Modeling transcriptomic age using knowledge-primed artificial neural networks

The development of ‘age clocks’, machine learning models predicting age from biological data, has been a major milestone in the search for reliable markers of biological age and has since become an invaluable tool in aging research. However, beyond their unquestionable utility, current clocks offer little insight into the molecular biological processes driving aging, and their inner workings often remain non-transparent. Here we propose a new type of age clock, one that couples predictivity with interpretability of the underlying biology, achieved through the incorporation of prior knowledge into the model design. The clock, an artificial neural network constructed according to well-described biological pathways, allows the prediction of age from gene expression data of skin tissue with high accuracy, while at the same time capturing and revealing aging states of the pathways driving the prediction. The model recapitulates known associations of aging gene knockdowns in simulation experiments and demonstrates its utility in deciphering the main pathways by which accelerated aging conditions such as Hutchinson–Gilford progeria syndrome, as well as pro-longevity interventions like caloric restriction, exert their effects.

2 match quite well with those generated by the pathway-based model (Supplementary Figure 2 c) and only differ slightly in the magnitude of impact predicted. This is interesting, as it suggests that irrespective of the network design, the models appear to learn a similar correlative structure from the data, with the notable difference of course being the interpretability of the neuronal activation patterns, which is only given using the pathway-based architecture.

Comparison of transcriptomic age and visual age
The aging phenotype of the skin is a sum of many factors, prominently including the formation of coarse and fine wrinkles, a loss of structural rigidity and elasticity, as well as changes in texture and pigmentation. The human eye is trained well on the detection of these markers, so we resorted to the use of portrait images in order to bridge the gap between molecular parameters and phenotypic manifestations of aging and validate the predictions of the model. The images were captured in a standardized setup, taking evenly lit (flash diffuser), non-polarized and color-controlled frontal portrait images of the test subjects with eyes closed, any hair covered to reduce the impact of features unrelated to the skin (except facial hair for men), and naturally any make-up removed beforehand. To enable a comparison with the transcriptomic age predictions, we generated estimates of visual age for a subset of 154 subjects from the test set from these images, by asking a group of 31 volunteers to assess the ages of the test subjects from the photographs. Averaging the predictions from all volunteers revealed a strong correlation of the estimated visual ages to the chronological ages of the test subjects, with a median absolute error of only 4.38 years. Subsequent analysis using linear models (Supplementary Table 1) identified a significant positive association between the predicted transcriptomic age and the visual age assessments after adjusting for chronological age and gender of the test subjects (p = 0.016). This delivers evidence of a direct link between the molecular aging state of the skin as captured by the model and tangible phenotypical manifestations of skin aging.

Simulation of gene knockdowns with higher fold-change and multiple knockdowns
The log2 fold-change for the simulated gene knockdowns of all genes in the model was chosen to be in range of what would be realistically expected in expression differences in the context of "normal" physiological aging, therefore was set to a moderately strong effect size of -2. In order to test if observed effects on the age estimates, which were relatively small in magnitude, were also reproducible at higher magnitude, we have calculated the impacts for additional, stronger knockdowns (log2 fold-change = -10) as well, which showed concordantly stronger impact, but matching very well (R 2 = 0.98) with the showed accumulative effects when compared to single gene knockdowns.

Further model validation experiments
To illustrate the gain in reproducibility awarded by the utilization of an adapted loss function including a term optimizing the auxiliary output from the pathway neurons as well as the ensembling approach, we have added data showing the impact on reproducibility from a single network with and without the auxiliary loss operating on the pathway layer, as well the final ensemble model from separately retrained models (Supplementary Figure 4 ac).
To ensure the functioning of the model as expected, we trained another model with a permuted filter matrix, in which gene-pathway affiliations were unchanged but the order of genes was permuted. The model's pathway ranking did not differ substantially from the original model (Supplementary Figure 4 d), indicating that interpretability was not impeded and the model was functioning as intended.
In order to ensure that the relatively even distribution of the pathway ranking in the model was not We also performed several experiments introducing control pathways into the model, artificial pathways consisting of varying numbers of randomly sampled genes (10,25,50,100,200). Analyzing the pathway activation ranking after retraining the model identified the control pathways as showing lower than average correlation to the modeled phenotype in all cases (Supplementary Figure 6 a). We then repeated the perturbation experiments using the photoaging, Hutchinson-Gilford-Progeria-Syndrome and caloric restriction signatures to ensure that interpretability of the model was not impeded by the addition of the additional pathways, which generated results comparable to the original model (Supplementary Figure   6 bd), indicating that interpretability was not affected by the added control pathways. Overall model 5 accuracy was also not significantly altered (Supplementary Figure 6 e). One of the control pathways (control 5, the largest one consisting of 200 randomly sampled genes) ranked relatively high in the pathway ranking compared to the other control sets, presumably due to some correlative information present among the randomly sampled genes. In order to test this, we trained another model using a new set of control pathways consisting of randomly sampled genes with decidedly low associations to age (absolute Pearson correlation coefficient < 0.1). In concordance with the above mentioned hypothesis, these negative control pathways showed far lower correlation to the modeled phenotype (Supplementary In order to generate more thorough statistical estimates for the pathway ranking, we calculated 100 permutations on a network including a control pathway of 150 completely randomly chosen genes (matching the average size of the Hallmark pathways). The analysis of the resulting data using Wilcoxon rank sum tests comparing the Hallmark pathways to the introduced control showed that most of the pathways exhibited significantly stronger age association than the introduced baseline after multiple testing adjustment, with p53 signaling and several inflammatory pathways leading the list (Supplementary Table 2, same data as shown in Figure 2 d).

Supplementary Table 2:
Statistics from comparing the correlation of pathway neuron activations of every pathway in the model to a control pathway of 150 randomly sampled genes using Wilcoxon rank sum tests, with data derived from 100 random permutations, consisting of retraining and testing the neural network ensemble: