Genetic Determinants of Telomere Length in African American Youth

Telomere length (TL) is associated with numerous disease states and is affected by genetic and environmental factors. However, TL has been mostly studied in adult populations of European or Asian ancestry. These studies have identified 34 TL-associated genetic variants recently used as genetic proxies for TL. The generalizability of these associations to pediatric populations and racially diverse populations, specifically of African ancestry, remains unclear. Furthermore, six novel variants associated with TL in a population of European children have been identified but not validated. We measured TL from whole blood samples of 492 healthy African American youth (children and adolescents between 8 and 20 years old) and performed the first genome-wide association study of TL in this population. We were unable to replicate neither the 34 reported genetic associations found in adults nor the six genetic associations found in European children. However, we discovered a novel genome-wide significant association between TL and rs1483898 on chromosome 14. Our results underscore the importance of examining genetic associations with TL in diverse pediatric populations such as African Americans.

that adult TL appears determined prior to adulthood 31 , further research in diverse pediatric populations is necessary to validate the existence of genetic effects on TL early in life.
Previous genetic studies of TL have been done almost exclusively in populations of European ancestry 32 , yet there is evidence that TL varies by race/ethnicity [32][33][34] . African Americans have been shown to have longer telomeres throughout life [34][35][36] and a greater rate of telomere attrition than populations of European ancestry 37 . Population-specific differences in genetic variants have previously been shown across the genome 38 . Thus, it is possible that population-specific variation of genetic factors contributing to TL influences the difference in TL observed between populations of African and European ancestries 33 .
To further understand the relationship between genetic variants and TL, we performed the first large-scale genetic study of TL in African American children and adolescents (n = 492) from the Study of African Americans, Asthma, Genes and Environments (SAGE). Herein, we analyze genome-wide genetic data to attempt validation of previously reported genetic associations with TL and identify genetic variants influencing TL in African American children and adolescents.

Results
Study Population. Demographic information for the study population (n = 492) is presented in Table 1. The age of participants ranged from 8 to 20 with a median age of 15.8 (IQR = 12.4, 18.3; Table 1). Median African ancestry was 0.81 (IQR = 0.74, 0.85; Table 1); additionally, we observed a subtle, but positive, correlation between African Ancestry and TL (β = 0.333, P = 0.022, Supplementary Fig. S1 and Supplementary Table S1). While individuals with public health insurance had significantly longer TL than individuals with private health insurance (P = 1.84 × 10 −4 ), there was no significant association of age or maternal education with TL (Supplementary Table S1).

Evaluation of Previous Variants.
We evaluated 40 variants, 34 from adult studies ( Table 2) and six from a pediatric study (Table 3), for an association with log-transformed TL. None of the variants from either the adult or pediatric studies were significantly associated with TL in our study population (P > 0.05). To determine whether the combined effect of the six previously discovered pediatric variants was associated with TL in our study population, we calculated a weighted genetic prediction score (GPS) by aggregating the allele associated with longer TL in European children weighted by the published β-coefficient 30 . There was no significant association between the GPS and TL in our study population of African American children and adolescents (β = 0.377, P = 0.150, Fig. 1).

Discovery Genome-Wide Association Study.
We performed a discovery GWAS to identify significant and suggestive associations between common genetic variants and TL in our study population. We identified a novel association between rs1483898 and TL that reached genome-wide significance (P = 7.86 × 10 −8 , Fig. 2). Rs1483898 is an intergenic single nucleotide polymorphism (SNP) located proximal to the LRFN5 gene on chromosome 14. An increase in copies of the rs1483898 A allele was significantly associated with longer TL (β = 0.148, P = 7.86 × 10 −8 , Fig. 3) and rs1483898 had a minor allele frequency (MAF) of 0.236 in our study population. We also discovered 41 suggestive associations between common variants and TL (P < 2.32 × 10 −6 , Supplementary  Table S2). Of particular note were rs9675924 (β = −0.171, P = 2.27 × 10 −6 , Supplementary Table S2) located in CABLES1 and rs4305653 (β = 0.167, P = 1.81 × 10 −6 , Supplementary Table S2) located in TTC37. These genes have been previously associated with telomere biology 39,40 .

Discussion
In this study, we contribute to the nascent body of research on genetic determinants of TL by assessing the generalizability of genetic markers of TL to African American children and adolescents. Our results are consistent with recent studies in pediatric populations 29,30 that did not replicate variants associated with TL in adults [17][18][19][20][21][22][23][24][25][26] , suggesting that these variants may not play a significant role in the regulation of TL during the first two decades of growth and development. However, we were also unable to replicate genetic variants associated with TL in a population of European children 30 , highlighting potential population-specific effects of genetic associations with TL. Lastly, we identified a genome-wide significant variant, rs1483898, and 41 suggestively associated variants within genes relevant to telomere biology in a GWAS for TL in African American children and adolescents.    34 variants that have been used in recent years as proxies of TL in studies of disease risk 17,27 . We did not replicate these genetic associations in our study population of African American children and adolescents. Pediatric studies of TL by other groups 29,30 have also been unable to replicate associations found among adults, suggesting that the genetic components influencing TL may differ between adult and pediatric populations. It is possible that variants identified in adults relate to telomere maintenance in adulthood but do not regulate TL during earlier developmental windows. For example, resistance to telomere shortening during childhood may be influenced by genetic factors impacting telomerase, a critical enzyme in telomere elongation 41 that is influenced by genetic loci 42 and shows age-related reduction in activity 43 . TL is determined prior to adulthood dependent on the TL setting at birth and the rates of shortening and elongating during the first two decades of life 10 . These factors have genetic influences that have yet to be fully characterized 1,10 . We attempted replication of six TL-associated genetic variants discovered in healthy children of European ancestry 30 . We found no significant association with TL among the six variants independently or in a weighted GPS, which tests cumulative variation at multiple genetic loci. Heritability estimates of TL range from 36% to 82% 11    limited set of studies assessing TL-associated genetic variants in populations of African ancestry, all have been performed in adult populations. One study discovered genetic variants associated with TL in adults of European ancestry that were not associated in an adult population of African Americans 22 . Another study in adult African Americans was only able to replicate the effects of variants in TERC, the gene encoding the enzyme telomerase, that had been identified in populations of European ancestry 46 . We found a subtle positive association between the proportion of African ancestry and TL in African American children and adolescents, which is consistent with research among adults 34 . Considering TL dynamics vary by race/ethnicity 32-34 , our study augments the current literature by demonstrating that TL-associated genetic variants differ between ancestral populations in the pediatric age range. Ancestry-specific genetic associations with a phenotypic trait have been demonstrated previously 47 , thus, the difference we observed may result from population-specific effects impacting genetic regulation of TL. It is worth noting regional variation in environmental and social exposures between SAGE's urban San Francisco Bay Area and the more rural Nancy, France of the Stathopoulou et al. study 48 as potential factors effecting the association between genetic variants and TL and possibly precluding replication of the Stathopoulou et al. study.
We identified associations with TL in biologically relevant pathways relating to apoptosis, cell senescence and telomere replication. The most significant association, reaching genome-wide significance, was for rs1483898. Rs1483898 is located on the q21.1 arm of chromosome 14 in the regulatory region of LRFN5, a neuronal transmembrane protein. In our population of African American children and adolescents, increasing copies of the A allele of rs1483898 associated with longer TL. The A allele of rs1483898 has allele frequencies of 0.74, 0.86 and 0.45 in the African, European and East Asian populations of 1000Genomes, respectively 49 .
We identified 41 variants that were suggestively associated with TL. The A allele of rs9675924, located within an intron of cell cycle regulator CABLES1 on chromosome 18, is associated with shorter TL in our study population. CABLES1 is co-expressed with protein kinase CDK5, a known contributor to apoptosis in certain neuronal diseases 39 . CABLES1 has also been shown to inhibit cell proliferation and induce cell senescence in umbilical endothelial cells 40 . The C allele of rs4305653 associated with longer telomeres. This variant is located on chromosome 5 within an intron of TTC37, a component of the SKI complex which mediates protein-protein interactions. TTC37 is co-expressed with apoptosis-promoting protein APAF1 as well as with TEP1, a protein that binds to the RNA subunit of telomerase. There is evidence that TEP1 is involved in telomere replication but the nature of that relationship in humans remains unclear 39 . Ultimately, co-expression is only a proxy for co-regulation 50 ; replication and further investigation of our results are needed to better characterize relevant associations between these genetic loci and telomere biology.
Our study lacked an independent replication cohort to assess the reproducibility of our genetic associations due to the unique characteristics of our study population (African American children and adolescents with genetic and TL data). Measurement of TL in our study population provided a snapshot of TL at a specific point in the life course. Longitudinal studies of TL are required to understand changes in TL over the life course. The major advantages of our study are that (1) it provides novel information about the genetic determinants of TL in a non-European pediatric population, and (2) our depth of phenotype data allowed us to adjust for social, environmental and genetic covariates (Supplementary Table S1).
It is important to note that while our study was well powered to detect moderate to strong effects (f 2 = 0.15 and 0.35, respectively) in common variants with a MAF of 0.05 or higher in our study population, we were not powered to detect weak effects (f 2 < 0.02) 51 . It is possible that our inability to replicate the reported variants may be explained by limited statistical power; however, this would only be the case if these variants had effect sizes in our population that were significantly lower than the strong effects reported in previous studies. All but two of the variants of the 40 variants we attempted to replicate were common (MAF > 0.05) in both the populations in which they were discovered and in our study population. In summary, the paucity of research on factors affecting TL in pediatric and non-European populations creates a knowledge gap in the scientific understanding of gene-environment interactions regulating telomeres. Epidemiological studies reporting associations between TL and disease risk are potentially biased by the disease itself or exposures relating to treatment. Genetic proxies for TL have recently been employed to overcome these and other potential biases, such as social and environmental exposures. A critical assumption when using genetic proxies for TL is that they are generalizable across age and racial/ethnic groups. However, we were unable to replicate previous findings of TL-associated variants in our study population. We also identified novel genetic associations with TL that have not been identified in previous studies in pediatric or adult populations. Further telomere research in pediatric populations from diverse ancestral backgrounds is required to fully understand the impacts of age-and population-effects on the genetic regulation of TL.

Methods
Ethics statement. This study has been approved by the institutional review boards of University of California San Francisco, Kaiser Permanente and Children's Hospital Oakland. Written informed consent was obtained from all subjects or from their appropriate surrogates for participants under 18 years old. All methods were performed in accordance with the relevant guidelines and regulations for human subject research.

Study population.
Our study included 492 healthy controls from the Study of African Americans, Asthma, Genes and Environments (SAGE). SAGE is one of the largest ongoing gene-environment interaction studies of asthma in African American children and adolescents in the USA. SAGE includes detailed clinical, social, and environmental data on asthma and asthma-related conditions. Full details of the SAGE study protocols have been described in detail elsewhere [52][53][54] . Briefly, SAGE was initiated in 2006 and recruited participants with and without asthma through a combination of clinic-and community-based recruitment centers in the San Francisco Bay Area. Recruitment for SAGE ended in 2015. All participants in SAGE self-identified as African American and self-reported that all four grandparents were African American.
After all quality control procedures relating to TL measurement, TL was computed for 596 healthy controls in SAGE from whole blood. There were 495 healthy controls with complete sex, age, African ancestry, maternal educational attainment and health insurance information. Three individuals showed extreme outlier measurements for TL (three times the interquartile range) and were thus removed.
Maternal educational attainment was dichotomized based on whether a participant's mother had pursued education beyond high school (i.e., ≤12 versus >12 years of education). Health insurance type was defined as private versus public insurance. The genetic ancestry of each participant was determined using the ADMIXTURE software package 58 with the supervised learning mode assuming two ancestral populations (African and European) using HapMap Phase III data from the YRI and CEU populations as references 59 .
Variant selection and genotyping. TL-associated variants were selected for replication using criteria set a priori. We only tested genetic associations if the (i) published association reaches genome-wide significance (P ≤ 5 × 10 −8 ) on NHGRI-EBI GWAS Catalog by October 26, 2017; (ii) variant used as genetic proxy of TL in at least one study; (iii) variant reaches suggestive genome-wide significance (P ≤ 5 × 10 −5 ) in a novel GWAS of TL in children; (iv) variant has a minor allele frequency (MAF) of at least 1% in our study population. We identified variants from 11 studies [17][18][19][20][21][22][23][24][25][26]30 . Ten of the 11 studies were performed in adult populations and nine of the 11 studies were performed in populations of European descent, with the remaining two performed in Punjab Sikh 24 and Han Chinese 26 populations. In total, we identified 40 variants from the literature, of which 12 were genotyped and 28 were imputed. The 28 imputed SNPs had r 2 (squared correlation between imputed and expected genotypes) ranging from 0.88 to 1.00. DNA was isolated from whole blood collected from SAGE participants at the time of study enrollment using the Wizard ® Genomic DNA Purification kits (Promega, Fitchburg, WI). Samples were genotyped with the Affymetrix Axiom ® LAT1 array (World Array 4, Affymetrix, Santa Clara, CA), which covers 817,810 SNPs. This array was optimized to capture genetic variation in African-descent populations such as African Americans and Latinos 60 . Genotype call accuracy and Axiom array-specific quality control metrics were assessed and applied according to the protocol described in further detail in Online Resource 1. Data was submitted to the Michigan Imputation Server and phased using EAGLE v2.3 and imputed from the Haplotype Reference Consortium r1.1 reference panel using Minimac3 61 . Imputed SNPs were included if they had an r 2 higher than 0.3. Quality control inclusion criteria consisted of individual genotyping efficiency >95%, Hardy-Weinberg Equilibrium (HWE) P > 10 −4 , and MAF > 5%. Cryptic relatedness was also assessed to ensure that samples were effectively unrelated. Samples with an estimate of genetic relatedness greater than 0.025 were excluded. After quality control procedures, 7,519,176 imputed and genotyped SNPs were available for analysis. brief, relative TL for each sample was calculated using the delta-delta C T (2 −∆∆Ct ) formula 62 . Using this formula, the TL computed for each SAGE sample is proportional to the T/S ratio of that sample normalized to the T/S ratio of the PCR plate positive DNA control sample 62,64,65 . Inter-and intra-experimental coefficients of variation for our internal control (1301 cell line DNA) were 3% and 4.25%, respectively. Average amplification efficiency across plates was ≥90% for telomere and 36B4 assays. As TL was not normally distributed in our study population, we performed all parametric tests on a log-transformation of TL.

Replication analysis.
Genotypes for all 40 previously published SNP's in adults and children were tested for association with log-transformed TL in a multivariable linear regression analysis. Regression analyses were run separately for each SNP under an additive model to calculate the individual effect of the SNP on TL. Each regression analysis was adjusted for biological, environmental and social factors that may impact TL including sex, age, African ancestry, maternal education, and health insurance type. We adjusted for qPCR plate ID in all regression analyses to ensure that our results were not impacted by qPCR batch effects. To ensure direct comparison of results between previous studies and our current study we coded the effect alleles in our analysis to be the same as those used in previous studies.
Genetic Prediction Score construction. Recent research suggests the cumulative effect of multiple genetic markers may be a stronger predictor of a quantitative phenotype than the individual markers 66,67 . We therefore constructed a weighted GPS based on the six variants from Stathopoulou et al. to test their cumulative effect on TL 30 . We calculated each subject's weighted GPS by summing the number of alleles (0, 1 or 2) associated with longer telomeres after weighting the allele count by the reported β-coefficient from the literature. We assumed that an effect allele having a positive β-coefficient meant that each additional copy of that allele was positively associated with TL. We used the GPS as a predictor in a linear regression against log-transformed TL controlling for sex, age, genetic ancestry, maternal educational attainment, health insurance type and batch effects. We were unable to calculate a weighted GPS based on the 34 variants in adult studies because the effect size could not be standardized across the studies.
Calculation of population-specific genome-wide significance threshold. The standard GWAS threshold for statistical significance is 5 × 10 −8 . This number was derived by applying a Bonferroni correction for multiple testing to a dataset of one million independent markers/SNPs. However, in many cases, this threshold is overly conservative and can be inappropriate when (1) a smaller number of markers is genotyped, and (2) the assumption of independence of tests is violated. In order to adjust the Bonferroni correction based on the actual number of independent test performed on our dataset, we determined the number of independent tests using the protocol published by Sobota et al. 68 . This method estimates the effective number of independent tests in a genetic dataset after accounting for linkage disequilibrium (LD) between SNPs using the LD pruning function in the PLINK 1.9 software package 69 . The following parameters were used in PLINK 1.9 as advised by the authors: 100 SNP sliding window, step size of 5 base pairs, and a variance inflation factor of 1.25. Applying this method on 7,519,176 genotyped and imputed SNPs yielded 431,896 independent tests, which was then used to calculate the genome-wide significance threshold (Bonferroni correction 0.05/431,896 = 1.2 × 10 −7 ). A suggestive threshold was set at 2.3 × 10 −6 for association results based on the widely used formula: 1/(effective number of tests) 70 .
Discovery Genome-Wide Association Study. We performed a genome-wide association study (GWAS) using 7,519,176 genotyped and imputed SNPs to assess the relationship between SNP genotype and log-transformed TL. The GWAS linear regression model adjusted for sex, age, African ancestry, maternal educational attainment, health insurance type and batch effects. All testing was performed using PLINK1.9 69 . Manhattan plots ( Fig. 2A,B) were generated using the qqman package 71 in the R statistical software environment (R Development Core Team 2010) and LocusZoom 72 . Curated protein-protein interactions were extracted using the STRING database 39 . An integrated confidence score for the interaction ranges from 0.5 (medium confidence) to 1 (high confidence).