Whole-genome sequence-based analysis of thyroid function

Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J.; Traglia, Michela; Brown, Suzanne J.; Mullin, Benjamin H.; Shihab, Hashem A.; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R.; Beilby, John P.; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D.; Hui, Jennie; Lim, Ee M.; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R.B.; Bell, Jordana T.; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L.; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M.; Naitza, Silvia; Walsh, John P.; Spector, Tim; Davey Smith, George; Durbin, Richard; Brent Richards, J.; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J.; Wilson, Scott G.

doi:10.1038/ncomms6681

Download PDF

Article
Open access
Published: 06 March 2015

Whole-genome sequence-based analysis of thyroid function

Peter N. Taylor¹^na1,
Eleonora Porcu^2,3,4^na1,
Shelby Chew⁵^na1,
Purdey J. Campbell⁵^na1,
Michela Traglia⁶,
Suzanne J. Brown⁵,
Benjamin H. Mullin^5,7,
Hashem A. Shihab⁸,
Josine Min⁸,
Klaudia Walter⁹,
Yasin Memari⁹,
Jie Huang⁹,
Michael R. Barnes ORCID: orcid.org/0000-0001-9097-7381¹⁰,
John P. Beilby^11,12,
Pimphen Charoen^13,14,
Petr Danecek⁹,
Frank Dudbridge¹³,
Vincenzo Forgetta^15,16,
Celia Greenwood^15,16,17,
Elin Grundberg^18,19,
Andrew D. Johnson²⁰,
Jennie Hui^11,12,
Ee M. Lim^5,11,
Shane McCarthy⁹,
Dawn Muddyman⁹,
Vijay Panicker ORCID: orcid.org/0000-0003-1551-8411⁵,
John R.B. Perry^21,22,
Jordana T. Bell²²,
Wei Yuan²²,
Caroline Relton⁸,
Tom Gaunt⁸,
David Schlessinger²³,
Goncalo Abecasis⁴,
Francesco Cucca^2,3,
Gabriela L. Surdulescu²²,
Wolfram Woltersdorf²⁴,
Eleftheria Zeggini⁹,
Hou-Feng Zheng^16,25,
Daniela Toniolo^6,26,
Colin M. Dayan¹,
Silvia Naitza²,
John P. Walsh^5,7,
Tim Spector²²,
George Davey Smith⁸,
Richard Durbin⁹,
J. Brent Richards^15,16,22,25,
Serena Sanna²,
Nicole Soranzo ORCID: orcid.org/0000-0003-1095-3852⁹,
Nicholas J. Timpson⁸^na2,
Scott G. Wilson^5,7,22^na2 &
The UK10K Consortium

Nature Communications volume 6, Article number: 5681 (2015) Cite this article

11k Accesses
66 Citations
24 Altmetric
Metrics details

Subjects

An Erratum to this article was published on 20 May 2015

This article has been updated

Abstract

Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10⁻⁹) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10⁻¹⁴). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10⁻⁹) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10⁻¹¹). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function.

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Qiuyue Yuan & Zhana Duren

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Saori Sakaue, Kathryn Weinand, … Soumya Raychaudhuri

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Introduction

Thyroid hormones have fundamental but diverse physiological roles in vertebrate physiology, ranging from induction of metamorphosis in amphibians to photoperiodic regulation of seasonal breeding in birds¹. In humans, they are essential for adult health and childhood development^2,3 and levothyroxine is one of the commonest drugs prescribed worldwide. Clinically, thyroid function is assessed by measuring circulating concentrations of free thyroxine (FT4) and the pituitary hormone thyrotropin (TSH); the complex inverse relationship between them renders TSH the more sensitive marker of thyroid status⁴. Even small differences in TSH and FT4, within the normal population reference range, are associated with a wide range of clinical parameters, including blood pressure, lipids and cardiovascular mortality, as well as obesity, bone mineral density and lifetime cancer risk⁵.

Twin and family studies estimate the heritability of TSH and FT4 as up to 65%⁶. Genome-wide association studies (GWAS) identified common variants associated with TSH and FT4^7,8,9; in a recent HapMap-based meta-analysis¹⁰, we identified 19 loci associated with TSH and 4 with FT4. However, these accounted for only 5.6% of the variance in TSH and 2.3% in FT4. Therefore, most of the heritability of these important traits remains unexplained.

The unidentified genetic component of variance might be explained by common variants poorly tagged by markers assessed in previous studies, or those with small effects. However, rarer variants within the minor allele frequency (MAF) spectrum might also account for a substantial proportion of the missing heritability as has been proposed for many polygenic traits¹¹. These variants, although individually rare (MAF<1%), are collectively frequent, and while their effects may be insufficient to produce clear familial aggregation, effect sizes for individual variants are potentially much greater than those observed for common variants. In addition, a greater understanding of the relative proportion of thyroid function explained by common variants is now possible with the availability of whole-genome sequencing (WGS) and this is essential to refine future research and analysis strategies when appraising the genetic architecture of thyroid function.

In this study, the first to utilize WGS to examine the genetic architecture of TSH and FT4, we perform single-point association analysis in two discovery cohorts in the UK10K project with WGS data available and a meta-analysis using genome wide association data (GWAS) with deep imputation from five additional data sets. We report three new loci associated with thyroid function in healthy individuals, undertake quantitative trait loci and DNA methylation analyses to further study these relationships and undertake genome-wide complex trait analyses (GCTA)¹² to assess the contributions of common variants (MAF≥1%) to variance in thyroid function. We also explore whether there is a shared polygenic basis between TSH and FT4. In individuals with WGS data, we perform sequence kernel-based association testing (SKAT) analysis to identify regions of the genome where rare variants have the strongest association with thyroid function and identify a novel locus associated with FT4. The results demonstrate that WGS-based analyses can identify rare functional variants and associations derived from rare aggregates. Larger meta-analyses of studies with WGS data are now required to identify additional common and rare variants, which may explain the missing heritability of thyroid function.

Results

Single-point association analysis

In the discovery study, using a meta-analysis of WGS data from the Avon Longitudinal Study of Parents and Children (ALSPAC) and TwinsUK cohorts (N=2,287) analysing up to 8,816,734 markers (Supplementary Tables 1 and 2; Supplementary Methods), we find associations at two previously described loci for TSH. These are NR3C2 (rs11728154; MAF=21.0%, B=0.21, s.e.=0.037, P=8.21 × 10⁻⁹; r²=0.99 with the previously reported rs10028213) and FOXE1 (rs1877431; MAF=39.5%, B=−0.19, s.e.=0.030, P=2.29 × 10⁻¹⁰; r²=0.99 with the previously reported rs965513). We find one borderline signal (between P=5.0 × 10⁻⁰⁸ and P=1.17 × 10⁻⁰⁸) at a novel locus FAM222A (rs11067829; MAF=18.3%, B=0.210, s.e.=0.038, P=3.73 × 10⁻⁸; Supplementary Figs 1a and 2; Supplementary Table 3). No variants show genome-wide significant association for FT4 (Supplementary Figs 1a and 3).

In a meta-analysis of the discovery cohorts and five additional cohorts, we find associations for 13 SNPs at 11 loci for TSH (N=16,335) of which 11 loci have been identified previously and 4 SNPs at 4 loci for FT4 (N=13,651) of which 3 have been identified previously (Table 1; Figs 1a–c,2a,b and 3; Supplementary Figs 1b and 3–6).

Table 1 Independent SNPs with MAF≥1% associated with serum TSH and FT4 levels in the overall meta-analysis.

Full size table

**Figure 1: Regional and genome-wide association plots for TSH.**

**Figure 2: Regional and conditional plots for FT4.**

**Figure 3: Overview of our findings of SNPs associated with TSH and FT4.**

To determine whether our identified associations at established loci represented previous association signals, we analysed the linkage disequilibrium (LD) between the strongest associated variants from this study and those from our previous study¹⁰ (Supplementary Table 4). The top variants from loci in both studies were in strong LD (r²>0.6), apart from MBIP and FOXE1, although these were in strong LD with variants previously associated with TSH by others⁸. Two SNPs associated with TSH in our study are novel, one at SYN2 (rs310763; MAF=23.5%, B=0.082, s.e.=0.014, P=6.15 × 10⁻⁹; Fig. 1a–c). SYN2 is a member of a family of neuron-specific phosphoproteins involved in the regulation of neurotransmitter release with expression in the pituitary and hypothalamus (http://biogps.org/#goto=genereport&id=6854). We also identify one novel variant at PDE8B (MAF=10.4%, B=−0.145, s.e.=0.019, P=5.94 × 10⁻¹⁴) in linkage equilibrium (r²=0.002, D′=0.17) with the previously described variant rs6885099 (ref. 10) and independent from our top SNP rs2046045 (P=1.93 × 10⁻¹¹) after conditional analysis. In the overall meta-analysis, we are unable to replicate the association between FAM222A and TSH in the discovery analysis (B=0.014, s.e.=0.015, P=0.378); however, we observe evidence of heterogeneity between cohorts (test for heterogeneity P=4.70 × 10⁻⁶; Supplementary Table 5), so potentially this locus may find support in future WGS studies.

In our meta-analysis, we also identify four SNPs associated with FT4, three at previously established loci (DIO1, LHX3 and AADAT; Table 1; Fig. 3; Supplementary Figs 1b, 4e and 6; Supplementary Table 4). We find a novel uncommon variant at B4GALT6/SLC25A52 associated with FT4 (rs113107469; MAF=3.20%, B=0.225, s.e.=0.037, P=1.27 × 10⁻⁹; Fig. 2a). B4GALT6 is in the ceramide metabolic pathway, which inhibits cyclic AMP production in TSH-stimulated cells. However, the B4GALT6 signal (rs113107469) is in weak LD (r²<0.1, D′=0.66) with the Thr139Met substitution (rs28933981; MAF=0.4%) and it may therefore be a marker for this functional change in TTR. The Thr139Met substitution was associated with FT4 levels in our single-point meta-analysis (P=2.14 × 10⁻¹¹), however, was not originally observed as the MAF was lower than our 1% threshold. Conditional analysis of the TTR region using rs28933981 as the conditioning marker in the ALSPAC WGS cohort reveals no evidence of association between rs113107469 in B4GALT6 and FT4 (P=0.124; Fig. 2b). Analysis using direct genotyping in the ALSPAC WGS and replication cohorts confirms the effect of the Thr139Met substitution on FT4 levels. Here, 0.79% of children were heterozygous for the Thr139Met substitution, which is positively associated with FT4 (B=1.70, s.e.=0.17, 95% CI 1.37, 2.03, P=3.89 × 10⁻²⁴). In the ALSPAC replication data set, rs113107469 in B4GALT6 was also positively associated with FT4 (P=0.0002); however, when conditioned on the Thr139Met substitution there was no longer any evidence of association (P=0.20). The Thr139Met substitution also appears to be functional: this mutation has increased protein stability compared with wild-type transthyretin (TTR)^13,14 and tighter binding of thyroxine¹⁴, resulting in a twofold increase in thyroxine-binding affinity^15,16. Further details of the likely genes related to all our observed independent novel signals are shown in Supplementary Table 6.

Expression quantitative trait locus analysis

Expression quantitative trait locus (eQTL) analysis^17,18 reveals that our SYN2 variant modulates SYN2 transcription in adipose, skin and whole-blood cells, but not lymphoblastoid cell lines (Supplementary Table 7). Furthermore, bioinformatics analysis suggests that the C-allele at rs310763 attenuates an EGR1 regulatory motif¹⁹. EGR1 is expressed in thyrocytes, regulates pituitary development^20,21 and may influence thyroid status via LHX3 promotor activity²⁰. Several other variants in the SYN2 gene region are in strong LD (r²>0.8) with rs310763, including the non-synonymous coding variant rs794999. Although predicted to be benign (PolyPhen-2 score=0.002 (ref. 22)), rs794999 is located in a DNase hypersensitivity cluster²³, influences four predicted regulatory motifs¹⁹ and appears to be under evolutionary constraint^24,25. SNPs identified in our study, or those in LD, also showed strong eQTL associations with PDE8B (P=8.69 × 10⁻²⁷), FOXE1 (P=9.10 × 10⁻⁵⁴) and AADAT (P=7.86 × 10⁻⁹) gene expressions (Supplementary Table 7).

DNA methylation analysis

To further explore cis-regulatory effects of variants identified in our study, we carried out analysis of DNA methylation profiles in whole-blood samples in 279 individuals from the TwinsUK cohort. We find evidence for a methylation quantitative trait locus (meQTL) at the novel TSH-associated variant rs2928167 in PDE8B (P=4.38 × 10⁻⁷; Supplementary Table 8), which are also eQTLs in multiple tissues (Supplementary Table 7). Recently, meQTL effects using the same probe (cg16418800) in adipose tissue also identified a peak signal at rs2359775 (P=6 × 10⁻¹⁵), which is in LD with rs2928167 (r²=0.5). We find that variants in ABO (P=2.02 × 10⁻²³) and AADAT (P=1.80 × 10⁻⁸) also show strong evidence for cis-meQTL effects (Supplementary Table 8). In additional analyses in 745 ALSPAC children, we find strong meQTL associations for rs2359775 in PDE8B (P=3.03 × 10⁻²⁸) and variants in ABO (P=1.01 × 10⁻¹⁰¹) and AADAT (P=4.18 × 10⁻³⁴) (Supplementary Table 8).

SKAT analysis

Tests of the association between aggregates of rare variants (MAF<1%) in the WGS cohorts were restricted to genes relevant to thyroid function. We find no evidence of association from SKAT analyses with TSH, however, for FT4 we identify one SKAT bin with multiple-testing-corrected evidence for association (P≤1.55 × 10⁻⁵) in NRG1 (P=2.53 × 10⁻⁶; Fig. 4; Supplementary Table 9). NRG1 is a glycoprotein that interacts with the NEU/ERBB2 receptor tyrosine kinase, and is critical in organ development.

GCTA and polygenic score analysis

SNPs were thinned to a set of 2,203,581 approximately independent SNPs with an LD threshold of r²>0.2, a window size of 5,000 SNPs and step of 1,000 SNPs. A genomic relationship matrix was then generated for unrelated individuals. We fitted linear mixed-effect models and estimate that all assessed common SNPs (MAF>1%) explain 24% (95% CI 19, 29) and 20% (95% CI 14, 26) of TSH and FT4 variance, respectively (P≤0.0001; Supplementary Table 10). Polygenic score analyses²¹ based on SNPs with P values under a fixed threshold do not detect evidence of a polygenic signal for TSH or FT4, nor of a shared polygenic basis between thyroid function and key metabolic outcomes. However, a genetic score based on 67 SNPs previously associated with thyroid function in GWAS^8,10,26 shows strong evidence of association with TSH (P=7.9 × 10⁻²⁰) and FT4 (P=2.7 × 10⁻⁴) and we observe evidence of shared genetic pathways with TSH associated with the FT4 gene score (P=7.0 × 10⁻⁴). These 67 SNPs explain 7.1% (95% CI 5.2, 9.0) of the variance in TSH and 1.9% (95% CI 1.1, 3.0) of the variance in FT4. Taken together, this suggests that many loci underlying thyroid function remain unknown.

Chemogenomic analysis

We undertook a database analysis of differential gene expression in cultured cells in response to hormone stimulation. We find SYN2 (rank 64 of 22283 (HL60 cells)) rates highest among the genes studied in the experiment, providing strong support for the role of this newly discovered locus in thyroid metabolism. Two other genes, NRG1 and CAPZB, also show evidence of levothyroxine responsiveness in at least one cell line²⁷ on the basis of a genome-wide differential expression and rank in the top 5th percentile (Supplementary Table 11). Publicly available data on altered SYN2 expression in brain, limb and tail from control and levothyroxine-treated Xenopus laevis during metamorphosis also provide evidence for the relevance of SYN2 in thyroid function²⁸.

Discussion

In this study, we demonstrate the utility of WGS data (and SNP array data when deeply imputed to WGS reference panels) in appraising the genetic architecture of thyroid function. Using WGS data, we identify a rare functional variant in TTR that appears to drive the observed association between an uncommon novel variant near B4GALT6 and FT4, and we demonstrate a novel association with FT4 arising from rare aggregates in NRG1. We also show that common variants collectively account for over 20% of the variance in TSH and FT4, a substantial advance on using only the ‘top SNPs’ from earlier GWA studies¹⁰. Taken together, this work indicates that both common variants with modest effects and rare variants with larger effects might explain a substantial proportion of the missing heritability of thyroid function, but larger studies are required to identify these variants. Studies including individuals with subclinical thyroid disease, particularly those who are negative for thyroid autoantibodies, may be particularly rewarding, as rare genetic variants with large effect sizes may be associated with serum TSH and FT4 concentrations outside the inclusion ranges we used and therefore would not be detected in our analyses.

Such endeavours are clinically relevant, as there has been a dramatic increase in levothyroxine prescribing for borderline TSH levels²⁹. At least three loci identified in this study show evidence of responsiveness to levothyroxine in cell line models, underscoring that borderline TSH levels often reflect the influence of genetic variation rather than overt autoimmune thyroid disease, in which case thyroid hormone replacement may not be appropriate. Our results indicate that further investigation of TSH heterogeneity at the population level is necessary.

Methods

Cohorts

Seven populations were used in this study. They are known as the TwinsUK WGS cohort, the TwinsUK GWAS cohort, the ALSPAC WGS cohort, the ALSPAC GWAS cohort, the SardiNIA cohort, the ValBorbera cohort and the Busselton Health Study cohort. Summary statistics of each cohort and full descriptions are given in Supplementary Methods, Supplementary Tables 1 and 2. All human research was approved by the relevant institutional ethics committees.

WGS data generation

Low-read depth WGS was performed in the TwinsUK and ALSPAC as part of the UK10K project. The SardiNIA cohort also had WGS data available (see Supplementary Methods).

Statistical analysis

An inverse normal transformation was applied to each trait within each cohort. Age, age², gender and any other cohort-specific variables (Supplementary Table 1) were applied as covariates. Genotype imputation was performed for relevant cohorts using the IMPUTE³⁰, MaCH³¹ or Minimac³² software packages, with poorly imputed variants excluded. See Supplementary Table 1 for cohort-specific details.