Gene expression profiling can be used to uncover the mechanisms by which loci identified through genome-wide association studies (GWAS) contribute to pathology1,2. Given that most GWAS hits are in putative regulatory regions and transcript abundance is physiologically closer to the phenotype of interest2, we hypothesized that summation of risk-allele-associated gene expression, namely a transcriptional risk score (TRS), should provide accurate estimates of disease risk. We integrate summary-level GWAS and expression quantitative trait locus (eQTL) data with RNA-seq data from the RISK study, an inception cohort of pediatric Crohn's disease3,4. We show that TRSs based on genes regulated by variants linked to inflammatory bowel disease (IBD) not only outperform genetic risk scores (GRSs) in distinguishing Crohn's disease from healthy samples, but also serve to identify patients who in time will progress to complicated disease. Our dissection of eQTL effects may be used to distinguish genes whose association with disease is through promotion versus protection, thereby linking statistical association to biological mechanism. The TRS approach constitutes a potential strategy for personalized medicine that enhances inference from static genotypic risk assessment.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Scientific Reports Open Access 23 May 2022
Inflammation and Regeneration Open Access 01 August 2021
Nature Communications Open Access 27 July 2021
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Fairfax, B.P. & Knight, J.C. Genetics of gene expression in immunity to infection. Curr. Opin. Immunol. 30, 63–71 (2014).
Gibson, G., Powell, J.E. & Marigorta, U.M. Expression quantitative trait locus analysis for translational medicine. Genome Med. 7, 60 (2015).
Haberman, Y. et al. Pediatric Crohn disease patients exhibit specific ileal transcriptome and microbiome signature. J. Clin. Invest. 124, 3617–3633 (2014).
Kugathasan, S. et al. Prediction of complicated disease course for children newly diagnosed with Crohn's disease: a multicentre inception cohort study. Lancet 389, 1710–1718 (2017).
Witte, J.S., Visscher, P.M. & Wray, N.R. The contribution of genetic variants to disease depends on the ruler. Nat. Rev. Genet. 15, 765–776 (2014).
Wray, N.R., Yang, J., Goddard, M.E. & Visscher, P.M. The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet. 6, e1000864 (2010).
Wray, N.R., Goddard, M.E. & Visscher, P.M. Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res. 17, 1520–1528 (2007).
Chatterjee, N., Shi, J. & García-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).
Walters, T.D. et al. Increased effectiveness of early therapy with anti–tumor necrosis factor-α vs an immunomodulator in children with Crohn's disease. Gastroenterology 146, 383–391 (2014).
Liu, J.Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).
Westra, H.J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).
Kabakchiev, B. & Silverberg, M.S. Expression quantitative trait loci analysis identifies associations between genotype and gene expression in human intestine. Gastroenterology 144, 1488–1496 (2013).
GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Di Narzo, A.F. et al. Blood and intestine eQTLs from an anti-TNF-resistant Crohn's disease cohort inform IBD genetic association loci. Clin. Transl. Gastroenterol. 7, e177 (2016).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
Lee, J.C. et al. Genome-wide association study identifies distinct genetic contributions to prognosis and susceptibility in Crohn's disease. Nat. Genet. 49, 262–268 (2017).
Ning, K. et al. Improved integrative framework combining association data with gene expression features to prioritize Crohn's disease genes. Hum. Mol. Genet. 24, 4147–4157 (2015).
Singh, T. et al. Characterization of expression quantitative trait loci in the human colon. Inflamm. Bowel Dis. 21, 251–256 (2015).
Albert, F.W. & Kruglyak, L. The role of regulatory variation in complex traits and disease. Nat. Rev. Genet. 16, 197–212 (2015).
Gibson, G. & Weir, B. The quantitative genetics of transcription. Trends Genet. 21, 616–623 (2005).
de Souza, H.S. & Fiocchi, C. Immunopathogenesis of IBD: current state of the art. Nat. Rev. Gastroenterol. Hepatol. 13, 13–27 (2016).
McGovern, D.P., Kugathasan, S. & Cho, J.H. Genetics of inflammatory bowel diseases. Gastroenterology 149, 1163–1176 (2015).
Nabekura, T. et al. Costimulatory molecule DNAM-1 is essential for optimal differentiation of memory natural killer cells during mouse cytomegalovirus infection. Immunity 40, 225–234 (2014).
Martinet, L. & Smyth, M.J. Balancing natural killer cell activation through paired receptors. Nat. Rev. Immunol. 15, 243–254 (2015).
Petrillo, M.G. et al. GITR+ regulatory T cells in the treatment of autoimmune diseases. Autoimmun. Rev. 14, 117–126 (2015).
Reikvam, D.H. et al. Increase of regulatory T cells in ileal mucosa of untreated pediatric Crohn's disease patients. Scand. J. Gastroenterol. 46, 550–560 (2011).
Ye, C.J. et al. Intersection of population variation and autoimmunity genetics in human T cell activation. Science 345, 1254665 (2014).
Wiley, S.E. et al. The outer mitochondrial membrane protein mitoNEET contains a novel redox-active 2Fe-2S cluster. J. Biol. Chem. 282, 23745–23749 (2007).
Novak, E.A. & Mollen, K.P. Mitochondrial dysfunction in inflammatory bowel disease. Front. Cell Dev. Biol. 3, 62 (2015).
Levine, A. et al. Pediatric modification of the Montreal classification for inflammatory bowel disease: the Paris classification. Inflamm. Bowel Dis. 17, 1314–1321 (2011).
Satsangi, J., Silverberg, M.S., Vermeire, S. & Colombel, J.F. The Montreal classification of inflammatory bowel disease: controversies, consensus, and implications. Gut 55, 749–753 (2006).
Cleynen, I. et al. Inherited determinants of Crohn's disease and ulcerative colitis phenotypes: a genetic association study. Lancet 387, 156–167 (2016).
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Anders, S., Pyl, P.T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).
Robinson, M.D., McCarthy, D.J. & Smyth, G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).
Onengut-Gumuscu, S. et al. Fine mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers. Nat. Genet. 47, 381–386 (2015).
Leek, J.T., Johnson, W.E., Parker, H.S., Jaffe, A.E. & Storey, J.D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).
Mecham, B.H., Nelson, P.S. & Storey, J.D. Supervised normalization of microarrays. Bioinformatics 26, 1308–1315 (2010).
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
Chun, S. et al. Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat. Genet. 49, 600–605 (2017).
Lee, M.N. et al. Common genetic variants modulate pathogen-sensing responses in human dendritic cells. Science 343, 1246980 (2014).
We are grateful to B. Zeng, D. Arafat, H. Somineni, S. Venkateswaran, and colleagues from the Gibson and Kugathasan laboratories for their support and helpful comments. We also would like to thank I. Mendizabal, J. Lachance and K. Jordan for comments on the manuscript. This research was supported by Project 3 (G.G., PI) of the NIH program project “Statistical and Quantitative Genetics” grant P01-GM0996568 (B. Weir, University of Washington, Director) as well as research grants from the Crohn's and Colitis Foundation of America (CCFA), New York, to the individual study institutions participating in the RISK study.
The authors declare no competing financial interests.
Integrated supplementary information
eQTLs detected in the vicinity of SNPs associated with IBD tend to show concordant effect size and direction in blood and ileum. The effects of 136 eQTLs available in ileum are shown (Supplementary Table 2). The x axis shows the β values for the eQTLs detected in peripheral blood from the Blood eQTL browser; the y axis shows the β values in the eQTL mapping study with ileal biopsies from the RISK study (see “Mapping study in the RISK cohort to build the ileal TRS” in the Online Methods). The dashed best-fitting least-squares regression line corresponds to Spearman r = 0.54 (P = 2 × 10−11). Values in the corners indicate the percentage of loci in each quadrant, showing that 70% are concordant in direction of effect in the two tissues (P = 1.7 × 10−6, sign test).
Supplementary Figure 2 Performance of the GRS and TRS based on the initial set of 157 candidate genes.
(a,b) Each plot shows the TRS based on 157 IBD genes associated with 96 eQTLs that are also associated with IBD or in LD with a SNP associated with the disease (Supplementary Table 1). The discriminatory performance of the GRS versus TRS based on these genes is shown for disease status: comparison of samples with Crohn’s disease (n = 210) and controls (n = 35) (a) and disease course (3-year period after diagnosis): comparison of samples that remain in non-complicated Crohn’s disease (B1; n = 183) and those that develop complicated disease (B2 and/or B3; n = 27) (b). The standardized GRS and TRS are shown on the y axis. Differences between groups (in s.d. units) along with P values (two-sided t test) are reported for each comparison.
(a) Each point represents the –log10 (P value) (NLP) for the blood eQTL association and Crohn’s disease GWAS association for 157 candidate genes. Colors represent the significance of the SMR statistic, clearly showing that the most highly significant genes are strongly associated with both traits. Similar plots are observed for ulcerative colitis and IBD. All 39 genes with SMR P < 2.3 × 10−4 (red and brown dots) for all three disease classifications were included in the final SMR-based TRS. (b) The coloc H4 score estimates the posterior probability that the same causal variant drives both the GWAS and eQTL associations. This plot shows that poor SMR values (small NLPs) tend also to have low coloc H4 scores; however, only approximately half of the strong SMR values (large NLPs) have strong coloc H4 posterior probabilities. The 29 genes with coloc H4 greater than 0.8 for the three disease phenotypes were included in the final coloc-based TRS. This includes 14 genes not in the SMR set.
Supplementary Figure 4 Relationship between transcriptional risk scores and location of inflammation.
Because the Paris classification of pediatric Crohn’s disease includes location of disease, which was strongly correlated with the degree of inflammation in the ileum from which biopsies were obtained, we plot here the relationship between disease location and the 29-gene coloc-derived TRS. RISK study patients were classified into two categories according to the presence/absence of visible ileal inflammation in endoscopies performed at diagnosis (L1 (ileum-only) and L3 (ileocolonic) cases were classified as ‘inflamed’; L2 (colonic-only) cases were classified as ‘non-inflamed’). Only two of the cases that progressed to complicated disease were non-inflamed, which are not shown owing to low sample size. The TRS is slightly elevated in inflamed versus endoscopically non-inflamed B1 cases (P < 0.02) and is also elevated in B1 cases with non-inflamed ilea as compared to non-IBD controls (P < 1 × 10−6), confirming that the TRS picks up a signal that is related but complementary to inflammation. Complicated cases have an elevated TRS even relative to inflamed B1 cases (P < 7 × 10−4). A box plot of values is shown for each group along with P values for pairwise comparisons (two-sided t test).
Supplementary Figure 5 Performance of the GRS and TRS based on 39 susceptibility genes detected by SMR.
(a,b) Thirty-nine genes were detected by SMR as being under the control of 29 causal variants that account for the association detected by GWAS and the eQTL effect reported in the Blood eQTL browser (Supplementary Table 4). The performance of the GRS verus TRS based on these genes is shown for disease status: comparison of samples with Crohn’s disease (n = 210) versus non-IBD controls (n = 35) (a) and disease course (3-year period after diagnosis): comparison of samples that remain in non-complicated Crohn’s disease (B1; n = 183) versus those that develop complicated disease (B2 and/or B3; n = 27) (b). The standardized GRS and TRS are shown on the y axis. Differences between groups (in s.d. units) along with P values (two-sided t test) are reported for each comparison.
Supplementary Figure 6 Performance of PRSs based on LD-pruned variants at different significance inclusion thresholds.
(a,b) PRSs at different thresholds (Online Methods) successfully separate Crohn’s disease cases from non-IBD controls (a) but fail to distinguish according to development of complicated disease (b). The performance of PRSs using SNPs that pass a range of liberal P-value thresholds in GWAS analysis is shown (the inclusion threshold and total number of variants used are reported on the y axis). Differences between groups (in s.d. units) along with P values for each comparison are reported on the x axis.
Supplementary Figures 1–6 (PDF 1128 kb)
eQTL association data in peripheral blood for 232 SNPs associated with IBD with genes <1 Mb away (7,389 SNP–gene pairs). (XLSX 1169 kb)
Replicability of blood eQTL effects in ileal tissue from the RISK study. (XLSX 58 kb)
coloc results for 163 SNP–gene pairs selected from the Blood eQTL browser. (XLSX 79 kb)
SMR results for 163 SNP–gene pairs selected from the Blood eQTL browser. (XLSX 86 kb)
eQTL association and coloc results for 46 genes controlled by SNPs associated with IBD in the RISK ileal eQTL mapping study. (XLSX 60 kb)
About this article
Cite this article
Marigorta, U., Denson, L., Hyams, J. et al. Transcriptional risk scores link GWAS to eQTLs and predict complications in Crohn's disease. Nat Genet 49, 1517–1521 (2017). https://doi.org/10.1038/ng.3936
This article is cited by
Scientific Reports (2022)
Inflammation and Regeneration (2021)
Human Genomics (2021)
Nature Communications (2021)
Inflammation status modulates the effect of host genetic variation on intestinal gene expression in inflammatory bowel disease
Nature Communications (2021)