Abstract
Many genetic variants affect disease risk by altering context-dependent gene regulation. Such variants are difficult to study mechanistically using current methods that link genetic variation to steady-state gene expression levels, such as expression quantitative trait loci (eQTLs). To address this challenge, we developed the cistrome-wide association study (CWAS), a framework for identifying genotypic and allele-specific effects on chromatin that are also associated with disease. In prostate cancer, CWAS identified regulatory elements and androgen receptor-binding sites that explained the association at 52 of 98 known prostate cancer risk loci and discovered 17 additional risk loci. CWAS implicated key developmental transcription factors in prostate cancer risk that are overlooked by eQTL-based approaches due to context-dependent gene regulation. We experimentally validated associations and demonstrated the extensibility of CWAS to additional epigenomic datasets and phenotypes, including response to prostate cancer treatment. CWAS is a powerful and biologically interpretable paradigm for studying variants that influence traits by affecting transcriptional regulation.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Code availability
Scripts to reproduce analyses from this study are available at https://github.com/scbaca/cwas, https://github.com/scbaca/chip_imputation and https://doi.org/10.5281/zenodo.6666796.
References
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124–130 (2013).
Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP–trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).
Hormozdiari, F. et al. Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nat. Genet. 50, 1041–1047 (2018).
Gallagher, M. D. & Chen-Plotkin, A. S. The post-GWAS era: from association to function. Am. J. Hum. Genet. 102, 717–730 (2018).
Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019).
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Wray, N. R., Wijmenga, C., Sullivan, P. F., Yang, J. & Visscher, P. M. Common disease is more complex than implied by the core gene omnigenic model. Cell 173, 1573–1580 (2018).
GTex Consortium.The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
GTEx Consortium.Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Kim, J. et al. Gene expression profiles associated with acute myocardial infarction and risk of cardiovascular death. Genome Med. 6, 40 (2014).
Singh, T. et al. Characterization of expression quantitative trait loci in the human colon. Inflamm. Bowel Dis. 21, 251–256 (2015).
Ram, R. et al. Systematic evaluation of genes and genetic variants associated with type 1 diabetes susceptibility. J. Immunol. 196, 3043–3053 (2016).
Gong, J. et al. PancanQTL: systematic identification of cis-eQTLs and trans-eQTLs in 33 cancer types. Nucleic Acids Res. 46, D971–D976 (2018).
Liu, B., Gloudemans, M. J., Rao, A. S., Ingelsson, E. & Montgomery, S. B. Abundant associations with gene expression complicate GWAS follow-up. Nat. Genet. 51, 768–769 (2019).
Strober, B. J. et al. Dynamic genetic regulation of gene expression during cellular differentiation. Science 364, 1287–1290 (2019).
Knowles, D. A. et al. Allele-specific expression reveals interactions between genetic variation and environment. Nat. Methods 14, 699–702 (2017).
Ward, M. C., Banovich, N. E., Sarkar, A., Stephens, M. & Gilad, Y. Dynamic effects of genetic variation on gene expression revealed following hypoxic stress in cardiomyocytes. eLife 10, e57345 (2021). 2021).
Kumasaka, N., Knights, A. J. & Gaffney, D. J. Fine-mapping cellular QTLs with RASQUAL and ATAC-seq. Nat. Genet. 48, 206–213 (2016).
Wang, A. T. et al. Allele-specific QTL fine mapping with PLASMA. Am. J. Hum. Genet. 106, 170–187 (2020).
Kim-Hellmuth, S. et al. Genetic regulatory effects modified by immune activation contribute to autoimmune disease associations. Nat. Commun. 8, 266 (2017).
Umans, B. D., Battle, A. & Gilad, Y. Where are the disease-associated eQTLs?. Trends Genet. 37, 109–124 (2021).
Yao, D. W., O’Connor, L. J., Price, A. L. & Gusev, A. Quantifying genetic effects on disease mediated by assayed gene expression levels. Nat. Genet. 52, 626–633 (2020).
Chun, S. et al. Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat. Genet. 49, 600–605 (2017).
Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).
McVicker, G. et al. Identification of genetic variants that affect histone modifications in human cells. Science 342, 747–749 (2013).
Chen, L. et al. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell 167, 1398–1414 (2016).
Waszak, S. M. et al. Population variation and genetic control of modular chromatin architecture in humans. Cell 162, 1039–1050 (2015).
del Rosario, R. C. H. et al. Sensitive detection of chromatin-altering polymorphisms reveals autoimmune disease mechanisms. Nat. Methods 12, 458–464 (2015).
Grubert, F. et al. Genetic control of chromatin states in humans involves local and distal chromosomal interactions. Cell 162, 1051–1065 (2015).
Gate, R. E. et al. Genetic determinants of co-accessible chromatin regions in activated T cells across humans. Nat. Genet. 50, 1140–1150 (2018).
Degner, J. F. et al. DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390–394 (2012).
Maurano, M. T. et al. Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo. Nat. Genet. 47, 1393–1401 (2015).
Deplancke, B., Alpern, D. & Gardeux, V. The genetics of transcription factor DNA binding variation. Cell 166, 538–554 (2016).
Alasoo, K. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat. Genet. 50, 424–431 (2018).
Gusev, A. et al. Allelic imbalance reveals widespread germline-somatic regulatory differences and prioritizes risk loci in renal cell carcinoma. Preprint at bioRxiv https://doi.org/10.1101/631150 (2019).
Benaglio, P. et al. Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits. Nat. Genet. 51, 1506–1517 (2019).
Jiang, X. et al. Shared heritability and functional enrichment across six solid cancers. Nat. Commun. 10, 431 (2019).
Davies, R. W., Flint, J., Myers, S. & Mott, R. Rapid genotype imputation from sequence without reference panels. Nat. Genet. 48, 965–969 (2016).
Stelloo, S. et al. Integrative epigenetic taxonomy of primary prostate cancer. Nat. Commun. 9, 4900 (2018).
Pomerantz, M. M. et al. Prostate cancer reactivates developmental epigenomic programs during metastatic progression. Nat. Genet. 52, 790–799 (2020).
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Stouffer, S. A., Suchman, E. A., Devinney, L. C., Star, S. A. & Williams R. M. Jr. The American Soldier: Adjustment During Army Life Vol. 1 (Princeton University Press,1949).
Castel, S. E. et al. A vast resource of allelic expression data spanning human tissues. Genome Biol. 21, 234 (2020).
Liang, Y., Aguet, F., Barbeira, A. N., Ardlie, K. & Im, H. K. A scalable unified framework of total and allele-specific counts for cis-QTL, fine-mapping, and prediction. Nat. Commun. 12, 1424 (2021).
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
Emami, N. C. et al. Association of imputed prostate cancer transcriptome with disease risk reveals novel mechanisms. Nat. Commun. 10, 3107 (2019).
Schumacher, F. R. et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nat. Genet. 50, 928–936 (2018).
Mancuso, N. et al. Large-scale transcriptome-wide association study identifies new prostate cancer risk regions. Nat. Commun. 9, 4079 (2018).
Pomerantz, MM. et al. Analysis of the 10q11 cancer risk locus implicates MSMB and NCOA4 in human prostate tumorigenesis. PLoS Genet. 6, e1001204 (2010).
Meuleman, W. et al. Index and biological spectrum of human DNase I hypersensitive sites. Nature 584, 244–251 (2020).
Conti, D. V. et al. Trans-ancestry genome-wide association meta-analysis of prostate cancer identifies new susceptibility loci and informs genetic risk prediction. Nat. Genet. 53, 65–75 (2021).
Wang, X. & Goldstein, D. B. Enhancer domains predict gene pathogenicity and inform gene discovery in complex disease. Am. J. Hum. Genet. 106, 215–233 (2020).
Kasowski, M. et al. Variation in transcription factor binding among humans. Science 328, 232–235 (2010).
Koh, C. M. et al. MYC and prostate cancer. Genes Cancer 1, 617–628 (2010).
Zhang, B. et al. Klf5 acetylation regulates luminal differentiation of basal progenitors in prostate development and regeneration. Nat. Commun. 11, 997 (2020).
Bhatia-Gaur, R. et al. Roles for Nkx3.1 in prostate development and cancer. Genes Dev. 13, 966–977 (1999).
Drobnjak, M., Osman, I., Scher, H. I., Fazzari, M. & Cordon-Cardo, C. Overexpression of cyclin D1 is associated with metastatic prostate cancer to bone. Clin. Cancer Res. 6, 1891–1895 (2000).
Economides, K. D. & Capecchi, M. R. Hoxb13 is required for normal differentiation and secretory function of the ventral prostate. Development 130, 2061–2069 (2003).
Wu, D. et al. Three-tiered role of the pioneer factor GATA2 in promoting androgen-dependent gene expression in prostate cancer. Nucleic Acids Res. 42, 3607–3622 (2014).
Ahmed, M. et al. CRISPRi screens reveal a DNA methylation-mediated 3D genome dependent causal mechanism in prostate cancer. Nat. Commun. 12, 1781 (2021).
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Kim, S. M. et al. Regulation of mouse steroidogenesis by WHISTLE and JMJD1C through histone methylation balance. Nucleic Acids Res. 38, 6389–6403 (2010).
Jin, G. et al. Genome-wide association study identifies a new locus JMJD1C at 10q21 that may influence serum androgen levels in men. Hum. Mol. Genet. 21, 5222–5228 (2012).
Levasseur, A., St-Jean, G., Paquet, M., Boerboom, D. & Boyer, A. Targeted disruption of YAP and TAZ impairs the maintenance of the adrenal cortex. Endocrinology 158, 3738–3753 (2017).
Hawley, J. R. et al. Reorganization of the 3D genome pinpoints noncoding drivers of primary prostate tumors. Cancer Res. 81, 5833–5848 (2021).
Sáez, C. et al. Expression of basic fibroblast growth factor and its receptors FGFR1 and FGFR2 in human benign prostatic hyperplasia treated with finasteride. Prostate 40, 83–88 (1999).
Sweeney, C. J. et al. Chemohormonal therapy in metastatic hormone-sensitive prostate cancer. N. Engl. J. Med. 373, 737–746 (2015).
Pomerantz, M. et al. Genome-wide association study (GWAS) of response to androgen deprivation therapy (ADT) and survival in metastatic prostate cancer (PCa). JCO 34, 1540 (2016).
Whitaker, H. C. et al. N-acetyl-l-aspartyl-l-glutamate peptidase-like 2 is overexpressed in cancer and promotes a pro-migratory and pro-metastatic phenotype. Oncogene 33, 5274–5287 (2014).
Berndt, S. I. et al. Two susceptibility loci identified for prostate cancer aggressiveness. Nat. Commun. 6, 6889 (2015).
Zhang, Z. et al. An AR-ERG transcriptional signature defined by long-range chromatin interactomes in prostate cancer cells. Genome Res. 29, 223–235 (2019).
Baca, S. C. et al. Reprogramming of the FOXA1 cistrome in treatment-emergent neuroendocrine prostate cancer. Nat. Commun. 12, 1979 (2021).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Zhang, Y. et al. Model-based analysis of ChIP-seq (MACS). Genome Biol. 9, R137 (2008). (2008).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Köster, J. & Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).
Loh, P. R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
van de Geijn, B., McVicker, G., Gilad, Y. & Pritchard, J. K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods 12, 1061–1063 (2015).
Castel, S. E., Levy-Moonshine, A., Mohammadi, P., Banks, E. & Lappalainen, T. Tools and best practices for data processing in allelic expression analysis. Genome Biol. 16, 195 (2015).
Kuilman, T. et al. CopywriteR: DNA copy number detection from off-target sequence data. Genome Biol. 16, 49 (2015).
Delaneau, O. et al. A complete tool set for molecular QTL discovery and analysis. Nat. Commun. 8, 15452 (2017).
Whitlock, M. C. Combining probability from independent tests: the weighted Z-method is superior to Fisher’s approach. J. Evolut. Biol. 18, 1368–1373 (2005).
Gusev, A. et al. A transcriptome-wide association study of high grade serous epithelial ovarian cancer identifies novel susceptibility genes and splice variants. Nat. Genet. 51, 815–823 (2019).
Hukku, A. et al. Probabilistic colocalization of genetic variants from complex and molecular traits: promise and limitations. Am. J. Hum. Genet. 108, 25–35 (2021).
Barbeira, A. N. et al. Exploiting the GTEx resources to decipher the mechanisms at GWAS loci. Genome Biol. 22, 49 (2021). 2021)
Aulchenko, Y. S., Struchalin, M. V. & van Duijn, C .M. ProbABEL package for genome-wide association analysis of imputed data. BMC Bioinf. 11, 134 (2010).
Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017).
Acknowledgements
This work is supported by grants from the PhRMA Foundation and the Kure It Cancer Research Foundation (S.C.B.). The androgen deprivation GWAS was supported in part by the National Cancer Institute of the National Institutes of Health under award numbers U10CA180820, U10CA180794 and UG1CA233180 (C.J.S.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We are grateful to the E3805: CHAARTED investigators; the patients who participated in the trial; the Prostate Cancer Foundation Mazzone Awards; and Sanofi for partial financial support and supplying docetaxel for early use (C.J.S.). In addition, we acknowledge Public Health Service grants CA180794, CA180820, CA23318, CA66636, CA21115, CA49883, CA16116, CA21076, CA27525, CA13650, CA14548, CA35421, CA32102, CA31946, CA04919, CA107868 and CA184734 (C.J.S.). We are grateful for the generous support of Rebecca and Nathan Milikowsky and Debbie and Bob First (S.C.B.).
Author information
Authors and Affiliations
Contributions
S.C.B., A.G. and M.L.F. conceived the study. S.C.B. analyzed the data and wrote the manuscript under the joint supervision of M.L.F. and A.G. A.F. performed ChIP–seq experiments. J.-H.S. generated and C.K. analyzed allelic imbalance data from LNCaP cells. T.M. and Y.D. analyzed SNP STARR-seq data under the supervision of N.L. and B.P. C.S. performed CRISPRi experiments under the supervision of D.Y.T. S. Z. assisted with analysis of ChIP–seq data. S.G. assisted with genotype imputation. S.L. and W.Z. provided prostate cancer ChIP–seq data. M.M.P. and V.W. analyzed ADT GWAS data performed on samples and data provided by C.J.S. J.A. assisted with implementation of the CWAS pipeline.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Jason Stein and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Accurate genotyping of SNPs from epigenomic data.
(a) Overview of 575 epigenomic datasets merged across 163 individuals for genotyping. Datasets are colored by cohort (See Supplementary Table 1). (b) Genomic distribution of reads in ChIP-seq, RNA-seq and input control (whole genome) data. The genome was divided into non-overlapping 500 base-pair windows and cumulative read counts for each bin were summed. For each datatype, five samples were randomly selected and down-sampled to 8.4 million reads for uniformity. The mean percentage of bins with the indicated number of read counts is shown for each datatype. (c) Number of covered SNPs (≥ 5 reads) versus total aggregated reads for each individual. (d) Number of covered SNPs (≥ 5 reads) for each individual (n = 165) as the indicated number of datasets are merged. Datasets were added in random order for a given individual. For boxplots, lower and upper hinges indicate 25th and 75th percentiles; whiskers extend to 1.5 x the inter-quartile ranges (IQR). (e) Correlation of imputed versus array-based genotype dosages across 24 individuals. (f) Receiver operating characteristic curve for detection of heterozygous SNPs using sequencing and imputation, with array-based genotypes as ground truth. Dotted red line indicates a mean sensitivity of 0.92 at a specificity of 0.9 in individuals of European ancestry.
Extended Data Fig. 2 Inferred ancestry of individuals in the study.
Projection of imputed genotypes onto the first two principal components of continental ancestry from ref. 78. Individual identifiers for outlier samples (with values > 2 x standard deviation) are labeled. Self-reported ancestry is coded by color.
Extended Data Fig. 3 Overlap of cQTLs with prostate tissue eQTLs.
(a) Enrichment of genetically determined AR peaks (left) and H3K27ac peaks (right) for overlap with GWAS risk SNPs eQTLs across various tissues. Empiric p values are derived 10,000 from permutations. (b) number of AR and H3K27ac cQTLs that are also the top eQTL for a gene in prostate tissue. (c) correlation of cQTL and eQTL effect size (β) for cQTL SNPs; p-value for Pearson correlation test is indicated. (d) Examples of SNPs (labeled with rs identifier) that are both AR cQTLs and eQTLs where the corresponding cPeak and eGene are connected by an H3K27ac HiChIP loop in LNCaP. cPeak coordinates are shown and eGene transcriptional start sites (TSS) is denoted. (e) Contingency table showing enrichment of H3K27ac HiChIP looping between the corresponding cPeak and eGene for cQTLs that are also eQTLs. Chi-square test p-values are indicated.
Extended Data Fig. 4 Distribution of cQTLs around cPeaks.
cQTL SNP significance versus distance to the center of the corresponding cPeak for significant cQTLs (permutation-based q-value < 0.05). Dashed blue lines indicate ± 25 Kb from the peak center.
Extended Data Fig. 5 Conditioning of GWAS SNP significance on genetically predicted CWAS AR binding.
Genomic context of AR CWAS ARBS (depicted in green) that are significantly associated with prostate cancer risk. Manhattan plots indicate significance of SNP associations with prostate cancer before and after conditioning on genetically predicted CWAS ARBS activity. (a) and (b) show representative examples where ARBS explain most of the nearby cis-SNP GWAS significance. (c) CWAS ARBS at the promoter of GGCX, where residual GWAS significance remains after conditioning on ARBS, suggesting additional mechanisms underlying risk conferred by SNPs in this region.
Extended Data Fig. 6 Comparison of CWAS and GWAS significance for tested ARBS and H3K27ac peaks.
The absolute value of the association Z-score is plotted for CWAS peak-trait associations (y-axis) and GWAS SNP-trait associations for the most significant nearby SNP (x-axis). (a) shows ARBS and (b) shows H3K27ac peaks. Dashed horizontal lines indicate genome-wide significance thresholds for CWAS. Vertical dotted lines indicate the GWAS significance threshold of z = 5.45.
Extended Data Fig. 7 Enrichment of prostate cancer GWAS risk SNPs in genetically determined AR peaks and H3K27ac peaks.
Enrichment and p-values for AR peaks (a) and H3K27ac peaks (b) derived from linkage disequilibrium score regression5.
Extended Data Fig. 8 cQTL vs. eQTL activity at TMPRSS2 and NKX3-1 loci.
(a) Normalized AR ChIP-seq reads at the TMPRSS2 enhancer and TMPRSS2 expression stratified by genotype of the indicated SNP. (b) Normalized H3K27ac ChIP-seq reads at the NKX3-1 enhancer and NKX3-1 expression stratified by genotype of the indicated SNP. ρ and p-values indicate Pearson correlation coefficient for (A) and (B). (c) Estimated cis-SNP heritability for the indicated epigenomic features and corresponding genes. For boxplots, lower and upper hinges indicate 25th and 75− percentiles; whiskers extend to 1.5 x the inter-quartile ranges (IQR).
Supplementary information
Supplementary Information
Supplementary Methods and Supplementary Note.
Supplementary Tables
Supplementary Tables 1–11.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Baca, S.C., Singler, C., Zacharia, S. et al. Genetic determinants of chromatin reveal prostate cancer risk mediated by context-dependent gene regulation. Nat Genet 54, 1364–1375 (2022). https://doi.org/10.1038/s41588-022-01168-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-022-01168-y
This article is cited by
-
Omics-based construction of regulatory variants can be applied to help decipher pig liver-related traits
Communications Biology (2024)
-
Adjusting for genetic confounders in transcriptome-wide association studies improves discovery of risk genes of complex traits
Nature Genetics (2024)
-
Inferring cell-type-specific causal gene regulatory networks during human neurogenesis
Genome Biology (2023)
-
A biallelic multiple nucleotide length polymorphism explains functional causality at 5p15.33 prostate cancer risk locus
Nature Communications (2023)
-
Regulatory controls of duplicated gene expression during fiber development in allotetraploid cotton
Nature Genetics (2023)