Modified penetrance of coding variants by cis-regulatory variation contributes to disease risk

Castel, Stephane E.; Cervera, Alejandra; Mohammadi, Pejman; Aguet, François; Reverter, Ferran; Wolman, Aaron; Guigo, Roderic; Iossifov, Ivan; Vasileva, Ana; Lappalainen, Tuuli

doi:10.1038/s41588-018-0192-y

Analysis
Published: 20 August 2018

Modified penetrance of coding variants by cis-regulatory variation contributes to disease risk

Nature Genetics volume 50, pages 1327–1334 (2018)Cite this article

17k Accesses
127 Citations
383 Altmetric
Metrics details

Subjects

Abstract

Coding variants represent many of the strongest associations between genotype and phenotype; however, they exhibit inter-individual differences in effect, termed ‘variable penetrance’. Here, we study how cis-regulatory variation modifies the penetrance of coding variants. Using functional genomic and genetic data from the Genotype-Tissue Expression Project (GTEx), we observed that in the general population, purifying selection has depleted haplotype combinations predicted to increase pathogenic coding variant penetrance. Conversely, in cancer and autism patients, we observed an enrichment of penetrance increasing haplotype configurations for pathogenic variants in disease-implicated genes, providing evidence that regulatory haplotype configuration of coding variants affects disease risk. Finally, we experimentally validated this model by editing a Mendelian single-nucleotide polymorphism (SNP) using CRISPR/Cas9 on distinct expression haplotypes with the transcriptome as a phenotypic readout. Our results demonstrate that joint regulatory and coding variant effects are an important part of the genetic architecture of human traits and contribute to modified penetrance of disease-causing variants.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Regulatory variants as modifiers of coding variant penetrance.**

Fig. 2: Analysis of regulatory effects at the individual level shows that pathogenic coding variants are depleted from higher-expressed and exon-including regulatory haplotypes in the general population.

**Fig. 3: eQTL haplotype configurations that are predicted to increase pathogenic coding variant penetrance are depleted in the genomes of GTEx individuals.**

**Fig. 4: eQTL haplotype configurations that are predicted to increase pathogenic coding variant penetrance are enriched in individuals with cancer and autism spectrum disorder.**

**Fig. 5: Haplotype-aware genome editing of a Mendelian disease SNP in *FLCN* demonstrates that expression regulatory variation can modify its penetrance.**

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Single-cell long-read sequencing-based mapping reveals specialized splicing patterns in developing and adult mouse and human brain

Article Open access 09 April 2024

Genome-wide association studies

Article 26 August 2021

References

Chen, R. et al. Analysis of 589,306 genomes identifies individuals resilient to severe Mendelian childhood diseases. Nat. Biotechnol. 34, 531–538 (2016).
Article CAS Google Scholar
Cooper, D. N., Krawczak, M., Polychronakos, C., Tyler-Smith, C. & Kehrer-Sawatzki, H. Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease. Hum. Genet. 132, 1077–1130 (2013).
Article CAS Google Scholar
Milne, R. L. & Antoniou, A. C. Genetic modifiers of cancer risk for BRCA1 and BRCA2 mutation carriers. Ann. Oncol. 22(Suppl 1), i11–17 (2011).
Article Google Scholar
Emison, E. S. et al. A common sex-dependent mutation in a RET enhancer underlies Hirschsprung disease risk. Nature 434, 857–863 (2005).
Article CAS Google Scholar
Wei, W.-H., Hemani, G. & Haley, C. S. Detecting epistasis in human complex traits. Nat. Rev. Genet. 15, 722–733 (2014).
Article CAS Google Scholar
Snozek, C. L. H. et al. LDLR promoter variant and exon 14 mutation on the same chromosome are associated with an unusually severe FH phenotype and treatment resistance. Eur. J. Hum. Genet. 17, 85–90 (2009).
Article CAS Google Scholar
Alberobello, A. T. et al. An intronic SNP in the thyroid hormone receptor β gene is associated with pituitary cell-specific over-expression of a mutant thyroid hormone receptor β2 (R338W) in the index case of pituitary-selective resistance to thyroid hormone. J. Trans. Med. 9, 144 (2011).
Article CAS Google Scholar
Butt, C. et al. Combined carrier status of prothrombin 20210A and factor XIII-A Leu34 alleles as a strong risk factor for myocardial infarction: evidence of a gene-gene interaction. Blood 101, 3037–3041 (2003).
Article CAS Google Scholar
Amin, A. S. et al. Variants in the 3’ untranslated region of the KCNQ1-encoded Kv7.1 potassium channel modify disease severity in patients with type 1 long QT syndrome in an allele-specific manner. Eur. Heart J. 33, 714–723 (2012).
Article CAS Google Scholar
Dimas, A. S. et al. Modifier effects between regulatory and protein-coding variation. PLoS Genet. 4, e1000244–10 (2008).
Article Google Scholar
Lappalainen, T., Montgomery, S. B., Nica, A. C. & Dermitzakis, E. T. Epistatic selection between coding and regulatory variation in human evolution and disease. Am. J. Hum. Genet. 89, 459–463 (2011).
Article CAS Google Scholar
Vu, V. et al. Natural variation in gene expression modulates the severity of mutant phenotypes. Cell 162, 391–402 (2015).
Article CAS Google Scholar
GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Article Google Scholar
GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Article Google Scholar
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Article CAS Google Scholar
Castel, S. E., Levy-Moonshine, A., Mohammadi, P., Banks, E. & Lappalainen, T. Tools and best practices for data processing in allelic expression analysis. Genome Biol. 16, 195 (2015).
Article Google Scholar
Mohammadi, P., Castel, S. E., Brown, A. A. & Lappalainen, T. Quantifying the regulatory effect size of cis-acting genetic variation using allelic fold change. Genome Res. 27, 1872–1884 (2017).
Article CAS Google Scholar
Irimia, M. et al. A highly conserved program of neuronal microexons is misregulated in autistic brains. Cell 159, 1511–1523 (2014).
Article CAS Google Scholar
Delaneau, O., Howie, B., Cox, A. J., Zagury, J.-F. & Marchini, J. Haplotype estimation using sequencing reads. Am. J. Hum. Genet. 93, 687–696 (2013).
Article CAS Google Scholar
Castel, S. E., Mohammadi, P., Chung, W. K., Shen, Y. & Lappalainen, T. Rare variant phasing and haplotypic expression from RNA sequencing with phASER. Nat. Commun. 7, 12817 (2016).
Article Google Scholar
Samocha, K. E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).
Article CAS Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article CAS Google Scholar
Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
Payne, S. R. & Kemp, C. J. Tumor suppressor genetics. Carcinogenesis 26, 2031–2045 (2005).
Article CAS Google Scholar
Fischbach, G. D. & Lord, C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron 68, 192–195 (2010).
Article CAS Google Scholar
Sanders, S. J. et al. Insights into autism spectrum disorder genomic architecture and biology from 71 risk loci. Neuron 87, 1215–1233 (2015).
Article CAS Google Scholar
Iossifov, I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014).
Article CAS Google Scholar
Krumm, N. et al. Excess of rare, inherited truncating mutations in autism. Nat. Genet. 47, 582–588 (2015).
Article CAS Google Scholar
Toro, J. R., Wei, M.-H., Glenn, G. M. & Weinreich, M. BHD mutations, clinical and molecular genetic investigations of Birt–Hogg–Dubé syndrome: a new series of 50 families and a review of published reports. J. Med. Genet. 45, 321–331 (2008).
Article CAS Google Scholar
Khoo, S. K. et al. Clinical and genetic studies of Birt–Hogg–Dubé syndrome. J. Med. Genet. 39, 906–912 (2002).
Article CAS Google Scholar
Lin, Y.-C. et al. Genome dynamics of the human embryonic kidney 293 lineage in response to cell biology manipulations. Nat. Commun. 5, 4767 (2014).
Article CAS Google Scholar
Ward, L. D. & Kellis, M. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucl. Acids Res. 40, D930–4 (2012).
Article CAS Google Scholar
Wang, J., Vasaikar, S., Shi, Z., Greer, M. & Zhang, B. WebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit. Nucl. Acids Res. 45, W130–W137 (2017).
Article CAS Google Scholar
Werling, D. M. et al. Limited contribution of rare, noncoding variation to autism spectrum disorder from sequencing of 2,076 genomes in quartet families. Preprint at https://doi.org/10.1101/127043 (2017).
1000 Genomes Project Consortium. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
Panousis, N. I., Gutierrez-Arcelus, M., Dermitzakis, E. T. & Lappalainen, T. Allelic mapping bias in RNA-sequencing is not a major confounder in eQTL studies. Genome Biol. 15, 467 (2014).
Article Google Scholar
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
Article CAS Google Scholar
O’Connell, J. et al. A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet. 10, e1004234 (2014).
Article Google Scholar
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Article CAS Google Scholar
Durbin, R. Efficient haplotype matching and storage using the positional Burrows–Wheeler transform (PBWT). Bioinformatics 30, 1266–1272 (2014).
Article CAS Google Scholar
Edmonson, M. N. et al. Bambino: a variant detector and alignment viewer for next-generation sequencing data in the SAM/BAM format. Bioinformatics 27, 865–866 (2011).
Article CAS Google Scholar
Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1–8 (2016).
Article Google Scholar
Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).
Article CAS Google Scholar
Zhao, M., Kim, P., Mitra, R., Zhao, J. & Zhao, Z. TSGene 2.0: an updated literature-based knowledgebase for tumor suppressor genes. Nucl. Acids Res. 44, D1023–D1031 (2015).
Article Google Scholar
Heigwer, F., Kerr, G. & Boutros, M. E-CRISP: fast CRISPR target site identification. Nat. Methods 11, 122–123 (2014).
Article CAS Google Scholar
Sanjana, N. E., Shalem, O. & Zhang, F. Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods 11, 783–784 (2014).
Article CAS Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS Google Scholar
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Article CAS Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article Google Scholar

Download references

Acknowledgements

We would like to thank members of the Lappalainen laboratory for discussion surrounding the project, and both Kristin Ardlie and Sampsa Hautaniemi, who supervised F.A. and A.C., respectively. We thank the GTEx donors for their contributions to science, the GTEx Laboratory, Data Analysis, and Coordinating Center (LDACC), and the GTEx analysis working group (AWG) for their work in generating the resource. In particular, we would like to thank Ayellet Segre and Xiao Li at the Broad for their work in performing WGS variant calling and phasing of GTEx v7 data. The GTEx Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health (NIH) and by the National Cancer Institute (NCI), the National Human Genome Research Institute (NHGRI), the National Heart, Lung, and Blood Institute (NHLBI), the National Institute on Drug Abuse (NIDA), the National Institute of Mental Health (NIMH) and the National Institute of Neurological Disorders and Stroke (NINDS). We also acknowledge the families at the participating SSC sites, the principal investigators at each site, the coordinators and staff at the SSC sites, the SFARI staff and the UMass Amherst Community Campaign (UMACC). Funds for the SSC were provided by the Simons Foundation. Additionally, we acknowledge the contribution of TCGA specimen donors and The Cancer Genome Atlas Research Network for their analyses. Funds for the TCGA were provided by Cancer Institute and NHGRI. S.E.C. was supported by NHGRI grant 1K99HG009916-01; T.L. and S.E.C. were supported by NIGMS grant R01GM122924 and NIMH grant R01MH101814; T.L., S.E.C. and P.M. were supported by NIH contract HHSN2682010000029C; T.L. and P.M. were supported by NIMH grant R01MH106842; and T.L. was supported by NIH grants UM1HG008901 and 1U24DK112331. AC was supported by the Cancer Society of Finland and Academy of Finland grant 284598.

Author information

Authors and Affiliations

New York Genome Center, New York, NY, USA
Stephane E. Castel, Alejandra Cervera, Pejman Mohammadi, Aaron Wolman, Ivan Iossifov, Ana Vasileva & Tuuli Lappalainen
Department of Systems Biology, Columbia University, New York, NY, USA
Stephane E. Castel, Pejman Mohammadi, Ana Vasileva & Tuuli Lappalainen
Research Programs Unit, Genome-Scale Biology & Medicine, Department of Biochemistry and Developmental Biology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
Alejandra Cervera
Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
Pejman Mohammadi
The Scripps Translational Science Institute, La Jolla, CA, USA
Pejman Mohammadi
Broad Institute of MIT and Harvard, Cambridge, USA
François Aguet
Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
Ferran Reverter & Roderic Guigo
Universitat Pompeu Fabrea (UPF), Barcelona, Spain
Roderic Guigo
Cold Spring Harbor Laboratory, New York, NY, USA
Ivan Iossifov

Authors

Stephane E. Castel
View author publications
You can also search for this author in PubMed Google Scholar
Alejandra Cervera
View author publications
You can also search for this author in PubMed Google Scholar
Pejman Mohammadi
View author publications
You can also search for this author in PubMed Google Scholar
François Aguet
View author publications
You can also search for this author in PubMed Google Scholar
Ferran Reverter
View author publications
You can also search for this author in PubMed Google Scholar
Aaron Wolman
View author publications
You can also search for this author in PubMed Google Scholar
Roderic Guigo
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Iossifov
View author publications
You can also search for this author in PubMed Google Scholar
Ana Vasileva
View author publications
You can also search for this author in PubMed Google Scholar
Tuuli Lappalainen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.E.C. and T.L. designed the study and wrote the manuscript. S.E.C., A.V., and T.L. designed analyses and experiments. S.E.C., A.C., F.A., A.W., and A.V. performed analyses and experiments. P.M. aided development of the test for regulatory modifiers of penetrance. F.R. and R.G. provided and assisted in analysis of GTEx PSI data. I.I. provided and assisted in the analysis of SSC data.

Corresponding authors

Correspondence to Stephane E. Castel or Tuuli Lappalainen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Illustration of the key features of the model of joint effects of regulatory and coding variants on functional gene dosage and selection.

Under the model, regulatory variation altering functional gene dosage is particularly important in loss-of-function heterozygotes, where the dosage of functional protein is already reduced to half. Our general assumption is that common regulatory variants typically have such low effects on gene dosage that, in the absence of coding variants, they do not cause severe disease or substantial reduction of fitness. Accordingly, in this example, under an additive model of gene expression, the more highly expressed eQTL allele increases expression by 1.25× and disease risk increases non-linearly with decreasing gene dosage; there are potentially large disease risk differences for loss-of-function heterozygotes depending on eQTL haplotype configuration. This results in purifying selection acting more strongly against haplotype configurations that decrease functional gene dosage, while acting more weakly on those that increase functional gene dosage. At the population level, this differential strength of purifying selection would result in haplotype configurations that increase functional gene dosage being present at higher frequencies than those that decrease dosage. We note that, while we believe that this general model is plausible for many genes with dosage sensitivity, other scenarios are likely to exist, and, for example, fully recessive genes or gain-of-function coding variants would not follow this model. Future work and larger data sets are needed to elucidate the full picture of the relative importance of different types of joint effects of regulatory and coding variants.

Supplementary Figure 2 Using GTEx allelic expression and percent spliced in to estimate the penetrance of coding variants at the individual level.

a,b, Differences in allelic expression between rare (AF < 1%), potentially pathogenic (CADD > 15, n = 644,052 independent ASE measurements) (a) and benign (CADD <15, n = 368,715 independent ASE measurements) (b) missense variants and allele frequency-matched synonymous variants across GTEx tissues. A negative difference indicates reduced expression of missense variants compared with synonymous controls. Bars show the 95% confidence interval of the difference of the means between missense and synonymous variants. Storey’s II₁ indicates the estimated proportion of true positives across the GTEx tissues. c, Box plots of mean aFC across each GTEx tissue for pathogenic and benign missense variants defined using PolyPhen (where pathogenic is probably damaging or possibly damaging). d, Comparison of allele frequency-matched synonymous-variant-normalized allelic expression between pathogenic and benign missense variants for each of the sets of variants analyzed in Fig. 2. e, Left, illustration of PSI z-score calculation from PSI measurements for the exon HsaEX0054530 in GTEx whole blood. Right, histogram of all absolute exon PSI z-scores for coding variants across GTEx whole blood, with the 90th percentile shown. Coding variants with an absolute PSI z-score greater than the 90th percentile were considered to be in exons with substantial variation in PSI compared with the population. P-values were calculated using a two-sided paired Wilcoxon signed rank test. For box plots: bottom whisker, Q1 − 1.5 × IQR; top whisker, Q3 + 1.5 × IQR; box, IQR; center, median; outliers are not plotted for ease of viewing.

Supplementary Figure 3 Test for regulatory modifiers of coding variant penetrance using phased genetic data.

a, As input, the test takes the phased genotypes of coding variants and the eQTL for that gene. For each individual heterozygous for a coding variant, a binary measure is produced to indicate whether the major (wild-type) allele is on the more weakly expressed eQTL haplotype. b, Across a population of individuals, the null expectation is that the observed haplotype configurations are a random sampling of all possible configurations, and thus, the proportion of observed major alleles on the more weakly expressed haplotype is equal to the frequency of the more weakly expressed haplotype in the population. The diagram depicts a single gene example, but observations are aggregated across genes, and the difference between the observed frequency of major alleles on the more weakly expressed haplotype and the frequency of the more weakly expressed haplotype across those genes is calculated. c, Results of a test performed on simulated haplotype data from 1,000 individuals at 500 genes with 1,000 replicates using a frequency for the more weakly expressed haplotype of 50% and coding variant frequencies observed in GTEx, across a range of genes exhibiting joint effects between regulatory and coding variants and effect size. The simulated effect size is described by the x axis in terms of the percentage of observed haplotype configurations that decrease penetrance. d) Power to detect significant (α = 0.05) regulatory modifiers of penetrance from simulation data in c is robust across a range of effect sizes. e, Comparison of P-values calculated using either the bootstrap approach or the Poisson binomial distribution from 1,000 simulations of 1,000 haplotypes generated under the null hypothesis shows that they are extremely similar. The equality line is shown in red, and the Pearson correlation R and slope are listed in the panel. See the Methods for more information.

Supplementary Figure 4 Correlation between regulatory coding variant haplotype configuration and both eQTL effect size and gene conservation.

a,b, Results are shown for rare pathogenic (MAF < 1%, CADD > 15, n = 132,788 variant expression haplotype configurations; left) and rare benign (MAF < 1%, CADD < 15, n = 204,395 variant expression haplotype configurations; right) coding variants separately. For each correlation, the Spearman rho and P-value are listed above the plot. The red line shows the linear regression fit, and the blue line shows the LOESS regression fit with a span of 0.25. Data were grouped into 100 equally sized bins ordered by either |eQTL effect size| (a) or gene conservation (b). See ‘Gene sets’ in the Methods for a description of how gene conservation was calculated.

Supplementary Figure 5 Gene-level metrics of common (MAF > 5%) regulatory and rare (MAF < 1%) coding variant haplotypes.

a–d, Haplotypes were generated using potentially pathogenic (CADD > 15) or benign (CADD < 15) rare coding variants and the top cross-tissue GTEx v6p eQTLs to define more highly and weakly expressed haplotypes. Histograms are shown of the more highly expressed haplotype Ω, the number of coding variants with haplotype data, the mean coding variant frequency, and the number of haplotypes observed at the gene level for haplotypes from 620 phased and imputed GTEx v7 whole genomes (a), 615 phased and imputed TCGA germline whole genomes (b), and phased and imputed array and whole exome data from 1,712 SSC unaffected siblings (c) and 2,304 SSC probands (d). Differences in the frequency distribution of the more highly expressed haplotype between populations result from differences in eQTL allele frequency. This is not expected to cause systematic bias in our test of modified penetrance shown in Supplementary Fig. 3.

Supplementary Figure 6 Analysis of joint eQTL and coding variant haplotype configurations in disease cohorts.

a, Analysis of rare benign variants in disease-implicated genes. b, Analysis of rare pathogenic variants in control genes selected to have within ±5% the number of coding variants, coding variant frequency and number of eQTL coding variant haplotypes as disease genes. The same number of haplotypes that were used in the disease gene analysis (Fig. 4) were sampled from the pool of haplotypes in control genes. c, Analysis of rare pathogenic variants in disease-implicated genes after removing individuals with Asian ancestry from the TCGA cohort. Median estimates and 95% confidence intervals were generated using 100,000 bootstraps, and two-sided empirical P-values were generated from these confidence intervals and combined between cohorts using Fisher’s method to produce meta P-values (*P < 0.05, **P < 0.01). See ‘Gene sets’ in the Methods for description of the gene sets used, Supplementary Fig. 5 for description of the eQTL coding variant haplotypes used for the analysis and Supplementary Table 3 for the full table of results, including individual cohort-level P-values.

Supplementary Figure 7 eQTL-haplotype-aware editing of a Mendelian SNP in 293T cells using the transcriptome as a phenotypic readout.

a, HaploReg v3 annotations of the top causal variant candidates for the FLCN eQTL. Highlighted in blue is variant rs1708629, which we consider to be the most likely causal variant, as it is annotated as promoter and enhancer in multiple Roadmap Epigenomic cell types, is bound to numerous proteins, and contains many protein-binding motifs. The r² value between the highlighted variant and other variants is listed. b, Visualization of eQTL SNP (rs1708629) genotyping reads, edited SNP (rs199643834) genotyping reads from representative monoallelic and single-copy clones, and targeted RNA-seq reads from representative clones with low SNP expression (snpLOW) and high SNP expression (snpHIGH). Ratios of the reference allele and alternative allele in targeted DNA and RNA sequencing are indicated. A full list of primers used for sequencing can be found in Supplementary Table 8. c, Volcano plot of differential expression analysis comparing two clones that are monoallelic for the edited SNP versus four wild-type clones. d, Illustration of haplotype configurations for the clonal cell lines analyzed.

Supplementary Figure 8 Comparison of absolute distance between coding variants and eQTL variants for analyzed gene sets.

a, Comparison of median pathogenic versus benign coding variant distances for each of the gene sets considered in Fig. 3. b, Comparison of median pathogenic versus benign coding variant distances from cases (TCGA/SSC probands) and controls (GTEX/SSC siblings) for each of the disease gene sets considered in Fig. 4. P-values were calculated using a two-sided Wilcoxon signed rank test. For box plots: bottom whisker, Q1 – 1.5 × IQR; top whisker, Q3 + 1.5 × IQR; box, IQR; center, median; outliers are not plotted for ease of viewing.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–8

Reporting Summary

Supplementary Table 1

Top cross-tissue GTEx v6p eQTLs per gene

Supplementary Table 2

TCGA individuals and respective cancer types used for analysis

Supplementary Table 3

Disease cohort-level results of regulatory modifiers of the penetrance test

Supplementary Table 4

Results from targeted sequencing of edited 293T monoclonal cell lines

Supplementary Table 5

Genes with expression affected by the FLCN Mendelian disease SNP

Supplementary Table 6

Pathway-based gene set enrichment analysis of genes affected by the FLCN Mendelian disease SNP

Supplementary Table 7

Inferred ancestry of individuals in GTEx v7 and TCGA cohorts used for analysis

Supplementary Table 8

Oligonucleotides used in this study

Rights and permissions

Reprints and permissions

About this article

Cite this article

Castel, S.E., Cervera, A., Mohammadi, P. et al. Modified penetrance of coding variants by cis-regulatory variation contributes to disease risk. Nat Genet 50, 1327–1334 (2018). https://doi.org/10.1038/s41588-018-0192-y

Download citation

Received: 28 February 2018
Accepted: 05 July 2018
Published: 20 August 2018
Issue Date: September 2018
DOI: https://doi.org/10.1038/s41588-018-0192-y

This article is cited by

Genetic modifiers of rare variants in monogenic developmental disorder loci
- Rebecca Kingdom
- Robin N. Beaumont
- Caroline F. Wright
Nature Genetics (2024)
Investigating the role of common cis-regulatory variants in modifying penetrance of putatively damaging, inherited variants in severe neurodevelopmental disorders
- Emilie M. Wigdor
- Kaitlin E. Samocha
- Hilary C. Martin
Scientific Reports (2024)
Haplotype-aware modeling of cis-regulatory effects highlights the gaps remaining in eQTL data
- Nava Ehsan
- Bence M. Kotis
- Pejman Mohammadi
Nature Communications (2024)
Characterizing the pathogenicity of genetic variants: the consequences of context
- Timothy H. Ciesielski
- Giorgio Sirugo
- Scott M. Williams
npj Genomic Medicine (2024)
Causal relationship between gut microbiota and cancers: a two-sample Mendelian randomisation study
- Yiwen Long
- Lanhua Tang
- Hong Zhu
BMC Medicine (2023)