Genetic studies of complex traits have mainly identified associations with noncoding variants. To further determine the contribution of regulatory variation, we combined whole-genome and transcriptome data for 624 individuals from Sardinia to identify common and rare variants that influence gene expression and splicing. We identified 21,183 expression quantitative trait loci (eQTLs) and 6,768 splicing quantitative trait loci (sQTLs), including 619 new QTLs. We identified high-frequency QTLs and found evidence of selection near genes involved in malarial resistance and increased multiple sclerosis risk, reflecting the epidemiological history of Sardinia. Using family relationships, we identified 809 segregating expression outliers (median z score of 2.97), averaging 13.3 genes per individual. Outlier genes were enriched for proximal rare variants, providing a new approach to study large-effect regulatory variants and their relevance to traits. Our results provide insight into the effects of regulatory variants and their relationship to population history and individual genetic risk.
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Tennessen, J.A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).
Nelson, M.R. et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337, 100–104 (2012).
Coventry, A. et al. Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nat. Commun. 1, 131 (2010).
UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).
1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Narasimhan, V.M. et al. Health and population effects of rare gene knockouts in adult humans with related parents. Science 352, 474–477 (2016).
MacArthur, D.G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).
Li, A.H. et al. Analysis of loss-of-function variants and 20 risk factor phenotypes in 8,554 individuals identifies loci influencing chronic disease. Nat. Genet. 47, 640–642 (2015).
Sulem, P. et al. Identification of a large set of rare complete human knockouts. Nat. Genet. 47, 448–452 (2015).
Flannick, J. et al. Loss-of-function mutations in SLC30A8 protect against type 2 diabetes. Nat. Genet. 46, 357–363 (2014).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Moltke, I. et al. A common Greenlandic TBC1D4 variant confers muscle insulin resistance and type 2 diabetes. Nature 512, 190–193 (2014).
Zoledziewska, M. et al. Height-reducing variants and selection for short stature in Sardinia. Nat. Genet. 47, 1352–1356 (2015).
Bottini, N. et al. A functional variant of lymphoid tyrosine phosphatase is associated with type I diabetes. Nat. Genet. 36, 337–338 (2004).
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Battle, A. et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 24, 14–24 (2014).
Maurano, M.T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Nicolae, D.L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).
Sidore, C. et al. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat. Genet. 47, 1272–1281 (2015).
Peltonen, L., Palotie, A. & Lange, K. Use of population isolates for mapping complex traits. Nat. Rev. Genet. 1, 182–190 (2000).
Lim, E.T. et al. Distribution and medical impact of loss-of-function variants in the Finnish founder population. PLoS Genet. 10, e1004494 (2014).
Orrù, V. et al. Genetic variants regulating immune cell levels in health and disease. Cell 155, 242–256 (2013).
Pilia, G. et al. Heritability of cardiovascular and personality traits in 6,148 Sardinians. PLoS Genet. 2, e132 (2006).
Pistis, G. et al. Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs. Eur. J. Hum. Genet. 23, 975–983 (2015).
Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).
Trapnell, C. et al. Transcript assembly and quantification by RNA–Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).
Segura, V. et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44, 825–830 (2012).
Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).
Dovas, A. & Couchman, J.R. RhoGDI: multiple functions in the regulation of Rho family GTPase activities. Biochem. J. 390, 1–9 (2005).
Castel, S.E., Levy-Moonshine, A., Mohammadi, P., Banks, E. & Lappalainen, T. Tools and best practices for data processing in allelic expression analysis. Genome Biol. 16, 195 (2015).
Voight, B.F., Kudaravalli, S., Wen, X. & Pritchard, J.K. A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006).
Kudaravalli, S., Veyrieras, J.-B., Stranger, B.E., Dermitzakis, E.T. & Pritchard, J.K. Gene expression levels are a target of recent natural selection in the human genome. Mol. Biol. Evol. 26, 649–658 (2009).
Kaneko, A. et al. Malaria eradication on islands. Lancet 356, 1560–1564 (2000).
Tognotti, E. Program to eradicate malaria in Sardinia, 1946–1950. Emerg. Infect. Dis. 15, 1460–1466 (2009).
Pugliatti, M., Sotgiu, S. & Rosati, G. The worldwide prevalence of multiple sclerosis. Clin. Neurol. Neurosurg. 104, 182–191 (2002).
Pugliatti, M. et al. The epidemiology of multiple sclerosis in Europe. Eur. J. Neurol. 13, 700–722 (2006).
Liu, X.Q. et al. Malaria infection alters the expression of B-cell activating factor resulting in diminished memory antibody responses and survival. Eur. J. Immunol. 42, 3291–3301 (2012).
Scholzen, A. & Sauerwein, R.W. How malaria modulates memory: activation and dysregulation of B cells in Plasmodium infection. Trends Parasitol. 29, 252–262 (2013).
Scholzen, A. et al. BAFF and BAFF receptor levels correlate with B cell subset activation and redistribution in controlled human malaria infection. J. Immunol. 192, 3719–3729 (2014).
Kosoy, R. et al. Evidence for malaria selection of a CR1 haplotype in Sardinia. Genes Immun. 12, 582–588 (2011).
Stoute, J.A. Complement receptor 1 and malaria. Cell. Microbiol. 13, 1441–1450 (2011).
Naitza, S. et al. A genome-wide association scan on the levels of markers of inflammation in Sardinians reveals associations that underpin its complex regulation. PLoS Genet. 8, e1002480 (2012).
Tabassum, R. et al. Omic personality: implications of stable transcript and methylation profiles for personalized medicine. Genome Med. 7, 88 (2015).
Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30, 271–277 (2012).
Patwardhan, R.P. et al. Massively parallel functional dissection of mammalian enhancers in vivo. Nat. Biotechnol. 30, 265–270 (2012).
Kwasnieski, J.C., Mogno, I., Myers, C.A., Corbo, J.C. & Cohen, B.A. Complex effects of nucleotide variants in a mammalian cis-regulatory element. Proc. Natl. Acad. Sci. USA 109, 19498–19503 (2012).
Veyrieras, J.-B. High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet. 4, e1000214 (2008).
Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Cooper, G.M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005).
Pollard, K.S., Hubisz, M.J., Rosenbloom, K.R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).
Gulko, B., Hubisz, M.J., Gronau, I. & Siepel, A. A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat. Genet. 47, 276–283 (2015).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Skarratt, K.K. et al. A 5′ intronic splice site polymorphism leads to a null allele of the P2X7 gene in 1–2% of the Caucasian population. FEBS Lett. 579, 2675–2678 (2005).
Johnston, J.J. et al. Individualized iterative phenotyping for genome-wide analysis of loss-of-function mutations. Am. J. Hum. Genet. 96, 913–925 (2015).
Montgomery, S.B., Lappalainen, T., Gutierrez-Arcelus, M. & Dermitzakis, E.T. Rare and common regulatory variation in population-scale sequenced human genomes. PLoS Genet. 7, e1002144 (2011).
Li, X. et al. Transcriptome sequencing of a large human family identifies the impact of rare noncoding variants. Am. J. Hum. Genet. 95, 245–256 (2014).
Zeng, Y. et al. Aberrant gene expression in humans. PLoS Genet. 11, e1004942 (2015).
Zhao, J. et al. A burden of rare variants associated with extremes of gene expression in human peripheral blood. Am. J. Hum. Genet. 98, 299–309 (2016).
Dobin, A. et al. STAR: ultrafast universal RNA–seq aligner. Bioinformatics 29, 15–21 (2013).
Anders, S., Pyl, P.T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).
Garber, M., Grabherr, M.G., Guttman, M. & Trapnell, C. Computational methods for transcriptome annotation and quantification using RNA–seq. Nat. Methods 8, 469–477 (2011).
Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).
Abecasis, G.R., Cherny, S.S., Cookson, W.O. & Cardon, L.R. Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30, 97–101 (2002).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
Shabalin, A.A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28, 1353–1358 (2012).
Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).
Sawcer, S. et al. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214–219 (2011).
Chen, W.M. & Abecasis, G.R. Family-based association tests for genomewide association scans. Am. J. Hum. Genet. 81, 913–926 (2007).
Sherry, S.T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).
All participants gave informed consent, with protocols approved by institutional review boards for ASL4 in Sardinia and by the University of Michigan. IRB exemption (OHSRP 11916) applied to analyses on coded data at collaborating institutions. M.P. is supported by the European Union's Horizon 2020 Research and Innovation Programme under grant agreement 633964 (ImmunoAgeing). Z.Z. is supported by the National Science Foundation (NSF) GRFP (DGE-114747) and by the Stanford Center for Computational, Evolutionary, and Human Genomics (CEHG). Z.Z., J.R.D., and G.T.H. also acknowledge support from the Stanford Genome Training Program (SGTP; NIH/NHGRI T32HG000044). J.R.D. is supported by the Stanford Graduate Fellowship. K.R.K. is supported by Department of Defense, Air Force Office of Scientific Research, National Defense Science and Engineering Graduate (NDSEQ) Fellowship 32 CFR 168a. S.J.S. is supported by the NIHR Cambridge Biomedical Research Centre. The SardiNIA project is supported in part by the intramural program of the National Institute on Aging through contract HHSN271201100005C to the Consiglio Nazionale delle Ricerche of Italy. The RNA sequencing was supported by the PB05 InterOmics MIUR Flagship grant; by the FaReBio2011 “Farmaci e Reti Biotecnologiche di Qualità” grant; and by Sardinian Autonomous Region (L.R. no. 7/2009) grant cRP3-154 to F. Cucca, who is also supported by the Italian Foundation for Multiple Sclerosis (FISM 2015/R/09) and by the Fondazione di Sardegna (ex Fondazione Banco di Sardegna, Prot. U1301.2015/AI.1157.BE Prat. 2015-1651). S.B.M. is supported by the US National Institutes of Health through R01HG008150, R01MH101814, U01HG007436, and U01HG009080. All of the authors would like to thank the CRS4 and the SCGPM for the computational infrastructure supporting this project.
The authors declare no competing financial interests.
About this article
Cite this article
Pala, M., Zappala, Z., Marongiu, M. et al. Population- and individual-specific regulatory variation in Sardinia. Nat Genet 49, 700–707 (2017). https://doi.org/10.1038/ng.3840
Trends in Genetics (2020)
PLOS Genetics (2020)
Nature Reviews Genetics (2019)
Genome Biology (2019)