Abstract

Genetic studies of complex traits have mainly identified associations with noncoding variants. To further determine the contribution of regulatory variation, we combined whole-genome and transcriptome data for 624 individuals from Sardinia to identify common and rare variants that influence gene expression and splicing. We identified 21,183 expression quantitative trait loci (eQTLs) and 6,768 splicing quantitative trait loci (sQTLs), including 619 new QTLs. We identified high-frequency QTLs and found evidence of selection near genes involved in malarial resistance and increased multiple sclerosis risk, reflecting the epidemiological history of Sardinia. Using family relationships, we identified 809 segregating expression outliers (median z score of 2.97), averaging 13.3 genes per individual. Outlier genes were enriched for proximal rare variants, providing a new approach to study large-effect regulatory variants and their relevance to traits. Our results provide insight into the effects of regulatory variants and their relationship to population history and individual genetic risk.

  • Subscribe to Nature Genetics for full access:

    $59

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

References

  1. 1.

    et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).

  2. 2.

    et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337, 100–104 (2012).

  3. 3.

    et al. Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nat. Commun. 1, 131 (2010).

  4. 4.

    UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).

  5. 5.

    1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  6. 6.

    et al. Health and population effects of rare gene knockouts in adult humans with related parents. Science 352, 474–477 (2016).

  7. 7.

    et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).

  8. 8.

    et al. Analysis of loss-of-function variants and 20 risk factor phenotypes in 8,554 individuals identifies loci influencing chronic disease. Nat. Genet. 47, 640–642 (2015).

  9. 9.

    et al. Identification of a large set of rare complete human knockouts. Nat. Genet. 47, 448–452 (2015).

  10. 10.

    et al. Loss-of-function mutations in SLC30A8 protect against type 2 diabetes. Nat. Genet. 46, 357–363 (2014).

  11. 11.

    et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

  12. 12.

    et al. A common Greenlandic TBC1D4 variant confers muscle insulin resistance and type 2 diabetes. Nature 512, 190–193 (2014).

  13. 13.

    et al. Height-reducing variants and selection for short stature in Sardinia. Nat. Genet. 47, 1352–1356 (2015).

  14. 14.

    et al. A functional variant of lymphoid tyrosine phosphatase is associated with type I diabetes. Nat. Genet. 36, 337–338 (2004).

  15. 15.

    et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

  16. 16.

    GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

  17. 17.

    et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 24, 14–24 (2014).

  18. 18.

    et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

  19. 19.

    et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).

  20. 20.

    et al. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat. Genet. 47, 1272–1281 (2015).

  21. 21.

    , & Use of population isolates for mapping complex traits. Nat. Rev. Genet. 1, 182–190 (2000).

  22. 22.

    et al. Distribution and medical impact of loss-of-function variants in the Finnish founder population. PLoS Genet. 10, e1004494 (2014).

  23. 23.

    et al. Genetic variants regulating immune cell levels in health and disease. Cell 155, 242–256 (2013).

  24. 24.

    et al. Heritability of cardiovascular and personality traits in 6,148 Sardinians. PLoS Genet. 2, e132 (2006).

  25. 25.

    et al. Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs. Eur. J. Hum. Genet. 23, 975–983 (2015).

  26. 26.

    , , , & Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).

  27. 27.

    et al. Transcript assembly and quantification by RNA–Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

  28. 28.

    et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44, 825–830 (2012).

  29. 29.

    et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).

  30. 30.

    & RhoGDI: multiple functions in the regulation of Rho family GTPase activities. Biochem. J. 390, 1–9 (2005).

  31. 31.

    , , , & Tools and best practices for data processing in allelic expression analysis. Genome Biol. 16, 195 (2015).

  32. 32.

    , , & A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006).

  33. 33.

    , , , & Gene expression levels are a target of recent natural selection in the human genome. Mol. Biol. Evol. 26, 649–658 (2009).

  34. 34.

    et al. Malaria eradication on islands. Lancet 356, 1560–1564 (2000).

  35. 35.

    Program to eradicate malaria in Sardinia, 1946–1950. Emerg. Infect. Dis. 15, 1460–1466 (2009).

  36. 36.

    , & The worldwide prevalence of multiple sclerosis. Clin. Neurol. Neurosurg. 104, 182–191 (2002).

  37. 37.

    et al. The epidemiology of multiple sclerosis in Europe. Eur. J. Neurol. 13, 700–722 (2006).

  38. 38.

    et al. Malaria infection alters the expression of B-cell activating factor resulting in diminished memory antibody responses and survival. Eur. J. Immunol. 42, 3291–3301 (2012).

  39. 39.

    & How malaria modulates memory: activation and dysregulation of B cells in Plasmodium infection. Trends Parasitol. 29, 252–262 (2013).

  40. 40.

    et al. BAFF and BAFF receptor levels correlate with B cell subset activation and redistribution in controlled human malaria infection. J. Immunol. 192, 3719–3729 (2014).

  41. 41.

    et al. Evidence for malaria selection of a CR1 haplotype in Sardinia. Genes Immun. 12, 582–588 (2011).

  42. 42.

    Complement receptor 1 and malaria. Cell. Microbiol. 13, 1441–1450 (2011).

  43. 43.

    et al. A genome-wide association scan on the levels of markers of inflammation in Sardinians reveals associations that underpin its complex regulation. PLoS Genet. 8, e1002480 (2012).

  44. 44.

    et al. Omic personality: implications of stable transcript and methylation profiles for personalized medicine. Genome Med. 7, 88 (2015).

  45. 45.

    et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30, 271–277 (2012).

  46. 46.

    et al. Massively parallel functional dissection of mammalian enhancers in vivo. Nat. Biotechnol. 30, 265–270 (2012).

  47. 47.

    , , , & Complex effects of nucleotide variants in a mammalian cis-regulatory element. Proc. Natl. Acad. Sci. USA 109, 19498–19503 (2012).

  48. 48.

    High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet. 4, e1000214 (2008).

  49. 49.

    Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  50. 50.

    et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005).

  51. 51.

    , , & Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).

  52. 52.

    , , & A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat. Genet. 47, 276–283 (2015).

  53. 53.

    et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

  54. 54.

    et al. A 5′ intronic splice site polymorphism leads to a null allele of the P2X7 gene in 1–2% of the Caucasian population. FEBS Lett. 579, 2675–2678 (2005).

  55. 55.

    et al. Individualized iterative phenotyping for genome-wide analysis of loss-of-function mutations. Am. J. Hum. Genet. 96, 913–925 (2015).

  56. 56.

    , , & Rare and common regulatory variation in population-scale sequenced human genomes. PLoS Genet. 7, e1002144 (2011).

  57. 57.

    et al. Transcriptome sequencing of a large human family identifies the impact of rare noncoding variants. Am. J. Hum. Genet. 95, 245–256 (2014).

  58. 58.

    et al. Aberrant gene expression in humans. PLoS Genet. 11, e1004942 (2015).

  59. 59.

    et al. A burden of rare variants associated with extremes of gene expression in human peripheral blood. Am. J. Hum. Genet. 98, 299–309 (2016).

  60. 60.

    et al. STAR: ultrafast universal RNA–seq aligner. Bioinformatics 29, 15–21 (2013).

  61. 61.

    , & HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).

  62. 62.

    , , & Computational methods for transcriptome annotation and quantification using RNA–seq. Nat. Methods 8, 469–477 (2011).

  63. 63.

    & Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).

  64. 64.

    , , & Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30, 97–101 (2002).

  65. 65.

    et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  66. 66.

    & Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

  67. 67.

    Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28, 1353–1358 (2012).

  68. 68.

    & Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).

  69. 69.

    et al. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214–219 (2011).

  70. 70.

    & Family-based association tests for genomewide association scans. Am. J. Hum. Genet. 81, 913–926 (2007).

  71. 71.

    et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

Download references

Acknowledgements

All participants gave informed consent, with protocols approved by institutional review boards for ASL4 in Sardinia and by the University of Michigan. IRB exemption (OHSRP 11916) applied to analyses on coded data at collaborating institutions. M.P. is supported by the European Union's Horizon 2020 Research and Innovation Programme under grant agreement 633964 (ImmunoAgeing). Z.Z. is supported by the National Science Foundation (NSF) GRFP (DGE-114747) and by the Stanford Center for Computational, Evolutionary, and Human Genomics (CEHG). Z.Z., J.R.D., and G.T.H. also acknowledge support from the Stanford Genome Training Program (SGTP; NIH/NHGRI T32HG000044). J.R.D. is supported by the Stanford Graduate Fellowship. K.R.K. is supported by Department of Defense, Air Force Office of Scientific Research, National Defense Science and Engineering Graduate (NDSEQ) Fellowship 32 CFR 168a. S.J.S. is supported by the NIHR Cambridge Biomedical Research Centre. The SardiNIA project is supported in part by the intramural program of the National Institute on Aging through contract HHSN271201100005C to the Consiglio Nazionale delle Ricerche of Italy. The RNA sequencing was supported by the PB05 InterOmics MIUR Flagship grant; by the FaReBio2011 “Farmaci e Reti Biotecnologiche di Qualità” grant; and by Sardinian Autonomous Region (L.R. no. 7/2009) grant cRP3-154 to F. Cucca, who is also supported by the Italian Foundation for Multiple Sclerosis (FISM 2015/R/09) and by the Fondazione di Sardegna (ex Fondazione Banco di Sardegna, Prot. U1301.2015/AI.1157.BE Prat. 2015-1651). S.B.M. is supported by the US National Institutes of Health through R01HG008150, R01MH101814, U01HG007436, and U01HG009080. All of the authors would like to thank the CRS4 and the SCGPM for the computational infrastructure supporting this project.

Author information

Author notes

    • Mauro Pala
    •  & Zachary Zappala

    These authors contributed equally to this work.

    • Francesco Cucca
    •  & Stephen B Montgomery

    These authors jointly directed this work.

Affiliations

  1. Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy.

    • Mauro Pala
    • , Mara Marongiu
    • , Roberto Cusano
    • , Francesca Crobu
    • , Maria G Piras
    • , Antonella Mulas
    • , Magdalena Zoledziewska
    • , Michele Marongiu
    • , Fabio Busonero
    • , Andrea Maschio
    • , Maristella Steri
    • , Carlo Sidore
    • , Serena Sanna
    • , Edoardo Fiorillo
    • , Andrea Angius
    •  & Francesco Cucca
  2. Department of Pathology, Stanford University School of Medicine, Stanford, California, USA.

    • Mauro Pala
    • , Xin Li
    • , Kevin S Smith
    •  & Stephen B Montgomery
  3. Dipartimento di Scienze Biomediche, Universita di Sassari, Sassari, Italy.

    • Mauro Pala
    • , Riccardo Berutti
    •  & Francesco Cucca
  4. Department of Genetics, Stanford University School of Medicine, Stanford, California, USA.

    • Zachary Zappala
    • , Joe R Davis
    • , Kimberly R Kukurba
    • , Elena P Sorokin
    • , Gaelen T Hess
    • , Michael C Bassik
    •  & Stephen B Montgomery
  5. Program in Biomedical Informatics, Stanford University School of Medicine, Stanford, California, USA.

    • Michael J Gloudemans
  6. CRS4, Advanced Genomic Computing Technology, Pula, Italy.

    • Frederic Reinier
    • , Riccardo Berutti
    •  & Chris Jones
  7. Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK.

    • Stephen J Sawcer
  8. Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland, USA.

    • Alexis Battle
  9. Department of Human Genetics, University of Chicago, Chicago, Illinois, USA.

    • John Novembre
  10. Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, USA.

    • Gonçalo R Abecasis
  11. Laboratory of Genetics, National Institute on Aging, Baltimore, Maryland, USA.

    • David Schlessinger

Authors

  1. Search for Mauro Pala in:

  2. Search for Zachary Zappala in:

  3. Search for Mara Marongiu in:

  4. Search for Xin Li in:

  5. Search for Joe R Davis in:

  6. Search for Roberto Cusano in:

  7. Search for Francesca Crobu in:

  8. Search for Kimberly R Kukurba in:

  9. Search for Michael J Gloudemans in:

  10. Search for Frederic Reinier in:

  11. Search for Riccardo Berutti in:

  12. Search for Maria G Piras in:

  13. Search for Antonella Mulas in:

  14. Search for Magdalena Zoledziewska in:

  15. Search for Michele Marongiu in:

  16. Search for Elena P Sorokin in:

  17. Search for Gaelen T Hess in:

  18. Search for Kevin S Smith in:

  19. Search for Fabio Busonero in:

  20. Search for Andrea Maschio in:

  21. Search for Maristella Steri in:

  22. Search for Carlo Sidore in:

  23. Search for Serena Sanna in:

  24. Search for Edoardo Fiorillo in:

  25. Search for Michael C Bassik in:

  26. Search for Stephen J Sawcer in:

  27. Search for Alexis Battle in:

  28. Search for John Novembre in:

  29. Search for Chris Jones in:

  30. Search for Andrea Angius in:

  31. Search for Gonçalo R Abecasis in:

  32. Search for David Schlessinger in:

  33. Search for Francesco Cucca in:

  34. Search for Stephen B Montgomery in:

Contributions

M.P., Z.Z., Mara Marongiu, G.R.A., D.S., F. Cucca, and S.B.M. conceived and designed the experiments. Mara Marongiu, R.C., F. Crobu, M.G.P., A. Mulas, M.Z., F.B., A. Maschio, E.F., and A.A. performed the experiments. M.P., Z.Z., X.L., J.R.D., M.J.G., G.R.A., F. Cucca, and S.B.M. performed statistical analysis. M.P., Z.Z., X.L., J.R.D., K.R.K., M.J.G., F.R., R.B., Michele Marongiu, M.S., C.S., S.S., A.B., J.N., G.R.A., D.S., F. Cucca, and S.B.M. analyzed the data. M.P., Z.Z., M.C.B., A.B., J.N., C.J., S.J.S., G.R.A., D.S., F. Cucca, G.T.H., E.P.S., K.S.S., and S.B.M. contributed reagents, materials, and/or analysis tools. M.P., Z.Z., J.N., G.R.A., D.S., F. Cucca, and S.B.M. wrote the manuscript. M.P. and Z.Z. contributed equally. F. Cucca and S.B.M. jointly directed research. All authors read and approved the final version of the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Francesco Cucca or Stephen B Montgomery.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–16, Supplementary Tables 1–28 and Supplementary Note