Population- and individual-specific regulatory variation in Sardinia

Abstract

Genetic studies of complex traits have mainly identified associations with noncoding variants. To further determine the contribution of regulatory variation, we combined whole-genome and transcriptome data for 624 individuals from Sardinia to identify common and rare variants that influence gene expression and splicing. We identified 21,183 expression quantitative trait loci (eQTLs) and 6,768 splicing quantitative trait loci (sQTLs), including 619 new QTLs. We identified high-frequency QTLs and found evidence of selection near genes involved in malarial resistance and increased multiple sclerosis risk, reflecting the epidemiological history of Sardinia. Using family relationships, we identified 809 segregating expression outliers (median z score of 2.97), averaging 13.3 genes per individual. Outlier genes were enriched for proximal rare variants, providing a new approach to study large-effect regulatory variants and their relevance to traits. Our results provide insight into the effects of regulatory variants and their relationship to population history and individual genetic risk.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: QTLs show larger effect sizes in Sardinia than in Europe.
Figure 2: Differentiated eQTLs in Sardinia.
Figure 3: Outlier gene expression in Sardinian trios.
Figure 4: Properties of rare shared variants near outlier genes.
Figure 5: Gene expression patterns in carriers of rare splicing variants.

References

  1. 1

    Tennessen, J.A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2

    Nelson, M.R. et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337, 100–104 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. 3

    Coventry, A. et al. Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nat. Commun. 1, 131 (2010).

    PubMed  PubMed Central  Google Scholar 

  4. 4

    UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).

  5. 5

    1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  6. 6

    Narasimhan, V.M. et al. Health and population effects of rare gene knockouts in adult humans with related parents. Science 352, 474–477 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. 7

    MacArthur, D.G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8

    Li, A.H. et al. Analysis of loss-of-function variants and 20 risk factor phenotypes in 8,554 individuals identifies loci influencing chronic disease. Nat. Genet. 47, 640–642 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. 9

    Sulem, P. et al. Identification of a large set of rare complete human knockouts. Nat. Genet. 47, 448–452 (2015).

    CAS  PubMed  Google Scholar 

  10. 10

    Flannick, J. et al. Loss-of-function mutations in SLC30A8 protect against type 2 diabetes. Nat. Genet. 46, 357–363 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11

    Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. 12

    Moltke, I. et al. A common Greenlandic TBC1D4 variant confers muscle insulin resistance and type 2 diabetes. Nature 512, 190–193 (2014).

    CAS  PubMed  Google Scholar 

  13. 13

    Zoledziewska, M. et al. Height-reducing variants and selection for short stature in Sardinia. Nat. Genet. 47, 1352–1356 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14

    Bottini, N. et al. A functional variant of lymphoid tyrosine phosphatase is associated with type I diabetes. Nat. Genet. 36, 337–338 (2004).

    CAS  PubMed  Google Scholar 

  15. 15

    Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16

    GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

  17. 17

    Battle, A. et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 24, 14–24 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. 18

    Maurano, M.T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19

    Nicolae, D.L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).

    PubMed  PubMed Central  Google Scholar 

  20. 20

    Sidore, C. et al. Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers. Nat. Genet. 47, 1272–1281 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21

    Peltonen, L., Palotie, A. & Lange, K. Use of population isolates for mapping complex traits. Nat. Rev. Genet. 1, 182–190 (2000).

    CAS  PubMed  Google Scholar 

  22. 22

    Lim, E.T. et al. Distribution and medical impact of loss-of-function variants in the Finnish founder population. PLoS Genet. 10, e1004494 (2014).

    PubMed  PubMed Central  Google Scholar 

  23. 23

    Orrù, V. et al. Genetic variants regulating immune cell levels in health and disease. Cell 155, 242–256 (2013).

    PubMed  PubMed Central  Google Scholar 

  24. 24

    Pilia, G. et al. Heritability of cardiovascular and personality traits in 6,148 Sardinians. PLoS Genet. 2, e132 (2006).

    PubMed  PubMed Central  Google Scholar 

  25. 25

    Pistis, G. et al. Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs. Eur. J. Hum. Genet. 23, 975–983 (2015).

    PubMed  Google Scholar 

  26. 26

    Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27

    Trapnell, C. et al. Transcript assembly and quantification by RNA–Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28

    Segura, V. et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44, 825–830 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. 29

    Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30

    Dovas, A. & Couchman, J.R. RhoGDI: multiple functions in the regulation of Rho family GTPase activities. Biochem. J. 390, 1–9 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. 31

    Castel, S.E., Levy-Moonshine, A., Mohammadi, P., Banks, E. & Lappalainen, T. Tools and best practices for data processing in allelic expression analysis. Genome Biol. 16, 195 (2015).

    PubMed  PubMed Central  Google Scholar 

  32. 32

    Voight, B.F., Kudaravalli, S., Wen, X. & Pritchard, J.K. A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006).

    PubMed  PubMed Central  Google Scholar 

  33. 33

    Kudaravalli, S., Veyrieras, J.-B., Stranger, B.E., Dermitzakis, E.T. & Pritchard, J.K. Gene expression levels are a target of recent natural selection in the human genome. Mol. Biol. Evol. 26, 649–658 (2009).

    CAS  PubMed  Google Scholar 

  34. 34

    Kaneko, A. et al. Malaria eradication on islands. Lancet 356, 1560–1564 (2000).

    CAS  PubMed  Google Scholar 

  35. 35

    Tognotti, E. Program to eradicate malaria in Sardinia, 1946–1950. Emerg. Infect. Dis. 15, 1460–1466 (2009).

    PubMed  PubMed Central  Google Scholar 

  36. 36

    Pugliatti, M., Sotgiu, S. & Rosati, G. The worldwide prevalence of multiple sclerosis. Clin. Neurol. Neurosurg. 104, 182–191 (2002).

    PubMed  Google Scholar 

  37. 37

    Pugliatti, M. et al. The epidemiology of multiple sclerosis in Europe. Eur. J. Neurol. 13, 700–722 (2006).

    CAS  PubMed  Google Scholar 

  38. 38

    Liu, X.Q. et al. Malaria infection alters the expression of B-cell activating factor resulting in diminished memory antibody responses and survival. Eur. J. Immunol. 42, 3291–3301 (2012).

    CAS  PubMed  Google Scholar 

  39. 39

    Scholzen, A. & Sauerwein, R.W. How malaria modulates memory: activation and dysregulation of B cells in Plasmodium infection. Trends Parasitol. 29, 252–262 (2013).

    CAS  PubMed  Google Scholar 

  40. 40

    Scholzen, A. et al. BAFF and BAFF receptor levels correlate with B cell subset activation and redistribution in controlled human malaria infection. J. Immunol. 192, 3719–3729 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41

    Kosoy, R. et al. Evidence for malaria selection of a CR1 haplotype in Sardinia. Genes Immun. 12, 582–588 (2011).

    CAS  PubMed  Google Scholar 

  42. 42

    Stoute, J.A. Complement receptor 1 and malaria. Cell. Microbiol. 13, 1441–1450 (2011).

    CAS  PubMed  Google Scholar 

  43. 43

    Naitza, S. et al. A genome-wide association scan on the levels of markers of inflammation in Sardinians reveals associations that underpin its complex regulation. PLoS Genet. 8, e1002480 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44

    Tabassum, R. et al. Omic personality: implications of stable transcript and methylation profiles for personalized medicine. Genome Med. 7, 88 (2015).

    PubMed  PubMed Central  Google Scholar 

  45. 45

    Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30, 271–277 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. 46

    Patwardhan, R.P. et al. Massively parallel functional dissection of mammalian enhancers in vivo. Nat. Biotechnol. 30, 265–270 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47

    Kwasnieski, J.C., Mogno, I., Myers, C.A., Corbo, J.C. & Cohen, B.A. Complex effects of nucleotide variants in a mammalian cis-regulatory element. Proc. Natl. Acad. Sci. USA 109, 19498–19503 (2012).

    CAS  PubMed  Google Scholar 

  48. 48

    Veyrieras, J.-B. High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet. 4, e1000214 (2008).

    PubMed  PubMed Central  Google Scholar 

  49. 49

    Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  50. 50

    Cooper, G.M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. 51

    Pollard, K.S., Hubisz, M.J., Rosenbloom, K.R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. 52

    Gulko, B., Hubisz, M.J., Gronau, I. & Siepel, A. A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat. Genet. 47, 276–283 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. 53

    Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. 54

    Skarratt, K.K. et al. A 5′ intronic splice site polymorphism leads to a null allele of the P2X7 gene in 1–2% of the Caucasian population. FEBS Lett. 579, 2675–2678 (2005).

    CAS  PubMed  Google Scholar 

  55. 55

    Johnston, J.J. et al. Individualized iterative phenotyping for genome-wide analysis of loss-of-function mutations. Am. J. Hum. Genet. 96, 913–925 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. 56

    Montgomery, S.B., Lappalainen, T., Gutierrez-Arcelus, M. & Dermitzakis, E.T. Rare and common regulatory variation in population-scale sequenced human genomes. PLoS Genet. 7, e1002144 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. 57

    Li, X. et al. Transcriptome sequencing of a large human family identifies the impact of rare noncoding variants. Am. J. Hum. Genet. 95, 245–256 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. 58

    Zeng, Y. et al. Aberrant gene expression in humans. PLoS Genet. 11, e1004942 (2015).

    PubMed  PubMed Central  Google Scholar 

  59. 59

    Zhao, J. et al. A burden of rare variants associated with extremes of gene expression in human peripheral blood. Am. J. Hum. Genet. 98, 299–309 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. 60

    Dobin, A. et al. STAR: ultrafast universal RNA–seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. 61

    Anders, S., Pyl, P.T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).

    CAS  Google Scholar 

  62. 62

    Garber, M., Grabherr, M.G., Guttman, M. & Trapnell, C. Computational methods for transcriptome annotation and quantification using RNA–seq. Nat. Methods 8, 469–477 (2011).

    CAS  PubMed  Google Scholar 

  63. 63

    Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. 64

    Abecasis, G.R., Cherny, S.S., Cookson, W.O. & Cardon, L.R. Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30, 97–101 (2002).

    CAS  Google Scholar 

  65. 65

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    PubMed  PubMed Central  Google Scholar 

  66. 66

    Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

    Google Scholar 

  67. 67

    Shabalin, A.A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28, 1353–1358 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. 68

    Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).

    CAS  Google Scholar 

  69. 69

    Sawcer, S. et al. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214–219 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  70. 70

    Chen, W.M. & Abecasis, G.R. Family-based association tests for genomewide association scans. Am. J. Hum. Genet. 81, 913–926 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. 71

    Sherry, S.T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

All participants gave informed consent, with protocols approved by institutional review boards for ASL4 in Sardinia and by the University of Michigan. IRB exemption (OHSRP 11916) applied to analyses on coded data at collaborating institutions. M.P. is supported by the European Union's Horizon 2020 Research and Innovation Programme under grant agreement 633964 (ImmunoAgeing). Z.Z. is supported by the National Science Foundation (NSF) GRFP (DGE-114747) and by the Stanford Center for Computational, Evolutionary, and Human Genomics (CEHG). Z.Z., J.R.D., and G.T.H. also acknowledge support from the Stanford Genome Training Program (SGTP; NIH/NHGRI T32HG000044). J.R.D. is supported by the Stanford Graduate Fellowship. K.R.K. is supported by Department of Defense, Air Force Office of Scientific Research, National Defense Science and Engineering Graduate (NDSEQ) Fellowship 32 CFR 168a. S.J.S. is supported by the NIHR Cambridge Biomedical Research Centre. The SardiNIA project is supported in part by the intramural program of the National Institute on Aging through contract HHSN271201100005C to the Consiglio Nazionale delle Ricerche of Italy. The RNA sequencing was supported by the PB05 InterOmics MIUR Flagship grant; by the FaReBio2011 “Farmaci e Reti Biotecnologiche di Qualità” grant; and by Sardinian Autonomous Region (L.R. no. 7/2009) grant cRP3-154 to F. Cucca, who is also supported by the Italian Foundation for Multiple Sclerosis (FISM 2015/R/09) and by the Fondazione di Sardegna (ex Fondazione Banco di Sardegna, Prot. U1301.2015/AI.1157.BE Prat. 2015-1651). S.B.M. is supported by the US National Institutes of Health through R01HG008150, R01MH101814, U01HG007436, and U01HG009080. All of the authors would like to thank the CRS4 and the SCGPM for the computational infrastructure supporting this project.

Author information

Affiliations

Authors

Contributions

M.P., Z.Z., Mara Marongiu, G.R.A., D.S., F. Cucca, and S.B.M. conceived and designed the experiments. Mara Marongiu, R.C., F. Crobu, M.G.P., A. Mulas, M.Z., F.B., A. Maschio, E.F., and A.A. performed the experiments. M.P., Z.Z., X.L., J.R.D., M.J.G., G.R.A., F. Cucca, and S.B.M. performed statistical analysis. M.P., Z.Z., X.L., J.R.D., K.R.K., M.J.G., F.R., R.B., Michele Marongiu, M.S., C.S., S.S., A.B., J.N., G.R.A., D.S., F. Cucca, and S.B.M. analyzed the data. M.P., Z.Z., M.C.B., A.B., J.N., C.J., S.J.S., G.R.A., D.S., F. Cucca, G.T.H., E.P.S., K.S.S., and S.B.M. contributed reagents, materials, and/or analysis tools. M.P., Z.Z., J.N., G.R.A., D.S., F. Cucca, and S.B.M. wrote the manuscript. M.P. and Z.Z. contributed equally. F. Cucca and S.B.M. jointly directed research. All authors read and approved the final version of the manuscript.

Corresponding authors

Correspondence to Francesco Cucca or Stephen B Montgomery.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–16, Supplementary Tables 1–28 and Supplementary Note (PDF 8770 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pala, M., Zappala, Z., Marongiu, M. et al. Population- and individual-specific regulatory variation in Sardinia. Nat Genet 49, 700–707 (2017). https://doi.org/10.1038/ng.3840

Download citation

Further reading