Here we describe the insights gained from sequencing the whole genomes of 2,636 Icelanders to a median depth of 20×. We found 20 million SNPs and 1.5 million insertions-deletions (indels). We describe the density and frequency spectra of sequence variants in relation to their functional annotation, gene position, pathway and conservation score. We demonstrate an excess of homozygosity and rare protein-coding variants in Iceland. We imputed these variants into 104,220 individuals down to a minor allele frequency of 0.1% and found a recessive frameshift mutation in MYL4 that causes early-onset atrial fibrillation, several mutations in ABCB4 that increase risk of liver diseases and an intronic variant in GNAS associating with increased thyroid-stimulating hormone levels when maternally inherited. These data provide a study design that can be used to determine how variation in the sequence of the human genome gives rise to human diversity.
At a glance
- A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007). et al.
- Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009). et al.
- Identification of low-frequency variants associated with gout and serum uric acid levels. Nat. Genet. 43, 1127–1130 (2011). et al.
- A mutation in APP protects against Alzheimer's disease and age-related cognitive decline. Nature 488, 96–99 (2012). et al.
- Mutations in BRIP1 confer high risk of ovarian cancer. Nat. Genet. 43, 1104–1107 (2011). et al.
- A rare variant in MYH6 is associated with high risk of sick sinus syndrome. Nat. Genet. 43, 316–320 (2011). et al.
- Nonsense mutation in the LGR4 gene is associated with several human diseases and other traits. Nature 497, 517–520 (2013). et al.
- Variant of TREM2 associated with the risk of Alzheimer's disease. N. Engl. J. Med. 368, 107–116 (2013). et al.
- A rare nonsynonymous sequence variant in C3 is associated with high risk of age-related macular degeneration. Nat. Genet. 45, 1371–1374 (2013). et al.
- A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer. Nat. Genet. 44, 1326–1329 (2012). et al.
- A germline variant in the TP53 polyadenylation signal confers cancer susceptibility. Nat. Genet. 43, 1098–1103 (2011). et al.
- Identification of low-frequency and rare sequence variants associated with elevated or reduced risk of type 2 diabetes. Nat. Genet. 46, 294–298 (2014). et al.
- Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012). et al.
- Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013). et al.
- Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat. Genet. 42, 969–972 (2010). et al.
- An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012). et al.
- The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). et al.
- Detection of sharing by descent, long-range phasing and haplotype imputation. Nat. Genet. 40, 1068–1075 (2008). et al.
- NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 40, D130–D135 (2012). , , &
- Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 2069–2070 (2010). et al.
- The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 6, R44 (2005). et al.
- Huvariome: a web server resource of whole genome next-generation sequencing allelic frequencies to aid in pathological candidate gene selection. J. Clin. Bioinforma 2, 19 (2012). et al.
- Human-specific insertions and deletions inferred from mammalian genome sequences. Genome Res. 17, 16–22 (2007). , , &
- The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes. Genome Res. 23, 749–761 (2013). et al.
- Mendelian Inheritance in Man and its online version, OMIM. Am. J. Hum. Genet. 80, 588–604 (2007).
- Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013). et al.
- Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013). , , , &
- Ensembl 2013. Nucleic Acids Res. 41, D48–D55 (2013). et al.
- A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012). et al.
- The types and prevalence of alternative splice forms. Curr. Opin. Struct. Biol. 16, 362–367 (2006). &
- Nonsense-mediated mRNA decay: terminating erroneous gene expression. Curr. Opin. Cell Biol. 16, 293–299 (2004). &
- Genetic variation in a human odorant receptor alters odour perception. Nature 449, 468–472 (2007). , , , &
- The missense of smell: functional variability in the human odorant receptor repertoire. Nat. Neurosci. 17, 114–120 (2014). et al.
- Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005). et al.
- Deterministic mutation rate variation in the human genome. Genome Res. 12, 1350–1356 (2002). , &
- Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000). et al.
- PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res. 41, D377–D386 (2013). , &
- Reconstructing dynamic regulatory maps. Mol. Syst. Biol. 3, 74 (2007). , , , &
- ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
- A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007). , , , &
- dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001). et al.
- Systematics and the Origin of Species from the Viewpoint of a Zoologist (Columbia University Press, 1942).
- A single BRCA2 mutation in male and female breast cancer families from Iceland with varied cancer phenotypes. Nat. Genet. 13, 117–119 (1996). et al.
- An Icelandic example of the impact of population structure on association studies. Nat. Genet. 37, 90–95 (2005). , , , &
- Identification of an imprinted master trans regulator at the KLF14 locus related to multiple metabolic phenotypes. Nat. Genet. 43, 561–564 (2011). et al.
- Parental origin of sequence variants associated with complex diseases. Nature 462, 868–874 (2009). et al.
- The imprinted DLK1-MEG3 gene region on chromosome 14q32.2 alters susceptibility to type 1 diabetes. Nat. Genet. 42, 68–71 (2010). et al.
- Central precocious puberty caused by mutations in the imprinted gene MKRN3. N. Engl. J. Med. 368, 2467–2475 (2013). et al.
- Genomic imprinting: implications for human disease. Am. J. Pathol. 154, 635–647 (1999). , , &
- Prevalence of diagnosed atrial fibrillation in adults: national implications for rhythm management and stroke prevention: the AnTicoagulation and Risk Factors in Atrial Fibrillation (ATRIA) Study. J. Am. Med. Assoc. 285, 2370–2375 (2001). et al.
- Lifetime risk for development of atrial fibrillation: the Framingham Heart Study. Circulation 110, 1042–1046 (2004). et al.
- Human fetal muscle and cultured myotubes derived from it contain a fetal-specific myosin light chain. Science 221, 955–957 (1983). , , &
- Chromosomal assignment of two myosin alkali light-chain genes encoding the ventricular/slow skeletal muscle isoform and the atrial/fetal muscle isoform (MYL3, MYL4). Hum. Genet. 81, 278–282 (1989). et al.
- Canalicular ABC transporters and liver disease. J. Pathol. 226, 300–315 (2012). et al.
- Progressive familial intrahepatic cholestasis. Orphanet J. Rare Dis. 4, 1 (2009). , , &
- Heterozygous MDR3 missense mutation associated with intrahepatic cholestasis of pregnancy: evidence for a defect in protein trafficking. Hum. Mol. Genet. 9, 1209–1217 (2000). et al.
- Discovery of common variants associated with low TSH levels and thyroid cancer risk. Nat. Genet. 44, 319–322 (2012). et al.
- Brown-Vialetto–Van Laere syndrome. Orphanet J. Rare Dis. 3, 9 (2008).
- Expanded polyglutamine domain possesses nuclear export activity which modulates subcellular localization and toxicity of polyQ disease protein via exportin-1. Hum. Mol. Genet. 20, 1738–1750 (2011). et al.
- Exome sequencing reveals riboflavin transporter mutations as a cause of motor neuron disease. Brain 135, 2875–2882 (2012). et al.
- Riboflavin transporter 3 involvement in infantile Brown-Vialetto-Van Laere disease: two novel mutations. J. Med. Genet. 50, 104–107 (2013). et al.
- Impaired riboflavin transport due to missense mutations in SLC52A2 causes Brown-Vialetto–Van Laere syndrome. J. Inherit. Metab. Dis. 35, 943–948 (2012). et al.
- Brown-Vialetto–Van Laere syndrome, a ponto-bulbar palsy with deafness, is caused by mutations in c20orf54. Am. J. Hum. Genet. 86, 485–489 (2010). et al.
- Exome sequencing in Brown-Vialetto–van Laere syndrome. Am. J. Hum. Genet. 87, 567–569, author reply 569–570 (2010). , , , &
- Brown-Vialetto–Van Laere and Fazio Londe syndrome is associated with a riboflavin transporter defect mimicking mild MADD: a new inborn error of metabolism with potential treatment. J. Inherit. Metab. Dis. 34, 159–164 (2011). et al.
- Cor pulmonale in a patient with Brown-Vialetto–Van Laere syndrome: a case report. J. Neurol. Sci. 300, 155–156 (2011). , , , &
- Pontobulbar palsy and sensorineural deafness (Brown-Vialetto–van Laere syndrome): the first case from Libya. Amyotroph. Lateral Scler. 11, 397–398 (2010). , &
- Progressive ponto-bulbar palsy with deafness. A clinico-pathological study. Acta Neurol. Belg. 76, 309–314 (1976). , , &
- Sclérose latérale amyotrophique ou myasthénie bulbospinal avec exaltation des réflexes tendineux et cntractions fibrillaires. J. Neurol. Psychiatry 6, 380–382 (1929). &
- A case of amyotrophic lateral sclerosis complicated by progressive lipodystrophy. Edin. Med. J. 60, 281–293 (1953). &
- Sequence variants from whole genome sequencing a large group of Icelanders. Sci. Data 2, 150011 doi:10.1038/sdata.2015.11 (2015). et al.
- Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009). &
- A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011). et al.
- Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
- The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002). et al.
- Ensembl 2012. Nucleic Acids Res. 40, D84–D90 (2012). et al.
- Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res. 18, 1814–1828 (2008). , , , &
- Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res. 18, 1829–1843 (2008). et al.
- Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011). et al.
- Supplementary Figure 4: Distribution of the number of observed alleles in 2,636 sequenced Icelanders by impact class. (95 KB)
Shown are the proportions of variants for which the minor allele was seen one to six times (MAF ≤ 0.11%).
- Supplementary Figure 5: Comparison of imputed and chip genotypes. (56 KB)
Shown is the fraction of the 28,204 SNPs identified in exons and splice regions and present on SNP chips that have r2 > 0.8, 0.9 and 0.99 between the imputed and chip genotypes as a function of their derived allele frequency (DAF).
- Supplementary Figure 6: The five pedigrees containing the eight homozygous carriers of c.234delC in MYL4. (98 KB)
Symbols for homozygous carriers are colored black. Symbols for deceased individuals are stricken through with a forward-leaning line. Symbols for individuals who have not been genotyped directly are stricken through with a backward-leaning line. Under each individual are up to five lines containing information about the individual. First appear an identifier, consisting of a pedigree name (f1–f5), the generation of the individual in roman numerals and an enumerator within the generation. Second appear the individual’s year of birth and, if appropriate, the individual’s year of death. Third appears the individual’s c.234delC genotype, where D and W denote directly genotyped deletion and wild-type alleles, respectively, and d and w denote in silico genotypes inferred from the genotypes of relatives. The order of the alleles indicates the parent of origin, where the first allele comes from the father and the second allele comes from the mother, except for the three cases for whom parent of origin could not be assigned: f2-I:1, f2-I:2 and f3-I:2. Fourth appear an indication of whether the individual has been diagnosed with AF and the age at onset after the @ sign. Fifth appear the presence of other relevant phenotypes: sick sinus syndrome (SSS), pacemaker implantation (PM) and sudden cardiac death (SCD).
- Supplementary Figure 7: The transmission of chromosome 17 through pedigree f-2. (91 KB)
The transmission of the founding couple of pedigree f-2 can be reconstructed on the basis of the expected values for meiotic transmissions of chromosome 17. The horizontal red lines indicate the position of c.234delC in MYL4, and the small red square surrounding the line indicates the region around c.234delC shared identically by decent by the founding couple. The length of this interval is estimated to be 3.3 cM. The first and last 10 cM of the chromosome have been truncated. The sisters f2-II:6 and f2-II:9 are imputed to be carrying c.234delC on their paternal chromosome on the basis of the chromosomal region around the deletion having been transmitted to their children (dark blue). There is no clear transmission of either sister’s maternal chromosomal region around the deletion to one of her children (although f2-III:3 may have inherited her mother’s paternal chromosome, but a crossover occurred in the region around c.234delC where f2-III:3 is homozygous). However, for both sisters, the maternal chromosome carrying the deletion (light blue) was transmitted to an offspring at regions on both sides of MYL4 (to f2-III:2 and f2-III:4) such that, unless a double crossover occurred around MYL4, they both carry c.234delC on their maternal chromosome.
- Supplementary Figure 8: The families of the BVVL cases. (53 KB)
Shown are birth years and genotypes at the SLC52A2 mutation, where W denotes the wild-type allele and M denotes the mutated allele. Symbols for cases are colored black, and the symbols corresponding to the two siblings of case 4 who died early are colored gray. A forward slash indicates that the individual is deceased, and a backward slash indicates that an SLC52A2 genotype is not available for that individual.
- Supplementary Figure 9: The effect of the filtering steps on the number of sequence variants that are candidates for causing BVVL syndrome in the two sisters. (111 KB)
The occurrence of a rare syndrome such as BVVL in two sisters suggests that it is caused by a rare genotype with high penetrance. We therefore restricted our search to LoF and MODERATE-impact variants. The sisters are affected but neither parent is, which suggests an autosomal recessive mode of inheritance. Allelic frequency over 2% would dictate a homozygous frequency of over 1 in 2,500, which would be too high for BVVL syndrome. This brought the number of potential variants down to 3 from 147. This would not have been possible using non-Icelandic resources such as ESP, as 4 of the 147 variants are not present in the database. We note that crude filtering, such as removing all variants present in public databases, would result in removing the causative sequence variant. This left us with three correlated MODERATE-impact variants on chromosome 8q24.3: p.Leu339Pro (rs148234606) in SLC52A2, p.Gln931Arg in OPLAH and c.2982C>T in CPSF1. No one was imputed to be homozygous for the SLC52A2 variant in the set of additional chip-typed Icelanders, whereas 5 and 19 Icelanders were imputed to be homozygous for the OPLAH and CPSF1 SNPs, respectively. No early deaths were reported among these homozygous carriers, and the oldest homozygous carriers reached ages 77 and 89 years for OPLAH and CPSF1, respectively, which is inconsistent with diagnosis of BVVL, as only 1 of 77 reported BVVL syndrome cases has lived past 60 years36, 37, 38, 39, 40, 41, 42, 43, 44.
- Supplementary Text and Figures (1,719 KB)
Supplementary Figures 1–9, Supplementary Tables 1–15 and Supplementary Note.