Abstract
Whole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch parent-offspring families and constructed a haplotype map of 20.4 million single-nucleotide variants and 1.2 million insertions and deletions. The intermediate coverage (∼13×) and trio design enabled extensive characterization of structural variation, including midsize events (30–500 bp) previously poorly catalogued and de novo mutations. We demonstrate that the quality of the haplotypes boosts imputation accuracy in independent samples, especially for lower frequency alleles. Population genetic analyses demonstrate fine-scale structure across the country and support multiple ancient migrations, consistent with historical changes in sea level and flooding. The GoNL Project illustrates how single-population whole-genome sequencing can provide detailed characterization of genetic variation and may guide the design of future population studies.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Hinds, D.A. et al. Whole-genome patterns of common DNA variation in three human populations. Science 307, 1072–1079 (2005).
International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).
International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).
International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Manolio, T.A. Bringing genome-wide association findings into clinical use. Nat. Rev. Genet. 14, 549–558 (2013).
Visscher, P.M., Brown, M.A., McCarthy, M.I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012).
McClellan, J. & King, M.-C. Genetic heterogeneity in human disease. Cell 141, 210–217 (2010).
Gibson, G. Rare and common variants: twenty arguments. Nat. Rev. Genet. 13, 135–145 (2011).
Goldstein, D.B. et al. Sequencing studies in human genetics: design and interpretation. Nat. Rev. Genet. 14, 460–470 (2013).
Weischenfeldt, J., Symmons, O., Spitz, F. & Korbel, J.O. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat. Rev. Genet. 14, 125–138 (2013).
Veltman, J.A. & Brunner, H.G. De novo mutations in human genetic disease. Nat. Rev. Genet. 13, 565–575 (2012).
Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013).
Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl. Acad. Sci. USA 108, 11983–11988 (2011).
Mathieson, I. & McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 44, 243–246 (2012).
Boomsma, D.I. et al. The Genome of the Netherlands: design, and project goals. Eur. J. Hum. Genet. 22, 221–227 (2014).
Brandsma, M. et al. How to kickstart a national biobanking infrastructure—experiences and prospects of BBMRI-NL. Nor. Epidemiol. 21, 143–148 (2012).
DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Menelaou, A. & Marchini, J. Genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold. Bioinformatics 29, 84–91 (2013).
1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
MacArthur, D.G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).
Tennessen, J.A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).
Kiezun, A. et al. Exome sequencing and the genetic basis of complex traits. Nat. Genet. 44, 623–630 (2012).
Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S. & Goldstein, D.B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).
Stenson, P.D. et al. The Human Gene Mutation Database: 2008 update. Genome Med. 1, 13 (2009).
Cooper, D.N., Krawczak, M., Polychronakos, C., Tyler-Smith, C. & Kehrer-Sawatzki, H. Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease. Hum. Genet. 132, 1077–1130 (2013).
Cassa, C.A., Tong, M.Y. & Jordan, D.M. Large numbers of genetic variants considered to be pathogenic are common in asymptomatic individuals. Hum. Mutat. 34, 1216–1220 (2013).
Dorschner, M.O. et al. Actionable, pathogenic incidental findings in 1,000 participants' exomes. Am. J. Hum. Genet. 93, 631–640 (2013).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Kong, A. et al. Rate of de novo mutations and the importance of father's age to disease risk. Nature 488, 471–475 (2012).
Michaelson, J.J. et al. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation. Cell 151, 1431–1442 (2012).
Howie, B.N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
Lao, O. et al. Correlation between genetic and geographic structure in Europe. Curr. Biol. 18, 1241–1248 (2008).
Novembre, J. et al. Genes mirror geography within Europe. Nature 456, 98–101 (2008).
Ralph, P. & Coop, G. The geography of recent genetic ancestry across Europe. PLoS Biol. 11, e1001555 (2013).
Bhatia, G., Patterson, N., Sankararaman, S. & Price, A.L. Estimating and interpreting FST: the impact of rare variants. Genome Res. 23, 1514–1521 (2013).
Zheng, H.-X., Yan, S., Qin, Z.-D. & Jin, L. MtDNA analysis of global populations support that major population expansions began before Neolithic Time. Sci. Rep. 2, 745 (2012).
Abdellaoui, A. et al. Population structure, migration, and diversifying selection in the Netherlands. Eur. J. Hum. Genet. 21, 1277–1285 (2013).
Lao, O. et al. Clinal distribution of human genomic diversity across the Netherlands despite archaeological evidence for genetic discontinuities in Dutch population history. Investig. Genet. 4, 9 (2013).
Novembre, J. & Stephens, M. Interpreting principal component analyses of spatial population genetic variation. Nat. Genet. 40, 646–649 (2008).
Gusev, A. et al. Whole population, genome-wide mapping of hidden relatedness. Genome Res. 19, 318–326 (2009).
Palamara, P.F., Lencz, T., Darvasi, A. & Pe'er, I. Length distributions of identity by descent reveal fine-scale demographic history. Am. J. Hum. Genet. 91, 809–822 (2012).
Gratten, J., Visscher, P.M., Mowry, B.J. & Wray, N.R. Interpreting the role of de novo protein-coding mutations in neuropsychiatric disease. Nat. Genet. 45, 234–238 (2013).
MacArthur, D.G. et al. Guidelines for investigating causality of sequence variants in human disease. Nature 508, 469–476 (2014).
Boettger, L.M., Handsaker, R.E., Zody, M.C. & McCarroll, S.A. Structural haplotypes and recent evolution of the human 17q21.31 region. Nat. Genet. 44, 881–885 (2012).
Jia, X. et al. Imputing amino acid polymorphisms in human leukocyte antigens. PLoS ONE 8, e64683 (2013).
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Ye, K., Schulz, M.H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009).
Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6, 677–681 (2009).
Abyzov, A., Urban, A.E., Snyder, M. & Gerstein, M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974–984 (2011).
Coe, B.P., Chari, R., MacAulay, C. & Lam, W.L. FACADE: a fast and sensitive algorithm for the segmentation and calling of high resolution array CGH data. Nucleic Acids Res. 38, e157 (2010).
Marschall, T., Hajirasouliha, I. & Schönhuth, A. MATE-CLEVER: Mendelian-inheritance-aware discovery and genotyping of midsize and long indels. Bioinformatics 29, 3143–3150 (2013).
Handsaker, R.E., Korn, J.M., Nemesh, J. & McCarroll, S.A. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat. Genet. 43, 269–276 (2011).
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).
Anderson, S. et al. Sequence and organization of the human mitochondrial genome. Nature 290, 457–465 (1981).
Andrews, R.M. et al. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat. Genet. 23, 147 (1999).
van Oven, M. & Kayser, M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30, E386–E394 (2009).
Excoffier, L. & Lischer, H.E.L. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 10, 564–567 (2010).
Ewing, B., Hillier, L., Wendl, M. & Green, P. Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. 8, 175–185 (1998).
Ewing, B. & Green, P. Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).
Wijaya, E., Frith, M.C., Suzuki, Y. & Horton, P. Recount: expectation maximization based error correction tool for next generation sequencing data. Genome Inform. 23, 189–201 (2009).
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Habegger, L. et al. VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment. Bioinformatics 28, 2267–2269 (2012).
Reumers, J. et al. SNPeffect: a database mapping molecular phenotypic effects of human non-synonymous coding SNPs. Nucleic Acids Res. 33, D527–D532 (2005).
Adzhubei, I., Jordan, D.M. & Sunyaev, S.R. Predicting functional effect of human missense mutations using PolyPhen-2. Curr. Protoc. Hum. Genet. Chapter 7, Unit 7.20 (2013).
Cooper, G.M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005).
Pruitt, K.D. et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42, D756–D763 (2014).
Browning, B.L. & Yu, Z. Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies. Am. J. Hum. Genet. 85, 847–861 (2009).
Delaneau, O., Marchini, J. & Zagury, J.-F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2012).
Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 78–81 (2010).
Huisman, M.H.B. et al. Population based epidemiology of amyotrophic lateral sclerosis using capture-recapture methodology. J. Neurol. Neurosurg. Psychiatry 82, 1165–1170 (2011).
Howie, B., Marchini, J. & Stephens, M. Genotype imputation with thousands of genomes. G3 1, 457–470 (2011).
Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Browning, B.L. & Browning, S.R. A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet. 88, 173–182 (2011).
Palamara, P.F. & Pe'er, I. Inference of historical migration rates via haplotype sharing. Bioinformatics 29, i180–i188 (2013).
Ward, J.H. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963).
Palamara, P.F., Lencz, T., Darvasi, A. & Pe'er, I. Length distributions of identity by descent reveal fine-scale demographic history. Am. J. Hum. Genet. 91, 809–822 (2012).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Cockerham, C.C. & Weir, B.S. Covariances of relatives stemming from a population undergoing mixed self and random mating. Biometrics 40, 157–164 (1984).
Acknowledgements
We wish to dedicate this work to the memory of David R. Cox, an enthusiastic supporter of human genetic research in the Netherlands for many years. The GoNL Project is funded by the BBMRI-NL, a research infrastructure financed by the Netherlands Organization for Scientific Research (NWO project 184.021.007). We acknowledge additional financial support from eBioGrid, CTMM/TraIT, the Ubbo Emmius Fund, the Netherlands Bioinformatics Center (NBIC) and EU-BioSHARE. We thank the individual participants of the biobanks; M. Depristo, E. Banks, R. Poplin and G. del Angel from the Broad Institute for expert advice on setting up our alignment and calling pipeline; K. Garimella for the initial implementation of PhaseByTransmission; G. Strikwerda, W. Albers, R. Teeninga, H. Gankema and H. Wind of the Groningen Center for Information Technology (see URLs) for support of the compute cluster and Target storage; E. Valentyn and R. Williams of Target (see URLs) for hosting project data on IBM GPFS storage; T. Visser and I. Nooren of BiG Grid (see URLs) and SURFsara for providing backup storage, additional computing capacity and expert advice; the team from MOLGENIS (see URLs) for software development support; H. Lauvenberg for handling data access requests; K. Zych for design of the GoNL logo; L. Franke, H.-J. Westra and J. Gutierrez-Achury for useful discussions; and S. Raychaudhuri and B. Neale for their critical reading of the manuscript. Target is supported by Samenwerkingsverband Noord Nederland, the European Fund for Regional Development, the Dutch Ministry of Economic Affairs, Pieken in de Delta and the provinces of Groningen and Drenthe. Target operates under the auspices of Sensor Universe. BiG Grid and the Life Science Grid are financially supported by the Netherlands Organization for Scientific Research (NWO). A.A. is funded by the Center for Medical Systems Biology-2, and D.I.B. is funded by the European Research Council (ERC 230374). A.S. and P.I.W.d.B. are recipients of VIDI awards (NWO projects 016.138.318 and 016.126.354, respectively).
Author information
Authors and Affiliations
Consortia
Contributions
P.I.W.d.B., D.I.B., J.A.B., C.M.v.D., G.-J.B.v.O., P.E.S., M.A.S. and C.W. (chair) formed the steering committee of the GoNL Project. Biobanks are managed and organized by A.H., A.G.U., C.M.v.D., B.O., F.R., A.I. (for the Rotterdam and Erasmus Rucphen Family studies), D.I.B., G.W. (for the Netherlands Twin Register), P.E.S., M.B., A.J.M.d.C., H.E.D.S. (for the Leiden Longevity Study) and the members of the LifeLines Cohort Study. P.I.W.d.B. and M.A.S. jointly led the analysis group. Sequencing data were generated at BGI (Shenzhen, China) by Q.L., Y.L., Y.D., R.C., H.C., N.L., S.C. and J.W. Additional Complete Genomics sequencing data were generated by S.J.P., S.P., P.S. and D.R.C. through a partnership with Pfizer. F.v.D., P.B.T.N., P.D., L.C.F., A.K., M.D., H.B., K.J.v.d.V. and M.A.S. formed the operational data stewardship and processing center. P.B.T.N., F.v.D. and M.A.S. designed and implemented the compute cluster. M.D., H.B., A.K. and M.A.S. designed and implemented the MOLGENIS computing platform to scale up analysis pipelines for alignment, variant calling and imputation. F.v.D. and L.C.F. performed alignment with help from I.J.N., J.B. and B.D.C.v.S. L.C.F. and F.v.D. called SNVs. L.C.F., S.L.P., A.M., E.M.v.L., L.C.K., M. Sohail, A.A. and M.V. performed quality control. V.G., K.Y., L.C.F., T.M., A.S., R.E.H., S.A.M., W.P.K., F.H., J.Y.H.-K., E.-W.L., A.A., V.K., H.M., M.H.M. and J.B. formed the structural variation subgroup. L.C.F. developed the PhaseByTransmission module in GATK and performed de novo mutation analyses with P.P. A.M. performed haplotype phasing and imputation benchmarks. J.H.V. and L.H.v.d.B. provided Complete Genomics data for imputation benchmarking. W.P.K. and I.R. performed variant validation. C.W. and M.P. generated Immunochip data on all GoNL samples. S.L.P., C.C.E., A.M., P.F.P., I.P., A.A., N.A., M. Sohail, D.V. and S.R.S. performed population genetic analyses. M.v.O., M.V., M.L., J.F.J.L., M. Stoneking, P.d.K. and M. Kayser performed mitochondrial DNA analysis. P.D., A.M., A.K., E.M.v.L., L.C.K., K.E., C.M.-G., J.v.S., M. Kattenberg, J.J.H. and D.v.E. formed the imputation subgroup. P.B.T.N., K.J.v.d.V. and M.A.S. were responsible for the GoNL website and associated services (see URLs). C.W. conceived the GoNL Project. P.I.W.d.B. wrote the initial manuscript with critical input from L.C.F., A.M., S.L.P., P.F.P. and C.C.E. C.W., D.I.B., G.-J.B.v.O., L.C.K., A.A., M.A.S., P.E.S., S.R.S., J.Y.H.-K., I.P., J.H.V., P.d.K., W.P.K., T.M., A.S., V.G., J.T.d.D. and M. Kayser provided critical feedback on the manuscript. All authors have seen and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The author declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–22, Supplementary Tables 1–17 and Supplementary Note (PDF 5315 kb)
Rights and permissions
About this article
Cite this article
The Genome of the Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet 46, 818–825 (2014). https://doi.org/10.1038/ng.3021
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3021
This article is cited by
-
An early look at birth cohort genetics in China
Nature (2024)
-
Host genetic regulation of human gut microbial structural variation
Nature (2024)
-
The performance of genome sequencing as a first-tier test for neurodevelopmental disorders
European Journal of Human Genetics (2023)
-
Genome-wide linkage analysis combined with genome sequencing in large families with intracranial aneurysms
European Journal of Human Genetics (2022)
-
Fast and accurate population admixture inference from genotype data from a few microsatellites to millions of SNPs
Heredity (2022)