Recent advances in whole-genome sequencing have brought the vision of personal genomics and genomic medicine closer to reality. However, current methods lack clinical accuracy and the ability to describe the context (haplotypes) in which genome variants co-occur in a cost-effective manner. Here we describe a low-cost DNA sequencing and haplotyping process, long fragment read (LFR) technology, which is similar to sequencing long single DNA molecules without cloning or separation of metaphase chromosomes. In this study, ten LFR libraries were made using only ~100 picograms of human DNA per sample. Up to 97% of the heterozygous single nucleotide variants were assembled into long haplotype contigs. Removal of false positive single nucleotide variants not phased by multiple LFR haplotypes resulted in a final genome error rate of 1 in 10 megabases. Cost-effective and accurate genome sequencing and haplotyping from 10–20 human cells, as demonstrated here, will enable comprehensive genetic studies and diverse clinical applications.
At a glance
Sequence Read Archive
- Human. genome: Genomes by the thousand. Nature 467, 1026–1027 (2010)
- The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008) et al.
- The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008) et al.
- DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456, 66–72 (2008) et al.
- Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008) et al.
- The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res. 19, 1622–1629 (2009) et al.
- A highly annotated whole-genome sequence of a Korean individual. Nature 460, 1011–1015 (2009) et al.
- Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19, 1527–1541 (2009) et al.
- Single-molecule sequencing of an individual human genome. Nature Biotechnol. 27, 847–850 (2009) , &
- Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 78–81 (2010) et al.
- Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nature Biotechnol. 29, 59–63 (2011) et al.
- An integrated semiconductor device enabling non-optical genome sequencing. Nature 475, 348–352 (2011) et al.
- A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res. 21, 1672–1685 (2011) et al.
- The sequence of the human genome. Science 291, 1304–1351 (2001) et al.
- Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001) et al.
- The importance of phase information for human genomics. Nature Rev. Genet. 12, 215–223 (2011) , , , &
- Haplotype phasing: existing methods and new developments. Nature Rev. Genet. 12, 703–714 (2011) &
- Chromosomal haplotypes by genetic phasing of human families. Am. J. Hum. Genet. 89, 382–397 (2011) et al.
- The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007) et al.
- Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques. Nucleic Acids Res. 40, 2041–2053 (2012) et al.
- Long-range polony haplotyping of individual human chromosome molecules. Nature Genet. 38, 382–387 (2006) et al.
- Direct determination of molecular haplotypes by chromosome microdissection. Nature Methods 7, 299–301 (2010) et al.
- Whole-genome molecular haplotyping of single cells. Nature Biotechnol. 29, 51–57 (2011) , , &
- Completely phased genome sequencing through chromosome sorting. Proc. Natl Acad. Sci. USA 108, 12–17 (2011) , &
- Nucleic acid analysis by random mixtures of non-overlapping fragments. US patent 7,901. 891 (2006)
- Comprehensive human genome amplification using multiple displacement amplification. Proc. Natl Acad. Sci. USA 99, 5261–5266 (2002) et al.
- Method and apparatus for quantification of DNA sequencing quality and construction of a characterizable model system using Reed–Solomon codes. US patent PCT/US2010/023083. (2010) &
- A haplotype map of the human genome. Nature 437, 1299–1320 (2005)
- A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007) et al.
- A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010)
- Computational techniques for human genome resequencing using mated gapped reads. J. Comput. Biol. 19, 279–292 (2011) et al.
- Variation in genome-wide mutation rates within and between human families. Nature Genet. 43, 712–714 (2011) et al.
- A method and server for predicting damaging missense mutations. Nature Methods 7, 248–249 (2010) et al.
- A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012) et al.
- Proportionally more deleterious genetic variation in European than in African populations. Nature 451, 994–997 (2008) et al.
- JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32, D91–D94 (2004) , , , &
- JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Res. 36, D102–D106 (2008) et al.
- Supplementary Information (2.6M)
This file contains Supplementary Figures 1-12, Supplementary Material with additional references, Supplementary Methods with additional Figures 1-14 and Supplementary Tables 1-13.