遺伝10〜20個のヒト細胞からの正確な全ゲノム塩基配列解読とハプロタイプ分析

Journal name:
Nature
Volume:
487,
Pages:
190–195
Date published:
DOI:
doi:10.1038/nature11236
Received
Accepted
Published online

Abstract

全ゲノム塩基配列決定の最近の進歩によって、個人ゲノミクスやゲノム医療という構想が実現に近づいた。しかし、現在の手法は臨床的な正確さに欠け、ゲノムの変異が同時に出現する状況(ハプロタイプ)を費用効率よく調べることもできない。本論文では、低コストでDNA塩基配列解読とハプロタイプ分析が可能なロング・フラグメント・リード(long fragment read; LFR)技術について述べる。この方法は、長い単一のDNA分子の塩基配列を、クローニングや中期染色体の分離を行わずに解読するのに似ている。この研究では、ヒトDNAを1試料当たりほぼ100ピコグラム以下という量で、10のLFRライブラリーを作成した。ヘテロ接合の単一ヌクレオチド変異の最大97%までが、長いハプロタイプコンティグに構築された。複数のLFRハプロタイプによって相が特定できない擬陽性の単一ヌクレオチド変異を除外した結果、最終的なゲノムの過誤率は10メガ塩基当たり1個程度になった。10〜20個のヒト細胞からのゲノム塩基配列解読とハプロタイプ分析を正確で費用効率よく行える方法は、ここで実証されたように、包括的な遺伝学的研究とさまざまな臨床応用とを可能にするだろう。

At a glance

Figures

  1. The LFR technology.
    Figure 1: The LFR technology.

    An overview of the LFR technology and controlled random enzymatic fragmenting is shown. (i) First, 100–130pg of high molecular mass (HMM) DNA is physically separated into 384 distinct wells; (ii) through several steps, all within the same well without intervening purifications, the genomic DNA is amplified, fragmented and ligated to unique barcode adapters; (iii) all 384 wells are combined, purified and introduced into the sequencing platform of Complete Genomics10; (iv) mate-paired reads are mapped to the genome using a custom alignment program and barcode sequences are used to group tags into haplotype contigs; and (v) the final result is a diploid genome sequence.

  2. LFR haplotyping algorithm.
    Figure 2: LFR haplotyping algorithm.

    a, Variation extraction. Variations are extracted from the aliquot tagged reads. The 10-base Reed–Solomon codes enable tag recovery by error correction. M denotes the number of genomic reads in the set (approximately 8billion); N denotes the number of the candidate heterozygous loci in the genome (~3 million). b, Heterozygous SNP pair connectivity evaluation. The matrix of shared aliquots is computed for each heterozygous SNP pair within a certain neighbourhood. Loop 1 is over all the heterozygous SNPs. Loop 2 is over all the heterozygous SNPs on the chromosome that are in the neighbourhood of the heterozygous SNPs in loop 1 (K). This neighbourhood is constrained by the expected number of heterozygous SNPs and the expected fragment lengths. c, Graph generation. An undirected graph is made, with nodes corresponding to the heterozygous SNPs and the connections corresponding to the orientation and the strength of the best hypothesis for the relationship between those SNPs. The orientation is binary and is shown in the figure with a colour. Red and green depict a flipped and unflipped relationship between heterozygous SNP pairs, respectively. The strength is defined by using fuzzy logic operations on the elements of the shared aliquot matrix. d, Graph optimization. The graph is optimized by a minimum spanning tree operation. e, Contig generation. Each sub-tree is reduced to a contig by keeping the first heterozygous SNP unchanged, and flipping or not flipping the other heterozygous SNPs on the sub-tree, based on their paths to the first heterozygous SNP. The designation of parent 1 (P1) and parent 2 (P2) to each contig is arbitrary. The gaps in the chromosome-wide tree define the boundaries for different sub-trees/contigs on that chromosome. f, Optional mapping of LFR contigs to parental chromosomes. Using parental information, a ‘mother’ or ‘father’ label is placed on the P1 and P2 haplotypes of each contig.

Accession codes

Primary accessions

Sequence Read Archive

References

  1. Human. genome: Genomes by the thousand. Nature 467, 10261027 (2010)
  2. Wheeler, D. A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872876 (2008)
  3. Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 6065 (2008)
  4. Ley, T. J. et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456, 6672 (2008)
  5. Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 5359 (2008)
  6. Ahn, S. M. et al. The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res. 19, 16221629 (2009)
  7. Kim, J. I. et al. A highly annotated whole-genome sequence of a Korean individual. Nature 460, 10111015 (2009)
  8. McKernan, K. J. et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19, 15271541 (2009)
  9. Pushkarev, D., Neff, N. F. & Quake, S. R. Single-molecule sequencing of an individual human genome. Nature Biotechnol. 27, 847850 (2009)
  10. Drmanac, R. et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327, 7881 (2010)
  11. Kitzman, J. O. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nature Biotechnol. 29, 5963 (2011)
  12. Rothberg, J. M. et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature 475, 348352 (2011)
  13. Suk, E. K. et al. A comprehensively molecular haplotype-resolved genome of a European individual. Genome Res. 21, 16721685 (2011)
  14. Venter, J. C. et al. The sequence of the human genome. Science 291, 13041351 (2001)
  15. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860921 (2001)
  16. Tewhey, R., Bansal, V., Torkamani, A., Topol, E. J. & Schork, N. J. The importance of phase information for human genomics. Nature Rev. Genet. 12, 215223 (2011)
  17. Browning, S. R. & Browning, B. L. Haplotype phasing: existing methods and new developments. Nature Rev. Genet. 12, 703714 (2011)
  18. Roach, J. C. et al. Chromosomal haplotypes by genetic phasing of human families. Am. J. Hum. Genet. 89, 382397 (2011)
  19. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007)
  20. Duitama, J. et al. Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques. Nucleic Acids Res. 40, 20412053 (2012)
  21. Zhang, K. et al. Long-range polony haplotyping of individual human chromosome molecules. Nature Genet. 38, 382387 (2006)
  22. Ma, L. et al. Direct determination of molecular haplotypes by chromosome microdissection. Nature Methods 7, 299301 (2010)
  23. Fan, H. C., Wang, J., Potanina, A. & Quake, S. R. Whole-genome molecular haplotyping of single cells. Nature Biotechnol. 29, 5157 (2011)
  24. Yang, H., Chen, X. & Wong, W. H. Completely phased genome sequencing through chromosome sorting. Proc. Natl Acad. Sci. USA 108, 1217 (2011)
  25. Drmanac, R. Nucleic acid analysis by random mixtures of non-overlapping fragments. US patent 7,901. 891 (2006)
  26. Dean, F. B. et al. Comprehensive human genome amplification using multiple displacement amplification. Proc. Natl Acad. Sci. USA 99, 52615266 (2002)
  27. Kermani, B. G. & Shannon, K. W. Method and apparatus for quantification of DNA sequencing quality and construction of a characterizable model system using Reed–Solomon codes. US patent PCT/US2010/023083. (2010)
  28. The International HapMap Consortium A haplotype map of the human genome. Nature 437, 12991320 (2005)
  29. Frazer, K. A. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851861 (2007)
  30. The 1000 Genomes Project Consortium A map of human genome variation from population-scale sequencing. Nature 467, 10611073 (2010)
  31. Carnevali, P. et al. Computational techniques for human genome resequencing using mated gapped reads. J. Comput. Biol. 19, 279292 (2011)
  32. Conrad, D. F. et al. Variation in genome-wide mutation rates within and between human families. Nature Genet. 43, 712714 (2011)
  33. Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nature Methods 7, 248249 (2010)
  34. MacArthur, D. G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823828 (2012)
  35. Lohmueller, K. E. et al. Proportionally more deleterious genetic variation in European than in African populations. Nature 451, 994997 (2008)
  36. Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W. W. & Lenhard, B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32, D91D94 (2004)
  37. Bryne, J. C. et al. JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Res. 36, D102D106 (2008)

Download references

Author information

  1. These authors contributed equally to this work.

    • Brock A. Peters &
    • Bahram G. Kermani

Affiliations

  1. Complete Genomics, Inc., 2071 Stierlin Court, Mountain View, California 94043, USA

    • Brock A. Peters,
    • Bahram G. Kermani,
    • Andrew B. Sparks,
    • Oleg Alferov,
    • Peter Hong,
    • Andrei Alexeev,
    • Yuan Jiang,
    • Fredrik Dahl,
    • Y. Tom Tang,
    • Juergen Haas,
    • Joseph E. Peterson,
    • Helena Perazich,
    • George Yeung,
    • Jia Liu,
    • Linsu Chen,
    • Michael I. Kennemer,
    • Kaliprasad Pothuraju,
    • Karel Konvicka,
    • Mike Tsoupko-Sitnikov,
    • Krishna P. Pant,
    • Jessica C. Ebert,
    • Geoffrey B. Nilsen,
    • Jonathan Baccash,
    • Aaron L. Halpern &
    • Radoje Drmanac
  2. Department of Genetics, Harvard Medical School, Cambridge, Massachusetts 02115, USA

    • Kimberly Robasky,
    • Alexander Wait Zaranek,
    • Je-Hyuk Lee,
    • Madeleine Price Ball &
    • George M. Church
  3. Program in Bioinformatics, Boston University, Boston, Massachusetts 02215, USA

    • Kimberly Robasky
  4. Wyss Institute for Biologically Inspired Engineering, Harvard Medical School, Cambridge, Massachusetts 02115, USA

    • Je-Hyuk Lee
  5. Present addresses: Aria Diagnostics, 5945 Optical Court, San Jose, California 95138, USA (A.B.S.); Halo Genomics, Dag Hammarskjolds vag 54A, 751 83 Uppsala, Sweden (F.D.).

    • Andrew B. Sparks &
    • Fredrik Dahl

Contributions

B.A.P., B.G.K., A.B.S. and R.D. conceived the study. B.A.P., B.G.K., R.D., O.A., Y.T.T., J.H., J.C.E., J.B., A.L.H. and G.B.N. performed analyses. B.A.P., A.B.S., P.H., A.A., Y.J., F.D., J.E.P., H.P., G.Y., J.L. and L.C. developed the laboratory processes and generated the LFR libraries. K.K., M.T.-S. and K.P.P. developed the basecaller and parts of the analysis pipeline. M.I.K. formatted, managed and uploaded data to the public archives. K.R., A.W.Z., J.-H.L., M.P.B. and G.M.C. generated and analysed the RNA sequencing data. B.A.P., B.G.K. and R.D. coordinated the study and wrote the paper. All authors contributed to revision and review of the manuscript.

Competing financial interests

Employees of Complete Genomics have stock options in the company; Complete Genomics has filed several patents on this work.

Corresponding authors

Correspondence to:

Tagged read data has been deposited with the NCBI short-read archive under accession number SRP012316 All sequence data and haplotype information for LFR libraries generated in this study are also available at http://www.completegenomics.com/LFR.

Author details

Supplementary information

PDF files

  1. Supplementary Information (2.6M)

    This file contains Supplementary Figures 1-12, Supplementary Material with additional references, Supplementary Methods with additional Figures 1-14 and Supplementary Tables 1-13.

Additional data