Genome-wide in situ exon capture for selective resequencing

Article metrics

Abstract

Increasingly powerful sequencing technologies are ushering in an era of personal genome sequences and raising the possibility of using such information to guide medical decisions. Genome resequencing also promises to accelerate the identification of disease-associated mutations. Roughly 98% of the human genome is composed of repeats and intergenic or non–protein-coding sequences. Thus, it is crucial to focus resequencing on high-value genomic regions. Protein-coding exons represent one such type of high-value target. We have developed a method of using flexible, high-density microarrays to capture any desired fraction of the human genome, in this case corresponding to more than 200,000 protein-coding exons. Depending on the precise protocol, up to 55–85% of the captured fragments are associated with targeted regions and up to 98% of intended exons can be recovered. This methodology provides an adaptable route toward rapid and efficient resequencing of any sizeable, non-repeat portion of the human genome.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Array-based exon selection scheme followed by Illumina 1G sequencing.
Figure 2: Read-exon distance distributions.
Figure 3: Pairwise comparisons of exon-capture specificity.
Figure 4: Exon coverage versus Illumina 1G read depth.
Figure 5: Effect of a variant input DNA preparation on capture efficiency and read depth.

References

  1. 1

    Topol, E.J. & Frazer, K.A. The resequencing imperative. Nat. Genet. 39, 439–440 (2007).

  2. 2

    Greenman, C. et al. Patterns of somatic mutation in human cancer genomes. Nature 446, 153–158 (2007).

  3. 3

    Futreal, P.A., Wooster, R. & Stratton, M.R. Somatic mutations in human cancer: insights from resequencing the protein kinase gene family. Cold Spring Harb. Symp. Quant. Biol. 70, 43–49 (2005).

  4. 4

    Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005).

  5. 5

    Bentley, D.R. Whole-genome re-sequencing. Curr. Opin. Genet. Dev. 16, 545–552 (2006).

  6. 6

    Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).

  7. 7

    Gunderson, K.L., Steemers, F.J., Lee, G., Mendoza, L.G. & Chee, M.S. A genome-wide scalable SNP genotyping assay using microarray technology. Nat. Genet. 37, 549–554 (2005).

  8. 8

    Ren, B. et al. Genome-wide location and function of DNA binding proteins. Science 290, 2306–2309 (2000).

  9. 9

    Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).

  10. 10

    Johnson, D.S., Mortazavi, A., Myers, R.M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).

  11. 11

    Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).

  12. 12

    Sjoblom, T. et al. The consensus coding sequences of human breast and colorectal cancers. Science 314, 268–274 (2006).

  13. 13

    Cleary, M.A. et al. Production of complex nucleic acid libraries using highly parallel in situ oligonucleotide synthesis. Nat. Methods 1, 241–248 (2004).

  14. 14

    Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).

  15. 15

    International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).

  16. 16

    International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  17. 17

    Morgulis, A., Gertz, E.M., Schaffer, A.A. & Agarwala, R. WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22, 134–141 (2006).

Download references

Acknowledgements

The authors thank M.Q. Zhang and A. Smith for their help in the read mapping and analysis, J. Silva for providing the MCF10A cell line DNA, and M. Rooks, S. McCarthy and members of the McCombie and Hannon laboratories for helpful discussion. G.J.H. is an Investigator of the Howard Hughes Medical Institute and is supported in part by a kind gift from Kathryn W. Davis and major support from the Stanley Foundation. Purchase of instrumentation and this work were supported in part by grants from the US National Science Foundation and National Institutes of Health (M.Q. Zhang, G.J.H. and W.R.M.).

Author information

Correspondence to Gregory J Hannon or W Richard McCombie.

Ethics declarations

Competing interests

T.J.A., M.N.M., S.W.S., C.M.M. and M.J.R. are employees of Nimblegen, Inc.

Supplementary information

Supplementary Text and Figures

Supplementary Figure 1 and Supplementary Table 1 (PDF 93 kb)

Rights and permissions

Reprints and Permissions

About this article

Further reading