Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Identification of genetic variants using bar-coded multiplexed sequencing

Abstract

We developed a generalized framework for multiplexed resequencing of targeted human genome regions on the Illumina Genome Analyzer using degenerate indexed DNA bar codes ligated to fragmented DNA before sequencing. Using this method, we simultaneously sequenced the DNA of multiple HapMap individuals at several Encyclopedia of DNA Elements (ENCODE) regions. We then evaluated the use of Bayes factors for discovering and genotyping polymorphisms. For polymorphisms that were either previously identified within the Single Nucleotide Polymorphism database (dbSNP) or visually evident upon re-inspection of archived ENCODE traces, we observed a false positive rate of 11.3% using strict thresholds for predicting variants and 69.6% for lax thresholds. Conversely, false negative rates were 10.8–90.8%, with false negatives at stricter cut-offs occurring at lower coverage (<10 aligned reads). These results suggest that >90% of genetic variants are discoverable using multiplexed sequencing provided sufficient coverage at the polymorphic base.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1
Figure 2: Comparison of index performance.
Figure 3: Relationship between mean and local coverage.
Figure 4: Discovery of variant bases by simultaneous analysis of all individuals.
Figure 5: Relationship between base-level coverage and Bayes factor for polymorphism discovery and variant genotyping.

Accession codes

Accessions

GenBank/EMBL/DDBJ

References

  1. 1

    International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).

  2. 2

    Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  3. 3

    Zondervan, K.T. & Cardon, L.R. Designing candidate gene and genome-wide case-control association studies. Nat. Protoc. 2, 2492–2501 (2007).

    CAS  Article  Google Scholar 

  4. 4

    Meyer, M., Stenzel, U., Myles, S., Prüfer, K. & Hofreiter, M. Targeted high-throughput sequencing of tagged nucleic acid samples. Nucleic Acids Res. 35, e97 (2007).

    Article  Google Scholar 

  5. 5

    Parameswaran, P. et al. A pyrosequencing-tailored nucleotide barcode design unveils opportunities for large-scale sample multiplexing. Nucleic Acids Res. 35, e130 (2007).

    Article  Google Scholar 

  6. 6

    Milosavljevic, A. et al. Pooled genomic indexing of rhesus macaque. Genome Res. 15, 292–301 (2005).

    CAS  Article  Google Scholar 

  7. 7

    Hamady, M., Walker, J.J., Harris, J.K., Gold, N.J. & Knight, R. Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nat. Methods 5, 235–237 (2008).

    CAS  Article  Google Scholar 

  8. 8

    ENCODE Project Consortium et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).

  9. 9

    Albert, T.J. et al. Direct selection of human genomic loci by microarray hybridization. Nat. Methods 4, 903–905 (2007).

    CAS  Article  Google Scholar 

  10. 10

    Hodges, E. et al. Genome-wide in situ exon capture for selective resequencing. Nat. Genet. 39, 1522–1527 (2007).

    CAS  Article  Google Scholar 

  11. 11

    Porreca, G.J. et al. Multiplex amplification of large sets of human exons. Nat. Methods 4, 931–936 (2007).

    CAS  Article  Google Scholar 

  12. 12

    Okou, D.T. et al. Microarray-based genomic selection for high-throughput resequencing. Nat. Methods 4, 907–909 (2007).

    CAS  Article  Google Scholar 

  13. 13

    Jeck, W.R. et al. Extending assembly of short DNA sequences to handle error. Bioinformatics 23, 2942–2944 (2007).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We acknowledge funding from the state of Arizona, US National Heart Lung and Blood Institute (U01 HL086528), the Stardust foundation, Science Foundation Arizona, and National Institute for Neurological Disorders and Strokes (R01 N5059873).

Author information

Affiliations

Authors

Contributions

D.W.C., J.V.P., M.J.H., G.N. and D.A.S. contributed to initial experimental design. S.S., A.S., M.R., J.J.C., T.L. and T.L.P. contributed to development and execution of exact experimental protocols. J.V.P., D.W.C. and N.H. contributed to the development of bioinformatics and analysis pipelines.

Corresponding author

Correspondence to David W Craig.

Ethics declarations

Competing interests

G.N. is an employee of Illumina.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–2, Supplementary Tables 1–5, Supplementary Methods (PDF 434 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Craig, D., Pearson, J., Szelinger, S. et al. Identification of genetic variants using bar-coded multiplexed sequencing. Nat Methods 5, 887–893 (2008). https://doi.org/10.1038/nmeth.1251

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing