Nature Genetics
- 38, 1413 - 1418 (2006)
Published online: 22 November 2006; | doi:10.1038/ng1921
Genome assembly comparison identifies structural variants in the human genomeRazi Khaja1, Junjun Zhang1, Jeffrey R MacDonald1, Yongshu He1, Ann M Joseph-George1, John Wei1, Muhammad A Rafiq1, 2, Cheng Qian1, Mary Shago1, Lorena Pantano3, Hiroyuki Aburatani4, Keith Jones5, Richard Redon6, Matthew Hurles6, Lluis Armengol3, Xavier Estivill3, 7, Richard J Mural8, Charles Lee9, Stephen W Scherer1 & Lars Feuk11
Program in Genetics and Genomic Biology, The Hospital for Sick Children and Department of Molecular and Medical Genetics, University of Toronto and The Centre for Applied Genomics, MaRS Centre, Toronto, Ontario, M5G 1L7, Canada. 2
Department of Biosciences, Commission on Science and Technology for Sustainable Development in the South Institute of Information Technology (CIIT), Islamabad-44000, Pakistan. 3
Genes and Disease Program, Center for Genomic Regulation, Charles Darwin s/n, Barcelona Biomedical Research Park, 08003 Barcelona, Catalonia, Spain. 4
Genome Science Laboratory, Research Center for Advanced Science and Technology, University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo 153-8904 Japan. 5
Affymetrix, Inc., Santa Clara, California 95051, USA. 6
The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. 7
Pompeu Fabra University, Charles Darwin s/n, and National Genotyping Centre, Passeig Marítim 37-49, Barcelona Biomedical Research Park, Barcelona, Catalonia, Spain. 8
Windber Research Institute, 620 7th Street, Windber, Pennsylvania 15963-1331, USA. 9
Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, 20 Shattuck St., Boston, Massachusetts 02115, USA.
Correspondence should be addressed to steve@genet.sickkids.on.ca Numerous types of DNA variation exist, ranging from SNPs to larger structural alterations such as copy number variants (CNVs) and inversions. Alignment of DNA sequence from different sources has been used to identify SNPs1,
2 and intermediate-sized variants (ISVs)3. However, only a small proportion of total heterogeneity is characterized, and little is known of the characteristics of most smaller-sized (<50 kb) variants. Here we show that genome assembly comparison is a robust approach for identification of all classes of genetic variation. Through comparison of two human assemblies (Celera's R27c compilation and the Build 35 reference sequence), we identified megabases of sequence (in the form of 13,534 putative non-SNP events) that were absent, inverted or polymorphic in one assembly. Database comparison and laboratory experimentation further demonstrated overlap or validation for 240 variable regions and confirmed >1.5 million SNPs. Some differences were simple insertions and deletions, but in regions containing CNVs, segmental duplication and repetitive DNA, they were more complex. Our results uncover substantial undescribed variation in humans, highlighting the need for comprehensive annotation strategies to fully interpret genome scanning and personalized sequencing projects.
MORE ARTICLES LIKE THIS These links to content published by NPG are automatically generated.
|