Letters to Nature
Nature 424, 788-793 (14 August 2003) | doi:10.1038/nature01858; Received 11 April 2003; Accepted 16 June 2003
Comparative analyses of multi-species sequences from targeted genomic regions
J. W. Thomas1,11, J. W. Touchman1,2,11, R. W. Blakesley1,2, G. G. Bouffard1,2, S. M. Beckstrom-Sternberg1,2, E. H. Margulies1, M. Blanchette3, A. C. Siepel3, P. J. Thomas2, J. C. McDowell2, B. Maskeri2, N. F. Hansen2, M. S. Schwartz3, R. J. Weber3, W. J. Kent3, D. Karolchik3, T. C. Bruen3, R. Bevan3, D. J. Cutler4, S. Schwartz5, L. Elnitski5, J. R. Idol1, A. B. Prasad1, S.-Q. Lee-Lin1, V. V. B. Maduro1, T. J. Summers1, M. E. Portnoy1, N. L. Dietrich2, N. Akhter2, K. Ayele2, B. Benjamin2, K. Cariaga2, C. P. Brinkley2, S. Y. Brooks2, S. Granite2, X. Guan2, J. Gupta2, P. Haghighi2, S.-L. Ho2, M. C. Huang2, E. Karlins2, P. L. Laric2, R. Legaspi2, M. J. Lim2, Q. L. Maduro2, C. A. Masiello2, S. D. Mastrian2, J. C. McCloskey2, R. Pearson2, S. Stantripop2, E. E. Tiongson2, J. T. Tran2, C. Tsurgeon2, J. L. Vogt2, M. A. Walker2, K. D. Wetherby2, L. S. Wiggins2, A. C. Young2, L.-H. Zhang2, K. Osoegawa6, B. Zhu6, B. Zhao6, C. L. Shu6, P. J. De Jong6, C. E. Lawrence7, A. F. Smit8, A. Chakravarti4, D. Haussler3,9, P. Green10, W. Miller5 and E. D. Green1,2
- Genome Technology Branch, National Human Genome Research Institute, and
- NIH Intramural Sequencing Center, National Institutes of Health, Bethesda, Maryland 20892, USA
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA
- Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21287, USA
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Children's Hospital Oakland Research Institute, Oakland, California 94609, USA
- The Wadsworth Center for Laboratories and Research, New York State Department of Health, Albany, New York 12201, USA
- The Institute for Systems Biology, Seattle, Washington 98103, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, California 95064, USA
- Howard Hughes Medical Institute and Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
- Present addresses: Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia 30322, USA (J.W.Th.); Translational Genomics Research Institute, Phoenix, Arizona 85004 and Department of Biology, Arizona State University, Tempe, Arizona 85287, USA (J.W.To.)
Correspondence to: E. D. Green1,2
Email: egreen@nhgri.nih.gov
GenBank accession numbers for BAC-derived sequences are provided in the Supplementary Information.
The systematic comparison of genomic sequences from different organisms represents a central focus of contemporary genome analysis. Comparative analyses of vertebrate sequences can identify coding1, 2, 3, 4, 5, 6 and conserved non-coding4, 6, 7 regions, including regulatory elements8, 9, 10, and provide insight into the forces that have rendered modern-day genomes6. As a complement to whole-genome sequencing efforts3, 5, 6, we are sequencing and comparing targeted genomic regions in multiple, evolutionarily diverse vertebrates. Here we report the generation and analysis of over 12 megabases (Mb) of sequence from 12 species, all derived from the genomic region orthologous to a segment of about 1.8 Mb on human chromosome 7 containing ten genes, including the gene mutated in cystic fibrosis. These sequences show conservation reflecting both functional constraints and the neutral mutational events that shaped this genomic region. In particular, we identify substantial numbers of conserved non-coding segments beyond those previously identified experimentally, most of which are not detectable by pair-wise sequence comparisons alone. Analysis of transposable element insertions highlights the variation in genome dynamics among these species and confirms the placement of rodents as a sister group to the primates.
