Subjects

Abstract

An understanding of how centromeric transition regions are organized is a critical aspect of chromosome structure and function; however, the sequence context of these regions has been difficult to resolve on the basis of the draft genome sequence. We present a detailed analysis of the structure and assembly of all human pericentromeric regions (5 megabases). Most chromosome arms (35 out of 43) show a gradient of dwindling transcriptional diversity accompanied by an increasing number of interchromosomal duplications in proximity to the centromere. At least 30% of the centromeric transition region structure originates from euchromatic gene-containing segments of DNA that were duplicatively transposed towards pericentromeric regions at a rate of six–seven events per million years during primate evolution. This process has led to the formation of a minimum of 28 new transcripts by exon exaptation and exon shuffling, many of which are primarily expressed in the testis. The distribution of these duplicated segments is nonrandom among pericentromeric regions, suggesting that some regions have served as preferential acceptors of euchromatic DNA.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. Sequencing of a rice centromere uncovers active genes. Nature Genet. 36, 138–145 (2004)

  2. 2.

    et al. Molecular structure and evolution of an alpha/non-alpha satellite junction at 16p11. Hum. Mol. Genet. 9, 113–123 (2000)

  3. 3.

    Duplicate, decouple, disperse: the evolutionary transience of human centromeric regions. Curr. Opin. Genet. Dev. 13, 629–635 (2003)

  4. 4.

    , Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)

  5. 5.

    Masquerading repeats: Paralogous pitfalls of the Human Genome. Genome Res. 8, 758–762 (1998)

  6. 6.

    , , , & Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 11, 1005–1017 (2001)

  7. 7.

    et al. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc. Natl Acad. Sci. USA 101, 1916–1921 (2004)

  8. 8.

    et al. Human-specific duplication and mosaic transcripts: the recent paralogous structure of chromosome 22. Am. J. Hum. Genet. 70, 83–100 (2002)

  9. 9.

    , & An Alu transposition model for the origin and expansion of human segmental duplications. Am. J. Hum. Genet. 73, 823–834 (2003)

  10. 10.

    , , & Lessons from the human genome: transitions between euchromatin and heterochromatin. Hum. Mol. Genet. 10, 2215–2223 (2001)

  11. 11.

    , , , & Genomic and genetic definition of a functional human centromere. Science 294, 109–115 (2001)

  12. 12.

    et al. Using a pericentromeric interspersed repeat to recapitulate the phylogeny and expansion of human centromeric segmental duplications. Mol. Biol. Evol. 20, 1463–1479 (2003)

  13. 13.

    et al. Neocentromeres in 15q24–26 map to duplicons which flanked an ancestral centromere in 15q25. Genome Res. 13, 2059–2068 (2003)

  14. 14.

    & The origin of man: a chromosomal pictorial legacy. Science 215, 1525–1530 (1982)

  15. 15.

    et al. An alphoid DNA sequence conserved in all human and great ape chromosomes: evidence for ancient centromeric sequences at human chromosomal regions 2q21 and 9q13. Hum. Genet. 90, 577–583 (1993)

  16. 16.

    et al. Analysis of primate genomic variation reveals a repeat-driven expansion of the human genome. Genome Res. 13, 358–368 (2003)

  17. 17.

    , & Centromere emergence in evolution. Genome Res. 11, 595–599 (2001)

  18. 18.

    et al. Chromosome 6 phylogeny in primates and centromere repositioning. Mol. Biol. Evol. 20, 1506–1512 (2003)

  19. 19.

    Recent duplication, domain accretion and the dynamic mutation of the human genome. Trends Genet. 17, 661–669 (2001)

  20. 20.

    et al. Molecular evolution of the human chromosome 15 pericentromeric region. Cytogenet. Genome Res. (in the press)

  21. 21.

    et al. The 200-kb segmental duplication on human chromosome 21 originates from a pericentromeric dissemination involving human chromosomes 2, 18 and 13. Gene 312, 51–59 (2003)

  22. 22.

    et al. Segmental duplications in euchromatic regions of human chromosome 5: a source of evolutionary instability and transcriptional innovation. Genome Res. 13, 369–381 (2003)

  23. 23.

    , & The mosaic structure of a 2p11 pericentromeric segment: A strategy for characterizing complex regions of the human genome. Genome Res. 10, 839–852 (2000)

  24. 24.

    et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002)

  25. 25.

    et al. Genomic sequence and transcriptional profile of the boundary between pericentromeric satellites and genes on human chromosome arm 10p. Genome Res. 13, 159–172 (2003)

  26. 26.

    et al. The DNA sequence of human chromosome 7. Nature 424, 157–164 (2003)

  27. 27.

    et al. Human paralogs of KIAA0187 were created through independent pericentromeric-directed and chromosome-specific duplication mechanisms. Genome Res. 12, 67–80 (2002)

  28. 28.

    et al. Sequences flanking the centromere of human chromosome 10 are a complex patchwork of arm-specific sequences, stable duplications, and unstable sequences with homologies to telomeric and other centromeric locations. Hum. Mol. Genet. 8, 205–215 (1999)

  29. 29.

    et al. Large-scale variation among human and great ape genomes determined by array comparative genomic hybridization. Genome Res. 13, 347–357 (2003)

  30. 30.

    et al. Pericentromeric duplications in the laboratory mouse. Genome Res. 13, 55–63 (2003)

  31. 31.

    , & Recent segmental duplications in the working draft assembly of the brown Norway rat. Genome Res. 14, 493–506 (2004)

  32. 32.

    , , , & Hotspots of mammalian chromosomal evolution. Genome Biol. 5, R23 (2004)

  33. 33.

    et al. Genetic definition and sequence analysis of Arabidopsis centromeres. Science 286, 2468–2474 (1999)

  34. 34.

    et al. Molecular phylogeny of the New World monkeys (Platyrrhini, primates) based on two unlinked nuclear genes: IRBP intron 1 and epsilon-globin sequences. Am. J. Phys. Anthropol. 100, 153–179 (1996)

  35. 35.

    & Chromosome-specific subsets of human alpha satellite DNA: analysis of sequence divergence within and between chromosomal subsets and evidence for an ancestral pentameric repeat. J. Mol. Evol. 25, 207–214 (1987)

  36. 36.

    , , , & Alpha-satellite DNA of primates: old and new families. Chromosoma 110, 253–266 (2001)

  37. 37.

    , , , & Human centromeric DNAs. Hum. Genet. 100, 291–304 (1997)

  38. 38.

    , High resolution-banding. Cytogenet. Cell Genet. 31, 1–23 (1981)

  39. 39.

    et al. Closing the gaps on human chromosome 19 revealed genes with a high density of repetitive tandemly arrayed elements. Genome Res. 14, 239–246 (2004)

  40. 40.

    & Analysis of the centromeric regions of the human genome assembly. Trends Genet. (in the press)

Download references

Acknowledgements

We are grateful to the large-scale sequencing centres (Baylor College of Medicine, Cold Spring Harbor Laboratory, Genome Therapeutics Corporation, Harvard Partners Genome Center, Joint Genome Institute, The NIH Intramural Sequencing Center, The UK-MRC Sequencing Consortium, The University of Oklahoma Advanced Center for Genome Technology, The University of Texas Southwest, The Whitehead Institute for Biomedical Research, The Washington University Genome Sequencing Center and the Wellcome Trust Sanger Institute) for access to all large-scale finished sequence, genome assembly and trace sequence data from the human genome before publication. This work was supported by grants from NIH and DOE to E.E.E. and grants from P.R.I.N.C.E., MURST and Telethon to M.R.

Author information

Author notes

    • Xinwei She
    •  & Julie E. Horvath

    These authors contributed equally to this work

Affiliations

  1. Department of Genetics, Center for Computational Genomics and the Center for Human Genetics, Case Western Reserve University School of Medicine and University Hospitals of Cleveland, Cleveland, Ohio 44106, USA

    • Xinwei She
    • , Julie E. Horvath
    • , Zhaoshi Jiang
    • , Ge Liu
    • , Laurie Christ
    • , Royden Clark
    • , Cassy L. Gulden
    • , Can Alkan
    • , Jeff A. Bailey
    • , Cenk Sahinalp
    • , Stuart Schwartz
    •  & Evan E. Eichler
  2. Department of Genome Sciences, University of Washington School of Medicine, 1705 NE Pacific St, Seattle, Washington 98195, USA

    • Xinwei She
    • , Zhaoshi Jiang
    •  & Evan E. Eichler
  3. UCSC Genome Bioinformatics Group, Center for Biomolecular Science & Engineering, University of California, Santa Cruz, 1156 High St, Santa Cruz, California 95064, USA

    • Terrence S. Furey
    •  & David Haussler
  4. Washington University School of Medicine, Genome Sequencing Center, 4444 Forest Park Boulevard, St Louis, Missouri 63108, USA

    • Tina Graves
    •  & Richard K. Wilson
  5. School of Computing Science, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada

    • Cenk Sahinalp
  6. Sezione di Genetica, DAPEG, University of Bari, Via Amendola 165/A 70126 Bari, Italy

    • Mariano Rocchi
  7. Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, University Park, Pennsylvania 16802, USA

    • Webb Miller

Authors

  1. Search for Xinwei She in:

  2. Search for Julie E. Horvath in:

  3. Search for Zhaoshi Jiang in:

  4. Search for Ge Liu in:

  5. Search for Terrence S. Furey in:

  6. Search for Laurie Christ in:

  7. Search for Royden Clark in:

  8. Search for Tina Graves in:

  9. Search for Cassy L. Gulden in:

  10. Search for Can Alkan in:

  11. Search for Jeff A. Bailey in:

  12. Search for Cenk Sahinalp in:

  13. Search for Mariano Rocchi in:

  14. Search for David Haussler in:

  15. Search for Richard K. Wilson in:

  16. Search for Webb Miller in:

  17. Search for Stuart Schwartz in:

  18. Search for Evan E. Eichler in:

Competing interests

The authors declare that they have no competing financial interests.

Corresponding author

Correspondence to Evan E. Eichler.

Supplementary information

Word documents

  1. 1.

    Supplementary Methods

    This file includes a detailed description of the Methods and additional references.

Excel files

  1. 1.

    Supplementary Table 1a

    Gaps, repeats and exons within 2Mb pericentromeric regions.

  2. 2.

    Supplementary Table 1b

    Gaps, repeats and exons within 5Mb pericentromeric regions.

  3. 3.

    Supplementary Table 1c

    Duplications and pairwise alignments within 2Mb pericentromeric regions.

  4. 4.

    Supplementary Table 1d

    Duplications and pairwise alignments within 5Mb pericentromeric regions.

  5. 5.

    Supplementary Table 1e

    Homology between each 2Mb pericentromeric region and all other chromosomes.

  6. 6.

    Supplementary Table 1f

    Homology between each 5Mb pericentromeric region and all other chromosomes.

  7. 7.

    Supplementary Table 2

    Analysis of alpha satellite DNA placement in build 34 (July 2003) for each chromosome.

  8. 8.

    Supplementary Table 3

    Cytogenetic vs. in silico analysis of human alpha-satellite containing sequences.

  9. 9.

    Supplementary Table 4

    Assessment of all gaps within the finished human genome.

  10. 10.

    Supplementary Table 5

    Complete analysis of cytogenetic vs. in silico analysis of human segmental duplications.

  11. 11.

    Supplementary Table 6

    Brief summary of cytogenetic vs. in silico analysis of human segmental duplications.

  12. 12.

    Supplementary Table 7

    Paralogous STS content of 10 pericentromeric duplicons in the human genome.

  13. 13.

    Supplementary Table 8

    Duplicon junction analysis.

  14. 14.

    Supplementary Table 9

    Non-human primate mapping of pericentromeric duplications on 2p11.

  15. 15.

    Supplementary Table 10

    Summary of ancestral duplicons and corresponding gene composition.

  16. 16.

    Supplementary Table 11a

    Refseq genes located within 2Mb pericentromeric regions.

  17. 17.

    Supplementary Table 11b

    Refseq genes located within 5Mb pericentromeric regions.

  18. 18.

    Supplementary Table 12

    Known genes and mRNAs within pericentromeric duplications from ancestral donors.

  19. 19.

    Supplementary Table 13

    RT-PCR analysis of a subset of pericentromeric genes, mRNAs and ESTs.

PDF files

  1. 1.

    Supplementary Figure 1a

    Repeat density plotted within 10Mb pericentromeric regions.

  2. 2.

    Supplementary Figure 1b

    Exon density plotted within 10Mb pericentromeric regions.

  3. 3.

    Supplementary Figure 2

    The 20 largest pericentromeric-pericentromeric alignments within 2Mb pericentromeric regions.

  4. 4.

    Supplementary Figure 3

    Distribution of pericentromeric duplications for each chromosome by divergence.

  5. 5.

    Supplementary Figure 4

    Methodology for identification of ancestral duplicons using mousenet.

  6. 6.

    Supplementary Figure 5

    Ancestral duplicons within 15q11.

  7. 7.

    Supplementary Figure 6

    Sequence similarity between acceptor duplicons and donor duplicons.

  8. 8.

    Supplementary Figure 7

    Sequence structure of genes and mRNAs within pericentromeric regions.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nature02806

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.