The sequence and analysis of duplication-rich human chromosome 16

Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Chan, Yee Man; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip B.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Manohar, Chitra F.; Mark, Graham A.; McMurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; Scott White, P.; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman A.; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; Rokhsar, Daniel S.; Eichler, Evan E.; Gilna, Paul; Lucas, Susan M.; Myers, Richard M.; Rubin, Edward M.; Pennacchio, Len A.

doi:10.1038/nature03187

Article
Published: 23 December 2004

The sequence and analysis of duplication-rich human chromosome 16

Joel Martin¹,
Cliff Han²,
Laurie A. Gordon¹,
Astrid Terry¹,
Shyam Prabhakar⁴,
Xinwei She⁵,
Gary Xie^1,2,
Uffe Hellsten¹,
Yee Man Chan⁶,
Michael Altherr^1,2,
Olivier Couronne⁴,
Andrea Aerts¹,
Eva Bajorek⁶,
Stacey Black⁶,
Heather Blumer²,
Elbert Branscomb^1,3,
Nancy C. Brown²,
William J. Bruno²,
Judith M. Buckingham²,
David F. Callen²,
Connie S. Campbell²,
Mary L. Campbell²,
Evelyn W. Campbell²,
Chenier Caoile⁶,
Jean F. Challacombe²,
Leslie A. Chasteen²,
Olga Chertkov²,
Han C. Chi²,
Mari Christensen³,
Lynn M. Clark²,
Judith D. Cohn²,
Mirian Denys⁶,
John C. Detter¹,
Mark Dickson⁶,
Mira Dimitrijevic-Bussod²,
Julio Escobar⁶,
Joseph J. Fawcett²,
Dave Flowers⁶,
Dea Fotopulos⁶,
Tijana Glavina¹,
Maria Gomez⁶,
Eidelyn Gonzales⁶,
David Goodstein¹,
Lynne A. Goodwin²,
Deborah L. Grady²,
Igor Grigoriev¹,
Matthew Groza³,
Nancy Hammon¹,
Trevor Hawkins¹,
Lauren Haydu⁶,
Carl E. Hildebrand²,
Wayne Huang¹,
Sanjay Israni¹,
Jamie Jett¹,
Phillip B. Jewett²,
Kristen Kadner¹,
Heather Kimball¹,
Arthur Kobayashi^1,3,
Marie-Claude Krawczyk²,
Tina Leyba²,
Jonathan L. Longmire²,
Frederick Lopez⁶,
Yunian Lou¹,
Steve Lowry¹,
Thom Ludeman²,
Chitra F. Manohar³,
Graham A. Mark²,
Kimberly L. McMurray²,
Linda J. Meincke²,
Jenna Morgan¹,
Robert K. Moyzis²,
Mark O. Mundt²,
A. Christine Munk²,
Richard D. Nandkeshwar³,
Sam Pitluck¹,
Martin Pollard¹,
Paul Predki¹,
Beverly Parson-Quintana²,
Lucia Ramirez⁶,
Sam Rash¹,
James Retterer⁶,
Darryl O. Ricke²,
Donna L. Robinson²,
Alex Rodriguez⁶,
Asaf Salamov¹,
Elizabeth H. Saunders²,
Duncan Scott¹,
Timothy Shough²,
Raymond L. Stallings²,
Malinda Stalvey²,
Robert D. Sutherland²,
Roxanne Tapia²,
Judith G. Tesmer²,
Nina Thayer^1,2,
Linda S. Thompson²,
Hope Tice¹,
David C. Torney²,
Mary Tran-Gyamfi¹,
Ming Tsai⁶,
Levy E. Ulanovsky²,
Anna Ustaszewska¹,
Nu Vo⁶,
P. Scott White²,
Albert L. Williams²,
Patricia L. Wills²,
Jung-Rung Wu²,
Kevin Wu⁶,
Joan Yang⁶,
Pieter DeJong⁷,
David Bruce²,
Norman A. Doggett²,
Larry Deaven²,
Jeremy Schmutz⁶,
Jane Grimwood⁶,
Paul Richardson¹,
Daniel S. Rokhsar¹,
Evan E. Eichler⁵,
Paul Gilna²,
Susan M. Lucas¹,
Richard M. Myers⁶,
Edward M. Rubin^1,4 &
…
Len A. Pennacchio^1,4

Nature volume 432, pages 988–994 (2004)Cite this article

10k Accesses
121 Citations
4 Altmetric
Metrics details

Abstract

Human chromosome 16 features one of the highest levels of segmentally duplicated sequence among the human autosomes. We report here the 78,884,754 base pairs of finished chromosome 16 sequence, representing over 99.9% of its euchromatin. Manual annotation revealed 880 protein-coding genes confirmed by 1,670 aligned transcripts, 19 transfer RNA genes, 341 pseudogenes and three RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukaemia. Several large-scale structural polymorphisms spanning hundreds of kilobase pairs were identified and result in gene content differences among humans. Whereas the segmental duplications of chromosome 16 are enriched in the relatively gene-poor pericentromere of the p arm, some are involved in recent gene duplication and conversion events that are likely to have had an impact on the evolution of primates and human disease susceptibility.

You have full access to this article via your institution.

Download PDF

The structure, function and evolution of a complete human chromosome 8

Article Open access 07 April 2021

The complete sequence of a human Y chromosome

Article 23 August 2023

Recurrent inversion toggling and great ape genome evolution

Article 15 June 2020

Main

The US Department of Energy (DOE) initiated the mapping and sequencing of human chromosome 16 in 1988 to contribute to the generation of a reference human genome sequence to be used in assessing the effects of radiation and for the study of human biology. This particular chromosome was in part targeted for sequencing because of the localization of the DNA repair gene ERCC4 to the p arm of chromosome 16 (ref. 1), the availability of a unique flow-sorted chromosome-specific cosmid library², and access to a mouse–human hybrid cell panel enabling the localization of clones to discrete cytogenetic intervals³. Further interest in human chromosome 16 stemmed from the clustering of metallothionein genes on this chromosome, which participate in heavy metal transport and detoxification, coinciding with important biological interests of the DOE^4,5. Here we describe the finished human chromosome 16 sequence, which provides a reference for the further exploration of genomic sequence alterations and their relationship to human biology.

Mapping and sequencing

To provide the foundation for sequencing human chromosome 16, we constructed a physical map based on previous sequence-tagged site (STS) content maps^6,7,8 with a minimal final tiling path of 716 clones, which include 618 bacterial artificial chromosomes (BACs), 79 cosmids, seven fosmids, five phage-derived artificial chromosomes (PACs), three yeast artificial chromosome (YAC) subclones, two P1 phages, two phage vectors and five genomic polymerase chain reaction (PCR) fragments. The final sequence contains four gaps, with two in each of the chromosome arms. One of the gaps is found in the highly duplicated pericentromeric region in the p arm, while two of the remaining non-pericentromeric gaps are resistant to stable cloning with conventional vectors, and efforts are ongoing to close the estimated ∼25 kilobases (kb) of missing sequence using alternative vectors⁹. The final gap is found near the telomere of the q arm in a region of subtelomeric repeats distal to the last identifiable cosmid subclone (AC137934) of a 16q telomere half-YAC as previously described¹⁰.

The high degree of segmental duplication of chromosome 16, coupled with the multiple haplotypes represented in the numerous clone libraries comprising the tiling path, hindered efforts to construct a valid clone-based representation of this chromosome. To resolve this issue, we adopted a strategy of high depth clone coverage from a library constructed from a single individual¹¹. This enabled the determination of both of the diploid haplotypes across the segmentally duplicated intervals. Overall, these efforts resulted in the generation of 78,884,754 base pairs (bp) of finished euchromatic sequence with an estimated accuracy¹² exceeding 99.9% and covering in excess of 99.9% of its euchromatin. Including the centromere and its adjacent heterochromatic portion of the q arm, sized together at 9.8 megabases (Mb) (see Methods), the total size of the chromosome is estimated at 88.7 Mb.

As a further assessment of the physical sequence, we compared it to the existing physical and genetic maps. We were able to account for all sequence-tagged sites from the Genethon¹³ microsatellite, the deCODE¹⁴ and the Marshfield¹⁵ genetic maps. We also compared the final DNA sequence with recombination distances in the deCODE female, male and sex-averaged meiotic maps (Fig. 1). We found the female recombination distances for chromosome 16 were similar to other human chromosomes, showing a linear relationship between recombination and physical distances at an average of 1.93 cM Mb^-1, excluding heterochromatin. However, the male meiotic map displayed substantial differences in the region from 17–72 Mb with a meiotic distance of only 22.5 cM, yielding an average of 0.50 cM Mb^-1. Finally, we found a marked increase in male recombination near the telomeres, exceeding 3 cM Mb^-1, consistent with other human chromosomes¹⁶.

Gene catalogue

We manually curated gene models as previously described¹⁷ and identified a total of 880 protein-coding gene loci (Table 1, Supplementary Information Table 1 and http://www.jgi.doe.gov/human_chr16) supported by 1,670 full-length (or nearly full-length) transcripts. These provided an average of 1.9 annotated transcripts per locus with 450 of the loci showing strong evidence for alternative splicing with two or more annotated messenger RNA transcripts. Additionally, 208 loci have ‘expressed sequence tag’ (EST) evidence for alternative splice forms, resulting in nearly 75% of loci displaying some evidence for alternative splice variants. Loci were further classified as ‘known genes’, ‘novel genes’ or ‘pseudogenes’, consistent with our previous definitions¹⁷, excluding loci without unique open reading frames, and ab initio predictions without supporting evidence. Seven hundred and seventy-one known genes were modelled on the basis of 2,435 Refseq transcripts as well as other complementary DNA sequence evidence in GenBank. Nearly one-third (36%) of these known genes were extended by more than 50 bp at the 5′ end and 18% at the 3′ end relative to Refseq transcripts while maintaining their original open reading frame.

Table 1 Chromosome 16 sequence features

Full size table

We identified thirty ‘novel genes’ based on cDNA sequence, spliced ESTs and protein similarity to known human or mouse genes, and we modelled an additional 79 putative novel genes using orthologous mouse cDNA sequences and ab initio predictions. Additionally, we annotated 19 tRNA genes and three tRNA pseudogenes based on previous data¹⁸. Finally we identified 341 pseudogenes and pseudogene fragments of which 120 appear to be non-processed because they displayed an exon structure similar to the parent locus and are therefore likely to have resulted from genomic duplication events. The remaining 221 appear to be processed pseudogenes, presumably resulting from viral retrotransposition of spliced mRNAs or from mitochondrial genome insertion. At least one frameshift or premature stop codon (in comparison to the parent gene) was identified in 233 pseudogenes and the remaining 108 were processed pseudogenes lacking introns and displaying poly-A's in the adjacent genomic sequence. This supports the likely nonfunctional nature of these vestigial genes. To assess the quality of our pseudogene collection, we compared it to an earlier analysis¹⁹ describing 250 processed pseudogenes on chromosome 16. Initially we were able to map 233 of these 250 pseudogenes to 429 loci on chromosome 16 using BLAT²⁰ with 100% coverage and >99% identity. We then eliminated loci consisting of repetitive DNA²¹ (Smit, A. F. A. and Green, P., unpublished results), those covering less than 50% of the parent gene and cases where there was clearly a retained intron/exon structure. This resulted in 146 processed pseudogenes in agreement between a previous study¹⁹ and our study, and suggested that our manual curation of the finished sequence identified 75 additional members.

Large structural polymorphisms

We observed several large structural polymorphisms based on the finished sequence of chromosome 16, which were often associated with segmental duplications. For instance, we further characterized a previously described stable length polymorphism within the 16p subtelomeric region^22,23. Whereas the shortest and most common allele was previously finished (represented in NCBI Build 35), we isolated and sequenced the majority of a longer allele derived from a 16p telomere half-YAC, located within close proximity of the TTAGGG telomere repeat as defined in ref. 10. This allele is ∼137.5 kb longer than the current assembly, however the shorter allele is not simply a truncation of the longer form; rather the telomeric 21,056 bp of the short allele is not present in the long allele and the telomeric 158,607 bp of the long allele is not shared with the short allele. Both of these unique regions contain genes with the short allele containing a putative gene(s) represented by cDNAs MGC:75272 and MGC:52000 and with the long allele containing genes encoding hypothetical protein XP_375548 (similar to septin), hypothetical protein XP_379920 (similar to capicua) and beta-tubulin 4Q (AAL32434).

We also identified one of the most extensively duplicated regions on chromosome 16 corresponding to a 500-kb interval at 16p11.2-12.1 composed of approximately 54 intrachromosomal duplications (Fig. 2 and Supplementary Table 2). This interval includes seven full or partial gene duplicates including the eukaryotic translation initiation factor 3 subunit 8 (EIF3S8), sulphotransferase 1A (SULT1A1) and the Batten disease gene (CLN3). Assembly of the region was initially complicated by the fact that the duplications were long (∼ 200 kb) and showed an extraordinary degree of homology (98.33%). During the mapping of this region, sequence for a second haplotype variant from the RPCI-11 BAC library was nearly completed except for one gap of ∼100 kb. Sequence comparison of these two haplotypes (EIFvar1 and EIFvar2) revealed a 452-kb inversion between them (Fig. 2). Analysis of the breakpoints suggests that a large duplication palindrome is responsible for this rearrangement.

**Figure 2: A 450-kb inversion haplotype on chromosome 16.**

Finished sequence was also generated across a recently duplicated 360-kb polymorphism of the human homologue of the hydrocephalus inducing gene (HYDIN) at 16q22, which is inserted in some humans at chromosome 1q21.1. The RPCI-11 BAC library seems to be heterozygous for this insertional polymorphism, with the current genomic assembly for chromosome 1 containing the haplotype version lacking the insertion. We further investigated a recently described²⁴ copy number polymorphism between 16p11.2 and 6p25, which contains the DUSP22 gene. On the basis of extensive drafting of RPCI-11 BACs in the region and comparisons with drafted clones from monochromosomal libraries for chromosomes 6 and 16, we were able to determine that the RPCI-11 library is homozygous and lacking the DUSP22 duplication on chromosome 16. Taken together, these recently arisen large structural polymorphisms are striking examples of variability in the human genome and support a potential mechanism that contributes to phenotypic or disease susceptibility differences among humans. It is worth noting that 91 genes on chromosome 16 are located within segmental duplications, any of which could be unstable and challenge researchers studying phenotypes linked to these gene-containing regions. These observations are particularly relevant on the basis of recent findings^24,25 of abundant copy number polymorphisms within the genomes of normal individuals, which include those described here.

Duplication analysis of chromosome 16

We performed a detailed analysis of duplicated genomic sequence (≥ 90% sequence identity and ≥1 kb in length) comparing chromosome 16 against the July 2003 assembly of the human genome. We found that 9.89% (7.8 Mb) of chromosome 16 consists of segmental duplications (Supplementary Table 2). In comparison to other finished chromosomes and to the human genomic average (5.3%), chromosome 16 is one of the most enriched chromosomes for segmental duplications (Supplementary Table 2 and Supplementary Fig. 1). Nearly 9% of genome-wide human duplication alignments map to this chromosome. Intrachromosomal duplications are longer and show higher sequence identity when compared with interchromosomal duplications (Fig. 3a and Supplementary Fig. 2). Whereas there is a general inverse correlation between duplication length and divergence, the effect is most pronounced for intrachromosomal duplication in which the average length of duplicated DNA exceeds 16 kb. A clear bimodal distribution pattern of sequence identity is distinguishable based on the distribution pattern of the alignments. Most interchromosomal duplication alignments show 93–95% sequence identity whereas intrachromosomal duplications show greater than 97% sequence identity, consistent with a recent expansion of intrachromosomal duplications along the chromosome^26,27. On the basis of substitution rates between great apes, we estimate that as much as 7% of the mass of human chromosome 16 was added by segmental duplication events within the last 10 million years of human evolution²⁸.

**Figure 3: Chromosome 16 segmental duplications.**

Segmental duplications are particularly clustered along the p arm of the chromosome (Supplementary Figs 1 and 3). As described previously²⁹, the 16p11 pericentromeric region represents the largest zone of interchromosomal duplications (Fig. 3b) accounting for 44% (937 of 2,146) of the total number of chromosome 16 alignments (Supplementary Table 4) and 55% (752 of 1,365) of all chromosome 16 interchromosomal alignments. Most of the interchromosomal duplications in this region map to the pericentromeric regions of other chromosomes (Fig. 3b). Large tracts of interstitial alpha-satellite DNA have been finished within proximal 16p11 and it is possible that such sequences have played a part in the frequent evolutionary exchange of pericentromeric DNA among non-homologous chromosomes³⁰. In stark contrast to 16p11, there is little evidence for extensive pericentromeric duplication on the q arm despite the fact that centromeric satellite boundary sequences have been traversed.

An additional 19 blocks of extensive duplication (> 100 kb and >5 duplication alignments) were identified within the euchromatic portion of chromosome 16. These regions are composed of as many as 119 underlying duplicons (also known as low-copy repeats on chromosome 16, LCR16) that have been juxtaposed in different combinations within the duplication blocks. These contain various genes and gene fragments, such as NPIP, SULT1A, EIF3S8 and SMG1 (Supplementary Table 3). Most are duplicated several times in varying copy numbers with a high degree of sequence identity to their putative ancestral genes. Most seem to have been duplicated in concert with LCR16a, a segment that contains one of the most rapidly evolving gene families of the human genome^27,31.

Comparative genomics

We compared human chromosome 16 to the chimpanzee, dog, mouse³², rat³³, chicken and fish³⁴ (Fugu rubripes) draft genomes to further explore the evolution and constraint of sequences found along this chromosome. By first building segmental maps from DNA alignments of all the vertebrate species described above, we were able to examine the global homologous chromosomal relationships between these vertebrate genomes and human chromosome 16 (see Methods). We found no major rearrangements relative to the homologous chimpanzee chromosome 18. Comparison with the mouse and rat genomes revealed 26 chromosomal segments unbroken in any of the three species, ranging in size from 250 kb to 10.7 Mb (Fig. 4a). Further addition of the chicken genome to the multi-dimensional map yielded 33 segments ranging in size from 250 kb to 8.7 Mb (Fig. 4a). These segmental maps provide the substrates to precisely define the breakpoints that, in some cases, may have disrupted gene loci in the species containing the rearrangement.

**Figure 4: Comparative analysis of human chromosome 16.**

We next identified slowly evolving regions, presumably under evolutionary constraint, through fine-scale DNA comparison of chromosome 16 with other vertebrate genome assemblies. Four different species combinations were selected to represent the accessible range of vertebrate evolutionary divergence times: human/mouse/rat, human/mouse/rat/dog, human/mouse/dog/chicken, and human/mouse/Fugu (see Methods). To explore potential noncoding functional elements on chromosome 16, the results were filtered for overlap with annotated genes, spliced ESTs or mRNAs in human, mouse and rat, which resulted in the identification of 5,187 discrete conserved noncoding regions between human/mouse/rat, 6,159 between human/mouse/rat/dog, 1,862 between human/mouse/dog/chicken, and 191 between human/mouse/Fugu (Fig. 4b and Supplementary Table 1). Compared with genome-wide averages, the densities of human/mouse/rat and human/mouse/dog/chicken elements were only slightly higher for human chromosome 16 (Supplementary Table 1). In contrast, human/mouse/Fugu elements are present at ∼2.4 times the genome-wide density, indicating that although chromosome 16 as a whole has had ‘normal’ levels of noncoding constraint since the mammal/bird split, it has conserved more ancient functions to a surprising degree. Functional studies on these conserved elements are warranted to assess their possible biological activity in the ∼98% of the human genome that is noncoding.

We further explored an 8.7-Mb region at 16q12, on the basis of extreme features of evolutionary conservation. This region was first identified as the largest unbroken synteny segment between human/mouse/dog/chicken on chromosome 16 and contains 59% (112 of 191) of the human/mouse/Fugu noncoding elements. These elements are entirely clustered in a gene-poor 5-Mb subregion, which contains at least six developmental transcription factors, including SALL1 and three iroquois genes (IRX3, IRX5 and IRX6). This clustering is an example of the general bias of human–fish conserved sequences towards developmental genes³⁵. Interestingly, at least nine of these human/mouse/Fugu elements have significant sequence similarity to counterparts in the paralogous IRX gene cluster on chromosome 5, which is similarly located in a ‘forest’ of human–fish conservation³⁶. In vivo mouse transgenic data indicate that a significant percentage of these IRX conserved noncoding sequences behave as gene enhancers (Pennacchio, L. A., unpublished observation), suggesting that in addition to the well described conservation of the protein-encoding portions of genomic duplications, evolutionary constraint is also observable in adjacent gene regulatory sequences following genomic duplication events. This synteny block is an outlier even in terms of more recent noncoding conservation, with 917 (105 per Mb) human/mouse/rat and 590 (67.5 per Mb) human/mouse/dog/chicken elements.

The second longest chromosome 16 synteny block in human/mouse/dog/chicken neighbours the highly conserved SALL1-IRX segment and is similar in length (8.19 Mb) (Fig. 4c). Once again this region is gene poor, with its telomeric 7.6 Mb containing only three annotated genes, all members of the cadherin family: CDH8, CDH11 and CDH5. Within the full 8.19-Mb interval, we identified 968 (118 per Mb) human/mouse/rat conserved noncoding sequences. This is twice the genome-wide density, as was the case in the SALL1-IRX region. However, in stark contrast to the neighbouring SALL1-IRX region, this synteny block has no noncoding conservation between human/mouse/Fugu, suggesting that its noncoding functions, though just as constrained among mammals, are more diverged in distant species.

As a special category of constrained DNA, we also searched for ultra-conserved noncoding sequences, recently defined by the stringent criterion of at least 200 bp in length and 100% identity between the human, mouse and rat genomes³⁷. Of the 482 ultra-conserved elements found in the entire human genome, 15 (3.1%) were found on chromosome 16, with 11 having some evidence of being transcribed and processed into mature mRNAs. The above-mentioned bias towards developmental genes has also been noted³⁷ for ultra-conserved human/rodent elements. Indeed, 9 of the 15 ultra-conserved elements found on chromosome 16 lie in the same SALL1-IRX synteny block that contains the mammal/fish conservation cluster. This contrasts with the similarly sized cadherin synteny block that contains no human–fish noncoding conservation and only one ultra-conserved element.

Finally, three regions on chromosome 16 have been selected by the National Human Genome Research Institute as part of the Encyclopaedia of DNA Elements (ENCODE) project, an effort aimed at rigorously analysing 1% of the human genome sequence³⁸ (http://www.genome.gov/10005107). These three ENCODE regions include the well-studied alpha-globin-containing interval (ENm008) and two randomly chosen regions (ENr211 on 16p12.1 and ENr313 on 16q21). Interestingly, ENr313 is located within the large cadherin gene desert described above and is completely devoid of genes (Fig. 4d). Nonetheless, it harbours the same high density of human/mouse/rat and human/mouse/dog/chicken conserved noncoding elements as the rest of the cadherin synteny block, suggesting the presence of numerous unassigned functional sequences within this region. Ongoing studies by ENCODE will better define the overlap of functionality and comparative sequence data such as that presented here.

Discussion

The primary sequence of human chromosome 16, as well as the human genome as a whole, now provides a key foundation for ongoing efforts such as ENCODE to deeply annotate all types of information encoded in our genome. This represents an enormous long-term challenge because genomic signatures embedded within the sequence of DNA perform a vast number of different operations across the trillions of cells within our bodies. These features range from relatively easily identified genes, to sequences involved in gene regulation—which use a plethora of signals to determine when and where a given gene is expressed and under what conditions—to probably even more complicated features such as higher-order chromosome structure and DNA involvement in replication and repair. It is inspiring to reminisce that it was only 50 years ago that we had our first glimpse into the structure of DNA, which provided the foundation for generating the nearly entire human euchromatic sequence. The next 50 years will probably also bring similarly impressive gains and enable us to precisely relate our primary genomic sequence to functional genomic signatures and their relationship to human biology.

Methods

Sizing of heterochromatic gaps

To estimate the size of the alpha satellite bands (16p11.1-16q11.1) encompassing the centromere and the satellite II heterochromatin in band 16q11.2, we used contour-clamped homogeneous electric field (CHEF) pulsed-field gel electrophoresis at various pulse times to resolve macrorestriction fragments between 100 kb and >7,000 kb. DNA from CY18 (a mouse–human hybrid containing a single human chromosome 16) was digested with several different rare cutting restriction enzymes and separated on CHEF gels. Hybridization to blots of these gels with 16-1 (16-specific alpha satellite) and pHuR 195 (16-specific satellite II) probes revealed a single band of alpha satellite (in three different enzyme digests) that did not overlap with any satellite II bands (data not shown). The smallest of these bands was an 1,800-kb Xho I fragment, which provided an upper size limit for the alpha satellite array, encompassing the centromere on chromosome 16. SalI fragmented the satellite II heterochromatin into well resolved large restriction fragments without cutting within the alpha satellite array. The sum of the SalI satellite II fragments was estimated at ∼7,800 kb providing a upper size limit of the 16q11.2 satellite II heterochromatin at nominally 8 Mb. Together these account for 9.8 Mb of unsequenced heterochromatin encompassing cytogenetic bands 16p11.1-16q11.2, although it is likely that we did sequence partially into the boundaries of these regions in the adjacent tiling set clones.

Segmental duplication analysis

We used a BLAST-based detection scheme³⁹ to identify all pairwise similarities representing duplicated regions (≥ 1 kb and ≥90% identity) within the finished sequence of chromosome 16 and compared it with all other chromosomes in the NCBI genome assembly (build 34). A total of 2,146 pairwise alignments representing 26.12 Mb of aligned basepairs and 7.8 Mb of non-redundant duplicated bases were analysed on chromosome 16. The program Parasight (http://humanparalogy.gene.cwru.edu/parasight/) was used to generate images of pairwise alignments. Divergence of duplication, the number of substitutions per site between the two sequences, were calculated using Kimura's two-parameter method, which corrects for multiple events and transversion/transition mutational biases⁴⁰. Analysis of haplotype structural variation was performed using the program Miropeats (threshold = 3,000)⁴¹. Gene content of each 1% duplicated regions of 90–100% identity was analysed using a non-redundant/non-overlapping set of known genes. A gene feature (exon) was considered duplicated if >50 bp of the feature overlapped duplication. Thus, exons less than 50 bp were lost in this analysis.

Pseudogene identification

Pseudogenes were defined as gene models built by homology to known human genes in which the alignment between the model and the homologue shows at least one stop codon or frameshift. We identified homologies⁴² of human IPI (International Protein Index; http://www.ebi.ac.uk/IPI/IPIhelp.html) proteins on repeat-masked²¹ (Smit, A. F. A and Green, P., unpublished results) genomic chromosome 16 sequence. For each such fragment of genomic sequence we built gene models using the GeneWise⁴³ program. Overlapping models were then clustered and the top-scoring model was analysed for the presence of premature stop codons and frameshifts. Remaining models were then manually checked to confirm their pseudogene status.

Comparative analysis

Multi-species segmental homology maps were computed using PARAGON (v2.2; Couronne, unpublished work), which is based on BLASTZ⁴⁴ pairwise alignments of all genomes to human. After filtering out segments shorter than 250 kb in humans, MLAGAN⁴⁵ alignments of homologous blocks were scanned for evolutionarily conserved regions using Gumby (v1.5; Prabhakar, unpublished work). These were visualized using Rank-VISTA (Prabhakar, unpublished work). Gumby goes through the following three-step process to identify statistically significant conservation in the input global alignment: (1) first, noncoding regions in the alignment are used to estimate the local neutral mutation rates⁴⁶ between all pairs of aligned sequences. The rates are used to derive a log-likelihood scoring scheme for slow versus neutral evolution⁴⁷, in which the slow rate is set at half the neutral rate; (2) each alignment position is then assigned a conservation score using a phylogenetically weighted sum-of-pairs scheme; (3) finally, a dynamic programming step scans the alignment for high-scoring segments (conserved regions) of any length. Conserved regions detected in this manner are assigned P-values using the same statistical formalism⁴⁸ as the BLAST algorithm⁴². Whereas BLAST assigns P-values relative to random permutations of the query and target sequences, Gumby P-values relate to random permutations of the columns in the input alignment. Here, all the results were generated using a Gumby P-value threshold of 0.01 and a baseline human sequence length of 100 kb. Conserved noncoding regions were defined as conserved segments that overlap annotated exons, spliced ESTs or mRNAs from human, mouse or rat over no more than 25% of their length. At a Gumby P-value threshold of 0.01, 2.2% of the ungapped positions in the human genome were assigned to human/mouse/rat conserved noncoding segments.

References

Siciliano, M. J. Chromosomal assignment of human genes coding for DNA repair functions. Isozymes Curr. Top. Biol. Med. Res. 15, 217–223 (1987)
CAS PubMed Google Scholar
Deaven, L. L. et al. Construction of human chromosome-specific DNA libraries from flow-sorted chromosomes. Cold Spring Harb. Symp. Quant. Biol. 51, 159–167 (1986)
Article CAS Google Scholar
Callen, D. F. et al. High-resolution cytogenetic-based physical map of human chromosome 16. Genomics 13, 1178–1185 (1992)
Article CAS Google Scholar
Hildebrand, C. E. & Enger, M. D. Regulation of Cd2 + /Zn2 + -stimulated metallothionein synthesis during induction, deinduction, and superinduction. Biochemistry 19, 5850–5857 (1980)
Article CAS Google Scholar
Stallings, R. L., Munk, A. C., Longmire, J. L., Hildebrand, C. E. & Crawford, B. D. Assignment of genes encoding metallothioneins I and II to Chinese hamster chromosome 3: evidence for the role of chromosome rearrangement in gene amplification. Mol. Cell. Biol. 4, 2932–2936 (1984)
Article CAS Google Scholar
Han, C. S. et al. Construction of a BAC contig map of chromosome 16q by two-dimensional overgo hybridization. Genome Res. 10, 714–721 (2000)
Article CAS Google Scholar
Doggett, N. A. et al. An integrated physical map of human chromosome 16. Nature 377, 335–365 (1995)
CAS PubMed Google Scholar
Cao, Y. et al. A 12-Mb complete coverage BAC contig map in human chromosome 16p13.1-p11.2. Genome Res. 9, 763–774 (1999)
CAS PubMed PubMed Central Google Scholar
Kouprina, N. et al. Construction of human chromosome 16- and 5-specific circular YAC/BAC libraries by in vivo recombination in yeast (TAR cloning). Genomics 53, 21–28 (1998)
Article CAS Google Scholar
Riethman, H. C. et al. Integration of telomere sequences with the draft human genome sequence. Nature 409, 948–951 (2001)
Article CAS Google Scholar
Osoegawa, K. et al. A bacterial artificial chromosome library for sequencing the complete human genome. Genome Res. 11, 483–496 (2001)
Article CAS Google Scholar
Schmutz, J. et al. Quality assessment of the human genome sequence. Nature 429, 365–368 (2004)
Article ADS CAS Google Scholar
Dib, C. et al. A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature 380, 152–154 (1996)
Article ADS CAS Google Scholar
Kong, A. et al. A high-resolution recombination map of the human genome. Nature Genet. 31, 241–247 (2002)
Article CAS Google Scholar
Broman, K. W., Murray, J. C., Sheffield, V. C., White, R. L. & Weber, J. L. Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am. J. Hum. Genet. 63, 861–869 (1998)
Article CAS Google Scholar
Yu, A. et al. Comparison of human genetic and sequence-based physical maps. Nature 409, 951–953 (2001)
Article ADS CAS Google Scholar
Grimwood, J. et al. The DNA sequence and biology of human chromosome 19. Nature 428, 529–535 (2004)
Article ADS CAS Google Scholar
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997)
Article CAS Google Scholar
Zhang, Z., Harrison, P. M., Liu, Y. & Gerstein, M. Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. Genome Res. 13, 2541–2558 (2003)
Article CAS Google Scholar
Kent, W. J. BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002)
Article CAS Google Scholar
Jurka, J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 16, 418–420 (2000)
Article CAS Google Scholar
Flint, J. et al. The relationship between chromosome structure and function at a human telomeric region. Nature Genet. 15, 252–257 (1997)
Article CAS Google Scholar
Wilkie, A. O. et al. Stable length polymorphism of up to 260 kb at the tip of the short arm of human chromosome 16. Cell 64, 595–606 (1991)
Article CAS Google Scholar
Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004)
Article ADS CAS Google Scholar
Iafrate, A. J. et al. Detection of large-scale variation in the human genome. Nature Genet. 36, 949–951 (2004)
Article CAS Google Scholar
Loftus, B. et al. Genome duplications and other features in 12 Mbp of DNA sequence from human chromosome 16p and 16q. Genomics 60, 295–308 (1999)
Article CAS Google Scholar
Johnson, M. E. et al. Positive selection of a gene family during the emergence of humans and African apes. Nature 413, 514–519 (2001)
Article ADS CAS Google Scholar
Chen, F. C. & Li, W. H. Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am. J. Hum. Genet. 68, 444–456 (2001)
Article CAS Google Scholar
She, X. et al. The structure and evolution of centromeric transition regions within the human genome. Nature 430, 857–864 (2004)
Article ADS CAS Google Scholar
Guy, J. et al. Genomic sequence and transcriptional profile of the boundary between pericentromeric satellites and genes on human chromosome arm 10p. Genome Res. 13, 159–172 (2003)
Article CAS Google Scholar
Eichler, E. E. et al. Divergent origins and concerted expansion of two segmental duplications on chromosome 16. J. Hered. 92, 462–468 (2001)
Article CAS Google Scholar
Waterston, R. H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002)
Article ADS CAS Google Scholar
Gibbs, R. A. et al. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428, 493–521 (2004)
Article ADS CAS Google Scholar
Aparicio, S. et al. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297, 1301–1310 (2002)
Article ADS CAS Google Scholar
Boffelli, D., Nobrega, M. A. & Rubin, E. M. Comparative genomics at the vertebrate extremes. Nature Rev. Genet. 5, 456–465 (2004)
Article CAS Google Scholar
Schmutz, J. et al. The DNA sequence and comparative analysis of human chromosome 5. Nature 431, 268–274 (2004)
Article ADS CAS Google Scholar
Bejerano, G. et al. Ultraconserved elements in the human genome. Science 304, 1321–1325 (2004)
Article ADS CAS Google Scholar
The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636–640 (2004)
Article ADS Google Scholar
Bailey, J. A., Yavor, A. M., Massa, H. F., Trask, B. J. & Eichler, E. E. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 11, 1005–1017 (2001)
Article CAS Google Scholar
Kimura, M. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16, 111–120 (1980)
Article ADS CAS Google Scholar
Parsons, J. D. Miropeats: graphical DNA sequence comparisons. Comput. Appl. Biosci. 11, 615–619 (1995)
CAS PubMed Google Scholar
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Article CAS Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004)
Article CAS Google Scholar
Schwartz, S. et al. Human-mouse alignments with BLASTZ. Genome Res. 13, 103–107 (2003)
Article CAS Google Scholar
Brudno, M. et al. LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13, 721–731 (2003)
Article CAS Google Scholar
Cooper, G. M. et al. Characterization of evolutionary rates and constraints in three mammalian genomes. Genome Res. 14, 539–548 (2004)
Article CAS Google Scholar
Boffelli, D. et al. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299, 1391–1394 (2003)
Article CAS Google Scholar
Karlin, S. & Dembo, A. Limit distributions of maximal segmental score among Markov-dependent partial sums. Adv. Appl. Prob. 24, 113–140 (1992)
Article MathSciNet Google Scholar

Download references

Acknowledgements

We thank the International Chimpanzee Sequencing Consortium for pre-publication access to and permission to analyse the relevant portions of the chimpanzee genomic sequence; the Broad Institute for pre-publication access to the dog genome assembly; and the Washington University Genome Sequencing Center for pre-publication access to the chicken genomic assembly. We also thank D. Gordon of the University of Washington for his assistance in developing and customizing finishing tools, and T. Furey and G. Schuler for their efforts towards assessing the quality and completeness of our assemblies. This work was performed under the auspices of the US Department of Energy's Office of Science, Biological and Environmental Research Program and by the University of California, Lawrence Livermore National Laboratory, Lawrence Berkeley National Laboratory, Los Alamos National Laboratory, and Stanford University.

Author information

Authors and Affiliations

DOE Joint Genome Institute, 2800 Mitchell Avenue, Walnut Creek, California, 94598, USA
Joel Martin, Laurie A. Gordon, Astrid Terry, Gary Xie, Uffe Hellsten, Michael Altherr, Andrea Aerts, Elbert Branscomb, John C. Detter, Tijana Glavina, David Goodstein, Igor Grigoriev, Nancy Hammon, Trevor Hawkins, Wayne Huang, Sanjay Israni, Jamie Jett, Kristen Kadner, Heather Kimball, Arthur Kobayashi, Yunian Lou, Steve Lowry, Jenna Morgan, Sam Pitluck, Martin Pollard, Paul Predki, Sam Rash, Asaf Salamov, Duncan Scott, Nina Thayer, Hope Tice, Mary Tran-Gyamfi, Anna Ustaszewska, Paul Richardson, Daniel S. Rokhsar, Susan M. Lucas, Edward M. Rubin & Len A. Pennacchio
Los Alamos National Laboratory, Los Alamos, New Mexico, 87545, USA
Cliff Han, Gary Xie, Michael Altherr, Heather Blumer, Nancy C. Brown, William J. Bruno, Judith M. Buckingham, David F. Callen, Connie S. Campbell, Mary L. Campbell, Evelyn W. Campbell, Jean F. Challacombe, Leslie A. Chasteen, Olga Chertkov, Han C. Chi, Lynn M. Clark, Judith D. Cohn, Mira Dimitrijevic-Bussod, Joseph J. Fawcett, Lynne A. Goodwin, Deborah L. Grady, Carl E. Hildebrand, Phillip B. Jewett, Marie-Claude Krawczyk, Tina Leyba, Jonathan L. Longmire, Thom Ludeman, Graham A. Mark, Kimberly L. McMurray, Linda J. Meincke, Robert K. Moyzis, Mark O. Mundt, A. Christine Munk, Beverly Parson-Quintana, Darryl O. Ricke, Donna L. Robinson, Elizabeth H. Saunders, Timothy Shough, Raymond L. Stallings, Malinda Stalvey, Robert D. Sutherland, Roxanne Tapia, Judith G. Tesmer, Nina Thayer, Linda S. Thompson, David C. Torney, Levy E. Ulanovsky, P. Scott White, Albert L. Williams, Patricia L. Wills, Jung-Rung Wu, David Bruce, Norman A. Doggett, Larry Deaven & Paul Gilna
Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California, 94550, USA
Elbert Branscomb, Mari Christensen, Matthew Groza, Arthur Kobayashi, Chitra F. Manohar & Richard D. Nandkeshwar
Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, California, 94720, USA
Shyam Prabhakar, Olivier Couronne, Edward M. Rubin & Len A. Pennacchio
Department of Genome Sciences, University of Washington, Seattle, Washington, 98195, USA
Xinwei She & Evan E. Eichler
Stanford Human Genome Center, Department of Genetics, Stanford University School of Medicine, 975 California Avenue, Palo Alto, California, 94304, USA
Yee Man Chan, Eva Bajorek, Stacey Black, Chenier Caoile, Mirian Denys, Mark Dickson, Julio Escobar, Dave Flowers, Dea Fotopulos, Maria Gomez, Eidelyn Gonzales, Lauren Haydu, Frederick Lopez, Lucia Ramirez, James Retterer, Alex Rodriguez, Ming Tsai, Nu Vo, Kevin Wu, Joan Yang, Jeremy Schmutz, Jane Grimwood & Richard M. Myers
Children's Hospital Oakland, Oakland, California, 94609, USA
Pieter DeJong

Authors

Joel Martin
View author publications
You can also search for this author in PubMed Google Scholar
Cliff Han
View author publications
You can also search for this author in PubMed Google Scholar
Laurie A. Gordon
View author publications
You can also search for this author in PubMed Google Scholar
Astrid Terry
View author publications
You can also search for this author in PubMed Google Scholar
Shyam Prabhakar
View author publications
You can also search for this author in PubMed Google Scholar
Xinwei She
View author publications
You can also search for this author in PubMed Google Scholar
Gary Xie
View author publications
You can also search for this author in PubMed Google Scholar
Uffe Hellsten
View author publications
You can also search for this author in PubMed Google Scholar
Yee Man Chan
View author publications
You can also search for this author in PubMed Google Scholar
Michael Altherr
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Couronne
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Aerts
View author publications
You can also search for this author in PubMed Google Scholar
Eva Bajorek
View author publications
You can also search for this author in PubMed Google Scholar
Stacey Black
View author publications
You can also search for this author in PubMed Google Scholar
Heather Blumer
View author publications
You can also search for this author in PubMed Google Scholar
Elbert Branscomb
View author publications
You can also search for this author in PubMed Google Scholar
Nancy C. Brown
View author publications
You can also search for this author in PubMed Google Scholar
William J. Bruno
View author publications
You can also search for this author in PubMed Google Scholar
Judith M. Buckingham
View author publications
You can also search for this author in PubMed Google Scholar
David F. Callen
View author publications
You can also search for this author in PubMed Google Scholar
Connie S. Campbell
View author publications
You can also search for this author in PubMed Google Scholar
Mary L. Campbell
View author publications
You can also search for this author in PubMed Google Scholar
Evelyn W. Campbell
View author publications
You can also search for this author in PubMed Google Scholar
Chenier Caoile
View author publications
You can also search for this author in PubMed Google Scholar
Jean F. Challacombe
View author publications
You can also search for this author in PubMed Google Scholar
Leslie A. Chasteen
View author publications
You can also search for this author in PubMed Google Scholar
Olga Chertkov
View author publications
You can also search for this author in PubMed Google Scholar
Han C. Chi
View author publications
You can also search for this author in PubMed Google Scholar
Mari Christensen
View author publications
You can also search for this author in PubMed Google Scholar
Lynn M. Clark
View author publications
You can also search for this author in PubMed Google Scholar
Judith D. Cohn
View author publications
You can also search for this author in PubMed Google Scholar
Mirian Denys
View author publications
You can also search for this author in PubMed Google Scholar
John C. Detter
View author publications
You can also search for this author in PubMed Google Scholar
Mark Dickson
View author publications
You can also search for this author in PubMed Google Scholar
Mira Dimitrijevic-Bussod
View author publications
You can also search for this author in PubMed Google Scholar
Julio Escobar
View author publications
You can also search for this author in PubMed Google Scholar
Joseph J. Fawcett
View author publications
You can also search for this author in PubMed Google Scholar
Dave Flowers
View author publications
You can also search for this author in PubMed Google Scholar
Dea Fotopulos
View author publications
You can also search for this author in PubMed Google Scholar
Tijana Glavina
View author publications
You can also search for this author in PubMed Google Scholar
Maria Gomez
View author publications
You can also search for this author in PubMed Google Scholar
Eidelyn Gonzales
View author publications
You can also search for this author in PubMed Google Scholar
David Goodstein
View author publications
You can also search for this author in PubMed Google Scholar
Lynne A. Goodwin
View author publications
You can also search for this author in PubMed Google Scholar
Deborah L. Grady
View author publications
You can also search for this author in PubMed Google Scholar
Igor Grigoriev
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Groza
View author publications
You can also search for this author in PubMed Google Scholar
Nancy Hammon
View author publications
You can also search for this author in PubMed Google Scholar
Trevor Hawkins
View author publications
You can also search for this author in PubMed Google Scholar
Lauren Haydu
View author publications
You can also search for this author in PubMed Google Scholar
Carl E. Hildebrand
View author publications
You can also search for this author in PubMed Google Scholar
Wayne Huang
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Israni
View author publications
You can also search for this author in PubMed Google Scholar
Jamie Jett
View author publications
You can also search for this author in PubMed Google Scholar
Phillip B. Jewett
View author publications
You can also search for this author in PubMed Google Scholar
Kristen Kadner
View author publications
You can also search for this author in PubMed Google Scholar
Heather Kimball
View author publications
You can also search for this author in PubMed Google Scholar
Arthur Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Claude Krawczyk
View author publications
You can also search for this author in PubMed Google Scholar
Tina Leyba
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan L. Longmire
View author publications
You can also search for this author in PubMed Google Scholar
Frederick Lopez
View author publications
You can also search for this author in PubMed Google Scholar
Yunian Lou
View author publications
You can also search for this author in PubMed Google Scholar
Steve Lowry
View author publications
You can also search for this author in PubMed Google Scholar
Thom Ludeman
View author publications
You can also search for this author in PubMed Google Scholar
Chitra F. Manohar
View author publications
You can also search for this author in PubMed Google Scholar
Graham A. Mark
View author publications
You can also search for this author in PubMed Google Scholar
Kimberly L. McMurray
View author publications
You can also search for this author in PubMed Google Scholar
Linda J. Meincke
View author publications
You can also search for this author in PubMed Google Scholar
Jenna Morgan
View author publications
You can also search for this author in PubMed Google Scholar
Robert K. Moyzis
View author publications
You can also search for this author in PubMed Google Scholar
Mark O. Mundt
View author publications
You can also search for this author in PubMed Google Scholar
A. Christine Munk
View author publications
You can also search for this author in PubMed Google Scholar
Richard D. Nandkeshwar
View author publications
You can also search for this author in PubMed Google Scholar
Sam Pitluck
View author publications
You can also search for this author in PubMed Google Scholar
Martin Pollard
View author publications
You can also search for this author in PubMed Google Scholar
Paul Predki
View author publications
You can also search for this author in PubMed Google Scholar
Beverly Parson-Quintana
View author publications
You can also search for this author in PubMed Google Scholar
Lucia Ramirez
View author publications
You can also search for this author in PubMed Google Scholar
Sam Rash
View author publications
You can also search for this author in PubMed Google Scholar
James Retterer
View author publications
You can also search for this author in PubMed Google Scholar
Darryl O. Ricke
View author publications
You can also search for this author in PubMed Google Scholar
Donna L. Robinson
View author publications
You can also search for this author in PubMed Google Scholar
Alex Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar
Asaf Salamov
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth H. Saunders
View author publications
You can also search for this author in PubMed Google Scholar
Duncan Scott
View author publications
You can also search for this author in PubMed Google Scholar
Timothy Shough
View author publications
You can also search for this author in PubMed Google Scholar
Raymond L. Stallings
View author publications
You can also search for this author in PubMed Google Scholar
Malinda Stalvey
View author publications
You can also search for this author in PubMed Google Scholar
Robert D. Sutherland
View author publications
You can also search for this author in PubMed Google Scholar
Roxanne Tapia
View author publications
You can also search for this author in PubMed Google Scholar
Judith G. Tesmer
View author publications
You can also search for this author in PubMed Google Scholar
Nina Thayer
View author publications
You can also search for this author in PubMed Google Scholar
Linda S. Thompson
View author publications
You can also search for this author in PubMed Google Scholar
Hope Tice
View author publications
You can also search for this author in PubMed Google Scholar
David C. Torney
View author publications
You can also search for this author in PubMed Google Scholar
Mary Tran-Gyamfi
View author publications
You can also search for this author in PubMed Google Scholar
Ming Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Levy E. Ulanovsky
View author publications
You can also search for this author in PubMed Google Scholar
Anna Ustaszewska
View author publications
You can also search for this author in PubMed Google Scholar
Nu Vo
View author publications
You can also search for this author in PubMed Google Scholar
P. Scott White
View author publications
You can also search for this author in PubMed Google Scholar
Albert L. Williams
View author publications
You can also search for this author in PubMed Google Scholar
Patricia L. Wills
View author publications
You can also search for this author in PubMed Google Scholar
Jung-Rung Wu
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Joan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Pieter DeJong
View author publications
You can also search for this author in PubMed Google Scholar
David Bruce
View author publications
You can also search for this author in PubMed Google Scholar
Norman A. Doggett
View author publications
You can also search for this author in PubMed Google Scholar
Larry Deaven
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy Schmutz
View author publications
You can also search for this author in PubMed Google Scholar
Jane Grimwood
View author publications
You can also search for this author in PubMed Google Scholar
Paul Richardson
View author publications
You can also search for this author in PubMed Google Scholar
Daniel S. Rokhsar
View author publications
You can also search for this author in PubMed Google Scholar
Evan E. Eichler
View author publications
You can also search for this author in PubMed Google Scholar
Paul Gilna
View author publications
You can also search for this author in PubMed Google Scholar
Susan M. Lucas
View author publications
You can also search for this author in PubMed Google Scholar
Richard M. Myers
View author publications
You can also search for this author in PubMed Google Scholar
Edward M. Rubin
View author publications
You can also search for this author in PubMed Google Scholar
Len A. Pennacchio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Edward M. Rubin or Len A. Pennacchio.

Ethics declarations

Competing interests

The authors declare that they have no competing financial interests.

Supplementary information

Supplementary Table 1

Chromosome features as determined by identical annotation methods across the 3 chromosomes annotated by JGI and comparison to select genome wide figures. PCG = Protein Coding Genes, PCT = Protein Coding Transcripts, CNS = Conserved Noncoding sequences as described in methods. Genome data derived from http://genome.ucsc.edu and Ensembl data from http://www.ensembl.org. N/D=Not Determined. (DOC 41 kb)

Supplementary Table 2

Bases involved in segmental duplication and pairwise alignment. Percent of non-redundant duplications are based on the total non-gap genome size 2,865,069,170 bp and chromosome 16 length of 78,884,752 bp. Segmental duplications with >90% sequence identity and > 1 kb were considered. (XLS 16 kb)

Supplementary Table 3

Duplicated genes. Duplications are binned by percent identity in 1% increment. Genes with at least one exon duplicated are listed. Exons with at least 50 bp of duplication were deemed duplicated. A gene could be duplicated multiple times at different percent identity. (XLS 12 kb)

Supplementary Table 4

Segmental duplication in pericentromeric and telomeric regions. Segmental duplication within 5 Mb of centromere and 2 Mb of the telomere of chromosome 16 are counted as pericentromeric and subtelomeric respectively. (XLS 14 kb)

Supplementary Table 5

Sequencing contributions by center and phase of completion. (DOC 20 kb)

Supplementary Figure 1

Distribution of segmental duplications. This schematic of chromosome 16 segmental duplications depicts the location of interchromosomal (red) and intrachromosomal (blue) duplicated sequence. Each horizontal line represents 5 Mb of sequence, with tick marks every 500 kb. Sequence gaps are represented as discontinuities within the horizontal line. The centromere is shown as a purple bar. Duplications detected by whole genome shotgun sequence are represented as green bars above the chromosome sequence. (PDF 44 kb)

Supplementary Figure 2

Sequence similarity and aligned bases of segmental duplications. For all pairwise alignments, the total number of aligned bases was calculated and binned based on percent sequence identity. Sequence identity distributions for interchromosomal (red) and intrachromosomal (blue) duplicated bases are shown. (PDF 560 kb)

Supplementary Figure 3

Sequence identity of segmental duplications on chromosome 16. Interchromosomal (red) and intrachromosomal duplications (blue) are shown to scale along the horizontal line in 2Mb increments. Green bars above the horizontal line correspond to duplications detected by other method, whole genome shotgun sequence detection ⁹. The underlying pairwise alignments of segmental duplications (>90% >1kb) are depicted as a function of % identity below the horizontal line. Different colors correspond to the location of the pairwise alignment on different human chromosomes. (i.e. chromosome 16 is shown as magenta, chromosome 18 as light blue). (PDF 89 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martin, J., Han, C., Gordon, L. et al. The sequence and analysis of duplication-rich human chromosome 16. Nature 432, 988–994 (2004). https://doi.org/10.1038/nature03187

Download citation

Received: 09 September 2004
Accepted: 15 November 2004
Issue Date: 23 December 2004
DOI: https://doi.org/10.1038/nature03187

This article is cited by

The scaffold protein AXIN1: gene ontology, signal network, and physiological function
- Lu Qiu
- Yixuan Sun
- Yanfeng Gao
Cell Communication and Signaling (2024)
16p13.11p11.2 triplication syndrome: a new recognizable genomic disorder characterized by optical genome mapping and whole genome sequencing
- Romain Nicolle
- Karine Siquier-Pernet
- Valérie Malan
European Journal of Human Genetics (2022)
Submicroscopic aberrations of chromosome 16 in prenatal diagnosis
- Xiaoqing Wu
- Liangpu Xu
- Yuan Lin
Molecular Cytogenetics (2019)
Mutations of N-Methyl-D-Aspartate Receptor Subunits in Epilepsy
- Xing-Xing Xu
- Jian-Hong Luo
Neuroscience Bulletin (2018)
Analyses of the genetic diversity and protein expression variation of the acyl: CoA medium-chain ligases, ACSM2A and ACSM2B
- Rencia van der Sluis
Molecular Genetics and Genomics (2018)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.