DNA sequence and analysis of human chromosome 18

Nusbaum, Chad; Zody, Michael C.; Borowsky, Mark L.; Kamal, Michael; Kodira, Chinnappa D.; Taylor, Todd D.; Whittaker, Charles A.; Chang, Jean L.; Cuomo, Christina A.; Dewar, Ken; FitzGerald, Michael G.; Yang, Xiaoping; Abouelleil, Amr; Allen, Nicole R.; Anderson, Scott; Bloom, Toby; Bugalter, Boris; Butler, Jonathan; Cook, April; DeCaprio, David; Engels, Reinhard; Garber, Manuel; Gnirke, Andreas; Hafez, Nabil; Hall, Jennifer L.; Norman, Catherine Hosage; Itoh, Takehiko; Jaffe, David B.; Kuroki, Yoko; Lehoczky, Jessica; Lui, Annie; Macdonald, Pendexter; Mauceli, Evan; Mikkelsen, Tarjei S.; Naylor, Jerome W.; Nicol, Robert; Nguyen, Cindy; Noguchi, Hideki; O'Leary, Sinéad B.; Piqani, Bruno; L Smith, Cherylyn; Talamas, Jessica A.; Topham, Kerri; Totoki, Yasushi; Toyoda, Atsushi; Wain, Hester M.; Young, Sarah K.; Zeng, Qiandong; Zimmer, Andrew R.; Fujiyama, Asao; Hattori, Masahira; Birren, Bruce W.; Sakaki, Yoshiyuki; Lander, Eric S.

doi:10.1038/nature03983

Letter
Published: 22 September 2005

DNA sequence and analysis of human chromosome 18

Chad Nusbaum¹,
Michael C. Zody¹,
Mark L. Borowsky¹,
Michael Kamal¹,
Chinnappa D. Kodira¹,
Todd D. Taylor²,
Charles A. Whittaker¹^nAff8,
Jean L. Chang¹,
Christina A. Cuomo¹,
Ken Dewar¹^nAff9,
Michael G. FitzGerald¹,
Xiaoping Yang¹,
Amr Abouelleil¹,
Nicole R. Allen¹,
Scott Anderson¹,
Toby Bloom¹,
Boris Bugalter¹,
Jonathan Butler¹,
April Cook¹,
David DeCaprio¹,
Reinhard Engels¹,
Manuel Garber¹,
Andreas Gnirke¹,
Nabil Hafez¹,
Jennifer L. Hall¹,
Catherine Hosage Norman¹,
Takehiko Itoh³,
David B. Jaffe¹,
Yoko Kuroki²,
Jessica Lehoczky¹^nAff10,
Annie Lui¹,
Pendexter Macdonald¹,
Evan Mauceli¹,
Tarjei S. Mikkelsen¹,
Jerome W. Naylor¹,
Robert Nicol¹,
Cindy Nguyen¹,
Hideki Noguchi^2,4,
Sinéad B. O'Leary¹,
Bruno Piqani¹,
Cherylyn L Smith¹,
Jessica A. Talamas¹,
Kerri Topham¹,
Yasushi Totoki²,
Atsushi Toyoda²,
Hester M. Wain⁵,
Sarah K. Young¹,
Qiandong Zeng¹,
Andrew R. Zimmer¹,
Asao Fujiyama^2,6,
Masahira Hattori^2,7,
Bruce W. Birren¹,
Yoshiyuki Sakaki² &
…
Eric S. Lander¹

Nature volume 437, pages 551–555 (2005)Cite this article

14k Accesses
44 Citations
7 Altmetric
Metrics details

A Corrigendum to this article was published on 01 December 2005

Abstract

Chromosome 18 appears to have the lowest gene density of any human chromosome and is one of only three chromosomes for which trisomic individuals survive to term¹. There are also a number of genetic disorders stemming from chromosome 18 trisomy and aneuploidy. Here we report the finished sequence and gene annotation of human chromosome 18, which will allow a better understanding of the normal and disease biology of this chromosome. Despite the low density of protein-coding genes on chromosome 18, we find that the proportion of non-protein-coding sequences evolutionarily conserved among mammals is close to the genome-wide average. Extending this analysis to the entire human genome, we find that the density of conserved non-protein-coding sequences is largely uncorrelated with gene density. This has important implications for the nature and roles of non-protein-coding sequence elements.

You have full access to this article via your institution.

Download PDF

Genome assembly in the telomere-to-telomere era

Article 22 April 2024

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range

Article Open access 11 April 2024

Natural antisense transcripts as versatile regulators of gene expression

Article 17 April 2024

Main

The International Human Genome Sequencing Consortium (IHGSC) recently completed a sequence of the human genome and published a report on the finishing of the human genome^2,3. Now, papers containing detailed reports about each human chromosome are bringing to light aspects of the biomedical and evolutionary implications of this work. Here we describe the completion of a physical map, high-quality finished sequence, and gene catalogue for human chromosome 18, which represents approximately 2.7% of the human genome.

The extremely low density of protein-coding genes on chromosome 18 (Table 1) offers an opportunity to study the conservation of non-protein-coding sequences. It was recently observed that, in addition to protein-coding sequences, ∼3% of the human genome shows a degree of evolutionary conservation among mammals that is significantly higher than background⁴. It is unclear whether this sequence consists mostly of regulatory elements related to genes or whether it represents other elements not tightly coupled to genes. These alternatives can be explored by comparing gene-rich and gene-poor chromosomes to see whether the proportion of conserved non-protein-coding sequence tends to scale with gene density or is unrelated to gene density.

Table 1 Chromosome 18 gene content

Full size table

The finished sequence of chromosome 18 contains 76,117,153 bases and is interrupted by three euchromatic gaps, one gap at the 18q telomere and one gap containing the centromeric heterochromatin (Fig. 1 and Supplementary Table S2). These gaps are refractory to current cloning and mapping technology. The sizes of the euchromatic gaps were estimated by alignment to the regions of conserved synteny in the mouse genome⁴ (see Methods). The size of the telomeric gap was estimated using the size of the telomeric half-YAC (yeast artificial chromosome). The total size of these gaps is estimated at 118 kb. This corresponds to <0.2% of the euchromatic length of the chromosome, substantially lower than the average across the human genome (cited in ref. 3, also refs 5–7). Of the finished sequence, 79% was generated by the Broad Institute of MIT and Harvard (formerly the Whitehead Institute/MIT Center for Genome Research or WICGR), 20% by the RIKEN Genomic Sciences Center, and the remaining 1% by three other research groups (Supplementary Tables S3, S4). Details of construction of the clone map and sequencing are described in the Supplementary Information.

Figure 1: **Overview of human chromosome 18.**

Several analyses verify that nearly the entire euchromatic region of chromosome 18 is present and accurately represented in the finished sequence. Of the 332 gene sequences in the well-curated RefSeq⁸ data set that have been mapped to chromosome 18, all are present and complete in the finished sequence. In addition, the finished sequence shows excellent alignment to genetic and radiation hybrid maps (Supplementary Fig. S1). The genetic map⁹ shows perfect alignment, with no discrepancies among 156 sequence-based genetic markers (Supplementary Table S5). The radiation hybrid map¹⁰ shows good agreement, but contains local discrepancies as would be expected from its lower resolution (Supplementary Table S6).

We assessed the local accuracy of the clone path by aligning paired-end sequences from a human Fosmid library (designated WIBR2, representing 10 × physical coverage) to the finished sequence³. By identifying discrepancies in the distances between Fosmid ends in the finished sequence and those expected on the basis of insert size constraints, one can detect errors in the clone path³. Our analysis revealed a single aberrant region, which was found to result from a bacterial artificial chromosome (BAC) clone containing a 21-kb deletion that was either present in the source genome or occurred in the cloning of the BAC; this clone was replaced with a non-deleted BAC from a different library. Finally, an independent quality assessment exercise commissioned by NHGRI estimated the accuracy of the finished sequence at less than one error per 100,000 bases¹¹ (J. Schmutz, personal communication).

We produced a manually curated catalogue of genes (see Methods), annotating 337 gene loci and 171 pseudogene loci on chromosome 18. These include all previously known genes on chromosome 18 (Table 1). According to the Hawk2 categorization scheme (http://www.sanger.ac.uk/Info/workshops/hawk2, see Supplementary Information) there are 243 ‘known’ genes, 49 ‘novel CDS’ (coding sequence of a gene), 10 ‘novel transcripts’, 11 ‘putative genes’, 11 ‘predictedplus genes’ and 13 ‘gene fragments’. All ‘novel transcript’ genes had expressed-sequence-tag (EST) evidence. For ‘putative genes’, only a subset of the exons were supported by one or more spliced ESTs. Only a small fraction of all loci, those in the ‘novel’ and ‘putative’ categories, were annotated as genes on the basis of spliced EST evidence only. Some ‘gene fragment’ loci may prove to be pseudogenes.

Using aligned EST evidence, it was possible to extend many of the previously known gene models at their 5′ or 3′ ends (see Supplementary Fig. S2 for an example). Approximately 57% of the RefSeq and mammalian gene collection (MGC) transcripts could be extended. The 5′ end extensions averaged 321 bp, and 3′ end extensions averaged 1,131 bp. In addition, a novel 5′ exon was found for 14% of the RefSeq or MGC transcripts, and a novel 3′ exon was found for 2.2%. The ability to extend the gene models probably reflects expanded databases of transcripts and ESTs. A sampling of the extended gene models was validated in the laboratory (see Supplementary Information).

We found an average of 10.7 exons per full-length known transcript, comparable to recent published reports of human chromosomes. Internal exon lengths average 155 bp, and the average transcript length is 3.1 kb for full-length transcripts of known genes. There is evidence of extensive alternative splicing, with gene loci having an average of 3.1 distinct transcripts and 71% having at least two transcripts. This rate of alternative splicing is comparable to recent reports^5,6.

The longest gene on chromosome 18 is DCC (deleted in colorectal carcinoma), spanning 1,190,632 bp. DCC also contains the longest intron at 411,177 bp. The longest mature transcript is laminin α3 (LAMA3) at 10,585 bp. The longest single exon is found in TCF4, being a 3′ exon of 5,700 bp. The gene with the most identified splice forms is TGIF (TGFβ-induced factor), which appears to have ten splice forms, of which two are represented by RefSeq transcripts. Of the 171 pseudogenes on chromosome 18, approximately two-thirds are processed (intronless) pseudogenes arising from retroposition, and the remaining one-third are unprocessed. In addition, we identified four transfer RNA genes on the chromosome, listed in Supplementary Table S7. An analysis of gene families revealed that several families have multiple members present on chromosome 18. These include members of the laminin and cadherin families of cell adhesion molecules, and a cluster of ten serpin serine protease inhibitors (see Supplementary Information). Careful analysis of gene models found 59 pairs of overlapping genes on chromosome 18, suggesting that overlapping genes may be 2–4 times more common than previously thought^12,13 (see Supplementary Information).

With an average of 4.4 genes per megabase (Mb), chromosome 18 has the lowest gene density of published human chromosomes (Supplementary Table S1). This gene density cannot be explained by chance fluctuation around a genome-wide mean (P < 10^-12, see Supplementary Information). The low gene density is reflected both in the low percentage of transcribed sequence (28.5%) and the small fraction of the chromosome included in exons (1.14% in all exons, 1.06% in coding exons). The G + C content (39.8%) is also low, consistent with the known positive correlation between G + C content and gene number¹⁴.

Chromosome 18 contains 24 gene deserts (defined as a 500-kb region without a coding gene, Supplementary Table S8), which together comprise 28 Mb or ∼38% of the total chromosome length. The sparsest region of the chromosome harbours only three genes over 4.5 Mb. In addition, chromosome 18 also has the longest median length of introns among all chromosomes, reflecting a genome-wide inverse correlation between intron size and gene density (Supplementary Fig. S3).

Despite being gene-poor, chromosome 18 is not enriched in repeat sequences. Transposable element fossils cover 43.5% of the chromosome, which is typical across the genome. Chromosome 18 also has a relatively low proportion of segmental duplication (segmental duplications are defined as having greater than 90% identity and being longer than 1 kb). Segmental duplications constitute ∼2.5% (1.92 Mb) of the chromosome, with a greater representation of interchromosomal duplications (2.13%) than intrachromosomal duplications (0.55%). Some sequences are represented in both types of duplication (E. Eichler and X. She, personal communication).

The paucity of genes on chromosome 18 probably explains why it is one of only three autosomes (the others being chromosomes 13 and 21) for which trisomic individuals routinely survive to term¹ (www.trisomy.org, www.ndss.org). Although chromosomes 18 and 21 have roughly the same number of RefSeq genes (332 and 374 genes, respectively), chromosome 18 trisomy (Edwards syndrome) has much more severe health effects than chromosome 21 trisomy (Down syndrome). Edwards syndrome occurs in 1 in 5,000 live births, and ∼90% of affected individuals die before one year of age. In constrast, Down syndrome is more common (1 in 800 live births), and affected individuals are frequently able to cope with the numerous health consequences and survive to adulthood. The availability of gene catalogues for these two chromosomes will facilitate work to elucidate how the contributions of specific genes lead to such different clinical outcomes.

Four other syndromes are caused by gross abnormalities in chromosome 18, including three partial monosomies caused by deletion of part of the p or q arms (18p-, 18q- and ring18) and tetrasomy of the p arm (www.chromosome18.org). The gene catalogue presented here should facilitate identification of the critical genes associated with each syndrome.

At least 45 loci on chromosome 18 have been implicated in genetic disorders¹⁵ (Supplementary Table S9). The list includes at least four disorders for which the responsible gene and molecular mechanism of disease have been characterized (Supplementary Table S9). For two such diseases (methemoglobinaemia and erythropoietic protoporphyria), we found evidence for novel alternative splice forms that would result in coding sequence alterations (not shown).

Comparative gene analysis revealed one locus that may represent a newly evolved gene in the primate lineage, although its function is unknown. Among the annotated multi-exon genes contained in blocks of conserved synteny among mammals, only one lacks exonic conservation with rodents and dog: C18orf2, a predicted RefSeq gene. Within this block of conserved synteny there is a primate-specific ∼100-kb inversion in the region (present in both human and chimpanzee). One of the endpoints of this inversion lies in the middle of the coding region of the gene, with the result that the region is not contiguous in either dog or rodent genomes. Partial sequencing of this gene in apes suggests that it is conserved at least as far back as orangutan (see Supplementary Information).

We compared chromosome 18 to its homologue chimpanzee chromosome 18 (ref. 16). The average sequence divergence is 1.25%, which is close to the genome-wide average. On a larger scale, the karyotype of human chromosome 18 differs from its homologues in the great apes by a human-specific pericentric inversion with an associated human-specific inverted duplication of 19 kb (refs 17, 18). As a consequence, human 18p corresponds to the proximal region of chimpanzee 18q. As large-scale chromosomal rearrangements can facilitate speciation^19,20, it is possible that this inversion had had a role in hominid evolution.

Finally, we sought to explore the still-mysterious nature of conserved non-protein-coding sequences. Recent comparison of the human and mouse genomes⁴ led to the surprising discovery that ∼5% of the human genome shows evolutionary conservation higher than the background rate (defined as the rate seen in ancestral repeat elements, which are presumed to be non-functional). Similar results have been seen in comparisons between the human and rat genomes²¹. As only 1–2% of the human genome encodes protein-coding exons, this indicates that the majority of human sequence under purifying selection is non-protein-coding. In principle, these non-protein-coding sequences could be (1) associated with protein-coding genes, such as those that directly or indirectly regulate the expression of protein-coding genes, or (2) independent of protein-coding genes, such as those that play a structural role in chromosome architecture or those that encode RNA genes.

We calculated the overall proportion of bases on each chromosome that are under purifying selection, and allocated this proportion as either protein-coding or non-protein-coding (see Methods). The computational analysis closely followed that used in recent mammalian comparisons^4,22 (see Methods). We compared the proportion of total sequence under selection (Fig. 2a) and non-protein-coding sequence under selection (Fig. 2b) to the proportion of coding sequence for each human chromosome. Chromosome 18 contains a low overall proportion of sequence under selection, but this is almost entirely explained by its low coding density, as there is no deficit in non-protein-coding sequence under selection. Approximately 4.2% of the bases on chromosome 18 appear to be under purifying selection, consisting of 0.6% in exons of protein-coding genes and 3.6% in non-protein-coding elements. The proportion of non-protein-coding sequence under selection is typical for human chromosomes. (Note that chromosomes 19 and 22 are outliers in this analysis; the many local gene family expansions make it difficult to assign orthology.)

Figure 2: **Scatter plots showing the fraction of syntenic region under selection plotted against the fraction of coding sequence in that region.**

As chromosomes vary widely in size, we repeated the analysis for 5-Mb windows across the human genome (Fig. 2c, d). Although there is more scatter in the data, the overall conclusion is very similar. Notably, the average proportion of non-protein-coding selected sequence in a window is ∼3.8%, and is slightly negatively correlated (R² = 0.08) with the proportion of coding sequence in the window.

Our analysis shows that the density of conserved non-protein-coding sequences is largely independent of the density of protein-coding genes. It is interesting to note that examination of non-coding aligned sequences between human and chicken²³ showed a negative correlation with coding content, and a study of highly conserved non-coding sequences in intergenic regions of human chromosome 21 did not identify tight coupling to the starts and ends of genes^24,25.

What is the nature of the non-protein-coding elements? First, the elements might encode transcripts that are not translated into proteins, such as small RNA genes or large regulatory RNAs²⁶. Second, they might serve a structural role, with a constant density of such elements required to maintain chromosome structure independent of gene density. Such structural elements could be evolutionarily essential for maintenance of a region, but might be dispensable if the entire region were to be deleted; this might explain the recent observation in mouse that a 1-Mb deletion in a gene desert containing highly conserved elements has no discernable phenotypic effect²⁷. Third, the elements may be largely related to the regulation of protein-coding genes, but their distribution may be inversely correlated with gene density^28,29. It is possible that genes in gene-poor regions tend to have more elaborate regulatory controls, and this could partially explain the relative sparsity of genes in such regions. In any case, it is clear that the finished sequence of the human genome will reveal many features of biological function and provide a firm foundation for future systematic analyses.

Methods

Generation of the gene catalogue

We started by aligning all available human RefSeq, MGC and GenBank messenger RNA sequences, as well as GenPept sequences from several species, to the finished sequence. Gene models were inspected manually to ensure accurate transcriptional start and stop sites, and to correct splice sites. Non-canonical splice sites were used only if supported by sufficient complementary DNA-based evidence. Partial transcripts (those containing a partial open reading frame (ORF) or overlapping non-coding exons of sibling transcripts) were annotated in cases for which there was firm evidence of their existence. Gene symbols for biologically characterized loci were assigned by the HUGO Gene Nomenclature Committee. See Supplementary Table S10 for a complete list of gene symbols. Our annotations are available from the Vertebrate Genome Annotation database (VEGA, http://vega.sanger.ac.uk/Homo_sapiens).

Comparative analysis: creation of synteny maps

We performed full genomic alignments of repeat masked sequence from mouse⁴ (builds 31 and 33), rat²¹ and dog (CanFam 1.0; K. Lindblad-Toh, personal communication) with the human genome sequence using the PatternHunter program³⁰. We did this for human build 34 with the Broad finished chromosomes (8, 15, 17, 18) inserted, and also for human build 35 (mouse build 31 was used against human build 34, and mouse build 33 against human build 35). From these alignments we identified collinear clusters of conserved microsynteny, which were then used to form larger syntenic segments in a hierarchical fashion. Syntenic maps and their underlying syntenic anchors serve as the basis for identification of conserved elements.

Comparative analysis: identification of conserved elements

Starting with large-scale syntenic blocks defined by the human–mouse and human–dog syntenic maps, we generated pair-wise alignments within these syntenic blocks using the PatternHunter program³⁰. We then scanned 50-bp windows with 5-bp offset and calculated the fraction of aligning bases that were matches (discarding windows with fewer than 20 aligning bases). These percentage conservation values were locally normalized to the average conservation in the surrounding 5 Mb to generate Z-scores measuring divergence from the local average (0) for every window. We examined the joint empirical distribution of mouse and dog Z-scores for windows contained within ancestral repeat sequence (undergoing neutral evolution and believed to predate the mouse–human split) and windows overlapping coding exons (Supplementary Fig. S4a). Coding sequence is defined as all bases that are annotated as coding in any transcript. All analysis presented uses Ensembl³¹ genes on human build 35; analysis with both Ensembl and Broad annotations on build 34 yields substantially similar results (Supplementary Information).

We combined dog and mouse Z-scores to generate a ‘composite’ Z-score (see Supplementary Information). We estimated the distribution of composite Z-scores for selected sequence by decomposing the global distribution of Z-scores into two components: a ‘neutral distribution’ centred at zero and corresponding to the conservation scores for ancestral repeat sequences, and a ‘selected distribution’ consisting of the residual after subtraction of the neutral distribution (Supplementary Fig. S4b). Taking into account the relative fractions of the aligning windows in each distribution, we were able to assign a probability that a window at a given score is under purifying selection.

We then divided the genome into non-overlapping 5-Mb windows. Within each such window, we counted the number of syntenic bases, the number of syntenic 50-bp windows, and the number of 50-bp windows under selection. The fraction of coding sequence (the explanatory variable in all regressions) was taken as the number of syntenic bases annotated as coding divided by the number of syntenic bases. The fraction under selection was calculated as the sum of all selection probabilities for all windows divided by the number of syntenic windows. If windows of only a certain class were considered, the probabilities were calculated only for windows in that class. We note that, on average, windows contained within coding exons scored only slightly higher than 0.67 probability of selection, owing to the large prior probability of neutrality. Thus, the slopes of all regressions are <1. For all analyses, we discarded any 5-Mb window with less than 4 Mb of syntenically assigned sequence (retaining >85% of all windows of non-zero euchromatic length). Similar results are obtained if the discarded windows are included, but the variance is higher.

Annotation

RefSeq (release 1), mammalian gene collection (MGC, 3 February 2003), dbEST and GenBank (29 December 2002) mRNAs were aligned to the genomic assembly using BLAT³². GenPept protein sequences (3 February 2003) were aligned using BLASTX³³ and GeneWise³⁴. All gene models were created manually using these aligned sequences as evidence, following HAWK2 (www.sanger.ac.uk/Info/workshops/hawk2) transcript type conventions. Gene models derived from aligned mRNA evidence were extended when possible using spliced EST evidence at the 5′ end and spliced and unspliced EST evidence in the 3′ untranslated region (UTR). Evidence was given relative priority as follows (high–low): RefSeq/MGC, GeneWise, other mRNAs, spliced ESTs and unspliced ESTs. We found CpG islands within 2-kb upstream and 1-kb downstream of the 5′ end of 73% of known category loci, which is somewhat higher than previous reports (in the range of 61–66%; cited in ref. 3, also refs 5–7).

References

Hernandez, D. & Fisher, E. M. Mouse autosomal trisomy: two's company, three's a crowd. Trends Genet. 15, 241–247 (1999)
Article CAS Google Scholar
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)
Google Scholar
International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004)
Article ADS Google Scholar
Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002)
Article Google Scholar
Grimwood, J. et al. The DNA sequence and biology of human chromosome 19. Nature 428, 529–535 (2004)
Article ADS CAS Google Scholar
Deloukas, P. et al. The DNA sequence and comparative analysis of human chromosome 10. Nature 429, 375–381 (2004)
Article ADS CAS Google Scholar
Martin, J. et al. The sequence and analysis of duplication-rich human chromosome 16. Nature 432, 988–994 (2004)
Article ADS CAS Google Scholar
Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33, D501–D504 (2005)
Article CAS Google Scholar
Kong, A. et al. A high-resolution recombination map of the human genome. Nature Genet. 31, 241–247 (2002)
Article CAS Google Scholar
Schuler, G. D. et al. A gene map of the human genome. Science 274, 540–546 (1996)
Article ADS CAS Google Scholar
Schmutz, J. et al. Quality assessment of the human genome sequence. Nature 429, 365–368 (2004)
Article ADS CAS Google Scholar
Yelin, R. et al. Widespread occurrence of antisense transcription in the human genome. Nature Biotechnol. 21, 379–386 (2003)
Article CAS Google Scholar
Veeramachaneni, V., Makalowski, W., Galdzicki, M., Sood, R. & Makalowska, I. Mammalian overlapping genes: the comparative perspective. Genome Res. 14, 280–286 (2004)
Article CAS Google Scholar
Mouchiroud, D. et al. The distribution of genes in the human genome. Gene 100, 181–187 (1991)
Article CAS Google Scholar
Rebhan, M. et al. GeneCards: encyclopedia for genes, proteins and diseases. (Weizmann Institute of Science, Bioinformatics Unit and Genome Center, Rehovot, Israel) http://bioinformatics.weizmann.ac.il/cards (1997).
The Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87 (2005)
Article Google Scholar
Dennehey, B. K., Gutches, D. G., McConkey, E. H. & Krauter, K. S. Inversion, duplication, and changes in gene context are associated with human chromosome 18 evolution. Genomics 83, 493–501 (2004)
Article CAS Google Scholar
Goidts, V., Szamalek, J. M., Hameister, H. & Kehrer-Sawatzki, H. Segmental duplication associated with the human-specific inversion of chromosome 18: a further example of the impact of segmental duplications on karyotype and genome evolution in primates. Hum. Genet. 115, 116–122 (2004)
Article CAS Google Scholar
King, M. Species Evolution 72–91 (Cambridge Univ. Press, Cambridge, 1993)
Google Scholar
Delneri, D. et al. Engineering evolution to study speciation in yeasts. Nature 422, 68–72 (2003)
Article ADS CAS Google Scholar
Gibbs, R. A. et al. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428, 493–521 (2004)
Article ADS CAS Google Scholar
Chiaromonte, F. et al. The share of human genomic DNA under selection estimated from human-mouse genomic alignments. Cold Spring Harb. Symp. Quant. Biol. 68, 245–254 (2003)
Article CAS Google Scholar
Hillier, L. W. et al. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716 (2004)
Article ADS CAS Google Scholar
Dermitzakis, E. T. et al. Comparison of human chromosome 21 conserved nongenic sequences (CNGs) with the mouse and dog genomes shows that their selective constraint is independent of their genic environment. Genome Res. 14, 852–859 (2004)
Article CAS Google Scholar
Dermitzakis, E. T., Reymond, A. & Antonarakis, S. Conserved non-genic sequences—an unexpected feature of mammalian genomes. Nature Rev. Genet. 6, 151–157 (2005)
Article CAS Google Scholar
Rastegar, M. et al. Sequential histone modifications at Hoxd4 regulatory regions distinguish anterior from posterior embryonic compartments. Mol. Cell. Biol. 24, 8090–8103 (2004)
Article CAS Google Scholar
Nobrega, M. A., Zhu, Y., Plajzer-Frick, I., Afzal, V. & Rubin, E. M. Megabase deletions of gene deserts result in viable mice. Nature 431, 988–993 (2004)
Article ADS CAS Google Scholar
Ovcharenko, I. et al. Evolution and functional classification of vertebrate gene deserts. Genome Res. 15, 137–145 (2005)
Article CAS Google Scholar
Nobrega, M. A., Ovcharenko, I., Afzal, V. & Rubin, E. M. Scanning human gene deserts for long-range enhancers. Science 302, 413 (2003)
Article CAS Google Scholar
Ma, B., Tromp, J. & Li, M. PatternHunter: faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)
Article CAS Google Scholar
Hubbard, T. et al. The Ensembl genome database project. Nucleic Acids Res. 30, 38–41 (2002)
Article CAS Google Scholar
Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002)
Article CAS Google Scholar
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Article CAS Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004)
Article CAS Google Scholar

Download references

Acknowledgements

Special thanks are due to L. Gaffney for help with the manuscript, figures and tables, and to K. Lance for help with the manuscript. We are grateful to E. Eichler and X. She for sharing data on segmental duplications, T. Furey for help with lists of genetic markers and placement of RefSeqs, and K. Lindblad-Toh for sharing data from the dog genome project. In addition, we thank Ming Li and Bin Ma (BioInformatics Solutions Inc.) for providing PatternHunter and advice about how to choose appropriate parameters. We also acknowledge the HUGO Gene Nomenclature Committee (S. Povey, E. A. Bruford, V. K. Khodiyar, R. C. Lovering, M. J. Lush, T. P. Sneddon, C. C. Talbot Jr and M. W. Wright) for assigning official gene symbols. We are grateful to all the members, present and past, of the Broad (and Whitehead) sequencing platform for the consistent high quality of their data.

Author information

Charles A. Whittaker
Present address: MIT Center for Cancer Research, 77 Mass Avenue E18-570, Cambridge, Massachusetts, 02139, USA
Ken Dewar
Present address: McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, H3A 1A4, Canada
Jessica Lehoczky
Present address: Department of Human Genetics, University of Michigan Medical School, 1500 East Medical Center Drive, Ann Arbor, Michigan, 48109, USA

Authors and Affiliations

Broad Institute of MIT and Harvard, 320 Charles Street, Massachusetts, 02141, Cambridge, USA
Chad Nusbaum, Michael C. Zody, Mark L. Borowsky, Michael Kamal, Chinnappa D. Kodira, Charles A. Whittaker, Jean L. Chang, Christina A. Cuomo, Ken Dewar, Michael G. FitzGerald, Xiaoping Yang, Amr Abouelleil, Nicole R. Allen, Scott Anderson, Toby Bloom, Boris Bugalter, Jonathan Butler, April Cook, David DeCaprio, Reinhard Engels, Manuel Garber, Andreas Gnirke, Nabil Hafez, Jennifer L. Hall, Catherine Hosage Norman, David B. Jaffe, Jessica Lehoczky, Annie Lui, Pendexter Macdonald, Evan Mauceli, Tarjei S. Mikkelsen, Jerome W. Naylor, Robert Nicol, Cindy Nguyen, Sinéad B. O'Leary, Bruno Piqani, Cherylyn L Smith, Jessica A. Talamas, Kerri Topham, Sarah K. Young, Qiandong Zeng, Andrew R. Zimmer, Bruce W. Birren & Eric S. Lander
RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
Todd D. Taylor, Yoko Kuroki, Hideki Noguchi, Yasushi Totoki, Atsushi Toyoda, Asao Fujiyama, Masahira Hattori & Yoshiyuki Sakaki
Mitsubishi Research Institute Inc., 2-3-6 Otemachi, Chiyoda-ku, 100-8141, Tokyo, Japan
Takehiko Itoh
University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, 277-0882, Japan
Hideki Noguchi
HUGO Gene Nomenclature Committee, The Galton Laboratory, Department of Biology, University College London, Wolfson House, 4 Stephenson Way, NW1 2HE, London, UK
Hester M. Wain
National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, 101-8430, Tokyo, Japan
Asao Fujiyama
Kitasato Institute for Life Sciences, Kitasato University 1-15-1, Kitasato, Sagamihara, Kanagawa, 228-8555, Japan
Masahira Hattori

Authors

Chad Nusbaum
View author publications
You can also search for this author in PubMed Google Scholar
Michael C. Zody
View author publications
You can also search for this author in PubMed Google Scholar
Mark L. Borowsky
View author publications
You can also search for this author in PubMed Google Scholar
Michael Kamal
View author publications
You can also search for this author in PubMed Google Scholar
Chinnappa D. Kodira
View author publications
You can also search for this author in PubMed Google Scholar
Todd D. Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Charles A. Whittaker
View author publications
You can also search for this author in PubMed Google Scholar
Jean L. Chang
View author publications
You can also search for this author in PubMed Google Scholar
Christina A. Cuomo
View author publications
You can also search for this author in PubMed Google Scholar
Ken Dewar
View author publications
You can also search for this author in PubMed Google Scholar
Michael G. FitzGerald
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoping Yang
View author publications
You can also search for this author in PubMed Google Scholar
Amr Abouelleil
View author publications
You can also search for this author in PubMed Google Scholar
Nicole R. Allen
View author publications
You can also search for this author in PubMed Google Scholar
Scott Anderson
View author publications
You can also search for this author in PubMed Google Scholar
Toby Bloom
View author publications
You can also search for this author in PubMed Google Scholar
Boris Bugalter
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Butler
View author publications
You can also search for this author in PubMed Google Scholar
April Cook
View author publications
You can also search for this author in PubMed Google Scholar
David DeCaprio
View author publications
You can also search for this author in PubMed Google Scholar
Reinhard Engels
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Garber
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Gnirke
View author publications
You can also search for this author in PubMed Google Scholar
Nabil Hafez
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer L. Hall
View author publications
You can also search for this author in PubMed Google Scholar
Catherine Hosage Norman
View author publications
You can also search for this author in PubMed Google Scholar
Takehiko Itoh
View author publications
You can also search for this author in PubMed Google Scholar
David B. Jaffe
View author publications
You can also search for this author in PubMed Google Scholar
Yoko Kuroki
View author publications
You can also search for this author in PubMed Google Scholar
Jessica Lehoczky
View author publications
You can also search for this author in PubMed Google Scholar
Annie Lui
View author publications
You can also search for this author in PubMed Google Scholar
Pendexter Macdonald
View author publications
You can also search for this author in PubMed Google Scholar
Evan Mauceli
View author publications
You can also search for this author in PubMed Google Scholar
Tarjei S. Mikkelsen
View author publications
You can also search for this author in PubMed Google Scholar
Jerome W. Naylor
View author publications
You can also search for this author in PubMed Google Scholar
Robert Nicol
View author publications
You can also search for this author in PubMed Google Scholar
Cindy Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Hideki Noguchi
View author publications
You can also search for this author in PubMed Google Scholar
Sinéad B. O'Leary
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Piqani
View author publications
You can also search for this author in PubMed Google Scholar
Cherylyn L Smith
View author publications
You can also search for this author in PubMed Google Scholar
Jessica A. Talamas
View author publications
You can also search for this author in PubMed Google Scholar
Kerri Topham
View author publications
You can also search for this author in PubMed Google Scholar
Yasushi Totoki
View author publications
You can also search for this author in PubMed Google Scholar
Atsushi Toyoda
View author publications
You can also search for this author in PubMed Google Scholar
Hester M. Wain
View author publications
You can also search for this author in PubMed Google Scholar
Sarah K. Young
View author publications
You can also search for this author in PubMed Google Scholar
Qiandong Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Andrew R. Zimmer
View author publications
You can also search for this author in PubMed Google Scholar
Asao Fujiyama
View author publications
You can also search for this author in PubMed Google Scholar
Masahira Hattori
View author publications
You can also search for this author in PubMed Google Scholar
Bruce W. Birren
View author publications
You can also search for this author in PubMed Google Scholar
Yoshiyuki Sakaki
View author publications
You can also search for this author in PubMed Google Scholar
Eric S. Lander
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chad Nusbaum.

Ethics declarations

Competing interests

Accession numbers for all clones contributing to the finished sequence of human chromosome 18 can be found in Supplementary Table S3. The updated human chromosome 18 sequence can be accessed through GenBank accession number NC_000018. Reprints and permissions information is available at npg.nature.com/reprintsandpermissions. The authors declare no competing financial interests.

Supplementary information

Supplementary Notes

This file contains the Supplementary Methods (including those for mapping, sequencing and annotation; validation of map, sequence and annotation; comparative analysis), Supplementary Data (gene families, overlapping genes, domains, paralogs and potential newly evolving genes) and the Supplementary Discussion of gene density and conservation of non-protein coding sequences. This file also contains additional references. (DOC 105 kb)

Supplementary Tables S1-S11

Supplementary Table S1-S11 provide information on: chromosome properties; gaps, clones, genetic markers and radiation hybrid markers on the chromosome 18 path; contributing sequencing centers; gene deserts and disease genes on chromosome 18; randomness of gene distribution by chromosome; gene names and symbols; overlapping genes on chromosome 18. (DOC 2485 kb)

Supplementary Figures S1-S4

Supplementary Figure S1, the relationship of the finished sequence map to genetic and radiation hybrid maps of chromosome 18. Supplementary Figure S2, an example of gene model extension. Supplementary Figure S3, the relationship between gene density and median intron length. Supplementary Figure S4, Distributions of Z-scores for conserved elements analysis. (PDF 318 kb)

Supplementary Figure Legends

Text to accompany the above Supplementary Figures. (DOC 24 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nusbaum, C., Zody, M., Borowsky, M. et al. DNA sequence and analysis of human chromosome 18. Nature 437, 551–555 (2005). https://doi.org/10.1038/nature03983

Download citation

Received: 21 January 2005
Accepted: 27 June 2005
Issue Date: 22 September 2005
DOI: https://doi.org/10.1038/nature03983

This article is cited by

The molecular mechanisms of recombinant chromosome 18 with parental pericentric inversions and a review of the literature
- Lingxi Wang
- Bing Dong
- Yong Wu
Journal of Human Genetics (2023)
Karyotyping and prenatal diagnosis of 47,XX,+ 8[67]/46,XX [13] Mosaicism: case report and literature review
- Shaohua Sun
- Fang Zhan
- Donghua Cao
BMC Medical Genomics (2019)
Global analyses of Chromosome 17 and 18 genes of lung telocytes compared with mesenchymal stem cells, fibroblasts, alveolar type II cells, airway epithelial cells, and lymphocytes
- Jian Wang
- Ling Ye
- Xiangdong Wang
Biology Direct (2015)
The origin and evolution of vertebrate sex chromosomes and dosage compensation
- A M Livernois
- J A M Graves
- P D Waters
Heredity (2012)
Little ROCK is a ROCK1 pseudogene expressed in human smooth muscle cells
- Maria Claudia Montefusco
- Kristen Merlo
- Gordon S Huggins
BMC Genetics (2010)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

DNA sequence and analysis of human chromosome 18

Abstract

Similar content being viewed by others

Genome assembly in the telomere-to-telomere era

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range

Natural antisense transcripts as versatile regulators of gene expression

Main

Methods

Generation of the gene catalogue

Comparative analysis: creation of synteny maps

Comparative analysis: identification of conserved elements

Annotation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Notes

Supplementary Tables S1-S11

Supplementary Figures S1-S4

Supplementary Figure Legends

Rights and permissions

About this article

Cite this article

This article is cited by

The molecular mechanisms of recombinant chromosome 18 with parental pericentric inversions and a review of the literature

Karyotyping and prenatal diagnosis of 47,XX,+ 8[67]/46,XX [13] Mosaicism: case report and literature review

Global analyses of Chromosome 17 and 18 genes of lung telocytes compared with mesenchymal stem cells, fibroblasts, alveolar type II cells, airway epithelial cells, and lymphocytes

The origin and evolution of vertebrate sex chromosomes and dosage compensation

Little ROCK is a ROCK1 pseudogene expressed in human smooth muscle cells

Comments

Search

Quick links

Abstract

Similar content being viewed by others

Main

Methods

Generation of the gene catalogue

Comparative analysis: creation of synteny maps

Comparative analysis: identification of conserved elements

Annotation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links