The DNA sequence and comparative analysis of human chromosome 10

Deloukas, P.; Earthrowl, M. E.; Grafham, D. V.; Rubenfield, M.; French, L.; Steward, C. A.; Sims, S. K.; Jones, M. C.; Searle, S.; Scott, C.; Howe, K.; Hunt, S. E.; Andrews, T. D.; Gilbert, J. G. R.; Swarbreck, D.; Ashurst, J. L.; Taylor, A.; Battles, J.; Bird, C. P.; Ainscough, R.; Almeida, J. P.; Ashwell, R. I. S.; Ambrose, K. D.; Babbage, A. K.; Bagguley, C. L.; Bailey, J.; Banerjee, R.; Bates, K.; Beasley, H.; Bray-Allen, S.; Brown, A. J.; Brown, J. Y.; Burford, D. C.; Burrill, W.; Burton, J.; Cahill, P.; Camire, D.; Carter, N. P.; Chapman, J. C.; Clark, S. Y.; Clarke, G.; Clee, C. M.; Clegg, S.; Corby, N.; Coulson, A.; Dhami, P.; Dutta, I.; Dunn, M.; Faulkner, L.; Frankish, A.; Frankland, J. A.; Garner, P.; Garnett, J.; Gribble, S.; Griffiths, C.; Grocock, R.; Gustafson, E.; Hammond, S.; Harley, J. L.; Hart, E.; Heath, P. D.; Ho, T. P.; Hopkins, B.; Horne, J.; Howden, P. J.; Huckle, E.; Hynds, C.; Johnson, C.; Johnson, D.; Kana, A.; Kay, M.; Kimberley, A. M.; Kershaw, J. K.; Kokkinaki, M.; Laird, G. K.; Lawlor, S.; Lee, H. M.; Leongamornlert, D. A.; Laird, G.; Lloyd, C.; Lloyd, D. M.; Loveland, J.; Lovell, J.; McLaren, S.; McLay, K. E.; McMurray, A.; Mashreghi-Mohammadi, M.; Matthews, L.; Milne, S.; Nickerson, T.; Nguyen, M.; Overton-Larty, E.; Palmer, S. A.; Pearce, A. V.; Peck, A. I.; Pelan, S.; Phillimore, B.; Porter, K.; Rice, C. M.; Rogosin, A.; Ross, M. T.; Sarafidou, T.; Sehra, H. K.; Shownkeen, R.; Skuce, C. D.; Smith, M.; Standring, L.; Sycamore, N.; Tester, J.; Thorpe, A.; Torcasso, W.; Tracey, A.; Tromans, A.; Tsolas, J.; Wall, M.; Walsh, J.; Wang, H.; Weinstock, K.; West, A. P.; Willey, D. L.; Whitehead, S. L.; Wilming, L.; Wray, P. W.; Young, L.; Chen, Y.; Lovering, R. C.; Moschonas, N. K.; Siebert, R.; Fechtel, K.; Bentley, D.; Durbin, R.; Hubbard, T.; Doucette-Stamm, L.; Beck, S.; Smith, D. R.; Rogers, J.

doi:10.1038/nature02462

Article
Published: 27 May 2004

The DNA sequence and comparative analysis of human chromosome 10

P. Deloukas¹,
M. E. Earthrowl¹,
D. V. Grafham¹,
M. Rubenfield^2,3,
L. French¹,
C. A. Steward¹,
S. K. Sims¹,
M. C. Jones¹,
S. Searle¹,
C. Scott¹,
K. Howe¹,
S. E. Hunt¹,
T. D. Andrews¹,
J. G. R. Gilbert¹,
D. Swarbreck¹,
J. L. Ashurst¹,
A. Taylor¹,
J. Battles²,
C. P. Bird¹,
R. Ainscough¹,
J. P. Almeida¹,
R. I. S. Ashwell¹,
K. D. Ambrose¹,
A. K. Babbage¹,
C. L. Bagguley¹,
J. Bailey¹,
R. Banerjee¹,
K. Bates¹,
H. Beasley¹,
S. Bray-Allen¹,
A. J. Brown¹,
J. Y. Brown¹,
D. C. Burford¹,
W. Burrill¹,
J. Burton¹,
P. Cahill²,
D. Camire²,
N. P. Carter¹,
J. C. Chapman¹,
S. Y. Clark¹,
G. Clarke¹,
C. M. Clee¹,
S. Clegg¹,
N. Corby¹,
A. Coulson¹,
P. Dhami¹,
I. Dutta¹,
M. Dunn¹,
L. Faulkner¹,
A. Frankish¹,
J. A. Frankland¹,
P. Garner¹,
J. Garnett¹,
S. Gribble¹,
C. Griffiths¹,
R. Grocock¹,
E. Gustafson^2,3,
S. Hammond¹,
J. L. Harley¹,
E. Hart¹,
P. D. Heath¹,
T. P. Ho²,
B. Hopkins¹,
J. Horne²,
P. J. Howden¹,
E. Huckle¹,
C. Hynds²,
C. Johnson¹,
D. Johnson¹,
A. Kana²,
M. Kay¹,
A. M. Kimberley¹,
J. K. Kershaw¹,
M. Kokkinaki⁴,
G. K. Laird¹,
S. Lawlor¹,
H. M. Lee²,
D. A. Leongamornlert¹,
G. Laird¹,
C. Lloyd¹,
D. M. Lloyd¹,
J. Loveland¹,
J. Lovell¹,
S. McLaren¹,
K. E. McLay¹,
A. McMurray¹,
M. Mashreghi-Mohammadi¹,
L. Matthews¹,
S. Milne¹,
T. Nickerson¹,
M. Nguyen²,
E. Overton-Larty¹,
S. A. Palmer¹,
A. V. Pearce¹,
A. I. Peck¹,
S. Pelan¹,
B. Phillimore¹,
K. Porter¹,
C. M. Rice¹,
A. Rogosin^2,3,
M. T. Ross¹,
T. Sarafidou⁴,
H. K. Sehra¹,
R. Shownkeen¹,
C. D. Skuce¹,
M. Smith¹,
L. Standring²,
N. Sycamore¹,
J. Tester¹,
A. Thorpe¹,
W. Torcasso²,
A. Tracey¹,
A. Tromans¹,
J. Tsolas^2,3,
M. Wall¹,
J. Walsh²,
H. Wang²,
K. Weinstock²,
A. P. West¹,
D. L. Willey¹,
S. L. Whitehead¹,
L. Wilming¹,
P. W. Wray¹,
L. Young¹,
Y. Chen⁵,
R. C. Lovering⁶,
N. K. Moschonas⁴,
R. Siebert⁷,
K. Fechtel²,
D. Bentley¹,
R. Durbin¹,
T. Hubbard¹,
L. Doucette-Stamm^2,3,
S. Beck¹,
D. R. Smith^2,3 &
…
J. Rogers¹

Nature volume 429, pages 375–381 (2004)Cite this article

11k Accesses
62 Citations
18 Altmetric
Metrics details

Abstract

The finished sequence of human chromosome 10 comprises a total of 131,666,441 base pairs. It represents 99.4% of the euchromatic DNA and includes one megabase of heterochromatic sequence within the pericentromeric region of the short and long arm of the chromosome. Sequence annotation revealed 1,357 genes, of which 816 are protein coding, and 430 are pseudogenes. We observed widespread occurrence of overlapping coding genes (either strand) and identified 67 antisense transcripts. Our analysis suggests that both inter- and intrachromosomal segmental duplications have impacted on the gene count on chromosome 10. Multispecies comparative analysis indicated that we can readily annotate the protein-coding genes with current resources. We estimate that over 95% of all coding exons were identified in this study. Assessment of single base changes between the human chromosome 10 and chimpanzee sequence revealed nonsense mutations in only 21 coding genes with respect to the human sequence.

You have full access to this article via your institution.

Download PDF

CoCas9 is a compact nuclease from the human microbiome for efficient and precise genome editing

Article Open access 24 April 2024

Single-cell analysis reveals context-dependent, cell-level selection of mtDNA

Article Open access 24 April 2024

Analysis and benchmarking of small and large genomic variants across tandem repeats

Article 26 April 2024

Main

We report here the final clone map and sequence assembly of human chromosome 10. This metacentric chromosome accounts for 4.5% of the genome and is best known for harbouring the PTEN tumour suppressor gene and the RET proto-oncogene.

With the human genome sequence in hand, the task ahead is to identify the different units of genetic information embedded in the sequence and understand their function both at the molecular and cellular level. In this study we address the former by reporting a comprehensive annotation of manually inspected gene structures and their correlation to sequence variation and other features of the genomic sequence. The annotation process is assessed by comparative analysis to the genome sequence of two rodents, Mus musculus and Rattus norvegicus, and three fishes, Tetraodon nigroviridis, Fugu rubripes and Danio rerio. Finally, we report our preliminary findings on the distribution of single base differences along human chromosome 10 in comparison to the chimpanzee genome.

The clone map and finished sequence

A clone map spanning the euchromatic regions of the short (p) and long (q) arm of human chromosome 10 was assembled by restriction fingerprint and sequence-tagged site (STS) content analysis¹. We identified clones by screening approximately 85 genomic equivalents of P1-derived artificial chromosome (PAC), bacterial artificial chromosome (BAC), yeast artificial chromosome (YAC), cosmid and fosmid libraries. The tiling path consists of 1,144 minimally overlapping clones (Supplementary Table S1) organized into 12 contigs (Table 1). Contig-1340 spans the entire p arm and harbours the boundary between euchromatin and heterochromatin with the proximal 250 kilobases (kb) extending into pericentromeric satellite repeats. In a detailed study of this region two further clones carrying satellite 3 sequences, contig-2069, were identified and mapped by pulse field gel electrophoresis (PFGE) ∼50 kb proximal to contig-1340 (ref. 2). Similarly, contig-43 harbours the q-arm boundary with the proximal 240 kb composed of satellite repeats³. In addition, clone RP11-745D9, 94% of which consists of α-satellite repeats, was arbitrarily placed proximal to contig-43 as it was suggested to map to human chromosome 10 (E. Eichler, personal communication). A 9.75-megabase (Mb) PFGE map spanning the chromosome 10 centromere⁴ places the core α-satellite block (D10Z1) ∼0.2 Mb distal to contig-102 and 1 Mb proximal to contig-43.

Table 1 Clone and sequence contigs on chromosome 10

Full size table

In contrast with the p arm, nine gaps remain in the clone map of the q arm. Five of them are clustered within a ∼4-Mb region (Table 1 and Fig. 1; see also Supplementary Fig. S1 for a detailed view). Our inability to walk across these five gaps was due to the extensive segmental duplications in this region (Fig. 2); however, we obtained a size estimate by fluorescent in situ hybridization (FISH) of RP11-172C24 (AL512595) and RP11-13E1 (AC013284) on metaphase chromosomes. We sized the remaining four gaps by FISH with clones immediately flanking each gap to extended DNA fibres. No gap was estimated to be larger than 50 kb in size. Altogether the euchromatic gaps account for no more than 840 kb (Table 1). Finally, we defined the location of both telomeres. Clone RP11-631M1 (AL713922) ends ∼20 kb away from the telomeric repeats of the p arm based on the telomeric half-YAC XX-YAC22O3 (http://www.wistar.upenn.edu/Riethman/). At the end of the q arm (qtel), clone XX-YAC2136 (BX322534) contains part of the telomeric repeat block.

**Figure 1: The sequence map of chromosome 10 and its features.**

**Figure 2: Chromosome 10 inter- and intrasegmental duplications.**

Each clone of the tiling path was subjected to random subcloning and sequencing at either the Genome Therapeutics Corporation (GENE) or the Sanger Institute—the initial draft sequence of a few clones was carried out by other centres that are credited in the corresponding submissions to the EMBL/GenBank/DDBJ databases. We finished clones according to the international finishing standard (http://genome.wustl.edu/Overview/g16stand.php). Of the 1,144 clones in the human chromosome 10 tiling path, 221 and 913 were finished at GENE and the Sanger Institute, respectively, and three elsewhere (Supplementary Fig. S1). The remaining seven clones show persistent deletion of internal fragments. In total, we finished 131,666,441 base pairs (bp) in 18 sequence contigs; euchromatic coverage is estimated at 99.4%. Sequence accuracy was estimated as described in ref. 5 and found to exceed 99.99%. The sequence assembly comprises all known chromosome 10 messenger RNAs (RefSeq set) and STS markers from available radiation hybrid⁶ and genetic maps^7,8 (T. Furey, personal communication). In addition, the integrity of the sequence map was independently assessed at the University of California, Santa Cruz, by alignment of fosmid and BAC paired end sequences (http://genome.cse.ucsc.edu/). Table 1 lists the size of each sequence contig, with the largest one spanning 44,693,577 bp.

The gene and protein index

The Sanger Institute has established a standardized annotation pipeline (outlined at http://vega.sanger.ac.uk/) in which gene structures are drawn on the basis of human interpretation of the combined supportive evidence generated during sequence analysis. Annotation of the human chromosome 10 sequence resulted in a total of 1,787 gene structures that we then classified, as described in ref. 9, into: (1) 654 ‘known’ genes; (2) 162 ‘novel genes’; (3) 219 ‘novel transcripts’; (4) 322 ‘putative genes’; and (5) 430 ‘pseudogenes’. Pseudogenes were further subdivided into processed (371) and unprocessed (59).

Excluding the pseudogenes, human chromosome 10 is a chromosome with an average gene density (10.4 genes Mb^-1). The 1,357 genes span 66,309,730 bp in total (mean 51,335 bp per gene). Therefore, 50.6% of the analysed sequence is transcribed, matching the figure reported for chromosome 22 (51%), which is gene-rich, but appearing elevated in comparison with chromosomes 6, 7, 14 and 20 (42.2%, 46.5%, 43.6% and 42.4%, respectively), which have gene densities similar to chromosome 10. The latter suggests that the human chromosome 10 genes have on average a larger genomic span than those on chromosomes 6, 7, 14 and 20. Gene size along human chromosome 10 varies enormously, with the two extremes being CTNNA3 (1,776,209 bp) and CALML5 (859 bp). Exons account for only 2.3% of the sequence and the mean exon size is 313 bp. The longest and shortest exons annotated in this study have a length of 9,763 (SH3MD1) and 3 bp (CDH23), respectively. CDH23 is also the gene with most exons (69) on this chromosome. Table 2 summarizes the features of each gene class.

Table 2 Structural characteristics of annotated gene structures on chromosome 10

Full size table

Alternative splicing is a major contributor to the complexity of the human transcriptome. We annotated a total of 4,204 transcripts for 1,357 gene structures (Table 2). No splice variants were annotated on the basis of alternative polyadenylation sites. Approximately 73% of the protein-coding genes have more than one transcript and 5.8 on average. For ADD3 we annotated 22 variants. Note that the use of partial expressed sequences (for example, expressed sequence tags (ESTs)) may result in the annotation of more than one transcript per splice variant. Given this caveat, our analysis suggests a significantly higher level of alternative splicing compared with previous estimates¹⁰. Approximately 50% of the 3,456 transcripts (known and novel) do not seem to encode a protein. Annotation of these transcripts is largely based on ESTs, many of which may correspond to aberrant transcripts. Their precise role is largely unknown but they may be part of the machinery of transcriptional regulation (for example, via nonsense-mediated decay). Nevertheless, there are 1,837 transcripts with an open reading frame (ORF). Of the 342 genes with at least two transcripts having a complete ORF (that is, possess both a 5′ and a 3′ untranslated region (UTR)), 312 encode at least two distinct peptides.

Identification of transcription start sites and promoter regions remains a challenge in the annotation process. First, we scanned the human chromosome 10 unmasked sequence and predicted a total of 1,025 CpG islands (Fig. 1), which are known to be associated with the 5′ end of an estimated 60% of human genes¹¹.We then used Eponine¹² to predict transcription start sites and FirstEF¹³ to predict regions that encompass the promoter and 5′ exon. In total, FirstEF and Eponine predicted 1,801 (rank = 1, score ≥0.8) and 2,800 features, respectively (Supplementary Fig. S1). Notably, 62% of FirstEF and 96% of Eponine features directly overlapped CpG islands, suggesting a heavy bias towards this feature in both algorithms. The distribution of CpG islands, FirstEF and Eponine hits was also assessed relative to the first exon of each of the 4,635 annotated transcripts using a window of 5,000 bp upstream and 1,000 bp downstream of the exon. Table 3 summarizes the results obtained per gene class. For example, in the ‘known’ class, 1,544 (49.6%) and 1,124 (36.1%) transcripts had a FirstEF and Eponine feature, respectively. Not surprisingly, 89.9% of FirstEF and 96.7% of Eponine transcripts were also associated with a CpG island. Note that Eponine predicts multiple transcription start sites per transcript (4.24 on average).

Table 3 Correlation of CpG island, FirstEF and Eponine features with annotated transcripts

Full size table

Regulation of gene expression by antisense transcription is a recognized mechanism with examples reported in species ranging from bacteria to mammals^14,15,16. We observed widespread occurrence of overlapping coding genes (either strand) in human chromosome 10 (101 pairs in total). In 38 cases one of the genes is fully contained within an intron of the other, typically on the opposite strand (34). For example, the second intron of the splice variant LIPA-004 encompasses four members of the IFIT gene cluster (Supplementary Table S3 and Fig. S2) and appears to be transcribed from the same bidirectional promoter as IFIT5 (IFIT cluster member). Interestingly, the LIPA-001 and -002 variants do not overlap any IFIT gene, whereas LIPA-003 and LIPA-005 both overlap with IFIT2 and IFIT4 (Supplementary Fig. S2). Mutations in LIPA can cause Wolman and cholesteryl ester storage disease (Online Mendelian Inheritance in Man (OMIM) 278000). Among partially overlapping pairs (opposite strands), 34% involve the respective 5′ exons, which is indicative in each case of a bidirectional promoter. We also searched for non-coding transcripts located on the opposite strand of coding genes. There are 67 antisense transcripts overlapping 63 coding genes. The two most common patterns observed were intragenic with partial overlap of one exon, and partial overlap with the most 5′ exon of the coding gene. For ZNF32, we found two antisense transcripts (ZNF32OS1 and ZNF32OS2).

Finally, we looked at the distribution of known protein domains in both human chromosome 10 (this study) and the whole genome (Ensembl v.17.33.1) proteome using InterProScan. At the gene level, 70.6% of the human chromosome 10 peptides have at least one InterPro match with a Pfam domain and 32% are multidomain (1.37 distinct InterPros on average). Supplementary Table S2 shows the top 24 domains in chromosome 10 alongside their genome-wide ranking, suggesting that this chromosome is enriched in peptides with a lipase (IPR000734), aldo/keto reductase (IPR001395) or alpha/beta hydrolase (IPR000073) domain. BLASTP analysis (e-values <10^-15) showed that all six genes encoding the peptides carrying the IPR001395 domain are clustered (AKR1C cluster) at position 4.8–5.3 Mb, whereas there are two lipase clusters, LIP (90.0–91.0 Mb; six members) and PNLIP (117.9–118.1 Mb; four members). In total, we found 42 gene clusters along human chromosome 10 (Supplementary Table S3).

Genomic landscape

The average G + C and repeat content of human chromosome 10 are 41.58% and 43.66%, respectively. The distribution of the main classes of repeats (Supplementary Table S4), the G + C and CpG density plots all seem to follow the known genome-wide trends. For example, the G + C content fluctuates along the chromosome, peaks at the qtel and is positively correlated to gene density (Supplementary Fig. S1). Large genes tend to be located adjacent to or within gene-poor regions, for example PRKG1 and PCDH15 (interval 52.0–59.0 Mb), whereas regions of high gene density seem enriched in short interspersed elements (SINEs).

Segmental duplications are an important feature of the genomic landscape, being an integral part of the evolutionary machinery. Figure 2a illustrates the interchromosomal duplications along human chromosome 10 (see also ref. 17 and http://humanparalogy.cwru.edu/SDD). Figure 2b shows the extensive segmental duplications within a 5 Mb region at 10q11 with sequences further dispersed at 10q22, 10q23.1 and 10q23.2. Using the draft genome sequence Crosier and colleagues¹⁸ reported that 10q11 has been subject to multiple rounds of local duplications in at least the last 30–40 million years (Myr); they also showed that a 10q11:10q23 paracentric inversion occurred after the divergence of orang-utan from other great apes and hypothesized that the 10q22 locus resulted from chromosome-specific duplicative transposition. Bryce and colleagues¹⁹ characterized three cathepsin-L-like paralogues, which are expressed pseudogenes, and reported FISH signals at 10q11, 10q22 and 10q23, a pattern previously seen with the GLUD paralogues²⁰ (Fig. 2b). The duplication of the CTSL locus between chromosomes 9 and 10 was estimated to have occurred some 40 Myr ago¹⁹. We identified four and three additional CTSLL and GLUD pseudogenes, respectively (Fig. 2b), consistent with the local duplication events involving BMS1L (known as KIAA0187)¹⁸ and CTGLF1 (this study). In total, we annotated 7 BMS1L and 11 CTGLF1 paralogues (Fig. 2b; functional genes in dark blue). Retroposition of a truncated KIAA1099 (CENTG2; chromosome 2) mRNA gave rise to a processed pseudogene on chromosome 10 (ref. 18). However, this pseudogene forms the 3′ exon of CTGLF1, suggesting that this gene resulted from a fusion between an ancestral gene and the CENTG2 pseudogene, and the retroposition event predated all segmental duplications. Note that there is also evidence of transcripts combining exons of CTGLF1 and BMS1L paralogues. Interestingly, we predicted a novel gene, FAM25A (based on mouse complementary DNA AK008614 and without similarity to any known protein), with seven human chromosome 10 paralogues (Fig. 2b) that follow the pattern of GLUD, BMS1L, CTGLF and CTSLL. In addition to these five types of low copy repeats, the segmental duplications in 10q11:10q22:10q23 seem to have impacted on the number of genes on this chromosome. Notably, 31% of all the functionally related gene clusters are located within 10q11 and 10q23 (Supplementary Fig. S1 and Table S3).

The average recombination rate across the chromosome is 1.32 cM Mb^-1 (Fig. 3). Note that ref. 7 used the draft human chromosome 10 sequence (inflated by ∼8%) and thus obtained a lower figure (1.21 cM Mb^-1). The rate of male recombination is higher than the female rate near the telomeres, whereas between D10S211 and D10S575 the female rate is higher (Fig. 3). This comparison also indicates the presence of two female-specific recombination hotspots (Fig. 3, arrows). As expected, pericentromeric regions display a low rate of recombination, more than twofold below the chromosome average. In particular, the region between D10S1247 and D10S1783 has a rate of 0.3 cM Mb^-1 and contains the only human chromosome 10 recombination ‘desert’²¹. The region of extensive segmental duplications at 10q11 shows a low rate of recombination (0.72 cM Mb^-1). Thus, meiotic recombination is unlikely to have been the driving force in generating these duplications. Our analysis confirmed the recombination ‘jungle’ between D10S1782–D10S1651, which we extended to D10S212 (3.4 cM Mb^-1); we refuted the one between D10S189 and D10S1728 (2.3 cM Mb^-1), and identified a new one between D10S1154 and D10S552 (4.55 cM Mb^-1).

**Figure 3: Alignment of the deCODE genetic map of chromosome 10 to the physical map from the telomeric end of the short arm to the telomeric end of the long arm.**

Comparative sequence analysis

Many of the functional units in present-day vertebrate genomes have been conserved through evolution. Coding regions are highly conserved across all vertebrates, whereas non-coding regions are conserved among more closely related species. The genome sequences of three fishes (Tetraodon, Fugu and zebrafish) and two rodents (mouse and rat) were publicly available at the time of analysis. We compared the sequence of human chromosome 10 to each of the above species and searched for conserved regions with coding potential—distinguishing functional non-coding elements above background sequence conservation requires the use of additional genomes²². We then correlated the obtained hits in each species with the annotated genes. As expected, each of the fish genomes showed higher specificity (> 0.82; that is, 82% of sequences conserved in fish overlapped human annotated exons) than the rodents (> 0.29), whereas the highest specificity was obtained using all genomes together (0.96). Sensitivity was higher using a rodent genome (0.7; that is, 70% of all annotated exons had matches in rodent) than a fish genome (0.4).

Of the 1,787 gene structures (16,765 exons), 84% (78% of exons) had at least one exon supported by a conserved region in one of the other genomes and 52% (32% of exons) in all genomes. Note that the figures given for exons are underestimated owing to lower sequence conservation in the untranslated regions. Protein-coding genes are highly conserved (98.5%; 85% of exons). In contrast, 61% (29% of exons) and 45% (27% of exons) of novel transcripts and putative genes, respectively, have at least one match. Furthermore, only 21% of novel transcripts and 8% of putative genes show conservation with a fish (91% for protein-coding genes). Typically, novel transcripts were annotated on the basis of solid experimental evidence (that is, human mRNA) and may represent either genes that have evolved more rapidly or non-coding RNAs.

On the basis of specificity, regions conserved in all six species can serve as a measure of completeness of the gene annotation process that occurred independently of the comparative analysis. We found 5,604 such evolutionarily conserved regions (ECRs) of which 5,292 mapped inside annotated exons (including pseudogenes). Of the remaining 312 ECRs, 142 were intergenic and 170 intragenic. On inspection, we found 79 of these ECRs with supportive evidence to annotate a missed exon, most of which were part of a pseudogene (79%). The remaining 233 ECRs provide the basis to estimate that we have annotated at least 95.8% of all conserved coding exons on human chromosome 10. This is a conservative estimate as 131 of these ECRs are located in introns and may represent conserved non-exonic sequences. Interestingly, 54 (41%) of them are associated with just four genes: C10orf11 (26; also known as CDA017), EBF3 (20), TCF7L2 (5) and PAX2 (3). All but C10orf11 (unknown function) are transcription factors. Figure 4 shows a MultiPipMaker²³ alignment of the orthologous EBF3 loci and the relative position of ECRs in the human gene. Note that sequence identity is often higher in ECRs than in exons.

**Figure 4: Multispecies alignment of orthologous *EBF3* loci.**

Sequence variation

During the human chromosome 10 project we discovered 35,882 single nucleotide polymorphisms (SNPs) by sequence alignment in regions of clone overlaps. In total, we mapped 143,364 SNPs (dbSNP release 115) to the chromosome 10 sequence. Supplementary Fig. S1 shows the density plots for randomly discovered²⁴ and all SNPs across the chromosome.

There are 5,864 (4.1%) exonic and 65,973 (46%) intronic SNPs. Of the 1,821 SNPs in coding exons 984 are non-synonymous. MSMB has the most polymorphic coding region with 43 SNPs kb^-1; it encodes a protein with inhibin-like activity and its expression is decreased in prostate cancer²⁵.

We also considered 729,553 human–chimpanzee single base differences (SBDs) remapped on the current assembly of human chromosome 10. These were high-confidence sequence differences originally identified by aligning 14 million shotgun reads of the chimpanzee genome, generated jointly by the Whitehead Institute and Washington University Genome Centers, to the human genome sequence assembly (NCBI build 31). We first removed all human–chimpanzee SBDs that co-localized with known human SNPs. Supplementary Fig. S1 shows the density plot of the remaining 703,338 SBDs. Of those, 55.3% are intergenic, 42.9% intronic and 1.8% exonic. The highest density of human–chimpanzee SBDs, fourfold greater than the average level, was observed in a 200-kb gene-poor region at 19.43–19.63 Mb. We then examined the 12,710 human–chimpanzee SBDs that lie in exons of the 816 human coding genes. Of those, 3,972 were in coding regions and can be subdivided further into 2,273 synonymous, 1,678 non-synonymous and 21 nonsense with respect to the human sequence. For each gene we calculated the rate of evolution of non-synonymous (K_a) and synonymous (K_s) changes, and the ratio K_a/K_s, which provides a measure of evolutionary selection. Supplementary Table S5 lists the 1,413 transcripts with at least one coding human–chimpanzee SBD sorted on the K_a/K_s value. There are only 29 transcripts (21 genes) that have a K_a/K_s value ≥1, whereas there are 484 without non-synonymous SBDs. Note that several caveats apply in this type of analysis owing to the incomplete nature of both the chimpanzee data and the list of human SNPs; we used the number of intronic human–chimpanzee SBDs per base in comparison to the chromosome average of 0.005 as a possible estimate of coverage. The gene with most non-synonymous human–chimpanzee SBDs is MKI67, an antigen identified by monoclonal antibody Ki-67, which appears to be fast evolving in humans (K_a/K_s = 1.038507; SNP data). The expression pattern of MKI67 in gastric and other cancers is under investigation as this gene is expressed in proliferating cells. Interestingly, a nonsense human–chimpanzee SBD is present in both of its coding transcripts. Among the 21 genes with nonsense human-chimpanzee SBDs notable examples are the serotonin receptor HTR7 (the neurotransmitter serotonin is thought to be involved in cognition and behaviour), PSAP (prosaposin; involved in variant Gaucher's disease and metachromatic leukodystrophy) and the developmental gene NODAL.

Medical implications

At the time of writing there were 85 disease loci reported on human chromosome 10 (http://www.ncbi.nlm.nih.gov/omim/), a 47% increase since 1999 (ref. 26). Several of these loci account for multiple disease phenotypes caused by mutations in the same gene; notable examples are FGFR2 (OMIM 176943), PTEN (OMIM 601728) and the proto-oncogene RET (OMIM 164761). Since PTEN was first shown to be mutated in brain, breast and prostate cancers²⁷, there has been an explosion of reported mutations (110 germline and 332 somatic mutations)²⁸ and disease phenotypes²⁹. Human chromosome 10 harbours several other genes involved in tumorigenesis; for example, deregulation of TLX1, NFKB2 or BMI1 caused by chromosomal translocations or amplifications has been detected in lymphatic neoplasms. Mapping of allelic imbalances and functional studies suggest the presence of additional tumour-related genes. The finished and annotated sequence is key in the process of cloning these and other hitherto unidentified disease-associated genes.

The prompt release of both the clone and sequence map resources throughout the project has accelerated the cloning of many disease-causing genes. To this end we recently showed as part of the European ADLTE consortium that mutations in the LGI1 gene cause autosomal dominant lateral temporal epilepsy³⁰. Notably, we found that the FRA10A folate-sensitive fragile site is located close to LGI1 and its expression is associated with the expansion of a polymorphic CGG repeat located at the 5′ UTR of FRA10AC1, a gene encoding a novel nuclear protein³¹. There are seven fragile sites mapping to human chromosome 10 (ref. 32).

The challenge ahead is to unravel the molecular basis of common disease. An increasing number of susceptibility loci for complex diseases is being mapped to human chromosome 10, including metabolic diseases such as type I diabetes (IDDM10 and IDDM17), psychiatric disorders such as schizophrenia, or neurodegenerative illnesses such as Alzheimer's disease^26,32. In a case control study of morbidly obese and healthy individuals Boutin and colleagues³³ identified a SNP in the GAD2 gene that increases the risk for obesity as well as a protective haplotype. Studies so far have mainly focused on candidate genes. The construction of a comprehensive haplotype map of the human genome is well underway³⁴, making it possible to undertake a systematic approach to scanning the genome for associations to disease-related and other phenotypes.

Methods

The mapping and sequencing methods used in the assembly of the bacterial clone and sequence map of chromosome 10, respectively, as well as the tools of the gene annotation pipeline are described in refs 1 and 35 (see ref. 36 for detailed protocols). Manual annotation of gene structures followed the guidelines agreed in the human annotation workshop (HAWK; http://www.sanger.ac.uk/HGP/havana/hawk.shtml), whereas gene symbols were approved where possible by the HUGO Gene Nomenclature Committee³⁷. Protein translations were analysed with InterProScan (http://www.ebi.ac.uk/interpro/scan.html), which was run via the Ensembl protein annotation pipeline, to obtain Pfam, Prosite, Prints and Profiles domain matches.

Alignments for inter- and intrachromosomal duplications were performed with WU-BLASTN (http://blast.wustl.edu) using the current sequence assembly of chromosome 10 and the NCBI34 build for the rest of the genome. All sequences were repeat masked with RepeatMasker (http://repeatmasker.genome.washington.edu) and low-quality alignments (e-value >10^-30, sequence identity <90%, length <80 bp) were discarded. For intrachromosomal duplications, self matches were discarded. For interchromosomal duplications, the sequence was split into 400-kb segments. Adjacent matches in the same orientation were joined together as described by ref. 38. Only blocks of 10 kb or greater were retained.

The following sequence assemblies were used for comparative analysis: M. musculus NCBI build 30 (Mouse Genome Sequencing Consortium; http://www.ensembl.org/Mus_musculus/resources.html); R. norvegicus version 2.0 (Rat Genome Sequencing Consortium; http://www.hgsc.bcm.tmc.edu/projects/rat); D. rerio Assembly version 1 (Sanger Institute; http://www.sanger.ac.uk/Projects/D_rerio); F. rubripes version 2 (International Fugu Genome Consortium; http://www.fugu-sg.org/project/info.html); and T. nigroviridis version 6 (Genoscope and Whitehead Institute for Genome Research; http://www.genoscope.cns.fr/externe/tetraodon/Ressource.html). The repeat-masked sequence of chromosome 10 was aligned to the mouse and rat genome sequences using BLASTZ³⁹ and the resulting matches were post-processed by axtBest and subsetAxt (W. J. Kent; http://www.soe.ucsc.edu/~kent/src) as described elsewhere³⁵. Alignment of the three fish genome sequences to chromosome 10 was performed with WU-TBLASTX using the scoring matrix, parameters and filtering strategy applied by Exofish⁴⁰. Overlapping alignments to different sequences were merged to produce contiguous regions of sequence conservation, analogous to the ECRs or ‘Ecores’, reported by Exofish.

SNPs in sequence overlaps were identified using a modification of the SSAHA software⁴¹. The chromosome 10 SNPs (dbSNP release 115) were mapped to the sequence assembly of this chromosome (this study) first with SSAHA and then Cross-Match. Of the approximately 14 million chimpanzee reads (http://www.genome.gov/11008056) mapped onto the human sequence assembly (NCBI build 31), those mapping to chromosome 10 were remapped to our sequence assembly and used to identify human–chimpanzee SBDs.

References

Bentley, D. R. et al. The physical maps for sequencing human chromosomes 1, 6, 9, 10, 13, 20 and X. Nature 409, 942–943 (2001)
Article ADS CAS Google Scholar
Guy, J. et al. Genomic sequence and transcriptional profile of the boundary between pericentromeric satellites and genes on human chromosome arm 10p. Genome Res. 13, 159–172 (2003)
Article CAS Google Scholar
Guy, J. et al. Genomic sequence and transcriptional profile of the boundary between pericentromeric satellites and genes on human chromosome arm 10q. Hum. Mol. Genet. 9, 2029–2042 (2000)
Article CAS Google Scholar
Jackson, M. S., See, C. G., Mulligan, L. M. & Lauffart, B. F. A 9.75-Mb map across the centromere of human chromosome 10. Genomics 33, 258–270 (1996)
Article CAS Google Scholar
Felsenfeld, A., Peterson, J., Schloss, J. & Guyer, M. Assessing the quality of the DNA sequence from the Human Genome Project. Genome Res. 9, 1–4 (1999)
CAS PubMed Google Scholar
Deloukas, P. et al. A physical map of 30,000 human genes. Science 282, 744–746 (1998)
Article ADS CAS Google Scholar
Kong, A. et al. A high-resolution recombination map of the human genome. Nature Genet. 31, 241–247 (2002)
Article CAS Google Scholar
Broman, K. W., Murray, J. C., Sheffield, V. C., White, R. L. & Weber, J. L. Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am. J. Hum. Genet. 63, 861–869 (1998)
Article CAS Google Scholar
Deloukas, P. et al. The DNA sequence and comparative analysis of human chromosome 20. Nature 414, 865–871 (2001)
Article ADS CAS Google Scholar
International Human Genome Sequencing Consortium, Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)
Article Google Scholar
Antequera, F. & Bird, A. Number of CpG islands and genes in human and mouse. Proc. Natl Acad. Sci. USA 90, 11995–11999 (1993)
Article ADS CAS Google Scholar
Down, T. A. & Hubbard, T. J. Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res. 12, 458–461 (2002)
Article CAS Google Scholar
Davuluri, R. V., Grosse, I. & Zhang, M. Q. Computational identification of promoters and first exons in the human genome. Nature Genet. 29, 412–417 (2001)
Article CAS Google Scholar
Wagner, E. G. & Simons, R. W. Antisense RNA control in bacteria, phages, and plasmids. Annu. Rev. Microbiol. 48, 713–742 (1994)
Article CAS Google Scholar
Eddy, S. R. Non-coding RNA genes and the modern RNA world. Nature Rev. Genet. 2, 919–929 (2001)
Article CAS Google Scholar
Kiyosawa, H. et al. Antisense transcripts with FANTOM2 clone set and their implications for gene regulation. Genome Res. 13, 1324–1334 (2003)
Article CAS Google Scholar
Bailey, J. A. et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002)
Article ADS CAS Google Scholar
Crosier, M. et al. Human paralogs of KIAA0187 were created through independent pericentromeric-directed and chromosome-specific duplication mechanisms. Genome Res. 12, 67–80 (2002)
Article CAS Google Scholar
Bryce, S. D. et al. A novel family of cathepsin L-like (CTSLL) sequences on human chromosome 10q and related transcripts. Genomics 24, 568–576 (1994)
Article CAS Google Scholar
Deloukas, P., Dauwerse, J. G., Moschonas, N. K., van Ommen, G. J. & van Loon, A. P. Three human glutamate dehydrogenase genes (GLUD1, GLUDP2, and GLUDP3) are located on chromosome 10q, but are not closely physically linked. Genomics 17, 676–681 (1993)
Article CAS Google Scholar
Yu, A. et al. Comparison of human genetic and sequence-based physical maps. Nature 409, 951–953 (2001)
Article ADS CAS Google Scholar
Thomas, J. W. et al. Comparative analyses of multi-species sequences from targeted genomic regions. Nature 424, 788–793 (2003)
Article ADS CAS Google Scholar
Schwartz, S. et al. PipMaker—a web server for aligning two genomic DNA sequences. Genome Res. 10, 577–586 (2000)
Article CAS Google Scholar
Sachidanandam, R. et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001)
Article ADS CAS Google Scholar
Vanaja, D. K., Cheville, J. C., Iturria, S. J. & Young, C. Y. Transcriptional silencing of zinc finger protein 185 identified by expression profiling is associated with prostate cancer progression. Cancer Res. 63, 3877–3882 (2003)
CAS PubMed Google Scholar
Deloukas, P., French, L., Meitinger, T. & Moschonas, N. K. Report of the third international workshop on human chromosome 10 mapping and sequencing 1999. Cytogenet. Cell Genet. 90, 1–12 (2000)
Article CAS Google Scholar
Li, J. et al. PTEN, a putative protein tyrosine phosphatase gene mutated in human brain, breast, and prostate cancer. Science 275, 1943–1946 (1997)
Article CAS Google Scholar
Bonneau, D. & Longy, M. Mutations of the human PTEN gene. Hum. Mutat. 16, 109–122 (2000)
Article CAS Google Scholar
Eng, C. PTEN: one gene, many syndromes. Hum. Mutat. 22, 183–198 (2003)
Article CAS Google Scholar
Morante-Redolat, J. M. et al. Mutations in the LGI1/Epitempin gene on 10q24 cause autosomal dominant lateral temporal epilepsy. Hum. Mol. Genet. 11, 1119–1128 (2002)
Article CAS Google Scholar
Sarafidou, T. et al. Folate-sensitive fragile site FRA10A is due to an expansion of a CGG-repeat in a novel gene FRA10AC1, encoding a nuclear protein. Genomics (in the press)
Moschonas, N. K. in Encyclopedia of the Human Genome Vol. 1, 618–625 (Nature Publishing Group, London, 2003)
Google Scholar
Boutin, P. et al. GAD2 on chromosome 10p12 is a candidate gene for human obesity. PLoS Biol. 1, 1–11 (2003)
Article Google Scholar
The International HapMap Consortium. The International HapMap project. Nature 426, 789–796 (2003)
Article Google Scholar
Mungall, A. J. et al. The DNA sequence and analysis of human chromosome 6. Nature 425, 805–811 (2003)
Article ADS CAS Google Scholar
Dunham, I. (ed.) Genome Mapping and Sequencing (Horizon Scientific, Wymondham, UK)
Wain, H. M., Lush, M., Ducluzeau, F. & Povey, S. Genew: the human gene nomenclature database. Nucleic Acids Res. 30, 169–171 (2002)
Article CAS Google Scholar
Cheung, J. et al. Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence. Genome Biol. 4, R25 (2003)
Article Google Scholar
Schwartz, S. et al. Human–mouse alignments with BLASTZ. Genome Res. 13, 103–107 (2003)
Article CAS Google Scholar
Roest Crollius, H. et al. Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence. Nature Genet. 25, 235–238 (2000)
Article CAS Google Scholar
Ning, Z., Cox, A. J. & Mullikin, J. C. SSAHA: a fast search method for large DNA databases. Genome Res. 11, 1725–1729 (2001)
Article CAS Google Scholar

Download references

Acknowledgements

We thank the Fugu, Mouse and Rat Genome Sequencing Consortia; Genoscope and Whitehead Institute for Genome Research (Tetraodon genome data); the members of the zebrafish project at the Sanger Institute; the groups at the Washington University Genome Sequencing Center and the Whitehead Institute for early access to the chimpanzee whole-genome shotgun data; the laboratories of H. Riethman and M. S. Jackson for the provision of telomeric and pericentromeric clones, respectively; H. M. Wain, E. A. Bruford, C. C. Talbot, V. K. Khodiyar, M. J. Lush, M. W. Wright and S. Povey for assigning official gene symbols; the EMBL, GenBank and Ensembl database groups; I. Dunham for critical reading of the manuscript; and the Wellcome Trust for financial support.

Author information

Authors and Affiliations

The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK
P. Deloukas, M. E. Earthrowl, D. V. Grafham, L. French, C. A. Steward, S. K. Sims, M. C. Jones, S. Searle, C. Scott, K. Howe, S. E. Hunt, T. D. Andrews, J. G. R. Gilbert, D. Swarbreck, J. L. Ashurst, A. Taylor, C. P. Bird, R. Ainscough, J. P. Almeida, R. I. S. Ashwell, K. D. Ambrose, A. K. Babbage, C. L. Bagguley, J. Bailey, R. Banerjee, K. Bates, H. Beasley, S. Bray-Allen, A. J. Brown, J. Y. Brown, D. C. Burford, W. Burrill, J. Burton, N. P. Carter, J. C. Chapman, S. Y. Clark, G. Clarke, C. M. Clee, S. Clegg, N. Corby, A. Coulson, P. Dhami, I. Dutta, M. Dunn, L. Faulkner, A. Frankish, J. A. Frankland, P. Garner, J. Garnett, S. Gribble, C. Griffiths, R. Grocock, S. Hammond, J. L. Harley, E. Hart, P. D. Heath, B. Hopkins, P. J. Howden, E. Huckle, C. Johnson, D. Johnson, M. Kay, A. M. Kimberley, J. K. Kershaw, G. K. Laird, S. Lawlor, D. A. Leongamornlert, G. Laird, C. Lloyd, D. M. Lloyd, J. Loveland, J. Lovell, S. McLaren, K. E. McLay, A. McMurray, M. Mashreghi-Mohammadi, L. Matthews, S. Milne, T. Nickerson, E. Overton-Larty, S. A. Palmer, A. V. Pearce, A. I. Peck, S. Pelan, B. Phillimore, K. Porter, C. M. Rice, M. T. Ross, H. K. Sehra, R. Shownkeen, C. D. Skuce, M. Smith, N. Sycamore, J. Tester, A. Thorpe, A. Tracey, A. Tromans, M. Wall, A. P. West, D. L. Willey, S. L. Whitehead, L. Wilming, P. W. Wray, L. Young, D. Bentley, R. Durbin, T. Hubbard, S. Beck & J. Rogers
Genome Therapeutics Corporation, 100 Beaver Street, Waltham, Massachusetts, 02453, USA
M. Rubenfield, J. Battles, P. Cahill, D. Camire, E. Gustafson, T. P. Ho, J. Horne, C. Hynds, A. Kana, H. M. Lee, M. Nguyen, A. Rogosin, L. Standring, W. Torcasso, J. Tsolas, J. Walsh, H. Wang, K. Weinstock, K. Fechtel, L. Doucette-Stamm & D. R. Smith
Agencourt Bioscience Corporation, 100 Cummings Center, Beverly, Massachusetts, 01915, USA
M. Rubenfield, E. Gustafson, A. Rogosin, J. Tsolas, L. Doucette-Stamm & D. R. Smith
Department of Biology, University of Crete & Institute of Molecular Biology and Biotechnology, Foundation of Research and Technology, PO Box 2208, 71409, Heraklion, Crete, Greece
M. Kokkinaki, T. Sarafidou & N. K. Moschonas
European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
Y. Chen
HUGO Gene Nomenclature Committee, Department of Biology, University College London, London, NW1 2HE, UK
R. C. Lovering
Institute of Human Genetics, University Hospital Schleswig-Holstein Campus Kiel, Schwanenweg 24, D-24105, Kiel, Germany
R. Siebert

Authors

P. Deloukas
View author publications
You can also search for this author in PubMed Google Scholar
M. E. Earthrowl
View author publications
You can also search for this author in PubMed Google Scholar
D. V. Grafham
View author publications
You can also search for this author in PubMed Google Scholar
M. Rubenfield
View author publications
You can also search for this author in PubMed Google Scholar
L. French
View author publications
You can also search for this author in PubMed Google Scholar
C. A. Steward
View author publications
You can also search for this author in PubMed Google Scholar
S. K. Sims
View author publications
You can also search for this author in PubMed Google Scholar
M. C. Jones
View author publications
You can also search for this author in PubMed Google Scholar
S. Searle
View author publications
You can also search for this author in PubMed Google Scholar
C. Scott
View author publications
You can also search for this author in PubMed Google Scholar
K. Howe
View author publications
You can also search for this author in PubMed Google Scholar
S. E. Hunt
View author publications
You can also search for this author in PubMed Google Scholar
T. D. Andrews
View author publications
You can also search for this author in PubMed Google Scholar
J. G. R. Gilbert
View author publications
You can also search for this author in PubMed Google Scholar
D. Swarbreck
View author publications
You can also search for this author in PubMed Google Scholar
J. L. Ashurst
View author publications
You can also search for this author in PubMed Google Scholar
A. Taylor
View author publications
You can also search for this author in PubMed Google Scholar
J. Battles
View author publications
You can also search for this author in PubMed Google Scholar
C. P. Bird
View author publications
You can also search for this author in PubMed Google Scholar
R. Ainscough
View author publications
You can also search for this author in PubMed Google Scholar
J. P. Almeida
View author publications
You can also search for this author in PubMed Google Scholar
R. I. S. Ashwell
View author publications
You can also search for this author in PubMed Google Scholar
K. D. Ambrose
View author publications
You can also search for this author in PubMed Google Scholar
A. K. Babbage
View author publications
You can also search for this author in PubMed Google Scholar
C. L. Bagguley
View author publications
You can also search for this author in PubMed Google Scholar
J. Bailey
View author publications
You can also search for this author in PubMed Google Scholar
R. Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
K. Bates
View author publications
You can also search for this author in PubMed Google Scholar
H. Beasley
View author publications
You can also search for this author in PubMed Google Scholar
S. Bray-Allen
View author publications
You can also search for this author in PubMed Google Scholar
A. J. Brown
View author publications
You can also search for this author in PubMed Google Scholar
J. Y. Brown
View author publications
You can also search for this author in PubMed Google Scholar
D. C. Burford
View author publications
You can also search for this author in PubMed Google Scholar
W. Burrill
View author publications
You can also search for this author in PubMed Google Scholar
J. Burton
View author publications
You can also search for this author in PubMed Google Scholar
P. Cahill
View author publications
You can also search for this author in PubMed Google Scholar
D. Camire
View author publications
You can also search for this author in PubMed Google Scholar
N. P. Carter
View author publications
You can also search for this author in PubMed Google Scholar
J. C. Chapman
View author publications
You can also search for this author in PubMed Google Scholar
S. Y. Clark
View author publications
You can also search for this author in PubMed Google Scholar
G. Clarke
View author publications
You can also search for this author in PubMed Google Scholar
C. M. Clee
View author publications
You can also search for this author in PubMed Google Scholar
S. Clegg
View author publications
You can also search for this author in PubMed Google Scholar
N. Corby
View author publications
You can also search for this author in PubMed Google Scholar
A. Coulson
View author publications
You can also search for this author in PubMed Google Scholar
P. Dhami
View author publications
You can also search for this author in PubMed Google Scholar
I. Dutta
View author publications
You can also search for this author in PubMed Google Scholar
M. Dunn
View author publications
You can also search for this author in PubMed Google Scholar
L. Faulkner
View author publications
You can also search for this author in PubMed Google Scholar
A. Frankish
View author publications
You can also search for this author in PubMed Google Scholar
J. A. Frankland
View author publications
You can also search for this author in PubMed Google Scholar
P. Garner
View author publications
You can also search for this author in PubMed Google Scholar
J. Garnett
View author publications
You can also search for this author in PubMed Google Scholar
S. Gribble
View author publications
You can also search for this author in PubMed Google Scholar
C. Griffiths
View author publications
You can also search for this author in PubMed Google Scholar
R. Grocock
View author publications
You can also search for this author in PubMed Google Scholar
E. Gustafson
View author publications
You can also search for this author in PubMed Google Scholar
S. Hammond
View author publications
You can also search for this author in PubMed Google Scholar
J. L. Harley
View author publications
You can also search for this author in PubMed Google Scholar
E. Hart
View author publications
You can also search for this author in PubMed Google Scholar
P. D. Heath
View author publications
You can also search for this author in PubMed Google Scholar
T. P. Ho
View author publications
You can also search for this author in PubMed Google Scholar
B. Hopkins
View author publications
You can also search for this author in PubMed Google Scholar
J. Horne
View author publications
You can also search for this author in PubMed Google Scholar
P. J. Howden
View author publications
You can also search for this author in PubMed Google Scholar
E. Huckle
View author publications
You can also search for this author in PubMed Google Scholar
C. Hynds
View author publications
You can also search for this author in PubMed Google Scholar
C. Johnson
View author publications
You can also search for this author in PubMed Google Scholar
D. Johnson
View author publications
You can also search for this author in PubMed Google Scholar
A. Kana
View author publications
You can also search for this author in PubMed Google Scholar
M. Kay
View author publications
You can also search for this author in PubMed Google Scholar
A. M. Kimberley
View author publications
You can also search for this author in PubMed Google Scholar
J. K. Kershaw
View author publications
You can also search for this author in PubMed Google Scholar
M. Kokkinaki
View author publications
You can also search for this author in PubMed Google Scholar
G. K. Laird
View author publications
You can also search for this author in PubMed Google Scholar
S. Lawlor
View author publications
You can also search for this author in PubMed Google Scholar
H. M. Lee
View author publications
You can also search for this author in PubMed Google Scholar
D. A. Leongamornlert
View author publications
You can also search for this author in PubMed Google Scholar
G. Laird
View author publications
You can also search for this author in PubMed Google Scholar
C. Lloyd
View author publications
You can also search for this author in PubMed Google Scholar
D. M. Lloyd
View author publications
You can also search for this author in PubMed Google Scholar
J. Loveland
View author publications
You can also search for this author in PubMed Google Scholar
J. Lovell
View author publications
You can also search for this author in PubMed Google Scholar
S. McLaren
View author publications
You can also search for this author in PubMed Google Scholar
K. E. McLay
View author publications
You can also search for this author in PubMed Google Scholar
A. McMurray
View author publications
You can also search for this author in PubMed Google Scholar
M. Mashreghi-Mohammadi
View author publications
You can also search for this author in PubMed Google Scholar
L. Matthews
View author publications
You can also search for this author in PubMed Google Scholar
S. Milne
View author publications
You can also search for this author in PubMed Google Scholar
T. Nickerson
View author publications
You can also search for this author in PubMed Google Scholar
M. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
E. Overton-Larty
View author publications
You can also search for this author in PubMed Google Scholar
S. A. Palmer
View author publications
You can also search for this author in PubMed Google Scholar
A. V. Pearce
View author publications
You can also search for this author in PubMed Google Scholar
A. I. Peck
View author publications
You can also search for this author in PubMed Google Scholar
S. Pelan
View author publications
You can also search for this author in PubMed Google Scholar
B. Phillimore
View author publications
You can also search for this author in PubMed Google Scholar
K. Porter
View author publications
You can also search for this author in PubMed Google Scholar
C. M. Rice
View author publications
You can also search for this author in PubMed Google Scholar
A. Rogosin
View author publications
You can also search for this author in PubMed Google Scholar
M. T. Ross
View author publications
You can also search for this author in PubMed Google Scholar
T. Sarafidou
View author publications
You can also search for this author in PubMed Google Scholar
H. K. Sehra
View author publications
You can also search for this author in PubMed Google Scholar
R. Shownkeen
View author publications
You can also search for this author in PubMed Google Scholar
C. D. Skuce
View author publications
You can also search for this author in PubMed Google Scholar
M. Smith
View author publications
You can also search for this author in PubMed Google Scholar
L. Standring
View author publications
You can also search for this author in PubMed Google Scholar
N. Sycamore
View author publications
You can also search for this author in PubMed Google Scholar
J. Tester
View author publications
You can also search for this author in PubMed Google Scholar
A. Thorpe
View author publications
You can also search for this author in PubMed Google Scholar
W. Torcasso
View author publications
You can also search for this author in PubMed Google Scholar
A. Tracey
View author publications
You can also search for this author in PubMed Google Scholar
A. Tromans
View author publications
You can also search for this author in PubMed Google Scholar
J. Tsolas
View author publications
You can also search for this author in PubMed Google Scholar
M. Wall
View author publications
You can also search for this author in PubMed Google Scholar
J. Walsh
View author publications
You can also search for this author in PubMed Google Scholar
H. Wang
View author publications
You can also search for this author in PubMed Google Scholar
K. Weinstock
View author publications
You can also search for this author in PubMed Google Scholar
A. P. West
View author publications
You can also search for this author in PubMed Google Scholar
D. L. Willey
View author publications
You can also search for this author in PubMed Google Scholar
S. L. Whitehead
View author publications
You can also search for this author in PubMed Google Scholar
L. Wilming
View author publications
You can also search for this author in PubMed Google Scholar
P. W. Wray
View author publications
You can also search for this author in PubMed Google Scholar
L. Young
View author publications
You can also search for this author in PubMed Google Scholar
Y. Chen
View author publications
You can also search for this author in PubMed Google Scholar
R. C. Lovering
View author publications
You can also search for this author in PubMed Google Scholar
N. K. Moschonas
View author publications
You can also search for this author in PubMed Google Scholar
R. Siebert
View author publications
You can also search for this author in PubMed Google Scholar
K. Fechtel
View author publications
You can also search for this author in PubMed Google Scholar
D. Bentley
View author publications
You can also search for this author in PubMed Google Scholar
R. Durbin
View author publications
You can also search for this author in PubMed Google Scholar
T. Hubbard
View author publications
You can also search for this author in PubMed Google Scholar
L. Doucette-Stamm
View author publications
You can also search for this author in PubMed Google Scholar
S. Beck
View author publications
You can also search for this author in PubMed Google Scholar
D. R. Smith
View author publications
You can also search for this author in PubMed Google Scholar
J. Rogers
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. Deloukas.

Ethics declarations

Competing interests

The authors declare that they have no competing financial interests.

Supplementary information

Supplementary Table 1

Bacterial clones in the chromosome 10 tiling path. (DOC 31 kb)

Supplementary Table 2

Most common InterPro domains in the chromosome 10 proteome. (DOC 43 kb)

Supplementary Table 3

Gene clusters on chromosome 10. (DOC 84 kb)

Supplementary Table 4

Interspread repeats. (DOC 36 kb)

Supplementary Table 5

Human transcripts with human-chimpanzee Single Base Differences (h-cSBDs). (DOC 3320 kb)

Supplementary Figure 1

The human chromosome 10 sequence and its features. Detailed view of all map and sequence features (tile path, G+C, CpG, Repeats, SNPs, Conserved regions, Gene families, Gene structures). (PDF 11747 kb)

Supplementary Figure 2

Overlapping genes on opposite stands. The LIPA gene and the IFIT gene cluster on chromosome 10. (PDF 176 kb)

Supplementary Figure Legends (DOC 21 kb)

PDF version of Figure 1

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deloukas, P., Earthrowl, M., Grafham, D. et al. The DNA sequence and comparative analysis of human chromosome 10. Nature 429, 375–381 (2004). https://doi.org/10.1038/nature02462

Download citation

Received: 17 December 2003
Accepted: 09 March 2004
Issue Date: 27 May 2004
DOI: https://doi.org/10.1038/nature02462

This article is cited by

Autosomal recessive congenital cataract is associated with a novel 4-bp splicing deletion mutation in a novel C10orf71 human gene
- M. Chograni
- H. M. Alahdal
- M. Rejili
Human Genomics (2023)
A perspective on molecular signalling dysfunction, its clinical relevance and therapeutics in autism spectrum disorder
- Sushmitha S. Purushotham
- Neeharika M. N. Reddy
- James P. Clement
Experimental Brain Research (2022)
Intellectual disability: dendritic anomalies and emerging genetic perspectives
- Tam T. Quach
- Harrison J. Stratton
- Anne-Marie Duchemin
Acta Neuropathologica (2021)
Genetic association and functional analysis of rs7903456 in FAM35A gene and hyperuricemia: a population based study
- Feng Yan
- Peng Sun
- Yujie Dai
Scientific Reports (2018)
Homeobox NKX2-3 promotes marginal-zone lymphomagenesis by activating B-cell receptor signalling and shaping lymphocyte dynamics
- Eloy F. Robles
- Maria Mena-Varas
- Jose A. Martinez-Climent
Nature Communications (2016)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Abstract

Similar content being viewed by others

Main

The clone map and finished sequence

The gene and protein index

Genomic landscape

Comparative sequence analysis

Sequence variation

Medical implications

Methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links