The genome of the social amoeba Dictyostelium discoideum

Eichinger, L.; Pachebat, J. A.; Glöckner, G.; Rajandream, M.-A.; Sucgang, R.; Berriman, M.; Song, J.; Olsen, R.; Szafranski, K.; Xu, Q.; Tunggal, B.; Kummerfeld, S.; Madera, M.; Konfortov, B. A.; Rivero, F.; Bankier, A. T.; Lehmann, R.; Hamlin, N.; Davies, R.; Gaudet, P.; Fey, P.; Pilcher, K.; Chen, G.; Saunders, D.; Sodergren, E.; Davis, P.; Kerhornou, A.; Nie, X.; Hall, N.; Anjard, C.; Hemphill, L.; Bason, N.; Farbrother, P.; Desany, B.; Just, E.; Morio, T.; Rost, R.; Churcher, C.; Cooper, J.; Haydock, S.; van Driessche, N.; Cronin, A.; Goodhead, I.; Muzny, D.; Mourier, T.; Pain, A.; Lu, M.; Harper, D.; Lindsay, R.; Hauser, H.; James, K.; Quiles, M.; Madan Babu, M.; Saito, T.; Buchrieser, C.; Wardroper, A.; Felder, M.; Thangavelu, M.; Johnson, D.; Knights, A.; Loulseged, H.; Mungall, K.; Oliver, K.; Price, C.; Quail, M. A.; Urushihara, H.; Hernandez, J.; Rabbinowitsch, E.; Steffen, D.; Sanders, M.; Ma, J.; Kohara, Y.; Sharp, S.; Simmonds, M.; Spiegler, S.; Tivey, A.; Sugano, S.; White, B.; Walker, D.; Woodward, J.; Winckler, T.; Tanaka, Y.; Shaulsky, G.; Schleicher, M.; Weinstock, G.; Rosenthal, A.; Cox, E. C.; Chisholm, R. L.; Gibbs, R.; Loomis, W. F.; Platzer, M.; Kay, R. R.; Williams, J.; Dear, P. H.; Noegel, A. A.; Barrell, B.; Kuspa, A.

doi:10.1038/nature03481

Article
Published: 05 May 2005

The genome of the social amoeba Dictyostelium discoideum

L. Eichinger¹^na1,
J. A. Pachebat^1,2^na1,
G. Glöckner³^na1,
M.-A. Rajandream⁴^na1,
R. Sucgang⁵^na1,
M. Berriman⁴,
J. Song⁵,
R. Olsen⁹,
K. Szafranski³,
Q. Xu^6,7,
B. Tunggal¹,
S. Kummerfeld²,
M. Madera²,
B. A. Konfortov²,
F. Rivero¹,
A. T. Bankier²,
R. Lehmann³,
N. Hamlin⁴,
R. Davies⁴,
P. Gaudet¹⁰,
P. Fey¹⁰,
K. Pilcher¹⁰,
G. Chen⁵,
D. Saunders⁴,
E. Sodergren^6,8,
P. Davis⁴,
A. Kerhornou⁴,
X. Nie⁵,
N. Hall⁴^nAff24,
C. Anjard⁹,
L. Hemphill⁵,
N. Bason⁴,
P. Farbrother¹,
B. Desany⁵,
E. Just¹⁰,
T. Morio¹¹,
R. Rost¹²,
C. Churcher⁴,
J. Cooper⁴,
S. Haydock¹³,
N. van Driessche⁶,
A. Cronin⁴,
I. Goodhead⁴,
D. Muzny⁸,
T. Mourier⁴,
A. Pain⁴,
M. Lu⁵,
D. Harper⁴,
R. Lindsay⁵,
H. Hauser⁴,
K. James⁴,
M. Quiles⁸,
M. Madan Babu²,
T. Saito¹⁴,
C. Buchrieser¹⁵,
A. Wardroper^2,16,
M. Felder³,
M. Thangavelu¹⁷,
D. Johnson⁴,
A. Knights⁴,
H. Loulseged⁸,
K. Mungall⁴,
K. Oliver⁴,
C. Price⁴,
M. A. Quail⁴,
H. Urushihara¹¹,
J. Hernandez⁸,
E. Rabbinowitsch⁴,
D. Steffen⁸,
M. Sanders⁴,
J. Ma⁵,
Y. Kohara¹⁸,
S. Sharp⁴,
M. Simmonds⁴,
S. Spiegler⁴,
A. Tivey⁴,
S. Sugano¹⁹,
B. White⁴,
D. Walker⁴,
J. Woodward⁴,
T. Winckler²⁰,
Y. Tanaka¹¹,
G. Shaulsky^6,7,
M. Schleicher¹²,
G. Weinstock^6,8,
A. Rosenthal³,
E. C. Cox²¹,
R. L. Chisholm¹⁰,
R. Gibbs^6,8,
W. F. Loomis⁹,
M. Platzer³,
R. R. Kay²,
J. Williams²²,
P. H. Dear²,
A. A. Noegel¹,
B. Barrell⁴ &
…
A. Kuspa^5,6

Nature volume 435, pages 43–57 (2005)Cite this article

27k Accesses
987 Citations
30 Altmetric
Metrics details

Abstract

The social amoebae are exceptional in their ability to alternate between unicellular and multicellular forms. Here we describe the genome of the best-studied member of this group, Dictyostelium discoideum. The gene-dense chromosomes of this organism encode approximately 12,500 predicted proteins, a high proportion of which have long, repetitive amino acid tracts. There are many genes for polyketide synthases and ABC transporters, suggesting an extensive secondary metabolism for producing and exporting small molecules. The genome is rich in complex repeats, one class of which is clustered and may serve as centromeres. Partial copies of the extrachromosomal ribosomal DNA (rDNA) element are found at the ends of each chromosome, suggesting a novel telomere structure and the use of a common mechanism to maintain both the rDNA and chromosomal termini. A proteome-based phylogeny shows that the amoebozoa diverged from the animal–fungal lineage after the plant–animal split, but Dictyostelium seems to have retained more of the diversity of the ancestral genome than have plants, animals or fungi.

You have full access to this article via your institution.

Download PDF

Phylogenomics and the rise of the angiosperms

Article Open access 24 April 2024

Single-cell analysis reveals context-dependent, cell-level selection of mtDNA

Article Open access 24 April 2024

Genomes of multicellular algal sisters to land plants illuminate signaling network evolution

Article Open access 01 May 2024

Main

The amoebozoa are a richly diverse group of organisms whose genomes remain largely unexplored. The soil-dwelling social amoeba Dictyostelium discoideum has been actively studied for the past 50 years and has contributed greatly to our understanding of cellular motility, signalling and interaction¹. For example, studies in Dictyostelium provided the first descriptions of a eukaryotic cell chemoattractant and a cell–cell adhesion protein^2,3.

Dictyostelium amoebae inhabit forest soil and consume bacteria and yeast, which they track by chemotaxis. Starvation, however, prompts the solitary cells to aggregate and develop as a true multicellular organism, producing a fruiting body comprised of a cellular, cellulosic stalk supporting a bolus of spores. Thus, Dictyostelium has evolved mechanisms that direct the differentiation of a homogeneous population of cells into distinct cell types, regulate the proportions between tissues and orchestrate the construction of an effective structure for the dispersal of spores⁴. Many of the genes necessary for these processes in Dictyostelium were also inherited by Metazoa and fashioned through evolution for use within many different modes of development.

The amoebozoa are also noteworthy as representing one of the earliest branches from the last common ancestor of all eukaryotes. Each of the surviving branches of the crown group of eukaryotes provides an example of the ways in which the ancestral genome has been sculpted and adapted by lineage-specific gene duplication, divergence and deletion. Comparison between representatives of these branches promises to shed light not only on the nature and content of the ancestral eukaryotic genome, but on the diversity of ways in which its components have been adapted to meet the needs of complex organisms. The genome of Dictyostelium, as the first free-living protozoan to be fully sequenced, should be particularly informative for these analyses.

Mapping, sequencing and assembly

An international initiative to sequence the genome of Dictyostelium discoideum AX4 (refs 5, 6) was launched in 1998. The high repeat content and (A + T)-richness of the genome (the latter rendering large-insert bacterial clones unstable) posed severe challenges for sequencing and assembly. The response to these challenges was to use a whole-chromosome shotgun (WCS) strategy, partially purifying each chromosome electrophoretically and treating it as a separate project. This approach was supported by novel statistical tools to recover chromosome specificity from the impure WCS libraries, and by highly detailed HAPPY maps that provided a framework for sequence assembly. These approaches have enabled the completion of this difficult genome to a high standard, and are likely to be valuable in tackling the many other genomes that present challenges of composition and complexity.

Genome mapping

To support sequence assembly, we made high-resolution maps of the chromosomes using HAPPY mapping^7,8,9, which relies on analysing the sequence content of single DNA molecules prepared by limiting dilution. A total of 3,902 markers selected mostly from the emerging shotgun data were mapped, and maps of all six chromosomes were assembled (see Methods and Table 1; see also Supplementary Fig. 1 and Supplementary Table 1).

Table 1 Sequence assembly details

Full size table

Genome sequencing and assembly

Two strategies were used to recover chromosome-specific data from impure WCS libraries (see Methods). The first (for chromosomes 1, 2 and 3) used enrichment of the respective libraries as the main statistical indicator of the chromosomal assignment of contigs, and HAPPY maps were used to guide assembly. The second strategy (for chromosomes 4, 5 and most of 6) used mapping data to assign sequences to chromosomes initially, with detailed HAPPY maps being used to validate final assemblies. A 1,508-kilobase (kb) portion of chromosome 6 was sequenced as a pilot project using a combination of approaches (see Methods).

Repetitive tracts complicated assembly. For chromosomes 1, 2 and 3, inspection of polymorphisms, combined with HAPPY maps, allowed unambiguous assembly in many cases. For chromosomes 4, 5 and 6, low-coverage sequencing of AX4-derived yeast artificial chromosomes (YACs) alleviated the problems by providing a local data set within which the troublesome repeat element was present as a single copy. Nevertheless, some repeat tracts proved intractable and remain as gaps. Thirty-four unlinked (floating) contigs of >1 kb, totalling 225,339 base pairs (bp), remain unpositioned in the genome, but can be provisionally assigned to specific chromosomes based on their content of reads from the WCS libraries. Most or all of these floating contigs are bounded by repetitive regions. The chromosome 2 sequence in the current assembly supersedes that previously published⁹, having benefited from further HAPPY mapping and manual sequence finishing.

The six chromosomal assemblies span 33,817 kb (Table 1), including ∼156 kb in the form of clone-, sequence- and repeat gaps. Assuming that most of the floating contigs lie beyond the termini of the assemblies, the total genome size is estimated at 34,042,810 bp. In estimating the completeness of the sequence, we note that of 967 well-characterized D. discoideum genes, 957 (99%) were found initially in the assemblies. Of the remaining ten, seven (cupE, trxA, trxB, trxC, staA, staB and cinB) have close matches, suggesting that their GenBank entries may contain errors or represent alternative alleles. Only three (fcpA, wasA and roco5) had no matches in the initial assemblies, although the first two of these were recovered by searches of unincorporated sequence followed by local reassembly. Of 133,168 ‘qualified’ D. discoideum AX4 expressed sequence tags (ESTs of >200 bp and >20% G + C, and not matching mitochondrial sequence; ref. 10 and H. Urushihara et al., unpublished data), 128,207 (96.3%) are found in the assemblies (the higher proportion of missing sequences among the ESTs probably reflects the higher error rate inherent in EST data).

We conclude that the current assembly represents ≫95% of the chromosomal sequence (less than 1% of which is in floating contigs) and ≥99% of genes, with most of the missing sequence comprising complex or simple repeats. The most stringent test of the medium- to long-range accuracy of the assembly comes from comparison with the HAPPY maps. This is particularly true for chromosomes 4, 5 and 6, where HAPPY markers were used to nucleate contigs but not to guide their assembly or ordering, specifically to allow such a comparison to be made without circularity of argument. As can be seen, good agreement between map and sequence confirms the accuracy of the assembly (Fig. 1).

**Figure 1: Chromosomal assemblies compared against HAPPY map data.**

Sequence characteristics of the genome

The genome is (A + T)-rich (77.57%) and has a broadly uniform composition, apart from the more (G + C)-rich repeat-dense regions (Fig. 2). On a finer scale, nucleotide composition tracks the distribution of exons (see below). Among dinucleotides, CpG is under-represented, not just in absolute terms but also relative to its isomer GpC (the former occurring only 62% as often as the latter). This bias normally reflects cytosine methylation at CpG sequences, promoting their mutation to TpG (which is over-represented relative to GpT by 38%). Hence, these observations suggest that cytosine methylation may occur in Dictyostelium, contrary to earlier findings¹¹.

**Figure 2: The genome of *Dictyostelium discoideum*.**

Simple sequence repeats are abundant and unusual

Simple sequence repeats (SSRs) are more abundant in Dictyostelium than in any other genome sequenced so far, comprising >11% of bases (Supplementary Fig. 2). In non-coding sequence, tracts of dinucleotides or longer motifs occur every 392 bp on average and comprise 6.4% of the bases. There is a bias towards repeat units of 3–6 bases, whereas dinucleotide tracts predominate in most other genomes. Homopolymer tracts are also abundant, comprising a further 16% of non-coding sequence. The base composition of non-coding SSRs and homopolymer tracts (99.2% A + T content) is even more biased than that of the surrounding sequence, suggesting that either selection or the mechanism of repeat expansion favours (A + T)-rich repeats.

Notably, SSRs are also abundant in protein-coding sequence, occurring on average every 724 bp within exons. We consider these coding SSRs in further detail below, in the context of proteins.

Transposable elements are clustered

The genome is rich in transposable elements^9,12. Completion of the sequence confirms the earlier observation that transposable elements of the same type are clustered, suggesting their preferential insertion within similar resident elements. However, none of the elements appears to use a specific sequence as a target for insertion: they insert at random within other elements of the same type. Non-long terminal repeat (LTR) retrotransposons are known to insert next to transfer RNA genes; we find many such instances (Fig. 2), but again no specific sequences were identified as insertion targets.

tRNAs are numerous and paired by specificity

The sequenced genome encodes 390 tRNAs, a number at the upper end of the eukaryotic spectrum (for example, Plasmodium falciparum = 43, Drosophila melanogaster = 284, Homo sapiens = 496). Allowing for the normal wobble rules in codon–anticodon pairing^13,14, every sense codon can be decoded, apart from the rare alanine codon GCG; we infer that the missing tRNA(s) lie in one or more gaps in the sequence. We also find a possible selenocysteine tRNA in the genome, as well as corresponding selenocysteine insertion targets in two predicted proteins (see Supplementary Fig. 3).

Dictyostelium, in common only with Acanthamoeba castellanii¹⁵, has been shown to lack certain apparently essential tRNAs in its mitochondrial genome¹⁶. It therefore seems likely that at least some chromosomally encoded tRNAs (those for valine, threonine, asparagine and glycine, as well as one arginine and two serine tRNAs) are imported into mitochondria.

Although the gross distribution of tRNAs is uniform, organization of tRNAs on a finer scale is striking: about 20% occur as pairs or triplets with identical anticodons (and usually 100% sequence identity), separated by <20 kb and often by <5 kb (Fig. 2). There are 41 such groups in the genome; a random distribution would produce few, if any. This pattern is unique among sequenced genomes, and suggests a wave of recent duplications. However, tRNA pairs are found in tandem, converging and diverging orientations with comparable frequencies, suggesting no straightforward duplication mechanism; nor is there usually duplication of extensive flanking sequences. Whether the preference of TRE elements for inserting adjacent to tRNAs is related to the large number and unusual distribution of tRNAs is unclear.

A chromosomal master copy of the extrachromosomal rDNA element

In Dictyostelium, ribosomal RNA genes lie on an 88-kb palindromic extrachromosomal element¹⁷, present at ∼100 copies per nucleus (Fig. 2). Evidence also exists of chromosomal copies: at least the central 3.2 kb of the element is located¹⁷ on chromosome 4, whereas chromosome 2 carries both a partial rDNA sequence and a 5S rRNA pseudogene^9,18.

In this study, two unanchored contigs assigned to chromosomes 4 and 5 contained junctions between rDNA sequences and complex repeats—attempts to extend the sequence and integrate these contigs into the assemblies failed owing to the highly repetitive nature of the adjoining sequences. We postulate that these contigs represent the junctions between a ‘master copy’ of the rDNA and the remainder of chromosome 4 (Fig. 2). One contig contains sequence matching a region of (G + C)-rich repeats near the centre of the palindrome, whereas the other matches sequence near the tip of the palindrome arm, adjacent to the one unclosed gap in the rDNA element sequence¹⁷. This gap is believed to represent a tandem array of short repeats, probably added post-synthetically to the extrachromosomal elements.

The structure of this master copy suggests a mechanism for generating the extrachromosomal copies by a process of transcription, hairpin formation and second-strand synthesis (Fig. 2). This process would account for the complete absence of sequence variation between the two arms of the palindrome.

Centromeres, telomeres and rearrangements

Repeat clusters may serve as centromeres

Centromeres mobilize eukaryotic chromosomes during cell division but vary widely in their structure and organization¹⁹, making them difficult to identify. Each Dictyostelium chromosome carries a single cluster of repeats rich in DIRS (Dictyostelium intermediate repeat sequence) elements^20,21 near one end²², and this sole but striking structural consistency suggests that these clusters may serve as centromeres. Although the repetitive nature of the chromosomal termini impeded their assembly, most of the cluster on chromosome 1 was assembled (Fig. 3) and shows a complex pattern of DIRS and related Skipper elements, each preferentially associated with others of the same type. Frequent insertions and partial deletions have created a mosaic with little long-range order.

**Figure 3: DIRS repeat region of chromosome 1.**

In Dictyostelium cells demonstrating condensed chromosomes characteristic of mitosis, DIRS-element probes hybridize to one end of each chromosome (Supplementary Fig. 4), consistent with the mapping data. DIRS-like elements in other species are more uniformly scattered along the chromosomes²³, suggesting that their restricted distribution in Dictyostelium chromosomes is functionally important. Furthermore, the DIRS-containing ends of the chromosomes cluster not only during mitosis, but also during interphase (Supplementary Fig. 4), as has been observed for centromeres in Schizosaccharomyces pombe²⁴.

rDNA sequences seem to act as telomeres

No (G + T)-rich telomere-like motifs were identified in the sequence; however, earlier findings²² suggested that the chromosomes terminate in the same (G + A)-rich repeat motif that caps the extrachromosomal rDNA element. We therefore surveyed all shotgun sequence to identify reads containing a junction between complex repetitive elements and rDNA-like sequence. Only 556 such reads were identified, of which 221 could be built into 13 contigs, which we refer to as C/R (complex-repeat/rDNA) junctions.

Of the 13 junctions, two represent known regions lying internally within the chromosomal assemblies. Of the remaining 11, one had twice the sequence coverage of the others, suggesting that it represents two distinct but identical portions of the genome (a possibility supported by the fact that another two of the junctions differed from each other by only two bases). Hence, we infer that the 11 remaining contigs represent 12 distinct junctions between repetitive elements and rDNA-like sequences—potentially one for every chromosomal end.

On the basis of their content of sequence reads from each of the whole-chromosome libraries, we assigned two of the C/R junctions to each of the chromosomes. Chromosomes 4 and 5 cannot be distinguished in this way, but three junctions, including the one believed to be present as two copies, are assigned to this chromosome pair. The point in the rDNA palindrome that is represented differs from one junction to the next (Supplementary Fig. 5), but several junctions fall at common parts of the palindrome. This may reflect a preference in the mechanism that forms or maintains the junctions, or may result from a homogenizing recombination between them or with other rDNA sequences. Certainly the low frequency of differences between the rDNA components of the junction fragments and the extrachromosomal rDNA element argues for some process that limits or rectifies mutation. At each junction, we see only the rDNA sequence that immediately adjoins the complex repeat, as further assembly is precluded by the multi-copy nature of rDNA. Therefore we cannot tell whether each junctional rDNA sequence extends to the telomere-repeat-carrying tip of the rDNA palindrome sequence, nor whether other sequences lie beyond the rDNA components.

HAPPY mapping of markers derived from six of these C/R junctions confirmed not only the chromosomal assignments that had been made based on the origins of their component sequences, but also their locations at the termini of the mapped regions of the chromosomes. For the other junctions, the absence of unique sequence features precluded such mapping. Taken as a whole, this evidence strongly suggests that rDNA-like elements form part of the telomere structure in D. discoideum, and that common mechanisms stabilize both the extrachromosomal rDNA element and the chromosomal termini.

Chromosome 2 duplication

Chromosome 2 of D. discoideum AX4 carries a perfect inverted 1.51-megabase (Mb) duplication (Fig. 2; see also refs 9, 25). This duplication, containing 608 genes, is known²⁵ to be absent from the wild-type isolate NC4 and from one of its direct descendents (AX2), but present in another (AX3); AX4 in turn is derived from AX3. The sequences adjoining the right-hand end of the duplication—a partial copy of a DIRS element (and a partial DDT-A element) and a region identical to part of the rDNA palindrome, both at about 3.74 Mb (Fig. 2)—have been implicated in centromeric and telomeric functions, respectively, elsewhere in the genome.

We propose that this duplication arose from a ‘breakage-fusion-bridge’ cycle as first described in maize²⁶ and since observed in many genomes. The nearby DIRS and rDNA components, in this view, represent abortive attempts to stabilize the halves of the broken chromosome by establishing new telomeres and centromeres, followed by re-fusion of the pieces to create a restored and enlarged chromosome (Supplementary Fig. 6).

Chromosome 2 (the largest of the chromosomes, even discounting the duplication in AX4) may be prone to breakage: in the Bonner isolate of NC4, maintained in vegetative growth for 50 years, chromosome 2 is represented by two smaller fragments²⁷. Comparison with more recent data²² indicates that the break point in NC4-Bonner lies in the same region as the duplication in AX4, suggesting that NC4-Bonner underwent the early stages of this process, but that the chromosome fragments were stabilized and maintained after the initial breakage. Preliminary results (data not shown) from HAPPY mapping also suggest that although wild-type isolates V12M2 and NC4 both lack the duplication seen in AX4, NC4 may carry a duplication of ∼300 kb near the opposite end of chromosome 2.

Content and organization of the proteome

Prediction of protein-coding genes (see Methods) was performed on the complete set of chromosomes and floating contigs (Table 2). In assessing the completeness and accuracy of the predictions, we find that of the 957 well-characterized D. discoideum genes that are present in the current sequence, 823 (86%) are predicted as transcripts with structures matching the experimentally determined ones. For a further 123 (13%), the predicted transcript differs from the experimentally determined one, about one-half of these differing only in their 5′ boundary; the remaining 11 (1%), although present in the sequence, were not predicted as transcripts. Similarly, of the 128,207 qualified ESTs present in the current sequence, 127,097 (99.1%) fall within predicted transcripts. Combining our estimate of sequence coverage (above) with these estimates of the success of gene prediction, we infer that approximately 98% of all D. discoideum genes are present in the predicted set.

Table 2 Comparison between the predicted protein-coding gene set of D. discoideum and those of other organisms

Full size table

The level of overprediction, conversely, is harder to estimate: prediction was performed generously to ensure that most true genes were represented. Of the 13,541 predicted proteins, 47.5% are represented by qualified ESTs, reflecting the inevitable bias in EST sampling. Among the shortest predicted proteins, fewer are represented by ESTs (for example, 21% of those of <60 amino acids); this is at least partly due to a higher level of overprediction. On the basis of the simplifying assumption that 50% of all genes coding for proteins of <100 amino acids are mis-predictions, we estimate the true number of genes at roughly 12,500. This number is closer to that seen in multicellular organisms rather than in most unicellular eukaryotes (Table 2). The same relative complexity is seen in the total number of amino acids encoded by the respective genomes; this measure of complexity is less affected by the inclusion of shorter (and hence more dubious) gene predictions. Introns in Dictyostelium are few and short, and intergenic regions are small, producing a compact genome of which 62% encodes protein.

Genes are distributed approximately uniformly across the genome (Fig. 2). Although we do not see widespread clustering of genes with coordinated expression patterns (see Methods), we do find statistically significant (P < 0.01) clusters of genes expressed predominantly at some developmental stages or in specific cell types (Fig. 2).

(A + T)-richness influences protein composition and codon usage

Codon usage in Dictyostelium favours codons of the form NNT or NNA over their NNG or NNC synonyms, the bias being even greater than for the (A + T)-rich Plasmodium genome. Comparison of tRNA and codon frequencies (Supplementary Table 2) reveals a similar picture to that in human²⁸ and other eukaryotes, suggesting that the same use is made of ‘wobble’ and of base modifications (for example, of adenine to inosine in some tRNAs) to expand the effective repertoire of tRNAs.

As in Plasmodium²⁹, the extreme (A + T)-richness is reflected not just in the choice of synonymous codons, but also in the amino acid composition of the proteins. Amino acids encoded solely by codons of the form WWN (where W indicates A or T and N indicates any base; these are Asn, Lys, Ile, Tyr and Phe) are much commoner in Dictyostelium proteins than in human ones; the reverse is true for those encoded solely by SSN codons (where S indicates C or G; these are Pro, Arg, Ala and Gly).

Geometry reflects phylogeny—duplications in the genome

The predicted gene set of Dictyostelium is rich in relatively recently duplicated genes. Of the 13,498 predicted proteins analysed, 3,663 fall into 889 families clustered by BLASTP similarities of e < 10^-40. Most (538) families contain only two members, but 351 families contain between three and 81 proteins (Supplementary Table 3). Hence, 2,774 (20%) of all predicted proteins have arisen by relatively recent duplication, potentially accounting for much of Dictyostelium's excess gene number compared with typical unicellular eukaryotes.

We tried to infer the mechanisms by which such duplications arise and propagate in the genome. Where members of a family are clustered on one chromosome, the physical distance between family members often (23 out of 86 families examined) correlates strongly with their evolutionary divergence (see Methods). Where a family is split between different chromosomes, members on the same chromosome are often (23 out of 50 families examined) more related to each other than to members on different chromosomes; the reverse is never observed.

These findings suggest that three processes combine to account for most of the duplications in Dictyostelium: tandem duplication, local inversion and interchromosomal exchange. In this model, gene families expand by tandem duplication of either single genes or blocks containing several consecutive genes, as in an earlier model³⁰; inversions within these expanding clusters may reverse local gene order. An elegant illustration of these two processes is provided by a cluster of acetyl-coA synthetases on chromosome 2 (Fig. 4). The third process (exchange of segments between chromosomes) may fragment these clusters at any stage. If such an interchromosomal exchange splits a gene family early in its expansion, then each of the two resulting subfamilies has a long subsequent period of evolution independent of the other, so similarities will be greatest between genes on the same chromosome. If, conversely, the split occurs later, then all family members, whether on the same chromosome or on different chromosomes, will tend to resemble each other equally closely. We cannot exclude the possibility of duplication occasionally creating a second copy of a gene, or group of genes, directly on a different chromosome from the first. However, all instances that we have examined can be accounted for without such intermolecular duplication.

**Figure 4: Phylogeny of gene family members compared to their physical order.**

Amino acid repeats

Tandem repeats of trinucleotides (and of motifs of 6, 9, 12, and so on, bases) are unusually abundant in Dictyostelium exons and naturally correspond to repeated sequences of amino acids. However, at the protein level the situation is even more extreme: there are many further amino acid repeats that use different synonymous codons, and so do not arise from perfect nucleotide repeats. Among the predicted proteins, there are 9,582 SSRs of amino acids (homopolymers of length ≥10, or ≥5 consecutive repeats of a motif of two or more amino acids). Of these, the most striking are polyasparagine and polyglutamine tracts of ≥20 residues, present in 2,091 of the predicted proteins. Also abundant are low-complexity regions such as QLQLQQQQQQQLQLQQ: there are 2,379 tracts of ≥15 residues composed of only two different amino acids. In total, repeats or simple-sequence tracts of amino acids (even by these conservative definitions) occur in 34% of predicted proteins and encode 3.3% of all amino acids.

It seems likely that these repeats have arisen through nucleotide expansion, but have been selected at the protein level. Evidence for selection at the protein level is that any given trinucleotide repeat occurs predominantly in only one of the three reading frames. For example, the repeat …ACAACAACAACA… is usually translated as polyglutamine ([CAA]n) rather than polythreonine ([ACA]n) or polyasparagine ([AAC]n). Further evidence comes from the many trinucleotide repeats that have apparently mutated to produce only synonymous codons (for example, …GATGACGATGATGAC…, translated as polyaspartate). Moreover, the distribution of repeats and simple-sequence tracts is nonrandom: most proteins either have no such features (66% of proteins) or have two or more (18% of proteins), suggesting that they are tolerated only in certain types of protein. The polyasparagine- and polyglutamine-containing proteins appear to be over-represented in protein kinases, lipid kinases, transcription factors, RNA helicases and messenger RNA binding proteins such as spliceosome components (Supplementary Fig. 9). Protein kinases and transcription factors are also over-represented in the polyasparagine- and polyglutamine-containing proteins of Saccharomyces cerevisiae, so it is possible that these homopolymers serve some functional role in these protein classes. A more detailed analysis of amino acid homopolymers is given in Supplementary Tables 4–6 and Supplementary Figs 7–10.

Phylogeny, evolution and comparative proteomics

The organisms that diverged from the last common ancestor of all eukaryotes followed different evolutionary paths, but all retained the basic properties of eukaryotic cells. Their genomes have been sculpted by chromosomal deletions and duplications that led to lineage-specific gene family expansions, reductions and losses, as well as genes with new functions^31,32. Our analysis of Dictyostelium's proteome shows that similar mechanisms have shaped its genome, augmented by horizontal gene transfer from bacterial species.

Phylogeny of eukaryotes based on complete proteomes

Using morphological criteria, early workers were unsure whether to classify Dictyostelids as fungi or protozoa³³. Molecular methods indicated that they were amoebozoa and also suggested that Dictyostelium diverged from the line leading to animals at about the same time as plants^34,35. A study of more than 100 proteins suggested that Dictyostelium diverged after the plant–animal split, but before the divergence of the fungi³⁶. The recent finding of a gene fusion encoding three pyrimidine biosynthetic enzymes, shared only by Dictyostelium, fungi and Metazoa, indicates that the amoebozoa are a true sister group of the fungi and Metazoa³⁷.

To examine the phylogeny of Dictyostelium on a genomic scale, we applied an improved method for predicting orthologous protein clusters to complete eukaryotic proteomes³⁸ (for details, see Supplementary Information). The data were used to construct a phylogenetic tree that confirms the divergence of Dictyostelium along the branch leading to the Metazoa soon after the plant–animal split (Fig. 5). Despite the earlier divergence of Dictyostelium, many of its proteins are more similar to human orthologues than are those of S. cerevisiae, probably due to higher rates of evolutionary change along the fungal lineage. Whether the greater similarity between amoebozoa and Metazoa proteins translates into a generally higher degree of functional conservation between them compared to the fungi remains to be seen.

**Figure 5: Proteome-based eukaryotic phylogeny.**

Proteins shared by Dictyostelium and major organism groups

To examine shared functions, we identified eukaryote-specific Superfamily and Pfam protein domains, and sorted them according to their presence or absence within 12 completely sequenced genomes to arrive at their distribution among the major organismal groups (see Supplementary Tables 7–10 and Supplementary Fig. 11). Plants, Metazoa, fungi and Dictyostelium all share 32% of the eukaryotic Pfam domains (Fig. 6). The protein domains present in Dictyostelium, Metazoa and fungi, but absent in plants, are interesting because they probably arose soon after plants diverged and before Dictyostelium diverged from the line leading to animals. The major classes of domains in this group of proteins include those involved in small and large G-protein signalling (for example, RGS proteins), cell cycle control and other domains involved in signalling (Supplementary Tables 8 and 9). It also appears that glycogen storage and usage arose as a metabolic strategy soon after the plant–animal divergence, because glycogen synthetase seems to have appeared in this evolutionary interval.

**Figure 6: Distribution of Pfam domains among eukaryotes.**

Particularly notable are the cases where otherwise ubiquitous domains appear to be completely absent in one group or another. For instance, Dictyostelium seems to have lost the genes that encode collagen domains, the circadian rhythm control protein timeless and basic helix–loop–helix transcription factors (Supplementary Table 7). Metazoa, on the other hand, appear to have lost receptor histidine kinases that are common in bacteria, plants and fungi, whereas Dictyostelium has retained and expanded its complement to 14 members³⁹.

Orthologues of human disease genes

An important motivation for sequencing the Dictyostelium genome was to aid the discovery of proteins that would facilitate studies of orthologues in human, with possible implications for human health. Although orthologues of human genes implicated in disease are of course present in many species, Dictyostelium provides a potentially valuable vehicle for studying their functions in a system that is experimentally tractable and intermediate in complexity between the yeasts and the higher multicellular eukaryotes. To assess the usefulness of Dictyostelium for investigating the functions of genes related to human disease we used the protein sequences of 287 confirmed human disease genes as queries and carried out a systematic search for putative orthologues in the Dictyostelium proteome⁴⁰. At a stringent threshold value of e ≤ 10^-40, we identified 64 such proteins. Of these, 33 were similar in length to the human protein and had similarity extending over >70% of the two proteins (Table 3). The number of Dictyostelium orthologues of human disease genes is lower than in D. melanogaster or Caenorhabditis elegans but higher than in S. cerevisiae or S. pombe. Of the 33 putative orthologues of confirmed human disease genes in Dictyostelium, five are absent in both S. cerevisiae and S. pombe (e-value ≤10^-30), a further four are absent from S. cerevisiae and two are not found in S. pombe.

Table 3 Dictyostelium genes related to human disease genes

Full size table

Horizontal gene transfer

The acquisition of genes by horizontal transfer from one species to another (HGT) has become increasingly recognized as a mechanism of genome evolution^41,42,43. We identified 18 potential instances of HGTs, by screening Dictyostelium protein domains that are similar to bacteria-specific Pfam domains and have phyletic relationships consistent with HGT (see Supplementary Information). The transferred domains appear to have replaced functions, added new functions or evolved into new functions (Table 4). The thy1 gene, which encodes an alternative form of thymidylate synthase (ThyX), appears to have replaced the endogenous gene, as the conventional thymidylate synthase (ThyA) is not present⁴⁴. Other HGT domains also have established functions, which are presumably retained and give Dictyostelium the ability to degrade bacterial cell walls (dipeptidase), scavenge iron (siderophore), or resist the toxic effects of tellurite in the soil (terD). Still other horizontally transferred domains have become embedded within Dictyostelium genes that encode larger proteins. An example of this is the Cna B domain that is found within four large predicted proteins, one of which, colossin A, is predicted to be 1.2 MDa (Supplementary Fig. 12).

Table 4 Candidate horizontal gene transfers from bacteria

Full size table

Dictyostelium ecology

Dictyostelium faces many complex ecological challenges in the soil. Amoebae, fungi and bacteria compete for limited resources in the soil while defending themselves against predation and toxins. For instance, the nematode C. elegans is a competitor for bacterial food and a predator of Dictyostelium amoebae, but also a potential dispersal agent for Dictyostelium spores⁴⁵. Dictyostelium has expanded its repertoire of several protein classes that are probably crucial for such interspecies interactions and for survival and motility in this complex ecosystem.

Polyketide synthases

A small number of natural products have already been identified from Dictyostelium, but the gene content suggests that it is a prolific producer of such molecules. Some of them may act as signals during development, such as the dichlorohexanophenone DIF-1, but others are likely to mediate currently unknown ecological interactions⁴⁶. Many antibiotics and secondary metabolites destined for export are produced by polyketide synthases, modular proteins of around 3,000 amino acids⁴⁷. We identified 43 putative polyketide synthases in Dictyostelium (see Supplementary Information). By contrast, S. cerevisiae completely lacks polyketide synthases and Neurospora crassa has only seven. Furthermore, two of the Dictyostelium proteins have an additional chalcone synthase domain, representing a type of polyketide synthase most typical of higher plants and found to be exclusively shared by Dictyostelium, fungi and plants. In addition to polyketide synthases, the predicted proteome has chlorinating and dechlorinating enzymes as well as O-methyl transferases, which could increase the diversity of natural products made. Thus, Dictyostelium appears to have a large secondary metabolism, which warrants further investigation.

ABC transporters

ATP-binding cassette (ABC) transporters are prevalent in the proteomes of soil microorganisms and are thought to provide resistance to xenobiotics through their ability to translocate small-molecule substrates across membranes against a substantial concentration gradient^48,49,50,51. There are 66 ABC transporters encoded by the genome, which can be classified according to the subfamilies defined in humans (ABCA, ABCB, ABCC, ABCD, ABCE, ABCF and ABCG) based on domain arrangement and signature sequences⁵². At least 20 of them are expressed during growth and are probably involved in detoxification and the export of endogenous secondary metabolites.

Cellulose degradation

Many of the predicted cellulose-degrading enzymes in the proteome (see Supplementary Information) that have secretion signals are expressed in growing cells that do not produce cellulose⁵³. The proteome also contains one xylanase enzyme that can degrade the xylan polymers that are often found associated with the cellulose of higher plants. Perhaps Dictyostelium uses these enzymes to degrade plant tissue into particles that are then taken up by cells. These enzymes may also aid in the breakdown of cellulose-containing microorganisms upon which Dictyostelium feeds. Alternatively, these enzymes may promote the growth of bacteria that can serve as food, because Dictyostelium's habitat also contains cellulose-degrading bacteria.

Specializations for cell motility

During both growth and development, Dictyostelium amoebae display motility that is characteristic of human leukocytes⁵⁴. As a consequence, studies of Dictyostelium have contributed significantly to cytoskeleton research⁵⁵. Dictyostelium's survival depends on an ability to efficiently sense, track and consume soil bacteria using sophisticated systems for chemotaxis and phagocytosis. Its multicellular development depends on chemotactic aggregation of individual amoebae and the coordinated movement of thousands of cells during fruiting body morphogenesis. The proteome reveals an astonishing assortment of proteins that are used for robust, dynamic control of the cytoskeleton during these processes. As suggested by functional parallels to human cells, these proteins are most similar to metazoan proteins in their variety and domain arrangements (Fig. 7; see also Supplementary Table 11). Surprisingly, although the actin cytoskeleton has been studied for over 25 years, 71 putative actin-binding proteins apparently escaped classical methods of discovery. For example, actobindins had not been previously recognized in Dictyostelium. Curiously, the actin depolymerization factor (ADF) and calponin homology (CH) domain proteins appear to have diversified by domain shuffling, a substantial fraction having domain combinations unique to Dictyostelium (Supplementary Table 12 and Supplementary Fig. 13). In addition to 30 actin genes, there are also orthologues of all actin-related protein (ARP) classes present in mammals, as well as three founding members of a new class (Supplementary Fig. 14).

**Figure 7: Microfilament system proteins.**

Cytoskeletal remodelling during chemotaxis and phagocytosis is regulated by a considerable number of upstream signalling components. Of the 18 Rho family GTPases in Dictyostelium, some are clear Rac orthologues and one belongs to the RhoBTB subfamily⁵⁶. However, the Cdc42 and Rho subfamilies characteristic of Metazoa and fungi are absent, as are the Rho subfamily effector proteins. The activities of these GTPases are regulated by two members of the RhoGDI family, by components of ELMO1–DOCK180 complexes and by a large number of proteins carrying RhoGEF and RhoGAP domains (> 40 of each), most of which show domain compositions not found in other organisms. Remarkably, Dictyostelium appears to be the only lower eukaryote that possesses class I phosphatidylinositol-3-OH kinases, which are at the crossroad of several critical signalling pathways (for details of the regulators and their effectors, see Supplementary Table 13)⁵⁷. The diverse array of these regulators and the discovery of many additional actin-binding proteins suggest that there are many aspects of cytoskeletal regulation that have yet to be explored.

Multicellularity and development

The evolution of multicellularity was arguably as significant as the origin of the eukaryotic cell in enabling the diversification of life. The common unicellular ancestor of the crown group of organisms must have possessed the basic machinery to regulate nutrient uptake, metabolism, cellular defence and reproduction, and it is likely that these mechanisms were adapted to integrate the functions of cells in multicellular organisms. Dictyostelium achieved multicellularity through a different evolutionary route compared with plants and animals, yet the ancestors of these respective groups probably started with the same endowment of genes and faced the same problem of achieving cell specialization and tissue organization.

When starved, Dictyostelium develops as a true multicellular organism, organizing distinct tissues within a motile slug and producing a fruiting body comprised of a cellular, cellulosic stalk supporting a bolus of spores⁴. Thus, Dictyostelium has evolved differentiated cell types and the ability to regulate their proportions and morphogenesis. A broad survey of proteins required for multicellular development shows that Dictyostelium has retained cell adhesion and signalling modules normally associated exclusively with animals, whereas the structural elements of the fruiting body and terminally differentiated cells clearly derive from the control of cellulose deposition and metabolism now associated with plants. The Dictyostelium genome offers a first glimpse of how multicellularity evolved in the amoebozoan lineage. In the following sections, we consider some of the systems that are particularly relevant to cellular differentiation and integration in a multicellular organism.

Signal transduction through G-protein-coupled receptors

The needs of multicellular development add greatly to those of chemotaxis in demanding dynamically controlled and highly selective signalling systems. G-protein-coupled cell surface receptors (GPCRs) form the basis of such systems in many species, allowing the detection of a variety of environmental and intra-organismal signals such as light, Ca²⁺, odorants, nucleotides and peptides. They are subdivided into six families, which, despite their conserved secondary domain structure, do not share significant sequence similarity⁵⁸. Until recently, in Dictyostelium only the seven CAR/CRL (cAMP receptor/ cAMP receptor-like) family GPCRs had been examined in detail^59,60. Surprisingly, a detailed search uncovered 48 additional putative GPCRs of which 43 can be grouped into the secretin (family 2), metabotropic glutamate/GABA_B (family 3) and the frizzled/smoothened (family 5) families of receptors (Fig. 8; see also Supplementary Information). The presence of family 2, 3 and 5 receptors in Dictyostelium was surprising because they had been thought to be specific to animals. Their occurrence in Dictyostelium suggests that they arose before the divergence of the animals and fungi and were later lost in fungi, and that the radiation of GPCRs pre-dates the divergence of the animals and fungi. The secretin family is particularly interesting because these proteins were thought to be of relatively recent origin, appearing closer to the time of the divergence of animals⁶¹. The putative Dictyostelium secretin GPCR does not contain the characteristic GPCR proteolytic site, but its transmembrane domains are clearly more closely related to secretin GPCRs than to other families (Fig. 8). Many downstream signalling components that transduce GPCR signals could also be recognized in the proteome, including heterotrimeric G-protein subunits (fourteen Gα, two Gβ and one Gγ proteins) and seven regulators of G-protein signalling (RGS) that share highest similarity with the R4 subfamily of mammalian RGS proteins.

**Figure 8: The G-protein-coupled receptors.**

SH2 domain signalling

In animals, SH2 domains act as regulatory modules of proteins in intracellular signalling cascades, interacting with phosphotyrosine-containing peptides in a sequence-specific manner. Dictyostelium is the only organism, outside of the animal kingdom, where SH2 domain phosphotyrosine signalling has been shown to occur⁶². What has been lacking in Dictyostelium is evidence of the other components of such signalling pathways; that is, equivalents of the metazoan SH2-domain-containing receptors, adaptors and targeting proteins. Three newly predicted proteins are strong candidates for these roles (Supplementary Fig. 15). One of them, CblA, is highly related to the metazoan Cbl proto-oncogene product. This is entirely unexpected because it is the first time that a Cbl homologue has been observed outside the animal kingdom. The Cbl protein is a ‘RING finger’ ubiquitin-protein ligase that recognizes activated receptor tyrosine kinases and various molecular adaptors⁶³. Remarkably, the Cbl SH2 domain went unrecognized in the protein sequence, but it was revealed when the crystal structure of the protein was determined⁶⁴. Thus, although SH2 domain proteins are less prevalent in Dictyostelium, there is the potential for the kind of complex interactions that typify metazoan SH2 signalling pathways.

ABC transporter signalling

Dictyostelium, like other organisms, has adapted ABC transporters to control various developmental signalling events. Several ABC transporters (TagA, TagB and TagC) are used for peptide-based signalling, similar to that previously observed for mating in S. cerevisiae and antigen presentation in human T cells^65,66,67. The novel domain arrangement of the Tag proteins—a serine protease domain fused to a single transporter domain—suggests that they have been selected for improved efficiency in signal production. Additional ABC transporters are needed for cell fate determination in Dictyostelium, suggesting that this ubiquitous protein family may be used in similar developmental contexts within many different species⁶⁸.

Kinases and transcription factors

Much cellular signal transduction involves the regulation of protein function through phosphorylation by protein kinases, often leading to the reprogramming of gene transcription in response to extracellular signals. The Dictyostelium proteome contains 295 predicted protein kinases, representing as wide a spectrum of kinase families as that observed in Metazoa (Supplementary Tables 14–16 and Supplementary Fig. 16). Given the presence of SH2-domain-based signalling it was surprising that no receptor tyrosine kinases could be recognized in the genome. However, Dictyostelium has a number of other receptor kinases, such as the histidine kinases and a group of eight novel putative receptor serine/threonine kinases, which are involved in nutrient and starvation sensing⁶⁹. Most of the ubiquitous families of transcription factors are represented in Dictyostelium, with the notable exception of the otherwise ubiquitous basic helix–loop–helix proteins (Supplementary Table 17 and Supplementary Fig. 17). Compared with other eukaryotes, Dictyostelium appears to have fewer transcription factors relative to the total number of genes, suggesting that many transcription factors have yet to be defined, or that the activities of a smaller repertoire of factors are combined and controlled to achieve complex regulation (Supplementary Table 18 and Supplementary Fig. 18).

Cell adhesion

Throughout Dictyostelium development, cells must modulate their adhesiveness to the substrate, to the extracellular matrix and to other cells in order to create tissues and carry out morphogenesis. To accomplish this, Dictyostelium uses a surprising number of components that have been normally only associated with animals. For example, disintegrin proteins regulate cell adhesiveness and differentiation in a number of Metazoa, and at least one Dictyostelium disintegrin, AmpA, is needed throughout development for cell fate specification⁷⁰. We also identified distant relatives of vinculin and α-catenin—normally associated with adherens junctions—which support the idea that the epithelium-like sheet of cells that surrounds the stalk tube contains such junctions⁷¹. Consistent with this, the Dictyostelium genome encodes numerous proteins previously described as components of adherens junctions in Metazoa, such as β-catenin (Aardvark), α-actinin, formins, VASP and myosin VII.

In animals, tandem repeats of immunoglobulin, cadherin, fibronectin III or E-set domains are often present in cell adhesion proteins, although their common protein fold pre-dates the emergence of eukaryotes. EGF/laminin domains are also found in adhesion proteins but, before the analysis of the Dictyostelium genome, no non-metazoan was known to have more than two EGF repeats in a single predicted protein. Dictyostelium has 61 predicted proteins containing repeated E-set or EGF/laminin domains, and many of these contain additional domains that suggest they have roles in cell adhesion or cell recognition, such as mannose-6-phosphate receptor, fibronectin III, or growth factor receptor domains and transmembrane domains (Fig. 9). In support of this idea, four of these proteins (LagC, LagD, AmpA and ComC) have been shown to be required for cell adhesion and signalling during development^70,72,73,74.

**Figure 9: Putative adhesion/signalling proteins.**

Cellulose-based structures

During development, Dictyostelium cells produce a number of cellulose-based structural elements. Dictyostelium slugs synthesize an extracellular matrix, or sheath, around themselves that is comprised of proteins and cellulose. Several of the smaller sheath proteins bind cellulose and are believed to have a role in slug migration, whereas the larger, cysteine-rich EcmA protein is essential for full integrity of the sheath and for establishing correct slug shape^75,76. During terminal differentiation, cellulose is deposited in the stalk and in the cell walls of the stalk and spore cells^77,78,79. The first confirmed eukaryotic gene for cellulose synthase was discovered in Dictyostelium and this gene has since been recognized in many plants, N. crassa and the ascidian Ciona intestinalis⁸⁰. The fungal and urochordate enzymes are more closely related to the Dictyostelium homologue than to plant or bacterial cellulose synthases, indicating that the common ancestor of fungi and animals carried a gene for cellulose synthase that was subsequently lost in most animals. The Dictyostelium genome encodes more than 40 additional proteins that are likely to be involved in cellulose synthesis or degradation, and are probably involved in the production and remodelling of cellulose fibres of the slug sheath, stalk tube and cell walls (see Supplementary Information).

The fundamental similarities in cellular cooperation found in Dictyostelium and in the Metazoa clearly resulted in a parallel positive selection for structural and regulatory genes required for cell motility, adhesion and signalling. Dictyostelium uses a set of signals and adhesion proteins that are distinct from those employed for similar purposes in Metazoa but, like the Metazoa, Dictyostelium has maintained a diversity of GPCRs, protein kinases and ABC transporters that enable it to respond to those signals. Dictyostelium has also retained and modified an organizational strategy perfected in plants, basing several structural elements on cellulose. At one level Dictyostelium has achieved multicellularity by using strategies that are similar to plants and Metazoa, but the differences between them suggest convergent evolution, rather than lineal descent from an ancestor with overt or latent multicellular capacities.

Conclusion

The complete protein repertoire of Dictyostelium provides a new perspective for studying its cellular and developmental biology. At a systems level, Dictyostelium provides a level of complexity that is greater than the yeasts, but much simpler than plants or animals. Thus, high-resolution molecular analyses in this system may reveal control networks that are difficult to study in more complex systems, and may presage regulatory strategies used by higher organisms^81,82,83. At a practical level, the comparative genomics of Dictyostelium and related pathogens, such as Entamoeba histolytica, should aid in the functional definition of amoebozoa-specific genes that may open new avenues of research aimed at controlling amoebic diseases. Dictyostelium's adeptness at hunting bacteria also renders it susceptible to infections by intracellular bacterial pathogens^84,85. Dictyostelium and human macrophages display fundamental similarities in their cell biology, which has spurred the use of Dictyostelium as a model host for bacterial pathogenesis. It is also an attractive model in which to study other disease processes: for a number of human disease-related proteins, it provides a test-bed for studying their functions in a model organism that has greater similarity to higher eukaryotes than do the yeasts, yet shares the latter's experimental tractability.

The high frequency of repeated amino acid tracts in Dictyostelium proteins has long been known anecdotally, but we can now survey their precise nature and number, and find them to be more abundant than in any other sequenced genome. Many human diseases result from the expansion of triplet nucleotide repeats, some of which encode polyglutamine tracts that cause cell degeneration^86,87. Learning how Dictyostelium cells tolerate so many proteins with amino acid homopolymers will, we hope, help to elucidate the roles of these motifs in protein function and dysfunction.

Comparative genomic studies in eukaryotes are providing the raw material for global examinations of the evolution of cellular regulation and developmental mechanisms⁸⁸. Many genes have been lost in one species but retained in others, such that each new genome sequence adds to our understanding of the genetic complement of the eukaryotic progenitor. Thus, our understanding of eukaryotes will continue to be refined as more genome sequences become available from representatives of large groups of organisms whose genomes remain largely unexplored, such as the amoebozoa. The surprising molecular diversity of the Dictyostelium proteome, which includes protein assemblages usually associated with fungi, plants or animals, suggests that their last common ancestor had a greater number of genes than had been previously appreciated.

Methods

Details on the availability of reagents can be found in the Supplementary Information. All analyses described here were performed on Version 2.0 of the genome sequence. Updates to the sequence and annotation are available at http://www.dictybase.org and http://www.genedb.org/genedb/dicty/index.jsp. Further details of analyses not explicitly described below can be found in the Supplementary Information.

HAPPY mapping

A short-range (∼ 100-kb), high-resolution (± 8.54-kb) mapping panel was prepared as described⁹. Briefly, 96 aliquots each containing ± 0.52 haploid genome equivalents of sheared AX4 genomic DNA were pre-amplified by PEP (primer extension pre-amplification⁸⁹). A total of 4,913 STS markers (Supplementary Table 1) were typed by two-phase hemi-nested polymerase chain reaction (PCR; multiplexed for up to 1,200 markers in the first phase) on aliquots of the diluted PEP products. Maps were assembled from good-quality data essentially as described previously⁸. A second, longer-range (± 150 kb) mapping panel was used to confirm some linkages on chromosomes 2 and 5. HAPPY map analysis and PCR primer design for HAPPY mapping was performed using various custom programs (P.H.D. and A.T.B., unpublished).

Chromosome purification

Genomic DNA from D. discoideum strain AX4 was prepared and separated by pulsed field gel electrophoresis essentially as described^27,9, except that gels were run in stacked pairs; one member of each pair was stained with ethidium bromide, and bands excised from its unstained counterpart by alignment.

WCS and YAC subclone libraries

For WCS libraries, gel slices (above) were disrupted by several passages through a 30-gauge syringe needle, digested with β-agarase (NEB) and phenol-extracted. DNA was concentrated by ethanol precipitation, sonicated, end-blunted using mung bean nuclease and size-fractionated on 0.8% low-melting-point agarose gels. Fractions of 1.4–2 kb and 2–4 kb were excised, DNA extracted as before and ligated into the SmaI site of pUC18 or pUC19. Clone propagation and template preparation followed standard protocols.

For YAC subclone libraries, AX4-derived YACs were identified (and their position and integrity confirmed) by screening the set described by ref. 22 using markers from the HAPPY map. Subclones were prepared from PFG-purified YACs essentially as for the WCS libraries; contaminating yeast-derived sequences were filtered out in silico.

Sequencing and assembly

Details of the sequencing and assembly methods can be found in Supplementary Information. Generally, mapped sequence features were used to nucleate sequence contigs assembled from the WCS data, and extended using read-pair information and iterative searches for overlapping sequences, followed by directed gap closure using a range of approaches.

Fluorescent in situ hybridization

In situ hybridization was performed as in ref. 17.

Gene prediction and identification of sequence features

Full details are provided in the Supplementary Information. Briefly, automated gene prediction was performed using a combination of programs that had been trained on well-characterized D. discoideum genes, and the results integrated with reference to D. discoideum complementary DNA sequences and homology to genes in other species. Other features in the predicted proteins, and other sequence features, were identified using a variety of software packages.

Analysis of functional gene clustering

Microarray targets (refs 53, 90, 91; and N. Van Driessche and G. Shaulsky, unpublished data) and gene models were mapped onto the genome sequence using BLAST⁹² and the modified LIS algorithm⁹³. To look for clustering of genes with correlated temporal expression profiles, pairwise correlation coefficients were calculated for genes with known expression profiles on each chromosome⁹¹. Blocks of ≥6 consecutive genes were sought, for which either (1) all pairwise correlation coefficients were positive and ≥70% were >0.2 (genes with similar developmental trajectories) or (2) each gene had a partner with an absolute correlation coefficient value of >0.6 (tightly co-regulated genes); no statistically significant clusters met these criteria.

To look for clustering of genes associated with specific developmental stages^94,95 or cell types^90,96, the genome was scanned with various sized windows⁹⁷ for regions with significant (P < 0.01) over-representation of genes in any one of these groups.

Analysis of duplicated genes

Predicted protein sequences were clustered using TribeMCL⁹⁸, using a BLASTP expectation of <10^-40 as a cutoff. A χ² test invalidated the hypothesis that members of a family are randomly distributed in the genome. Within each family, protein divergences (similarity distances computed using the ‘Protdist’ module of PHYLIP; http://evolution.genetics.washington.edu/phylip.html) and physical intergenic distances between all pairs of family members were tabulated, and the correlation coefficient between the former and latter values was calculated. Analysis was performed on the 86 gene families (representing 155 gene pairs) with at least 10 intrachromosomal distance pairings to provide robust statistical confidence.

Other sequence analyses and graphical representation

Other sequence analyses (nucleotide and dinucleotide composition; identification of simple-sequence repeats in nucleotide and protein sequence; coding density computation; tRNA cluster identification) were performed using a range of custom software (P.H.D. and A.T.B., unpublished). Graphical representation of chromosomes in Fig. 2 was done primarily using Cinema4D-8.5 (Maxon Computer GmbH) after pre-processing using custom software (P.H.D.).

References

Kessin, R. H. Dictyostelium—Evolution, Cell Biology, and the Development of Multicellularity, xiv, 294 (Cambridge Univ. Press, Cambridge, 2001)
Book Google Scholar
Konijn, T. M. et al. The acrasin activity of adenosine-3′,5′-cyclic phosphate. Proc. Natl Acad. Sci. USA 58, 1152–1154 (1967)
Article ADS CAS PubMed PubMed Central Google Scholar
Müller, K. & Gerisch, G. A specific glycoprotein as the target site of adhesion blocking Fab in aggregating Dictyostelium cells. Nature 274, 445–449 (1978)
Article ADS PubMed Google Scholar
Raper, K. B. Pseudoplasmodium formation and organization in Dictyostelium discoideum . J. Elisha Mitchell Sci. Soc. 56, 241–282 (1940)
Google Scholar
Raper, K. B. Dictyostelium discoideum, a new species of slime mold from decaying forest leaves. J. Agr. Res. 50, 135–147 (1935)
Google Scholar
Knecht, D. A., Cohen, S. M., Loomis, W. F. & Lodish, H. F. Developmental regulation of Dictyostelium discoideum actin gene fusions carried on low-copy and high-copy transformation vectors. Mol. Cell. Biol. 6, 3973–3983 (1986)
CAS PubMed PubMed Central Google Scholar
Dear, P. H. & Cook, P. R. HAPPY mapping—linkage mapping using a physical analog of meiosis. Nucleic Acids Res. 21, 13–20 (1993)
Article CAS PubMed PubMed Central Google Scholar
Konfortov, B. A., Cohen, H. M., Bankier, A. T. & Dear, P. H. A high-resolution HAPPY map of Dictyostelium discoideum chromosome 6. Genome Res. 10, 1737–1742 (2000)
Article CAS PubMed PubMed Central Google Scholar
Glöckner, G. et al. Sequence and analysis of chromosome 2 of Dictyostelium discoideum . Nature 418, 79–85 (2002)
Article ADS PubMed CAS Google Scholar
Urushihara, H. et al. Analyses of cDNAs from growth and slug stages of Dictyostelium discoideum . Nucleic Acids Res. 32, 1647–1653 (2004)
Article CAS PubMed PubMed Central Google Scholar
Smith, S. S. & Ratner, D. I. Lack of 5-methylcytosine in Dictyostelium discoideum DNA. Biochem. J. 277, 273–275 (1991)
Article CAS PubMed PubMed Central Google Scholar
Glöckner, G. et al. The complex repeats of Dictyostelium discoideum . Genome Res. 11, 585–594 (2001)
Article PubMed PubMed Central Google Scholar
Crick, F. H. Codon-anticodon pairing: the wobble hypothesis. J. Mol. Biol. 19, 548–555 (1966)
Article CAS PubMed Google Scholar
Soll, D. & RajBhandary, U. (ed.) tRNA: Structure, Biosynthesis and Function (ASM, Washington DC, 1995)
Burger, G., Plante, I., Lonergan, K. M. & Gray, M. W. The mitochondrial DNA of the amoeboid protozoon, Acanthamoeba castellanii: complete sequence, gene content and genome organization. J. Mol. Biol. 3, 522–537 (1995)
Article Google Scholar
Ogawa, S. et al. The mitochondrial DNA of Dictyostelium discoideum: complete sequence, gene content and genome organization. Mol. Gen. Genet. 263, 514–519 (2000)
CAS PubMed Google Scholar
Sucgang, R. et al. Sequence and structure of the extrachromosomal palindrome encoding the ribosomal RNA genes in Dictyostelium . Nucleic Acids Res. 31, 2361–2368 (2003)
Article CAS PubMed PubMed Central Google Scholar
Szafranski, K., Dingermann, T., Glöckner, G. & Winckler, T. Template jumping by a LINE reverse transcriptase has created a SINE-like 5S rRNA retropseudogene in Dictyostelium . Mol. Genet. Genom. 271, 98–102 (2004)
Article CAS Google Scholar
Eichler, E. E. & Sankoff, D. Structural dynamics of eukaryotic chromosome evolution. Science 301, 793–797 (2003)
Article ADS CAS PubMed Google Scholar
Cappello, J., Cohen, S. M. & Lodish, H. F. Dictyostelium transposable element DIRS-1 preferentially inserts into DIRS-1 sequences. Mol. Cell. Biol. 4, 2207–2213 (1984)
CAS PubMed PubMed Central Google Scholar
Cappello, J., Handelsman, K. & Lodish, H. F. Sequence of Dictyostelium DIRS-1: an apparent retrotransposon with inverted terminal repeats and an internal circle junction sequence. Cell 43, 105–115 (1985)
Article CAS PubMed Google Scholar
Loomis, W. F., Welker, D., Hughes, J., Maghakian, D. & Kuspa, A. Integrated maps of the chromosomes in Dictyostelium discoideum . Genetics 141, 147–157 (1995)
Article CAS PubMed PubMed Central Google Scholar
Goodwin, T. J. & Poulter, R. T. Multiple LTR-retrotransposon families in the asexual yeast Candida albicans . Genome Res. 10, 174–191 (2000)
Article CAS PubMed Google Scholar
Appelgren, H., Kniola, B. & Ekwall, K. Distinct centromere domain structures with separate functions demonstrated in live fission yeast cells. J. Cell Sci. 116, 4035–4042 (2003)
Article CAS PubMed Google Scholar
Kuspa, A., Maghakian, D., Bergesch, P. & Loomis, W. F. Physical mapping of genes to specific chromosomes in Dictyostelium discoideum . Genomics 13, 49–61 (1992)
Article CAS PubMed Google Scholar
McClintock, B. The production of homozygous deficient tissues with mutant characteristics by means of the aberrant mitotic behaviour of ring-shaped chromosomes. Genetics 23, 315–376 (1938)
Article CAS PubMed PubMed Central Google Scholar
Cox, E. C., Vocke, C. D., Walter, S., Gregg, K. Y. & Bain, E. S. Electrophoretic karyotype for Dictyostelium discoideum . Proc. Natl Acad. Sci. USA 87, 8247–8251 (1990)
Article ADS CAS PubMed PubMed Central Google Scholar
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–941 (2001)
Article Google Scholar
Gardner, M. J. et al. Genome sequence of the human malaria parasite Plasmodium falciparum . Nature 419, 498–511 (2002)
Article ADS CAS PubMed Google Scholar
Trusov, Y. A. & Dear, P. H. A molecular clock based on the expansion of gene families. Nucleic Acids Res. 24, 995–999 (1996)
Article CAS PubMed PubMed Central Google Scholar
Kellis, M., Birren, B. W. & Lander, E. S. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae . Nature 428, 617–624 (2004)
Article ADS CAS PubMed Google Scholar
Dujon, B. et al. Genome evolution in yeasts. Nature 430, 35–44 (2004)
Article ADS PubMed Google Scholar
Raper, K. B. The Dictyostelids (Princeton Univ. Press, Princeton, New Jersey, 1984)
Book Google Scholar
Loomis, W. F. & Smith, D. W. Consensus phylogeny of Dictyostelium . Experientia 51, 1110–1115 (1995)
Article CAS PubMed Google Scholar
Baldauf, S. L. & Doolittle, W. F. Origin and evolution of the slime molds (Mycetozoa). Proc. Natl Acad. Sci. USA 94, 12007–12012 (1997)
Article ADS CAS PubMed PubMed Central Google Scholar
Bapteste, E. et al. The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba . Proc. Natl Acad. Sci. USA 99, 1414–1419 (2002)
Article ADS CAS PubMed PubMed Central Google Scholar
Nara, T., Hshimoto, T. & Aoki, T. Evolutionary implications of the mosaic pyrimidine-biosynthetic pathway in eukaryotes. Gene 257, 209–222 (2000)
Article CAS PubMed Google Scholar
Olsen, R. & Loomis, W. F. A model of orthologous protein sequence divergence. J. Mol. Evol. (in the press)
Thomason, P. & Kay, R. Eukaryotic signal transduction via histidine-aspartate phosphorelay. J. Cell Sci. 113, 3141–3150 (2000)
Article CAS PubMed Google Scholar
Fortini, M. E., Skupski, M. P., Boguski, M. S. & Hariharan, I. K. A survey of human disease gene counterparts in the Drosophila genome. J. Cell Biol. 150, F23–F30 (2000)
Article CAS PubMed Google Scholar
Jain, R., Rivera, M. C., Moore, J. E. & Lake, J. A. Horizontal gene transfer in microbial genome evolution. Theor. Popul. Biol. 61, 489–495 (2002)
Article PubMed Google Scholar
Richards, T. A., Hirt, R. P., Williams, B. A. & Embley, T. M. Horizontal gene transfer and the evolution of parasitic protozoa. Protist 154, 17–32 (2003)
Article PubMed Google Scholar
Iyer, L. M., Aravind, L., Coon, S. L., Klein, D. C. & Koonin, E. V. Evolution of cell-cell signaling in animals: did late horizontal gene transfer from bacteria have a role? Trends Genet. 20, 292–299 (2004)
Article CAS PubMed Google Scholar
Myllykallio, H. et al. An alternative flavin-dependent mechanism for thymidylate synthesis. Science 297, 105–107 (2002)
Article ADS CAS PubMed Google Scholar
Kessin, R. H., Gundersen, G. G., Zaydfudim, V., Grimson, M. & Blanton, R. L. How cellular slime molds evade nematodes. Proc. Natl Acad. Sci. USA 93, 4857–4861 (1996)
Article ADS CAS PubMed PubMed Central Google Scholar
Morris, H. R., Taylor, G. W., Masento, M. S., Jermyn, K. A. & Kay, R. R. Chemical structure of the morphogen differentiation inducing factor from Dictyostelium discoideum . Nature 328, 811–814 (1987)
Article ADS CAS PubMed Google Scholar
Cane, D. E., Walsh, C. T. & Khosla, C. Harnessing the biosynthetic code: combinations, permutations, and mutations. Science 282, 63–68 (1998)
Article CAS PubMed Google Scholar
Holland, I. B. & Blight, M. A. ABC-ATPases, adaptable energy generators fuelling transmembrane movement of a variety of molecules in organisms from bacteria to humans. J. Mol. Biol. 293, 381–399 (1999)
Article CAS PubMed Google Scholar
Andrade, A. C., Van Nistelrooy, J. G., Peery, R. B., Skatrud, P. L. & De Waard, M. A. The role of ABC transporters from Aspergillus nidulans in protection against cytotoxic agents and in antibiotic production. Mol. Gen. Genet. 263, 966–977 (2000)
Article CAS PubMed Google Scholar
Mendez, C. & Salas, J. A. The role of ABC transporters in antibiotic-producing organisms: drug secretion and resistance mechanisms. Res. Microbiol. 152, 341–350 (2001)
Article CAS PubMed Google Scholar
Schoonbeek, H. J., Raaijmakers, J. M. & De Waard, M. A. Fungal ABC transporters and microbial interactions in natural environments. Mol. Plant Microbe Interact. 15, 1165–1172 (2002)
Article CAS PubMed Google Scholar
Anjard, C. & Loomis, W. F. Evolutionary analyses of ABC transporters of Dictyostelium discoideum . Eukaryot. Cell 1, 643–652 (2002)
Article CAS PubMed PubMed Central Google Scholar
Iranfar, N., Fuller, D. & Loomis, W. F. Genome-wide expression analyses of gene regulation during early development of Dictyostelium discoideum . Eukaryot. Cell 2, 664–670 (2003)
Article CAS PubMed PubMed Central Google Scholar
Devreotes, P. N. & Zigmond, S. H. Chemotaxis in eukaryotic cells: A focus on leukocytes and Dictyostelium . Annu. Rev. Cell Biol. 4, 649–686 (1988)
Article CAS PubMed Google Scholar
Noegel, A. A. & Schleicher, M. The actin cytoskeleton of Dictyostelium: a story told by mutants. J. Cell Sci. 113, 759–766 (2000)
Article CAS PubMed Google Scholar
Rivero, F. & Somesh, B. P. Signal transduction pathways regulated by Rho GTPases in Dictyostelium . J. Muscle Res. Cell Motil. 23, 737–749 (2002)
Article CAS PubMed Google Scholar
Merlot, S. & Firtel, R. A. Leading the way: directional sensing through phosphatidylinositol 3-kinase and other signalling pathways. J. Cell Sci. 116, 3471–3478 (2003)
Article CAS PubMed Google Scholar
Bockaert, J. & Pin, J. P. Molecular tinkering of G protein-coupled receptors: an evolutionary success. EMBO J. 18, 1723–1729 (1999)
Article CAS PubMed PubMed Central Google Scholar
Ginsburg, G. T. et al. The regulation of Dictyostelium development by transmembrane signalling. J. Eukaryot. Microbiol. 42, 200–205 (1995)
Article CAS PubMed Google Scholar
Raisley, B., Zhang, M., Hereld, D. & Hadwiger, J. A. A cAMP receptor-like G protein-coupled receptor with roles in growth regulation and development. Dev. Biol. 265, 433–445 (2004)
Article CAS PubMed Google Scholar
King, N., Hittinger, C. T. & Carroll, S. B. Evolution of key cell signaling and adhesion protein families predates animal origins. Science 301, 361–363 (2003)
Article ADS CAS PubMed Google Scholar
Kawata, T. et al. SH2 signaling in a lower eukaryote: A STAT protein that regulates stalk cell differentiation in Dictyostelium . Cell 89, 909–916 (1997)
Article CAS PubMed Google Scholar
Thien, C. B. & Langdon, W. Y. Cbl: many adaptations to regulate protein tyrosine kinases. Nature Rev. Mol. Cell Biol. 2, 294–307 (2001)
Article CAS Google Scholar
Meng, W. et al. Structure of the amino-terminal domain of Cbl complexed to its binding site on ZAP-70 kinase. Nature 398, 84–90 (1999)
Article ADS CAS PubMed Google Scholar
Shaulsky, G., Kuspa, A. & Loomis, W. F. A multidrug resistance transporter serine protease gene is required for prestalk specialization in Dictyostelium . Genes Dev. 9, 1111–1122 (1995)
Article CAS PubMed Google Scholar
Anjard, C., Zeng, C., Loomis, W. F. & Nellen, W. Signal transduction pathways leading to spore differentiation in Dictyostelium discoideum . Dev. Biol. 193, 146–155 (1998)
Article CAS PubMed Google Scholar
Good, J. R. et al. TagA, a putative serine protease/ABC transporter of Dictyostelium that is required for cell fate determination at the onset of development. Development 130, 2953–2965 (2003)
Article CAS PubMed Google Scholar
Good, J. R. & Kuspa, A. Evidence that a cell-type-specific efflux pump regulates cell differentiation in Dictyostelium . Dev. Biol. 220, 53–61 (2000)
Article CAS PubMed Google Scholar
Chibalina, M. V., Anjard, C. & Insall, R. H. Gdt2 regulates the transition of Dictyostelium cells from growth to differentiation. BMC Dev. Biol. 4, 8 (2004)
Article PubMed PubMed Central CAS Google Scholar
Blumberg, D. D., Ho, H. N., Petty, C. L., Varney, T. R. & Gandham, S. AmpA, a modular protein containing disintegrin and ornatin domains, has multiple effects on cell adhesion and cell fate specification. J. Muscle Res. Cell Motil. 23, 817–828 (2002)
Article CAS PubMed Google Scholar
Grimson, M. J. et al. Adherens junctions and β-catenin-mediated cell signalling in a non-metazoan organism. Nature 408, 727–731 (2000)
Article ADS CAS PubMed Google Scholar
Dynes, J. L. et al. LagC is required for cell-cell interactions that are essential for cell-type differentiation in Dictyostelium . Genes Dev. 8, 948–958 (1994)
Article CAS PubMed Google Scholar
Wang, J. et al. The membrane glycoprotein gp150 is encoded by the lagC gene and mediates cell-cell adhesion by heterophilic binding during Dictyostelium development. Dev. Biol. 227, 734–745 (2000)
Article CAS PubMed Google Scholar
Kibler, K., Svetz, J., Nguyen, T. L., Shaw, C. & Shaulsky, G. A cell-adhesion pathway regulates intercellular communication during Dictyostelium development. Dev. Biol. 264, 506–521 (2003)
Article CAS PubMed Google Scholar
Morrison, A. et al. Disruption of the gene encoding the EcmA, extracellular matrix protein of Dictyostelium alters slug morphology. Dev. Biol. 163, 457–466 (1994)
Article CAS PubMed Google Scholar
Wang, Y. Z., Slade, M. B., Gooley, A. A., Atwell, B. J. & Williams, K. L. Cellulose-binding modules from extracellular matrix proteins of Dictyostelium discoideum stalk and sheath. Eur. J. Biochem. 268, 4334–4345 (2001)
Article CAS PubMed Google Scholar
Freeze, H. & Loomis, W. F. Chemical analysis of stalk components of Dictyostelium discoideum . Biochim. Biophys. Acta 539, 529–537 (1978)
Article CAS PubMed Google Scholar
Zhang, P., McGlynn, A., Loomis, W. F., Blanton, R. L. & West, C. M. Spore coat formation and timely sporulation depend on cellulose in Dictyostelium . Differentiation 67, 72–79 (2001)
Article CAS PubMed Google Scholar
West, C. M., Zhang, P., McGlynn, A. C. & Kaplan, L. Outside-in signaling of cellulose synthesis by a spore coat protein in Dictyostelium . Eukaryot. Cell 1, 281–292 (2002)
Article CAS PubMed PubMed Central Google Scholar
Blanton, R. L., Fuller, D., Iranfar, N., Grimson, M. J. & Loomis, W. F. The cellulose synthase gene of Dictyostelium . Proc. Natl Acad. Sci. USA 97, 2391–2396 (2000)
Article ADS CAS PubMed PubMed Central Google Scholar
Thomason, P. A. et al. An intersection of the cAMP/PKA and two-component signal transduction systems in Dictyostelium . EMBO J. 17, 2838–2845 (1998)
Article CAS PubMed PubMed Central Google Scholar
Maeda, M. et al. Periodic signaling controlled by an oscillatory circuit that includes protein kinases ERK2 and PKA. Science 304, 875–878 (2004)
Article ADS CAS PubMed Google Scholar
Soler-Lopez, M. et al. Structure of an activated Dictyostelium STAT in its DNA-unbound form. Mol. Cell 13, 791–804 (2004)
Article CAS PubMed Google Scholar
Solomon, J. M., Rupper, A., Cardelli, J. A. & Isberg, R. R. Intracellular growth of Legionella pneumophila in Dictyostelium discoideum, a system for genetic analysis of host-pathogen interactions. Infect. Immun. 68, 2939–2947 (2000)
Article CAS PubMed PubMed Central Google Scholar
Skriwan, C. et al. Various bacterial pathogens and symbionts infect the amoeba Dictyostelium discoideum . Int. J. Med. Microbiol. 291, 615–624 (2002)
Article PubMed Google Scholar
Zoghbi, H. Y. & Orr, H. T. Glutamine repeats and neurodegeneration. Annu. Rev. Neurosci. 23, 217–247 (2000)
Article CAS PubMed Google Scholar
Brown, L. Y. & Brown, S. A. Alanine tracts: the expanding story of human illness and trinucleotide repeats. Trends Genet. 20, 51–58 (2004)
Article CAS PubMed Google Scholar
Rubin, G. M. et al. Comparative genomics of the eukaryotes. Science 287, 2204–2215 (2000)
Article CAS PubMed PubMed Central Google Scholar
Zhang, L. et al. Whole genome amplification from a single cell: Implications for genetic analysis. Proc. Natl Acad. Sci. USA 89, 5847–5851 (1992)
Article ADS CAS PubMed PubMed Central Google Scholar
Iranfar, N. et al. Expression patterns of cell-type-specific genes in Dictyostelium . Mol. Biol. Cell 12, 2590–2600 (2001)
Article CAS PubMed PubMed Central Google Scholar
Van Driessche, N. et al. A transcriptional profile of multicellular development in Dictyostelium discoideum . Development 129, 1543–1552 (2002)
Article CAS PubMed Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Article CAS PubMed Google Scholar
Zhang, H. Alignment of BLAST high-scoring segment pairs based on the longest increasing subsequence algorithm. Bioinformatics 19, 1391–1396 (2003)
Article CAS PubMed Google Scholar
Katoh, M. et al. An orderly retreat: dedifferentiation is a regulated process. Proc. Natl Acad. Sci. USA 101, 7005–7010 (2004)
Article ADS CAS PubMed PubMed Central Google Scholar
Xu, Q. et al. Transcriptional transitions during Dictyostelium spore germination. Eukaryot. Cell 3, 1101–1110 (2004)
Article CAS PubMed PubMed Central Google Scholar
Maeda, M. et al. Changing patterns of gene expression in Dictyostelium prestalk cell subtypes recognized by in situ hybridization with genes from microarray analyses. Eukaryot. Cell 2, 627–637 (2003)
Article CAS PubMed PubMed Central Google Scholar
Cohen, B. A., Mitra, R. D., Hughes, J. D. & Church, G. M. A computational analysis of whole-genome expression data reveals chromosomal domains of gene expression. Nature Genet. 26, 183–186 (2000)
Article CAS PubMed Google Scholar
Enright, A. J., Van Dongen, S. & Ouzounis, C. A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002)
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Sequencing and analysis of chromosomes 1, 2 and 3 were supported by grants from the DFG and by Köln Fortune, and that of chromosomes 4, 5 and 6 in the USA by grants from NICHD/NIH. Work in the UK/European Union (EU) was supported by a programme grant from the MRC to J.W., R.R.K., B.B. and P.H.D., and by the EU. Analyses at dictyBase were supported by a grant from the NIGMS/NIH to R.L.C. The Dictyostelium cDNA project was supported by Research for the Future of JSPS and by Grants-in-Aid for Scientific Research on Priority Areas of MEXT of Japan. The German team wishes to thank S. Förste, N. Zeisse, S. Rothe, S. Landmann, R. Schultz, C. Neuhoff and R. Müller for technical assistance. The US team thanks S. Kaminsky, S. Klein and T. Hewitt for their scientific foresight in the early stages of this project and for their support of Dictyostelium as a model system, and H. Hosak, O. Delgado, L. Lewis, K. Hamilton, J. Hume, C. Kovar Smith, D. Neal, P. Havlak, K. J. Durbin and P. Burch of the HGSC at Baylor College of Medicine. R.S. thanks L. Cortez, E. Joyner and B. Hill for their assistance during the course of the project. C.B. thanks M. Veron and P. Glaser for their support and discussions. P.H.D. thanks H. O'Hare for early involvement in the project, A. Ivens for discussions, and the MRC Centre Visual Aids department, Cambridge, for advice on graphics. We also thank S. Bowman and D. Lawson for their contribution to the EUDICT region in the initial stages of the project, and D. Martin at the Wellcome Trust Biocentre, University of Dundee, for running the GOtcha search of our gene models. The Japanese cDNA project thanks N. Ogasawara and I. Takeuchi for comments and encouragement, and others who participated in earlier stages of the project.Author contributions M. Platzer, R. R. Kay, J. Williams, P. H. Dear, A. A. Noegel, B. Barrell and A. Kuspa are co-senior authors.

Author information

N. Hall
Present address: The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland, 20850, USA
L. Eichinger, J. A. Pachebat, G. Glöckner, M.-A. Rajandream and R. Sucgang: *These authors contributed equally to this work

Authors and Affiliations

Center for Biochemistry and Center for Molecular Medicine Cologne, University of Cologne, Joseph-Stelzmann-Str. 52, 50931, Cologne, Germany
L. Eichinger, J. A. Pachebat, B. Tunggal, F. Rivero, P. Farbrother & A. A. Noegel
Laboratory of Molecular Biology, MRC Centre, CB2 2QH, Cambridge, UK
J. A. Pachebat, S. Kummerfeld, M. Madera, B. A. Konfortov, A. T. Bankier, M. Madan Babu, A. Wardroper, R. R. Kay & P. H. Dear
Genome Analysis, Institute for Molecular Biotechnology, Beutenbergstr. 11, D-07745, Jena, Germany
G. Glöckner, K. Szafranski, R. Lehmann, M. Felder, A. Rosenthal & M. Platzer
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK
M.-A. Rajandream, M. Berriman, N. Hamlin, R. Davies, D. Saunders, P. Davis, A. Kerhornou, N. Hall, N. Bason, C. Churcher, J. Cooper, A. Cronin, I. Goodhead, T. Mourier, A. Pain, D. Harper, H. Hauser, K. James, D. Johnson, A. Knights, K. Mungall, K. Oliver, C. Price, M. A. Quail, E. Rabbinowitsch, M. Sanders, S. Sharp, M. Simmonds, S. Spiegler, A. Tivey, B. White, D. Walker, J. Woodward & B. Barrell
Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, 77030, USA
R. Sucgang, J. Song, G. Chen, X. Nie, L. Hemphill, B. Desany, M. Lu, R. Lindsay, J. Ma & A. Kuspa
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, 77030, USA
Q. Xu, E. Sodergren, N. van Driessche, G. Shaulsky, G. Weinstock, R. Gibbs & A. Kuspa
Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, 77030, USA
Q. Xu & G. Shaulsky
Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, 77030, USA
E. Sodergren, D. Muzny, M. Quiles, H. Loulseged, J. Hernandez, D. Steffen, G. Weinstock & R. Gibbs
Section of Cell and Developmental Biology, Division of Biology, University of California, La Jolla, California, 92093, San Diego, USA
R. Olsen, C. Anjard & W. F. Loomis
dictyBase, Center for Genetic Medicine, Northwestern University, 303 E Chicago Ave, Chicago, Illinois, 60611, USA
P. Gaudet, P. Fey, K. Pilcher, E. Just & R. L. Chisholm
Graduate School of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Ibaraki, 305-8572, Japan
T. Morio, H. Urushihara & Y. Tanaka
Adolf-Butenandt-Institute/Cell Biology, Ludwig-Maximilians-University, 80336, Munich, Germany
R. Rost & M. Schleicher
Biochemistry Department, University of Cambridge, Cambridge, CB2 1QW, UK
S. Haydock
Division of Biological Sciences, Graduate School of Science, Hokkaido University, 060-0810, Sapporo, Japan
T. Saito
Unité de Genomique des Microorganismes Pathogenes, Institut Pasteur, 28 rue du Dr Roux, 75724, Cedex 15, Paris, France
C. Buchrieser
Department of Biology, University of York, York, YO10 5YW, UK
A. Wardroper
MRC Cancer Cell Unit, Hutchison/MRC Research Centre, Hills Road, CB2 2XZ, Cambridge, UK
M. Thangavelu
Centre for Genetic Resource Information, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan
Y. Kohara
Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Minato, Tokyo, 108-8639, Japan
S. Sugano
Institut für Pharmazeutische Biologie, Universität Frankfurt (Biozentrum), 60439, Frankfurt am Main, Germany
T. Winckler
Department of Molecular Biology, Princeton University, Princeton, New Jersey, 08544-1003, USA
E. C. Cox
School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK
J. Williams

Authors

L. Eichinger
View author publications
You can also search for this author in PubMed Google Scholar
J. A. Pachebat
View author publications
You can also search for this author in PubMed Google Scholar
G. Glöckner
View author publications
You can also search for this author in PubMed Google Scholar
M.-A. Rajandream
View author publications
You can also search for this author in PubMed Google Scholar
R. Sucgang
View author publications
You can also search for this author in PubMed Google Scholar
M. Berriman
View author publications
You can also search for this author in PubMed Google Scholar
J. Song
View author publications
You can also search for this author in PubMed Google Scholar
R. Olsen
View author publications
You can also search for this author in PubMed Google Scholar
K. Szafranski
View author publications
You can also search for this author in PubMed Google Scholar
Q. Xu
View author publications
You can also search for this author in PubMed Google Scholar
B. Tunggal
View author publications
You can also search for this author in PubMed Google Scholar
S. Kummerfeld
View author publications
You can also search for this author in PubMed Google Scholar
M. Madera
View author publications
You can also search for this author in PubMed Google Scholar
B. A. Konfortov
View author publications
You can also search for this author in PubMed Google Scholar
F. Rivero
View author publications
You can also search for this author in PubMed Google Scholar
A. T. Bankier
View author publications
You can also search for this author in PubMed Google Scholar
R. Lehmann
View author publications
You can also search for this author in PubMed Google Scholar
N. Hamlin
View author publications
You can also search for this author in PubMed Google Scholar
R. Davies
View author publications
You can also search for this author in PubMed Google Scholar
P. Gaudet
View author publications
You can also search for this author in PubMed Google Scholar
P. Fey
View author publications
You can also search for this author in PubMed Google Scholar
K. Pilcher
View author publications
You can also search for this author in PubMed Google Scholar
G. Chen
View author publications
You can also search for this author in PubMed Google Scholar
D. Saunders
View author publications
You can also search for this author in PubMed Google Scholar
E. Sodergren
View author publications
You can also search for this author in PubMed Google Scholar
P. Davis
View author publications
You can also search for this author in PubMed Google Scholar
A. Kerhornou
View author publications
You can also search for this author in PubMed Google Scholar
X. Nie
View author publications
You can also search for this author in PubMed Google Scholar
N. Hall
View author publications
You can also search for this author in PubMed Google Scholar
C. Anjard
View author publications
You can also search for this author in PubMed Google Scholar
L. Hemphill
View author publications
You can also search for this author in PubMed Google Scholar
N. Bason
View author publications
You can also search for this author in PubMed Google Scholar
P. Farbrother
View author publications
You can also search for this author in PubMed Google Scholar
B. Desany
View author publications
You can also search for this author in PubMed Google Scholar
E. Just
View author publications
You can also search for this author in PubMed Google Scholar
T. Morio
View author publications
You can also search for this author in PubMed Google Scholar
R. Rost
View author publications
You can also search for this author in PubMed Google Scholar
C. Churcher
View author publications
You can also search for this author in PubMed Google Scholar
J. Cooper
View author publications
You can also search for this author in PubMed Google Scholar
S. Haydock
View author publications
You can also search for this author in PubMed Google Scholar
N. van Driessche
View author publications
You can also search for this author in PubMed Google Scholar
A. Cronin
View author publications
You can also search for this author in PubMed Google Scholar
I. Goodhead
View author publications
You can also search for this author in PubMed Google Scholar
D. Muzny
View author publications
You can also search for this author in PubMed Google Scholar
T. Mourier
View author publications
You can also search for this author in PubMed Google Scholar
A. Pain
View author publications
You can also search for this author in PubMed Google Scholar
M. Lu
View author publications
You can also search for this author in PubMed Google Scholar
D. Harper
View author publications
You can also search for this author in PubMed Google Scholar
R. Lindsay
View author publications
You can also search for this author in PubMed Google Scholar
H. Hauser
View author publications
You can also search for this author in PubMed Google Scholar
K. James
View author publications
You can also search for this author in PubMed Google Scholar
M. Quiles
View author publications
You can also search for this author in PubMed Google Scholar
M. Madan Babu
View author publications
You can also search for this author in PubMed Google Scholar
T. Saito
View author publications
You can also search for this author in PubMed Google Scholar
C. Buchrieser
View author publications
You can also search for this author in PubMed Google Scholar
A. Wardroper
View author publications
You can also search for this author in PubMed Google Scholar
M. Felder
View author publications
You can also search for this author in PubMed Google Scholar
M. Thangavelu
View author publications
You can also search for this author in PubMed Google Scholar
D. Johnson
View author publications
You can also search for this author in PubMed Google Scholar
A. Knights
View author publications
You can also search for this author in PubMed Google Scholar
H. Loulseged
View author publications
You can also search for this author in PubMed Google Scholar
K. Mungall
View author publications
You can also search for this author in PubMed Google Scholar
K. Oliver
View author publications
You can also search for this author in PubMed Google Scholar
C. Price
View author publications
You can also search for this author in PubMed Google Scholar
M. A. Quail
View author publications
You can also search for this author in PubMed Google Scholar
H. Urushihara
View author publications
You can also search for this author in PubMed Google Scholar
J. Hernandez
View author publications
You can also search for this author in PubMed Google Scholar
E. Rabbinowitsch
View author publications
You can also search for this author in PubMed Google Scholar
D. Steffen
View author publications
You can also search for this author in PubMed Google Scholar
M. Sanders
View author publications
You can also search for this author in PubMed Google Scholar
J. Ma
View author publications
You can also search for this author in PubMed Google Scholar
Y. Kohara
View author publications
You can also search for this author in PubMed Google Scholar
S. Sharp
View author publications
You can also search for this author in PubMed Google Scholar
M. Simmonds
View author publications
You can also search for this author in PubMed Google Scholar
S. Spiegler
View author publications
You can also search for this author in PubMed Google Scholar
A. Tivey
View author publications
You can also search for this author in PubMed Google Scholar
S. Sugano
View author publications
You can also search for this author in PubMed Google Scholar
B. White
View author publications
You can also search for this author in PubMed Google Scholar
D. Walker
View author publications
You can also search for this author in PubMed Google Scholar
J. Woodward
View author publications
You can also search for this author in PubMed Google Scholar
T. Winckler
View author publications
You can also search for this author in PubMed Google Scholar
Y. Tanaka
View author publications
You can also search for this author in PubMed Google Scholar
G. Shaulsky
View author publications
You can also search for this author in PubMed Google Scholar
M. Schleicher
View author publications
You can also search for this author in PubMed Google Scholar
G. Weinstock
View author publications
You can also search for this author in PubMed Google Scholar
A. Rosenthal
View author publications
You can also search for this author in PubMed Google Scholar
E. C. Cox
View author publications
You can also search for this author in PubMed Google Scholar
R. L. Chisholm
View author publications
You can also search for this author in PubMed Google Scholar
R. Gibbs
View author publications
You can also search for this author in PubMed Google Scholar
W. F. Loomis
View author publications
You can also search for this author in PubMed Google Scholar
M. Platzer
View author publications
You can also search for this author in PubMed Google Scholar
R. R. Kay
View author publications
You can also search for this author in PubMed Google Scholar
J. Williams
View author publications
You can also search for this author in PubMed Google Scholar
P. H. Dear
View author publications
You can also search for this author in PubMed Google Scholar
A. A. Noegel
View author publications
You can also search for this author in PubMed Google Scholar
B. Barrell
View author publications
You can also search for this author in PubMed Google Scholar
A. Kuspa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. H. Dear.

Ethics declarations

Competing interests

The authors declare that they have no competing financial interests.

Supplementary information

Supplementary Notes

Document containing Supplementary Discussion and Supplementary Methods, with Supplementary Figures S1-S and Supplementary Tables S1-S18. (PDF 3594 kb)

Supplementary Figure S1

This is a complete version of Supplementary Figure S1, of which only a small portion is shown in the Supplementary Notes. This figure shows the alignment between the HAPPY maps and the chromosomal sequences. (PDF 10994 kb)

Supplementary Table S1

This is a complete version of Supplementary Table S1, of which only a small portion is shown in the Supplementary Notes. The table gives details of all the HAPPY markers and their chromosomal locations. (XLS 840 kb)

Supplementary Table S3

Raw Data for analysis of inferred gene duplication in the Dictyostelium genome. This table gives details of the recently duplicated genes discussed in the text. (XLS 827 kb)

PDF version of Figure 2

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eichinger, L., Pachebat, J., Glöckner, G. et al. The genome of the social amoeba Dictyostelium discoideum. Nature 435, 43–57 (2005). https://doi.org/10.1038/nature03481

Download citation

Received: 16 September 2004
Accepted: 17 February 2005
Issue Date: 05 May 2005
DOI: https://doi.org/10.1038/nature03481

This article is cited by

The Dictyostelium discoideum FimA protein, unlike yeast and plant fimbrins, is regulated by calcium similar to mammalian plastins
- Hiroaki Ishida
- Andrew G. Woodman
- Hans J. Vogel
Scientific Reports (2023)
Very Low Rates of Spontaneous Gene Deletions and Gene Duplications in Dictyostelium discoideum
- Shelbi E. Gill
- Frédéric J. J. Chain
Journal of Molecular Evolution (2023)
Evolution and functional diversification of catalase genes in the green lineage
- Luzhao Pan
- Yin Luo
- Xuexiao Zou
BMC Genomics (2022)
Engineering the amoeba Dictyostelium discoideum for biosynthesis of a cannabinoid precursor and other polyketides
- Christin Reimer
- Johann E. Kufs
- Falk Hillmann
Nature Biotechnology (2022)
The draft genome of Cochliopodium minus reveals a complete meiosis toolkit and provides insight into the evolution of sexual mechanisms in Amoebozoa
- Yonas I. Tekle
- Fang Wang
- Joseph F. Ryan
Scientific Reports (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Abstract

Similar content being viewed by others

Main

Mapping, sequencing and assembly

Genome mapping

Genome sequencing and assembly

Sequence characteristics of the genome

Simple sequence repeats are abundant and unusual

Transposable elements are clustered

tRNAs are numerous and paired by specificity

A chromosomal master copy of the extrachromosomal rDNA element

Centromeres, telomeres and rearrangements

Repeat clusters may serve as centromeres

rDNA sequences seem to act as telomeres

Chromosome 2 duplication

Content and organization of the proteome

(A + T)-richness influences protein composition and codon usage

Geometry reflects phylogeny—duplications in the genome

Amino acid repeats

Phylogeny, evolution and comparative proteomics

Phylogeny of eukaryotes based on complete proteomes

Proteins shared by Dictyostelium and major organism groups

Orthologues of human disease genes

Horizontal gene transfer

Dictyostelium ecology

Polyketide synthases

ABC transporters

Cellulose degradation

Specializations for cell motility

Multicellularity and development

Signal transduction through G-protein-coupled receptors

SH2 domain signalling

ABC transporter signalling

Kinases and transcription factors

Cell adhesion

Cellulose-based structures

Conclusion

Methods

HAPPY mapping

Chromosome purification

WCS and YAC subclone libraries

Sequencing and assembly

Fluorescent in situ hybridization

Gene prediction and identification of sequence features

Analysis of functional gene clustering

Analysis of duplicated genes

Other sequence analyses and graphical representation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links