INTRODUCTION

The centromere is a complex chromosomal structure responsible for proper chromosome/chromatid segregation at meiosis and mitosis. In almost all eukaryotes, centromeres are found at specific locations along chromosomes and are composed of, occasionally very large, blocks of satellite DNA. Evidence shows that in spite of the very high conservation of centromeric proteins (CENP), the satellite DNA sequences can substantially differ even among closely related species. Recently, two interconnected phenomena, human neocentromeres (HN) and evolutionary new centromeres (ENC, also called repositioned centromeres), have revolutionized our understanding of centromere function and its relationship to the underlining DNA sequences. HNs are centromeres that emerge in ectopic chromosomal regions and are devoid of alphoid sequences, that is, the satellite DNA present at primate centromeres. ENCs are centromeres that move to a new position along the chromosome without any change in marker order (no inversion or other structural rearrangements).

The discovery of evolutionary new centromeres

In the 1970s, chromosome banding triggered renewed interest in studies on karyotype evolution in primates and many other mammalian orders. The different position of the centromere along a chromosome was almost always interpreted as the result of a pericentric inversion or complex rearrangement. However, in Dutrillaux's wide-ranging study of chromosomal evolution in 60 species of primates, ‘centromere translocation’ was given as a possible mechanism for the evolution of chromosome 11 in some Cercopithecidae (Dutrillaux, 1979). He also hypothesized that in the case of Cercopithecidae with high diploid numbers, where the fission was not centromeric, there had to be a gain in centromeres. In the 1990 review of the evolution of human chromosomes, Clemente et al. (1990) hypothesized that differences in centromere position in homologs to chromosomes 4, 6 and 10 did not appear to be the result of inversions but seemed to result from the ‘activation/inactivation of centromeres’.

The advent of fluorescence in situ hybridization (FISH) technology, painting probes in particular, provided more solid, reliable tools to study karyotype evolution (Wienberg et al., 1990; Jauch et al., 1992). However, painting probes, even though very efficient in spotting chromosomal translocations, were not able to distinguish between an inversion and a centromere-repositioning event. Fortunately, the human genome-sequencing project produced large libraries of precisely mapped bacterial artificial chromosome (BAC) clones that can be very efficiently used in FISH experiments. Sequencing projects of other vertebrate species, using the shotgun approach, made extensive use of BACs and fosmids. The end sequences of these clones continue to be exploited to reliably anchor and close sequence contigs. The systematic use of BAC-FISH was very effective in guiding and disambiguating the sequence assembly of some genomes. See, for example, the cytogenetic frames that supported the sequence assembly in macaque (Gibbs et al., 2007) as reported at http://www.biologia.uniba.it/macaque, and in orangutan (Locke et al., 2011) as reported at http://www.biologia.uniba.it/orang. The BAC-FISH approach represented a powerful, visual link in dealing with the cytogenetic organization of the species under study, with special reference to the centromeres whose position is almost impossible to spot from sequencing data.

Montefalcone et al. (1999) were the first to unequivocally demonstrate the existence of the evolutionary centromere-repositioning phenomenon. They traced the evolutionary and phylogenetic history of chromosome IX in primates by the FISH of cloned DNA. It became clear that if the position of the centromere was not taken into account, a much more parsimonious scenario of rearrangements could be hypothesized to account for between-species marker order differences. When the centromere was included, the analysis became an impossible jigsaw puzzle. The centromere was therefore hypothesized to have repositioned along the chromosome, independently from the surrounding markers, with no need to hypothesize a seemingly endless series of inversions. Over the last decade numerous other studies have found ENCs in primates and in other mammalian orders (Table 1). ENCs are now accepted as an important mechanism of genome evolution ranked on equal grounds with traditional chromosome rearrangements such as inversions, translocation, deletions and insertions.

Table 1 Described cases of centromere-repositioning events in mammals

ENCs and HNs

Several lines of evidence (see below) suggest that ENCs and HNs are related phenomena. As for any mutational event, a fixed ENC must occur in a single chromosome and then spread in the population. Yet, large-scale cytogenetic studies at the population level are available, with few exceptions, only for humans (Bhasim, 2007). Our knowledge of the karyotypes of most species was usually gained by investigating just a few individuals. These small sample sizes were justified by the relatively high conservation of the karyotype in a species leading to the simplification that each species had one karyotype (Dutrillaux, 1979). As a consequence, the chances of spotting a polymorphic ENC at an early stage were very low. However, there is at least one exception, illustrated below, in the orangutan, which was thought to be a complex inversion polymorphism, but is now known to be an ENC (Locke et al., 2011).

Cytogenetic studies of HNs have two enormous advantages. The first is that huge numbers of individuals go through a powerful clinical filter. The vast majority of HNs were seeded in acentric fragments generated by a fortuitous rearrangement. The acentric fragments would have been lost in normal circumstances, but, occasionally, a neocentromere is seeded, ensuring the rescue of the fragment. However, the fragment constitutes a supernumerary chromosome causing more or less severe phenotypic consequences, requiring medical attention. About 100 cases of HN have been described (for a review, see Marshall et al., 2008). A second important advantage is the widespread practice of cytogenetic prenatal diagnosis because it can be regarded as a large ongoing population study. Prenatal diagnosis often discloses fortuitous events, like centromere shifts, that otherwise would have never been disclosed. These ‘real-time’ centromere-repositioning events in humans mimic seeding events that lead to the formation of an ENC, supporting the view that ENC and HN are two faces of the same coin. For this reason, in discussing neocentromeres in mammals, many hints and hypotheses came from what we have learned from HNs. Therefore, they deserve a short summary.

Human neocentromeres

The vast majority of HN was seeded in acentric fragments consisting of inverted duplications of a distal portion of a chromosome arm. They were classified as class I by Marshall et al. (2008): acentric duplicated fragments stabilized by a neocentromere function as supernumerary chromosomes with clinical manifestations. Class II neocentromeres are formed by acentric fragments that were excised to form linear or ring chromosomes. Clinical problems are due to accompanying deletions or gene disruption. Ring instability can also lead to the loss or duplication of the ring. Occasionally, class II neocentromeres were discovered following the malsegregation of a balanced rearrangement that did not cause phenotypic problems in the transmitting parent (Capozzi et al., 2008).

A third type of very rarely reported neocentromeres is the most pertinent to our discussion. These neocentromeres arise in intact chromosomes and functionally replace the normal centromere. The old centromere appears unchanged but functionally inactivated, as proved by the absence of CENP-A and other centromere-specific proteins that are conversely present at the new centromeric site (Warburton et al., 1997; Voullaire et al., 1999). They do not cause clinical problems and, indeed, were discovered serendipitously, mostly through amniocentesis. Eight such cases have been described (see Table 1 in Hasson et al., 2011): two were de novo, and another two segregated through at least three generations. Two de novo cases, one on chromosome Y (Bukvic et al., 1996) and another on chromosome 7 (Liehr et al., 2010), are relevant for understanding the timing of normal centromere inactivation. The chromosome 7 case did not show any old/new centromere mosaicism, thus suggesting that the neo-chromosome 7 was already present in one gamete. The chromosome Y case was mosaic 45,X/46,XY/46,XneoY, thus indicating that the event was post-zygotic. In the familiar or de novo case, no functionally dicentric chromosomes or mosaicism was observed. Instead mosaicism is relatively frequent in clinical neocentromeres, suggesting that neocentromeres might not be very efficient in ensuring mitotic segregation. The two contrasting views can be reconciled considering that there is probably a strong selection against the loss of normal neocentromeric chromosomes, which favors the loss of supernumerary neocentromeric markers.

An additional important point provided by HN is their clustering in specific chromosomal domains. Neocentromeres at 3q, 8p, 13q, 15q and Yq are especially frequent (see Figure 1a in Marshall et al., 2008).

Figure 1
figure 1

Result of a fluorescence in situ hybridization (FISH) experiment on orangutan heterozygous for the evolutionary new centromeres (ENC) on chromosome 9, using the amplification products of a PCR experiment as a probe, the total orangutan DNA as template and primers specific for the orangutan alpha satellite DNA. Note the very small size of the centromeric alpha-satellite signal on both normal and variant chromosome 9. Forward primer: 5′-TCAACTCTGTGAGATGAATGCAAAC-3′; reverse primer: 5′-AAACATCTTTGTGATGTGTGCATTC-3′. PCR conditions: 95 °C for 3 min; 35 times: 95 °C for 30 min, 60 °C for 60 min and 72 °C for 40 min). Primers were derived from a consensus sequence constructed using all the centromeric stretches of alpha satellite DNA of orangutan, available on the trace archive database (http://www.ncbi.nlm.nih.gov/Traces/home/).

ENCs in mammals

We can assume that neocentromeric chromosomes of class I and II are not limited to humans. However, because of the clinical problems they cause, these neocentromeres have no evolutionary perspective. Indeed, most ENCs belong to the third category of HN. A distinct ENC category, not yet reported clinically, is represented by neocentromeres that arise following a chromosomal fission with a breakpoint outside the centromere. In these cases a neocentromere forms in the acentric fragment (occasionally in both fragments), resulting from the fission. ENCs of chromosomes 15 and 14 were generated in this way, following the fission of an ancestral chromosome corresponding to chromosome 7 in macaque (Ventura et al., 2003).

Frequency of ENCs in mammals

Enough systematic data have now accumulated to provide information on the evolutionary rate and frequency of ENCs. In the macaque (Macaca mulatta), 9 out of 20 autosomal centromeres were shown to be evolutionarily new: chromosomes 1, 2, 4, 12, 13, 14, 15, 17 and 18 (Ventura et al., 2007). Comparative data show that these ENCs are found in all the studied old world monkeys (OWM); therefore they accumulated during the 14 million years span from the Hominoidea/Cercopithecoidea split (32 million years ago, MYA) to the Cercopithecinae/Colobinae divergence (18 MYA) (Perelman et al., 2011). By comparison, six human centromeres are evolutionarily new. Centromeres of chromosomes 3, 6 and 11 were repositioned along the chromosomes (Ventura et al., 2004; Cardone et al., 2007; Capozzi et al., 2009). Those of 14 and 15 were seeded, as mentioned, in the Hominoidea ancestor following the fission of a chromosome corresponding to chromosome 7 in macaque (Ventura et al., 2003). A non-centromeric fission of the 3/21 association synteny in the Hominoidea ancestor generated chromosome 21 and its neocentromere.

A very informative additional example is provided by the evolutionary history of Equidae. Carbone et al. (2006) compared chromosomal marker order between Burchelli's zebra (Equus burchelli) and the donkey (Equus asinus), using the horse (Equus caballus) as an outgroup. Equidae, and these three species in particular, underwent a recent, rapid evolution and accumulated a large number of chromosomal changes (Trifonov et al., 2008). Zebra and donkey diverged about 0.9 MYA, while their common ancestor diverged from the horse around 2 MYA (Oakenfull and Clegg, 1998; Oakenfull et al., 2000). The study revealed that eight centromere-repositioning events took place during the evolution of this genus. Surprisingly, at least five cases occurred in the donkey after its divergence from zebra. ‘At least’, because some chromosomes are very small in these species, and marker order and the position of the centromere could not be established with certainty in these tiny chromosomes.

These examples show that ENC formation can be relatively frequent, on a par with other types of chromosome rearrangements. What about the frequency of centromere-repositioning seeding events in general, including those that were seeded, but disappeared from the population or remained at low frequency and have not yet been discovered? The data in humans support the idea that the fixed ENCs are just the tip of an iceberg.

Neocentromeres seeding

The vast majority of the ENCs possess a heterochromatic block similar to normal centromeres. This fact is particularly evident, for instance, in macaque, where all the nine ENCs have large blocks of alphoid DNA indistinguishable from other macaque centromeres (Ventura et al., 2007). Mature repositioned centromeres are thought to have slowly acquired the large arrays of satellite DNA after their seeding in an anonymous sequence. The macaque-repositioning events occurred at least 18 MYA, providing plenty of time to ‘mature’.

The possibility that an ENC could have resulted from a transposition of the functional centromere cannot be discarded with certainty. However, the following lines of evidence support the view that ENCs were the result of epigenetic events and not the transposition of particular sequences.

  • All HNs are devoid of satellite DNA. FISH analysis did not detect any fluorescent signal at the neocentromeric loci, and, most importantly, in all class I cases the neocentromere seeding was an opportunistic event triggered by the acentric fragment formation. The probability, in these cases, of a simultaneous alpha-satellite sequence transposition can be reasonably assumed as unrealistic.

  • Chromatin immunoprecipitation followed by hybridization on microarrays (ChIP-on-chip analysis) in HN cases, using anti-CENP-A and/or anti-CENP-C antibodies (see below), always showed that the centromeric function was associated with single-copy sequences. This circumstantial evidence was recently supported by data on horse and orangutan ENCs (Wade et al., 2009; Locke et al., 2011) (for details on orangutan see below). In both cases the neocentromere, precisely mapped by ChIP-on-chip analysis, was located in regions devoid of satellite sequences.

What about the features of the sequence underlying the neocentromere? ChIP-on-chip analysis has been performed in a number of mammalian neocentromere cases (Lo et al., 2001a, 2001b; Alonso et al., 2003, 2007; Cardone et al., 2006; Capozzi et al., 2008, 2009; Wade et al., 2009; Hasson et al., 2011; Locke et al., 2011). A comparison revealed no striking similarities, with only marginal shared features, like an occasional abundance of LINE1 repeats.

The cytogenetic mapping of neocentromeres showed that some of them cluster to specific chromosome domains, 3q, 13q and 15q, in particular (Marshall et al., 2008). This finding suggests that at least some of them might be linked to a specific sequence, but ChIP-on-chip analysis pointed out that no two studied neocentromeres, apparently mapping to the same locus, shared the same seeding point sequence (Alonso et al., 2003; Ventura et al., 2004; Hasson et al., 2011).

One ENC and one HN raised relevant points of discussion in this context.

The ENC found in macaque chromosome 18 (human 18) perfectly corresponded, in humans, to a clone gap, positioned at chromosome 18: 50 313 135–50 360 134 (UCSC genome browser, hg18 release) (Carbone et al., 2009). We found that the gap was composed of non-alpha, satellite-like DNA. Sequence analysis of several primate species suggested that this sequence was present in the Cercopithecidae ancestor at the time of the neocentromere seeding. This satellite DNA was subsequently replaced by alpha satellite DNA.

A second, notable, HN case was reported by Hasson et al. (2011). These authors investigated an HN in chromosome 8. Different experimental approaches indicated that the neocentromere was seeded in a domain at 8q21, which consisted of a large array of tandemly repeated DNA with a monomer of 12 kb. This tandemly repeated DNA more closely resembled multiple segmental duplications (SDs) than classical satellite DNA, in which the repeats are usually much shorter. The presence of SDs at the seeding point was also found for other neocentromeres, those clustering at 15q24-26 in particular. This region is intriguing for an additional reason. The evolutionary history of chromosome 15 showed that chromosomes 15 and 14 originated, as mentioned, from a non-centromeric chromosomal fission (Ventura et al., 2003). A neocentromere was formed in both derivative chromosomes, and the ancestral centromere, located to a region corresponding to 15q24-26, inactivated. The abundant SDs clustered at this domain are remains of the pericentromeric SDs that flanked the ancestral inactivated centromere. Capozzi et al. (2009) recently reported a similar example. The centromere of chromosome 6 in the primate ancestor was, very likely, at 6p22.1. It repositioned to the present day location in humans in the Hominoidea ancestor. The authors report a familial case in which the centromere repositioned back to its ancestral location. These findings raise additional points. Are there hidden sequence features, which are a legacy of the inactivated centromere? Is the legacy, if present, because of primary or secondary structures? Additionally, why do so many neocentromeres cluster at 15q, and a single one at 6p and none to 2q21.2, where an ancestral centromere recently inactivated following the telomere–telomere fusion that generated human chromosome 2? One hypothesis is that trisomies/tetrasomies of the distal part of chromosome 15 are compatible with life, whereas trisomies for other regions are not. Indeed, the neocentromere on chromosome 6 was found in an otherwise normal chromosome.

Additional intriguing relationships between neocentromeres and ENCs have been reported for chromosome 13 and chromosome 3. In the case of chromosome 13, two novel ENCs were seeded in the same chromosomal domain in OWMs and pig (Sus scrofa), which diverged about 95 MYA (Cardone et al., 2006). In human chromosome 3, a repositioned centromere (normal phenotype, found by chance) and a clinical neocentromere were seeded to the 3q26 chromosomal domain, the locus where the centromere repositioned in the OWM ancestor (Ventura et al., 2004). Therefore, the same domain has been used as seeding point of an ENC and HNs.

Roizes (2006) hypothesized that centromere-repositioning events can be indirectly elicited by mutations, like retrotransposon insertions in the centromere, that could potentially affect functionality. Hasson et al. (2011) noticed that the alpha-satellite block of the inactivated centromere of the repositioned chromosome 8 showed a substantially reduced size of the alpha satellite array with respect to its homolog. In the orangutan, both the normal and repositioned chromosome 9 showed a very low amount of centromeric alpha-satellite heterochromatin (see below).

An alternative hypothesis

Zeitlin et al. (2009) demonstrated that CENP-A, a crucial component of the centromere, is rapidly recruited to DNA double-strand breaks, along with three components (CENP-N, CENP-T and CENP-U) associated with CENP-A at centromeres. The authors argue that, ‘since cell survival after radiation-induced DNA damage correlates with CENP-A expression level, we propose that CENP-A may have a function in DNA repair’. These authors also hypothesized that a neocentromere could emerge because of the presence of CENP-A at the breakpoint. All class I and II neocentromeres were seeded after a break that generated an acentric fragment. The closeness of the breakpoint to the neocentromere location has been noticed in some studies (Ventura et al., 2003). However, in other studies no relationship was found between neocentromeres and breakpoints (Warburton et al., 2000).

The ENC polymorphism in orangutan chromosome 9 (human 12)

Since the early days of comparative banding, cytogeneticists were aware that chromosome 9 in the orangutan had two forms. The difference was interpreted as an intra-chromosomal translocation and insertion of a segment containing the centromere (Turleau et al., 1975) or a paracentric inversion within a intrachromosomal translocation (de Boer and Seuanez, 1982). Later, de Boer and Seuanez (1982) and Ryder and Chemnick (1993) showed that these variants demonstrated true polymorphism in orangutans by karyotyping numerous individuals, however, they never questioned that the different chromosome forms were due to complex structural rearrangements.

Now, we understand that this polymorphism is not a complex rearrangement but an ENC (Locke et al., 2011). The heterochromatic blocks of alphoid DNA of both repositioned and normal chromosome 9 are almost undetectable by FISH (Figure 1). As mentioned above, the reduced size could have negatively affected the centromere functionality of this chromosome, and thus indirectly favored neocentromere emergence.

Note that to facilitate comparison with human chromosomes, Locke et al. (2011) referred to this chromosome as orangutan 12. However, here we preferred to follow the ICSN recognized standard nomenclature and will continue to refer to this chromosome as orangutan 9.

Over the years the laboratory at Freiburg had the opportunity to karyotype a total of 59 orangutans, which we report here for the first time (Supplementary Information). The publication of de Boer and Seuanez (1982) listed 71 individuals, according to their assigned species and stud book registration number. These authors previously reported on 11 out of 59 orangutans, studied by the Freiburg lab. Combining the two data sets (119 orangutans), we have the following distribution of karyotypes for chromosome 9:

  • 51 Bornean orangutans, (Pongo pygmaeus) with 32 homozygous normal, 14 heterozygous and 5 homozygous ENC individuals providing a frequency of 0.235 for the ENC.

  • 50 Sumatran orangutans, (Pongo abelii) with 26 homozygous normal, 22 heterozygous and 2 homozygous ENC individuals providing a frequency of 0.260 for the ENC.

  • 18 hybrid orangutans, with 12 homozygous normal, 6 heterozygous and 0 homozygous ENC individuals providing a frequency of 0.167 for the ENC.

Ryder and Chemnick (1993) studied 141 orangutans, but they did not individually list each individual or clearly divide them into the two species. The frequency of the ENC in their total sample is 0.138. This figure is considerable lower than the ENC frequency of 0.235 in our total sample plus that of de Boer and Seuanez (1982).

It is not a simple task to relate these frequencies to those that would actually be found in natural populations. However, it is clear that the ENC frequency is notable. To understand clearly the implication and dating of the ENC origins, we need to briefly review what is known about the taxonomic, phylogenetic history of orangutans.

Dating the origin of the orangutan ENC

The taxonomic level of Bornean and Sumatran orangutans was debated for some time. Since the mid 1990s, it became ever more generally accepted that two species are present: Pongo pygmaeus in Borneo and P. abelii in Sumatra (Zhi et al., 1996; Perelman et al., 2011). In general biomolecular dates cluster around 1.5 million years for the separation of the two taxa. However, comparisons of the sequenced genome assemblies provided a much lower estimate of about 400 000 years. This low estimate may derive from the overall slowdown in genome evolution of the orangutan noted by these same authors (Locke et al., 2011).

All dates of divergence between the two recognized species, whether early or late, are still amply before the final separation of Borneo and Sumatra into two islands (Steiper, 2006; Goossens et al., 2008).

It seems highly likely, given the distribution in both species of orangutans, that the ENC emerged in their common ancestor after divergence from the line leading to the African apes and humans: between 15 MYA and 1 MYA. It may be that the emergence was closer to this last date, because it seems never to have acquired all the characteristics of a mature centromere (Locke et al., 2011) (see also Figure 1). Additionally, we would expect that if it was old it would have been fixed or lost. A meiotic exchange in an individual heterozygous for an ENC within the region delimited by the old and the novel centromeres would result in dicentric and acentric chromosomes. Both derivatives are probably lost. However, a dicentric chromosome could inactivate one or the other centromere, thus reverting back to a normal or neocentric.

An important point is that the ENC polymorphism survived a fairly recent speciation event. Another point is that either the orangutan population was never particularly small or unknown selection factors are maintaining this polymorphism.

ENCs in the X-chromosome of new world squirrel monkeys

The X chromosome is probably the most conserved chromosome among mammals (Chowdhary et al., 1998). Primate species have, with few exceptions (see Ventura et al. 2001), X chromosomes that are apparently identical in banding and centromere position to the human X. Therefore, the finding by Schempp et al. (1989) that the X chromosome of Saimiri sciureus (SSC) had undergone unclear intrachromosomal rearrangements, which had apparently moved the pseudoautosomal region to distal part of the long arm was of particular interest. Later Dumas et al. (2007) hypothesized that the X-chromosome in SSC either differed by a pericentric inversion or centromere shift.

If the X-chromosome of S. sciureus had an ENC it would raise a series of questions about its distribution and evolution in new world primates. We then proceeded to study the marker order of the X chromosome of squirrel monkeys, using a panel of appropriate BAC clones (see Table 2). The synteny of the two BAC clones RP11-552J9 (Xp11.22) and RP11-135B16 (Xq11.1), flanking the human centromere, was not disrupted, but the two markers mapped to the long arm of the SSC chromosome X (Figure 2). The analysis revealed that a segment delimited, in human, by BAC RP11-24M7 (HSAXq21.33) and BAC RP11-265K3 (HSAXq28; at chromosome X: 154 603 527–154 763 828, very close to the telomere chromosome X: 154 913 754) was inverted, and that a centromere was present at the breakpoint corresponding, in human, to Xq21.33 (Figure 2). The most parsimonious interpretation is that, in concomitance to the inversion, a centromere was seeded at the breakpoint at Xq21.33. The seeding event could have been favored by the break (see above) and/or by the presence of subtelomeric repetitive sequences. However, different temporal sequence of the inversion and centromere seeding events could not be discarded.

Table 2 Human BACs used for FISH experiments on SSC to determine the marker order of the SSC chromosome X
Figure 2
figure 2

Examples of FISH experiments, using human BAC clones (see Table 2), on squirrel monkey X chromosome, showing the position (a) of the human centromere and (b) of the squirrel monkey centromere. (c) shows a human BAC mapping, in humans, at Xq21.33, which, following the inversion, became telomeric. For detail, see text.

Dating the origin and phylogenetic distribution of the X chromosome ENC

In order to understand better the origins and distribution of the neoX chromosome, we need to briefly summarize what is known about the taxonomy and phylogeny of squirrel monkeys. This is a controversial group of New World monkeys. Historically anywhere from 1 to 7 species and up to 16 subspecies of this new world primate were recognized. Prior to Hershkovitz (1982), squirrel monkeys were generally regarded as a single species. Hershkovitz (1982), considering morphology, geographic distribution and relying on cytogenetic data, divided squirrel monkeys into four species: Saimiri boliviensis, S. sciureus, S. ustus and S. oerstedii. An additional species, S. vanzolinii, was reported in 1985 (Ayres, 1985). Although Costello et al. (1993) minimized the importance of the cytogenetic data and recognized only two species, most workers have generally followed Hershkovitz with some slightly different arrangements. Groves, for instance, recognized five Saimiri species (Groves, 2001).

Cytogeneticists had long recognized that squirrel monkeys from various geographic regions all had 44 chromosomes, but differences were found in the number of acrocentric and biarmed chromosomes (Jones and Ma, 1975; Lau and Arrighi, 1976; Cambefort and Moro, 1978; Dutrillaux and Couturier, 1981; Moore et al., 1990; Garcia et al., 1995; Scammell et al., 2001). The differences range from 5 acrocentric and 16 submetacentrics to 7 acrocentric and 14 submetacentric chromosomes (see Supplementary Information for a summary of taxonomy and karyotypes). In this paper given the confusing array of numbering systems, we prefer to follow the chromosome nomenclature adapted by Stanyon et al. (2000) and Dumas et al. (2007). Given that different chromosomes varied according to taxonomic designation and geographic distribution, we also wanted to test if the Saimiri neoX chromosomes might follow the same distinctions or was perhaps even polymorphic as in the orangutan. We also hypothesized that if the neoX was found in some squirrel monkey taxa and not others it might help date the origin of the ENC.

The most recent biomolecular studies generally identified four distinct clades S. oerstedii, S. sciureus, S. boliviensis and S. ustus (Lavergne et al., 2010; Perelman et al., 2011). In studies of both mtDNA (Chiou et al 2011) and nuclear DNA (Perelman et al., 2011), a sister relationship between S. boliviensis and other Saimiri taxa was found. Either S.s. macrodon (Chiou et al., 2011) or S. ustus (Perelman et al., 2011) was proposed as the sister lineage to S. oerstedii/S.s. sciureus (Chiou et al., 2011). These studies found a very recent divergence of extant squirrel monkey species. S. boliviensis apparently diverged between 1.5 and 2.2 MYA, followed by a subsequent radiation of the other taxa between 0.7 and 1.2 MYA (Chiou et al., 2011; Perelman et al., 2011).

The repositioned centromere on chromosome X found in Saimiri is certainly present in S. sciureus, S. boliviensis boliviensis and in S. boliviensis peruviensis (Figure 3). A review of the literature shows that, when the X chromosome is illustrated with sufficient banding clarity, the repositioned centromere is evident in all squirrel monkeys regardless of the taxonomic designation (Jones and Ma, 1975; Lau and Arrighi, 1976; Cambefort and Moro, 1978; Garcia et al., 1979, 1995; Dutrillaux and Couturier, 1981; Schempp et al., 1989; Scammell et al., 2001; Stanyon et al., 2008). The seemingly anomalous q terminal position of the par of the two Saimiri in Schempp et al. (1989) is now easily explained by the presence of the neoX.

Figure 3
figure 3

4,6-Diamidino-2-phenyl indole (DAPI) banding (below), and trypsin-Giemsa banding (above) of chromosome X from squirrel monkeys Saimiri sciureus (SSCX), S. boliviensis boliviensis (SBObX) and S. b. peruviensis (SBPpX). The banding pattern appears identical, strongly indicating that both SBO share the same variant X with SSC (Figure 1).

It is also noteworthy that no other neoX was found in any other new world primate, and in particular Cebinae, the sister group to Saimiri. Our conclusion is that the ENC in the X chromosome of Saimiri evolved in the common ancestor of all squirrel monkeys. The date for Cebus/Saimiri divergence was recently calculated at about 15 MYA (Perelman et al., 2011), therefore the origin of the ENC has to be somewhere between 15 and 1.5 MYA.

There is some cytogenetic evidence that would favor that the centromere is relatively old. In Figures 2a and c the 4,6-diamidino-2-phenyl indole staining displays a consistent block of centromeric heterochromatin in SSCX-repositioned centromere. Additionally, two human BACs mapping in the region, where the novel centromere was seeded, failed to yield any FISH signals, indicating that the pericentromeric region was deeply restructured after the centromere repositioning event, further indicating that a long time elapsed since the neocentromere seeding (see below). Normally, only mature centromeres have these features. If the centromere is old it is also less likely that it is polymorphic in any Saimiri species. However, only additional research will conclusively answer these questions.

ENC evolutionary modifications after seeding

Mature eukaryotic centromeres, including ENC, are composed of arrays of satellite DNA frequently surrounded by clusters of SD (She et al., 2004). ENCs, as illustrated above, reasonably emerge in anonymous sequences, and do not immediately affect the sequence itself. ENC fixation in the population is accompanied by the acquisition of species-specific arrays of centromeric satellite DNA as well as clusters of pericentromeric segmental duplications. FISH, using specific probes can easily test for the presence of satellite DNA. If unavailable, the total DNA of the species under study can be hybridized at very high stringency. The characterization of SDs around specific centromeres is definitely more complex. Detailed data on pericentromeric SDs are essentially limited to human and mouse, because all other genomes were sequenced using the shotgun approach. The methodology designed by Bailey et al. (2002) can efficiently detect duplicated sequences for whole-genome shotgun sequence, by calculating the relative depth of coverage in the raw shotgun sequence read pool, but it is not able to map them. The analysis of pericentromeric SDs in the six human ENCs (chromosomes 3, 6, 11, 14, 15, and 21) revealed that two of them (3 and 6) are among the most poor in SDs (She et al., 2004). However, it has to be considered that SDs were already present in the seeding region of some of the other human ENCs at the time of the ENC emergence (see Cardone et al., 2007). The only non-human pericentromeric region of an ENC examined in detail is that of macaque chromosome 4 (human 6q24.3) (Ventura et al., 2007). Comparison of the sequence of the human 6q24.3 region to many other mammalian species indicated that, very likely, the seeding domain was devoid of satellite DNA and SDs. Following the ENC seeding, a 250-kb segment was extensively and imperfectly duplicated around the novel centromere. These duplications were strictly intrachromosomal. Interestingly, the two youngest ENCs yet studied, horse chromosome 11 and orangutan chromosome 9, are apparently devoid of satellite DNA. Therefore, the process leading the ENC toward the complexity of a normal centromere appears to be relatively slow, and forces acting to keep the region unaltered can oppose their restructuring. The presence of genes can be one of these forces. ENC maturation (and centromere deactivation) is still not clear. Perhaps in the future orangutan chromosome 9 might eventually provide some clues, once the neocentromere region is sequenced.

Some papers have examined the expression of genes embedded in neocentromeric regions (Saffery et al., 2003; Nagaki et al., 2004; Lam et al., 2006; Yan et al., 2006). Their conclusion was that neocentromeres do not affect gene expression per se. However, the accumulation of satellite DNA and the potential restructuring of the pericentromeric regions can negatively affect gene structure and, consequently, expression. The absence of (important) genes around the pericentromeric regions can therefore be seen as a condition favoring, or at least not opposing, ENC fixation. Lomiento et al. (2008) have found, in primates, that fixed ENCs were preferentially seeded in gene-deserts. Alternatively, those close to genes can be supposed to have remained poor in segmental duplication as in the case of human ENCs 3 and 6 (She et al., 2004).

Telomere/centromere interchange

Ventura et al. (2004) have reported on the evolutionary history of chromosome 3 in primates. The ancestral chromosome 3 split, in the new world monkeys’ ancestor, into three distinct acrocentric chromosomes. Marker order analysis confirmed that synteny and marker order was conserved in the three Platyrrhine families Cebidae, (Callithrix jacchus), Atelidae (Lagothrix lagothricha) and Pitheciidae (Callicebus pallescens). Strikingly, at least three centromere/telomere interchanges have occurred. The centromere position moved from one telomere to the other depending on the species examined. Subtelomeric repetitive sequences and/or SD could have a role in these exchanges. We can also note that Villasante et al. (2007) have proposed that, during the evolution of eukaryotic chromosomes, the centromeres were derived from telomeres.

Concluding remarks: centromeres and genome sequencing

Over the last two decades sequencing technology had experienced quantum leaps. The ‘parallel sequencing’ era, initiated in 2005 (Margulies et al., 2005), has allowed the sequencing of many human individuals (see the ‘1000 genomes’ project; http://www.1000genomes.org). Concomitantly, the sequencing of entire non-human genomes has progressed exponentially (see http://www.genome.gov/10002154), and the sequencing of 10K vertebrate genome has been proposed (Genome_10K_community_of_scientists, 2009). The giant panda was the first mammalian genome to be fully sequenced by parallel sequencing (Li et al., 2010). However, this achievement also points to the weakness of these technologies in reliably assembling sequences into chromosomes. The panda sequence, in fact, is a collection of scaffolds, and, consequently, the position of the centromere along the chromosome was not considered at all. Additionally, the satellite DNA specific for the centromeres of the species under study could be unknown or could be present in non-centromeric regions (see the stretches of alphoid sequences present, in humans, at 2q21.2; chromosome 2: 132 682 845–132 722 540; UCSC hg18). Furthermore, centromere-repositioning events can only be identified in evolutionary studies that compare a phylogenetic array of species to distinguish between the ancestral and derivative position of a specific centromere. Therefore, we can anticipate that classical and molecular cytogenetics will continue to have a crucial role in the identification of centromere movements, even in the era of massive genome sequencing. Indeed, all ENCs and HNs were found following classical and molecular cytogenetic investigations.