Main

The satellite DNA now known as alpha satellite DNA (AS) was first described as a large-scale repetitive sequence in the African green monkey,1 and was found to reside in the centromere.2 AS is now known to be a major DNA component of centromeres in primates.3, 4 AS consists of tandem repeats of DNA sequences that are approximately 170 bp in length. Human AS consists of a large number of subfamilies, some of which are simple tandem repetitions of the basic repeat units, and others that are organized into higher-order repeats (HORs), where an HOR refers to a structure in which multiple copies of the basic repeat units appear periodically. AS containing HORs is known to be more important in regulating the centromere function.4, 5 The chromosome-specific organization of AS was discovered approximately 30 years ago,6 and chromosome-specific subfamilies are characterized by HOR structures as well as their monomer sequences. The present study deals with the evolutionary emergence of HORs in AS in the primate lineage.

HORs have also been observed in great apes, including the chimpanzee, gorilla and orangutan, with less variation observed in the periodicity within species, as compared with the variation observed in humans.4, 7 Humans and great apes belong to the family Hominidae (hominids). This family, along with the family Hylobatidae (gibbons; also called small apes), forms the superfamily Hominoidea (hominoids). Computational analyses of data generated by shotgun sequencing of the Nomascus leucogenys (white-cheeked gibbon) genome suggested that HORs may be present in AS of members of the family Hylobatidae.8 Subsequent experimental approaches by the same group, however, did not find direct evidence for the existence of HOR in AS of N. leucogenys, and the authors concluded that HORs are a peculiarity of hominids.9 In the present study, we obtained direct evidence for the presence of HOR in AS of Symphalangus syndactylus (siamang), another species of the Hylobatidae family. We have previously reported that this species carries AS in the terminal regions of chromosomes, in addition to the centromeres.10 To characterize AS, we had sequenced a 4101-bp region of a genomic DNA clone (GenBank accession number: AB678729).10 Subsequent to the publication of this work, a re-analysis of this sequence (which contained 24 repeat units) with an alignment of the repeat units revealed a periodic variation in the nucleotide sequence. Because the analyzed sequence was not long enough to clarify the pattern of organization of the repeat units in detail, we sequenced another clone in the present study to obtain a longer sequence. We also intended to confirm the HOR presence in another clone.

The analysis of the pFosSia1 sequence suggested that its HOR is a mixture of two repeat intervals: an interval of four repeat units and one of six repeat units. For a nucleotide sequence of a block of 850 (5 × 170) consecutive nucleotides, if the block contains an HOR with an interval of four repeat units, we can expect detection of a sign for this HOR by dot matrix analysis to compare the sequence with itself. One sequence assay of a fosmid clone, using a universal primer, provides sequences of 1000–1100 nucleotides, and the first 700-nucleotide region exhibits a significantly low frequency of sequencing error. In the next 200-nucleotide region, the error frequency is higher, but it still provides sequence information sufficient for a dot matrix analysis for signs of HOR. We cloned 19 additional fosmid clones (pSiaFos2 to 20) that exhibited strong signals by the method described in our previous study.10 We then sequenced one end of the 19 clones, and 16 of them were found to contain AS. Dot matrix analysis of the respective sequences (Supplementary Figure 1) suggested that 3 (pFosSia7, 15, and 19) of the 16 sequences contain HORs with an interval of four repeat units. We selected one of these (pFosSia7) as a second clone to be sequenced.

The sequencing strategy used has been described previously,10 and involves the preparation of deletion clones of different sizes, sequencing of these clones using a universal primer and assembly of sequence reads into a single stretch. However, we altered our protocol to use exonuclease III and mung bean nuclease11 in place of restriction endonucleases, because the use of these enzymes permits more variety in the size of deletion clones, and leads to a higher efficiency in collecting sequencing samples. We sequenced a 9517-bp region of the pFosSia7 clone, deposited in GenBank with the accession number AB819921, and found 55 consecutive repeat units therein.

Sequence alignment of the 55 repeat units of the pFosSia7 clone is shown in Supplementary Figure 2. Sequence alignment of the 24 repeat units of the pFosSia1 clone has been previously published.10 In both cases, a non-random distribution of variation is apparent at many points in the nucleotide sequence. Pairwise comparisons of the repeat units for sequence identities are shown in Supplementary Figure 3, in which cells representing identities of 90–95% and >95% are in yellow and red, respectively. In the figure, red cells form several step-like patterns, which are parallel to one another, indicating that multiple copies of the basic repeat units appear periodically. For further clarification of the HOR structure, we constructed a neighbor-joining phylogenetic tree of the repeat units, assigned numbers to distinctive blocks, and examined the sequence of these numbers along the nucleotide sequences of the pSiaFos1 and pSiaFos7 clones (Figure 1). The most common patterns were ‘123456’, ‘123478’ and ‘1278’, which we designated as α, β and γ, respectively. Thus, the sequenced AS contains an HOR structure, the most common repeat intervals being six and four. Repetition of a specific combination of the α, β and γ blocks was not observed within the present sequence data; this might appear if a further long region is sequenced.

Figure 1
figure 1

Structure of the higher-order repeat. (a) A neighbor-joining phylogenetic tree for the 79 repeat units (A01–A24 and B01–B55) was constructed using the MEGA5 program19 under default settings. To the eight distinctive blocks indicated by green vertical bars, numbers (1–8) were assigned arbitrarily so that their sequence in b would become easy to recognize. (b) The block number sequences along the nucleotide sequences of the pSiaFos1 and pSiaFos7 clones are shown. α, β and γ are the three most common patterns of the block number appearance, formed as ‘123456’, ‘123478’ and ‘1278’, respectively, and indicated by thin green bars.

The centromere protein B (CENP-B) box is a 17-bp sequence embedded in AS, and was first identified in human as an important signal for the centromere function.12 CENP-B, a highly conserved centromere-associated protein, binds to this region. We examined the consensus sequence of pFosSia1 (Figure 4 of Koga et al.10) and that of pFosSia7 (Supplementary Figure 2 of the present study) for a CENP-B box, but we could not find it or a similar sequence block. However, this does not necessarily mean that the HOR-containing AS sequence is devoid of centromere function. The CENP-B box has been demonstrated to be present in AS of humans and great apes by hybridization experiments, but the same experiments did not detect it in gibbons or other primates examined.13

S. syndactylus carries large constitutive heterochromatin blocks in the terminal regions of its chromosomes,14 and these blocks are composed mostly or solely of AS.10 We performed fluorescence in-situ hybridization analysis with the intention to determine whether the HOR-containing AS originates from the centromere or from the terminal heterochromatin. The results from assays using pSiaFos1 as a probe have been shown in our previous reports.10, 15 Hybridization signals appeared in both the centromere and telomere regions; we were therefore unable to determine the locations. Additional assays using pSiaFos7 and some other clones resulted in the same signal patterns (Supplementary Figure 4). Thus, it cannot be determined at present which region the HOR-containing AS originates from. Even if the origin is the telomere region, these sequences might affect the composition of centromeric AS by being transferred to the contromere. One possible form of transfer would be extrachromosomal circular DNA, which is known to often contain AS in human cells.16 There is also an example of vigorous amplification of a tandem-repeat sequence integrated into the centromere in a gibbon.17

Apes of the Hylobatidae family can be divided into four genera: Hoolock, Hylobates, Nomascus and Symphalangus.18 Of these, Nomascus and Symphalangus include species that have AS at chromosomal ends in addition to centromeres.9, 10, 15 While studying N. leucogenys, Cellamare et al.9 cloned numerous AS fragments from both locations and analyzed nucleotide sequences for HOR, and concluded that AS of this species does not have HOR. In the present study, we obtained evidence for HOR in AS of S. syndactylus. These contrasting results may be due to the differences in the species used, or due to differences in the principal methods employed (computer analysis of a collection of short sequences in the study by Cellamare et al.,9 and traditional sequencing of long genomic clones in the present study).

It is widely postulated that HOR in AS is an attribute of hominids, but our results necessitate a modification of the current understanding: HOR is an attribute of hominoids.