Interspecies conservation of organisation and function between nonhomologous regional centromeres

Despite the conserved essential function of centromeres, centromeric DNA itself is not conserved. The histone-H3 variant, CENP-A, is the epigenetic mark that specifies centromere identity. Paradoxically, CENP-A normally assembles on particular sequences at specific genomic locations. To gain insight into the specification of complex centromeres, here we take an evolutionary approach, fully assembling genomes and centromeres of related fission yeasts. Centromere domain organization, but not sequence, is conserved between Schizosaccharomyces pombe, S. octosporus and S. cryophilus with a central CENP-ACnp1 domain flanked by heterochromatic outer-repeat regions. Conserved syntenic clusters of tRNA genes and 5S rRNA genes occur across the centromeres of S. octosporus and S. cryophilus, suggesting conserved function. Interestingly, nonhomologous centromere central-core sequences from S. octosporus and S. cryophilus are recognized in S. pombe, resulting in cross-species establishment of CENP-ACnp1 chromatin and functional kinetochores. Therefore, despite the lack of sequence conservation, Schizosaccharomyces centromere DNA possesses intrinsic conserved properties that promote assembly of CENP-A chromatin.

C entromeres are the chromosomal regions upon which kinetochores assemble to mediate accurate chromosome segregation. Evidence suggests that both genetic and epigenetic influences define centromere identity [1][2][3][4][5][6][7][8][9] . Neocentromere formation at new locations lacking homology to centromeres 10 and the inactivation of one centromere of a dicentric chromosome despite it retaining centromeric sequences 11 indicate that centromere sequences are neither necessary nor sufficient for centromere assembly 1,7,9 . CENP-A is found at all active centromeres and is the epigenetic mark that specifies centromere identity 1,7,9 . Artificial tethering of CENP-A or CENP-A loading factors at non-centromeric locations on metazoan chromosomes is sufficient to trigger kinetochore assembly 5,12 . Thus, it is specialized chromatin rather than primary sequences of centromeric DNA that determines where kinetochores and hence functional centromeres are assembled. On the contrary, however, CENP-A is generally found on particular sequences in any given organism 1,2,4 and naked repetitive centromere DNA such as alpha-satellite DNA can provide a substrate for the de novo assembly of functional centromeres when introduced into human cells [1][2][3][4] . These observations suggest that, despite the lack of conservation between species, centromere sequences possess properties that make them attractive for assembly of CENP-A chromatin.
Schizosaccharomyces pombe, a paradigm for dissecting complex regional centromere function, has demarcated centromeres (35-110 kb) with a central domain assembled in CENP-A Cnp1 chromatin, flanked by outer-repeat elements assembled in RNA interference-dependent heterochromatin, in which histone-H3 is methylated on lysine-9 (H3K9) [13][14][15][16] . Heterochromatin is required for establishment but not maintenance of CENP-A Cnp1 chromatin 6,17 . We have proposed that it is not the sequence per se of S. pombe central-core that is key in its ability to establish CENP-A chromatin but the properties programmed by it 18,19 .
To investigate whether these properties are conserved, here we completely assemble the sequence across centromeres of other Schizosaccharomyces species and test their cross-species functionality. We show that although Schizosaccharomyces centromeres are not conserved in sequence, those of Schizosaccharomyces octosporus and Schizosaccharomyces cryophilus share with S. pombe a conserved organization of a central domain assembled in CENP-A Cnp1 chromatin, flanked by outer repeats assembled in heterochromatin. Syntenic clusters of tRNA and 5S-rRNA genes are present across S. octosporus and S. cryophilus centromeres, further emphasizing their conserved organization. By introducing minichromosomes bearing central domain sequences from S. octosporus and S. cryophilus into S. pombe, we demonstrate that these nonhomologous centromere sequences can be recognized between divergent species, allowing the establishment of CENP-A Cnp1 chromatin and functional centromeres. These observations indicate that centromere DNA possesses conserved properties that promote the establishment of CENP-A chromatin.
S. cryophilus heterochromatic outer repeats contain additional repetitive elements, including a 6.2 kb element (cTAR-14) with homology to the retrotransposon Tcry1 and transposon remnants at the mating-type locus 20  tRNA gene clusters occur at transitions between CENP-A and heterochromatin domains in two of three centromeres in S. octosporus (S.oct-cen2, S.oct-cen3) and S. cryophilus (S.cry-cen1, S.cry-cen2), and are associated with low levels of both H3K9me2 and CENP-A Cnp1 (Fig. 2b, c), suggesting that they may act as boundaries, as in S. pombe [26][27][28] . No tRNA genes demarcate the CENP-A/heterochromatin transition at S.cry-cen3. Instead, this transition coincides precisely with 270 bp LTRs (Fig. 2b, Supplementary Tables 1, 6 and Supplementary Data 3), which may also act as boundaries [29][30][31] . Similar to tRNA genes, LTRs have been shown to be regions of low nucleosome occupancy, which may counter spreading of heterochromatin 31,32 . The transition between CENP-A Cnp1 and heterochromatin is poorly demarcated at S.oct-cen1 compared with other centromeres. This region lacks tRNA genes and, as only retrotransposon remnants are detectable in S. octosporus, the sequence of putative LTRs is unknown. It is possible that the long inverted imr repeats comprise a gradual transition zone at this centromere. tRNA gene clusters also occur near the extremities of all centromeres in both species, separating heterochromatin from adjacent euchromatin. tRNA genes and LTRs are thus likely to act as chromatin boundaries at fission yeast centromeres.
A high proportion (~32%) of tRNA genes in S. pombe, S. octosporus and S. cryophilus genomes are located within centromere regions 33 (Figs. 1a, 3c, Supplementary Table 8 and Supplementary Data 9, 10). Centromeric tRNA genes are intact and are conserved in sequence with their genome-wide counterparts, indicating that they are functional genes. Two major, conserved tRNA gene clusters reside exclusively within S. octosporus and S. cryophilus centromeres (p-value < 0.00001; q-value < 0.05) (Fig. 3c, d). Cluster 1 comprises several subclusters of 2-3 tRNA genes in various combinations of up to 8 tRNA genes, whereas Cluster2 contains up to 5 tRNA genes (Fig. 3d); 17 different tRNA genes (14 amino acids) are represented, none of which are unique to centromeres (Fig. 3c). Intriguingly, the order and orientation of tRNA genes within clusters is conserved between species, but intervening sequence is not (Fig. 3d, e). Strikingly, as well as local tRNA gene cluster conservation, inspection of centromere maps reveals synteny of tRNA genes and clusters across large portions of S. octosporus and S. cryophilus centromeres. For example, the tRNA gene order AIR-RKL-E-T-T-L-DVAIR-RKLEF-A-DV (single-letter code) is observed at S.oct-cen1 and S.cry-cen3 ( Supplementary Fig. 7). This synteny, together with both possessing small central cores and long imrs, suggests that these two centromeres are ancestrally   Together, these similarities suggest ancestral relationships between S.oct-cen2 and S.cry-cen1, So-cen3 and Scry-cen2. Further, in places where synteny appears to break down, patterns of tRNA gene clusters suggest specific centromeric rearrangements occurred between the species. For instance, tRNA gene clusters at the edges of S.cry-cen2R and S.cry-cen3L are consistent with an inter-centromere arm translocation relative to S.oct-cen1R and S.oct-cen2R, indicated by gene synteny maps (Figs. 1b, 4a and Supplementary Fig. 7).
Fission yeast centromeres show interspecies functionality. No central-core sequence homology was revealed between species using BLASTN. To identify potential underlying centromere sequence features, k-mer frequencies (5-mers), normalized for centromeric AT-bias, were subjected to principal component analysis (PCA). CENP-A Cnp1 -associated regions of S. pombe, S. octosporus and S. cryophilus genomes all group together, distinct from the majority of non-centromere sequences (p-value, 9.3 × 10 −7 ) (Fig. 4b, c). Interestingly, S. pombe neocentromereforming regions 34 also cluster separately from other genomic regions, sharing sequence features with centromeres. Surprisingly, taking GC content into account, the S. japonicus genome as a whole shows no significant difference in 5-mer frequency compared with the other three fission yeast genomes. In contrast, S. japonicus CENP-A Cnp1 -associated 5-mer frequencies show significant differences from its own wider genome sequence and from centromere sequences of the other fission yeast species ( Supplementary Fig. 6). K-mer analysis and conserved centromeric organization prompted us to investigate cross-species functionality of protein and DNA components of Schizosaccharomyces centromeres. green fluorescent protein (GFP)-tagged CENP-A Cnp1 protein from each species localized to S. pombe centromeres and complemented the cnp1-1 mutant 35 (Fig. 5a-c), indicating that heterologous CENP-A proteins assemble and function at S. pombe centromeres, despite normally assembling on nonhomologous sequences in their respective organisms.
Introduction of S. pombe central-core (S.pom-cnt) DNA on minichromosomes into S. pombe results in the establishment and maintenance of CENP-A Cnp1 chromatin if S.pom-cnt is adjacent to heterochromatin, or if CENP-A is overexpressed 6,17,18,36 . S.octcnt regions (3.2-10 kb) or S.pom-cnt2 (positive control) were placed adjacent to S. pombe outer-repeat DNA in minichromosome constructs (Fig. 6a), which were transformed into S. pombe cells overexpressing S. pombe GFP-CENP-A Cnp1 (hi-CENP-A Cnp1 ) 18 . Acquisition of centromere function is indicated by minichromosome retention on non-selective indicator plates (white/pale pink colonies) and by the appearance of sectored colonies (Fig. 6b, c). The pHET-S.pom-cnt2 minichromosome containing S.pom-cnt2 established centromere function at high frequency immediately upon transformation in hi-CENP-A Cnp1 cells (Table 1). Centromere function was also established on S.oct-cnt-containing minichromosomes in hi-CENP-A Cnp1 cells (Fig. 6b, c and Table 1). CENP-A Cnp1 ChIP-quantitative PCR (ChIP-qPCR) indicated that, for minichromosomes with established centromere function, CENP-A Cnp1 chromatin was assembled on nonhomologous S.oct-cnt DNA, to levels similar to those at endogenous S. pombe centromeres and to S.pom-cnt2 on a minichromosome (Fig. 6d). Minichromosomes containing S.oct-cnt provided efficient segregation function (Table 1), no longer requiring CENP-A Cnp1 overexpression to maintain that function once established (Fig. 6e), consistent with the selfpropagating ability of CENP-A chromatin 5,18 . Minichromosomes containing S. cryophilus central-core regions (S.cry-cnt) were also able to establish functional centromeres and segregation function in S. pombe. These S.cry-cnt-bearing minichromosomes assembled CENP-A Cnp1 chromatin to high levels, similar to those at endogenous S. pombe centromeres ( Supplementary Fig. 8).
Centromere function was not due to minichromosomes gaining portions of S. pombe central-core DNA ( Supplementary Fig. 9). A similar minichromosome bearing a region (retrotransposon Tj7) highly enriched for CENP-A Cnp1 in S. japonicus did not convincingly form functional centromeres when introduced into S. pombe or assemble CENP-A Cnp1 chromatin to an appreciable extent ( Supplementary Fig. 6). Thus, S. pombe, S. octosporus and S. cryophilus centromeres share a similar organization, underlying sequence features and cross-species establishment of CENP-A Cnp1 chromatin, whereas putative S. japonicus centromeres appear not to share these attributes. Our analyses indicate that S.oct-cnt and S.cry-cnt DNAs are competent to establish CENP-A chromatin and centromere function in S. pombe when CENP-A Cnp1 is overexpressed, suggesting that S. octosporus and S. cryophilus central-core DNA have intrinsic properties that promote the establishment of CENP-A chromatin despite lacking sequence homology.

Discussion
Our analyses indicate that the centromeres of S. pombe, S. octosporus and S. cryophilus share a conserved organization of a CENP-A Cnp1 -assembled central-core flanked by outer repeats assembled in H3K9me heterochromatin. Despite this conservation of organization, centromere sequence is not conserved, although underlying sequence features are detectable by PCA of 5-mer frequencies. The cross-species functionality of S.oct-cnt and S.cry-cnt central-core DNA in S. pombe suggests that the central-core regions of these three species are favoured substrates, sharing intrinsic properties that promote the establishment of CENP-A Cnp1 chromatin, properties that S.jap-Tj7 may lack. Although the nature of putative conserved CENP-A-promoting properties is unknown, recent studies have revealed distinctive characteristics of centromeric DNA. S. pombe central-core DNA has the innate property of driving high rates of histone-H3 nucleosome turnover, causing low nucleosome occupancy 19 and may programme pervasive low-quality RNAPII transcription to promote assembly of CENP-A chromatin 18 . These and other properties, such as non-B form DNA 37 , may contribute to an Fig. 2 Domain organization of Schizosaccharomyces centromeres. a Immunostaining of centromeres in indicated Schizosaccharomyces species with anti-CENP-A Cnp1 antibody (green) and DNA staining (DAPI; red). Scale bar, 5 μm. b S. cryophilus centromere organization indicating DNA repeat elements. ChIP-seq profiles for CENP-A Cnp1 (purple) and H3K9me2-heterochromatin (orange) are shown above each centromere. Positions of tRNA genes (singleletter code of cognate amino acid; black), 5S rDNAs (red) and solo LTRs are indicated (pink). Central cores (cnt-purples) innermost repeats (imr-blue shades). 5S-associated repeats (cFSARs-orange shades); tRNA gene-associated repeats (TARs) containing clusters of tRNA genes (green shades); heterochromatic repeats (cHR) and TARs associated with single tRNA genes (various colours: brown/pink/red). cTAR-14s, containing retrotransposon remnants (deep pink). For details, including individual repeat annotation, see Supplementary Fig. 4 and Supplementary Tables 3,4. c S. octosporus centromere organization indicating DNA repeat elements. Labelling and shading as in b. Only oTAR-14ex (pale pink part) contain retrotransposon remnants. Colouring is indicative of homology within each species but only of possible repeat equivalence (not homology) between species; see Supplementary Table 5 Based on conserved features, ancestral Schizosaccharomyces centromeres may have consisted of a CENP-A Cnp1 -assembled central-core surrounded by tRNA gene clusters and 5S rDNAs. We surmise that RNAPIII promoters perhaps provided targets for transposon integration 38 , followed by heterochromatin formation to silence retrotransposons and preserve genome integrity 39,40 . The ability of heterochromatin to recruit cohesin 41 26 . In S. pombe, non-centromeric and centromeric tRNA genes and 5S rDNAs cluster adjacent to centromeres in a TFIIICdependent manner 27,28 . The multiple tandem centromeric 5S rDNAs and tRNA genes could contribute to a robust, highly folded heterochromatin structure promoting optimum kinetochore configuration for co-ordinated microtubule attachments and accurate chromosome segregation 44 . The lack of overt sequence conservation between centromeres of different species appears not to prevent functional conservation, which may be driven by underlying sequence features or properties such as the transcriptional landscape. Maintenance of centromere function has been observed at a pre-established human centromere (pre-assembled with CENP-A and an intact, functional kinetochore) in chicken cells 45

Methods
Cell growth and manipulation. Standard genetic and molecular techniques were followed. Fission yeast methods were as described 47 . Strains used in this study are listed in Supplementary Table 10. All Schizosaccharomyces strains were grown at 32°C in YES (Yeast Extract with Supplements), except S. cryophilus, which was grown at 25°C, unless otherwise stated. S. pombe cells carrying minichromosomes were grown in PMG-ade-ura. For low GFP-tagged CENP-A Cnp1 protein expression from episomal plasmids, cells were grown in PMG-leu with thiamine.
PacBio sequencing of genomic DNA. High-molecular-weight genomic DNA was prepared from S. cryophilus, S. octosporus and S. japonicus using a Qiagen Blood and Cell Culture DNA Kit (Qiagen), according to the manufacturer's instructions. Pacific Biosciences (PacBio) sequencing was carried out at the CSHL Cancer Center Next Generation Genomics Shared Resource. Samples were prepared following the standard 20 kb PacBio protocol. Briefly, 10-20 μg of genomic material was sheared via g-tube (Covaris) to 20 kb. Samples were damage repaired via ExoVII (PacBio), damage-repair mix and end-repair mix using standard PacBio 20 kb protocol. Repaired DNA underwent blunt-end ligation to add SMRTbell adaptors. For some libraries, 10-50 kb molecules from 1 to 2 μg SMRTbell libraries were size selected using BluePippin (Sage Science), after which samples were annealed to Pacbio SMRTbell primers per the standard PacBio 20 kb protocol. Annealed samples were De novo whole genome assembly of PacBio sequence reads. PacBio reads were assembled using HGAP3 (The Hierarchical Genome Assembly Process version 3) 48 . Reads were first sorted by length and the top 30% used as seed reads by HGAP3. All remaining reads of at least 1 kb in length were used to polish the seed reads. These polished reads were used to de novo assemble the genomes and Quiver software used to generate consensus genome contigs. Comparisons to the ChIP-seq input data and Broad Institute Schizosaccharomyces reference genomes 20 showed very high agreement with these datasets. The S. octosporus and S. cryophilus chromosomes were named according to their sequence lengths, the longest chromosome being labelled as chromosome I in each case.
De novo assembly of S. pombe genome using nanopore technology. Genomic DNA was extracted as described previously 49 . Briefly, cells were incubated with Zymolyase 20T to digest the cell wall, pelleted, resuspended in TE (10 mM Tris-HCl pH8, 1 mM EDTA) and lysed with SDS, followed by addition of potassium acetate and precipitation with isopropanol. After treatment with RNase A and proteinase K, two phenol chloroform extractions were performed. DNA was precipitated in the presence of sodium acetate and isopropanol, followed by centrifugation and washing of the pellet with 75% ethanol. After air drying, the pellet was resuspended in TE. DNA purity and concentration were assessed using a Nanodrop 2000 and the double-stranded high-sensitivity assay on a Qubit fluorometer, respectively. Genomic DNA was sequenced using the MinION nanopore sequencer (Oxford Nanopore Technologies). Three sequencing libraries were generated using the one-dimensional (1D) ligation kit SQK-LSK108, the twodimensional (2D) ligation kit SQK-NSK007 and the 1D Rapid sequencing kit SQK-RAD002, following the manufacturer's guidelines. Each library was sequenced on one MinION flow cell. Sequencing reads were base-called using Metrichor (1D and 2D ligation libraries) or Albacore (rapid sequencing library). The combined dataset incorporating reads from three flow cells was assembled using Canu v1.5. The assembly was computed using default Canu parameters and a genome size of 13.8 Mbp. QUAST v3.2 was used to evaluate the genome assembly.
Genome annotation and chromosome structure. Genes were annotated onto the genome both de novo, using BLAST and the sequences of known genes, and by using liftover (https://genome-store.ucsc.edu) to carry over the previous gene annotation information from the Broad institute reference genomes (ref). Cross-Map 50 was then used to lift the chain files over to the new, updated genome. The locations of tRNA genes were predicted using tRNAscan 51,52 . Dfam 2.0 53 was used to annotate repetitive DNA elements. MUMmer3.23 54 was used to compare the genomes and annotate repeat elements and tandem repeat sequences, including those located in centromeric domain and telomere sequences. Centromeric repeat elements were manually identified using BLASTN and MEGABLAST (https://blast. ncbi.nlm.nih.gov). Each repeat element was named according to their sequence features (association with tRNA gene and rDNAs) and locations. The sequence of the wild-type (h 90 ) S. pombe mating-type locus was obtained by manually merging nanopore and PacBio contigs using available data 20 (Supplementary Fig. 10) and information at www.pombase.org/status/mating-type-region. Genome synteny alignment analysis was carried out using syMAP42 55,56 , based on orthologous genes among the three genomes. ChIP-quantitative PCR analysis. For analysis of CENP-A Cnp1 association with minichromosomes bearing S. octosporus central-core DNA, three independent transformants with established centromere function (indicated by ability to form sectored colonies) for each minichromosome were grown in PMG-ade-ura cultures and fixed with 1% formaldehyde for 15 min at room temperature. ChIP was performed as previously described 57 . Briefly, 2.5 × 10 8 cells were lysed by bead beating (Biospec) in 300 μl Lysis Buffer (50 mM Hepes-KOH pH 7.5, 140 mM NaCl, 1 mM EDTA, 1% (v/v) Triton X-100, 0.1% (w/v) sodium deoxycholate). Lysates were sonicated (Bioruptor, Diagenode) for 20 min (30 s on/off, high setting), followed by centrifugation at 17,000 × g (2 × 10 min) to pellet cell debris. Lysates were precleared for 1 h with 25 μl of Protein-G agarose beads (Roche) and 10 μl precleared lysate retained as 'input' sample. Three hundred microlitres of lysate was incubated overnight with 10 μl sheep anti-CENP-A Cnp1 serum and 25 μl Protein-G agarose beads. Beads were washed with Lysis Buffer, Lysis Buffer with 500 mM NaCl, WASH buffer (10 mM Tris-HCl pH 8, 0.25 M LiCl, 0.5% NP-40, 0.5% (w/v) sodium deoxycholate,1 mM EDTA) and TE. DNA was recovered from input and IP samples using Chelex resin (BioRad). Ten microlitres of anti-CENP-A Cnp1 sheep antiserum 57 (raised to the N-terminal 19 amino acids of S. pombe CENP-A Cnp1 ) and 25 μl Protein-G-Agarose beads were used per ChIP. qPCR was performed using a LightCycler 480 and reagents (Roche), and analysed using Light-Cycler 480 Software 1.5 (Roche). Primers used in qPCR are listed in Supplementary  Table 12. Mean %IP ChIP values for Sp-cnt or So-cnt on minichromsomes were normalized to %IP for endogenous S. pombe cnt1. Error bars represent SD.
Chromatin immunoprecipitation sequencing. A modified ChIP protocol was used. Briefly, pellets containing 7.5 × 10 8 cells were lysed by four 1 min pulses of bead beating in 500 μl of lysis buffer (50 mM HEPES-KOH, pH 7.5, 140 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% sodium deoxycholate), with resting on ice in between. The insoluble chromatin fraction was pelleted by centrifugation at 6000 × g and washed with 1 ml lysis buffer before resuspension in 300 μl lysis buffer containing 0.2% SDS. Chromatin was sheared by sonication using a Bioruptor adaptor sequences) were selected using Ampure XP beads (Beckman Coulter). The libraries were sequenced following Illumina HiSeq2000 work flow (or as indicated in Supplementary Table 11).
Defining fission yeast centromeres. CENP-A Cnp1 and H3K9me2 ChIP-seq data were generated to identify centromere regions. ChIP-Seq reads with mapping qualities lower than 30, or read pairs that were over 500 nt or <100 nt apart, were discarded. ChIP-seq data were normalized with respect to input data. Paired-end ChIP-seq data (single-end for S. japonicus) was aligned to the updated genome sequences using Bowtie2 59 . Samtools 60 , Deeptools 61 and IGV 62 were subsequently used to generate sequence data coverage files and to visualize the data. MACS2 63 was used to detect CENP-A Cnp1 and heterochromatin-enriched regions of the genome.
Centromere tRNA gene cluster analysis. To test for the enrichment of tRNA gene clusters at centromere regions, a greedy search approach was used to identify potential clusters. All tRNA genes <1000 bp apart were grouped into clusters. To test for significant clustering of tRNA genes at the centromere, the locations of tRNA genes across the genome were shuffled 1000 times. For each cluster observed in the real genome, the proportion of permutations where the same cluster was observed at least as many times was calculated to provide estimates of significance. Following conversion of these p-values to q-values to account for multiple testing, the centromere tRNA gene clusters each exhibited a q-value <0.005.
Hsp16 gene tree analysis. hsp16 paralogs from S. octosporus and S. cryophilus genomes were predicted using BLASTP. The predicted protein sequences from hsp16 genes across all four fission yeasts were aligned together with those from Saccharomyces cerevisiae using Clustal Omega. BEAST (Bayesian Evolutionary Analysis Sampling Trees) 64 and FigTree (http://tree.bio.ed.ac.uk/software/figtree/) was used to generate and view the hsp16 gene phylogenetic tree.
5-mer frequency PCA. The CENP-A Cnp1 -associated sequences in the S. pombe, S. cryophilus and S. octosporus genomes are all~12 kb in length. Each genome was therefore split into 12 kb sliding windows with a 4.5 kb overlap. The frequencies of each 5-mer was calculated in each window using Jellyfish 65 . CENP-A Cnp1 -associated regions showed a general enrichment of AT base pairs relative to the genome as a whole. To normalize for GC content among the windows, all base pairs were randomized in each sequence window to generate 1000 artificial sequences with the same GC content. 5-mer frequencies were then recalculated for each of these 1000 artificial sequences and the true original 5-mer frequencies compared with these background frequencies by calculating a z-score. Consequently, these enrichment scores represent the k-mer enrichments in a given sequence normalized for GC content. Genome windows were split into six groups: CENP-A Cnp1 -associated sequences (CENP-A Cnp1 peaks covering >6 kb of sequence); outer-repeat pK-So-cnt3-2.6 kb pK-So-cnt1-3.2 kb pK-So-cnt2-4.7 kb pK-So-cnt2-6.5 kb pK-So-cnt2-10 kb pK-Sp-cnt2-8.5 kb pombe transformants containing minichromsome plasmids were replica-plated to low-adenine non-selective plates: colonies retaining the chimeric minichromosome plasmid are white/ pale pink, those that lose it are red. Representative plate showing pKp-So-cnt3-6.5kb-containing colonies. c S. pombe cells containing pKp-So-cnt3-6.5 kb chimeric minichromosome were streaked to single colonies. Red colour indicates loss of minichromosome; small red sectors indicate low-frequency minichromosome loss and mitotic segregation function. d ChIP-qPCR for CENP-A Cnp1 on S. pombe hi-CENP-A Cnp1 cells containing chimeric minichromosomes with established centromere function. Three biologically independent transformants were analysed for each minichromosome (n = 3). ChIP enrichment on S.pom-cnt2 and S.oct-cnt-bearing minichromosomes is normalized to the level at endogenous S. pombe cnt1. Individual data points are shown as black dots. Error bars, SD. e Propagation of chimeric minichromosome stability. Cells containing pK(5.6 kb)-So-cnt2-10 kb were streaked on lowadenine-containing plates with or without thiamine, which results in repression or expression of high levels of S. pombe CENP-A Cnp1 . Source data available as a Source Data file heterochromatin regions (more than half the window covered by H3K9me2 peaks adjacent to CENP-A domains); subtelomeric regions (more than half the window covered by H3K9me peaks and close to the end of a chromosome); mating-type locus; neocentromere regions (identified using CENP-A Cnp1 ChIP-seq data of S. pombe neocentromere-containing strains 34 ); and remaining genome sequences. As the highly repetitive transposon-rich S. japonicus centromere regions are not fully assembled, the precise location of the centromere-kinetochore is unknown. We therefore adopted a limited PCA approach and selected the top 11 most highly enriched 12 kb regions from CENP-A Cnp1 ChIP-seq (Supplementary Data 11). These were compared with ten randomly selected non-centromeric sequences from each of the fission yeast genomes as above. Logistic regression and mean comparison were used to determine whether principal components were linked to the probability of a sequence belonging to a particular sequence group 66 . Logistic regression and mean comparison were used to determine whether principal components (FactoMineR) were linked to the probability of a sequence belonging to a particular sequence group.
Construction of minichromosomes. S. pombe functional minichromosomes contain central domain DNA and flanking repeat DNA on one side; the long, inverted repeats found in the natural context are not tolerated by Escherichia coli 67 . Regions of S. octosporus and S. cryophilus central-core regions were amplified with primers indicated in Supplementary Table 12 and inserted as BglII-NcoI, BamHI-NcoI or BglII-SalI fragments into BglII-NcoI-or BglII-SalI-digested plasmid pK (5.6 kb)-MCS-ΔBam, which contains a 5.6 kb fragment of the S. pombe K (dg) outer repeat. To create plasmid pK-So-cnt2-10 kb, an additional 3.6 kb region from S.oct-cnt2 was inserted as a BamHI-SalI fragment into BglII-SalI-digested pK-So-cnt2-6.5 kb to make a 10 kb region of S. octosporus central core. For pKp plasmids, S. octosporus or S. cryophilus central-core regions were by inserted as BglII-NcoI, SalI-BamHI or XhoI-BamHI fragments into BamHI-NcoI-or SalI-BamHI-digested plasmid pKp (pMC91), which contains 2 kb region from S. pombe K(dg) outer repeat. For the S. japonicus CENP-A Cnp1 -associated retrotransposon Tj7 20 (Supplementary Fig. 6), a region spanning the almost the entire retrotransposon (but lacking the second LTR to avoid rearrangement or transposition problems in E. coli or S. pombe) was amplified by PCR with primers indicated in Supplementary  Table 12 and cloned in two steps into the NotI-XbaI site of pK(5.6 kb)-MCS-ΔBam to make pK-Sj-Tj7-4.8 kb. Plasmids are listed in Supplementary Table 13.
Centromere establishment assay. Strains A7373 or A7408, which contains integrated nmt41-GFP-CENP-A Cnp1 to allow high level expression of CENP-A 18 , were grown in PMG-complete medium and transformed using sorbitolelectroporation method 68 . Cells were plated on PMG-uracil-adenine plates and incubated at 32°C for 5-10 days until medium-sized colonies had grown. Colonies were replica-plated to PMG-low-adenine (10 μg/ml) plates to determine the frequency of establishment of centromere function. These indicator plates allow minichromosome loss (red) or retention (white/pale pink) to be determined. Minichromosome retention indicates that centromere function has been established, and that minichromosomes segregate efficiently in mitosis. In the absence of centromere establishment, minichromosomes behave as episomes that are rapidly lost. Minichromosomes occasionally integrate giving a false positive white phenotype. To assess the frequency of such integration events and to confirm establishment of centromere segregation function, a proportion of colonies giving the white/pale pink phenotype upon replica plating were re-streaked to single colonies on low-adenine plates-sectored colonies are indicative of segregation function with low levels of minichromosome loss, whereas pure white colonies are indicative of integration into endogenous chromosomes-and the establishment frequency adjusted accordingly.
Minichromosome stability assay. Minichromosome loss frequency was determined by half-sector assay. Briefly, transformants containing minichromsomes with established centromere function were grown in PMG-ade-ura to select for cells containing the minichromosome. At least two transformants were analysed per minichromosome. Cells were plated on low-adenine-containing plates and allowed to grow non-selectively for 4-7 days. Minichromosome loss is indicated by red sectors and retention by white sectors. To determine loss rate per division, all colonies were examined with a dissecting microscope. All colonies-except pure reds-were counted to give total number of colonies. Pure reds were checked for the absence of white sectors and were excluded, because they had lost the minichromosome before plating. To determine colonies that lost the minichromosome in the first division after plating, 'half-sectored' colonies were counted. This included any colony that was 50% or greater red (including those with only a tiny white sector). Loss rate per division is calculated as the number of half-sectored colonies as a percentage of all (non-pure-red) colonies.