Locus specific engineering of tandem DNA repeats in the genome of Saccharomyces cerevisiae using CRISPR/Cas9 and overlapping oligonucleotides

DNA repeats constitute a large part of genomes of multicellular eucaryotes. For a longtime considered as junk DNA, their role in genome organization and tuning of gene expression is being increasingly documented. Synthetic biology has so far largely ignored DNA repeats as regulatory elements to manipulate functions in engineered genomes. The yeast Saccharomyces cerevisiae has been a workhorse of synthetic biology, owing to its genetic tractability. Here we demonstrate the ability to synthetize, in a simple manner, tandem DNA repeats of various size by Cas9-assisted oligonucleotide in vivo assembly in this organism. We show that long tandem DNA repeats of several kilobases can be assembled in one step for different monomer size and G/C content. The combinatorial nature of the approach allows exploring a wide variety of design for building synthetic tandem repeated DNA directly at a given locus in the Saccharomyces cerevisiae genome. This approach provides a simple way to incorporate tandem DNA repeat in synthetic genome designs to implement regulatory functions.

Tandem DNA repeats constitute a large fraction of eukaryotic genomes 1 . This fraction of eukaryotic genomes has long interested a large research community, with questions ranging from their evolutionary origin and function 2 , their epigenetic effect on gene expression 3 to their role in three dimensional genome structure 4 . Interest in their evolutionary dynamics also stems from their role as markers for genetic footprinting 5 and in human diseases, like Huntington disease, myotonic dystrophy type 1 and several neurodegenerative diseases 6,7 . Yeast has long been a model of choice to study the biology of tandem DNA repeats. Large cloned arrays of human tandem DNA repeats inserted into the yeast genome [8][9][10] or carried by a plasmid 11 are stable across a wide array of size in a wild type background, allowing for identification of genetic pathways associated to repeat instability. Analyses of yeast mutant strains carrying tandem DNA repeats have allowed a better understanding of the roles of replication, DNA repair, recombination, transcription or DNA structures in genetic stability of trinucleotide repeats 12 or human G-rich minisatellites 9,10 .
S. cerevisiae genome contains natural tandem DNA repeats that have been studied for their potential functional role and capacity to evolve under phenotypic selection [13][14][15] . A classical example is the FLO genes locus, in which copy number of FLO genes influence the flocculation phenotype and cell adherence to surface 15,16 . Another well-studied example is the copy number of rDNA genes, which can vary greatly between natural strains. Copy number variation within the rDNA locus has served as a model to understand concerted evolution of repeated DNA sequences or the role of replication stress in influencing copy number 13,17 . A measurable phenotypic effect of tandem repeat copy number variation (CNV) has also been shown for short nucleotide repeats located inside yeast promoters 18 . These studies highlight the potential of S. cerevisiae for testing phenotypic consequences of CNV of given tandem DNA repeats.
Experimental approaches to insert synthetic tandem DNA repeats in the yeast genome have been devised in the past. To generate the repeats, the main experimental approaches rely on an in vitro step, either by enzymatic ligation of monomers 19 , by polymerase chain reaction (PCR) 9 or by rolling circle amplification (RCA) 11 . Synthesis by ligation allows controlling the number of repetition to be assembled, whereas PCR and RCA allow generating larger repeats. In the case of PCR, sequence heterogeneity of the repeat monomer is often large, owing to PCR frequent mispriming, whereas with RCA long sizes of faithfully replicated tandem repeats can be achieved. However, this technique is not well suited to engineer polymorphic repeats. To circumvent some of the limitations offered by the existing methods, we reasoned that a more versatile approach to generate synthetic DNA repeated arrays inside the yeast genome should be advantageous for many synthetic biology projects. Thanks to the efficiency of homologous recombination in S. cerevisiae, many techniques allowing direct DNA assembly in yeast using donor DNA from plasmids, PCR products or oligonucleotides have been proposed in the past [20][21][22] . These also include recent technologies of genome editing relying on CRISPR/Cas9 enzymes to selectively drive assembly of donor DNA at an induced double stranded break. CRISPR/Cas9 allows easier scar-less genome editing, improves editing efficiency, and eases multiplex genome editing when compared to older techniques [23][24][25][26][27][28] .
In this report, we show that a simple experimental approach using CRISPR/Cas9 to assist insertion of partially overlapping oligonucleotides allows to generate in one experimental step a diverse library of tandem DNA repeat arrays, ranging typically from 1 to about 100 repeats in size. We show that the efficiency depends on the size of the monomer and to a lesser extent to the G/C content of the repeats. In particular, the approach was successful for building repeated arrays from monomers ranging in size from 46 to 165 base pair (bp). The approach should be useful for synthetic chromatin or promoter engineering, or functional genomics studies.

Results
Experimental Design. Tandem DNA repeats can be theoretically assembled from partially complementary oligonucleotides (Fig. 1). To assemble synthetic tandem DNA repeats directly at a specific locus in the yeast genome, we reasoned that we could assemble partially overlapping oligonucleotides at the site of a DNA double strand break generated in vivo by a CRISPR/Cas9 complex. We chose to target assembly of synthetic repeats into the non essential YMR262 gene of chromosome XIII, which does not contain natural repeats (Fig. 1B). To that goal, we first expressed constitutively from a centromeric plasmid an engineered version of Cas9 that previously promoted efficient genome edition in yeast 24,29 (see Methods). In a second step this Cas9 expressing strain was co-transformed with a 2μ plasmid constitutively transcribing a guide RNA targeting codon 79 in the YMR262 gene as the Cas9 cutting site, partially complementary oligonucleotides promoting repeat assembly, and two donor DNA fragments containing complementary regions to both genomic sequence surrounding the double DNA strand break and to the assembled repeats (Fig. 1A). The exact coordinates of the cutting site in the sac-Cer3 version of the S288c reference genome assembly is chrXIII:793958 (Fig. 1C). The two PCR-generated donor DNAs were homologous to 35-39 bp on the DNA repeat sequence to be assembled, and homologous to respectively 246 bp and 100 bp on genomic DNA. The left PCR was designed so that 361 bp of genomic DNA including the YMR262 promoter are lost upon repair, thus preventing influence of transcriptional activity on assembled repeats (see Methods, Fig. 1A and Supplementary Table s4). To test the generality of the method for generating repeats of various size and nucleotide composition, we tested assembly of G/C poor (25% G/C), G/C neutral (50% G/C) and GC rich (75% G/C) synthetic repeats of random sequence. For each G/C content, we tested three monomer length of 4, 46 and 165 bp, giving 9 designs in total. The nine expected monomer sequences are given in Supplementary Table s1. Before building the repeats, we verified using a blast query against the S288c reference genome (sacCer3 version) that the six engineered 46 and 165 bp repeats were not already present in the S. cerevisiae genome, since this could have been a source of recombination artefacts. Monomers of this size are often found in nature in promoters, genes and non-coding satellite DNA 1 . The design of overlapping oligonucleotides used to assemble tandem DNA repeats of random sequence and of three different monomer length is given in Fig. 1D. Repeat monomer sequences were broken down in 40 to 60 mer oligonucleotides overlapping on 18-20 nucleotides, the last oligonucleotide overlapping with the first one, so that a repeated assembly can take place. The number of oligonucleotides used depended on the size of the monomer, two oligonucleotides being sufficient to assemble 4 bp and 46 bp repeats and 4 oligonucleotides to assemble 165 bp repeats. In the case of the GC neutral 4 bp repeat, only one oligonucleotide was necessary in the design, since the ATGC 4 bp repeat is self complementary (Supplementary Table s4).
Characterisation of the in vivo assembled repeats. After transformation, cells were directly plated on selective media to transiently maintain plasmids expressing Cas9 and the guide RNA. In these conditions, only yeasts having repaired the double strand break by modification of the cutting site should survive 24 (Fig. 1A). In our hands, each transformation events of 10 6 yeast cells yielded more than two hundred surviving clones. For comparison, transformation of the guide RNA only yields less than 10 colonies in average ( Supplementary Fig. s10). From each transformation experiment, 10 clones were picked and analyzed by southern blotting. Results of two independent transformations are shown in Fig. 2. As exemplified in Fig. 2A, each transformant analyzed carried a modified YMR262 allele. Sizes of inserted DNA at the YMR262 locus varied widely depending on the type of repeat assembled. Assembled repeats up to 1 kb long were PCR amplified to allow precise size measurement after Sanger sequencing. For longer repeats, arrays were sized directly from analysis of the southern blot. Assembly of 165 bp repeats using a design of four partially complementary oligonucleotides yielded the largest size diversity of recombinant insertions, with inserts ranging from one repeat to arrays larger than 15 kb, corresponding to an assembly of around a hundred monomers. The assembly of a 46 bp repeats using a two oligonucleotides design led mainly to assembly of a single repeat. This is not completely unexpected, given that the two donor DNAs are partially overlapping over about 24-26 bp by design, allowing the insertion of single repeats. However, in each case (G/C poor, neutral or rich), at least a long insertion was recovered, indicating that this approach allows synthesis of long repeats, yet at a lower efficiency than the four oligonucleotides design. The assembly of 4 bp repeats led predominantly to short assembled repeats (5-10 repeats). As in the case of 46 bp repeats, assembly of 5-10 repeats can be explained by direct recombination of overlapping sequences within donor DNA without insertion of overlapping oligonucleotides. However, long insertion of up to 100 repeats were recovered from the assembly of 4 bp GC rich monomers, demonstrating that synthesis of large repeated arrays by oligonucleotide assembly is feasible in this context.

Fidelity of in vivo repeat assembly.
Repeats longer than 1 kb were out of reach for direct PCR amplification and Sanger sequencing. To check that the insertions were true repeats and verify the fidelity of the assembly process, we PCR amplified repeats shorter than 1 kb including genomic junctions. Amplified repeats were analyzed by Sanger sequencing for the 9 different designs. Sequences of isolated clones were scored for single nucleotide polymorphisms, insertions and deletions. Results are summarized in Table 1 and Supplementary Figs s1-s9. Out of a total of 262 genomic DNA-repeat junctions sequenced, we found only four substitutions, six indels and one insertion of five nucleotides, indicating that the fidelity of the insertion process is very high. Regarding the repeats, the accuracy and fidelity of tandem repeat assembly varied slightly depending on the G/C content. The least accurate assembly was observed for the G/C rich 165 bp repeats. G/C poor repeat tended to have increased nucleotide substitutions and indel. These errors can be explained by misalignment of two oligonucleotides during annealing and/or recombination insertion. For the G/C rich 165 bp repeat, the main errors were large deletions of 5, 53, 64 and 115 bp, present in 5 out of 16 sequenced clones. For this particular repeat, we wondered if this high error rate was also due to misalignement of the repeat-coding oligonucleotides. As shown in Fig. 3, the deletions observed can be straightforwardly explained by spurious annealing of two oligonucleotides over a short G/C rich stretch of 7-10 complementary bases. This result suggests that short G/C rich 10 nucleotides complementary stretches are sufficient to drive oligonucleotide assembly, adding versatility into oligonucleotide designs that can be implemented to drive repeat assembly with this approach.

Discussion
The goal of this study was to provide a simple experimental way to build tandem DNA repeats of various sizes in the genome of S. cerevisiae. We showed that it can be achieved in one experimental step by using CRISPR/Cas9 and partially overlapping oligonucleotides. The use of CRISPR/Cas9 greatly simplifies genome editing even in genetically tractable organisms like Saccharomyces cerevisiae. Here we show that we can engineer, using CRISPR/ Cas9, a combinatorial library of different size of DNA repeat arrays (typically from one to a hundred repeats) in a single step experiment, for different monomer size and GC content. Interestingly, we observed that short GC rich stretches of 7-10 nucleotides are sufficient for oligonucleotide annealing. In our experience, short GC rich stretches homologous between oligonucleotides introduced sequence variability within the assembled repeats. This suggests that overlapping sequences between oligonucleotides used to drive repeat assembly could be shortened from 20 to stretches of 10 G/C nucleotides. The approach was not very efficient for building arrays of tetranucleotide repeats that were diverse in size. Therefore, alternative oligonucleotide designs could be explored for microsatellite design, for example by reducing the size of the region complementary to the assembled repeats within the donor DNA, to minimize direct repair by donor DNA without oligonucleotide insertion.
In this study, we only sequenced shorter repeats amenable to PCR amplification and Sanger sequencing, but a different strategy would have to be implemented for the sequencing of longer repeats. Long read sequencing technologies like nanopore sequencing or SMRT sequencing can today circumvent issues posed by sequencing of long tandem DNA repeats [30][31][32] .
In conclusion, beyond building homologous tandem DNA repeats, the approach should allow engineering combinatorial libraries of heterogenous repeats that can be selected through an appropriate phenotypic screening of the recombinant cells (Fig. 4). Drawing examples from nature, tandem DNA repeats could be used for Repeats  4P  4N  4R  46P  46N  46R  165P  165N  165R   #clones  19  15  20  17  16  16  11  15  16   #sequenced nt  728  280  1072  2252  1445  734  4279  8033  2802   substitutions  0  1  0  0  0  1  3  19  2   indels  0  0  0  10  1  0  3   example to build spacer DNA between genetic elements 33 or for bottom-up engineering of regulatory regions, like promoters and enhancers 34,35 . Indeed, the approach presented here provides an alternative to combinatorial design of promoters 34 , e.g. with the introduction of transcription factor binding sequences in non overlapping  oligonucleotide regions. Since repeats are assembled statistically from the oligonucleotide mix provided during transformation, providing alternative oligonucleotide sequences located in non overlapping regions of the repeat should be straightforward to increase the diversity of the library assembled into the repeated array. For example, the oligonucleotide designs proposed here for 46 bp and 165 bp monomers allow respectively 9 and twice 22 non overlapping nucleotides where diverse binding sites can be incorporated. Noteworthy, this approach should also be an interesting addition to the techniques developed by the Sc2.0 project (www.syntheticyeast.org) for the synthesis of a yeast synthetic genome, allowing introduction of repeated sequences in the synthetically assembled building blocks. Our technique therefore prevents the need to design repeats in the synthetics "chunks" 36 , as they can be assembled directly in vivo following sequence replacement. The only limitation of our technique is that it does not allow to predetermine the exact number of repeats added. However, given the large number of repeats that can be added by this approach, it might still be cost effective to select from a large library of engineered repeats as opposed to fully construct large repeats in vitro. Finally, the approach would be easily testable in other systems than S. cerevisiae, depending on the ability of the host system to favor homologous recombination over non-homologous end joining to repair DNA double strand breaks.  (Fig. 1B). The YPH499-Cas9 strain was created by transforming plasmid pAL30 into YPH499 by the LiAC protocol, to form the ALY0 strain 39 . In vivo synthesis of tandem DNA repeats at the YMR262 locus was achieved by transformation of yeast spheroplasts prepared according to the procedure established by the Larionov lab 40 with little modifications. Briefly, the Cas9-expressing ALY0 strain was grown in SCD-His to an OD 600 of 0.6-0.8. 10 7 spherloplasts were transformed with 1 μg of plasmid pAL31, 100 pmol of each repeat-forming oligonucleotide (AL-O-01 to 23, Supplementary Table s4), and 10 pmol of both donor DNAs. After transformation cells were plated on SCD-His-Ura and incubated 48 to 72 hours at 30 °C to allow growth of surviving clones containing repeat insertions at the site of CRISPR/Cas9 cutting. Growing clones were first reisolated on SCD-His-URA before screening.

Methods
Screening of edited clones by Southern blotting and Sanger sequencing. Screening was done by purification of genomic DNA from yeast transformants followed by Southern blotting. Genomic DNAs from twenty clones of each repeat design were digested with BamHI and DraI, which cut at each side of the insertion locus (Fig. 1B). For the Southern blots, we used a 1 kb long P 32 -radio-labeled probe synthetized by PCR with primers AL-O-50 and 51 ( Fig. 1B  Whenever PCR products of multiple sizes were generated during the PCR reaction, the main band was purified before Sanger sequencing. This did not happen systematically however, and most of the PCR reactions yielded products of unique size in our hands. Sanger sequencing was performed by GATC biotech. Sequences are available in Supplementary Figs s1-s9.