Identification of a Xist silencing domain by Tiling CRISPR

Despite essential roles played by long noncoding RNAs (lncRNAs) in development and disease, methods to determine lncRNA cis-elements are lacking. Here, we developed a screening method named “Tiling CRISPR” to identify lncRNA functional domains. Using this approach, we identified Xist A-Repeats as the silencing domain, an observation in agreement with published work, suggesting Tiling CRISPR feasibility. Mechanistic analysis suggested a novel function for Xist A-repeats in promoting Xist transcription. Overall, our method allows mapping of lncRNA functional domains in an unbiased and potentially high-throughput manner to facilitate the understanding of lncRNA functions.

Dashed horizontal lines represent a FC level of 1.5. In panels [1][2][3][4]197 sgRNAs that are significantly enriched with maxFC ≥ 1.5 and RSA p < 0.5 in D18 dox+ samples are colored in red and the rest in blue. Among them, 3 sgRNAs are also enriched in D18 dox− samples (highlighted in red in panels [5][6]. The 7 th panel shows neighborhood Log 10 P for each sgRNA within a sliding window. These values were used to identify an sgRNAenriched cluster (see Methods). The bottom schematic shows a ~2.4 kb Xist region corresponding to an enriched sgRNA cluster, as determined by neighborhood Log 10 P. Red triangles represent 14 individual sgRNAs used for validation and downstream analysis. (d) Upon being transduced with an indicated sgRNA, cas9-cl36 cells were split into two groups with one group treated with doxycycline (dox+) and the other with DMSO (dox−). After 7 days continued culturing in puromycin, the ratio of the percentage of RFP+ cells in dox+ vs dox− samples were calculated.
We reasoned that similar phenotypes would arise from mutations generated by adjacent or overlapping sgR-NAs; thus, a "sgRNA cluster" would likely correspond to a true functional domain. Applying "sliding window" analysis with a window size of 30-300 bp, we identified one sgRNA cluster corresponding to the 15 to 2446 bp candidate region at the Xist transgene 5′-end (Fig. 1c, 2 bottom panels). Among 295 sgRNAs derived from this region, 167 were enriched, representing a hit rate of 56.6%, which is significantly higher than the overall hit rate of 12.9% when considering the entire transgene (p-value = 1.06e-107). We then randomly picked 14 enriched sgR-NAs located across the candidate region for validation (Fig. 1c, bottom panel, red triangles). Upon transduction of individual sgRNAs, we determined sgRNA enrichment by measuring ratios of RFP+ cells between dox+ and dox− cells after 7 days of culture in puromycin. In comparison to scrambled sgRNA controls, which displayed ratios < 1, candidate sgRNAs displayed ratios ranging from 2.5 to 13.8 (Fig. 1d), confirming their enrichment. Overall, results derived from Tiling CRISPR suggest that a region at the Xist 5′ end is responsible for silencing function. Indeed, this region contains several conserved repeats that reportedly regulate Xist activity [18][19][20][21] .

PacBio-seq suggests A-repeats within the 2.4 Kb region as the silencing domain.
To narrow down silencing sequences within this region, we analyzed InDels generated by each of the 14 validated sgRNAs. We first determined internal vs. promoter InDels, as either would interfere with Xist function, while only internal InDels were applicable to domain analysis. We compared levels of a ~100 bp amplicon covering the Xist transgene transcription start site (TSS) in transduced vs. parental cas9-cl36 cells using qPCR analysis (Supplementary Table 2). Relative to the parental cas9-cl36 control, 36-58% cells infected with Xist-derived sgRNAs displayed intact promoters (Fig. 2a), suggesting ~ half of InDels are likely internal that do not perturb promoter function. The CRISPR-cas9 system generates both large and small Indels. To assess which types are likely responsible for loss of Xist silencing function, we derived cas9-cl36 clones infected with sgRNA Xist325 derived from a 20 bp region 1213 bp downstream of Xist transgene TSS and outside conserved repeats. Among 7 clones generated, 3 displayed 1046 bp deletions and the rest showed 2-13 bp deletions (Fig. 2b). For each clone, we calculated the proportion of surviving cells between dox+ and dox− cells after 4 days of culture in puromycin. Proportions ranging from 22-52% were detected from clones containing >1 kb deletions (Fig. 2b), in comparison to <5% from clones with small InDels. Therefore, we focused further analysis on large InDel detection.
A 6138 bp region at the Xist 5′-end was PCR-amplified using genomic DNA extracted from RFP+ cas9-cl36 cells that had been transduced with one of the 14 sgRNAs and cultured in dox plus puromycin for 7 days (Fig. 2c). To exclude InDels from the endogenous Xist gene, we used an upstream primer located at the Xist transgene promoter plus a downstream internal primer (Supplementary Table 2). Analysis of PCR products suggested the presence of large deletions in all samples (Fig. 2c). We then used PacBio long-read sequencing to identify and align deletions. While each sample displayed unique deletion patterns, a "common" deletion from 553 to 718 bp was detected in 13 of 14 samples, but not in the scrambled control (Fig. 2d), suggesting that this region, which is located within the Xist A-repeats, is essential for silencing function.
A-repeats are required for Xist transactivation. To assess this potential function, we derived clones in which the entire A-repeat region (367-730 bp) had been deleted using CRISPR-cas9-directed homologous directed recombination (HDR) (Fig. 3a, upper panel, and Supplementary Table 2; clones are designated RepA Del ). Following clone screening, we randomly picked 3 RepA del clones for analysis (Fig. 3a, lower panel, and Sanger sequencing). We first assessed cell survival upon Xist induction in puromycin. Unlike the scrambled control which displayed ~5% cell survival, 30-35% of cells from RepA del clones survived (Fig. 3b), confirming that A-repeats function in puro r silencing. To assess underlying mechanisms, we evaluated Xist levels by RNA FISH and RT-qPCR following induction. Both methods revealed a significant lack of Xist RNA in RepA del clones (Fig. 3c,d). After excluding the possibility of promoter deletion in all clones (Fig. 3e), we reasoned that loss of the A-repeats may either inhibit Xist transcription or promote Xist decay. To determine which, we evaluated Xist half-life by actinomycin D treatment and detected no changes in RepA del clones vs controls (Fig. 3f)     labeling and immunoprecipitation of labeled RNA. Relative to scrambled controls, RepA del clones displayed a >67% decrease in levels of nascent Xist RNA (Fig. 3g), while no change was detected from nascent GapDH mRNA, suggesting that loss of A-repeats downregulates Xist transcription. Overall, these experiments suggest that the RepA region is required for Xist transactivation.

Discussion
Although lncRNAs have been extensively analyzed, tools useful to assess their function are limited and mostly borrowed from methods initially devised to define mRNA activity. For example, lncRNA loss-of-function studies have been based on use of RNA interference to degrade lncRNA 24,25 or on the CRISPR-Cas9 to either repress lncRNA expression through promoter manipulation 26 or to generate large deletions of the lncRNA gene loci 27 . These technologies have been effective in identifying biologically relevant lncRNAs, but it has remained difficult to push lncRNA functional analysis forward. Currently, lncRNA functional domains are often predicted based on RNA sequence conservation. However, it is well-established that functionally conserved lncRNAs show poor sequence conservation 28,29 , greatly limiting the utility of these approaches. In contrast, the technology we developed, Tiling CRISPR, directly identifies lncRNA functional sequences, whether they are highly or poorly conserved, in an unbiased manner. In addition, if multiple lncRNAs function in the same molecular pathway, it is possible to screen domains of multiple lncRNAs at the same time -thus, we envision Tiling CRISPR is also a method amendable to high-throughput screen.
In this proof-of-concept study, using Xist lncRNA as a model, we demonstrated the feasibility of Tiling CRISPR. Our design of tiled sgRNAs was based on several considerations: 1) Like shRNAs, not all sgRNAs are effective in generating InDels. Since the only requirement for sgRNA design is that target sites are immediately followed by a Photospacer Adjacent Motif (PAM, 5′-NGG-3′), then by chance, every 8-nucleotide on either the forward or reverse strand would contain a PAM sequence and could be targeted by sgRNA. Such high coverage greatly increases the efficiency of mutation generation. 2) Off-target effects of individual sgRNAs are well documented [30][31][32][33] . When applying Tiling CRISPR, we observed that functional sgRNA forms clusters, i.e. multiple functional sgRNAs are enriched at certain loci (Fig. 1c). Cluster formation suggests that mutations associated with neighboring sgRNAs give rise to similar phenotypes, greatly reducing concerns relevant to sgRNA off-target effects. We also observed that unlike traditional CRISPR studies, in which InDels are predominantly small, we detected large deletions in our screen (Fig. 2c,d). Indeed, use of the Non-Homologous End Joining (NHEJ) CRISPR system, in which Cas9 and a single sgRNAs are introduced into cells without a donor sequence to direct homology end repair, reportedly generates a spectrum of genomic InDels, with the largest deletion up to 6 Kb 34 . Since small Indels are unlikely to abolish lncRNA function, we envision that Tiling CRISPR will primarily detect large deletions, as evidenced by Xist analysis.
Using Tiling CRISPR, we successfully identified the known Xist silencing domain, A-repeats. A-repeats reportedly mediate silencing through multiple mechanisms: interacting with silencing factors 2,35 , recruiting genes into a Xist-mediated silencing compartment 36 , regulating Xist spreading 37,38 , or regulating Xist splicing 39 . However, our findings suggest a novel function whereby A-repeats positively regulate Xist transcription. In agreement, a genetic study has reported lack of Xist RNA and failure of X-inactivation when deleting A-repeats in mouse female embryos 40 . In addition to A-repeats, two other repetitive sequences, including F and C repeats located at the Xist 5′-end downstream of the A-Repeats, reportedly regulate Xist spreading [41][42][43] . We did not detect these regions using Tiling CRISPR, possibly because their loss has relatively subtle effects on Xist-mediated gene silencing compared to A-repeats deletion. This idea is supported by a previous study showing that deletion of Repeats F or C alone did not alter Xist-mediated puro r silencing 21 .
Overall, we conclude that Tiling CRISPR provides a new tool to map lncRNA functional domains in an unbiased and potentially high-throughput manner. Domain identification will advance lncRNA research by enabling in-depth mechanistic analysis of lncRNA activity and will enable development of RNA-based therapeutics, such as oligonucleotides, useful to effectively target lncRNAs and block their activity in disease.
Tiling single guide RNA (sgRNA) design and cloning. The Xist sequence was adopted from RefSeq entry NR_001463.3 with chromosome range chrX:103460373-103483233 on GRCm38. sgRNAs are designed following these rules: (i) they are 20 bp long; (ii) target sites are immediately followed by 5′-NGG PAM (Photospacer Adjacent Motif), a motif required for Cas9 endonuclease activity [44][45][46] ; (iii) and sgRNAs originate from both forward and reverse strands of Xist cDNA. A total of 1660 unique sgRNA sequences were identified from both strands with an average separation of 12.7 bp. After removing sgRNAs that match multiple locations on the mouse genome, 1527 sgRNAs were retained. The Rule Set2 47 on-target scores for the sgRNAs vary with values of 0.48 ± 0.13. Retained sgRNAs were synthesized by Custom array Inc., amplified by PCR, and cloned into the BbsI restriction sites of lentiviral U6 sgRNA expression vector (lentiGuide-RFP).
Scientific RepoRts | (2019) 9:2408 | https://doi.org/10.1038/s41598-018-36750-0 CRISPR library screen. Cl36-Cas9 cells were transduced with lentiviral sgRNA pool at a low multiplicity of infection (MOI = 0.2 or 0.5) and a representation of 700 cells per sgRNA. 2 × 10 6 (MOI = 0.5) or 5 × 10 6 (MOI = 0.2) Cl36-Cas9 cells were seeded in 15-cm 0.2% gelatin coated dishes at a density of 1 × 10 6 cells/dish in ESC medium containing 10 μg/μl polybrene and lentivirus. Cell culture medium was changed after overnight incubation. Four days after transduction, cells were divided into 3 groups with cells from reference groups harvested and sample or control groups cultured in puromycin containing medium with or without 1 μg/ml doxycycline (Sigma), respectively. Survived cells were collected 14 days after culture. sgRNA cluster detection. Centered at each position, the distribution of FC values formed by its nearby sgRNAs within the ±n-bp window was compared to value 1 using one sample t-test, where n ranges from 15 to 150 with an increment of 1 to scan for the optimal window size resulting in the lowest p-value (P_Ttest). To correct for multiple-test effect, all FC values were randomly shuffled and the whole search process were repeated 1000 times to simulate the NULL distribution. The permutation test assigned each position a new P_perm defined as the number of simulations with p ≤ P_Ttest divided by 1000. P_perm was further smoothed by expanding P_perm to positions within the same optimal window; then the minimum P_perm at each position were defined as its P_smooth. All p-values were calculated independently for each of the 4 experimental groups (D18 dox+ / D18 dox− at MOI 0.2 or 0.5 or D18 dox+ /D4 at MOI 0.2 or 0.5) independently for each of the 4 D18 dox+ related FCs, which results in 4 sets of p values per position. At each position, the least significant P_smooth across all FCs was considered as the neighborhood P-value (Fig. 1c). An sgRNA cluster is defined as a region containing sgRNAs that display neighborhood P ≤ 0.01.
Individual sgRNA validation. 14   in pUC19 and 2 lentiGuide-RFP constructs containing a pair of sgRNA flanking the target region. 72 hours after transfection, RFP positive cells were sorted (BD FACSAria II) and seeded into 96-well plates at 1 cell per well. Individual clones were expanded and genotyped by PCR and Sanger sequencing.
Cell viability assay. Cells were seeded into 96-well plate at 5,000 cells per well and cultured in puromycin containing medium with or without 1 μg/mL doxycycline for 4 days. Cell viability was determined with CellTiter-Glo ® Luminescent Cell Viability Assay (Promega) using CLARIOstar ® microplate reader (BMG LABTECH).
XIST RNA FISH. FISH experiment was carried out as previously described 2 . Xist expression was induced in the puromycin free ESC medium containing 1 μg/ml doxycycline for 48 hours and ES cells were dissociated and collected by cytospin. The slides were treated by CSK with 0.5% triton prior to paraformaldehyde fixtion. Xist pSx9-3 probe was labeled with Cy3-dUTP by nick-translation (Roche).
Assessment of Xist RNA stability. Xist expression was induced in the puromycin free ESC medium containing 1 μg/ml doxycycline for 24 hours. To assess RNA stability, actinomycin D (Sigma) at 5 μg/ml was added to cell culture and after 0, 3 or 6 hrs of incubation, cells were collected and RNAs were isolated for RT-qPCR.
Assessment of Xist RNA synthesis. Experimental procedure is adopted from a previous publication 51 .
Briefly, 2 mM Bromouridine (BrU, Sigma) was added to medium and cells were incubated with BrU at 37 °C for 30 min. RNAs were isolated and BrU containing RNAs were pulled down for RT-qPCR using anti-BrU antibody (BD bioscience, Cat. # 555627) and Protein A/G beads (Thermo Fisher Scientific, Cat. # 88802).

Life Sciences Reporting Summary. Further information on experimental design and reagents is available
in the Life Sciences Reporting Summary.

Data Availability
All high throughput seq data were deposited to Sequence Read Archive (SRA) under BioProject ID PRJ-NA507802. The remaining data that support the findings of this study are available from the corresponding author upon reasonable request.