Optimised insert design for improved single-molecule imaging and quantification through CRISPR-Cas9 mediated knock-in

The use of CRISPR-Cas9 genome editing to introduce endogenously expressed tags has the potential to address a number of the classical limitations of single molecule localisation microscopy. In this work we present the first systematic comparison of inserts introduced through CRISPR-knock in, with the aim of optimising this approach for single molecule imaging. We show that more highly monomeric and codon optimised variants of mEos result in improved expression at the TubA1B locus, despite the use of identical guides, homology templates, and selection strategies. We apply this approach to target the G protein-coupled receptor (GPCR) CXCR4 and show a further insert dependent effect on expression and protein function. Finally, we show that compared to over-expressed CXCR4, endogenously labelled samples allow for accurate single molecule quantification on ligand treatment. This suggests that despite the complications evident in CRISPR mediated labelling, the development of CRISPR-PALM has substantial quantitative benefits.

Super-resolution imaging techniques offer unique insights into cellular structures and organisations on a nanoscale inaccessible through conventional light microscopy. Of the many super-resolution methods currently available, single molecule localisation microscopy (SMLM) is unique in that it not only offers spatial resolution in the tens of nanometers [1][2][3][4][5][6] , but a means of quantifying single emitters for insights into protein stoichiometry and organisation [2][3][4]7 .
PALM (Photo Activated Localisation Microscopy) and STORM (Stochastic Optical Reconstruction Microscopy) are archetypal SMLM methods which are based on the stochastic activation of subsets of molecules within a sample to allow for precise localisation and rendering post-acquisition 2,3 . The key difference between these two approaches lies in the nature of the reporters used to generate 'blinking' events which result from the stochastic activation of fluorophores. While in STORM paired dyes or organic fluorophores capable of switching within an appropriate redox buffer (dSTORM) are used, PALM relies on photoswitching fluorescent proteins (e.g. mEos, Dendra). Since the early inception of these techniques, a number of variations have been developed which aim to address the shortcomings of previous iterations and improve on both spatial resolution and the quantitative capability of SMLM (including iPALM, PAINT, and MINFLUX) [8][9][10] .
SMLM techniques are potentially potent quantitative tools, however single molecule imaging and quantification have significant caveats which are critical for accurate image acquisition, reconstruction, and quantification. First and foremost is the correct identification of single molecules which is often confounded by repeated cycling between on-off states, resulting in multiple detections from a single emitter [11][12][13] . This is a significant limitation in single molecule quantification, and a considerable focus of the field is accurately detecting individual emitters and modelling fluorophore behaviour. 1 A significant advance in the field would be the development of systems with a known labelling stoichiometry and known photoswitching kinetics. While PALM addresses both of these to an extent by introducing photoswitchable tags with a known 1:1 labelling ratio, the variability and artefacts introduced by over-expression are one of a number of problems inherent to this approach 14 . The endogenous expression of labels can offer a solution to this particular quantitative caveat of SMLM, and offers a number of other distinctive benefits. Cells expressing a genomically encoded tag will provide a large sample with defined expression profiles and label distributions, thus minimising sample to sample variation which is a major concern in nanoscale measurements 15,16 .
This approach has been used very successfully in prokaryotic cells, with a number of excellent studies highlighting the utility of endogenous labels in applying SMLM to interrogate nanoscale cellular architecture [17][18][19] . The application of this approach in mammalian cells has been limited in the past due to the poor efficiency and high costs associated with mammalian genome editing. With the advent of CRISPR-Cas9 genome editing however, the insertion of labels into target endogenous loci has become substantially more convenient [20][21][22] . To date a handful of studies have applied this approach in live and fixed cell super-resolution microscopy, and of these only two studies use the integration of photoswitchable tags for CRISPR mediated SMLM 15,16,23,24 . Cho et al. do not report on the effect of knock-in on expression level, or the resulting quality of their single molecule imaging. Hansen et al. effectively used their HaloTag knock-ins, and while the data is not shown, report the generation of homozygous knock-ins with no effect of insertion on the target's expression level compared to wild type protein 23,24 .
In our previous work, we reported a tag dependent expression of CRISPR edited TubA1B in multiple cell lines. While mEGFP is sufficiently expressed in Hek293T, A549, and Hel 92.1.7 to allow for diffraction limited imaging, mEos 3.2 tagged cells in all three samples show significantly reduced expression when compared to mEGFP knock-in. Of the three cell lines tested, only Hel 92.1.7 cells express mEos 3.2 tagged TubA1B at a high enough level to generate images comparable to dSTORM, according to quantitative measurements of resolution. As only heterozygous knock-ins were found, we reasoned that the expression of tagged tubulin was regulated to maintain cell function and morphology, and that where the intrinsic qualities of fluorophores have a more dramatic effect on protein function, the tagged allele would be subsequently down-regulated. As mEGFP is a highly monomeric, codon optimised fluorophore, its expression is better tolerated than mEos 3.2.
In this work we evaluate the effect of optimising a fluorescent protein on the expression of CRISPR-Cas9 generated knock-in cells (Fig. 1). We test mEos variants with improved monomericity and codon optimisation, and show a substantial improvement on knock-in to the TubA1B locus. To determine whether this approach is suitable for receptor quantification, we apply this approach to target CXCR4, a GPCR with significant therapeutic value, to evaluate the effect of knock-ins on receptor distribution and function. Finally, we explore the quantitative potential of CRISPR knock-in when compared to over-expression. We report a tag and gene dependent effect of insertion with significant implications for both single molecule imaging and the wider field of CRISPR mediated HDR (homology directed repair). To our knowledge, the behaviours of different tags in identical knock-ins have not been reported or systematically studied.
Similarly we show that compared to over-expression of CXCR4, CRISPR knock-in labelling of this receptor allows for accurate single molecule clustering analysis. This is a significant advance in the field of imaging receptor behaviours on the nanoscale, suggesting that despite the complications inherent to knock-in design, CRISPR-PALM is a powerful quantitative single molecule tool.

Results
Monomeric and codon optimised variants increase endogenous expression at the TubA1B locus for improved single molecule imaging. We reasoned that the variation in CRISPR labelled TubA1B observed in previous work was due to fundamental differences in fluorophore properties which could lead to down-regulation or degradation. mEGFP is a highly monomeric, codon optimised fluorescent protein while mEos 3.2 has been reported to oligomerise 15,25 . To test this hypothesis, HDR donor templates for TubA1B were generated carrying a codon optimised mEos 3.2 and both the original and a codon optimised version of mEos 4b (mEos4b CO) (Fig. S1) reported by Paez-Segala et al. 26 . These donors were co-transfected with a TubA1B targeting guide and Cas9 expressing plasmid into Hel 92.1.7, a cell line which we previously demonstrated expresses high levels of TubA1B but still exhibits significantly less mEos 3.2 tagged TubA1B when compared to endogenously labelled mEGFP at the same locus 15 .
As in our previous work, cells were single sorted to select for the most fluorescent clones (and therefore cells which best express a specific label) (Fig. S2A). Individual clones were expanded and further validated by flow assisted cell sorting (FACS) and immunofluorescence to ensure adequate expression before validation by PCR (Fig. S2B). As previously reported by ourselves and Roberts et al., no homozygous expression of tagged TubA1B is observed (loss of the genomic wild type band and wild type protein). Interestingly however, markedly different expression profiles are evidenced on western blotting of validated clones ( Fig. 2A). In wild type cells one band at the predicted molecular weight of tubulin is observed (approx. 50 kDa), while in each of our CRISPR knock-ins a heavier band consistent with the fusion protein is observed and quantified (approx. 80 kDa) ( Fig. 2A). When compared to our previously reported mEos 3.2 clones, no significant difference in expression is observed in clones carrying a codon optimised variant of the original mEos 3.2 (Fig. 2B). mEos 4b shows a significant decrease in expression (p < 0.0001), however the codon optimised version of this sequence demonstrates a significant increase in expression when compared to the mEos 3.2 clone (p = 0.0005, 0.0244, and 0.022 respectively) (Fig. 2B). This Western blot data was further supported by quantitative Real Time PCR (qRT-PCR) amplifying the edited gene using tag specific forward primers, and a reverse primer anchored to the TUBA1B sequence (Fig. S4).
The highest expressing mEos 4b CO clone was taken forward to evaluate the effect of improved label expression on the resulting single molecule imaging. PALM imaging of clone C4 results in high quality single molecule images which demonstrate a significantly increased number of localisations compared to mEos 3.2 (p = 0.0230) (Figs 2C,D and S3). No significant difference in other measures of resolution, namely Fourier Ring Correlation and localisation precision (XY uncertainty) were observed (Fig. 2E,F). An improvement in RSP (global Pearson correlation coefficient as measured by the error mapping software SQUIRREL) was observed across inserts in Hel 92.1.7 cells (p = 0.728 and 0.844 for mEos 3.2 and mEos 4b CO respectively) (Fig. S5) 27 .
In our previous work we reported a significant down-regulation of TubA1B-mEos 3.2 when compared to mEGFP in Hek293T cells, and show that in these knock-in cell lines the poor expression of mEos 3.2 labelled cells does not allow for the clear resolution of microtubules and performs poorly when quantitatively compared to dSTORM images of Hek293T 15 .
To determine whether the improvement in expression achieved by mEos 4b CO knock-in improves the quality of SMLM in Hek293T, where TubA1B is more poorly expressed compared to Hel 92.1.7, we generated Hek293T clones carrying TubA1B-mEos 4b CO and confirm the heterozygous insertion of the tag before quantitatively establishing expression (Fig. S6). Compared to previously generated mEos 3.2 clones, two of the three mEos 4b CO tagged cells show a significant improvement in expression (Fig. 3A). While this improvement is not as substantial as the increase observed in Hel 92.1.7 and remains significantly lower than mEGFP expression, there is a significant effect on the quality of resulting single molecule images (Fig. 3B). This is quantified as a significant improvement in the number of localisations, FRC, and localisation precision (XY uncertainty) ( Fig. 3C-E). We also observe an improvement in RSP as measured by SQUIRREL (mEos 4b CO RSP = 0.718 compared to mEos 3.2, RSP = 0.565) (Fig. S8). Together this data indicates that across both cell lines, the codon optimised, monomeric mEos variant is better expressed, and the resulting improvement in the distribution of the endogenously expressed label also improves SMLM resolution and performance. Multiple inserts including optimised mEos variants were cloned into identical donor plasmids to assess the effects of fluorophore properties on endogenous expression after CRISPR knock-in. Cells were identically transfected with specific guides, and after the clonal isolation of the brightest (and hence best expressing cells) by single cell sorting, cells were validated for insertion efficiency, expression level, and functional effects. Finally successful clones were interrogated for their efficiency as tools for single molecule microscopy through measurements of effective resolution and cluster distribution.
www.nature.com/scientificreports www.nature.com/scientificreports/ CRISPR targeting of CXCR4 allows for diffraction limited and super-resolved imaging of knock-in tagged receptor. Beyond its capacity for nanoscale resolution, single molecule imaging allows for unique quantification strategies which provide insights into the distribution, clustering behaviours, and stoichiometry of proteins of interest. Information of this kind is particularly relevant to the study of receptors, and as such the generation of a single molecule reporter based on endogenous expression is of particular interest to the field of receptor biology.
The C-X-C chemokine receptor type 4 (CXCR4) is a G protein-coupled receptor essential for the proliferation of B-cell precursors during haematopoiesis, and plays a role in the homing and retention of most immune Optimised expression of mEos significantly improves SMLM imaging. Hel 92.1.7 clones CRISPR edited to express mEos 3.2, codon optimised mEos 3.2, mEos 4b, and codon optimised mEos 4b TubA1B inserts were isolated and compared to a previously reported TubA1B-mEos 3.2 clone to determine the effects of fluorophore properties and codon optimisation on expression level. (A) Western blotting of clones carrying specific mEos variants at the TubA1B locus showed a variation in expression of tagged tubulin dependent on the insert. (B) No significant difference was observed in mEos 3.2 codon optimised clones (ns p = 2.189, ns p = 0.2465, ns p = 0.1178), however a significant reduction in expression was observed in cells carrying mEos 4b when compared to the mEos 3.2 clone (****p < 0.0001, ****p < 0.0001, ****p < 0.0001). Conversely, a significant improvement in expression was observed in clones carrying a codon optimised mEos 4b (***p = 0.0005, *p = 0.0244, **p = 0.0022). (C,D) mEos 4b codon optimised (CO) clones produce high quality PALM images, with a significantly increased number of localisations compared to mEos 3.2 clones (*p = 0.0230). (E,F) No significant difference in fourier ring correlation or localisation precision were observed (ns p = 0.2819, ns p = 0.8161). (n = 3 (S.D), statistical analysis performed using One-Way ANOVA with multiple comparisons to the mEos 3.2 clone for western data, and by t-test for quantitative SMLM comparisons. Complete unedited gels are included in Fig. S2). www.nature.com/scientificreports www.nature.com/scientificreports/ cells [28][29][30] . It is also vital for the development of the hematopoietic, cardiovascular, and nervous systems during embryogenesis [31][32][33][34][35] . Dysregulation of CXCR4 function can result in cancer progression and metastasis, immunodeficiency diseases (particularly WHIM syndrome which is characterised by mutations in CXCR4), and neurodevelopmental defects. CXCR4 also acts as a co-receptor in facilitating HIV infection. CXCR4 has therefore been the focus of major drug discovery efforts.
CXCR4 forms homomers as well as heteromers with a number of G protein-coupled receptors including CCR5 and CXCR7 36,37 . These clustering behaviours affect its signalling properties [37][38][39][40] , making it an ideal target for proof-of-principle CRISPR-PALM knock-in studies to establish receptor kinetics and stoichiometry in response to ligands. CXCR4, like many GPCRs, also presents a number of labelling challenges which can be addressed using a genome-editing approach, namely poor specific labelling by antibodies for both immunofluorescence (and www.nature.com/scientificreports www.nature.com/scientificreports/ hence dSTORM) and western blotting (Figs 4A and S9) which results in a dependence on over-expression systems to study the receptor 41 . CRISPR knock-in has successfully ameliorated the absolute need for over-expression of donor fused CXCR4 in the bioluminescence resonance energy transfer assay as demonstrated by White et al., suggesting that C-terminal knock-ins are functional for this receptor.
Knock-in templates, described previously by White et al., were designed to target CXCR4 and introduce mEGFP, mEos 4b CO, and a codon optimised HaloTag sequences to the C-terminus. HaloTag was included to expand the repertoire of single molecule labels and due to the range of organic fluorophores available for the Conversely, HaloTag knock-ins demonstrate a 50% reduction in expression (*p = 0.0301), while mEos 4b CO knock-in shows no change (ns p = 0.6661). (D) When RT PCR is performed using tag specific primers, the significant increase in mEGFP expression is likely due to the stability of mEGFP tagged CXCR4 RNA, suggesting that RNA stability is likely a factor in mediating the tag specific down-regulation observed (**p = 0.0050, *p = 0.0155). (E) CXCL12-mediated G protein activation measured by observing G protein dissociation/conformational changes by BRET in knock-in or wildtype cells transfected with cDNA encoding Gαi1/Nluc and Venus/Gγ2. Results are expressed as % maximal CXCL12 response observed in wildtype cells andshow a 3 fold increase in response in mEGFP knock-in clones compared to wild type, with a 50% reduction in response in HaloTag knock-in, and no variation in mEos 4b CO clones. (F) Potency of CXCL12-mediated G protein activation in cells expressing wildtype or genome-edited CXCR4 (ns p = 0.0549, ns p = 0.9736, ns p = 0.5770). (n = 3 (S.D) for qRT-PCR data, compared by One-Way ANOVA with multiple comparisons. n = 5 for functional data).
www.nature.com/scientificreports www.nature.com/scientificreports/ labelling of this tag, allows for the potential future multiplex with existing fluorescent proteins in CRISPR-PALM studies. As a chemical tag which is non-fluorescent until exposed to an appropriately modified organic fluorophore, HaloTag offers the benefits of a one to one labelling ratio through endogenous expression as well as the high photon counts (and thus localisation precision) offered by organic fluorophores.
In cells expressing CXCR4-mEGFP or CXCR4-HaloTag knock-ins, labelled protein was observed as expected at both the plasma membrane and endosomal sites, surprisingly however, we found that mEos 4b CO appeared to form aggregates in PCR validated clones (Figs 4A and S10), either due to mis-localisation of the fusion protein or cleavage of the tag. mEGFP and HaloTag knock-ins demonstrate even labelling at the plasma membrane across the clonal population, (Fig. 4B), suggesting that knock-in establishes a stably expressing population suitable for large sample sizes.
Due to poor specific detection by western blotting, expression was determined by quantitative real-time PCR (qRT-PCR) (Fig. 4C,D). Compared to wild type, CXCR4-mEGFP knock-in clones demonstrate a substantial increase in expression (approximately 20 fold increase, (*p = 0.038)), while CXCR4-HaloTag knock-ins have an approximately 50% reduction in expression (Fig. 4C). No significant change in CXCR4 expression was observed in cells expressing the CXCR4-mEos 4b CO knock-ins, which when paired with CXCL12 induced signalling responses that are comparable to wildtype suggests that the tag is cleaved with no impairment on function or expression (Fig. 4C,E,F). qRT-PCR using primers specific to tagged CXCR4 demonstrate that the increased expression of CXCR4-mEGFP is likely due to an increase in mRNA stability, suggesting that mRNA stability is likely a factor influencing the variations in expression observed.
Furthermore, as reported by White et al., the addition of tags to the C-terminus of CXCR4 was well tolerated with no differences in the potency of CXCL12-mediated G-protein activation (Fig. 4F) or modulation of cAMP production (Fig. S11) between cells expressing wild type CXCR4 or CRISPR edited tagged CXCR4. However in agreement with an increase in CXCR4-mEGFP mRNA expression, an increase in the maximal response elicited by CXCL12 was observed (Fig. 4E) indicating an increase in tagged receptor expression.
Finally, the most significant implication of developing knock-in lines for single molecule imaging lies in the generation of a robust, consistently labelled sample which allows for the quantitative analysis of receptor organisation and function. To interrogate whether CRISPR generated CXCR4-HaloTag knock-ins offer a means of robust single molecule quantitation, knock-in cell lines were compared to Hek293T transfected with a CXCR4-HaloTag over-expression vector after treatment with CXCL12. Labelling CXCR4-HaloTag with Janelia Fluor 549 (JF549) allowed for stochastic activation and subsequent single molecule detection and cluster analysis (Fig. 5A).
Compared to Hek293T over-expressing CXCR4-HaloTag, CRISPR Knock-In cells demonstrate a significant increase in median cluster area upon ligand treatment (*p = 0.0272) (Fig. 5B). Conversely, transfected cells do not show a statistically significant difference in cluster size (ns p = 0.9484). Histograms plotting the percentage of clusters by area shows a clear shift towards larger clusters on ligand treatment in CRISPR knock-in cells, but no clear trend in transfected cells. This suggests that heterogeneity in over-expressing samples is likely responsible for the lack of a significant effect on ligand treatment (Fig. 5C).
A significant reduction in the percentage of clusters with a radius below 50 nm is observed in CRISPR cells (*p = 0.0384), alongside a parallel increase in the percentage of clusters with radii between 50-250 nm (*p = 0.0133) (Fig. 5D). No such trend is observed in transfected samples (p = 0.7277, p = 0.57557 respectively). These results are consistent with the trafficking and clustering of CXCR4 on ligand treatment, and show that while variations in transfected samples limit the application of quantitative single molecule imaging, CRISPR knock-in cells offer a consistently labelled sample which allows for the quantitative measurement of changes in cluster size and distribution on ligand treatment.

Discussion
In this study, we expand on our previous observation of a tag dependent effect on the expression of CRISPR labelled genes. In our original work, we hypothesised that the reduction in expression observed in mEos 3.2 tagged TubA1B when compared to equivalent mEGFP knock-ins is likely due to fundamental properties of the fluorophore affecting function, and therefore driving a regulatory down-regulation of the targeted allele. We expand on the original observation by testing mEos variants with improved monomericity and codon optimisation to determine whether, where identical selection strategies, CRISPR guides and donor templates are used, variations in expression attributable to these properties are observed.
In Hel 92.1.7 cells, which highly express the TubA1B isoform and demonstrate sufficient labelling through mEos 3.2 expression for high quality CRISPR PALM, the introduction of a codon optimised, monomeric variant (mEos 4b CO) results in a significant improvement in expression as shown by western blotting, and as a result, an improvement in the number of single molecule detections recorded in subsequent imaging experiments. Codon optimisation of the original mEos 3.2 sequence alone was not enough to improve expression, similarly the original mEos 4b sequence was more poorly expressed. These findings led us to conclude that improved monomericity and codon optimisation together improve expression in Hel 92.1.7, and counter-acts the down regulation of mEos labelled TubA1B we had previously observed.
To expand on this finding we generated equivalent CRISPR knock-ins in Hek293T, a cell line which demonstrates a significant down-regulation of the mEos tagged TubA1b gene in our previous work. The original mEos 3.2 TubA1B clones result in extremely poor SMLM imaging. mEos 4b CO Hek293T knock-ins demonstrate a 2-fold increase in expression which results in a substantial improvement in quantitative SMLM metrics, however expression of the mEos 4b CO tagged TubA1B remains significantly less than mEGFP labelling of the same locus. This finding suggests that there are likely other properties of both fluorophore and sequence impeding expression in Hek293T, which may include maturation rates or further regulation at the mRNA level.
To investigate the applicability of this CRISPR-mediated labelling approach to the study and quantification of membrane receptors, we generated Hek293T cells carrying mEos 4b CO, mEGFP, and HaloTag knock-ins at the www.nature.com/scientificreports www.nature.com/scientificreports/ www.nature.com/scientificreports www.nature.com/scientificreports/ C-terminus of CXCR4 following the design of White et al. in their work establishing a Nanoluc knock-in at the same locus. CXCR4 is an ideal target for CRISPR approaches due to poor antibody labelling and the importance of receptor dynamics in viable drug targets like GPCRs 42 . We included a codon optimised HaloTag sequence as self-labelling enzymes are extensively used in over-expression studies of GPCRs, and these approaches offer substantial imaging benefits (including the possibility of multiplexing with mEos, high photon counts and a plethora of conjugated labels).
Interestingly we found a series of different expression and functional profiles on knock-in. mEos 4b CO appears to be cleaved and aggregates within the cell, with no evidence of functional effects on the receptor. mEGFP knock-ins label the receptor well, however over-expression resulting in enhanced CXCR4 signalling/ function is observed which is likely due to the elevated stability of mEGFP-CXCR4 mRNA as evidenced by qRT-PCR. Finally, HaloTag knock-ins result in the only homozygous insertion observed in our studies thus far, suggesting that this tag is well tolerated. However a 50% reduction in expression is observed in these knock-ins, as well as a similar reduction in CXCL12 mediated responses.
Recently Hansen et al. report the use of HaloTag knock-in lines in the study of chromatin loop stability 24 . The authors report the generation of homozygous knock-ins which express HaloTag labelled cells at levels equivalent to wild type. This contrasts with our own findings that HaloTag knock-in cells demonstrate a 50% reduction in expression at the CXCR4 locus. We reproduce HaloTag knock-ins at the TubA1B locus in Hel 92.1.7 cells and find poor expression in these clones (approximately 4% compared to the significantly higher expression of mEos and mEGFP labelled TubA1B in these cells -all cells reported are heterozygous for insertion) (Figs S12 and S13). Together these findings suggest a very gene specific effect of HaloTag knock-in at target loci.
Our interest is in the application of CRISPR mediated knock-in to generate high quality samples for robust quantitative SMLM. Clonal populations which are functionally validated and provide an established labelling density are potentially powerful quantitative tools. Therefore we perform single molecule imaging and cluster analysis of CRISPR and over-expression CXCR4 HaloTag cells to determine whether the changes in receptor clustering resulting from ligand treatment can be established in these samples.
Our data shows that due to the sample to sample variation inherent to transfection, no significant change in cluster distribution is evident on ligand treatment. Conversely, our CRISPR cells consistently demonstrate significant differences in cluster statistics, consistent with the clustering and trafficking of CXCR4 on treatment with CXCL12. These results suggest that the generation of endogenously expressing samples are better suited to robust single molecule quantification.

conclusions
The use of CRISPR-Cas9 genome editing to introduce endogenously expressed fluorescent proteins remains a relatively small part of the literature, however with ever increasing knock-in efficiencies in classically difficult cell types, these experiments are becoming increasingly commonplace.
We demonstrate that targeting genes with identical guides, linkers, and homology templates but different tags achieves markedly different expression profiles and behaviours. To our knowledge, this has not been reported in the literature pertaining to CRISPR mediated knock-in to date. This variation in expression is likely due to a myriad of regulatory factors which are, in our hands, gene specific. Our results show that mEGFP is consistently well expressed regardless of the gene targeted, however none of our mEGFP knock-ins are bi-allelic, suggesting that in our targets there is an upper limit to the amount of tagged, endogenous protein tolerated. This work is consistent with reports from the Allen Institute, which find that on generation of a large knock-in mEGFP library specific genes do not tolerate bi-allelic insertions 43 . mEGFP labelling of CXCR4 appears to result in mRNA stabilisation which causes receptor over-expression, and as a result enhanced CXCR4 function and/or signalling.
The abnormal stability of mEGFP tagged CXCR4 implies that this is likely due to a functional over-expression on mEGFP knock-in at the CXCR4 locus.
In a recent review interrogating and discussing the effects of over-expression, Stepanenko and Heng highlight the genomic, epigenetic, and phenotypic effects of both transient and stable transfection on cells 14 . CRISPR mediated labelling offers a number of distinctive advantages in advanced light microscopy, including SMLM. However, our results show that establishing the effects of knock-ins is critical to generating useful biological models. This is particularly important in light of both increasing applications of CRISPR for HDR, and the growing evidence of other significant caveats to CRISPR, including off-target effects and low rates of correct HDR.
This study suggests that there is a need to determine the properties of both fluorophores and knock-in sequences which may affect function and expression. While we explore monomericity and codon optimisation, a myriad of other factors including maturation rates and mRNA stability are likely to be important. An 'ideal' insert is needed to fully realise the potential of these approaches, however CRISPR knock-in still promises to generate models which overcome classic limitations in cell biology and single molecule imaging.
Finally, our results show that despite the complexity involved in the generation of CRISPR lines which express specialist reporters well enough for high quality SMLM, the result is a stable population suitable for high quality single molecule imaging. Compared to the equivalent transfected samples, CXCR4 knock-ins allow for accurate cluster quantification. As imaging approaches and analytical methods consistently improve in the single molecule field, the generation of robust samples through CRISPR is likely to be a valuable addition to the growing single molecule arsenal. www.nature.com/scientificreports www.nature.com/scientificreports/ at 37 °C and 5% CO 2 . Adherent cells were passaged by washing T75 cell culture flasks (Corning) twice with sterile Phosphate Buffered Saline (PBS) without calcium or magnesium, before incubation with Trypsin-EDTA (Thermo Scientific) for 5 minutes at 37 °C. Cells were then re-suspended in complete DMEM before passaging at an appropriate dilution in a fresh flask. Suspension cells (Hel 92.1.7) were passaged by spinning cells at 1200 RPM for five minutes before re-suspension in fresh complete RPMI, dilution in fresh media, and incubation.

Materials and Methods
Adherent cells were transfected with Lipofectamine 3000 (Thermo Scientific) as per manufacturer guidelines. Briefly, cells were plated at a density of 1 × 10 5 in 12 well plates (Thermo Scientific) 18 hours before transfection. Immediately preceding transfection, cells were washed twice with sterile PBS before incubation in Optimem. The px459 guide and Cas9-puro expressing plasmid was used (Addgene plasmid #62988 was a gift from Feng Zheng) to introduce the previously reported TubA1B targeting guide C or CXCR4 targeting guide with Cas9 elements to transfected cells 15,42 .
Suspension cells were transfected using the Neon electroporation system (Thermo). Briefly, cells were resuspended in a final total volume of 10μl buffer R with 1μg of total DNA (equimolar guide C px459 and donor) and a total of 1 × 10 5 cells. 10 μL tips were used with 2x pulses at 20 ms pulse width and 1450 V. As described in our previous work (Khan et al. 2017), all CRISPR experiments included a complete panel of controls including a 'donor only' condition to establish that fluorescence was exclusively due to CRISPR mediated knock-in.
For over-expression studies, CXCR4-HaloTag CO sequence identical to that which is used in knock-in studies was cloned into a pcDNA3.1 over-expression vector and transfected at 25 ng per 35 mm MatTek dish. To maintain equivalent transfection efficiencies across experiments, transfection mixtures were bulked to a final total DNA concentration of 1μg with empty pGem-T-Easy vector.
Donor vector cloning. TuBA1B donor plasmids were generated by ordering gBlock synthetic DNA (Integrated DNA Technologies) for both left and right homology arms and each of the inserts tested (mEos 3.2, mEos 3.2 codon optimised, mEos 4b, mEos 4b codon optimised, HaloTag codon optimised and mEGFP). Codon optimisation was performed using IDT's online tool. Each fragment was designed with a 20 bp segment overlapping the adjacent arm. Homology arms and inserts were cloned into an empty pGem-T-Easy backbone using a HiFi Gibson Assembly kit (New England Biolabs).
2μL of each 10μL reaction volume was transformed into competent cells as described in the previous section. Individual colonies were selected after overnight growth on ampicillin plates, miniprepped (using manufacturer instructions included in the GeneJet kit -Thermo Fisher), and subject to an EcoRI test digest to verify the presence of an insert. Clones carrying an insert of the correct size were further verified by Sanger sequencing. Validated plasmids were amplified.
Donor plasmids for CXCR4 genome engineering were generated by sub-cloning codon-optimised, mEos 4b, mEos 4b and HaloTag as well as mEGFP synthesised as doubled stranded DNA gBlocks (Integrated DNA Technologies) into the CXCR4 donor vector described previously using XhoI and XbaI restriction enzymes 42 . Flow cytometry and single cell sorting. Validation and verification of knock-ins was performed using an Accuri C6 Flow Cytometer (BD Technologies). Samples were gated according to forward and side scatter, with positive fluorescence for these experiments detected in the FL-1 channel. Single cell sorting was very generously performed by Matt MacKenzie (TechHub) on a BD FACSAria Fusion cell sorter with a 100μm nozzle at 20 psi. Gates were set on the brightest population of cells to ensure highly expressing clones were selected.
Western Blotting. Clonal populations were expanded in 6-well plates (Thermo Scientific) before lysis in NP-40 buffer with proteolysis inhibitor (Sigma). Lysates were prepared by 30 minutes incubation on ice in NP-40 with protease inhibitors (Sigma) before a final 10 minute spin at 14000 rcf. The supernatant was then added to 2x reducing sample buffer and boiled for 5 minutes.
Western Blots were prepared through SDS-PAGE on 4-12% gradient Bolt gels (Thermo Scientific). Gels were run for 15 minutes at 70 V, and 45 minutes at 125 V. Once run, each gel was transferred to a polyvinylidene difluoride (PVDF) membrane (Bio-Rad) using a Turbo transfer system (Bio-Rad). After transfer, membranes were blocked with 4% BSA in 0.1% Tween-20 Tris buffered saline (TBST) and probed with the relevant primary antibodies. Secondary incubations were performed using fluorescent antibodies (LiCor Instruments) in TBST. Anti-mouse 680 and anti-rabbit 800 fluorescent antibodies were used for detection using an Odyssey Fc (LiCor Instruments). Tubulin and GAPDH probes were used as previously described. Anti-CXCR4 antibodies Fusin C8352 (labelled antibody 1) and knock-out validated ab124824 (labelled antibody 2) were obtained from Thermo and Abcam respectively. Finally, Image Studio was used to quantify western blots. www.nature.com/scientificreports www.nature.com/scientificreports/ CXCR4 Functional Assays. Cells were transiently transfected according to the manufacturer's instructions using FuGENE-HD transfection reagent (Promega, Wisconsin, USA) with plasmid cDNA coding for Gαi1/ Nluc (synthesised by GeneArt, Invitrogen) and Venus/Gγ2 (a kind gift from A/Prof Kevin Pfleger) 24 hours after seeding 350,000 cells/well in a 6-well plate. Cells were harvested with x1 Trypsin-EDTA (Sigma Aldrich) and seeded into poly-D-lysine (Sigma Aldrich) coated white flat bottom 96 well plates (655089; Greiner Bio-One, Stonehouse, UK) at 30,000 cells/well 24 hours before performing an assay.
For each BRET assay, media was removed from 96-well plates containing CRISPR/Cas9-modified or wildtype HEK293 cells and incubated with 1x HANK's Buffered Salt Solution (1xHBSS; 25 mM HEPES, 10 mM glucose, 146 mM NaCl, 5 mM KCl, 1 mM MgSO4, 2 mM sodium pyruvate, 1.3 mM CaCl2, 1.8 g/L glucose; pH 7.45) supplemented with 0.1% BSA for 1 hour at 37 °C under atmospheric CO 2 . Furimazine (Promega, Wisconsin, USA) was then added to a final concentration of 10 μM and incubated for 5 minutes before filtered light emissions where measured at 460 nm (80 nm bandpass) and 535 nm (60 nm bandpass) continuously at 37 °C using a PHERAStar FS plate reader (BMG Labtech). After 5 reads CXCL12 (0.1 pM-100 nM; Preprotech, Rocky Hill, USA) or buffer (HBSS containing 0.1% BSA) was added to triplicate wells and BRET was measured for a further 25 cycles. Raw BRET ratios were calculated by dividing the 535 nm emission by the 460 nm emission with the maximal change in BRET used for analysis.
For cAMP assays, wild type or CRISPR/Cas9-modified HEK293T were maintained as described for BRET assays above. On the day of assay cells were harvested with Dulbecco's Phosphate Buffered Saline (DPBS) supplemented with 0.2 g/L Ethylenediaminetetraacetic acid (EDTA; Sigma Aldrich) pre-warmed to 37 °C and re-suspended in stimulation buffer (HBSS containing 500 μM 3-Isobutyl-1-Methylxanthine and 0.1% BSA) at 400,000 cells/ml. cAMP production was measured using a LANCE ® cAMP Detection Kit (PerkinElmer) following the manufacturer's instructions. Briefly, 5 μL of cells were added in duplicate to a white 384 well plate (ProxiPlate-384, PerkinElmer) containing buffer only (stimulation buffer containing Alexa Fluor ® 647-anti cAMP antibody) or Forskolin (0.5 μM, Sigma-Aldrich) in the absence or presence of CXCL12 (1 pM-1 μM) for 45 minutes at room temperature. The reaction was then stopped by addition the cAMP detection buffer containing Eu-W8044 labeled streptavidin and biotin-cAMP. Plates were then incubated for 1 hour at room temperature and fluorescence was measured at 615 nm and 665 nm, respectively, 50 μs after excitation at 320 nm using an EnVision 2102 microplate reader (PerkinElmer).

Imaging (PALM and dSTORM). Clones of interest were imaged on 35 mm MatTek Dishes (MatTek
Corporation) on a Nikon N-STORM system (Andor iXon Ultra DU897U EMCCD, Ti-E stand, Perfect Focus, Agilent MLC400 laser bed). A 100x 1.49 NA TIRF Objective was used for each acquisition, which was then reconstructed using ThunderSTORM 44 (Maximum Likelihood, Integrated Gaussian PSF fitting).
Hel 92.1.7 cells were treated with phorbol 12-myristate 13-acetate (PMA) (Sigma) and thrombopoeitin (TPO -a generous gift from Ian Hitchcock) overnight to drive spreading and differentiation prior to imaging. Once seeded on MatTek dishes, cells were washed twice with PBS before treatment with microtubule stabilising buffer (MTSB -80 mM PIPES pH 6.8, 1 mM MgCL 2 , 4 mM EGTA) and 0.5% Triton-X100 for 30 seconds before fixation with ice cold Methanol at −20 °C for 3 minutes. Samples were then washed with TBST and imaged in PBS for CRISPR-PALM.
CXCR4 knock-ins were labelled with Janelia Fluor 549 (generously provided by Lavis lab) prior to fixation 45 . Cells were plated on 35 mm MatTek dishes overnight before the addition of 250 nM JF549 in complete DMEM and a subsequent 15 minute incubation. Samples were then washed with PBS and complete media before a further 30 minute incubation in unlabelled complete media. Finally, cells were washed 3x with PBS before fixation in formalin for 10 minutes. For CXCL12 treated samples 100 ng (10μM) recombinant human CXCL12 (SDF-1α, Thermo) was added for 4 minutes prior to fixation. Finally, these samples were imaged in blinking buffer (100 mM mercaptoehylamine-HCL, 1 g/mL catalase, 50 g/mL glucose oxidase, PBS) at 20 ms exposures and approximately 120 kW laser power (output at laser), (with low level 405 illumination added on successfully shelving JF549 into a dark state). Where the activation 405 laser was used, the Auto Laser Power (Auto LP) function of the N-STORM system was used for consistency across experiments. Auto LP incrementally increases the power of the activation laser to maintain the number of detections per frame across the acquisition. Images were acquired until the sample was completely bleached as demonstrated in Fig. S3. Cluster analysis. Images were reconstructed as previously described by Khan et al. using ThunderSTORM 44,46 . Prior to cluster analysis, images were drift corrected (cross correlations), and filtered to remove detections with uncertainties above 75 nm. Duplicates were removed, and blinks recurring within 50 nm in successive frames were merged. Filtered detections were clustered using persistence based clustering, implemented using RSMLM and KNIME (workflow available on request) 47,48 . Detection density was estimated by counting the number of neighbouring detections within 20 nm and the persistence threshold was set to 10 detections. For each field of view the central 10 μm 2 was cropped and analysed. Clusters with less than 10 detections were removed from the analysis. Cluster area was defined by the convex hull of the detections within a cluster.
Image and statistical Analysis. Statistical analysis was performed using GraphPad PRISM 6, with statistical tests as indicated in figure legends. Briefly, significance was determined using either 2-way ANOVA or 1-way ANOVA with multiple comparisons. n = 3 across samples unless otherwise stated, and error bars represent standard deviation of the mean. FRC measurements were performed using the recently published NanoJ-SQUIRREL plug-in 49,50 .
CXCR4 functional data were normalized to the maximal response generated by wildtype HEK293T cells and analysed using GraphPad Prism 7. Concentration-response data were fitted with sigmoidal curves generated