smiFISH and embryo segmentation for single-cell multi-gene RNA quantification in arthropods

Recently, advances in fluorescent in-situ hybridization techniques and in imaging technology have enabled visualization and counting of individual RNA molecules in single cells. This has greatly enhanced the resolution in our understanding of transcriptional processes. Here, we adapt a recently published smiFISH protocol (single-molecule inexpensive fluorescent in-situ hybridization) to whole embryos across a range of arthropod model species, and also to non-embryonic tissues. Using multiple fluorophores with distinct spectra and white light laser confocal imaging, we simultaneously detect and separate single RNAs from up to eight different genes in a whole embryo. We also combine smiFISH with cell membrane immunofluorescence, and present an imaging and analysis pipeline for 3D cell segmentation and single-cell RNA counting in whole blastoderm embryos. Finally, using whole embryo single-cell RNA count data, we propose two alternative single-cell variability measures to the commonly used Fano factor, and compare the capacity of these three measures to address different aspects of single-cell expression variability.

In this manuscript, Calvo et al. describes applications of an established smFISH approach (smiFISH or single molecule inexpensive FISH) in several extant and emerging arthropod model organisms. They also developed image analysis pipelines for single-cell transcript expression analysis in embryos or tissues. In general, the data is of high quality, the text is clear, the method is detailed enough, and the analysis methods can be broadly interesting. Although there are no novel biological findings, that does not seem to be the focus of this manuscript. I recommend its publication if the following points can be addressed.
Major point: The current neighbor-finding algorithm relying only on cell-cell distance is almost too simple and can only work for tissues with mostly near hexagon cells. In the case of cell shape changes during epithelial folding, for instance, some cells will be elongated, and the algorithm will clearly fail to faithfully identify neighbors.
For this type of analysis to be more broadly applicable, I'd recommend implement a more sophisticated algorithm. For example, one could start by a clearly large search radius, and determine how many edges the connecting line between the queried cell center and a potential neighbor cell center crosses; a true neighbor should cross only once, as long as the cell topology is not too complicated. A more robust algorithm will be identifying the vertices and edges of each cell to literally using shared edges as the criterion.
It is up for the editor and the authors to decide whether this is too much additional work, but I believe it is reasonable giving the coding capability gauged from the figures.
Minor points: 1. When describing eve expression in Figure 1, do authors suggest "low" eve mRNA expressing cells have post-transcriptional regulation that further sharpen protein-level expression boundaries, or just to highlight the superb smFISH resolution comparing to immunofluorescence and traditional in situ?
To advance the use of single molecule FISH in whole mount tissues, it is essential to present the evidence that the procedure and subsequent analysis correctly measure mRNA density. But the mRNA counts per cell presented here are not consistent with previous studies. Further, to ascertain the extent of biological variability requires accounting for measurement error, but this has not been done. Sources of measurement error that the authors should account for include: -Detection failure. The authors should estimate the extent of under-counting using two sets of probes labeled with different colors and directed against the same mRNA. Moreover, at a glance the failure rate appears quite high in the case of nos mRNA, single mRNAs of which are densely distributed through the entire oocyte (PMID: 9895314, 25848747).
-mRNA crowding. mRNAs will be undercounted at a rate that increases with increasing density. This is not accounted for in the manuscript, but has a major impact on any conclusions about variability. The authors do appear to be undercounting the gap gene mRNAs. For example, Kr has a maximum density of around 1000 mRNA per cell, where the basal extent of the cell is taken at 12 microns below the cortex (PMID: 23953111).
-Tissue deformation upon processing and mounting. The fixing and mounting procedure can alter the apparent density of mRNAs and nuclei and can be a dominant source of measurement error (PMID: 23953111, Fig. S2). In the current manuscript, such deformation might well contribute to measurement error, given the clear spatial correlations in the variability shown in Fig. 5G. For example, Kr variability approaches the Poisson limit in the most rapidly expressing regions of the embryo, i.e. Fano factor = 1 (PMID: 23953111, Fig. S6); however, in the current manuscript, the Fano factor clearly exceeds 1 in this region, and the Fano factor exhibits spatial correlations around 4-5 nuclear diameters. This is consistent with local tissue deformations impacting the counts per cell.
As a final point, the usefulness of the two new measurements of variability is not immediately clear. As a means of assessing variability between cells, it might be more informative to assign each cell a value based on how many standard deviations away from the "neighborhood mean count" that cell is found, and a p value to say whether that distance is significantly larger than expected. None of the measurements presented (including the Fano factor) help the reader understand whether the mRNA content of a given cell is significantly larger (or smaller) than expected.
The current neighbor-finding algorithm relying only on cell-cell distance is almost too simple and can only work for tissues with mostly near hexagon cells. In the case of cell shape changes during epithelial folding, for instance, some cells will be elongated, and the algorithm will clearly fail to faithfully identify neighbors.
For this type of analysis to be more broadly applicable, I'd recommend implement a more sophisticated algorithm. For example, one could start by a clearly large search radius, and determine how many edges the connecting line between the queried cell center and a potential neighbor cell center crosses; a true neighbor should cross only once, as long as the cell topology is not too complicated. A more robust algorithm will be identifying the vertices and edges of each cell to literally using shared edges as the criterion. It is up for the editor and the authors to decide whether this is too much additional work, but I believe it is reasonable giving the coding capability gauged from the figures. Thanks to Reviewer 1 for this good point. We fully agree, and have developed an improved strategy ( Figure  4) for automated neighbour detection, that does not rely on cells being regular in shape/size. This new method works by slightly expanding the border of each cell in 3D by just ~10% of a cell diameter, such that now cell borders slightly overlap only with directly touching neighbour cells. The program then identifies all border intersections, producing a list of only directly touching neighbours. We verify that this new method achieves near perfect agreement with a manual neighbour count. All code for the new automated neighbour detection method is provided.
We also appreciate that for some analyses, comparing cell expression among a larger group of cells within a given area may be more meaningful than limiting only to touching neighbours. Accordingly, we have also provided supplementary code for calculating and filtering the Euclidian distance between cell centre coordinates, to identify all nearby cells within a user-specified search radius.
Minor points: 1. When describing eve expression in Figure 1, do authors suggest "low" eve mRNA expressing cells have post-transcriptional regulation that further sharpen protein-level expression boundaries, or just to highlight the superb smFISH resolution comparing to immunofluorescence and traditional in situ? It is just to highlight the resolution obtained with smiFISH. A great advantage of the technique is that a single mRNA spot is the same intensity and equally detectable irrespective of overall gene expression level.
This allows visualization of very low expressing cells, even just a few RNA per cell, such as those 'low' eve expressing cells between stripes. With traditional in-situ or immunofluorescence relying on measurement of overall fluorescence level, such cells would likely be classed as non-expressing, as their overall fluorescence level would not differ significantly from background.
2. When segmenting cells using Spectrin signal, how far was the sharp mid-section Spectrin signal was extrapolated? Any supporting evidence or argument to support the choice of the cell height? For analysing cell to cell variability, it is essential that the RNA assignment to individual cells is accurate. Therefore, image stacks were taken to the basal limit of membrane ingression but not further; as it is not possible to assign any mRNAs located lower than this membrane limit to specific cells with certainty. In the image analysed in Figure 3, the first 20 slices of Spectrin staining was maintained, and slice 20 extrapolated for a further 28 slices to bottom of the stack defined as the basal limit of membrane ingression. This has now been made clearer in the text.
The choice of cell height will depend upon the specific question. If accuracy of cell to cell variability is critical, then Spectrin staining should not be extrapolated beyond the basal membrane ingression limit. However, if total mRNA number is more critical, with less focus on cell-to-cell variability, deeper stacks can be taken to encompass every mRNA including any below the basal membrane limit, and the sharp midsection membrane extrapolated accordingly beyond the basal limit through the whole stack. In a new Supplementary Figure 2, we have included quantifications of this kind from an additional 12 blastoderm embryos, to assess biological variability of total Kr mRNA number using deeper stack depths beyond the membrane limit. Supplementary Figure 2g shows differing degrees of membrane ingression which was measured to sort embryos by age order. The degree of membrane extrapolation depends therefore on embryo age (degree of membrane ingression), and whether stack depth is intended to encompass just mRNAs within membranes, or also to include those below the membrane limit. These considerations have now been included in the main text.
3. In Figure 1, posterior mRNA accumulation of the nos mRNA is marginal. Is this common? Yes, the staining is consistent with previous studies; the important point lies in the timing of posterior nos accumulation during oogenesis. The egg chamber shown in Figure 1 is at stage 10, and previous studies show that the majority of posterior localization of nos mRNAs occurs during stages 13/14 of oogenesis, following nurse cell 'dumping' of mRNA (PMID: 12867026, PMID: 9895314, PMID: 7515724). We chose to show a stage 10 oocyte, because this demonstrates the sensitivity of smiFISH by capturing the earliest onset of posterior accumulation of nos mRNA, earlier than previously observed by classic in-situ (PMID: 7515724 and PMID: 9895314), and live imaging (stage 11 PMID: 12867026) but similar to previous FISH staining, (stage 10 PMID 25848747). The text referring to this figure has now been adjusted to highlight this point.
4. The abbreviation MYA (million years ago) needs to be explained when first mentioned. This has been fixed. For changes, please see Supplementary Figure 1, and line 103. Figure 2, the Hox gene multiplexing. Are bright foci representing active transcription bursts? Is it still possible to count mRNA dots in these foci? When plotting the distribution of mRNA expressions of these genes with direct evidence of active transcription, does the histogram have a long tail? Discussion of this point can be combined with Figure 3 discussion. Hox genes are generally long, ranging from ~10kb (Dfd) to ~103kb (Antp), therefore tend to show large bright transcriptional sites, representing a localized accumulation of multiple nascent RNAs in the process of transcription along the gene length. Therefore these bright foci are not so much individual bursts as they are more persistent marks of ongoing active transcription through the gene length. Intensity of transcription sites can in principle be quantified, compared to single mRNA spot intensity, and the nascent RNA number inferred (PMID: 25728770). This approach may be relevant for long genes where much RNA is held at the transcription site, and for questions concerning transcription dynamics. A note about transcription site quantification has now been included in the Figure 2 discussion.

In
For changes, please see lines 134-139.

Reviewer #2:
Calvo et al. have adapted an existing single molecule FISH method, previously developed for cultured cells, to whole mount preparations. They demonstrate that the method works for embryos and tissues from a handful of arthropods. Using deconvolution and spectral unmixing, they show that up to eight genes can be assessed in whole mounts. They present a membrane labeling protocol compatible with their FISH method. They also present two straightforward measurements of cell-to-cell variability.
The mRNA labeling method will certainly be regarded by the community as a valuable cost-saving measure. The compatibility with membrane labeling will also be very useful for cell segmentation. Demonstrating the application of spectral unmixing is also useful. These are valuable additions to the toolbox of single mRNA molecule detection. However, the quantification is not rigorous and detracts from the manuscript. The manuscript otherwise contains no novel conclusions. The work might be more appropriate for a methodscentric publication once the authors address concerns about the quantification and measurement error outlined below. This work is indeed intended as a methods paper, and aims to present a methodology pipeline for multi-gene mRNA quantification and cell to cell variability analysis in whole embryos. We do not aim to draw any particular biological conclusions from the specific genes imaged and quantified.
1) To advance the use of single molecule FISH in whole mount tissues, it is essential to present the evidence that the procedure and subsequent analysis correctly measure mRNA density. But the mRNA counts per cell presented here are not consistent with previous studies. Further, to ascertain the extent of biological variability requires accounting for measurement error, but this has not been done. Sources of measurement error that the authors should account for include: We agree that it is important to demonstrate accuracy of the quantitation method. The original smiFISH publication (PMID: 27599845) thoroughly assesses the accuracy of smiFISH detection in cell culture, so we did not originally consider it necessary to repeat validations of the smiFISH technique. However, since we are applying smiFISH to whole embryos, and using a different image analysis software (Imaris), we recognize that separate validations of quantification accuracy are important and will strengthen the work. We have therefore performed additional experimental work to assess the accuracy of our quantitation in different ways, and these validations are provided in a new Supplementary Figure 2.
While our numbers are lower than those reported in PMID: 23953111, our range of maxima are consistent with several other studies that quantify mRNAs of early patterning genes. Boettiger and Levine 2013 quantify snail mRNA/cell and report maxima ranging from ~150 to 300 per cell across multiple embryos at nc14 (PMID 23352665). Hoppe et al. 2020 quantify u-shaped (ush) and hindsight (hnt) mRNA/cell and report maxima ranging from ~80-280 (ush) and ~120-240 (hnt) per cell across multiple nc14 embryos (PMID 32758422). Bothma et al. 2014 report a maximum of ~250 eve mRNA/nucleus in eve stripe 2 at nc14 (PMID 24994903).
1 i) Detection failure. The authors should estimate the extent of under-counting using two sets of probes labeled with different colors and directed against the same mRNA. Moreover, at a glance the failure rate appears quite high in the case of nos mRNA, single mRNAs of which are densely distributed through the entire oocyte (PMID: 9895314, 25848747). In Supplementary Figure 2 a & b, we present this suggested two colour test for Kr, and confirm strong agreement in mRNA/cell detected between the two probe sets, indicating minimal under-counting due to detection failure. This two colour test is a useful addition to the methodology pipeline we present, allowing detection failure to be measured and factored into quantifications where absolute mRNA numbers are critical.
For changes, please see Supplementary Figure 2 and lines 205-208.
We do not agree that our failure rate for nos detection is high; instead we believe that it is an issue of developmental timing. We detect abundant nos mRNAs in the nurse cells, significant accumulation at the anterior oocyte edge, and marginal accumulation at the posterior pole, which is expected for a stage 10 egg chamber, and is consistent with previous studies. For example, one of the studies indicated (PMID: 9895314, Figure 7) and another (PMID: 7515724 Figure 2) use less sensitive classic in-situ, and failed to detect any posterior nos mRNA at stage 10, with first detection observed at stage 12, and the majority of accumulation at stages 13 and 14 (PMID: 7515724). Similarly, a live imaging study first detected marginal posterior nos mRNA accumulation at stage 11, again with the majority of accumulation occurring later at stages 12 and 13 (PMID: 12867026). The publication indicated (PMID: 9895314) states the following: "…endogenous nos RNA are highly abundant in the nurse cells at stage 10 (asterisk). This RNA cannot be detected in the oocyte until after stage 10 when the nurse cells empty their contents into the oocyte (not shown)." This publication does not show that nos mRNA is densely distributed throughout the oocyte; it shows that it is distributed throughout embryos. The other paper indicated (PMID 25848747) reports distribution of nos mRNAs throughout embryos (with accumulation at the posterior pole), but does not show dense distribution throughout the stage 10 oocyte; it shows increasing accumulation of nos RNAs at the posterior end of the oocyte, from stage 10 to stage 13. The paper states: 'Synthesized by the ovarian nurse cells, nos enters the oocyte en masse when the nurse cells "dump" their contents at the end of stage 10 and becomes distributed throughout the oocyte by diffusion and the concurrent streaming of the oocyte cytoplasm (ooplasm)'. It also states '…we observed continuous accumulation of nos in granules at the posterior of the oocyte beginning at stage 10 of oogenesis. nos-containing granules increase in both number and mRNA content up until the oocyte reaches maturity at stage 14'. We therefore consider that our early stage 10 egg chamber staining is consistent with previous publications, and captures the earliest beginnings of posterior nos accumulation.
1 ii) mRNA crowding. mRNAs will be undercounted at a rate that increases with increasing density. This is not accounted for in the manuscript, but has a major impact on any conclusions about variability. The authors do appear to be undercounting the gap gene mRNAs. For example, Kr has a maximum density of around 1000 mRNA per cell, where the basal extent of the cell is taken at 12 microns below the cortex (PMID: 23953111). We have directly addressed mRNA crowding experimentally in Supplementary Figure 2 c & d, by imaging Kr mRNAs with 40X objective then with 100X objective, through identical z depths in the same embryo, to determine whether at lower magnification spots are overcrowded and consequently undercounted. Visual inspection of Imaris spot detection shows successful separation of crowded touching spots at both magnifications. We found a slight increase in detection efficiency at 100X compared with 40X, (Supplementary Figure 2d, 40X mean=54, max=136, 100X mean=66, max=142, non-zero cells only). This difference is modest and cannot account for the difference between our numbers and the maximum 1000 mRNA per cell stated.
We further considered whether differences in either embryo age, the depth of imaging, or embryo to embryo biological variability, may account for the difference between our Kr mRNA numbers and the maximum ~1000/cell figure. Using identical settings, we quantified Kr mRNAs in an additional 12 blastoderm embryos, imaged beyond the basal limit of membrane ingression to z-depths of up to 24um, starting just above the most apical mRNAs and ending just below the most basal ones. This captured every mRNA through the cytoplasmic depth, in a variety of different aged embryos, as measured by the degree of membrane ingression. Across the 12 embryos, maximum mRNA/cell ranged from 123 to 337. Since deep imaging was used to ensure inclusion of every mRNA, it is clear that z-stack depth is not responsible for the difference between our Kr counts and the previously reported 1000 mRNA/cell maximum. The substantial biological variability we find between embryos imaged and quantified identically, is consistent with the wild-type embryo to embryo variability found for snail mRNAs in Boettiger and Levine 2013 (PMID 23352665). This biological variability may account for some of the difference up to ~200 mRNA/cell, but still cannot reconcile the full difference with the 1000 mRNA/cell maximum figure.
1 iii) Tissue deformation upon processing and mounting. The fixing and mounting procedure can alter the apparent density of mRNAs and nuclei and can be a dominant source of measurement error (PMID: 23953111, Fig. S2). In the current manuscript, such deformation might well contribute to measurement error, given the clear spatial correlations in the variability shown in Fig. 5G. For example, Kr variability approaches the Poisson limit in the most rapidly expressing regions of the embryo, i.e. Fano factor = 1 (PMID: 23953111, Fig. S6); however, in the current manuscript, the Fano factor clearly exceeds 1 in this region, and the Fano factor exhibits spatial correlations around 4-5 nuclear diameters. This is consistent with local tissue deformations impacting the counts per cell. We consider that the spatial correlations in variability arise from shared positional identity of the cell in the context of the expression pattern. Cells in similar regions have similar levels and combinations of upstream transcription factors present, and consequently would be expected to show similar behaviour in terms of expression variability. This seems evident from the fact that spatial patterns in variability closely match the gene expression patterns themselves. For this spatial correlation in variability to be explained by tissue deformation, the physical deformations would inexplicably need to occur in patterns matching the gene expression domains.
2) As a final point, the usefulness of the two new measurements of variability is not immediately clear. As a means of assessing variability between cells, it might be more informative to assign each cell a value based on how many standard deviations away from the "neighborhood mean count" that cell is found, and a p value to say whether that distance is significantly larger than expected. None of the measurements presented (including the Fano factor) help the reader understand whether the mRNA content of a given cell is significantly larger (or smaller) than expected. We have tested this suggestion of expressing the variability as number of standard deviations the cell mRNA number is from the 'neighbourhood mean count', which is the Z-score. The attached variability measures spreadsheet below shows that the Z-score performs similarly to our numerical variability (NV), returning identical values for regions 1, 3 & 4, which have identical variability in absolute mRNA number, and a lower score for region 2. However, like NV, the Z-score does not differentiate the proportional difference between regions 3 and 4, whereas our proportional variability (PV) measure does reflect this difference. The Z-score does potentially have the advantage of assigning a p value to the score, however, this relies on the assumption that the data is normally distributed. For many cells, the neighbour number is either too low to test for normality, or the neighbour counts are non-normally distributed (for example, giant, normality test error: 36% of neighbour groups, non-normal: 10% of neighbour groups, normal: 53% of neighbour groups). Non-normality may be expected for genes expressed in discrete patterns, where some neighbour groups will lie on expression boundaries. The Z-score may be a useful alternative to NV, in applications where mRNA is being compared among larger groups of cells within a given area of tissue, as opposed to just touching neighbours, since a greater neighbour number would allow the assumption of a normal distribution to be tested. A normally distributed neighbour group may also be more likely for ubiquitously expressed genes, than genes expressed in discrete domains. The measures PV and NV do faithfully reflect absolute numerical variability and proportional variability of a given cell, with respect to its direct neighbours. To interpret the significance of these scores, statistics can subsequently be performed on the raw NV and PV values, depending the biological question. For example, cells with a NV or PV score significantly higher than the population mean score can be identified, to highlight the most 'abnormally variable' cells within the embryo. Additional text discussing this has now been included in the discussion section. and we also increase the number and duration of washes, to account both for the fact that embryos and 94 tissues are thicker and more complex than cells, and that complete removal of solutions between 95 washes is less feasible. 96 97 An identical protocol was used for all species and tissues, the only minor differences were in the 98 sample fixation method, and the final mounting (detailed in online methods). Across species, we 99 stained for the same two genes, even-skipped (eve) in early embryos, and engrailed (en) in later 100 embryos (Figure 1). Single mRNA resolution was achieved in embryos of all species, with very low 101 non-specific background, evident from the regions outside of stripes that are devoid of signal. In both 102 Drosophila species (diverged ~50 million years ago), eve is expressed in seven stripes. Classically, 103 eve stripes detected with normal ISH or immuno-staining tend to have a discrete appearance 19-21 , but 104 here magnified panels showing the regions in between stripes at single molecule resolution reveal that 105 eve is expressed throughout the entire region enclosed by the seven stripes. The stripes represent 106 waves of alternating high and low expression. In accordance with previous observations, eve shows 107 different patterns in Tribolium, Parhyale and Nasonia, which may reflect distinct upstream regulatory 108 inputs, and the differing modes of segmentation in these species compared with Drosophila 22-24 . 109 110 Imaginal discs were stained for en and wingless (wg) (Figure 1). Both genes have regions within the 111 wing disc with markedly different expression levels (en, magnified panel), and both sharp and diffuse 112 boundaries (wg, magnified panels), presumably arising from regional differences in transcriptional 113 regulation. Ovaries were stained for bicoid (bcd) and nanos (nos) RNAs (Figure 1) To view eight genes together, optimal excitation and collection from each fluorophore is essential to 141 avoid bleed-through between channels. This image was acquired using a Leica SP8 confocal with 142 white light laser, tunable to each specific excitation wavelength. Narrow collection windows of 143 ~20nm were set, corresponding to emission peaks of each fluorophore. Line averaging 16x, and high 144 resolution 4096 x 4096 format enabled single RNAs to be resolved. Despite settings that minimized 145 bleed-through, some still persisted between certain channels, so the image was spectrally unmixed 146 following acquisition. To avoid the need for spectral unmixing, a six colour stain using DAPI, 147 AlexaFluor 488, Quasar 570, CalFluor 610, Quasar 670 and Quasar 705 is ideal.  Figure  183 3d (cells with zero RNA excluded). For cultured cells, the shape of histogram distributions of this 184 type has been used to make inferences about promoter behaviour 33 , based on an assumption that the 185 promoter in each cell has a common behaviour shared throughout the cell population, leading to a 186 certain signature evident from the distribution. For example, a promoter with high bursts of 187 transcription followed by long off periods is expected to produce a distribution with a long tail to high 188 values 33 , similar to that found here for eve. However, it is important to note that such inferences 189 cannot be made for non-ubiquitously expressed genes in whole embryos and tissues, as the with densely expressed mRNAs, imaging at 100X may therefore be advantageous. However, for 217 lower expressed genes, for nascent transcription site analysis, or where whole embryo data is more 218 critical than absolute numbers, then lower magnification 40X imaging is preferable.  analyzing cells together as a single pool may not be informative. Single-cell variability is better 237 addressed by comparing variability between a cell and its immediate neighbours. We define 238 immediately neighbouring cells as those that directly share a membrane border in the Spectrin channel. 239 Using a 2D segmentation plane from the embryo shown in Figure 3b, immediate neighbour number of 240 each cell was manually counted in half of the embryo (Figure 4a), giving a range of 2-8, with the 241 frequency distribution shown in Figure 4d. To automate neighbour detection, the spots function in 242 Imaris was employed with a low threshold to detect ~80,000 spots in the Spectrin channel, providing 243 a dense representation of the membrane as spots (Figure 4b). A custom R code was developed that 244 uses both Spectrin spot coordinates and the cell ID to which each spot belongs, to construct 3D 245 polygons that closely matched the original segmentation ( Figure 3C). Each polygon was slightly The strength of this polygon method over a more simplistic neighbour search radius approach is that 255 the polygon expansion factor is a small fraction (~10%) of the average cell diameter, therefore serves 256 to exclusively identify only directly bordering cells, even in tissues with different cell sizes and 257 shapes. For certain studies, comparing cells within a given area of tissue may be more meaningful 258 than limiting to only touching cells. To provide this as an alternative option for the analysis pipeline, 259 we also developed code to calculate the Euclidian distance between the centre point coordinates of 260 every cell, and then filter by a specified radius, to return for each cell a list of all neighbours within 261 that radius. All code is provided via the link under 'code availability'. 262 263

New measures to capture numerical and proportional single-cell variability 264
The smiFISH panels in Figure 5b show eve-expressing cells from an early germband Parhyale 265 embryo, and highlight how a single cell can have a markedly different expression level from its 266 immediately adjoining neighbours. Such single-cell variability within a population has been shown to 267 have important biological relevance, for example in fate determination 11 , cell behaviour 9 , and disease 12 . 268 Fano factor is a commonly used measure of local mRNA variability, and is calculated as The centre cells in panels c and e are equally different from their neighbours, both proportionately and 274 numerically, whereas the centre cell in panel d is not very variable, being the same as all but one of its 275 neighbours. However, Fano factor fails to distinguish any difference between c and d (both 6.11) and 276 incorrectly finds e much more variable than c (42.37 vs 6.11). To overcome this limitation, we 277 devised two alternative variability measures, the local numerical cell variability (NV), and the local 278 proportional cell variability (PV) (Figure 5a). Both measures express how different an individual cell 279 is from its immediate neighbours. NV is normalized by the maximum mRNA per cell for the whole 280 cell population, therefore a high NV value highlights cells whose mRNA difference from their 281 immediate neighbours is numerically large in terms of the maximum level at which that gene can be 282 expressed. PV is normalized by the maximum just for the neighbour group, and so high PV does not 283 necessarily mean a large difference in actual mRNA number, just that the cell has a high proportional 284 difference from its neighbours. Both measures return values between 0 (no variability) and 1 285 (maximum variability). In the scenarios shown in Figure 5 c-f, 550 is used as the population 286 maximum. Both NV and PV find the centre cells in scenarios c and e to be equally variable, and d to 287 be less variable. Importantly, NV is the same between scenarios c, e and f, since the numerical RNA 288 difference between the centre cell and each neighbour is the same, whereas PV finds scenarios c and e 289 to be proportionally more variable than f. 290 291 Fano factor, NV and PV were calculated for the five genes shown in Figure 3, using neighbours 292 defined by the automated neighbour detection method (Figure 4). Variability scores are displayed as 293 heatmaps (Figure 5g). Cells outside of expression domains that have a single mRNA, surrounded only 294 by non-expressing neighbours, have the maximum PV score of 1. While this is correct, we were more 295 interested to highlight cells that had high PV within actual expression domains. Therefore when Whole genome DNA and RNA sequencing is becoming increasingly feasible and affordable, and 311 consequently the number of non-model organisms with whole or partial genome sequence is rapidly 312 growing. Since only ~1kb of gene sequence is required to design a probe set, smFISH can be applied 313 with ease to non-model species, revealing both expression patterns and levels. Here we have tested 314 smiFISH, with modifications, across a range of arthropod species and sample types, and found that it 315 enabled single mRNA visualization with consistency and high specificity. We also combined 316 smiFISH with subsequent membrane immunofluorescence, allowing whole embryo single-cell 317 segmentation. The anti Drosophila alpha-Spectrin antibody used did not work in the non-Drosophilid 318 species tested, so appropriate species-specific membrane antibodies are required for use in different 319 organisms. 320 321 smiFISH makes multiplexing simple and flexible, and therefore imaging becomes the limitation on 322 how many genes can be viewed together. Using an imaging strategy to optimize fluorophore 323 excitation and capture of emission peaks, we could image nine different channels simultaneously 324 (with spectral unmixing), or six channels without unmixing. The capacity to image more genes 325 simultaneously is advantageous as it allows more potentially interacting genes to be studied within the 326 same cells, thus eliminating error due to sample variability. 327 328 A major strength of smFISH is that position of the cell within the sample is preserved, which allows 329 variability to be analysed on a cell by cell basis. We compared a commonly used measure of cell 330 variability, the Fano factor, with two alternative measures termed NV and PV, that were devised to 331 better highlight individual cell variability. Each measure has its own strengths and limitations, and is relevant for studying diverse aspects of expression analysis, and we anticipate that the detailed 360 multi-colour imaging strategy provided here will prove valuable for analysis of gene networks. 361 Finally, it is our view that methods to appropriately analyze spatial cell to cell variability will yield a 362 new level of information critical to understanding how individual cell behaviors lead to biological 363 outcomes.     CalFluor 610, imaged with 40X objective, beyond the basal limit of membrane ingression through 120 slices with z step size of 200nm, to capture every Kr mRNA in the cytoplasmic depth. f) Slice by slice profile of sum greyscale intensity across a region of interest (ROI) through 120 z-slices. The trough between slices ~40-60 corresponds to the nucleus. g) Spectrin membrane staining at a cross sectional plane shows differing degrees of membrane ingression between stage 5 blastoderm embryos, used as a measure of embryo age. h) smiFISH of Kr mRNAs using CalFluor 610, in 12 blastoderm embryos imaged with 40X objective, z step size of 200nm, through total z-depths ranging from 20μm to 24μm, set to extend from above the apical extent to below the basal extent of Kr mRNA spots. For cell segmentation, Spectrin membrane staining was extrapolated beyond the basal ingression limit, to the full stack depth. Identical Imaris spot detection settings were used across all embryos. The plot shows Kr mRNA/cell across the 12 blastoderm embryos arranged in ascending age order, as measured

Figure Species
Sample type Gene Fluorophore