Development of a pipeline for automated, high-throughput analysis of paraspeckle proteins reveals specific roles for importin α proteins

We developed a large-scale, unbiased analysis method to measure how functional variations in importin (IMP) α2, IMPα4 and IMPα6 each influence PSPC1 and SFPQ nuclear accumulation and their localization to paraspeckles. This addresses the hypothesis that individual IMP protein activities determine cargo nuclear access to influence cell fate outcomes. We previously demonstrated that modulating IMPα2 levels alters paraspeckle protein 1 (PSPC1) nuclear accumulation and affects its localization into a subnuclear domain that affects RNA metabolism and cell survival, the paraspeckle. An automated, high throughput, image analysis pipeline with customisable outputs was created using Imaris software coupled with Python and R scripts; this allowed non-subjective identification of nuclear foci, nuclei and cells. HeLa cells transfected to express exogenous full-length and transport-deficient IMPs were examined using SFPQ and PSPC1 as paraspeckle markers. Thousands of cells and >100,000 nuclear foci were analysed in samples with modulated IMPα functionality. This analysis scale enabled discrimination of significant differences between samples where paraspeckles inherently display broad biological variability. The relative abundance of paraspeckle cargo protein(s) and individual IMPs each influenced nuclear foci numbers and size. This method provides a generalizable high throughput analysis platform for investigating how regulated nuclear protein transport controls cellular activities.

. Overview of experimental and analytical approaches used to identify changes to subnuclear foci in response to modulating the cells nuclear transport capacity. Using either transient transfection with plasmids encoding GFP-tagged IMPα 2/α 4/α 6 variants or siRNA knockdown of IMPα 2/α 4 the capacity of IMPα s to modulate delivery of PSPC1/SFPQ into the nucleus/paraspeckles was investigated. All images are Z-series captured via confocal laser scanning microscopy, scale bars represent 20 μ m. (A) GFP-tagged IMPα isoforms and their functional properties. Binding is indicated as true (✓) or false (⨯), with an indication of either full length or truncated Δ IBB variants (summarized in Fig. 1A). Transient over-expression of a GFP-tagged full-length IMPα protein will increase nuclear accumulation of its cargoes. In contrast, IMPα Δ IBB isoforms, which lack the importin beta binding (IBB) domain, exhibit a dominant negative effect on cargo accumulation because these still bind cargo proteins but cannot bind IMPβ 1 to form a functional transport complex 55 ; the resulting competitive binding will diminish cargo availability for binding endogenous IMPα and thereby reduce cargo nuclear accumulation. For IMPα 2, an additional control construct containing two point mutations in the NLS binding groove (lysine replacement at aa192 and arginine at aa396) was used (GFP-IMPα 2-ED). These mutations significantly reduce cargo binding 55,56 , but have little or no effect on endogenous IMPα 2 cargo nuclear transport; this isoform serves as a control for non-nuclear transport-related effects arising from GFP-IMPα 2 over-expression. The binding capacity and predicted nuclear transport outcomes from transfection with each of IMPα isoform construct are summarized in Fig. 1A. Other control samples included GFP alone, mock-transfected and not-transfected cells. The second approach used to modulate IMPα levels in HeLa cells was siRNA knockdown, targeting IMPα 2 and IMPα 4.
In all experiments, tiled confocal z-series images were collected for 3D visualisation and analysis, allowing the full volume of numerous (between 143 and 813) individual cells to be analysed in each sample (Pipeline outlined in Fig. 1B). Briefly, using Imaris software, cells, nuclei and PSPC1/SFPQ nuclear foci were identified. Results were exported from Imaris in CSV formats, processed and compiled into a compatible format using a series of custom Python scripts and then imported into the 'R environment for statistical computing' for analysis. This approach facilitated analysis of thousands of cells and quantification of hundreds of thousands of nuclear foci in a consistent and non-subjective manner. All raw data, exported from Imaris along with the custom python, R and shell scripts which compile and analyse these data, are provided in Supplementary Dataset SD1.
Cell gating was initially set to capture a very low GFP signal level, corresponding to the auto-fluorescence signal level. In this way, the cytoplasm and nucleus of every cell was identified, regardless of whether it was transfected or not. This approach removed the need to use an additional cell body marker, maximizing available fluorescence channels and minimising photo-damage by reducing laser exposure. To ensure that only transfected cells were analysed, a final mean GFP intensity threshold per cell (higher expression level of GFP) was later applied to the data using the R environment for statistical computing (Fig. 1Bx). This allowed the GFP thresholding to be applied to all test and control samples simultaneously, with adjustments made to identify a threshold where the detected cell number approached zero in the control samples. GFP thresholding was also selectively withheld from the control samples to extend analyses to non-transfected and mock-transfected cells using the same base parameters for cell, nuclei and foci detection. Overall, this analysis approach enabled accurate cytoplasmic identification of cells that had relatively low GFP-IMPα expression levels to achieve comprehensive measurements for all cells within each sample.
The nuclear detection threshold was set to ensure the nucleus was identified even in cells with a low level nuclear marker signal. Although this could slightly inflate the detected volume of each nucleus, this approach was chosen to avoid missing parts of some nuclei which could underestimate nuclear foci numbers. Using the R environment for statistical computing, nuclei on the edge of an image in the X, Y or Z image planes, and therefore likely to be incomplete nuclei, were excluded from the data sets. Subsequently, all cells without nuclei were also removed from the data sets (see Fig. 1Bx).
In the GFP-IMPα transient transfection study, the non-transfected (Not-Trans-C), mock-transfected (Mock-C) and GFP-transfected (GFP) control sample parameters for nuclear foci varied (Supplementary Tabl es S5, S6 and S7). We hypothesize that these differences reflect the physiological state of individual cells from each group in regard to cell cycle or local microenvironment differences at the time of sampling. This would be consistent with reported paraspeckle roles, but spotlights the paucity of knowledge about the inherent variability of paraspeckles within a population, and whether these are dynamically modulated within a cell in response to particular conditions. These results lead us to conclude that comparing outcomes within a single IMPα subtype, in which either cargo or IMPβ 1 binding has been manipulated, is appropriate, while comparing between different IMPα subtypes should be undertaken cautiously, and with this information in mind.

Functional IMPα protein levels determine endogenous PSPC1 localization to paraspeckles.
To assess the accuracy of the automated analysis pipeline, we initially compared its outcomes with those from our previous analysis using manual selection of individual cells 49 . HeLa cells were transiently transfected to binding strength (+ /− ). (B) Overview of image analysis pipeline. From this example merged z-series confocal image (Bi), the immunofluorescent signal for endogenous PSPC1 (Bii) was used to identify foci (Bvii), the nuclear marker DAPI (Biii) was used to identify the nuclei (Bvi) and the immunofluorescent signal for endogenous IMPα 2 (Biv) was used to identify the cell body (Bv). The other options for paraspeckle marker, nuclear stain and cell or transfection marker for the GFP-tagged IMPα experiments (GFP-Trans) or siRNA knock down experiments (siRNA KDs) are listed. The full 3D reconstruction of cells, their nuclei and foci (Bviii) is shown for this example image but it should be noted that each sample in an experiment has 49 + of these images. Data about these cells, their nuclei and foci were exported from Imaris in CSV file formats and reorganised ready for statistical analysis using custom python programming language scripts (Bix). Additional data manipulations were performed on a per experiment basis (as detailed for each) using custom R scripts (Bx), before final statistical analysis and outputs were generated using the R software environment for Statistical Computing (Bxi). The Python logo is a trademark of the Python Software Foundation (https://www.python.org; v2.7). The R logo (https://www.r-project.org/logo/) is licensed under CC BY-SA 4.0; the license terms can be found on the following link: (https://creativecommons.org/licenses/by-sa/4.0/).  express GFP-tagged IMPα 2 or IMPα 6, as each binds PSPC1 in a yeast two hybrid system and in an ELISA-based importin binding assay 49 . During cell/nucleus/foci detection in Imaris, nuclear PSPC1 foci were identified using the immunofluorescent signal for endogenous PSPC1 with parameters matching our previous study 49 to allow a direct comparison. All IMPα 2 samples produced results similar to those previously reported 49 , with GFP-IMPα2-ED control values intermediate to those obtained with the other two IMPα 2 isoforms (summary comparison in Fig. 2; detailed comparison in Supplementary Table S1). This congruency demonstrates that automated detection of cells and nuclei is of comparable accuracy to the laborious manual cell image cropping. For all paraspeckle-related endpoints, all but one of the GFP-IMPα 2Δ IBB sample values were significantly reduced relative to the GFP-IMPα 2-FL values (Table 1 and Fig. 3A). The one exception was the geometric mean (GM) PSPC1 voxel intensity (per foci), which was not significantly reduced (Table 1 L).
To interrogate nuclear accumulation, the mean of the fluorescent signal in the nucleus (F n ) and cytoplasm (F c ) was converted to a ratio (F n/c ) for each cell 44,57 . Mean PSPC1 F n/c values for all IMPα 2 samples increased with increasing IMPα 2 functionality as expected (Table 1E and Fig. 3Aii; Δ IBB [lowest function]: 2.04; ED: 2.08; FL [highest function]: 2.69). The other GFP-IMPα 2-FL sample parameters were unchanged or slightly increased compared with those from the GFP-IMPα 2-ED control. The only significantly different result was the PSPC1 F n/c value, indicating that the FL isoform significantly enhances PSPC1 nuclear accumulation.
Our previous demonstration of IMPα 6 binding to PSPC1 in yeast two hybrid and ELISA assays was extended here by measuring paraspeckle numbers and size in HeLa cells relative to IMPα 6 functionality. Significant differences in several parameters were recorded when comparing the FL and Δ IBB variants of IMPα 6. The Δ IBB variant exhibited a lower proportion of foci-positive cells, reduced nuclear accumulation of PSPC1 (PSPC1 F n/c ), a lower total volume of foci per cell and a reduction in the total signal from PSPCI-foci per cell when compared to the FL isoform (Table 1 and Fig. 3A). Although the number of foci measured per cell was reduced in the Δ IBB sample (FL:6.00; Δ IBB:4.41), this outcome did not reach significance, which most likely reflects the low proportion of cells containing nuclear foci in these samples (54.8% in FL [n = 91]; 30.8% in Δ IBB [n = 66]). This finding indicates that changing levels of IMPα 6 will also influence PSPC1 nuclear accumulation and the characteristics of PSPC1-positive nuclear foci, as recorded for IMPα 2.
These data demonstrate IMPα 2 and IMPα 6 can each modulate endogenous PSPC1 nuclear accumulation and localization to paraspeckles. In addition, the direct comparison to our previous work with IMPα 2 validates the automated analysis pipeline as an effective tool for detecting these outcomes.

Functional IMPα protein levels modulates endogenous SFPQ localization to paraspeckles.
To determine if changes in IMPα expression levels that altered PSPC1 nuclear accumulation and localization into paraspeckles also affected another core DBHS paraspeckle marker, we examined endogenous SFPQ in HeLa cells transiently transfected to express GFP-tagged IMPα constructs. IMPα 2 variants influence SFPQ localization to nuclear foci in a manner similar to that recorded for PSPC1 localization (Table 2 and Fig. 3B). The percentage of cells containing SFPQ nuclear foci is greatly increased in the IMPα 2-FL group (83.9%), and slightly decreased in the IMPα 2Δ IBB group (57.8%), compared to the IMPα 2-ED control sample (58.7%); ED and Δ IBB values are each significantly different (p = 0.0000) from the FL outcome (Table 2D and Fig. 3Bi). The F n/c for SFPQ was significantly reduced (p = 0.0000) in the Δ IBB (3.21) and ED (2.80) groups in comparison to IMPα 2-FL (4.75); the odds ratios when compared to the FL set to 1.0 are 0.675 for Δ IBB and 0.590 for ED (Table 2E and Fig. 3Bii). No other paraspeckle parameters displayed statistically significant differences. These outcomes suggest that SFPQ transport is affected by IMPα 2 functionality, but its relationship to paraspeckles is not.
Transfection with IMPα 4 isoforms resulted in remarkable and significant differences measured between IMPα 4-FL and IMPα 4Δ IBB samples, across the population, cell and individual foci parameters (Table 2 and   Table 1. Outcomes of modulating IMPα expression and transport function on endogenous PSPC1positive nuclear foci. The analysed cell numbers for each GFP-tagged IMPα transfection group, the number of detected PSPC1-positive nuclear foci and proportion of cells determined to contain PSPC1 nuclear foci (detected by indirect PSPC1 immunofluorescence with an Alexa Fluor 546 [A546] secondary antibody) are presented. Samples were assessed on a per cell or per PSPC1 nuclear foci basis, with geometric means (GM) and 95% confidence intervals (95% CI) calculated. To determine significant differences between groups, a logistic regression (Lg Reg) model was used for PSPC1 foci positive/negative cells, linear regression (Ln Reg) models were used for per cell data and generalised estimating equations (GEE) were used for per PSPC1 nuclear foci data. Comparative significance values using IMPα -FL as the reference groups (set at 1.000) are shown. Using Bonferroni correction, the significance threshold was reassigned from ≤ 0.05 to ≤ 0.008 (0.05 ÷ 6 experimental groups), with those outcomes below the threshold indicated (*). Further details are provided in Fig. 2A with additional samples and analysis parameters included in Supplementary Tables S2 and S 5 .   This demonstrates that IMPα 4 functionality can determine SFPQ localization to nuclear foci. The IMPα 6-FL group contained a significantly higher percentage of cells with nuclear foci (85.7%) than did the IMPα 6Δ IBB group (41.2%; p = 0.0000; Table 2D and Fig. 3Bi). A significantly greater F n/c per cell for SFPQ (FL:5.32; Δ IBB:2.66, p = 0.0000), and number of nuclear foci per cell (FL:6.85; Δ IBB:5.25, p = 0.0046) was measured within the IMPα 6-FL group compared to the IMPα 6Δ IBB group (Table 2E,H and Fig. 3Bi,Biii). The absence of other statistically significant differences indicates that, while the number of paraspeckles per cell differs depending on IMPα 6 functionality, the parameters of individual foci (volumes and SFPQ) do not.
These results show that changes in the functional levels of individual IMPα influence multiple paraspeckle parameters, including the localization of specific, key components. Thus the relative intracellular abundance of individual importins, and their availability for cargo binding, will affect paraspeckle formation.

Functional IMPα protein levels modulate exogenous dsRed2-PSPC1 localization to paraspeckles.
We predicted that the changing levels of specific cargos would also alter how IMPα s influence paraspeckle parameters. To test the impact of IMPα functionality when cargo is elevated, exogenous PSPC1 (dsRed2-PSPC1) and GFP-tagged IMPα constructs were co-transfected into HeLa cells.
A greater but not significantly different (p = 0.0614) proportion of cells contained PSPC1 foci in the IMPα 2-FL (51.3%) compared to IMPα 2-ED samples (43.7%), while this was significantly lower in the IMPα 2Δ IBB group (38.2%; p = 0.0042, compared to FL; Table 3D and Fig. 3Ci). Only the DsRed2-PSPC1 F n/c value was statistically significantly higher in the FL sample relative to the IMPα 2-ED (p = 0.0001) and Δ IBB (p = 0.0019) groups (FL:1.86; Δ IBB:1.62; ED:1.60; Table 3E and Fig. 3Cii). We interpret this as indicating that cells have an increased capacity for cargo transport (above endogenous levels) in the presence of increased levels of transport-competent IMPα 2. A direct comparison of exogenous versus endogenous PSPC1 data is shown in Supplementary Table S1. As expected, samples containing exogenous PSPC1 have a more and larger nuclear foci containing more PSPC1, relative to samples containing only endogenous PSPC1.
Analysis of IMPα 4 variants revealed significant effects on several outcomes measured for exogenous PSPC1, but only when considered at the level of individual cells. The IMPα 4-FL values were higher than Δ IBB levels for: percentage of cells with nuclear foci (FL:59.6%; Δ IBB:37%, p = 0.0000), number of foci per cell (FL:14.94; Δ IBB:7.78, p = 0.0001) and cumulative volume of foci (FL:2.48; Δ IBB:1.00, p = 0.0001). No significant reduction in DsRed2-PSPC1 F n/c was recorded, which was different than the significant decreases observed with the transport-deficient isoforms of either IMPα 2 or IMPα 6. This suggests transport of exogenous PSPC1 is not regulated by IMPα 4 levels, but that IMPα 4 does influence PSPC1 localization into paraspeckles. This aligns with ELISA-based assays that measured IMPα 4 binding to PSPC1 only at high IMPα 4 concentrations, with weaker binding than was recorded for IMPα 2 or IMPα 6 49 .

Expression levels of IMPα2 or IMPα4 correlate with PSPC1 nucleocytoplasmic distribution.
As an alternative approach to measuring the outcomes of modulating importin function, IMPα 2  T ab le 2. Outcomes of modulating IMPα expression and transport function on endogenous SFPQpositive nuclear foci. The analysed cell numbers for each GFP-tagged IMPα transfection group, the number of detected SFPQ-positive nuclear foci and proportion of cells determined to contain SFPQ nuclear foci (detected by indirect SFPQ immunofluorescence with an Alexa Fluor 546 [A546] secondary antibody) are presented. Samples were assessed on a per cell or per SFPQ nuclear foci basis, with geometric means (GM) and 95% confidence intervals (95% CI) calculated. To determine significant differences between groups, a logistic regression (Lg Reg) model was used for SFPQ foci positive/negative cells, linear regression (Ln Reg) models were used for per cell data and generalised estimating equations (GEE) were used for per PSPC1 nuclear foci data. Comparative significance values using IMPα -FL as the reference groups (set at 1.000) are shown. Using Bonferroni correction, the significance threshold was reassigned from ≤ 0.05 to ≤ 0.0063 (0.05 ÷ 8 experimental groups), with those outcomes below the threshold indicated (*). Further details are provided in Fig. 2C with additional samples and analysis parameters included in Supplementary Tables S3 and S6. or IMPα 4 knockdown by targeted siRNA was followed by simultaneous detection of either endogenous PSPC1 or SFPQ (each in duplicate experiments) and the relevant IMPα by indirect immunofluorescence (Supplementary Tables S8-S15). The mean intensity of IMPα 2 per cell on a population basis was reduced across the four experimental samples by introduction of siRNA targeting IMPα 2 when compared to the scrambled siRNA control. The IMPα 2 siRNA versus control signals were 0.43 and 0.59 for samples in which PSPC1 was detected, and 0.65 and 0.87 for SFPQ samples (calculated from values in Supplementary Tables S8-S15), demonstrating effective IMPα 2-targeting by these siRNAs. This was confirmed by Western blot with cell lysates (data not shown). Although the attempted siRNA knockdown of IMPα 4 was not consistently effective, these samples provided cell populations with a range of IMPα 4 levels that were used in subsequent analyses. A faster approach for image acquisition was trialled, using a resonance scanner to capture confocal z-series images for these samples. While scanning times were reduced to approximately 25% (from 32 days with galvo-scan imaging, to 8 days using the resonance scanner), reduced image quality made robust identification of foci impossible. As a consequence, outputs requiring foci detection are not presented or discussed for these experiments. F n/c measurements, which require only detection of the cell nucleus and cytoplasm, were reliably determined from these images, allowing the influence of each IMPα on PSPC1 or SFPQ nuclear accumulation to be determined following resonance scanning. The PSPC1 F n/c values in the IMPα 2 siRNA knockdown samples were reduced to ~80% of their scrambled counterparts (PSPC1: 0.78 and 0.80; SFPQ: 0.81 and 0.94; calculated from values in Supplementary Tables S8-S15).
To explore the flexibility and power of creating hierarchically linked outputs that describe multiple aspects of each cell, a different analysis approach was applied. Instead of making comparisons between siRNA knockdown groups, these outputs based on fluorescence signal were considered across the whole population of cells, regardless of treatment group. Correlations between PSPC1 F n/c and the IMPα signal within each cell are presented in Fig. 4. The upward sloping line in Fig. 4Ai indicates that, as IMPα 2 levels increase within cells, PSPC1 F n/c values also increase (correlation coefficients of 0.169 and 0.191 obtained for two independent experiments). The IMPα 4 samples generated the opposite result, showing a reciprocal relationship between PSPC1 F n/c and IMPα 4 levels (downwards sloping trend line, Fig. 4Bi; correlation coefficients of − 0.294 and − 0.350 for each of two experiments). These results provide an additional indication that IMPα 2 is a nuclear transporter for endogenous PSPC1 in HeLa cells, and they suggest that IMPα 4 is not. An alternative explanation for the lack of correlation with IMPα 4 levels may be that the expression across the cell population is relatively low and uniform, yielding a small dynamic range of signal. A similar analysis for SFPQ did not yield consistent results between replicates (Supplementary Figure S1); we interpret this to indicate IMPα 2 and IMPα 4 are not the only transporters for this paraspeckle protein because knockdown did not alter SFPQ distribution, while over-expression of IMPα s did (Fig. 3).
Finally, non-and mock-transfected cell groups alone were examined to study cell populations with a broad range of endogenous IMPα expression in the absence of any importin manipulations ( Fig. 4Aii and Bii). The overall trends observed were similar to those obtained from the complete set of siRNA knockdown samples ( Fig. 4Ai and Bi). This result confirms the value of previous studies, in which F n/c values correlate with IMP-based transport outcomes. Most importantly, the result of analyzing cells which have not been transfected demonstrates how application of a high throughput image analysis system can yield sophisticated and functionally relevant outcomes using only indirect immunofluorescence to detect endogenous cargo(s) and IMP proteins. This provides an exciting avenue for studying nucleocytoplasmic transport within intact tissues, by examining developmental systems in the absence of manipulations.

Discussion
Development and application of an automated image analysis pipeline enabled the rigorous interrogation of how IMPα functionality affects paraspeckle number and size. Imaris software allowed non-subjective and relatively fast batch-processing of hundreds of 3D images to identify cells, nuclei and foci. This was linked into an analysis pipeline using python and R scripts that extended the flexibility of data manipulation and provided access to a diversity of statistical analysis tools and graphical outputs. To also investigate nuclear transport of two key paraspeckle components, PSPC1 and PSF, distinct from their localization for nuclear foci formation, the pipeline calculated the ratio between the fluorescent nuclear and cytoplasmic signals for these proteins (F n/c ). Manual F n/c measurement is very time-consuming, potentially subjective, and cannot be accurately applied to samples with uneven fluorescent signals that can arise from protein localization to subcellular structures, such as paraspeckles. Because our approach segments the entire nucleus and cytoplasm in 3D, brighter or darker structures in either compartment are accounted for in the measured means. Once appropriate cell/nucleus/vesicle detection parameters have been determined, many images/cells can be analysed easily, with high quality 3D image acquisition times then becoming the primary limiting factor for extending cell analysis numbers. At present, achieving the correct balance between lengthy imaging times and final image quality is a challenging aspect of such high throughput experiments. We trialled the use of resonance confocal scanning to accelerate image acquisition for the IMPα siRNA experiments. The associated loss of image quality made this approach inappropriate for sub-organelle feature scale quantification, however analyses of organelle feature scales (such as F n/c outcomes) for whole cell populations provided meaningful measurement of endogenous nucleocytoplasmic transport activity.
Using an automated high-throughput image analysis pipeline can generate an overwhelming amount of data across multiple parameters in a relatively short time frame; sifting through this to identify the meaningful results can be both challenging and tedious. To help solve this problem we included principal component analysis (PCA) as part of the analysis pipeline. Through PCA, multiple parameters across groups of each experiment were condensed into two principal components, allowing a simple 2D relationship across all included parameters to be generated (Fig. 5). In addition to providing an accessible summary of the results, PCA also helps identify key Continued outcomes during the initial stages of data analysis, thereby providing strategic directions for subsequent data interrogation.
The results in this study collectively demonstrate that modulating functional levels of IMPα 2, IMPα 4 and IMPα 6 will impact nuclear import and delivery of PSPC1 and SFPQ to nuclear paraspeckles, and also provides evidence that the relative abundance of individual IMPα s and the cargo paraspeckle protein(s) influences these outcomes. In addition to reinforcing the knowledge that PSPC1 is a transport cargo of IMPα 2 49 , the manipulation of IMPα 6 functionality in HeLa cells provides new evidence that this importin can also effect nuclear transport of this core paraspeckle protein. The transport role of IMPα 4 is less clear, because the F n/c of over-expressed PSPC1 was not significantly different between samples co-transfection with either fully functional (FL) or transport-deficient (Δ IBB) isoforms. This contrasts with IMPα 2 and IMPα 6, for which the Δ IBB variants had lower nuclear-localized PSPC1 relative to FL counterparts. The endogenous SFPQ dataset ( Fig. 3B and Table 2) differs, with IMPα 2, IMPα 4 and IMPα 6 isoforms each influencing nuclear accumulation (F n/c ) and the percentage of foci-positive cells. Given that SFPQ has not been documented as an IMPα cargo, further investigation would be required to determine if these effects are a result of direct or indirect actions of IMPα . Importantly, all SFPQ paraspeckle parameters are significantly influenced by the IMPα 4 isoform (but not by IMPα 2 and IMPα 6, for which no individual foci parameters were affected). This suggests a unique functional relationship exists between SFPQ and IMPα 4 that facilitates SFPQ nuclear import and paraspeckle localization. IMPα 4 over-expression does not increase exogenous PSPC1 nuclear accumulation, but increases DsRed2-PSPC1 nuclear foci numbers, indicative of higher paraspeckle numbers in each cell. We hypothesize that IMPα 4 over-expression mediates paraspeckle enlargement, potentially through the elevation of SFPQ in paraspeckles, thereby stabilizing NEAT1 RNA 17 , and enabling higher levels of PSPC1 recruitment and accumulation into paraspeckles.
These findings will be of particular importance in developmental systems in which IMPα levels are dynamically regulated and paraspeckles or components thereof are also present. We previously showed that IMPα 2 expression peaks in the embryonic mouse testis (E12.5) and the adult mouse testis at developmental stages overlapping with PSPC1 expression 49 . NEAT1 transcripts also increase during muscle differentiation from myoblasts into myotubes, when paraspeckles are documented as enlarged and present in greater numbers 12 . This observation is interesting given that regulated expression of the nuclear transport machinery has also been implicated in muscle differentiation, with increasing IMPα 2 linked to myoblast proliferation, myocyte migration and myotube size 46 .
IMPα 2 expression has been identified as a prognostic marker of poor outcome in many cancers 58 , including those in which the long non-coding paraspeckle RNA NEAT1 has been independently implicated, including breast [59][60][61][62] , colon 63 , liver 64 and lung 65,66 . The link identified here between functional IMPα levels and the nuclear accumulation and localization of PSPC1 and SFPQ to paraspeckles leads us to speculate that enhanced paraspeckle formation and function may affect prognostic outcomes and provide therapeutic targets in oncology. The automated image analysis pipeline allowed for non-subjective, comprehensive examination of subcellular features on a mass scale, with the number of cells analysed extending far beyond what is feasible with manual analysis.   This adaptable, high-throughput analysis pipeline could be used to answer other research questions requiring quantification of subtle changes at subcellular levels or larger imaging scales. Within the Imaris cells module, the object named "vesicles" can be used to identify spots or foci, while the "nucleus" and "cell" components will identify larger objects. These three object types do not have to be cells, nuclei or vesicles; they could be anything, micro or macro, that is identifiable by intensity thresholding. Because the parameters from these object types are linked hierarchically within Imaris, the diversity of outputs, and information about their inter-relationships, is extensive. Furthermore, custom parameters can be achieved by those with programming knowledge by creating Imaris plug-ins (XTensions) or calculating them from existing Imaris outputs within the R environment for statistical computing. As imaging techniques advance and larger 3D data sets can be acquired in shorter time frames, automated analysis pipelines such as this, which allow subtle subcellular events to be rigorously interrogated across many thousands or millions of cells, will deepen our understanding of fundamental cellular processes.

Cell culture, transfection and indirect immunofluorescent staining. HeLa cells were maintained
in Dulbecco's modified eagle medium with 10% (v/v) fetal calf serum, Penicillin-Streptomycin (Pen-Strep), L-Glutamine and MEM Non-Essential Amino Acids in 5% CO 2 at 37 °C. Twenty-four hrs prior to transfection, cells were seeded on round coverslips in medium lacking Pen-Strep in 12 well plates for siRNA knockdown or 24 well plates for GFP/RFP-tagged construct transfections. Lipofectamine 2000 (Invitrogen) was used to transfect PSPC1 and IMPα 2 constructs, following the manufacturer's method with 2.5 μ g of DNA (single plasmid or 1.25 μ g of each for co-transfection). The Dharmacon ON-TARGETplus siRNA system (GE Life Sciences) with DharmaFECT 1 transfection reagent was used as per manufacturer's instructions. Pre-designed siRNAs targeting IMPα 2 (SMARTpool L-004702-00) and IMPα 4 (SMARTpool L-017477-00) were used, with a non-targeting (SCRAM siRNA) control pool (D-001810-10) as the siRNA negative control. At 48 hrs post transfection, cells were fixed in 3.2% paraformaldehyde (in PBS) for 10 min and washed (2 × 5 min, PBS) before proceeding to indirect immunofluorescence staining, as previously 49 . To detect endogenous mouse PSPC1 and SFPQ, mouse monoclonal antibodies specific to SFPQ and to the longer PSPC1 isoform were used 67 . Rabbit anti-IMPα 2 (Abcam, cat#ab84440) and goat anti-IMPα 4 (Abcam, cat#ab6039) were used to detect IMPα 2 and IMPα 4, respectively, for immunofluorescence. Primary antibodies (1:100 in 0.5% BSA/PBS) were applied overnight at 4 °C. Secondary antibodies, rabbit anti-mouse Alexa Fluor 546 (Molecular Probes-Invitrogen, cat#A11060) for GFP/RFP-tagged transfections and donkey anti-mouse Alexa Fluor 488 (Molecular Probes-Invitrogen, cat#A21202) plus goat anti-rabbit Alexa Fluor 546 (Molecular Probes-Invitrogen,   (Tables 1, 2 and 3 and Fig. 2) were used to perform PCA, allowing simultaneous comparisons of multiple parameters and revealing strong patterns between groups. In each experiment PC1 explains > 99% of the variance across all parameter and therefore the distances between groups across the X axis (PC1) should be considered as the primary delineator. Paraspeckles were assessed within experimental groups using indirect immunofluorescence with an Alexa Fluor 546 (A546) secondary antibody to detect endogenous PSPC1 (A), using indirect immunofluorescence with an Alexa Fluor 546 (A546) secondary antibody to detect endogenous SFPQ (B) or through exogenous PSPC1 by co-transfecting with a plasmid encoding DsRed2-PSPC1 (C). Parameters used to compare the geometric means of groups within experiments using a specific paraspeckle marker (PSM; A:PSPC1, B:SFPQ, C:DsRed2-PSPC1) were "% cells positive for foci", "cytoplasmic PSM intensity", "nuclear PSM intensity", "PSM F n/c per cell", "PSM intensity per cell", "number of nuclear foci per cell", "sum volume of nuclear foci per cell", "sum nuclear foci PSM intensity per cell", "nuclear foci volume", "nuclear foci PSM intensity" and "sum nuclear foci PSM intensity".
HeLa cell image acquisition. Imaging was performed using a Leica SP5 laser scanning confocal system (DMI6000 microscope, motorised stage, 63 × water/glycerol objective, Monash Micro Imaging Facility). Images were collected as Z-series and tiled in a 7 × 7 field of view grid (coverage of approximately 1.7 mm 2 ), with resonant scanning mode (8000 Hz) used for siRNA samples (coverage of approximately 0.9 mm 2 ).

Imaris-assisted image analysis to detect cells, nucleus and paraspeckles.
To assess paraspeckle number, size and PSPC1 intensity within each cell, the Imaris software package "Cells" module (Bitplane, Version 8) was used to batch process identification of cells, their nuclei, and paraspeckles, within the larger image sets described above (as shown in Fig. 1Bi-viii). Throughout the GFP-tagged IMPα transient transfection experiments, Draq5 signal identified the nucleus, GFP signal was used to identify the cell body, and nuclear foci were identified using the particular paraspeckle marker signal under investigation (i.e. PSPC1, SFPQ or DsRed2-PSPC1). For siRNA samples, DAPI signal identified the nucleus, IMPα (IMPα 2 or IMPα 4) signal identified the cell body, and nuclear foci were identified using the paraspeckle marker signal (PSPC1 or SFPQ). The results (output in CSV file formats) were combined and manipulated using Python scripts (Python Software Foundation, version 2.7), then analysed using the R Project for Statistical Computing scripts (The R Foundation, version 3.2). Incomplete cells with no nucleus or their nucleus on the very edge of an image (X, Y or Z image planes) were excluded from datasets for analysis and GFP thresholding was applied to datasets as described for the GFP-tagged IMPα transient transfection experiments. Additional R packages used for analysis were "car" 68 , "epitools" 69 ., "geepack" 70 , "ggplot2" 71 . Graphs presented in Fig. 2 were generated using Prism (GraphPad Software, Version 6), while all others were generated using R and the "ggplot2" package.
Statistical Analysis. For statistical testing, individual cells were assumed to be independent, but paraspeckles within each cell were assumed to be correlated. When analysing the individual cell or paraspeckle data, three outcome types were generated: 1) binary responses based on whether or not a cell was positive for paraspeckles, 2) counts data based on the number of paraspeckles within each cell (including/excluding zeroes) and 3) continuous data based on paraspeckle volume sum and paraspeckle PSPC1 intensity sum.
Comparisons between groups were made using generalised linear models (GLM); logistic regression for the binary data, linear regression for the count and continuous data. As the count and continuous data were both skewed, data were transformed using the natural logarithm to allow valid statistical inference from the linear regression models. The p-values are based on the transformed data; however, the results were then back-transformed to give estimates in the original scale for ease of interpretation. By taking the exponent of the mean of log-transformed data, the geometric mean and confidence intervals (CIs) were obtained on the original linear scale. By taking the exponent of the linear regression coefficients obtained on the log-transformed scale, the ratio of the geometric means and their 95% CIs were obtained on the original scale. Odds ratios are given for logistic regression results. When assessing data on a per paraspeckle basis, continuous outcomes were examined, which again required log transformations. Generalised estimating equations (GEE) were used to enable correlation between paraspeckles originating from the same cell 72 .