A comprehensive analysis of gene expression changes in a high replicate and open-source dataset of differentiating hiPSC-derived cardiomyocytes

We performed a comprehensive analysis of the transcriptional changes occurring during human induced pluripotent stem cell (hiPSC) differentiation to cardiomyocytes. Using single cell RNA-seq, we sequenced > 20,000 single cells from 55 independent samples representing two differentiation protocols and multiple hiPSC lines. Samples included experimental replicates ranging from undifferentiated hiPSCs to mixed populations of cells at D90 post-differentiation. Differentiated cell populations clustered by time point, with differential expression analysis revealing markers of cardiomyocyte differentiation and maturation changing from D12 to D90. We next performed a complementary cluster-independent sparse regression analysis to identify and rank genes that best assigned cells to differentiation time points. The two highest ranked genes between D12 and D24 (MYH7 and MYH6) resulted in an accuracy of 0.84, and the three highest ranked genes between D24 and D90 (A2M, H19, IGF2) resulted in an accuracy of 0.94, revealing that low dimensional gene features can identify differentiation or maturation stages in differentiating cardiomyocytes. Expression levels of select genes were validated using RNA FISH. Finally, we interrogated differences in cardiac gene expression resulting from two differentiation protocols, experimental replicates, and three hiPSC lines in the WTC-11 background to identify sources of variation across these experimental variables.


Introduction
Single cell RNA-sequencing (scRNA-seq) has revolutionized the study of heterogeneous cell populations and cell state transitions at the single cell level, including during differentiation and development. ScRNA-seq datasets enable identification and characterization of transcriptionally distinct cell types and transitions in vivo and in vitro by revealing the expression of cell type-specific genes or gene modules [1][2][3][4][5] . This has allowed for the identification of unique cell types and transient states, including rare types that were previously undetectable by bulk sequencing methods 4,6-8 .
Cell state transitions during the differentiation of human pluripotent stem cells to cardiomyocytes have also been profiled by scRNA-seq 9,10 and are characterized by some of the same gene expression programs found during in vivo development 6,[11][12][13] . Transcriptional profiling of in vitro-derived cardiomyocytes after extended time in culture or after perturbations can serve as a benchmark, classifying maturation states and marker genes that are associated with cell function and phenotype [14][15][16] .
Identifying cardiac maturation state is important in settings where the utility of in vitro derived cardiomyocytes is limited by their immaturity and heterogeneity, such as in vivo cell transplantation where immature cells can be pro-arrhythmogenic [17][18][19] , and drug testing and phenotyping platforms where the degree of functional maturation affects cell performance 14,20 . Despite the importance of characterizing populations of more mature cardiomyocytes, most scRNA-seq studies have captured only time points early in differentiation (up to two to four weeks 10,15,21,22 , or up to eight weeks 9,20,23 ).
Furthermore, in vitro differentiation systems are prone to biological and technical variability influenced by replicates and differentiation protocols 24,25 , resulting in differences cell phenotype, as well as cardiac purity with the presence of other differentiated cell types in the population. Without extensive experimental and technical replicates in these types of transcriptomic studies, identification and validation of genes for more focused downstream analysis may be obscured or confounded by non-physiologically relevant factors. These issues have historically been challenging to address due to technical limitations in extended sample collection and storage, scRNA-seq experimental bottlenecks, and need for batch correction 26,27 .
To profile the dynamic cell populations during hiPSC differentiation to cardiomyocytes and their in vitro maturation after extended time in culture, we performed scRNA-seq on cells undergoing directed differentiation at Days 0, 12, 24, and 90. Differential expression analysis on de novo-identified cell clusters is often an important first step in scRNA-seq analysis to identify marker genes that distinguish cell types and cell states beyond the limited set of previously established markers 28,29 . We complemented this type of cluster-based differential expression analysis with a cluster-independent, time point-based penalized regression method 30,31 that ranks and prioritizes genes based on how well they can predict the differentiation time point. The expression patterns of top-ranked genes were validated in re-plated cardiomyocytes by imaging using RNA Fluorescence in Situ Hybridization (RNA FISH). While some of the top-ranked genes were expected given their roles in cardiomyocyte development, this time point-based method allowed us to identify a short list of candidate genes that distinguished between D12, D24, and D90 cardiomyocytes with high accuracy. These data indicated that gene sets as small as two to twelve genes may be informative in staging cardiomyocyte maturation and differentiation state during extended culture.
In addition to characterizing transcriptional changes that occur during in vitro differentiation and maturation of cardiomyocytes, we explored the technical and biological variability arising from in vitro differentiation experiments. We sequenced cells from a total of 55 samples from multiple independent differentiation experiments with two endogenously fluorescently-tagged cells lines and their unedited parental line, each differentiated using two differentiation protocols. This reproducibility analysis was performed at the two earlier time points (D12, D24) with samples that were processed in a single batch to limit downstream batch effects. The differentiation experiments, cell lines, and protocols were correlated at the population level. However, we identified some variation in cardiomyocyte profiles within and across differentiation experiments. While fluorescent tags did not affect the transcriptional profiles of in vitro differentiated cardiomyocytes, we observed differences in the timing of key gene expression changes between cardiomyocytes based on differentiation protocol. Taken together, this work provides a comprehensive analysis of gene expression changes in a highly replicable and open-source dataset of differentiating cardiomyocytes.

Single cell RNA-sequencing reveals distinct cell types and cardiomyocyte differentiation states after directed cardiomyocyte differentiation
To identify and characterize the distinct cell types and transcriptional states present during in vitro differentiation of hiPSC-derived cardiomyocytes, we performed scRNA-seq on cell populations spanning four stages of the cardiac differentiation process: undifferentiated hiPSCs (Day 0), early-and intermediate-stage cardiomyocytes (Day 12 and Day 24), and an aged cardiomyocyte population that served as a benchmark for more mature cardiomyocytes (Day 90) [32][33][34] . We used the scRNA-seq method SPLiT-Seq 35 for parallel processing of all D12/D24 samples, including those from two differentiation protocols, three cell lines, and multiple independent differentiation experiments (sample overview shown in Supplementary Fig. S1 and Supplementary Table S1).
We first focused our analysis on cells differentiated with a small molecule protocol (referred to as Protocol 1; Fig. 1A, Supplementary Fig. S1B, D) collected from all four time points (D0, D12, D24, D90; n = 11,619 cells), thereby establishing the baseline of cell types and gene expression patterns before expanding the analysis to the entire dataset. Unsupervised clustering identified 14 clusters representing three major categories of cells: undifferentiated hiPSCs, cardiomyocytes, and differentiated nonmyocytes (Fig. 1B). The cluster corresponding to undifferentiated hiPSCs (C2) was identified by expression of the pluripotency transcription factor POU5F1 (OCT-4) (Fig. 1B-D). Five of the 14 clusters contained cardiomyocytes based on the presence of the classic cardiomyocyte gene TNNT2 (cardiac troponin T; C0, C1, C3, C7, C12; Fig. 1C-D), comprising the majority (~72%) of the differentiated cell populations. Parallel measurement of cardiac muscle troponin T protein by flow cytometry in the same samples prior to sequencing confirmed the fraction of TNNT2+ cells (R 2 = 0.81; Fig. 2B Fig. 1D). RNA FISH on re-plated cardiomyocytes was used to confirm the presence of TNNT2+/MKI67+ cardiomyocytes by imaging (Fig. 1E).
In some cases, cell types were challenging to define due to the presence of multiple classical marker genes associated with cardiomyocytes and a secondary cell type within the same cluster.
Between the intermediate and late stage cardiomyocytes (D24 to D90), enriched gene ontology categories included "cell-substrate adhesion" and "extracellular matrix organization", in addition to "heart development" (Supplementary Fig. S3C, Supplementary Table S4).
Although measurably distinct from each other (Fig. 1B, Fig. 2C), the changes in gene expression between D12 and D24 were less pronounced than those seen between D24 and D90 with relatively modest shifts in median transcript levels between these two intermediate time points (Fig. 2A, Fig. 2C Table S2). The myosin heavy chain genes MYH6 and MYH7 were among the relatively small set of genes with expression changes greater than two-fold from D12 to D24 (Fig. 3A, Supplementary Table S2). Across the population, MYH6 was more abundant at D12, and MYH7 was more abundant at D24 (Fig. 2C-D), indicative of an expression switch associated with the maturation of human ventricular cardiomyocytes 47,48 . C1 (mostly D24) also showed an increase in expression of the gene encoding cardiac calcium regulator phospholamban (PLN; Fig. 2C) and a decrease in expression of the gene encoding the voltage-dependent calcium channel Cav1.3 (CACNA1D) compared to cells in C0 (mostly D12 ; Fig. 2F).
The early/intermediate D12/D24 (C0, C1) to late D90 (C3) cardiomyocyte transition was marked by expression changes in cardiac muscle myosin genes often used to stage cardiac maturation, including the transition in expression from MYH6 to MYH7 and MYL7 to MYL2 in ventricular cardiomyocytes [49][50][51] ( Fig. 2C, D). In addition, we observed shifts from D24 to D90 in the ventricular and slow skeletal myosin light chain isoform MYL3, which decreased in expression over time in culture, as well as smooth muscle myosin light chain MYL9, which increased in expression over time in culture (Fig. 2D). Members of the SMAD family, which mediate TGF-beta signaling and are involved in early stage cardiac development and cardiomyocyte differentiation 52 , decreased in expression over time, while SMAD negative regulator LDLRAD4 increased in expression during extended culture (Fig. 2F, Supplementary Fig. S3D). We observed other changes in expression in genes encoding components of signaling pathways including cyclic nucleotide (cAMP/cGMP) signaling (PDE3A, PDE10A, PDE1C, ADCY5, and ion channels (CACNA1C, SCN5A and KCNQ1; Supplementary Fig. S3D). Several collagens were upregulated in D90 cardiomyocytes (Fig. 2E), which is consistent with previous observations of collagen expression by cardiomyocytes during early in vivo heart development 11,53 . The collagen COL2A1 decreased with age, a pattern previously reported to distinguish trabeculated cardiomyocytes from compact cardiomyocytes in human fetal heart development 6 (Fig. 2E). Consistent with the known metabolic switch from glycolysis in hiPSCs to fatty acid oxidation in mature cardiomyocytes, D90 cardiomyocytes (C3) exhibited downregulation of the glucose transporter SLC2A3 and up-regulation of three metabolism associated genes: the long-chain fatty acid transport protein SLC27A6, the fatty acid binding protein FABP3 11 , and glucose response IGF-binding protein 5 (IGFBP5; Fig. 2F, Supplementary Fig. S3D).
A complementary approach to identify subsets of differentially expressed genes for downstream analysis Pairwise differential expression analysis (as shown above) is often used to identify genes of interest for downstream analysis such as functional validation or imaging 28 . This approach is clusterdependent, with genes that differ between clusters identified at a pre-specified fold change or significance level (ex. Log2 fold > 1). We sought an alternative, complementary method to rank differentially-expressed genes and prioritize markers for downstream biological assays to reduce the number of genes that need to be tested experimentally. To accomplish this, we used a time point-based  Table S3). Using only the expression level of the highest-ranked gene (MYH7) as input, a simple logistic model correctly assigned the cells in the holdout data set to D12 or D24 with an accuracy of 0.81 ( Fig.   3B). Including the expression level of MYH6, the second ranked gene, improved the accuracy of prediction of time point for a given cell to 0.84 ( Fig. 3B -red dot). The ranking of MYH6 and MYH7 as the top two genes is consistent with their changes in expression during early cardiomyocyte development 47,48 . Expanding the list to include the top 12 ranked genes raised prediction accuracy to 0.94 ( Fig. 3B -blue dot). These included the cardiac calcium regulator phospholamban (PLN) as well as non-classical cardiac genes such as the proteoglycans COL2A1 and VCAN. The top 40 genes resulted in a prediction accuracy of 0.97 ( Fig. 3B -purple dot) and using 1,877 of the most highly variable genes between D12 and D24 cardiomyocytes led to a prediction accuracy of 0.98 (Fig. 3B).
Thus, small numbers of genes can accurately classify differentiation time points, with diminishing returns by including additional genes in the model. One hundred percent of the top 12 ranked genes and 70% of the top 40 ranked genes were also identified as differentially expressed between C0 (D12) and C1 (D24) cardiomyocyte clusters, showing that the bootstrapped sparse regression method is consistent with standard cluster-based differential expression while providing additional ranking and prioritization of genes (Supplementary Table S2, Supplementary Table S3).
Of the top ranked genes at D12/D24, many showed a continued progression of up-or downregulation between D24 and the D90 time point, indicating that their expression levels may be informative in identifying differentiation and maturation states at later time points (Fig. 3C). To further evaluate this, we repeated the bootstrapped sparse regression analysis on D24 and D90 cardiomyocytes and found similar prediction accuracy results with different genes (Supplementary Fig.   S4C-F). The top three selected genes were able to assign cells to D24 or D90 with an accuracy of 0.94 and included A2M, a plasma protein secreted by cardiac fibroblasts, the maternally imprinted long noncoding RNA H19, and insulin-like growth factor IGF2 (Supplementary Fig. S4C

Validation of select genes using RNA FISH
We used RNA FISH to validate the expression patterns of select genes by quantifying transcripts in cardiomyocytes re-plated on glass for high-resolution imaging. Due to the technical challenges of culturing single cells out to D90, we performed RNA FISH only at the early and intermediate time points The remaining genes were chosen for being differentially expressed between cardiomyocyte clusters (C0/D12, C1/D24, C3/D90) and for their known roles in cardiomyocyte development (Fig. 3C). Transcript abundance was quantified in segmented cells at each time point, and most assayed genes showed trends consistent with the scRNA-seq data ( Fig. 3C-E). Of the top two performing genes predictive of D12/D24 in the bootstrapped sparse regression analysis (MYH6 and MYH7), RNA FISH showed an expression change for MYH6, which decreased in expression between D18 and D30 (Fig. 3D). MYH7 did not show a large expression change with RNA FISH. While the sample time points are similar, re-plating cells for RNA FISH delays the time points relative to scRNA-seq (D18/30 vs D12/24) and places the cells on a stiffer substrate 50 . Furthermore, while MYH6 continued to decrease after D24 in the scRNA-seq data ( Fig. 3C) and was among the top ranked genes between D24 and D90, MYH7 was not a top ranked gene in the bootstrapped sparse regression analysis between D24 and D90 (Supplementary Fig. S4C, Supplementary Table S3). This is consistent with the switch from MYH6 to MYH7 expression occurring before D18 and the observed stable expression of MYH7 between the D18 and D30 RNA FISH time points. The scRNA-seq analysis revealed a range of MYH6 and MYH7 expression that was largely anticorrelated in single cells, and this cell-to-cell heterogeneity was also confirmed by RNA FISH (Supplementary Fig. S4G, Fig. 3D). Finally, the RNA FISH confirmed the expression of non-classical cardiac genes such as H19 and COL2A1 in differentiated and re-plated cardiomyocytes (Fig. 3E).

Expanding early and intermediate time point analysis to explore reproducibility of differentiation
Analysis of differentiated cell populations from Protocol 1 revealed distinct cell types and changes in gene expression over the time course of differentiation and established a baseline for applying these interpretations to a broader dataset. In vitro differentiation systems are prone to biological and technical variability, potentially arising from variables including hiPSC line, culture conditions, and differentiation protocol steps. To assess the robustness of our protocols and analysis methods, we expanded our data set to characterize the reproducibility of cardiomyocyte differentiation in different protocols and hiPSC clones. We sequenced samples from the two intermediate time points that were differentiated with a second differentiation protocol that used a combination of cytokines and small molecules (Protocol 2).
While cells for both protocols were collected on the same day, the nomenclature reflects a two day difference (i.e., D12 for protocol 1 is the same as D14 for protocol 2, D24 for Protocol 2 is the same as TNNT2-non-cardiomyocytes clusters (c4, c5, c6, c8, c9, c10) ( Fig. 4B-D). c1 was almost exclusively composed of D24 and D26 cells and had the highest median MYH7 level and lowest median MYH6 level among the cardiomyocyte clusters (Fig. 4D), consistent with changes in expression of MYH6 and MYH7 during ventricular cardiomyocyte development. The other three non-proliferative cardiomyocyte clusters (c0, c2, c3) were dominated by D12/D14 cells and differed in their median level of MYH7 expression with c2 being the lowest and c0 being the highest (Fig. 4D).

Contribution of cell line and experimental replicates to variability in differentiation
We calculated gene-wise coefficients of determination (R²) for variables of interest (time point, cell line, protocol, differentiation experiment) to quantify sources of variation and found that time point was the major source of variation as expected. While differentiation protocol and differentiation experiment also contributed to variance in gene expression, the specific cell line was not a major contributor (Fig. 5A,   Supplementary Fig. S6A). The D12/D14 and D24/D26 samples included three hiPSC lines in the same genetic background (two clonal gene-edited lines with endogenous fluorescent tags: AICS-0011 TOMM20-mEGFP and AICS-0037 TNNI1-mEGFP, and their parental line WTC-11). Expression profiles across the three cell lines were highly correlated at the population level, and cells did not cluster by cell line, indicating that the presence of a fluorescent tag on TOMM20 or TNNI1 genes and the clonal selection process used to generate these lines did not alter the differentiation potential of hiPSCs or the transcriptional profiles of differentiated tagged cardiomyocytes (Fig. 4D, Fig. 5A).
Gene expression in samples derived from independent differentiations were also correlated at the population level (Supplementary Fig. S6B). To explore experimental differences that may be obscured in the global analysis, we independently clustered cardiomyocytes from each time point (Clusters c0, c1, c2, c3, and c7 in early atrial genes (NPPA and ACTC1), both of which tended to be higher in the Exp1 and Exp4 populations at D12 (c0) compared to Exp2, Exp3, and Exp5 at D12 (Fig. 5D, E). Despite these differences, cardiac troponin T levels, as detected by flow cytometry and by TNNT2+ expression, were similar across these samples, suggesting that cardiac troponin T alone may not sufficiently capture the biological variation across independent differentiation experiments and that more refined descriptors of cardiomyocyte state are needed (Fig. 5D, Supplementary Fig. S6C Fig. 4D) all being more prevalent in Protocol 1. The stromal population (c8, both protocols, FN1 positive) was generated by both protocols (Fig. 4D).
We next independently re-clustered cardiomyocytes from the largest differentiation experiment, Exp5 (Fig. 5F, G), to focus on protocol-specific differences within a high-replicate experimental setup. We found that cells clustered by differentiation protocol, and at the D24/D26 time point, there was a distinct trend for higher levels of MYH7 and lower levels of MYH6 in Protocol 1 compared to Protocol 2 ( Fig. 5G-H). This trend of reduced MYH7 expression in Protocol 2 was also found for the earlier time point  Fig. 2C, Supplementary Fig. S3A, Supplementary Fig. S7G). These data suggest that the cells from both protocols undergo a similar progression of differentiation and maturation in these first few weeks and that there is a subset of genes that can robustly distinguish between time points across both differentiation protocols.

Discussion
In this study, we used single cell RNA-sequencing to profile the cell populations present during cardiomyocytes. The transition between D24 and D90 cardiomyocytes encompassed changes in many of the structural, metabolic, and signaling transcriptional programs that are activated during in vivo cardiomyocyte development 9 . We identified several clusters that suggest in vitro cardiomyocyte maturation. Interestingly, we observed a subpopulation of D90 cardiomyocytes with upregulated expression of collagens and extracellular matrix-associated genes (C5, Supplementary Fig. S2A). This observation is consistent with previous findings of collagen expression during early in vivo heart development, perhaps indicative of a transient subpopulation of cardiomyocytes with extracellular-matrix related gene expression in the heart during early cardiac development 11,53 , but could also be a technical artifact resulting from doublets produced during sample processing. At D90, we observed multiple clusters displaying co-expression of classical cardiomyocyte genes and another secondary cell type, including a mixed cardiomyocyte-like population expressing the cardiac smooth-muscle TRP channel gene TRPM3 (C7, Fig. 1D, Supplementary Fig. S2A, D). The TRPM3 channel is activated by thermal and hypotonic conditions 38 , thus we postulate that co-expression in this context may be a consequence of the in vitro culture conditions.
In vitro cardiomyocyte differentiation is prone to biological and technical variability that may be driven by factors such as the genetic background of cell lines, the batch of cells and other reagents used at the outset of an experiment, cell cycle and density of undifferentiated hiPSCs, and the protocols used for cell culture and directed differentiation 9,54-58 . This variability is a challenge for data reproducibility and makes it difficult to draw conclusions about cell types or states based on a limited number of samples and conditions. Historically, evaluation of cardiomyocyte differentiation performance and quality control has been evaluated using a single metric: expression of cardiac troponin T by flow cytometry or immunochemistry 56 . However, differentiated populations vary not only in the percent of cells expressing cardiac troponin T, but also exhibit variation in functional phenotypes including contractility, sarcomere organization, and electrophysical properties 56,57,59 .To probe the robustness of transcriptional profiles at the single cell level, we multiplexed the single cell analysis of >15,000 cells from 48 independent samples, spanning two differentiation protocols, three edited cell lines, and numerous experimental replicates. We found that while population level gene expression was well-correlated across experimental replicates, cell lines, and protocols, there were some differences that were not fully captured by presequencing analysis of cardiac troponin T by flow cytometry. This heterogeneity included expression differences in MYH6 and MYH7 across experimental replicates and differences in genes associated with atrial or ventricular specification and smooth muscle (Fig. 5C-H, Supplementary Figs.

S6C-E, S7C-F).
Transcriptional differences in cardiomyocytes between differentiation time points were evaluated using pairwise, cluster-based differential expression analysis, a common approach in analyzing scRNAseq data. We complemented this analysis with a time point based, cluster-independent penalized regression method, which enabled ranking and prioritization of genes based on how well they predicted the differentiation time point. While some of the top genes identified were of known importance in cardiomyocyte biology, we observed a number of genes that have little to no previously reported role in cardiac differentiation or maturation. For example, the top three genes in the D24/D90 analysis were a plasma protein secreted by cardiac fibroblasts (A2M), long non-coding RNA H19, and insulin-like growth factor (IGF2). While the collective role of these three genes in cardiac maturation has not been robustly established, H19 and IGF2 are co-expressed during development [60][61][62] . Furthermore, H19 has been found to inhibit the abundance of the cardiac maturation-inducing micro-RNA let7 63 , and A2M has been reported to promote cardiomyocyte hypertrophy in ventricular cardiomyocytes 64 . Their performance in this prediction model suggests they may be important for distinguishing cardiomyocyte maturation states in other studies. Notably, this analysis revealed that gene sets as small as two to twelve genes enabled prediction of time point with high accuracy, similar to the accuracy achieved by using over 1000 of the most highly variable genes. These data indicate that a small subset of carefully-chosen gene targets may be informative for downstream studies where gene set size is limited, such as in functional knock-out assays, in vivo experiments, or image-based RNA FISH studies 65 .
In summary, we used scRNA-seq to profile gene expression in cardiomyocytes and noncardiomyocytes during cardiomyocyte differentiation and extended culture in vitro. We tested the robustness of our conclusions by sequencing 55 total samples from numerous differentiation experiments, differentiation protocols, and cell lines. We found that while cell types and gene expression were generally correlated at the population level, there were differences in cardiomyocyte gene expression by differentiation protocol and experimental replicate. Using a cluster-independent regression analysis, we identified sets of two to forty genes that predict cardiomyocyte time point with high accuracy.
This shows that a limited number of genes can be used to benchmark the stage of cardiomyocyte differentiation and maturation, providing insight into the quality of cardiomyocytes for use in subsequent in vitro models or in vivo therapeutic applications. thank the Allen Institute for Cell Science team for helpful scientific discussions and support. The WTC-11 line that we used to create our gene-edited cell lines was provided by the Bruce R. Conklin Laboratory at the Gladstone Institute and UCSF. We gratefully acknowledge funding from R21CA246358 and support from the Washington Research Foundation. We would like to thank the Allen Institute for Cell Science founder, Paul G. Allen, for his vision, encouragement, and support.

Declaration of Interests
A.B.R, C.R. and G.S. are shareholders of Parse Bioscience.

C.
Differentially expressed genes were identified between pairs of clusters corresponding to the undifferentiated hiPSC population (C2) and the three largest cardiomyocyte populations: D12 (C0), D24 (C1), and D90 (C3). Heatmap shows the top 10 up-and down-regulated genes from each pairwise cluster comparison. Normalized transcript abundance was centered and scaled across each row (z-score color scale on top; red = standard deviations above mean; blue = standard deviations below mean; white = mean; for visualization purposes, 4 was set as the maximum z-score, and z-scores > 4 were set to 4). The dendrogram is based on hierarchical clustering of genes. Each column corresponds to one cell. Genes referenced in Results are noted with an asterisk.  B.
All unique selected gene sets (resulting from different values of the regularization parameter lambda) were used to calculate the prediction accuracy of cell age in the scRNA-seq bootstrapped sparse regression holdout data set (see Materials and Methods). X-axis shows each unique gene set size, and y-axis shows cell age prediction accuracy in holdout data. Color of selected gene set sizes are as follows: red = 2 gene set at lambda = 0.487; blue = 12 gene set at lambda = 0.213; purple = 40 gene set at lambda 0.0494; gray = all other gene set sizes. The prediction accuracy for a set of highly variable genes (n = 1,877) between D12 and D24 is shown as the dot to the right of the x-axis break. Prediction accuracies for random gene sets of the same size are shown as box plots with outliers omitted (selected gene sets ranged in size from 1 to 83 genes, and for each gene set size, 100 random gene samples were used for accuracy calculation).

B.
Differentiation experiment schematic. One differentiation experiment includes all plates set up on that experiment date and may include multiple cell lines, protocols, and/or time points. One plate or one half-plate was set up for each intended collection time point per cell line and protocol. Differentiation experiments were seeded on different days using an independent source plate of hiPSCs. Each scRNA-seq sample originates from a single well in a differentiation plate; in some cases, multiple wells/samples were collected per plate but were never pooled.

C.
Cardiomyocytes (TNNT2+ cells) from all collected D12 samples (from each of the five differentiation experiments) were independently clustered and visualized using UMAP. Lower right UMAP is colored by cluster, and each other UMAP highlights in red cells from one of the five differentiation experiments.

D.
Group violin plot showing distributions of marker genes in D12 clusters with cluster breakdown by experiment and cell line shown in the bar plots below.

E.
Heat map of top differentially expressed genes between the four non-proliferative (MKI67-) D12 cardiomyocyte clusters (c0, c1, c2, c3). Normalized transcript abundance was centered and scaled across each row (z-score color scale below heatmap; red = standard deviations above mean; blue = standard deviations below mean; white = mean; for visualization purposes, 4 was set as the maximum z-score, and z-scores > 4 were set to 4). The dendrogram is based on hierarchical clustering of genes. Each column corresponds to one cell.  Figures, Legends, and Tables.

Cardiomyocyte differentiation using two protocols
Two differentiation protocols were used, referred to as "Protocol 1" and "Protocol 2" in the text. Protocol 1 is based on a previously reported small molecule differentiation method with modifications 69,70 . HiPSC cultures were dissociated into single cells using Accutase and plated into matrigel-coated 6 well-plates at 125k -350k cells per well in mTeSR1 supplemented with 1% P/S and 10 µM Rock Inhibitor (Day -3). Media was replaced for 2 days, and differentiation was initiated on the third day (Day 0) by replacing media with RPMI-1640 (Invitrogen #A10491-01) supplemented with B27 without insulin (Invitrogen #A1895601) and 6 µM CHIR99021 (Cayman Chemical #13122). Media was replaced after 48 hours (Day 2) with RPMI-1640 supplemented with B27 without insulin and 5 µM IWP2 (R&D Systems #3533). Media was again replaced after an additional 48 hours (Day 4) by RPMI-1640 supplemented with B27 without insulin. Every 2-3 days thereafter (starting on Day 6), media was replaced with RPMI-1640 supplemented with B27 containing insulin (Invitrogen #12587010) and 1% P/S. Cardiomyocyte samples at Day 90 were differentiated using an optimized version of Protocol 1 with Chiron and IWP2, both at 7.5 µM. A full protocol is available at the Allen Cell Explorer (www.allencell.org/methods-for-cells-in-the-lab, SOP: Cardiomyocyte differentiation methods_v1.2.pdf).
Across both differentiation protocols, spontaneous beating was generally observed between Days 7-12. We have recently updated our protocols and recommend the use of RPMI-1640 (Invitrogen #11875-093) and B27 supplement (Gibco #17504044) for cardiac differentiation using both protocols.

Experimental design for scRNA-seq studies
Each differentiation experiment encompassed all plates set up at one experiment date, and one source plate of hiPSCs was dissociated and seeded concurrently for both differentiation protocols, thus keeping the input hiPSCs the same within an experiment. Differentiation experiments were seeded on different days using an independent source plate of hiPSCs. Samples were collected on the same day for each paired collection time point listed in the scRNA-seq dataset, from both protocols (Day 12/D14, D24/D26). The difference in reported day (i.e., Day 12 vs 14) is due to the delay in differentiation initiation (Day 0) between the two protocols, as described above in the "Cardiomyocyte differentiation using two protocols" section above. Samples were collected 15 days after seeding (denoted as D12 for Protocol 1, D14 for Protocol 2) or 27 days later (denoted as D24 for Protocol 1, D26 for Protocol 2). Samples referred to as D90 were independently derived and were collected 93-96 days after initiating differentiation using a modified version of Protocol 1 as described above. Each scRNA-seq sample originates from a single well in a differentiation plate. In some cases, multiple wells/samples were collected per plate but were never pooled. Stem cell (D0) samples for scRNA-seq were independently cultured and were not used as a source plate for any of the differentiation setups. There was a total of 55 samples included in this study; all scRNA-seq sample metadata can be found in Supplementary Table S1.

Single cell dissociation for scRNA-seq
Stem cells were dissociated to single cells as described above and processed for RNA sequencing following the protocol detailed in the "scRNA-seq library preparation and sequencing" section below. All differentiated sample wells were visually inspected at the desired cardiomyocyte collection time point for successful cardiac differentiation on the basis of high cardiac purity, spontaneous beating, and morphology; wells that passed these criteria were collected for downstream analysis. For single cell dissociation, cardiomyocyte wells were washed with 1X DPBS (Gibco #14190-144) and incubated with pre-warmed 2X TrypLE Select (Gibco #A12177) diluted in Versene (Gibco #15040-066) for 8-10 minutes at 37°C. Monolayers were gently dissociated using a P1000 micropipette to obtain single cells, collected, and added to 5 mL of resuspension media -RPMI-1640 containing B27 supplement, 1% P/S, 10 µM Rock Inhibitor and 200 U/mL DNAse I (Millipore Sigma #260913-10MU). Wells were washed twice with an additional 1 mL of resuspension media, and the cell suspension was centrifuged at 300 g for 5 minutes at 4°C. The single cell suspension was gently resuspended in 5 mL of resuspension media and counted twice on a hemocytometer to obtain a total cell count for each sample.

Flow cytometry of scRNA-seq samples
An aliquot was taken from each cell sample at the time of single cell dissociation for cardiac Troponin T (cTnT) analysis by flow cytometry. Briefly, sample aliquots were fixed with 4% paraformaldehyde (Electron Microscopy Sciences #15710) in DPBS for 10 min. After fixation, samples were stained for 30 min in BD Perm/Wash TM buffer (BD Biosciences #512091KZ) with anti-cardiac Troponin T AlexaFluor® 647 (BD Biosciences #565744) or an equal mass of AF647 lgG1 κ isotype control (BD Biosciences #565571). Fixed cells were resuspended in 5% FBS (Gibco #10437028) in DPBS with 2 µg/mL DAPI, followed by processing and data acquisition on a CytoFLEX S (Beckman Coulter). Analysis was performed using FlowJo software V. 10.2 (Treestar).

scRNA-seq library preparation and sequencing
ScRNA-seq was performed using the SPLiT-seq method 35 . After single cell dissociation (see "Single cell dissociation for scRNA-seq") samples were prepared for library preparation by centrifuging the cell suspension at 300 g for 5 minutes at 4°C and then resuspending in 1 mL of cold RNAse-free PBS containing 0.05 U/µL Superase IN (Invitrogen #AM2696) and 0.05 U/µL Enzymatics RNAse Inhibitor (Qiagen #Y9240L) (mixture referred to throughout as PBS + RI). On ice, resuspended cells were passed through a 40 µm filter and fixed with 3mL of 1.33% formaldehyde (Electron Microscopy Sciences #15710) for 10 minutes. Fixed cells were permeabilized with 160 µL of 5% Triton X-100 (MilliporeSigma #T8787-50ML) + RNase Inhibitor for 3 minutes. Permeabilized cells were then centrifuged at 500 g for 5 minutes at 4°C and resuspended in 500 µL of PBS + RI and mixed with an additional 500 µL of 100 mM Tris-HCL (ThermoFisher #AM9855G) and then 20 µL 5% Triton X-100. Cells were then centrifuged at 500 g for 5 minutes at 4°C and the pellet resuspended in 300 µL 0.5X PBS + RI, then passed through a 40 µm filter. Filtered cells were counted using a hemocytometer and diluted to 1x10 6 cells/mL in 0.5X PBS + RI. Labeled tubes were placed in a RT (room temperature) Mr. Frosty (Thermo Fisher #5100-0001) and placed into a -80°C freezer for storage.
In-cell reverse transcription, ligation barcoding, lysis, and library preparation were carried out according to the protocol described in Rosenberg and Roco et al. 35 . Vials were thawed by placing tubes in a 37°C water bath, and vial contents were pipetted into wells containing barcoded, well-specific reverse transcription primers and a reverse transcription reaction mix. Each well contained a mixture of random hexamer and anchored poly(dT)_15 barcoded RT primers. Two different sequencing experiments were conducted. Sequencing Experiment 1 contained samples across Day 12/14 and Day 24/26, while Sequencing Experiment 2 contained samples across D0-Stem Cells, Day 90, and one re-sequenced sample each of Day 12 and Day 24 that had been included in Sequencing Experiment 1 to control for sequencing batch variability. Sample metadata including sequencing experiments can be found in Supplementary Table S1. For Sequencing Batch 1 (48 samples; see Supplementary Table S1), each sample was placed in a single well (4,000 cells per input sample) for a total of 192,000 cells. For Sequencing Batch 2 (9 samples, see Supplementary Table S1), each D0 and D90 sample was split over 6 wells (24,000 input cells per sample). The D12 and D24 samples in Sequencing Batch 2 were each split into 3 wells (12,000 input cells per sample). In both sequencing batches, after three rounds of barcoding, the cells were counted and divided into sub-libraries of 5000 cells before lysis. These sub-libraries were barcoded with a fourth unique barcode each and processed for sequencing on an Illumina NextSeq.

scRNA-seq data processing
Sequence alignment and quantification of intronic and exonic UMI (unique molecular identifier) counts were performed as described in Rosenberg et al. 35 using STAR 72 for alignment, TagReadWithGeneExon from Drop-seq tools to assign reads to genes, and Starcode 73 to collapse UMIs. Per gene intronic and exonic UMI counts were collapsed prior to generating the count matrix. To filter out ambient RNA barcodes, the number of UMIs per barcode were plotted with barcodes ordered by number of UMIs in descending order. The inflection point (or knee) in the plot was used as the threshold to separate barcodes originating from intact cells and barcodes originating from ambient RNA; all barcodes that fell below this threshold were removed. The remaining cells were further filtered for mitochondrial content (percent of UMI counts coming from mitochondrial genes) to remove low quality cells and total UMI counts to remove potential doublets with very large counts. The two sequencing batches were filtered separately. For Sequencing Batch 1 (D12, D14, D24, D26; see Supplementary Table S1), cells within the highest 5% of percent mitochondrial and total UMI counts distributions were removed. For Sequencing Batch 2 (D0, D90; see Supplementary Table S1), D90 cells had higher percentage mitochondrial UMIs than D0, which is consistent with previous studies showing an increase in mitochondrial content and number during cardiomyocyte maturation 74,75 . D90 cells also had a decrease in total UMI counts compared to D0, which is consistent with other studies indicating a decrease after differentiation 76 . Prompted by these observations, we used the same filtering approach (removing the highest 5% of cells based on UMI and mitochondrial distributions) but applied it independently to each time point (D0, D12, D24 and D90), which led to different maximum cutoffs being used for each time point. For both sequencing batches, genes detected in less than 10 cells and cells with less than 200 transcribed genes were removed from the matrix prior to clustering and visualization.

scRNA-seq clustering and visualization of D0 and Protocol 1 D12, D24, and D90 samples
Cells were clustered and visualized using the R (version 3.5.1) package Seurat (version 2.3.4) 77 . Normalized transcript abundance for each gene was calculated by dividing counts by the total counts per cell, multiplying by a scaling factor (10000) and log transforming the result using log1p (NormalizeData function in Seurat). For stem cell (D0) and Protocol 1 samples (D12, D24, D90), highly variable genes to be used for dimensionality reduction and clustering were identified using FindVariableGenes with the following parameters: mean.function = ExpMean, dispersion.function = LogVMR, x.low.cutoff = 0.05, x.high.cutoff = 4, y.cutoff = 0.5 (these parameters were chosen to define outlier genes after manual inspection of mean expression vs. dispersion plot). Counts were scaled using ScaleData with default parameters without regressing out any variables. Principal component analysis (PCA) was performed on the normalized and scaled matrix (with highly variable genes only) using RunPCA. Standard deviations of the principal components were plotted to determine the number of components to retain for clustering and visualization, and principal components above the inflection point in the plot were retained. Cells were clustered using the Jaccard-Louvain method 78 , which is based on shared nearest neighbor modularity optimization (FindClusters function with resolution=c(0.3, 0.4, 0.5, 0.6, 0.8, 1) and the following standard parameters: algorithm=1 (original Louvain algorithm), modularity.fxn=1 (standard modularity function)). Uniform Manifold Approximation and Projection (UMAP) 79 was used to visualize cells in a twodimensional space (RunUMAP function with default parameters). In Figs. 1-2 and Supplementary Fig.  S2, where only Protocol 1 is shown (D0, D12, D24, D90; n = 11,619 cells), clustering with resolution 0.5 was used for visualization, differential expression, and other downstream analyses. Pearson correlation was calculated for cardiac troponin T transcript and protein abundance between flow cytometry-based abundance (% cTnT positive cells, described in "Flow cytometry of scRNA-seq samples") and scRNA-seq based abundance (% of TNNT2 positive cells; Fig. 2B). Flow cytometry and scRNA-seq were performed on different cells obtained from the same differentiation sample (same differentiation well). Plots were created using R package ggplot2 (version 3.3.0) 80 , and violin plots were created using R package scrattch.vis (version 0.0.210) 81 .

scRNA-seq clustering and visualization of D12, D14, D24, D26 samples
To compare differentiation protocols, cell lines, and differentiation experiments, Protocols 1 and 2 early and intermediate time point cells (D12, D14, D24, D26; n = 15,878 cells) were clustered independently using the Seurat workflow described above (clustering with resolution 0.4 is shown, and clusters are numbered c0-c10; Fig. 4). Pairwise Spearman rank correlation coefficients were calculated between the three cell lines, five differentiation experiments, and two differentiation protocols (Fig. 5A,  Supplementary Fig. S6B, S7A). For independent clustering of time points and differentiation experiments, clustering with resolution 0.4 is shown in Fig. 5C-G, Supplementary Fig. S6C-E,  Supplementary Fig. S7C-G). Per gene variance explained (R²/coefficient of determination) for variables of interest (day of differentiation, protocol, differentiation experiment, differentiation protocol, cell line, # of genes detected, # of UMIs detected) was calculated using getVarianceExplained in R package scater (version 1.14.6) 82 . Distribution of variance explained for variables of interest is plotted for the top 5% of highly variable genes (602 genes identified using getTopHVGs from R packager scran version 1.14.6) 83 in the D12, 24, 24, and 26 population.

scRNA-seq differential expression analysis
Differentially expressed (DE) genes between clusters were identified by performing pairwise comparisons with edgeR (version 3.26.0) 84,85 . RNA composition normalization was performed with calcNormFactors, and negative binomial dispersions were estimated with estimateDisp. DE genes were identified by fitting gene-wise generalized linear models (glmFit; did not include intercept term in design) and performing a likelihood ratio test (glmLRT with contrasts). We retained genes with an absolute log2 fold change >= 1 and Benjamini-Hochberg adjusted p-value < 0.05). To remove potential false positives caused by dropouts in low depth data (i.e., gene expressed very highly in only a few cells within a cluster), we calculated the fraction of cells positive for each gene and retained only genes where the up-regulated group had at least 30% of cells positive for that gene. Heatmaps of top differentially expressed genes (Fig. 2C, Fig. 5E, Supplementary Fig. S2A, Supplementary Fig. S7G) were created using R package pheatmap (version 1.0.12) 86 and show scaled transcript abundance (normalized counts for each gene were scaled and centered). Dendrograms in heatmaps show hierarchical clustering of genes. For visualization purposes, maximum scaled transcript abundance cutoffs were applied in some cases (see heatmap figure legends), and cells are grouped by cluster. Biological Process (BP) gene ontology (GO) enrichment analysis was performed on differentially expressed genes between the following clusters: 1) C2/D0 + C0/D12, 2) C0/D12 + C1/D24, and 3) C1/D24 + C3/D90 using enrichGO function in the R package clusterProfiler (version 3.14.0) 87 . Redundancy in enriched GO terms was removed using the simplify function in clusterProfiler, and the top 10 simplified terms from each pairwise comparison are visualized in Supplementary Fig. S3C. All statistical calculations were performed in R 3.5.1, and plotting was performed using ggplot2 (version 3.3.0) 80 .

scRNA-seq feature selection analysis
In addition to differential expression analysis based on pairwise cluster comparisons, an orthogonal clustering-independent method was used for identifying genes that change between early and . 1000 bootstrap rounds (sampled 80% of cells without replacement at each round) were run with the training set to identify genes that had non-zero coefficients in all 1000 rounds at different values of the regularization parameter, lambda. Genes with non-zero coefficients in all 1000 bootstrap rounds at a given value of the regularization parameter, lambda, were reported as selected (see Supplementary Table S3 for top 40 selected genes ranked by lambda when first selected. Each set of genes selected from Protocol 1 (D12 vs. D24, D24 vs. D90) was used to predict the time point for cells in the corresponding test set by fitting a binomial generalized linear model (glmnet with ridge penalty, alpha = 0, and lambda = 1x10 -6 ) with the training data set and then using the model to predict time point in the test data set (Supplementary Fig. S3). Accuracy of prediction was calculated using the confusionMatrix function from the R package caret (version 6.0-85) 88 with positive = D12 time point for D12 vs D24, and positive = D24 time point for D24 vs D90. As a control for the accuracy of each selected gene set size, we took 100 random gene samples of the same size and calculated accuracy of predicting time point in the test data set. Accuracies are shown as box plots (outliers hidden) for random gene sets and as individual points for selected gene sets (Fig. 3B for D12 vs. D24, Supplementary Fig. S4D for D24 vs. D90).

Replating cardiomyocytes for imaging and RNA FISH
After performing cardiomyocyte differentiation, cells were dissociated into single cells at D12 (described in "Single cell dissociation for scRNA-seq" section) and seeded into glass bottom multiwell plates with an aliquot being used for flow cytometry analysis as described above. First, glass bottom multiwell plates (24-well, Cellvis P24-1.5H-N) were incubated at RT with 0.5 M glacial acetic acid (Fisher Scientific #BP1185-500) for 20-60 min and washed once with sterile milliQ (MQ) water. Wells were then treated with 0.1% PEI (Sigma Aldrich #408727-100ML) solution in sterile MQ water for 16-72 h at 4°C and rinsed with DPBS and sterile MQ water. Wells were then incubated with 25 µg/mL natural mouse laminin (Gibco #23017-015) diluted in sterile MQ water overnight at 4°C and removed immediately preceding cell plating. Cells were seeded at a density of 35,000 to 50,000 cells per well in RPMI-1640 supplemented with B27 containing insulin, 1% P/S, and 10 µM Rock Inhibitor. Media was changed after 24 hrs to RPMI-1640 supplemented with B27 containing insulin and 1% P/S, and media was changed every 2-3 days after until fixation. Cells were fixed at time points indicated in text for RNA FISH; the D18 time point reflects single cell dissociation at D12 and a 6-day recovery period after replating, and the D30 time points reflect additional maturation. Cells at the D30 time point were fixed between D29-D30. Both time points (D18 and D30) were seeded from the same source population of D12 differentiated cardiomyocytes that were replated together and maintained in parallel until fixation. Cells were fixed by removing media and washing twice with RNAse-free PBS, then incubated for 10 min at RT in a 4% paraformaldehyde solution (Electron Microscopy Sciences #15710). Fixation solution was removed, wells were washed once more with RNAse-free PBS, then stored in 70% ethanol at -20°C until RNA hybridization was performed.

RNA FISH using HCR v3.0
Gene validation by RNA FISH was performed using the HCR v3.0 method, following the HCR v3.0 protocol for "Mammalian cells on a slide" with modifications to adapt for samples on glass bottom multiwell plates (Molecular Instruments, https://www.molecularinstruments.com/protocols), as previously described 65 . Fixed wells were washed 4 times with 500 µL 2X SSC (Invitrogen #15557-044), incubated for 30-60 min at 37°C in probe hybridization buffer (Molecular Instruments), and hybridized overnight with 1.2 pmol of each probe set mixture containing 400 U/mL RNAse inhibitor (Enzymatics #Y9240L) at 37°C (250 µL per well). Custom probe sets can be found in Supplementary Table S5. Wells were probed for three genes in each experiment. Wells were washed with probe wash buffer (Molecular Instruments) supplemented with 400 U/mL RNAse inhibitor at 37°C for 30 min, washed 4 times with 2X SSC at RT, and incubated in amplification buffer (Molecular Instruments) for 30-60 min at RT. Hairpin amplifiers were prepared during this time; 18 pmol of hairpin amplifiers were heated to 95°C for 90 sec, protected from light and cooled, then combined and added into an amplification buffer containing 400 U/mL RNAse inhibitor. Hairpin mixtures were added to the appropriate wells (250 µL per well) and incubated for 45 min at RT, protected from light. Hairpin solution was removed, wells were washed with 2X SSC four times, nuclei were labeled with 2 µg/mL DAPI in 2X SSC for 5 min, followed by additional washes with 2X SSC. Samples were stored protected from light at 4°C in 2X SSC with 400 U/mL RNAse inhibitor until imaging.

Imaging RNA FISH samples
Imaging cardiomyocytes after RNA FISH (Fig. 1E, 3E, Supplementary Fig. 5) was performed on a Zeiss spinning-disk microscope with a 40x/1.2 NA W C-Apochromat Korr UV-vis infrared (IR) objective (Zeiss) and a 1.2x tube lens adapter for a final magnification of 48x, a CSU-X1 spinning-disk head (Yokogawa), and Orca Flash 4.0 camera (Hamamatsu) (pixel size 0.271µm in X-Y after 2x2 binning and 0.29 µm in Z). Standard laser lines (405, 488, 561, 640 nm), primary dichroic (RQFT 405, 488, 568, 647 nm) and the following Band Pass (BP) filter sets (Chroma) were used for fluorescent imaging: 450/50 nm for detection of DAPI, and 525/50 nm, 600/50 nm, and 690/50 nm for detection of RNA FISH probes. Brightfield images were acquired using an LED light source with peak emission of 740 nm with narrow range and a BP filter 706/95 nm for brightfield light collection.

Manual cell annotations in Napari
Multi-channel Z-stacks were loaded into Napari (https://napari.org) 89 and 2D single cell masks were generated by manually drawing cell boundaries in 2D while incorporating information from all channels collected during imaging (brightfield, two FISH probe channels, nuclei via DNA dye (DAPI), and alphaactinin-2-mEGFP (structure) signal if present). Single cell masks in fields of view (FOVs) were hand drawn by a single human expert. Cell boundaries were manually drawn for cells that were mostly within the FOV, and low-confidence/high cell density regions with many overlapping cells were avoided. 2D single cell masks were used downstream to create single cell transcript abundance measurements.

DNA (Nuclear) segmentation
Nuclear segmentation in 2D using the DNA channel was performed using CellProfiler (version 3.1.8) 90 . See CellProfiler pipeline for Zeiss in https://github.com/AllenCellModeling/fish_morphology_code for details. The DNA channel maximum intensity projection (MIP) was normalized with CellProfiler's RescaleIntensity module from the 5 th percentile to 95 th percentile of the raw image. Nuclei were segmented using minimum cross entropy thresholding to define the probability distributions of foreground and background regions in an image using CellProfiler's IdentifyPrimaryObjects module. Clumped objects were filtered by shape to identify nuclear objects in close proximity. Objects smaller than 500 pixels were considered debris and discarded. Nuclei were assigned to a cell if their centroids fell within the 2D segmented cell object. Unassigned nuclear objects were discarded and not used for further analysis.

RNA spot segmentation and feature extraction
RNA FISH transcripts were segmented using a transcript-specific segmentation workflow in the Allen Cell Structure Segmenter (Allen Cell Explorer) 91 . MIP image intensities were normalized, a Gaussian smoothing filter was applied to all images, and a 2D spot filter algorithm was applied to segment the transcript signal, where each transcript signal represented the location of the RNA as a diffraction-limited spot. This filter accounted for both the dot radius and the filter response to generate a binary result. Dot intensity levels varied by transcript and thus the filter response parameter was optimized for each RNA species, as listed in the table below. Segmented RNA FISH spots were quantified for each manually segmented cell in an FOV using CellProfiler's IdentifyPrimaryObjects and MeasureObjectSizeShape modules.

Microscope
Probe

Statistical Analysis
Details of specific statistical analyses for each section, sample sizes, and statistical tests used are given in the Methods and in the corresponding figure legends.