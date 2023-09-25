Isolation of mouse LSK cells

LSK cells were obtained using a previously described protocol12. Adult mice were euthanized, and bone marrow was extracted and passed through a 70 µm filter. Cells were centrifuged at 300g for 10 min at 4 °C, resuspended in EasySep buffer (STEMCELL, 20144) at 100 million cells per ml and differentiated cells were removed using the EasySep lineage depletion kit (STEMCELL, 19856). Cells were stained for Sca1 (Sca1-AF488; BioLegend clone D7) and cKit (CD117-PE; BioLegend clone 2B8) and sorted using the MoFlo Cell Sorter (Beckman Coulter) with a 130 µm nozzle.

Mice and derivation of MEFs

MEFs were derived from embryonic day (E)13.5 C57BL/6J embryos (Jackson Laboratory, 000664). Heads and visceral organs were removed, and the remaining tissue was minced with a razor blade, dissociated in a mixture of 0.05% trypsin and 0.25% collagenase IV (Life Technologies) at 37 °C for 15 min and the cell slurry was passed through a 70-μM filter to remove debris. Cells were washed and plated on 0.1% gelatin-coated plates, in DMEM supplemented with 10% FBS (Gibco), 2 mM l-glutamine and 50 mM β-mercaptoethanol (Life Technologies). All animal procedures were based on animal care guidelines approved by the Institutional Animal Care and Use Committee at Washington University in St. Louis.

Lentivirus and retrovirus production

Lentiviral particles were produced by transfecting 293T-17 cells (American Type Culture Collection: CRL-11268) with the pSMAL-CellTag construct (see Supplementary Experimental Methods, CellTag-multi library synthesis), along with packaging constructs pCMV-dR8.2 dvpr (Addgene, 8455), and pCMV-VSVG (Addgene, 8454). Retroviral particles for the bicistronic Hnf4a-T2A-Foxa1 construct were produced as previously described7. Virus was collected 48 h and 72 h after transfection and applied to cells immediately following filtering through a low-protein binding 0.45-μm filter. Wherever applicable, the virus was concentrated using ultracentrifugation. In total, 20 ml of filtered viral supernatant was centrifuged at 50,000g for 2.5 h at 4 °C, the supernatant was removed and the virus was resuspended in 100 µl of DMEM and stored at −80 °C.

Section 1

Species-mixing experiment

For the species-mixing experiment, mouse iEP-LT cells were tagged with CellTag-multi library, containing the barcode pattern (N) 3 GT(N) 3 CT(N) 3 AG(N) 3 TG(N) 3 CA(N) 3 and human HEK 293T cells with CellTag-multi-v0 library, containing the barcode pattern (N) 5 GTA(N) 5 CCT(N) 5 ATC(N) 5 GAT(N) 5 . Nuclei were isolated from both using the 10X Genomics scATAC-seq nuclei isolation protocol (CG000169) and mixed in a 1:1 ratio. The sample was processed using the standard 10X Genomics scATAC-seq library preparation (v1 kit) with modifications to capture CellTags (Supplementary Methods). Single-cell libraries were sequenced on an Illumina NextSeq-500, and sequencing data were aligned to a mixed species reference using CellRanger. The aligned BAM file was used for downstream analysis.

Reads matching v0 or v1 CellTags were parsed from the mixed species single-cell aligned BAM file. Each cell barcode was assigned to one of four categories, based on CellRanger-ATAC species assignments—human, mouse, doublet, noncell; the distribution of v0 and v1 reads was assessed across the four categories. Cells with fewer than two CellTag reads across both libraries were discarded, and the remaining cells were plotted on a species-mixing plot. We quantified interspecies cross-talk of CellTags, by calculating the percent of cells, with at least two CellTag reads per cell, having less than 95% of CellTag reads originating from the correct, species-specific CellTag library.

Assessing the effect of isRT on chromatin accessibility signal

We compared the effect of introducing an isRT step on scATAC-seq data quality. For this, two single-cell ATAC libraries were prepared with CellTagged HEK 293T cells using either the original 10X Genomics scATAC library preparation protocol (Original) or our modified method (Modified). Sequencing data from both were processed with ArchR74, dimensionally reduced using latent semantic indexing, clustered using Louvain clustering, and peaks were identified across samples. Normalized peak counts (counts per million) were calculated for each sample and plotted on a scatterplot, and the Pearson correlation coefficient was calculated to quantify the similarity between the genome-wide accessibility signal of the two samples.

Analysis of clones in expanded reprogramming fibroblasts

A subset of the data obtained from our reprogramming dataset (described in Section 3) from days 12 and 21 was used for this analysis. Clones were identified following the workflow described in Supplementary Methods. CellTag abundance was calculated for each CellTag as the percent of cells containing that CellTag after filtering and binarization. Browser tracks depicting single-cell accessibility fragments were plotted using ArchR. Gene expression and gene score values were averaged on a clonal level. Spearman correlation coefficients were calculated between clonal gene expression and gene score both within (intraclonal) and across clones (interclonal).

Comparison of scRNA-seq, scATAC-seq and 10X multiome-based CellTag capture

The 10X Genomics RNA + ATAC Multiome libraries were prepared from reprogrammed cells from day 21 of replicate two of our reprogramming datasets (Section 3) and compared to a similar number of day 21 cells profiled with scRNA-seq and scATAC-seq for the same replicate. For multiome samples, CellTag amplicon libraries were obtained using cDNA generated during the scRNA part of the library prep (Supplementary Methods, CellTag-RNA PCR) but with 15 cycles of sample index PCR as opposed to the standard 11 and sequenced on a NextSeq-500. Multiome CellTag reads were processed exactly like scRNA-seq CellTag reads. CellTag library complexity was calculated as the total number of unique Cell Barcode—UMI—CellTag barcode combinations detected in each CellTag amplicon library. This analysis was omitted for scATAC-seq CellTag reads due to lack of UMIs. To compare sequencing quality metrics (fraction of reads in peaks and percent mitochondrial reads) multiome, scATAC-seq and scRNA-seq were downsampled to an equal sequencing depth per cell.

Section 2

Lineage tracing during in vitro mouse hematopoiesis

LSK cells were purified as described above, counted and 5,500 cells were added to a 96-well U-bottom suspension culture plate (GenClone, 25-224) and allowed to recover in broad myeloid differentiation media12 consisting of serum-free expansion medium (STEMCELL), penicillin–streptomycin (Pen–Strep), interleukin (IL)-3 (PeproTech, 213-13; 20 ng ml−1), FLT3-L (PeproTech, 250-31L; 50 ng ml−1), IL-11 (PeproTech, 220-11; 50 ng ml−1), IL-5 (PeproTech, 215-15; 10 ng ml−1), erythropoietin (PeproTech, 100-64; 3 U ml−1), thrombopoietin (PeproTech, 315-14; 50 ng ml−1) and mouse stem cell factor (R&D Systems, Q78ED8; 50 ng ml−1) and IL-6 (R&D Systems, 406-ML-005; 10 ng ml−1) at 37 °C for 2 h.

To allow clone tracking, cells were transduced for 2 d with 10 µl of concentrated CellTag-multi virus (~25 k unique CellTag sequences) in 100 µl differentiation media, in the presence of 6 µg ml−1 diethylaminoethyl–Dextran after spin-fection at 800g for 90 min at 37 °C. Sixty hours (2.5 d) after the start of the experiment, 50% of the cells were collected for single-cell profiling. The remaining cells were split into two technical replicates and replated in fresh differentiation media. Finally, all the cells were collected on day 5 for single-cell profiling. At each time point, cells for single-cell profiling were split equally between scRNA-seq (single-index v3 kit) and scATAC-seq (v1 kit) with modifications to capture CellTags (Supplementary Methods). RNA libraries were sequenced on an Illumina NovaSeq-6000 and computationally dehopped. RNA CellTag amplicons were sequenced on an Illumina NextSeq-500. CellTag and transcriptome read files for each sample were processed together using CellRanger, using a custom mm10 reference containing GFP, to produce one BAM file per sample. ATAC libraries containing both accessible chromatin and CellTag fragments were sequenced on an Illumina NextSeq-500 and processed using CellRanger-ATAC, using the default mm10 reference genome. Aligned BAM files from both modalities were used for CellTag processing75, and other CellRanger and CellRanger-ATAC outputs were used for downstream single-cell analyses.

Basic single-cell and clonal analysis of the hematopoiesis dataset

scRNA-seq count matrices were processed using Seurat. Low-quality cells with high mitochondrial reads, low UMIs and features per cell were removed, and the two time points were integrated using SCTransform, dimensionally reduced using principle component analysis (PCA) and clustered using Louvain clustering. Fragments files from scATAC-seq samples were processed using ArchR v1.0.1. Valid cell barcodes (from CellRanger-ATAC) passing default ArchR quality filters were retained. Cells were dimensionally reduced using iterative LSI and clustered using Louvain clustering. Cell types were annotated using known hematopoietic marker genes in scRNA-seq12. Cell-type labels were transferred to scATAC-seq cells using Seurat label transfer, and annotations were manually verified by inspecting the accessibility of marker genes (gene activity scores). For RNA–ATAC co-embedding, scRNA-seq gene expression matrix and imputed76 scATAC-seq gene score matrix were used as input to the RunCCA function in Seurat. A union set of the top 5,000 HVG from each dataset was used for this co-embedding.

For clone calling, the cell × CellTag UMI (for RNA) and read (for ATAC) count matrices were obtained. The RNA matrix was binarized at a threshold of >1 UMI count per cell, and cells with 2–25 CellTags were retained. The ATAC matrix was binarized at a threshold of >1 read count per cell, and cells with 1–25 CellTags were retained. The two filtered matrices were merged, and the cell–cell Jaccard similarity matrix was computed and thresholded at 0.6 (for cell pairs within the same modality) and 0.5 (for cell pairs across modalities). The final thresholded matrix was used to identify clones across the entire dataset. Clone-cell embedding was computed as described in Supplementary Methods, and ForceAtlas2 was used to jointly visualize clones and cells. For single-modality clonal analysis, cell × CellTag matrices for each modality were processed separately with the same thresholds as above. A Jaccard threshold of 0.5 was used for ATAC clone calling and 0.6 was used for RNA clone calling.

For homoplasy simulation, we used a population size of 5,500 cells, 1–25 CellTags/cell and an average MOI of 3.4. A total of 100 simulations were performed, and average values were reported.

Inference of lineage hierarchies using scRNA and scATAC lineage data

Lineage hierarchies were obtained using CoSpar77 using the cospar.pp.initialize_adata_object function PCA for RNA and LSI from ATAC data as input embeddings. The corresponding clone tables were added to each object using the cospar.pp.get_X_clone function. Finally, RNA and ATAC transition maps were computed using the cospar.tmap.infer_Tmap_from_multitime_clones function and fate hierarchies were obtained using cospar.tl.fate_hierarchy for major hematopoietic fates, as indicated in Fig. 2c,d. Finally, CoSpar inferred trees were converted to Cassiopeia78 objects, and the RF distance metric was calculated using the cassiopeia.critique.robinson_foulds function.

To assess changes in inferred fate hierarchies at different dataset sizes, the RNA object was subsampled to either 10,000, 20,000 or 40,000 cells. For each subset, fate hierarchies were inferred independently as described above and RF distance between the full dataset tree and subsampled dataset trees were calculated.

State–fate linkage in hematopoiesis

To link cell state with fate, each clone was assigned a fate label based on the predominant fate among its day 5 siblings. Scarce lineages were grouped for similarity (Ery/Meg, Baso/Eos/Mast, DCs). Clones labeled as transitions or progenitors were excluded from the state–fate analysis, unless specified. Fate bias scores were determined as the percentage of day 5 siblings belonging to the annotated fate label.

Each clone was divided into up to four subclones based on the time point and assay of each sibling, and the clone-cell embedding was recalculated. The overlap between RNA and ATAC subclones across the two single-cell modalities was assessed within each ‘fate potential’ group using the Wasserstein distance metric with a 30-dimensional UMAP-based embedding of the subclone nodes.

To evaluate if state subclones closer to the periphery of a ‘fate potential’ group exhibited less fate bias, we introduced a closeness metric. This metric measures the minimum distance of a state subclone from the centroid of an alternative fate potential group. A higher closeness metric indicates that a state subclone is further away from the centroids of other fate potential groups. We then plotted the relationship between the closeness metric and fate bias using a percentile plot, with the x axis representing the percentile rank for the closeness metric and the y axis showing the mean fate bias scores for state subclones passing that percentile rank.

To characterize functional priming of cell state, day 2.5 state siblings in each fate potential group were compared to the rest in gene expression and TF activity space. For scRNA-seq features, we used residuals obtained for the top 3,000 HVG after SCTransform normalization in Seurat. For scATAC-seq features, we used chromVAR-derived TF activity z scores (default mouse motif set in ArchR—884 TF motifs). Correction for multiple hypothesis testing was performed using the Benjamini–Hochberg method, setting the FDR threshold for significance at 0.05, unless otherwise specified. Additionally, ‘biological process’ GO term enrichment analysis was performed for the top 100 gene markers for each fate potential group using the PANTHER classification system79 (release 17.0; http://geneontology.org/), and terms with FDR < 0.01 were reported in Supplementary Table 4.

Fate prediction from cell state using machine learning

We performed state–fate machine learning to predict cell fate from the early state. A machine-learning classifier used single-cell features X of day 2.5 cells to predict discrete clonal fate labels (for example, ‘progenitor’, ‘monocyte’ and ‘neutrophil’). For RNA only, we used residuals of the top 3,000 genes. For ATAC only, we used TF activity z scores (k-nn imputation with k = 20). For RNA + ATAC, we paired siblings and used combined features. Repeated Stratified k-fold cross-validation (n_splits = 5, n_repeats = 5) was used for analysis, resulting in 25 accuracy/weighted F1 score values. Results are depicted using boxplots.

For each machine-learning task, we tested a panel of classifier architectures, logistic regression, LightGBM and random forest. Each was trained and evaluated using the procedure described above. Hyperparameter tuning was performed for each and the following values were tested:

Random Forest: n_estimators: [100, 300, 1000], max_depth: [10, 50, None], min_samples_leaf: [1,2,4], bootstrap: [True, False]

LightGBM: num_leaves: [7,15,31,80], max_depth: [5,9,30], min_data_in_leaf: [20,40,80], bagging_fraction: [0.8,1], bagging_freq: [3], feature_fraction: [0.1, 0.9]

Logistic Regression: penalty: [‘l2’, ‘none’], C: np.logspace(-4, 4, 20), solver: [‘lbfgs’,‘newton-cg’,‘saga’], max_iter: [1000]

The Python library ‘scikit-learn’ was used for all machine-learning analysis.

Fate prediction using TF activities derived from distal, intronic, exonic and promoter peak sets

ATAC peaks were categorized (intronic, exonic, promoter or distal) using default ArchR definitions. TF activity scores were calculated for each peak set independently and used for the state–fate prediction as described before. To test if performance variation was due to different peak numbers, all sets were randomly subsampled to 8,823 peaks (exonic set size), and state–fate prediction was done using these new scores.

SHAP analysis

The ‘SHAP’ python package was used for SHAP analysis to interpret trained machine-learning models. SHAP values were calculated using the TreeExplainer function from the package for trained random forest models. For each input feature and fate label, SHAP values were computed using each data point in the 25 test sets (n_splits × n_repeats), resulting in 5 SHAP values per data point per feature to average out any outliers caused by model training artifacts.

Feature importance scores were then determined for each input feature regarding the prediction of each fate label by calculating the mean of absolute SHAP values for each feature-fate combination. To identify features positively or negatively correlated with the prediction of a fate label, SHAP correlation was performed. For each input feature, the Pearson correlation coefficient between its values (expression/TF activity) and its SHAP values for a specific fate was computed, resulting in one correlation value per feature per fate.

Section 3

Lineage tracing during iEP reprogramming

Cryo-preserved P0 MEFs were thawed and seeded on 0.1% gelatin-coated six-well plates, in DMEM supplemented with 10% FBS, 2 mM l-glutamine and 50 mM β-mercaptoethanol (Life Technologies) and Pen–Strep at a density of 30,000 cells per well. After overnight recovery at 37 °C, cells were transduced every 12 h for 2 d, with fresh Hnf4α-T2A-Foxa1 retrovirus in the presence of 4 μg ml−1 protamine sulfate (Sigma-Aldrich). During the last round of transduction, the retroviral mixture was supplemented with CellTag-multi lentiviral library to initiate clone tracking. On day 0 of reprogramming, cell culture media was changed to hepato-medium (DMEM:F-12, supplemented with 10% FBS, 1 μg ml−1 insulin (Sigma-Aldrich), 100 nM dexamethasone (Sigma-Aldrich), 10 mM nicotinamide (Sigma-Aldrich), 2 mM l-glutamine, 50 mM β-mercaptoethanol (Life Technologies) and Pen–Strep, containing 20 ng ml−1 epidermal growth factor (Sigma-Aldrich)). After 72 h (day 3 of reprogramming), cells were dissociated, two-thirds of the cells were collected for single-cell sequencing and the remaining cells were replated on six-well plates coated with 5 μg cm−2 type I rat collagen (Gibco, A1048301). Two additional samples were collected on days 11 and 21 for single-cell sequencing. We used the 10X Genomics 3′ scRNA kit (v3.1; dual index) and the scATAC kit (v1.1) for single-cell profiling. This experiment was performed in two biological replicates.

CellTag PCR was performed for all scRNA-seq and scATAC-seq libraries, as described in Supplementary Methods. scRNA-seq and scATAC-seq libraries were sequenced on an Illumina NovaSeq-6000. CellTag amplicon libraries were sequenced on an Illumina NextSeq-500 to avoid any index hopping-related artifacts.

Basic single-cell and clonal analysis of the direct reprogramming dataset

scRNA-seq count matrices were processed using Seurat. Low-quality cells with high mitochondrial reads, low UMIs and features per cell were removed, and all time points and biological replicates were integrated, dimensionally reduced using PCA and clustered using Louvain clustering. Single-cell identity scores were obtained using Capybara, using Fibroblasts (MEFs), and reprogrammed cells and dead-end trajectory references from a previous dataset7. Cells from days 12 and 21 were subsetted, reclustered and annotated as ‘reprogrammed’, ‘dead-end’ or ‘transition’ based on these cell identity scores and marker gene expression. scATAC cells were processed exactly as the LSK dataset. Cells were annotated as ‘reprogrammed’, ‘dead-end’ or ‘transition’ based on marker gene accessibility. For RNA–ATAC co-embedding, scRNA-seq gene expression matrix and imputed76 scATAC-seq gene score matrix were used as input to the RunCCA function in Seurat. A union set of the top 2,000 HVG from each dataset was used for this co-embedding.

For clone calling, cell × CellTag UMI (for RNA) and read (for ATAC) count matrices were obtained for each modality. The RNA matrix was binarized at a threshold of more than one UMI count, and cells with 1–25 CellTags were retained. The ATAC matrix was binarized at a threshold of more than one read count, and cells with 1–25 CellTags were retained. To reduce false-positive rates, highly abundant single-CellTag signatures (single-CellTag signatures that were also present in multi-CellTag signatures) were removed from our analysis. The two filtered matrices were merged, and cell–cell Jaccard similarity matrix was computed and thresholded at 0.6. The final thresholded matrix was used to identify clones across the entire dataset. Clone-cell embedding was computed (Supplementary Methods), and the UMAP algorithm was used to jointly visualize clones and cells.

For homoplasy simulation, we used a population size of 30,000 cells, 1–25 CellTags/cell and an average MOI of 2.25. Consistent with our clonal analysis, simulated single-CellTag signatures that were also present in simulated multi-CellTag signatures were excluded from homoplasy analysis. A total of 100 simulations were performed, and average values were reported. True/observed rate of homoplasy was calculated by comparing CellTag signatures of single cells across the two biological replicates.

State–fate analysis for the direct reprogramming dataset

Clones were annotated with one of the following three fates: reprogrammed, transition or dead-end, based on the modal cell type among fate siblings. Clonal fate bias scores were calculated as the percentage of fate siblings (days 12 and 21) belonging to the annotated fate label. Alluvial plots were constructed using the ggAlluvial R package. State–fate machine-learning analysis was performed as in the ‘Fate prediction from cell state using machine learning’ section to predict ‘reprogrammed’ or ‘dead-end’ fates.

CellRank analysis was performed on a 40,000-cell subset of the scRNA-seq dataset due to scalability limitations. For feature enrichment analysis, day 3 siblings in state–fate clones were grouped by fate. Both on-target and off-target cell groups were expanded using k-nearest neighbors (k = 5) for peak and TF activity comparisons. TF activity results were further refined by discarding TFs with low gene score-TF activity correlation (<0.3). Motif enrichment analysis was performed using the HOMER package80 on on-target and off-target DERs with MEF DERs as background. Genomic regions annotated as dELS, pELS, dELS, CTCF-bound or pELS, CTCF-bound in the SCREEN database42 were used for enrichment analysis.

The FigR41 package was used for peak-to-gene linkage analysis. Optimal matching was used to pair RNA and ATAC cells from the same time points, followed by the runGenePeakcorr function to identify peak–gene pairs. Peak–gene pairs with an adjusted P value < 0.05 were retained. FOXA1 and HNF4A chromatin immunoprecipitation followed by sequencing (ChIP–seq) peaks from day 2 of reprogramming were obtained53 and added as custom annotations in ArchR. Single-cell accessibility z scores for each peak set were computed using the addDeviationsMatrix function in ArchR.

Computational analysis related to ZFP281 motifs

Tomtom analysis56 from the MEME-ChIP package was used to find highly similar motifs to Zfp281. The Zfp281 position frequency matrix was obtained from ArchR and used as input to the Tomtom web interface. Highly correlated TF motifs with q value less than 0.05 were obtained, and these were further subsetted for TF activities enriched in off-target destined cells resulting in a total of four TF motifs for comparison with Zfp281. ZFP281 ChIP–seq peaks were obtained54, and single-cell accessibility z scores were computed using the addDeviationsMatrix function in ArchR. ZFP281 gene targets57 were used as inputs for a state–fate prediction model, which was trained and evaluated as described above and compared to a sized-matched set of random genes.

Plasmid cloning related to Foxd2 and Zfp281 experiments

Nontargeting shRNA construct was obtained from Sigma-Aldrich (SHC202; pLKO.5-puro Control Plasmid). Zfp281 targeting shRNA gene was obtained from Sigma-Aldrich (clone ID: TRCN0000255746) and cloned into the pLKO.5-puro lentiviral construct (Sigma-Aldrich, SHC201). For OE, cDNA fragments were cloned in the pGCDNsam retroviral construct. Zfp281 cDNA was obtained from OriGene (MC205914) and Foxd2 cDNA was RT from RNA obtained from long-term iEP cells.

Reprogramming with Foxd2 and Zfp281 perturbations

Reprogramming was performed as described above, with the following modifications. For OE, cells were transduced with a 1:1 mixture of Foxd2/Zfp281 retrovirus and Hnf4α–Foxa1 reprogramming retrovirus every 12 h for 2 d. Control cells were transduced with a 1:1 mixture of a GFP control retrovirus and Hnf4α–Foxa1 reprogramming retrovirus for the same amount of time. For KD, cells were transduced with the nontargeting control/Zfp281–shRNA lentivirus every 12 h for 1 d after the 2-d Hnf4α–Foxa1 retroviral transduction was completed.

Single-cell analysis for Foxd2 and Zfp281 experiments

scRNA-seq libraries were prepared for all four samples (Zfp281 OE, OE control, Zfp281 KD and KD control) and sequenced on a Nextseq-500. Count matrices were generated and integrated using CellRanger count and aggr commands and processed using Seurat. Quality filtering was performed to remove cells with high mitochondrial reads and low UMIs and genes per cell. Cells were dimensionally reduced using PCA, cell cycle regressed, clustered using Louvain clustering and visualized using UMAP. Capybara identity scores were calculated as described in the ‘Basic single-cell and clonal analysis of the direct reprogramming dataset’ section above. Markers for each lineage across time points and uninduced MEFs were obtained (log 2 (fold change) > 0.7, adjusted P < 0.05) and used for gene module scoring for all four samples. Cell clusters enriched with on-target or off-target markers were annotated with the respective fates, and GO analysis was performed as described above (‘State–fate linkage in hematopoiesis’).

Spectra analysis for signaling pathways

Mouse-specific ligand–receptor pairs for each pathway were downloaded from the CellChat database. Top 25 genes positively associated with TGF-β signaling from the pROGENY81 database were also obtained. These gene lists were provided as global gene sets in Spectra. For cluster-specific factor fitting, seven gene lists enriched along the on-target and off-target reprogramming lineages at each time point and uninduced MEFs were used. Spectra model fitting was done with λ = 0.01, and resulting factor lists were compared to input gene lists to identify a BMP signaling factor and an activin/nodal/TGF-signaling factor.

Colony formation assays

Colony formation assays were performed as previously described7. Reprogramming cells were seeded at low plating density in collagen-coated six-well plates within the first 4 d and allowed to form colonies over 2 weeks of reprogramming. Following this, cells were fixed using 4% paraformaldehyde, permeabilized using 0.1% Triton-X and processed for CDH1 (E-cadherin) staining using the VIP peroxidase substrate kit (Vector Laboratories, SK4600) and anti-mouse E-cadherin primary antibody (BD Biosciences; 1:100). Stained colonies were imaged using a flatbed scanner and quantified using the following script: https://github.com/morris-lab/Colony-counter.

Quantitative PCR and analysis

Cells were collected for RNA extraction (RNeasy kit; Qiagen) on day 12 of reprogramming and RT using the Maxima RT kit (Thermo Fisher Scientific, K1672). A total of 20 ng of RT RNA was mixed with TaqMan Gene Expression Master Mix (Thermo Fisher Scientific) and gene-specific TaqMan probes (Supplementary Table 11) in a 20 µl reaction volume and processed according to manufacturer’s instructions (4371135) on the StepOne Plus qPCR system. Per gene fold change for Foxd2 overexpressing cells was calculated relative to control reprogramming cells (Hnf4α–Foxa1 and GFP control OE) that were processed in parallel, after normalization to the housekeeping gene, Actb.

Reprogramming with activin/nodal/TGF-β signaling inhibition

Cells were reprogrammed as previously described. They were cultured in hepatic media with 2.6 µM SB431542 (STEMCELL, 72232) from day 0, changing the media every 2 d. On day 5, cells were collected for qPCR analysis and processed accordingly. Additionally, colony formation assays were conducted following the procedure described above.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.