Introduction

The diploid eucaryotic genome requires precise spatial-temporal regulation during development and homeostasis1,2,3. Autosomal gene transcription can be carried out in three states; bi-allelic, deterministic monoallelic (DeMA – the allele selection is predetermined. e.g., imprinted genes), and random autosomal monoallelic (RaMA – the allele selection is not predetermined. e.g., each allele has an equal chance to become the active or the inactive allele)4. Alteration to the homeostatic expression from the selected DeMA or RaMA gene may result in allele heterogeneity, thus may partly explain the variable penetrance of disease-associated mutations caused by monoallelic genes5. Hence, extensive investigation into the establishment and mechanism of monoallelic gene expression can provide insights into development, homeostasis, and disease. RaMA gene expression is widely established in early mammalian development6,7,8,9,10. This gene expression paradigm has been explored using bulk cell transcriptomic data generated from in vitro F1 mouse embryonic stem cell (mESC) derived neural progenitor cells7,8, however, it has not been extensively investigated in cells or tissues committed to different lineage differentiation paradigms at single cell resolution. Sex-biased autosomal gene expression has been recently reported, but allele-exclusive expression has not yet received much attention11,12. The regulatory bias based on the alleles’ origin and alleles’ genetic background in autosomal monoallelic genes has been overlooked, as dogma implies that there is no distinction between maternal and paternal alleles based on the similarity of the nucleic acid chemistry. Thus, here we sought to investigate the allele-specific expression profiles of monoallelic genes in cells of the cardiac lineage.

Here we show the importance of allele-specific monoallelic gene regulation in cardiac cells. In addition, we find epigenetic differences between deterministic and random autosomal monoallelically expressed genes. Further, we also find a greater contribution of the maternal versus paternal allele in cardiac development, homeostasis and disease, highlighting the importance of maternal influence to male cardiac tissue homeostasis. Taken together we emphasize the importance of allele-specific insights into gene regulation.

Results

Allele-bias of DeMA gene expression

The establishment of monoallelically expressed genes in cardiac lineage cell types is currently unknown. We initially implemented an allele-specific bulk-RNA-seq approach on highly morphogenic (C57Bl/6J-maternal × CAST/EiJ-paternal) male F1 mouse embryonic stem cells (mESC) derived from three blastocysts and in vitro derived heterogenous cardiac precursor cell cultures (CPCCs) from those three F1 mESC clones. CPCCs are predominantly enriched with cardiac stem cells (CSCs) and cardiac progenitor cells (CPCs; Fig. 1a and Supplementary Fig. 1a–c), but also contain an additional three cell types, including Mef2c positive cardiac precursor cells (Fig. 2c and Supplementary Fig. 2b). Since the CPCCs contain lineage committed naive and fully differentiated CSCs and CPCs, these cultures exhibit a transcriptional profile similar to mESCs and a transcriptional profile specific to cardiac lineage cells (Fig. 1b and Supplementary Fig. 1c, d). To examine the parental origin of the transcripts, we systematically implemented a rigorous allele-specific gene filtering algorithm13 so that the significantly expressed autosomal genes from each parental background could be identified as deterministic maternal (mat-mono), deterministic paternal (pat-mono) and bi-allelically expressed gene cohorts (Fig. 1c, d and Supplementary Fig. 1e). In this rigorous filtering approach, we only considered a gene to be a DeMA gene if it is solely expressed from one parental background in all three clonal biological replicates. The intersect of all mESC and CPCCs mat-mono and pat-mono genes showed no allele switching of expressed genes between the cell types (except for three outliers) (Fig. 1e). Interestingly, during mESC to CPCC differentiation, we observed that the 994 monoallelically expressed genes in mESCs had changed their expression to biallelic status and the 468 biallelic genes in ESCs had changed to monoallelic status (Supplementary Fig. 1f, g). The newly identified 266 genes out of 1496 monoallelic genes in CPCCs function in biological processes relevant to cardiac lineage cells (Supplementary Fig. 1h) indicating the biological relevance in the establishment of DeMA genes in cardiac lineage cells.

Fig. 1: Allelic expression of autosomal monoallelic genes.
figure 1

a F1 mESCs grown in 2i medium and in vitro derived CPCCs cultures. The representative images shown here from three independent biological replicates. Scale bar-100 μm. b, Hierarchical cluster heat map of F1 mESC and CPCCs. Gene wise dispersion by conditional maximum likelihood (Log2(CPM)) was assessed using EdgeR pipeline setting the cut-off z-score at four. Bulk-RNA-Seq performed with biological triplicates. c, d Volcano-plot showing maternally and paternally expressed genes in mESCs and CPCCs. Log2 fold changes of the (-)X axis coordinates are deterministic mat-mono genes (red dotted line area) and the (+)X axis coordinates are deterministic pat-mono genes (blue dotted line area). Differential gene expression analysis was performed using EdgeR pipeline. e Intersects of mESC and CPCCs monoallelic genes. Allele-specific monoallelic genes: Some are common between mESCs and CPCCs, the remainder are unique to each cell type and do not switch alleles (except three outliers). f Mat-mono and pat-mono establishment is imbalanced in both mESCs and CPCCs. The number of actively transcribing versus mature transcripts differ. g, h Intersects of actively transcribing genes versus genes with mature transcripts. Higher proportion of genes contain either only the mature transcripts or only the nascent transcripts. i qRT-PCR analysis for representative actively transcribing monoallelic genes lacking mature transcripts. Nascent and mature transcripts were assessed using a Click-iT™ assay and mature transcript–control was assessed on total RNA used in the Click-iT™ assay. Single technical replicate from each of the three biological replicates (n = 3) used in qRT-PCR analysis. Nascent transcripts expressed were normalized to nascent Actb transcripts expressed and the mature transcripts expressed were normalized to mature Actb transcripts expressed. Source data 1-8. Abbreviations: ‘-mat’: maternally expresses genes, ‘-pat’: Paternally expressed genes, Bi & RaMA: Biallelic and Random autosomal monoallelic genes.

Fig. 2: Allele expression in cardiac cell types and allele-specific differential epigenetic regulation of monoallelic genes.
figure 2

a Pseudo time trajectory analysis for mESCs, CPCCs and CBs scRNA-Seq data. Trajectory shows the linear-lineage correlation of three experimental points. State 1: mainly mESCs, State 2, 3, and 4: mainly CPCCs, State 5: mainly CBs. Nodes of the trajectory indicate the terminating points between the experimental points. bd Uniform Manifold Approximation and Projection (UMAP) based systematic cell cluster analysis for mESC, CPCCs, and CBs scRNA-Seq data points. UMAP plots show three clusters for mESC, five for CPCCs (Cluster 0: CSC, Cluster 2: CPC) and four for CBs (Cluster 0: CF, Cluster 1: CM, Cluster 2: EC, Cluster 3: PE). e Number of bi-allelic and DeMA genes expressed in seven cell types. For each cell type, the number of bi-allelic and DeMA genes expressed are shown for two transcript categories: mature transcripts and nascent transcripts. f Intersects of all maternal and paternal monoallelic genes in cardiac lineage cell types. Significantly higher number of mat-mono genes are shared between all cardiac cell types than pat-mono genes. Allele switching less commonly occurred between maternal and paternal backgrounds; 100 intersects are shown. Nodes show the intersected gene size and the line shows the cell types that are sharing the intersected genes. g Histone signatures of mat-mono genes. Histone marks enrichment represented as Combined score, -Log10(adjP-value), -Log10(P-value) for each signature. The p-value is computed from the Fisher exact test assuming binomial distribution and independence for probability of any gene belonging to any set. Combined score was computed by using the log of the P-value from the Fisher exact test and multiplying that by the z-score of the deviation from the expected rank. Benjamini–Hochberg method was used to correct the multiple hypothesis testing to compute the adjusted p-value. h Schematic representation of histone mark association for DeMA gene alleles. Maternal allele: Purple colored. Paternal alleles: Blue colored. Abbreviations: ‘-mat’: maternally expresses genes, ‘-pat’: Paternally expressed genes, Bi & RaMA: Biallelic and Random autosomal monoallelic genes. Source Data 9-36, 46-57.

To investigate whether distinct maternal versus paternal DeMA gene expression is due to the parent-of-origin of the alleles or the alleles’ genetic background, we re-analyzed three publicly available RNA-seq data sets (two mESC bulk RNA-seq: GSM240589714, one bulk RNA-seq from 32-cell blastocysts: GSE15210315 and one scRNA-seq data set from 32-cell blastocysts: GSE8081016) generated from C57BL/6J × CAST/EiJ reciprocal crosses. Though the number of genes expressed in the same cell types in the reciprocal crosses is the same, we found that the divergence of the imbalance of DeMA gene expression from maternal and paternal alleles is due to the genetic background of the parental alleles. We found that the C57BL/6J allele is preferentially expressed in in vitro mESCs and uncultured cells from blastocysts, which to our knowledge, was not previously reported in any studies using crosses between different genetic backgrounds (Supplementary Fig. 1i–m).

We observed that in mESCs and CPCCs, the number of mat-mono genes expressed was 5.5-fold and 5.0-fold higher than the number of pat-mono genes, respectively (Fig. 1f). To test whether this imbalanced allelic expression can be explained by differential transcription bursting frequencies in each genetic background, we assessed the active transcription status of the maternal and paternal alleles in our RNA-seq data in silico. We observed in both mESCs and CPCCs that the number of actively transcribed mat-mono genes is lower than the total mat-mono gene number (Fig. 1f) and that no nascent transcripts were detected for a subset of mature mat-mono mRNAs (ESCs: 88.2% and CPCCs: 85.7% from the total RNAs; Fig. 1g, h), perhaps due to those genes’ longer time-lag between transcriptional bursts. Conversely, a significant number of pat-mono actively transcribing genes were lacking, or exhibited a significantly reduced number of, mature transcripts (ESCs: 79.5% and CPCCs: 79.3% from the total active RNAs; Fig. 1g, h). To validate this unexpected observation, we performed nascent and mature RNA specific qRT-PCR for several candidate genes (Fig. 1i). qRT-PCR results were in accordance with our in silico observation and thus we suggest this is possibly due to delayed pre-mRNA processing or rapid mRNA decay17,18, or a yet unknown nascent mRNA perpetuation mechanism of those mRNA species.

Allele-biased DeMA gene expression in cardiac lineage commitment

To investigate allele-biased DeMA gene expression further in lineage committed and terminally differentiated somatic cardiac cells, we conducted allele-specific scRNA-seq analysis in mESCs (215-cells) and cells from in vitro derived CPCCs (175-cells) and cardiac bodies (CBs, 232-cells; Supplementary Fig. 1a and Supplementary Movie 1). Pseudo-time trajectory analysis confirmed the linear-lineage correlation of the mESCs, CPCCs and CBs (Fig. 2a). Next, the cellular heterogeneity of each point was defined by a systematic cell-cluster analysis approach (Fig. 2b–d and Supplementary Fig. 2a–c). From those cell-clusters we investigated allele-specific expression in seven known cell types: G1/S-mESCs, CSCs, CPCs, CF-cardiac fibroblasts, CM-cardiomyocytes, EC-endocardium cells, and PE-pericardial cells. Cardiac lineage cell-clusters did not further sub-cluster in cell cycle analysis. In agreement with the observation from bulk-RNA-seq data (Fig. 1f, g), allele-specific scRNA-seq analysis confirmed that the distinctive DeMA gene expression pattern from parental genomes remains preserved between mESCs and six cardiac lineage cell types (Fig. 2e and Supplementary Fig. 2d). In addition, active mat-mono and pat-mono genes exhibit imbalanced allelic expression frequencies, either higher in pat-mono genes (e.g., CSC, CPC, EC, and PE) or higher in mat-mono genes (e.g., ESCs, CF, and CM; Supplementary Fig. 2e). Further, a higher proportion of mat-mono genes are shared between cardiac cell types than pat-mono genes and they rarely exhibit allele-switching (Fig. 2f).

To further validate our data sets, we investigated whether known monoallelic genes expressed in the heart were included in our CM specific DeMA gene set. We observed the allele-specific expression of several known imprinted genes including H19, Snrpn, Cobl, and Dlk1 agreeing with our data set, but some imprinted genes, such as Peg1019, was observed to be biallelically expressed in CMs. It has been shown that some of the imprinted monoallelic genes show clustered gene expression. Although it was previously demonstrated that Ubea3a exhibited clustered gene expression with Snrpn, we did not see Ubea3a expressed in our single-cell CMs specific gene set. Instead, we detected that Snurf, which is within the Snrpn cluster, is also maternally expressed. As previously reported, we also observed the maternally expressed Begain, which is positioned at the Chr12 centromeric boundary, physically distanced ~383 kb to the Dlk1 imprinted gene at the Dlk1-Meg3 imprinted cluster20,21,22. However, we saw no Meg3 expression in CMs perhaps due to the cell type specific gene expression of Meg3 in the heart. Some of the differences we detect between our DeMA genes and others may be due to the different technical approaches we used (e.g., single cell vs bulk tissue).

DeMA genes contain distinctive epigenetic marks

Allele-specific gene expression may be explained by the distinctive epigenetic status of the alleles23,24. However, the histone signatures of cardiac monoallelic genes have been largely unexplored. Previous studies suggest that H3K27me3 and H3K36me3 are the primary histone signatures of monoallelic genes25,26,27, but the signatures of the active versus silent allele were not examined individually. To address the allele-specific histone signatures of DeMA genes, we compiled all the mat-mono and pat-mono gene candidates identified in the seven cell types with ENCODE histone modification data to probe their common histone signatures (Supplementary Data 1). This analysis revealed that the maternal and paternal monoallelic alleles are enriched for different and unique histone signatures (Fig. 2g and Supplementary Fig. 2f). H3ac, H3K79me2/3, H3K4me3, H3K27ac, and H3K9ac are the signatures significantly enriched on mat-mono genes. H3K27ac marks the active gene enhancers28,29. Bivalent enrichment of H3K4me3 and H3K9ac and; H3K27ac and H3K79me2 signatures occupy highly active gene promoters. In contrast, the same promoters contain H3K27me3 and H3K9me3 histone marks (lowly abundant in our data, Supplementary Data 1) when those same genes become transcriptionally silent in cardiomyocytes30, suggesting the possibility of H3K27me3 and H3K9me3 silent histone signatures marking the silent paternal alleles of maternally expressed genes. In our pat-mono genes, the most significant signature is H3K27me3, a silent gene histone signature at the promoter or downstream of the transcription start sites (TSSs). H3K4me1 (enhancer mark) is also present (less significance: combined score = 0.33) in our pat-mono genes, suggesting a model where H3K4me1 and H3K27me3 perhaps bivalently occupy the poised enhancers31 and H3K4me1, the unimodal histone mark, occupies the poised promoter of the active paternal allele. Heterochromatin associated H3K9me3 and H3K36me3 histone marks are lowly abundant in our pat-mono genes (Supplementary Data 1). We speculate that the repressed maternal alleles of pat-mono genes are either marked by H3K27me3 and H3K9me3 at the promoter regions simultaneously with H3K36me3 in the genes’ body, or the enhancer/promoters maintain their silent state, perhaps by another mechanism such as miRNA, siRNA, or DNA secondary structures32,33. We also observed that the average transcription levels were higher in mat-mono genes than in the pat-mono genes in mESCs, CFs, and CMs (Supplementary Fig. 2e), but in the CSCs, CPCs, ECs, and PEs, the average transcription levels were higher in pat-mono genes than mat-mono genes. This may be the result of the enhancer/promoter association of different chromatin signatures in different cell types, which means for our data that in mESCs, CFs, and CMs, the enhancer associated H3K27ac mark may promote a higher rate of mat-mono gene transcription, while in pat-mono genes H3K4me1 poised promoters may drive steady-state transcription, resulting in a lower transcription levels34,35. In contrast, in CSCs, CPCs, ECs, and PEs, unimodal H3K4me1 signatures on promoters (proximal to TSS) may result in higher pat-mono gene transcription, a muscle cell development gene specific epigenetic mechanism36. For the most part, allele specificity of monoallelic genes does not change between cardiac cell types although a few outliers exhibit a switch of the parental origin but remained monoallelic (Fig. 2f). This tightly regulated mechanism, in part in mat-mono genes, may be driven by H3K79me2/3, which may permanently ‘bookmark’ active alleles through mitosis37 as the H3K79me2/3 mark occupies active gene bodies through mitosis but is not involved in gene activation30,38,39,40. Taken together, incorporating our data with the published data, we propose an allele-specific histone signature model for DeMA genes as illustrated in Fig. 2h.

To experimentally validate the histone signatures in our proposed model, as a proof of principle, we next assessed the six allele-specific histone signatures in mESCs (Supplementary Fig. 3a) using a CUT&Tag41 assay. As expected, we detected strong H3K4me3 and H3K27ac signals around the TSS of the genes (Supplementary Fig. 3b). However, when we assessed the allele-specific enrichment (Supplementary Fig. 3c, d), we noticed that two peaks were present upstream and downstream of the TSSs, probably indicating proximal enhancers/promoters at the 5′ of the TSSs and at the promoters 3′ of the TSSs. H3K4me1, the only enhancer mark on the paternal alleles in mESCs was noticeably enriched upstream of the TSSs as well as in the gene bodies, probably at intergenic enhancers. H3K79me2 and H3K79me3 known to mark gene bodies, was also observed at the TSSs, suggesting that those signatures may be involved in assisting the enhancer-promoter interactions of actively transcribing genes42,43. Moreover, we found that H3K36me3, a signature of actively transcribing gene bodies, was enriched at the promoter regions (Supplementary Fig. 3a), but was not strongly present at DeMA gene promoters, matching to the histone signatures we modeled for mESC (Supplementary Fig. 3a) DeMA genes (Supplementary Fig. 3c, d).

Next, to validate the histone signature model we proposed for DeMA genes, we assessed our mESCs’ maternal and paternal gene cohorts from bulk-RNA-seq data in allele-specific histone signatures from the CUT&Tag assay. We found 78.4% of our maternal DeMA genes fell within the maternal-specific histone signatures (Supplementary Fig. 3e, g). However, the less probabilistically significant (P = 0.03) in silico predicted paternal allele-specific histone signatures (Supplementary Fig. 2f – lower panel) only matched with 21% of the genes we identified in our CUT&Tag assay (Supplementary Fig. 3c, h). We reason that this is a probabilistic issue of the analysis of paternal DeMA genes caused by the predicted low-number (n = 2) of parental specific histone marks from in silico analysis. Therefore, our histone signature model, at least in part, is supported by the histone signatures associated with the maternal DeMA genes.

Context dependent RaMA gene expression

RaMA genes stochastically select the expressed and repressed alleles. To assess whether RaMA genes share the same regulatory mechanism as DeMA genes, we first identified RaMA genes in seven cell types using a rigorous gene filtering method (Supplementary Fig. 4a). In line with previous reports, we observed that RaMA genes are not clustered, rather they are randomly positioned in the genome7,8,9,44 (Fig. 3a). An increased number of RaMA genes has been reported in in vitro cell differentiation paradigms7,8. To determine whether an increase in RaMA expression occurs in the cardiac lineage, we first evaluated bulk RNA-seq data from mESCs and CPCCs (Supplementary Fig. 4b). Although the number of RaMA genes in the two cohorts is comparable, the gene sets are distinctive. The mESCs and CPCCs are heterogenous (Fig. 2b, c), thus making it difficult to accurately assess RaMA gene expression. Therefore, to investigate RaMA establishment in homogenous cell populations, we next evaluated scRNA-seq data of mESCs, CSCs and CPCs. Contrary to bulk-RNA-seq data, we observed modest increased establishment (1.44%, 2.8%, 2.65%, respectively) of RaMA gene expression within the total expressed genes in mESCs, CSCs, and CPCs (Fig. 3b). Further, except for CMs, other cardiac cells also exhibit an increase in the number of RaMA genes. This is perhaps because the mouse CMs are multinucleated and therefore, when analyzing the CMs at the single-cell level, but not on the single nucleus level, the true representation of RaMA genes in CMs has not been captured. Next, we assessed whether RaMA gene establishment in CSCs and CPCs is required for establishing DeMA genes in more differentiated cells of the cardiac lineage. However, we found that these two instances are independent processes (Supplementary Fig. 4c) and that RaMA genes are mostly cell type specific, thus RaMA gene establishment may not directly correlate with lineage commitment (Supplementary Fig. 4d).

Fig. 3: RaMA genes exhibit lineage specificity and expression is regulated by distinct histone modifications to monoallelic genes.
figure 3

a Spatial arrangement of CSC, CPC, CF, and EC RaMA genes in the autosomes. Columns in each circle represent the gene positions. b Percentages and the fold changes of RaMA genes upon ESC differentiation (scRNA-seq). Percentages are calculated to non-monoallelic genes and the fold changes are with reference to RaMA gene number in mESCs. c Histone signatures of RaMA genes from seven cell types. Thirty terms and 60 RaMA genes with the highest significance (combined score) are shown. Enriched terms are columns (column order is the sum) and the input genes are the rows. d Number of transcripts in RaMA genes in seven cell types. One-sample t-test was performed and the error bars presents the standard deviations of the mean values. Each data point represent a RaMA gene, n = 159(ESC), 283(CSC), 211(CPC), 399(CF), 26(CM), 247(EC), and 83(PE). e The number of known protein coding transcripts per gene in protein coding RaMA genes in seven cell types. Only experimentally validated proteins matched to merged Ensembl/Havana transcripts were considered as protein coding transcripts. f Schematic representation of the extrapolation of the histone mark association for expressed RaMA gene alleles. RaMA random autosomal monoallelic genes. Source data 37–45.

RaMA gene regulation is distinctive

Expression from a single allele is common to both RaMA and DeMA genes. Therefore, we speculated that the active and inactive alleles of these two gene cohorts may share similar histone signatures. Active RaMA gene alleles are primarily marked by H3ac, H3K27ac, H3K4me3, H3K79me2/3 and H3K9ac histone signatures (Fig. 3c and Supplementary Data 2). H3K27ac is known to mark active gene enhancers28,29. H3K4me3 and H3K9ac and; HeK27ac and H3K79me2 bivalently occupy highly active gene promoters30. It has been hypothesized that once established, active alleles and silent alleles are ‘bookmarked’ in a clonal cell population4. H3K79me2/3 is known to establish at active gene-bodies in an in vitro and in vivo CM differentiation paradigm45. The levels of H3K79me2 are known to fluctuate but are not erased during mitosis46,47, thus H3K79me2/3 may serve as a ‘bookmark’ of the active allele of a RaMA gene. However, in our analysis, we found H3K36me3 as the only histone signature associated with heterochromatic RaMA genes. H3K36me3 is known to recruit HDAC to prevent redundant run-off RNA Pol II transcription48,49 thus, a loose-interplay between H3K36me3 and H3K79me2 in the allele body may explain a role in orchestrating alleles to gain their active or silent signatures in a clonal cell population. Further, H3K36me3 and H3K79me2 marks have been shown to be present in nucleosomes and H3K36me3 is enriched within exons, which has been suggested to inhibit pre-mRNA splicing50,51,52. We found that most of the RaMA genes express five to six alternatively spliced transcripts on average (Fig. 3d), however, 64% to 74% of these genes result in a single identified protein coding transcript (Fig. 3e), suggesting a possible histone model in which H3K36me3 may safeguard the unnecessary splicing events in the single protein coding transcripts of the RaMA genes (Fig. 3f). Interestingly, ENCODE histone modification data for our RaMA gene set from the six cell types we studied reveals greater similarity to maternal DeMA genes, but is distinctive from pat-mono histone signatures (Supplementary Data 1 and 2). However, at this point, we are unable to provide an explanation for this observation, as ours or others’ published data do not lead to any avenue to explore the above observation.

To investigate whether the histone signature model we proposed for RaMA genes by in-silico analysis could be validated by experimental evidence, we evaluated our CUT&Tag data for a RaMA gene histone signature. Because the number of RaMA genes we observed in our bulk-RNA-seq data from mESCs was low (13 genes), we used the RaMA genes from scRNA-seq data in mESCs. We found 85.6% of the RaMA genes agreed with the enriched promoter and gene body histone signatures in our CUT&Tag data (Supplementary Fig. 4e–g), indicating the validity of the histone signature model we proposed for RaMA genes.

DeMA genes’ critical impact on the heart

Some DeMA genes are common to CSCs and CPCs, but some are unique to each cell type (Fig. 4a). Because of the lineage relationship of these cells, we sought to assess the significance of common and unique monoallelic genes in heart development. Surprisingly, we found that unique pat-mono alleles in CPCs are exclusively regulated by 10 TFs (Supplementary Data 3); among this group five are Zinc finger proteins (Zfps) - (Fig. 4b and Supplementary Fig. 5a). Further, those 10 TFs are known to be involved in the processes of muscle and heart cell homeostasis (Fig. 4c and Supplementary Data 4). A significant number of uniquely expressed mat-mono and pat-mono gene cohorts in CSCs and CPCs are enriched for processes related to heart development (Supplementary Data 4); noticeably, mat-mono genes contribute to embryonic heart development and young and adult heart cell physiology (e.g., MGS206: P = 2.04E-10, MGS210: P = 5.92E-10 - Fig. 4d and Supplementary Fig. 5b), while pat-mono genes are involved in young and neonates’ heart musculature (e.g., MGS844: P = 0.01, MGS766: P = 0.037 - Fig. 4e). The genes common in CSCs and CPCs in pat-mono and mat-mono cohorts exhibit diverse and broad functional profiles (Supplementary Fig. 5c, d); mat-mono genes are significantly involved in cardiomyocyte physiology (e.g., MGS320: P = 3.78E-18, MGS328: P = 3.74E-17), whereas pat-mono genes (e.g., Kcnd3 - cardiac repolarization) participate in heart and skeletal muscle physiology (e.g., MGS1392: P = 0.014, MGS414: P = 0.015; Supplementary Fig. 5e, f and Supplementary Data 4). The number of mat-mono genes that were common in four cell types from CBs was notably higher than the number of pat-mono genes (Fig. 4f, g) and some (e.g., Coa5, Ndufs1, Ndufa10) were critical for known cardiac related human disease phenotypes (e.g., Hypertrophic cardiomyopathy-HP:0001639 – Padj= 9.942 × 10−4; Fig. 4h), whereas pat-mono genes were not (Supplementary Data 4). CMs are the most specialized cell type in the heart and perform rhythmic beating, the primary function of the heart. We assessed the importance of the unique DeMA genes in CMs in disease causation. Unique mat-mono and pat-mono genes exhibited a wide functional profile (Supplementary Fig. 5g, h and Supplementary Data 3). However, the significantly higher TF functional profile for unique mat-mono genes compared to pat-mono genes suggests that the mat-mono genes hold greater risk factors for heart disease in males (Fig. 4i and Supplementary Fig. 6a–c, and Supplementary Data 3). Conversely, unique pat-mono genes in CMs do not provide insight into known human disease related phenotype or physiological relevance.

Fig. 4: Critical cardiac genes are DeMA.
figure 4

a Intersects of mat-mono and pat-mono genes in CPC and CSC cells. b Functional profiles of CPC pat-mono genes. 588 out of 592 CPC pat-mono genes are only enriched in TF functional profile enrichment category. Data precented as p-value (computed from the Fisher exact test assuming binomial distribution and independence for probability of any gene belonging to any set) for each term. ce Unique CSC and CPC maternal and paternal monoallelic genes enrichment in known cardiac muscle development gene sets. Each unique monoallelic gene set was compiled with SysMyo muscle gene data and data is presented as combined scores computed by using the log of the P-value from the Fisher exact test and multiplying that by the z-score of the deviation from the expected rank. The length of the bars represents the combined score for each enrichment term. f, g Intersects of mat-mono (f) and pat-mono (g) genes in cells from CBs. Greater number of mat-mono genes are shared between all the cell types. h Functional profiles of mat-mono genes common in four cell types in CBs. Highest enrichment scored for TFs. Only the cardiac specific human phenotype IDs are shown. (P-value was computed from the Fisher exact test assuming binomial distribution and independence for probability of any gene belonging to any set) i Disease causality with CMs’ unique mat-mono genes in the heart. Disease corelations were curated from Clin Var 2019 data. Data precented as combined scores computed by using the log of the P-value from the Fisher exact test and multiplying that by the z-score of the deviation from the expected rank. GO Gene Ontology, MF molecular function, CC cellular component, BP biological process, KEGG Kyoto Encyclopedia of Genes and Genomes, REAC reactome, WP WikiPathway, TF transcription factor, MIRNA miRTarBase, CORUM comprihensive resource of mammalian protein complexes protein databases, HP human phenotype.

To gain insight into DeMA and RaMA genes’ functional relevance in CMs, we then assessed the functional profiles of these two gene classes. Gene ontology profiles showed that the DeMA genes were enriched in processes contributing to CM cell composition (85 - GO: terms), whereas RaMA genes exhibit no association with CM structure (Supplementary Fig. 6d, e). Further, only the DeMA genes exhibit enrichment in molecular function (62 - GO:MF terms), biological processes (259 - GO: BP terms) and transcription factor (639 - GO: TF terms) GO terms in regard to CMs. Moreover, from the total DeMA genes in CMs, 718 genes homologous to human were known to cause abnormalities in human heart musculature (GO: HP:0003011) (Supplementary Data 5). Interestingly, we also found that, out of 718 genes, 69.1% of genes were included in autosomal recessive inheritance genes in human (HP:0000007), confirming the significance of DeMA genes in heart musculature. However, there was no strong evidence of the involvement of RaMA genes in the establishment of cardiac musculature. RaMA genes were generally related to basic cellular housekeeping processes, for example Smug1: base excision repair, and Lig3: DNA ligase (Supplementary Fig. 6f, g and Supplementary Data 6). We further investigated whether there was any known clinical information associated with DeMA genes. By compiling our DeMA gene set (6284 genes) in ‘Jensen DISEASE’, an algorithm for clinical literature data mining, we found that 28 genes are included out of 62 clinically known congenital heart disease related genes (Supplementary Fig. 6h, i and Supplementary Data 7 and 8).

Discussion

Monoallelic gene expression by imprinting is a well studied phenomenon and significantly contributed to our understanding of gene expression in development and disease45,53,54. However, it has also been shown that, for some genes, monoallelic gene expression could be explained by non-imprinting mechanisms55, for example establishment of recessive and dominant alleles56 of which the molecular mechanism is poorly understood. Also, the imprinted status of some genes may not be stable across multiple tissues, which could arise from the tissue context57,58 - https://www.geneimprint.com/site/genes-by-species. In this study, using our data from mESCs and publicly available data from mESCs and mouse blastocysts, we identified a role for genetic background to influence monoallelic gene expression, which may influence the establishment of dominant and recessive alleles because of the higher polymorphisms between the two genomes. mESCs and cells in blastocysts are naïve cells and contain unique genetic and epigenetic marks, for example, gene imprinting is less abundant in those two cell types compared to lineage committed cells. Consistent with this, Rivera-Muilia et al. demonstrated that the higher number of events of asynchronous DNA replication between alleles is highly correlated with the unorganized nuclear architecture of naïve mESCs, but was lost in differentiated cells with structured nuclear architecture59. Therefore, the genetic background of the parental allele’s influence on deterministic genes’ allele selection (active versus inactive) is perhaps due to mESCs unique genetic and epigenetic status and may not follow the “rules of stable monoallelic gene establishment” as in lineage committed cells. To support this idea, we showed that the parental allele identity of the known imprinted genes (Snrpn, Cobl and Dlk1,) match with our CMs DeMA data. This observation suggests that, imprinted genes may not be strongly influenced by the genetic background of the parental alleles, but the non-imprinted monoallelic genes that are established by other mechanisms, for an example dominant and recessive gene establishment, may change the origin-of-the active allele depending on the genetic background. Therefore, we intentionally avoided categorizing the DeMA genes identified in this study as imprinted genes, but rather broadly acknowledge them as deterministic monoallelic genes. Although we were not able to address this finding further as it is beyond the scope of our study, it will be worth pursuing in future studies using the lineage committed cardiac cells from reciprocal crosses at single-cell resolution.

Our data demonstrates the potential distinctive gene regulatory mechanisms in autosomal monoallelic maternal and paternal genomes in cardiac lineage cells, emphasizing monoallelic gene expression and its potential effects in cardiac development and disease. An interesting and unexpected outcome of our study is the disease susceptibility based on the exclusive expression of the maternal or paternal allele. When a critical gene is expressed solely from the single maternal allele, males are at a disadvantage in acquiring variable penetrance of disease (e.g., Fig. 4i – e.g., HP:0003215 – Discarboxylic aciduria). It is known that males are more susceptible to heart-related disease60,61,62. We observed that the maternally expressed set of genes in our cardiomyocyte’s DeMA cohort, which are homologous to human genes (GO term: HP0000007 - AUH, DNAJC19, OPA3), are linked to the male specific 3-Methylglutaconic aciduria type 2 (BTHS syndrome) disease - https://www.ncbi.nlm.nih.gov/medgen/107893. Therefore, our study provides insightful genetic evidence of maternally expressed monoallelic genes potentially being causative of male heart-related diseases. Further, exclusively expressed paternal deterministic monoallelic genes (in CPCs) primarily showed a contribution to neonatal and adult heart physiology and this gene cohort was exclusively regulated by TFs (e.g., Esr1, SP1, Klf4), and are known to be involved in preventing heart attack and cardiac hypertrophy30,38. Further, the DeMA genes are scattered throughout the genome, however, some showed clustering, and thus may be susceptible to disease caused by locus heterogeneity63. In summary, in this study we emphasize the importance of allele-specific gene expression in development, homeostasis and disease and provide a comprehensive identification of deterministic monoallelic genes in unique cardiac cell types.

Methods

Generation of C57BL/6J-CAST/EiJ F1 Hybrid mESCs

F1 mESC lines were derived at the Cold Spring Harbor Laboratory (CSHL) Gene Targeting Shared Resource. In brief, four week old C57BL/6J female and four week old CAST/EiJ male mice were purchased from Jackson Laboratories and maintained in the CSHL Animal Shared Resource which is fully accrediated by the Association for Assessment and Accrediation of Laboratory Animal Care (AAALAC). C57BL/6J female mice were hormone treated and paired (five pairs) with the male CAST/EiJ mice (two females per male) and the plug rate was 30% from the matings. At 3.5 dpc, blastocysts were collected from three plugged CAST/EiJ females. In total 37 blastocysts were placed onto feeder (Applied Stem Cell, Cat. No. ASF-1014) monolayer cultures (2i medium − 15% FBS (Millipore, Cat. No. ES-009-B), 1X Glutamine (Millipore, Cat. No. TMS-002-C), 1X Non-Essential AA (Millipore, Cat. No. TMS-001-C), 0.15 mM 2-Mercaptoethanol (Millipore, Cat. No. ES-007-E), 100 U/ml Penicillin-Streptomycin (Gibco, Cat. No.15140-122), 100 U/ml Lif (Millipore, Cat. No. ESG 1107), 1 μM PD0325901 (Sigma Aldrich, Cat. No. 444968), 3 μM CHIR99021 (Sigma Aldrich, Cat. No. 361571) in Knockout DMEM (Gibco, Cat. No. 10829-018) in 96-well plates. Cultures were maintained for 14 days to establish hybrid F1 mouse embryonic stem cells (mESCs). From 37 cultures, eight mESC-like lines were selected and transferred into feeder free 2i+LIF cultures. Cells were passaged twice and Zfp-1 expression (fwd: 5′-CTCATGCTGGGACTTTGTGT-3′, rev: 5′-TGTGTTCTGCTTTCTTGGTG-3′) in each clone was assessed prior to selecting positive (male) clones. Further, from the selected three male mESC clones (sexes were further verified in RNA-seq analysis), three single-cell derived clones were established for further experiments. Preliminary, mESCs were characterized by qRT-PCR for Pou5f1 (fwd: 5′-TGTGGACCTCAGGTTGGACT-3′, rev: 5′-CTTCTGCAGGGCTTTCATGT-3′), Nanog (fwd: 5′-CAGATAGGCTGATTTGGTTGGTGT-3′, rev: 5′-CATCTTCTGCTTCCTGGCAA-3′) and Gata6 (negative control, fwd: 5′-CAGCCCACGTTACGATGAACG-3′, rev: 5′-AAAATGCAG ACATAACATTCC-3′). By performing RNA-Seq analysis, the mESCs clones were re-confirmed as mESCs. Moreover, by performing the qRT-PCR followed by Sanger Sequencing using primers which cover known single nucleotide polymorphism (SNP) in CAST/EiJ strains, the mESCs were further confirmed for hybrid genome. Allele specific analysis of the RNA-Seq data reported, on average, 5–6 SNPs per transcript.

F1 mESC in vitro culture

Male mESCs were used in all the experiments. Plastic-bottom tissue culture dishes were coated with 300 μl 0.1% gelatin (Millipore, Cat. no. ES-006-B) and incubated for at least 5 min at 37 °C prior to cell seeding. Cells were plated at 1 × 105 per well in a six-well plate (FALCON, Cat. no. 353046) clonal density in 2i culture medium (15% FBS (Millipore, Cat. No. ES-009-B), 1X Glutamine (Millipore, Cat. No. TMS-002-C), 1X Non-Essential AA (Millipore, Cat. No. TMS-001-C), 0.15 mM 2-Mercaptoethanol (Millipore, Cat. No. ES-007-E), 100 U/ml Penicillin-Streptomycin (Gibco, Cat. No. 15140-122), 100 U/ml Lif (Millipore, Cat. No. ESG 1107), 1 μM PD0325901 (Sigma Aldrich, Cat. No. 444968), 3 μM CHIR99021 (Sigma Aldrich, Cat. No. 361571) in Knockout DMEM (Gibco, Cat. No. 10829-018)). Cells were passaged when the cultures were 70–80% confluent. The cells were washed twice with 1X PBS (Thermo Fisher, Cat. No. 14190-250) and incubated for 3 min at a 37 °C humidified incubator with 200 μl of TrypLE Express Enzyme (Thermo Fisher. Cat. No. 12605028). The enzyme was neutralized by adding 3 ml of 10% FBS (Millipore, Cat. No. ES-009-B) in Knockout DMEM (Gibco, Cat. No. 10829-018) and centrifuged at 180 × g for 5 min at room temperature to remove the supernatant. The cell pellets were replated in 2.5 ml 2i per well (six-well plates). To cryo-freeze F1 mESCs, 70–80% confluent cells from one well of a six-well plate were dissociated to obtain single-cell suspension and then resuspended in freezing medium (10% DMSO (Sigma Aldrich, Cat. No. D2650), 20% FBS (Millipore, Cat. No. ES-009-B) in Knockout DMEM (Gibco, Cat. No. 10829-018)) and frozen and stored at −80 °C.

Cardiac lineage cell differentiation

Male mESC cells were resuspended in differentiation medium containing 15% FBS (Millipore, Cat. No. ES-009-B), 1X Glutamax (Thermo Fisher, Cat. No. 35050061), 1X Nonessential AA (Millipore, Cat. No. TMS-001-C), 0.15 mM 2-Mercaptoethanol (Millipore, Cat. No. ES-007-E), 100 U/ml Penicillin-Streptomycin (Gibco, Cat. No. 15140-122) in Knockout DMEM (Gibco, Cat. No. 10829-018) at the 5 × 104 cells/ml density. In all, 20 μl drops of cell suspension (~1000 cells/ drop) were placed on 10 cm sterile petri dish lids and turned gently upside down onto a 10 ml 1x PBS (Thermo Fisher, Cat. No. 14190-250) containing cell culture dish so that hanging drops are formed. Cells in the hanging drops were then cultured for 48 h at 37 °C, 5% CO2 cell culture incubator to form cell-aggregates. At 48 h post-seeding, cell-aggregates were gently rinsed into a 15 ml conical bottomed FALCON tube with 10 ml of differentiation medium and the cell-aggregates were allowed to settle down at the bottom of the tube. In all, 1 ml of direct cardiac lineage differentiation medium (differentiation medium supplemented with 0.1 mg/ml l-Ascorbic acid (Sigma Aldrich, Cat. No. A7506) – DCLDM) was gently added and then the cell-aggregates were transferred to a tissue culture petri dishe containing 14 ml of DCLDM to form embryonic bodies (EBs) (always used tipped-off P1000 filter tips for cell-aggregates transfer). Plates with floating EBs were gently shaken twice a day to avoid EBs attaching to the plate bottoms. At 72 h post cell-aggregate plating, EBs started expressing cardiac stem cell and mesoderm lineage markers (Nkx2.5 (fwd: 5′-CCCCCAAGTGCTCTCCTG-3′, rew: 5′-CATCCGTCTCGGCTTTGT-3′), Gata4 (fwd: 5′-GCAGCAGCAGTGAAGAGATG-3′, rew: 5′-GCGATGTCTGAGTGACAGGA-3′)).

Establishment of cardiac precursor cell culture (CPCCs)

EBs at 72 h post cell-aggregate plating were collected into 15 ml FALCON tubes, washed three times with 10 ml of 1x PBS each, without disturbing the EBs. To harvest, the EBs were allowed to settle at the bottom of the tubes. Cells in the EBs were dissociated by adding 200 μl of TrypLE Express Enzyme (Thermo Fisher. Cat. No. 12605028), incubated at 37 °C, 5% CO2 cell culture incubator for 2 min, added 3 ml of 10% FBS (Millipore, Cat. No. ES-009-B) in Knockout DMEM (Gibco, Cat. No. 10829-018) and then mechanically dissociated by forced pipetting; up-down with P1000 pipette. Another 7 ml of 10% FBS (Millipore, Cat. No. ES-009-B) containing Knockout DMEM (Gibco, Cat. No. 10829-018) medium was added and then centrifuged at 180×g for 5 min at room temperature. Cell pellets were then resuspended (we used EBs from one plate) in differentiation medium containing 10 ng/ml EGF (Peprotech, Cat. No. AF-100-15) and 10 ng/ml Fgf2 (Peprotech, Cat. No. 450-33) (cardiac stem cell enrichment medium – CSCEM) and plated on 0.1% gelatin (Millipore. Cat. No. ES-006-B) coated wells in 12 wells (we used EB cells from one plate per one well in a 12-well plate) plate. Cells were passaged every other day for three passages to enrich for the CPCCs. In every passaging step we used 40% of the cells from the dissociated harvested cells to re-plate. Cells from the fourth passage were used in downstream applications.

Cardiac body culture

In all, 48 h post seeded cell-aggregates were cultured in DCLDM medium in a sterile petri dish for 5–6 days gently shaking the dishes twice a day. We combined cell aggregates from three 10 cm plates into one 10 cm plate with 20 ml DCLDM medium. DCLDM medium was refreshed every other day. Rhythmically beating non-attached cardiac bodies (CBs) could be observed by five to six days post cell-aggregate plating and were viable and functional for another four to five days (the maximum time point we tested). For the scRNA-seq experiment, beating CBs were picked by pipets using a tipped-off P1000 filter tip under a light microscope. CBs were washed twice with 1x PBS (Thermo Fisher, Cat. No. 14190-250) and then incubated with the combination of 1:1 TrypLE Express Enzyme (Thermo Fisher. Cat. No. 12605028) and Trypsin (VWR, Cat. No. 45000-666) for 5 min at 37 °C, at the 5% CO2 cell culture incubator and then the CBs were mechanically dissociated by pipetting up-down (P1000 pipet) followed by another 2 min of incubation period. Finally, the EBs were dissociated into single-cell suspensions by rigorous pipetting with a p200 pipette. The cell suspensions were further filtered through cell strainers (CellTrics-Partec, Cat. No. 04-004-2326) to remove the clumps.

Single-cell RNA-seq – sample preparation

The diameters of the cells from mESCs, CPCCs and CBs single-cell suspensions were measured using Countess™ ll FL Automated Cell Counter (Invitrogen, Cat. No. A27974). The sizes are mESC – 17 μm, CPCCs – 16 μm, CBs – 13 μm. We observed cells from CBs sized (measured in microscopic images) between 20 and 25 μm in in vitro cultures, but they shrank when dissociated into single cells. cDNAs from single cells were synthesized using C1TM Single-Cell Reagent Kit for mRNA Seq (Fluidigm, Cat. No. 100-6201). In brief, we used 96-well C1TM Single-Cell mRNA Seq IFC-10-17 (Fluidigm, Cat. No. 100-5760) IFCs to load the cells. Once the cells were harvested cell suspensions were filtered through cell strainers (CellTrics-Partec, Cat. No. 04-004-2326) to avoid possible cell-aggregates before the cells were loaded into integrated fluidic circuits (IFCs). Approximately 750 cells were loaded onto an IFC each time and as a control External RNA Control Consortium (ERCC- Thermo Fisher, Cat. No. 4456740) RNA spike-ins were added to the lysis buffer to obtain 1:20,000 dilutions of the final cDNA samples. Cells in IFCs were processed using the Fluidigm C1™ Single-Cell Auto Prep System. After single cells were primed into compartments, cell numbers (single or multiple cells) and the quality (debris or healthy) of the cells were observed using a light microscope and recorded corresponding to each compartment. Typical total RNAs in a mammalian cell ranged from 10-30 pg. cDNAs from low yield polyA+ RNA was synthesized using SMART-Seq® v4 Ultra® Low Input RNA Kit for the Fluidigm® C1™ System (Clontech, Cat. No. 635026), SMARTer-seq® chemistry (Switching Mechanism at 5′ End of RNA Template and Locked Nucleic acid technology), compatible for Fluidigm C1™ Single-Cell Auto Prep System. cDNA from each cell was quantified using Quant-IT™ PicoGreen® dsDNA Assay Kit (Thermo Fisher, Cat No. P11496). We used 0.3 ng of dsDNA per cell, only from the high-quality cells in library preparation. Indexed libraries were generated using Illumina Nextera XT DNA Library preparation Index Kit (Illumina, Cat No. FC-131-1096 and CF-131-1002), which is capable of multiplexing (up to 96 available indexes). Indexed libraries generated for each sample from one experiment (from one IFC) were pooled together. We finally obtained 9-pooled libraries and compressed those libraries again to 8-pooled libraries. Libraries were cleaned up using Agencourt AMPure XP beads (Beckman Coulter, Cat. No. A63880) and library quality and quantities were assessed by Agilent 2100 bioanalyzer using high sensitivity DNA Analysis chips (Agilent, Cat. No. 5067-4626). All our libraries fell within 500 bp average size and concentration ranging from 6.5 nM to 12 nM. In total we obtained 215 libraries for ESC, 175 libraries for CPC and 232 libraries for CM and cardiac smooth muscle cells. We performed 125 nt paired end sequencing for these 8-pooled libraries (one pooled library per lane) in Illumina HiSeq2500 – V4 flow cell platform at the Cold Spring Harbor Laboratory Genome Center.

Bulk RNA-seq – sample preparation

Cells from three mESC and CPCCs biological replicates were used. Total RNAs from each sample were extracted using RNAeasy Mini Kit (Qiagen, Cat. No. 74104) including the DNA digest step according to the manufacturer’s guidelines. All the replicates were processed in parallel at each step. In all, 0.5 μg of total RNA from each sample was used to synthesize cDNAs from polyA+ RNA using Clontech SMART-Seq® v4 Ultra® Low Input RNA Kit for the Fluidigm® C1™ System (Clontech, Cat. No. 635026) following the manufacturer’s guidelines. Quality and quantity of cDNA from each replicate was assessed using the Agilent 2100 bioanalyzer using high sensitivity DNA Analysis chips (Agilent, Cat. No. 5067-4626). All the libraries fell within 500 bp average size. We used 0.3 ng of dsDNA per replicate for library preparation. Indexed libraries were generated using Illumina Nextera XT DNA Library preparation Index Kit (Illumina, Cat No. FC-131-1096 and CF-131-1002). We performed 125 nt paired end sequencing for each library independently on the same chip (one library per lane) in Illumina HiSeq2500 – V4 flow cell platform at the Cold Spring Harbor Laboratory Genome Center.

Single-cell RNA-seq analysis

Sequencing data from each single-cell library from each experimental point was aligned to GRCm38_v97 (mm10) mouse reference genome using STAR 2.7 aligner embedded in Partek Flow software (trial version - https://www.partek.com/partek-flow/) following the guidelines to obtain gene count tables. For the trajectory analysis, we used ASAP v1 software suite64. Single gene count tables of 622 scRNA-seq libraries were processed using the DDRTree reduction method. For cluster analysis, we used separate gene count tables from three independent timepoints. Gene count tables were then used in the Seurat 3.1.0 (SeuratV3) single-cell data analysis package embedded in Nucleic Acid Sequence Analysis Resource65, a web-based portal for cell clustering. In brief, for each experiment the gene expression was normalized by “LogNormalize”, setting the scale factor to 10,000 to obtain the log-transformed data. To detect the variable genes across the single cells in each experimental point per se and to calculate the average expression and depression for each gene, we applied the recommended generic settings (Mean function-ExpMean, Dispersion function-LogVMR, X Low Cut-off value: 0.0125, X High cut-off value: 3 and Y cut-off value: 0.5) to mask the outliers. To regress out the variant caused by technical noise, batch effects and biological source of variations, which would affect the clustering, we relied on the nCount_RNA feature embedded in Seurat 3.1.0 package. After computing the linear dimensional reduction to determine the statistically significant PCs, we used supervised Elbow methods and performed cell clustering with the resolution set-up at the 0.6 in default. Then, the non-linear dimension reduction was proceeded with UMAP. Each cell cluster (CPC, CSC, CF, CM, EC, and PE) was then defined by using the known cell type specific markers66,67,68,69. Identification numbers (IDs) of each cell in each cluster was then matched with the single-cell library IDs in the original ‘.fastq’ file library IDs in order to isolate the single-cell libraries into relevant cell types. Cell cycle analysis was performed in StemChecker web-portal (http://stemchecker.sysbiolab.eu). Deterministic monoallelic and RaMA genes positioning in the genome were performed using Circa software (https://omgenomics.com/circa/).

Bulk RNA-seq analysis

Differential gene expression of mESCs and CPCCs bulk-seq data was assessed with DeSeq2 (version 1.24.0), an R package for “Differential gene expression analysis based on the negative binomial distribution” embedded in the NASQAR web-portal65.

Allele specific scRNA-seq and bulk-RNA-seq analysis

To assess the allele-specific single cell and bulk cell analysis, we used MEA pipeline13 because of MEA pipelines’ usage of indels together with the SNPs in the genome to call the parental background of the transcriptomic data. We manually implemented the pipeline with all its dependencies using the Cold Spring Harbor Laboratory computing cluster. In brief, mm10 reference genome (mgp.v5.merged.snps_all.dbSNP142.vcf.gz) and genetic variants (.vcf) including SNPs and INDELs (mgp.v5.merged.indels.dbSNP142.normed.vcf.gz) were downloaded from the Ensemble genome bowser (https://useast.ensembl.org/info/data/ftp/index.html) and in silico diploid genome for C57BL/6 J and CAST/EiJ was reconstructed utilizing SHAPEIT2. We then aligned each RNA-seq read file (.fastq) to the in silico genome using STAR aligner (STAR-2.5.4b). Aligned ‘.sam’ output files were then converted into ‘.bam’ files using SAMtools-0.1.1970. This process generated two.bam files each for C57BL/6J and CAST/EiJ per single cell. We then assembled all the ‘.bam’ files from a cell cluster into two separate data sets each for C57BL/6J and CAST/EiJ strains. Those two ‘.bam’ files data sets were then used for differential gene expression analysis using EdgeR pipeline embedded in SeqMonk bio-informatics web-portal (https://www.bioinformatic.babraham.ac.uk/projects/seqmonk/). The gene count-tables were then intersected manually using Excel to filter in the deterministic monoallelic (Supplementary Fig. 1e) and RaMA (Supplementary Fig. 4a) genes. For bulk-RNA-seq, each experimental-data point was processed in the same way as above. Graphs and diagrams were generated using the InteractiVenn71, Intervene shinyapp72 and GraphPad PRISM (version 7.04).

Transcript number and functional protein coding gene analysis

The BioMart data mining tool (https://www.ensembl.org/biomart/martview) in the Ensembl genome browser was used to assess the number of transcripts per RaMA gene and the number of protein coding transcripts per RaMA gene. The protein coding transcripts were cross examined in the Uniprot database (https://www.uniprot.org/uniprot).

Gene functional profile and enrichment analysis

Gene functional analysis was performed by using g:Profiler (version e102_eg49_p15_7a9b4d6)73. Numeric IDs were put in as ENTREZGENE_ACC for the only annotated genes. For multiple testing correction the g:SCS threshold was used as the recommended method implementing 0.05 as the threshold cut-off margin. Together with Gene Ontology analysis (Molecular Function (MF), Cellular Component (CC), Biological Process (BP)), Biological pathways (KEGG, Reactome (REAC), WikiPathway (WP), regulatory motif in DNA (TRANSFAC (TF), miRTarBase (MIRNA), protein databases (CORUM), and the Human phenotype ontology (HP) were included in the analysis.

Gene enrichment analysis was performed using Enrichr web-portal developed by the Ma’ayan lab at the Data Coordination and Integration Center at Icahn School of Medicine at Mount Sinai74,75. Gene set enrichment was performed compiling the gene list with SysMyo Muscle Gene sets (created by the Duddy lab https://www.sys-myo.com/muscle_gene_sets/) embedded in Enrichr74,75. Histone association data was obtained by compiling gene sets in ENCODE histone modification 2015 database embedded in Enrichr74,75. Results were presented in a bar chart with -Log10(P-values), -Log10(adjP-values) and combined scores (log of the P-value from the Fisher exact test and multiplying that by the z-score of the deviation from the expected rank74,75). Disease relevance for gene sets was assessed with ClinVar 2019 (contained in Enrichr74,75), OMIM Disease (contained in Enrichr74,75), Rare Disease AutoRIF ARCHS4 Predictions (contained in Enrichr74,75) and Rare Disease GeneRIF ARCHS4 Predictions data bases (contained in Enrichr74,75). Results were presented with combined scores.

Nascent RNA and mature RNA analysis

We used the Click-iT™ Nascent RNA Capture Kit (Thermo Fisher, Cat. No. C10365) for nascent RNA capture. Triplicates of mESCs cultured on six-well Plates were pulsed with 0.5 mM 5-ethynyl uridine (EU) for 0.5 h in a 37 °C, 5% CO2 incubator. Single-cell suspensions were obtained by dissociating cells with TrypLE and cell counts were performed using the Countess™ ll FL Automated Cell Counter (Invitrogen, Cat. No. A27974). In total, 1 × 107 cells from each replicate were used to extract RNAs. Total RNAs were extracted using the RNeasy Mini Kit (Qiagen, Cat. No. 74104), including the DNA digesting step. 10 μg of total RNAs from each replicate was used as entry material in the protocol and EU incorporated RNA was precipitated overnight at −75 °C. One microgram of biotinylated RNAs with 50 ml of Dynabeads® MyOne™ Streptavidin T1 magnetic beads was used to capture biotinylated RNAs per replicate and suspended the bead-RNA complex in 50 μl of wash buffer 2 and immediately processed for cDNA synthesis. We used SuperScript® VILO™ cDNS synthesis kit (Thermo Fisher, Cat. No. 11754-050) as recommended in the Click-iT™ Nascent RNA Capture protocol. Two microliters of the undiluted cDNA was used per qRT-PCR reaction. qRT-PCR was performed with PowerUp SYBER Green Master Mix (Life Technologies, Cat. no. A25743). One microgram of the total RNA from the EU pulsed RNA samples was used to synthesize control cDNA for each replicate. Nascent RNA detection primers were designed over the intron-exon boundaries, avoiding capturing transcripts with retaining introns or processed transcripts. Mature RNA detection primers were designed to expand the product over the neighboring exon-exon where the intron between is >1.5 kb. Primers used were: Thbs4-mature (fwd: 5′-AGGGGAACATCTCCGAAACT-3′, rev: 5′-AAAAGCGCACC CTGATGTAG-3′), Thbs4-nascent (fwd: 5′-GTGGACACCTGTGCTCTCTG-3′, rev: 5′-GTTGCAGCGGTACTTGAGGT-3′), Ccdc80-mature (fwd: 5′-ATGTTCCTCAGTTCCGATGG-3′, rev: 5′-TCCTCTCCAACACCCAAAAG-3′), Ccdc80-nascent (fwd: 5′-TGAAATTCATCGGTTGTCAG-3′, rev: 5′-ACTCCTCCAACTTCCTCTCC-3′), Bicd2-mature (fwd: 5′-AAGCTACGGAACGAGCTCAA-3′, rev: 5′-CCATGCGCAACAGAGAGTTA-3′), Bicd2-nascent (fwd: 5′-CTCAGGCTCCAGGAGAGATG, rev:5’-CCATGCGCAACAGAGAGTTA-3′), Hsbp1l1-mature (fwd: 5′-GAACGCGGCTGAGAATCTAC-′, rev: 5′-CATGAGGTCATCCACGTTTC-3′), Hsbp1l1-nascent (fwd: 5′-CAGCCGTACTCCCTCAGTGT-3′, rev: 5′-CGACTTCCCATCTCTTCCAG-3′), 4932435O22Rik-mature (fwd: 5′-CTGAGGCTCTTTGGCACTTT-3′, rev: 5′-CCACAGCAGTCCTCCTAAGC-3′), 4932435O22Rik-nascent (fwd: 5′-AGCCAGCCAGCTGAACTATC-3′, rev: 5′-ATTAC CGAAGTGTCCGATGC-3′), Mob1b-mature (fwd: 5′-TCCATTCCCGAAGAATTTCA-3′, rev: 5′-GTGC CAACTCTCGTCTGTCA-3′), Mob1b-nascent (fwd: 5′-GCATCAGGGGAGCTTAAGTG-3′, rev: 5′-GTG CCAACTCTCGTCTGTCA-3′), Tbx3-mature (fwd: 5′-CCTTCCACCTCCAACAACAC-3′, rev: 5′-GCAT GCTGTTCAAATTGAGG-3′), Tbx3-nascent (fwd: 5′-CCTCCACTCCTCCAAAACAG-3′, rev: 5′-GCAG CCATGTATGTGTAGGG-3′), Kif17-mature (fwd: 5′- GCAACTACTTCCGCTCCAAG-3′, rev: 5′-CTCA CCACCGAAGCTGTTTT-3′), Kif17-nascent (fwd: 5′-CCAGGAGTCATGGGAGTGAC-3′, rev: 5′- GCTTGGAGCGGAAGTAGTTG-3′).

CUT&Tag assay

To assess the allele-specific chromatin accessibilty of maternal and paternal genes in F1 mESCs, we used CUT&Tag-IT™ assay kit (Cat. No. 53160) from Active Motif®. mESCs were cultured in 2i medium and at 70% confluency, cells were harvested using TrypLE Express Enzyme (Thermo Fisher. Cat. No. 12605028). Cells from three wells of a 6-well plate (FALCON, Cat. no. 353046) were combined and filtered through cell strainers (CellTrics-Partec, Cat. No. 04-004-2326) to remove cell clumps. For each histone mark, 400,000 cells were used. We followed the protocol according to the assay kit manual, excluding the magnetic bead capture steps. Instead, we used centrifugation (5 min at 600 × g at room temperature per centrifuging step) to pellet the cells. Antibodies used: H3K36me3 (Active Motif - Cat. No. 61102), H3K4me1 (Active Motif - Cat. No. 39298), H2K27ac (Active Motif - Cat. No. 39134), H3K4me3 (Active Motif - Cat. No. 39160), H3K79me2 (Active Motif - Cat. No. 39144), H3K79me3 (Novus Biologicals - Cat. No. NB21-1383SS). We used 1 μg in 50 μl dilution for primary antibodies. Guinea Pig anti-rabbit antibody provided with the kit was used as the secondary antibody at 1:100 dilution as well as the IgG control. We obtained 200,000 nuclei in the final cell suspention in every sample and the all the nuclei were used for DNA extraction. Total fragmented DNA extracted in each sample was used to construct sequencing libraries using Illumina index primers provided with the kit. Sequencing libreries were pooled (including the other two libraries from the same experiment) and sequenced in one lane using a MiSeq PE 300 V3 kit at the Cold Spring Harbor Laboratory Genome Center.

CUT&Tag data analysis

Allele-specific and non-allele-specific data analysis was carried out using the data anlysis webtool provided by Active Motif®. Briefly, peak calling was performed using SEACR (Sparse Enrichment Analysis for CUT&RUN) analysis strategy which uses the global distribution of background signal to calibrate a simple threshold for peak calling76. Motif calling was performed with the Homer algorithm. For allele-specific read calling we used the MEA pipeline13 as previously described.

Microscopy

Rhythmically beating cardiac body brightfield movies were taken using a Zeiss Observer Z1 inverted fluorescence microscope using a ×10 objective.

Software used

Partek® Flow® software (Partek®). ASAP v1 software suite. Seurat 3.1.0 (SeuratV3) embedded in Nucleic Acid SeQuence Analysis Resource (NASQAR)65. StemChecker web-portal (http://stemchecker.sysbiolab.eu). Circa software (https://omgenomics.com/circa/). Methylomic and epigenomic analysis (MEA) pipeline13. SeqMonk bio-informatics web-portal (https://www.bioinformatics.babraham.ac.uk/projects/seqmonk/). InteractiVenn71. Intervene shinyApp72. GraphPad PRISM (version 7.04). Ensembl BioMart data mining tool (https://www.ensembl.org/biomart/martview). g:Profiler (version e102_eg49_p15_7a9b4d6)73. Enrichr software suite74,75. Uniprot database (https://www.uniprot.org/uniprot). SEACR76 (Sparse Enrichment Analysis for CUT&RUN) provided by Active Motif®. Microsoft Office 2011. Adobe Illustrator 2020.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.