Heart enhancers with deeply conserved regulatory activity are established early in development

During the phylotypic period embryos from different genera show similar gene expression patterns, implying common regulatory mechanisms. To identify enhancers involved in the initial events of cardiogenesis, which occurs during the phylotypic period, we isolated early cardiac progenitor cells from zebrafish embryos and characterized 3838 open chromatin regions specific to this cell population. Of these regions, 162 overlapped with conserved non-coding elements (CNEs) that also mapped to open chromatin regions in human. Most of the zebrafish conserved open chromatin elements tested drove gene expression in the developing heart. Despite modest sequence identity, human orthologous open chromatin regions could recapitulate the spatial temporal expression patterns of the zebrafish sequence, potentially providing a basis for phylotypic gene expression patterns. Genome-wide, we discovered 5598 zebrafish-human conserved open chromatin regions, suggesting that a diverse repertoire of ancient enhancers is established prior to organogenesis and the phylotypic period.


Introduction:
The developmental hourglass model predicts a phylotypic stage during midembryogenesis when species within the same phylum display the greatest level of morphological similarities 1,2 . The hourglass model is also supported by comparative transcriptomic studies that demonstrated the most conserved gene expression occurred at the phylotypic stage [3][4][5] . The idea that conserved phylotypic gene expression is established through conserved enhancers is supported by several comparative epigenomic studies [6][7][8][9][10] . While most molecular studies of the phylotypic period have focused on whole embryos, recent evidence suggests that the exact developmental timing of maximal conservation varies in a tissue-specific manner 8 . We are only beginning to understand how conserved transcriptional programs for individual developmental lineages are set up prior to the phylotypic stage.
The heart, derived from the mesoderm, is the first organ formed during embryogenesis.
Heart development is orchestrated by conserved cardiac transcription factors (TFs) binding to cis-regulatory elements (CREs) 11,12 . Crucial cardiac specification events occur during early embryogenesis [13][14][15][16] . For example, distinct subtypes of mouse cardiac progenitors emerge within the gastrula stage preceding the expression of the canonical cardiac progenitor marker Nkx2.5, long before any organ structure is formed [13][14][15] . However how this potential early cardiac specification is controlled by enhancer elements, and the extent to which this process is evolutionarily conserved, is not known.
Heart enhancers have been extensively characterized in studies utilizing genome-wide profiling techniques including chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) [17][18][19][20][21][22] , computational predictions 23,24 and in vivo validation in mouse and zebrafish 4 embryos 17,18,20,23,24 . To date, the majority of in vivo heart enhancer discovery and validation experiments were performed in embryos, when the heart chamber structures were already established, or in adult hearts 17,18 . In vitro differentiation of embryonic stem cells (ESCs) into cardiac progenitors have also yielded insights into early cardiac development and have enabled the discovery of cardiac enhancers 19,21,22 . However, more work is needed to identify the CREs that regulate the early differentiation of mesoderm progenitors to cardiac lineages in the context of the developing embryo.
Despite the highly conserved cardiac TFs necessary for heart development, heart enhancers identified at mouse E11.5 show limited phylogenetic conservation compared to brain enhancers identified at the same developmental stage 8,17 . However, analysis of putative enhancers in mesoderm cells, derived from embryonic stem cells, show higher evolutionary constraint than the enhancers identified after organogenesis 8 . This suggests that the regulatory elements that establish the conserved cardiac transcriptional program may exist at the initial stages of heart development, presumptively during the time window from naïve mesoderm to cardiac progenitors.
Here, we set out to discover enhancers that are active in cardiac progenitor cells prior to the expression of the cardiac progenitor marker Nkx2. 5. We generated a zebrafish GFP reporter line driven by a mouse Smarcd3 enhancer (Smarcd3-F6) that is active in early gastrulating mesoderm 13 , in order to enrich for early zebrafish cardiac progenitors. We profiled gene expression and the accessible chromatin landscape of GFP positive cells using the assay for transposase-accessible chromatin using sequencing (ATAC-seq), single cell mRNA-seq, lineage tracing, and RNA in situ hybridization, both in wildtype embryos and following knockdown of the essential cardiac transcription factors Gata5/6. Results from these experiments indicated that 5 we had purified a population of cells enriched for cardiac progenitors. Using direct and indirect DNA alignments 25,26 , we identified accessible chromatin regions shared between zebrafish and human. We found that these conserved accessible chromatin elements were highly associated with developmental transcription factors that are regulated by Polycomb repressive complex 2 (PRC2). We confirmed the cardiac activity and functional conservation of many anciently conserved open chromatin regions using in vivo reporter assays. In sum, our study identified a set of conserved cardiac enhancers established prior to the phylotypic period, before the heart and other organ primordia appear, potentially providing a basis for common gene expression patterns across genera. Furthermore we uncovered ~6000 anciently conserved open chromatin regions that likely serve as enhancers for other cell lineages.

Results:
The mouse Smarcd3-F6 enhancer marks cardiac progenitors in zebrafish To examine the earliest events that contribute to cardiac lineage specification in zebrafish, we needed a means to isolate early cardiac progenitor cells in vivo. To identify a zebrafish marker that could drive GFP expression in cardiac progenitor cells prior to nkx2.5 expression, we tested a recently described early mouse cardiac enhancer, Smarcd3-F6 13 in our zebrafish model.
Lineage tracing experiments demonstrated that this enhancer labeled cardiac progenitor cells prior to Nkx2.5 expression in mouse embryos 13 . We also found that the Smarcd3-F6 region was enriched for the active enhancer mark H3K27ac and contains a CRE co-bound by several conserved cardiac TFs (GATA4, NKX2.5, TBX5) in mouse ESC differentiated cardiac precursors (CP) and cardiomyocytes (CM) 19,22 (Supplementary Fig. 1A). 6 To test if the Smarcd3-F6 enhancer functions as an early marker of cardiac progenitors in zebrafish, we generated a Tg(Smarcd3-F6:EGFP) transgenic line (Fig. 1A). RNA in-situ hybridization against gfp showed the enhancer activity was robustly detected at 6 hours postfertilization (hpf) along the embryonic margin (Fig. 1B), which contains mesendodermal progenitors including future cardiac cells 27 . Over the course of gastrulation, GFP positive cells migrated to encompass positions in the anterior and posterior lateral plate mesoderm (ALPM and PLPM) (Fig. 1B). Co-immunostaining comparing Tg(Smarcd3-F6:EGFP) and Tg(nkx2.5:ZsYellow) expression indicated that the Smarcd3-F6 enhancer marked almost all cardiac mesoderm expressing nkx2.5 at early somite stages (13 hpf) (Fig. 1C). Tg(Smarcd3-F6:CreERT2) lines were generated to trace the fate of Smarcd3-F6 labelled cells. By crossing Tg(Smarcd3-F6:CreERT2) to a Tg(βactin2-loxP-dsRed-loxP-GFP) reporter line, we found that following 4-hydroxytamoxifen (4-HT) addition at 8 hpf, cells labeled by the Smarcd3-F6 enhancer contributed to heart formation in later development ( Supplementary Fig. 1). Although the Smarcd3-F6 enhancer shows limited mammalian, and no zebrafish sequence conservation ( Supplementary Fig. 1A), the early labeling of cardiac lineages in zebrafish indicated that this enhancer would allow us to isolate a cell population enriched for cardiac progenitors.

Open chromatin regions enriched in the Smarcd3-F6 labeled population
Open chromatin profiles often identify the genomic regions where TFs and their cofactors bind and function 28,29 . Using ATAC-seq, we detected 155,879 open chromatin regions (ATAC-seq peaks) in GFP+ population and 153,777 in GFP-population. Our ATAC-seq peaks overlap with markers of active promoters (H3K4me3) and enhancers (H3K27ac, H3K4me1) that were previously identified from ChIP-seq experiments on whole embryos of similar stages 6 , as well as additional genomic regions, likely due to purifying a small subset of cells away from the 8 bulk embryo (Supplementary Fig. 3A). Most ATAC-seq peak regions (n=195,466) shared similar ATAC-seq signals in both GFP+ and GFP-populations. 5,471 peaks showed significant quantitative differences ( Fig. 2A)  We then asked which TF motifs were overrepresented in GFP+ specific peaks. We found GATA motifs showed the strongest enrichment, consistent with the crucial roles GATA factors play in heart and endoderm development [31][32][33][34] (Fig. 2C). In contrast, GFP-specific peaks were most highly enriched for SOX motifs (Fig. 2C). Like Gata4 and Gata6 in mice [35][36][37][38][39][40] , gata5 and gata6 play redundant but critical roles for zebrafish heart formation 31,33 . To test if the activity of the Smarcd3-F6 enhancer is regulated by Gata5 and Gata6 in zebrafish, we performed gata5 and gata6 knock-downs by injecting morpholinos into Tg(Smarcd3-F6: EGFP) embryos (Fig. 2D). 9 Supporting our motif enrichments we found that GFP expression by Smarcd3-F6 enhancer requires Gata5 and Gata6.

Comparative epigenomic analysis of accessible chromatin elements reveals phylogentically conserved cardiac enhancers
To identify regions of open chromatin that are conserved between zebrafish and human, we used two well-defined conserved non-coding element (CNE) datasets, zCNE 25 and garCNE 26 .
Both of these two resources contain conserved regions identified using direct alignment and indirect homology bridged by intermediate species (Fig. 3A). We associated zebrafish ATACseq peaks to CNEs if they overlapped a zebrafish-human or zebrafish-mouse CNE. Most  Table 1 for details). Of these 6294 zebrafish-human ATAC-seq peaks 176 were GFP+ specific, 264 were GFP-specific.
We found that GFP-specific peaks were ~4 times more enriched for conserved regions than the GFP+ ones (P=1.40e-46 using human CNEs; Chi-square test) (Fig. 3A). Previous work has shown that 30-45% of forebrain, midbrain and limb enhancers overlapped regions of extremely high sequence constraint, in contrast to only ~6% of cardiac enhancers 17 . Given that GFP+ and GFP-populations were enriched for cardiac and brain lineages respectively (Fig. 1F), our observations were consistent with these previous findings 17,41 . 10 Comparing genomic features (e.g. TF binding, chromatin accessibility, histone modifications) between species is a potentially powerful way to identify conserved regulatory function within alignable sequence for specific tissues or cell types [42][43][44][45][46][47] . For ATAC-seq peaks overlapping CNEs, we asked if the orthologous human CNEs also contained accessible chromatin according to DNase I hypersensitivity sites (DHS) reported by ENCODE3 48 (Fig. 3B).
The majority of the zebrafish ATAC-seq peaks overlapping zebrafish-human or zebrafish-mouse CNEs overlapped human or mouse CNEs containing accessible chromatin (89% for human (5598/6294) and 79% (4747/6047) for mouse; Fig. 3B,C). Overall ~3% of the total zebrafish ATAC-seq peaks were identified as accessible non-coding regions that are shared between zebrafish and human, or zebrafish and mouse. We refer to conserved accessible chromatin connected through CNEs as aCNEs. We compared our aCNEs to ultraconserved non-coding elements in the human genome 49 , which were defined as sequence elements with no mismatches for at least 200 base pairs between orthologous regions in the human, rat and mouse genomes. While 30% of ultraconserved elements overlap human aCNEs, only 3% of the aCNEs overlap ultraconserved elements (Supplementary Table 2). The fact that nearly 40% (2228/5598 for zebrafish-human aCNEs) of the aCNEs were found using indirect alignments indicates that many of these ancient aCNEs show limited sequence conservation and would likely have been overlooked using standard multiple genome alignment methods.

Conserved accessible chromatin regions drive expression in the developing heart
To gain insight into the enhancer activity of aCNEs, we compared them to elements tested in the VISTA enhancer database, which is the most comprehensive database of functionally validated enhancers. Roughly 10% (556/5849) of human or mouse aCNEs have been experimentally tested in vivo and reported in the VISTA Enhancer Browser, two thirds of which were recorded as positive enhancers (2017/07/29 release; https://enhancer.lbl.gov/). 43/162 of the GFP+ aCNEs that we found overlapped heart enhancers that were predicted using curated epigenomic data 23 .
To determine if some conserved open chromatin elements are bound by cardiac TFs during heart development, we collected published ChIP-seq and ChIP-exo data for GATA4, NKX2.5, TBX5, HAND2, MEF2A and SRF conducted in mouse embryonic hearts or cardiac cell types 20,22,50,51 (see Methods for detail). We found that among the 134 GFP+ specific ATACseq peaks that were aCNEs shared with mouse, 49 (37%) are bound by at least one cardiac TF at the orthologous regions of mouse genome and 34 (25%) are CREs that are bound by more than one cardiac TF (Supplementary Table 2). In contrast, cardiac TF binding was rarely observed in To assess the in vivo function of the GFP+ specific aCNEs, we used a transgenic reporter assay to test their activity during zebrafish embryonic development (Fig. 3C). The regions we selected included both direct (13 regions) and indirect (8 regions) alignments between zebrafish and human (Supplementary Table 3). We cloned the 21 zebrafish regions into a Tol2-based GFP enhancer detection vector and injected the constructs into zebrafish embryos (Fig. 3C).
We found 18/21 regions drove heart expression in at least 30% of the F0 embryos injected ( Supplementary Fig. 4), suggesting they are active enhancers in embryonic heart development. Within this set of 18 enhancers, 11 zebrafish aCNEs (ZaCNEs) were located near known cardiac genes, nine of which overlapped the experimentally determined binding sites of one or more cardiac TFs (GATA4, NKX2.5, TBX5, HAND2, MEF2A and SRF) in mouse hearts or cardiac cell types 20,22,50,51 (Supplementary Table 3). Four of the seven selected ZaCNEs with no known cardiac gene association had experimentally determined binding sites of one or more cardiac TF (Supplementary Table 3).
We raised stable transgenic lines for the 18 ZaCNEs that passed the 30% threshold in the F0 assays, to verify their cardiac activity (Fig. 3C). Except for ZaCNE18 for which only one transgene germline carrier was identified, we obtained multiple independent alleles for the other 17 ZaCNEs (Supplementary Table 3). Despite the random transgene integration mediated by the Tol2 transposase system, we observed consistent GFP expression in hearts in multiple alleles of the same enhancer lines for 15/17 ZaCNEs, with ZaCNE4 and ZaCNE7 being the exceptions ( Supplementary Fig. 5). These results from F1 or F2 stable transgenic embryos demonstrated high accordance with those obtained from the F0 generation. We classified the heart expression observed in the ZaCNE transgenic lines into 4 major categories ( Fig. 3D and Supplementary Fig.   5A). Some heart enhancers broadly label all heart structures (Category I), while others showed specific or enhanced expression in the ventricle (category II), atrioventricular canal (category III) or outflow tract (category IV) ( Fig. 3D and Supplementary Fig. 5). These results confirmed the diverse expression driven by aCNEs and suggested that aCNEs may play distinct roles in regulating heart gene expression.

Human-zebrafish aCNEs share conserved early cardiac activities
We wanted to determine if human-zebrafish orthologous aCNEs were functionally conserved with respect to their ability to control spatial-temporal gene expression patterns. We chose four aCNEs near essential cardiac genes hand2/HAND2 (aCNE1), tbx20/TBX20 (aCNE20) and mef2cb/MEF2C (aCNE5, aCNE19). All of these zebrafish sequences drove robust and specific heart expression in stable transgenic lines (ZaCNE1, ZaCNE5, ZaCNE19, ZaCNE20) 14 Our motif enrichment analyses suggested that GATA factors act as important regulators in the GFP+ population. Within GFP+ specific zebrafish aCNEs conserved with human, 66% (107/162) have at least one GATA motif and ~30% (52/162) have more than one GATA motif (Supplementary File. 5). In contrast, while using the same threshold, no significant GATA motifs were found in the conserved GFP-specific zebrafish aCNEs (Supplementary File. 5). To test the functional importance of the GATA motif in aCNEs, we mutated an aligned GATA motif within zebrafish and human aCNE1 regions (near the hand2/HAND2 locus) and compared their activities with that of the WT sequences ( Supplementary Fig. 6C). In F1 stable transgenic lines, we found that both zebrafish and human aCNE1 sequences with mutated GATA motif drove much weaker GFP expression in zebrafish hearts compared to the WT sequences ( Supplementary   Fig. 6C).
These results together demonstrate that despite sequence divergence between human and zebrafish aCNEs, we have identified a number of conserved accessible chromatin elements that share conserved spatiotemporal, GATA-dependent activity during the early stages of heart development.

aCNEs are enriched for lineage-specific developmental enhancers
Functional enrichment analysis by GREAT revealed that aCNEs were significantly associated with DNA binding proteins, with the highest enrichment being the homeodomain proteins (GO:0043565, sequence-specific DNA binding: FDR=0; IPR009057, Homeodomainlike: FDR=0; binomial test). Supporting the role of aCNEs as development enhancers, we observed that these regions were also highly enriched for genes regulated by polycomb 15 repressive complex 2 (PRC2) (MSigDB Perturbation, Set 'Suz12 targets': FDR=0, Set 'Eed targets': FDR=0; binomial test). To gain more insight into lineage-restricted functions of aCNEs we compared the human aCNEs orthologous to our GFP+ specific and GFP-specific zebrafish ATAC-seq peaks. Although human aCNEs specific to the GFP+ and GFP-populations both showed significant enrichments for PRC2 target genes and homeobox transcription factors, only a minority of PRC2 regulated genes were associated with both GFP+ and GFP-aCNEs, indicating that these two sets of aCNEs may be involved in distinct developmental processes ( To get a global understanding of the tissue-specific usage of aCNEs, we took advantage of the chromatin states of 127 human tissues/cell types that were predicated based on 5 chromatin marks (H3K4me3, H3K4me1, H3K36me3, H3K27me3 and H3K9me3) in the Roadmap Epigenomics Project 52 ( Fig. 5C and Supplementary Fig. 8). We simplified the 15 chromatin states into "Active", "Bivalent", "Polycomb repressed", "Quiescent" and "Other" categories ( Fig. 5C and Supplementary Fig. 8). Only 10% (899/8865) of the aCNEs showed constitutive activity ("Active" in > 80% the tissues/cell types) (Fig. 5D) Given that lineage-specific activity and enrichment for PRC2 binding are wellestablished properties of poised enhancers in ESCs 53-56 , we compared aCNEs to three enhancer types defined in a recent study using mouse ESCs: poised (P300+, H3K27me3+, H3K4me3-, H3K27ac-), primed (H3K4me1+, H3K4me3-, H3K27ac-, H3K27me3-) and active (P300+, H3K27ac+, H3K4me3-, H3K27me3-) 56 . Compared to all open chromatin regions, aCNEs 17 showed a 5-fold enrichment for poised enhancers (FDR < 4.44e-19 using hypergeometric test) with little enrichment for active (1.4 fold FDR < 0.0018 using hypergeometric test) or primed (1.1 fold FDR < 0.16 using hypergeometric test) enhancers. We also observed a strong enrichment for poised enhancers that were identified in human ES cells (fold of enrichment: 9.72, FDR < 1.14e-84 using hypergeometric test) 54 .
Conserved genomic regulatory blocks (GRBs), which are defined by cluster of CNEs, contain developmental genes, coincide the boundaries of topologically associating domains (TADs) and remain concordant between species of various evolutionary distances 57 . We found that nearly 90% (7846/8866) of human aCNEs fell into GRBs defined by using CNEs (70% identity over 50bp) between human and chicken 57 , suggesting aCNEs have been kept in a discrete set of syntenic regions during vertebrate evolution. Overall the genomic and epigenomic properties of aCNEs suggest that many of them may serve as lineage-restricted enhancers that facilitate the expression of developmental genes.

Discussion
The existence of a phylotypic period implies that the gene regulatory events that control phylotypic gene expression patterns are also likely to be evolutionarily constrained. In this study we characterized a population of cells enriched for cardiac progenitors in early zebrafish Vertebrate heart enhancers tend to show a lack of overt phylogenetic conservation relative to enhancers active in other tissues, such as the brain, at the same developmental stages 17,41 . Our results echo this finding, while at the same time highlighting a set of highly conserved heart enhancers. By comparing open chromatin within CNEs identified using approaches such as transitivity through a third species and ancestry reconstruction 25,26 , we discovered 60% more conserved open chromatin regions between zebrafish and human/mouse that would otherwise have been missed by using direct sequence alignment alone (Fig. 3A,B).
Nearly half of the heart enhancers we validated were from indirect sequence alignment (Supplementary Table 3), and zebrafish-human aCNE orthologs identified from either direct (aCNE1) or indirect (aCNE5, aCNE19, aCNE20) alignment share conserved cardiac activity to a similar degree, despite a 50-60% sequence identity. The aCNEs that we have validated can be used to further our understanding of heart development and regeneration. For example, using one of these "indirectly aligned" heart enhancers (aCNE21) we can recapitulate the endogenous expression of the nearby cardiac gene hey2 70 . Characterizing the epigenomic landscape of additional cell-lineages early in development will likely highlight further enhancers, and reveal specific aCNEs capable of driving cell or tissue-specific expression before, during and after the phylotypic period.
Attesting to their potential importance in regulating specific genes through long-range chromatin interactions, an established genomic property of CNEs is that they cluster into regions of conserved synteny, referred to as gene regulatory blocks (GRBs) 71,72 . A significant fraction of 20 GRB boundaries coincide with the boundaries of topological association domains (TADs) 57 . We found that nearly 90% (7846/8866) of our human aCNEs fell into GRBs comprised of CNEs between human and chicken (70% identity over 50bp) 57  Embryos were raised at 28.5 degree Celsius and staged as previously described 80 .  Supplementary Table 3

Immunostaining
Immunostaining was carried out as previously described with minor modifications 87 .
Embryos from Tg (Smarcd3-F6: EGFP) hsc70 and Tg(nkx2.5: ZsYellow) fb7 crosses were fixed at 13 hpf in 4% paraformaldehyde at 4 degrees Celsius overnight. After 3X5min washes in PBS with 0.1% Triton, embryos were permeabilized in PBS with 0.5% Triton for 4 hours at room temperature (RT). Embryos were then blocked in PBST (1% DMSO and 0.5% Triton X in PBS) with 5% Normal Goat Serum (Millipore, Cat# S26-LITER) for 1.5-2h at RT before incubated with primary antibodies (α-RCFP 1:500, Clontech Cat# 632475; α-GFP 1:1000, Torrey Pines Biolabs) at 4 degrees Celsius overnight. Next day, embryos were washed for 3-4 hours in PBS with 0.1% Triton at RT, with 6-8 changes of solution. Incubation of secondary antibodies was carried out at 4 degrees Celsius overnight, followed by the same washing procedure as that for primary antibodies. After staining, embryos were mounted in 1% (w/v) low melt agarose (Sigma, A9414) and imaged under a Nikon A1R Si Point Scanning Confocal. Bioanalyzer were used to check library quality and concentration. Libraries were 50bp singleend sequenced on Illumina HiSeq 2500 platform to a depth of (3.5-7.0) X 10 7 reads. Two biological replicates were collected for both GFP+ and GFP-cells.

Transgenic zebrafish enhancer assay
Candidate regions containing the zebrafish ATAC-seq peaks (21 ZaCNEs) or human DHSs (4 HaCNEs) were amplified from genomic DNA and recombined into pDONOR221 vector (Invitrogen, Gateway BP Clonase II Enzyme Mix, Cat# 11789020) before they were cloned into E1b-Tol2-GFP-gw vector eventually. 25 ng E1b-Tol2-GFP-gw plasmid carrying one aCNE and 150ng Tol2 mRNA were injected into WT embryos at one-cell stage. F0 founder embryos were raised to 48-52 hpf before subjected to imaging and heart expression scoring.
Candidate regions were considered as a heart enhancer if more than 30% of the injected embryos display GFP positive cells beating in zebrafish hearts, which was consistent with the criterion used in similar studies before 89 . 42-220 (average n=86) injected embryos were analyzed for each candidate regions. The genomic coordinates, lengths and nearby genes of candidate regions and primers used for cloning can be seen in the Supplementary Table 3. F0 embryos injected with enhancers that passed the 30% threshold were raised to screen for transgene germline carriers. Except enhancer ZaCNE18 for which only one carrier has been identified, 2-4 independent alleles have been identified for each enhancer (see Supplementary   Table 3). Though ectopic expression has been seen in some carriers, the cardiac expression patterns were similar in different alleles of the same transgene. 27
The primers were designed to convert the aligned GATA consensus sequence AGATAA to CCCCCC. pDONOR221 vectors carrying the mutated aCNE1 enhancers were PCR amplified from the original pDONOR221 containing the wildtype (WT) aCNE1 sequences using the primers above. After amplification, 50 ul PCR mixture was incubated with 1 ul DpnI (NEB, Cat#R0176S) at 37 degrees to remove WT enhancer templates. DpnI was inactivated at 80 degrees for 20min before transformation. Plasmid clones with the correct GATA motif mutation were confirmed by Sanger sequencing and used as entry clones to make E1b-Tol2-GFP constructs containing mutated aCNE1 enhancers. Independent germline carriers haven been identified for ZaCNE1_GATAMutated (n=5) and HaCNE1_GATAMutated (n=4) enhancers (see Supplementary Table 3). GFP expression levels in different GATA_Mutated alleles look slightly different but were generally weaker than WT_alleles. Images in Supplementary Fig. 6 were taken using alleles showing median expression level within all alleles.  Supplementary Table 4 for details).

Processing and analysis of bulk and single-cell mRNA-seq data
HTSeq-count 93 and Zv9 (Ensembl release 79) transcriptome annotation were used to determine the number of reads mapped to each gene. An average of 14486 genes were detected in bulk samples and 3901 genes in single cells.
For bulk mRNA-seq, genes that have at least 1 read per million in at least 2 replicates were kept for downstream analysis. edgeR package (version 3.18.1) was used for normalization and differential gene expression analysis 94 . 167 genes were identified as more highly expressed in GFP + cells compared to the GFPpopulation while 147 genes were more strongly expressed in GFPcells (FDR < 0.05, Fold change > 2). Volcano plot showing differentially expressed genes was generated with an in-house R script.
Functional enrichment analysis was performed using online tool g:Profiler 95 . A more stringent list of differentially expressed genes (FDR < 0.05, Fold change > 4) were ordered based on its fold change and then used as the input for g:Profiler. Only functional categories containing more than 2 genes but less than 500 genes were included in our analysis. Benjamini-Hochberg FDR method was used for multiple testing correction to adjust significance thresholds. The top10 enriched GO terms (sorted by FDR) for each gene list were plotted.
For single-cell data, 92 cells with more than 2000 genes detected (counts per million, CPM >0) were kept for clustering analysis. Gene counts (CPM) for each cell were first normalized using TMM method 96 and then winsorized before log transformation (In(CPM+1)).
To perform unsupervised clustering, 189 genes were selected and used, if they were differentially expressed in bulk mRNA-seq (|log2FC|>1.5, FDR < 0.1) and have an expression level (In(CPM+1) > 4) in at least one single cells. The Manhattan distance and ward.D method 29 were used for unsupervised clustering. To perform PCA analysis on single cells, R package FactoMineR 97 was used with default settings.
ATAC-seq reads mapping, peak calling and differential peak identification

Motif enrichment and functional enrichment analysis
Motif enrichment analysis was conducted using CentriMO 102 (version 4.11.2). In order to obtain the summits of GFP+/-specific peaks, we first perform BEDtool-intersect between the peak summits determined by MACS2 99,100 and the GFP+/-specific peaks identified by diffBind 101 . By doing this, we identified 3861 GFP+ specific summits and 1635 GFP-specific ones. 250bp was extended to each side from the peak summits and the new regions with 501 bp uniform length were used as input for CentriMO. To identify motifs enriched in GFP+(-) specific 30 peaks, the same numbers of shared peaks were subsampled and used as the comparative dataset in the "absolute and differential model". All other parameters were kept as default.

Identification of sequence-conserved open regions
Two zebrafish CNE datasets were used for this analysis. First, Zebrafish CNEs (zCNEs) conserved with human or mouse identified from Hiller et al. were obtained from the authors, including records of direct sequence alignment and transitive mapping through other species or reconstructed ancestries 25 . zCNEs that overlaps a zebrafish-human well-aligning window 25 by at least 15bp were defined as direct aligned. zCNEs that can be mapped back to mouse through transitive alignment or ancestry reconstruction but cannot be detected by direct alignment were defined as indirectly aligned CNEs. The same direct and indirect alignment definition was set for zCNE conserved with mouse (zCNE_mouse) To include more high-quality zebrafish CNEs into our analysis, we used another zebrafish-human CNE dataset identified in a recent study through transitive alignment via the spotted gar genome 26 , which we referred to as "garCNE". If the zebrafish coordinate of a garCNE does not overlap any zCNE records, we added it into our analysis and define it as 31 indirect aligned. Altogether, we collect 20,005 zebrafish CNEs conserved with human, with 10187 direct and 9818 indirect ones (Supplementary Table 1).
To establish the same system for CNEs conserved between zebrafish and mouse, we used liftOver to convert the human-zebrafish CNEs identified through gar genome to mouse-zebrafish  Table 1). Finally, we associated zebrafish ATAC-seq peaks to CNEs (sequence-conserved ATAC-seq peaks) if they overlap a CNE by at least one base pair. Nearly 90% of the these CNE-associated ATAC-seq peaks were completely excluded from gene coding regions defined by Ensembl transcriptome annotation (Zv9, release 79).

Identification of aCNEs
Human or mouse DNase master peak lists from ENCODE3 48 (https://www.encodeproject.org/search/?type=Annotation&annotation_type=DNase+master+pea ks) were used for identifying open regions anciently conserved between zebrafish and human or between zebrafish and mouse. Mouse DNase master peaks, which were provided as mm10 coordinates, were converted to mm9 coordinates using liftOver (-minMatch=0.95). If a CNE that overlaps a zebrafish ATAC-seq peak by at least one basepair also overlaps a DNase I hypersensitivity site (DHS) from the DNase master lists by at least one basepair, then these orthologous ATAC-seq peak and DHS linked via CNEs were identified as accessible CNEs (aCNEs). 32 There are 1,199,722 non-overlapping regions in mouse DHSs master list after liftOver, 4,193,929 non-overlapping regions in human DHS master list. Probably due to the total number differences, more human DHSs have been identified as aCNEs than mouse DHSs; similarly, more ATAC-seq peaks were identified as aCNEs shared with human than those with mouse (Supplementary Table 1).
We found analysis with the DHS master lists sometimes gave us a chain of DHSs that are conserved with the same ATAC-seq peaks. To avoid potential bias that may be introduced in analysis conducted by GRAET, we merged the DHSs that are within 200bp distance and conserved with the same ATAC-seq peaks. We used merged DHSs for analyses in Fig Quiescent/Low) and "Others" (8: ZNF genes & repeats, 9: Heterochromatin). If an aCNE region was associated with an "Active" state in more than 80% of the 127 tissues/cell types, the aCNE was considered as constitutively "Active". If an aCNE region were "Polycomb repressed" or "Quiescent" in more than 70% of the 127 tissues/cell types and "Active" in at least one epigenome, it was considered as "Active" in a lineage specific manner.

Processing and analyzing of published mouse cardiac TF ChIP-seq data
Mouse cardiac ChIP-seq or ChIP-exo data from the following studies were collected for annotating aCNEs.

Author Contributions
Conception

Competing Financial Interests statement
The authors declare no conflicts of interests.  Each curve shows the probability of the best match to a given motif occuring at a given position in the input sequences. Solid lines represent probabilities caculated from query peak sets (GFP+/-specific peaks) while dash lines show that from the background sequences (shared peaks). (D) GFP in-situ and immunostaining on Tg(Smarcd3-F6: EGFP) emrbyos that were uninjected (control) or injected with gata5/6 morpholinos. All staining and imaging were performed under the same condition for the control and gata5/6 KD groups.   Row names were formatted as "ID"-"EpigenomeName", which were retrieved from Roadmap Epigenomics Project metadata table (http://egg2.wustl.edu/roadmap/web_portal/meta.html).