Abstract
Endothelial cells (EC) differentiate from multiple sources, including the cardiopharyngeal mesoderm, which gives rise also to cardiac and branchiomeric muscles. The enhancers activated during endothelial differentiation within the cardiopharyngeal mesoderm are not completely known. Here, we use a cardiogenic mesoderm differentiation model that activates an endothelial transcription program to identify endothelial regulatory elements activated in early cardiogenic mesoderm. Integrating chromatin remodeling and gene expression data with available single-cell RNA-seq data from mouse embryos, we identify 101 putative regulatory elements of EC genes. We then apply a machine-learning strategy, trained on validated enhancers, to predict enhancers. Using this computational assay, we determine that 50% of these sequences are likely enhancers, some of which are already reported. We also identify a smaller set of regulatory elements of well-known EC genes and validate them using genetic and epigenetic perturbation. Finally, we integrate multiple data sources and computational tools to search for transcriptional factor binding motifs. In conclusion, we show EC regulatory sequences with a high likelihood to be enhancers, and we validate a subset of them using computational and cell culture models. Motif analyses show that the core EC transcription factors GATA/ETS/FOS is a likely driver of EC regulation in cardiopharyngeal mesoderm.
Similar content being viewed by others
Background
In mammals, endothelial cells (EC) derive, through vasculogenesis, from the mesoderm but they can differentiate from multiple sources1,2. ECs are heterogeneous in their function, transcriptional program, and chromatin landscape; single-cell-based studies have provided a wealth of data on this effect3,4. Heterogeneity has different causes5, but, at least in part, it may depend upon lineage of origin as well as epigenomic and enhancer profiles.
The cardiopharyngeal mesoderm (CPM) lineage provides progenitors to various tissues and organs of the lower face, mediastinum, and heart6. The CPM also provides multipotent progenitors that differentiate into ECs7,8,9,10. In addition, the second heart field (SHF)11,12,13 which derives from the CPM, provides EC progenitors to various components of the cardiovascular system, including ECs of the pharyngeal arch arteries and outflow tract14,15, which, through endothelial-to-mesenchymal transition, contribute to cardiac valve formation. In particular, Tbx1 expression, which is a marker of the CPM and SHF, identified these ECs through genetic labeling driven by a Tbx1Cre allele15,16. In addition, time-controlled genetic labeling with an inducible Cre recombinase determined that Tbx1 was activated in EC progenitors within the time window E7.5-E8.5 in mouse embryos15. The molecular events that drive EC differentiation in the CPM are mostly unknown, with some exception8, and a suitable approach to define them would be to use a dynamic model in which chromatin remodeling and gene expression can be monitored at critical developmental times. Using such an approach, we found that the differentiation of cardiogenic mesoderm from mouse embryonic stem cells (mESCs) activated an endothelial transcription program. We then measured gene expression and chromatin accessibility in differentiating mESCs within the activation window to identify differentially expressed genes and differentially accessible regions. We then used EC gene information from published single-cell RNA-seq data obtained from Tbx1Cre and Mesp1Cre sorted cells from E8.5-E9.5 mouse embryos. Data integration and analysis identified and computationally scored 101 putative regulatory elements activated in the Tbx1Cre-selected EC cluster. Finally, we identified and validated putative regulatory elements associated with a small set of well-known EC genes.
In summary, our results provide a systematic experimental approach to identify cell type-specific regulatory elements during differentiation, and the results obtained shed light on the EC regulatory elements activated during cardiogenic mesoderm differentiation.
Results
Cardiac mesoderm differentiation of mouse embryonic stem cells activates endothelial differentiation, and it can be driven to yield a nearly homogeneous EC population
We have used a published protocol to derive cardiac mesoderm from mouse embryonic stem cells (mESCs)17 (Fig. 1a). We found expression of endothelial genes at differentiation day 4 (d4), while these genes were not detected at d2 (Fig. 1b). RNA-seq analysis performed on two replicates from two independent differentiation experiments confirmed the activation of an endothelial expression program within this time window (Fig. 1c, Table 1, and Supplementary Data 1). Therefore, we used the d2-d4 time window in our search for EC enhancers.
Flow cytometry using the endothelial-specific marker VE-Cadherin, encoded by the Cdh5 gene, revealed that at d2, there were no detectable VE-Cadherin+ cells, while at d4, a small percentage (19%) was present (Fig. 2a). Therefore, we extended the differentiation protocol in order to increase the EC population. To this end, at d4, we added a high concentration of VEGFA (200 ng/ml) and Forskolin (2 µM), as suggested by a published protocol18. Following this treatment, at d6 and d8, the percentage of VE-Cadherin+ cells increased to 57.4 and 91.3%, respectively (Fig. 2a). Similar results were obtained in multiple experiments. Next, we performed Matrigel assays on d8 cells19 in order to determine whether they formed the tubule-like networks expected of fully differentiated ECs. Results indicated that d8 cells had this capacity (Fig. 2b). Thus, this modified differentiation protocol produced a nearly homogeneous population of ECs.
An unbiased strategy identifies putative regulatory elements in early EC differentiation
Having identified the d2-d4 time window for the activation of an EC transcription program, we performed ATAC-seq at these two time points in order to localize regions of dynamic chromatin accessibility across the genome. Experiments were performed in two biological replicates (from two independent differentiation experiments), and we subsequently considered consensus peaks only, i.e., peaks that were called in both replicates. Thus, we identified a total of 20,268 consensus peaks at d2 and 17,110 at d4 (Supplementary Data 2), of which 8773 were differentially accessible regions (DARs) as determined by the DiffBind and Descan2 software tools20,21; again, we only considered DARs that were identified by both tools (Fig. 3a, b). We then derived a list of marker genes of an EC cluster obtained from single-cell RNA-seq (scRNA-seq) experiments performed on cells selected using a Tbx1Cre driver combined with a GFP reporter. Tbx1 is a CPM marker, and cells were FACS-purified from E8.5 and E9.5 mouse embryos22. The EC cluster was characterized by 252 marker genes (listed in Supplementary Data 3) that we intersected with our dataset of d2-d4 DARs opened at d4 (n = 4408) (Fig. 3a). This resulted in 101 regions that were significantly more accessible at d4, compared to d2, and were associated with EC expressed genes (Table 2). We then performed a computational prediction of the probability that these regions are enhancers. To this end, we used a machine-learning procedure based on a logistic regression test trained on validated enhancer sequences (see methods for details). The average scores obtained with this procedure are reported in Table 2. Of the 101 regions identified, 57 scored more than 0.5, indicating a significant likelihood of being enhancers; of these, 15 (26%) have been reported in the literature (references indicated in Table 2). A search for transcription factor binding motifs within the 101 regions identified significantly enriched motifs (as the background, we used the peakome associated with expressed genes). Specifically, we found motifs of GATA, ETS transcription factor families, and FOS, a subunit of the AP-1 transcription complex (Fig. 3c and Supplementary Data 4). GATA and ETS factors are co-present in 55% (56 out of 101) of the regions tested. The expression of Gata1, Gata2, Gata4, Gata6, Fos, and Erg, as well as other ETS family members (e.g., Ets1, Ets2, Etv2, Fli1, Elk1, Elf1) genes were strongly upregulated at d4, relative to d2 (Supplementary Data 1).
Next, we repeated the procedure using marker genes expressed in endothelial clusters of a scRNA-seq dataset from Mesp1Cre-sorted cells at the same developmental stage (E9.5)22. Mesp1Cre-sorted cells include ECs derived from the entire anterior mesoderm and not just the cardiopharyngeal mesoderm, practically the entire vascular bed of the trunk, as the head and most posterior regions of the embryos were removed before sorting22. In this study, two endothelial clusters were identified, named c2 and c16, that shared 801 marker genes (Supplementary Data 3). We mapped DARs upregulated at d4 to these sets of genes and identified 536 unique putative regulatory elements, of which, 283 (52.8%) had a score above 0.5 (Supplementary Data 3). We conducted motif searches using DARs upregulated at d4 mapped to marker genes of the two endothelial clusters separately, 434 regions for Mesp1Cre c16 and 367 regions for Mesp1Cre c2. Results identified a more extensive set of motifs than the one identified using the Tbx1Cre dataset, but the most enriched ones were again GATA and ETS factors (Supplementary Data 4).
Identification and validation of EC regulatory elements (RE) associated with major EC differentiation genes
Next, we applied a different approach to the identification of EC-REs: we selected a group of well-known endothelial genes that were expressed at d4, on the basis of RNA-seq data, and that exhibited regions of increased chromatin accessibility at d4. We focused on 6 putative, non-promoter REs associated with six genes: Kdr (encoding VEGFR2), Chd5 (encoding VE-CADHERIN), Eng (encoding ENDOGLIN), Flt1 (encoding VEGFR1), Pecam1, and Notch1. Computational prediction indicated that four of the six putative EC-REs identified had a score above 0.5, indicating a high probability of being enhancers (Table 3 and Fig. 4). Furthermore, the two putative EC-REs associated with Kdr and Notch1 are amongst the 101 regions open at d4 and mapped to EC marker genes (asterisks on Table 2).
To test the importance of the putative EC-REs, we used an epigenetic reprogramming strategy based on CRISPR-dCAS9:LSD1 (Fig. 5a). We first generated an ES cell line that stably expressed the dCAS9:LSD1 construct (named #B1 dCas9-LSD1). We then designed crRNAs targeting the six EC-REs and a control crRNA targeting a gene desert sequence (Supplementary Data 5). We transfected #B1 dCas9-LSD1 cells with targeting and control gRNAs complex (crRNAs:ATTO-tagged tracrRNA), FACS-purified the ATTO+ cells, and subjected them to the EC differentiation protocol. Cells were harvested at d4, d6, and d8 and the expression of the targeted genes was measured by quantitative real-time PCR (qPCR). Experiments were repeated at least four times. Results showed that LSD1 targeting of the six EC-REs resulted in reduced expression of the associated genes (Fig. 5b), with the exception of the Pecam1-associated RE, which also had a low probability score (Table 3). In most cases, the reduction in gene expression was more evident at the later stages of differentiation tested, namely d6 and, even more so at d8.
The EC-REs for Notch1 and Pecam1 are required for gene expression
Next, we selected the Notch1 and Pecam1 EC-REs for further testing based on the importance of the associated genes for EC differentiation, and because of the negative results obtained with epigenetic reprogramming for Pecam1. We deleted the putative EC-REs using CRISPR-Cas9; for each RE, we selected two gRNAs flanking the segment (Fig. 6a, gRNA sequences are shown in Supplementary Data 5), and these were transfected into mESCs along with the Cas9 protein and ATTO-labeled tracrRNA. We then plated FACS-purified ATTO+ cells to a clonal density. Clones were later picked and expanded into 96-well plates. DNA extracted from clones was screened by PCR to identify clones carrying homozygous deletion of the putative RE. We expanded two homozygously deleted clones for each deleted RE. All four clones were used in multiple differentiation experiments (n = 5). Results showed that both Pecam1 and Notch1 expression were significantly affected by the deletion of their respective REs at d6 and d8 (Fig. 6b).
We next tested whether the deletion of the Notch1 EC-RE affected the expression of a subset of NOTCH1 target genes, namely Hes1, Nrarp, and Dll4. Results showed that all of these genes were affected by the deletion of the enhancer, but with some differences. Specifically, Hes1 and Nrarp were significantly and consistently downregulated at d6 but not at d8. Conversely, Dll4 was downregulated only at d8 (Fig. 6c). Thus, deletion of the EC-RE identified here was sufficient to cause dysregulation of at least part of the NOTCH1 signaling pathway.
Next, we used the Notch1 enhancer deletion lines (clones #7 G, #11B Notch1-∆enh.in15), along with the parental WT line, to generate gastruloids as described by ref. 23. Gastruloids developed a primitive EC network (PECAM1-positive) in both the WT and mutant lines (examples in Fig. 7a, n = 10 immunostained for each clone). However, mutant gastruloids appeared more densely stained, although we could not quantify them due to the complexity of patterns. Therefore, we subjected the lines to a standard Matrigel test and evaluated the branching points of the EC network generated (see Methods). Results of five independent experiments showed that Notch1 mutant clones developed a more intricated EC network with a significantly higher number of branch points (Fig. 7b). These results are consistent with the NOTCH1 signaling role in limiting vessel branching24.
Discussion
The cardiopharyngeal mesoderm6 is a lineage that provides progenitors to various structures including those of the heart, pharyngeal apparatus, and vessels9,10. Endothelial cells are heterogeneous in origin and single-cell sequencing assays are starting to define specific transcription and chromatin profiles depending on the tissue of origin4. However, whether cells destined to differentiate in EC are primed by distinct mechanisms according to their origin, is still unclear. One possible avenue to address this question is to identify tissue-specific enhancers for each lineage. In this study, we propose an approach that leverages novel and published data, integrated with software tools, and genetic/epigenetic editing in a cell differentiation model. This integrated approach identified a group of putative EC enhancers, some of which had already been reported in the literature and were validated in our model. For this study, we used a mesoderm differentiation protocol originally proposed for cardiogenic mesoderm induction17 and observed the activation of EC-specific gene expression and chromatin remodeling (as assayed by ATAC-seq) 48 h after induction. However, at this stage (d4), only a small percentage of cells exhibit an EC phenotype as defined by the expression of VE-cadherin on the cell surface, suggesting that activation of the EC program is at an early stage. Boosting VEGF signaling after mesoderm induction promoted EC differentiation such that a near-homogeneous EC-like population was obtained at d8, as measured by VE-cadherin expression and Matrigel assays. We selected to leverage chromatin dynamics of EC-specific gene activation at an early developmental time window (d2-d4) in order to capture the regulatory sequences associated with the activation of the EC program in cardiogenic mesoderm. To this end, we used data-rich bulk ATAC-seq and RNA-seq information, combined with published high-resolution tissue- and time-specific scRNA-seq of cells that were selected using the Tbx1cre driver, Tbx1 being a marker of the cardiopharyngeal mesoderm. This enabled us to identify 101 regions that became more accessible in the selected time window and mapped to EC genes defined by scRNA-seq data (Table 2). Of the 57 putative enhancers scoring >0.5 identified through our unbiased approach, 15 (26%) were already reported in the literature, thus suggesting that our approach was efficient in detecting likely regulatory elements in our model. In addition, motif analysis showed enrichment of transcription factors known to be involved in EC development, further supporting the suitability of the approach. However, we did not validate these putative enhancers directly in our model; thus, further work will be necessary to establish the reliability of our approach for systematic identification of cell type-specific enhancer sequences.
The candidate gene approach was designed to identify regulatory sequences “activated” in our model and associated with genes known to be involved in EC development. We have validated them regardless of the prediction score and found five of the six tested putative REs to regulate the respective genes. Overall, we identified regulatory elements for many of the known genes involved in EC development, including a subset of genes expressed in CPM-derived cells in vivo like, for example, Notch18.
Epigenetic reprogramming, while providing consistent results, proved to be variable in our hands. Sources of variability may be the efficiency of transfection, the gRNAs, or perhaps the variable extent of chromatin modification induced by the dCAS9:LSD1 complex. Furthermore, the inconsistent results obtained with the Pecam1 putative enhancer using epigenetic reprogramming and gene editing may be due to different reasons. We speculate that perhaps the sequence is not a regulatory element (as suggested by the low prediction score), but the genetic deletion may have altered the expression of the gene by interfering with processes like RNA maturation/splicing or causing other structural perturbation of the gene.
The dCas9-recruited repressor could potentially cause chromatin modifications beyond the intended targeted sequence, particularly if the promoter is nearby. The six enhancers tested with this method are all fairly distant from the transcriptions start site (TSS, Table 3). The closest is 8.7Kb from the TSS, the others are between 15 and 67 Kb.
Searches for consensus sequences in the putative enhancer regions identified GATA motifs as the most enriched, together with ETS factors (ERG, FLI,) and the AP-1 subunit FOS (Fig. 3b). EC genes from cell clusters derived using the Mesp1Cre driver22, which captures a larger and more diverse EC population than Tbx1Cre, exhibited DARs enriched for a more extensive set of motifs, but they also included GATA and ETS motifs. Interestingly, the Mesp1cre dataset motifs also included transcription factor families that play a role in CPM development, such as T-BOX, FOX, and MEIS factors (Supplementary Data 4), raising the question of whether they may be involved in enabling the EC transcription program in the CPM.
Overall, the results of consensus sequence searches suggest that there is a core of transcription factors, GATA-ETS-FOS, that are central to the activation of the EC program in our model. There is ample literature indicating GATA and ETS transcription factors as general players in endothelial differentiation (reviewed in De Val and Black, 2009)2. It is therefore possible that during mesoderm induction in our system, GATA factors act as pioneers to establish the conditions necessary for the binding of other, lineage determining factors, such as ETS/ERG. This is consistent with the established role of GATA factors as pioneer transcription factors (review in refs. 25,26) and the role of ETS factors as core transcription factors in endothelial differentiation3,27,28,29,30,31.
Overall, our strategy efficiently identified putative enhancers of cell type-specific genes during differentiation. It provided us with an extensive list of regulatory sequences with probability scores calculated using a machine-learning approach. Furthermore, it allowed us to identify and validate a smaller set of regulatory sequences of well-known genes involved in EC differentiation. The identification and validation strategies applied here are applicable to other cell types, whenever a suitable differentiation model is available, although more extensive bench validation experiments will be required before the proposed approach may be considered an established pipeline for enhancer identification.
Methods
Mouse embryonic stem cells (mESC) culture and manipulation
ES-E14TG2a mESCs (ATCC CRL-1821) were cultured without feeders and maintained undifferentiated on gelatin-coated dishes in GMEM (Sigma Cat# G5154) supplemented with \({10}^{3}\) U/ml ESGRO LIF (Millipore, Cat# ESG1107), 15% fetal bovine serum (ES Screened Fetal Bovine Serum, US Euroclone Cat# CHA30070L), 0.1 mM non-essential amino acids (Gibco, Cat# 11140-035), 0.1 mM 2-mercaptoethanol (Gibco, Cat# 31350-010), 0.1 mM l-glutamine (Gibco, Cat# 25030081), 0.1 mM Penicillin/Streptomycin (Gibco, Cat# 10378016), and 0.1 mM sodium pyruvate (Gibco, Cat# 11360-070). Cells were passaged every 2–3 days using 0.25% Trypsin-EDTA (1X) (Gibco, Cat# 25200056) as the dissociation buffer.
For differentiation, E14-Tg2a mESCs were dissociated with Trypsin-EDTA and cultured at 75,000 cells/ml in serum-free media: 75% Iscove’s modified Dulbecco’s media (Cellgro Cat# 15-016-CV) and 25% HAM F12 media (Cellgro #10-080-CV), supplemented with N2 (GIBCO #17502048) and B27 (GIBCO #12587010) supplements, penicillin/streptomycin (GIBCO #10378016), 0.05% BSA (Invitrogen Cat#. P2489), l-glutamine (GIBCO #25030081), 5 mg/ml ascorbic acid (Sigma A4544) and 4.5 × 10−4 M monothioglycerol (Sigma M-6145). After 48 h in culture, the EBs were dissociated using the Embryoid Body dissociation kit (cod. 130-096-348 Miltenyi Biotec) according to the manufacturer’s protocol and reaggregated for 40 h in serum-free differentiation media with the addition of 8 ng/ml human Activin A (R&D Systems Cat#. 338-AC), 0.5 ng/ml human BMP4 (R&D Systems Cat# 314-BP), and 5 ng/ml human-VEGF (R&D Systems Cat#. 293-VE). The 2-day-old EBs were dissociated and 6 × \({10}^{4}\) cells were seeded onto individual wells of a 24-well plate coated with 0.1% gelatine in EC Induction Medium consisting of StemPro-34 medium (Gibco #10639011), supplemented with SP34 supplement, l-glutamine, penicillin/streptomycin, 200 ng/ml human-VEGF, and 2 μM Forskolin (Abcam, ab120058). The Induction Medium was changed after 1 day. On day 6 of differentiation, the cells were dissociated and replated on 0.1% gelatine-coated dishes at a density of 25,000 cells/\({{{\mbox{cm}}}}^{2}\) in EC Expansion Medium, consisting of StemPro-34 supplemented with 50 ng/ml human-VEGF. Stem cell-derived endothelial cells were maintained until they reached confluency (about 2–3 days). EC Expansion Medium was replaced every other day.
For the Matrigel assay, 300 µL of Matrigel (BD Matrigel Basement Membrane Matrix Growth Factor Reduced, Phenol Red Free cat. 356231) was aliquoted into each well of a 24-well plate and incubated for 30–60 min at 37 °C to allow the gel to solidify. About 200,000 ECs were then added to the Matrigel-coated well and cultured for 24 h at 37 °C. Formation of tubular structures on a two-dimensional Matrigel surface was observed after 16 to 24 h under an optical microscope. The quantification of branch points was performed using the Angiogenesis Analyzer module32 of Image J. The total number of branch points per image was quantified. We performed statistical analyses of branch counts using the parametric paired t-test, one-tailed.
Gastruloid formation assay
Gastruloids were generated as described in ref. 23. In brief, 300 mESCs were plated in 40 μL N2B27 medium in 96-well Ultra-Low Cluster Round Bottom Ultra-Low Attachment plates (7007, Corning). After 48 h, 150 μL of N2B27 supplemented with 3 μM CHIR-99021 (S1263, Selleckchem) were added to each well. Then after 72 h, the medium was changed with 150 μL of fresh N2B27. At 96 h, gastruloids were transferred 1:1 in 100 μL of medium in 24-well Flat Bottom Ultra-Low Attachment plates (3473, Corning), containing 700 μL of fresh N2B27 supplemented with 30 ng/mL bFGF (Recombinant Human FGF-basic, 100-18 C, Peprotech), 5 ng/ml VEGF (Recombinant Human VEGF165, 100-20, Peprotech), and 0.5 mM ascorbic acid (Sigma A4544) (N2B27 +++). Then, 50% of the medium was changed daily, at 120 h with 400 μL of fresh N2B27 +++ and at 144 h with N2B27, until 168 h.
Immunofluorescence and confocal imaging on fixed gastruloids
For whole-mount immunofluorescence, gastruloids at 168 h were washed in 1x PBS and fixed in 4% PFA overnight at 4 °C while shaking. Then, fixed samples were washed three times in 1x PBS (5 min each) and three times (5 min each) in blocking solution1 (1x PBS, 10% Goat Serum, 0.1% Triton X-100) at 4 °C with agitation. Gastruloids were blocked for 1–2 h at 4 °C in blocking solution1 and then were incubated o.n. with primary antibody anti PECAM1 (mouse MA3105, Thermo Fisher Scientific) 1:200 in blocking solution1 at 4 °C with agitation. The day after, samples were washed two times (5 min each); then three times (15 min each), and finally four to six times (for 1 h total) in blocking solution1 at 4 °C while shaking. Gastruloids were incubated o.n. with secondary antibody Goat Anti-Armenian hamster IgG H&L (Alexa Fluor® 488, Abcam ab173003) and DAPI in blocking solution1 at 4 °C while shaking. The day after, samples were washed two times (5 min each) in blocking solution1 at 4 °C; then two times (5 min each) at room temperature (RT) with blocking solution 2 (1x PBS, 0.2% Goat Serum, 0.2% Triton X-100) and finally three times (15 min each) in blocking solution 2 at RT while shaking. Subsequently, gastruloids were incubated in blocking solution 2/100% glycerol (Sigma) 1:1 for 30 min at RT with agitation. Then, they were maintained in plates with blocking solution 2/70% glycerol 7:3 at 4 °C. Images were acquired with Nikon A1 Confocal Microscopy (equipped with Nikon Resonant Scanner and NIS-A/NIS-Elements software).
CRISPR-Cas9-mediated targeting
(A) Pecam1 intron 2-enhancer deletion was induced in E14-Tg2a using Alt-R™ CRISPR-Cas9 System (IDT) following the manufacturer’s specifications. This genome editing system is based on the use of a ribonucleoprotein (RNP) consisting of S. pyogenes Cas9 nuclease complexed with guide RNA (crRNA:tracrRNA duplex). The crRNA is a 20 nt custom-synthesized sequence that is specific for the target and contains a 16 nt sequence that is complementary to the tracrRNA. The specific crRNA sequences were: Pecam1_int2-crRNA1 and Pecam_int2-crRNA3 (sequences shown in Supplementary Data 5). CRISPR-Cas9 tracrRNA-ATTO 550 (5 nmol catalog no. 1075927) is a conserved 67 nt RNA sequence that is required for complexing to the crRNA so as to form the guide RNA that is recognized by S.p. Cas9 (Alt-R S.p. Cas9 Nuclease 3NLS, 100 μg catalog no. 1081058). The fluorescently labeled tracrRNA with ATTO™ 550 fluorescent dye is used to FACS-purify transfected cells. The protocol involves three steps: (1) annealing of the crRNA and tracrRNA, (2) assembly of the Cas9 protein with the annealed crRNA and tracrRNAs, and (3) delivery of the ribonucleoprotein (RNP) complex into mESC by reverse transfection. Briefly, we annealed equimolar amounts of resuspended crRNA and tracrRNA to a final concentration (duplex) of 1 μM by heating at 95 °C for 5 min and then cooling to room temperature. The RNA duplexes were then complexed with Alt-R S.p. Cas9 enzyme in OptiMEM media to form the RNP complex, which was then transfected into mESCs using the RNAiMAX transfection reagent (Invitrogen #13778-150). After 48 h incubation, cells were trypsinized and ATTO 550+, transfected cells were purified by FACS. Fluorescent cells (~65% of the total cell population) were plated at very low density to facilitate colony picking. We picked and screened PCR 96 clones. Primer sequences are indicated in the Supplementary Data 5. Positive clones were confirmed by DNA sequencing.
(B) For the Notch1 intron 15-enhancer deletion, we followed the same procedure but using different target sequences: Notch1_int15-crRNA1 and Notch1_int15-crRNA3 (sequences shown in Supplementary Data 5).
Generation of dCas9-LSD1 expressing mESC line
About 20 µg of plasmid p-dCas9-LSD1-Hygro (a gift from Stephan Beck and Anna Koeferle, available through Addgene plasmid #104406; http://n2t.net/addgene:104406; RRID:Addgene_104406) was linearized with AhdI enzyme and electroporated in mESC (1 × 107 cells/10 cm plate). The electroporation parameters used were 0.24 kV and 500 μF. The cells were maintained in Hygromicin B selection (500 µg/ml) for 10 days. Individual colonies were isolated, expanded, and screened by PCR for inserted sequences for both DNA and RNA. Primer sequences are in the Supplementary Data 5.
CRISPR-dCas9:LSD1-mediated epigenetic reprogramming strategy
Epigenetic targeting of putative enhancer elements was induced by transfection of dCas9-LSD1-expressing mESC line with specific gRNA complex (crRNA:tracrRNA duplex). For each enhancer element, we designed three crRNA sequences (shown in Supplementary Data 5).
Then, we annealed equimolar amounts of resuspended crRNA and tracrRNA labeled with ATTO™ 550 fluorescent dye to a final concentration (duplex) of 1 μM by heating at 95 °C for 5 min and then cooling to room temperature. For gRNAtransfection, cells were plated at 8 × 105 per well in six-well plates and transfected with gRNA complex (crRNA:tracrRNA 10 nM) in an antibiotic-free medium using Lipofectamine RNAiMAX Reagent (Invitrogen #13778 - 150), according to the instructions of the manufacturer. Twenty-four hours after transfection, fluorescent ATTO 550+ cells (~90 – 95% of the total cell population) were harvested and subjected to the differentiation protocol. crRNA sequences are listed in the Supplementary Data 5.
Flow cytometry
We dissociated cells with Trypsin-EDTA or with the Embryoid Body dissociation kit (cod. 130-096-348 Miltenyi Biotec). Dissociated cells (1 × 106 cells/100 μl) were incubated with primary antibodies (VE-Cadherin-APC, mouse cod.130-102-738) directly conjugated (1:10) in PBS-BE solution (PBS, 0.5% BSA, 5 mM EDTA) for 20 min on ice. Subsequently, cells were washed twice with 2 ml of PBS-BE. Cells were analyzed using the BD FACS ARIAIII™ cell sorter. Negative controls were incubated with fluorochrome-labeled irrelevant isotype control antibody (REA Control APC, mouse cod. 130-113-446 Miltenyi Biotec).
Quantitative RT-PCR
Total RNA was isolated from mouse ESCs with QIAzol lysis reagent (Qiagen #79306), according to the manufacturer’s protocol. The isolated RNAs were quantified using a NanoDrop spectrophotometer 1000. Before reverse transcription, RNA samples were treated with DNAse I to eliminate any contamination with genomic DNA.
cDNA was transcribed using 1 or 2 μg total RNA with the High-Capacity cDNA reverse transcription kit (Applied Biosystem catalog. n. 4368814). cDNAs were amplified using myTaq™ DNA polymerase (Meridian Bioscience) and a standard three-step cycling PCR profile: 10 min at 94 °C, 30 amplification cycles (denaturation at 94 °C for 30 s, annealing at 60 °C for 30 s, and extension at 72 °C for 30 s), followed by a final extension at 72 °C for 10 min. Quantitative gene expression analyses (qRT-PCR) were performed using PowerUp™ SYBR Green Master Mix (Applied Biosystem #A25742). Relative gene expression was evaluated using the “2-ΔCt” method, and Gapdh expression as a normalizer. cDNA was amplified by qRT-PCR, using StepOnePlus™ Real-Time PCR System. The run used was holding stage (95 °C - 10 min); cycling stage (95 °C – 15 s, 60 °C – 1 min for 40 cycles); melt curve stage (95 °C – 15 s, 60 °C – 1 min, 95 °C – 15 s). The cycle threshold (Ct) was determined during the geometric phase of the PCR amplification plots, as illustrated in the manufacturer’s protocol. Expression data are shown as the mean ± SD. Primer sequences are listed in Supplementary Data 5. GraphPad Prism software v8.00 (GraphPad) was used to analyze qRT-PCR data. Relative mRNA levels were analysed in triplicate and data were presented as means ± SD. Two-way repeated measures ANOVA test (ANOVA two-way-RM) was used to assess the statistically significant interaction effect between “time” and “genotype” on a gene expression variable. Other two statistical methods between groups of data were used: nonparametric and parametric test. The first was a nonparametric Wilcoxon matched-pairs signed-rank test, one-tailed; the second statistical analysis were performed using the parametric Student’s paired t-test, one-tailed. Shapiro–Wilk test was performed to determine the normality distribution of the dataset.
RNA-seq
Total RNA was isolated from d2 (n. 2 biological replicates) and d4 (n. 2 biological replicates) cells with QIAzol lysis reagent (Qiagen #79306), according to the manufacturer’s protocol. RNA concentration was estimated using a Nanodrop spectrophotometer 1000. Libraries were prepared according to the Illumina strand-specific RNA-seq protocol. Libraries were sequenced on the Illumina platform NextSeq500, in paired-end, 75 bp reads.
ATAC-seq
mESCs were collected on day 2 and day 4 and then washed two times in PBS, harvested, counted using a hemacytometer chamber, and pelleted. About 15,000 cells/sample for mESC were treated with Tagment DNA Buffer 2x reaction buffer with Tagment DNA Enzyme (Illumina) according to the manufacturer’s protocol. After washes in PBS, cells were suspended in 50 mL of cold lysis buffer (10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% IGEPAL CA-630) and immediately spun down at 500 \(\times\) g for 10 min at 4 °C. Fresh nuclei were treated with Transposition mix and Purification (Illumina #FC121-130), the nuclei were incubated at 37 °C in Transposition Reaction Mix (25 µL reaction buffer, 2.5 µL Transposase, 22.5 µL Nuclease-free water), purified using Qiagen MinElute PCR Purification Kit (catalog no./ID: 28006) and eluted in 10 µL of nuclease-free water. The sequencing library was prepared from the fragmented amplified tagmented DNA. Fragmentation size was evaluated using the Agilent 4200 TapeStation. Two biological replicates for each condition were sequenced using the Illumina NextSeq500 system to obtain paired-end (PE) reads of 60 bp.
Sequence data analysis
For RNA-seq sequencing data, we assessed the quality of the paired-end (PE) reads of length 75 bp using FastQC. We filtered the low-quality and short PE reads and trimmed the universal Illumina adapters using cutadapt (v2.9)33 by setting the following parameters: -q 30 -m 30. Post-trimming, we re-assessed the quality and compiled the report using multiQC. We aligned the PE reads to mm10/GRCm38 reference genome (primary assembly) using STAR aligner (v2.6.0a)34 following two steps: (i) Generation of Ensembl mm10 reference genome index (release 102) setting the parameters --sjdbGTFfile 100 --sjdbOverhang 100 (ii) Alignment of PE reads to the reference genome (--sjdbOverhang 100 --quantMode GeneCounts --outSAMtype BAM SortedByCoordinate). We provided the sorted BAM as input and the mm10 (GRCm38.p4) primary assembly annotation file in GTF format(https://www.gencodegenes.org/mouse/release_M10.html) as reference annotation to quantify the gene expression levels with the featureCounts function from the Rsubread package (v2.0.1)35 (annot.ext = ”mm10.v102” gtf file, useMetaFeatures=TRUE, allowMultiOverlap=FALSE, strandSpecific=2, CountMultiMappingReads=FALSE). After that, we retained the expressed gene matrix by filtering out from the read count matrix the zero and low count genes (CPM <0.5) using the proportion test method from the NOIseq package (v2.34.0)36. Then, we processed the expressed gene matrix using NOIseq (v2.34.0), obtaining a set of normalized read counts (Upper Quartile, UQUA) and identified differentially expressed genes (DE) using the noiseq function and setting the posterior probability value cutoff >0.8. Simultaneously, we normalized the expressed count matrix with the sizefactor function in DEseq237 and we used the DESeq function with default options, then we selected the DE genes by setting the adj.p value cutoff <0.01. Finally, we considered DE genes in our study, those genes that were declared DEs from both methods. We performed gene ontology enrichment analysis using the gProfiler2 R package (v0.2.2)38, providing the common DE gene list as input, the expressed gene list as background, and setting Benjamini–Hochberg FDR (BH-FDR) cutoff to <0.01.
ATAC-seq sequences underwent quality control (using FastQC and multiQC), adapter trimming, and filtering using cutadapt (v2.9) with parameters -q 30 -m 30 and universal Illumina adapters. Then, we aligned the PE sequences to the mouse genome (mm10) containing only canonical chromosomes with Bowtie2 (v2.3.4.3)39, setting the options -q -t --end-to-end --very-sensitive -X 1000. After removing reads mapping to the mitochondrial chromosome, we removed duplicates and reads mapping to multiple positions using sambamba (v0.6.8) with -F “[XS] == null and not unmapped and not duplicate”40. We called the ATAC peaks for each sample using MACS2 (v2.1)41 with the option -BAMPE –nomodel –shif100 –extsize 200, which are the suggested parameters to handle the Tn5 transposase cut site. After that, we removed those peaks overlapping the mm10-blacklist regions (downloaded from https://github.com/Boyle-Lab/Blacklist/blob/master/lists/mm10-blacklist.v2.bed.gz) using samtools42. Then, for each condition, we defined the consensus lists of enriched regions as the peak regions common to both replicates using the intersectBed function from the BedTools v2.26.043. To identify differentially enriched regions (DARs) between d4 and d2, we selected the regions consistently enriched or decreased in DEScan2 (v1.18.2)20 and confirmed with the same sign also in DiffBind (v3.8.4)21. For DEScan2, we first created a peak-set using the finalRegions function (zThreshold = 1, minCarriers = 2) after loading all of the MACS2 peaks not overlapping blacklist regions. Then, we called the DARS with the edgeR44,45 method as follows: we estimated the count dispersion using the estimateDisp function, we fitted the robust glmQLFit model and used the glmQLFTest, with the default parameters, setting the adj. p value to 0.01. For DiffBind (v3.8.4), we set dba.count(minOverlap = 2), dba.contrast(minMembers = 2), dba.analyze(method = DBA_EDGER) and 0.01 as adj. p value. Finally, we selected the DARs regions of DEScan2 confirmed by DiffBind using subsetByOverlaps in GenomicRanges5. We annotated the consensus peak lists, the common peak set, and DARs to genes using the makeTxDbFromEnsembl function from ChIPseeker (v1.29.1)46 by associating to each peak/region the nearest gene, setting the TSS region [−1000, 1000] and using them the release 102 from the mus musculus Ensemble database. Finally, we selected the chromatin-enriched regions at d4 with annotated nearest genes intersecting some lists of marker genes from the single-cell experiment described22, (i) the marker genes of endothelial cell clusters in Tbx1Cre Ctrl and cKO embryos at E9.5 (Supplementary Data 5, cluster 6 of the cited publication); (ii) the marker genes of endothelial cell clusters in Mesp1Cre Ctrl and cKO at E9.5 (Supplementary Data 3, cluster 2); (iii) Marker genes of endothelial cell clusters in Mesp1Cre Ctrl and cKO at E9.5 (Supplementary Data 3, cluster 16). We used such regions to identify enriched motifs using HOMER (Hypergeometric Optimization of Motif EnRichment)47. From the motif list of regions associated with genes intersecting marker genes of endothelial cell clusters in Tbx1Cre, we selected GATA3 and ERG motifs, we identified those regions containing both motifs and we calculated the percentage. We performed the enhancer prediction of regions associated with genes intersecting the list of marker genes of endothelial cell cluster 6 in Tbx1Cre Ctrl embryos at E9.5, as described in the next section. Finally, we merged the list of regions derived from the intersections of d4 DARs with annotated nearest genes intersecting lists of marker genes of Mesp1cre c2 and c16 clusters and selected unique regions, then we performed the enhancer prediction using these regions.
Enhancer prediction
We implemented a machine-learning approach to assign a probability score of being enhancers to peak regions from ATAC-seq data using the logistic regression model with the L1 penalty. We performed all analyses using Rstudio and R version 4.2.0 (https://www.r-project.org/).
First, we downloaded the coordinates (chromosome, start-end position) of the 695 enhancers marked as positive from the VISTA ENHANCER Browser (http://enhancer.lbl.gov). Then, since the enhancers’ coordinates were in mm9, we mapped them into mm10 using the lift-over function (https://genome.ucsc.edu/cgi-bin/hgLiftOver). Next, we created 695 non-enhancer regions to use as negative examples. For this purpose, we randomly shuffled the genome to get genomic coordinates that do not overlap the positive enhancer coordinates with the shuffle function from the BedTools v2.26.043 using the parameters -g mm10 -excl the positive enhancer file merged with mm10-blacklist regions file. (https://github.com/Boyle-Lab/Blacklist/blob/master/lists/mm10-blacklist.v2.bed.gz). Such non-enhancer regions have the same lengths as the positive enhancers. Finally, we built a binary response vector with 1390 components assigning 1 to the positive enhancers and 0 to the negative enhancers. Second, we downloaded 385 datasets using Chipseeker (v1.36) R package with the function ChIPseeker::downloadGEObedFiles(genome= “mm10”) (https://bioconductor.org/packages/release/bioc/html/ChIPseeker.html) containing peak coordinates of histone modifications, p300 and CTCF transcription factors, at different cell states and cell types (in mm10). After merging the replicates using intersectBed function from the BedTools, we obtained 327 files. Then, we filtered out the datasets containing KO experiments or other treatments. Overall, we obtained 81 epigenetic tracks of peaks in bed format and 2 in bedgraph format. Next, we intersected the 1390 enhancer and non-enhancer regions with the 81 epigenetic tracks in bed format using the Findoverlaps function from the GenomicRanges R package (v1.52.0) (https://bioconductor.org/packages/release/bioc/html/GenomicRanges.html). Finally, we built a binary matrix of dimension 1390 × 81 where we assigned 1 at the positions with an overlap and 0 where they do not. For the remaining two epigenetics tracks in bedgraph format, we used the sum of the coverage at each of the 1390 regions. The feature matrix of dimension 1390 × 83 is formed by combining the two parts. We split the dataset into the Training set (80% of the 1390 regions) and the Test-set (20% of the 1390 regions). Then, we trained the L1-penalized Logistic Regression (i.e., Lasso logistic regression) on the training set using K-fold cross-validation to choose the best regularization parameter. To this purpose, we used the glmnet package (v4.1.7) (https://cran.r-project.org/web/packages/glmnet/index.html) with the command: cv.out=cv.glmnet(X.train, Y.train,alpha=1, family = “binomial”). After that, we fitted the training set using the lasso.mod.train = glmnet(X.train, Y.train, lambda = bestlam, alpha = 1, family = “binomial”), where bestlam is the regularization parameter obtained from the cross-validation. In the validation phase, we used the assess.glmnet() function to determine the accuracy of the test-set. Finally, we predicted the scores (i.e., the probability of being an enhancer) to the genomic coordinates of interest. Since the scores vary between 0 (corresponding to minimum probability) and 1 (corresponding to maximum similarity), we used the threshold of 0.5 to establish if a given region is an enhancer. The whole procedure, starting from the random generation of the 695 non-enhancer regions to the final prediction, was repeated 10 times, and the final prediction consisted of the average of the individual prediction scores over which we applied the threshold of 0.5.
Statistics and reproducibility
All differentiation experiments have been repeated at least three or four times. RNA-seq and ATAC-seq experiments have been performed in two replicates from two independent differentiation experiments. The specific statistical tests applied are specified in the sections above.
For the evaluation of differential gene expression of real-time PCR results, we used the two-way repeated measures ANOVA test. We also used the nonparametric Wilcoxon matched-pairs signed-rank test, one-tailed. For the evaluation of differential gene expression of data from RNA-seq, we used NOIseq (Bayesian test without parametric assumption)36 and DEseq2 (frequentist test with negative binomial assumption)37. For the evaluation of differential accessibility of data from ATAC-seq, we used DiffBind and DEScan2, as detailed above. For enhancer prediction, we used a lasso-type penalized logistic regression test.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Data supporting the results are included in the figures, Supplementary files, and in the GEO database under the accession number GSE235651 (RNA-seq and ATAC-seq data). The source data behind the graphs in the manuscript can be found in Supplementary Data 6. Any other data and scripts used to run data analysis software are available from the corresponding author (or other sources, as applicable) on reasonable request.
References
Aquino, J. B., Sierra, R. & Montaldo, L. A. Diverse cellular origins of adult blood vascular endothelial cells. Dev. Biol. 477, 117–132 (2021).
De Val, S. & Black, B. L. Transcriptional control of endothelial cell development. Dev. Cell 16, 180–195 (2009).
Sabbagh, M. F. et al. Transcriptional and epigenomic landscapes of CNS and non-CNS vascular endothelial cells. eLife 7, e36187 (2018).
Trimm, E. & Red-Horse, K. Vascular endothelial cell development and diversity. Nat. Rev. Cardiol. 20, 197–210 (2023).
Li, P. & Ferrara, N. Vascular heterogeneity: VEGF receptors make blood vessels special. J. Exp. Med. 219, e20212539 (2022).
Diogo, R. et al. A new heart for a new head in vertebrate cardiopharyngeal evolution. Nature 520, 466–473 (2015).
Lescroart, F., Dumas, C. E., Adachi, N. & Kelly, R. G. Emergence of heart and branchiomeric muscles in cardiopharyngeal mesoderm. Exp. Cell Res. 410, 112931 (2022).
Lescroart, F. et al. Defining the earliest step of cardiovascular lineage segregation by single-cell RNA-seq. Science https://doi.org/10.1126/science.aao4174 (2018).
Lescroart, F. et al. Early lineage restriction in temporally distinct populations of Mesp1 progenitors during mammalian heart development. Nat. Cell Biol. 16, 829–840 (2014).
Devine, W. P., Wythe, J. D., George, M., Koshiba-Takeuchi, K. & Bruneau, B. G. Early patterning and specification of cardiac progenitors in gastrulating mesoderm. Elife 3, e03848 (2014).
Kelly, R. G., Brown, N. A. & Buckingham, M. E. The arterial pole of the mouse heart forms from Fgf10-expressing cells in pharyngeal mesoderm. Dev. Cell 1, 435–440 (2001).
Mjaatvedt, C. H. et al. The outflow tract of the heart is recruited from a novel heart-forming field. Dev. Biol. 238, 97–109 (2001).
Waldo, K. L. et al. Conotruncal myocardium arises from a secondary heart field. Development 128, 3179–3188 (2001).
Wang, X. et al. Endothelium in the pharyngeal arches 3, 4 and 6 is derived from the second heart field. Dev. Biol. 421, 108–117 (2017).
Xu, H., Cerrato, F. & Baldini, A. Timed mutation and cell-fate mapping reveal reiterated roles of Tbx1 during embryogenesis, and a crucial function during segmentation of the pharyngeal system via regulation of endoderm expansion. Development 132, 4387–4395 (2005).
Huynh, T., Chen, L., Terrell, P. & Baldini, A. A fate map of Tbx1 expressing cells reveals heterogeneity in the second cardiac field. Genesis 45, 470–475 (2007).
Kattman, S. J. et al. Stage-specific optimization of activin/nodal and BMP signaling promotes cardiac differentiation of mouse and human pluripotent stem cell lines. Cell Stem Cell 8, 228–240 (2011).
Patsch, C. et al. Generation of vascular endothelial and smooth muscle cells from human pluripotent stem cells. Nat. Cell Biol. 17, 994–1003 (2015).
Arnaoutova, I. & Kleinman, H. K. In vitro angiogenesis: endothelial cell tube formation on gelled basement membrane extract. Nat. Protoc. 5, 628–635 (2010).
Righelli, D. et al. DEScan2: differential enrichment scan 2. R package version 1.22.0. https://bioconductor.org/packages/DEScan2 (2023).
Stark, R. & Brown, G. DiffBind: differential binding analysis of ChIP-Seq peak data version 3.0.15 from bioconductor. Bioconductor https://rdrr.io/bioc/DiffBind/ (2011).
Nomaru, H. et al. Single cell multi-omic analysis identifies a Tbx1-dependent multilineage primed population in murine cardiopharyngeal mesoderm. Nat. Commun. 12, 6645 (2021).
Rossi, G. et al. Capturing cardiogenesis in gastruloids. Cell Stem Cell 28, 230–240.e6 (2021).
Hellström, M. et al. Dll4 signalling through Notch1 regulates formation of tip cells during angiogenesis. Nature 445, 776–780 (2007).
Linnemann, A. K., O’Geen, H., Keles, S., Farnham, P. J. & Bresnick, E. H. Genetic framework for GATA factor function in vascular biology. Proc. Natl Acad. Sci. USA 108, 13641–13646 (2011).
Tremblay, M., Sanchez-Ferras, O. & Bouchard, M. GATA transcription factors in development and disease. Development 145, dev164384 (2018).
Kanki, Y. et al. Dynamically and epigenetically coordinated GATA/ETS/SOX transcription factor expression is indispensable for endothelial cell differentiation. Nucleic Acids Res. 45, 4344–4358 (2017).
Hagedorn, E. J. et al. Transcription factor induction of vascular blood stem cell niches in vivo. Dev. Cell 58, 1037–1051.e4 (2023).
Kalna, V. et al. The transcription factor ERG regulates super-enhancers associated with an endothelial-specific gene expression program. Circ. Res. 124, 1337–1349 (2019).
Liu, F. & Patient, R. Genome-wide analysis of the zebrafish ETS family identifies three genes required for hemangioblast differentiation or angiogenesis. Circ. Res. 103, 1147–1154 (2008).
Pimanda, J. E. et al. Gata2, Fli1, and Scl form a recursively wired gene-regulatory circuit during early hematopoietic development. Proc. Natl Acad. Sci. USA 104, 17692–17697 (2007).
Carpentier, G. et al. Angiogenesis analyzer for ImageJ - A comparative morphometric analysis of ‘endothelial tube formation assay’ and ‘fibrin bead assay. Sci. Rep. 10, 11568 (2020).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Liao, Y., Smyth, G. K. & Shi, W. The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads. Nucleic Acids Res. 47, e47 (2019).
Tarazona, S. et al. Data quality aware analysis of differential expression in RNA-seq with NOISeq R/Bioc package. Nucleic Acids Res. 43, e140 (2015).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Raudvere, U. et al. G:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res. 47, W191–W198 (2019).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015).
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9, e1003118 (2013).
Yu, G., Wang, L.-G. & He, Q.-Y. ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 31, 2382–2383 (2015).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Harada, Y. et al. ETS-dependent enhancers for endothelial-specific expression of serum/glucocorticoid-regulated kinase 1 during mouse embryo development. Genes Cells 26, 611–626 (2021).
Rhodes, C. S., Matsunobu, T. & Yamada, Y. Analysis of a limb-specific regulatory element in the promoter of the link protein gene. Biochem. Biophys. Res. Commun. 518, 672–677 (2019).
Selvarajan, I. et al. Coronary artery disease risk variant dampens the expression of CALCRL by reducing HSF binding to shear stress responsive enhancer in endothelial cells. Preprint at bioRxiv https://doi.org/10.1101/2023.02.08.527795 (2023).
El Taghdouini, A. et al. Genome-wide analysis of DNA methylation and gene expression patterns in purified, uncultured human liver cells and activated hepatic stellate cells. Oncotarget 6, 26729–26745 (2015).
Perkins, E. B., Cunningham, J. G., Bracete, A. M. & Zehner, Z. E. Two homologous enhancer elements in the chicken vimentin gene may bind a nuclear factor in common with a nearby silencer element. J. Biol. Chem. 270, 25785–25791 (1995).
Ehrlich, K. C., Lacey, M. & Ehrlich, M. Epigenetics of skeletal muscle-associated genes in the ASB, LRRC, TMEM, and OSBPL gene families. Epigenomes 4, 1 (2020).
Spensberger, D. et al. Deletion of the Scl +19 enhancer increases the blood stem cell compartment without affecting the formation of mature blood lineages. Exp. Hematol. 40, 588–598.e1 (2012).
Lilly, B., Olson, E. N. & Beckerle, M. C. Identification of a CArG box-dependent enhancer within the cysteine-rich protein 1 gene that directs expression in arterial but not venous or visceral smooth muscle cells. Dev. Biol. 240, 531–547 (2001).
Le Bras, A. et al. VE-statin/egfl7 expression in endothelial cells is regulated by a distal enhancer and a proximal promoter under the direct control of Erg and GATA-2. PLoS ONE 5, e12156 (2010).
Zhou, P. et al. Mapping cell type-specific transcriptional enhancers using high affinity, lineage-specific Ep300 bioChIP-seq. Elife 6, e22039 (2017).
Becker, P. W. et al. An intronic Flk1 enhancer directs arterial-specific expression via RBPJ-mediated venous repression. Arterioscler. Thromb. Vasc. Biol. 36, 1209–1219 (2016).
Rice, S. J. et al. Genetic and epigenetic fine-tuning of TGFB1 expression within the human osteoarthritic joint. Arthritis Rheumatol. 73, 1866–1877 (2021).
Chan, W. Y. I. et al. The paralogous hematopoietic regulators Lyl1 and Scl are coregulated by Ets and GATA factors, but Lyl1 cannot rescue the early Scl-/- phenotype. Blood 109, 1908–1916 (2007).
Vijayabaskar, M. S. et al. Identification of gene specific cis-regulatory elements during differentiation of mouse embryonic stem cells: an integrative approach using high-throughput datasets. PLoS Comput. Biol. 15, e1007337 (2019).
Schütte, J. et al. An experimentally validated network of nine haematopoietic transcription factors reveals mechanisms of cell state stability. Elife 5, e11469 (2016).
Wamstad, J. A. et al. Dynamic and coordinated epigenetic regulation of developmental transitions in the cardiac lineage. Cell 151, 206–220 (2012).
Baumgartner, E. A., Compton, Z. J., Evans, S., Topczewski, J. & LeClair, E. E. Identification of regulatory elements recapitulating early expression of L-plastin in the zebrafish enveloping layer and embryonic periderm. Gene Expr. Patterns 32, 53–66 (2019).
Sundaram, V. et al. Functional cis-regulatory modules encoded by mouse-specific endogenous retrovirus. Nat. Commun. 8, 14550 (2017).
Acknowledgements
We thank Marchesa Bilio for technical help; Laura Pisapia and Enzo Mercadante at the flow cytometry facility, and Salvatore Arbucci at the Microscopy facility of the Institute of Genetics and Biophysics. RNA-seq and ATAC-seq samples were sequenced by Genomix4Life SrL, Salerno, Italy. This work was funded by grants from the Fondazione Telethon GMR22T1012 (to A.B.), Fondation Leducq 15CVD01 (A.B. and E.I.) and the Italian Ministry of University and Research PRIN 20179J2P9J (A.B. and E.I.) and PRIN 2022XFE7M2 (to A.B. and G.L.).
Author information
Authors and Affiliations
Contributions
I.A.: performed experiments, assembled figures, provided conceptual input to experimental design, and edited the manuscript; O.L. and V.P.K.: performed bioinformatic analysis, assembled figures, and contributed to manuscript editing; A.C.: performed experiments; S.A.: performed experiments; G.L.: performed experiments, provided experimental design input, and contributed to manuscript editing; R.F.: performed experiments; C.A.: supervised bioinformatic analysis, provided conceptual input, and contributed to manuscript editing; E.I.: provided funding, contributed conceptual input to experimental design, and edited the manuscript; A.B.: provided funding, designed the experimental plan, and wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: David Favero. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Aurigemma, I., Lanzetta, O., Cirino, A. et al. Endothelial gene regulatory elements associated with cardiopharyngeal lineage differentiation. Commun Biol 7, 351 (2024). https://doi.org/10.1038/s42003-024-06017-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42003-024-06017-8
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.