Identification of downstream effectors of retinoic acid specifying the zebrafish pancreas by integrative genomics

Retinoic acid (RA) is a key signal for the specification of the pancreas. Still, the gene regulatory cascade triggered by RA in the endoderm remains poorly characterized. In this study, we investigated this regulatory network in zebrafish by combining RNA-seq, RAR ChIP-seq and ATAC-seq assays. By analysing the effect of RA and of the RA receptor (RAR) inverse-agonist BMS493 on the transcriptome and on the chromatin accessibility of endodermal cells, we identified a large set of genes and regulatory regions regulated by RA signalling. RAR ChIP-seq further defined the direct RAR target genes in zebrafish, including hox genes as well as several pancreatic regulators like mnx1, insm1b, hnf1ba and gata6. Comparison of zebrafish and murine RAR ChIP-seq data highlighted the conserved direct target genes and revealed that some RAR sites are under strong evolutionary constraints. Among them, a novel highly conserved RAR-induced enhancer was identified downstream of the HoxB locus and driving expression in the nervous system and in the gut in a RA-dependent manner. Finally, ATAC-seq data unveiled the role of the RAR-direct targets Hnf1ba and Gata6 in opening chromatin at many regulatory loci upon RA treatment.

Suppl. Figure S2. A RAR binding site is located at an equivalent place downstream the zebrafish and murine mnx1 gene. (A) Genome browser views of the Mnx1 gene locus in mouse (upper panel) and in zebrafish (lower panel) showing the presence of a RAR binding site downstream the gene by ChIP-seq analysis (gold tracks) and located at the border of a conserved regions. Sequence conservation is highlighted below the murine ChIP-seq peaks through the comparison with several vertebrate species (data from UCSC genome browser) as well as by the fish conserved genomic regions ("Cons. track" in blue). (B-F) The mnx1 RAR element is able to target GFP reporter expression in the dorsal pancreas (white arrow) and in the posterior gut (yellow arrows) in zebrafish transgenic assays at 24hpf (B) and 40hpf (C) stages. GFP expression is slightly increased by RA (panel E) and switch off by BMS493 (panel F) (treatments being performed during gastrulation).  (FDR<0,05). The green boxes highlight the RAR sites whose chromatin accessibility is not significantly modified by RA treatment (FDR>0,05).

Supplementary Tables :
Suppl. Table S1: List of genes having significant enriched expression in endodermal cells at 3somites stage (excel sheet 1) and at 8-somites stage (excel sheet 2).
The enrichment is given by log2 of fold change (FC)(RA versus DMSO) with the fold discovery rate (FDR). Normalized level of expression is given in count of reads per million (CPM) for the DMSO and RA treatments. Enriched genes were selected with a FDR<0.01.

Suppl. Table S2: List of genes regulated by RA treatments at 3-somites stage (excel sheet 1) and at 8-somites stage (excel sheet 2).
The enrichment is given by log2 of fold change (FC)(RA versus DMSO) with the fold discovery rate (FDR). Normalized level of expression is given in count of reads per million (CPM) for the DMSO and RA treatments. RA-regulated genes were selected based on FDR <0.01.
The enrichment is given by log2 of fold change (FC)(BMS493 versus DMSO) with the fold discovery rate (FDR). Normalized level of expression is given in count of reads per million (CPM) for the DMSO and BMS493 treatments. BMS493-regulated genes were selected based on FDR <0.01.

Suppl. Table S4: List of RAR binding sites identified by ChIP-seq.
Location of all 4750 RARa ChIP-seq peaks is given by their coordinates (start, end and width of peaks) on each zebrafish chromosome.
Suppl. Table S5: List of genes regulated by RA treatment and harbouring a RARa sites in their genomic vicinity.
Location of ChIP-seq peaks and the associated genes is given by their coordinates on each chromosome. Gene identity is given by their Ensembl ID, transcript ID and gene name. The location of the RAR ChIP-seq peaks and the distance from their assigned gene is given in columns I and J, respectively. Sites assigned to a gene stimulated by RA are given in excel sheet1; RAR sites assigned to a repressed gene are given in excel sheet 2.
Suppl. Table S6: List of RARa binding sites located in zCNE.
Location of zebrafish (excel sheet 1) and murine (excel sheet 2) RARa sites which are located in evolutionary conserved genomic sequences, with their assigned gene (given by Ensembl ID and gene name). Coordinates of ChIP-seq peaks (columns A-D), their location and distance to the assigned gene TSS are given in column I and J. The RARa sites located in highly conserved sequences from fish to mice are indicated in column L (excel sheet 1).

Suppl. Table S7: List of ATAC-seq peaks specifically identified in endodermal or non-endodermal cells.
Location of ATAC-seq peaks identified specifically in endoderm (sheet 1) or non-endoderm (sheet2) is given by their coordinates (start, end and width of peaks) on each chromosome. Peaks enriched in each cell type were selected based on FDR<0.05.
Suppl. Table S8: List of ATAC-seq peaks affected by RA treatments.
Location of ATAC-seq peaks induced (sheet 1) and repressed (sheet 2) by RA treatments is given by their coordinates on each chromosome. The gene assigned for each peak is given by their Ensembl ID and name. Selection of peaks was done based on FDR<0.05 (RA versus BMS493).