Multidimensional chromatin profiling of zebrafish pancreas to uncover and investigate disease-relevant enhancers

The pancreas is a central organ for human diseases. Most alleles uncovered by genome-wide association studies of pancreatic dysfunction traits overlap with non-coding sequences of DNA. Many contain epigenetic marks of cis-regulatory elements active in pancreatic cells, suggesting that alterations in these sequences contribute to pancreatic diseases. Animal models greatly help to understand the role of non-coding alterations in disease. However, interspecies identification of equivalent cis-regulatory elements faces fundamental challenges, including lack of sequence conservation. Here we combine epigenetic assays with reporter assays in zebrafish and human pancreatic cells to identify interspecies functionally equivalent cis-regulatory elements, regardless of sequence conservation. Among other potential disease-relevant enhancers, we identify a zebrafish ptf1a distal-enhancer whose deletion causes pancreatic agenesis, a phenotype previously found to be induced by mutations in a distal-enhancer of PTF1A in humans, further supporting the causality of this condition in vivo. This approach helps to uncover interspecies functionally equivalent cis-regulatory elements and their potential role in human disease.

T he mechanisms that tightly control transcription are essential for organ function. The transcriptional regulation of genes is controlled by non-coding cis-regulatory elements (CREs) spread over large genomic distances 1 . Genome-Wide Association Studies (GWAS) have identified many noncoding disease-associated alleles that have a hereditary component and overlap with CREs epigenetic signatures, suggesting that the disruption of CREs may be one of the genetic bases of human disease. This is the case of some pancreatic diseases, such as pancreatic cancer and diabetes [2][3][4][5][6] , that have a heavy societal burden, with incidence and death rates increasing worldwide 7-12 . Many previous studies demonstrated an enrichment of diabetesassociated variants in adult human islet enhancers 2,5,[13][14][15][16] , corroborating the hypothesis of pancreatic diseases being caused by alterations in CREs. Likewise, experimental in vivo and in vitro enhancer reporter assays also showed that specific islet enhancer variants correlate with altered regulatory functions 14,[17][18][19][20] . Studies of the role of CREs' mutations in the development of pancreatic diseases using in vivo models would provide invaluable insight given the complex regulatory networks involved. However, evidences from in vivo models of the role of CREs' mutations in the development of pancreatic diseases are still scarce [21][22][23] .
The zebrafish is a vertebrate model suitable for genetic manipulation 24 , with a pancreas that shares many similarities with the human pancreas, including similar transcription factors (TFs) and genetic networks of pancreatic development and function 25,26 . Thus, the zebrafish is a suitable in vivo model to validate causal regulatory variants. Yet, the identification of interspecies functionally equivalent CREs faces unsolved fundamental challenges, such as low conservation of interspecies noncoding sequences 27 and, for the minority of CREs whose sequence is conserved, their fast-evolving functionality 28 . Indeed, although sequence conservation of non-coding sequences has successfully been used to find enhancers, many with interspecies orthologous identities 29,30 , it has also been demonstrated to be insufficient for identifying all enhancers within a genome and between species 31,32 . To bypass these limitations, in this work we profiled the chromatin state of zebrafish pancreas cells and chromatin interaction points. We were able to accurately identify zebrafish pancreatic enhancers and, by comparisons with similar human datasets, we predicted functionally equivalent pancreatic enhancers. These findings revealed a previously unidentified human enhancer in the landscape of the tumour suppressor ARID1A 33,34 , with a potential role in the susceptibility to pancreatic cancer. Additionally, we explored the regulatory landscape of PTF1A, known to contain a human distal enhancer whose deletion leads to pancreatic agenesis/hypoplasia [35][36][37][38] , and found a zebrafish distal ptf1a enhancer that contains similar regulatory information to its human counterpart. We further demonstrated its functional equivalency by showing that its ablation induces pancreatic agenesis, explained by a reduction in the pancreatic progenitor domain early in development. Taken together, the multidimensional chromatin profiling used here allowed the establishment of previously unknown functional connections between human and zebrafish enhancers. These bridges between different species are invaluable for the prediction of new diseaserelevant enhancers and the study of their role in human disease.

Results
Zebrafish putative pancreatic enhancers share developmental roles. When comparing the basic structure of the human and zebrafish adult pancreas we observed that the organ structure is analogous between the two species (Fig. 1a). We further extended this comparison to the cellular composition of the main cell types of the pancreas between zebrafish, mouse 39,40 and human [39][40][41][42] , and found that the predominance of the major cellular types is maintained in these three vertebrates ( Supplementary Fig. 1). Because of these extended similarities between the zebrafish and mammal pancreas, the zebrafish has been used as a model to study pancreatic diseases 25,43 . Furthermore, these similarities hint at the existence of shared genetic networks that operate, likely through equivalent sets of CREs, in these three species. Thus, we explored the chromatin state and chromatin interaction points of zebrafish whole pancreas, to gather information about endocrine and exocrine cells, and compared it to human datasets. To identify CREs active in the zebrafish adult pancreas, we performed ChIP-seq for H3K27ac 44 , a key histone modification associated with active enhancers, and ATAC-seq 45 , an assay that identifies regions of open chromatin (Fig. 1b). We also performed HiChIP 17 against H3K4me3 46 to detect active promoters interacting with the uncovered enhancers (Fig. 1b). We found 14753 putative active enhancers, mostly in intergenic regions (57.8%), and 23298 putative active promoters corresponding to 9848 genes ( Fig. 1c; Supplementary Dataset 1a-c). To identify a subset of pancreatic enhancers with higher tissue-specificity, we compared the H3K27ac data from adult zebrafish pancreas to whole zebrafish embryos at four developmental stages, Dome, 80% epiboly, 24 h post-fertilisation (hpf) and 48 hpf 47 , since these comprise differentiated and non-differentiated cells from many different tissues. We found that 7115 putative enhancers (48.2%) are active only in the differentiated adult pancreas (PsE; Fig. 1c; Supplementary Dataset 1a-c) while the remaining 7638 (51.8%) are also broadly active in developing embryos (DevE), suggesting that their activity is not restricted to the pancreas. DevE presented 4 clusters (C1-4) with different H3K27ac abundance profiles during the different developmental stages ( Fig. 1d; Supplementary Fig.  2a; Supplementary Dataset 1e-l), suggesting that, apart from their activity in the adult pancreas, these enhancers might function in other cell types. C1 and C4 show similar levels of H3K27ac in all developmental stages, compatible with a putative ubiquitous enhancer activity, while C2 and C3 show different levels of H3K27ac during development, which may reflect a dynamic state of repression (C2) and activation (C3) of enhancers, or alternatively, differences in the abundance of cells where these enhancers are active during development.
Functional similarities between human and zebrafish pancreatic enhancers. Pancreatic enhancers are expected to activate the expression of genes in the pancreas. To test if the predicted enhancers correlate with the expression of target genes in the pancreas, we identified the nearest genes to each putative pancreatic enhancer 48,49 and observed that genes nearby PsE are enriched for exocrine pancreas expression (p < 4.27E−9; Supplementary Fig. 2b; Supplementary Dataset 2a-c), detected by in situ hybridisation 48,49 . These results contrast with the ones obtained for DevE, for which nearby genes are enriched for expression in several other tissues, including epidermis and endothelial cells (Supplementary Fig. 2; Supplementary Dataset 2d-f), suggesting a higher tissue-specificity of PsE. Additionally, the presence of endothelial expression also in genes associated to the PsE group suggests the detection of endothelial enhancers, likely derived from the vasculature present in the zebrafish adult pancreas (Supplementary Dataset 2d-f).
To improve the enhancer to gene association, we used H3K4me3 HiChIP to detect chromatin interactions between active promoters and putative enhancers in the zebrafish adult pancreas ( Fig. 1b; Supplementary Dataset 3a) and used RNA-seq to evaluate transcription ( Fig. 1b; 50 ). We found that, compared to all genes, PsE-associated genes have a higher average expression in multiple pancreatic cell types (Fig. 2a, Supplementary Dataset 3b). As expected, these expression results contrast with the lower average expression levels of the PsE-associated genes compared to all genes in a distantly related control tissue such as the muscle (Fig. 2a, Supplementary Dataset 3b). Similar results were obtained when analysing genes associated to the other identified clusters of pancreatic enhancers, specifically, DevE, C1-C4 and the total dataset of pancreatic enhancers altogether (PsEs+DevE; Supplementary Fig. 2c, d, Supplementary Dataset 3c-g), which had higher expression levels for at least one pancreatic adult tissue and lower expression levels in the muscle (control tissue), when compared to all transcribed genes. Next, we performed a similar analysis by calculating the ratio of the average expression level of genes associated to C1-4 and PsE putative enhancers (HC) divided by the average expression of all genes (AllG), using the previously published transcriptome of whole zebrafish embryos from 18 developmental stages 51 . We found that the genes associated to C1-4 and PsE have a HC/AllG ratio ≥ 1 ( Fig. 2b; Supplementary Fig. 2e) and that the HC/AllG ratio of the DevE associated genes is higher than the one of PsEassociated genes, for most of the analysed developmental time points (Fig. 2b) partially reflects the variation of H3K27ac signal observed in the enhancers of the C1-4 clusters (Fig. 1d, Fig. 2b and Supplementary Fig. 2e). For instance, the C2 group that shows an increased presence of H3K27ac signal at Dome and 80% epiboly developmental time-points (Fig. 1d), also shows an increased HC/AllG ratio in the earliest developmental time points (BDO:blastula to G75: 75%epiboly; Fig. 2b and Supplementary  Fig. 2e). These results suggest that C1-4 enhancers control gene expression in the adult differentiated pancreas, in addition to other cell types during development. Overall, these results increase the robustness of the pancreatic enhancers predictions, since it is possible to correlate with the transcription of the respective putative target genes.
To determine if the detected H3K27ac signal is a good predictor of active pancreatic enhancers, we performed in vivo enhancer reporter assays for 17 regions within the regulatory landscapes of known pancreatic genes. We selected sequences with detectable, but variable, H3K27ac signal overlapping with open chromatin, detected by ATAC-seq 45 [52][53][54] . These results validate the robustness of pancreatic enhancers prediction based on chromatin state and further suggest that the abundance of H3K27ac mark in genomic locations might improve such predictions.
We observed that out of 14753 putative zebrafish pancreatic enhancers, only 12.49% (n = 1842) could be directly aligned to the human genome 55 (Fig. 3a and Supplementary Dataset 3i-l). A similar proportion was found in the group of developmental enhancers (11.36%; 7326 out of 64,498; Fig. 3a). Using the corresponding human sequences from the pancreas and developmental enhancers groups, we found that they share similar PhastCons conservation scores ( Fig. 3b; Supplementary Fig. 3d and Supplementary Dataset 3m-p). Next, we wanted to determine if the zebrafish putative pancreatic enhancers that align to the human genome also overlap with H3K27ac signal from human pancreas. Only a minority of interspecies aligned sequences shared H3K27ac signal (total pancreas data set: 227 out of 1842; PsE: 115 out of 1052; DevE: 112 out of 790). The human sequences, that shared H3K27ac signal with zebrafish, did not show a higher average conservation score than the aligned sequences that showed H3K27ac signal in zebrafish alone ( Fig. 3b Fig. 3c; Supplementary Dataset 3q). Overall, these results suggest that pancreatic enhancer function is not a strong condition to impose sequence conservation.
Following these data, we assessed whether functionally equivalent pancreatic CREs exist between human and zebrafish, despite an overall lack of sequence conservation. To explore this possibility, we investigated if the genes interacting with each cluster of zebrafish enhancers were enriched for homologs of human genes associated with pancreatic diseases, which would suggest the existence of functionally equivalent pancreatic CREs with potential biomedical relevance. Such enrichment was observed for the clusters of late development and adult pancreas (PsE, C3 and C4; Fig. 3d; Supplementary Dataset 3r, s). Human gene-disease associations were retrieved from DisGeNET 56 and we observed that 306 out of 836 zebrafish genes (36.6%) homologous to human pancreas disease-associated genes also interact with zebrafish pancreatic enhancers.
Enhancers can exist in their typical form, as short and restricted regions of DNA, or they can be present as large regions of hyperactive chromatin referred to as super enhancers 13,57,58 . Several computational approaches have been applied to identify super enhancers in vertebrate genomes, including in human and zebrafish 59 . We searched for super enhancers active in the pancreas of human and zebrafish (Supplementary Dataset 1m, n; 275 in zebrafish and 875 in human), to understand if pancreatic super enhancers control the same genes in both species, further suggesting an equivalency in function. Gene ontology for putative target genes showed a similar enrichment for transcriptional regulation in both species and several of these genes corresponded to the same orthologues (32 out of the 271 zebrafish genes; Supplementary Fig. 3f-g), some with important pancreatic functions, such as INSR, a critical regulator of glucose homoeostasis 60 and GATA6, which plays a crucial role in pancreas development and β-cell function 61 ( Supplementary  Fig. 3h). We further inquired if human and zebrafish enhancers might operate similarly, using equivalent TFs. To test this, we performed a motif enrichment search for TF binding sites (TFBS) Fig. 1 The zebrafish pancreas, from histology to chromatin state. a Comparison of the basic structure of the human and zebrafish adult pancreas. Above: Dissected adult male Tg(insulin:GFP, elastase:mCherry) zebrafish; insulin and elastase promoters drive GFP expression in beta-cells (green) and mCherry in acinar cells (red), respectively. IN, intestine; LRL, Liver right lobe; LT, left testis; PI, principal islet; SI, secondary islets; SB, swim bladder. Below: Histology of the pancreas; transverse sections with hematoxylin/eosin staining showing islets of Langerhans (black dashed lines) surrounded by exocrine tissue in zebrafish and human pancreas. Magnification: ×40 and scale bar: 1 mm. b Genomic landscape of gata6 in the zebrafish adult pancreas showing the H3K27ac ChIP-seq profile (black) and ATAC-seq peaks (blue) from whole pancreas, RNA-seq from exocrine pancreas (green) and a heat map for chromatin interactions with gata6 promoter detected by HiChIP for H3K4me3 from whole pancreas (below). A putative enhancer sequence that interacts with the gata6 promoter is highlighted by the light blue box. c Bar plot (left panel) showing the number of genes with active promoters (defined by H3K4me3 signal, gray bar) and putative active enhancers in adult zebrafish pancreas (defined by H3K27ac mark, green bar), and their distribution throughout the regions of the genome (right panel). in regions of open chromatin identified by ATAC-seq 45 , within the 14753 pancreatic enhancers, and found several TFBS for known pancreatic TFs (ZP; Fig. 3f, Supplementary Fig. 4a, and Supplementary Dataset 3t, u). We also performed a similar analysis using available human whole pancreas datasets (HP 62 ; Datasets summarised in Supplementary Dataset 4g). To compare the extent of overlap of enriched motifs in human and zebrafish pancreatic enhancers with motifs enriched in other pancreas unrelated enhancers, we have performed a similar motif enrichment search for datasets of zebrafish embryos (D80, dome and 80%epiboly; 24 HPF, 24 hpf) and human heart ventricle (V 62 ; Datasets summarised in Supplementary Dataset 4g). We selected the top 140 enriched motifs from each dataset and observed that the majority of the common motifs were found in zebrafish (ZP) and human (HP) pancreas datasets (ZP,HP:98; ZP,D80:63; HP,D80:61) (Fig. 3g, Supplementary Fig. 4b), while comparisons with the human ventricle (V) showed that ZP,HP was the second largest group following HP, V ( Supplementary Fig. 4c).
Several TFs, such as Ptf1a, Pdx1, Pax6 and Sox9, are known to be important for pancreas function or development in several vertebrate species, including human and zebrafish 2,63-65 . As shown above, human and zebrafish pancreatic enhancers are enriched for many shared TFBS, therefore it is reasonable to expect that many of these TFBS are from TFs known to have an important pancreatic function. To test this hypothesis, we have selected 25 TFs known to be required for pancreas function and development and calculated the distribution of the respective TFBS motifs within the previously identified enriched motifs  ChIP-seq and ATAC-seq data accurately predict functional pancreatic enhancers. a Average expression of genes interacting with putative pancreas-specific enhancer sequences (PsE), detected by HiChIP for H3K4me3 (HC, n = 6174 genes), compared to the average expression of all genes (AllG, n = 33737 genes). Gene expression was determined from RNA-seq data from different pancreatic cells (acinar n = 4, duct n = 3, endocrine pancreas n = 4), whole pancreas (n = 2), and muscle (control; n = 2). One-sided Wilcoxon test (≥), p-values < 0.05 were considered statistically significant (****p < 2E−16). Error bars represent the 95% confidence interval. b Ratio between average expression of genes interacting with putative pancreatic enhancers (PsE, C1, C2, C3 and C4 clusters) and the average expression of all genes throughout zebrafish development. C1, C2, C3 and C4 are different clusters that compose the DevE category. BDO: blastula, dome; G50: gastrula, 50% epiboly; GSH: gastrula, shield; G75: gastrula, 75% epiboly, S1-4: segmentation, 1-4 somites; S14-19: segmentation, 14-19 somites; S20-25: segmentation, 20-25 somites; PP5: pharyngula, Prim-5; PP15: pharyngula, Prim-15; PP25: pharyngula, Prim-25; HLP: hatching, long-pec; LPM: larval, protruding-mouth; LD4: larval, day 4; LD5: larval, day 5. c Percentage of F0 zebrafish larvae with GFP expression in the exocrine pancreas following in vivo transient transgenesis reporter assays. The empty enhancer reporter vector was used as the negative control (NC). The depicted sequences (E1 to 10) represent the top 10 putative enhancer sequences with higher H3K27ac signal ("high H3K27ac" group). Values are represented as percentages and compared by two-sided Chi-square with Yates' correction test. p-values<0.05 were considered significant (****p < 0.0001, *p < 0.05). The exact p-value and n are discriminated in Supplementary Dataset 4. d Representative confocal image of the in vivo transient transgenesis reporter assays for the E3 sequence (n = 30). depicted in c) showing expression of GFP (green) in 11 dpf zebrafish pancreas (white dashed line), labelled by anti-Alcam staining (white) and anti-Amylase (red) antibodies (n = 30, from 2 independent injections, with 63.33% of larvae showing GFP expression in the exocrine pancreas). Nuclei were stained with DAPI (blue). Images were captured with a Leica SP5II confocal microscope. Scale bar: 60 µm. For a-c, source data are provided as a Source Data file.
described in Supplementary Dataset 3t. We found that the majority of the TFBS motifs from the pancreatic TFs were within the ZP,HP overlapping datasets, regardless of the compared groups ( Supplementary Fig. 4d-f). These results suggest that the same set of TFs operates in zebrafish and human pancreatic enhancers. Overall, these results argue in favour of interspecies functional equivalency of enhancers.
Landscape of arid1a reveals potential pancreatic cancer associated enhancer. To better address the hypothesis of interspecies functional equivalency of enhancers, we focused on the regulatory landscape of a gene that is potentially linked to human pancreatic diseases. We selected arid1ab, the orthologue of human ARID1A, a tumour-suppressor gene associated with cancer in several different cell types 33,34 , including pancreatic ductal adenocarcinoma 66 . ARID1A plays a key role in the regulation of DNA damage repair, by promoting an efficient processing of double-strand breaks into single-strand ends, being required to sustain DNA damage signalling and repair, hence suppressing tumorigenesis 67 . We identified several putative enhancers (zA.E1-4, Fig. 4a), that we tested in vivo using enhancer reporter assays (Supplementary   A ptf1a enhancer explains pancreatic agenesis causal variant in vivo. To further evaluate the interspecies functional equivalency of enhancers and their role in human pancreatic diseases, we focused on the human PTF1A locus, known to be controlled by a distal downstream enhancer whose deletion causes pancreatic agenesis 35 ( Fig. 5a; hP.E3). Concomitantly, we detected a zebrafish distal ptf1a enhancer, downstream of ptf1a (zP.E3), as well as two previously identified proximal enhancers (zP.E1 and zP.E2; 70 ). zP.E3 interacts with the promoter of ptf1a, observed by Hi-ChIP and 4C-seq ( Fig. 5a and Supplementary Fig. 5b), and could correspond to the functional equivalent enhancer whose deletion causes pancreatic agenesis in humans (hP.E3), although its sequence partially aligns with a more distal human sequence likely inactive in human pancreatic cells ( Supplementary Fig. 6).
In vivo enhancer assays for zP.E3 and hP.E3 showed strong and robust expression in progenitor cells (Fig. 5b), a result that is in agreement with the described activity of hP.E3 in vitro as a human developmental enhancer 35 . These results suggest that the human and zebrafish enhancers share some regulatory information. This is further supported by binding sites for FOXA2 and PDX1 in the human hP.E3, also predicted to bind to the zebrafish zP.E3 ( Supplementary Fig. 7a, b; 71 ). To further evaluate the role of zP.E3, we generated genomic deletions in the zP.E3 sequence ( Fig. 5c-g, Supplementary Fig. 8 and Fig. 9). Deletion1, a 632 bp deletion that includes the predicted Foxa2 and Pdx1 binding sites and the majority of transposase-accessible chromatin within zP.E3 ( Supplementary Fig. 9a), results in a decrease of the pancreatic progenitor domain area in homozygous mutants (Fig. 5 c, d, f), as well as a reduction in the expression levels of ptf1a ( Supplementary Fig. 9b). Furthermore, after pancreatic differentiation, the Deletion1 mutants displayed pancreatic hypoplasia (Fig. 5e, g; Supplementary Fig. 9c-e), and we observed the same phenotype for multiple independent deletions of zP.E3 generated in somatic cells ( Supplementary Fig. 8). In contrast, no phenotypes were observed for a 517 bp deletion within the zP.E3 enhancer, adjacent to Deletion1, which excludes the majority of accessible chromatin and predicted TF binding sites (Deletion2; Supplementary Fig. 9a, d, e), suggesting that the functional core of zP.E3 coincides with the regions of available chromatin that overlap with the predicted binding of Foxa2 and Pdx1. In agreement with the observed phenotypes, pancreatic hypoplasia is compatible with the described loss-of-function of ptf1a in zebrafish 70 and the loss of hP.E3 function in humans 35 . In light of these results, we suggest that pancreatic hypoplasia is the consequence of the reduction in the pancreatic progenitor domain caused by decreased levels of ptf1a due to the loss of an important pancreatic progenitor enhancer. Later on, after pancreatic differentiation, zP.E3 and hP.E3 enhancers acquire distinct activity patterns. The zebrafish zP.E3 enhancer is able to drive a consistent expression in differentiated pancreatic cells from late embryos up to adults ( Supplementary  Fig. 10), including acinar and duct cells, while the human hP.E3 enhancer shows almost a total lack of activity in differentiated acinar and duct cells, as previously observed in vitro 35 driving expression only in very few cells ( Supplementary Fig. 10). Overall, these results suggest that zebrafish and humans share a functionally equivalent distal enhancer of PTF1A during development, whose loss-of-function results in a reduction of the pancreatic progenitor domain, elucidating, in vivo, the causal link between the disruption of this enhancer in humans and pancreatic agenesis.

Discussion
Cis-regulatory mutations and sequence variations are associated with pancreatic cancer and diabetes 2-6 . However, the in vivo implications of these genetic changes are still unknown. Here, we explore the chromatin state of the zebrafish pancreas to uncover pancreatic enhancers and establish comparisons with humans, Fig. 3 The zebrafish and human pancreas share cis-regulatory similarities. a Percentage of predicted zebrafish pancreatic enhancer sequences aligned to the human genome. Sequences are grouped in different clusters: "Pancreas" that includes PsE and DevE; "PsE"; "DevE"; "Embryo" that include putative enhancers active only during embryonic development.  ARTICLE so that we can predict and model human pancreas diseaseassociated enhancers. We found that, although most of the zebrafish pancreatic enhancers do not share significant sequence identity with human pancreatic enhancers, they share many TFBS and their target genes are enriched for human pancreas diseases. These results suggest the existence of functionally equivalent enhancers in zebrafish and humans, as proposed for other tissues and species 72,73 . Indeed, recent studies looking into highly divergent species as human and sponges have located similarly functional enhancers within microsyntenic regions that, although do not share significant sequence identity, clearly recapitulate similar expression patterns in enhancer reporter assays, arguing in favour of functional equivalency 74 . This is likely the consequence of enhancers being fast-evolving sequences operating with a high degree of sequence flexibility 75 . Several mechanisms that may operate together during evolution can illustrate the potential for sequence flexibility of enhancers while retaining a consistent TFBS code. Among them, nucleotide alterations within   33 , we show that within a microsyntenic region within the arid1a locus in humans and zebrafish, there are pancreatic enhancers that share regulatory information, although not sharing significant sequence identity. We further show that the deletion of the human ARID1A pancreatic enhancer impairs ARID1A expression, defining a locus for non-coding mutations that may increase the risk for pancreatic cancer. We further explored the potential of functional equivalency for an enhancer of ptf1a 80 , in which both zebrafish and human enhancers share regulatory information and biological requirements during pancreas development. The loss-of-function of the zebrafish enhancer results in a decrease of the pancreatic progenitor domain and ultimately in pancreatic hypoplasia, a phenotype consistent with the impact of mutations described in the human regulatory landscape, which are associated with pancreatic agenesis 35 . The reduction of the pancreatic progenitor domain in zebrafish may explain the phenotype observed in humans, contributing to the clarification of its molecular and cellular origin. Interestingly, the deletion of the zebrafish ptf1a enhancer does not show a complete phenotypic penetrance, with~25% of the embryos having a pancreas morphologically similar to the controls, suggesting that other redundant enhancers may exist in the zebrafish regulatory landscape of ptf1a, compatible with a shadow enhancer identity 81 . Additionally, human and zebrafish ptf1a enhancers show divergent functions after differentiation. While the human enhancer shows very little activity in differentiated pancreatic cells, the zebrafish enhancer drives persistent reporter expression, suggesting that the phenotype in zebrafish after pancreatic differentiation could have the extra contribution of this late zebrafish specific function of the ptf1a enhancer.
Sequence conservation of CREs can be a good predictor of sequence functionality, however it holds important limitations in the prediction of equivalent functions. This is observed in the current work, where the vast majority of the zebrafish pancreatic enhancers that could be aligned to the human genome did not share marks of enhancer activity in pancreatic cells. This is further illustrated by zP.E3, which shows some partial alignment with a human sequence that has no active marks of enhancer in pancreatic cells. Many examples have been described showing how conserved sequences among divergent species might harbour divergent functions. These include differences in conserved enhancer sequences resulting in functional divergence 82,83 , to more striking examples of coding exons sequences repurposed to cis-regulatory functions 79 . Additionally, recent studies have shown that the ultra-conservation at sequence level observed in some enhancers is not necessary for the maintenance of tissue specific regulatory functions, suggesting that sequence constraint may partially result from other regulatory or unknown functions 75 .
The use of animal models to understand the role of CREs in the development of human diseases requires the identification of functionally equivalent sequences. As discussed above, sequence conservation is not a reliable predictor of functional conservation 84 and functional equivalent sequences might not present high sequence conservation 85 . This problem can be partially bypassed by combining the use of biochemical marks associated to CREs activity with enhancer reporter assays to identify similar regulatory information harboured by such sequences. In the current work we used this strategy, allowing us to identify and test in vivo enhancers that, when altered, can affect the expression of disease-associated genes. This strategy can help to identify where in the genome disease-causing non-coding mutations may occur by predicting disease-relevant CREs based on phenotypic description of CRE's loss-of-function. Furthermore, in the near future this strategy may be further improved by computational methods as well as the detection of TFBS in both species. These improvements could help to establish a correspondence of enhancers' identity genome wide.
The pancreas is a complex structure composed by multiple cell types. In this work we assessed the chromatin state of the whole pancreas of adult zebrafish in order to identify pancreatic CREs and their target genes. By associating CREs to the expression of target genes, we have shown that our dataset includes exocrine and endocrine CREs. This broad pancreatic enhancer map is very advantageous since it allows us to approach different biological and biomedical questions related with different pancreatic cell types. The pancreas also contains other cell types that are heavily intertwined, as is the case of endothelial cells. Indeed, several of our observations indicate the presence of endothelial enhancers in the described CREs datasets, namely the enrichment of endothelial expressing genes located nearby DevEs (Supplementary Dataset 2d-f) and the extended overlap of common motifs between pancreatic enhancers and heart ventricle enhancers. Fig. 4 The zebrafish and human arid1ab/ARID1A regulatory landscapes contain an equivalent pancreatic enhancer. a Genomic landscape of the zebrafish arid1ab gene, showing profiles for H3K27ac ChIP-seq (black), ATAC-seq (blue) and 4 C with viewpoint in the arid1ab promoter (magenta) in adult zebrafish pancreas (top); zoom-in in arid1ab regulatory landscape (middle). Human ARID1A genomic landscape (bottom) with H3K27ac enriched intervals from human pancreatic cell lines (HPCL, black bars, top-to-bottom: PT-45-P1, CFPAC-1 and HPAF-II), H3K27ac profile from human pancreas (WPT, black) and from non-pancreatic human cell lines (NPHCL; GM12878, H1-hESC, HSMM, HUVEC, K562, NHEK and NHLF; Data from ENCODE). Human/zebrafish sequence conservation (dark green). Tested putative enhancers are highlighted in grey (zA.E1 and zA.E3; no enhancer activity) and green (zA.E2, zA.E4 and hA.E4; enhancer activity). Zebrafish/human syntenic box (red box). b Transient in vivo enhancer reporter assays of zA.E4 and hA.E4 showing the percentage of zebrafish embryos with GFP expression in endocrine, acinar and duct cells (two-sided chi-square test with Yates correction; *p < 0.05; Endocrine cells: zA.E4, p = 0.0001; hA.E4, p = 0.0294; Acinar cells: zA.E4, p = 0.0391; hA.E4, p = 0.1167; Duct cell: zA.E4, p = 0.00001; hA.E4, p = 0.9731). Number of analysed embryos (n). Negative control (NC). c Luciferase enhancer reporter assays performed in human hTERT-HPNE cells for hA.E4, showing luc2/Nluc ratios, relative to the negative control (two-sided t-test; ****p < 0.0001; hA.E4 p-value = 0.0001; PC p-value < 0.0001). Data from three biological replicates (grey dots, n = 3) and Mean±SD (error bar). Negative control (NC). Positive control (PC). d Strategy for CRISPR-Cas9 deletions in the hA.E4 locus, indicating sgRNA target sites. e Representative images of transfected hTERT-HPNE human cells expressing pairs of sgRNAs and Cas9 (arrows). In control, sgRNAs target a H3K27ac depleted region, while sgRNAs in sgPair1 and sgPair2 target the hA.E4 locus. Left column show anti-ARID1A (grey) and right column GFP (green), mCherry (red) and DAPI (blue; nuclei). Representative images from three biological replicates. Scale bar: 40 μm. f Normalized ARID1A levels from immunocytochemistry images. Two-sided t-test depicted for p ≤ 0.05(*), p ≤ 0.01(**) and not significant (ns; Enhancers can be highly tissue specific, while others can be active in multiple tissues, as observed by the identification of PsE and DevE. The former showed H3K27ac profiles more restricted to the zebrafish adult pancreas, while the latter had broad profiles throughout development, suggesting their activity to be present in multiple tissues. The zP.E3 enhancer is not detected in the embryonic H3K27ac dataset, likely because its activity is highly restricted to pancreatic progenitor cells during development, resulting in its inclusion in the PsE group. A detailed analysis of the activity of this enhancer, from the larval stage to adulthood, shows it to be almost exclusively active in exocrine pancreatic cells (Supplementary Fig. 10e), illustrating the expected tissue specificity of PsE enhancers.   In this work, we identified pancreatic CREs in zebrafish, a model organism that is amenable to genetic manipulation and phenotyping. By establishing a correlation between human and zebrafish pancreatic CREs, functional testing of CREs can be performed in vivo, helping to clarify the role of CREs in pancreatic function and disease. In summary, the combination of techniques used in this work, allowed the identification of human cis-regulatory elements involved in disease. We show that transcriptional cis-regulation of the human and zebrafish adult pancreas have a high degree of similarity, allowing the functional exploration of cis-regulatory sequences in zebrafish, with the potential of translation to human pancreatic diseases.  87 . For the in vivo enhancer assays, embryos were anesthetized by adding tricaine (MS222; ethyl-3aminobenzoate methanesulfonate, #E10521-10G, Sigma-Aldrich) to the medium and selected by the internal positive control of transgenesis. For the establishment of transgenic and mutant zebrafish lines, embryos were microinjected, selected, bleached and grown until adulthood. Adult F0s were outcrossed with WT adults and the offspring screened for the internal control of transgenesis and the pattern of expression of the regulatory element, or for the respective mutations, by genotyping. In vivo reporter lines, Tg(ela:mCherry) and Tg(sst:mCherry), were used to label the exocrine and endocrine domain, respectively. The i3S animal facility and this project were licensed by Direcçaõ Geral de Alimentaçaõ e Veterinária (DGAV) and all the protocols used for the experiments were approved by the i3S Animal Welfare and Ethics Review Body.
Generation of plasmids for enhancer assays. Putative enhancer sequences were selected based on the overlap between H3K27Ac ChIP-seq and ATAC-seq signal in non-coding regions within the landscape of each pancreas-relevant gene. Sequences were PCR amplified from zebrafish genomic DNA using the primers in Supplementary Dataset 4b (designed to span the ChIP-seq and ATAC-seq signals) (Sigma-Aldrich), with the proof-reading iMax TM II DNA polymerase (#25261, INtRON Biotechnology) following the manufacturer's instructions for a standard 20 μl PCR reaction. PCR products were visualised by electrophoresis on an 1% agarose gel, the bands excised, purified with NZYGelpure kit (#MB011, NZYTech) and cloned into the entry vector pCR ® 8/GW/TOPO (#250020 Invitrogen, Ther-moFisher Scientific) according to manufacturer's instructions. The vectors were then recombined into the destination vectors Z48 90 , for transient enhancer assays, and ZED 91,92 , for stable transgenic lines, using Gateway ® LR Clonase ® II Enzyme mix (#11791020, Invitrogen, ThermoFisher Scientific), following manufacturer's instructions.
Genomic DNA from hTERT-HPNE cells was extracted 48 h after transfection and used as template for PCR amplification in order to genotype the tested conditions (Supplementary Dataset 4b).
Zebrafish pancreatic progenitor cells were extracted from 48 hpf embryos, immediately following euthanasia by rapid chilling, by repeated pipetting up and down in a gentle motion with 300 μL of Ginzburg fish Ringer's solution The hTERT-HPNE cells were fixed at 48 h after transfection in formaldehyde 4% (#F1635-500ML, Sigma-Aldrich) in PBS [137 mM NaCl (#S/3161/60, Fisher Chemical), 2.7 mM KCl (#2676.298, VWR), 10 mM NaHPO4 (#1.06342.0250, Merk), and 1.8 mM KH2PO4 (#1.06585.1000, Merk)] for 15 min at RT, permeabilized with 1% Triton X-100 (#X100, Sigma-Aldrich) in PBS and blocked with 2% BSA (#MB04602, NZYTech) in PBS for 20 min at RT. Incubation with primary antibody in 2% BSA/PBS (#MB04602, NZYTech) was O.N. at 4°C and in secondary antibody plus DAPI (1:1000, D1306 Invitrogen, ThermoFisher Scientific) was 3 h at 4°C in 2% BSA/PBS (#MB04602, NZYTech) for 3 h. Human cells were washed once after fixation and permeabilization, and three times after each incubation with primary and secondary antibodies with PBS for 10 min at RT. Fluorescence images were obtained at ×40 magnification on a Leica DMI6000 FFW microscope (v. 3.7.4.23463). Primary antibody used: anti-ARID1A (1:1000; #HPA005456 Sigma-Aldrich). Secondary antibody used: anti-rabbit Alexa Fluor 647 (1:1000, #A31573, ThermoFisher Scientific). In hTERT-HPNE immunohistochemistry images, the ARID1A nuclear staining was measured for each cell GFP + /mCherry+ and normalized for the average staining of the nucleus of all other cells in the same field (ratio=ARID1A expression/mean of ARID1A expression in the field). Then, we normalized the ratios using the control values. Statistical analysis. Two-tailed paired Student's t-test was applied to area quantifications, and in expression analysis. Chi-square test was applied to the in vivo validation of selected putative pancreatic enhancers and TFs motif comparisons. Wilcoxon test was applied to gene-to-enhancer association by chromatin interaction points comparisons. Fisher's exact test was applied to analyse the percentage of larvae in each phenotypic class. In all analyses, P < 0.05 was required for statistical significance and calculated in GraphPad Prism 5 (San Diego, CA, USA).

Processing and bioinformatic analysis
ChIP-seq analysis. High quality raw reads for the two replicates of H3K27ac ChIPseq (FASTQC v.0.11.5 96 , Supplementary Data 1 and 2) were aligned to the zebrafish genome (GRCz10/danRer10) using Bowtie2 (v.2.2.6) with default settings 97 . Before the alignment, the sequencing adapters were removed from the raw reads applying Skewer (v.0.2.1) 98 . The alignment file was converted into a bed file(Bedtools v.2.27) 99 and the data extended 300 bp, bigwig tracks generated and uploaded to UCSC Genome Browser (Fig.1b). Highly enriched regions (peaks) were obtained by MACS14 (v. 1.4.2) with the parameters "--nomodel, --nolambda and --space=30" 100 . During the ChIP-seq analysis the two replicates were processed independently. Reproducibility of the two biological replicates was measured by Pearson's correlation coefficient 101 in R. The same pipeline was applied to analyse human dataset from the ENCODE project (https://www.encodeproject.org/): ENCSR340GAZ; ENCSR748TFF. Regarding the embryo ChIP-seq datasets from the work by Bogdanovic and colleagues 47 , the data processed by the authors was used.
Identification of putative enhancers. To identify the best putative active enhancers in the zebrafish adult pancreas, we intersected the peaks from the two H3K27ac ChIP-seq replicates, generated by peak calling, selecting only the enriched regions present in both replicates (Bedtools intersect v.2.27 with the default parameters 99 ). Since H3K27ac is also present in promoter regions, we excluded peaks overlapping with TSS by intercepting our set of putative active enhancers with the TSS coordinates (Bedtools intersect with the parameter "-v"). To determine the presence of unreliable peaks, a "blacklist" was generated using H3K27ac ChIP-seq of different zebrafish tissues to identify putative false positive peaks. The used datasets from the DANIO-CODE consortium were the following(https://danio-code.zfin.org).: DCD002894SQ, DCD002921SQ, DCD003653SQ, DCD003654SQ, DCD003671SQ and DCD002742SQ. Then, MACS software was performed in these datasets using the same parameters described in the last section and the peaks that were present in at least 5 out 6 datasets were selected. This analysis generated 156 peaks, from which 102 overlapped with 69 peaks from the list of 14,753 putative pancreatic enhancers, representing less than 0,5% of the total dataset. We have used a published human "blacklist" of unreliable peaks 102 and observed that these represent 192 out of 102,548 of the human pancreas H3K27ac ChIP-seq called peaks (0.2% of the identified peaks). The zebrafish and human "backlist" of peaks is included in Supplementary Dataset 1o and annotated in Supplementary Dataset 1a.
4C-seq analysis. 4C-seq libraries were first inspected for quality control using FASTQC 96 (v.0.11.5, Supplementary Data 3-5) and demultiplexed using the script "demultiplex.py" from the FourCSeq package 107 , allowing for 1 mismatch in the primer sequence. 4C-seq data were analysed as previously described 108,109 . Briefly, reads were aligned to the zebrafish genome (GRCz10/danRer10) using Bowtie (v.1.1.2) 110 , keeping only uniquely mapping reads (-m 1). Reads within fragments flanked by restriction sites of the same enzyme or if fragments smaller than 40 bp were filtered out. In addition, reads falling ±5 kb from the viewpoint were filtered out. Mapped reads were then converted to reads-per-first-enzyme-fragment-end units, and smoothed using a 30 fragment mean running window algorithm (Figs. 4a and 5a).
HiChIP-seq analysis. H3K4me3 HiChIP-seq analysis from paired-end fastq files to pairs of interacting chromatin fragments were performed using a custom python script based on the default function of the pytadbit python library 111 . This library first uses GEM mapper (v.3.6) 112 to map paired reads independently to the zebrafish reference genome (GRCz10/danRer10, flags used by GEM mapper --maxdecoded-strata 1; --min-decoded-strata 0; -e 0.04). Then, reads are associated to a particular restriction fragment and paired together according to their read names. Once the reads are paired, the pairs of reads are filtered so that only those belonging to different restriction fragments are kept. Compressed sparse matrix files in cooler and hic formats were generated from those filtered reads using Cooler ("cload pairix" utility) and Juicer tools ("pre" utility) respectively for both visualisation and further analysis. From the hic file we obtained contact matrices detailing the coordinates of 2 interacting 5 kb chunks and the respective number of interactions, using Juicer tools ("dump" utility) and filtering for ≥2 interactions between chunks ≤100 kb apart. To predict the target promoters of putative active enhancers, only contacts connecting zebrafish pancreas active TSSs and putative active enhancers given by H3K27ac ChIP-seq peaks from whole pancreas, adult pancreas (PsE), developing pancreas (DevE) and the different enhancer clusters (C1-C4) were selected. An output table was produced with genes targeted by enhancers, per enhancer cluster (Supplementary Dataset 3a-g). Custom scripts are provided in a GitLab repository (https://gitlab.com/rdacemel/pancreasregulome).
Gene expression barplots. The average expression of genes associated with each enhancer cluster (PsE, DevE, C1-C4), as defined by HiChIP, was compared to the average expression of all genes present in the RNA-seq datasets using R and ggplot for drawing barplots (Fig. 2a, Supplementary Fig. 2c, Supplementary Dataset  Identification of Human/zebrafish syntenic blocks. Human/zebrafish syntenic blocks were defined by two aligned regions between both species that kept their relative position among each other. Pre-existing alignments available in the UCSC genome browser were used. Then, enhancers were searched within these blocks in both species. Conservation between zebrafish and human and PhastCons scores. To obtain the percentage of zebrafish putative active enhancers conserved with human, the coordinates of putative active enhancers from adult zebrafish pancreas and embryos at different development stages (GRCz10/danRer10) were used as input to the UCSC genome coordinate conversion tool (https://genome.ucsc.edu/cgi-bin/ hgLiftOver, liftover (v.1.04.00) to hg19, October 2019) (Fig. 3a). To visualise the conservation of the respective sequences, liftOver (v.1.04.00) to hg38 was done and their average PhastCons conservation score plotted (Fig. 3b). For this, we downloaded PhastCons scores in bigWig format from a 100-way multiple species alignment of 99 vertebrates against human (hg38) (hg38.phastCons100way.bw, October 2019) 116 and converted to BedGraph text format using the UCSC's utility bigWigToBedGraph (v.1.04.00). Then, the Bedtools 99 suite (v.2.27) was used to intersect and map different putative enhancer clusters in bed format with the conservation scores, storing for each putative enhancer the median and average PhastCons score. To know which of them overlap putative active enhancers in human pancreas, we used the Bedtools "intersect" tool with default ≥1 bp of overlap (Fig. 3b, blue). To calculate the Fold Change (FC) of the graph displayed in Fig. 3c, we have quantified the number of zebrafish H3K27ac positive sequences aligned with the human genome that also showed H3K27ac signal in human pancreas (ZebraHumanK27). As a control, we have performed a similar analysis, randomizing the aligned human sequences, quantifying the number of those that also showed H3K27ac signal in human pancreas, repeating this operation 10 5 times (randomZebraHumanK27). FC was calculated by the ratio: ZebraHumanK27/ [average(randomZebraHumanK27)] (Supplementary Dataset 3q). This was performed for the different populations of zebrafish enhancers (Pancreas, PsE, DevE, and embryo).
Transcription factor binding motifs enrichment. To refine our data, H3K27ac peaks were filtered with the ATAC-seq peaks. Then, the transcription factor binding site (TFBS) predictor program Hypergeometric Optimization of Motif EnRichment (HOMER v.4.11.1) was used to identify conserved sequence motifs enriched 103 . To evaluate our results, we also analysed, using HOMER, different acetylation data from: human pancreas, human ventricle, zebrafish embryos at 24 hpf and at dome +80%epiboly (Supplementary Dataset 3t, u and 4g). From the resulting analysis, we selected the top 140 enriched motifs for each dataset. These motifs were selected based on ranking and the groups were compared by performing hypergeometric enrichment tests. Fisher exact test from GraphPad Prism 7 (v.7.04) was performed to evaluate the enrichment in 25 known pancreas-related TFs (with Bonferroni correction). The HOMER software was also similarly applied in PsE, C1, C2, C3 and C4 in order to identify TFBS ( Supplementary Fig. 3f, g, Fig. 4 and Supplementary Dataset 3t, u).
Identification of super-enhancers. We applied ROSE (Ranking Ordering of Super-Enhancers, v.1) algorithm with default parameters to define super-enhancers in our whole pancreas acetylation data and in human pancreas acetylation data 58 . Then, we performed gene ontology analysis in both data using PANTHER software (v.14.0, on April 2019) and compared the molecular functions obtained (http:// pantherdb.org). To identify the genes shared between the two groups, we identified the human orthologous genes in our zebrafish list using Biomart (https:// www.ensembl.org/biomart; on April 2019) and compared the groups (Fig. 3e,  Supplementary Fig. 3h).
Disease association enrichment of genes from different enhancer clusters. To know whether the genes interacting with the pancreatic enhancer sets (PsE, C1-C4) include homologs of human genes associated with pancreatic diseases in a higher proportion than expected by chance, we took human gene-disease associations from DisGeNET (v.6.0) 56 , for the available pancreatic diseases. Then, we derived for each disease, the set of zebrafish genes homologous to the human diseaseassociated genes. In detail, pancreatic diseases and their associated genes were selected from the file containing all gene-disease links from DisGeNET (all_gen-e_disease_associations.tsv, downloaded from the DisGeNET website on April 2019, v6.0, http://www.disgenet.org/, Integrative Biomedical Informatics Group GRIB/ IMIM/UPF), filtering for associations with a score > 0.1 to exclude those based only on text-mining. The disease search term used was "pancrea*", followed by manually filtering for pancreas-related diseases and their human associated genes.
Gene annotations were obtained from Ensembl via BioMart on April 2019 selecting protein coding genes in zebrafish and gene homologs between human and zebrafish. We required a minimum of 15 zebrafish genes relating to a disease to avoid significant gene set enrichments only due to small group ratios without real over/under representations, yielding 16 pancreatic diseases totalling 836 zebrafish homologs of human genes associated to pancreatic diseases (Supplementary Dataset 3r). To check whether the genes interacting with various enhancer clusters (Embryo only, C1, C2, C3, C4, PsE) are enriched for pancreas disease-association, we performed hypergeometric tests for gene set enrichment with the 16 pancreatic diseases left (R phyper function, X: number of genes in disease Ai and in enhancer set Bi; M: number of genes in disease Ai, N: non-disease genesnumber of zebrafish protein coding genes minus M; K: number of genes in enhancer set Bi). The R package "qvalue" was used to correct for multiple testing using FDR and convert unadjusted p-values into q-values 117 . Hypergeometric enrichment was obtained as the ratio "(number disease genes in clusterX/number of genes in clusterX)/(number disease genes/number of protein coding genes)". Finally, diseases with an absolute enrichment ≥ 1.5 and a q-value ≤ 0.05 were considered significantly enriched in the respective cluster (Fig. 3d).
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
All raw sequencing data generated within this study has been submitted to ENA under accession number "PRJEB40292". The analysed data are available on "USCS browser [http://genome-euro.ucsc.edu/s/VDR_group_public_data/ Carrico_et_al_2020_ZebrafishPancreasRegulome]" and in Supplementary material.

Code availability
The custom code for analysis of optical action potential traces is available in gitbub (https://gitlab.com/rdacemel/pancreasregulome) 118