Functional annotations of three domestic animal genomes provide vital resources for comparative and agricultural research

Gene regulatory elements are central drivers of phenotypic variation and thus of critical importance towards understanding the genetics of complex traits. The Functional Annotation of Animal Genomes consortium was formed to collaboratively annotate the functional elements in animal genomes, starting with domesticated animals. Here we present an expansive collection of datasets from eight diverse tissues in three important agricultural species: chicken (Gallus gallus), pig (Sus scrofa), and cattle (Bos taurus). Comparative analysis of these datasets and those from the human and mouse Encyclopedia of DNA Elements projects reveal that a core set of regulatory elements are functionally conserved independent of divergence between species, and that tissue-specific transcription factor occupancy at regulatory elements and their predicted target genes are also conserved. These datasets represent a unique opportunity for the emerging field of comparative epigenomics, as well as the agricultural research community, including species that are globally important food resources.

1. The figures in article look boring. There are too many histograms in main article and the colors are not coordinated. The authors may find more way to visualize their data efficiently and beautifully. 2. There are some symbol errors in some places. For example, "Trim Galore!" should be "Trim Galore". 3. Some parameters of used softwares are not desrcibed. For example, the parameters of ChromHMM. 4. Line 217. The two -shouldn't be here. 5. Line 335. There are some typos in some places. For example, "iDeal" should be "ideal". 6. Line 445. The explanation of the name should be at where it first appears. For example, transcription start site (TSS) should not be here. 7. Line 445. Why the cutoff of TSS is 2kb? Have you tested other cutoffs, like 3kb or 1kb?
Reviewer #2: Remarks to the Author: Comments to Authors Overall, the paper is well-written, and the resources newly generated here (i.e., functional genome annotations of three farm animal species) are of general interest and high impact in the wider fields including animal/human genetics and comparative genomics. The statistical methods used are solid. Through comparing epigenomes of multiple tissues across five species (including human and mouse), Dr. Zhou and his colleagues characterized the conservation of regulatory elements, particularly for enhancers. Taking cattle as example, they also showed the importance of regulatory elements in the interpretation of genetics of complex traits in livestock. I have a few major comments as shown below: 1) I think it could be useful to discuss the limitations of current resources and next steps. For instance, in this pilot project of FAANG, the authors only considered adult male animals. Any plans to generate tissues in females in the future, as the regulatory elements are tissue-dependent and most traits of economic traits may be relevant with female tissues, such as milk production (mammary gland) and egg production (ovary)? How about different developmental stages, and single-cell levels (as the difference across species may due to differences in cell composition of tissues)? 2) Could the authors biologically explain why each enhancer regulate more genes in chicken than in cattle and pig? Any technical bias from different qualities of reference genomes? Did you restrict your comparisons within 1-to-1 orthologues genes in Fig. 4b?
Minors: 1. Line 132-133: How did you define expressed genes? What about the non-expressed genes in terms of active TSS proximal REs? 2. Line 162-163: How did you define the four groups, i.e., A, B, C and D? It could be good to explain a bit in the main text as well. 3. Line 176: How about the function of genes targeted by conserved enhancers? Did they have the similar targets across species? Did they show the similar functions as genes with conserved promoters? 4. Line 223-225: Could the authors show the average distances (or the distribution of distances) between RE and their predicted targets? What proportion are they overlapped with the "naive" approach (i.e., nearest genes)? 5. Line 225-226: The authors found that each RE interact with more genes in chicken than in pig and cattle. Is this due to outliers (i.e., a few enhancers interact with many genes in chicken)? 6. The authors found that the majority of conserved enhancers were universal to all tissues (Line 175), while they also found that there is tissue-specific conservation of regulatory features across species (Line 250-251). Did this indicate that the degree of activity (i.e., signal density of H3K27ac) of conserved enhancers is an important driver of tissue-specificity that was conserved across species? 7. Line 256-263: These were data summary. It could be better to put them in the beginning of Results section (Data Overiew). 8. Line 282: Is this 2.5 times difference statistically significant? 9. Could you explain a bit why the chromatin states annotated in chicken and pig covered much larger parentages of genomes than cattle did (even human and mouse) across most of tissues (Fig. 1c)? However, cattle has more active regulatory regions than chicken (Fig. 1d). 10. Fig. 2a, Is this calculated based on the entire genome sequence information (or regions of regulatory elements)? 11. Fig. 3, could you show how did you get q-values (FDR?) in the legend? 12. Fig. 4a, Was the median/mean size measured by bp?
We are very grateful to the editor and reviewers for their efforts in reviewing our manuscript. These comments are extremely helpful in improving the strength and clarity of this paper. Below, please find a point-by-point response to each comment from the reviewers.

Reviewer #1 (Remarks to the Author):
The authors have tried to annotate genome-wide regulatory elements in chicken, pig and cattle and conducted a large-scale analysis comparing epigenomes, genomes, and transcriptomes of eight tissues in other vertebrates. The results show that intergenic enhancers have low genomic positional conservation compared to promoters and genic enhancers, and that some regulatory elements are functionally conserved across mammals and birds despite of their evolutionary distance. The results and especially the datasets provide useful resources for future agriculture and comparative epigenomic studies on domestic animals.
Major points: 1. The DNase-seq and ATAC-seq show significant difference. What cause this difference and how does it impact the downstream analysis? Why not to use the same technology, say, ATAC-seq which is a robust technology, to improve the integrity of the paper?

Response: The reviewer brought up a great point. The DNase-seq and ATAC-seq datasets are quite similar in terms of the identification of open chromatin regions in the genome, as can be seen in Fig 1e with a similar percentage of regulatory elements in open chromatin regions across all three species.
We have also shown in a previous publication that open chromatin regions measured from DNase-seq and ATAC-seq on the same tissue (chicken lung) overlap substantially (https://doi.org/10.1038/s41598-020-61678-9). ATAC-seq data produces a higher number of peaks than DNase-seq, but the genome coverage of these peaks is similar because ATAC-seq tends to produce more narrow peaks. When it comes to identifying transcription factor occupancy with the footprinting method, a significant difference is indeed present. The HINT tool from the Regulatory Genomics Toolkit that we used has made significant progress in accounting for the biases present in these assays; however, there is still a difference between the two assays. We believe the DNase-seq footprinting results to be the most accurate, while the ATAC-seq footprinting results have a higher false positive rate. As these results were used for the transcription factor enrichment analyses, the higher false positive of footprints in ATAC-seq would have the effect of making it harder to achieve significant TF enrichment. Nevertheless, we are still able to show similarities in enrichment across species. It is possible that DNase-seq data from all species would have shown more similarities; however, the similarities found with ATAC-seq data we believe are real. The original experimental design for this project was to use DNase-seq in all species by Prof. John Stam's Lab at University of Washington; however, after data was generated for chicken, the difficulty, budget, and time required to produce high quality data made it unrealistic to produce such datasets for pig and cattle. With ATAC-seq being now the most commonly used method to map open chromatin we switch to this assay for pig and cattle.