Tissue-specific 5-hydroxymethylcytosine landscape of the human genome

5-Hydroxymethylcytosine (5hmC) is an important epigenetic mark that regulates gene expression. Charting the landscape of 5hmC in human tissues is fundamental to understanding its regulatory functions. Here, we systematically profiled the whole-genome 5hmC landscape at single-base resolution for 19 types of human tissues. We found that 5hmC preferentially decorates gene bodies and outperforms gene body 5mC in reflecting gene expression. Approximately one-third of 5hmC peaks are tissue-specific differentially-hydroxymethylated regions (tsDhMRs), which are deposited in regions that potentially regulate the expression of nearby tissue-specific functional genes. In addition, tsDhMRs are enriched with tissue-specific transcription factors and may rewire tissue-specific gene expression networks. Moreover, tsDhMRs are associated with single-nucleotide polymorphisms identified by genome-wide association studies and are linked to tissue-specific phenotypes and diseases. Collectively, our results show the tissue-specific 5hmC landscape of the human genome and demonstrate that 5hmC serves as a fundamental regulatory element affecting tissue-specific gene expression programs and functions.

D NA methylation at the fifth position of cytosine (5mC), which is established and maintained by DNA methyltransferases, is a predominant epigenetic modification that is critical for various biological and pathological processes, including silencing of transposable elements, regulation of gene expression, genomic imprinting and X-chromosome inactivation 1,2 . 5-Hydroxymethylcytosine (5hmC), also known as the "sixth base" of DNA, was discovered as another relatively abundant form of cytosine modification in Purkinje neurons and mouse embryonic stem cells (mESCs) 3,4 . Further studies found that the ten-eleven translocation (TET) family of Fe(II)-and α-ketoglutaratedependent DNA dioxygenases (including TET1, TET2, and TET3) catalyzes the sequential oxidation of 5mC to 5hmC, 5formylcytosine (5fC) and 5-carboxylcytosine (5caC) [4][5][6] . Subsequently, the DNA repair enzyme thymine-DNA glycosylase (TDG) can excise 5fC and 5caC to generate abasic sites, which eventually results in the regeneration of unmodified cytosines by the base excision repair pathway [6][7][8] . This TET-TDG pathway is known as the active DNA demethylation pathway.
In addition to being an intermediate of 5mC oxidation, evidence shows that 5hmC is a stable epigenetic mark with regulatory functions 9 . The specific genomic distribution pattern of 5hmC, such as its high enrichment in the gene bodies of transcriptionally active genes, promoters and enhancers, hints at specific biological roles of 5hmC [10][11][12] . In addition, the 5hmC level undergoes highly dynamic changes during development, differentiation and cancer 4,13,14 . For instance, the global 5hmC content is dramatically reduced in multiple human cancers compared with that in the normal tissues adjacent to the cancer [15][16][17] , suggesting that dysregulation of genomic 5hmC may be involved in tumorigenesis. Mechanistically, abnormal hydroxymethylation status impacts chromatin structure by interrupting the interaction of 5hmC-specific binding proteins with 5hmC 18 . Altogether, existing data have suggested a critical role of 5hmC in developmental processes and an association of dysregulation of 5hmC with human diseases. Different methods have been developed to map the genomic distribution of 5hmC, including affinity-based methods (5hmC-DIP-Seq, CMS-Seq, GLIB, and hMe-Seal) and high-resolution methods (oxBS-Seq, TAB-Seq, hmC-CATCH, TAPS, and ACE-Seq) 19,20 . For instance, we previously developed hmC-CATCH 21 , which is a bisulfite-free method for genome-wide detection of 5hmC. hmC-CATCH couples selective chemical labeling and biotin pulldown [21][22][23][24] , thus allowing 5hmC enrichment and detection at single-base-resolution. While hydroxymethylome maps have been obtained for mammalian cell lines and mouse tissues, 5hmC profiles in human tissues have not been as comprehensively characterized.
In this study, we generated genome-wide, base-resolution 5hmC data via hmC-CATCH across 19 human tissue types, represented by 60 tissue samples derived from 6 Chinese donors. We found that 5hmC is enriched in gene bodies and that it exhibits a better positive correlation with gene expression than gene body 5mC. Tissue-specific differentially hydroxymethylated regions (tsDhMRs) reside in regions enrich for regulatory elements, and co-localize with transcription factor-binding sites, which may regulate the tissue-specific gene expression network. Furthermore, tsDhMRs were shown to enrich disease-related single-nucleotide polymorphisms (SNPs) identified by genomewide association studies (GWAS), linking 5hmC to human phenotypes and pathologies. Collectively, our results provide a highresolution, high-quality atlas of DNA hydroxymethylation across diverse human tissues and provide a resource for exploring the role of this modification in gene expression networks.

5-Hydroxymethylcytosine landscape of diverse human tissues.
To systematically investigate the dynamics of DNA hydroxymethylation across different human tissues, we collected samples of 19 tissue types from 6 Chinese donors: 3 males and 3 females ( Fig. 1a and Supplementary Tables 1 and 2). We profiled the 5hmC signals of the various tissues via hmC-CATCH (60 hmC-CATCH samples; 83-183 million paired-end reads per sample; average, 128 million paired-end reads) (Supplementary  Table 3). We found that the 5hmC-containing spike-in sequence was specifically and efficiently enriched ( Supplementary Fig. 1a) and displayed a high detection rate for 5hmC (~91% and~98% before and after pulldown) ( Supplementary Fig. 1b, c and Supplementary Table 4), both of which showed the high quality of the hydroxymethylome generated by hmC-CATCH. We found that 80% reads in the pulldown samples have C-to-T conversion and 94% of peaks contain at least one confident 5hmC site; nevertheless, in order to ensure the confidence of detection, we only selected the reads that contain C-to-T conversion signals (Supplementary Fig. 1d, e). This made sure that all peaks are 5hmCenriched regions. It is also worth mentioning that after removing reads without C-to-T conversion,~97% peaks were still identified. In addition, tissues derived from the same organ system from different donors were clearly clustered together ( Fig. 1b and Supplementary Fig. 1f), further demonstrating the confidence of the 5hmC data.
We next analyzed the genomic features of 5hmC. We identified 721,404 reproducible 5hmC peaks, most of which were located in intron and intergenic regions (Fig. 1c). The peak number of each tissue type ranged from 272,391 to 525,080 ( Supplementary  Fig. 1g). We found that 5hmC was highly enriched in gene bodies, especially in exon regions, while it was depleted in intergenic regions ( Supplementary Fig. 1h). Taking the HOXD gene cluster as an example, we observed 5hmC signals in distal regulatory elements, promoters and gene bodies; such signals were also dynamic among tissues (Fig. 1d). Hence, via hmC-CATCH, we were able to generate the whole-genome 5hmC landscape of different human tissues.
Hydroxymethylome at single-base-resolution. To obtain a more detailed and clearer landscape of the hydroxymethylome, we analyzed 5hmC sites at single-base-resolution. We identified 9,416,937 reproducible 5hmC sites, with the site number of each tissue type ranging from 1,217,850 to 2,429,878 (~3-10% of CpG sites were hydroxymethylated) ( Supplementary Fig. 2a). While the number of 5hmC peaks in the brain is comparable to other tissues ( Supplementary Fig. 1g, presumably due to the insensitivity of enrichment-based methods to overall modification level change), the number of 5hmC sites in the brain is significantly higher than that in other tissues (P-value: 6.57 × 10 −7 ), consistent with the higher 5hmC content in the brain 20 . A representative locus is shown in Supplementary Fig. 2b, in which the single 5hmC sites identified by hmC-CATCH co-localize nicely with 5hmC sites by TAB-seq. More than half of the 5hmC sites were identified in at least two tissues, suggesting that a substantial number of 5hmC sites could be conserved ( Supplementary  Fig. 2c). Most of the 5hmC sites were located in intron and intergenic regions (Fig. 2a) and were highly enriched in gene bodies ( Supplementary Fig. 2d).
5mC in the CG context is almost symmetrical due to the maintenance of DNA methyltransferase 1. A total of 98.46% of the 5hmC sites identified were located in the CG context (Fig. 2b), but only 13% of the 5hmCG sites were symmetrical (Fig. 2c), which is consistent with previous data in mESCs 25 . Thus, this pattern of 5hmC differs from that of 5mC, which is symmetrical in human tissues 26 . In addition, the proportion of 5hmC in the non-CG context (5hmCH) was variable in different tissues, ranging from 0.96% to 2.79% ( Fig. 2d and Supplementary Fig. 2e). Previous research suggests that DNA methylation is rapidly accumulated in 5mCH sites during synaptogenesis 27 , and we also found a higher 5hmCH ratio in the brain than that in other tissues ( Fig. 2d and Supplementary Fig. 2e), indicating that 5hmCH may play a role during brain development.
We next analyzed the 5hmC context in tissues and found a "CA hm CGT" motif for 5hmC ( Fig. 2e and Supplementary Fig. 2f, g), which coincided with the binding site of ARNT, a housekeeping gene that participates in important metabolic processes. Within ARNT ChIP-seq signals, cytosines are hydroxymethylated in most tissues, with evident C-to-T conversion in the sequencing reads and hence clear single 5hmC sites ( Fig. 2f and Supplementary Fig. 2h). This observation hints that 5hmC may positively influence the genomic occupancy of ARNT. Notably, it has been reported that the ARNT binding motif loses 5hmC signals in esophageal cancer patients compared to healthy individuals 28 . With regard to the 5hmCH modification, we found that the most frequent base following 5hmCH was adenine ( Supplementary  Fig. 2i, j), consistent with the 5mCH sequence preference 26 .
Taken together, the results of the base-resolution hydroxymethylome analysis presented above revealed the varied 5hmCH ratio among tissues, the asymmetrical distribution of 5hmCG and the sequence preference of 5hmC, providing a potential mechanism for how 5hmC can be modified, recognized and dynamically regulated.
Gene body 5hmC excels 5mC in reflecting gene expression. As the 5hmC level in the gene body shows a positive correlation with gene expression in mESCs and the mouse brain 18 , we analyzed whether the gene body 5hmC level in human tissues also corresponds to gene expression. Indeed, we found that stronger gene body 5hmC signals were correlated with higher gene expression levels ( Fig. 3a and Supplementary Fig. 3a). For instance, the genes that escape X chromosome inactivation display higher gene body 5hmC levels than the inactive genes ( Supplementary Fig. 3b, c). As the gene body 5mC level is reported to be positively correlated with transcription 1 , we also compared the gene body 5mC and 5hmC levels (Fig. 3b, c). Although 5mC co-localized with 5hmC in gene bodies (Fig. 3d), we found that the gene body 5hmC level exhibited a stronger quantitative correlation with gene expression levels than the gene body 5mC level. More specifically, we found 5hmC is almost linearly positively correlated with gene expression, while 5mC shows a non-monotonic and bell-shaped relationship with gene expression ( Supplementary Fig. 4a-e) 29 .
To further explore the dynamics of the gene body 5hmC across human tissues, we defined 4,031 tissue-specifically expressed genes using RNA expression data generated by the Genotype-Tissue Expression project (GTEx) 30 (Supplementary Fig. 4f). The expression of tissue-specific genes and gene body 5hmC levels also demonstrated a positive correlation, which was absent for 5mC ( Supplementary Fig. 4g-i). Correlations of representative tissue-specific marker genes are shown in Fig. 3e-g. For instance, PGA4 is specifically expressed in the stomach and encodes the precursor of pepsin; we found the highest level of gene body 5hmC signals in the stomach among all tissues (Fig. 3e, f). In contrast, gene body 5mC failed to display a positive correlation with tissue-specific gene expression (Fig. 3g). Taking CYP4A11 as another example showed the following: this gene is specifically expressed in the liver and is involved in drug metabolism and the synthesis of cholesterol; consistent with this function, we also found higher gene body 5hmC levels in the liver than in other tissues (Fig. 3h). Again, this pattern was not found for 5mC ( Supplementary Fig. 4j). As 5hmC shows a strong co-localization with Poll 21 , it was proposed to play a role in the transcription process. Collectively, these data show that the gene body 5hmC level correlates well with gene expression and may be associated with the maintenance of tissue-specific functions.  Table 2. e The most significant 5hmC motif in different genomic elements demonstrates partial sequence overlap with the known transcription factor motif of ARNT. f IGV visualization of the 5hmC site signals on chromosome 3. The gray bar represents the Cto-T counts. 5hmC is asymmetrical at the 5hmCG site. ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-24425-w tsDhMRs reside in regions that may function as tissue-specific regulatory elements. 5hmC may participate in development and cell differentiation 11,31,32 ; nevertheless, its regulatory functions in different tissues have not been reported. We found 5hmC showed high enrichment in cis-regulatory elements in the corresponding tissues (Fig. 4a). In contrast, 5mC are depleted in cis-regulatory elements (Fig. 4b). We identified 33.3% (240,269 out of 721,404 peaks) of the 5hmC peaks as being differentially hyperhydroxymethylated in different tissues, which we term as tissue-specific differentially hydroxymethylated regions (tsDhMRs) (Supplementary Fig. 5a, See Methods). Majority of the tsDhMRs are located in intron and distal regions (56.0% and 31.5%, respectively, Supplementary Fig. 5b). We further analyzed tsDhMRs at distal regions and found that they are enriched for H3K4me1 and H3K27ac in corresponding tissues, which are putative enhancer marks (Fig. 4c, d and Supplementary Fig. 5c). Thus, tsDhMRs reside in regions with regulatory potential. In addition, tsDhMRs show higher evolutionary conservation than random regions ( Supplementary Fig. 5d), suggesting the importance of such functional elements.
We then investigated how these tsDhMRs may contribute to the gene expression programs of specific tissues. We adopted an existing method to identify linkages between tsDhMRs and putative target genes occurring within a 500-kb window 33 . In Fig. 3 Gene body 5hmC correlates well with gene expression in human tissues. a 5hmC profiles of genes expressed at high (yellow), low (cyan), and silenced (blue) levels in heart tissue. b Spearman's correlation of gene body 5hmC (5mC) signals and gene expression levels. Random regions were selected as controls (***represents P-value < 0.0001; two-side Mann-Whitney U-test; n 5hmC = 16 tissue types; n 5mC = 12 tissue types; 5hmC vs. random Pvalue: 3.32 × 10 −9 ;5mC vs. random P-value: 7.396 × 10 −7 5hmC vs. 5mC P-value :6.57 × 10 −8 ; red lines represent the first and third quantiles). c 5mC profiles of genes expressed at high (yellow), low (cyan) and silenced (blue) levels in heart tissue. d Scatter plot showing the correlation of 5mC and 5hmC signals at gene bodies. e Heatmap displaying the expression levels of tissue-specific marker genes. Gene expression of matched tissue samples downloaded from the GTEx project. f, g Heatmap displaying normalized gene body 5hmC (f) and 5mC (g) signals of tissue-specific marker genes (order of rows as in e). 5mC data downloaded from ENCODE. h IGV visualization of the 5hmC signals at the gene body of CYP4A11 and surrounding regions. Gene expression levels of the gene in different tissues are shown on the right. NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-24425-w ARTICLE NATURE COMMUNICATIONS | (2021) 12:4249 | https://doi.org/10.1038/s41467-021-24425-w | www.nature.com/naturecommunications total, we identified 2,279 genes whose expression showed significant correlations with 5hmC signals in tsDhMRs ( Fig. 4e and Supplementary Fig. 5e). These were genes with tissue-specific expression, such as CYP11B2 in adrenal glands and NPPB in heart. Gene ontology (GO) analysis demonstrated that these genes are enriched in tissue-specific functions, such as learning or memory (brain), kidney epithelium development (kidney), female gonad development (ovary) and sex differentiation (uterus) ( Fig. 4f and Supplementary Fig. 5f). Similarly, KEGG pathway analysis revealed that tsDhMR-associated genes participate in tissue-specific functional pathways ( Supplementary Fig. 5g). CYP2C8, which is involved in the metabolism of xenobiotics, is highly expressed in liver, with liver-specific DhMRs upstream of CYP2C8 (Fig. 4g). Meanwhile, PTF1A plays a vital role in mammalian pancreatic development; and there are pancreasspecific DhMRs around the gene ( Supplementary Fig. 5h). In summary, tsDhMRs may contain putative regulatory elements, which specifically affect the expression of nearby functional genes.
Tissue-specific transcription factors are enriched in tsDhMRs. We next analyzed potential transcription factor-binding site (TFBS) within tsDhMRs. We found that more than half of the tsDhMRs overlapped with at least one TFBS ( Supplementary  Fig. 6a), and 20.1% of the distal tsDhMRs even overlapped with   ST  SX  TC  TR  UT   AG_1  AO_1  AO_2  BD_2  BD_3  BR_3  BR_3_BS  BR_3_CE  BR_3_HT  BR_3_SC  BR_4_BF  BR_5_BF  BR_6_BF  ES_3  HR_1  HR_4_LV  HR_6_LV  KD_4  KD_5  KD_6  LE_2_AP  LE_2_RE  LE_3_AP  LE_3_CL  LE_3_RE  LE_4  LE_6  LI_1  LI_6  LN_1  LN_4  LN_5  OV_5  OV_6  PA_3  PA_4  SE_2_IE  SE_2_JE  SE_3_IE  SE_3_JE  SE_4  SE_5  SK_1  SK_3  SK_6  ST_3  ST_4  ST_5  SX_1  SX_4  SX_5  TC_2  TC_3  TR_2  TR_3  SE_6  UT_4  UT_5  UT_6 steroid biosynthetic process cellular hormone metabolic process ketone biosynthetic process outflow tract morphogenesis outflow tract septum morphogenesis epithelial tube morphogenesis positive regulation of synaptic transmission learning or memory long-term synaptic potentiation regulation of vascular permeability vascular process in circulatory system positive regulation of myoblast differentiation metanephros development nephron tubule development kidney epithelium development steroid metabolic process fatty acid metabolic process lipid localization reproductive system development reproductive structure development male sex differentiation digestion cobalamin metabolic process extracellular matrix disassembly skin development epidermis development peptide cross-linking regulation of leukocyte cell-cell adhesion lymphocyte proliferation mononuclear cell proliferation branching morphogenesis of an epithelial tube epithelial tube morphogenesis smooth muscle contraction 5   no less than four TFBSs (Fig. 5a). We further investigated different transcription factors (TFs) enriched in tsDhMRs derived from each tissue type and found that tissue-specific TFs were significantly enriched in the corresponding tsDhMRs, regardless of H3K4me1/H3K27ac modification ( Fig. 5b and Supplementary  Fig. 6b, c)  The TF motifs enriched in distal tsDhMRs of each tissue. The color scale represents the -log 10 (P-value). One-side binomial test (default by homer2). c ChIP-seq signals of NEUROD1 and HNF4A around BR-specific DhMRs and LI-specific DhMRs. d IGV visualization of 5hmC signals and ChIP-seq signals of NEUROD1 and HNF4A in the brain and liver. e, f Key TF regulatory networks in the liver (e) and brain (f). Hexagon represents key TF; rhombi represents tsDhMRs; circles represents regulated genes. Lines linking the hexagon and rhombi indicate that the TF binds to the tsDhMRs, which is supported by TF ChIP-seq data. development, while HNF4A regulates the expression of several hepatic genes. By comparing to ENCODE ChIP-seq data, we found that NEUROD1 specifically binds to brain-specific DhMRs, while HNF4A specifically binds to liver-specific DhMRs (Fig. 5c and Supplementary Fig. 6d). We further used LINGO1 and CYP2C18, which are downstream targets of NEUROD1 and HNF4A in the brain and liver, respectively, to demonstrate the tissue-specific coexistence of 5hmC signals (Fig. 5d). Thus, key tissue-specific TFs may rewire their tissue-specific network through binding to tsDhMRs. To further understand the regulatory functions, we constructed regulatory networks mediated by HNF4A and NEUROD1 with their corresponding tsDhMRs. We found that HNF4A can regulate genes with tissue-specific expression, such as CYP4F2, CYP8B1, and UGT1A9, via a process potentially mediated by liver-specific DhMRs (Fig. 5e). Within the NEUROD1 network, brain-specific DhMRs regulate GRIK3, CSMD2 and other genes to perform their brain-specific functions (Fig. 5f). Through network analysis, we illustrated that the key TFs interact with tsDhMRs, which may further affect the expression of tissuespecific functional genes.
GWAS SNPs are preferentially located within tsDhMRs. We next analyzed the potential relationship of tsDhMRs with functional single-nucleotide polymorphisms (SNPs). We found that the tsDhMRs derived from each tissue highly overlapped with the GTEx single-tissue eQTL SNPs (Fig. 6a). In fact, tsDhMRs contain SNPs that are functional in the corresponding tissues. Moreover, tsDhMRs are also enriched for GWAS SNPs 34 with phenotypes related to the corresponding tissue functions (Fig. 6b and Supplementary Fig. 6e, f), indicating that tsDhMRs reside in regulatory elements that associate with tissue-related diseases. We used several examples to elaborate the findings: (1) SNPs related to electrocardiographic traits and QRS duration were highly enriched in heart-specific DhMRs; (2) SNPs related to metabolite levels and LDL cholesterol were enriched in liver-specific DhMRs; and (3) SNPs related to type 2 diabetes were enriched in pancreas-and adrenal gland-specific DhMRs.
More specifically, we used an example to illustrate the potential mechanism by which distal GWAS SNPs may impact diseases. HCN4, which is necessary for the cardiac pacemaking process, is specifically expressed in the heart (Fig. 6c). We found that several heart-related GWAS SNPs were localized in the heart-specific DhMR (Fig. 6c), indicating that the DhMR is associated with heart diseases. Moreover, several enhancers of the heart identified by ENCODE candidate cis-regulatory elements (cCREs) were located in this DhMR (Fig. 6c). We further integrated the VISTA enhancer data 35 , which were validated by transgenic mouse assays, and found an enhancer (hs2161 from VISTA) within the DhMR. This enhancer is specifically expressed in the mouse heart (images from VISTA database) (Fig. 6d). These data confirm that DhMR, located in enhancer, is functional in the heart and is related to heart diseases. These results indicate that dysregulation of tsDhMRs may be involved in human disease pathologies. Collectively, our data show that tsDhMRs could help us understand the function of distal GWAS SNPs in the corresponding tissues.

Discussion
In this study, we present a base-resolution atlas of 5hmC in human tissues. Hundreds of thousands of 5hmC peaks and millions of 5hmC sites were identified in this dataset, expanding the epigenomic landscape determined by previous large-scale efforts, for example, the ENCODE project. While our work was under review, a 5hmC tissue map of adults of European descent was published, which found that 5hmC is preferentially enriched on tissue-specific gene bodies and enhancers 36 . Thus, tissuespecificity of 5hmC is validated in different human ancestry. Moreover, this study has enabled additional findings: (1) The correlation between 5hmC/5mC levels and gene expression in multiple human tissues has not been directly compared previously. We found that tissue-specifically expressed genes show tissue-specific 5hmC patterns, which is not observed for 5mC; (2) tsDhMRs reside in regions with regulatory potential to control the expression of nearby tissue-specific functional genes; (3) tsDhMRs are enriched with tissue-specific transcription factorbinding sites and rewire the regulatory network of tissue-specific transcription factors; (4) tsDhMRs reside in regulatory elements that associate with tissue-related functional GWAS SNPs.
Our 5hmC tissue map is base-resolution, owning to the use of hmC-CATCH 21 . This feature not only improves the confidence of 5hmC detection ( Supplementary Fig. 1e), but also reveals information with regard to the detailed distribution pattern of 5hmC (Fig. 2). For instance, we found a "CAhmCGT" motif that resembles the binding motif of ARNT While approximately half of ARNT binding sites contain 5hmC, only a small part of 5hmC sites contain ARNT binding sites. Thus, ARNT binding only partially explains the 5hmC motif. Among the many possibilities for this observation, other transcription factors could be invovled.
Gene body 5hmC levels are positively correlated with gene expression, especially of genes with tissue-specific expression. Thus, the gene body 5hmC levels may be used to infer gene expression in tissues. This can be particularly useful in some precious clinical samples, where RNA degradation is severe (frozen samples, formalin-fixed paraffin-embedded samples, body fluids and so on). For instance, 5hmC levels in cell-free DNA (cfDNA) have been utilized as biomarkers for cancer diagnosis 21,[37][38][39] . While it is anticipated that healthy and cancerous tissues may have different hydroxymethylomes, cancerspecific 5hmC signatures, which may reflect the gene expression program in different cancers, could be identified for noninvasive cancer diagnosis. Thus, the relative stability of epigenetic modifications makes 5hmC a promising candidate for prediction of gene expression in clinical samples.
Using the hydroxymethylome of multiple tissues, we discovered that approximately one-third of all 5hmC peaks are tissue-specifically hyperhydroxymethylated. We found that tsDhMRs, residing in potential regulatory regions, are positively correlated with gene expression, which is in contrast to the fact that differentially methylated regions (DMRs) show a negative correlation 40 . Moreover, tissue-specific TFs are enriched in tsDhMRs, providing a mechanism by which key TFs may regulate tissue-specific gene expression via tsDhMRs. Based on our identified tsDhMRs, future studies could be designed to illustrate the specific mechanisms by which 5hmC can regulate tissue development and differentiation.
Through integration of GWAS SNP and 5hmC data, we discovered that tsDhMRs were significantly enriched with GWAS SNPs. Our data indicate that tsDhMRs contribute to tissuerelated diseases. Although GWAS SNPs are consistent among somatic cells, they may affect tsDhMR 5hmC levels and ultimately lead to dysfunctions of the corresponding tissues. Our analysis provides new insights into the understanding of GWAS data, where distal GWAS SNPs interact with tsDhMRs to regulate target genes. Disruption of tsDhMR 5hmC levels may result in tissue-related disease phenotypes.
Collectively, our data provide a rich resource for understanding the 5hmC landscape in human tissues. The reported human tissue hydroxymethylome adds to the knowledge of how this epigenetic mark may affect tissue-specific differentiation and diseases.

Methods
Biospecimen collection. We collected a total of 60 tissue samples from deceased donors who had died of natural and accidental deaths at Zhongshan Hospital (Shanghai, China); we targeted 19 distinct tissues from 6 postmortem Chinese individuals, including 3 males and 3 females. All samples were obtained research consent from the families and informed consent under a protocol approved by the ethics committee of School of Basic Medical Sciences of Fudan University, China. The collected samples were transferred to cryovials for long-term storage at −80°C.
Preparation of genomic DNA. Genomic DNA was isolated from tissues using the Blood/Cell/Tissue Genomic DNA Extraction Kit (TIANGEN, DP304) following the manufacturer's specifications. The 5hmC sequencing libraries were constructed with 200 ng of genomic DNA. Briefly, the DNA was fragmented with a ME220 Focused-ultrasonicator (Covaris, ME220) to 300-500 bp. We added a model sequence as a spike-in to the fragmented DNA (Ref, 5hmC and 5fC spike-in, Supplementary Data processing. Illumina sequencing adapters and low-quality reads were removed from raw sequencing data to obtain clean data. We added a sequencing barcode through the hmC-CATCH protocol to mark PCR-duplicated reads. Then, the PCR-duplicated reads were filtered out, and only one read was retained using an in-house script. The final cleaned reads were mapped to hg38 by Bismark (Version: v0.15.0). To enhance the signal-to-noise ratio, we used only read pairs with more than one C-to-T conversion for further analysis.
Assessing the C-to-T conversion rate of 5hmC sites. We added a model sequence as a spike-in before constructing the hmC-CATCH library, of which one C site was 100% hydroxymethylated. After treatment through the hmC-CATCH protocol, we observed a C-to-T conversion signal in this 5hmC site. We used the T/(C + T) of the sequencing data at this site to estimate the C-to-T conversion rate.
Identification of 5hmC sites. We used Bismark (Version: v0.15.0) to extract single-base-resolution information. Sites with less than five total bases (NT + NC) or three NTs were discarded for 5hmC calling. Then, we used the binomial distribution with N as the sequencing depth (NC + NT) and p as the normal cytosine conversion rate to assess the probability of observing NT by chance. We considered 5hmC sites with the Holm-Bonferroni method-adjusted P < 0.001 and located within a type of tissue 5hmC-enriched region as high-confidence 5hmC sites.
Identification of 5hmC peaks. Our hmC-CATCH approach could enrich DNA fragments with 5hmC. Here, we used a peak calling method to identify these regions with 5hmC. MACS2 (Version: v2.1.1) was applied to call peaks in each sample with the following command: "macs2 callpeak -t <5hmC bam> -c <input bam> -g hs -f BAMPE-keep-dup all-outdir <outdir> -n <sample name> " Annotation of 5hmC sites and peaks. The 5hmC sites or peaks were annotated by annotatePeaks.pl (Homer, Version: v4.5), and the "-annStats" parameter was added to quantify the enrichment of genomic elements compared to the background. Then, the log2 ratios of observation to expectation in all tissues were plotted as a bar chart by ggplot2 (R package, Version v3.5.1).
Analysis of the correlation of gene body 5hmC and 5mC. The RPKM values of all protein-coding genes were calculated and normalized as mentioned above. Spearman's correlation coefficients between the RPKM values and 5hmC/5mC signals in matched tissues were calculated by R (cor, method = "spearman", Version v3.5.1).
Identification of genes with tissue-specific expression. We used GTEx gene expression data (RNASeQCv1.1.9_gene_median_tpm) to identify the genes with tissue-specific expression that were defined as being highly expressed in one tissue. We first filtered out the genes with a mean TPM <1, which was regarded as low expression in all tissues. Then, we calculated the fold changes in the expression of protein-coding genes in one tissue over the mean values for other tissues. We ordered the genes according to fold change levels, and the top 300 in each tissue were regarded as genes with tissue-specific expression. To ensure that the top 300 genes with tissue-specific expression in each tissue were significantly more highly expressed than others, we filtered out the genes with fold changes lower than 2.
Identification of tissue-specific differential 5hmC regions. We first merged all 5hmC peaks from the 60 samples to obtain the total peaks using "Bedtools merge" (bedtools, Version: v2.27.1). Then, the read counts in each merged peak of all samples were calculated by "Bedtools multicov" (bedtools, Version: v2.27.1) and normalized as mentioned above. We merged all biological replicates from the same tissue to enhance the signals. We used the Poisson distribution to estimate the Pvalue of each peak in a tissue. The probability of read counts in each peak of one tissue was estimated by the one-tail Poisson distribution with the parameter λ as the mean for other tissues. The P-values were further adjusted by the Bonferroni method. The peaks with adjusted P-values < 0.05 and fold changes >2 were regarded as significant tissue-specific differential 5hmC regions.
Identification of tsDhMR-associated genes. We adapted a method to link tsDhMRs to putative genes 33 . We first identified all possible genes linked to tsDhMRs by searching any TSS of GENCODE (Version: v28) protein-coding genes within 500 kb of the tsDhMRs. Then, we calculated the Pearson correlation coefficients between normalized 5hmC signals and gene expression (TPM). To avoid spurious associations, we used cor.test (R, Version v3.5.1) to assess the stochastic links. Finally, we required that the confident links had a Pearson correlation over 0.8 and P-value lower than 0.05.
Motif-enrichment analysis. For each tissue, we used findMotifsGenome.pl (Homer, Version: v4.5) to find the motifs enriched in tsDhMRs with the following command: "findMotifsGenome.pl <tsDhMRs bed> hg38 <output> -p 5" Analysis of tsDhMRs associated with GWAS SNPs. We downloaded previously published GWAS datasets from the GWAS catalog 34 . Then, we calculated the associations of GWAS phenotypes and tsDhMRs. A GWAS phenotype always be associated with multiple SNPs. For each phenotype, we used Fisher's test to compute the significance and odds ratios between the phenotype-associated SNPs and tsDhMRs.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/ licenses/by/4.0/.