Integrated genomic analysis reveals regulatory pathways and dynamic landscapes of the tRNA transcriptome

tRNAs and tRNA-derived RNA fragments (tRFs) play various roles in many cellular processes outside of protein synthesis. However, comprehensive investigations of tRNA/tRF regulation are rare. In this study, we used new algorithms to extensively analyze the publicly available data from 1332 ChIP-Seq and 42 small-RNA-Seq experiments in human cell lines and tissues to investigate the transcriptional and posttranscriptional regulatory mechanisms of tRNAs. We found that histone acetylation, cAMP, and pluripotency pathways play important roles in the regulation of the tRNA gene transcription in a cell-specific manner. Analysis of RNA-Seq data identified 950 high-confidence tRFs, and the results suggested that tRNA pools are dramatically distinct across the samples in terms of expression profiles and tRF composition. The mismatch analysis identified new potential modification sites and specific modification patterns in tRNA families. The results also show that RNA library preparation technologies have a considerable impact on tRNA profiling and need to be optimized in the future.

www.nature.com/scientificreports/ (tRNA-Seq) have been introduced to overcome certain technical limitations [12][13][14][15] . For example, demethylation of the bases of tRNAs (DM-tRNA-Seq 14 , ARM-Seq 15 ) and tRNA fragments (Hydro-tRNAseq 16 ) were used to reduce sequencing bias resulting from posttranscriptional modifications or secondary structure. The Iso-tRNA-CP algorithm was designed to evaluate the relative expression levels of the tRNA genes based on their proportional transcript contribution to a corresponding isodecoder set to reduce the effects of sequencing bias 17 .
In this study, we analyzed publicly available large-scale NGS data from 1332 ChIP-Seq, 8 DM-tRNA-Seq, and 34 small-RNA Seq experiments to explore the mechanisms of transcriptional regulation of the tRNA genes and tRNA/tRF profiles in human cell lines and tissues (Fig. S1). More than 68 transcription factors and chromatin remodelers enriched in tRNA genes were identified in three human cell lines. The data show that tRNA transcription is tightly regulated by several disease-related pathways. Histone modifications, especially histone acetylation, may play important roles in the regulation of tRNA transcription. We used a new tRNAExplorer algorithm to identify 950 high-confidence tRFs in human cell lines and tissues. Comparison of tRNA profiles across the samples suggests that the profiles are dramatically distinct from each other in terms of expression and tRF composition. Certain new 5′-additions of tRFs, such as T −1 -addition, and 8 new potential modification sites were identified. Additionally, we found that tRNA cleavage sites are very conserved across the samples and clustered on exposed surfaces of tRNAs.

Materials and methods
Data collection and preprocessing. The human hg38 genome sequence (FASTA) and annotation (GTF) files were downloaded from the UCSC ftp server (https ://genom e.ucsc.edu/golde nPath ). The definitions of the tRNA genes based on the GtRNAdb database 13 were downloaded from the UCSC table browser (https ://genom e.ucsc.edu/cgi-bin/hgTab les). Structural information on the tRNA genes was predicted by tRNAScan-SE (http:// lowel ab.ucsc.edu/tRNAs can-SE/) 13 . Mitochondrial tRNA genes, pseudo tRNA genes, and genes with tRNAScan scores less than 30 were removed. The tRNA genes/transcripts were grouped into six levels based on the amino acids they carry, anticodons and sequence (Table S1).
ChIP-Seq data (bigWig files) were downloaded from the GEO database (https ://www.ncbi.nlm.nih.gov/geo/) (93 for h1-ESCs, 527 for HepG2 cells, and 712 for K562 cells; the details can be found in Supplemental Data Files S10-S12) 18 . All Chip-Seq datasets had at least 20 M reads with read lengths over 37 bp.
In this study, we analyzed two RNA-Seq datasets: DM-tRNA-Seq and Cell Lines-Tissues-Seq datasets. The DM-tRNA-Seq dataset was downloaded from the ENA database (https ://www.ebi.ac.uk/ena/brows er/view/ PRJNA 27730 9) and contained four samples with two technical repeats of purified tRNA and total RNA as templates with (+) or without (−) demethylase treatment (total RNA control, total RNA treatment, tRNA control, and tRNA treatment) 14 . The Cell Lines-Tissues dataset was downloaded from the ENCODE database (https :// www.encod eproj ect.org) and included 48 small-RNA-Seq datasets from 4 human cell lines and 8 tissues (Supplemental Data File S9). To minimize the batch effect, all small-RNA sequencing experiments selected were performed with the same cDNA library construction methods (https ://www.encod eproj ect.org/exper iment s/ ENCSR 000CR F/) and met three criteria: (1) the purified RNAs were size-selected to be shorter than 200 bp; (2) the read length was at least 100 base pairs; and (3) each FASTA file had 30 million aligned reads. Notably, the cDNA libraries of the DM-tRNA-Seq and Cell Lines-Tissues datasets were generated by template switching reaction and regular A-tailing cDNA synthesis method, respectively, which could have significantly influenced the results of tRNA profiling (see below).
ChIP-Seq data analysis. All ChIP-Seq data (bigWig files) available for each gene were combined for each cell line to achieve higher signal/noise ratios. Then, the combined bigWig files were normalized to the total number of mapped reads in the FASTQ file. To calculate the binding intensities for the matrix, we summed the pileup area around the transcriptional start site (TSS) of the tRNA genes within 100 bp up/downstream using the computeMatrix function in deepTools2 19 under a reference-point model. The matrix and binding profiles were generated by R and deepTools2 19 (Supplemental Data Files S14 and S15).
Definitions and classification of tRNA and tRFs. GtRNAdb gene symbols were used to identify the tRNA genes (http://gtrna db.ucsc.edu/docs/namin g/). The definitions of the tRNA hierarchy (gene-> isodecoders-> isoacceptors) were adapted from previous studies (Table S1) 20,21 . All tRNA genes sharing the same mature sequence were grouped into a tRNA family. The IDs of the tRNA families followed the pattern "tRFM#tRNA_ ID" (e.g., tRFM#tRNA-Glu-CTC-1-1), in which "tRFM" is the ID prefix of the tRNA families and tRNA_ID is the GtRNAdb gene symbol of a member that has the smallest transcript ID and gene locus ID. If a tRNA does not share its mature sequence with any other tRNA, it was assigned to a family containing only itself.
In this study, we defined tRFs as any RNA fragments that are cleavage products of the transcripts of the mitochondrial and nuclear tRNA genes. Therefore, full-length mature tRNAs were also considered a type of tRF. Structurally, tRFs fall into 12 distinct classes based on their overlapping range in the tRNA genes within up-and downstream sequences (60 bp UTRs) (Fig. 3A). Specifically, Full_tRNA and Full_U_tRNA represent full-length tRNAs and their precursors. The character "U" indicates that these tRFs contain the UTR sequences. The 5_tRNA_halves, 5_U_tRNA_halves, 3_tRNA_halves, and 3_U_tRNA_halves species correspond to four classes of the products of cleavage within the anticodon of mature tRNAs or their precursors. The 5_U_tRF and 3_U_tRF species correspond to the fragments that overlap with the upstream or downstream regions (UTRs) of mature tRNA but are shorter than 5_U_tRNA_halves and 3_U_tRNA_halves. The 5_tRFs and 3_tRFs species are shorter than 5_tRNA_halves and 3_tRNA_halves, respectively. Internal tRFs (i_tRFs) are remaining fragments derived from mature tRNAs. Finally, the "other" class represents all remaining tRFs, for example, tRFs that are only mapped upstream or downstream of the tRNA genes. The default minimum length of tRFs was set as 18 nts. www.nature.com/scientificreports/ The IDs of tRFs are composed of the pattern "tRF#Len-SeqCODE" (e.g., tRF#23-ZKXU53K80E), where 'tRF' is a prefix for the ID type, 'Len' (e.g., 23) indicates the length of tRFs, and 'SeqCODE' (e.g., ZKXU53K80E) is the sequence zip code generated by the Python code from MINTbase 22 . tRF sequencing analysis. Current tRNA-Seq bioinformatics tools have many limitations 12,23 ; thus, we developed a new program, tRNAExplorer, to process the tRNA-Seq data. Initially, low-quality reads were filtered out, and adapters were trimmed by Trimmomatic 24 . Then, BLASTN was used to search the reads against a customized tRNA transcript database that contained four major types of tRNAs to maximize the mapping possibility: (1) tRNA precursors with intron(s), (2) tRNA precursors without intron(s), (3) mature tRNAs, and (4) mature tRNAs with CCA. Only the best hits with identity over 96% were kept for subsequent analyses. The multiple aligned reads were equally assigned to related tRNAs. All tRFs were categorized into 12 classes based on the rules mentioned in the previous paragraph. The minimum requirements for tRF identification included the following criteria: (1) minimum identity of 96% between a tRF and its database sequence match, (2) longer than 18 nt, and (3) supported by more than 500 reads in at least one sample. High-confidence tRFs should be supported by more than 1000 reads in at least two samples. The relative abundance of a tRNA is represented by the normalized read number (NR), which is calculated as follows: NR = M × 10 9 /N, where M indicates the number of tRNA mapping reads and N indicates total reads in the FASTQ file. The program can generate analysis reports (several tsv files) for the expression matrix mismatch and cleavage site statistics of tRNA/tRFs across the samples. This information can be used to analyze the modifications and terminal addition patterns of tRNAs. Finally, tRNAExplorer also presents a Python kernel to implement more than 11 regular analyses. Most of plots in the manuscript including sample correlation, clustered matrix of tRNA/tRFs, tRF classification, pileup profiles of tRNAs, and others were prepared by the kernel. (https ://githu b.com/hqyon e/tRNAE xplor er/blob/maste r/help/ tRNAE xplor er_manua l.md). The results of the comparison between tRNAExplorer and current tools (such as MINTmap and tDRMapper) suggest its superiority in terms of the sensitivity of tRF identification, analysis, and running speed (data not shown). The source code and documentation for tRNAExplorer can be downloaded at the GitHub site (https ://githu b.com/hqyon e/tRNAE xplor er).
Protein interaction network analysis. Protein-protein interactions were analyzed using the STRING server (https ://strin g-db.org/) 25 . Functional annotations of the genes were downloaded from the UniProt database (https ://www.unipr ot.org) 26 .

Results
Transcriptional regulatory matrixes of tRNA genes. tRNA transcription is known to be driven by the Pol-III system; however, little is known about the control mechanism of tRNA transcription. To partially answer this question, we analyzed more than 1332 ChIP-seq data from three cell lines (K562, HepG2, and H1-ESC) to identify the tRNA gene-binding factors and related epigenetic markers. We calculated the binding/enrichment intensities of these factors and markers on the tRNA genes by directly normalizing the number of reads aligned to the corresponding promoters to the total aligned read count in the samples. The normalized read counts were used to construct the hierarchical clustering matrixes of 625 tRNAs across 253 (K562), 264 (HepG2), and 89 (H1-ESC) DBPs or histone markers (Figs. S2-S4 and "Supplemental Data"). Based on these results, we identified 85, 76 and 25 proteins bound to the tRNA genes in HepG2, K562, and H1-ES cells, respectively (see Tables S2-Table S4). We also constructed and compared two matrices, including 625 tRNAs across 89 factors shared by K562 and HepG2 cells (Fig. 1A,B, Tables S2 , Table S3). Analysis of the matrices indicated that tRNA genes can be clearly divided into two categories based on their binding/enrichment profiles. In category I, the tRNA genes (294 in HepG2 and 299 in K562 cell lines) bind more than 20 DNA-binding proteins (DBPs). Another category includes tRNA genes with low or even absent binding signals for all target proteins. Combination with RNA-Seq data clearly indicated that the tRNA genes of the first category have significantly higher expression levels on average than that of the tRNA genes of the second category. Although most category 1 tRNA genes (275) were shared by the two cell lines ( Fig. 2A), there was no correlation between their expression levels in the two cell lines (Fig. 2B). The top tRNA gene-binding factors (sorted by relative binding intensity) were also dramatically different between the categories (Fig. 2C,D). These results suggest that distinct DBP binding profiles are major contributors to the cellular specificity of tRNA profiles. The heatmaps and snapshots of aligned ChIP-Seq profiles of differential histone epigenetic markers and DBPs clearly show the details of distinct DBP binding profiles in the K562 and HepG2 cell lines ( Fig. 2E and Fig. S5).
Subsequent investigation of the cellular functions of the top tRNA gene-binding factors surprisingly indicated that both transcriptional activators and repressors cobind to the active tRNA genes (category I tRNA genes) in the two cell lines (Fig. 2D). For example, two key components of the SIN3A repression complex (SIN3A and RCOR1) and two transcriptional activators (ATF7 and ATF1) cobind to the tRNA genes in K562 cells. These phenomena were also observed in the case of histone acetylation enzymes (see below).
Histone modifications regulate tRNA transcription. Investigation of the ChIP-Seq profiles of histone modifications indicated that active markers (such as H3K4m3 and H3K9ac) were highly enriched and transcriptional repression markers (such as H3K9me3, H3K27me3, and H3K30me3) were absent at the promoters of category I tRNA genes (Fig. S5). Additionally, there were no significant differences between the two cell lines in terms of H3K4me3 and H3K9me3 profiles (Fig. S5A), suggesting that the modifications may function as basic mechanisms to enable the transcriptional potential of the tRNA genes as they did in regular genes.
Furthermore, the data showed that both histone acetyltransferase (EP300, KAT8) and histone deacetylases (HDAC1, HDAC2, and HDAC6) bind to the tRNA genes in combination with associated proteins (BRD4 for www.nature.com/scientificreports/  www.nature.com/scientificreports/ EP300 and REST, SIN3A, SIN3B, and RCOR1 for HDACs). For example, EP300 binds to the tRNA genes with HDAC1 and HDAC6 in K562 cells (Fig. 2D). KAT8 and HDAC2 cobind to the same tRNA genes in HepG2 cells. Surprisingly, the binding intensity of some subunits of HDAC complexes (such as HDAC6 and SIN3B) showed a relatively high positive correlation with tRNA gene expression in K562 and HepG2 cells (Fig. S5E,F). In H1-ES cells, ATF2, a histone acetyltransferase, binds to the tRNA genes in combination with HDAC6 (Fig. S4). Furthermore, the interaction network analyses clearly indicated that the EP300-SIN3A-centered and HDAC1/2-KAT6-centered networks are the major tRNA regulatory components in K562 and HepG2 cells, respectively (Fig. 2F). This finding supports a competition model for both sites and suggests that histone acetylation may play an important role in the regulation of tRNA transcription.
Other signaling pathways regulating tRNA transcription. In addition to histone acetylation, our data highlighted other signaling pathways that may be involved in tRNA regulation. For example, ATF1 and ATF7, two key factors of the cAMP signaling pathway 27,28 , were found associating with the tRNA genes in K562 cells. This observation raise the possibility that the tRNA pathway can be controlled by the cAMP pathway. Additionally, both MAX and MAZ can bind to the tRNA genes in K562 cells. These factors are the key components of the MLL complex and form transcriptionally activated complexes with the proto-oncogene Myc to contribute to autonomous proliferation and growth. High expression of tRNAs has been linked to cell transformation and proliferation 10,29,30 ; thus, it is of interest to test whether the cMyc or MLL pathways can promote tRNA transcription in cancer cells. Additionally, in H1 embryonic stem cells, POU5F1 (OCT4) and Nanog, two key pluripotency transcription factors, were bound to the tRNA genes, suggesting that the tRNA pathway may be a key component of the pluripotency network 31, 32 (Fig. S4). Overall, our data suggest that the tRNA pathway is regulated by a substantially higher number of regulatory mechanisms than assumed previously (Fig. S5, Table S4) in a cell-specific manner.
tRNA profiles revealed by DM-tRNA-Seq. Recent studies reported an optimized small tRNA-Seq technology known as DM-tRNA-seq that combines demethylation treatment and template shift strategy to achieve better performance in the detection of full-length tRNAs 14 . We categorized tRFs into 12 types based on their mapping range on the tRNA genes (Fig. 3A). Full-length tRNAs account only for a small portion (from 14 to 23%) of tRNA pools in all samples, suggesting that tRFs, the cleavage products of mature tRNAs, are stable and may have biological significance. Consistent with the original report 14 , the proportion of tRNA-mapped reads in the cDNA libraries was significantly increased (from 20 to 50%) in the purified tRNA template compared with that in the total RNA template (Fig. 3B). The correlation and expression analyses showed that the tRNA profiles of the purified tRNA and total RNA data were dramatically different (Fig. 3C,D). For example, the correlations between the total RNA and purified tRNA data were approximately 0.42-0.76, whereas the internal correlation between these two groups was usually higher than 0.86. tRNA-Glu was the most abundant isoacceptor in the total RNA profile, which was ranked 11th in the purified tRNA profiles (Fig. 3D). These results emphasize the importance of template selection for tRNA profiling. The length distribution of tRFs in the DM-tRNA-seq dataset shows three peaks at 75 nt, 38 nt, and 57 nt, which represent full-length tRNAs, 3′-tRNA_halves, and long 3′-tRFs, respectively (Fig. S6). As expected, a greater number of long tRFs (> 50 nt) were identified in the demethylated samples, especially in the tRNA-treatment samples (Fig. S6). The relative abundance of full-length tRNAs in the tRNA-demethylated samples was higher than that in the untreated samples (Fig. 3E). A total of 16 full-length tRNA isoacceptors were identified in tRNA-demethylated samples compared with 8 identified in tRNA-untreated samples (Fig. S7). Interestingly, very few 5_tRFs, including 5_tRNA_halves, were identified in the DM-tRNA-seq dataset, which accounted for a significant proportion in the Cell Lines-Tissues samples (see below).
Diversity of the tRF pools in the cell lines and tissues. tRF profiles across different samples were compared by analysis of 34 small-RNA sequencing datasets in the Cell Lines-Tissues dataset. We identified a total of 2103 tRFs, including 950 high-confidence tRFs, which were supported by at least 1000 reads in two samples (Supplemental Data Files 1 and 2). High-confidence tRFs included 3_tRFs (314), 3_tRNA_halves (199), 5_tRFs (162), and 3_tRNA_halves (199), suggesting that most tRFs are direct products of mature tRNA cleavage (Fig. S8A). The correlation analysis suggested that all samples can be clustered into two groups (blue and green rectangles in Fig. 4A). Group one included four types of tissues (ovary, esophageal squamous epithelium, transverse colon, and adrenal gland) and 2 cell lines (K562 and GM23338). The second group contained four types of tissues (parietal lobe, frontal cortex, diencephalon, metanephros, heart, liver, lung, and cerebellum) and 2 cell lines (OCI-LY7 and Karpas-422). Within each group, cell lines and tissues were separated from each other. Construction of an expression matrix of 50 tRNA isodecoders across all samples indicated that isodecoders were clustered into three groups, including 21 universally expressed tRNA isodecoders, 24 issue-specific tRNA isodecoders, and five nonexpressed tRNA isodecoders (Fig. 4B). The sample classification in the matrix was consistent with the correlation results indicating that the samples in group 2, but not group 1, express tissue-specific isodecoders. Analysis of the composition of the tRF pools in all samples indicated that 5′-tRFs and 5′-tRF-halves accounted for a significant proportion in the tissue samples but not in cell lines (Fig. 4C). In most cases, a single tRNA family had several expression patterns across the samples. For example, three tRFs were derived from the tRNA_Lys-CTT-4-1 family, including 5′-tRNA-halves, 3′-tRFs, and full-length tRNA, which were dominant in 20, 5, and 2 samples, respectively. Other 11 samples had both 5′-tRNA-halves and 3′-tRFs (Fig. S8B). Only 5′-tRNA-halves were observed in all 26 samples in the case of the tRNA-Val-TAC-4-1 family (Fig. S8C). www.nature.com/scientificreports/

5′-Additions in tRNAs and tRFs.
It is well established that the tRNA His guanylytransferase (Thg1) superfamily can catalyze the 3′-5′ synthesis of nucleic acids and G −1 addition to tRNA His with high specificity and efficiency 33 . As expected, the G −1 addition ratios of full-length tRNA His in the tissues were above 92%, and 11 out of 13 ratios were 100%. Interestingly, compared with the tissues, cell lines had relatively lower and more diverse G −1 addition ratios for tRNA His   www.nature.com/scientificreports/ respectively. In GM23338 and OCI-LY7 cells, the ratios of G −1 addition were even lower (15% and 19%, respectively) (Fig. 5A). These data suggest that the Thg1 pathway may be interrupted in these cell lines. Analysis of the ratio of the 5′-halves of tRNA His indicated that the G −1 addition ratios of the 5′-halves were significantly lower (approximately 15-70%) than that of full-length tRNA His in both cell lines and tissues, with the exception of GM23338, which showed a slight increase. Moreover, T −1 additions were found in 5′-halves in some tissues (e.g., cerebellum, diencephalon, lung, embryo metanephros, and parietal lobe) (Fig. 5B). In the case of 3′-halves of tRNA His , almost no G −1 additions were found in any of the samples (Fig. 5C). However, T −1 additions were found in all three cell lines and some tissues, such as the cerebellum, esophagus, and lung. Previous reports indicates that G −1 addition is detected only in full-length tRNA His34 . Our date suggested that the 5′-halves of tRNA His may www.nature.com/scientificreports/ www.nature.com/scientificreports/ inherit G −1 from full-length tRNAs and gradually lose it. The presence of T −1 additions in 3-halves of tRNA His suggested that these additions may occur after anticodon cleavage in a tissue-specific manner. Intriguingly, novel 5′-additions (e.g., A, U, C, UG, UA, and UU) were identified in various tRNAs in many samples. Approximately 5-45% of tRFs had 5′-addition modifications, and T −1 addition was the most abundant modification in 18 out of 34 samples (18/24) (Fig. 5D, Supplemental Data File 3). Comparison with G −1 addition indicated that T −1 addition has lower tRNA specificity and can be detected in both full-length tRNA and 3′_tRFs, suggesting that T −1 addition can be added to tRFs after the maturation and cleavage of tRNAs. These patterns were detected in various samples, suggesting that they should not result from random sequencing errors or artificial bias of library construction. Overall, the data suggested that 5′-additions of tRNAs are much more complex than thought previously, and the mechanisms and functions related to these novel 5′-additions are a subject of future investigations.

tRNA modification profiles in 293 T cells.
Previous studies suggested the potential use of NGS technology to identify and quantify tRNA base methylation 35,36 . Using tRNAExplorer, we reanalyzed the DM-tRNA-Seq dataset and found that the treatment with the dealkylating enzyme AlkB, which removes the methyl groups from m 1 A and m 3 C to revert the bases to their unmodified forms, dramatically decreased the ratio of mismatched reads vs total reads from approximately 89% (tRNA_control) to 31% (tRNA_treated), suggesting that most of the mismatches in the tRNA sequencing data may result from tRNA methylation (Fig. S9A). We calculated the mismatch ratio (MR) for all mismatched sites and found two peaks in the MR distribution at approximately 0% and 100% in all samples; AlkB treatment removed most of mismatched sites with MR between the two peaks (Fig. S9B). We identified a total of 16 potential modification sites (MR > 80% and > 200 supporting reads) across 44 isoacceptors in the tRNA-contol and tRNA-treatment data (Table S5).
The 58A mismatches were the most popular and were detected in most of the isoacceptors (41/44). These mismatches were detected only in tRNA-controls and were absent from the tRNA-treatment (AlkB) samples, suggesting the existence of m 1 A modification (Fig. 6A). This result is consistent with previous reports that demonstrated the presence of m 1 A58 in almost all human tRNAs 35 . We also found that the modification ratios of m 1 A58 were relatively low in the Arg-TCT, Glu-CTC, and Tyr-GTA tRNA families (Fig. 6A).
We also identified mismatches at position 34 in the tRNA nomenclature (wobble base of anticodon) in both tRNA-control and tRNA-treatment samples (Fig. S9C). The A-to-inosine (I) modification at the wobble base was reported in at least eight human tRNAs, which may be important for expanding the base-pairing capability from A34-U to I34-U/I34-C and even I34-A 37 . Our results show that approximately seven isoacceptors have this modification in 293 T cells; the mismatch ratios are always 100% and are not influenced by AlkB treatment (Fig. S9D). Although 2′O-methylation of G34/C34 is another common wobble modification in human tRNAs, we did not detect the G34/C34 mismatches, suggesting that NGS sequencing may not have sufficient sensitivity to detect this modification.
Nucleotide 37 in tRNAs is an A or G and is located at the immediate 3′ end of the anticodon nucleotides. Modifications of this position, such as N 1 -methylguanosine (m 1 G), N 6 -threonylcarbamoyladenosine (t 6 A), and N 6 -isopentenyladenosine (i 6 A), were reported to be essential for the prevention of frameshifting 38 . We identified the 37G and 37A mismatches in 7 and 3 isoacceptors, respectively (Fig. 6B,C). Interestingly, the 37A (A-> G) mismatches were detected only in the AlkB-treated samples, indicating that AlkB treatment may remove the methyl group from t 6 A or i 6 A thus enabling the detection of these modifications by NGS sequencing (Fig. 6D). As expected, the 37G modifications should be m 1 G because mismatches were only observed in the tRNA-control samples and were absent from the tRNA-treatment samples.
The m 2 2 G26 modification between the D and anticodon stems is assumed to be important for an increase in the stiffness of tRNAs 39 . We identified 19 isoacceptors that may have this modification in 293 T cells (Table S5). Many G26 mismatches were observed in both control and AlkB treatment samples, suggesting that m 2 2 G26 modification is resistant to AlkB treatment. The absence of G26 mismatches in some control samples is primarily due to the lack of the reads covering G26.
Additionally, ten mismatch sites, including A52, A67, A69, C25, G56, C32, C51, U53, U27, and U28, were observed only in one or two isoacceptors; C32 is known as a 3′-methylation site 36 . The functions and modification types of other sites are largely unknown. Overall, our results suggest the potential use of mismatches in the tRNA-Seq data to identify tRNA modifications and quantify their ratios in human samples (Fig. 6E, Table S5).

Modification profiles across tissues and cell lines.
To investigate the general patterns of tRNA modification across the samples and identify tissue/cell line-specific modification events, we calculated the mismatch ratio using the tissue and cell line data. The matrix of 58A tRNA families across all samples indicated that the modification status of A58 is distinct between the tRNA families. Asp-GTC-tRNAs and Val-TAC-tRNAs had the lowest modification ratios (2-30%); Glu-CTC-tRNAs, Glu-TTC-tRNAs, and Val-TAC-tRNAs had moderate modification levels (10-70%), and other families had high modification levels (70-100%) (Fig. 6F). In the case of the G9 (m 1 G) sites, the modification ratio in Val-TAC-4-1 was usually approximately 100%, which was significantly higher than that in other families. On the other hand, the G9 modification ratios in Glu-TTC, Glu-CTC, Pro-AGG, and Pro-CGG tRNAs were usually less than 10% in most samples. The data also indicated some tissue-specific modification events (Fig. S10A). For example, we identified four potential modification sites (G7, G10, A57, and G64) in most of the samples in the tRNA-Glu-CTC family, which is the most abundant tRNA family. In the case of A57, the modification ratios across the samples range from 7% (K562 cells) to 70% (ovary), suggesting that the modification statuses of the site may vary in various samples. However, the distributions of three other modifications were generally uniform (Fig. S10B). www.nature.com/scientificreports/ www.nature.com/scientificreports/ The cleavage sites of tRNAs. tRFs are the products of cleavage of tRNAs; hence, it is possible to detect the cleavage events based on the start and end sites of tRFs. However, it is necessary to distinguish true cleavage sites from early stops of cDNA synthesis triggered by certain posttranscriptional modifications. In the DM-tRNA-Seq dataset, cDNAs were synthesized by template switching from the adaptor to the 3′ end of the target RNA. In the case of the 5′ ends of tRFs, it is almost impossible to distinguish the cleavage sites from the early stop sites. On the other hand, the 3′ ends of tRFs should result from tRNA cleavage. However, most tRFs (> 99%) end with the 3′-termini of the tRNA genes (Fig. 3E) in the datasets; thus, no new cleavage sites can be identified based on the data.
In the Cell Lines-Tissues dataset, a poly-A tail and 5′-RNA adapter were added at the 3′ and 5′ ends of RNAs, respectively, before cDNA synthesis (https ://www.encod eproj ect.org/exper iment s/ENCSR 000CR F/). Early cDNA synthesis stop events can result in the absence of a 5′-RNA adapter of the first strand of cDNA and the failure of cDNA amplification (loss of tRFs) but do not result in the truncation of tRFs. Therefore, both the 5′ and 3′ ends of tRFs observed in the Cell Lines-Tissues dataset should result from RNA cleavage or random degradation, which can be distinguished by frequency. We identified a total of 1027 cleavage sites that were supported by at least 1000 reads in two samples (Supplemental Data File 4). We mapped the peaks of the 5′ and 3′ ends of tRFs to a virtual tRNA 70 bp in length to calculate the cleavage frequencies for each base in the virtual tRNA. The results indicated the presence of 11 and 5 peaks of cleavage sites suggested by the 5′ and 3′ ends, respectively (Fig. 7A). Most of the sites were located within five regions (5′ and 3′ terminals of tRNA, the connection region between the stem arm and D-loop, the anticodon loop, and the T-loop) (Fig. 7B). The 3D structure of tRNA suggests that all regions form exposed surfaces that are easily accessed by RNases (Fig. 7C). In addition to the 3′ and 5′ ends of mature tRNAs, cleavage sites are enriched within the anticodon loop or even in the anticodon itself (peaks a, b, and 7 in Fig. 7A). These sites are close to the wobble base (34) and 37 A/G. Additionally, the pileup profiles of TF-tRNA-Arg-CCG-2-1 can be categorized into four groups. Two conserved cleavage sites (T33/C34 and C40/ A41) were found in groups 2, 3, and 4 and resulted in the production of 5′-tRNA-halves of a 3′-tRF (Fig. 7D).
To explore the sequence specificity of the cleavage sites, we extracted 19952 unique sequences flanking highconfidence cleavage sites (supported by a minimum of 100 reads) with 7 bases and found 110 enriched patterns ( Fig. 7E and Supplemental Data File 5).

Discussion
In this study, we systematically examined the enrichment of eight histone modifications and 248 DNA-binding factors in the tRNA genes in three human cell lines to explore the mechanisms that regulate the tRNA pathway at the transcriptional level. As expected, active tRNA genes were consistently enriched with active markers (such as H3K4me3 and H3K4me2) and did not have repressive markers (such as H3K9me3 and H3K27me3) in various cells, suggesting that histone methylation may serve as a common mechanism that enables the expression of the tRNA genes. Interestingly, our data showed that highly expressed tRNA genes may serve as hubs of trans-factor binding and can recruit both transcriptional activators and repressors. For example, many subunits of histone acetylation and deacetylation complexes bind to and may compete for the tRNA genes in a cell-specific manner. Additionally, several signaling pathways (for example, the TORC1, Myc, ERK, and p53 pathways) are known to be involved in the Pol-III and tRNA synthesis systems [40][41][42] . Our study added several new candidates to the list, including the cAMP pathway in the tumor cell lines and the pluripotent pathway in H1 embryonic stem cells. These pathways are linked to human diseases and organism development 27,32 ; thus, our observations suggest the possible crosstalk between these pathways and present new clues for the investigations of novel tRF functions.
Additionally, our study compared tRF profiles across five human cell lines and eight tissues. The datasets used in the present study have some limitations (see below); however, the results clearly show the potential use of small-RNA NGS sequencing technology as a powerful method to simultaneously monitor the expression and modification, addition, and cleavage events of tRNA. Previously, several tRF databases were published 12,21,23 ; however, most of them were based on short RNA-sequencing data (read length < 40 nt), which complicates identification of the full sequences of long tRFs. The results of the present study indicated that a large portion of tRFs (50-60%) were longer than 40 nt (Fig. S6); therefore, we used long RNA-sequencing data (read length ≥ 100 nt) to avoid these issues. Our study presents a high-confidence tRF list for further investigation (Supplemental Data File 1). The data indicate that full-length tRNAs comprise only a small proportion of tRNA pools, suggesting the biological significance of tRFs. The compositions of tRFs were diverse across the samples, suggesting the existence of tissue/cell line-specific regulatory mechanisms. Our analyses also identified new modification sites and new patterns of modification and present valuable data resources to explore the tRNA dynamics. For example, the observation that the ratio of G −1 addition of tRNA His in cell lines is significantly lower than that in normal tissues suggests a dysfunction of the Thg1 pathway in the cell line samples. Additionally, our study identified more than 100 highly enriched sequence motifs flanking the tRNA cleavage sites (Fig. 7E). Although very few RNases were determined to have sequence specificity, our results suggest that some tRNA-cleaving RNases may recognize specific sequences or, interestingly, can use tRFs as guide RNAs to target cleavage sites similar to CRISPR. It is notable, that many clues and hypothesis raised by our study need be further validated by additional benchworks and data analyses.
Although optimized experimental methods, such as DM-tRNA-Seq and ARM-Seq, have significantly improved the sensitivity of tRNA sequencing technology, there is considerable room for improvement 14,15 . For example, very few 5-tRNA-halves were detected in the DM-tRNA-Seq dataset (Fig. 3E), suggesting the significant 3′-bias of RNA sequencing. HEK293T libraries were generated using a template switch method 14 ; thus, whether the absence of 5-tRNA-halves/5′-tRFs results from 3′-sequencing bias or the true biological nature of HEK293T cells requires additional investigation. Furthermore, cDNAs undergoing early RT stop can be amplified by the template switch method, which makes it difficult to distinguish tRNA cleavage events from the early www.nature.com/scientificreports/ stop of RT. Additionally, AlkB treatment can remove only limited modifications (N 1 -methyladenosine (m 1 A), N 3 -methylcytidine (m 3 C) and N 1 -methylguanosine (m 1 G)) 15 . Combination with other enzymes may improve the performance. Our analyses indicate that many modifications (for example, m 1 A58) do not induce early stops. Therefore, enzyme treatment should focus on the modifications that induce early stops. In the case of the Cell Lines-Tissues samples, regular A-tailing cDNA requires adaptor ligation at both ends for cDNA amplification and therefore may result in the loss of tRNA/tRFs due to early stops, which may be triggered by the modifications www.nature.com/scientificreports/ or secondary structure of tRNAs. The tRNA-Seq methods should be optimized in the following directions: (1) using purified tRNA rather than total RNA as template; (2) using ligation-based library construction method to avoid missing 5′_tRFs and 5′_tRNA_halves and inability to discriminate RNA cleavage from an early stop of RT; (3) removing more modifications that induce an early stop; (4) parallel experiments with and without enzyme treatments are required to evaluate the treatment efficiency and identify new modification sites; (5) the read length of RNA-Seq should be above 100 bp to capture long tRFs; and (6) advanced bioinformatics tools, such as tRNAExplorer, are required for simultaneous evaluation of tRNA/tRF quantification, modification, addition, and cleavage.

Conclusion
In summary, we extensively investigated the regulatory mechanisms and landscapes of tRNA using 1332 Chip-Seq datasets and 36 small-RNA-Seq datasets from human cell lines and tissues and found that tRNA genes are regulated by distinct transcriptional regulators, including histone acetylation, cAMP, and pluripotency pathways. We also identified 950 high-confidence tRFs suggesting that tRNA pools are dramatically distinct across cells and tissues in terms of the expression profiles and tRF composition. Tissues usually have higher levels of 5′-tRFs than that in cell lines. The analysis identified new potential modifications and uncovered specific cleavage patterns in tRNA families. The results also showed that RNA library preparation technologies have a considerable impact on tRNA profiling and require optimization in the future. Overall, our study provides a comprehensive global view and new insight into tRNA pools across various human cell lines and tissues. www.nature.com/scientificreports/