Main

During protein synthesis, ribosomes match messenger RNA codons to their cognate amino acids via base pairing with complementary anticodons in transfer RNAs. These short (approximately 76 nucleotides (nt)) adaptor molecules are highly similar in sequence and structure, and much more abundant than ribosomes in cells1, which enables rapid mRNA decoding. Their similarity poses a challenge as ribosomes must faithfully distinguish tRNAs that may differ by a single nucleotide but carry distinct amino acids. The relative abundance of charged tRNAs matching a given codon can modulate the rate and fidelity of mRNA translation2, and changes in the levels or function of individual tRNAs have been linked to cancer and neurological diseases3.

The tRNA repertoires of human cells and the mechanisms that control them remain largely unknown. This is due to the multicopy nature and simple promoter structures of tRNA genes, which makes their regulation challenging to predict or quantify. It is similarly challenging to accurately quantify mature tRNA transcripts because of their stable structure and abundant chemical modifications, which has resulted in poor characterization of tRNA expression in specific cellular contexts. There are 619 predicted tRNA genes in the human nuclear genome, which can potentially generate 432 unique tRNA transcripts from 57 tRNA anticodon families4. RNA polymerase III (Pol III) is directed to nuclear-encoded tRNA genes by transcription factor IIIC, which binds to short intragenic promoter elements (A box and B box) and recruits transcription factor IIIB (TFIIIB) to the variable regions upstream of tRNA loci5,6,7,8. TFIIIB then positions Pol III for initiation and can retain it for multiple rounds of transcription9,10,11. Because of this simplistic regulatory model, tRNA gene copy number is often used as a proxy for tRNA expression levels. However, although nearly all tRNA loci in yeast are transcribed during rapid growth12,13, Pol III enrichment at tRNA genes varies among mammalian tissues14,15,16,17,18. The molecular basis and quantitative impact of this selective transcription on tRNA levels are unknown. The lack of high-resolution measurements of tRNA levels has also led to controversy over their role in setting elongation rates in mammalian cells19,20.

Here we address these questions by combining modification-induced misincorporation tRNA sequencing (mim-tRNAseq)—a method we recently developed to quantify the abundance of mature tRNA with high accuracy and resolution21,22—with ribosome profiling and Pol III chromatin immunoprecipitation–sequencing (ChIP–Seq). We find that tRNA repertoires are extensively remodelled following the differentiation of human induced pluripotent stem cells (hiPSC) into neuronal and cardiac cell types. These changes, however, are not driven by altered codon usage across the transcriptome and they have a minimal impact on tRNA anticodon availability, which remains largely stable. Mechanistically, differential Pol III occupancy at tRNA loci determines mature tRNA levels and is driven by sequence features in the tRNA gene body and 5′ flanking regions. Decreased mTORC1 signalling following differentiation activates the Pol III repressor MAF1, which restricts Pol III to a subset of tRNA genes we define as ‘housekeeping’. We find that these genes are stably expressed and constitute the most abundant isodecoders in each anticodon family. This mechanism underlies the broad stability of tRNA anticodon pools and decoding rates in different cell types despite tRNA-repertoire remodelling during differentiation.

Results

Differentiation extensively remodels tRNA transcript pools

To define the composition of human tRNA pools in physiological settings, we designed a workflow using a reference hiPSC line (kucg-2)23, which circumvents the genetic variability of immortalized human cell lines24,25 and the changes in Pol III regulation following cellular transformation26,27,28,29. We used established small molecule-based protocols to direct hiPSC to differentiate into cardiomyocytes (CM)30,31 or dividing neuronal progenitor cells (NPC) from which we then obtained neurons32,33 (Fig. 1a). This isogenic panel contains cell types that are particularly affected by dysregulated tRNA metabolism in human diseases3. Immunostaining demonstrated the expected cell morphology and uniform expression of the pluripotency markers POU5F1 and SOX2 in hiPSC, the neural progenitor markers PAX6 and NESTIN in NPC, the neuronal markers MAP2 and CHAT in neurons, and ɑ-actinin-2 and cardiac troponin T in CM, confirming culture purity (Fig. 1b). Distinct and characteristic transcriptomic signatures in these cell lines and the robust expression of defined marker genes were also confirmed through RNA sequencing (RNA-Seq; Fig. 1c and Extended Data Fig. 1a,b).

Fig. 1: Transfer RNA anticodon pools are maintained largely stable across cell types despite extensive reprogramming of tRNA repertoires during differentiation.
figure 1

a, Schematic of hiPSC differentiation into NPC, neurons and CM. b, Representative fluorescence microscopy images (from at least three independent experiments) of immunostaining for cell type-specific marker proteins and DAPI. CTNT, cardiac troponin T; ACTN2, ɑ-actinin-2. Scale bars, 10 µm. c, Gene expression heatmaps for known cell type- and proliferative state-specific markers in hiPSC, NPC, neuron and CM cultures (n = 2 biological replicates). Standardized Z scores were calculated using DESeq2-normalized RNA-Seq gene counts across samples. d, Schematic depicting tRNA classification. Distinct tRNA transcripts sharing an anticodon are called isodecoders and collectively constitute anticodon families. Members of different anticodon families that carry the same amino acid belong to the same isotype. e, Principal component (PC) analysis of variance-stabilizing-transformed count data for tRNA transcripts from DESeq2 for each cell line (n = 2 biological replicates; variance explained by each principal component in parentheses). f, Heatmap of tRNA transcript expression dynamics showing only differentially expressed transcripts in at least one differentiated cell line relative to hiPSC (Benjamini–Hochberg-adjusted Wald test, Padj ≤ 0.05). Hierarchically clustered expression heatmap showing the scaled Z score of normalized unique transcript counts in hiPSC, NPC, neurons and CM (n = 2 biological replicates; left). Differential expression for NPC, neurons and CM relative to hiPSC reported as log2-tranformed fold changes (middle). Base mean normalized to the tRNA transcript across all samples (right). g, Principal component analysis as in e calculated from variance-stabilizing-transformed count data summed by tRNA anticodon. h, Heatmap as in f for count data summed by tRNA anticodon. Anticodon families previously38 associated with proliferating (P) or differentiated (D) cells are shown. Source numerical data and unprocessed blots are provided.

Source data

We then profiled the abundance of mature tRNA in these isogenic cell types using mim-tRNAseq21,22, which enables the accurate quantitation of individual tRNA transcripts (Fig. 1d). We obtained approximately 80% uniquely mapped and ≤2% multi-mapped reads for all samples, a median of approximately 80% of which were full length and >95% of which contained the post-transcriptionally added 3′ CCA tail (Extended Data Fig. 1c–e), indicating that they were derived from mature and translationally competent tRNAs. Less than 4% of uniquely mapped reads were derived from mitochondrial tRNAs (Extended Data Fig. 1f), and we obtained single-transcript resolution data for 373 of the 413 (90%) predicted nuclear-encoded tRNA transcripts in our curated reference. Seven of the remaining transcripts had <10 mapped reads and the others were mostly only distinguishable at sites that can carry misincorporation-inducing nucleotide modifications, which precludes read deconvolution (for example, tRNA-Pro-AGG-1 and tRNA-Pro-AGG-2; Extended Data Fig. 1g)22. To validate our ability to capture known instances of differential tRNA abundance, we examined tRNA-Arg-UCU-4, which is highly expressed in the nervous system of mice34 and humans35. This expression pattern was recapitulated in our workflow, as the proportion of reads mapping to tRNA-Arg-UCU-4 was significantly higher in neurons compared with hiPSC, NPC and CM (Extended Data Fig. 1h).

We next used DESeq2 (ref. 36) to compare the expression of individual tRNAs in differentiated cells relative to hiPSC. Principal component analysis demonstrated high reproducibility among replicates. The first principal component (reflecting cellular differentiation) accounted for 89% of the variation; tRNA transcript pools could also accurately distinguish cell types (Fig. 1e). Of the 373 unique tRNA transcripts we could deconvolute, 161 showed significant differences in expression of up to about 70-fold in one or more differentiated cell types compared with hiPSC (adjusted P (Padj) ≤ 0.05; Fig. 1f). The changes measured by mim-tRNAseq were highly concordant with northern blotting analysis performed as validation for three transcripts (Extended Data Fig. 1i,j). Among the remaining tRNAs 205 had zero or low counts in all cell populations (<0.005% of tRNA-mapped reads; Supplementary Table 1). These data demonstrate that tRNA transcript pools in human cells are extensively remodelled during differentiation.

Transfer RNA anticodon levels are largely stable across cell types

To define how this reprogramming impacts the abundance of tRNA anticodon families (Fig. 1d), we aggregated uniquely mapped tRNA reads by anticodon before DESeq2 analysis. Among the 57 anticodon families encoded by the full set of predicted human tRNA genes, we found no evidence of expression for nine, while 47 were robustly expressed across all cell types and tRNA-Ile-GAU was only detectable at very low levels in hiPSC (0.002% of uniquely mapped reads compared with 2.9% for tRNA-Ile-AAU and 0.8% for tRNA-Ile-UAU). The different cell types were well-resolved by principal component analysis of anticodon-aggregated data (Fig. 1g) and 46 anticodon families were differentially regulated in at least one cell type (Padj ≤ 0.05; Fig. 1h and Supplementary Table 1). However, the differences in abundance of the tRNA anticodon families were much smaller compared with those for individual tRNA transcripts, which is consistent with moderate variability among tRNA anticodon pools across mouse tissues37. Apart from a strong decrease for the poorly expressed tRNA-Ile-GAU, the largest changes were for tRNA-Gly-CCC (increased by 1.7–2.5-fold) and the selenocysteine-inserting tRNA-SeC-UCA (increased by threefold); all other changes were between 0.7- and 1.7-fold. There was no separation of anticodon pools among proliferating (hiPSC and NPC) and non-dividing (neuron and CM) cells or a substantial overlap of tRNA anticodon changes in NPC, neurons and CM with previously identified differentiation- or proliferation-linked tRNAs38 (Fig. 1h and Supplementary Table 1). Although tRNA-Arg-UCU-4-1 was strongly upregulated in neurons (approximately 40% of tRNA-Arg-UCU), the abundance of this anticodon family decreased by approximately 1.4-fold because other isodecoders were downregulated (Extended Data Fig. 1k,l and Supplementary Table 1). Thus, despite extensive reprogramming of tRNA transcript pools, the availability of tRNA anticodons in human cells remains largely unchanged following differentiation.

Codon usage and decoding rates do not vary across cell types

We next investigated whether the small but significant differences in tRNA anticodon pools among cell types are linked to differences in codon demand as previous studies on its potential coordination with tRNA supply have reached conflicting conclusions17,18,38. Codon usage weighted for expression correlated significantly with tRNA anticodon abundance (Pearson’s correlation coefficient (r) = 0.57–0.66; Fig. 2a) and was strikingly stable across all four cell types (coefficient of variation (CV), 0.77–13.22%; Fig. 2b), in accordance with previous data from mouse and human tissues17,18. Notably, the codon usage of mRNAs that are highly abundant in all cells correlated significantly more strongly with tRNA anticodon levels than that of cell type-specific mRNAs that are expressed at high levels (Extended Data Fig. 2a,b). Despite their remodelling during differentiation, tRNA anticodon pools are thus equally well adapted to global codon demand across cell types.

Fig. 2: Transfer RNA anticodon levels correlate with invariant codon demand and decoding rates across cell types.
figure 2

a, Correlation of mean weighted codon usage to mean tRNA anticodon abundance from mim-tRNAseq scaled to the proportions of total tRNA-mapped reads (n = 2 biological replicates for each cell type). b, Aggregated codon usage weighted by mRNA expression (in transcripts per million, TPM) across all transcripts based on RNA-Seq data. The values shown per codon are the mean weighted codon frequencies per cell type, represented as the proportion of total codon usage. The range of CV per codon is indicated. Codon sequences and their corresponding amino acids (single-letter code) are provided. c,d, Correlation between the reciprocal of tRNA anticodon abundance calculated from the proportion of all tRNA-mapped reads to relative A-site codon dwell time measured separately for short (20–23 nt; left) and long (28–33 nt; right) footprints from hiPSC (c) and NPC (d) libraries prepared with cycloheximide and tigecycline (n = 2 biological replicates denoted as rep1 and rep2). a,c,d, Solid lines, linear regression model; grey shading, 95% confidence interval. The Pearson’s correlation coefficients are provided. e, Correlation between mean relative A-site codon dwell time in hiPSC and NPC from c,d (long footprints), coloured by log2 fold changes in tRNA anticodon abundance in NPC relative to hiPSC, Padj ≤ 0.05; grey dots denote non-significant changes; dashed line, y = x; the Pearson’s correlation coefficient is indicated. Source numerical data are provided.

Source data

To test whether the small differences in tRNA anticodon abundance we detected altered the decoding rates, we performed ribosome profiling39 in hiPSC and NPC using cycloheximide to inhibit elongation and tigecycline to block tRNA entry into empty ribosomal A sites40. This yielded two predominant ribosome footprint sizes: 20–23 nt (short) and 28–33 nt (long; Extended Data Fig. 3a). The A-site (but not P-site) codon dwell times were strongly anticorrelated with the abundance of cognate tRNA anticodon families in both hiPSC and NPC (Fig. 2c,d and Extended Data Fig. 3b), demonstrating the key role of tRNA anticodon availability for decoding rates in human cells. Unlike in yeast40,41 the correlation in human cells was stronger for long footprints (Fig. 2c,d); it was much more modest for short (Pearson’s r = 0.32) and not significant for long footprints (or P sites) when tigecycline was not added to the cell extracts (Extended Data Fig. 3c–e).

The A-site codon dwell times in NPC and hiPSC were highly correlated (Pearson’s r = 0.9) and showed no clear relationship with differences in cognate tRNA abundance (Fig. 2e). For example, the tRNA-Gly-CCC levels were 1.7-fold higher in NPC cells but the average dwell time of ribosomes at GGG increased by only 5% (Extended Data Fig. 3f). We next compared the A-site codon dwell times for mRNAs that are expressed at high levels in both hiPSC and NPC (shared, n = 393) or that are cell type-specific (n = 114 in hiPSC and n = 80 in NPC; Extended Data Fig. 2a). Although approximately 20% of the ribosome footprints originated from shared mRNAs, <3% mapped to the cell type-specific transcripts, resulting in much less concordant codon dwell times between replicates (Extended Data Fig. 3g). Accordingly, the A-site codon dwell times for shared mRNAs were more highly correlated between NPC and hiPSC than the dwell times for cell type-specific mRNAs (Pearson’s r = 0.87 versus 0.54) but the higher variance of the latter was not explained by differences in cognate tRNA levels (Extended Data Fig. 3h,i). Therefore, the divergence of tRNA anticodon pools between NPC and hiPSC is not sufficient to substantially alter decoding speed.

Buffering of tRNA anticodon levels through major isodecoders

We investigated whether the stark disparity in the magnitudes of changes between tRNA transcript and anticodon abundance is due to the unequal contribution of distinct isodecoders to mature tRNA pools. To test this, we calculated the number of ‘major’ isodecoders that cumulatively contribute ≥90% of the tRNA-mapped reads for each anticodon family. Although the number of isodecoders in predicted human tRNA genes varies between one and 26, most mature tRNA anticodon families were comprised of 1–4 major isodecoders in hiPSC and only up to two in NPC and neurons (Fig. 3a,b). For example, the human genome encodes nine tRNA-Ala-UGC isodecoders, six of which were detectable in mature tRNA pools from hiPSC and two of which became predominant in differentiated cells (Fig. 3c). Similarly, three of the five predicted tRNA-Pro-UGG isodecoders are expressed in hiPSC and two of those account for most mature tRNA-Pro-UGG in differentiated cells (Fig. 3c). Globally, most minor isodecoders (>70%) were strongly downregulated in differentiated cells (by up to 70-fold); conversely, most major isodecoders were upregulated, albeit much more modestly (approximately 1.2–4-fold; Fig.3d). Thus, tRNA anticodon pools in different human cell types are buffered through major isodecoders that are more stably expressed.

Fig. 3: Human tRNA anticodon pools are buffered through stably expressed major isodecoders.
figure 3

a, Distribution of gene copy numbers per isodecoder in predicted human tRNA genes. b, Distribution of isodecoder count cumulatively constituting ≥90% of each anticodon per cell type (mean read proportions from mim-tRNAseq; n = 2 biological replicates) for detectable anticodon families with fully resolved unique transcripts. c, Proportional isodecoder composition (mean from n = 2 biological replicates) for tRNA-Ala-UGC (left) and tRNA-Pro-UGG (right). d, Change in tRNA expression (Benjamini–Hochberg-adjusted Wald test, Padj ≤ 0.05) for transcripts that are detectable in at least one cell type (≥0.005% of tRNA-mapped reads). DE, differentially expressed. Box plots: centre line, median; box limits, upper and lower quartiles; whiskers, 1.5× the interquartile range. Source numerical data are provided.

Source data

Pol III occupancy at tRNA genes predicts mature tRNA levels

To define the regulatory mechanisms that favour the expression of major tRNA isodecoders, we generated genome-wide Pol III occupancy maps using ChIP–Seq for the Pol III catalytic core component RPC1 (ref. 42) and BRF1, the TFIIIB subunit that recruits Pol III to tRNA5 (Fig. 4a). As expected, the ChIP signals for both RPC1 and BRF1 were highly localized and overlapped with predicted tRNA genes (Fig. 4b), whereas BRF1 signal was absent from the spliceosomal small nuclear RNA gene RNU6-1, which recruits Pol III via a BRF2-containing TFIIIB complex5 (Extended Data Fig. 4a). Strikingly, we obtained a near-perfect linear correlation between RPC1 ChIP–Seq signal strength aggregated by transcript and tRNA levels, measured by mim-tRNAseq, in all the cell types we tested (r2 = 0.88–0.9; Fig. 4c and Extended Data Fig. 4b). Differences in Pol III occupancy thus explain nearly all of the variation in mature tRNA abundance in hiPSC as well during their differentiation.

Fig. 4: Differentiation restricts Pol III to a housekeeping tRNA gene set encoding major isodecoders.
figure 4

a, Schematic of Pol III recruitment to tRNA genes. b, Representative ChIP–Seq signal at predicted tRNA genes in a locus on human chromosome 6 for RPC1 and BRF1. Data are from one biological replicate of hiPSC normalized to the estimated library sizes from counts over extended tRNA features (±125 bp) and scaled to reads per million (RPM). Insets: magnified views of regions around closely spaced tRNA loci. c, Correlation between tRNA abundance estimated by mim-tRNAseq (n = 373 unique transcripts) and RPC1 ChIP–Seq signal aligned to extended tRNA features (±125 bp) for hiPSC and neurons (mean from n = 2 biological replicates for each cell type; scaled to the proportions of total tRNA-mapped reads in each dataset). Solid blue lines, linear regression model; grey shading, 95% confidence interval; the Pearson’s correlation coefficients are indicated. d, Heatmaps of normalized RPC1 ChIP–Seq signal around tRNA gene start sites (±1 kbp) for single replicates. The tRNA genes were separated into housekeeping, repressed and inactive based on significant peaks in RPC1 ChIP–Seq data (FDR-adjusted P ≤ 0.05). e, UpSet plot of significant consensus RPC1 peaks (±125 bp) of annotated tRNA genes (FDR-adjusted P ≤ 0.05). Housekeeping tRNAs are shown in green and hiPSC-specific tRNAs in blue; the numbers in each group are indicated. f, Spike-in-normalized RPC1 ChIP–Seq read counts over tRNA features (±125 bp) versus the log2-transformed fold change in CM (left), NPC (middle) and neurons (right) relative to hiPSC presented as MA plots (n = 2 biological replicates for each cell type). Green and purple denote significantly higher and lower occupancies, respectively (FDR-adjusted P ≤ 0.05). Source numerical data are provided.

Source data

Differentiation restricts Pol III to housekeeping tRNA loci

To determine how the Pol III-transcribed tRNA repertoire changes during differentiation, we performed genome-wide peak identification in the RPC1 and BRF1 ChIP–Seq datasets. We found a nearly complete overlap between peaks at predicted tRNA genes for the same protein across biological replicates as well as between consensus RPC1 and BRF1 peaks in the same cell type (Extended Data Fig. 4c,d). Defining consensus tRNA peaks and filtering out tRNA genes with ≥25% ambiguously assigned reads enabled single-gene resolution analysis of 558 of the 619 (90%) predicted human tRNA genes.

Based on the striking reduction in the number of Pol III peaks at tRNA genes we observed following differentiation (Fig. 4d), we defined three distinct classes of human tRNA genes. The first comprised genes occupied by Poll III in all cell populations (n = 205), which we defined as housekeeping. This set encodes transcripts from all 47 anticodon families with detectable expression in the mim-tRNAseq datasets (Supplementary Table 1). Housekeeping genes represented 70 and 94% of major isodecoders in hiPSC and neurons, respectively (Extended Data Fig. 4e). The second set included tRNA genes that were not bound by Pol III in any cell type (n = 159), which we called ‘inactive’. The third set included tRNA genes at which a significant RPC1 ChIP peak in hiPSC is lost in one or more differentiated cell populations (‘repressed’; n = 194, Supplementary Table 2). The largest set of tRNA-overlapping RPC1 ChIP peaks was in hiPSC (n = 397), and differentiated cells contained subsets of these peaks (Fig. 4e and Supplementary Table 2). No tRNA genes gained RPC1 ChIP peaks specifically in CM, whereas peak sets from NPC and neurons each contained one cell type-specific peak. Consistent with these data, none of the mature tRNA transcripts that were present in a differentiated cell type were undetectable (<0.005% of tRNA-mapped reads) in hiPSC (Supplementary Table 1).

To rule out cell line-specific effects, we performed RPC1 ChIP–Seq in other hiPSC and immortalized cell lines. Of the 397 RPC1 tRNA peaks in kucg-2 hiPSC, 362 were detectable in an independent reference line, wibj-2 hiPSC23, as well as in HEK293T cells (Extended Data Fig. 4f). Datasets from kucg-2 and wibj-2 contained 24 tRNA peaks that were not detected in HEK293T cells, whereas only ten tRNA peaks were present in HEK293T or wibj-2 but not in kucg-2 cells. Approximately one-third of the predicted human tRNA genes are thus not bound by RPC1 in two independent hiPSC lines and the immortalized HEK293T cells. Note that 97% of housekeeping genes (199/205) were predicted to be active by a random forest classifier trained on tRNA gene sequence and genomic context43 (Extended Data Fig. 4g). However, almost half of the tRNA genes that we found to be bound by Pol III in hiPSC and repressed during differentiation were not predicted as active by this approach.

To capture both global and gene-specific regulation44, we analysed differential RPC1 binding to tRNA genes after spike-in normalization45. Pol III occupancy at tRNA genes was significantly reduced in differentiated cells compared with hiPSC (n = 197 in CM, n = 397 in NPC and n = 403 in neurons; false-discovery rate (FDR) ≤ 0.05; Fig. 4f). The largest effect sizes were for genes with low-to-medium RPC1 signal, mirroring the decrease in low-abundance tRNA transcripts (Figs. 1f and 3d). The RPC1 signal increased by less than threefold at 116 tRNA genes with mid to high occupancy in CM. By contrast, only two tRNA genes had significantly higher occupancy in neurons: one did not pass the peak-calling threshold due to low counts and the other (tRNA-Arg-TCT-4-1) encodes the neuron-specific tRNA-Arg-UCU-4 (Fig. 4f). Differentiation is thus accompanied by a general reduction in Pol III binding to tRNA genes, which disproportionately affects lower-occupancy loci. This is not accounted for by an overall reduction in Pol III abundance because the levels of its core subunits RPC1 and RPC2 decreased only modestly in neurons and CM (Extended Data Fig. 4h).

Chromatin remodelling at tRNA genes following differentiation

To determine how chromatin state impacts tRNA gene activity46,47,48, we profiled nucleosome-free regions (NFRs) using the assay for transposase-accessible chromatin with sequencing (ATAC–Seq)49 and performed ChIP–Seq for the trimethylation of histone H3 at K4 (H3K4me3; which flanks the transcription start sites (TSS) of active Pol II genes)50, K27 (H3K27me3; which marks Pol II genes repressed in specific cell states)51 and K9 (H3K9me3; which marks repeat-rich heterochromatin)52. We observed a marked concordance between the presence of H3K4me3 and RPC1 ChIP signal at tRNAs, in line with previous data46,47 (Fig. 5a). Although the NFR signal generally coincided with the bodies of RPC1-bound tRNA genes, it was a worse predictor of mature tRNA levels than RPC1 ChIP signal, particularly in differentiated cells (r2 = 0.56 for NFR ATAC–Seq versus r2 = 0.9 for RPC1 ChIP–Seq in neurons; Fig. 4c and Extended Data Figs. 4b and 5a,b). Selective loss of RPC1 occupancy coincided with a loss of NFR and H3K4me3 signals as well as the appearance of H3K27me3. By contrast, inactive tRNA genes were in closed chromatin with weak H3K9me3 enrichment and not marked by H3K4me3 or H3K27me3 (Fig. 5a and Extended Data Fig. 5a). Analysis of whole-genome bisulfite sequencing datasets from human embryonic stem cells53 revealed near-complete CpG methylation at inactive tRNAs (Extended Data Fig. 5c), suggesting that DNA methylation may contribute to their silencing.

Fig. 5: Transfer RNA gene body and upstream sequences govern differential Pol III recruitment.
figure 5

a, Representative heatmaps (bottom) and metagene profiles (top) of the ChIP–Seq signal for RPC1, H3K4me3 (K4me3), H3K27me3 (K27me3), H3K9me3 (K9me3) and NFRs from ATAC–Seq around tRNA gene start sites (±1 kbp) for hiPSC and neurons, separated by tRNA gene activity. b, Distances between tRNA genes from different activity classes to their nearest tRNA gene. Box plots: centre line, median; box limits, upper and lower quartiles; whiskers, 1.5× the interquartile range. c, Relationship between RPC1 occupancy at tRNA genes (mean from n = 2 biological replicates) and the predicted tRNAScan-SE score, separated by tRNA gene activity. Dashed lines, median tRNAScan-SE score and RPC1 occupancy; solid blue line, 55-bit score tRNAScan-SE threshold for functional tRNAs. d, Two-dimensional binned kernel motif density of the A (left) and B (right) box of each tRNA gene (n = 558) separated by tRNA activity (centre line, median). Motif counts for density estimation were based on a 90% match to the consensus motif. Dot colour is used to indicate whether the tRNAScan-SE score is above or below the 55-bit threshold for predicted functionality (Wilcoxon test). e, Schematic of CRISPR–Cas9 editing to replace tRNA-Pro-TGG-2-1 with tRNA-Pro-TGG-1-1 (left). Fraction of tRNA-mapped RPC1 ChIP–Seq reads at wild-type and CRISPR-edited hiPSC and NPC (n = 2 biological replicates for each cell type; bar, median) at the indicated tRNA genes. f, Receiver operating characteristic for tRNet performance on test data for each task. g, Top three significant TF-Modisco sequence motifs (FDR-adjusted P ≤ 0.01) for housekeeping tRNA genes. h, Schematic of CRISPR–Cas9 editing to insert the 100-bp sequence upstream of tRNA-Pro-TGG-1-1 in front of tRNA-Pro-TGG-2-1 (left). Fraction of tRNA-mapped RPC1 ChIP–Seq reads at wild-type (data from e) and CRISPR-edited hiPSC and NPC (n = 2 biological replicates; bar, median) at the indicated tRNA genes (right). i, Mean expression of CADM3 in RNA-Seq datasets across cell types (n = 2 biological replicates). j, Representative plot of normalized ChIP–Seq signal for RPC1 ChIP, K4me3 ChIP and RNA-Seq signal surrounding the tRNA-Arg-TCT-4-1 and CADM3 genes in hiPSC and NPC as well as neurons. WT, wild-type; edit, CRISPR-edited. Source numerical data are provided.

Source data

Nearly half of predicted human tRNA genes cluster on chromosomes 1 and 6; we thus investigated whether nearby loci modulate Pol III occupancy. The majority (80%) of housekeeping and repressed tRNA genes were in proximity to other active tRNA genes (median distance of 0.96 × 103 and 3.69 × 103 base pairs (bp)), which could facilitate the concentration of active Pol III in transcription ‘factories’54 to enable its recycling. By contrast, inactive tRNAs were more distant from other tRNAs (median of 380.5 kbp; Fig. 5b). Although half of the tRNAs with gene-resolved RPC1 ChIP data were either within (234/558, 42%) or near coding genes (≤500 bp; 44/558, 8%), their linear distance from active or inactive Pol II genes was not related to RPC1 occupancy (Extended Data Fig. 5d). Active human tRNA genes are thus most often in close proximity to each other but, in contrast to previous reports46,55, we found no clear association between Pol III signal and nearby Pol II activity.

Sequence-dependent features underlie tRNA gene regulation

To test whether selective tRNA gene expression is driven by sequence-dependent mechanisms, we first examined the relationship between RPC1 occupancy and overall bit scores from the tRNA gene prediction tool tRNAScan-SE4. All housekeeping genes surpassed the 55-bit score threshold suggested to distinguish functional tRNAs based on anticodon–isotype congruence, A- and B-box consensus match and secondary structure4, whereas 131 of the 159 (82%) inactive genes fell below this threshold (Fig. 5c and Extended Data Fig. 6a). However, 45 tRNA genes with bit scores of <55 had RPC1 peaks in hiPSC and there were no detectable RPC1 peaks at 28 loci with bit scores of >55 (Fig. 5c), confirming that the tRNAScan-SE score alone is not an accurate predictor of tRNA gene expression potential43.

To quantify the contribution of A- and B-box promoters to differential RPC1 binding, we first compared both promoters separated by activity status (Extended Data Fig. 6b). In line with previous data from mouse liver16, we found a high degree of sequence similarity and only subtle sequence differences across the three tRNA gene groups. To quantify these differences, we defined a consensus sequence for each promoter based on all predicted human tRNA genes (Extended Data Fig. 6c). There was a significantly higher density of both the A- and B-box consensus sequences in housekeeping tRNAs relative to repressed and inactive tRNAs (Fig. 5d), which suggests that subtle differences in A- and B-box promoters contribute to differential Pol III occupancy across human cell types. To experimentally validate this prediction, we replaced the housekeeping tRNA-Pro-TGG-2-1 gene with the repressed tRNA-Pro-TGG-1-1 in hiPSC using clustered regularly interspaced short palindromic repeats (CRISPR)–CRISPR associated protein 9 (Cas9) gene editing. These isodecoders differ by three nucleotides, one of which is in the A box (Fig. 5e), and the RPC1 ChIP signal at tRNA-Pro-TGG-1-1 is lost following differentiation to NPC. Conversely, the RPC1 occupancy at tRNA-Pro-TGG-2-1 increased (Extended Data Fig. 6d). ChIP–Seq revealed a strong reduction in RPC1 occupancy at the edited tRNA-Pro-TGG-2-1 locus in both hiPSC and NPC but comparable signal at the neighbouring tRNA-Pro-AGG-2-4 and the unedited tRNA-Pro-TGG-1-1 (Fig. 5e), confirming the importance of gene body sequences for Pol III occupancy strength.

Interestingly, the RPC1 ChIP signal at the edited tRNA-Pro-TGG-2-1 locus still increased in NPC compared with hiPSC, indicating that gene body sequences are not the sole determinant of transcriptional activity. In many instances identical tRNA genes with different flanking sequences are indeed characterized as different activity classes and become selectively occupied by Pol III during differentiation (for example, tRNA-Tyr-GTA-5 copies; Extended Data Fig. 6e). In such cases, differences in the 5′ flanking sequence might result in differential TFIIIB recruitment. As transcription initiates at a variable distance from the tRNA gene start11, matching position weight matrix models would miss over-represented motifs in these regions. We therefore adapted the BPNet convolutional neural network (CNN) architecture56, which can predict the sequence specificity of DNA-binding factors from experimental data56,57,58,59, to build a CNN called tRNet for predicting BRF1 binding from upstream tRNA sequences (Extended Data Fig. 6f). Trained using a 200-bp 5′ flanking sequence and tRNA activity status (housekeeping, repressed and inactive), tRNet was highly accurate in predicting tRNA activity in unseen data (75–78% across all folds) and could confidently distinguish housekeeping (area under the receiver operating characteristic (AUROC) = 0.91) and inactive (AUROC = 0.92) tRNA genes relative to all other classes, whereas repressed tRNAs were comparably more difficult to classify (AUROC = 0.81; Fig. 5f). Sequence motif detection56,60 revealed that GC-rich sequences and polyA stretches in 5′ flanking regions drive the predictive ability of tRNet for housekeeping tRNAs (Fig. 5g). Consistent with this, gene activity predictions based on chromatin state have suggested a regulatory role for GC content around tRNA loci43, whereas a polyA stretch may enhance DNA binding by TFIIIB through its TATA-binding protein subunit. By contrast, the upstream regions of inactive genes were enriched for polyT stretches (Extended Data Fig. 6g), which constitute Pol III termination signals that inhibit tRNA transcription in vitro7. To experimentally test whether tRNA 5′ flanking sequences can alter Pol III binding, we inserted the 100-bp sequence preceding tRNA-Pro-TGG-1-1 (GC content of 30%) directly upstream of tRNA-Pro-TGG-2-1 (GC content of 60%) in hiPSC using CRISPR–Cas9. The RPC1 and BRF1 ChIP signals were reproducibly decreased at tRNA-Pro-TGG-2-1 in NPC harbouring this edit (Fig. 5h and Extended Data Fig. 6h), corroborating the predictions from tRNet. Collectively, our data indicate a combinatorial effect of intragenic and 5′ flanking sequence features in determining Pol III occupancy at individual human tRNA genes.

tRNA-Arg-TCT-4-1 is co-regulated with CADM3 in neurons

The tRNA-Arg-TCT-4-1 gene, which is upregulated in neurons, represented a rare case of strong selectivity given the significant decrease in RPC1 occupancy at all other tRNA genes in comparison with hiPSC (Fig. 4f). It was classified as a housekeeping gene based on the presence of a significant RPC1 ChIP peak in consensus sets for all cell types (Supplementary Table 2), suggesting that it is also active in non-neuronal cells. Given that its A- and B-box sequences are identical to those in other tRNA-Arg-UCU isodecoders, we investigated whether its genomic context drives increased expression in neurons. The human tRNA-Arg-TCT-4-1 locus is >2.25 Mbp from other tRNA genes but it is 30 kbp from the TSS of CADM3, whose expression is particularly high in neuronal cells in the brain and eye61 (Human Protein Atlas, https://www.proteinatlas.org/). The genomic co-localization of CADM3 and tRNA-Arg-TCT-4-1 is conserved across vertebrates and the levels of CADM3 mRNA mirrored the tRNA-Arg-TCT-4-1 expression pattern we observed during hiPSC differentiation, with a strong upregulation specifically in neurons (Fig. 5i and Extended Data Fig. 1h).

As very few tRNA genes in mice and humans are in locations with conserved synteny like tRNA-Arg-TCT-4-1 (refs. 6,15,62), we thus considered that tRNA-Arg-TCT-4-1 could overlap with a distal cis-regulatory element of CADM3. Comparison of the tRNA-Arg-TCT-4-1 loci in mice and humans revealed a striking conservation not only of the tRNA gene body but also of a 140-bp region upstream (99% sequence identity; Extended Data Fig. 7a). Inspection of the GeneHancer database revealed that a neuron-specific in vivo-transcribed enhancer overlapping human tRNA-Arg-TCT-4-1 has been predicted based on cap analysis of gene expression (CAGE) data from the FANTOM5 panel of samples, with CADM3 as one of its potential targets63,64 (Fig. 5j). In accordance with enhancer-based regulation, the CADM3 mRNA levels were very low in hiPSC and NPC despite a high H3K4me3 ChIP signal at the TSS of the gene (Fig. 5j), which could be due to Pol II pausing. Pol II has indeed been found in a paused state at the Cadm3 promoter in NPC from the developing mouse cortex, with pausing relieved in their daughter neurons65. Neuron-specific CADM3 enhancer activation could thus potentiate Pol III transcription of the overlapping tRNA-Arg-TCT-4-1 by establishing a permissive chromatin state, which would account for the exceptionally high levels of tRNA-Arg-UCU-4 in the central nervous system34. Overall, 55 human tRNA genes overlap with predicted enhancers64, although only half of these enhancers (27) are transcribed based on FANTOM5 CAGE (Supplementary Table 3). Among other tRNA genes overlapping transcribed enhancers, tRNA-Lys-TTT-3-1 and tRNA-Lys-TTT-3-2 may also be co-regulated with enhancer targets in NPC and neurons (Extended Data Fig. 7b) but this seems to be a rare regulatory mechanism based on our dataset.

Selective tRNA gene repression is not driven by RPC7α loss

We investigated whether the selective repression of a tRNA gene subset following differentiation is linked to changes in Pol III composition. The human Pol III complex comprises 17 subunits66, one of which (RPC7) has two isoforms (RPC7α and RPC7β) encoded by two gene paralogues (POLR3G and POLR3GL). High RPC7α levels are a hallmark of embryonic stem cells and cancer; in healthy differentiated cells, RPC7α is largely replaced by RPC7β67,68,69,70. Accordingly, the levels of POLR3G mRNA and RPC7α protein were strongly decreased in NPC and nearly undetectable in neurons and CM (Extended Data Fig. 8a,b). The switch from RPC7α to RPC7β in Pol III thus coincides temporally with selective tRNA repression (Fig. 4d). We identified 294 consensus tRNA peaks in RPC7α ChIP–Seq from hiPSC, 292 of which were shared with RPC1 consensus peaks (Extended Data Fig. 8c). In contrast to 200 of 205 housekeeping tRNA genes (98%), only 93 of 194 repressed tRNA loci (48%) had significant RPC7α peaks (Extended Data Fig. 8d,e), indicating that RPC7α-containing Pol III is not preferentially enriched at these loci. We also found no significant changes in the RPC1 ChIP signal at tRNA genes in hiPSC depleted for RPC7α by inducible CRISPR interference71 (CRISPRi; Extended Data Fig. 8f,g), indicating that selective tRNA gene repression following differentiation does not result from RPC7α loss.

Selective tRNA repression correlates with MAF1 activation

We next focused on the Pol III repressor MAF1, which is kept inactive through phosphorylation by mTORC1 (refs. 72,73). Following a decrease in mTORC1 signalling triggered by low nutrient availability, MAF1 becomes dephosphorylated and inhibits Pol III (ref. 74). The gel migration pattern of MAF1 suggested that it is mostly phosphorylated in hiPSC (Fig. 6a), consistent with the requirement for high mTORC1 activity for pluripotency maintenance75. mTORC1 activity (measured by S6K1 and 4E-BP1 phosphorylation) was strongly diminished in differentiated cells, consistent with previous studies of human embryonic stem cell differentiation76 and mouse neurogenesis77. In agreement with this, MAF1 from differentiated cells migrated faster, which is indicative of phosphorylation loss. Interestingly, a small fraction of MAF1 remained partially phosphorylated when hiPSC were treated with the mTORC1 inhibitor rapamycin and MAF1 from NPC exhibited a similar pattern (Fig. 6b), indicating that phosphorylation at one or more sites in MAF1 (S60, S68 and S75)72 may be less sensitive to this drug. Diminished mTORC1 activity in differentiated cells thus activates MAF1 by altering its phosphorylation status.

Fig. 6: Diminished mTORC1 signalling in differentiated cells triggers MAF1-dependent selective tRNA gene repression.
figure 6

a, Immunoblots of MAF1, phospho-S6K1 (S6K1-P) and phospho-4E-BP1 (4E-BP1-P) in hiPSC, NPC, neurons and CM (n = 3 biological replicates). Samples from both untreated and Torin 1-treated HEK293T cells (250 nM; 1 h) served as controls for mTOR inhibition. b, Immunoblot of a Phos-tag gel for MAF1 in hiPSC, NPC and hiPSC treated with 10 nM rapamycin for 8 h (n = 2 biological replicates). Vinculin (VCL) served as a loading control. c, Immunoblots of MAF1 in CRISPRi lines carrying an sgRNA targeting MAF1. Gene knockdown was induced by the addition of 2 µM doxycycline (Dox) for 3 (hiPSC; top) or 6 d (NPC; bottom). For the hiPSC>NPC samples (middle), 2 µM Dox was added to MAF1 sgRNA-containing hiPSC for 3 d, followed by NPC derivation under continuous Dox treatment (n = 2 biological replicates). d, MA plots (generated by DiffBind) of spike-in-normalized RPC1 counts over tRNA features (±125 bp) versus the log2-transformed fold change for Dox-induced hiPSC (top), hiPSC>NPC (middle) and NPC (bottom) samples carrying an sgRNA targeting MAF1 relative to uninduced controls (n = 2 biological replicates for each cell type). Significantly higher and lower occupancies (FDR ≤ 0.05) are shown in green and purple, respectively. e, Heatmap of RPC1 ChIP–Seq occupancy changes following MAF1 depletion by inducible CRISPRi (+Dox) relative to uninduced controls (−Dox). RPC1 ChIP–Seq read counts over extended tRNA features (±125 bp; left). Normalized signal, accounting for estimated library sizes, was generated from these counts scaled to RPM. DiffBind differential occupancy analysis using spike-in normalization for induced samples relative to the corresponding uninduced controls reported as the log2-tranformed fold change (right; n = 2 biological replicates for each cell type and condition; FDR-adjusted P ≤ 0.05). f, Model of selective tRNA expression following hiPSC differentiation. Rep, replicate. Source numerical data and unprocessed blots are provided.

Source data

To experimentally test whether MAF1 activation mediates selective tRNA gene repression, we used inducible CRISPRi to perform MAF1 knockdown in hiPSC and NPC as well as before NPC derivation from hiPSC. MAF1 depletion did not alter the levels of RPC1 (Fig. 6c) and only modestly increased Pol III occupancy at 39 tRNA genes in hiPSC. By contrast, more than 100 loci had significantly higher RPC1 ChIP signal strength in NPC derived in the absence of MAF1 (n = 109) or depleted of MAF1 after derivation (n = 110), with effect sizes that were primarily >fourfold, and up to approximately 30-fold, higher (Fig. 6d). Remarkably, nearly all of these genes belong to the set of tRNAs that are repressed following differentiation (Fig. 6e). None of the inactive tRNA genes gained significantly more RPC1 ChIP signal in MAF1-depleted NPC and only eight did so in hiPSC (Fig. 6e). The RPC1 ChIP signal strength at housekeeping tRNA genes also remained largely unaffected by MAF1 depletion. These data indicate that cell type-specific human tRNA repertoires are established in a MAF1- and mTORC1-dependent manner (Fig. 6f).

Discussion

Despite their crucial importance for faithful and efficient mRNA decoding, the composition of tRNA pools in human cells and their regulation have remained poorly defined due to technical limitations. Understanding this regulation is critical for identifying the molecular triggers of human diseases caused by tRNA dysregulation3,78 as well as for the design of effective mRNA- and tRNA-based therapeutics79,80. By applying orthogonal methods in hiPSC-based models, we show that despite extensive remodelling of tRNA repertoires, the levels of mature tRNAs with specific anticodons are maintained largely stable across diverse human cell types. This is mediated by constitutively high transcription of one-third of the predicted human tRNA genes, which we define here as housekeeping. These genes have distinct intragenic promoters and 5′ flanking sequences, and their products comprise the most abundant mature transcripts in each tRNA anticodon family. Housekeeping tRNA genes are largely resistant to MAF1-mediated Pol III repression, which we identify as the mechanism for silencing low-occupancy tRNA loci on differentiation. We propose that the maintenance of stable tRNA anticodon pools and global codon usage across cell types ensures consistent decoding rates throughout development, independently of cell identity.

By combining Pol III ChIP–Seq with high-resolution tRNA quantification in homogeneous populations of distinct isogenic and untransformed human cell types, we found that differences in Pol III occupancy explain nearly all of the variation in mature tRNA levels (r2 = 0.9). This extraordinary concordance between two completely orthogonal workflows further underscores the quantitative nature of tRNA abundance measurements by mim-tRNAseq21,22. Whereas Pol III ChIP–Seq requires highly specific antibodies that are unavailable for most organisms, profiling mature tRNA repertoires with mim-tRNAseq is much more broadly applicable and we anticipate that it will help uncover other fundamental aspects of tRNA regulation.

The distinct A and B boxes and 5′ flanking sequence motifs of housekeeping tRNA genes may favour Pol III recruitment or facilitate its recycling at these loci, enabling their escape from MAF1-mediated repression during differentiation. A similar mechanism could account for the protection of some highly transcribed tRNA genes from stress-induced MAF1 inhibition in yeast81, mice82 and human fibroblasts83. The broader tRNA repertoires we found transcribed in cells with high mTORC1 activity, which is a hallmark of pluripotency but also of many cancers29, result in tRNA pools with a more diverse isodecoder composition. However, tRNA isodecoder diversity has surprisingly minor effects on decoding speed. We instead found that the relative abundance of tRNA anticodon families, which remains largely unchanged across cell types, determines translation elongation rates at different codons. In physiological contexts, a stable tRNA anticodon supply during development—maintained by housekeeping tRNA gene transcription—would minimize the potential for ribosome errors and protein misfolding that could result from decoding rate fluctuations84,85.

Why are active tRNA gene sets restricted during differentiation? Given that tRNAs are highly abundant, their synthesis is energetically costly and restriction of Pol III to housekeeping tRNA genes via MAF1 may help maintain tRNA anticodon pools in differentiated cells while conserving resources. In line with this, Maf1−/− mice are viable but have a lean phenotype and increased energy expenditure86. MAF1 plays a role in mouse adipogenesis87 and osteoblast differentiation88, but whether this is through tRNA repertoire reprogramming is difficult to dissect, given that the protein also inhibits Pol III-mediated transcription of 5S ribosomal RNA5. This coupling of rRNA and tRNA biogenesis may also serve to maintain the overall stoichiometry between ribosomes and tRNA molecules in cell types with distinct global translation demands.

Despite the strong correlation between H3K4me3 and RPC1 ChIP signals at tRNA genes in our datasets and previous studies46, we found no clear association of tRNA gene activity with Pol II transcription of nearby coding genes. However, in very rare cases—such as we propose for tRNA-Arg-TCT-4-1—an overlap with an enhancer element may boost the expression of an individual tRNA gene in specific cell contexts. Long-range regulatory DNA interactions, rather than linear distance to Pol II genes, could thus modulate the expression of specific tRNA genes in defined cell types. We found some evidence for a similar mode of regulation for tRNA-Lys-TTT-3-1 and tRNA-Lys-TTT-3-2 in NPC and neurons, and it remains possible that other tRNA loci we found to overlap with predicted enhancers may be differentially expressed in cellular contexts where these enhancers are active.

Methods

Cell culture and hiPSC differentiation

HEK293T/17 (American Type Culture Collection, CRL-11268) and Lenti-X 293T (Takara Bio, 632180) cells were cultured in DMEM high-glucose medium supplemented with 10% fetal calf serum (FCS) at 37 °C with 5% CO2. The HPSI0214i-kucg_2 and HPSI0214i-wibj_2 reference hiPSC lines23 were obtained from the European Collection of Authenticated Cell Cultures (catalogue number 77659901) and cultured in mTeSR Plus on Geltrex-coated plates at 37 °C with 5% CO2. Neurons and NPC were derived from HPSI0214i-kucg_2 cells using small molecules as previously described32,33. For NPC differentiation, hiPSC were cultured to 90% confluency and cut by scratching a chequered pattern into the dish with a cannula, followed by incubation with collagenase IV for 10–15 min at 37 °C. Cell clusters were carefully scratched off the plate, transferred to a 15 ml tube containing Neurobasal (N2B27) medium (Gibco, 21103049)—DMEM/F12 (Gibco, 21331020) 50:50, 0.5×N2 (Thermo Fisher Scientific, 17502048), 0.5×B27 (Thermo Fisher Scientific, 12587010), 2 mM GlutaMAX (Gibco, 35050061)—and pelleted by gravity. Cell clusters were washed once with N2B27 medium and transferred into NPC-induction medium (N2B27 with 200 µM ascorbic acid (Sigma-Aldrich, A4403), 3 µM CHIR99021 (Axon Medchem, Axon1386), 0.5 µM purmorphamine (Santa Cruz Biotechnology, sc-202785A), 150 nM dorsomorphin (Absource, S7306) and 10 µM SB431542 (Biomol, Cay12031)) with 5 µM ROCK inhibitor (Y-27632; Stemcell Technologies, 72305) in a sterile dish without coating, to allow embryoid body formation, and incubated at 37 °C with 5% CO2. The medium was exchanged every 2 d with NPC-induction medium without Y-27632. On day six, the embryoid bodies were dissociated into single cells by pipetting and plated into a Geltrex-coated well in NPC expansion medium (N2B27 with 200 µM ascorbic acid, 3 µM CHIR99021 and 0.5 µM purmorphamine). The medium was changed every other day. To remove non-NPC cells, a sequential digest was performed during the first passages using Accutase. Standard passaging was performed as for hiPSC single-cell passaging every 5 d at a ratio of 1:10.

For the differentiation of NPC to neurons33, cells were singularized with Accutase and 1 × 106 cells were seeded into a six-well plate containing patterning medium (N2B27 with 200 µM ascorbic acid, 1 µM retinoic acid (Sigma-Aldrich, R2625), 0.5 µM purmorphamine and 10 ng ml−1 of both GDNF and BDNF (Peprotech, 450-10 and 450-02)). The cells were cultured for 6 d with a medium change every other day. On day six, the medium was changed to maturation medium (MM; N2B27 with 200 µM ascorbic acid, 100 µM dbcAMP (Sigma-Aldrich, D0627), 5 ng ml−1 GDNF and BDNF, and 1 ng ml−1 TGF-β3 (Peprotech, AF-100-36E)) with 5 ng µl−1 Activin A (Life Technologies, PHG9014). After 2 d, the medium was exchanged with MM without Activin A. The cells were maintained in plates for another 10 d, with medium exchanges every 2–3 d. On day 16, the cells were detached with Accutase, resuspended in MM, pelleted by centrifugation for 5 min at 200g and transferred to a new plate. CompE (0.1 µM; Merck, 565790) was added to the medium on day 19 to enhance neuronal maturation and the cells were harvested on day 21.

Cardiomyocytes were derived from HPSI0214i_kucg-2 hiPSC as previously described30,31, with some modifications. Accutase was used to dissociate hiPSC into single cells, which were then seeded in day 0 differentiation medium (KO-DMEM (Gibco, 10829-018); 2 mM l-glutamine (Gibco, 25030-024); insulin, transferrin and selenious acid (5 µg ml−1 each; ITS; Corning, 354351); 10 ng ml−1 FGF2 (Peprotech, 100-18B-250); 1 µM CHIR 9920 (Axon, 1386); 1 ng ml−1 BMP-4 (R&D, 314-BP-010); 5 ng ml−1 Activin A (Life Technologies, PHG9014) and 10 µM Y-27632 on Matrigel-coated plates. The medium was changed to transferrin/selenium medium (KO-DMEM, 2 mM l-glutamine, 5.5 µg ml−1 human transferrin (Sigma-Aldrich, TS8158-100mg), 6.7 ng ml−1 sodium selenite (Sigma-Aldrich, 214485) and 250 µM ascorbic acid (Sigma-Aldrich, A4403-100mg)) after 1 d. On days 2 and 3, the medium was replaced with transferrin/selenium medium supplemented with 0.2 µM WNT-inhibitor C59 (Tocris, 5148). The medium was exchanged daily until day 9. To enrich for CM, the cells were then starved of glucose for 1 d in transferrin/selenium medium minus glucose medium (DMEM without glucose (Gibco, A13320-01), 2 mM l-glutamine, 5.5 µg ml−1 human transferrin, 6.7 ng µl−1 sodium selenite, 250 µM ascorbic acid and 4 mM lactic acid (Sigma L4263-100ml))31. On day 10, the cells were trypsinized with Accutase and plated in CM-MM medium (KO-DMEM, 2% FCS (Gibco, 16000-044), 2 mM l-glutamine and 10 µM Y-27632 on Matrigel-coated wells. The following day the medium was replaced with fresh CM-MM without Y-27632, which was exchanged every 2 d until the cells were harvested on day 15.

HEK293T/17 and Lenti-X 293T cells were cultured (at 37 °C with 5% CO2) in DMEM high-glucose medium supplemented with 10% FCS and passaged using 0.25% trypsin in EDTA every other day at a ratio of 1:10–1:20.

Generation of an inducible CRISPRi hiPSC line

HPSI0214i_kucg-2 cells were engineered to express KRAB-dCas9 from a doxycycline-inducible promoter at the AAVS1 locus71 using pAAVS1-PDi-CRISPRn (a gift from B. Conklin; Addgene, plasmid 73500; http://n2t.net/addgene:73500; RRID: Addgene_73500). The cultures were selected with 100 µg ml−1 G418 until stable colonies originating from single cells formed. The colonies were picked and screened for heterozygous insertion by PCR using two primers flanking the AAVS1 locus (5′-CGAGAGCTCAGCTAGTCTTC-3′ and 5′-CTCTCCCTCCCAGGATCC-3′) and an additional primer binding the insert (5′-GTTCATTCAGGGCACCGGAC-3′). KRAB-dCas9 expression in positive clones was assessed by flow cytometry and immunoblotting after the addition of 2 µM doxycycline. Genome integrity was verified by G-band analysis of expanded clones.

CRISPR–Cas9 genome editing

CRISPR RNA and single-stranded oligodeoxynucleotide templates were obtained from IDT. Guide RNAs were assembled by annealing the CRISPR RNA (5′-UGUGGGCCAAGGCUAGGGAGGUUUUAGAGCUAUGCU-3′ for the Pro-TGG-2 gene body edit and 5′-UUGCUCAGCAGAUGGCUCGUGUUUUAGAGCUAUGCU-3′ for the Pro-TGG-2 upstream region edit) with trans-activating CRISPR RNA (IDT) in equimolar ratios at 95 °C for 5 min. Ribonucleoprotein complex was assembled by mixing 100 pmol guide RNA with 50 pmol Alt-R HiFi Cas9 (IDT) and incubated at room temperature for 20 min. HPSI0214i_kucg-2 cells were dissociated into single cells using Accutase and Nucleofected with ribonucleoprotein complex and HDR donor oligonucleotide in P3 solution (Lonza) using the CA137 programme in a Nucleocuvette strip. The cells were re-plated in mTeSR Plus supplemented with 1:10 CloneR (Stemcell Technologies, 05888). The medium was exchanged with mTeSR Plus every 2 d. Colonies were picked and expanded, and homozygous edited clones were identified by PCR amplification of genomic DNA and Sanger sequencing.

CRISPRi sgRNA design

Single guide RNAs (sgRNAs) were designed to target the TSS of genes in the GENCODE v19 annotation using an adapted workflow of the CRISPRiaDesign protocol (https://github.com/mhorlbeck/CRISPRiaDesign). To incorporate information about single nucleotide polymorphisms (SNPs) in the HPSI0214i_kucg-2 genome, we used GATK haplotype calls for the cell line (ftp://ftp.sra.ebi.ac.uk/vol1/ERZ447/ERZ447992/) and extracted variant sites only using gvcftools extract_variants v0.17.0. The resulting genomic variant call format (VCF) file was indexed and genotypes were called using GATK GenotypeGVCFs v4.1.0.0. From this, only SNPs were retained so as to preserve genomic context and position information between GRCh37 and our custom genome. This was achieved using GATK SelectVariants v4.1.0.0 with the -select-type SNP parameter. We then replaced nucleotides in the reference GRCh37 genome with the called genotype SNPs by generating a sequence dictionary from the reference genome using Picard CreateSequenceDictionary v2.17.10 and supplying the SNP VCF to GATK FastaAlternateReferenceMaker v4.1.0.0. To train an elastic net linear regression model for sgRNA activity predictions, this custom genome was used in combination with other supplied training data from the CRISPRia pipeline, including sgRNA activity scores and TSS predictions (https://github.com/mhorlbeck/CRISPRiaDesign/tree/master/data_files) and our own ATAC–Seq data from HPSI0214i-kucg_2 cells as a proxy for chromatin accessibility. After activity score prediction, off-targets were predicted per sgRNA as described89.

CRISPRi knockdown

POLR3G- and MAF1-targeting sgRNAs (5′-GGACTCGCCGGAGCGCTCTG-3′ and 5′-GGTGCCGGCCGGCAAGGAAA-3′) were cloned in pU6-sgRNA EF1α-Puro-T2A-GFP by Gibson assembly. This plasmid was constructed by replacing BFP with GFP in pU6-sgRNA EF1α-Puro-T2A-BFP (a gift from J. Weissman; Addgene plasmid 60955; http://n2t.net/addgene:60955; RRID:Addgene_60955). Lentivirus stocks were produced by co-transfection of the resulting plasmid with packaging plasmids (gifts from D. Trono; pMDLg/pRRE, Addgene, plasmid 12251, http://n2t.net/addgene:12251, RRID:Addgene_12251; pRSV-Rev, Addgene, plasmid 12253, http://n2t.net/addgene:12253, RRID:Addgene_12253; and pMD2.G, Addgene, plasmid 12259, http://n2t.net/addgene:12259, RRID:Addgene_12259) into Lenti-X 293T cells with TransIT-Lenti transfection reagent (Mirus, MIR6603) following the manufacturer’s instructions. Viral supernatant was harvested 48–72 h after transfection, filtered through a 0.45 μm polyvinylidene fluoride syringe filter and then precipitated overnight with Lentivirus precipitation solution (Alstembio, VC125) at 4 °C. Virus stocks were concentrated tenfold in cold PBS, aliquoted and stored at −80 °C.

Lentiviral transduction of hiPSC was performed by adding thawed lentivirus stock mixed with fresh medium to plates, followed by an incubation of 10 min at 37 °C with 5% CO2 and the addition of trypsinized cells. The cells were incubated with lentivirus for 2 d before splitting and selection with 2.5 μg ml−1 puromycin for 2–3 d. NPC were transduced as per the protocol for hiPSC, except that the incubation with lentivirus was reduced to 1 d and performed in the absence of doxycycline. The cells were then selected with 2.5 μg ml−1 puromycin in the presence or absence of doxycycline until >80% of the cells were GFP-positive.

RNA isolation

Cells were lysed in lithium dodecyl sulfate (LiDS)/LET buffer (5% LiDS in 20 mM Tris, 100 mM LiCl, 2 mM EDTA, 5 mM dithiothreitol (DTT) pH 7.4 and 100 μg ml−1 proteinase K). The lysates were incubated at 60 °C for 10 min, pushed ten times through a 1 ml syringe with a 26 G needle and mixed by vortexing. Two volumes of cold acid phenol (pH 4.3), 1/10 volume 1-bromo-3-chloropropane and 50 µg glycogen (Thermo Fisher Scientific, AM9510) were added. The samples were mixed vigorously by vortexing, followed by centrifugation at 10,000g and 4 °C. The aqueous phase was transferred to a new tube and the phenol and 1-bromo-3-chloropropane extraction was repeated. RNA was then precipitated from the aqueous phase by the addition of three volumes of 100% ethanol and incubation at −20 °C for 30 min. The pellets were washed with 80% ethanol, air-dried and resuspended in RNase-free water. The RNA concentration was measured using a Nanodrop system and the samples were stored at −80 °C.

Northern blotting

For each sample, 0.5 µg total RNA was separated on denaturing gels (10% polyacrylamide in 7 M urea and 1×TBE). The RNA was transferred to Immobilon NY+ membranes (Millipore) in 1×TBE at 4 mA cm−2 for 40 min using a TransBlot Turbo apparatus (Bio-Rad) and crosslinked at 0.04 J in a Stratalinker ultraviolet light crosslinker. The membranes were incubated at 80 °C for 1 h and pre-hybridized at 55 °C for 4 h in hybridization buffer (20 mM Na2HPO4 pH 7.2, 5×SSC, 7% SDS, 2×Denhardt’s solution and 40 μg ml−1 sheared salmon sperm DNA). This was followed by overnight hybridization with 10 pmol 5′-end 32P-labelled probes (tRNA-Gly-CCC-2, 5′-CGGGTCGCAAGAATGGGAATCTTGCATGATAC-3′; tRNA-Arg-UCU-4, 5′-CGGAACCTCTGGATTAGAAGTCCAGCGCGCTCGTCC-3′ and tRNA-Asn-GUU-1, 5′-CGTCCCTGGGTGGGATCGAACC-3′) in hybridization buffer. Finally, the membranes were washed three times in 25 mM Na2HPO4 pH 7.5, 3×SSC, 5% SDS and 10×Denhardt’s solution, washed once in 1×SSC and 10% SDS, and exposed to PhosphorImager screens scanned on a Typhoon FLA 9000 (GE Healthcare). The band intensity was quantified using ImageJ.

Immunoblotting

Cells were lysed in RIPA buffer (20 mM Tris pH 7.5, 150 mM NaCl, 1% NP-40, 0.5% sodium deoxycholate and 0.1% SDS) supplemented with 10 µg ml−1 aprotinin, 20 µM leupeptin, 2.5 µM pepstatin A, 0.5 mM 4-(2-aminoethyl)benzenesulfonyl fluoride hydrochloride and 1×phosphatase inhibitor cocktail (Cell Signaling Technologies, 5870). The protein concentration was determined using a Pierce BCA protein assay kit (Thermo Fisher Scientific, 23225). For each sample, 20 μg total protein was resolved by SDS–PAGE on 10% gels supplemented with 0.5% 2,2,2-trichloroethanol (Sigma, T54801) or on pre-cast 4–12% bis-Tris polyacrylamide gels (Life Technologies) in Bolt MES SDS running buffer (Invitrogen, B0002). Total protein stained with 2,2,2-trichloroethanol was imaged by ultraviolet light illumination on a ChemiDoc system (Bio-Rad). The proteins were then transferred to a nitrocellulose membrane (Amersham, 10600015). For visualizing total protein in pre-cast gels, the membranes were stained with Ponceau S solution (0.5% Ponceau S and 1% acetic acid) for 3 min at room temperature with gentle shaking and imaged on a ChemiDoc system (Bio-Rad) after rinsing with distilled water. The membranes were blocked in 5% milk in PBS-0.1% Tween-20 for 1 h, followed by overnight incubation at 4 °C with primary antibodies. The primary antibodies used for immunoblotting were anti-phospho-p70 S6 kinase (1:1,000; Cell Signaling Technologies, 9206S), rabbit anti-phospho-4E-BP1 (1:1,000; Cell Signaling Technologies, 2855T), mouse anti-Pol III RPC32/RPC7α (1:1,000; Santa Cruz Biotechnology, sc-21754), mouse anti-MAF1 (1:1,000; Santa Cruz Biotechnology, sc-515614 X), mouse anti-POLR3B/RPC2 (1:1,000; Santa Cruz Biotechnology, sc-515362), rabbit anti-POLR3A/RPC1 (1:1,000; Cell Signaling Technologies, 12825) and rabbit anti-vinculin (1:1,000; Cell Signaling Technologies, 13901). The membranes were then incubated with horseradish peroxidase (HRP)-labelled secondary antibodies (1:4,000; anti-rabbit IgG–HRP or anti-mouse IgG–HRP; Dianova, 111-035-003 and 115-035-003, respectively) at room temperature for 1 h. Proteins were visualized by chemiluminescence using SuperSignal west pico PLUS (Thermo Fisher Scientific, 34577) and imaged on an iBright system (Thermo Fisher Scientific).

For S6K1 and 4E-BP1 immunoblotting, the membranes were first probed with phospho-specific antibodies (1:1,000; anti-phospho-4E-BP1 and anti-phospho-p70 S6 kinase (T389); Cell Signaling Technologies, 2855T and 9206S, respectively). The membranes were stripped by two rounds of incubation in 25 ml Restore western blot stripping buffer (Thermo Fisher Scientific, 21059) for 15 min at room temperature with gentle shaking. This was followed by another round of blocking and the membranes were re-probed with anti-4E-BP1 (1:2,000; Cell Signaling Technologies, 9644) and anti-p70 S6 kinase (1: 2,000; Cell Signaling Technologies, 2708T).

For Phos-tag immunoblotting, 20 µg total protein from each sample was mixed with 4×Laemmli sample buffer (Bio-Rad, 161-0747) supplemented with 25 mM DTT and boiled for 10 min at 95 °C. The denatured samples were run on Phos-tag gels (8% acrylamide in bis solution 29:1; 0.375 M Tris–HCl, pH 8.8, 20 µM Phos-tag (Wako, AAL-107) and 40 µM MnCl2) in 1×Tris/glycine/SDS running buffer (Bio-Rad, 1610732). The gels were washed twice with gentle shaking in transfer buffer (25 mM Tris, 192 mM glycine and 10% methanol) containing 1 mM EDTA (10 min each wash), followed by two washes (10 min each wash) in transfer buffer without EDTA. The proteins were transferred to polyvinylidene fluoride membranes (Amersham, 10600021) overnight in 25 mM Tris, 192 mM glycine and 10% methanol at 35 V and room temperature. The membrane was blocked in 5% milk in PBS-0.1% Tween-20 for 1 h, followed by overnight incubation with mouse anti-MAF1 (1:1,000; Santa Cruz Biotechnology, sc-515614 X) at 4 °C and anti-mouse IgG–HRP (1:4,000; Dianova, 115-035-003) at room temperature for 1 h. The proteins were visualized by chemiluminescence using SuperSignal West Femto Maximum Sensitivity Substrate (Thermo Fisher Scientific, 34094) and imaged on an iBright system.

Immunostaining

Cells were cultured on glass-bottomed dishes (ibidi, 80827). For staining, the cells were washed with PBS and fixed in 3.7% formaldehyde for 10 min at room temperature. The formaldehyde was exchanged stepwise with PBS-0.02% Tween-20, followed by three complete washes with PBS. NPC and hiPSC were permeabilized with 0.5% Triton X-100 in PBS0.02% Tween-20 for 10 min and blocked for 1 h in blocking solution (3% BSA and 0.1% Triton X-100 in PBS). The cells were incubated overnight at 4 °C with the primary antibody diluted in blocking solution (POU5F1 C-10, Santa Cruz Biotechnology, sc-5279, 1:400; SOX2 E-4, Santa Cruz Biotechnology, sc-365823, 1:200; NANOG P1-2D8, deposited to the Developmental Studies Hybridoma Bank (DSHB) by Common Fund Protein Capture Reagents Program (DSHB Hybridoma Product PCRP-NANOGP1-2D8), 1:200; PAX6, Abcam ab5790, 1:200; Nestin, R&D Systems, MAB1259, 1:200). After three washes with PBS-0.02% Tween-20, the cells were incubated with secondary antibody diluted in blocking solution for 1 h at room temperature (goat anti-mouse–Alexa Fluor 488, 1:2,000; goat anti-rabbit–Alexa Fluor 488, 1:2,000 or goat anti-mouse–Alexa Fluor 633, 1:500; Thermo Fisher Scientific, A-11001, A-11034 and A-21052, respectively). The cells were washed another three times in PBS-0.02% Tween-20 before imaging, and 4,6-diamidino-2-phenylindole (DAPI) was added during the second wash step (1:1,000). Neurons were permeabilized for 10 min in PBS-0.7% Tween-20 and blocked for 1 h in neuron blocking solution (1% BSA, 0.1% Triton X-100 and 10% FCS in PBS). The cells were washed once in 0.1% BSA in PBS and incubated overnight with the primary antibody diluted in PBS containing 1% BSA at 4 °C (anti-MAP2, 1:1,000 and anti-CHAT, 1:200; Abcam, ab92434 and ab6168, respectively). After three washes in PBS containing 0.1% BSA, the cells were incubated with the secondary antibody diluted in PBS containing 1% BSA for 1 h at room temperature (goat anti-rabbit A633, 1:500 and goat anti-chicken A488, 1:2,000; Thermo Fisher Scientific, A-21070 and A-11039, respectively), followed by another three washes with 1% BSA in PBS-0.05% Tween-20, with DAPI added during the second wash step (1:1,000). Cardiomyocytes were blocked and permeabilized in blocking solution (3% BSA and 0.1% Triton X-100 in PBS) for 1 h at room temperature. After three washes with PBS-0.1% Tween-20, the cells were incubated overnight at 4 °C with primary antibody (anti-ɑ-actinin-2, Sigma-Aldrich, A7811, 1:800 and anti-cardiac troponin T, CT3, deposited to the DSHB by Lin, J. J. -C., 1:5) diluted in staining solution (1% BSA and 0.1% Tween in PBS). After three washes with PBS0.1% Tween-20, the cells were incubated with the secondary antibody (goat anti-mouse–Alexa Fluor 488, Thermo Fisher Scientific, A-11001, 1:2,000) and DAPI (1:1,000) diluted in staining solution for 1 h at room temperature in the dark. The cells were washed three times in PBS-0.1% Tween-20 and imaged in PBS.

RNA-Seq library construction

A total of 250 ng of the same total RNA used for mim-tRNAseq library preparation was used for mRNA-Seq library construction with a Zymo-Seq RiboFree total RNA library kit (Zymo Research, R3000). The libraries were quantified using a Qubit dsDNA HS assay, fragment size was determined on an Agilent TapeStation and the libraries were sequenced for 120 cycles on an Illumina NovaSeq platform, generating >21 × 106 reads per library.

tRNA-Seq library construction

The tRNASeq libraries were prepared using the mim-tRNAseq workflow21,22. Briefly, total RNA from two biological replicates for each cell line was mixed with synthetic Escherichia coli tRNA-Lys-UUU-CCA and E. coli tRNA-Lys-UUU-CC at a 3:1 ratio, followed by dephosphorylation with T4 PNK (NEB, M0201S) and ethanol precipitation. The RNA samples were resolved on denaturing 10% polyacrylamide, 7 M urea and 1×TBE gels. RNA of 60–100 nt in length was recovered by gel excision and elution from gel slices, followed by ethanol precipitation. The gel-purified tRNA was then ligated to pre-adenylated, barcoded 3′-adaptors22 in 1×T4 RNA ligase buffer, 25% PEG-8000, 20 U Superase In (Thermo Fisher Scientific, AM2696) and 1 µl T4 RNA ligase 2, truncated KQ (NEB, M0373S). The mix was incubated for 3 h at 25 °C and the ligation products were purified by size selection on a 10% polyacrylamide, 7 M urea and 1×TBE gel. Adaptor-ligated tRNA (100 ng) was annealed with 1 µl of 1.25 µM RT primer (5′-pRNAGATCGGAAGAGCGTCGTGTAGGGAAAGAG/iSp18/GTGACTGGAGTTCAGACGTGTGCTC-3′, where iSp18 is a 18-atom hexa-ethyleneglycol spacer) at 82 °C for 2 min, followed by incubation at 25 °C for 5 min. Reverse transcription was performed with 500 nM TGIRT (InGex, TGIRT50) in 50 mM Tris–HCl pH 8.3, 75 mM KCl, 3 mM MgCl2, 5 mM DTT (from a freshly prepared 100 mM stock), 1.25 mM dNTPs and 20 U Superase In at 42 °C for 16 h. After reverse transcription, NaOH was added to a final concentration of 0.1 M and the RNA was hydrolysed by incubating the samples for 5 min at 90 °C. Complementary DNA products were separated from unextended primer on a 10% polyacrylamide, 7 M urea and 1×TBE gel. Regions corresponding to cDNAs that were >10 nt longer than the RT primer were excised after SYBR Gold staining. Gel-purified and ethanol-precipitated cDNA was incubated for 3 h at 60 °C with CircLigase ssDNA ligase (Lucigen) in 1×reaction buffer supplemented with 1 mM ATP, 50 mM MgCl2 and 1 M betaine. Following enzyme inactivation for 10 min at 80 °C, one-fifth of the circularized cDNA was used directly for library construction PCR with a common forward (5′-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC-3′) and unique indexed reverse primers (5′-CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTG-3′; NNNNNN, the reverse complement of an Illumina index sequence; asterisk, phosphorothioate bond) with KAPA HiFi DNA polymerase (Roche) in 1×GC buffer with initial denaturation at 95 °C for 3 min, followed by five cycles of 98 °C for 20 s, 62 °C for 30 s and 72 °C for 30 s at a ramp rate of 3 °C s−1. The PCR products were purified using a DNA Clean and Concentrator 5 kit (Zymo Research), quantified with a Qubit dsDNA HS kit (Thermo Fisher Scientific, Q32851) and sequenced for 150 cycles on an Illumina NextSeq 500 platform, generating >2.5 × 106 reads per library.

ChIP–Seq library construction

Cells cultured in six-well plates were fixed with 0.8% methanol-free formaldehyde (Thermo Fisher Scientific, 28906) in DMEM medium for 10 min at room temperature with gentle shaking, followed by quenching with 0.125 M glycine for 5 min. All buffers in the subsequent steps of the protocol were supplemented with cOmplete EDTA-free protease inhibitor cocktail (Roche, 1187358000). The cells were washed twice with ice-cold PBS and resuspended in Farnham buffer (5 mM PIPES pH 8.0, 85 mM KCl and 0.5% IGEPAL-CA 630), followed by snap freezing in liquid nitrogen.

Chromatin was isolated and sheared following the NEXSON protocol90. Frozen cell pellets were thawed on ice and sonicated for 2 min in 1 ml tubes (Covaris, 520130) on a Covaris S220 ultrasonicator at peak power = 75 W, duty factor = 2% and cycles per burst = 200. Isolated nuclei were washed once with Farnham buffer and resuspended in shearing buffer (10 mM Tris–HCl pH 8.0, 0.1% SDS and 1 mM EDTA). Chromatin was sheared on a Covaris S220 system by sonication for 9 min (for BRF1 ChIP) or 18 min (for all other ChIP) in 1 ml tubes at peak power = 140 W, duty factor = 5% and cycles per burst = 200. The sheared chromatin was clarified by centrifugation for 10 min at 16,000g. DNA isolated from 10 µl sheared chromatin was used for size analysis on an Agilent TapeStation system; a DNA fragment-size distribution of 100–800 bp was considered suitable for ChIP. The sheared chromatin was snap-frozen and stored at −80 °C. A 10 µl aliquot was used to determine the DNA concentration using a Qubit dsDNA HS assay. For this, crosslinks were reversed by overnight incubation with 0.2 M NaCl at 65 °C, followed by incubation with 50 µg ml−1 RNase A (Thermo Fisher Scientific, EN0531) at 37 °C for 30 min, after which the samples were incubated with 200 µg ml−1 proteinase K (Sigma-Aldrich, P2308) at 65 °C for 1 h. The DNA was purified using a DNA ChIP clean and concentrator kit (Zymo Research, D5205) and eluted in 10 µl of 10 mM Tris pH 8.5 with 0.1 mM EDTA.

The sheared chromatin was thawed on ice. For the RPC1 ChIP, 5 μg chromatin was diluted 1:8 with ChIP Dilution buffer (23 mM Tris–HCl pH 8.0, 200 mM NaCl, 2.3 mM EDTA and 1.3% Triton X-100). Magna ChIP protein A + G magnetic beads (Merck, 16-663) were blocked with 5 mg ml−1 BSA in PBS for 2 h at room temperature on a rotating platform and resuspended in ChIP dilution buffer. Drosophila melanogaster spike-in chromatin (Active Motif, 53083) was added to a final concentration of 0.5% and the chromatin was pre-cleared by incubation with 10 µl blocked magnetic beads for 1 h at 4 °C on a rotating platform. The pre-cleared chromatin was incubated with 5 μg anti-POLR3A/RPC1 (Cell Signaling Technologies, 12825) and 0.2 µg Drosophila antibody (Active Motif, 61686) overnight at 4 °C on a rotating platform. For the H3K4me3 and H3K27me3 ChIP, 2 µg pre-cleared chromatin and 5 µl anti-H3K4me3 (Active Motif, 39159) or anti-H3K27me3 (Millipore, 07-449) was used per ChIP and spike-in chromatin was omitted. The samples were then incubated with 60 µl blocked magnetic beads for 2 h at 4 °C on a rotating platform. The beads were washed sequentially with low-salt buffer (0.1% SDS, 1% Triton X-100, 2 mM EDTA pH 8.0, 20 mM Tris–HCl pH 8.0 and 150 mM NaCl), high-salt buffer (0.1% SDS, 1% Triton X-100, 2 mM EDTA pH 8.0, 20 mM Tris–HCl pH 8.0 and 500 mM NaCl), lithium chloride buffer (0.25 M LiCl, 1% IGEPAL-CA 630, 1% sodium deoxycholate, 1 mM EDTA and 10 mM Tris–HCl pH 8.0) and Tris–EDTA buffer (10 mM Tris–HCl and 1 mM EDTA pH 8.0). Each wash was performed twice at 4 °C for 10 min on a rotating platform. The DNA was eluted through two incubations with ChIP elution buffer (1% SDS and 50 mM NaHCO3) for 30 min at room temperature on a rotating platform. The crosslinking was reversed and DNA was purified as for the input chromatin.

For the BRF1 and RPC7α/POLR3G ChIP, 5 μg chromatin was diluted 1:8 with ChIP RIPA buffer (50 mM Tris–HCl pH 8.0, 150 mM NaCl, 2 mM EDTA pH 8.0, 1% NP-40, 0.5% sodium deoxycholate and 0.1% SDS) and pre-cleared as described above. Magnetic beads were blocked with PBS containing BSA as for the RPC1 ChIP but resuspended in ChIP RIPA buffer. The pre-cleared chromatin (5 µg) was incubated overnight with 10 µl anti-BRF1 (Abcam, ab264191) or 20 µl anti-RPC7α/POLR3G (Santa Cruz Biotechnology, sc-21754) at 4 °C with rotation. The samples were then incubated with 60 µl blocked magnetic beads with rotation for 6 h at 4 °C. The beads were washed three times with low-salt buffer and once with high-salt buffer for 10 min at 4 °C with rotation. DNA was eluted from the beads by two sequential 30 min incubations with RIPA elution buffer (1% SDS and 100 mM NaHCO3) at room temperature with rotation.

H3K9me3 ChIP was performed following the Ren laboratory ChIP protocol (http://bioinformatics-renlab.ucsd.edu/RenLabChipProtocolV1.pdf). Dynabeads M-280 sheep anti-rabbit IgG (50 μl; Thermo Fisher Scientific, 11203D) were washed three times with 5 mg ml−1 BSA in PBS (BSA/PBS) and resuspended in 100 μl BSA/PBS. Anti-H3K9me3 (5 μl; Cell Signaling Technologies, 13969) was added to 900 μl BSA/PBS and then combined with the magnetic beads. The mixture was incubated overnight on a rotating platform at 4 °C. The beads were then washed three times with 1 ml BSA/PBS and resuspended in 100 μl BSA/PBS. ChIP reactions were set up by taking 5 μg chromatin and adjusting the volume to 1 ml with TE buffer (10 mM Tris–HCl and 1 mM EDTA pH 8.0). A 300 μl volume of STOCK solution (1% Triton X-100 and 0.1% sodium deoxycholate, prepared in Tris–EDTA buffer) was added to each reaction, followed by mixing with the resuspended 100 μl antibody–beads mixture. The mixture was incubated overnight on a rotating platform at 4 °C. The beads were washed eight times with 1 ml RIPA2 buffer (50 mM HEPES pH 8.0, 1 mM EDTA, 1% NP-40, 0.7% sodium deoxycholate and 0.5 M LiCl), followed by one wash with 1 ml Tris–EDTA. After removing the TE buffer using a magnetic rack, the beads were centrifuged for 1 min at 4,000 r.p.m. and the remaining liquid was removed. The protocol for DNA elution from beads was performed as for the RPC1 ChIP. Crosslinking reversal and DNA clean-up were performed as for input chromatin.

Sequencing libraries from ChIP-eluted DNA samples were prepared using an Ovation ultralow V2 DNA-Seq library preparation kit (Tecan, 0344NB) and SPRIselect beads (Beckman Coulter, B23318) according to the manufacturer’s protocol. The library concentration was determined using a Qubit dsDNA HS assay (Thermo Fisher Scientific) and fragment-size distribution was assessed on an Agilent TapeStation system. An Illumina NovaSeq platform was used to perform 110-bp paired-end sequencing, generating >30 × 106 reads per library.

ATAC–Seq library construction

ATAC–Seq was performed using an ATAC–Seq kit (Active Motif, 53150) according to the manufacturer’s instructions. Briefly, two biological replicates of 50,000 cells from each cell type were tagmented at 37 °C for 60 min. After verifying a nucleosomal banding pattern in the resulting libraries on an Agilent Tapestation, they were quantified using a KAPA library quantification kit (catalogue number, KK4854) and sequenced in a 75-bp paired-end run on an Illumina NextSeq 500 platform, generating 21.5–46 × 106 reads per library.

Ribosome profiling library construction

Ribosome footprint libraries were prepared essentially as described previously40,91 with minor modifications. The cell medium was changed 2 h before harvesting. The cells were quickly washed with ice-cold PBS supplemented with 100 µg ml−1 cycloheximide (Sigma-Aldrich, C1988) and snap-frozen. For libraries prepared with cycloheximide in the lysis buffer, plates were thawed on ice and the cells were scraped off the plate in 400 µl polysome lysis buffer (20 mM Tris pH 7.4, 150 mM NaCl, 5 mM MgCl2, 1% Triton X-100, 1 mM DTT, 100 µg ml−1 cycloheximide, 25 U ml−1 Turbo DNase (Thermo Fisher Scientific, AM2238), 0.1% NP-40, 10 µg ml−1 aprotinin, 20 µM leupeptin, 2.5 µM pepstatin A, 0.5 mM 4-(2-aminoethyl)benzenesulfonyl fluoride hydrochloride and 1×phosphatase inhibitor cocktail). The samples were vortexed vigorously, triturated through a 26 G gauge needle and spun down for 7 min at 16,000g and 4 °C. The RNA concentration in the supernatant was measured using a Qubit RNA HS kit. RNA (20 µg) in 200 µl polysome lysis buffer was digested with 50 U RNase I (Thermo Fisher Scientific, AM2295) for 45 min at 2,000 r.p.m. and 22 °C.

For libraries prepared with cycloheximide and tigecycline in the lysis buffer, plates were thawed and cells from a 10 cm dish were lysed in 15 ml polysome lysis buffer supplemented with 0.1% NP-40 and 100 µg ml−1 tigecycline (Sigma-Aldrich, PZ0021). After incubation on ice for 5 min, extracts were pre-cleared by centrifugation for 5 min at 3,000g and 4 °C. Ribosomes were pelleted through 3 ml of a sucrose cushion (1 M sucrose, 20 mM Tris pH 8.0, 140 mM KCl, 5 mM MgCl2 and 1 mM DTT) in a Type 70 Ti rotor for 120 min at 50,000 r.p.m. and 4 °C. The ribosome pellets were rinsed once, dissolved in 200 µl drug-free polysome lysis buffer and incubated with 200 (hiPSC) or 300 U (NPC) RNase I for 45 min at 2,000 r.p.m. and 22 °C.

The RNase I digestion was stopped by the addition of 100 U Superase In (Thermo Fisher Scientific, AM2694), and the extracts were loaded on a 0.9 ml sucrose cushion (1 M sucrose in polysome lysis buffer), followed by centrifugation for 75 min at 120,000 r.p.m. and 4 °C in a S120AT2 rotor (Thermo Fisher Scientific). The pellet was dissolved in 400 µl LiDS/LET lysis buffer and RNA was extracted as described for total RNA isolation. The RNA (3 µg) was loaded on 15% polyacrylamide, 7 M urea and 1×TBE gels. Fragments in the range of 19–32 nt were excised from the gel and crushed with a pestle. The RNA was eluted in 400 µl gel elution buffer (0.3 M sodium acetate pH 4.5, 0.25% SDS, 1 mM EDTA pH 8.0) by heating (65 °C for 10 min), followed by snap freezing on dry ice for 10 min, thawing for 5 min at 65 °C and overnight incubation on a rotating wheel at room temperature. The gel debris was removed by centrifuging the samples through a Spin-X filter (Corning) and the RNA was purified by ethanol precipitation. The size-selected RNA was dephosphorylated for 45 min at 37 °C using T4 PNK (NEB, M0201S).

The dephosphorylated RNA was mixed with pre-adenylated adaptors containing five random nucleotides at their 5′ ends91 in 1×T4 RNA ligase buffer, 25% PEG-8000, 20 U Superase In and 1 µl T4 RNA ligase 2, truncated KQ (NEB, M0373S). The mix was incubated for 3 h at 25 °C and the ligation products were purified by size selection on a 12% polyacrylamide, 7 M urea and 1×TBE gel. The linker-ligated sample (50 ng) was used for rRNA depletion using a Ribo-Seq riboPOOL h/m/r depletion kit (siTOOLs) for the cycloheximide-only samples, and legacy RiboZero Gold kit (Illumina) for the cycloheximide + tigecycline samples following the manufacturer’s instructions. The rRNA-depleted footprints were annealed with RT primer (5′-pRNAGATCGGAAGAGCGTCGTGTAGGGAAAGAG/iSp18/GTGACTGGAGTTCAGACGTGTGCTC-3′) at 65 °C for 5 min and reverse transcribed for 30 min at 50 °C in an RT master mix containing 1×Protoscript II Buffer, 0.5 mM dNTPs, 10 mM DTT, 20 U Superase In and 200 U Protoscript II (NEB, E6560S). After reverse transcription, NaOH was added to a final concentration of 0.1 M and the RNA was hydrolysed by incubating the samples for 5 min at 90 °C. The cDNA products were purified by size selection on a 12% polyacrylamide, 7 M urea and 1×TBE gel, followed by ethanol precipitation. For cDNA circularization, a 20 µl reaction was prepared containing the gel-purified RT product mixed with 3 µM recombinant TS2126 RNA ligase 1 in circularization buffer (50 µM ATP, 2.5 mM MnCl2, 50 mM MOPS pH 7.5, 10 mM KCl, 5 mM MgCl2, 1 mM DTT and 1 mM betaine) and incubated for 3 h at 60 °C, followed by heat inactivation for 10 min at 80 °C. Libraries were constructed from circularized cDNA using the same primers as for tRNASeq. Amplification was performed using KAPA HiFi DNA polymerase (Roche) in 1×HiFi buffer with an initial denaturation at 95 °C for 3 min, followed by 6–10 cycles of 98 °C for 20 s, 62 °C for 30 s and 72 °C for 15 s at a ramp rate of 3 °C s−1. The PCR products were purified by size selection on an 8% polyacrylamide and 1×TBE gel. Excised gel slices were crushed with a pestle and DNA was eluted by rotating samples overnight in 300 µl DNA elution buffer (300 nM NaCl, 10 mM Tris–HCl pH 7.5 and 0.2% Triton X-100). After ethanol precipitation, the libraries were quantified using a Qubit dsDNA high sensitivity kit and 75–86-bp single-end sequencing was performed on an Illumina NextSeq 500 platform, generating >19 × 106 reads per library.

Analysis of RNA-Seq data

RNA-Seq datasets were pre-processed to remove potential 3′ adaptors using Trim Galore v0.6.4 with default settings, retaining reads with a length of ≥20. The reads were aligned to the GRCh38 human genome using STAR v2.6.1c with the following parameters: --outSAMtype BAM SortedByCoordinate --outFilterMultimapNmax 1 --outFilterMismatchNmax 1 --quantMode TranscriptomeSAM GeneCounts. The featureCounts v1.6.2 software was used to count reads overlapping a filtered set of protein-coding gene annotations from the GENCODE basic gene annotation. Differential gene expression analysis was performed using DESEq2 v1.38.1. Gene expression heatmaps were generated using ComplexHeatmap v2.14.0 (ref. 92) by combining standardized gene counts and significant log2-transformed fold-change values from DESeq2 (Padj ≤ 0.05).

Analysis of tRNASeq data

Demultiplexing and 3′ sequencing adaptor removal was performed using cutadapt v3.5. Indels were disallowed (--no-indels) and both read ends were quality trimmed with a quality score of 30 (-q 30,30). As sequencing was performed with more cycles than the length of any sequenced fragment, all reads were expected to contain adaptors and only trimmed reads were retained with --trimmed-only. The reads were further trimmed to remove the two 5′-RN nucleotides introduced by circularization from the RT primer with -u 2. In both processing steps, reads <10 nt were discarded using -m 10. Analysis of tRNA expression and modification was performed with v1.2 of the mim-tRNAseq computational package (https://mim-trnaseq.readthedocs.io/en/latest/index.html)21. Briefly, the full set of 619 predicted tRNA genes for the hg38 human genome assembly were downloaded from GtRNAdb93 and the 22 mitochondrially encoded human tRNA genes were fetched from mitotRNAdb94. After intron removal and the addition of 5′-G (for tRNA-His) and 3′-CCA (for nuclear-encoded transcripts), a curated set of 599 nuclear-encoded tRNA sequences (excluding tRNAs with non-canonical secondary structure alignments or undetermined anticodons) and 22 mitochondrially encoded tRNA sequences was compiled as an alignment reference (--species Hsap). The reads were aligned to this reference with a cluster ID of 0.97, maximum mismatch tolerance at a number of nucleotides equal to 7.5% read length for the first alignment round and 5% read length for realignment, a deconvolution coverage ratio of 0.4 at mismatch sites to allow accurate cluster deconvolution and a minimum coverage threshold of 0.05% total reads per transcript for low coverage transcript filtering. The following command was used: mimseq --species Hsap --cluster-id 0.97 --threads 40 --min-cov 0.0005 --max-mismatches 0.075 --control-condition kiPSC --deconv-cov-ratio 0.4 -n hg38_diff --out-dir hg38_WTdiff_2rep_deconv0.4_ID0.97_0.075_remap0.05_v12/ --max-multi 6 –remap --remap-mismatches 0.05 sampleData_ht_diff_2rep.txt.

In addition, DESeq2 was run on tRNA transcripts with single-transcript resolution by first removing those still in clusters from the counts table (evidenced by the presence of multiple transcripts in the name, separated by ‘/’) and repeating DESeq2 analysis on these. Isotype counts, generated by aggregating anticodon counts for the same tRNA isotype were also generated, and DESeq2 was additionally run on this count data.

Codon usage analysis

We used RSEM v1.3.1 to calculate coding-gene expression in TPM. First, we built a custom reference transcriptome annotation, which was defined on the basis of APPRIS annotations95. From these we extrapolated the MANE-annotated transcript for each gene and retained transcripts with a coding sequence beginning with an AUG codon and ending with a UAG/UAA/UGA codon, a nucleotide length that was a multiple of three and no unidentified bases. Sequences without a perfect match with a protein sequence in UniProtKB/SwissProt were removed, yielding a reference containing 16,731 transcripts.

An RSEM reference for read alignment using STAR was built using rsem-prepare-reference with the --star option enabled. For TPM calculation, we used this reference and adaptor-trimmed RNA-Seq reads with rsem-calculate-expression for each sample.

To calculate the codon usage in each sample, we weighted the 61 sense codon frequencies of each transcript in our custom annotation by the TPM expression of the transcript in that sample. We separately counted start AUG codons from coding sequence AUG codons for the distinction between dynamics at start and coding sequence methionine codons. These raw codon usages were additionally summed across all transcripts to generate aggregated codon usage per codon. For normalization, these values were divided by the sum of all codon usages per sample, representing proportional codon usage.

For comparison to tRNA anticodon abundance, we utilized raw mim-tRNAseq read counts summed by anticodon and converted to proportions of total tRNA-aligned reads. Where no perfect match between anticodon and codon was available due to wobble pairing, we duplicated the anticodon abundance of tRNAs that are known to wobble pair to such codons, such that all 61 sense codons had corresponding tRNA anticodon abundance values.

Ribosome profiling data analysis

Sequencing libraries were demultiplexed and adaptor-trimmed using Cutadapt v3.5 (ref. 96) as described for the tRNA-Seq. Trimmed reads >10 nt were aligned to a human rRNA reference using Bowtie v1.2.2 (ref. 97) with the following options: -p 40 -S --best. Ribosomal RNA-filtered reads were aligned to GRCh38 using STAR v2.6.1c98 with the following options: --outFilterMultimapNmax 1 --outSAMtype BAM SortedByCoordinate --outFilterMismatchNmax 0 --alignEndsType Local --seedSearchStartLmax 14 --alignIntronMax 10000 --sjdbOverhang 28 --outFilterIntronMotifs RemoveNoncanonicalUnannotated --quantMode TranscriptomeSAM --outSAMattributes NH HI AS nM NM MD. Between 5.3 × 106 and 21.9 × 106 pre-processed reads were aligned to coding regions in the GRCh38 transcriptome.

We identified the A- and P-site codon in each open reading frame-mapped read using Scikit-ribo99, which uses a random forest with recursive feature selection for accurate A-site prediction and a generalized linear model for codon dwell time estimation based on matched ribosome profiling and RNA-Seq datasets. Kallisto 0.44.0 with the parameters -b 100 --single -l 180 -s 20 -t 40 was used to quantify transcript abundances in TPM from RNA-Seq data based on the reference set of MANE-annotated transcripts. To avoid memory errors due to the large size of the human genome and the presence of multiple transcript isoforms, all RNAfold dependencies in Scikit-ribo were omitted and the index was built separately for each chromosome. To make the hg38 Gene Transfer Format file compatible with Scikit-ribo, transcript/untranslated region annotations were removed. For each transcript, the start codon in the first exon and the stop codon in the last exon were adjusted to represent transcript start and end coordinates, taking into account the gene strand. To calculate relative codon dwell times (defined as the difference between the dwell time of each codon and the median of all codon dwell times99), short (20–22 nt) and long (28–33 nt) ribosome footprints were analysed separately.

ChIP–Seq read alignment and multimapping analysis

ChIP–Seq and ATAC–Seq datasets were pre-processed to remove potential 3′ adaptors using Trim Galore v0.6.4 with default settings, retaining reads with a length of ≥20. Given the high frequency of tRNA gene duplication, which can include flanking sequences4,100, we first analysed the extent of multimapping for RPC1 ChIP–Seq reads mapping to predicted tRNA genes. First, 2 × 110-bp paired-end reads from RPC1 ChIP–Seq libraries were aligned to the human GRCh38 reference genome using STAR v2.6.1c, allowing up to one mismatch per read (--outFilterMismatchNmax 1), up to ten alignment positions (--outFilterMultimapNmax 10) and in end-to-end alignment mode with prohibited introns in reads (--alignEndsType EndToEnd --alignIntronMax 1). Read duplicates were then removed using Picard Tools MarkDuplicates v2.17.10, with REMOVE_DUPLICATES = true to enable direct filtering of duplicates in the output binary alignment map (bam) file. The mmquant v1.3 (ref. 101) tool was used to count reads overlapping the 619 predicted tRNA genes, with each gene extended by 125 bp of upstream and downstream sequence. Using a custom Python script, the mmquant output was parsed such that for each library input, an output was produced consisting of tRNA genes as rows and one column each for uniquely mapping read counts, multimapping read counts and the proportion of total reads per tRNA represented by multimapping reads. We defined tRNA genes that are not distinguishable in ChIP–Seq data by finding the consensus list of tRNA genes with ≥25% multimapping reads and ≥50 total aligned reads in RPC1 ChIP–Seq libraries (n = 61 from 27 isodecoders and 16 anticodon families; Supplementary Table 4). As expected, 20 of the 23 tRNA genes in the four tandem repeats of a cluster of tRNA genes on chromosome 1 (Glu-CTC, Gly-TCC, Asp-GTC, Leu-CAG and Gly-GCC)100 fall within this group. These 61 tRNA genes were excluded from all gene-level analyses of ChIP–Seq and ATAC–Seq datasets. Given that nearly all multi-mapped reads aligned to identical gene copies coding for the same tRNA transcript, one alignment position was randomly chosen and reported for such reads in Pol III occupancy and chromatin accessibility analysis aggregated by tRNA transcript.

ChIP–Seq and ATAC–Seq peak calling and annotation

Adaptor-trimmed ChIP–Seq and ATAC–Seq libraries were aligned to the GRCh38 reference genome using STAR with the following settings: up to one mismatch per read, a maximum of ten alignment positions, end-to-end alignment, prohibited introns and only one alignment reported per read (--outFilterMismatchNmax 1 --outFilterMultimapNmax 10 --alignEndsType EndToEnd --alignIntronMax 1 --outSAMmultNmax 1). Reads from spike-in-containing libraries were also aligned to the D. melanogaster r6.39 genome with the same settings, except only uniquely mapped reads were retained (--outFilterMultimapNmax 1). Read duplicates were then removed using Picard Tools MarkDuplicates v2.17.10 as described earlier, and for ATAC–Seq libraries, alignments to the mitochondrial genome were also filtered. To account for dimerization of the transposon before insertion49, filtered ATAC–Seq reads were additionally shifted by +4 bp and −5 bp for positive and negative strand alignments, respectively, using deepTools alignmentSieve v3.4.0, which was simultaneously used to split fragments with a maximum length of 100 nt representing NFRs. Both operations were performed simultaneously using the --ATACshift and --maxFragmentLength 100 parameters. The ATAC–Seq NFR alignments were then converted to BEDPE format for peak calling using alignmentSieve --BED.

Peaks were called using MACS callpeak v2.2.6, supplying ChIP input samples from HPSI0214i-kucg_2 for the kucg-2 hiPSC and CM datasets, HPSI0214i-wibj_2 for the wibj-2 hiPSC datasets and from HPSI0214i-kucg_2-derived NPC for NPC and neuron datasets, specifying the fragment sizes (--extsize) with shifting model building disabled (--nomodel). The small region size used to calculate dynamic lambda was reduced to 500 bp (--slocal 500) and peak summits were also reported (--call-summits). MACS peak calling was performed on all reads without duplicate removal (--keep-dup all), as these had previously been filtered for duplicates using Picard Tools. For the ATAC–Seq peak calling, the BEDPE files generated above were used (--f BEDPE) without the corresponding control input samples, shifting model building was not disabled and fragment sizes were not specified. For the H3K27me3 ChIP–Seq peak calling, the --broad parameter was additionally specified to call broad peaks for this mark. For both data types, significant peaks were called if the FDR-adjusted Poisson distribution P value was ≤0.05. Predicted peaks were filtered using the ENCODE project unified GRCh38 blacklist regions bed file (https://www.encodeproject.org/files/ENCFF356LFX/) by identifying overlaps using bedtools intersect v2.29.2. The blacklist-filtered peak region summits were then annotated by searching for their nearest predicted tRNA locus with bedtools closest using the filtered set of tRNA genes excluding those with significant ‘within isodecoder’ multimapping reads in hiPSC, as defined above. Peaks with tRNA ‘hits’ were then defined for each sample as those within 125 bp of an annotated tRNA gene, whereas tRNA hits shared by both biological replicates of an experimental condition were used to define consensus tRNA peaks for that condition. Using RPC1 tRNA peak datasets, we defined housekeeping tRNAs as those that were shared between consensus sets for all cell types and those that were absent from all consensus lists constitute persistently inactive tRNAs. Repressed tRNA genes were defined as the difference between the union of all tRNA peaks in all cells and the housekeeping set.

ChIP–Seq coverage normalization and visualization

For visual analysis of ChIP–Seq datasets, duplicate-filtered bam files were converted into normalized bigWig signal tracks using deepTools v3.5.1. To calculate the required normalization factors for individual libraries, mmquant was used to count ChIP–Seq reads that overlapped the annotated hg38 tRNA genes extended by 125 bp at both ends. Using edgeR v3.34.1, normalization factors were calculated using these counts as input for the calcNormFactors function using the ‘RLE’ method. Relative library sizes were taken as the sum of reads assigned to tRNA features, scaled per million reads. The edgeR normalization factors were further multiplied by these library-size factors and the reciprocal of this product was used for normalized signal generation. DeepTools bamCoverage with a normalization bin size of 1 bp (--binSize 1), the previously calculated scale factors (--scaleFactor) and read extension using fragment lengths that were previously estimated by Phantompeakqualtools (--extendReads) was implemented to generate normalized signal files. Plotting of this signal was performed with deepTools computeMatrix (in reference-point mode) and plotHeatmap, using the tRNA gene start as a reference (--referencePoint TSS), bed files of housekeeping, repressed and inactive tRNAs as regions (-R), and either 500 bp or 1,000 bp flanking the tRNA gene start (-a 500 -b 500 or -a 1000 -b 1000, respectively).

Differential occupancy analysis with DiffBind

For the differential occupancy analysis, we utilized DiffBind v3.2.7 and specified the set of human tRNA genes filtered for <25% multimapping reads (as described earlier) for inclusion in the analysis. This enabled us to obtain occupancy analysis results for all tRNA genes regardless of the presence or absence of a ChIP peak. Briefly, we first generated a bed file of these tRNA genes extended by 200 bp on either end to capture all ChIP signals around each tRNA. To avoid peak merging by DiffBind, overlapping regions in these extended features (for tRNAs separated by less than 200 bp) were determined, using bedtools intersect, and subtracted from the extended features using bedtools subtract. Sample sheets specifying duplicate-filtered bam files for alignments to the human (‘BamReads’ column) and D. melanogaster (‘SpikeIn’ column) genomes, extended and processed tRNA regions (‘Peaks’ column) as well as metadata such as condition and replicate were supplied for DiffBind analysis. After read counting (dba.count), blacklisted regions were not filtered, as this had previously been done after peak calling, but non-redundant sets of greylist regions were determined and excluded from analysis using dba.blacklist with blacklist = FALSE. Normalization and differential occupancy analysis were performed using dba.normalize with RLE normalization from DESeq2 (Benjamini–Hochberg-adjusted Wald test P value) combined with spike-in normalization (normalize = DBA_NORM_RLE, spikein = TRUE) and dba.analyse. Finally, the results for individual contrasts were retrieved using the DiffBind dba.report function, and annotation information (that is, tRNA gene name) was restored using the annotatePeakInBatch function of ChIPpeakAnno v3.26.4.

Whole-genome bisulfite sequencing analysis

Public whole-genome bisulfite sequencing data for the human H1 human embryonic stem cell line were obtained from ENCODE project number ENCSR617FKV (Gene Expression Omnibus (GEO): GSE80911) by downloading the processed bed files of methylation state at CpG nucleotides for both biological replicates (ENCFF434CNG and ENCFF573YXL). A custom tRNA annotation was generated by extending each tRNA gene with 125 bp upstream for the filtered set of tRNA genes without significant multimapping in the RPC1 ChIP–Seq datasets. The CpG methylation data were then matched to these annotations using bedtools intersect v2.29.2. The proportion of CpG methylation, present in column 11 of the bed files, was plotted per biological replicate separated by tRNA gene activity defined by RPC1 occupancy in the four cell types (‘ChIP–Seq and ATAC–Seq peak calling and annotation’ section).

Sequence motif analysis

To compare A- and B-box sequences in the three activity classes of tRNAs defined from the RPC1 ChIP–Seq data, we first generated multiple sequence alignments of all human hg38 tRNA genes to tRNA covariance models using the cmalign command from Infernal v1.1.2. We then extracted A- and B-box sequences from these alignments corresponding to positions 9–21 and 75–85, respectively. Sequence logos were then generated for these subsequences, separated by tRNA activity class, using the Python package logomaker v0.8.

To define genome-wide motifs for A- and B-box promoter sequences, we used the online MEME prediction tool102 (https://meme-suite.org/meme/tools/meme) and uploaded all 619 predicted tRNAs in the hg38 genome from GtRNAdb93. Motif prediction was run in classic mode, with one occurrence per sequence allowed per motif, as is expected for A and B boxes in tRNA sequences. The search was limited to two motifs with a width of 9–11 nt, based on previous predictions of A- and B-box consensus motif lengths. Finally, motif searching was limited to the given strand only, as the supplied sequences were mature tRNA and not DNA. MEME found exactly two motifs in the input sequences and the consensus for each corresponded to known A- and B-box consensus sequences.

The results in XML format were downloaded, imported into R v4.2.2 using the read_meme function and plotted using view_motifs from universalmotif v1.16.0. These were converted into position weight matrices using universalmotif convert_type for each motif instance. Motif densities were calculated using a customized version of the seqPattern function plotMotifDensityMap that returns the motif densities in each sequence. Briefly, motifScanHits is called using the imported position weight matrices to return motif hits in each sequence passing a minimum motif counting score of 90% (minScore = 90%). Two-dimensional binned kernel density estimates were then calculated on the motif hits using bkde2D from KernSmooth v2.23 with a bandwidth of 1 bp in both coordinate directions. For each sequence, the maximum density score was extracted and used for comparing distributions of motif densities in each tRNA activity class.

tRNet architecture

The tRNet CNN is a multi-class CNN implemented in keras v2.2.4 (Tensorflow v1.15.5 backend) to predict the class of tRNA gene as housekeeping, repressed or inactive from genomic input sequence in one-hot-encoded format (A = [1,0,0,0], C = [0,1,0,0], G = [0,0,1,0] and T = [0,0,0,1]). Conceptually, the architecture is based on that described for BPNet56 with minor adjustments to the size of the receptive field of the network and the output (Extended Data Fig. 5g). Briefly, tRNet consists of an initial convolutional layer with 128 filters and a width of 20 bp, followed by eight consecutive dilated convolutional layers with 128 filters and a width of 10 bp, where the dilation rate is doubled at each layer. Such exponential dilation rates double the number of skip positions in the convolutional filter, effectively increasing the complexity of pattern learning and the receptive field in sequence space that is visible to the network. Each convolutional layer is followed by a rectified linear activation (f(x) = max(0,x)). A global max pooling follows the convolutional layers and precedes the fully connected hidden layer, which contains 32 neurons. The tRNet output consists of a final fully connected layer with softmax activation to three outputs, each representing the probability of a tRNA gene belonging to each class based on the input sequence.

tRNet transfer learning approach

During the training of tRNet we utilized a transfer learning approach from a network trained for a binary classification task. In this network the architecture is identical to that of tRNet, except that the final output layer consists of a single sigmoid activated output to predict whether the input sequence belongs to a housekeeping tRNA or not. Inputs for this model also only consisted of sequences from the subset of tRNA genes in housekeeping and inactive classes. Given the more distinct sequence difference between these two classes, this is a simpler classification problem from which learned features are exploited for better generalization in the final multi-class model. Transfer learning was achieved by training the modified model on input sequences from housekeeping and inactive genes, and their gene class labels obtained from called peaks from ChIP–Seq data. Next, all layers were frozen to prevent retraining of already trained layers and the model architecture was updated to replace the output layer with one producing three softmax activated outputs, as described earlier. This model was then retrained on one-hot-encoded sequence data from all three classes and their corresponding tRNA gene class labels. Finally, the last convolutional layer of the network was unfrozen and the model trained once again to optimize the weights of this layer for the new multi-class model.

CNN training and evaluation

All networks were trained with the same approach on 80% of the input data, and validated on 20% held-out data. To evaluate model performance, K-fold cross-validation (k = 5) was implemented on training data, and performance in the form of validation accuracy and loss across the five folds was compared. Training of the initial binary classification model, from which learning was transferred, was implemented using the Adam optimizer (learning rate = 0.00025, as determined by parameter hypertuning), binary cross-entropy loss function and early stopping with patience of ten epochs. For final model training, after transfer learning, the same training parameters were specified, except that a categorical cross-entropy loss function more tailored to the multi-class output of this model was used. Final model performance was evaluated on held-out testing data and the accuracy the of model predictions for each class was assessed using the AUROC by plotting the one-versus-rest macro-average scores.

Nucleotide contribution score calculation and motif analysis with TF-Modisco

To calculate the contribution scores of each nucleotide in each input sequence to the final prediction, we employed the SHAP DeepExplainer module, an extension of DeepLift for calculating SHAP contribution scores. These contribution scores, one for every nucleotide in every input sequence, are based on the difference in output between the model given a set of shuffled input sequences and the output of the model on actual tRNA upstream sequence. Ten dinucleotide-shuffled sequences for every input sequence were supplied for the contribution score calculation. The resulting DeepExplainer hypothetical contribution scores were multiplied by the one-hot encoded matrix for each sequence to derive the final contribution scores for each sequence. The hypothetical and final contribution scores were calculated separately for every output, or task, of the model, corresponding to classification of sequences as housekeeping, repressed or inactive tRNAs.

TF-Modisco v0.5.14.1 was then run on the contribution scores from SHAP DeepExplainer for each task separately to find sequence enrichment or motifs among nucleotides with high contribution to model output. Significant high-importance windows in the sequences, or seqlets, were detected using a sliding window size of 15 bp, a flanking sequence of 5 bp and seqlet FDR threshold of 0.01 (TfModiscoWorkflow(sliding_window_size = 15, flank_size = 5, target_seqlet_fdr = 0.01)). Final patterns were assembled from detected seqlets with a window size of 20 bp, flaking sequence of 10 bp and a minimum of 20 seqlets per cluster (TfModiscoSeqletsToPatternsFactory(trim_to_window_size = 20, initial_flank_to_add = 10, final_min_cluster_size = 20).

Enhancer analysis

Enhancer elements for the GRCh38 genome were obtained from the UCSC GeneHancer Double Elite regulatory elements table, fetching elements whose identification and association to target genes are derived from more than one information source. First, enhancer elements per chromosome were downloaded with the table browser using a filter for ‘Enhancer’ in elementType (accessed 14 April 2023); these were merged to obtain all Double Elite GeneHancer enhancers in the GRCh38 genome. To find tRNA genes that overlap with this set of enhancers, bedtools closest v2.29.2 was used to obtain the closest enhancer to each of the tRNA loci with gene-resolution RPC1 ChIP occupancy data (n = 558); those overlapping tRNAs (distance of 0 bp) were retained (n = 55). As an additional source of evidence for enhancer activity, overlaps with FANTOM5 CAGE data were obtained from https://fantom.gsc.riken.jp/5/datafiles/reprocessed/hg38_latest/extra/enhancer/F5.hg38.enhancers.bed.gz and CAGE peaks overlapping all enhancer elements were identified using bedtools intersect v2.29.2. From this set of FANTOM5 CAGE-overlapping enhancers, those that also overlapped tRNAs were found with a combination of bedtools closest and filtering for a distance of 0 bp, as above. A table containing tRNA and overlapping enhancer position and identity information, FANTOM5 CAGE peak overlap, tissue and/or cell type specificity of each of these tRNA-associated enhancers as well as their predicted target genes was compiled (Supplementary Table 3). Tissue/cell type and target information were obtained by request from the GeneCards database (https://www.genecards.org/Guide/Datasets) for GeneHancer v5.16. Co-regulation with overlapping enhancers was assessed for tRNA genes that (1) were in the repressed and housekeeping class, (2) overlapped a Double Elite enhancer with FANTOM5 CAGE support and (3) demonstrated evidence of tissue/cell-type specificity in a cell type related to those used in this study. If an enhancer overlapped more than one tRNA, only those where the tRNAs are of the same activity status were retained for analysis.

Statistics and reproducibility

No statistical method was used to pre-determine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment. No data were excluded from the analyses. Information on the statistical tests used for each analysis and reproducibility is included in the relevant sections describing the method as well as in the figure legends.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.