Extensive 5′-surveillance guards against non-canonical NAD-caps of nuclear mRNAs in yeast

The ubiquitous redox coenzyme nicotinamide adenine dinucleotide (NAD) acts as a non-canonical cap structure on prokaryotic and eukaryotic ribonucleic acids. Here we find that in budding yeast, NAD-RNAs are abundant (>1400 species), short (<170 nt), and mostly correspond to mRNA 5′-ends. The modification percentage of transcripts is low (<5%). NAD incorporation occurs mainly during transcription initiation by RNA polymerase II, which uses distinct promoters with a YAAG core motif for this purpose. Most NAD-RNAs are 3′-truncated. At least three decapping enzymes, Rai1, Dxo1, and Npy1, guard against NAD-RNA at different cellular locations, targeting overlapping transcript populations. NAD-mRNAs are not translatable in vitro. Our work indicates that in budding yeast, most of the NAD incorporation into RNA seems to be disadvantageous to the cell, which has evolved a diverse surveillance machinery to prematurely terminate, decap and reject NAD-RNAs.


Supplementary Fig. 1 | NAD captureSeq library reference features and further analysis of the WT strain.
a, Method validation: Enrichment of spike-in NAD-RNAI (a regulatory RNA from E. coli) reads in WT NAD captureSeq. The height of the orange bars indicates the percentage of NAD-RNAI reads among total genomemapped reads in the sample group (S, ADPRC fully-treated), and blue bar height indicates the same in the negative control group (N, minus ADPRC). Error bars represent the mean + standard deviation (sd), n=3. p values are denoted by asterisks: (*) p <0.05; (**) p <0.01; (Student's t test, one sided). This analysis revealed efficient enrichment of the synthetic pure NAD-RNA in all samples. Particularly high enrichment was observed in the unfragmented libraries.
b, Positions of the first nucleotide of spike-in RNAI-mapped reads in the WT sample, confirming the reliable identification of NAD-RNA 5'-ends from the NAD captureSeq data. c, Distribution of 5' UTR lengths in S. cerevisiae according to published data 1 . The boxplot shows from bottom to top "minimum" (Q1-1.5 interquartile range (IQR, 25% to 75%)), first quartile (Q1, 25%), median (solid line, 50%), third quartile (Q3,75%), "maximum" (Q3+1.5IQR), and outliers (black dots). The sample size (n) is 4419. d, Positions of the first nucleotide of genome-mapped reads in the WT unfragmented NAD captureSeq sample group. The 5' UTR region (-120 to +50, relative to the translation start site (TLS) as '0') is zoomed in in the upper panel, confirming the 5' UTR length distribution expected from the literature data visualized in Supplementary Fig.  1c.
e, Alignment of small RNAs (12-17 nt) with homology to TDH3 RNA observed in the WT unfragmented NAD captureSeq library. The red 'A' is the +1 nucleotide of the assumed transcription start site. The number on the right side of the sequence indicates the number of reads.
f, Enriched NAD-RNAs in the large fragmented NAD captureSeq libraries of the WT strain. All statistics parameters as in Fig. 1a. Performed in biological replicates, n=3. g, Enriched NAD-RNA from Walters' published WT BY4742 yeast library 2 . All statistics parameters are as in Fig.1a. h, Heatmap correlation between NAD-RNA enrichment (NAD captureSeq) and transcript abundance (transcriptome sequencing). Continuous line represents actual expression levels, while the dashed line serves as reference for comparison.
i, LC/MS standard curve for the quantification of pulled-down NAD-RNAs. The grey dashed line represents titrated nicotinamide riboside (NR, grey empty circles) as standard curve. Blue triangles denote the measured NR intensities for pulled-down NAD-TDH3 RNA, while red cubes represent NAD-POR1 RNA NR intensities. Performed in biological replicates, n=3.
j, Bioanalyzer electrophoresis of NADcapture Seq cDNA amplicon. The red line represents the cDNA amplicon of the NAD captureSeq sample group (+ADPRC treatment), while the blue line represents the negative control group (-ADPRC treatment). The x axis represents the cDNA length in base pairs (bp), while the y axis denotes the signal intensity. Peaks at 15 and 1500 bp represent size standards added for Bioanalyzer electrophoresis.
Source data are provided as a Source Data file. e,f, Hydrolysis of NAD (yielding NMN and ATP), catalyzed by WT Npy1 and mutant (E276Q) Npy1 in vitro. 32 P-NAD was treated with the respective enzyme in the presence of 2 mM Mg 2+ and 1 mM Mn 2+ and reaction mixtures separated by thin layer chromatography (TLC, NH4OAc/EtOH 4:6). g, Growth phenotype comparison between WT and npy1Δ strain under different conditions. Cells were spotted in 10-fold serial dilutions starting from OD600 = 1. The cells were cultured in normal YPD medium at 30 °C, while the NaCl set was additionally supplemented with 0.5 M NaCl. h, Analysis of expression level changes of transcripts upon removal of Npy1 by transcriptome sequencing. 7620 different transcripts were analyzed and are represented as dots. Red dots: up-regulated transcripts (fold change >1.414, normalized base mean >1, p <0.05 (DEseq2, negative binomial distribution)), blue dots: down-regulated transcripts (fold change <0.707, normalized base mean >1, p <0.05). Performed in biologically independent replicates, n=3.
i, Enriched NAD-RNAs in the small fragmented NAD captureSeq library of the npy1Δ strain. Other parameters as in Fig. 1a. Performed in biologically independent replicates, n=3. j, Enriched NAD-RNAs in the large fragmented NAD captureSeq library of the npy1Δ strain. Other parameters as in Fig. 1a. Performed in biologically independent replicates, n=3. k, LC/MS standard curve with total NAD-RNA. Same analysis as Supplementary Fig. 1i, but using total RNA for the measurement. Performed in biologically independent replicates, n=3.
l, Relative RNA NAD-modification ratio trend from integration of NADcaptureSeq, transcriptome, and LC-MS data. Grey dashed line equals y=x as reference, while the blue solid line represents the linear regression of the experimental data: NAD-ratioΔNpy1=k* NAD-ratioWT+c (k denotes the slope, while c corresponds to the intercept).
Source data are provided as a Source Data file.

Supplementary Fig. 3 | Detailed comparative analysis of yeast deletion mutants.
a, Time course of the cell density (OD600) for WT and all deletion mutants. The mutant strains harboring a RAI1 gene deletion are indicated by dashed lines while all other strains are plotted as continuous lines. The cells were incubated in YPD medium on 96 well plates at 30 °C under agitation. The cell density was inferred based on the optical density at 600 nm at the identical time points. b, Heatmap of the differential expression assessed by transcriptome sequencing, comparing the number of upregulated (red, above the "0 0 0" diagonal) and down-regulated (blue, below the "0 0 0" diagonal) transcripts between two respective strains. The number of RNA species was log10-transformed and resulting values color-coded. Example: Comparison of npy1∆ and WT strain yields 974 species exhibiting up-regulation and 827 species exhibiting down-regulation. c, Venn diagrams showing the intersection of up-/down-regulated RNA species between the three single-deletion strains by transcriptome sequencing. Biologically independent samples n=3. d, Heatmap of the intersection of enriched NAD-RNAs assessed by NAD captureSeq. The number of RNA species that overlap between two strains was log10 transformed and scaled by color intensity. Each group comprises biologically independent replicates, n=3. e, Heatmap of functional clustering of the top 250 enriched NAD-RNA species for WT and all 7 deletion strains by gene ontology (GO) terms. The color intensity represents the log10 transformation of the p value. Each group comprises biologically independent replicates, n=3.
Source data are provided as a Source Data file.

Supplementary Fig. 4 | Genome-wide 5'-transcript leader features of NAD-RNAs.
a, Comparison of the read profiles of WT and npy1Δ samples. Aligned reads of NAD captureSeq and transcriptome sequencing were normalized as RPM and visualized in the IGB. Green patterns represent accumulated reads in the fully treated sample group (+ADPRC), while the red traces were derived from the -ADPRC negative control in NAD captureSeq data sets (unfragmented libraries). The grey traces represent the read distribution of transcripts from transcriptome sequencing. b, Identical experimental and analytical procedures as in Fig. 6c, but using the small fragmented WT NAD captureSeq libraries.
c, Identical experimental and analytical procedures as in Fig. 6c, but using the large fragmented WT NAD captureSeq libraries. d, Identical experimental and analytical procedures as in Fig. 6c, but using the dxo1Δ rai1Δ npy1Δ triple knockout NAD captureSeq libraries.
Source data are provided as a Source Data file. q, Scheme illustrating the TDH3 gene promoter and relevant mutations. The yellow 'A' is the TSS and referenced as +1. The red letters highlight mutations. r, Spike-in NAD-RNAIII and ppp-RNAI served as the standards for TDH3 RNA NAD-ratio quantification. Statistics parameters as in Fig. 7c. Error bars represent the mean ± sd. Performed in biologically independent replicates, n=3.
s, Flow cytometry analysis and analysis of GFP expression levels of the yeast strains shown in Fig. 7d, using three independent clones. The solid lines in the boxplot represent the median fluorescence signal intensity for each group (>75000 events in pOri, p-1, p+2, p+3, and p+459). The median values of each group were compared, p value is denoted by asterisks: (**) p <0.01 (Student's t test, one-sided). Outliers above 7000 were not shown and omitted. The layout of the boxplot is similar to Supplementary Fig. 1c. On the right panel, 30000 collected cells from the pOri and p-1 strain were first subdivided into single cell populations by a SSC-H gate. Subsequently, GFP signal intensity was measured after removal of background signal by GFP (488-E-A). The subgroup percentage of gated events is denoted beside the indicated range.
Source data are provided as a Source Data file. c, Prediction of TL folding energy using RNAfold 6 as a function of the assumed transcript length. The orange bars represent the predicted minimum free energy of the top 25 highest-enriched NAD-RNAs when comparing the dxo1Δ rai1Δ npy1Δ triple knockout (S/N) with the dxo1Δ npy1Δ double knockout (S/N), while the grey bars refer to background RNAs (no significant NAD enrichment change (0.707 < dxo1Δ rai1Δ npy1Δ (S/N)/ dxo1Δ npy1 (S/N) < 1.414) between the two strains). The layout of the boxplot is identical to Supplementary Fig. 1c. Yeast cells were cultured in 100 mL YPD medium and collected at OD600 0.8. The cells (biological triplicates) were washed twice with ice-cold dH2O then twice with ice-cold PBS. The pelleted cells were then lysed by passing them twice through a French press (~0.69 kbar) in 2 mL protein lysis buffer (50 mM Tris-HCl (pH 7.5), 100 mM NaCl, 1 mM EDTA, 1 mM PMSF, 1 µg/mL leupeptin, 1 µg/mL pepstatin A). The lysate was centrifuged at 25,000 g, 4 °C, 20 min. The supernatant was collected and subsequently flash-frozen in liquid nitrogen and stored at -80 °C until all samples were ready. The protein concentration was measured by BCA assay. Reduction of disulfide bonds in cysteinecontaining proteins was performed using 10 mM dithiothreitol (56 °C, 30 min, in 50 mM HEPES, pH 8.5). Reduced cysteines were alkylated with 20 mM 2-chloroacetamide (room temperature, in the dark, 30 min, 50 mM HEPES, pH 8.5). Samples were prepared following the SP3 protocol 32 and subsequently trypsin (sequencing grade) was added in an enzyme to protein ratio of 1:50 for overnight digestion at 37 °C. Peptides were labelled using the TMT10plex 33 Isobaric Label Reagent, according the manufacturer's instructions. For further sample clean up, an OASIS HLB µElution Plate (Waters) was used. Offline high pH reverse phase fractionation was carried out on an Agilent 1200 Infinity high-performance liquid chromatography system, equipped with a Gemini C18 column (3 μm, 110 Å, 100 x 1.0 mm, Phenomenex) 34 .

Proteomics Mass Spectrometry Data Acquisition and
Analysis. An UltiMate 3000 RSLC nano LC system (Dionex) was fitted with a trapping cartridge (µ-Precolumn C18 PepMap 100, 5 µm, 300 µm i.d. x 5 mm, 100 Å) and an analytical column (nanoEase M/Z HSS T3 column 75 µm x 250 mm C18, 1.8 µm, 100 Å, Waters). Trapping was carried out with a constant flow of solvent A (0.1% formic acid in water) at 30 µL/min onto the trapping column for 6 minutes. Subsequently, peptides were eluted via the analytical column with a constant flow of 0.3 µL/min with an increasing percentage of solvent B (0.1% formic acid in acetonitrile) from 2% to 4% in 4 min, from 4% to 8% in 2 min, followed by 8% to 28% for a further 96 min, and finally from 28% to 40% in another 10 min. The outlet of the analytical column was coupled directly to a QExactive plus (Thermo Scientific) mass spectrometer using the proxeon nanoflow source in positive ion mode.
The peptides were introduced into the QExactive plus via a Pico-Tip Emitter 360 µm OD x 20 µm ID; 10 µm tip (New Objective) and an applied spray voltage of 2.3 kV. The capillary temperature was set to 320 °C. A full mass scan was acquired with a mass range from 350 to 1400 m/z in profile mode in the FT with a resolution of 70000. The filling time was set to the maximum of 100 ms with a limitation of 3x10 6 ions. Data-dependent acquisition (DDA) was performed with the resolution of the Orbitrap set to 35000, with a fill time of 120 ms and a limitation of 2 x 10 5 ions. A normalized collision energy of 32 was applied. A loop count of 10 with count 1 was used and a minimum AGC trigger of 2e2 was set. A dynamic exclusion time of 30 s was used. The peptide match algorithm was set to 'preferred' and charge exclusion 'unassigned', charge states 1, 5 -8 were excluded. MS2 data was acquired in profile mode.
IsobarQuant 35 and Mascot (v2.2.07) were used to process the acquired data, which was searched against a Uniprot S. cerevisiae proteome database (UP000002311) containing common contaminants and reversed sequences. The following modifications were included into the search parameters: Carbamidomethyl (C) and TMT10 (K) (fixed modification), Acetyl (N-term), Oxidation (M) and TMT10 (N-term) (variable modifications). For the full scan (MS1) a mass error tolerance of 10 ppm and for MS/MS (MS2) spectra of 0.02 Da was set. Further parameters were set: Trypsin as protease with an allowance of maximum two missed cleavages: a minimum peptide length of seven amino acids; at least two unique peptides were required for a protein identification. The false discovery rate on peptide and protein level was set to 0.01.
The protein.txt -output file of IsobarQuant was analyzed using an R script. As a quality control filter, only proteins which were quantified with at least 2 unique peptides were used (2256 out of 6049 proteins remained). The signal_sum columns were annotated according to the experimental conditions. Batch-effects were removed with the limma (v3.38.3) package and subsequently the data was normalized using vsn (v3.50.0). limma was used again to test for differentially expressed genes between wild type and npy1Δ. Proteins were annotated as a hit with a fold-change bigger 50% and a false discovery rate smaller 5% and as a candidate with a fold change bigger 40% and a false discovery rate smaller 20%.
Flow Cytometry Data Acquisition and Analysis. Cells, in biological triplicates, were grown in low fluorescence synthetic complete medium lacking leucine to mid log phase. Flow cytometry (FCM) of yeast cells expressing GFP from p415-based plasmids was performed on a BD FACSCanto TM II (BD Bioscience) equipped with a 488-nm laser and a combination of 502-nm long-pass and 530/30-nm band pass emission filters for GFP detection. Total 300000 events were measured for data analysis using Flowing Software. Events of single cells were isolated. Then the events of fluorescence background were removed. Fluorescence intensity of remained events was analyzed.

Plasmid Construction.
Npy1: the open reading frame (ORF) of NPY1 gene was PCR-amplified from S. cerevisiae (BY4742 strain) gDNA, and NdeI and BamHI restriction sites introduced using the respective primers (Fw_NPY1 and Rev_NPY1, Table S1). The PCR products were then purified using the QIAquick PCR Purification Kit (QIAGEN) and digested with NdeI and BamHI-HF. The digested amplicons were subsequently ligated into a NdeI-and BamHI-digested pET-28a (+) plasmid (Novagen) by T4 DNA ligase, which is subsequently referred to as pET-28a-NPY1.
mRNA for in vitro translation: The cloning procedure to obtain template mRNA sequences of interest had three distinct steps. Firstly, the 5' UTR and 22 nt of the coding sequence (CDS) of the gene were PCR-amplified from S. cerevisiae (BY4742 strain) gDNA using the corresponding primers (Table S1). Secondly, the firefly luciferase sequence (1653 bp, same as from pGL4.10 [Luc2] Vector (Promega)) and the Renilla reniformis luciferase sequence (936 bp, same as from pRL-null Vector (Promega)) were cloned from the plasmid pBK77N_hybSV40_Luc2_empty_BGH_Renilla_CMV and pFK-DVs-R2A, respectively, using dedicated primer pairs (Fw_Luc2, Rev_luc2, Fw_Renilla and Rev_Renilla primers; Table S1). Thirdly, the T7 promoter sequence containing a HindIII site was fused with 5'-terminal mRNA sequence, firefly luciferase sequence, and a poly(A) 30  YAAG promoter: The plasmid backbone used here was a derivative of the p415GPD vector 36 . The TDH3 gene promoter and 15 nt of the CDS were obtained from S. cerevisiae gDNA by standard PCR amplification, using corresponding primers (F_TDH3_5UTR_15ntCDS_SacI and R_TDH3_5UTR_15ntCDS_XbaI; Table S1). The p415GPD plasmid and the generated PCR products were digested with SacI-HF and XbaI and subsequently ligated. Next, sequence superfolder GFP (sfGFP) was amplified from pMaM5 plasmid under standard PCR conditions, using appropriate primers (F_sfGFP_BamHI and R_sfGFP_HindIII; Table S1). These sfGFP-encoding amplicons and the partially assembled genetic constructs, described above, were again digested with BamHI-HF and HindIII-HF and ligated together as the pOri plasmid. Plasmids carrying mutations at the positions p-1, p+2, p+3, p+459 were generated by PCR using dedicated primer pairs, separately (Mutagenesis_R, Mutagensis_p-1A_F, Mutagensis_p+2T_F, Mutagensis_p+3T_F, and Mutagensis_p+459T_F, Table S1).
Correct insert sequences of all plasmids were further confirmed by Sanger sequencing.
gDNA Extraction. The gDNA isolation procedure was based on a published method 37 . 1.5 mL of the yeast overnight culture (BY4742 strain) were pelleted and resuspended in 200 µL lysis buffer, containing 2% Triton X-100, 1% SDS, 100 mM NaCl, 10 mM Tris-HCl (pH 8.0), 1 mM EDTA (pH 8.0). Resuspended cells were kept at -80 °C for 15 min and then directly heated to 95 °C for 1 min. Samples were subjected to two additional freeze-thaw cycles, as described. The mixture was then vortexed for 30 s. Subsequently, 200 µL chloroform was added and the mixture vortexed for 2 min at RT before centrifugation. Then the aqueous layer from the centrifugation was transferred into 400 µL ethanol (ice-cold). The solution was incubated at RT for 5 min. Then the solution was centrifuged at 20,000 g for 10 min at RT. The supernatant was collected and dried under vacuum. The gDNA pellet was then resuspended in 20 µL TE buffer (10 mM Tris-HCl (pH 8.0), 1 mM EDTA (pH 8.0)).

Protein Expression and Purification. ADPRC:
The Pichia pastoris GS115 pPICZαA/CYCLASE-2 strain was a gift from H.C. Lee. The protein expression and purification was performed as described previously with minor modifications 11 . First, P. pastoris was cultured grown on YPD agar plates, containing 100 µg/mL Zeocin. Then, single colonies were used to inoculate 10 mL liquid YPD medium, and subsequently cultured in a final volume of 500 mL YPD medium, until the optical density of the cells reached its plateau phase at about 50 mg cell pellet per mL medium. Next, the YPD medium was replaced by 500 mL BMMY medium (1% yeast extract, 2% peptone, 100 mM K2HPO4/KH2PO4 (pH 6.0), 1.34% (w/v) yeast nitrogen base with ammonium sulfate without amino acids, 0.4% mg/L biotin, 0.5% (v/v) methanol). Following the exchange of growth medium and after additional 24 h, as well as 48 h, of culturing the yeast in BMMY medium, 2.25 mL methanol were added. The supernatant of thus treated cultures was then collected, after overall 72 h, by centrifugation at 4 °C, 1000g, 10 min. The 500 mL of supernatant were filtered through a 0.45 µm filter and then concentrated employing Amicon Ultra-15 mL Centrifugal Filter Units 10 kDa, ultimately yielding 25 mL of concentrate. Contained proteins were dialyzed overnight against 1 L dialysis buffer, containing 50 mM NaOAc (pH 5.0), using a 5 kDa cut-off membrane (Carl Roth). Dialysis buffer was exchanged regularly with fresh 1 L dialysis buffer. Dialyzed protein solution was then loaded on 2X HiTrap SP HP 1mL columns at a flowrate of 1 mL/min on an FPLC system (Bio-Rad). ADRPC was eluted by applying a salt gradient from 50 mM NaOAc (pH 4.0) to 50 mM NaOAc (pH 4.0), 1M NaCl at 0.75 mL/min. The fractions that contained the ADPRC band upon SDS-PAGE analysis were pooled. The cyclase activity was determined conducting an NGD fluorometric assay. Briefly, ADPRC was subjected to serial dilution ranging from 0.2 ng/µL to 1.2 ng/µL, with constant amount of 60 µM NGD in HEPES Buffer (50 mM HEPES, 5 mM MgCl2, pH 7), in a total reaction volume of 20 µL. According to linear regression of the NGD kinetics, 1 U of activity was defined as 0.125 µg ADPRC that, at a concentration of 1.35 µg/mL, converted 60 mM NGD and reached a cGDPr fluorescence plateau after 130-140s (JASCO spectrophotometer, λex = 300 nm; λem = 410 nm, high sensitivity, bandwidth 2 nm) at 25 °C 12 .
Expression and affinity purification of Npy1 and Npy1(E276Q) was achieved by the following standard procedures with minor changes. Expression in E. coli, carrying the corresponding expression vector, was induced at an OD600 of ~0.7) by adding 0.1 mM IPTG. The cells were then chilled for 20 min at 4 °C and incubated at 16 °C, 150 rpm, for an additional 16 h. Cell pellets were subsequently harvested by centrifugation and washed with ice-cold dH2O. The pelleted cells were then resuspended in HisTrap Buffer A (50 mM Tris-HCl (pH 7.8), 0.3 M NaCl, 5 mM MgSO4, 5 mM 2-mercaptoethanol, 5% glycerol, 5 mM imidazole) and cells lysed by sonification. After centrifugation (37,500 g, 4°C, 30 min) of thus obtained lysates, the supernatant was filtered through a 0.45 µm filter, before loading it on a HisTrap HP 1 mL Column, using an FPLC system (Bio-Rad). The target protein was then eluted by an imidazole gradient, ranging from HisTrap Buffer A to HisTrap Buffer B, which contained an additional 500 mM imidazole. Based on SDS-PAGE analysis, fractions containing the target protein were pooled and concentrated by Amicon Ultra-15 mL Centrifugal Filter Units 10 kDa (Merck) and the HisTrap Buffer B was exchanged with Buffer G ((50 mM Tris-HCl (pH 7.5), 200 mM NaCl, 0.1 mM DTT). Further purification of Npy1 and Npy1(E276Q) by size-exclusion chromatography (SEC) was achieved on a Sephacryl 16/60 S-200 High Resolution column. The final concentration of all proteins was determined employing the Pierce BCA Protein Assay Kit and stored in 50% glycerol at -20 °C.