Nucleotide-amino acid π-stacking interactions initiate photo cross-linking in RNA-protein complexes

Photo-induced cross-linking is a mainstay technique to characterize RNA-protein interactions. However, UV-induced cross-linking between RNA and proteins at “zero-distance” is poorly understood. Here, we investigate cross-linking of the RBFOX alternative splicing factor with its hepta-ribonucleotide binding element as a model system. We examine the influence of nucleobase, nucleotide position and amino acid composition using CLIR-MS technology (crosslinking-of-isotope-labelled-RNA-and-tandem-mass-spectrometry), that locates cross-links on RNA and protein with site-specific resolution. Surprisingly, cross-linking occurs only at nucleotides that are π-stacked to phenylalanines. Notably, this π-stacking interaction is also necessary for the amino-acids flanking phenylalanines to partake in UV-cross-linking. We confirmed these observations in several published datasets where cross-linking sites could be mapped to a high resolution structure. We hypothesize that π-stacking to aromatic amino acids activates cross-linking in RNA-protein complexes, whereafter nucleotide and peptide radicals recombine. These findings will facilitate interpretation of cross-linking data from structural studies and from genome-wide datasets generated using CLIP (cross-linking-and-immunoprecipitation) methods.

The synthesis of all oligonucleotides was carried out with an MM12 synthesizer (Bio Automation Inc., Plano, TX) on a 50 nmol scale using 500 Å UnyLinker CPG (ChemGenes, Wilmington, MA).

Cross-linking and mass spectrometry
For the digestion using RNases, 75 µg of RNA-protein complexes were made from equimolar mixtures of unlabelled and isotope-labelled RNA and irradiated four times at 800 mJ/cm 2 . Each irradiation step was separated by 1 min for sample cooling. After irradiation, samples were precipitated with 3 volumes of ethanol at -20°C and 1/10 volumes 3 M sodium acetate (pH 5.2), left at -20°C for at least 2 h, and centrifuged at 4°C for 30 min at 13,000 g. Resulting pellets were washed by brief vortexing in 80% ethanol at -20°C, and centrifuged repeatedly. Pellets were air-dried for 10 min, then were resuspended in 50 µL of 50 mM Tris-HCl (pH 7.9) with 4 M urea, and then diluted with 150 µL 50 mM Tris-HCl, pH 7.9. RNases A (Roche Diagnostics, Rotkreuz, CH) and T1 (ThermoScientific, Waltham, MA) were added at 5 μg or 5 U per mg of cross-linked sample, respectively. Samples were digested for 2 h at 52 °C. Samples were cooled on ice, and 2 μl of 1 M MgCl2 was added to each sample, followed by 125 U of benzonase (Sigma Aldrich, St Louis, MO) per mg of cross-linked complex. Samples were further digested for 1 h at 37 °C.
Trypsin was added in a 24:1 protein:enzyme ratio (w/w) to all samples, and the samples incubated overnight at 37°C on a shaking incubator. Samples were heated to 70°C for 10 min to deactivate trypsin, purified by solid-phase extraction (SPE, Waters SepPak 50 mg tC18 cartridges, Milford, MA), and evaporated to dryness in a vacuum centrifuge. RNA-protein crosslinks were enriched by titanium dioxide affinity chromatography, as described previously 1 .
Briefly, dried samples were resuspended in 100 μl of 50% acetonitrile, 0.1% trifluoroacetic acid, 10 mM lactic acid (loading buffer). The samples were incubated for 30 min on a shaking incubator at >10,000 rpm with 5 mg of preequilibrated TiO2 beads (10 μm Titansphere PhosTiO, GL Sciences, Tokyo, JPN), and the beads settled by centrifugation. The supernatant was removed and replaced with 100 μl fresh loading buffer, and the sample incubated for a further 15 min. Centrifugation was repeated, the supernatant removed, and 100 μl 50% acetonitrile, 0.1% trifluoroacetic acid (washing buffer) was added, followed by 15 min incubation, centrifugation for 1 min at 10,000 g, and removal of the supernatant. Peptide-RNA adducts were eluted from the beads with 50 μl 50 mM ammonium phosphate, pH 10.5, and incubated for 15 min on a shaking incubator at >1000 rpm. Beads were settled by centrifugation as in previous steps, and the supernatant was carefully collected. The elution step was repeated a second time, and the eluate was stored on ice. The eluate was immediately acidified to pH 2-3 with TFA, and purified with solid phase extraction using self-packed Stage tips. Briefly, two layers of C18-filter (Empore, 3M, ThermoFisher scientific, Waltham, MA) in a 200 μl tip were washed with 100% acetonitrile, 80% acetonitrile with 0.1% formic acid, then twice with 5% acetonitrile with 0.1% formic acid. The sample was applied, and the tips then (v/v/v); B = water:acetonitrile:formic acid, 2:98:0.15 (v/v/v)) over 60 min with a flow rate of 300 nl/min.
The Orbitrap Elite was operated in data-dependent acquisition mode. The Orbitrap analyzer was used for acquisition of precursor ion spectra with a resolution of 120000. Precursor ions were fragmented with collisioninduced dissociation (CID, normalised collision energy = 35%), with dynamic exclusion enabled for 30 sec. Fragment ions were detected at "Normal" resolution in the ion trap.
ThermoFisher RAW data files produced by the mass spectrometer were converted to centroided mzXML files with msconvert.exe (ProteoWizard version 3.0.9393) and searched against a FASTA database containing the FOXRRM protein sequence using xQuest. 2 All amino acids were specified as possible modification sites, and all possible adducts of 1-4 nucleotides in length, based on the sequence UGCAUGU or the subsequent mutated sequences,

Unbiased analysis of three large cross-linking datasets
For the analysis of the cross-linking data from Kramer et al. 4 , the human and yeast dataset was manually filtered for non-sulphur containing cross-linking sites (Supporting Data 4). Next, cross-linked amino acids that are aromatic (F, W, Y, H) or within -/+3 amino acids of an aromatic amino acid were selected. For these, the Protein Data Bank (PDB) was searched for high resolution structures using the UniProt number and the assigned cross-linked amino acid or its neigbouring aromatic amino acid was manually evaluated for possible participation in a π-stacking interaction. In case of unavailable structures also closely related protein structures were considered. Protein hits with no known structure (Date: January 1, 2022) or hits where the cross-linked amino acid is not in contact with the RNA were designated as "unknown".
For the analysis of the cross-linking data from Bae et al. 5 , the data was filtered for hits with a total peptide-spectrum match (PSM) Count >3. Next, cross-linked amino acids that are aromatic (F, W, Y, H) or within -/+3 amino acids neigbouring an aromatic amino acid were manually evaluated for π-stacking interaction using the crystal structure  6 . Surface (heavy atoms of RRM) and stick (heavy atoms of the RNA) represent the lowest energy structure (structure visualized with PyMOL (PyMOL Molecular Graphics System, Version 2.5 Schrödinger, LLC)). b) SDS-PAGE showing that G2/G6 mono-mutants undergo cross-linking with FOXRRM with increasing irradiation. The cross-linking band is indicated by XL (repeated three times). c) CLIR-MS plots of uniformly labelled G2/G6 mutants cross-linked to FOXRRM, in order to identify sites of cross-linking; the xQuest software was used to search for cross-linked mono-, di-, tri-and tetra-nucleotide adducts which are colour coded. Mutation of G2 or G6 to A2 or A6 greatly attenuates cross-linking of the mutated nucleotides to the clusters of amino-acids around positions 126 and 160, respectively. Inserted IC50 values are taken from Auweter et al. 6 and show that mutation of G2 and G6 attenuates binding to FOXRRM (the mutated nucleotide is labelled in red). CLIR-MS analysis of singly-labelled FOXRBE mutants to FOXRRM using alkaline hydrolysis work-up. Plots show number of RNA adducts at each amino acid position of FOXRRM. The xQuest software was used to search for crosslinked mono-, di-, tri-and tetra-nucleotide adducts that include a labelled nucleotide which are colour coded (*N indicates a 13 C-labelled nucleotide and the mutated nucelotide is labelled in red). Figure 8. CLIR MS of PTBP1 in complex with IRES RNA of EMCV and overlay of NMR solution structure of PTBP1 in complex with CUCUCU. a) CLIR-MS analysis of PTBP1 in complex with the internal ribosomal entry site (IRES) of encephalomyocarditis virus (EMCV) analysed by Dorn et al. 1 . Samples were digested using RNases (Rnase A, T1 and benzonase). Plots show the number of RNA adducts at each amino acid position. The different cross-linked mono-, di-, tri-and tetranucleotide adducts that are present in the IRES of EMCV are colour coded. The main cross-linked amino acids are annotated. b) Enlarged view of selected cross-linking clusters of PTBP1 in complex with IRES RNA of EMCV (same data as Fig. S8a, analysed by Dorn et al. 1 ) with annotated cross-linking adducts correlated with NMR solution structure of PTBP1 in complex with RNA binding motif CUCUCU (PDB ID: 2AD9, 2ADB, 2ADC) 9 . (Due to the more complex structure of the IRES EMCV RNA, the cross-linking adducts UU, GU and AU are detected, which are not present in the consensus RNA binding motif CUCUCU); red-highlighted sub-sequences likely result from hydrolysis of cytidine stacked to H457. Structures are visualized with PyMOL (PyMOL Molecular Graphics System, Version 2.5 Schrödinger, LLC). The π-stack of Y127 and uridine (i) has an intervening π-stacked arginine.