An updated map of Trypanosoma cruzi histone post-translational modifications

In humans and other eukaryotes, histone post-translational modifications (hPTMs) play an essential role in the epigenetic control of gene expression. In trypanosomatid parasites, conversely, gene regulation occurs mainly at the post-transcriptional level. However, our group has recently shown that hPTMs are abundant and varied in Trypanosoma cruzi, the etiological agent of Chagas Disease, signaling for possible conserved epigenetic functions. Here, we applied an optimized mass spectrometry-based proteomic workflow to provide a high-confidence comprehensive map of hPTMs, distributed in all canonical, variant and linker histones of T. cruzi. Our work expands the number of known T. cruzi hPTMs by almost 2-fold, representing the largest dataset of hPTMs available to any trypanosomatid to date, and can be used as a basis for functional studies on the dynamic regulation of chromatin by epigenetic mechanisms and the selection of candidates for the development of epigenetic drugs against trypanosomatids. Measurement(s) histone_modification Technology Type(s) mass spectrometry • nanoflow liquid chromatography-tandem mass spectrometry • Data-Dependent Acquisition Sample Characteristic - Organism Trypanosoma cruzi Measurement(s) histone_modification Technology Type(s) mass spectrometry • nanoflow liquid chromatography-tandem mass spectrometry • Data-Dependent Acquisition Sample Characteristic - Organism Trypanosoma cruzi Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.13491165

(2021) 8:93 | https://doi.org/10.1038/s41597-021-00818-w www.nature.com/scientificdata www.nature.com/scientificdata/ of targets for epigenetic drugs. Nevertheless, due to the high degree of similarity of its orthologous genes, it is very likely that epigenetic components are also essential for the growth and survival of this parasite and can represent important targets for the development of new therapies for Chagas Disease.
In this context, our group has been working on large-scale proteomic analysis to provide a global view of the PTM landscape for each of the T. cruzi histones, aiming to smooth this gap of information and to pave the way for functional epigenetic studies on trypanosomatids. Here, we applied optimized sample preparation, two parallel mass spectrometry-based proteomic approaches (GeLC-MS/MS and LC-MS/MS) with complimentary sensitive/ high-resolution fragmentation techniques (CID/HCD) and de novo assisted database search ( Fig. 1) to deeply profile the PTMs of T. cruzi canonical, variant and linker histones, increasing to 189 the number of hPTM sites and to 353 the number of hPTM marks described for this parasite (Fig. 2) and contributing to the hypothesis of the existence of dynamic regulation of chromatin by hPTMs in trypanosomatids. A summary of the global numbers of T. cruzi hPTMs described up to this date is available in Table 1 and detailed in Fig. 3. Our updated T. cruzi hPTM dataset represents the most comprehensive available for any trypanosomatid to date, and can be used as a basis for future functional studies and selection of targets for the development of anti-parasitic epigenetic drugs.

Methods
Cell culture and histone enrichment. T. cruzi Dm28c epimastigotes were cultured to log phase in liver infusion tryptose (LIT) medium 27 , supplemented with 10% fetal bovine serum without agitation at 28 °C. Histone extraction and enrichment were performed as previously described 20 , with some modifications. Briefly, 1 × 10 9 epimastigote cells were collected by centrifugation (10 minutes, 5000 g at 4 °C). Cells were lysed resuspending the obtained pellet in 1 ml of extraction buffer A (250 mM Sucrose; 1 mM EDTA; 3 mM CaCl 2 ; 10 mM Tris-HCl pH 7.4; 0.5% (v/v) Saponin; 10 mM sodium butyrate, 1x protease inhibitor cocktail (Complete Mini EDTA free, Roche) and 1x phosphatase inhibitor cocktail (Roche)) and centrifuged for 10 minutes at 6000 g, 4 °C. Cell pellet was washed in 1 ml of extraction buffer B (extraction buffer A without saponin) and centrifuged for 10 minutes at 6000 g, 4 °C. The pellet, containing the cell nuclei, was resuspended in 1 ml of Buffer C (1% (v/v) Triton X-100; 150 mM NaCl; 25 mM EDTA; 10 mM Tris-HCl pH 8; 10 mM sodium butyrate, 1x protease inhibitor cocktail (Complete Mini EDTA free, Roche) and 1x phosphatase inhibitor cocktail (Roche)) and then centrifuged for 20 minutes at 12000 g, 4 °C. The pellet was washed 3 times in 100 mM Tris-HCl pH 8, resuspended in 1 ml of 0.4 N HCl and incubated on a rotator overnight at 4 °C. Acid soluble proteins were recovered in the supernatant after sample centrifugation for 15 minutes at 10000 g, 4 °C. The supernatant was transferred to a clean tube; acetone (8 x the initial volume) was added and incubated overnight at −20 °C. The sample was centrifuged for 15 minutes at 3100 g, 4 °C. Acetone was removed carefully and the pellet was washed 3 times with 1 ml of acetone. The protein pellet was carefully dried at 37 °C and then resuspended in 50 µl of water. www.nature.com/scientificdata www.nature.com/scientificdata/ , with an EASY nLC 1000 (Thermo Fisher Scientific) system connected to an LTQ Orbitrap XL (Thermo Fisher Scientific) mass spectrometer equipped with a nanoelectrospray ion source (Phoenix S&T). Chromatographic separation of the peptides took place in a one-column set-up, with a 30-cm analytical column (75 μm inner diameter, 350 μm outer diameter) in-house packed with reversed-phase C18 resin (ReproSil-Pur C18-AQ 1.9 µm, Dr. Maisch GmbH, Ammerbuch-Entringen, Germany), kept at a constant temperature of 60 °C. Solvent A was 0.1% formic acid, 5% DMSO in water, and solvent B was 5% DMSO, 0.1% formic acid in acetonitrile. Samples were injected onto the column and subsequently eluted with a flow rate of 250 nL/min and peptide mixtures were separated with a linear gradient from 5% to 40% acetonitrile in 128 min. The mass spectrometer operated in Data-Dependent Acquisition (DDA) mode to automatically switch between MS and MS/MS (MS 2 ) acquisition, using, applying both Collision-Induced Dissociation (CID) and Higher Energy Collisional Dissociation (HCD) to the 5 most intense peptides detected in each MS spectrum. For all samples duplicate or triplicate LC-MS/MS runs were performed. Survey full scan MS spectra (at 300-1600 m/z range) were acquired in the Orbitrap analyzer with resolution R = 60,000 at m/z 400 (after accumulation to a target value of 1,000,000 in the linear ion trap), with preview scan enabled. Singly-charged precursor ions were not selected for fragmentation. Former target ions selected for MS/MS were dynamically excluded for 30 seconds. Total cycle time was approximately three seconds. Other mass spectrometric conditions were: spray voltage, 2.4 kV; no sheath and auxiliary gas flow; ion transfer tube temperature, 100 °C; collision gas pressure, 1.3 mTorr; normalized collision energy using wide-band activation mode 35% for MS2. The ion selection threshold was 250 counts for MS2. An activation q = 0.25 and activation time of 30 ms was applied in MS2 acquisitions. The lock mass 32 option, using DMSO peaks 33 was enabled in all full scans to improve the mass accuracy of precursor ions. Data analysis. Peptides and hPTM sites were identified with the software Peaks Studio (version 10, Bioinformatics Solutions Inc) [34][35][36] .The sequential analysis by Peaks Studio started with de novo sequencing of fragment spectra (Peptide De Novo), followed by peptide sequence match of the high quality de novo tags with (Peaks DB) 36 , considering the most frequent modifications, and then by peptide sequence match of the remaining high quality de novo only peptide tags (Peaks PTM) 35 . Proteins were searched against a database containing 20257 sequences of T. cruzi Dm28c strain (downloaded on Aug 15, 2018 from TriTrypDB, http://www.tritrypdb.org). In all Peaks searches (Peptide De Novo, Peaks DB and Peaks PTM) the precursor mass tolerance was set to 10 ppm and the fragment ion mass tolerance was set to 0.5 Da (ion trap spectra) or 20 ppm (Orbitrap spectra). Minimum peptide size was set to five amino acids, allowing for two missed cleavages. The enzyme for theoretical digestion was Arg-C with specific digestion mode. For Peaks DB, monomethylation (K/R), dimethylation

Data records
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE 37 partner repository with the dataset identifier https://identifiers.org/pride.project:PXD019104 38 . Representative spectra for all novel hPTM marks and additional tables for supporting our dataset have been uploaded to figshare 39 .

Technical Validation
In the present work, two parallel proteomic approaches (LC-MS/MS and GeLC-MS/MS) were used to deeply profile the hPTMs of canonical, variant and linker histones of T. cruzi epimastigotes (Fig. 1). Each proteomic approach was applied to two biological replicates, each of them divided into two or three technical replicates during sample preparation, and multiple LC-MS/MS runs were performed for each sample, totalizing 27.raw files. The experimental design adopted in this study allowed us to substantially expand the repertoire of hPTMs and led to very reliable and complimentary data. One of the reasons for the improved identification of low abundance peptides and more PTM sites was the use of different approaches that not only decreased the complexity of the sample (histone enrichment by acid extraction and further separation by SDS-PAGE), but also explored different biochemical characteristics of histones. The derivatization of proteins before trypsin cleavage prevented overcutting and reduced the charge of the lysine-rich histone regions, especially in the N-terminal tails, producing peptides with good size and charge (doubly and triply charged in electrospray ionization MS 40 ) for optimal high energy based peptide identification 41 . After protein propionylation, the samples from both proteomic strategies were directly submitted to the digestion. Thus, this simplified methodology was efficient in the identification of T. cruzi hPTMs.
A combined list of all histone supporting peptides identified in the present work is available in figshare File 1 39 . Each non-redundant peptide sequence was unambiguously identified by multiple features. The identified peptides matched to several distinct gene products that represent each histone in the genome of T. cruzi, some of them demonstrating the expression of sequence divergent histone isoforms (Fig. S1a). Among the multiple isoforms www.nature.com/scientificdata www.nature.com/scientificdata/ detected for each given histone, the one with the highest score and number of PTMs (e.g. H2A, BCY84_17381; H2B, BCY84_06298; H3, BCY84_02638; H4, BCY84_15632; H2A.Z, BCY84_22061; H2B.V, BCY84_04421; H3.V, BCY84_18558 and H1, BCY84_14748), was chosen as the model sequence to be used throughout the article. All histones were identified by multiple MS/MS spectra in both biological replicates of the two proteomic strategies (Fig. S1b). The quality of hPTM peptide identification can be verified through their mass accuracy and score distribution (Fig. S2). Also, the majority of hPTMs were detected across multiple samples and experiments, strengthening the reliability of our data ( Fig. 4 and figshare File 2 39 ). www.nature.com/scientificdata www.nature.com/scientificdata/ An aspect explored in our data is the relative abundance of hPTMs. In general, the hPTM marks identified displayed low abundance/occupancy. However, some sites of high abundance were also found, mainly for acetylation, methylation and a few phosphorylation sites (Fig. 5). These results seem to be in agreement with previous studies that show the low abundance of scarcely histone modifications in eukaryotes 42 and that the methylation and acetylation are the most abundant hPTM in T. cruzi 19 . Among the hPTM marks with high abundance in our dataset and previously detected in compared studies are the H3K76me/me2/me3, H4K10ac and H4K14ac, important marks involved in cell cycle regulation in trypanosomes 12,14,21 . Other marks in this group are present in variant histones, H3.VK94me2/me3, H2A.ZK54ac and H2A.ZK58ac, that seems to be trypanosome-specific 14 . The relative abundances of the modified peptides and individual modification sites identified in our dataset with a relative abundance equal or higher than 10% are shown in Fig. S3a, b, respectively. www.nature.com/scientificdata www.nature.com/scientificdata/ A total of 201 hPTM marks identified in the present work passed the criteria of site localization score (Ascore >20). Among them, 94 confirmed the literature and 107 were newly described in the repertoire of T. cruzi hPTMs (representative spectra available in figshare File 5 39 ). Also, some hPTMs previously described were not detected here (n = 126) is probably due to technical and biological reasons (e.g. different proteomic strategies, different strains of parasites, regions of histones not covered in a given study, etc.). Also, the chance of identification of hPTM marks in multiple experiments and studies seems to have some correlation with their intensity and relative abundance ( Fig. 6 and figshare File 3 39 ). In addition to the validated hPTM marks described here, we identified another 111 with Ascore <20, which were not added to our final map. However, several of them were either close to the threshold score and/or identified by multiple features/experiments. Therefore, to allow the reader to fully explore the data, a compiled list of all the hPTM marks identified in the present work (Ascore >20/<20 and literature) are available in figshare File 4 39 .
Our updated and comprehensive profile of T. cruzi hPTMs, now available to be used for further studies, reinforces that several residues are targets of multiple modifications, that some modification types are more abundant than others and that hPTMs are widely distributed and diverse at both the tails and the globular domains of all histones, which are both regions with distinct and important roles in the epigenetic regulation of higher eukaryotes 3,43 .