Background & Summary

Trypanosoma cruzi, the causative agent of Chagas Disease, is a flagellated protozoan belonging to the order Kinetoplastida, family Trypanosomatidae1. About 6 to 7 million people are estimated to be infected with T. cruzi worldwide2, making it a serious public health problem. In its life cycle, T. cruzi passes through different hosts, including mammals and insects and, to adapt to these different environments, undergoes dramatic changes in its phenotype, which require a fine regulation of gene expression.

The regulation of gene expression in an organism can occur in different levels. In humans and most other eukaryotes, one of the key points of gene expression control is the regulation of transcription initiation by epigenetic mechanisms, such as the occurrence of hPTMs3, and “epigenetic drugs” are already in use for the treatment of different diseases, such as cancer and neurological disorders4,5,6. In T. cruzi and other trypanosomatids, on the other hand, the control of gene expression occurs mainly post-transcriptionally, at the level of RNA and protein7. Despite that fact, the chromatin of T. cruzi is similar to that of other eukaryotes. It is organized into chromosomes containing canonical (H2A, H2B, H3 and H4), variant (H2A.Z, H2B.V and H3.V) and linker (H1) histones8 and the presence of common hPTMs (acetylation, methylation and phosphorylation) has been detected in both their replicative and non-replicative forms9,10,11,12,13,14,15,16,17,18,19. More recently, our group identified a plethora of 13 different hPTM types in T. cruzi epimastigotes, starting to unravel a histone code that potentially supports the existence of chromatin regulation via post-translational modification in this parasite20.

The importance of a few modification types and specific hPTM marks for trypanosomatids have been demonstrated. In T. brucei, di- and trimethylation of H3K76 regulate the cell cycle21; methylation and acetylation act as signaling for histone writers and erasers in the regulation of variant surface glycoproteins (VSGs)22,23 and variant histones and H3 trimethylation are enriched in probable transcription initiation sites24,25. Similarly, the origins of polycistronic transcription seem to be regulated by histone acetylation in Leishmania major26 and, in T. cruzi, acetylation and methylation indicate regions of transcriptional initiation of divergent polycistronic transcription units, which contain evolutionarily conserved bidirectional promoters11. However, the global impact of hPTMs, as well as the mechanisms underlying their function in trypanosomatids and the language of crosstalk regulation between different PTMs, are still poorly understood.

For trypanosomatids of medical interest, particularly for T. cruzi, the lack of basic information about epigenetic molecular players, such as the thorough identification and site-specific localization of histone PTMs marks, has hindered functional research that would allow the understanding of epigenetic control and the identification of targets for epigenetic drugs. Nevertheless, due to the high degree of similarity of its orthologous genes, it is very likely that epigenetic components are also essential for the growth and survival of this parasite and can represent important targets for the development of new therapies for Chagas Disease.

In this context, our group has been working on large-scale proteomic analysis to provide a global view of the PTM landscape for each of the T. cruzi histones, aiming to smooth this gap of information and to pave the way for functional epigenetic studies on trypanosomatids. Here, we applied optimized sample preparation, two parallel mass spectrometry-based proteomic approaches (GeLC-MS/MS and LC-MS/MS) with complimentary sensitive/high-resolution fragmentation techniques (CID/HCD) and de novo assisted database search (Fig. 1) to deeply profile the PTMs of T. cruzi canonical, variant and linker histones, increasing to 189 the number of hPTM sites and to 353 the number of hPTM marks described for this parasite (Fig. 2) and contributing to the hypothesis of the existence of dynamic regulation of chromatin by hPTMs in trypanosomatids. A summary of the global numbers of T. cruzi hPTMs described up to this date is available in Table 1 and detailed in Fig. 3. Our updated T. cruzi hPTM dataset represents the most comprehensive available for any trypanosomatid to date, and can be used as a basis for future functional studies and selection of targets for the development of anti-parasitic epigenetic drugs.

Fig. 1
figure 1

Experimental workflow for the comprehensive profiling of T. cruzi hPTMs.

Fig. 2
figure 2

Global map of hPTMs currently described for T. cruzi. Modification sites are marked in bold letters and the numbers below the sequence represent the position of the amino acid in the sequence (after removal of the initial methionine). Rectangles indicate the histone-fold within the globular domain. The line below the sequence represents the region covered in our experiments.

Table 1 Global numbers for hPTM identification of T. cruzi histones.
Fig. 3
figure 3

Summary of the T. cruzi hPTMs described in the present work and the literature. Colors indicate which hPTM sites and marks were confirmed and which ones are novel. For the confirmed ones, the reference numbers indicate the respective work(s) where each modification site or PTM mark was previously described. Ac (acetylation), Ci (citrullination), Cr (crotonylation), Glu (glutarylation), Hib (hydroxyisobutyrylation), Ma (malonylation), Me (monomethylation), Me2 (dimethylation), Me3 (trimethylation), OH (hydroxylation), Ph (phosphorylation), (K), Su (succinylation), Ub (ubiquitination).

Methods

Cell culture and histone enrichment

T. cruzi Dm28c epimastigotes were cultured to log phase in liver infusion tryptose (LIT) medium27, supplemented with 10% fetal bovine serum without agitation at 28 °C. Histone extraction and enrichment were performed as previously described20, with some modifications. Briefly, 1 × 109 epimastigote cells were collected by centrifugation (10 minutes, 5000 g at 4 °C). Cells were lysed resuspending the obtained pellet in 1 ml of extraction buffer A (250 mM Sucrose; 1 mM EDTA; 3 mM CaCl2; 10 mM Tris-HCl pH 7.4; 0.5% (v/v) Saponin; 10 mM sodium butyrate, 1x protease inhibitor cocktail (Complete Mini EDTA free, Roche) and 1x phosphatase inhibitor cocktail (Roche)) and centrifuged for 10 minutes at 6000 g, 4 °C. Cell pellet was washed in 1 ml of extraction buffer B (extraction buffer A without saponin) and centrifuged for 10 minutes at 6000 g, 4 °C. The pellet, containing the cell nuclei, was resuspended in 1 ml of Buffer C (1% (v/v) Triton X-100; 150 mM NaCl; 25 mM EDTA; 10 mM Tris-HCl pH 8; 10 mM sodium butyrate, 1x protease inhibitor cocktail (Complete Mini EDTA free, Roche) and 1x phosphatase inhibitor cocktail (Roche)) and then centrifuged for 20 minutes at 12000 g, 4 °C. The pellet was washed 3 times in 100 mM Tris-HCl pH 8, resuspended in 1 ml of 0.4 N HCl and incubated on a rotator overnight at 4 °C. Acid soluble proteins were recovered in the supernatant after sample centrifugation for 15 minutes at 10000 g, 4 °C. The supernatant was transferred to a clean tube; acetone (8 x the initial volume) was added and incubated overnight at −20 °C. The sample was centrifuged for 15 minutes at 3100 g, 4 °C. Acetone was removed carefully and the pellet was washed 3 times with 1 ml of acetone. The protein pellet was carefully dried at 37 °C and then resuspended in 50 µl of water.

Derivatization and digestion of histones

Samples were processed by two different proteomic strategies (GeLC−MS/MS and LC−MS/MS) based on the protocol described in our previous study20, with some modifications. For GeLC-MS/MS, histone-enriched extracts were resolved on 15% SDS-PAGE gels and stained with Coomassie blue. Histone bands were excised, destained (25 mM NH4HCO3 in 50% ethanol, shaking in a thermomixer at 800 rpm, 25 °C), derivatized with propionylation solution (propionic anhydride in 100 mM NH4HCO3, 1:10)28 and digested in gel29 with trypsin (sequencing modified, Promega) at a concentration of 12.5 ng/uL for 16 h at 37 °C. For LC-MS/MS, histone-enriched extracts were directly derivatized with propionylation solution (propionic anhydride in 2-propanol, 1:3)30 and digested in solution30 with trypsin (sequencing modified, Promega) at a protease/protein ratio of 1/20, for 16 h at 37 °C. For both proteomic strategies, reduction and alkylation were not performed and peptide digests were desalted using C18 StageTips31 prior to nanoLC- ESI-MS/MS.

NanoLC-ESI-MS/MS analysis

Peptide mixtures were separated by online reversed-phase (RP) nanoscale capillary liquid chromatography (nanoLC) and analyzed by electrospray mass spectrometry in tandem (ESI MS/MS). The experiments were performed at the mass spectrometry facility P02-004 (Carlos Chagas Institute - Fiocruz Parana), with an EASY nLC 1000 (Thermo Fisher Scientific) system connected to an LTQ Orbitrap XL (Thermo Fisher Scientific) mass spectrometer equipped with a nanoelectrospray ion source (Phoenix S&T). Chromatographic separation of the peptides took place in a one-column set-up, with a 30-cm analytical column (75 μm inner diameter, 350 μm outer diameter) in-house packed with reversed-phase C18 resin (ReproSil-Pur C18-AQ 1.9 µm, Dr. Maisch GmbH, Ammerbuch-Entringen, Germany), kept at a constant temperature of 60 °C. Solvent A was 0.1% formic acid, 5% DMSO in water, and solvent B was 5% DMSO, 0.1% formic acid in acetonitrile. Samples were injected onto the column and subsequently eluted with a flow rate of 250 nL/min and peptide mixtures were separated with a linear gradient from 5% to 40% acetonitrile in 128 min. The mass spectrometer operated in Data-Dependent Acquisition (DDA) mode to automatically switch between MS and MS/MS (MS2) acquisition, using, applying both Collision-Induced Dissociation (CID) and Higher Energy Collisional Dissociation (HCD) to the 5 most intense peptides detected in each MS spectrum. For all samples duplicate or triplicate LC-MS/MS runs were performed. Survey full scan MS spectra (at 300–1600 m/z range) were acquired in the Orbitrap analyzer with resolution R = 60,000 at m/z 400 (after accumulation to a target value of 1,000,000 in the linear ion trap), with preview scan enabled. Singly-charged precursor ions were not selected for fragmentation. Former target ions selected for MS/MS were dynamically excluded for 30 seconds. Total cycle time was approximately three seconds. Other mass spectrometric conditions were: spray voltage, 2.4 kV; no sheath and auxiliary gas flow; ion transfer tube temperature, 100 °C; collision gas pressure, 1.3 mTorr; normalized collision energy using wide-band activation mode 35% for MS2. The ion selection threshold was 250 counts for MS2. An activation q = 0.25 and activation time of 30 ms was applied in MS2 acquisitions. The lock mass32 option, using DMSO peaks33 was enabled in all full scans to improve the mass accuracy of precursor ions.

Data analysis

Peptides and hPTM sites were identified with the software Peaks Studio (version 10, Bioinformatics Solutions Inc)34,35,36.The sequential analysis by Peaks Studio started with de novo sequencing of fragment spectra (Peptide De Novo), followed by peptide sequence match of the high quality de novo tags with (Peaks DB)36, considering the most frequent modifications, and then by peptide sequence match of the remaining high quality de novo only peptide tags (Peaks PTM)35. Proteins were searched against a database containing 20257 sequences of T. cruzi Dm28c strain (downloaded on Aug 15, 2018 from TriTrypDB, http://www.tritrypdb.org). In all Peaks searches (Peptide De Novo, Peaks DB and Peaks PTM) the precursor mass tolerance was set to 10 ppm and the fragment ion mass tolerance was set to 0.5 Da (ion trap spectra) or 20 ppm (Orbitrap spectra). Minimum peptide size was set to five amino acids, allowing for two missed cleavages. The enzyme for theoretical digestion was Arg-C with specific digestion mode. For Peaks DB, monomethylation (K/R), dimethylation (K/R), trimethylation (K), acetylation (K), acetylation (N-term), propionylation (K), propionylation (N-term), methylpropionylation (K), phosphorylation (S/T/Y) and oxidation (M) were set as variable PTMs. For Peaks PTM, on top of those PTMs searched in PeaksDB, monomethylation (N-term), dimethylation (N-term), glutarylation (K), 2-hydroxyisobutyrylation (K), hydroxylation (Y), malonylation (K), succinylation (K), ubiquitination (K), crotonylation (K), citrullination (R), methylation (Q), phosphorylation (H), butyrylation (K), acetylation (S/T), formylation (K) and deamidation (N/Q) were added as variable PTMs. For identification of both peptides and proteins, the false discovery rate (FDR) was set to 1%. PTM sites with an Ascore ≥20 were automatically validated.

Data Records

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE37 partner repository with the dataset identifier https://identifiers.org/pride.project:PXD01910438. Representative spectra for all novel hPTM marks and additional tables for supporting our dataset have been uploaded to figshare39.

Technical Validation

In the present work, two parallel proteomic approaches (LC-MS/MS and GeLC-MS/MS) were used to deeply profile the hPTMs of canonical, variant and linker histones of T. cruzi epimastigotes (Fig. 1). Each proteomic approach was applied to two biological replicates, each of them divided into two or three technical replicates during sample preparation, and multiple LC-MS/MS runs were performed for each sample, totalizing 27.raw files. The experimental design adopted in this study allowed us to substantially expand the repertoire of hPTMs and led to very reliable and complimentary data. One of the reasons for the improved identification of low abundance peptides and more PTM sites was the use of different approaches that not only decreased the complexity of the sample (histone enrichment by acid extraction and further separation by SDS-PAGE), but also explored different biochemical characteristics of histones. The derivatization of proteins before trypsin cleavage prevented overcutting and reduced the charge of the lysine-rich histone regions, especially in the N-terminal tails, producing peptides with good size and charge (doubly and triply charged in electrospray ionization MS40) for optimal high energy based peptide identification41. After protein propionylation, the samples from both proteomic strategies were directly submitted to the digestion. Thus, this simplified methodology was efficient in the identification of T. cruzi hPTMs.

A combined list of all histone supporting peptides identified in the present work is available in figshare File 139. Each non-redundant peptide sequence was unambiguously identified by multiple features. The identified peptides matched to several distinct gene products that represent each histone in the genome of T. cruzi, some of them demonstrating the expression of sequence divergent histone isoforms (Fig. S1a). Among the multiple isoforms detected for each given histone, the one with the highest score and number of PTMs (e.g. H2A, BCY84_17381; H2B, BCY84_06298; H3, BCY84_02638; H4, BCY84_15632; H2A.Z, BCY84_22061; H2B.V, BCY84_04421; H3.V, BCY84_18558 and H1, BCY84_14748), was chosen as the model sequence to be used throughout the article. All histones were identified by multiple MS/MS spectra in both biological replicates of the two proteomic strategies (Fig. S1b). The quality of hPTM peptide identification can be verified through their mass accuracy and score distribution (Fig. S2). Also, the majority of hPTMs were detected across multiple samples and experiments, strengthening the reliability of our data (Fig. 4 and figshare File 239).

Fig. 4
figure 4

Identification of T. cruzi hPTMs across different experiments. Venn diagrams showing the number of hPTMs identified (a) each individual experiment, (b) different proteomic approaches, (c) LC-MS/MS experiments and (d) GeLC-MS/MS experiments. The hPTMs contained in each part of the Venn diagrams are listed in figshare File 239.

An aspect explored in our data is the relative abundance of hPTMs. In general, the hPTM marks identified displayed low abundance/occupancy. However, some sites of high abundance were also found, mainly for acetylation, methylation and a few phosphorylation sites (Fig. 5). These results seem to be in agreement with previous studies that show the low abundance of scarcely histone modifications in eukaryotes42 and that the methylation and acetylation are the most abundant hPTM in T. cruzi19. Among the hPTM marks with high abundance in our dataset and previously detected in compared studies are the H3K76me/me2/me3, H4K10ac and H4K14ac, important marks involved in cell cycle regulation in trypanosomes12,14,21. Other marks in this group are present in variant histones, H3.VK94me2/me3, H2A.ZK54ac and H2A.ZK58ac, that seems to be trypanosome-specific14. The relative abundances of the modified peptides and individual modification sites identified in our dataset with a relative abundance equal or higher than 10% are shown in Fig. S3a, b, respectively.

Fig. 5
figure 5

Relative abundance of hPTM sites. For each hPTM site identified the relative abundance (occupancy) were calculated by dividing the intensity of the modified by the sum of the modified and unmodified hPTM site. The cases with 100% of abundance are due to the lack of intensity detection for its unmodified version.

A total of 201 hPTM marks identified in the present work passed the criteria of site localization score (Ascore >20). Among them, 94 confirmed the literature and 107 were newly described in the repertoire of T. cruzi hPTMs (representative spectra available in figshare File 539). Also, some hPTMs previously described were not detected here (n = 126) is probably due to technical and biological reasons (e.g. different proteomic strategies, different strains of parasites, regions of histones not covered in a given study, etc.). Also, the chance of identification of hPTM marks in multiple experiments and studies seems to have some correlation with their intensity and relative abundance (Fig. 6 and figshare File 339). In addition to the validated hPTM marks described here, we identified another 111 with Ascore <20, which were not added to our final map. However, several of them were either close to the threshold score and/or identified by multiple features/experiments. Therefore, to allow the reader to fully explore the data, a compiled list of all the hPTM marks identified in the present work (Ascore >20/<20 and literature) are available in figshare File 439.

Fig. 6
figure 6

Abundancy of exclusive and confirmed hPTM marks identified in this study compared to previously described analysis. Overlapping hPTMs to other studies are color coded according to (a) log2 intensity and (b) relative abundance. Detailed data are listed in figshare File 339.

Our updated and comprehensive profile of T. cruzi hPTMs, now available to be used for further studies, reinforces that several residues are targets of multiple modifications, that some modification types are more abundant than others and that hPTMs are widely distributed and diverse at both the tails and the globular domains of all histones, which are both regions with distinct and important roles in the epigenetic regulation of higher eukaryotes3,43.