Discovery of protein acetylation patterns by deconvolution of peptide isomer mass spectra

Protein post-translational modifications (PTMs) play important roles in the control of various biological processes including protein–protein interactions, epigenetics and cell cycle regulation. Mass spectrometry-based proteomics approaches enable comprehensive identification and quantitation of numerous types of PTMs. However, the analysis of PTMs is complicated by the presence of indistinguishable co-eluting isomeric peptides that result in composite spectra with overlapping features that prevent the identification of individual components. In this study, we present Iso-PeptidAce, a novel software tool that enables deconvolution of composite MS/MS spectra of isomeric peptides based on features associated with their characteristic fragment ion patterns. We benchmark Iso-PeptidAce using dilution series prepared from mixtures of known amounts of synthetic acetylated isomers. We also demonstrate its applicability to different biological problems such as the identification of site-specific acetylation patterns in histones bound to chromatin assembly factor-1 and profiling of histone acetylation in cells treated with different classes of HDAC inhibitors.


Description of how spectral deconvolution is carried out by Iso-PeptidAce
Raw file extraction Information is extracted from Raw files using ProteoWizard 1 and MsFileReader. MS2 peak lists (masses and intensities) are stored per precursor mass and sorted by retention time. Spectrum precursor intensity, MS1 and MS2 injection times are also extracted.
Precursor intensity counts are converted to ions per millisecond (iMS): For each different precursor mass in a file, a curve describing the number of ions flowing in the system at any given time is built using the computed iMS values. Linear interpolation is used to compute the area under the curve for a given time point. In particular, the number of ions that entered the CTrap and got fragmented (#ion) is estimated as the area under the curve between the beginning (rt) and the end of the scan (rt + MS2i). # = ℎ ( : , : + 2 ) These operations are made for each file (both mixed samples and synthetic peptide runs). For the synthetic peptide runs, spectrum fragment intensity counts (fi) are normalized (nfi) using the number of ions (#ion) that got fragmented.

Peptide Spectrum Matching
Iso-PeptidAce uses PeptidAce peptide spectrum matching abilities (which derives from Morpheus) to automatically identify peptide sequences and modifications across synthetic peptide runs. PeptidAce uses a no-enzyme in-silico protein digestion routine to parse the provided Fasta file for potential matches. All modifications (specified in the "Modifications.csv" user configurable Comma Separated Values file) are searched for. This automatic identification is useful to discover impurities in the peptide synthesis process.
PeptidAce matches peptides to spectrum by comparing theoretical ions (fragments a, b, c, x, y and z) within a specified ppm tolerance window. Peptide spectrum matches (psm) are scored by their cumulated normalized fragment intensities (nfi), number of matching ions versus unmatched theoretical ions (mif) and precursor accuracy ( , normalized between [0,1]): = 0.33 • + 0.33 • + 0.33 • In case of discrepancy (precursor with inconsistent psm across the elution curve), psm scores of similar peptides are summed together and the best peptide is associated to the precursor. Ambiguous spectra assigned to different peptides are discarded.

Describing the system as a Maximum Flow problem
The network flow was built from mixed spectra and spectra acquired from peptide isomers. The objective, finding the maximum flow of the system, is stated as such:

How many ions of peptides isomer does it take to optimally fill the mixed spectrum fragment intensity counts
For each synthetic peptide found, normalized fragment intensities (nfi) of every spectrum were averaged. The n most intense fragments were chosen for each synthetic peptide. Only fragment masses with consistent normalized intensities (standard deviation above 50% of the mean) were considered reproducible enough. Unstable fragments were flagged and ignored in following analyses.
For synthetic peptides of similar masses, these n fragments were used to build a list of fragment masses ( ), both with common and unique fragment masses. The nfi of their averaged spectrum was used to associate intensities to a list of fragment masses ( , one for each potential isomer j). A similar list of intensity count ( ) is built for the mixed spectrum m of matching precursor mass. For each mixed spectrum m, the Maximum Flow is defined by the sum of that optimally fills the fragment intensity counts (were is the number of ions for the j isomer, and is the theoretical list being built): = ∑ ( • ) and ≤ Thus, the network flow can be described as the source being connected to peptide isomers j (edges of unknown capacity) which are themselves connected to every selected fragments aMz (edge capacity determined by ) linking to the sink (edge capacity determined by ). When the maximum flow is reached, the first set of edges from the source represents the ratio of each peptide isomer found in the mixed spectrum.