Identification of species-specific peptide markers in cold-pressed oils

In recent years, cold-pressed vegetable oils have become very popular on the global market. Therefore, new versatile methods with high sensitivity and specificity are needed to find and combat fraudulent practices. The objective of this study was to identify oilseed species-specific peptide markers, using proteomic techniques, for authentication of 10 cold-pressed oils. In total, over 380 proteins and 1050 peptides were detected in the samples. Among those peptides, 92 were found to be species-specific and unique to coconut, evening primrose, flax, hemp, milk thistle, nigella, pumpkin, rapeseed, sesame, and sunflower oilseed species. Most of the specific peptides were released from major seed storage proteins (11 globulins, 2S albumins), and oleosins. Additionally, the presence of allergenic proteins in the cold-pressed oils, including pumpkin Cuc ma 5, sunflower Hel a 3, and six sesame allergens (Ses i 1, Ses i 2, Ses i 3, Ses i 4, Ses i 6, and Ses i 7) was confirmed in this study. This study provides novel information on specific peptides that will help to monitor and verify the declared composition of cold-pressed oil as well as the presence of food allergens. This study can be useful in the era of widely used unlawful practices.

Scientific Reports | (2020) 10:19971 | https://doi.org/10.1038/s41598-020-76944-z www.nature.com/scientificreports/ to detect adulterations of high-price cold-pressed oils derived from various oleaginous seeds and fruits. Many advanced untargeted approaches have been introduced recently to address authenticity issues of edible oils, based on liquid or gas chromatography mass spectrometry (LC-MS or GC-MS) techniques for measuring various polar, nonpolar, and volatile compounds, as well as spectroscopic techniques including nuclear magnetic resonance (NMR) spectroscopy, near-infrared spectroscopy (NIR), attenuated total reflection-Fourier transform infrared spectroscopy (ATR-FTIR), or fluorescence, where the result is achieved by analysing the whole spectrum of oils and development of statistical models [11][12][13][14] .
When food fraud is considered in terms of food safety issues, and related, for example, to food allergenicity, a targeted approach based on identifying specific proteins and peptide markers could be a method of choice. From this aspect, MS-based proteomic analysis has the advantage of detecting the adulteration of unrefined oils and tracing potential allergenic proteins present in the sample, simultaneously during one MS run. Depending on the type of oil, the method of protein extraction and assessment methodology, the protein content of edible oils varies from 0.01 to 14.8 ppm 2,15 . Allergenic proteins, such as cruciferin (11S globulin) and napin (2S albumin) have been identified in rapeseeds (Brassica napus) but also in unrefined rapeseed oil using a gel-based LC-MS method; however, these allergens have not been shown in refined rapeseed oil 16 . Moneret-Vautrin et al. 17 found that the residual allergenic proteins, i.e. Ara h1, in peanut oil can cause an allergic reaction in infants with atopic dermatitis sensitive to peanut, whereas, Blom et al. 18 estimated the risk of allergenic reactions to refined peanut oil as extremely low.
This study aimed to identify the unique peptide markers specific to oilseed species and proteins present in cold-pressed oils using proteomic techniques. An additional goal was to assess the presence of allergens. The protein and peptide composition of oils produced from 10 oilseeds, namely coconut, evening primrose, hemp, flax, milk thistle, nigella (also known as black cumin), pumpkin, rapeseed, sesame, and sunflower seeds, was analysed using ultra-performance liquid chromatography coupled to quadrupole time-of-flight mass spectrometry (UHPLC-Q-TOF-MS/MS). The markers identified could be useful for the detection of adulteration.

Results and discussion
Protein profiles. A previous study, comparing five methods using various solvents to separate proteins from unrefined and refined oils, showed that the protein content differs significantly depending on the extraction method 15 . Therefore, the two most efficient methods were tested on cold-pressed sunflower oil, i.e. extraction with acetone-hexane and extraction with acetone, at the beginning of this work. Since no considerable differences were observed in the extracted protein profiles, extraction with acetone was applied to all oil samples. Acetone extraction has been used successfully to identify allergenic proteins from commercial cold-pressed rapeseed oils 16 . Proteins were present in all the examined oils but the proteomic profiles differed considerably in the distribution and intensity of protein bands (Fig. 1). Visible differences reflect the species diversity of the major storage proteins. Under reducing SDS-PAGE conditions, the most abundant bands belonged to 11S globulin α and β subunits (MW 30-45 and 20-30 kDa, respectively). 11S globulin monomers (MW about 45-56 kDa) were less intense. Lower molecular weight bands (MW 10-21 kDa) were less abundant but showed species specificity.
Globulin, albumin and oleosin subunits have been reported in electrophoretic bands obtained from oilseed meals or isolates of various species, such as sesame protein isolate 19 , hemp protein isolate 20 and sunflower seed and kernel proteins 21 . Regarding cold-pressed oils, proteins are minor components present in very low amounts as a result of gentle processing by pressing the oilseeds with a screw press without heating, and the resulting Scientific Reports | (2020) 10:19971 | https://doi.org/10.1038/s41598-020-76944-z www.nature.com/scientificreports/ oil is purified only by sedimentation, filtration, or centrifugation. Therefore, identification of the proteins in vegetable oils has not been common in the literature. The fruit and seed proteins, mainly storage proteins and different oleosins have been found in olive oil and palm oil 22,23 , whereas knowledge about nigella, milk thistle, and evening primrose proteins is negligible. Major rapeseed allergens, such as napin and cruciferin (2S albumin and 11S globulin, respectively), have been extracted and identified from commercial cold-pressed rapeseed oil 16 . Accordingly, there is a need to conduct research to determine species-specific proteins and peptides found in cold-pressed vegetable oils.
Protein identification. To identify specific proteins, an in-solution tryptic digest of acetone-extracted proteins was analysed using the UHPLC-Q-TOF-MS/MS method. Figure 2 presents the 3D chromatograms obtained from the 10 examined cold-pressed oils. The National Center for Biotechnology Information (NCBI, U.S. National Library of Medicine) protein database was searched for protein and peptide identification across green plant taxonomy entries or appropriate genera (i.e. Brassica, Cannabis, Cocos, Cucurbita, Helianthus, Nigella, Linum, Oenothera, Sesamum, and Silybum). The number of identified proteins and tryptic peptides obtained from cold-pressed oils is shown in Fig. 3. In total, over 380 proteins and 1050 peptides were identified in the samples analysed. The highest number of proteins and peptides was found in the pumpkin and sunflower oils (106 and 105 proteins, respectively). The smallest number of proteins and peptides was determined in evening primrose, milk thistle, and nigella oils (only four, two, and three proteins, respectively). For these species, only a small fraction of the proteins have been sequenced, and thus, databases are highly incomplete. SDS-PAGE protein profiles obtained from these three oils confirmed the presence of a storage protein bands of species-specific distribution and intensity (Fig. 1). All proteins which were identified with high confidence scores using the Spectrum Mill MS Proteomics Workbench with > 70% score peak intensity and 10 ppm precursor mass tolerance are gathered in Supplementary material 1 for protein datasets. The matches and Spectrum Mill scores were evaluated at 1% of the false discovery rate (FDR), for identity and homology threshold. The identified proteins belong, among others, to 11S and 7S globulins, 2S albumins, and oleosin families. Some of those derived from hemp oil and sesame oil have been identified previously in hempseed and sesame seed defatted powders using LC-MS/MS 24,25 . For the remaining eight oilseed species studied, there is a lack of comprehensive proteomic MS-based analysis. Recently, only the key proteins of oil palm mesocarp (Elaeis guineensis) have been identified by 2-DE and MALDI-TOF/ TOF methods 23 . None of them coincided with the proteins detected in coconut oil, despite the close relationship between these two species as they belong to the same family of Arecaceae.
Species-specific peptide markers. The peptides identified and selected, based on the Spectrum Mill output scores, were searched against the NCBInr database using the protein Basic Local Alignment Search Tool (BLAST) and blastp algorithm (U.S. National Library of Medicine, Bethesda, MD), for species and protein specificity. In total, 92 species-specific peptides, that were released from 42 proteins (including different subunits), were identified in the examined oils. The largest amount of unique peptides was found in sesame, sunflower and pumpkin oils (21,18, and 17 peptides, respectively). Table 1 presents the unique, species-specific peptide markers identified in coconut, evening primrose, flax, hemp, milk thistle, nigella, pumpkin, rapeseed, sesame, and sunflower cold-pressed oils. The peptides' total intensities were in the range of 10 5 -10 8 and scored peak intensities (SPI) were > 60%. All peptides detected in the examined cold-pressed oils are shown in Supplementary material 2 for peptide datasets.
Unique peptides derived mainly from major seed storage proteins (i.e. cocosin, conlinin, edestin, other specific legumin-like or vicilin-like globulin subunits, and 2S albumins), and oleosins which are structural proteins found in plant parts with high oil content. Mass spectra of two peptide markers specific to coconut and hempseed obtained from 11S globulins are shown in Fig. 4. Product spectra of another three peptides unique to pumpkin oleosin 18.2 kDa-like, flax oleosin high molecular weight isoform, and black cumin nigellin-1.1 chain A, are shown in Fig. 5. In the present study, most of the identified and selected peptides were specific to the species investigated, but in some cases the peptides were assigned to another species of a given genus, i.e. N. damascena or C. moschata, and thus, the specificity of the protein for the genus can be confirmed.
To date, only a few comprehensive proteomic studies of oilseeds having application to food science have been published. Many protein sequences submitted to the NCBI database were obtained by methods of DNA sequencing. Several peptides released from hemp edestin 1 and edestin 2 identified in our study (Supplementary material 2) have been reported previously in hempseed defatted flour 24 .
Regarding allergens with food route exposure, proteins derived from four of the ten species investigated, namely pumpkin, rapeseed, sesame, and sunflower, have been included in the database of officially recognized allergens maintained by the World Health Organization and International Union of Immunological Societies (WHO/IUIS) Allergen Nomenclature Sub-committee 25 . In the present study, some food allergens derived from pumpkin, sesame, and sunflower cold-pressed oils were detected. These allergens include pumpkin Cuc ma 5, sunflower Hel a 3, and six sesame allergens (Ses i 1, Ses i 2, Ses i 3, Ses i 4, Ses i 6, and Ses i 7). Details of the allergens and detected peptides are shown in Table 2. Recently, seven sesame allergens have been detected in raw sesame seeds and quantified in food products such as sauces, cookies, cake, and candy 26 based on selected signature peptides. Eight out of twelve of those previously selected signature peptides for sesame allergens were identified in the present study.
This study presents the discovery of unique peptide markers, but further, a large-scale study is needed to confirm their utility in the authentication of commercially manufactured cold-pressed oils. The specificity of the peptide marker should be also further verified using different cultivars, geographical locations, etc. for each species.

Conclusions
This study provides a set of specific proteins and species-specific peptide markers which can be helpful for food analysts to verify the declared composition of cold-pressed oil as well as the presence of food allergens. Ninetytwo specific peptides unique to coconut, evening primrose, flax, hemp, milk thistle, nigella, pumpkin, rapeseed, sesame, and sunflower were detected and identified in cold-pressed oils. The unique peptides were released mainly from specific seed storage proteins and oleosins. Additionally, several food allergens, i.e. pumpkin, sunflower, and sesame allergenic proteins, were observed in the relevant cold-pressed oils. The results are undoubtedly beneficial in the era of widely used unlawful practices, mainly the modification of product composition with cheaper substitutes. Preparation of samples. The oil was prepared in a cold-pressing process using a Yoda YD-ZY-02A oil press (Warsaw, Poland). The oil temperature during the production process was in the range of 38−50 °C and the efficiency of the pressing process in relation to the oil content was approximately 85%. Protein extraction with acetone was performed according to Martín-Hernández, Bénet, & Obert 15 with some changes. To 50 g of oil, 125 mL of cold acetone was added. After shaking, the mixture was stirred on a magnetic stirrer at 500 rpm for 1 h at 4 °C. The mixture was then centrifuged at 11,000 RCF and 4 °C for 15 min, and the supernatant was discarded. The precipitate was washed twice with 5 mL of cold acetone. The pellet was dried overnight in drying oven at 40 °C, and then stored at − 20 °C until proteomic analysis. The samples were analysed in duplicate.

Sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE). SDS-PAGE was
performed to compare the profiles of the proteins extracted from oil samples according to a previously described method 27 . Dried pellet (5 mg) was dissolved with lysis buffer (8 M urea, 2 M thiourea, 0.05 mM Tris, 75 mM dithiothreitol (DTT), 3% SDS, and 0.05% bromophenol blue, at pH = 6.8) and heated at 98 °C for 4 min. The protein concentration was determined using a 2-D Quant kit (GE Healthcare Bio-Sciences, Fairfield, CT, USA). Protein aliquots (12 μg) were loaded onto 15% polyacrylamide gels prepared in a Hoefer SE250 system (GE Healthcare Bio-Sciences). A reference broad-range molecular weight standard (Bio-Rad Laboratories, Inc., CA,   www.nature.com/scientificreports/ and data acquisition were performed using Agilent MassHunter Workstation Software. The LC parameters were set as follows: 10 μL injection volume, 0.3 mL/min mobile phase flow. The mobile phase consisted of 0.1% formic acid in water (solvent A) and 0.1% formic acid in acetonitrile (solvent B). Gradient steps were applied as follows: 0-2 min, 2% B; 2-40 min, to 32% B; 40-45 min, to 37% B; 45-50 min, to 90% B; 50-55 min, 90% B; and a 5-min post-run at 2% B. The ion source gas (nitrogen) temperature was 250 °C, the flow rate was 14 L/min, nebulizer pressure was 35 psi, sheath gas temperature was 250 °C, and sheath gas flow was 11 L/min. The capillary voltage was set at 3500 V, nozzle voltage to 1000 V, and the fragmentor to 400 V. Positive ions formed in an electrospray were acquired in the range of 100-3000 m/z in MS scan mode and in auto MS/MS mode, with a scan rate of 5 scan/s for MS and 3 scan/s for MS/MS. Internal mass calibration was enabled by using two reference masses at 121.0509 and 922.0098 m/z. A National Center for Biotechnology Information (NCBI, U.S. National Library of Medicine) protein database search for protein and peptide identification was performed, using the Spectrum Mill MS Proteomics Workbench with > 50% score peak intensity and 10 ppm precursor mass tolerance, with the following parameters: trypsin enzyme, taxonomy green plants or a given taxonomy genus (i.e. Brassica, Cannabis, Cocos, Cucurbita, Helianthus, Nigella, Linum, Oenothera, Sesamum, Silybum), two missed cleavages, 50 ppm products mass tolerance, carbamidomethylation as fixed modification, and methionine oxidation as a variable modification. The matches and Spectrum Mill scores were evaluated at 1% of the false discovery rate (FDR), for identity and homology

Data availability
The datasets generated and analysed during this study are available from the corresponding author on reasonable request.