The intracellular proteome of African swine fever virus

African swine fever (ASF) is a viral disease that affects members of the Suidae family such as African bush pigs, warthogs, but also domestic pigs, and wild boar. It is transmitted by direct contact of naïve with infected animals, by soft ticks of the Ornithodoros genus, or indirectly by movement of infected animals, improper disposal of contaminated animal products or other sources related to human activity. The recent spread of ASF into Eastern and Central European countries is currently threatening the European pig industry. The situation is aggravated as to-date no efficient vaccine is available. African swine fever virus (ASFV) is a large enveloped ds DNA-virus encoding at least 150 open reading frames. Many of the deduced gene products have not been described, less functionally characterized. We have analysed ASFV protein expression in three susceptible mammalian cell lines representing a susceptible host (wild boar) and two non-susceptible species (human and green monkey) by mass spectrometry and provide first evidence for the expression of 23 so far uncharacterized ASFV ORFs. Expression levels of several newly identified ASFV proteins were remarkably high indicating importance in the viral replication cycle. Moreover, expression profiles of ASFV proteins in the three cell lines differed markedly.

by identification of peptides deduced from the viral genomic sequence. A GFP-expressing thymidine kinase (TK) negative recombinant of ASFV strain OURT 88/3 (OURT 88/3-ΔTK-GFP) was used for infection. OURT 88/3 is a naturally attenuated non-hemadsorbing field strain isolated from Ornithodoros erraticus ticks in Portugal 12 which is considered for use as vaccine as it has shown the ability to protect pigs against a challenge with virulent ASFV strain OURT 88/1 [12][13][14] . Three susceptible cell lines were selected for proteome analysis after infection. WSL-HP is a lung cell line from wild boar, a natural host of ASVF, while HEK 293 cells originate from humans which are not susceptible to infection. Vero cells were included, since they have been extensively used for infection experiments with ASFV in the past.

Results and Discussion
MS study design. The three cell lines used in this study were fully susceptible to infection with the recombinant ASFV OURT 88/3-ΔTK-GFP expressing GFP under the control of the late ASFV p72 promotor [15][16][17] . As shown in Fig. 1 the virus replicated in all three cell lines, although final titers on Vero and HEK 293 were 20-fold and 16-fold lower than on WSL-HP cells, respectively. Before preparing samples for the MS analysis the infection conditions were adjusted to ensure that completely infected cell monolayers were used for the preparation of the protein extracts in order to achieve high sensitivity for the detection of ASFV proteins and to enable the MS based quantitative comparison of ASFV protein expression between the different cells. Cell monolayers were inoculated with ASFV OURT 88/3-ΔTK-GFP and GFP fluorescence, indicating the entry into the late phase of infection, was monitored over time. After 24 h (HEK 293 and Vero) and after 48 h (WSL-HP) the cell monolayers were entirely positive for GFP fluorescence. Immunoblot and RT-qPCR analyses targeting structural proteins p30, expressed with early kinetics 18,19 , and p72, expressed with late kinetics 15,16 , confirmed that the cell lines were in the late phase of infection (Fig. 2) at the given time points which were then chosen to harvest the cells for the MS analysis.
For the proteome analysis of the cell extracts a shotgun approach was applied using a mass spectrometric platform combining nano liquid chromatography (nLC) and matrix-assisted laser desorption/ionisation tandem time-of-flight (MALDI-TOF/TOF) MS. In short, total protein extracts of the three cell lines were prepared at the designated time points, proteins were digested into peptides, separated by nLC and analysed on the MALDI-TOF/ TOF instrument. The protein composition of the extracts was then determined by a query of the spectra using a sequence database containing host and virus proteins. To increase sensitivity and improve sequence coverages of ASFV proteins in the mass spectrometric analysis, two replicates of each of the three infected cell lines were digested with three different proteases with complementary specificity (trypsin, chymotrypsin and Glu-C). The choice of the proteases based on an in-silico comparison of the physical properties (molecular weight, isoelectric point, pI, and the Grand Average of Hydropathy, GRAVY 20 ) of ASFV and Sus scrofa proteins which had indicated that the viral proteins were, on average, smaller (37.7 kDa vs.50.5 kDa), more basic (pI 7.81 vs. 7.33), and more hydrophobic (GRAVY −0.21 vs. −031) than the Sus scrofa proteins (Supplementary Fig. S1). The results of the six nLC-MALDI-TOF/TOF MS runs for every cell line were compiled into a single result file for the calculation of the identification scores, sequence coverages and the number of peptides identified in every protein. A comprehensive representation of the MS data is provided as Supplementary data, Tables S1 and S2.
Classification of ASFV proteins into structural, non-structural and uncharacterized proteins followed a recent review 6 . Additionally, the literature was searched for any evidence for the expression of ASFV RNAs or proteins and the references were added to the column 'Ref ' in Supplementary Table S3. To the best of our knowledge, we demonstrate the existence of 23 ASFV proteins, for which no evidence of expression was available so far.    Table S3.
Expression profiles of ASFV specific proteins differed markedly between the cell lines ( Fig. 3). Fifty-four proteins were detected in all three cells which may represent a core set of ASFV proteins required to maintain the late phase of infection mammalian cell cultures. Ten proteins were exclusively detected in WSL-HP cells, while 4 more were exclusively identified in Vero cells. Twenty-three proteins were shared between WSL-HP and Vero cells but were undetectable in HEK 293 cells.
The observed differences were obviously not due to technical issues, as the low number of 57 identified viral proteins in HEK 293 cells was in contrast to the high number of 1,251 identified host cell proteins (vs. 1,194 in Vero and 969 in WSL-HP) and may reflect a limitation in the replication of ASFV in cells from the non-susceptible human species. Another factor contributing to the observed differential expression of ASFV proteins could be the different growth kinetics of OURT 88/3-ΔTK-GFP in the cells from different hosts ( Fig. 1) which may have influenced expression profiles in the late phase. Also, proteins exclusively expressed in the early phase of infection may be underrepresented. In early studies using one-dimensional or two-dimensional electrophoresis (2DE) in combination with radioactive pulse labelling and inhibitors of viral DNA replication 21-24 the numbers of virus induced proteins synthesized in the early and late phases of infection were assessed. Although the experimental conditions were different from ours, it is interesting to note that in the study by Esteves et al. 23 , 24 of the 35 early proteins and in the study by Urzainqui et al. 21 all early proteins were also transcribed in the late phase. As these numbers are based on the incorporation of radioactive precursors in a pulse labelling experiment we assume that the majority of early proteins will be amenable to mass spectrometric detection also in the late phase of infection as they are still synthesized or have accumulated during infection. Indeed, a number of proteins listed in Supplementary Table S3 are expressed with early kinetics. In contrast to these early publications, MALDI-TOF MS was combined with 2DE analysis in a study focusing on the identification of host proteins modulated after ASFV infection 25 . Approximately 60% of the 68 identified protein spots were of viral origin, but the identities of these proteins and kinetics of expression were not reported.
For the qualitative analysis the use of two more proteases in addition to the most commonly used enzyme, trypsin, was beneficial for the yields and the sequence coverages of identified proteins. This observation was most striking for the comparison of tryptic and chymotryptic peptides obtained after digestion of infected WSL-HP cells. While only 31.9% of the 8,643 Sus scrofa specific peptides resulted from the chymotrypsin cleavage, this was the case for 44.7% of the 1,254 ASFV specific peptides. Using Fisher's exact test, these ratios indicated a highly significant overrepresentation of chymotryptic peptides among the ASFV derived peptides. Most likely, this was a consequence of the slightly divergent physicochemical properties of ASFV proteins in comparison to the host proteins.
A prerequisite for MS based proteome analysis is the correct annotation of the ORFs within the genome sequence. We have therefore also applied a proteogenomic approach to our MS data in order to identify any additional unidentified ORFs. To this end, the genome of ASFV strain OURT 88/3 was translated in the six possible reading frames and the resulting database was used to re-evaluate the mass spectra with the Mascot search engine (data not shown). However, peptide assignments to so-far unknown potential new reading frames were not observed so that we have no indication for any additions to the annotations in the published OURT 88/3 protein sequences.
Of the 32 members of the multi gene families present in OURT 88/3, we have identified 6, specifically one MGF 505 protein (MGF 505-9 R) and five MGF 110 members (MGF 110-1L, -2L, -4L, -5L, and -14L). While MGF 110-14L and MGF 505-9 R were expressed weakly and detected only in Vero cells, the expression of the four remaining MGF 110 proteins (1L, 2L, 4L, and 5L) was stronger and, with the exception of MGF 110-2 L, they were detected in all three cell lines. While MGF 360 and MGF 530 members have been reported to be virulence determinants, interact with the innate immune system 9,26,27 , and determine the host range of ASFV 10,11 , MGF 110 genes have been shown to be non-essential for the infection of and virulence in pigs 28 . Nevertheless, MGF 110-1L, -4L and -5L are expressed in all three cell cultures suggesting they may play an at least beneficial role here. As MGF 110 proteins have been reported to contain signal peptides 29 , the coding sequences were analysed for this feature by SignalP 4.0 30 and Phobius softwares 31,32 . Twelve predicted mature sequences were added to the sequence database for a re-evaluation of the mass spectrometric data. As shown in Supplementary Fig. S2, the N-termini of the predicted mature sequences of all four identified MFG 110 proteins 1L, 2L, 4L, and 5L and also of pI329L were confirmed by MS. For 44 additional of the 94 identified proteins the N-terminal peptides were identified (Supplementary data, Table S1). Only six were in the native amino form, while all other were modified by N-terminal acetylation, a common post-translational modification.
Quantitative proteome analysis. The quantitative evaluation of the MS analysis shown in Figs 4 and 5 was performed using the exponentially modified protein abundance index (emPAI) 33 , a label-free approach which allows the calculation of protein abundances on basis of the number of identified peptides that are annotated to a certain protein during the protein identification process. With 14 and 17 mole % (10.2 and 13.3 weight %) of total protein content the expression of ASFV proteins in Vero and WSL-HP was quite massive and significantly higher than in HEK 293 cells (6.3 mole %, 4.8 weight %). The lower number of identified ASFV specific proteins in HEK 293 cells correlated with the lower content of ASFV proteins present in this cell line and could therefore be a matter of sensitivity as poorly expressed proteins may have dropped below the detection limit. However, the quantitative evaluation of the MS data showed that some ASFV proteins were expressed in HEK 293 cells at similar or even higher levels than in WSL-HP or Vero cells (Fig. 4) indicating that expression of individual proteins differed markedly in the different cells and arguing against a general underrepresentation of ASFV proteins in HEK 293 cells. Although the abundances of individual ASFV proteins varied in the different cell lines, a certain degree of correlation between the expression levels was observed which was most striking for the ASFV proteins expressed in Vero and in WSL-HP cells (Fig. 4, right panel) representing a susceptible (WSL-HP) and a non-susceptible (Vero) host. This result suggests that the course of infection in both cell lines is similar and both may be useful for infection studies with ASFV. This observation may also be helpful for the retrospective assessment of studies which have been performed with Vero cells in the past.
Non-structural proteins were found among the less expressed proteins in all three cell lines while the most abundant were structural proteins and, surprisingly, so far uncharacterized proteins. In Fig. 5, an overview of the 20 most abundant proteins in the cell extracts (including host proteins) is given in panel A and the abundances of the 20 highest ranking ASFV proteins (without cellular proteins) are compared in panel B. The abundance ranking within the ASFV proteins differed between in the three cell lines (Fig. 5B) which may reflect different relevance of the proteins in cells of the three differentially susceptible hosts. However, there also were notable similarities. Three so far uncharacterized proteins ranked among the top 20 abundant proteins in WSL-HP cells, namely pK145R, pC129R, and pI73R. Of these, pK145R and pC129R ranked also among the top 20 most   34 , WSL-HP (CCLV-RIE #1346) 35 , and Vero cells (CCLV-RIE #0015) were provided by the Biobank of the Friedrich-Loeffler-Institut. Cells were cultivated in Minimum Essential Medium (MEM) supplemented with fetal bovine serum (10%) and Penicillin/Streptomycin solution (1%, Biochrom, Berlin, Germany) at 37 °C in a humidified atmosphere with 2.5% CO 2 . All live-virus experiments were carried out in a biocontainment facility that fulfils the safety requirements for ASF laboratories and animal facilities laid out in Chapter VIII of Commission Decision 2003/422/EC. The GFP-expressing ASFV mutant OURT 88/3-ΔTK-GFP was used throughout this study. OURT 88/3-ΔTK-GFP was generated from strain OURT 88/3 (kindly provided by Linda Dixon, Pirbright, UK) as described for the corresponding NHV recombinant 17 . The correct integration of the late ASFV p72 promoter regulated GFP expression cassette was confirmed by direct sequencing of PCR amplicons spanning the mutagenized locus. To improve infection rates, the cells were inoculated in 6-well or 24-well plates by centrifugation (600 × g at 37 °C) during the 1 h incubation period (JH Forth, L Käbisch, R Portugal, S Blome, GM Keil, unpublished). The inoculum was then replaced by fresh medium and cells were further incubated.

Virus and cells. HEK 293 (CCLV-RIE #197)
For measuring growth kinetics cells were cultivated in 24-well plates and infected at a multiplicity of infection (MOI) of 2. After incubation for the times indicated supernatants were removed and cells were scraped into phosphate buffered saline ( Immunoblots. Proteins were purified using the Trizol Reagent workflow as described in the manual, separated on polyacrylamide gels (10%) and electroblotted to nitrocellulose membrane 40 . Blots were incubated with PBS containing 10% horse serum and 6% skimmed milk powder overnight at 4 °C. After incubation with appropriate dilutions of rabbit sera directed against p30 (1:20,000) or p72 (1:50,000) made up in blocking buffer (PBS containing 0.1% Tween-20 [PBS-T], 0.6% skimmed milk powder, and 1% horse serum) for 1 h at RT, the membranes were washed with PBS supplemented with 0.3% Tween-20 for 1 min and in PBS-T for 5 min and then incubated with secondary antibody (1:2,000 dilution in PBS-T, peroxidase-conjugated goat anti-rabbit, Dianova, #111-036-045) for 1 h at RT. Membranes were washed as above, and bound antibodies were detected using the Clarity Western ECL substrate (BioRad, # 170-5061).
Mass spectrometric workflow. Proteome analysis was performed using a shotgun approach. Samples were digested into peptides, which were separated by nano liquid reversed phase chromatography (EASY-nLC II, Bruker), and spotted to a MALDI target by a Proteineer fcII sample spotting robot (Bruker). Mass spectrometric analysis was performed on an UltrafleXtreme MALDI-TOF/TOF instrument (Bruker). All reagents used were of highest purity available, all solvents were of MS grade.
MALDI-TOF/TOF mass spectrometry. Mass spectrometric analysis was performed in positive mode in the m/z range from 700 to 3,500 Da. A maximum of 40 peptide peaks per fraction with signal-to-noise ratios above 5 were selected for fragmentation. The spectra were processed with Flexanalysis software (version 3.4, Bruker) and the proteins were identified by the Mascot search engine (version 2.4.1, Matrix Science) using the following settings: peptide and fragment mass tolerance were set to 25 ppm and 0.7 Da, respectively. Oxidation of methionine and the acetylation of protein N-termini were set as variable modifications, whereas the carbamidomethylation of cysteine was set as fixed modification. One missed cleavage site was tolerated for tryptic digests, up to 6 for samples digested with Glu-C or chymotrypsin. The false discovery rate was set to a maximum of 2%. For the Mascot search of samples from the different species databases representing the human and the domestic pig proteome were downloaded from the Ensembl repository 43 while the Chlorocebus sabaeus proteome was downloaded from the Uniprot Knowledgebase 44 . Proteins specified by the used ASFV strain OURT 88/3 (GenBank AM712240.1 45 ), were added to all three host cell proteomes and the appropriate compilation of viral and host cell proteome was used for the different samples. The results of the Mascot database search were exported to the ProteinScape software (Version 3.1, Bruker). Only proteins identified with at least two peptides exceeding the Mascot peptide identification score are reported. Proteins were quantified using the exponentially modified protein abundance index (emPAI) 46