Genome-wide analysis of gene expression and protein secretion of Babesia canis during virulent infection identifies potential pathogenicity factors

Infections of dogs with virulent strains of Babesia canis are characterized by rapid onset and high mortality, comparable to complicated human malaria. As in other apicomplexan parasites, most Babesia virulence factors responsible for survival and pathogenicity are secreted to the host cell surface and beyond where they remodel and biochemically modify the infected cell interacting with host proteins in a very specific manner. Here, we investigated factors secreted by B. canis during acute infections in dogs and report on in silico predictions and experimental analysis of the parasite’s exportome. As a backdrop, we generated a fully annotated B. canis genome sequence of a virulent Hungarian field isolate (strain BcH-CHIPZ) underpinned by extensive genome-wide RNA-seq analysis. We find evidence for conserved factors in apicomplexan hemoparasites involved in immune-evasion (e.g. VESA-protein family), proteins secreted across the iRBC membrane into the host bloodstream (e.g. SA- and Bc28 protein families), potential moonlighting proteins (e.g. profilin and histones), and uncharacterized antigens present during acute crisis in dogs. The combined data provides a first predicted and partially validated set of potential virulence factors exported during fatal infections, which can be exploited for urgently needed innovative intervention strategies aimed at facilitating diagnosis and management of canine babesiosis.

Supplementary Document S1. CEGMA completeness and completeness relative to Toxoplasma gondii core eukaryotic gene families in selected apicomplexan species.

Isolation of genomic DNA, library preparation and genome sequencing
Whole blood from the experimental inoculated splenectomized dog was used. Washed blood samples were passed twice through a cellulose powder column (medium fibre powder; Sigma-Aldrich) to deplete host leukocytes (Rodriguez et al. 1986;Sriprawat et al. 2009 method was chosen in order to improve the enrichment the longer fragments.
After the run, a sequencing report was generated for every cell via the SMRT portal, in order to assess the adapter dimer contamination, the sample loading efficiency, the obtained average read-length and the number of filtered sub-reads.

Genome assembly and annotation
Long fragments sequenced on the Pacific Bioscience RSII had the SMRTbell adapters removed and filtered according to a minimum read length of 50 nucleotides and a minimum read quality of 75%. Moreover, contamination from fragments coming from the host Canis familiaris were also excluded following a mapping step with PacBio read-alignment software BLASR (Chaisson and Tesler 2012). The final draft of the genome was obtained using a two-step procedure. De novo assembly was initially performed using the HGAP2 algorithm (Chin et al. 2013) and the outcome was further polished using Quiver, a reference guided consensus tools developed as part of the SMRT analysis, PacBio proprietary analysis package. The resulting draft was carried over for gap-closing and further scaffolding using PBJelly v14.1.15 (English et al. 2012). BLASR was provided with the following settings: minMatch = 8, sdpTupleSize = 8, minPctIdentity = 75, bestn = 1, nCandidates = 10, maxScore = 500, and noSplitSubreads was activated.
Genome annotation was performed using different sources of evidence. First evidence was based on the Maker2 pipeline (Stanke and Waack 2003). Augustus (Version 3.1) (Korf 2004) was used as ab initio gene predictors using initially P. falciparum as model organism. The Trinity-assembled de novo transcriptome was used as transcript evidence (RNA seq is described below). Protein sequences from UniProt were used as homology-based evidence.
Repetitive genomic elements were identified and masked from annotation with RepeatMasker (http://www.repeatmasker.org) using the full Repbase database (Jurka et al. 2005). The Maker2 annotation was run a first time using the de novo transcriptome directly to infer gene predictions (i.e. est2genome=1) and a second time incorporating the transcriptannotation file (in .gff format) obtained from the first run (i.e. "pass-through" aided annotation). De novo transcriptome assembly (RNA isolation and processing is described below) was performed using the Trinity software suite v.2013-8-14 (Grabherr et al. 2011) using default parameters. The likely coding DNA sequences (CDS) and corresponding proteins within the de novo transcriptome assembly were estimated with Transdecoder (http://transdecoder.sourceforge.net/). Redundant transcripts potentially representing sequencing errors or genetic polymorphisms were clustered with the cd-hit software (Fu et al. 2012). This first approach resulted in 2187 gene-predictions. In order to improve the dataset, a second evidence was generated by ab initio gene prediction. Augustus was trained with complete gene sets to perform ab initio annotation with P. falciparum 3D7, B. bovis T2Bo, and T. annulata Ankara as model organisms, retrieved from EuPathDB release 26 (Aurrecoechea et al. 2010). Non-overlapping sequences to the first evidence were generated from the gfffiles with BEDTools genome arithmetic software (Quinlan 2014). This second approach resulted in additional 690 gene-predictions. Third dataset was based on Trinity-assembled de novo transcriptome transcript evidence and curation of the hits. The non-overlapping sequences to the first two sources of evidence resulted in additional 590 gene-predictions.

Analysis of genome completeness
We initially used CEGMA (Parra et al. 2007)

RNA preparation, wet lab transcriptomics, and transcriptomic data analysis
Host-contamination depleted B. canis blood stages were rapidly re-suspended in Trizol

Functional annotation and characterization
Functional annotation of the gene-predictions was achieved by gene ontology mapping  (Aurrecoechea et al. 2010).

Prediction of exported gene models
In a first stage, prediction of the B. canis exportome was based on different in-silico analysis tools, databases, and manual curation. Criteria for selection in the predicted exportome were based on the prediction of a canonical hydrophobic N-terminal signal peptide (cSP) by SignalP v3.0 (Bendtsen et al. 2004b) using conservative D-score discrimination of 0.46, prediction of transmembrane (TM) domain by TMHMM v2.0 (Krogh et al. 2001) using default settings, prediction of previously described and known apicomplexan secretory protein families using Markov-based method Pfam (Finn et al. 2014), functional gene ontology and pBlast data for protein homology detection to other apicomplexan (Pelle et al. 2015). Additionally, prediction of potential leaderless protein secretion was based by ab initio prediction of non-classical triggered protein secretion by SecretomP 1.0f using conservative NN-score cut-off of 0.6 (Bendtsen et al. 2004a). Prediction of GPI-anchors was performed using PredGPI server (Pierleoni et al. 2008) with the conservative model and an inclusion above 99% prediction probability. In silico GPI prediction without TM helices prediction were ignored and grouped according other evidences due to overestimation of GPI anchors (Pierleoni et al. 2008).
Transmembrane domain-containing proteins were further analyzed for sub-cellular direction to the parasite (plasma) membrane based on ProtComp 9.0.
On a second stage, proteins witch could not be assigned to any known apicomplexan secretory protein family and with an explicit prediction of subcellular targeting were assigned to its subcellular direction (and eliminated from the list of potential host-targeting proteins).
Apicoblast targeting was based on local ApicoAP v2.7.3 (Cilingir et al. 2012) based on SignalP 3.0 predicted signal peptides and B. bovis or B. microti as model organism. Mitochondrial targeting was predicted with MitPred v2.0 (Kumar et al. 2006) curated with TargetP v1.1 (Emanuelsson et al. 2000) prediction and pBLAST evidence. Nucleus targeting was based on NucPred (Brameier et al. 2007). Ambiguous results were cross-validated with the ApiLocDB v3 whenever possible. Subcellular location was further validated by combination of TargetP and ProtComp 9.0 results. Finally, proteins which contain only a TM-domain and could not be assigned to a known host-targeting protein family were not considered to target the host, as these proteins are likely to be retained within the parasite. This approach resulted in a list of proteins targeting the parasite surface, iRBC membrane, and which are potentially secreted to the host cell cytoplasm or beyond.

Samples for proteomic studies
Secreted soluble B. canis proteins. Short term in-vitro cultures of B. canis were performed from anticoagulated whole blood (12.3% v/v citrate-phosphate-dextrose-adenine; CPDA-1) from experimental inoculated dogs and kept always at 37°C. After removing plasma, blood samples were washed and passed twice through a cellulose powder column (medium fibre powder; Sigma-Aldrich) to deplete host leukocytes. Subsequently, parasites were cultured for 18 hours in serum-free medium based on RPMI-1640 (Sigma-Aldrich), 25 mM HEPES (Sigma-Aldrich), 25 mM NaHCO3, 2 mM L-glutamine (Sigma-Aldrich) and 100 µg/ml neomycine (Sigma-Aldrich). Each culture consisted of a total volume of 10 ml with 5% packed cell volume (approx. 1% parasitaemia) in 90 mm petri dishes (Thermo Fisher) at 37 °C, 4% CO2 and 3% O2 cultured in a modular incubator chamber (MIC-101; Billups-Rottenberg). The medium containing the secreted soluble Babesia proteins was collected, centrifuged (2,000 x g for 10 min) and the supernatant concentrated with an Amicon filtration unit and PLGC (10,000 NMWL) membranes (both Merck Millipore, Darmstadt, Germany). Concentrated antigens from the two different media were stored at -80 °C until used. Pooled samples from the three dogs were used for further studies. Similar, blood from a healthy canine blood donor was cultures and supernatant was processed as a negative control.
Infected host cell membrane bound B. canis proteins. CPDA-1 anticoagulated, leucocytedepleted blood was washed three times with ice cold RPMI-1640 and iRBC were concentrated by a Percoll gradient (GE Healthcare) centrifugation (Figueroa et al. 1990). Parasites were then released from the iRBC by streptolysin-O mediated lysis (Richier et al. 2006) and removed by another Percoll gradient centrifugation step (Rodriguez et al. 1986 cycle 20%, output control 2.0) on ice and proteins were precipitated in chilled acetone and lipids were removed. Ghost proteins were stored in homogenization buffer at -80°C until used.
The same protocol was used to collect ghost proteins from uninfected erythrocytes. 180 µm x 20 mm column (Waters Corporation) and separated on a BEH300 C18, 1.7 um, 75 µm x 150 mm column (Waters Corporation) using a gradient formed between solvent A (0.1% formic acid in water) and solvent B (0.1% formic acid in acetonitrile). The gradient started at 1% solvent B and the concentration of solvent B was increased to 40% within 60 minutes.

Mass spectrometry and protein identification
Following peptide data acquisition, different database searches were performed using the MASCOT search program against a database build on the B. canis annotated genes with a concatenated decoy database supplemented with commonly observed contaminants and the Swissprot database to increase database size, or against NCBI database for Canis lupus familiaris. The identified hits were then loaded onto the Scaffold Viewer version 4 (Proteome Software, Portland, US) and filtered based on relaxed or high stringency parameters. Relaxed stringency parameters correspond to a minimal mascot score of 20 for peptide probability, a protein probability greater than 50%, and a minimum of 1 unique peptide per protein. High stringent parameters were set to reach a corresponding false discovery rate (peptide and protein) <5% at 90% for peptide and protein probability, and 2 and 1 unique peptides per -14 -protein for the secreted soluble and membrane-bound proteins, respectively. All datasets were corrected for obvious host-and environmental contaminants by an additional pBLAST search with the single peptide hits.