In vivo gene expression analyses provide unique insights on P. vivax gametocytogenesis and chloroquine response

Studies of gene expression have provided insights on the regulation of Plasmodium parasites. However, few studies have targeted P. vivax, the cause of one third of all human malaria cases outside Africa, due to the lack of in vitro culture system and the difficulties associated with studying clinical samples. Here, we describe robust RNA-seq profiles of P. vivax parasites generated directly from infected patient blood. Gene expression deconvolution analysis reveals that most parasite mRNAs derive from trophozoites and that the asynchronicity of P. vivax infections is unlikely to confound gene expression studies. We also show that gametocyte genes form two clusters of co-regulated genes, possibly indicating the independent regulation of male and female gametocytogeneses. Finally, despite a large effect on parasitemia, we find that chloroquine does not alter trophozoite gene expression. Overall, our study highlights the biological knowledge that can be gathered by directly studying P. vivax patient infections. Importance Plasmodium vivax is the second most common cause of human malaria worldwide but, since it cannot be cultured in the laboratory, its biology remains poorly understood. In this study, we describe the analysis of the parasite gene expression profiles generated from 26 patient infections. We show that the proportion of male and female parasites varies greatly among infections, suggesting that they are independently regulated. We also compare the gene expression profiles of the same infections before and after treatment with chloroquine, a common antimalarial, and show that the drug efficiently kills most P. vivax parasites but appears to have little effect on one specific parasite stage, the trophozoites, in contrast with the effect of the drug on P. falciparum. Overall, our study exemplifies the biological insights that can be gained from applying modern genomic tools to study this difficult human pathogen.

chloroquine does not alter trophozoite gene expression. Overall, our study highlights the 23 biological knowledge that can be gathered by directly studying P. vivax patient infections. 24 25 Importance 26 Plasmodium vivax is the second most common cause of human malaria worldwide but, since it 27 cannot be cultured in the laboratory, its biology remains poorly understood. In this study, we 28 describe the analysis of the parasite gene expression profiles generated from 26 patient 29 infections. We show that the proportion of male and female parasites varies greatly among 30 infections, suggesting that they are independently regulated. We also compare the gene 31 expression profiles of the same infections before and after treatment with chloroquine, a 32 common antimalarial, and show that the drug efficiently kills most P. vivax parasites but appears 33 to have little effect on one specific parasite stage, the trophozoites, in contrast with the effect of 34

Introduction 38
Plasmodium vivax is the most widespread human malaria parasite, responsible for more than 8.5 39 million clinical malaria cases worldwide in 2016 and threatening more than 2 billion people in 90 40 countries [1,2]. Unfortunately, our understanding of the biology of this important human 41 pathogen remains limited and lags behind that of P. falciparum, primarily due to our inability to 42 continuously propagate P. vivax parasites in vitro [3]. Most studies of P. vivax thus rely on infected 43 blood samples, which complicates biological and molecular investigations due to the 44 polyclonality of most P. vivax infections, the concurrent presence of different parasite stages 45 (with their specific regulatory programs and responses), and the abundance of host molecules 46 that hamper genomic studies. As a result, studies of P. vivax transcriptomes, which could provide 47 unique insights on the biology of this parasite, have been few and far between, and limited to 48 parasites grown in short-term ex vivo cultures [4][5][6]. We therefore still have a very limited 49 understanding of the patterns of P. vivax gene expression during clinical infections or of their 50 changes upon antimalarial drug treatment. 51 Here, we expand on previous analyses demonstrating that RNA-seq data could be generated 52 directly from P. vivax-infected patients [7] and characterize the transcriptomes of 26 clinical P. 53 vivax isolates obtained from Cambodian patients enrolled in a chloroquine efficacy study ([8], 54 Popovici et al., under review). First, we describe variations in parasite gene expression among 55 infected patients and identify novel potential gametocyte markers. Second, we compare the gene 56 We extracted RNA from all blood samples stored in Trizol using the Zymo Direct-zol kit with an 78 in-column DNAse step and eluted RNA into 20 µL of water. We then prepared Illumina stranded 79 libraries after ribosomal RNA and globin mRNA reduction [7] and sequenced them on a HiSeq 80 2500 to generate a total of 1.2 billion paired-end reads of 50 bp (Supplemental Table S1). 81 82

Read alignment 83
We aligned all reads, first to the human genome (Hg38), then to the P01 P. vivax genome 84 (PlasmoDB P01 34 [9]) using Tophat2 [10] with the following parameters: -g 1 (to assign each 85 read to a single location), -I 5000 (maximum intron length), -library-type fr-firststrand (for 86 stranded libraries). We removed potential PCR duplicates using the samtools rmdup function. For 87 each annotated P. vivax gene and each sample, we determined the number of reads overlapping 88 any exon using custom perl scripts [7]. We then transformed the raw counts into normalized 89 transcripts per million (tpm) by dividing each gene count by the gene length (in kb) and by the 90 sum of these values (in millions) in each sample. For further analyses, we only considered 91 annotated P. vivax genes with more than 20 transcripts per million (tpm) in at least 10 samples, 92 resulting in 4,999 genes analyzed. 93

Gene expression deconvolution 95
To determine the proportion of mRNA molecules derived from parasites at each developmental 96 stage in each sample, we performed gene expression deconvolution using Cibersort [11]. We 97 retrieved stage-specific P. falciparum RNA-seq data [12] and converted P. falciparum to P. vivax 98 gene names using PlasmoDB, excluding all genes with no orthologs or multiple orthologs, leaving 99 a total of 2,901 genes used for gene expression deconvolution. We then ran Cibersort with 100 100 permutations and quantile normalization disabled. 101 102 Gene co-expression analysis 103 To examine the gene expression of gametocytes, we first retrieved putative gametocyte genes 104 from the literature and determined the Pearson correlation between the gene expression level 105 of each pair of genes across all samples. We then grouped the genes using unsupervised 106 hierarchical clustering and assessed the significance of the clusters using pvclust [13]. 107 We also identified putative novel gametocyte genes by identifying genes whose expression was 108 statistically correlated with one of the known gametocyte genes (with Pearson's R >0.8 and 109 p<0.01). 110

Determination of gene expression profile similarity 112
To assess the overall effect of the first dose of chloroquine on the parasite gene expression 113 profiles, we compared the mean difference in gene expression across all genes between paired 114 samples to the mean difference observed between randomly paired samples. We first calculated 115 the pair-wise normalized gene expression differences for each pair of samples (i.e., before and 116 after chloroquine) across all expressed genes using the following formula: 117 where Xi,0 stands for the normalized gene expression (in tpm) of the sample i at time 0 for gene 119 X. We then summed all pair-wise differences across all pairs of samples and compared this 120 number to the sum of the pair-wise differences obtained by randomly pairing samples (n=100) 121 (i.e., Xi,0-Xj,1 where i and j are different patients). 122

123
Identification of genes differentially expressed 124 We also determined which P. vivax genes were significantly affected by chloroquine by testing 125 for differential gene expression using EdgeR and paired analyses [14]. To account for the large 126 difference in the number of reads obtained from P. vivax mRNAs before and after chloroquine 127 treatment, we randomly subsampled all pre-treatment datasets to the same number of reads as 128 in their corresponding post-treatment datasets. Due to the lower read counts in samples after 129 treatment and the subsampling, only 4,280 genes remained expressed above the threshold 130 described above and were included in the analysis of the effect of chloroquine on parasite gene 131 expression. All analyses were corrected for multiple testing using false discovery rates [15,16]. 132 133

Variations in P. vivax gene expression among clinical infections 135
We extracted RNA from ~50 µL of whole blood collected from 26 Cambodian individuals seeking 136 treatment for vivax malaria. All patients were tested positive for Plasmodium vivax by RDT and 137 blood smears, and PCR confirmed that they were solely infected with P. vivax. We prepared and 138 sequenced a stranded RNA-seq library from each blood sample after globin and rRNA reduction 139 as previously described [7]. After removing reads originating from the human genome, we 140 aligned all reads to the most recent P. vivax genome sequence [9]. The percentage of reads 141 originating from P. vivax transcripts varied greatly among samples, from 1.09% to 38.78% (mean: 142 16.72%), with only a moderate correlation with the samples' parasitemia (Pearson's R=0.37,143 p=0.06, Supplemental Figure 1). Overall,253,491,854 reads aligned to the P. vivax 144 genome and 20 of the 26 samples yielded more than one million P. vivax reads (Supplemental 145 Table S1). Out of the 6,823 annotated P. vivax genes, 4,999 were deemed expressed in at least 146 ten patients and were further analyzed.  Table 1). The gene expression of these 21 predicted gametocyte genes were highly correlated 164 among samples but, interestingly, clustered into two distinct subsets poorly correlated with each 165 other ( Figure 2B). Thus, Pvs47, Pvs48/45, Hap2, the gamete egress and sporozoite traversal 166 protein, s16, and 3 CPW-WPC proteins were all positively correlated with each other (Cluster A 167 in Figure 2B, Pearson's R=0.15-0.96, p<0.01), while Pvs25, ULG8, the gametocyte associated 168 protein, gametocyte developmental protein 1, guanylate kinase, HMGB1, and 5 CPW-WPC 169 proteins all clustered into a second group (Cluster B, Pearson's R=0.05-0.86, p<0.01). Only genes 170 belonging to this second group (Cluster B) had their expression correlated with the 171 gametocytemia determined by microscopy (Pearson's R>0.5, p<0.01, Table 1). To expand our 172 knowledge of P. vivax genes possibly expressed in gametocytes, we then searched for additional 173 genes whose expression correlated with any of these gametocyte genes. Overall, the expression 174 of 1,613 genes were correlated with that of at least one of these known gametocyte genes 175 (Pearson's R>0.8, p<0.01) and could represent putative novel gametocyte genes. These included 176 460 genes correlated with members of Cluster A and 1,153 correlated with genes in Cluster B 177 (Supplemental Table S2). Gene ontology analyses showed that genes whose expression 178 correlated with those of gametocyte genes from Cluster A were enriched in biological processes 179 such as microtubule related genes, including dynein, kinesin, and tubulin. By contrast, genes 180 associated with intracellular trafficking and histone remodeling were overrepresented in Cluster 181

B. 182 183
Effect of chloroquine exposure on P. vivax transcriptome 184 To assess how P. vivax parasites respond to exposure to chloroquine, we compared the 185 expression profiles of the parasites collected, from the same 20 infections, before and eight hours 186 after the first dose of chloroquine treatment. Consistent with the parasite clearance induced by 187 chloroquine, we observed that the proportion of RNA-seq reads aligned to the P. vivax genome 188 decreased after chloroquine treatment ( Figure 3A). However, while the parasitemia measured 189 by microscopy only decreased, on average, by 36% eight hours after the administration of the 190 first dose of chloroquine, the proportion of P. vivax reads decreased by more than 72%. Gene 191 expression deconvolution analyses indicated that, eight hours after treatment, most parasite 192 RNAs (>95%) still primarily derived from trophozoites. While it did not separate the samples collected before and after chloroquine, PC1 was statistically 206 associated with chloroquine treatment in paired analyses (p=0.002, Supplemental Figure S3) 207 suggesting that parasite gene expression was influenced by the treatment. We therefore tested 208 the effect of chloroquine exposure on each annotated P. vivax gene. Out of 4,280 genes tested, 209 293 genes (195 downregulated and 98 upregulated) showed a statistically significant change in 210 expression following chloroquine treatment (FDR<0.1, Figure 3C). Interestingly, a large number 211 of exported protein genes (including 23 PHIST genes) and PIR genes were significantly 212 downregulated after treatment, as were many genes involved in erythrocyte invasion (e.g., 213 AMA1, DBP, MSP5, RBP2a, RBP2e or RBP3) (Supplemental Table S3). Genes upregulated included 214 ribosomal RNAs and histones (Supplemental Table S3). None of the candidate genes suspected 215 to be associated with chloroquine susceptibility in P. vivax or P. falciparum showed significant 216 changes in expression: for example, the chloroquine resistance transporter gene (CRT) showed 217 only a modest, non-significant, decrease in expression (log2 fold change = -0.47, FDR=0.42) as did 218 the multidrug resistance gene (MDR1) (log2 fold change = -1.48, FDR=0.27). Since we previously 219 showed that both CRT and MDR1 could be spliced into multiple isoforms in P. vivax [7], we also 220 tested whether the transcription of different isoforms was associated with chloroquine 221 susceptibility. In samples pre-treatment, we did not observe any association between the 222 isoforms expressed and the subsequent response to chloroquine: retention of the CRT intron 9, 223 that leads to a predicted early stop codon, was highly variable among infections (Supplementary 224

RNA-sequencing has provided invaluable insights on some of the fundamental characteristics of 232
Plasmodium gene expression and, for example, highlighted the complexity of the parasite 233 transcriptome with its extensive noncoding transcripts, long 5' and 3' untranslated regions 234 (UTRs), and, often, multiple isoforms per gene [4,7,[18][19][20]. Additionally, gene expression studies 235 have been extremely useful to identify transcriptional differences between parasite stages [5, 6, 236 12, 21, 22], in response to antimalarial drugs [23][24][25][26]  studying asynchronous infections and indicates that analyses of parasite RNA-seq profiles 252 generated directly from infected patient blood are unlikely to be confounded by stage differences 253 (and could be further controlled by using the gene expression deconvolution results as covariates 254 in the analyses). However, these results also imply that differences in gene regulation occurring 255 in rings or schizonts will be more difficult to detect from whole blood RNA-seq and may require 256 other approaches, such as profiling after short-term cultures or single-cell RNA-seq [32,33]. In 257 addition, we should caution that gene expression deconvolution relies on the gene expression 258 patterns of reference datasets generated from highly synchronized P. falciparum in vitro cultures 259 and that these samples might not fully recapitulate the profiles of parasites in vivo [27] or the 260 specificities of P. vivax development (though these limitations could be alleviated in the future 261 by using gene expression profiles generated by single-cell RNA-seq analyses of patient infections). 262 The dataset used for gene expression deconvolution did not include profiles from gametocytes 263 and these were therefore not considered in this analysis. However, we observed that the 264 expression levels of putative gametocyte markers were highly correlated with each other across 265 infections but, surprisingly, clustered in two distinct groups. Our co-expression analyses revealed 266 novel putative gametocyte genes and indicated that microtubule associated proteins were 267 overrepresented among the genes from one cluster, while intracellular trafficking and histone 268 remodeling genes were overrepresented in the second cluster. One explanation for these 269 observations is that the Cluster A (that includes hap2) represented genes expressed in male 270 gametocytes while Cluster B (that includes Pvs25) corresponded to female gametocyte genes 271 (Table 1). This hypothesis is consistent with the observation that gametocytemia (determined by 272 microscopy) was only correlated with the expression of genes from Cluster B since female 273 gametocytes are much more abundant and easier to detect by microscopy [34,35]. These data 274 indicate that not only the proportions of gametocytes vary across infections but, more 275 importantly, the lack of correlation between the two gene clusters suggests that the ratio of male 276 to female gametocytes differs among infections. This interpretation is exciting as it would imply 277 that the male and female gametocytogeneses are two distinct and independently-regulated 278 processes, which would provide a mechanism for the parasites to reduce the probability of self-279 fertilization in mosquito by isolating, perhaps temporally, the generation of gametes of different 280 sexes by a single clone. Note that by identifying new genes that are potentially highly expressed 281 in male and female P. vivax gametocytes (Supplemental Table S2), our findings will also facilitate 282 further investigations of this hypothesis using additional clinical samples. 283 We also explored how exposure to chloroquine affects the regulation of P. vivax gene expression 284 in vivo by sequencing RNA from parasites collected eight hours after the first dose of chloroquine 285 treatment. As expected, we observed a large decrease in the proportion of reads aligned to the 286 P. vivax genome when compared to samples pre-treatment. However, the magnitude of this 287 decrease was significantly greater than the decrease in parasitemia measured by microscopy and 288 might reflect the inclusion, in the microscopy counts, of dead or metabolically inactive parasites, 289 resulting in an overestimation of the parasitemia after chloroquine treatment. This hypothesis, 290 that will need to be validated in future analyses, could suggest that clearance rates, typically 291 determined for antimalarial drugs by microscopy, might be underestimated due to the 292 confounding rate at which dead parasites are cleared from the circulation. In striking contrast 293 with the important decrease in the proportion of parasite mRNA after treatment, we noted few 294 qualitative changes in parasite gene expression. Only a handful of genes were differentially 295 expressed after drug treatment and the overall profiles of gene expression remained unchanged 296 between infections. One possible explanation for this discrepancy is that the trophozoites, who 297 contribute the vast majority of the parasite transcripts (Figure 1), are unaffected by chloroquine 298 while the other blood stages are rapidly eliminated, leading to less "new" trophozoites eight 299 hours after treatment and therefore less total parasite RNAs. (Note that all P. vivax infections 300 included in this study were efficiently cleared by chloroquine and that we did not observe any 301 evidence of drug resistance in these parasites (Popovici et al., under review)). This hypothesis is 302 consistent with the results of ex vivo studies that showed that P. vivax trophozoites, in contrast 303 to P. falciparum trophozoites, are insensitive to chloroquine [36]. In this regard, it is also 304 interesting to note that the decrease in the expression level of invasion genes and exported 305 proteins after treatment could reflect a lower proportion of schizonts after treatment. 306 Alternatively, this decrease in the expression of invasion genes could reflect a change in the 307 strategy of the surviving parasites [37], preferentially opting for sexual commitment rather than 308 asexual multiplication. 309 310

Conclusion 311
Our study demonstrates that robust gene expression profiles can be generated directly from 312 blood samples of vivax malaria patients and that stage composition differences between 313 infections are unlikely to confound gene expression studies since mRNAs overwhelmingly derive 314 from trophozoites. Our analyses also reveal that the production of male and female gametocytes 315 are not correlated with each other, and that the gametocyte sex ratio varies significantly among 316 infections, providing a possible mechanism for the parasite to reduce self-fertilization. Finally, we 317 showed that chloroquine efficiently cleared most blood stage parasites but had little effect on 318 trophozoites. Overall, our study highlights the biological knowledge that can be obtained from 319 studying gene expression profiles of P. vivax clinical infections and provides a promising 320 framework to better understand this important human pathogen. 321

Funding information 322
This work was funded by a National Institutes of Health -NIAID award to DS (R01 AI103228). 323 Additional support was provided by a grant from the Institut Pasteur to DM (PTR 2014-490