Introduction

Despite integrative control efforts, there were an estimated of 247 million malaria cases with 619,000 deaths in 2021 of which Plasmodium falciparum was the main causative agent1,2. The emergence and widespread of drug resistant parasites and insecticide resistant mosquitoes have impeded the progress toward sustainable reduction of morbidity and disease elimination in several endemic areas. As alternative strategy, vaccination could be an important means for adjunctive malaria control. To date, a remarkable progress has been envisaged for vaccines against pre-erythrocytic stages targeting the infective sporozoites and probably the liver stage parasites whereas vaccines against asexual blood stages including multiple merozoite antigens have been explored as potential vaccine candidates3.

The ~ 80 kDa glutamic acid-rich protein of P. falciparum (PfGARP) is highly expressed during trophozoite development4 and is detectable during schizogony of intraerythrocytic parasites5. The gene encoding PfGARP of the FC27 strain contains 2248 bp, characterized by a short 5′-exon encoding a signal peptide followed by a 214 bp intron and a second exon spanning 653 codons6. PfGARP is composed entirely of intrinsically disordered structure and repetitive low complexity sequences in which glutamic acid, lysine and aspartic acid constitute over half of all amino acid residues in the protein. Four complex repeat-containing regions, three of which were rich in lysine residues, and two homopolymeric glutamic acid repeats have been identified in this protein6. It has been shown that the lysine-rich repeats in PfGARP account for an indispensable module for targeting the protein to the periphery of the infected erythrocyte. Furthermore, in vitro mutagenesis has revealed that the length of the lysine-rich repeats in PfGARP is crucial for peripheral targeting efficiency7. Meanwhile, variation in length of lysine-rich repeat regions occurred in several laboratory strains of P. falciparum including 3D7, Dd2, HB3, IT and 7G8 strains7. Domain mapping of PfGARP has identified an immunogenic lysine-rich repeat region as a secreted ligand capable of binding to an ectodomain of erythrocyte band 3, an anion-exchanger in the red cell membrane, as a host receptor5.

PfGARP-derived synthetic peptides containing the erythrocyte-binding repeats conferred aggregation of erythrocytes akin to rosette formation, a phenomenon contributing to microvascular obstruction during the pathogenesis of complicated malaria5. Meanwhile, mouse anti-PfGARP antibody elicited significant inhibition of parasite growth in vitro. Consistently, anti-PfGARP antibodies purified from pooled plasma of Tanzanian adults could remarkably halt parasite growth in culture. Aotus monkeys immunized with PfGARP-derived vaccines were protected against high parasitemia and severe anemia4. Anti-PfGARP antibodies per se could mediate parasite killing by triggering programmed cell death in the asexual blood-stage parasites. Tanzanian children who mounted anti-PfGARP antibody responses upon natural infections had lower risk of severe malaria than those without detectable antibodies. Likewise, the levels of parasitemia in Kenyan adolescents and adults inversely associated with the magnitude of natural anti-PfGARP antibody responses4. Therefore, PfGARP is a promising target for anti-disease vaccine while it is also considered to be a potential marker for disease progression3,4,8.

Antigenic polymorphisms in malarial vaccine candidates could hinder an effective vaccine design if the protective immunity is predominantly strain-specific9. Although it has been suggested that PfGARP exhibited meager genetic diversity, the conclusion seemed to be mainly drawn from whole genome sequence data where variation in the repetitive sequences requires further elucidation4,10. Herein, we analyzed the nucleotide sequences of this locus among P. falciparum populations from four major malaria endemic areas of Thailand. Results revealed limited sequence variation in non-repeat regions of PfGARP among Thai and global isolates whereas differential diversity in repeat domains was observed. In addition to previously identified four repeat-encoding domains and two homopolymeric glutatamic acid repeat regions6, two additional regions have been newly recognized to possess repetitive sequences. Furthermore, parasite genetic structure and in silico prediction of immunogenic epitopes in PfGARP have been analyzed.

Results

Genetic diversity and structural organization of PfGARP

The PfGARP sequences were successfully obtained from all 80 isolates which revealed clear and non-superposed signals on electropherograms. Size variation in PfGARP was observed among Thai isolates, ranging from 2179 to 2284 bp. In total, 26 alleles of PfGARP were identified among Thai isolates whereas an isolate (MDCU32) from a Guinean patient analyzed in this study had a different sequence. Likewise, size and sequence variation were also observed among previously reported sequences of this locus among isolates from other malaria endemic areas (n = 18) including African, Indochina, South American and Western Pacific countries, all of which possessed distinct sequences with size variation from 2209 to 2266 bp (Supplemental Table S1). Together with previously reported complete coding sequences, 44 haplotypes were identified (Table 1). Of these, 26 haplotypes were found among 80 Thai isolates (Supplemental Fig. S1) in which the numbers of haplotypes and haplotype diversity of isolates from Tak, Ubon Ratchathani and Chanthaburi Provinces were remarkably higher than those of Yala Province (Table 1). Likewise, nucleotide diversity of P. falciparum population from Yala Province was significantly lower than those of other endemic areas (Table 1). Based on available 99 complete coding sequences, PfGARP can be divided into 13 blocks consisting of five non-repeat and eight repeat-containing regions (Fig. 1).

Table 1 Haplotype and nucleotide diversity of PfGARP among Thai and global isolates.
Figure 1
figure 1

Schematic representation of PfGARP. Exons are shown as boxes and an intron as a dense line. Boxes are numbered above the scheme. Exons are characterized by conserved, repeats and homolymeric glutamic acids regions. Amino acid position at the end of each block/domain is indicated beneath the scheme. The nucleotide positions for boxes Ia, Ib and II to XIII are 1–75, 290–571, 572–706, 707–997, 998–1215, 1216–1327, 1328–1462, 1463–1537, 1538–1855, 1856–1939, 1940–2026, 2027–2173, respectively (positions corresponding to coding sequence of the FC27 strain, GenBank accession no. J03998).

Exon I and intron

The 75-bp coding region in exon I exhibited perfect sequence identity among Thai and worldwide isolates (block 1a in Fig. 1). The adjacent intron region displayed two variants due to short insertion/deletion of TA residues: one possessed 214 bp and the other contained 216 bp. The former was more prevalent among Thai and worldwide isolates accounting for 80% and 83.3%, respectively.

Non-repeat regions in exon II

The nonrepetitive sequences in exon II were highly conserved containing two synonymous substitutions: c. 439A>G (E75) and c. 502A>T (I96) in block 1b and a synonymous substitution: c. 2248C>T (I678) in block 13 (positions corresponding to coding sequence of the FC27 strain, GenBank accession no. J03998) (Fig. 1). Five nonsynonymous substitutions occurred in conserved block 3 among non-Thai isolates: c. 707A>G (K165E), c.781G>T (D193Y), c. 852C>T (P213L), 854T>G (Y214D) and c. 861A>G (Y216C). The distribution of these single nucleotide polymorphisms (SNPs) among isolates is shown in Supplemental Table S2. The remaining non-repeat regions including blocks 5 and 8 were perfectly conserved among Thai and worldwide isolates.

Repeat domain I (block 2)

Besides previously known four repeat blocks, two additional repeat regions have been identified in PfGARP based on Tandem Repeat Finder Program. Herein, these domains were assigned as repeat domains I-VI (RI-RVI) in which previously reported repeat sequence motifs in blocks 1–4 are corresponding to domains RI, RIII, RIV and RVI, respectively6. Analysis of 80 Thai and 18 worldwide isolates have shown that domain RI could be assigned to 13 alleles, characterized by KKX motif where X is D, K, E or H as previously described6. The tripeptide repeats in this domain varied from 12 to 19 units. Designation of alleles in all repeat-containing regions was referred here according to the number of amino acid residues. When different sequences contained identical number of amino acids, alleles were further subdivided by adding an alphabet following the number to indicate variants; thereby, new alleles could be included in alphabetical order. Of these, eight alleles occurred in Thai isolates which included RI-57, RI-51A, RI-51B, RI-51C, RI-48, RI-45A and RI-45B. The RI-45A allele was most prevalent and seemed to circulate across endemic provinces in Thailand (Table 2).

Table 2 Diversity and distribution of repeat alleles in block II (repeat domain I) of PfGARP.

Repeat domain II (block 4)

Repeat domain II has been newly identified in this study, characterized by two copies of degenerate 33-codons encoding KKERKQKEKEMKE(or K)QE(or K) KIEKK(or E)K(or R)KKQ(or K)EEKEKKKQ(or K)E (or K) intervened by a short region encoding KERKKQE. The sequence of repeat domain II exhibited sequence conservation except a deletion of the last two lysine residues in all Thai and most worldwide isolates (Supplemental Table S3).

Repeat domain III (block 6)

Repeat domain III, characterized by degenerate pentapeptide motifs encoding E(or G/K)EH(or D)K(or E/K)E(or K/S) in which the repeats comprised EEHKE, GEHKE, GEDKE, GEHKK, EEHKK, GEHEE, EEHKS, GEHKS and KEHKE. In total 19 alleles were identified, 11 of which were found in Thailand. Allele RIII-30 was most common and could be detected in all isolates from Yala Province, followed by alleles RIII-45A and RIII-35A whereas the isolate MDCU32 had a unique sequence (Table 3).

Table 3 Diversity and distribution of repeat alleles in block 6 (repeat domain III) of PfGARP.

Repeat domain IV (block 7)

It has been shown that repeat domain IV (RIV) of PfGARP is a parasite ligand for human erythrocyte band 3 that could contribute to the cytoadherence during asexual blood stage development of P. falciparum5. Although RIV was located adjacent to RIII, the sequences were different in which the latter comprised 5 copies of degenerate pentapeptide repeats KGKKX where X was D, K, E or H as previously noted6. Analysis of Thai and worldwide isolates has shown perfect sequence identity in this domain, resulting in a single haplotype of this domain.

Repeat domain V (block 9)

The newly recognized repeat domain V (RV) was characterized by imperfect repeats encoding KEVE(or Q)EE(or gap)S(or gap), flanked by EEDKKEES and DEEEVEED at the N- and C-termini of this domain, respectively. Five alleles have been identified in which the C-terminal sequence of allele RV-23 had a deletion of five codons encoding EEVEE. Of these, four alleles have been detected among Thai isolates (Table 4).

Table 4 Diversity and distribution of repeat alleles in block 9 (repeat domain V) of PfGARP.

Repeat domain VI (block 11)

Repeat domain VI (RVI) contained degenerate heptapeptide repeats consisting of E(or D)E(or D)E (or D)XE(or D)E(or D)E(or D) where X is A, V, D, E or gap, followed by (E)n(D)m residues where n and m varied from 1–5 to 1–3, respectively6. This repeat domain was the most polymorphic region in PfGARP, containing 27 alleles; 13 of these occurred among Thai isolates (Table 5). The isolate MDCU32 shared the same allele of this domain with the strain TG01 sequence from Togo (GenBank accession no. LR131450).

Table 5 Diversity and distribution of repeat alleles in block 11 (repeat domain VI) of PfGARP.

Homopolymeric glutamic acid repeats

Size variation was observed in homopolymeric glutamic acid repeats of domains E1 and E2, corresponding to blocks 10 and 12, respectively. The E1 domain contained 16 to 29 codons, characterized by interruption of perfect GAA repeats by GAG triplets. The E2 domain consisted of uninterrupted perfect GAA repeats with length variation from 5 to 11 codons (Supplemental Table S4).

Test for neutrality

Among the non-repeat blocks of PfGARP, the rate of nonsynosmous substitutions per nonsynonymous site (dN ± S.E. = 0.0086 ± 0.0038) significantly exceeded that of synonymous substitutions per synonymous site (dS ± S.E. = 0.0000 ± 0.0000) in block 3 (p = 0.024) whereas no significant difference between these parameters occurred in other blocks. Meanwhile, codon based detection of deviation from selective neutrality using Fast Unconstrained Bayesian Approximation (FUBAR) has identified positive selection at codons 193 (D>Y) and 214 (Y>D) in non-repeat blocks 3 (Supplemental Table S5). Likewise, purifying selection was detected at codons 75(E) and 96(I) in block 1b and codon 678 (I) in block 13 based on the FC27 sequence.

Phylogenetic analysis

Both neighbor-joining and maximum likelihood trees inferred from the complete coding sequences of PfGARP did not show distinct phylogenetic clades due to the lack of high bootstrap values supporting the main branches. Like African isolates, most Thai isolates did not show any clusters or distribution based on location of origin in the phylogenetic tree. This is expected considering the described highly variable repeat domains in PfGARP. Out of 13 blocks, variation in repeat domains I-III, V and VI could contribute to the topology of phylogenetic tree (Fig. 2).

Figure 2
figure 2

Neighbor-joining tree inferred from the complete PfGARP gene sequences from Thai and worldwide isolates. Thai isolates with initials TSY and AP are from Tak, UB from Ubon Ratchathani, YL from Yala and CT from Chanthaburi Provinces. The numbers following these initials are used to label individual isolates. Bootstrap values greater than 60% are shown along the branches. Thai and African isolates are marked with circles and triangles, respectively. Scale denotes nucleotide substitutions per site.

Genetic differentiation

Population genetic structure inferred from allelic and genotypic frequencies of PfGARP was analyzed in P. falciparum populations from different endemic areas in Thailand by using Wright’s F-statistics. Almost all pairwise FST values among parasite populations from different endemic provinces significantly exceeded zero. However, the interpopulation variance between parasite populations from Tak and Chanthaburi Provinces was not statistically meaningful (p = 0.099) (Table 6).

Table 6 Genetic differentiation of P. falciparum populations inferred from PfGARP.

Parasitemia and PfGARP alleles

To determine whether variation in the number of amino acid residues in repeat regions of PfGARP was associated with parasitemia of the patients, analysis was performed using 76 isolates (Tak, n = 19; Ubon Ratchathani, n = 19, Chanthaburi, n = 18 and Yala, n = 20) whose parasite density could be determined. Of these, parasitemia ranged from 200 to 864,000 parasites per μL (median, 11,100 parasites/μL; geometric mean, 10,588 parasites/μL). Results revealed a tendency towards higher parasite density in patients infected with P. falciparum bearing more amino acid residues in repeat domains RIII and RVI including its flanking domains E1 and E2 (Kruskall–Wallis H test, p = 0.011 and 0.0281, respectively) (Table 7). No such tendency was observed for repeat domain V (p = 0.098) whereas limited number of isolates in categorical data or no variation in the remaining repeat domains precluded the analysis.

Table 7 Length polymorphism in repeat domains of PfGARP and parasite density.

Predicted linear B cell epitopes

Linear B cell epitopes in PfGARP were predicted based on similarity of known epitope sequences implemented in BepiBlast web server11 and protein language models implemented in BepiPred-3.012. In total, nine B cell epitopes were predicted by the BepiBlast method, most of which spanned repeat domains. Three of these predicted epitopes, i.e. NDKENISE, KQKKIEKE and KKQEEKEK, were perfectly conserved across isolates whose sequences were similar to known epitopes in ankyrin repeat-containing protein of Ehrlichia chaffeensis, spike glycoprotein of severe acute respiratory syndrome coronavirus 1 and glutathione S-transferase isozyme of Schistosoma mansoni, respectively (Table 8). Repeat domain III contained four predicted epitopes that possessed sequence similarity either with genome polyprotein of dengue virus or M protein of Steptococcus pyogenes. Furthermore, two predicted epitopes were identified in repeat domain VI (Table 8). Meanwhile, prediction based on BepiPred-3.0 has identified linear B cell epitopes mostly in conserved blocks 3 and 8. All repeat-containing domains received epitope scores below the cut-off threshold by this method (Supplemental Fig. S2). The epitope score for monoclonal antibody mAb7899 capable of killing P. falciparum in vitro4 was remarkably above the epitope threshold (the N-terminal part of block 8) albeit the sequence did not share similarity to known epitopes based on the BepiBlast method (Table 8 and Supplemental Fig. S2).

Table 8 Predicted linear B cell epitopes spanning 8 amino acids in PfGARP and their distribution among variant alleles.

Predicted helper T cell epitopes

Searching for potential helper T cell epitopes recognized by HLA class II molecules with allele frequency > 0.1 among Thai population13 has identified four peptides in blocks 1 and 3 of PfGARP that received peptide rank < 10 and IC50 < 1000 nM14. Three of these peptides were perfectly conserved across Thai and worldwide isolates. It is noteworthy that the four peptide variants in non-repeat block 3: (i) LLLSSPYQY, (ii) LLLSSLYQY, (iii) LLLSSPYQC and (iv) LLLSSPDQY, seemed to alter the peptide rank and IC50 for predicted HLA class II binding peptides, particularly amino acid substitutions in variants iii and iv (Table 9).

Table 9 Predicted HLA class II-binding peptides in PfGARP and distribution of variant alleles.

Discussion

PfGARP has been recently recognized as a potential anti-disease vaccine against falciparum malaria3,4,8. However, the gene encoding this protein seems to be dispensable because PfGARP-knockout parasites could propagate normally in vitro15. Our analysis did not support natural deletion of this locus because PfGARP could be amplified by PCR from all isolates examined, corroborating with previous whole genome sequence analysis16. Despite being perceived as high sequence conservation based on whole genome sequence analysis, our study has shown differential variation in repetitive sequences in this locus based on direct sequencing of P. falciparum clinical isolates from diverse malaria endemic areas of Thailand. With more sequences analyzed, two additional domains containing repetitive sequences (domains RII and RV) have been identified. Therefore, PfGARP was constituted of eight repeat blocks, two of which belonged to previously recognized homopolymeric glutamic acid-encoding domains, and five highly conserved non-repeat blocks6 (Fig. 1).

The number of haplotypes and the extent of nucleotide diversity of PfGARP were almost comparable across endemic provinces in Thailand except those from Yala Province in which only two haplotypes were identified and the nucleotide diversity was two orders of magnitude lower than those observed in other endemic provinces of the country (Table 1). Consistently, our previous analyses of genetic diversity of the genes encoding circumsporozoite protein and merozoite surface protein 2 of P. falciparum have shown a significant lower number of haplotypes and level of nucleotide diversity of these loci among southern parasite isolates including Yala and Narathiwat Provinces than those from Tak Province, a northwestern malaria endemic area. Likewise, the number of haplotypes and nucleotide diversity of the genes encoding apical membrane antigen-1, merozoite surface proteins 1, 4 and 5 of the sympatric P. vivax population from Yala was significantly lower than that of Tak. Simultaneous reduction in genetic diversity of P. falciparum and P. vivax populations from Yala Province seemed to be due to population bottlenecks in both Plasmodium species as a consequence of control measures during the past decades and limited trans-border migration in Yala and Narathiwat Provinces17. However, the FST value inferred from sequence variation in PfGARP between Tak and Chanthaburi populations was not significantly different from zero, implying no genetic differentiation between these populations. Although the reason behind this finding remains elusive, a considerable number of indigenous malaria patients in Tak and Chanthaburi (including Trat) Provinces occurred among gem miners who routinely traveled between these malaria endemic areas for their occupations while insufficient treatment were common, leading to malarial gene flow between these endemic areas18. It has been suggested that drug resistant P. falciparum strains were introduced from Thai-Cambodian border to Thai-Myanmar border corresponding to the gem trade between these areas. Although the gem trade was most active only during late 1980s and early 1990s, it could be that genetic diversity within each population could have been fixed after local introduction to each endemic area18.

Due to meager nucleotide substitutions in non-repeat blocks of PfGARP, phylogenetic tree inferred from this locus mainly represented sequence variation in repeat domains. It is noteworthy that most Thai isolates were clustered in the same or related branches while a few Thai isolates (CT1597, UBT139 and UBT784) were placed outside of most Thai lineages (Fig. 2). Meanwhile, most African isolates tend to be scattered throughout the phylogenetic tree. It has been proposed that the expansion or reduction in repeat units could stem from slipped-strand mispairing mechanism or gene conversion which has been suggested to occur in repeat sequences of several malarial genes encoding vaccine candidate antigens19,20,21,22,23,24. The topology of phylogenetic tree may suggest that the repeat sequences in PfGARP seemed to have undergone independent concerted evolution, become divergent and potentially been fixed for characteristic repeat alleles between populations with geographic isolation (i.e. Southeast Asia and Africa) while diversification of repeat sequences could incidentally generate some related alleles across geographic areas25. Within populations, repeat sequence similarities may evolve in concert, probably following the process of random genetic drift and molecular drive which includes DNA repair and replication mechanisms in conjunction with population genetic processes26,27,28. However, extensive expansions or contractions in repeat domains of PfGARP could have been constrained by intrinsic stability of the repeat structure29,30 and/or their functional importance7,31,32. For example, repetitive sequence containing identical amino acids can adopt characteristic conformations that affect protein–protein interaction33. Interestingly, the length of homopolymeric glutamic acid repeats in domain E1 containing stretches of GAA interrupted by GAG was approximately three times longer than the perfect GAA repeats in domain E2. It seemed likely that long perfect triplet repeats encoding the same amino acids could be affected by structural instability at the DNA level unless they were interrupted by another triplet encoding the same amino acid as previously described29,30,34.

It is noteworthy that repeat domains RI-RIV were rich in lysine and other positively charged residues6. Besides the PEXEL/HT motif in non-repeat block 1 that elicited the translocation of protein into the host cell membrane, it has been shown that the low complexity sequences encoding lysine-rich tandem repeats in RI-RIV of PfGARP have involved in protein targeting to P. falciparum-infected erythrocyte periphery. Furthermore, the number of lysine-rich repeat units seemed to be associated with protein targeting efficiency7. Importantly, a minimum of 10 lysine-repeat units in domain RI seemed to be indispensable for protein targeting to the erythrocyte periphery7. Our sequence analysis has identified 13 alleles in domain RI with the number of lysine-repeat units ranging from 12 to 19 units; all exceeded the minimum number required for host cell peripheral targeting function (Table 2). The perfect sequence conservation of domain RIV in PfGARP has implied functional or structural importance of the region. Based on limited number of samples in this study, the length of repeat domains III and VI had a tendency to be associated with parasite density although more samples would be required to draw a firm conclusion. However, if this would be the case, the expansion of repeat units in PfGARP could probably enhance parasite survival in malaria patients. Repeat-number polymorphism in protein-coding genes has been suggested to be influenced by selection pressure35. Likewise, the expansion of lysine-repeat units in PfGARP could confer selective advantage for P. falciparum7. Like erythrocyte membrane protein 1 of P. falciparum (PfEMP-1) and other related proteins on the surface of infected erythrocytes, PfGARP has been suggested to be associated with cytoadherence property of mature asexual blood stage parasites in order to avoid host immune destruction, especially splenic removal of abnormal and infected erythrocytes5.

Although the gene encoding PfGARP was cloned by screening of lambda phage expression library of P. falciparum with sera from Papua New Guinean adults over three decades ago, it was not until recently that the significance of this molecule has been unveiled as an important target for host antibody responses capable of protecting African children with falciparum malaria from high parasitemia and severe symptoms4. It has been shown that anti-PfGARP antibodies conferred parasite killing through the induction of programmed cell death as evidenced by the activation of caspase-like proteases and the fragmentation of parasite DNA of late trophozoites and schizonts. Importantly, the epitope for mAB7899 antibody conferring parasite killing in vitro has been mapped to a perfectly conserved block of the protein in which high scores for linear B cell epitopes were predicted in this region by the BepiPred 3 algorithm (Supplemental Fig. S2). Although additional B cell epitopes await further investigations, prediction of linear B cell epitopes by sequence similarity with known epitopes implemented in the BepiBlast web server have identified nine potential linear B cell epitopes spanning eight amino acids, eight of these predicted epitopes were found in repeat domains and most of which exhibited sequence variation among isolates (epitopes nos. 4–9 in Table 8). Intriguingly, sequence variation in repeat domains RIII and RVI could probably be influenced by host immune pressure.

It has been shown that cognate T cell epitopes in malarial vaccine candidate antigens play a crucial role to confer clinical protection36,37. Searching for common Thai HLA class II binding peptides in PfGARP has predicted four epitopes in non-repeat regions (blocks 1 and 3) of the protein, three of which were invariant across isolates. Interestingly, amino substitutions at residues 214 (Y>D) and 216 (Y>C) could abolish predicted helper T cell epitope scores recognized by a common Thai HLA class II allele DRB1*12:02 (Table 9). At the nucleotide level, the substituted epitopes exhibited dN significantly exceeding dS, suggesting that positive selection has influenced sequence variation in block 3. Although four helper T cell epitopes have been predicted in PfGARP, potential recognition of these epitopes seemed to be limited to one or a few common HLA class II alleles/haplotypes in Thai population. Meanwhile, a recent immunoinformatic and structural approach have suggested that a vaccine construct derived from PfGARP was predicted to induce both humoral and cellular immune responses38. Whether genetic restriction to host immune responses could compromise PfGARP vaccine efficacy awaits further studies.

In conclusion, sequence diversity in PfGARP seems to be limited to some repeat-encoding domains whereas non-repeat regions were highly conserved albeit microheterogeneity of sequence was observed particularly in regions potentially recognized by HLA class II molecules. With limited number of isolates analyzed, it seemed that expansion or reduction of lysine-rich and glutamic acid-rich repeat regions seemed to influence parasite density of malaria patients. With high sequence conservation in non-repeat and predicted immunogenic epitope regions, it is plausible that PfGARP-derived vaccine may largely elicit strain-transcending immunity.

Materials and methods

Parasite isolates

Blood samples were obtained from symptomatic malaria patients who were diagnosed with P. falciparum infections by microscopic examinations of Giemsa-stained thin and thick blood films, using a 100 × objective. The patients attended malaria clinics or district hospitals during 2009 and 2014 in Tak, Chanthaburi, Ubon Ratchathani and Yala Provinces located in northwestern, eastern, northeastern and southern parts of Thailand, respectively (Supplemental Fig. S3). Demographic data of the patients are shown in Supplemental Table S6. All blood samples were preserved in EDTA and stored at − 40 °C until use. An isolate from a Guinean patient (isolate MDCU32) was used to validate the protocol to genotype a sample from high-transmission setting.

DNA extraction

Two hundred microliters of EDTA-preserved blood sample from each patient were deployed for DNA extraction using Qiagen DNA mini kit (Qiagen, Hilden, Germany) following the manufacturer’s instruction. DNA samples were stored at − 40 °C until use.

PCR detection and genotyping of P. falciparum

All isolates diagnosed with P. falciparum monoinfections by microscopy were reaffirmed by species-specific nested PCR39. Genotypes of P. falciparum were determined by size polymorphism in block 2 of the merozoite surface proteins-1 (PfMSP1) and the central repeat region of the merozoite surface proteins-2 (PfMSP2) as described previously40. Isolates yielding single bands of both PfMSP1 and PfMSP2 on agarose gel electrophoresis were included for further analysis. In total 80 isolates were used in this study, consisting of 20 isolates from each endemic province (Supplemental Fig. S3).

Parasite density

Estimation of parasite density was done from at least 200 white blood cells in Giemsa-stained thick blood films, using a 100 × objective. The procedure was performed by a well-trained microscopist with > 20 years of experience in detection and identification of malaria parasite species. Parasite density was determined twice using duplicated blood films from each patient.

PCR amplification and sequencing of the PfGARP gene

The complete coding sequence of PfGARP was amplified by PCR using primers PfGARP-F0 (5′-ATAAATAAAGATTAGTATATTTAAAACG-3′) and PfGARP-R0 (5′-AAATAGCTTTGATTTAACACATTAC-3′). DNA amplification was carried out in a total volume of 20 µL containing 2 µL of DNA template, 2.5 mM each deoxynucleoside triphosphate, 3 μL of 10 × PCR buffer, 0.3 μM of each primer and 1.25 unit of ExTaq DNA polymerase (Takara, Seta, Japan). The PCR thermal profile included a preamplification denaturation at 94 °C for 1 min, 35 cycles of 94 °C for 40 s, 50 °C for 30 s and 72 °C for 3 min, and a final extension at 72 °C for 10 min. Amplicons were analyzed by 1% agarose gel electrophoresis, stained with ethidium bromide and visualized under UV transilluminator. Sequences were determined directly and from both directions using the PCR-purified products as templates and sequencing primers (Supplemental Table S7). Singletons and unique insertion-deletion of sequences were verified by re-sequencing of the PCR products from independent amplification reactions using the same genomic DNA as templates.

Data analysis

Sequence analysis included 80 nucleotide sequences of PfGARP from Thai isolates, one clinical isolate from Guinea (isolate MDCU32) and 18 publicly available complete gene sequences whose isolate names, country of origins and their GenBank accession numbers are as follows: 3D7 (Netherlands from West Africa, AL844501), CD01 (Congo, LR129686), Dd2 (Indochina, LR131290), FC27 (Papua New Guinea, J03998), FCC1/HN (Hainan in China, AF251290), GA01 (Gambia, LR131386), GB4 (Ghana, LR131402), KH1 (Cambodia, LR131418), KH2 (Cambodia, LR131306), HB3 (Honduras, LR131338), IGH-CR14 (India, GG6656811), IT (Brazil, LR131322), KE01 (Kenya, LR131354), ML01 (Mali, LR131481), SD01 (Sudan, LR131466), SN01 (Senegal, LR131434), TG01 (Togo, LR131450), and UGT5.1 (Vietnam, KE124372). Of these, the 3D7, FC27and FCC1/HN sequences were determined by Sanger dideoxy-chain termination method whereas the remaining isolates were assembled sequences from next-generation sequencing platforms (Supplemental Table S1). Sequence alignment was performed by using the CLUSTAL_X program, taken into account appropriate codon match in the coding region by manual adjustment to maintain the reading frame. The sequence from the FC27 strain was used as a reference6. Searching for nucleotide repeats was performed by using the Tandem Repeats Finder version 4.0 program with the default option. Nucleotide diversity (π), the rate of synonymous substitutions per synonymous site (dS) and the rate of nonsynonymous substitutions per nonsynonymous site (dN) were determined from the average values of sequence differences in all pairwise comparison of each taxon and the standard error was computed from 1000 bootstrap pseudoreplicates implemented in the MEGA 6.0 program41. Haplotype diversity and its sampling variance were computed by taking into account the presence of gaps in the aligned sequences using the DnaSP version 5.10 program42. Natural selection on codon substitution was determined by using fast unconstrained Bayesian approximation (FUBAR) method in the Datamonkey Web-Server43,44. Neighbor-joining phylogenetic tree based on nucleotide sequences was constructed by using maximum composite likelihood parameter whereas maximum likelihood tree was built using Tamura-Nei model with the rate variation model allowed for some sites to be evolutionarily invariable. The Arlequin 3.5.2.2 software was deployed to determine genetic differentiation between populations, the fixation index (FST), using analysis of molecular variance approach (AMOVA) akin to the Weir and Cockerham’s method but taken into account the number of mutations between haplotypes45. One hundred permutations were deployed to determine the significance levels of the fixation indices. Prediction of linear B cell epitopes in PfGARP was performed by using a sequence similarity to known experimentally verified epitopes from the Immune Epitope DataBase (IEDB) implemented in the BepiBlast Web Server11. Furthermore, linear B cell epitopes were also predicted based on protein language models implemented in BepiPred-3.012. Potential HLA-class II-binding peptides were analyzed by using the IEDB recommended 2.22 algorithm with a default 12–18 amino acid residues option. The predicted HLA-class II-binding peptides were predicted based on the percentile rank < 10 and the IC50 threshold for HLA binding affinity ≤ 1000 nM14. The analysis mainly concerned the common HLA class II haplotypes among Thai populations with allele frequency > 0.113.

Ethical approval

This study was reviewed and approved by the Institutional Review Board in Human Research of Faculty of Medicine, Chulalongkorn University, Thailand (IRB No. 193/64; COA No. 468/2021). Prior to blood sample collection, written informed consent was obtained from all participants or from their parents or guardians. All procedures were performed in accordance to the relevant guidelines and regulations.

Accession numbers

Eighty-one complete sequences of the PfGARP gene of Plasmodium falciparum have been deposited in NCBI GenBank under accession numbers OQ197883-OQ197963.