Multilocus sequence analysis reveals high genetic diversity in clinical isolates of Burkholderia cepacia complex from India

Burkholderia cepacia complex (Bcc) is a complex group of bacteria causing opportunistic infections in immunocompromised and cystic fibrosis (CF) patients. Herein, we report multilocus sequence typing and analysis of the 57 clinical isolates of Bcc collected over the period of seven years (2005–2012) from several hospitals across India. A total of 21 sequence types (ST) including two STs from cystic fibrosis patient’s isolates and twelve novel STs were identified in the population reflecting the extent of genetic diversity. Multilocus sequence analysis revealed two lineages in population, a major lineage belonging to B. cenocepacia and a minor lineage belonging to B. cepacia. Split-decomposition analysis suggests absence of interspecies recombination and intraspecies recombination contributed in generating genotypic diversity amongst isolates. Further linkage disequilibrium analysis indicates that recombination takes place at a low frequency, which is not sufficient to break down the clonal relationship. This knowledge of the genetic structure of Bcc population from a rapidly developing country will be invaluable in the epidemiology, surveillance and understanding global diversity of this group of a pathogen.

MLST is a powerful tool for typing and has provided insights into the population structure and recombination events in bacteria 13,14 . However, MLST studies on Bcc from developing countries in general and from India, in particular are lacking. With high patient diversity and density along with heavy use of antimicrobial agents, such studies are need of the hour. The present study involved MLST analysis of 57 clinical isolates of Bcc obtained during the time span of seven years (2005-2012) from four hospitals across India. For the first time in India, Bcc has been isolated from CF patients and three isolates from CF were included in this study 15 . Further identified the species of Bcc using MLSA, inferred about the genetic diversity and recombination features. This analysis has revealed novel insights of genetic diversity of Bcc isolates from India, which is important in epidemiological surveillance.

Results
Total 57 Bcc isolates were included in this study. Out of these 57 isolates, 42 isolates were obtained from tertiary referral hospital of Post Graduate Institute of Medical Education and Research, Chandigarh, India (PGIMER; n = 42); 6 isolates from blood cultures from Fortis Escorts Heart Institute, New Delhi (FEHI; n = 6); 6 isolates from urine and blood culture were obtained from Topiwala National Medical College & B. Y. L. Nair Charitable Hospital, Mumbai (TNMC; n = 6) and 3 from blood cultures from Global Hospitals & Health City, Chennai (GHH; n = 3). A majority of isolates were obtained from blood cultures (n = 46) and rest from respiratory specimens (n = 6), endotracheal aspirate (ETA) (n = 2) and pus (n = 1), urine (n = 1) and body fluid (n = 1) from patient population having septicaemia, meningitis, peritonitis, pus, respiratory tract and urinary tract infections (Supplementary Table 1). Besides non-CF patients, three isolates are from CF patients, two of them were obtained from ETA and one from blood culture. E-MLST and population structure. A total of 21 sequence types (ST), including 12 novel STs, which, harboured 11 novel alleles were identified amongst 57 Bcc isolates included in this study. Locations of the hospitals from where the Bcc isolates were obtained and their STs identified are shown in Fig. 1 Fig. 2. Isolates 8947 and 9500 were isolated from the same patient at an interval of one week but had different sequence types as ST6 and ST628 respectively. Interestingly, in one case four serial isolates (4613, 5310, 5312 and 7216) were retrieved out from the same patient admitted in bone marrow transplant unit of PGIMER with two distinct novel STs (826 and 825). The isolates 4613, 5310 and 5312 with ST825 are the initial isolates and three weeks later isolate 7216 with ST-826 was isolated. Two novel STs (828, 826) were obtained from three CF isolates (1236, 7055 & 7716) under this study. Interestingly, ST 826 was also isolated from the blood culture of non-CF patient. Isolates 7055 and 7716 isolates were obtained from the same patient with a similar allelic profile (ST 826) but had different isolation sources, i.e. endotracheal aspirate and blood.
21 STs found in this study were grouped into 5 BURST groups (BG) and 12 singleton STs by eBURST (Supplementary Table 2). The largest BG had 15 isolates and 3 STs (628, 839 and 217) with predicted founder ST 628, second largest group comprised of 6 isolates and 2 STs (232 & 843). Another group, which comprised of 5 isolates and 3 STs (826, 821 and 822) with predictor founder 826 and 13 unrelated STs, which were predicted to be singletons. goeBURST analysis with all 1083 STs present in the PubMLST database (as on 4 th September 2016) including STs under this study (Fig. 3A) revealed that the seven STs-628, 839, 217, 826, 621, 822 & 841 are linked to the 31 clonal complex (CC), which is the largest clonal complex (Fig. 3B). Among the STs that are linked to the 31CC, ST 826 was obtained from an isolate of CF patient.
Sequence diversity. In order to assess the sequence diversity of the seven MLST loci of the isolates under study, average GC content was calculated, the number of polymorphic sites, nucleotide diversity (π), and ratio of non-synonymous (d N ) to synonymous (d S ) substitutions were determined for each MLST locus and mentioned in Table 1 indicating that the all seven MLST loci exhibiting purifying selection. The mean GC content of the MLST gene fragments ranging from the 61.1% for phaC to 69.3% for the trpB. The nucleotide diversity index (π ) ranging from the 0.00522 (gltB) to 0.02003 (gyrB). The number of polymorphic sites varied from the 13 for gltB to 46 for the gyrB, the most polymorphic locus.
Multi locus sequence analysis. All isolates included in this study are previously identified as Bcc by recA-RFLP. However, a major disadvantage of this method is that most species include multiple restriction profiles and overlapping of restriction profiles in multiple species which limits this method for accurate species identification of Bcc. Thus for the identification of species of the Bcc isolates we employed MLSA, which involves phylogenetic analysis on the nucleotide sequences of the alleles utilized in MLST. Phylogenetic analysis was carried out using multilocus sequence information of isolates under study by including the sequences from type or reference strain of 20 validated species of Bcc (Supplementary Table 3). The addition of reference strains or type strains in MLSA enables us the more accurate identification of the species of Bcc. The maximum likelihood tree based on seven loci showed that Bcc isolates under this study are grouped into two groups. First is a major group consisting of 51 isolates (89.47%), whose all 16 STs (including 8 novel) form one tight cluster along with  B. cenocepacia J2315 T and belong to one lineage. The second is a minor group consisting of 6 isolates (10.53%) belonging to 4 novel STs (823, 838, 825 & 837) that cluster with B. cepacia ATCC 25416 T . The resulting phylogenetic tree revealed that the major lineage belonged to B. cenocepacia while other minor but diverse lineage belonged to B. cepacia (Fig. 4). This suggests that the population of Bcc consist of only two species B. cenocepacia and B. cepacia. Interestingly, out of four serial isolates from the single patient, initial isolates (5310, 5312 and 4613) belong to B. cepacia lineage while the later isolate (7216) groups with the B. cenocepacia lineage. Similarly, amongst the isolates 8947 and 9500 from the same patient but initial isolate belongs to B. cepacia and later one to B. cenocepacia clad respectively. Three CF isolates including two from the same patient belong to the major B. cenocepacia lineage.
Recombination analysis. Split network analysis to examine the evidence for recombination amongst the 57 isolates revealed different structures in the split graphs for seven loci (Fig. 5). The split graphs for gltB, gyrB, recA, phaC and lepA revealed a network like with parallelogram structures indicating that intergenic recombination   had occurred during the evolutionary history of these genes. However, the split graphs of atpD and trpB are tree-like structures suggesting that the descent of these genes was clonal and absence of recombination. The split decomposition analysis of combined seven MLST loci display network like structure with rays of different length (Fig. 6A). The 21 STs representing all isolates are divided into two main groups, L1 and L2 which corresponds to isolates of B. cenocepacia and B. cepacia respectively identified by the MLSA. Group L1 and L2 are completely disconnected from each other suggests that the interspecies recombination had not occurred. However, intergenic recombination has occurred between the isolates of B. cenocepacia group (L1) during the course of evolution as parallelogram-shaped groupings were detected (Fig. 6B). ST 828, which contains only one isolate 1236 from CF  patient was clearly disconnected from both the groups suggesting that there is no recombination between this isolates and isolates of the other two group.
The phi test is a rapid statistically efficient test for recombination. The P value generated from phi test for all 21 STs is 5.923E-04 (Table 2)

Discussion
Multilocus sequence typing is a method of choice for the typing of Bcc due to its ability to differentiate the species in Bcc complex 5,8,16,17 . The MLST is also employed to study the global epidemiology of the Bcc isolates of CF 18 .
Recently, E-MLST typing had been described that addressed the shortcomings in the original MLST methodology 12 , which was performed in this study on 57 clinical isolates of Bcc from India which was never studied before.
MLST successfully revealed the extent of diversity in Bcc isolates from a rapidly developing country like India where co-incidentally the patient density and diversity is also high, which is further coupled with the intense use of antibacterial agents 19,20 . In this study, 57 clinical isolates of Bcc are distributed in to 21 STs including 12 novel STs indicating the extent of diversity. Not only could we identify a large number of novel STs but half of the population belonged to these novel STs. In fact, one novel ST (ST824) constituted 10.6% of the population. The study has also provided first insights into the diversity of CF isolates from India where CF is not prevalent 21,22 . The fact that one of this novel ST from CF patient was also isolated from non-CF patient suggests the need of more exhaustive and exclusive MLST studies on CF isolates.
Among the Bcc, B. cenocepacia and B. multivorans together account 85-97% of all Bcc infections. Despite this, the other species of which rarely found in clinical infections are B.cepacia, B. stabilis, B. vietnamiensis, B. dolosa,  B. ambifaria, B. anthinia and B. pyrocinia 18 . While MLST allowed us to type and establish their clonal groups and multilocus sequence analysis (MLSA) allowed us to infer phylogenetic relationship amongst the isolates and species of Bcc isolates. MLSA revealed that vast majority of the isolates belonged to B. cenocepacia and form one lineage, the remaining few belonged to B. cepacia and form a minor lineage. All three CF isolates belong to the B. cenocepacia. Interestingly, one CF isolate (1236) that also belongs to a novel ST forms out-group of dominant lineage suggesting that CF isolates can be more diverse or ancestral. Two serial isolates with identical STs from different tissues at the interval of one week from the CF patient suggests the chronic infection, which is a classic example of cepacia syndrome 23 . Hence, future studies will be aimed to look at more CF isolates for the extent of diversity and novel STs. Seven STs which comprises 21 isolates from our study are linked to 31CC, which is the largest clonal complex among all STs reported in the database. ST28 and ST32 strains are distributed globally to distinct locations and belong to clonal complex 31 18 . This suggests that isolates from our study are genetically closely related to these internationally spread clones. Analysis of recombination suggests that the population of Bcc is clonal in nature and mutations playing an important role rather than recombination in generating the genotyping diversity, however, few allelic loci exhibiting intergenic recombination. There is no evidence for the interspecies recombination but a larger collection of strains is required to investigate interspecies recombination events.

Conclusion
Overall, MLST and MLSA analysis provided major insights into the diversity of Bcc population in India. There is a need to extend the study to larger collection from several more hospitals to encompass the whole of India for effective epidemiological and surveillance studies. There is also an immediate need to undertake such studies exclusively on CF patients. Further whole genome sequencing studies on this collection of Bcc is required to understand genomic diversity, variation by horizontal gene transfer, candidate virulence loci and its resistome.

Material and Methods
Ethical clearance. Ethical clearance was obtained from Institute Ethics committee at PGIMER, Chandigarh Bacterial strains. We had screened clinical isolates collected from various specimens that included blood cultures, pus, respiratory specimens, sterile body fluids, intravenous catheters and cerebrospinal fluid (CSF), which were stocked over the last 7 years (2005-2012). Gram-negative, oxidase-positive, motile, NFGNB were  selected and the identification of isolates as members of the Bcc was carried out by a triphasic analysis as, growth on the B. cepacia selective agar (BCSA), biochemical tests like oxidase, lysine decarboxylase, ornithine decarboxylase, arginine dihydrolase, triple sugar iron (TSI) and restriction fragment length polymorphism (RFLP) of recA gene 24 . Blood agar and MacConkey agar were used for routine isolation of Bcc. Burkholderia cepacia selective medium (BCSA) was used for the recovery of Bcc from respiratory specimens of CF patients that included sputum, induced sputum, bronchioalveolar lavage etc. which are diagnosed as per the guidelines of CF Foundation 25 . Bcc isolates were lyophilized and stored at 4 °C for further reference.

Genomic DNA extraction, MLST locus amplification and sequencing. E-MLST was performed
by sequencing of the 7 housekeeping genes: ATP synthase beta chain (atpD), glutamate synthase large subunit (gltB), DNA gyrase subunit B (gyrB), recombinase A (recA), GTP binding protein (lepA), acetoacetyl-CoA reductase (phaC), Tryptophan synthase subunit B (trpB) according to previously published method and available at www.pubMLST.org/bcc 11,12 . The list of primer used in the MLST locus amplification is mentioned in Supplementary

Expanded Multi-Locus Sequence Typing (E-MLST).
Allele profiles, sequence types (ST) and clonal complexes were assigned by using the E-MLST database (www.pubMLST.org/bcc/). Alleles and ST that had not been previously described were submitted to the database and were assigned new allele numbers and STs.
Population structure analysis. The STs generated in MLST were subjected to the eBURST analysis by eBURSTv3 in order to assign them into the Clonal complex/Burst groups (BG). The eBURSTv3 is based on a model of bacterial evolution whereby a single ancestor founder ST undergoes diversification to produce a subset of closely related STs and available at http://eburst.mlst.net/ 26 . The relationship between the STs generated in this study with the existing STs in the global MLST database was assessed by using geoBURST 27 .
Sequence diversity analysis. The number of polymorphic sites, nucleotide diversity π ( ), average GC content was calculated by using DnaSP Version 5.10 28 . For calculation of the average non-synonymous/synonymous substitution rate ratios (dN/dS), we first corrected the allelic locus of gyrB, atpD and gltB in order to get them in translating frame and dN/dS ratio was calculated by using MEGA version 6.06 29 .
Multilocus sequence analysis. The concatenated nucleotide sequences of seven MLST allelic loci of reference strains of Bcc members (Supplementary Table 2) was retrieved from the PubMLST (www.pubMLST.org/ bcc/). Sequence alignment of the concatenated seven housekeeping gene fragments (E-MLST) of all isolates and reference strains of Bcc members was carried out and Maximum Likelihood tree was constructed by using General Time Reversible model, Gamma distributed and Invariant sites (G + I) with 1000 bootstrap replications using MEGA version 6.06 29 . Split network and Recombination analysis. The split network of STs and individual loci was generated by using neighbor-net method 30 using SplitTree4 31 . The pairwise homoplasy index (phi) test 32 implemented in SplitTree4 31 for recombination was performed, and the P value < 0.05 indicated recombination existed. LDhat programme 33 implemented in Recombination Detection Program (RDP) v.4.66 34 was used to calculate per site ρ /θ ratio based on the concatenated sequences of seven loci with 1,000,000 MCMC updates. The parameters ρ and θ represents the rate of recombination and mutation respectively. Linkage disequilibrium from allelic data was evaluated by calculating the standardised index of association (I A S ) using LIAN v3.7 35 (http://adenine.biz. fh-weihenstephan.de/lian_3.1/). The null hypothesis of complete linkage equilibrium (I A S > 0; presence of linkage disequilibrium or clonality) was tested by using Monte Carlo methods with 10,000 iterations on allelic profile.