Diversity of Mycobacterium tuberculosis in the Middle Fly District of Western Province, Papua New Guinea: microbead-based spoligotyping using DNA from Ziehl-Neelsen-stained microscopy preparations

Tuberculosis remains the world’s leading cause of death from an infectious agent, and is a serious health problem in Papua New Guinea (PNG) with an estimated 36,000 new cases each year. This study describes the genetic diversity of Mycobacterium tuberculosis among tuberculosis patients in the Balimo/Bamu region in the Middle Fly District of Western Province in PNG, and investigates rifampicin resistance-associated mutations. Archived Ziehl-Neelsen-stained sputum smears were used to conduct microbead-based spoligotyping and assess genotypic resistance. Among the 162 samples included, 80 (49.4%) generated spoligotyping patterns (n = 23), belonging predominantly to the L2 Lineage (44%) and the L4 Lineage (30%). This is consistent with what has been found in other PNG regions geographically distant from Middle Fly District of Western Province, but is different from neighbouring South-East Asian countries. Rifampicin resistance was identified in 7.8% of the successfully sequenced samples, with all resistant samples belonging to the L2/Beijing Lineage. A high prevalence of mixed L2/L4 profiles was suggestive of polyclonal infection in the region, although this would need to be confirmed. The method described here could be a game-changer in resource-limited countries where large numbers of archived smear slides could be used for retrospective (and prospective) studies of M. tuberculosis genetic epidemiology.

Results clinical sample set. A total of 345 smeared sputa were collected from June 2012 to March 2014, of which 206 showed evidence of mycobacterial DNA based on combined TaqMan qPCR assays (see 14 for details). Out of those 206 qPCR-positive samples, the spoligotyping and resistance analyses focused on the ones with the lowest Cq values after qPCR, with a cutoff of 36 chosen from preliminary spoligotyping results. Samples included in the analyses (n = 162) are summarised in Table 1, with their detailed microscopy results. The excluded samples (n = 44; Cq > 36) were mostly negative in microscopy (36 negative, 3 scanty, 5 unknown).
Genotypic diversity of the clinical isolates. Spoligotyping  Twenty-three different spoligotyping patterns belonged to 13 different families and five clusters with two or more samples (Fig. 1). The largest clusters were L2/Beijing SIT1 (n = 34), one new L4 pattern (likely L4.2/Ural1 according to TBminer) (n = 9), and one pattern labelled as 'mixed' (n = 11). Of the 23 patterns identified, eight patterns (n = 44 samples) have already been described in PNG, including four patterns described in the two previous PNG studies 8,9 as well as ours: L2/Beijing SIT1, L4/H3 SIT50, L4/T1 SIT53 and L4/T1 SIT393 (Fig. 1). The other 15 patterns (n = 36 samples) were "new" in PNG. Of note, eight patterns were detected in the two previous PNG studies 8,9 but were not identified in the Balimo/Bamu region, including L4/X1 SIT119 and L4/LAM9 SIT42 that each accounted for 6% of spoligotypes in Madang 8 . Figure 2 shows the distribution of Mtb diversity according to previous investigations in PNG (Fig. 2B) and the present study (Fig. 2B,C). L2 and L4 Lineages were present in all PNG settings, while L1 Lineage was detected in samples from three locations out of the five regions investigated, including the Balimo/Bamu region where it was detected in one sample (Fig. 2C). However, the relative percentage of each lineage varied between locations at the national level ( Fig. 2B) as well as at the regional level (Fig. 2C). In the present study, the biggest spatial cluster (n = 8) corresponds to the town of Balimo (Fig. 2C).

Spatial analysis of Mtb lineages in pnG.
Rif-resistance associated mutations. RIF-resistance was assessed by sequencing a 147-bp region of the rpoB gene (including RRDR). Out of the 162 qPCR-positive sputum smears included in the analyses, 116 (71.6%) were successfully sequenced ( Table 2). Mutations were identified in 7.8% (95% CI 2.9-12.6%) of the successfully sequenced samples, and all corresponded to the rpoB S450L codon mutation ( Table 2). Of note, one sample that tested positive for mycobacterial DNA based on qPCR resulted in a rpoB sequence originating from Pseudomonas aeruginosa. Considering that this sample showed a late positive qPCR (Cq 33.3 and 35.5), a negative microscopy result, but a '1 +' microscopy result on another slide from the same patient sampled the previous month, we believe this patient to be a true but low MTBC positive and a mixed sample containing MTBC and Pseudomonas.
Among the 79 sputum smears with both rpoB and spoligotyping results, 72 were rpoB wild type and seven were S450L mutants, with all the RIF-resistant samples belonging to the L2/Beijing Lineage (Fig. S1). Drug resistance was thus linked with Lineage 2 (Fisher exact test p = 0.003), as already observed in a study conducted at Modilon General Hospital, Madang Province 8 .

Discussion
The spoligotyping method is known to have limited discriminatory capacity due to the fact that it targets only a single genetic locus, covering less than 0.1% of the Mtb genome, and when used alone, is not sufficient for epidemiological linking studies 24 . Nevertheless, spoligotyping allows identification of most if not all genotypes of significant, clinical, and epidemiological relevance, such as the "Beijing" genotype 24 . In this study, we used spoligotyping to describe the genetic diversity of Mtb from patients presenting at BDH, originating from the Middle Fly District WP, PNG. We show a wide diversity of spoligotype patterns in the Balimo/Bamu region, and confirm the presence of the modern L2 and L4 Lineages, with a single ancient L1 Lineage case, originating from the Bamu LLG. We also analysed drug-resistance associated mutations in the rpoB gene, and identified 7.8% of the successfully sequenced samples showing RIF resistance, all being L2/Beijing Lineage isolates.
Populations in the Highlands are the oldest settlers of PNG, as shown by human genetic studies 25 , and have been isolated from the outside world for much longer than the populations in coastal sites. For this reason it was expected that evolutionary ancient lineages of Mtb (e.g. L1 Lineage) would be found in the low density populations of the Highlands, as opposed to modern lineages of Mtb (L2 and L4 Lineages) that would be found in the high density populations of the coastal regions 9 . However, a recent human genetic study determined that the genetic differentiation between populations from highland and lowland PNG does not seem to be as significant as previously thought 26 , and no statistically significant difference in the prevalence of L1 Lineage could be found in Goroka, Eastern Highlands, compared to coastal regions 9 . In the end, the pattern of dominance of modern lineages over ancient L1 Lineage seems consistent across PNG, with the 2% of L1 Lineage found in the Balimo/Bamu region not dissimilar to the 0.6% (1/173) to 8% (3/38) observed in the North Coast (East Sepik, Madang), Goroka or Alotau 8,9 . This pattern is strikingly different from what has been described in neighbouring South-East Asian countries. In Western New Guinea (WNG), i.e. the Indonesian part of the island, the same three lineages have been reported, but with a higher proportion of L1 Lineage (33.7%) 27 . This is also true in other countries of the region, with L1 Lineage representing 30.2% of samples in Makassar, Indonesia 28 , 56.4% in Malaysia 29 , and 36.3% in southern Vietnam 30 . This might be due to different importation histories of TB strains in PNG compared to the rest of South-East Asia. For example, it has been suggested that ancient TB lineages might have been replaced early on by modern and more successful lineages with the arrival of Europeans in PNG in the late 19 th century 31 .
In L4 Lineage, Latin-American and Mediterreanean (LAM) families (Lineage 4.3) were the most prevalent in WNG, as compared to PNG where other L4 families were found in high proportion in most of the locations. Interestingly, L4/SIT393 is reported in the three independent studies performed so far in PNG. L4/SIT393 is www.nature.com/scientificreports www.nature.com/scientificreports/ characterized by the absence of spacer sp14, a character that has not yet been shown to be monophyletic. A total of 60 cases have now been reported in PNG (Fig. 1), whereas until now, only 36 cases were described worldwide in SITVIT2 32 . The largest and best collection of L4 Lineage characterized so far in PNG has been described by Stucki and colleagues 33 that report, within the L4 Lineage, up to 40% of L4.10/PGG3 (Principal Genetic Group 3) strains, characterized by the katG463-CGG (Arg) plus gyrA95-AGC (Ser) combination and the Rv1501 C-> A www.nature.com/scientificreports www.nature.com/scientificreports/ allelic change 34,35 . We suggest that L4/SIT393 isolated in PNG could belong to the L4.10/PGG3 lineage. Future work should focus on SNP characterization of these isolates in PNG, and related analyses should shed interesting light into diversity at the sublineage level. www.nature.com/scientificreports www.nature.com/scientificreports/ An interesting result from our analysis was the likely identification of many mixed (or polyclonal) infections, which are infections with several different genotypes/strains of Mtb in a single patient 36 . In our study, many spoligotype patterns could be assigned as MANU2 in SITVIT2, but have been assigned as "mixed" patterns in our analysis, with strains harboring superimpositions of L2 and L4 patterns (Table 1), and seven orphans showing similar patterns. It has been suggested that mixed infections in clinical specimens can challenge TB spoligotyping and sequencing techniques, and might have been the cause of false spoligotypes previously reported 37,38 . MANU2 spoligotype has been suggested to be one of the five genotypes derived from mixed infections in clinical specimens 39  Typical L2/Beijing strains are reported at high frequency worldwide and have been associated with drug resistance and increased transmissibility 42,43 . Sequencing of a 147-bp rpoB region identified seven samples as RIF-resistant, all presenting the rpoB S450L codon mutation. This is in agreement with previous findings in the Balimo/Bamu region WP 44 and in South Fly District WP 12 and other provinces in PNG 8,9 . All RIF-resistant samples belonged to the L2/Beijing Lineage, as found in Vietnam 30,45 . Molecular epidemiological studies have suggested that certain Mtb genotypes are more successful than others, including some L2/Beijing genotypes that have a higher fitness and have evolved unique pathogenic characteristics 42,46,47 . Phylogenetic analysis has provided evidence that the more recently diversified strains (so-called modern/typical Beijing L2.3) are adapted to spread and cause disease, compared to other sublineages (so-called ancient/atypical Beijing L2.2 or proto-Beijing, L2.1) 48 . On Daru Island in the South Fly District WP, a drug-resistant TB outbreak was recently characterized using WGS and was found to be driven by a Beijing L2.2.1.1 strain 12 , corresponding to a modern Beijing L2.3 in the latest classification from Liu and colleagues 48 . Unfortunately, our study did not provide any information at the sublineage level regarding the L2/Beijing strains. To complement our spoligotyping, further genomic studies of this sublineage in the Balimo/Bamu region of PNG will be needed to assess the presence of super-spreaders, association with drug resistance, or any other genetic signature of epidemiological interest.
Our study confirms the limitation that clean spoligotyping results can be obtained mostly from AFB 3+ and 2+ slides. Nevertheless, the collected information about genetic diversity is invaluable for better understanding the epidemiology of TB, and archived smear slides are of sufficient quality to collect such data, as seen in this study. It is likely that prior targeted enrichment techniques may have improved our direct genotyping results from sputum smears 49,50 . However, such methods are not yet appropriate to local processing of samples. Before more sensitive methods can be introduced in countries such as PNG, huge numbers of archived smear slides could be used for retrospective and prospective studies of Mtb diversity.
In summary we demonstrate that, in countries where conservation of samples might be an issue, and/or hospital laboratories do not have the capacity for culture or Xpert MTB/RIF-based diagnosis of TB, uncultured biological material could be used to obtain rich genomic information using archived sputum smears. Increasing local in vitro diagnostic capacities using devices such as the GeneXpert (Cepheid, USA) is important to increase awareness and improve public health. Alternatively, research on efficiency of new lab-on-chip devices, that should be as simple and inexpensive as possible to be used in level 1 laboratories and run as close as possible to the bedside of the patient, could be another way to improve TB control in rural regions of PNG.

Methods
Study setting and sample collection. BDH is located in the town of Balimo in the Middle Fly District WP, PNG. Passive case detection for TB is conducted at BDH among individuals self-presenting for health care. Presumptive TB cases presenting at BDH were from the Balimo Urban, Gogodala Rural, and Bamu Rural LLG areas, i.e. three of the five LLGs belonging to the Middle Fly District WP. Sputum samples are collected routinely for diagnostic purposes from patients with clinical signs of TB, and are ZN-stained and examined for AFB (acidfast bacilli) by light microscopy, with slides then archived and stored. Each sputum sample was smeared on several slides, with up to five slides per sputum sample (see 14 for details). Archived slides were transferred to James Cook University, in Townsville, Australia, for DNA extraction from the sputum smears.   www.nature.com/scientificreports www.nature.com/scientificreports/ DNA extraction and detection of Mycobacterium species. DNA extracts were obtained using a modified version of a previously described Chelex DNA extraction method in biosafety level 2 laboratories 21 ; the modified protocol has been published elswhere 14 . The presence of MTBC in the extracted DNA was analyzed using a TaqMan real-time PCR 51 and the molecular diagnosis results have been published elsewhere 14 . Regarding duplicate sputum smears (i.e. originating from a single patient), only the smear showing the lowest Cq value was selected for further analysis. DNA aliquots were sent to the Institute for Integrative Cell Biology (Gif-sur-Yvette, France) for high-throughput spoligotyping.
Microbead-based spoligotyping. Microbead-based hybridization was performed as previously described 10,23 using the 'TB-SPOL' kits purchased from Beamedex (Beamedex, Orsay, France; www.beamedex.com). Detection was performed using a Luminex ® 200 platform (Luminex Corp, Austin, TX, USA) and XPONENT software for LX100/LX200 (version 3.1.871.0). Interpretation of raw data was undertaken jointly by two experts (CS, GR) who idependently assessed spoligotyping patterns as previously described 52 . When no clear-cut interpretation was possible, typing was considered as failed; in other cases, intermediate quantitative signals for some spacers only allowed categorization of the samples as "mixed patterns" (see Discussion). Individual spoligotyping patterns were further compared with the International Spoligotyping Database (SITVIT2) of Pasteur Institute of Guadeloupe (http://www. pasteur-guadeloupe.fr:8081/SITVIT2/indexfr.jsp), the latest updated release of SITVITWEB database 32 . Shared International Types (SIT) and spoligotype lineages and families were assigned according to signatures provided in SITVIT2 and to the latest genome-based lineage designations, i.e. L1-L7 Lineage labels and sublineage labels when possible, according to 53 and 34 . Spoligotyping patterns absent in SITVIT2 were reported as "orphans", as well as the patterns already described as "orphans" in the SITVIT2 database with no assigned SIT. Additional exploration of the family to which strains belonged was performed using TBminer (https://info-demo.lirmm.fr/tbminer/) 54 . This tool provides assignation according to different classifications that largely overlap. Contradictory assignations between the different classifications are indicative of either newly described subfamilies or of artefacts in the genotyping data (unpublished results). Spoligotyping from smear slides has previously shown better results with AFB-positive sputa quantified as '2+' and '3+' because of higher DNA loads 55 . However, instead of using microscopy results, we took into account Cq values obtained from qPCR targeting MTBC to select samples showing high DNA loads (see 14  Samples that were reactive (Cq ≤ 38) were sequenced in both directions (Macrogen Inc., Republic of Korea). Sequence chromatograms were analysed using Geneious R10 (Biomatters Limited, Auckland, New Zealand), with the consensus sequences of each sample aligned with and compared to the MTB H37Rv reference sequence (GenBank: AL123456.3) using the "Map to Reference" Geneious tool. The consensus numbering system used for the RIF-resistance associated rpoB gene mutations is according to that described previously for MTB H37Rv 56 .
Mapping the data. Patient data was obtained from the BDH TB patient and laboratory registers, with patient locations identified as the first residential address recorded. Each patient's location was matched to a census unit, based primarily on PNG census data obtained from the PNG National Statistical Office, or alternatively from the 2012 and 2017 PNG government election polling schedules 57,58 . Latitude and longitude coordinates were obtained from the PNG National Statistical Office and census data, based on census unit-level data for villages, and averaged coordinates of multiple census units for Balimo and larger villages where the precise census unit was unknown. Instances of alternate local names were checked and confirmed locally. Out of the 162 included samples, the residential address was not able to be determined for 14 (address absent from the register for 12 patients; unknown villages 'Sarau' and 'Owa' for two patients). Maps were created using a Mac OSX Version of QGIS v2.18 Las Palmas de Gran Canaria 59 (available from: http://www.kyngchaos.com/software/qgis). A licence-free PNG map with administrative areas (level 1) was downloaded as a shapefile from http://www.diva-gis.org/gdata. ethics statement. Ethics approval was obtained from the PNG Medical Research Advisory Council and registered under the reference MRAC No. 17.02. The information sheet and written informed consent forms were available where required. However, the need for written informed consent was waived by the Ethics Committee due to generally low levels of literacy, and verbal consent was obtained in the majority of the cases. The study was conducted with the permission and support of the Middle Fly District Health Service and the Evangelical Church of PNG Health Service, and at all times permission was obtained prior to sampling activities.

Data availability
All data generated or analysed during this study are included in the published article. Published: xx xx xxxx