Community-Associated Staphylococcus aureus from Sub-Saharan Africa and Germany: A Cross-Sectional Geographic Correlation Study

Clonal clusters and gene repertoires of Staphylococcus aureus are essential to understand disease and are well characterized in industrialized countries but poorly analysed in developing regions. The objective of this study was to compare the molecular-epidemiologic profiles of S. aureus isolates from Sub-Saharan Africa and Germany. S. aureus isolates from 600 staphylococcal carriers and 600 patients with community-associated staphylococcal disease were characterized by DNA hybridization, clonal complex (CC) attribution, and principal component (PCA)-based gene repertoire analysis. 73% of all CCs identified representing 77% of the isolates contained in these CCs were predominant in either African or German region. Significant differences between African versus German isolates were found for alleles encoding the accessory gene regulator type, enterotoxins, the Panton-Valentine leukocidin, immune evasion gene cluster, and adhesins. PCA in conjunction with silhouette analysis distinguished nine separable PCA clusters, with five clusters primarily comprising of African and two clusters of German isolates. Significant differences between S. aureus lineages in Africa and Germany may be a clue to explain the apparent difference in disease between tropical/(so-called) developing and temperate/industrialized regions. In low-resource countries further clinical-epidemiologic research is warranted not only for neglected tropical diseases but also for major bacterial infections.


Results
Healthy participants' and patients' characteristics are summarized in Tables 1 and 2, respectively. The median age of asymptomatic carriers (volunteers) was 18 (0-61) years and 23 (0-89) years in the African and German study sites, respectively. Patients in Africa had a median age (range) of 3 (0-71) years, in Germany 53 (0-98) years. German patients had a higher rate of previous hospital care, or overall healthcare. German patients more frequently showed risk factors for invasive S. aureus infection (as reflected by elevated rates of Charlson comorbidity score), and in Germany a larger proportion of clinical isolates was obtained from blood cultures when compared to clinical isolates from Africa. African patients had a higher rate of skin and soft tissue infections, while deep invasive infections of the bone/joint, or respiratory tract were more frequently reported among German patients. Patients with a history of HIV infection were only found in the African group. 1,190 isolates of the 1,200 S. aureus isolates could be assigned to 32 CC and three singleton STs. For seven isolates, the CC could not be deduced because they belonged to new MLSTs not covered by known array profiles. These isolates were either from Africa (n = 4, ST2734, ST2744, ST2370) or Germany (n = 3 ST2733, ST2678, ST2735). Three isolates (1.3%) that were not CC attributable by Iconoclust were attributed to CCs by affinity propagation (based on their MA profiles). Figure 1 displays the distribution of CCs of isolates from African and German study sites. Except for four CCs (CC80 and CC88 in Africa, CC50 and CC398 in Germany), all CCs with a number of at least six isolates were found in Africa as well as in Germany. For 17 of the 40 detected CCs and STs, significant geographic distribution differences were found. 16/22 (73%) of the most frequently encountered CCs, and the vast majority of isolates contained within these CCs (896/1168, 77%) were significantly (p < 0.05) predominant either in Africa or in Germany. In the subgroup of clinical isolates, CCs were again significantly (p < 0.05) predominant in Africa or Germany, respectively (with the exception of clusters of low abundance, i.e. CC6 and CC50), while of the CCs contained in the subgroup of commensal isolates, among African isolates only CC88, CC121 and CC152, and among German isolates only CC 7 and CC30 were significantly predominant. In addition, we used already published whole genome sequences of a randomly selected subset of isolates (n = 154) of this study to construct a neighbor-joining tree based on the allelic profiles of 1861 S. aureus core genome features (cgMLST, Figure S1) 21 .
On visual inspection, this analysis also shows that the majority of clusters are based on the geographical region. Clusters of isolates from infection or colonization were not detected. From the MA repertoire, all genes with a known or presumed regulator, virulence, and/or pathogenicity role were extracted, and compared between isolates from Africa and Germany (Supplementary Table 1). African isolates contain accessory gene regulators agr type I through type IV (with apparently some cross-hybridization between agrI and agrIV as the total number adds up to >100%); in contrast, within German isolates the majority is of agrI while agrIV was rarely found. Overall, enterotoxin gene recognition was low; yet, seb hybridized positively with DNA from African isolates, while sec, sed, sel, and the enterotoxin gene cluster egc was preferentially detected in isolates from Germany. A major difference was observed with leukocidins: the genes encoding for the Panton-Valentine leukocidin (PVL) lukF-PV and lukS-PV were recognized in almost one half of African clinical strains and were virtually absent in German isolates. The edinA and edinB immune evasion genes encoding the epithelial differentiation inhibitors were more frequently found amidst African isolates as was the gene isaB encoding the immunodominant antigen B and the protease gene splB. Only fragments and not the full map gene encoding the extracellular adhesive protein Eap were detected among African isolates; the gene sasG encoding for the biofilm associated surface protein G was more frequently found among African isolates (Table S1).
The majority of resistance genes were equally distributed among isolates from Africa and Germany. In general, methicillin resistance (mecA) was low in isolates from Africa (7/ , p < 0.0001). merA, ermA, and tetM also displayed a significant difference between German and African isolates, yet, at an overall rate of target recognition of less than 10%. These findings correspond well to the phenotypic resistance profiles (Supplementary Table 2); here, striking differences in phenotypic resistance could be observed for tetracycline and trimethoprim-sulfamethoxazole with a larger proportion of resistant isolates in the African population, and clindamycin, with resistance more prevalent among German isolates.
The combined PCA/Silhouette analysis allowed to identify nine PCA clusters (labelled #1-9, Fig. 2). Overall, the CC attribution of the isolates corresponded to these PCA clusters, i.e. the isolates confined to a PCA/silhouette cluster could be attributed to a specific CC. Clusters with preferential composition of ' African' isolates are  The CCs were sorted in ascending order according to the total number of isolates in the respective CC. The proportions of clinical (red) and nasal (green) isolates in the African and German group are shown. Differences in the distribution of CCs between Africa and Germany were calculated with Fisher's exact test; *p < 0.05, **p < 0.001.
With the Kolmogorov-Smirnoff test we identified the MA hybridization targets, which distinguished the isolates in the respective clusters #1-9 out of all MA hybridization signals for each isolate of the collection; these genes are denoted in Fig. 2.

Discussion
Here we present a prospective, cross-sectional geographic comparative study on strictly community-associated S. aureus isolates recovered under controlled, identical conditions in Germany and Sub-Saharan Africa and demonstrate that the cluster repartition among African and German isolates is profoundly inhomogeneous.
Studies from Europe revealed CC45, CC5, CC15, CC30, CC8 to be the most frequently encountered clusters 5,6,8,[22][23][24] . The overall smaller studies from Sub-Saharan Africa indicate that CC5, CC15, and CC30 are prevalent 14,20 , that MSSA-CC8 has been primarily reported from North Africa (whereas MRSA-CC8 has been found in Central and South Africa), and that CC121 was more frequently reported from Sub-Saharan countries compared to Europe 2 . Methicillin-sensitive CC80 has been more frequently reported from North Africa, and may be related to the community-associated MRSA clone ST80 prevalent in Europe 25 . CC88 isolates are also typically methicillin-resistant; because this cluster has almost uniquely been recovered from African regions, it was attributed the acronym ' African clone' 2 . The PVL positive clonal complex CC152 may also have originated from Africa 14 , expanded through central Europe, then acquired the methicillin resistance 26 . From these literature reports we conclude: first, 'typical' S. aureus clusters such as CC5, CC15, and CC30 appear to be prevalent in both Europe and Africa. Secondly, another set of 'typical' clones (such as CC80, CC88, or CC152) is reported from Africa rather than from Europe, yet, these clones do not seem to make up the bulk of isolates recovered in a non-endemic setting. Third, and probably most importantly, clear-cut studies allowing for frequency comparison between European and African clusters are lacking. The underlying mechanism of different population structures of S. aureus from Africa and Germany is unclear. The conservation of genomic patterns (e.g. gene clusters) and a subsequent clonal expansion could account for these differences. For instance, the ΦSa 2 prophage which carries lukF-PV and lukS-PV was integrated in the CC80 lineage at few occasions and subsequently clonally expanded in Africa and Europe 25 . Factors that favor the expansion of one clone in one geographic area might be associated with the bacterium itself (e.g. competition between different clones and species). However, host and environmental factors certainly play as well an important role which should be addressed in future studies.
Our cross sectional, comparative study now proves certain CCs of isolates from Africa to be indeed significantly prevalent (CC15, CC121, CC152) or even unique (CC88, CC80) compared to Germany. On the other hand, among isolates from Germany, other CCs are either significantly prevalent (such as CC45, CC30, CC7, CC22) or unique (CC398, CC50). PCA, avoiding multiple comparisons of single target recognition, confirmed this analysis allowing clear separation of the predominant ' African' from the 'German' clusters. Does this clonal repartition imbalance contribute to a difference in disease spectrum? In Africa, higher rates of S. aureus-related pyomyositis are reported, frequently with bone, skin, and soft tissue involvement 15,27 , at times presenting with multifocal lesions 28 . Moreover, S. aureus is particularly frequent in skin and soft tissue infections in Africa 2 . Molecular epidemiologic studies (from US and Europe) describe CC15 (in our study, ' African'), CC30 ('German'), and CC5, CC8, CC25 (in our study, 'balanced') as associated with invasive disease 7,23,29 , yet, they do not provide clear clues towards as to a different disease presentation as a function of predominant CCs in tropical/ temperate geographic areas. Our study now provides such indication as the two CCs in our study significantly linked with clinical (as opposed to commensal) origin were the ' African' clones CC121 and CC152 (while the two CCs associated with nasal provenience were the 'German' clones CC45 and CC101).
In addition to the clonal repartition difference between Sub-Saharan Africa and Germany, does the gene repertoire composition contained in the respective, imbalanced CCs contribute to different disease presentation? In our analysis, agrIV was identified with an over-representation in African and agrI in German isolates, respectively, consistent with previous studies demonstrating agrIV to be prevalent in African CC121 30 . Moreover, the previously reported difference in the positivity rate for lukF-PV and lukS-PV was clearly confirmed also in this study 14,15,31 . The enterotoxin gene seb was also found to be predominant in African isolates, in line with results from studies performed in isolates from remote pygmy populations 31 , particularly among isolates of CC121. Of note is the difference in recognition of isaB target encoding a gene only expressed in vivo 32 inhibiting autophagic flux, thus allowing S. aureus to evade host degradation 33 . The proteases are also considered of importance to virulence 34 , and splB was significantly more often detected in African isolates (while splE was predominant in German isolates). Among adhesion factors, significant differences were found for map, the gene conferring extracellular adherence protein (Eap) expression 35 and for the surface protein gene sasG. For map/eap this difference was largely attributable to a lacking recognition of the eap variant in isolates of CC152 (an African isolate whose genome did also fail to hybridize with the sdrC target) while for sasG the difference was mainly attributable to CC121. These observations allow to conclude that not only the clonal attribution but also certain regulatory, pathogenicity and virulence genes are differently distributed when comparing African and German S. aureus isolates obtained from patients with community associated infection.
The MRSA prevalence in our study was very low (nasal isolates: 2%, clinical isolates: 3%) compared to many other studies from Sub-Saharan Africa (23-55%) 36 . However, these studies should be interpreted with caution as, in contrast to our study, species of S. aureus and methicillin resistance were not confirmed. It is therefore likely that methicillin resistance is over reported in these studies.
The low rates of methicillin resistance could be also the result of strict exclusion of nosocomial, hospital-associated cases of infection. In accordance to the phenotypic data of many African studies showing a high resistance to penicillin 30 and tetracycline (21.8-92%) 37 , we found a significant predominance of the beta lactamase operon and of the tetracycline resistance determinants tetK and tetM in the African isolates. Moreover, the erythromycin resistance genes ermA and ermC were more frequently found in German and African isolates, respectively (in line with a recent study 38 ). In part, these findings were also confirmed by the phenotypic resistance profile demonstrating significant differences in susceptibility to tetracycline (but not to erythromycin).
This study has a number of limitations. First, the discrepancy in population age and comorbidities between the German and African cohort potentially biases the 'true' distribution of clones and genes between isolates from the different geographic regions (although application of a multiple linear regression model for the detection rate of Panton-Valentine leucocidin genes failed to provide evidence that age acts as a confounding variable [not shown]). In line, the imbalance in the type of infection (as shown in Table 2) between patients from Germany and Africa may also be a confounder with respect to the CC and virulence gene profile. Ex ante we deliberately did not attempt to match patients from Germany and Africa for age, comorbidity profiles, or type of clinical disease; instead, it was our goal to compare the patient characteristics and S. aureus isolates of a typical patient population presenting for primary medical care at German and African Medical Centers, and to avoid a potential bias incurred by imbalanced strata sizes. Secondly, the MA technique does not allow to distinguish between allelic variants not recognized by hybridization, and complete absence of alleles or genes (this issue has been investigated recently by our group comparing whole genome sequencing (WGS) and MA of exemplar isolates demonstrating that both techniques are highly but not fully reliable with respect to the gene/allele identification [with 1.7% WGS errors and 1.8% MA errors] 39 ). Furthermore, the amount of gene transcripts or gene products was not assayed; thus, no correlation between transcript levels and geographic isolate provenience can be inferred. Thirdly, it was not possible to quality control the reliability of clinical case ascertainment beyond the instruction of the clinical personnel on following the written detailed instruction provided together with the structured questionnaires, and attribution of clinical characteristics may therefore lack scrutiny. Fourthly, we did not engage additional study sites from Europe; therefore, our comparison is limited to German isolates. However, in contrast to MRSA, MSSA have a similar population structure across Europe 5 . As the majority of our isolates were MSSA, results from Germany could be used as a surrogate for Europe.
In conclusion, prospectively collected, community-associated S. aureus isolates obtained from asymptomatic carriers and patients demonstrate profound and significant differences between Germany and various Sub-Saharan African regions, both with respect of clonal cluster attribution and gene repertoire, and for many genes the difference between the cohorts appears to be even more pronounced when only clinical isolates from both regions are analyzed. Thus, based on the overall clonal attribution and allele repertoire, our data provide first clues to explain the purported difference in clinical presentation and course of diseases caused by Staphylococcus aureus, a pathogen of major significance both in developing and developed regions.

Methods
Study design and participants. This is a cross sectional, geographical correlation study. Wherever applicable, described definitions and items on molecular epidemiology for infectious diseases study designs were applied 40 . Between years 2010 and 2012, a total of 1200 community-associated isolates was collected in three African (Lambaréné, Gabon; Bagamoyo, Tanzania; Manhiça, Mozambique) and three German study sites (Homburg, Freiburg, Münster). Every study site collected 100 non-duplicate isolates of healthy asymptomatic carriers. Exclusion criteria were (i) hospitalization within the past four weeks, (ii) antibacterial treatment within the past four weeks, and (iii) antituberculous treatment in the past four weeks. In addition, 100 clinical non-duplicate isolates were collected from human infection at each study site. The inclusion criteria were (i) clinical suspicion of infection by the treating physician, and (ii) community-onset of disease (outpatient clinic, or <48 h after admission). Clinical data were systematically recorded, electronically transmitted to the Freiburg study site, and checked for data consistency.
Ethical approval was obtained from the Ministry of Health and Social Welfare of Tanzania (A 81- Isolate collection and microbiological methods. Nasal swabs from asymptomatic carriers and appropriate specimens from infection sites were collected, and species identification performed by standard methods and confirmed at the Homburg site by MALDI-TOF (BRUKER Daltonics GmbH, Bremen, Germany). Antimicrobial susceptibility testing was performed at the various study centers using standard techniques (Clinical and Laboratory Standards Institute, M100).
All isolates were transferred into storage tubes, and shipped on dried ice to a central sample repository (Fraunhofer IBMT, Sulzbach, Germany) for long-term storage at −140 °C. DNA microarray-based genotyping and MLST. All isolates were genotyped using the IdentiBAC ® DNA microarray (MA, Alere Technologies GmbH, Jena, Germany). DNA extraction (Qiagen, Hilden, Germany) and hybridization were performed according to the manufacturer's instructions. Spot signals were analyzed using ArrayMate ® reader and corresponding Iconoclust ® software (Alere Technologies GmbH, Jena, Germany) attributing specific multilocus sequence typing (MLST) clonal complex (CC) and sequence type (ST) designations. MLST was carried out for samples that were not assigned by the MA 41 .
CC assignment confirmation and statistics. Correctness of the CC identification by MA was confirmed by WGS of 154 exemplars 39 defined by affinity propagation 42 . Principal component analysis (PCA) was performed to represent the isolate genotype in a two-dimensional projection. Given a large set of data, PCA identifies a small number of uncorrelated variables (termed principal components) that explain the maximum amount of variance in the data. In particular, the first two variables termed PCA1 and PCA2 describe the largest and second-largest variance in the data (Fig. 2).
Statistical analysis (Kolmogorov-Smirnoff test, Hommel p-value adjustment) was used to determine genotypic differences of isolate clusters defined by PCA. All comparisons were statistically analyzed by Chi-Square adjusted for multiple testing (Hommel p-value adjustment). Chi-Square, multivariate and principal component analysis were performed with the software "R", version 3.2.0. Silhouette analysis was carried out to determine the number of different isolate clusters in the PCA, and was performed with "R", version 3.2.2, function silhouette and package "cluster" version 2.0.3 on default parameters.