Genomic characterization of novel Neisseria species

Of the ten human-restricted Neisseria species two, Neisseria meningitidis, and Neisseria gonorrhoeae, cause invasive disease: the other eight are carried asymptomatically in the pharynx, possibly modulating meningococcal and gonococcal infections. Consequently, characterizing their diversity is important for understanding the microbiome in health and disease. Whole genome sequences from 181 Neisseria isolates were examined, including those of three well-defined species (N. meningitidis; N. gonorrhoeae; and Neisseria polysaccharea) and genomes of isolates unassigned to any species (Nspp). Sequence analysis of ribosomal genes, and a set of core (cgMLST) genes were used to infer phylogenetic relationships. Average Nucleotide Identity (ANI) and phenotypic data were used to define species clusters, and morphological and metabolic differences among them. Phylogenetic analyses identified two polyphyletic clusters (N. polysaccharea and Nspp.), while, cgMLST data grouped Nspp isolates into nine clusters and identified at least three N. polysaccharea clusters. ANI results classified Nspp into seven putative species, and also indicated at least three putative N. polysaccharea species. Electron microscopy identified morphological differences among these species. This genomic approach provided a consistent methodology for species characterization using distinct phylogenetic clusters. Seven putative novel Neisseria species were identified, confirming the importance of genomic studies in the characterization of the genus Neisseria.

A phylogeny reconstructed from the sequences of 51 ribosomal loci identified distinct clusters among the Nspp isolates, each separated by long branches. This also confirmed the polyphyletic relationships of the N. polysaccharea isolates (Fig. 1B). A total of 115 rMLST profiles were identified: 9 among N. gonorrhoeae isolates; 18 among N. meningitidis isolates, with an additional isolate missing two loci and for which an rMLST profile could not be defined; 39 among N. polysaccharea isolates; and 49 among Nspp isolates. A network analysis using rMLST, which included additional genomes from the remaining human-restricted Neisseria 3 , showed that these Nspp isolates were distinct from other known species (Supplemental Fig. 1).
Nine distinct clusters were observed in a maximum-likelihood phylogeny derived from the 1114 cgMLST loci with complete sequences for all isolates; the cgMLST was defined for this study as explained in the "genomic diversity" heading of the method section. These were provisionally designated Nspp 1 to Nspp 9. At least two clusters could be differentiated among the N. polysaccharea isolates, designated N. polysaccharea 1 and N. polysaccharea 2 (Fig. 1C). Phylogeographical clustering was observed in both Nspp and N. polysaccharea groups as follows: (i) all the isolates from Nspp 1 (N = 36) and Nspp 8 (N = 1) were collected in Mali; (ii) Nspp 2 (N = 20), Nspp 4 (N = 14) and Nspp 5 (N = 21) were collected in Malawi; (iii) Nspp 3 (N = 9) came from Chad; (iv) Nspp 7 (N = 2) from The Gambia; and (v) Nspp 9 (N = 1) from Germany. The only cluster with isolates from different locations was Nspp 6 collected in Malawi (N = 2) and in the UK (N = 1). Similarly, all N. polysaccharea 2 isolates were collected in The Gambia (N = 6) and Mali (N = 7), two countries of the African meningitis belt, whereas N. polysaccharea 1 included isolates both from Europe and Africa. Even within the N. polysaccharea 1 cluster, these remained polyphyletic with clusters separated by deep branches. The ten isolates from Malawi (N = 10) formed a divergent branch and the remaining three African isolates, from Mali (N = 1) and The Gambia (N = 2), clustered more closely together compared to European isolates (Fig. 1C).
Average nucleotide identity identified seven putative novel Neisseria species. The cgMLST ANI calculated for the 13 representative genomes of the different clusters varied from 93.1% (between N. gonorrhoeae and Nspp 6 and N. gonorrhoeae and Nspp 7) to 96.6% (between Nspp 1 and Nspp 2). Only three pairwise comparisons yielded values >95%, these were between: (i) Nspp 1 and Nspp 2; (ii) Nspp 1 and Nspp 9; and (iii) Nspp 2 and Nspp 9. Using a threshold of 95% ANI, these findings indicate that these should be considered as distinct bacterial species. All the remaining ANI values were lower than 95% ( Fig. 2A). Similar results were obtained using whole genome ANI (Supplemental Fig. 2A).
Using the 95% threshold, the ANI data were used to classify the Nspp isolates into seven distinct species groups, each of which assigned a proposed species name:   www.nature.com/scientificreports www.nature.com/scientificreports/ after Jenny MacLennan (nee Green) who collected and characterized all the isolates from Malawi and The Gambia included in this study (green is viridi in Latin). 7. Finally, the name "Neisseria benedictiae" is suggested for the three Nspp 6 isolates, after Julia Bennett (bennet is benedictus in Latin) who pioneered the genomic work of N. bergeri isolates and the speciation of Neisseria with cgMLST approaches (Fig. 3).
The pairwise allelic comparison of the 1114 core loci with complete sequences between each defined cluster identified that, despite sharing the same loci, most clusters had distinct alleles, with only small numbers of shared allele between clusters. Nspp 1 and Nspp 2, however, had the largest number of loci (N = 693), sharing at least one allele, and similarly for Nspp 1 and Nspp 9 (N = 145) and Nspp 2 and Nspp 9 (N = 134) (Fig. 2C), consistent with the ANI analysis suggesting their grouping into a single species.
Neisseria polysaccharea is polyphyletic. The two N. polysaccharea clusters, designated N. polysaccharea 1 and N. polysaccharea 2, had an ANI value of 94.6%, confirming that they can be considered as two distinct species ( Fig. 2A). The pairwise allelic comparison based on the two clusters found that N. polysaccharea 1 also had large number of loci sharing at least one allele with various clusters (N. polysaccharea 2; Nspp 1, Nspp 2, Nspp 4, Nspp 5 and Nspp 6), suggesting that the cluster was not homogeneous and probably included more than one species. Three additional genomes of the N. polysaccharea 1 cluster were selected to attempt to characterize this further (Fig. 1C). 95% cgMLST and whole genome ANI ( Fig. 2B and Supplemental Fig. 2B) comparison among the genomes of these 5 isolates identified another distinct cluster, N. polysaccharea 3 (Fig. 3).
Phenotypic analysis of clusters. Macroscopic analysis of bacterial colony morphology established that colonies from all isolates tested were grey, exhibited γ-haemolysis, and formed a round shape on blood agar (BAP). The putative novel species exhibited edges with a regular shape and smooth texture; however, Neisseria blantyrii (Nspp 4) visually appeared to be moister. The size range of the colonies varied from 0.1 mm to 1 mm; N. meningitidis colony size range extended to 2 mm (Table 1).

Discussion
The concept and definition of species in prokaryotes continues to be subject to debate, as do the best methods for microbial taxonomy 30,31 . Many bacterial species have been defined solely on phenotypic characteristics 4 . For example, N. meningitidis and N. gonorrhoeae, were defined by their very different clinical manifestations, although they are very closely related and do not meet the 70% DDH criterion necessary to separate them into distinct species taxonomically 32 . The integration of ecological and genetic data has greatly improved classification. With the advent of high-throughput nucleotide sequencing studies, multi locus sequence analysis (MLSA) became an approach of choice 33 . The analysis of the seven MLST loci has proven to be sufficient to distinguish N. meningitidis, N. gonorrhoeae, and N. lactamica 34 , for example, which single-locus 16S RNA sequence analysis has failed to achieve 3 . Although housekeeping genes have predominantly been used, there have been no specific guidelines on the loci or the number of loci to be chosen for MLSA. The increased availability of whole or nearly complete genome sequence data allows species characterization at a higher resolution, as demonstrated through the use of the Neisseria genus cgMLST to identify multiple Neisseria species 3 . The phylogenomic approach 35 adopted here combines rMLST, MLSA, and ANI scores, using nucleotide sequence data of core genes, supplemented with a quantitative measure of genetic relatedness ( Fig. 2A,B). This allows phylogenetic relationships to be inferred among the collection of isolates under study (Fig. 1C). The 95% cgMLST ANI cut-off identified seven distinct species (Fig. 3); however, three clusters identified by phylogenomic analyses were indistinguishable by ANI and were therefore grouped into a single species named N. bergeri sp. nov. These three clusters can be described as subspecies (Nb Subspecies 1, 2, and 3). Three putative novel Neisseria species were assigned names, as clusters contained more than three representative isolates: N. uirgultaei sp. nov., N. blantyrii sp. nov., and N. viridiae sp. nov. The remaining clusters had fewer than three isolates and therefore will not be formally named; however, the species names ("N. maigaei", "N. basseii" and "N. benedictiae") are proposed for future reference. Identical ANI results were obtained using the full draft WGS (Supplemental Fig. 2A), confirming that using genes core to these human-restricted Neisseria species provided optimal genomic resolution with exclusion of intergenic regions and accessory genes not affecting results. Congruence between ANI results and cgMLST allelic comparison ( Fig. 2A,C), suggests that allelic diversity plays an important role in the speciation process, alongside gene presence/absence. The low number of shared alleles confirms the infrequency of cross-species recombination as previously observed in a study comparing N. meningitidis, N. lactamica, and N. gonorrhoeae 36 . Analyses identified distinct clusters among the N. polysaccharea isolates included in this study: two phylogenetic clusters, N. polysaccharea 2 and N. polysaccharea 3, could be reclassified as distinct species based on the 95% cgMLST and ANI ( Fig. 2B and Supplemental Fig. 2B). However, a subset (N. polysaccharea 1) remained polyphyletic.
A major limitation when undertaking taxonomic studies is ensuring the natural diversity of the bacterial population has been appropriately sampled. Studies are often geographically limited, leading to an underestimation of diversity. The different nasopharyngeal carriage studies undertaken in the UK 29,37 , countries of the African Meningitis Belt 19,26,38 , and in Malawi (outside the belt) 28 have led to the collection of a large number of Neisseria isolates; however, the NPNs were not always characterized at the genomic level. Our results indicate a phylogeographic clustering of genotypes, with isolates from the same country clustering together (Fig. 1C). Even though most recent carriage studies, done in the UK and in African countries, used the same laboratory methodology, most of the previously non-speciated isolates were recovered in Africa with only one Nspp found in the UK, suggesting a potentially greater diversity of Neisseria species in African countries than in the UK. However, differences in the age groups targeted, the whole population (AMB and Malawi) as opposed to 15-19 years old (UK), may have played a role in the differences observed.
WGS analysis of additional carried Neisseria isolates from various geographical regions will improve the taxonomy of this genus. Isolates previously characterized as "N. bergeri", based on nucleotide sequences of the rplF locus 20 , included additional putative novel Neisseria species, such as those identified in this study. Genome sequences obtained from the Malian collection of the MenAfriCar project found that most of Nspp genomes were classified as N. bergeri sp. nov. using the 95% cgMLST ANI analysis presented here; however, one was found to form a distinct putative species, "N. maigaei". Although the sequencing of the f_rplF locus provides species identity for previously defined species, our study confirms that it does not provide sufficient resolution for the characterization of new species. Each of the clusters identified had distinct f_rplF alleles except N. benedictiae (Nspp 6) and N. basseii (Nspp 7) which shared allele 69, consistent with recent HGT or shared ancestry.
All the putative novel species and the N. polysaccharea isolates were indistinguishable using standard biochemical tests, despite being clearly distinct by 95% cgMLST ANI. However, additional tests including biochemical analyses, pH, or chemotaxonomy (analysis of chemical composition of the cell) combined with a larger numbers of isolates, including additional type strains, may identify further differences 39 . Phenotypic tests are labor intensive, and the interpretation of the results can be subjective. The Diatabs TM tests were, overall, more straightforward than API®-NH tests, which have a strict McFarland Standard concentration. Sugar degradation by the Diatabs TM were accurate for the N. meningitidis and N. lactamica controls but were elsewhere unstable with repeats giving different results. On the other hand, the degradation of sugars assessed by the API®-NH were highly reproducible, but failed to give the expected results for N. meningitidis 24 . Coccoid cells, some of which exhibiting distinct extracellular membrane structures, were visible following SEM analyses (Fig. 4). After 20 hours of incubation, most of the putative novel species had "Heart-shaped" cells (Fig. 4). These cells were still visible after 48 h of incubation (data no shown) but additional time points may establish whether these forms were intermediate shapes of coccoid cells or remained distinctive. Overall, phenotypic tests were inconclusive.
The genomic analyses presented here, allowed putative novel species predominantly found in African countries to be identified. Determining the broader implications of these findings will require more sampling, genomic characterization and analyses of the interactions of these different species in the nasopharynx. This information will determine any role that they play in modulating carriage and invasion by N. meningitidis. Future (2019) 9:13742 | https://doi.org/10.1038/s41598-019-50203-2 www.nature.com/scientificreports www.nature.com/scientificreports/ microbiome and immunological studies will be crucial to shed further light on the dynamics of carriage, but genomic characterizations of the isolates will be necessary to adequately assess the diversity. Another novel, non-human-restricted, Neisseria species has been recently identified in the US, indicating the potential for other unsampled species 40 .
Results from this study suggest that Neisseria diversity may be greater in countries outside the AMB, for example Malawi, than within the belt. This could have important implications regarding the impact of those species in modulating the carriage and invasiveness of N. meningitidis, as the disease epidemiology is known to be very different between these two regions of Africa; however, it is likely that analysis of more genome sequences of the Neisseria isolates from the AMB, such as those from the other countries sampled alongside Mali and Chad during the MenAfriCar project 20 , will improve our understanding of the diversity seen within the AMB and allow more adequate hypotheses to be formulated. The analytical methods developed here provide a framework for future speciation endeavors and can easily be repeated to incorporate additional genomes when available. It is now evident that the diversity of human restricted Neisseria had been underestimated and more carriage studies using genomic methods should be encouraged to define a more accurate picture of the genus.

Materials and Methods
Isolates and genomes. Isolate collection. A total of 107 Neisseria isolates unclassified to species level (Nspp) were identified in the pubMLST Neisseria database; some of these had been recorded as "Neisseria bergeri", following preliminary analysis 20,25,28 . The majority of the Nspp were collected in carriage studies performed in Africa and included: (i) 57 isolates collected from Ndirande, Blantyre, Malawi, from 27,28,41 ; (ii) 37 isolates collected in Mali and 9 from Chad 20,38 ; and (iii) 2 isolates obtained in Basse, Gambia 26 . Two additional isolates were collected in Europe: the first in Germany in 24 ; and the second in the United Kingdom 29 .
DNA preparation, Genome sequencing and assembly. All the African Nspp isolates were sequenced de novo using the Illumina MiSeq platform. Isolates were cultured on blood agar plates (BAP) and incubated overnight for 16-24 hours at 37 °C, in 5% CO 2 . For 96 of the isolates, genomic DNA was extracted using the Wizard genomic DNA Purification kit (Promega) following the manufacturer's instructions; for the remaining nine isolates from Chad, DNA was extracted using the DNeasy Blood and Tissue kit (Qiagen), as previously described 42 . DNA samples were sequenced by pair-end sequencing on an Illumina HiSeq, with read lengths of 100 (N = 55), 125 (N = 46) and 150 (N = 4) base pairs at the Wellcome Trust Sanger Institute, using previously described methods 43 .
Resultant short reads were assembled using the Velvet genome assembly program (v1.2.08) 44 : all odd-numbered kmer lengths between 21-99, 21-124 and 21-149 were sampled for read lengths of 100, 125, and 150 base pairs respectively, using the VelvetOptimiser software (v2.2.4) 45 to automatically establish the optimal assembly parameters for Velvet (default optimization functions used). The assembled contigs were deposited in the Neisseria PubMLST database 46 which uses the Bacterial Isolate Genome Sequence Database (BIGSdb) software 47 and to the European Nucleotide Archive (Accession numbers in Table 1). Draft genomes were automatically curated at all loci defined in the database using the BLAST algorithm 48 and a sequence similarity threshold of >98%, allowing rapid annotation of known alleles and sequences which were very similar to defined loci 43 . Manual verification was then performed for variable loci, such as those containing internal stop codons, frame shifts or those with sequence similarities lower than 98%. Incomplete gene sequences at the beginning or end of a contig were identified as such.
Additional genomes. In addition to the 105 African Nspp sequenced de novo, and the two European strains previously sequenced and deposited into the http://pubMLST.org/neisseria database, 74 other genomes deposited in the database were chosen as described below, to complete the genome collection analyzed in this study. These represented three known human-associated Neisseria species and included all the genomes available at the time of analysis for N. polysaccharea (N = 45), all but one finished genomes of N. gonorrhoeae (N = 9) and representative genomes of the most common clonal complexes of N. meningitidis (N = 20) 43 , with a preference for finished genomes when available. Additional information and meta-data for each of the 181 genomes are available in Supplemental Table 1.

Genomic diversity. Phylogenetic analysis. A hierarchical gene-by-gene approach to Whole Genome
Sequence (WGS) analysis was applied to all 181 genomes, including the analysis of: (i) f_rplf, a fragment of the ribosomal gene rplF 9 , (ii) 51 of the 53 rMLST loci (BACT0060 (rpmE) and BACT0065 (rpmI) were excluded as they are known to be paralogous in Neisseria species); and (iii) a newly defined cgMLST scheme (Human-restricted Neisseria cgMLST v1.0), including 1441 loci present in at least 95% of isolates in this collection. All the sequences were aligned using MAFFT in pubMLST 46,49 .
Two methods where used in parallel to generate the list of core loci using: (i) the Genome Comparator tool of PubMLST was used to compare all genomes, using the N. meningitidis finished genome FAM18 (Accession number: AM421808) as a reference with parameters set at a minimum identity of 70%, a minimum alignment length of 50%, a BLASTN word size of 20 base pair and a core threshold of 95% (paralogous loci were excluded from this analysis and variable loci aligned and concatenated into a single sequence for each genome); (ii) Proteinortho, assembled draft genomes were downloaded from PubMLST for each isolate and used as input files for the annotation program Prokka, using default settings with the option "-addgenes" and "-compliant" 50 . The protein sequence outputs of Prokka were used as input files for Proteinortho, with default settings, to detect orthologous group of genes 51 .
Both methods generated a list of core loci present in 95% of the genomes included in the analysis; 1421 loci were identified by Genome comparator and 1366 by Proteinortho. An inclusive list of 1441 loci including any that were considered core to 95% of isolates by both methods was generated and used in the phylogenetic analysis.
www.nature.com/scientificreports www.nature.com/scientificreports/ Within these 1441 loci, 1114 had complete sequences in all the genomes, whereas the remaining 327 fell at the beginning or the end of a contig in at least one genome and were classified as incomplete genes and excluded from the phylogenetic analysis (Supplemental Table 2).
Phylogenetic analyses were performed for f_rplF, rMLST, and cgMSLT loci using a finished genome of Neisseria lactamica (pubMLST ID: 8851/isolate ID: 020-06) as an out-group. The f_rplF and rMLST phylogenies were generated in MEGA6 using the Maximum likelihood method with the following options: General Time Reversible model 52 , 100 bootstrap replications, Gamma distributed rate with Invariant sites and 5 discrete Gamma Categories. The cgMLST phylogeny was generated in FastTree 53 using the General Time Reversible model ("-gtr") and all other defaults options. Tree annotations were performed in MEGA6 54 .
Another rMLST phylogeny was reconstructed with the Neighbor Joining model in MEGA6 using the 181 genomes previously described and all 40 additional human-restricted Neisseria species included in the paper published by Bennett et al., which included the available type strains genomes 3 .
Pairwise allelic variations at the core genes, between each identified clusters, were assessed using the allele-overlap script comparing alleles of each gene between two defined sets of genomes 55 .
Statistical methods. Average nucleotide identity. The Average Nucleotide Identity (ANI) was calculated by comparing concatenated sequences of cgMLST loci from representative isolates of each phylogenetic cluster of interest: 13 isolates for the analysis of the whole phylogeny and 5 for the further characterization of the N. polysaccharea cluster. An online ANI tool, " the ANI calculator" 56,57 was used to calculate a two-way ANI between each pair of selected isolates using the method described previously 11 . Pairs of isolates with ANI values higher than 95% were considered as indistinguishable at the species level as suggested by the program. The analysis was also computed using the whole genomes sequences of the isolates.
Morphological analysis. Macroscopic assessment. Cell morphology was characterized, and the color, texture, shape and colony contour were visually assessed after 20 h of incubation. The size of individual colonies was measured using a graduated ruler.
Scanning Electron microscopy (SEM). Sub-cultured isolates were prepared for SEM after 20 hours of incubation, following adaptation of a protocol previously described 60 . Briefly, a piece of agar with bacterial cells attached was excised and fixed with a 2.5% glutaraldehyde solution in 0.1 M Phosphate buffer and incubated at room temperature for 1 hour. Specimens were then rinsed in 0.1 M Phosphate buffer and post-fixed with 1% osmium tetroxide for 1 hour. Specimens were then rinsed in Mili-Q water and dehydrated using an ethanol series: 50% for 10 min, 70% overnight (stored at 4 °C), 90% for 10 min, 95% for 10 min, 100% for 20 min and in pure ethanol three times for 20 min each. Ethanol was removed, and the samples were dried by incubating specimens in a 50% Hexamethyldisilazane (HMDS) and 50% ethanol solution for 15 min, followed by incubation in pure HMDS for an additional 15 min. Once HMDS was removed, samples were left in a fume hood overnight to allow HMDS fumes to evaporate. Samples were mounted on SEM stubs with a conductive carbon adhesive, then coated with 10-15 nm of gold using a Quorum Technologies Q150R ES sputter coater and were imaged with a Zeiss Sigma 300 field emission gun SEM at an accelerating voltage of 2 kV.
Sugar degradation and other biochemical tests. Oxidase tests were conducted with the oxidase test disks (Sigma-Aldrich) following the manufacturer's instructions. Catalase tests were undertaken by putting the bacteria in contact with a drop of hydrogen peroxide; the formation of bubbles was considered a positive reaction. Haemolysis was assessed for each sample by visual inspection of the cells on BAP.
Diatabs TM (Rosco Diagnostica) were used according to the manufacturer's instructions to test for sugar degradation of glucose, maltose, sucrose, and lactose, and other selected microbial enzymatic properties including: the ability to enzymatically hydrolyse aminopeptidase substrates like gamma glutamyl aminopeptidase (GGA), leucine aminopeptidase and proline aminopeptidase; the hydrolysis activity of the enzyme β-galactosidase on the orthonitrophényl-β-galactoside (ONPG) substrate; and the enzymatic hydrolysis of tributyrin into butyric acid and glycerol often used for the differentiation of Moraxella catarrhalis from Neisseria species. The results were assessed following the instructions of the test (Supplemental Table 3).
The API-NH ® kit (Biomerieux) for Neisseria characterisation was also tested in parallel, following the manufacturer's instructions (Supplemental Table 4).