Rapid MinION metagenomic profiling of the preterm infant gut microbiota to aid in pathogen diagnostics

The Oxford Nanopore MinION sequencing platform offers direct analysis of DNA reads as they are generated, which combined with its low cost, low power and extremely compact size, makes the device attractive for in-field or clinical deployment, e.g. rapid diagnostics. We employed the MinION platform for shotgun metagenomic sequencing and analysis of mixed gut-associated microbial communities; firstly, we used a 20 species human microbiota mock community to show that Nanopore metagenomic data can be classified reliably and rapidly. Secondly, we profiled bacterial DNA isolated from faeces from preterm infants at increased risk of sepsis and necrotising enterocolitis to analyse their gut microbiota. Using longitudinal samples, and comparing Illumina to MinION, we captured the diversity of the immature gut microbiota and observed how its complexity changes over time in response to interventions, i.e. probiotic, antibiotics and episodes of suspected sepsis. Finally, we performed a ‘real-time’ run from sample to analysis using a faecal sample of a critically ill infant. Real-time analysis was facilitated by our new NanoOK RT software package. We determined that we can reliably identify potentially pathogenic taxa (i.e. Klebsiella pneumoniae) along with corresponding AMR gene profiles in as little as one hour, post sequencing start. Furthermore, data obtained revealed insights into how antibiotic treatment decisions may be rapidly modified in response to specific AMR profiles, which was validated using pathogen isolation, whole genome sequencing and antibiotic susceptibility testing. Our results demonstrate that MinION sequencers offer the ability to progress from clinical samples to a potential tailored patient antimicrobial treatment in just a few hours.


INTRODUCTION
Next generation sequencing (NGS) has revolutionised profiling of environmental and clinical microbial communities. In particular, the culture-independent, sensitive, data-rich nature of metagenomic sequencing, combined with powerful bioinformatics tools, have allowed researchers to start to differentiate patient groups from healthy individuals based on their microbial profiles [1][2][3][4][5] . Microbiota linked health effects include the ability to support immune system development, facilitate dietary metabolism, modulate the metabolome and provide antimicrobial protection [6][7][8] . Disturbances in the microbiota by, for example, external influences such as antibiotics, have been associated with an increased risk of diseases including ulcerative colitis 9 , obesity, and autoimmune conditions 10,11 , and within an infectious disease context increased risk of pathogen overgrowth 12 .
NGS platforms are often large capital investments with a considerable physical footprint, and generate huge quantities of data using a pipeline which combines many samples into a single sequencing run, which can take days or weeks to run and analyse. These are not ideal attributes for clinical settings. In contrast, the MinION platform released by Oxford Nanopore Technologies in 2014 is a pocket-sized sequencing device powered from a laptop's USB port capable of producing gigabase yields of long reads in real time (see recent review of MinION technology and it's applications 13 ). This low cost, $1000 platform may allow scientists working within a clinical setting a readily portable and easy-to-use NGS platform that shows great potential for rapid pathogen screening, diagnosis, and treatment strategy design. Crucially, the device operates in 'real-time' allowing sequenced reads to be analysed immediately after they are generated. However, due to the 'real-time' nature of the platform and its differing error profile, it is critical to develop new bioinformatics pipelines to take full advantage of the data, particularly in the clinical arena.
The huge rise in antimicrobial resistance (AMR) [14][15][16] highlights the need for rapid methodologies to identify at-risk individuals, diagnose infectious agents and evaluate treatments. The MinION may represent such a technology for clinical and healthcare settings. One such at-risk patient cohort is premature infants, defined as born before the start of the 37th week of pregnancy, and accounting for 1 in 10 live births globally, with this number increasing 17 . These infants are born with underdeveloped gut physiology and immunity, which is associated with an increased risk of life threatening infections 18 . Furthermore, preterm infants are often born via Caesarean-section and receive multiple antibiotic courses during their hospitalisation that deplete their gut microbiota, including the beneficial genus Bifidobacterium, that may correlate with overgrowth of pathogens linked to diseases such as necrotising enterocolitis (NEC) 19,20 . Notably, this disease may be difficult to diagnose in its early stages and is often associated with fulminant deterioration. Pathogens currently linked to NEC include Clostridium perfringens and Klebsiella pneumoniae 21 . From a diagnostic perspective it is therefore important to be able to confidently detect (i) microbes at the species level to provide accurate diagnosis, (ii) species abundance within the microbiota (as these bacteria can also reside within the wider community, but not cause disease when at low levels) and (iii) AMR gene repertoires.
Despite technical challenges, MinIONs have been successfully used in medical research on low complexity samples. To date this includes surveillance of the Ebola outbreak in West Africa (using RT-PCR amplicons) 22 , characterisation of bacterial isolates 23 and DNA spiked into patient samples at known levels 24 or from samples heavily (~90%) infected with a single species 25,26 . Diagnostics in metagenomics samples is still challenging due to lower MinION sequence yields and accuracy, but is essential as most clinical samples are complex. Here we demonstrate MinION based metagenomics, first in a controlled (i.e. mock community) setting and then with clinical samples including in 'real time'. These studies allowed us to determine longitudinal microbiota profiles, gut-associated pathogens linked with sepsis or NEC and their AMR profiles. We benchmarked these data against conventional Illumina sequencing, whole-genome sequencing (WGS) on pathogen isolates and phenotypic antibiotic susceptibility testing.

Accurate classification of a microbial mock community using MinION sequencing
We benchmarked the accuracy of MinION technology by profiling a bacterial mock community of a staggered abundance mixture (HM-277D, BEI Resources). Sequencing was carried out on R7.3 flowcells using a "2D" library protocol that involves sequencing both strands of DNA. One flowcell produced 148,441 total reads, with 71,675 reads passing default quality filter and a mean size of 3,047 bp (read size is driven largely by the size of input DNA molecules) and a longest read size of 40,561 bp ( Table 1). Reads were analysed with NanoOK 27 and produced alignments to the 20 reference sequences with 82-89% identity (consistent with previous published analysis 28 ) including long error-free sequences up to 223 bases (the full NanoOK report is available at https://github.com/richardmleggett/bambi). Coverage of each genome ranged from almost 0x (8 reads of Actinomyces odontolyticus) to 13x (7,695 reads of Streptococcus mutans), which is consistent with their mock concentrations (Supplementary  table 5). We benchmarked to 'conventional' (Illumina based) metagenomic sequencing using 1 million reads (equivalent to a MiSeq nano flowcell). To simulate an unknown community, reads were BLAST searched against the NCBI nt database and then classified taxonomically using MEGAN6 29 . This showed broadly similar abundance levels across both platforms, but in some cases, MEGAN's Lowest Common Ancestor algorithm was able to assign a greater proportion of Nanopore reads to species level rather than genus or family, while in other cases, a greater proportion of Illumina reads were able to be assigned to species (Fig. 1a). This apparent contradiction is likely because in some cases the longer length of Nanopore reads will provide better specificity, however these longer reads may also contain errors which may lessen specificity, depending on sequence distance between two species within the same genus. Comparing MinION and Illumina species abundance resulted in a Pearson correlation coefficient of 0.91 (Fig. 1b), which is comparable to previously published work assessing mock community composition on Illumina technology 30 .

Monitoring microbial disturbances in the preterm gut microbiota using MinION
We next sought to determine the relevance of MinION technology in a more clinical context; profiling at-risk preterm infants. We studied faecal samples from one preterm patient (P10) from days 13, 28 and 64 after birth (samples P10N, P10R and P10V respectively, Fig. 2a), and compared MinION platform (version R7.3) to Illumina shotgun and Illumina 16S rRNA gene amplicon sequencing. We initially ran one flowcell for P10N, but it was necessary to run two flowcells for P10R and P10V in order to generate sufficient yield on this earlier MinION chemistry. This study generated between 145,342 and 234,453 reads per sample across 5 flowcells ( Table 1). We confirmed that the MinION sequencing depth was sufficient to capture the complete species diversity of the samples, by computing rarefaction curves (Supplementary Fig. 1). For all sequencing technologies and samples, the vast majority of species diversity was captured by ~20,000 reads. This analysis indicated that the yield and accuracy of R7.3 MinION flowcells was sufficient for analysis of low bacterial diversity preterm samples 31 .
When we compared MEGAN's taxonomic assignments obtained using MinION vs. Illumina data we again observed that the results were comparable for the majority of the bacterial genera present in preterm infant P10 (e.g. Klebsiella, Enterobacter, Veillonella, Staphylococcus and Bifidobacterium) (Fig. 2b). Interestingly, for sample P10N (collected when the infant was receiving probiotic supplementation) both platforms indicated presence of one probiotic strain, Bifidobacterium bifidum. MinION and Illumina also detected the presence of Enterobacter cloacae (in sample P10R), a notorious pathogen causing late onset sepsis (LOS) in preterm infants 32 , which correlated with the clinical diagnosis of suspected sepsis (Fig. 2a, however no clinical microbiology testing was carried out to confirm this). Corresponding 16S rRNA gene data (Supplementary Fig. 2) provided similar profiles to our shotgun results (albeit with different abundances), but unsurprisingly the short 16S rRNA gene reads failed to differentiate some bacteria taxa even at genus level e.g. members of the family Enterobacteriaceae, which comprises commensal gut bacteria as well as opportunistic pathogens and whose full-length 16S rRNA genes are often indistinguishable from one another. Overall, these data highlight the potential for MinION technology and shotgun sequencing to be used in a clinical setting to confirm the impact of microbiota intervention studies (i.e. probiotics) and elucidate potential causative pathogens which may lead to clinical infection diagnoses.
Administration of antibiotics can lead to disruption of early colonisation by gut microbes, and may also contribute to the reservoir of AMR genes; the 'resistome' 33 . Here, we also determined the AMR profile in preterm infant P10 using the Comprehensive Antibiotic Resistance Database (CARD) and compared MinION sequencing data to HiSeq Illumina data. Supplementary Fig. 3 represents a summary of the AMR genes detected using both sequencing technologies. Overall, if we classify the AMR genes according to mode of action, the detection efficiency of MinION and Illumina was comparable, with only four genes with unique resistance mechanisms (mphC, fusB, sat-4 and vanRG) were detected exclusively by Illumina. Focusing on gene abundance, four main groups: efflux pumps, β-lactamases, aminoglycosides and fluoroquinolones were particularly prevalent (Supplementary Fig. 4). Importantly, MinION technology was able to detect AMR genes specific for certain species such as ileS encoding for mupirocin 34 resistance in Bifidobacterium (sample P10N), or fosA2 35 encoding for fosfomycin resistance, which is specific for E. cloacae (sample P10R, Supplementary Table 1). These data illustrate the ability of MinION technology to detect the pool of AMR genes present in the samples tested, and also determine species-specific AMR genes.

Use of MinION for rapid characterisation of gut-associated pathogenic bacteria in a preterm infant with necrotising enterocolitis
Having successfully detected a specific beneficial bacterium (i.e. B. bifidum after probiotic supplementation) and a potential opportunistic pathogen (i.e. E. cloacae) using MinION technology (Fig. 2b), we next performed a 'real time' run to evaluate how rapidly the MinION could do this in a clinical setting. Current rapid clinical microbiology tests, including determining antibiotic susceptibility, take between 36 and 48 h. We performed a 'real-time' run (from sample preparation to data analysis) using the most up-to-date MinION R9.4 flowcells on a faecal sample from a preterm infant (P8) clinically diagnosed with suspected NEC (Bell stage 1A) (Supplementary Fig. 5). Notably, this infant was exposed, before sample collection, to 43 days of non-concurrent antibiotic treatments (i.e. benzylpenicillin, gentamicin, meropenem, tazocin, vancomycin, flucloxacillin, metronidazole and amoxicillin).
We executed the 'real-time' run and timed all stages used for this pipeline including sample preparation (90 min), DNA quality control (45 min), 2D MinION library preparation and loading onto the MinION flowcell (2 h), and sequencing-and-data analysis (40 min for first non-specific AMR hit) (Fig. 3). Thus, the overall time from initiating DNA extraction to obtaining first clinically relevant data was less than 5 h. In this run, we also evaluated two different basecalling approaches for the nanopore sequencing. Initially we ran ONT's Metrichor cloud-based service, but we encountered significant time delays on the experimental day. Having generated 787,601 reads in just over 11 h, but basecalled only a small proportion of these, we then abandoned the attempt and reloaded the same flowcell using MinKNOW's local basecalling option, which was able to keep pace with read generation. Using this approach, we generated 423,422 reads in approximately 23.5 h, and it is these that we considered in the following analysis. Shortly after completing the experiment, ONT announced the discontinuation of the Metrichor basecalling service, explaining the lag we had observed.
The sequenced reads which successfully passed the quality filter (mean quality 9 or greater giving 230,494 reads, Table 1) were analysed using the NanoOK RT pipeline which groups 500 read batches for analysis against the NCBI nt and CARD databases (see Online Methods for further details). The first 500 reads immediately indicated a dominance (230 reads) of K. pneumoniae (a potential causative organism that has been associated with NEC pathogenesis in preterm infants 36 ), as well as Proteus mirabilis (42 reads). By 1 h after sequencing start (5 h 15 min total), the pipeline had analysed 18,000 pass reads and K. pneumoniae accounted for around 75% of reads. To further verify we had sequenced enough of the bacterial diversity existing in the sample at this time point, we compared taxonomic profiles from analysis completed at 6 h (91,000 reads, 10 h 15 min total time) and at 26 h (229,500 reads, 30 h 15 min total time, sequencing finished and all analysis complete). This comparison verified that there were no significant qualitative differences between the three taxonomic profiles (Fig.  4a).
As highlighted previously, it is clinically important to detect AMR genes in metagenomic samples from preterm infants to guide appropriate antibiotic prescription. In our 'real-time' run we determined how rapidly we could map AMR genes from the CARD database with respect to sequencing depth over time. Fig. 4b shows the classes of AMR genes detected throughout the run, including quinolone resistance, β-lactamases and efflux pumps (similar to observations in preterm infant P10) were detected in as little as 1 h after sequencing start. We were able to detect K. pneumoniae-specific SHV variants 37 as early as 1 h 8 min (at 21,000 reads, 5 h 23 min total time), whilst other lower abundance AMR genes in the sample, such as those conferring trimethoprim, streptothricin and colistin resistance, were not detected until 3-9 h post sequencing (7-13 h total).
As MinION reads are typically longer than Illumina reads we reasoned we could extract additional information by examining flanking sequences either side of each AMR hit and searching the NCBI nt database for hits that were independent (defined as ⋝ 50 bp) from the AMR sequence. This 'walkout' study indicated that the majority of AMR genes within the whole metagenomic sample mapped to K. pneumoniae (Supplementary Fig. 6, Supplementary Table 2), including multidrug exporters such as acrB or oqxAB associated to K. pneumoniae and conferring resistance to tetracycline, chloramphenicol, and fluoroquinolones. We also correlated specific AMR gene cassettes including aadA, which can confer aminoglycoside resistance to P. mirabilis. These data indicated that relevant AMR genes detected in a metagenomic sample and further mapped to known pathogenic species may facilitate tailored antibiotic treatment strategies for critically ill patients.

Isolation of Klebsiella pneumoniae colonies from preterm infant P8 and phenotypic characterisation of antibiotic resistance
To validate the genotypic data obtained from our 'real time' MinION nanopore run, we isolated nine K. pneumoniae colonies from patient P8, and performed alignment analysis on their 16S rRNA gene sequences which showed similarity levels ranging from 99.8% to 100% (Supplementary Table 3). To further confirm our metagenomics findings, we performed WGS on one K. pneumoniae isolate (assembled using Prokka, 69 contigs, ~5.7MB genome, 57% GC and 23.5x coverage). Notably, we observed five AMR genes (FosA5 [encoding Fosfomycin resistance], acraA [efflux pump], oqxA, oqxB [quinolone resistance], and several different SHV allelic variants [encoding extended-spectrum β-lactamases, ESBLs]), which matched to the same genes as analysed from MinION data, suggesting metagenomic sequencing and our 'walk-out' analysis was sufficient to detect relevant AMR genes (Supplementary Fig. 8).
In order to demonstrate antibiotic resistance phenotypes correlating to presence of AMR genes we tested the susceptibility of the same K. pneumoniae isolate with the seven most commonly used antibiotics in NICUs (Supplementary Fig. 9). Interestingly, the K. pneumoniae isolate was found to have a higher minimum inhibitory concentration (MIC) breakpoint value for those antibiotics that were prescribed to the preterm infant P8 (i.e. benzylpenicillin, amoxicillin, metronidazole, gentamicin and vancomycin, Supplementary Fig.  5). In contrast, the MIC breakpoint values obtained for antibiotics that were not administered to the infant (i.e. cefotaxime) were lower and similar to those put forward by the European Committee on Antimicrobial Susceptibility Testing (EUCAST 38 ). These data correlate with the AMR genes determined from the MinION and WGS data; SHV variants encode resistance to β-lactam antibiotics (i.e. benzylpenicillin and amoxicillin). Interestingly, this K. pneumoniae isolate was also found to be vancomycin resistant by MIC testing, but we did not detect any specific vancomycin resistance genes encoded in its genome. This resistance may therefore be explained by the presence of various multidrug exporter genes such as acrA and oqxAB (Supplementary Fig. 9).

DISCUSSION
There are pressing clinical needs for faster microbial diagnostics and tailored, rather than widespread broad spectrum, antibiotic use. Gold standard microbial laboratory screening is limited by the growth rates of possible pathogens on selective media and the combinatorial tests that can be conducted. While microbiological tests take days, newer sequencing platforms, e.g. Illumina, could manage this in a faster manner and provide rich data that can also be used to track hospital epidemics 39 , but requires specialised laboratories and large capital investment. The new MinION sequencer seems an ideal tool because it is smaller, faster and cheaper than Illumina machines. Using a combination of improved Nanopore sequencing chemistries, and our own open source Nanopore analysis packages, NanoOK RT and NanoOK Reporter, we showed that the MinION platform is able to successfully profile known metagenomes (i.e. mock community), and even clinical samples. Importantly, MinION sequencing data using the new R9.4 flowcells were comparable in discriminatory power to the conventional Illumina sequencing platform, and provided clinically relevant information within just 5 h from sample receipt. We were able to demonstrate three clinically relevant, and actionable, pieces of data: 1) abundance of microbial species present in the sample, 2) overall antibiotic resistance gene profile and 3) species-specific antibiotic profiles.
Our first findings, using a known metagenomic sample (mock community), indicated that the MinION could be suitable for detection of microbes from a mixed clinical sample and so we next tested longitudinal samples from a preterm infant residing in NICU. We found MinION and Illumina analyses were comparable in discriminating species and indicated one of the probiotic strains (i.e. B. bifidum) during the supplementation period was present (Fig. 2b). Furthermore, these data provided information on probiotic therapy indications after antibiotic administration; absence of these beneficial species may correlate with a 'disturbed' microbial profile 40 . Specifically, E. cloacae completely dominated the microbiota (sample P10R) and is a known sepsis pathogen (able to disseminate to the blood) in immunocompromised, including preterm, patients 41 . Therefore, profiling opportunistic gut-associated pathogens may provide key clinical data to prevent subsequent systemic infections.
One of the critical tests carried out by clinical microbiology labs is the detection of pathogens and their antibiotic sensitivities. This is particularly pertinent due to the increasing prevalence of antibiotic-resistant bacteria and lack of novel antibiotic development. We believe MinION sequencing may provide a rapid tool for crafting bespoke antibiotic treatment strategies in time-critical patients. When we investigated AMR genes detected by MinION and Illumina, both sequencing platforms generated reads mapping to genes with similar antibiotic resistance mechanisms (Supplementary Fig. 3), and only 4 genes (mphC, fusB, sat-4 and vanRG) out of 146 genes were detected exclusively by Illumina with unique resistance mechanisms. This result may be correlated to the lower MinION read count, and so could be mitigated by improvements in MinION technology for subsequent studies. Notably, we observed presence of AMR genes that corresponded to prescribed antibiotics; β-lactamase and aminoglycoside genes conferring resistance to benzylpenicillin and gentamicin, while fluoroquinolone resistance genes did not correlate to any prescribed antibiotics, which may relate to AMR gene transfer of strains from other sources (Supplementary Fig. 4). Thus, AMR profiling may guide clinical treatment decisions at an earlier stage of patient care.
Whilst our comparisons of preterm gut-associated microbial profiles highlighted the significant scope for MinION in a relevant clinical context, we wanted to demonstrate that the entire pipeline of sample preparation, library construction, sequencing and analysis could be carried out rapidly and in 'real-time'. Importantly, we used the most recent flowcells (R9.4) for this study, which have an improved error rate (~8% for 1D, and 4% for 2D reads) and yield. Specifically, for this study we added significant functionality to NanoOK for the 'real-time' analysis of species abundance and antibiotic resistance genes. The sample chosen for this run-through was from an extremely ill preterm infant P8 (born after 26 weeks' gestation with a birthweight of only 508 g), who had received multiple courses of antibiotics since birth (46 days of antibiotic treatment out of 63 days of life at sample collection), and presented with clinical NEC observations at the time of sample collection. Whilst MinION allows generation of long reads, DNA extraction to maximise this type of data is more time-consuming, thus in this study we utilised a rapid DNA extraction protocol (including a bead-beating step). Furthermore, previous studies, including our own (Alcon-Giner et al, unpublished), indicate that incomplete DNA extraction significantly biases metagenomic profiles obtained 42 , which may in turn limit pathogen detection and AMR analysis. Therefore, we acknowledge a limitation here which would be linked to the relatively short (N50 of 1,052 bp) reads produced during the MinION run and we expect improved DNA extraction methods would yield more powerful datasets.
We also wanted to provide as robust sequencing data as possible, thus we used the 2D library preparation for the run, which allows both strands to be sequenced and a concensus generated with greater accuracy. Our initial attempt was unsuccessful, with no reads passing filter, potentially down to the quality of flowcells. Additionally, our second attempt was slowed down by the Metrichor cloud base calling service (subsequently decommissioned), causing us to re-load the same flowcell to carry out local base calling and obtain impressively high yields (423,422 reads). This second attempt at a complete MinION run, revealed a K. pneumoniaedominated profile, after just 1 h of sequencing (~18,000 reads), enabling us to confidently 'call' this potential pathogen. This analysis was further strengthened as more sequencing, e.g. timepoints 6 h and 26 h (after sequencing start), gave almost identical microbial profiles (Fig. 4a), as did Illumina sequencing (Supplementary Fig. 7). However, it should be noted that the single species domination in this sample facilitates early detection at lower read depth and low level pathogen abundance would require deeper sequencing. Nevertheless, these data highlight how rapid diagnosis of pathogen overgrowth is possible using R9.4 Nanopore flowcells. Our data indicate the gut was colonised with K. pneumoniae in an infant with suspected NEC, which suggests that this bacterium may be causative in this illness. Importantly, K. pneumoniae has been linked to preterm NEC (and is supported by corresponding clinical observations of Bells diagnostic staging criteria, Supplementary Fig.  5), with overgrowth in the intestine linked to pathological inflammatory cascades, facilitated by a 'leaky' epithelial barrier 43 .
Whilst detection of individual pathogens is important, a critical additional requisite is identification of AMR profiles so that tailored antibiotic treatment can be used. Real-time analysis of MinION data highlighted the presence of a significant metagenomic 'resistome', including presence of colistin resistance, a last resort antibiotic, by the detection of gene arnA 44 . We noted the greater the read depth the greater the number of AMR genes detected, although importantly we were able to detect a significant number of AMR genes as rapidly as 1 h after sequencing start, including β-lactamases, quinolone, aminoglycoside and tetracycline resistance genes (Supplementary Table 4). Klebsiella is of particular concern in an AMR context due to the increasing emergence of multidrug-resistant isolates that cause severe infection, which represent a real threat to patient outcomes 45 . Therefore, to improve the potential for guiding antibiotic treatment decisions we performed the 'walked-out' study from the AMR gene sequences to determine which bacterial species were carrying which AMR genes. Our data indicated that these genes primarily mapped back to K. pneumoniae (Supplementary Fig. 6), and secondly P. mirabilis. This level of data may significantly enhance clinical diagnosis and antibiotic regimens to be determined in an intensive care environment. Notably, although infant P8 did not receive quinolone-type antibiotics, we detected quinolone resistance genes in these data (potentially acquired via horizontal gene transfer), which may indicate that prescription of, for example, ciprofloxacin may be ineffective in this patient.
Based on these data, a clinical decision (as determined by neonatologist Dr Paul Clarke, in a blinded fashion) at 1 h after sequencing start (5 h 15 min total) would be continuation of gentamicin and flucloxacillin or cefotaxime treatment (NICE guidance 46 states antibiotics should be given within 1 h of presentation of signs of sepsis). However, using 6 h AMR hit data (10 h 15 min total), the presence of acrD (a transporter belonging to the resistancenodulation-division family, known to participate in the efflux of aminoglycosides 47 ) might suggest gentamicin resistance and thus the infant may be moved onto meropenem (a carbapenem) that is gaining popularity as a single agent for neonatal infections 48 . Interestingly, we first detected acrD at 2 h 49 min after sequencing started (50,500 reads analysed, 7 h 4 min since sample receipt) therefore an earlier change in prescribed antibiotic, i.e. to meropenem, could have been possible. At the final 26 h time-point (30 h 15 min total) the guidance would be to continue with meropenem, which is also a last resort antibiotic against Enterobacteriaceae, including K. pneumoniae 49 . In NEC cases, where the patient has deteriorated and sepsis is suspected, the presence of K. pneumoniae in the blood stream would be required for a complete diagnosis. However, few patients with NEC symptoms are also normally positive for this sepsis diagnostic, with sample processing taking approx. 36-48 h. Thus, MinION faecal profiling is potentially very important clinically because of the speed of the test, and may allow clinical decisions from first suspicions of disease, to rapidly identify potential gut-associated causative organisms and corresponding AMR profiles.
We sought to further benchmark our MinION pipeline by correlating our metagenomic observations with isolated K. pneumoniae strains that had undergone WGS and MIC testing. Notably, whole genome analysis indicated a close agreement of resistance genes determined by MinION analysis, specifically when comparing results from the ´walk-out´ analysis (Supplementary Fig. 8). When subjecting strains to MIC testing (gold standard for profiling AMR), we observed phenotypic resistance to all main groups of antibiotics that had been prescribed to infant P8, including gentamicin (which would have continued to be used using 1h post sequencing AMR hits; Supplementary Fig. 10). There was good association between AMR gene sequence detection and MIC testing, i.e. SHV and β-lactam antibiotics, which highlights that MinION could be extremely useful for rapid AMR profiling. However, other resistances identified via MIC testing, such as vancomycin and metronidazole, did not correlate to presence of specific resistance genes in either the MinION, Illumina or WGS sequence data. These differences are not only restricted to MinION, and could relate a need of updating AMR databases with phenotype testing. As such we expect that rapid NGS, via MinION, would inform early and more appropriate antibiotic choices for patient care, halting the rapid deterioration observed in critically ill patients. Subsequently, phenotypic testing via standard clinical microbiology labs would further refine clinical management of patients. Notably, MIC testing also suggested susceptibility to meropenem, which is the same antibiotic that we discussed may be prescribed (2 h 49 min post sequencing start, 7 h 4 min from sample receipt), based on the lack of resistance determinants in the metagenomic MinION analysis. Importantly, it is clear that preterm infants harbour a significant reservoir of AMR genes within the wider microbiome as well as in potentially pathogenic bacteria such as K. pneumoniae, which makes bespoke antibiotic treatment decisions non-trivial. Thus, further highlighting the urgent requirement for comprehensive AMR databases, new antibiotic development and alternative treatments such as novel antimicrobials or microbiota therapies.
We obtained MinION bioinformatics results in under 5 hours. In comparison, standard Illumina MiSeq sequencing (paired 250 bp reads) alone normally takes 39 hours, excluding sample preparation and analysis (https://www.illumina.com/systems/sequencingplatforms/miseq/specifications.html), and PacBio (i.e. long-read sequencing) with the same sample prep and quality as our MinION experiment, using the rapid (4 hour) library method (http://www.pacb.com/wp-content/uploads/2015/09/Guide-Pacific-Biosciences-Template-Preparation-and-Sequencing.pdf), 30 minute diffusion and the shortest sequencing run ( 30 minute) would take over 7 hours even without base calling or bioinformatics, highlighting the significantly quicker detection potential of MinION in a clinical setting.
A number of products announced by ONT could make Nanopore sequencing in clinical settings even more attractive. The SmidgION, expected end of 2018, is an even more compact device than the MinION that can be connected to a mobile phone. At the other extreme, the GridION X5 and PromethION are respectively capable of running between 5 and 48 flowcells simultaneously, facilitating better economies of scale. Sample preparation is already relatively simple, particularly for 1D libraries, but this is likely to be further simplified by VolTrax, a compact automated library preparation system from ONT that can manipulate fluids around an array of pixels. ONT have demonstrated 1D library preparation directly from an Escherichia coli cell culture entirely with VolTrax (https://nanoporetech.com/publications/voltrax-rapidprogrammable-portable-disposable-device-sample-and-library-preparation). Balancing the excitement about new products is that the technology is still relatively immature and is frequently updated, issues that must be addressed for a true diagnostic tool. Even within the time-frame of this study we have observed significant advances in optimisation of MinION technology, and we hope this could lead to a stable platform for large-scale clinical testing.
Whilst we show the results of single patient diagnostics alone, if this platform were to be broadly adopted, even within a single ward or hospital, additional epidemiological analysis should be possible. Transmission chain analysis of hospital-acquired pathogens has been shown using Illumina data 39 , and Nanopore sequence-based epidemiology was used to monitor the West Africa Ebola virus outbreak 50 .

CONCLUSION
In conclusion, we have demonstrated that MinION technology is an easy-to-use and rapid sequencing platform that has the ability to detect gut-associated pathogens that have been associated with potentially life-threatening preterm-associated infections in 'real time'; identification of specific pathogenic taxa (i.e. K. pneumoniae) and corresponding AMR profiles. Data obtained may allow clinicians to rapidly tailor antibiotic treatment strategies (i.e. change from gentamicin to meropenem) in a rapid (~7 h decision from sample receipt) and timely manner. The utility of this approach was confirmed when compared to Illumina metagenomic sequencing and isolation and characterisation of K. pneumoniae strain including WGS and phenotypic (i.e. MIC) testing. This suggests that MinION may be used in a clinical setting, potentially improving health care strategies and antibiotic stewardship for at-risk preterm infants in the future.

METHODS
Methods and associated references are available in the online version of the paper.

Accession codes
The Illumina and MinION read data supporting the conclusions of this article are available in the European Nucleotide Archive (http://www.ebi.ac.uk/ena) under study accession PRJEB22207.
Strategic Programme Grant BB/J004669/1 and BBSRC Core Strategic Programme Grant BB/J004669/1. This work was funded via a Wellcome Trust Investigator Award to LJH (100974/Z/13/Z). Isolation work was funded by a Microbiology Society Research Visit Grant to TCB. LH is in receipt of an MRC Intermediate Research Fellowship in Data Science (UK MED-BIO, grant number MR/L01632X/1). We are grateful for the assistance of the Genomics Pipelines team at EI, as well as Tim Stitt and the NBI Computing Infrastructure for Science team. We are also grateful to research nurse Karen Few for obtaining consent from parents and collecting samples. We thank Chris Bennett and Sasha Stanbridge of the EI Communications team for producing the accompanying video.
The following reagent was obtained through BEI Resources, NIAID, NIH as part of the Human Microbiome Project: Genomic DNA from Microbial Mock Community B (Staggered, High Concentration), v5.2H, for Whole Genome Shotgun Sequencing, HM-277D.

AUTHORS CONTRIBUTIONS
LJH, MDC and RML designed the research; CAG, DH, MK, RML, SC, and TCB performed research; RML contributed new software tools; CAG, LH, LJH, MDC, PC, RML and SC analysed data; and CAG, LJH, MDC and RML wrote the paper.

COMPETING FINANCIAL INTERESTS
The authors have not received direct financial contributions from ONT, but RML and MDC have received a small number of free flowcells as part of the MAP and MARC programs. RML is in receipt of travel and accommodation expenses to speak at an ONT conference and is on a PhD student advisory team with a member of ONT staff. Table 1: Nanopore flowcell versions and yields. Yield and length metrics for Nanopore runs. Flowcell for sample P8 (2D) was used for two experiments: (a) Initial Metrichor basecalled run which was abandoned due to network lag and (b) Local basecalled run which was used for results presented in the analysis. Sample P8 (1D) utilised the rapid transposase-based library preparation protocol but was ultimately unsuccessful in our experiment. Only pass reads were used for analysis -these are defined as having a mean Q score of 9 or greater.

Sample
Flowcell Sequencing kit

Illumina sequencing of mock community
Illumina compatible amplification-free paired end libraries were constructed with inserts spanning from 600 bp to >1000 bp. A total of 600 ng of DNA was sheared in a 60 µl volume on a Covaris S2 (Covaris, Massachusetts, USA) for 1 cycle of 40 s with a duty cycle of 5%, cycles per burst of 200 and intensity of 3. Fragmented DNA was then end-repaired using the NEB End Repair Module (NEB, Hitchin, UK), size selected with a 0.58x Hi Prep bead cleanup (GC Biotech, Alphen aan den Rijn, The Netherlands) and followed by A tailing using the NEB A tailing module (NEB) and ligation of adapters using the NEB Blunt/TA Ligase Master Mix (NEB). Three 1x bead clean-ups were then undertaken to remove all traces of adapter dimers. Library quality control was performed by running an Agilent BioAnalyser High Sensitivity chip and quantified using the Kappa qPCR Illumina quantification kit. Based on the qPCR quantification libraries were loaded at 9 pM on an Illumina MiSeq and sequenced with 300 bp paired reads.

MinION sequencing of mock community
MinION 2D libraries were constructed targeting inserts >8 kbp. A total of 1 µg of DNA was fragmented in a 46 µl volume in a Covaris G-tube (Covaris, Massachusetts, USA) at 6,000 rpm in an Eppendorf centrifuge 5417. Sheared DNA was then subjected to a repair step using the NEB FFPE repair mix (NEB, Hitchin, UK) and purified with a 1x Hi Prep bead clean-up (GC Biotech, Alphen aan den Rijn, The Netherlands). A DNA control was added to the repaired DNA and then end-repaired and A-tailed using the NEBNext Ultra II End Repair and A-Tailing Module (NEB), purified with a 1x Hi Prep bead clean-up and then the AMX and HPA MinION adapters ligated using the NEB Blunt/TA Ligase Master Mix (NEB). An HP tether was then added and incubated for 10 min at room temperature followed by a further 10 min room temperature incubation with an equal volume of pre-washed MyOne C1 beads (Thermo Fisher, Cambridge, UK). The library bound beads were washed twice with bead binding buffer (ONT) before the final library eluted via a 10 min incubation at 37 ºC in the presence of the MinION Elution Buffer. The final library was then mixed with running buffer, fuel mix and nuclease free water and loaded onto an R7.3 flowcell per the manufacturer's instructions and sequencing data collected for 48 h.

Mock community data analysis
MinION reads were basecalled using the Metrichor service and downloaded as FAST5 files. NanoOK v0.54 27

Clinical samples Ethical approval and preterm sample collection
The Ethics Committee of the Faculty of Medical and Health Sciences in the University of East Anglia (Norwich, UK) approved subject recruitment for this study. Protocol for faeces collection was laid out by the Norwich Research Park (NRP) Biorepository (Norwich, UK) and was in accordance with the terms of the Human Tissue Act 2004 (HTA), and approved with license number 11208 by the Human Tissue Authority. Infants admitted to the Neonatal Intensive Care Unit (NICU) of the Norfolk and Norwich University Hospital (NNUH, Norwich, UK) were recruited by doctors or nurses with informed and written consent obtained from parents. Oral probiotic supplementation provided to the infants in this study contained Bifidobacterium bifidum and Lactobacillus acidophilus (i.e. Infloran®, Desma Healthcare, Switzerland) strains with a daily dose of 2 x 10 9 of each species. Collection of faecal samples was carried out by researchers and stored at -80 °C prior to DNA extraction.

DNA extraction of faeces samples from preterm infants
Bacterial DNA was extracted using the FastDNA Spin Kit for Soil (MP) following the manufacturer's instructions but extending the bead-beating step to 1 min, and eluting the DNA with 55 °C DES. The starting faecal material used for the DNA extraction was between 100 and 150 mg. The purity and concentration of the DNA was assessed using a NanoDrop 2000c Spectrophotometer and Qubit® 2.0 fluorometer. Samples with DNA concentrations higher than 25 ng/µl were considered acceptable.

MinION shotgun library preparation
MinION 2D libraries were constructed as outlined for the mock (see above) except that for the R9.4 flowcells the final library was mixed with running buffer containing fuel mix, library loading beads and nuclease free water and loaded onto the flowcell per the manufacturer's instructions. MinION 1D libraries were prepared by incubating 200 ng of DNA for with 2.5 µl FRM 1 min at 30 °C then 1 min at 75 °C followed by the addition of 1 µl RAD and 0.2 µl NEB Blunt/TA Ligase Master Mix (NEB) and a room temperature incubation for 5 min. The final library was then mixed with running buffer containing fuel mix, library loading beads and nuclease free water and loaded onto the flowcell per the manufacturer's instructions.
Libraries from infant P10 (P10N, P10V and P10R) were prepared using the SQK-MAP006 Genomic Sequencing Kit and sequenced on R7.3 flowcells for 48 h. ONT's MinKNOW software was used to collect signal data and the Metrichor cloud-based service was used for basecalling.

Illumina HiSeq 2500 shotgun library preparation
Libraries for samples (P10N, P10R and P10V) were prepared using TruSeq Nano DNA Library Prep Kit according to the manufacturer's instructions and sequenced using the HiSeq Illumina 2500 machine with 150 bp paired end reads. The library for P8 was prepared as for the Amplification Free library for the mock (see above) and run at 9 pM on an Illumina MiSeq with a 2x 250 bp read metric. Bioinformatic analysis started by processing raw reads through quality control using FASTX-Toolkit 53 keeping a minimum quality threshold of 33 for at least 50% of the bases. Reads that passed the threshold were aligned against SILVA database (version: SILVA_119_SSURef_tax_silva) 54 using BLASTN 55 (ncbi-blast-2.2.25+; Max e-value 10e-3) separately for both pairs. After performing the BLASTN alignment, all output files were imported and annotated using the paired-end protocol of MEGAN 29 .

Time series study for infant P10
Illumina and MinION sequencing data for samples P10N, P10V and P10R from infant P10 were studied. For the Illumina samples, we removed PCR duplicates (remove_pcr_duplicates.pl script from https://github.com/richardmleggett/scripts), ran Trimmomatic 52 to remove adaptors and applied a sliding window quality filter (size 4, mean quality greater than or equal to 15) and then randomly sub-sampled 1 million reads (subsample.pl script from same location). These reads were used as the input to a blastn search of NCBI's nt database. For the Nanopore sequencing, we took only the reads classified as 'pass' reads (defined as 2D reads with a mean Q value >9) and performed no further preprocessing before running blastn. Using MEGAN6 29 , we removed reads matching Homo sapiens (accounting for < 0.1% per sample) and performed taxonomic analysis.

Real-time diagnostic study for preterm P8 using MinION and NanoOK RT
One sample from infant P8 was sequenced. Both 1D and 2D Nanopore libraries were prepared using the SQK-RAD002 Rapid Sequencing Kit 1D and SQK-LSK208 Ligation Sequencing Kit 2D, respectively, and each library was sequenced on a R9.4 flowcell. MinKNOW software was used to collect signal data. As described in the main text, an attempt was made to use the Metrichor base calling service for the 2D library but failed, instead local basecalling through MinKNOW was used, though at the time only basecalled the first, or template strand, so could not generate more accurate 2D consensus reads. MinKNOW was also used for local basecalling of the 1D library, which cannot generate 2D reads.
To enable real-time analysis of MinION data new functionality was added to NanoOK 29 . The new software, NanoOK RT, monitored a specified directory for FAST5 files as they are created. For each new file, a FASTA file was extracted automatically. For efficiency FASTA files were grouped into batches of 500, and each batch was BLAST searched against the NCBI nt database (downloaded February 2017) and the CARD database (v1.1.1, downloaded October 2016) of antibiotic resistance genes 56 . NanoOK RT can also write out command files for MEGAN 29 , which allows more detailed analysis of community composition, either as the run proceeds, or on completion. NanoOK RT is available as an extension to NanoOK, selectable as a run-time option, from https://github.com/TGAC/NanoOK. Another bioinformatic tool, NanoOK Reporter, was also developed for this project, and provided a graphical user interface to monitor the run, view summaries of species, and antibiotic resistance genes identified. The tool allowed the user to browse through data in real time as batches were processed, or after all of the results were in using their timestamps to indicate when a result was first obtained. This summary data can also be exported as plain text files and these were subsequently used for later analysis (below). NanoOK Reporter is available from https://github.com/richardmleggett/NanoOKReporter. Documentation for NanoOK Reporter as well as a tutorial utilising the data from this publication is available at https://documentation.tgac.ac.uk/display/NANOOK/NanoOK+Reporter.

Generation of resistance heat map in Figure 4b
We opened the CARD results using NanoOK Reporter and used the option to save summary data as a plain text file. This saved a text file for the analysis at each time point (here batches of timestamped 500 reads) summarising the counts of resistance genes identified up to that point. We took the latest time point file (chunk 459, available at https://github.com/richardmleggett/bambi) and extracted a list of the ARO (Antibiotic Resistance Ontology) numbers from the ID column. Each unique ARO was manually assigned to one of 13 groups (as displayed in the figure). We wrote a script (gather_heatmap_data.pl, same GitHub repository) to take the summary files, together with this mapping and to generate a file (BAMBI_P8_2D_Local_070317_hits.txt) summarising hits per group at each time point. An R script (plot_card_heatmap.R, same GitHub repository) takes this file and renders the heat map.

Walking out from resistance genes to identify the encompassing bacteria
We wrote a shell script to go through all the CARD BLAST hits and wrote each CARD hit and the corresponding nt hits for the same read (walk_out_preprocess.sh, available at https://github.com/richardmleggett/bambi). A second script (walk_out.pl) took the output from the first script and parsed it, read-by-read. If there was a hit in nt that began at least 50 bases before the start of the CARD hit, or at least 50 bases after the end of the CARD hit, then this species was taken as the encompassing species. The script also recorded count of the number of times each species was seen.

Statistical analysis on number of reads
Read counts at different stages of the bioinformatics analysis are provided in Table 1. For comparative analysis, sequences were normalised using values from the sample with the lowest number of reads.

Isolation and biochemical characterisation of P8 Klebsiella pneumoniae strains
An aliquot (100 mg) of faecal sample was homogenised in 1 mL TBT buffer (100 mM Tris/HCl, pH 8.0; 100 mM NaCl; 10 mM MgCl2•6H2O) by pipetting and plate mixing at 1500 rpm for 1 h. Homogenates were serially diluted to 10 -4 in TBT buffer. Aliquots of 50 µl were spread on MacConkey (Oxoid) agar plates in triplicate and incubated aerobically at 37 °C overnight. Colonies were selectively screened for lactose-positive (i.e. pink) colonies. One colony of each morphology type was re-streaked on MacConkey agar three times to purify. Biochemical characterisation was performed using API 20E tests (Biomerieux) according to manufacturer's instructions.