# Nanopore DNA Sequencing and Genome Assembly on the International Space Station

## Abstract

We evaluated the performance of the MinION DNA sequencer in-flight on the International Space Station (ISS), and benchmarked its performance off-Earth against the MinION, Illumina MiSeq, and PacBio RS II sequencing platforms in terrestrial laboratories. Samples contained equimolar mixtures of genomic DNA from lambda bacteriophage, Escherichia coli (strain K12, MG1655) and Mus musculus (female BALB/c mouse). Nine sequencing runs were performed aboard the ISS over a 6-month period, yielding a total of 276,882 reads with no apparent decrease in performance over time. From sequence data collected aboard the ISS, we constructed directed assemblies of the ~4.6 Mb E. coli genome, ~48.5 kb lambda genome, and a representative M. musculus sequence (the ~16.3 kb mitochondrial genome), at 100%, 100%, and 96.7% consensus pairwise identity, respectively; de novo assembly of the E. coli genome from raw reads yielded a single contig comprising 99.9% of the genome at 98.6% consensus pairwise identity. Simulated real-time analyses of in-flight sequence data using an automated bioinformatic pipeline and laptop-based genomic assembly demonstrated the feasibility of sequencing analysis and microbial identification aboard the ISS. These findings illustrate the potential for sequencing applications including disease diagnosis, environmental monitoring, and elucidating the molecular basis for how organisms respond to spaceflight.

## Introduction

Durations for Mars missions are likely to range from 1.5 to 3 years, with 12 to 24 months of that time spent in transit between the planets, based on current propulsion technologies and planetary orbital dynamics. In response to spaceflight, the human immune response becomes dysregulated1, and microbial pathogenicity can increase during spaceflight2. Beyond gene expression-mediated virulence changes, it is unclear how microbial populations would evolve, both in terms of population ecology and genetic mutations, over the course of a multi-year mission with increased exposure to ionizing radiation and microgravity during transit. This ongoing microbial evolution could have a profound impact on crew health, as microbiome stability and dynamics are known to have significant effects on human health on Earth3,4. Considering the time required to reach Mars, intervention from Earth during the course of a Mars mission will be limited to electronic communication, meaning that any analyses or monitoring to be performed must be done in situ. There is also a clear need for in-flight clinical diagnostic capability to ensure that any infections can be managed appropriately, including the administration of targeted antimicrobials.

Sequencing is a technology that could potentially address several critical spaceflight needs: infectious disease diagnosis, population metagenomics, gene expression changes, and accumulation of genetic mutations. Based on size, power, and ease of use considerations, the MinION™ DNA sequencer (Oxford Nanopore Technologies, Oxford, UK) was the most spaceflight-ready of commercially available sequencers. This device sequences DNA and RNA by measuring current changes caused by nucleic acid molecules passing through protein nanopores embedded in membranes; the change in current is diagnostic of the sequence of the DNA or RNA occupying the pore at a given time. In order to evaluate the performance of the MinION in a space environment, we tested it aboard the International Space Station (ISS). Orbiting 400 km above the Earth and travelling at 28,000 km/h, the ISS is in constant freefall and maintains a continuous microgravity environment. Although the MinION has been successfully reported to operate in remote locations on Earth5,6,7,8, operation aboard the ISS poses additional challenges, including possible flow cell membrane disruption due to launch shocks and vibrations, difficulty in removing air bubbles that form during sample handling, along with a potential increase in susceptibility to those air bubbles introduced to the flow cell during loading in microgravity; air bubbles in particular can damage the flow cell membranes or nanopores, and can also block the nanopores directly. In parabolic flight testing of the nanopore sequencer, we obtained only three reads while airborne9. From this experience, we identified a number of procedural changes in order to improve performance in flight and enable the present work (see Materials and Methods).

Here we describe the results of nanopore DNA sequencing experiments performed aboard the ISS with samples containing an equimolar mixture of genomic DNA extracted from a virus (Enterobacteria phage lambda), a bacterium (Escherichia coli), and a model mammalian organism, the mouse (Mus musculus). In parallel, we performed control experiments on the ground and made cross-platform comparisons with genomic sequence data obtained from the same samples on Illumina MiSeq (Illumina, San Diego, CA) and PacBio (Pacific Biosciences, Menlo Park, CA) instruments. Simulated analyses of in-flight data using an automated metagenomic pipeline and de novo assembly algorithms demonstrate the feasibility of on-board, real-time analysis and genomic assembly for future sequencing applications in space.

## Results

### Flight and Ground Control sequencing with the MinION

Nine sequencing experiments were conducted aboard the ISS between August 26, 2016 and January 9, 2017 (Fig. 1A, Supplementary Table 1). Identical, simultaneous sequencing runs were also performed on the ground. For eight of the nine sequencing runs, a frozen sample containing ready-to-sequence DNA libraries was thawed and loaded on a new MinION flow cell, and the run was initiated using MinKNOW software (Oxford Nanopore Technologies, Inc. v0.51); for run 6, an initial 6 hour sequencing run was performed, after which a second thawed library was loaded on the flow cell and a 48 hour sequencing run was initiated (ISS6.2). All samples contained equimolar inputs of lambda, E. coli and mouse genomic DNA. Of the eighteen combined flight and ground experiments, all but two ground experiments (G2 and G6.2) produced good yields of high-quality sequencing data. The cause for the poor performance in G2 is not definitely known, although the low number of available pores suggested disruption of the nanopores during handling, shipping, or storage. For G6.2, the reduced performance was not unexpected because the flow cell appeared to have relatively fewer active pores prior to its first use (G6.1), and then was re-used without rinsing with the manufacturer’s recommended wash buffer; similarly, it was observed that ISS6.2 also had a decrease in available pores from ISS6.1. Two of the runs, G7 and ISS7, terminated earlier than expected. In the case of G7, the wireless adapter on the Surface Pro 3 was left on and the combined power consumption of the MinION and wireless card exceeded the charging capacity of the Surface Pro 3, causing it to power off; for ISS7, the power cord became disconnected at some point during the sequencing run and the Surface Pro 3 shut off when its battery ran out of power. Because sequence data files are written as each molecule is sequenced, the only consequence of these early terminations was a reduction in the total number of molecules sequenced. Upon completion of sequencing experiments in-flight, all FAST5/HDF files produced were downloaded from the ISS to Earth. Data transfers for each sequencing run (4–20 Gb of data distributed among 15,000 to 61,000 files) took between 1 and 6 hours. The flight and ground data were then analyzed using a number of open-source and custom-developed bioinformatic workflows (see Materials and Methods).

A key determinant of the success or failure of sequencing on the MinION is the number of active pores identified during the MUX scan performed at the initiation of sequencing. An active pore is one where current can be measured going through the pore. Each flow cell contains a total of 2,048 nanopores, each of which is capable of sequencing molecules that pass through it. In practice, however, a subset of these nanopores will fail at some point during manufacture, transit, handling, storage or use, reducing the total number of active pores. Vibration testing on the ground suggested that ~70% of the nanopores in the R7 flow cells, as determined by platform QC analysis, should be active after launch vibration9. Although there was considerable variability in the number of active pores between the 16 flow cells used in this study, no statistically significant decrease in the number of active pores in the flow cells used on the ISS was observed compared with the ones used on the ground (Supplementary Table 1). Across all 9 runs, a total of ~284,000 reads were generated, as compared to ~130,000 from the ground controls. Thus, to a first approximation, MinION sequencing performance on the ISS was comparable to or better than MinION sequencing on the ground.

### Metagenomic analysis of in-flight nanopore data using the automated SURPIrt pipeline

To demonstrate the feasibility of analyzing sequencing data on the ISS, we ran simulations of real-time metagenomic analysis of all 8 pooled runs from in-flight nanopore data using the automated SURPIrt pipeline (Fig. 2A; Supplementary Figure 8)10,11. Species-specific reads corresponding to mouse, E. coli, and lambda from the sample mixtures were initially identified in SURPIrt within 1 minute of beginning sequence analysis by MegaBLAST alignment to the NCBI nt database (word size = 16; e-value = 1 × 10−5). The distribution of detected reads could be visualized in real-time as donut charts on a web browser refreshed every 30 s (Fig. 2B).

Automated analyses of all 276,882 nanopore reads in pooled runs 1 through 8 revealed a percent read count distribution of mouse, viral, and bacterial reads of 30.1%, 30.1%, and 30.0%, respectively, consistent with the expected proportions from equimolar mixing of the sample, with only 218 reads (0.08%) mapping to other organisms and 27,043 reads (9.8%) unidentified (Fig. 2B). Taxonomic classification using a lowest common ancestor (LCA) algorithm revealed that nearly all of the bacterial reads (99.7%) mapped to E. coli (or its parental taxa), while nearly all of the viral reads (99.9%) mapped to lambda or other phages in the Caudovirales order. The proportions of mouse, E. coli, and phage reads remained fairly consistent across all 8 runs (Fig. 2C), although relatively fewer lambda reads were detected in the earlier runs 1 through 3.

Overall, the results from the SURPIrt pipeline for unbiased pathogen detection were comparable to those obtained by directly mapping the reads to the closest target reference genome in GenBank using GraphMap (Fig. 2D). When individual GraphMap-identified reads were aligned to the reference genome, the mean read lengths and average percent pairwise identities were 6,880 bp and 81.6% for mouse, 5,718 bp and 82.8% for E. coli, and 6,245 bp and 84.1% for phage. The range of read lengths varied considerably from 80 to 72,619 bp (Supplementary Table 2). To generate consensus sequences, individual reads were mapped to the 4.66 Mb E. coli genome, 48.5 kb lambda genome, and a representative mouse sequence, the 16.3 kb mitochondrial genome. Pairwise gapped alignment of the 3 full-length consensus sequences against their corresponding reference genomes using the MAFFT algorithm showed 96.7–100% consensus pairwise identity (Fig. 2E).

### De novo genome assembly across the sequencing platforms

To determine if the in-flight nanopore data generated on the ISS could be used for successful de novo assembly of the E. coli genome, we independently tested two long-read genome assemblers, Miniasm12 and Canu13,14, using 2D reads pooled from all 9 in-flight sequencing runs. We tested assemblies generated (1) directly from raw 2D data, (2) from remaining reads after background subtraction of mouse sequences, and (3) from reads assigned to E. coli only (Supplementary Figure 9). From each of these read subsets, Canu was able to assemble a single contig representing >99% of the complete E. coli genome sequence, with ≥98.6% consensus pairwise identity relative to the PacBio reference genome. In contrast, although Miniasm had a significantly faster run time than Canu (<1 minute versus 2 hours on a 64-core server), the assemblies had more contigs, were not as complete (85.1–87.6%), and were less accurate (87.1%). The improved accuracy of Canu relative to Miniasm was also observed previously15.

We further tested the ability to run these assemblies not only on computational servers but also on the cloud (the Amazon Elastic Compute Cloud/EC2 platform) and on a laptop. We observed that an 8-core, 32 Gb EC2 instance was sufficient to complete an entire Miniasm assembly within 15 seconds, and a similarly configured laptop with 8 hyperthreaded cores and 64 Gb RAM took less than 40 sec (Supplementary Figure 9). These results on the cloud and laptop showed that real-time genome sequencing and analyses were feasible in space. Analyses of nanopore data were not performed on the ISS in the current study, as available in-flight computing resources were not sufficient (e.g. the on-board Surface Pro 3 tablet uses a single-core i7 processor with 8 Gb of RAM). Nevertheless, in-flight sequence analysis capability could be achieved with the next generation of laptops, supplied to the ISS in June, 2017. As the lambda prophage is inserted into the E. coli genome, these data constitute the first de novo assemblies of a complete bacterial and viral genome with 100% accuracy, and indeed, the first genome assemblies of any organism from sequence data generated solely off of planet Earth.

## Discussion

Although for this experiment samples were prepared on the Earth, recent simplifications of sample preparation for the MinION sequencer (e.g., Oxford Nanopore Technologies 1D rapid library preparation kits and VolTRAXTM automated sample preparation device) should be straightforward to adapt to the spaceflight environment, and are currently being optimized for deployment to the ISS. Just as on Earth, methods for the extraction of the nucleic acids themselves from ISS-derived samples will need to be tailored for each source. However, the Wetlab-2 project has already successfully demonstrated DNA and RNA extraction aboard the ISS16, and microbial DNA extraction could be performed with simple thermal lysis and magnetic bead clean-up, processes that are not gravity-dependent. Furthermore, we recently demonstrated that pipetting using both positive displacement and air displacement pipets was possible on the ISS as well as during microgravity intervals on a parabolic flight17.

The data obtained from sequencing aboard the ISS can readily recapitulate the measurements of nucleic acids from phages (lambda), bacteria (E. coli), and mammalian (mouse) DNA on Earth. Indeed, across all three species, the base quality and 2D read alignments were routinely above 85%, with equal or superior performance to the identical replicate libraries and flow cells tested on Earth. These results were also true when comparing the skips/base, stays/base, read length, and GC-content of the data. Sequence reads were also validated against sequencing data obtained on PacBio RSII and Illumina MiSeq instruments to confirm their accuracy. De novo assembly of nanopore reads collected in-flight enabled the generation of a high-quality genome assembly of E. coli (a single full-length contig at ≥98.6% identity). Importantly for microbiome and metagenomic applications, our results demonstrate that a de novo assembly of microbial genomes from raw, unfiltered data sequence data corresponding to a complex metagenomic mixture is feasible in space. As with sequence data from any platform, the success of unfiltered assembly from nanopore reads will likely depend on the complexity of the metagenomic background in the sample, depth of sequencing, and error rates, which have been steadily decreasing over time for nanopores18. In this study, the individual error rates for identified mouse, E. coli, and lambda reads were 18.4%, 17.2%, and 15.9%, respectively, although coverage redundancy during directed or de novo assembly, as shown here, can reduce error rates to <2%. We also successfully tested genome assembly of in-flight nanopore reads using a cloud-based platform (Amazon EC2) and laptop. In aggregate, these results clearly validate the implementation of the MinION nanopore sequencer for rapid, in situ diagnostics and microbial identification on the ISS, and, ultimately, in any space environment. Furthermore, lightweight sequencing platforms such as the one demonstrated here, coupled with sufficient local computing power, can be directly applied to terrestrial research applications in remote environments. The ability to analyze a subset of samples to assess sampling diversity and quality while in field locations such as the Arctic or on deep-sea drilling expeditions could greatly improve the overall yield of science from these campaigns.

From a spaceflight perspective, in the immediate future the MinION holds the potential to greatly improve the rate at which ISS research can be performed by allowing researchers rapid access to data obtained in-flight, rather than having to wait for sample return. With robust experiment planning and some foresight, research projects that required multiple flights over several years could now be performed in a matter of months, as researchers could monitor experiment progress in real-time and make adjustments as needed (i.e., cadence of time points, identifying a subset of samples that should be returned to Earth for further analysis, etc.). Studies of gene expression in-flight would also be enabled by a sequencing platform on the ISS, and could be performed more robustly and with less risk of experiment failure by reducing the need for storage of labile RNA in a freezer. Analysis aboard the ISS would also serve to help eliminate time constraints, such as those posed by organism re-acclimation upon return to Earth, facilitating more optimal and less arbitrary selection of time points for sample collection and analysis. Nanopore sequencing also has the potential to detect base modifications19,20, which could also enable in situ epigenetics studies of both DNA and RNA.

As exploration progresses beyond low Earth orbit toward extended missions in cis-lunar space and eventually to Mars, changes and advances in nanopore-based sequencing will be needed. Increases in sequencing accuracy will result in improved diagnostic capabilities by reducing the number of false positives and false negatives. As communication delays increase and data transfer rates decrease, local analysis of sequencing data will be essential. Aspects of this challenge are manifested on Earth in remote locations and point-of-care settings of clinical and public health significance, such as “hot spots” from outbreaks due to Ebola or Zika virus21,22,23. In the current study, we used the SURPIrt computational platform to simulate an automated metagenomic analysis of nanopore data in real-time, from read processing to microbial identification to genome assembly on both a server and a laptop, and we also showed that rapid (15 sec) assembly was possible, highlighting the ability to use these tools and techniques locally for future missions.

One of the outstanding questions for use of the sequencers such as the MinION in deep space exploration is flow cell stability over the course of an 18 to 36 month mission. Extreme temperatures and increased galactic and cosmic radiation exposure are less of a concern for flow cell stability during human missions, as crew members will require shielding from these conditions as well24. For robotic missions, however, enhancements in flow cell stability will likely be needed, which could be achieved through the development of more robust membranes in which the pores are embedded or with improvements in the resolving power of solid state nanopores. It is worth noting however, that the present work has demonstrated flow cells are stable after 6 months in orbit, which is on par with at least a one-way transit to Mars; and radiation exposure would not seem to be a significant factor for protein nanopore stability: the Curiosity rover measured <1 Gray of radiation during its transit to Mars and 0.18 to 0.225 mGray per day on the Martian surface25,26, which are radiation doses orders of magnitude lower than doses (3,500 Gray) that proteins have been demonstrated to tolerate with no loss in function27. Flow cell reusability would be of tremendous benefit on Mars and other deep-space missions due to stringent mass limits imposed by cost and propulsion constraints; we were able to demonstrate in a limited fashion the reusability of flow cells aboard the ISS, where a minor decrease in the number of available pores was observed between the first and second flow cell loadings, and overall data yields were comparable to those obtained from 48 hour sequencing runs on flow cells only used once. Assuming sufficient flow stability is achieved, nucleic acid sequencing could play an important role on crewed missions to Mars to monitor crew health.

Once on Mars or another planetary surface, the sequencing platform would then become a powerful tool for surveillance and exploration. DNA-based life could be rapidly detected, enabling identification of Earth-derived contamination and perhaps even characterization of indigenous Martian life if it also uses DNA. Beyond life as we know it, direct analysis of molecules by nanopores has been used to detect base modifications in DNA19,20, to identify pathogens in clinical samples11, to sequence RNA23,28,29, and even to characterize proteins30. The ability of nanopore analyzers to accommodate a range of polymers increases the chance of detecting extraterrestrial life, which could use different bases or sugars in its genetic material beyond canonical nucleotide-based DNA and RNA.

## Materials and Methods

### Code availability

The Megablast-based SURPIrt code for metagenomic analysis of the data generated from this study can be found on the Github repository for the UCSF Chiu laboratory at https://github.com/chiulab. The remaining scripts are available at https://pbtech-vc.med.cornell.edu/git/mason-lab/nanopore_in_space/tree/master.

### Data availability

The datasets generated or analyzed during the current study are available from the authors on reasonable request; base-called.fastq files are available in the NASA GeneLab database under accession number 84; https://genelab-data.ndc.nasa.gov/genelab/accession/GLDS-84.

### Spaceflight Hardware

The full payload included the following items: two MinION devices, a USB 3.0 cable, nine R7.3 flow cells (Oxford Nanopore Technologies), nine sample syringes containing ground-prepared genomic DNA, nine empty sample syringes for air bubble removal, and a configuration flow cell (Supplementary Figure 10). A pipette kit including 10, 100, and 1,000 μl Rainin positive displacement pipettes (Mettler Toledo, Oakland, CA) and associated tips was included for contingency purposes if the syringe was not sufficient for bubble removal. These items were all launched from Cape Canaveral Air Force Base on the SpaceX CRS-9 Dragon capsule on July 18th, 2016. The MinKNOW™ software (Oxford Nanopore Technologies) required for operation of the MinION was loaded on a Microsoft Surface Pro 3 tablet and delivered to the ISS on the Orbital ATK OA-6 Cygnus Space Station Resupply Vehicle, which launched on March 22, 2016.

### Library Preparation and Sequencing

Samples analyzed in this study contained sequence libraries prepared from mixtures of female mouse BALB/C (Zyagen, San Diego, CA), Escherichia coli K-12 (Zyagen), and N 6-methyladenine-free bacteriophage lambda (New England Biolabs, Ipswich, MA) genomic DNA. These species were chosen based on their being model organisms corresponding to a eukaryote (mammal), bacterium, and virus, respectively. The samples were aliquoted for library preparation on three platforms: Oxford Nanopore Technologies (ONT) MinION (v6), Illumina MiSeq (v2), and the Pacific Biosciences RSII.

### MinION Library Preparation and Sequencing (Ground and ISS)

Aliquots of DNA for mouse, E. coli, and lambda libraries were sheared individually using Covaris g-TUBEs (Covaris, Boston, MA) by centrifugation at 4,800 × g for 2 min to produce fragments that were predominantly 8 kb. Fragmentation was verified using a 2100 Agilent Bioanalyzer (Agilent Technologies, Santa Clara, CA). Sheared mouse, E. coli, and lambda DNA samples were quantified using a Qubit 2.0 fluorometer (Thermo Fisher Scientific, Waltham, MA) and combined in equal abundances, targeting 1.5 µg total (0.5 µg each). Mixed DNA samples then underwent treatment to repair residual nicks, gaps, and blocked 3′ ends using Formalin-Fixed, Paraffin-Embedded (FFPE) DNA Repair Mix (New England Biolabs), with modifications to the manufacturer’s protocol to include a 0.5X Agencourt AMPure XP system magnetic bead clean-up (Beckman Coulter Genomics, Brea, CA) to remove small fragments of DNA. The repaired DNA was then prepared for sequencing according to Oxford Nanopore Technologies’ procedure for the SQK-MAP-006 Kit. Following library preparation, individual samples were mixed with a ~100 ng aliquot of pre-sequencing library (150 µl) with 162.5 µl of 2X running buffer, 6.5 µl fuel mix (Oxford Nanopore Technologies) and 156 µl molecular-grade sterile deionized water to a total volume of 450 µl. This volume was loaded into a 1 ml syringe and the syringe was capped. A total of 18 samples were prepared: nine flight samples and nine ground control samples. Capped syringes containing the samples were packaged with an identical 1 ml syringe for potential bubble removal and syringe tips were placed inside of a large plastic tube to facilitate transport. After syringe loading, the tubes were stored at −80 °C.

### Future flight-compatible sample preparation

While the DNA sequenced in-flight was prepared on the ground, recent demonstrations of the transfer of microliter fluid volumes with conventional pipettes in a microgravity environment17,19, in combination with Oxford Nanopore Technologies commercially-available rapid library preparation kit, lend themselves to the reality of sample preparation on the ISS. The miniPCR system is already onboard the ISS and has been used as both a heat block and thermal cycler in ground-based testing to support rapid 1D library preparation and amplifications of low input samples, respectively. Although a manual sample preparation process could be implemented on the ISS in the near future to enable the sequencing of spaceflight samples, flight certification of the soon-to-be-released sample and library preparation device, VolTRAX™ by Oxford Nanopore Technologies, could fully automate the entire process.

### Ground Processing of Cold Stowage Hardware

The nine flight samples were removed from the −80 °C freezer, placed on dry ice in a Styrofoam cooler, and shipped overnight to the launch processing facility at Kennedy Space Center (KSC). Once at KSC, the samples were maintained in a −80 °C freezer until transfer to the powered freezer on the SpaceX Dragon Capsule. To more closely approximate the handling of the flight samples, the nine ground control samples were also removed from the freezer, placed on dry ice in a Styrofoam cooler and allowed to sit on the laboratory bench for the same amount the time that the flight samples were out of the freezer during transit. A total of 18 R7.3 flow cells from the same manufacturing lot were shipped directly from the manufacturer to KSC for flight (9) and JSC to serve as ground controls (9). Upon receipt, the flow cells were stored at +4 °C until the time of launch. The flow cells and samples were maintained within the launch processing labs at KSC until 48 h before the launch, at which point they were loaded onto the vehicle.

### Launch and ISS Stowage Conditions

Flow cells were launched in a double cold bag in which the temperature was maintained between +2 and +8 °C. The DNA samples were launched in a powered freezer in which temperatures were maintained between −80 and −90 °C. Upon docking with the ISS, the flow cells and DNA samples were transferred to refrigerator (+2 to +8 °C) and freezer (−80 to −90 °C) dewars within the Minus Eighty Degree Laboratory Freezer for ISS (MELFI), respectively. All other items were stored at ambient ISS conditions.

### PacBio RSII Library Preparation and Sequencing

Single Molecule, Real-Time (SMRT) sequencing libraries were prepared using the SMRTbell Template Prep Kit 1.0 (Pacific Biosciences) and 20 kb Template Preparation Using BluePippin Size-Selection System protocol (Pacific Biosciences). For each sample, 5 μg were used. Library quality and quantity were determined using an Agilent 2200 TapeStation and Qubit dsDNA BR Assay (Life Technologies), respectively. Sequencing was conducted using P6-C4 chemistry and a v3 SMRT Cell (Pacific Biosciences) at Weill Cornell Medicine.

### Illumina MiSeq Library Preparation and Sequencing

Sequencing libraries were prepared from 1 ng of sample using the NexteraXT kit (Illumina) according to the manufacturer’s protocol. Libraries were indexed with dual 8-nt barcodes on each end of the sequencing amplicon. In total, 4 dual-indexed DNA sequencing libraries were constructed, corresponding to lambda bacteriophage, E. coli, mouse, and an equimolar mixture of DNA from the 3 organisms. Libraries were quantitated using the Agilent Bioanalyzer and Qubit spectrophotometer and sequenced on an Illumina MiSeq as a 1 × 160 bp single-end sequencing run. The approximate percentage of the run allocated to each library as determined by the quantified input concentration was 3% for lambda bacteriophage, 17% for E. coli, 30% for mouse, and 50% for the equimolar mixture.

Basecalling was performed using the Metrichor workflow “2D Basecalling for SQK-MAP006 - v1.107”. We used a custom shell script to extract one read from each fast5 file for further quantification and analysis, selecting the 2D read where available, and the higher quality of the 1D template or complement read where not.

### GraphMap and Calculation of Species Counts/Proportions

As described previously9, we aligned to a combined Enterobacteria lambda phage (NCBI reference sequence NC_001416.1), Escherichia coli (NCBI reference sequence NC_000913.3), and Mus musculus (mm10, GRCm38.p4) genome using GraphMap version 0.3.0, with the command “graphmap align -r $ref -d$fi -o \$name.sam”, which saves the top result for each read. We used the results to count the number of reads mapping to each of the three species and the fraction identity between reads and references.

For comparison of the relative species proportions in the sample mixture between the nanopore in-flight runs and the Illumina data, we separately aligned the Illumina and nanopore reads to the Mus musculus, E. coli, and lambda phage genomes using Bowtie2 in local alignment mode at default settings31 and GraphMap32, respectively. We then ensured only one unique mapping per read, which include assigning all lambda reads to the lambda genome (as the complete lambda genome is integrated in the E. coli chromosome), prior to calculating relative species proportions. Individual reads were mapped to the corresponding reference genome using the Geneious software package version 8.1.9 (Biomatters, Inc.). After determination of the consensus sequence in Geneious, consensus pairwise identities were calculated by taking each consensus sequence and performing a pairwise gapped alignment using the MAFFT algorithm (v7.0)33 at default settings (algorithm = “Auto”; scoring matrix = “200PAM/k = 2”, gap open penalty = 1.53; offset value = 0.123), as implemented in the Geneious software package.

### De novo genome assembly from PacBio and Illumina data

To ensure that our E. coli alignments and sequencing measures were not a result of any strain or sample-specific genetic drift or contamination, we performed a de novo assembly of the E. coli genome used in this study using the PacBio data. We used the Hierarchical Genome Assembly Process (HGAP, v2) for read-cleaning and adapter trimming (pre-assembly), de novo assembly with Celera Assembler, and assembly polishing with Quiver34. Raw sequencing reads were filtered for length and quality such that the minimum polymerase read score was 0.8, the minimum subread length was 500 bp, and the minimum polymerase read score was greater than 100. The assembly was generated using CeleraAssembler v1 with the default parameters and was polished using the Quiver algorithm34. Our assembly yielded a single contig of 4,734,145 base pairs at >99.7% accuracy and confirmed the E. coli sample as strain K-12.

We next performed independent de novo and directed assemblies of the E. coli genome using single-end Illumina data. Raw Illumina reads were preprocessed for trimming of adapters and removal of low-complexity and low-quality sequences. E. coli reads were identified using Bowtie2 alignment against the E. coli reference genome in local alignment mode at default parameters. De novo assembly was performed using the SPAdes genome assembly v3.8.235 with the “careful” parameter. The de novo assembled contigs as well as the original dataset of Illumina reads were then separately mapped to the PacBio E. coli genome assembly using Geneious v8.0 (Biomatters, Inc.). Illumina reads aligning to lambda by Bowtie2 were also mapped to the lambda phage genome (LAMCG) using Geneious.

### SURPIrt (sequence-based ultra-rapid pathogen identification, real-time) analysis

The SURPIrt pipeline is a real-time analysis pipeline for automated metagenomic pathogen detection and reference-based genomic assembly from nanopore sequencing data. Modeled after previously published SURPI10 and Metapore11 software, SURPIrt incorporates Linux shell scripts and code from the Python, Perl, Javascript, HTML, and Go programming languages. SURPIrt is currently being developed for analysis of clinical human samples, but was customized here for automated analysis of the NASA test mixture of mouse, E. coli, and lambda bacteriophage DNA. Specifically, reads are aligned successively using Megablast36 to mouse, viral RefSeq.37, bacterial RefSeq.37, and non-chordate eukaryotic10 sequence databases for detection of host (mouse) and microbial reads (Supplementary Figure 11). Reads are then taxonomically classified using a lowest common ancestor algorithm38, and graphical results are provided as tables, pie charts, and coverage maps. SURPIrt can be run on a server, cloud, or laptop.

### De novo genome assembly from nanopore data

We used Minimap v0.2-r124-dirty and Miniasm v0.2-r137-dirty as described in the github tutorial, using 8 threads (https://github.com/lh3/miniasm). Threeo assemblies were generated, (1) using 2D reads that mapped exclusively to Escherichia coli K12 MG1655 using GraphMap from the four runs on the ISS, and (2) using all 2D reads from the ISS that did not map to either human or mouse (using the hg38 human and mm10 mouse reference genomes), and (3) using all raw 2D reads. In parallel, we used Canu v1.7313 at default parameters using a specified target genome size of 4.8 MB for de novo assembly. Runs were performed using both a 64-core computational server with 512 gigabytes (Gb) memory and an 8-hyperthreaded core laptop with 64 Gb memory. Three assemblies were generated using the same read sets as for Miniasm. Assembly metrics, including N50, were calculated by the “abyss-stats.pl” program in ABySS39. The assembled contigs were mapped to the PacBio-generated E. coli genome and visualized using Mauve version 2.4.040. Consensus pairwise identities of the de novo-assembled E. coli genomes were estimated using JSpeciesWS41 after specifying the use of MUMmer42 for pairwise identity calculations.

## References

1. 1.

Taylor, G. R., Konstantinova, I., Sonnenfeld, G. & Jennings, R. Changes in the immune system during and after spaceflight. Adv Space Biol Med 6, 1–32, https://doi.org/10.1016/S1569-2574(08)60076-3 (1997).

2. 2.

Wilson, J. W. et al. Space flight alters bacterial gene expression and virulence and reveals a role for global regulator Hfq. Proc Natl Acad Sci USA 104, 16299–16304, https://doi.org/10.1073/pnas.0707155104 (2007).

3. 3.

Cho, I. & Blaser, M. J. The Human Microbiome: at the interface of health and disease. Nature reviews. Genetics 13, 260–270, https://doi.org/10.1038/nrg3182 (2012).

4. 4.

Kinross, J. M., von Roon, A. C., Holmes, E., Darzi, A. & Nicholson, J. K. The human gut microbiome: implications for future health care. Current gastroenterology reports 10, 396–403, https://doi.org/10.1007/s11894-008-0075-y (2008).

5. 5.

Johnson, S. S., Zaikova, E., Goerlitz, D. S., Bai, Y. & Tighe, S. W. Real-Time DNA Sequencing in the Antarctic Dry Valleys Using the Oxford Nanopore Sequencer. Journal of Biomolecular Techniques: JBT 28, 2, https://doi.org/10.7171/jbt.17-2801-009 (2017).

6. 6.

Edwards, A. et al. Deep Sequencing: Intra-Terrestrial Metagenomics Illustrates The Potential Of Off-Grid Nanopore DNA Sequencing. bioRxiv 133413, https://doi.org/10.1101/133413 (2017).

7. 7.

Hoenen, T. et al. Nanopore sequencing as a rapidly deployable Ebola outbreak tool. Emerging infectious diseases 22, 331, https://doi.org/10.3201/eid2202.151796 (2016).

8. 8.

Quick, J. et al. Real-time, portable genome sequencing for Ebola surveillance. Nature 530, 228–232, https://doi.org/10.1038/nature16996 (2016).

9. 9.

McIntyre, A. B. et al. Nanopore sequencing in microgravity. npj Microgravity 2, 16035, https://doi.org/10.1038/npjmgrav.2016.35 (2016).

10. 10.

Naccache, S. N. et al. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome research 24, 1180–1192, https://doi.org/10.1101/gr.171934.113 (2014).

11. 11.

Greninger, A. L. et al. Rapid metagenomic identification of viral pathogens in clinical samples by real-time nanopore sequencing analysis. Genome medicine 7, 99, https://doi.org/10.1186/s13073-015-0220-9 (2015).

12. 12.

Li, H. Minimap: Experimental tool to find approximate mapping positions between long sequences https://github.com/lh3/minimap/ (2015).

13. 13.

Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. bioRxiv; https://doi.org/10.1101/071282 (2017).

14. 14.

Koren, S., Walenz, B. P., K, B., Miller, J. R. & Phillippy, A. M. Canu: scalable and accurate long-read asembly via adaptive k-mer weighting and repeat separation. bioRxiv, https://doi.org/10.1101/071282 (2016).

15. 15.

Judge, K. et al. Comparison of bacterial genome assembly software for MinION data and their applicability to medical microbiology. Microbial Genomics 2; https://doi.org/10.1099/mgen.0.000085 (2016).

16. 16.

Parra, M. et al. Microgravity validation of a novel system for RNA isolation and multiplex quantitative real time PCR analysis of gene expression on the International Space Station. PLOS ONE 12, e0183480, https://doi.org/10.1371/journal.pone.0183480 (2017).

17. 17.

Rizzardi, L. F. et al. Evaluation of techniques for performing cellular isolation and preservation during microgravity conditions. Npj Microgravity 2, 16025, https://doi.org/10.1038/npjmgrav.2016.25 (2016).

18. 18.

Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17, 333–351, https://doi.org/10.1038/nrg.2016.49 (2016).

19. 19.

Rand, A. C. et al. Mapping DNA methylation with high-throughput nanopore sequencing. nature methods 14, 411–413, https://doi.org/10.1038/nmeth.4189 (2017).

20. 20.

Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. nature methods 14, 407–410, https://doi.org/10.1038/nmeth.4184 (2017).

21. 21.

Quick, J. et al. Real-time, portable genome sequencing for Ebola surveillance. Nature 530, 228–232, https://doi.org/10.1038/nature16996 (2016).

22. 22.

Sardi, S. I. et al. Coinfections of Zika and Chikungunya viruses in Bahia, Brazil, identified by metagenomic next-generation sequencing. Journal of clinical microbiology 54, 2348–2353, https://doi.org/10.1128/JCM.00877-16 (2016).

23. 23.

Kilianski, A. et al. Use of Unamplified RNA/cDNA–Hybrid Nanopore Sequencing for Rapid Detection and Characterization of RNA Viruses. Emerging infectious diseases 22, 1448, https://doi.org/10.3201/eid2208.160270 (2016).

24. 24.

Cucinotta, F. A. et al. Space radiation cancer risks and uncertainties for Mars missions. Radiation research 156, 682–688 (2001).

25. 25.

Hassler, D. M. et al. Mars’ surface radiation environment measured with the Mars Science Laboratory’s Curiosity rover. Science 343(6169), 1244797, https://doi.org/10.1126/science.1244797 1244797 (2013).

26. 26.

Zeitlin, C. et al. Measurements of energetic particle radiation in transit to Mars on the Mars Science Laboratory. Science 340, 1080–1084, https://doi.org/10.1126/science.1235989 (2013).

27. 27.

Ruhl, S. et al. Integrity of proteins in human saliva after sterilization by gamma irradiation. Appl. Environ. Microbiol 77, 749–755, https://doi.org/10.1128/AEM.01374-10 (2011).

28. 28.

Smith, A. M., Jain, M., Mulroney, L., Garalde, D. R. & Akeson, M. Reading canonical and modified nucleotides in 16S ribosomal RNA using nanopore direct RNA sequencing. bioRxiv, 132274; https://doi.org/10.1101/132274 (2017).

29. 29.

Garalde, D. R. et al. Highly parallel direct RNA sequencing on an array of nanopores. bioRxiv 068809, https://doi.org/10.1101/068809 (2016).

30. 30.

Nivala, J., Marks, D. B. & Akeson, M. Unfoldase-mediated protein translocation through an alpha-hemolysin nanopore. Nat Biotechnol 31, 247–250, doi: 1038/nbt.2503 (2013).

31. 31.

Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–359, https://doi.org/10.1038/nmeth.1923 (2012).

32. 32.

Sovic, I. et al. Fast and sensitive mapping of nanopore sequencing reads with GraphMap. Nature communications 7, 11307, https://doi.org/10.1038/ncomms11307 (2016).

33. 33.

Katoh, K. & Standley, D. M. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Molecular Biology and Evolution 30, 772–780, https://doi.org/10.1093/molbev/mst010 (2013).

34. 34.

Chin, C. S. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10, 563–569, https://doi.org/10.1038/nmeth.2474 (2013).

35. 35.

Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19, 455–477, https://doi.org/10.1089/cmb.2012.0021 (2012).

36. 36.

Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J Mol Biol 215, 403–410, https://doi.org/10.1016/S0022-2836(05)80360-2 (1990).

37. 37.

Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Research 35, D61–D65, https://doi.org/10.1093/nar/gkl842 (2007).

38. 38.

Huson, D. H., Auch, A. F., Qi, J. & Schuster, S. C. MEGAN analysis of metagenomic data. Genome Research 17, 377–386, https://doi.org/10.1101/gr.5969107 (2007).

39. 39.

Simpson, J. T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res 19, 1117–1123, https://doi.org/10.1101/gr.089532.108 (2009).

40. 40.

Darling, A. E., Mau, B. & Perna, N. T. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5, e11147, https://doi.org/10.1371/journal.pone.0011147 (2010).

41. 41.

Richter, M., Rossello-Mora, R., Oliver Glockner, F. & Peplies, J. JSpeciesWS: a web server for prokaryotic species circumscription based on pairwise genome comparison. Bioinformatics 32, 929–931, https://doi.org/10.1093/bioinformatics/btv681 (2016).

42. 42.

Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol 5, R12, https://doi.org/10.1186/gb-2004-5-2-r12 (2004).

## Acknowledgements

The Biomolecule Sequencer team thanks P. Whitson for performing sequencing experiments aboard the ISS is grateful to support personnel at the NASA Johnson Space Center and Marshall Space Flight Center, especially D. Voss and L. Gibson. We thank M. Weislogel for discussions on the physics of microfluidics in microgravity, Oxford Nanopore Technologies for technical and logistics support. We would also like to thank the Epigenomics Core Facility of Weill Cornell Medicine and Roger Altman. A.S.B. and S.C.W. acknowledge the ISS program office for funding. K.K.J. acknowledges support from the NASA Postdoctoral Program administered through the Universities Space Research Association. J.P.D., M.K.L. and T.A.S. acknowledge support from the NASA Astrobiology Instiute through the Goddard Center for Astrobiology. For A.B.R.M., N.A., C.E.M., we would like to thank the Epigenomics Core Facility at Weill Cornell Medicine, as well as the Starr Cancer Consortium grant (I9-A9-071) and funding from the Irma T. Hirschl and Monique Weill-Caulier Charitable Trusts, Bert L and N Kuggie Vallee Foundation, the WorldQuant Foundation, The Pershing Square Sohn Cancer Research Alliance, NASA (NNX14AH50G, 15Omni2-0063), the National Institutes of Health (R25EB020393, R01ES021006), the Bill and Melinda Gates Foundation (OPP1151054), and the Alfred P. Sloan Foundation (G-2015-13964). C.Y.C., S.F., D.S., S.S., and G.Y. are supported by the National Institutes of Health (R01-HL105704, R21-AI120977) and Abbott Laboratories, Inc.

## Author information

Authors

### Contributions

All authors contributed to writing the manuscript. S.L.C.W., K.K.J., J.P.D., M.L.L., D.J.S., D.J.B., T.A.S. and A.S.B. designed the procedures for in-flight sequencing and certified hardware and reagents for spaceflight. S.E.S. prepared the ground and flight samples and performed the ground control sequencing experiments. K.H.R. assisted with experiment design and performed the flight sequencing experiments. C.Y.C., S.F., D.S., S.S. and G.Y. analyzed data from flight and ground control samples, and performed orthogonal analyses on the Illumina and PacBio platforms. A.B.R.M., N.A. and C.E.M. assisted with experiment design, analyzed ground and flight data, and performed orthogonal analyses on the PacBio and Illumina platforms. S.J., D.J.T. and F.I. assisted with developing flight-compatible sample loading procedures and the development of data analysis workflows in Metrichor.

### Corresponding author

Correspondence to Aaron S. Burton.

## Ethics declarations

### Competing Interests

Three authors (D.J.T., S.J. and F.I.) are employees of Oxford Nanopore Technologies, the company that produces the MinION sequencing technology. They assisted with experiment planning and instrument testing for flight. Analyses of nanopore data were performed independently by the bi-coastal team of the Chiu and Mason labs and the scientists at Oxford Nanopore Technologies. C.Y.C. is the director of the UCSF-Abbott Viral Diagnostics and Discovery Center and receives research support from Abbott Laboratories, Inc. All other authors declare no conflicts of interest.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions

Castro-Wallace, S.L., Chiu, C.Y., John, K.K. et al. Nanopore DNA Sequencing and Genome Assembly on the International Space Station. Sci Rep 7, 18022 (2017). https://doi.org/10.1038/s41598-017-18364-0

• Accepted:

• Published:

• ### 100 Years of evolving gene–disease complexities and scientific debutants

• Saman Zeeshan
• , Ruoyun Xiong
• , Bruce T Liang
•  & Zeeshan Ahmed

Briefings in Bioinformatics (2020)

• ### Current Progression: Application of High-Throughput Sequencing Technique in Space Microbiology

• Yanwu Chen
• , Bin Wu
• , Cheng Zhang
• , Zhiqi Fan
• , Ying Chen
• , Bingmu Xin
•  & Qiong Xie

BioMed Research International (2020)

• ### An educational guide for nanopore sequencing in the classroom

• Alex N. Salazar
• , Franklin L. Nobrega
• , Christine Anyansi
• , Ana Rita Costa
• , Anna C. Haagsma
• , Anwar Hiralal
• , Ahmed Mahfouz
• , Rebecca E. McKenzie
• , Teunke van Rossum
• , Stan J. J. Brouns
• , Thomas Abeel
•  & Francis Ouellette

PLOS Computational Biology (2020)

• ### Why Personalized Medicine Is the Frontier of Medicine and Performance for Humans in Space

• Michael A. Schmidt
• , Caleb M. Schmidt
• , Robert M. Hubbard
•  & Christopher E. Mason

New Space (2020)

• ### Library preparation for next generation sequencing: A review of automation strategies

• J.F. Hess
• , T.A. Kohl
• , M. Kotrová
• , K. Rönsch
• , T. Paprotka
• , V. Mohr
• , T. Hutzenlaub
• , M. Brüggemann
• , R. Zengerle
• , S. Niemann
•  & N. Paust