Introduction

Kilobase-length, single-stranded DNA (ssDNA) is essential to numerous biotechnological applications including sequencing1, cloning2, homology directed repair templating for gene editing3, DNA-based digital information storage4,5, and scaffolded DNA origami6,7,8,9. Specifically, scaffolded DNA origami enables the fabrication of custom structured nanoscale objects with application to nanoscale lithography10,11, light harvesting and nanoscale energy transport12,13,14,15, metal nanoparticle casting16, and therapeutic delivery17,18. In this approach, a long ssDNA scaffold is folded via self-assembly into arbitrary, user-specified shapes by slow annealing in the presence of complementary short oligonucleotide “staples”. These staples are designed using Watson-Crick base-pair complementarity to the scaffold, forcing sequences that are far apart in sequence space to be close in physical space. Fully automated, top-down computational design of scaffolded DNA origami nanostructures has now been enabled by sequence design algorithms in both 2D and 3D19,20,21,22,23, enabling the democratization of otherwise complex scaffolded DNA origami design that previously excluded non-experts. DNA origami generated from these algorithms with sizes from 10–50-nm typically require scaffold lengths of 1–3 kb. However, therapeutic and materials science applications that require large-scale, low-cost scaffold with custom length and sequence requirements are still hindered by limitations in production of circular, isogenic scaffolds on the 1–3 kb scale24,25,26,27,28.

The most common low-cost source of native cssDNA for scaffolded DNA origami is the 7,249 base single-stranded M13mp1829 phage genome, which has allowed production of up to 410 mg of cssDNA per liter of E. coli growth through fed-batch fermentation30. However, because therapeutic and materials science applications of DNA origami often require exact-size scaffolds less than 3,000 nt, scalable production of custom scaffold lengths are of value. To achieve custom sequence design of mini-scaffold DNA, helper plasmid systems are employed where the M13 coding sequences are sub-cloned onto a double-stranded, low-copy number vector that is co-transformed with a phagemid containing an ssDNA origin of replication (e.g., f1 origin) that allows for the synthesis and packaging of ssDNA. The most commonly used helper plasmid system is M13KO731, which maintains a packaging sequence, albeit with a mutated packaging signal to reduce the packaging frequency. This system has shown utility in phage display32,33,34, and has also been applied to produce phagemids that encode either a 1,983-nt or 2,404-nt sequence, both containing a cssDNA f1 origin, a dsDNA pUC origin, and an ampicillin selection marker24,28. However, these phagemid cssDNAs were contaminated by other DNAs, both from the dsDNA phagemid and M13KO724. Importantly, isolation of the target ssDNA from these DNA impurities would require subsequent purification steps for any further scale-up or would introduce background sequence contamination of the helper plasmid and dsDNA and consequently lower yields in applications to DNA origami folding.

To overcome these limitations, the ssDNA origin of replication and packaging signal was entirely removed from a helper plasmid (E. coli str. M13cp)35, thereby enabling the biological production of isogenic cssDNA without DNA impurities. Leveraging this advance, one innovative approach to achieving custom bacterial scaffolds was recently implemented using a modified M13 origin of replication, but scalability was not reported and a lack of a selection marker in the produced material indicated that scalability might be challenging25. Another recent novel approach was the self-excision of ssDNA from phage DNA for scaffolds less than 3,200 nt26. However, the produced strands are linear and, without additional chemical stabilization36, would be prone to exonuclease degradation37. Additionally, the excision process requires further purification steps to remove residual DNAzymes. In each case, optimization for pure cssDNA production38 without genomic or plasmid contamination is still required25,26. Additionally, elimination of the antibiotic selection marker from both of these approaches25,26 ultimately does not allow for subsequent re-infection without this selective control, limiting downstream biological applications where reinfection would be useful, such as for biological sequence amplification. Thus, there remains a critical need for scalable production of isogenic cssDNA at the 1–3 kb length scale that maintains replication capacity.

To address this need, we show that isogenic miniphage production of cssDNA is scalable by fermentation using the E. coli str. SS320 with the M13cp helper plasmid. Three miniphages were synthesized using both classic restriction and restriction-free (RF) cloning39,40. The miniphage presented here maintain the selection marker and origin of replication, which allows for the reinfection of the phage in culture while reducing the occurrence of contaminating dsDNA because they do not contain a double-strand origin of replication, similar to the natural M13 phage. Monitoring phage yields and growth rates of the bacteria, we identify an 8-hour timepoint after inoculation that yields maximal cssDNA production with no detectible DNA contamination. Scalable silica-column-based DNA extraction techniques from clarified media yielded 2 mg of pure cssDNA per liter of culture with final endotoxin levels similar to detergent based methods of endotoxin removal41. We demonstrate for the first time bioreactor-scalability of production of highly pure cssDNA of less than 3,000 nt in length, which is essential for the generation of circular scaffolds for wireframe DNA origami nanoparticles with partial sequence control19,22,23, with additional applications to write-once, read-many archival DNA data storage.

Results

A variant of extension-overlap, restriction-free cloning39,40 using long ssDNA (Fig. 1a) was applied for the de novo assembly of a miniphage genome containing only an f1 origin of replication and an ampicillin resistance selection marker (phPB52). Two kilobase-scale megaprimer ssDNAs were generated using asymmetric PCR (aPCR)37 using 5′-phosphorylated primers: a top-strand megaprimer encoding the f1 origin sequence (427 nt)42 and a bottom-strand megaprimer encoding an ampicillin resistance cistron (bla; 1,249 nt) (Figs 1b and S1). The kilobase primers were synthesized such that the two sequences contained a complementary sequence of 20 nt on each of the 5′ and 3′ ends (Fig. 1a). The two megaprimers were mixed at equimolar concentration and completed to dsDNA using PCR, followed by enzymatic ligation. The ligated plasmid (Fig. S2) was transformed into chemically competent E. coli str. M13cp35 and dual selected on ampicillin and chloramphenicol with no detectable toxicity due to the miniphage, resulting in normal colony shape and size. Two out of eight colonies screened were found to be of the exact sequence desired. Liquid culture was inoculated and grown in a shaker flask, after which the culture was centrifuged to separate the phage-containing media from the bacterial pellet. Phage in the clarified media were visualized by TEM, showing the anticipated size and homogeneity (Figs 1c and S1). The cssDNA from this phage was isolated using silica column purification and showed 88% cssDNA purity according to agarose gel imaging (Fig. 1d), with an approximate yield of 0.5 mg per liter of bacterial growth, while the bacterial pellet showed helper plasmid, dsDNA intermediate phage DNA, and cssDNA.

Figure 1
figure 1

Scalable bacteriophage production of isogenic cssDNA. (a) Miniphage phPB52 was assembled using restriction-free (RF) cloning was used for miniphage phPB52 assembly and transformed into E. coli containing the M13cp helper plasmid for production of isogenic cssDNA. (b) aPCR was used to generate the two ssDNA megaprimers for RF cloning encoding the f1 origin of replication (f1 ori) and the bla ampicillin selection marker (Selection). See Fig. S1a for uncropped color image. (c) Phage particles from clarified media were visualized by TEM. See Fig. S1b for uncropped image. (d) DNA purification from the bacterial pellet and the clarified media show mostly pure cssDNA in the media and cssDNA and dsDNA phagemid, and helper plasmid in the bacterial pellet. See Fig. S1c for uncropped color image.

Having generated a phage containing only the f1 origin and a resistance gene, demonstrated to be stably produced and exported to the media from the helper strain, we next sought to generate a second phage with a synthetic fragment of DNA that is orthogonal in sequence to bacterial and phage genomes. A fragment of length 844 nt was ligated between the f1 origin and the bla cistron using standard restriction cloning to generated a plasmid of size 2,520 nt (phPB84; Fig. S2). This plasmid was transformed into the helper strain and the produced phage was purified and its sequence verified by primer walking with Sanger sequencing (External Tables S1 and S2)

Figure 2
figure 2

Shaker flask production of pure cssDNA. (a) Shaker flask growth of phPB84 was used to optimize conditions for phage amounts and purity. (b) Time-course assay of cssDNA production of phPB84, with cssDNA yield calculated by absorbance at 280 nm and purity adjusted by agarose gel band intensity, showing maximum yield and purity at the 8-hour timepoint. The 16 h time-point is from a separate culture, and therefore is not included in the plot. See Fig. S1a for uncropped color gel image. (c) Comparison between DH5a F′Iq and SS320 showing two-fold yield increases in the SS320 strain. (d) Comparison between growth media showing five-fold improved cssDNA yield in 2 × YT after 8 hours of production. (e) Comparison of five pH values for cssDNA production, controlled by use of 100 mM HEPES-NaOH. Error bars indicate standard deviation of triplicate experiments. See Figs S3 and S3 for uncropped color triplicate measurements of all experiments.

In order to obtain milligram-scale production of cssDNA with high genetic purity of the final material, we used a shaker flask setup (Fig. 2a) to vary the growth time, the E. coli strain, the growth media, and the media pH to determine optimal conditions. We found the highest and purest yield of cssDNA production occurred at the 8-hour timepoint after inoculation, near the end of log phase, with production falling off thereafter and the appearance of dsDNA contaminations in the media visualized at the 12-hour timepoint (Figs 2b and S3). Two strains were tested for production: DH5a F′Iq (Invitrogen) and the SS320 strain (Invitrogen). Both express the F pili and are commonly used for phage production, and each was transformed with the M13cp helper plasmid purified from E. coli str. M13cp. Strain SS320 showed approximately double the cssDNA yield (Figs 2c and S4) and was therefore chosen as the strain for further optimization of growth conditions. Terrific broth (TB) and 2 × yeast extract tryptone (2 × YT) media for bacterial growth and cssDNA production were both evaluated for phage growth while also monitoring dsDNA contamination using agarose gel analysis (Figs 2d and S4). Notably, TB had significant dsDNA contamination by the 8-hour timepoint (Fig. S4), and 2 × YT was therefore chosen as the optimal media for batch production. Next, we investigated the pH sensitivity of the production of phage material38, which exhibited a three-fold increase in yield at pH 6.8 and 7.2 compared to pH 8 (Figs 2e and S4).

Having identified the optimal growth conditions in the shaker flask setup, we next identified conditions for scale-up in a batch fermenter process (Fig. 3a) using a Stedium Sartorius 5 L fermenter (Sartorius, Germany). Shaker flask conditions were transferred to the bioreactor setup including using 2 × YT media, while pH 7.0 was controlled using external phosphoric acid and ammonium hydroxide. The growth curve was monitored using O.D.600 absorbance measurements and the pH and dissolved oxygen were monitored by calibrated probes. Each timepoint was additionally monitored for cssDNA and dsDNA production using agarose gel analysis (Figs 3b and S5), showing maximal cssDNA yield at the 8-hour timepoint, as with the shaker flask, with minimal contaminating dsDNA up to the 12-hour timepoint (Fig. S5). Extraction of 900 mL of media for phage purification was carried out at the 8-hour timepoint and processed using a silica-column based approach specifically designed to reduce endotoxin levels (EndoFree Megaprep Kit, Qiagen, MD). Gel band intensity analysis after kit purification showed no detectable dsDNA contamination (Fig. 3c). Sanger sequencing by primer walking verified the sequence of the phage DNA (External Tables S1 and S2). The kit-based purification yielded 2 mg of cssDNA/L of culture, matching the yield from phenol-chloroform extraction. Endotoxins were tested using a colorimetric assay (ToxinSensor Chromogenic LAL Endotoxin Assay Kit, GenScript, NJ), showing the final product yielded endotoxin levels at 1.1 ± 0.1 E.U./ml per cssDNA concentration of 10 nM, similar to endotoxin reduction by Triton-X11441 (Fig. S6). Circularization of the produced cssDNA was verified by incubation with exonuclease I, showing no detectable degradation after 30 min (Fig. 3c).

Figure 3
figure 3

Batch fermenter production of pure cssDNA. (a) Scalable production in a stirred-tank bioreactor. (b) Time-course assay of cssDNA yield based on agarose gel band intensity analysis (Fig. S5), with the 8-hour timepoint used for 900 mL cssDNA purification. (c) Silica-column DNA purification from the PEG-precipitated phPB84 phage showed no detectible dsDNA contamination, similar in purity to commercially available M13mp18, yielding 2 mg of DNA per liter of culture at the 8-hour timepoint. Stability from exonuclease I (ExoI) degradation after 30 min coincubation indicates the ssDNA is circular.

Having implemented a method for milligram-scale production of isogenic miniphage cssDNA, we next applied the method to produce custom length single-stranded DNA scaffold with partial sequence control for application to wireframe scaffolded DNA origami (Fig. 4). We used the DAEDALUS design algorithm19 to design a DNA-scaffolded pentagonal bipyramid with a 52-bp edge length (1,580-nt scaffold length) using the smallest phPB52 phage genome sequence (1,676 nt) and a second DNA-scaffolded pentagonal bipyramid with an 84-bp edge length (2,520-nt scaffold length) using the phPB84 phage genome sequence (2,520 nt). Notably, any DNA origami with scaffold lengths larger than 1,676 nt can have perfectly matched phage genome lengths, as exemplified in the pentagonal bipyramid with an 84-bp edge length. DNA origami object folding was characterized using agarose gel mobility shift assays and transmission electron microscopy (TEM), which confirmed monodispersed object sizes with near quantitative yield of self-assembly (Figs 4aS7 and S8).

Figure 4
figure 4

Applications of scalable, isogenic miniphage production. (a) Pentagonal bipyramids of 52-bp and 84-bp edge-lengths were folded using the phPB52 and phPB84 as scaffolds, respectively. Agarose gel shift mobility assays and TEM were used to validate the folding of the scaffold to the expected design. See Figs S7 and S8 for uncropped gel images and example full field TEM micrographs, respectively. Scale bars represent 20 nm. (b) Phage particles are natively protected from environmental degradation and easy to amplify by bacterial infection, and thus provide a compelling method for archival and amplification of digital information encoded in DNA. The encoding scheme shown here can generate bio-orthogonal sequences that are designed to limit secondary structure and recombination sites. DNA encoding a digital text file containing a line from The Crucible43 (full encoded text: “The answer is in your memory and you need no help to give it to me. Why did you dismiss Abigail Williams?”) was ligated to the phPB52 vector and subsequently produced and amplified in bacteria. Sanger sequencing by primer walking was used to retrieve the original digital file.

As an alternative application, we applied our platform to package digital information encoded in the DNA sequence for write-once, read-many archival DNA storage. Specifically, we cloned a sequence into the phPB52 variable domain that encoded a line of text (“The answer is in your memory and you need no help to give it to me. Why did you dismiss Abigail Williams?”) from Act II, Scene 2 of The Crucible by Arthur Miller43 (phCruc; Fig. 4b). While the binary representation of this encrypted text file was converted into a DNA sequence using direct nucleotide conversion (A or C representing 0 and T or G representing 1), other encryption or compression approaches could alternatively be employed. A universal forward primer, together with header information were added to the 5′ of the sequence, and an end-of-file (EOF), random slack space, and a universal reverse primer were added to the 3′ end (Fig. 4b). The sequence was optimized for single-strandedness by ensuring no regions of sequence had greater than 7 bases that were repeated or complementary to any other region of the sequence. The DNA “memory block” was cloned into the phPB52 sequence (Fig. 4b) and four-milliliter production of the phCruc phage showed 95% purity of the cssDNA as judged by agarose gel band intensities. Sanger sequencing was used to retrieve the insert sequence and decode the digital message (External Table S1).

Discussion

We applied the E. coli str. M13cp strain for scalable bioproduction of pure cssDNA, which has the capabilities of generating isogenic material for biotechnological applications including scaffolded DNA origami and digital information archiving and amplification, amongst other uses. The method employed here to direct purification of phage cssDNA without additional dsDNA contamination allows for new technology development in synthetic cssDNA sequence production that can be made bio-orthogonal and scalable, enabling future application to novel therapeutics and materials. Additional advances in the scaffolded DNA origami field are applicable to this strain, including the incorporation of DNAzymes26 that would allow for greater control over the sequence and size of the produced linear ssDNA. However, in the approach used here, maintenance of the f1 origin and the selection marker in the produced phage allows for reinfection across the culture, which is important for subsequent biological amplification such as needed in phage display and, here, archival information storage. Moreover, circularization blocks exonuclease activity, which may prove important for therapeutic applications44. In the future, improved understanding of phage biology should enable new approaches to excising specific coding sequences from M13 to generate engineered systems specifically designed for production of cssDNA.

The yields from the bioreactor approach used here were lower than wild type phage production that has been extensively optimized26,30. This is due in part to the loss of the native feedback control over gene expression in the phage genome45, the use of batch fermentation as opposed to a fed-batch approach that would allow for higher cell density30, and plasmid loss due to ampicillin selection. Interestingly, we were not able to obtain clones of kanamycin or chloramphenicol selection cistrons on the vector purely under the control of the f1 origin. This may be due to the use of their respective cognate promoters, which might be overcome by alternative single-strand-specific promoters46. This resistance insertion would then allow for fed-batch scale-up, leading to significantly improved yields.

Increased cssDNA production yields, together with advances in custom sequence design25,26 and bio-orthogonality47,48 with and without protein coding sequences, suggest that our approach is amenable to therapeutic applications in which ssDNA are used in circular49 or linear50 forms. In particular, scalable production of pure ssDNA at lower costs could enable yields required for therapeutic dosages of kilobase-length HDR template strands51, a strategy that is further enabled by applying a DNAzyme approach for linearization26. Scalable biological production of scaffolded DNA origami now matches production amounts from solid-state DNA synthesis commonly used for staple production, so that scaffolded DNA origami nanoparticles may now be produced at reasonable cost for mouse and higher animal therapeutic studies. Staple sequences synthesized with modifications to improve staple stability may further enhance nanoparticle lifetimes36.

The alternative application of custom length and sequence scaffolds to encode digital information offers a write-once, read-many approach to low-cost massive archival data storage. Phage packaging is known to improve DNA stability against nuclease and chemical degradation52, and ease of amplification in bacterial cultures makes this a intriguing method for native archival storage and biological-based information amplification, compatible with all sequencing strategies developed for M13 shotgun sequencing. Further, knowledge of phage biology and phage display offers an interesting set of possibilities for conditional amplification of phage sequences that encode specific digital information, a possible alternative or complement to the current PCR-based solutions that are being developed53.

Materials and Methods

Plasmid assembly by single-stranded DNA

All sequences of phage genomes (External Table S1) and primers (External Table S2) are contained in the External Supplementary Tables Excel file. The sequence of the f1 origin of replication was ordered from Integrated DNA Technologies (IDT, Inc., Coralville, IA) as a gBlock™ with 20 nt primers flanking the 5′ and 3′ sides designed to have a calculated melting temperature of 57 °C54. Double stranded DNA was generated by amplification of the synthetic gBlock f1 sequence with Phusion™ polymerase (New England Biolabs, Inc., Ipswitch, MA). The beta-lactamase (bla) ampicillin resistance gene with its promoter and terminator sequences were amplified from pUC19 using Phusion™ polymerase and 5′ and 3′ primers extended on their 5′ by the complementary pair of the f1 gBlock fragment. In each case, the PCR-amplified material was purified by ZymoClean agarose gel purification (Zymo Research, Inc., Irvine, CA) and column cleanup (Qiagen miniprep spin purification kit, Qiagen, Inc., Germany). Single-stranded DNA was generated using asymmetric production with 200 ng of purified dsDNA and 1 μM 5′-phosphorylated primer and Accustart HiFi polymerase (QuantaBio, Inc., Beverly, MA) in 1× Accustart HiFi buffer with 2 mM MgCl2, and cycled 25 times, as previously described37. The ssDNA was gel- and column-purified. The two ssDNA products were then mixed in a 1:1 molar ratio and the ssDNA was converted to dsDNA using Phusion polymerase, column purified, and ligated using T4 DNA ligase (NEB) in 1× T4 DNA ligation buffer with 30 ng of amplified DNA incubated at room temperature overnight.

E. coli strains M13cp35, DH5α F′Iq (Thermo Fisher, Inc., Waltham, MA), and SS320 (Lucigen, UK) were each transformed with the M13cp helper plasmid (a generous gift of Dr. Andrew Bradbury, Los Alamos National Lab) and made competent by washing log-phase grown cells in ice cold 100 mM CaCl2. 20 µL of competent cells were transformed with 2 µL of phagemid DNA ligation mix. Cells were incubated on ice for 30 minutes, heat shocked at 42 °C for 45 seconds, and then put on ice. Pre-warmed SOB media was added and the cell culture was shaken at 37 °C for 1 hour. 100 µL of cells were plated evenly across a Luria-Agar (LA) media plate made with 100 µg/mL ampicillin and 15 µg/mL chloramphenicol.

Individual colonies were selected and grown in 5 mL of Terrific Broth (TB) supplemented with 1% glycerol for overnight at 37 °C. Bacteria was removed by centrifuging at 4,000 rpm for 10 minutes. Supernatant was removed and placed in a new 1.5 mL spin column and spun at 4,000 rpm for an additional 10 minutes. 1 µL of the clarified supernatant was added to 20 µL of nuclease-free water and heated to 95 °C for 5 minutes, after which 1 µL of the heated solution was added to a Phusion PCR mix containing enzyme, buffer, nucleotides, and forward and reverse primers used to generate the plasmid. Positive colonies were determined by the presence of the PCR amplicon as visualized by agarose gel, and the purified phage were sent for Sanger sequencing. Of the eight colonies chosen, two were shown to have the correct sequence. The bacterial pellet was processed to purify all containing DNA by alkaline lysis and column purification (Qiagen miniprep spin kit, Qiagen, Inc., Germany).

Purified dsDNA was PCR amplified from phPB52 to have an EcoRI and PstI nuclease sites between the bla resistance cistron and the f1 ori. The PCR product was purified and digested alkaline phosphatase treated and gel purified. Synthetic DNA insert encoding digital information was generated using a computational algorithm that optimizes single-stranded compatibility. The synthetic sequences were amplified from a pUC19 vector containing the sequences to have flanking EcoRI and PstI sites, and digested with EcoRI and PstI nucleases. The product was gel purified. The inserts were ligated to the phPB52 digested vector in 1×T4 DNA ligation buffer (NEB) with 30 ng of vector DNA with three molar excess of synthetic inserts, incubated at room temperature overnight, and transformed into competent helper strain E. coli. These generated phPB84 and phCruc phage genomes.

Synthetic phage production

Phage producing colonies, as judged by positive PCR, gel visualization, and sequencing results, were grown overnight in 4 mL 2 × YT supplemented with 100 μg/mL ampicillin, 15 μg/mL of chloramphenicol and 5 μg/mL of tetracycline (Sigma-Aldrich, Inc.) in 15 mL culture tubes shaken at 200 RPMs at 37 °C. The following day, the cultures were diluted to an O.D.600 of 0.05 in 2 × YT supplemented with 100 μg/mL ampicillin, 15 μg/mL of chloramphenicol and 5 μg/mL of tetracycline and grown between 3 h to 27 h for time course experiments, and 8 h for media, pH, and strain optimization experiments. For pH optimization, the pH was controlled by addition of 100 mM HEPES-NaOH to the 2 × YT media. Strain-specific antibiotics were used as recommended by the manufacture. After the chosen time point, the cultures were spun down at 4,000 RPMs for 15 minutes, after which the supernatant was removed to a fresh tube and spun at 4,000 RPMs for an additional 15 minutes, and filtered using a 0.45 μm cellulose acetate filter. For gel and nanodrop quantification, 400 μL of the clarified media was lysed by addition of Qiagen Buffer P1 supplemented with Proteinase K (20 μg/mL final; Sigma) and RNase A/T1 and incubated at 37 °C for 1 h, followed by addition of Qiagen Buffer P2 and heating to 70 °C for 15 minutes, and letting return to room temperature. Qiagen Buffer N3 was then added and precipitant was centrifuged. One volume of 100% ethanol was added to the supernatant and applied to a Qiagen spin column, and purified. The purified eluate DNA concentration was determined by A280 absorbance from a NanoDrop 2000 (Thermo Fisher) for each time point and condition tested, and ran on a 1% agarose gel in 1 × Tris-Acetate-EDTA (TAE) stained with SybrSafe (Thermo Fisher) for visualization of the product. The ssDNA purity was judged by ImageJ55 intensity analysis and the amount of ssDNA from the time point or condition was adjusted by this purity multiplied by the total amount of DNA found from A280 absorbance.

For milligram-scale production of synthetic miniphage, a Stedium Sartorius fermenter was used for growing 5 L of culture. An overnight culture was grown in 2 × YT supplemented with 100 μg/mL Ampicillin and 15 μg/mL of chloramphenicol and 5 μg/mL of tetracycline and diluted to O.D. 600 of 0.05 for inoculating 5 L of media. The growth media for the batch fermentation was also 2 × YT supplemented with 100 μg/mL Ampicillin and 15 μg/mL of chloramphenicol and 5 μg/mL of tetracycline. Oxygen and pH were monitored throughout the growth, and the pH was maintained at 7.0 with phosphoric acid and ammonium hydroxide, with a constant agitation of 400 RPM. Time points were taken approximately every hour and samples were processed as above for the shaker flask. At 8 h, 900 mL of liquid culture was removed for processing. For milligram-scale purification of ssDNA, 900 mL of liquid culture bacteria was pelleted by centrifuging twice at 4,000 × g for 20 min, followed by 0.45 μm cellulose acetate filtration. Phage from clarified media were precipitated by adding 6% w/v of polyethylene glycol-8000 (PEG-8000) and 3% w/v of NaCl and stirring continuously at 4 °C for 1 h. Precipitated phage were collected by centrifuging at 12,000 × g for 1 h, and the PEG-8000 supernatant was removed completely, and pellet was resuspended in 30 mL of 10 mM Tris-HCl pH 8.0, 1 mM Ethylenediaminetetraacetic acid (EDTA) buffer (TE buffer). The phage was then processed using an EndoFree Maxiprep (Qiagen, Germany) column-based purification, following the manufacturer’s protocol with two adjustments. First, proteinase K (20 μg/mL final) was added to EndoFree Buffer P1 and incubated at 37 °C for 1 h before addition of EndoFree Buffer P2 and incubation at 70 °C for 10 min. The lysed phage was returned to room temperature before proceeding. Second, after removal of endotoxins, 0.2 v/v of 100% ethanol was added to the clarified sample, before applying to the EndoFree Maxiprep column to increase ssDNA binding. All other steps remained the same, and the cssDNA was eluted in 1 mL of endotoxin-free TE buffer. The amount of collected DNA was judged by absorbance at A280, and the purity was judged by running on a 1% agarose gel in 1× TAE stained with ethidium bromide.

Endotoxin amounts were tested using the ToxinSensor chromogenic LAL endotoxin assay kit (GenScript, Piscataway, NJ) following the manufacturer’s protocol. The cssDNA phPB84 was diluted to 10 nM in endotoxin-free water, with absorbance read for each measurement on a standard curve and the 10 nM sample on an Evolution 220 UV/Vis spectrophotometer (Thermo Fisher). Stability from exonuclease I degradation was tested by incubating cssDNA phPB84 with exonuclease I in 1× exonuclease buffer (NEB) at 37 °C for 30 min. The reaction was quenched by incubating the reaction at 80 °C for 15 min, and was subsequently ran on a 1% agarose gel in 1 × TAE stained with ethidium bromide.

Agarose gels were either cast as 1% gels using SeaKem agarose stained with SybrSafe or were purchased as precast Reliant 1% Gold agarose stained with ethidium bromide (SeaKem). Gels were imaged under a blue light and the images were then made to black-and-white, and color inverted to show a black band using Adobe Photoshop CC2018. Original gel images can be found in the associated Supplementary Information.

DNA origami assembly

DNA purified from phages phPB52 and phPB84 were used to fold a pentagonal bipyramid with edge length 52 base pairs and 84 base pairs, respectively. Staples for each object were generated from the automated scaffold routing and staple design software DAEDALUS19. Staples were synthesized by IDT, and listed in External Tables S3 and S4. To fold the nanoparticles, 20 nM of bacterially-produced and purified scaffold was incubated with 20-molar excess of staples in 1 × TAE buffer with 12 mM MgCl2. The objects were annealed over 13 hours from 95 °C to 24 °C as previously described19, and the folded particles were run on 1% agarose gel in 1 × TAE buffer with 12 mM MgCl2 with the respective cssDNA scaffolds for reference. The folded nanoparticles were purified using a 100 kDA MWCO spin concentrator (Amicon) for a total of five 5-fold buffer exchanges as purification for TEM.

DNA data encoding scheme

Kilobase length-scale synthesis of DNA sequences are optimized for single-strandedness by ensuring limited self-complementarity and repeat sequences, as well as reducing mononucleotide repeats37. Digital information can be encoded to satisfy these requirements by pseudo-randomizing the bitstream through encryption and having bit-to-base be a one-to-two encoding scheme. This was implemented using a python script that converts a digital file to a DNA sequence that satisfies constraints for single-strandedness by choosing the sequence based on the sequence having no problematic sequences of wordsize greater than 7 returned by BLASTn. The digital file bitstream is encrypted using AES with block cipher mode with a randomly generated password (“ry%Tr*>2Y><NFv5aqAEhU@Q046Cy$n92”) and a randomly generated 16 nt DNA sequence initialization vector (“TAATTTACTTATTCTC”). The encoded bitstream is then translated to a DNA sequences with 0 represented randomly by an A or C and 1 represented randomly by a G or T. No homopolymers greater than 5 nucleotides are allowed and are swapped to the other choice base as required. The converted sequence is then a concatenate of a master forward primer sequence (CTTGGGTGGAGAGGCTATTC), a file type identifier (GTTTAAGGTCACATCGCATG), the initialization vector, a four-nt base-4 (T: 0, G: 1, A: 2, C: 3) memory page (TGTT), an eight-nt base-4 digital file size (GTTGAGCC), the bitstream data, an end-of-file sequence (GTACTAGTCGACGCGTGGCC) and randomly generated slack space to a user specified DNA block length, and a master reverse primer sequence (GATCTCCTGTCATCTCACCT). The slack space is randomly generated, as are the individual bit-to-base choices, and thus the bases are iteratively swapped as needed to satisfy the BLASTn wordsize constraint.

To reconstruct the file, the extraneous header and footer data is stripped from the digital encoded sequence and the sequence is converted back to bitstream data by direct conversion of A or C to 0 and G or T to 1. Applying the Python 3.4 PyCrypto module’s AES function in block cipher mode with the 16-nt sequence initialization vector and password given above to the bitstream data allows for the retrieval of the original text file containing the line from The Crucible (“The answer is in your memory and you need no help to give it to me. Why did you dismiss Abigail Williams?”). We have made the source of this Python decoding algorithm freely available on GitHub at https://github.com/lcbb/ssDNA-memory.

Transmission electron microscopy

The structured DNA pentagonal bipyramid with 52-base-pair and 84-base-pair edge length assembled using the phage-produced scaffold was visualized by transmission electron microscopy (TEM). A volume of 200 µL of folded reaction was purified from excess staples and buffer exchanged into 20 mM Tris-HCl pH 8.0 and 8 mM MgCl2 using a 100 kDa MWCO spin concentrator (Amicon, Merck Millipore, Billerica, MA). The concentration was subsequently adjusted to 5 nM. Carbon film with copper grids (CF200H-CU; Electron Microscopy Sciences Inc., Hatfield, PA) were glow discharged and the sample was applied for 60 seconds. The sample was then blotted from the grid using Whatman 42 ashless paper, and the grid was placed on drop of freshly prepared 1% uranyl-formate with 5 mM NaOH for 10 s56. Remaining stain was wicked away using Whatman 42 paper and dried before imaging. The grid was imaged on a Technai FEI with a Gatan camera.