The essential genome of the crenarchaeal model Sulfolobus islandicus

Sulfolobus islandicus is a model microorganism in the TACK superphylum of the Archaea, a key lineage in the evolutionary history of cells. Here we report a genome-wide identification of the repertoire of genes essential to S. islandicus growth in culture. We confirm previous targeted gene knockouts, uncover the non-essentiality of functions assumed to be essential to the Sulfolobus cell, including the proteinaceous S-layer, and highlight essential genes whose functions are yet to be determined. Phyletic distributions illustrate the potential transitions that may have occurred during the evolution of this archaeal microorganism, and highlight sets of genes that may have been associated with each transition. We use this comparative context as a lens to focus future research on archaea-specific uncharacterized essential genes that may provide valuable insights into the evolutionary history of cells.


76
Essential genes were predicted to be significantly underrepresented in the insertion locations 77 extracted from the transposon mutagenesis and sequencing data (Tn-seq). It is important to note 78 that this may make them indistinguishable from genes that are not strictly essential for growth, 79 but instead cause a severe growth defect, and thus our definition of "essential" extends to these used a combination of two programs: ESSENTIALS 18 and Tn-Seq Explorer 19 . Both methods 82 report essential gene candidates by separating essential and non-essential genes into a bimodal 83 distribution of scores. ESSENTIALS does so by calculating a log ratio of observed and 84 expected reads in each gene (log2FC), while Tn-Seq Explorer uses a sliding window approach 85 to examine the absolute number of insertions in and around genes and calculates an Essentiality 86 Index (EI) for each. The former tends to underestimate the number of essential genes, while the 87 latter tends to overestimate 19 . 445 genes lie within the suggested range for both methods (log2FC 88 ≤-5.1 and EI<4), leaving 178 genes within only one range, or "unassigned" as essential or non-89 essential. The remaining 2,105 protein-coding genes are likely non-essential for growth under 90 these conditions ( Fig. 1b and   of cdvB, may be incorrectly called essential in our Tn-seq analysis. We can readily obtain cdvB3 105 disruption mutants ( Supplementary Fig. 3b) and the growth of a cdvB3 mutant strain is 106 indistinguishable from the wild-type strain (data not shown), thus this gene was removed from 107 the essential gene list. An explanation of why this gene is mischaracterized would require 108 further investigation, but it is possible that, because the score distributions for essential and 109 non-essential genes overlap, this gene was simply not hit enough times to achieve significance. To further investigate our automated assignments, we screened eight "unassigned" genes in S.

114
islandicus M.16.4 that were called essential by one method or the other but not both. We were 115 unable to obtain mutants for six of them. Of these, five genes, i.e., lig (M164_1953), priL 116 (M164_1568), priX (M164_1652), rnhII (M164_0197), and tfs2 (M164_1524) were called 117 essential via EI but not log2FC, while thrS1 (M164_0290) was called essential based on log2FC 118 but not EI. In contrast, knockouts of the two "unassigned" genes called essential by EI but not 119 log2FC, udg4 (M164_0085), encoding uracil-DNA glycosylase family 4, and rpo8 incubation of transformation plates, again consistent with a severe growth defect 122 ( Supplementary Fig. 2b, 2c, and 3b). This suggests the presence of false negatives and a 123 stronger bias to underestimate than overestimate the true number of essential genes. Because 124 not all genes in the unassigned categories were genetically tested, we conservatively excluded 125 all unassigned genes from the essential gene list. By contrast, knockouts for all 76 non-essential 126 genes tested were successfully obtained and verified by PCR analysis (Supplementary Table 3 127 and Supplementary Fig. 3). These include hjm/hel308a (M164_0269), cdvB1 (M164_1700), 128 topR1 (M164_1732), and three DExD/H-box family helicase genes (M164_0809, M164_2103, 129 and M164_2020), the homologs of which were previously thought to be essential in a related 130 strain S. islandicus Rey15A 21-24 (Supplementary Table 3

157
The cellular function of the Sulfolobus S-layer is unknown, but is believed to provide resistance 158 to osmotic stress and contribute to cell morphology 28 . S-layer deficient mutants have never been 159 successfully cultivated before in any archaeal species, therefore it was assumed to be essential. 160 of slaA, slaB, and slaAB via a MID (marker insertion and unmarked target gene deletion) 163 recombination strategy 32 . PCR amplification with two primer sets, which bind the flanking and 164 internal region of S-layer genes, respectively (Fig. 3a), confirmed the successful deletion of 165 slaA, slaB, and slaAB from the chromosome of the genetic host RJW004 (wild type) (Fig. 3b).

166
We next tested for absence of the S-layer proteins in growing cells. Isolation of a white 167 precipitant, described as the S-layer previously 33 , was possible only in the wild type and to a 168 much lesser extent in the ΔslaB mutant strain (Supplementary Fig. 4a and 4b). Transmission 169 electron microscopy (TEM) analysis confirmed this extracted protein precipitate from both wild 170 type and ΔslaB formed crystalline lattice structures ( Supplementary Fig. 4c). Finally, we tested

202
Unlike Euryarchaota and most extremely thermophilic bacteria, Crenarchaeota possess two 203 copies of reverse gyrase 35,36 , both believed to be essential for growth 21,37 . Tn-seq analysis 204 indicated that the topR1 (M164_1732) was non-essential, which was confirmed by a successful 205 disruption ( Supplementary Fig. 1b). Interestingly, as mentioned above, topR2 (M164_1245) 206 was called essential but we could obtain topR2 disruption mutants ( Supplementary Fig. 1c Fig. 7 and 8). Together these data 263 support four primary stages in the evolution of the contemporary S. islandicus cells and allow 264 us to assign specific essential genes to these potential transitions in the evolution of the cell. 3 * Genes are put into a category if they are present in >50% of the organisms in each group, i.e. universal is in >50 % of each of Bacteria, Archaea and Eukarya groups. "Other" refers to genes that do not meet these criteria. ** NOG categories "Function unknown" or "General functional prediction only". Full list shown in Supplemental Table 6 266 267 The highest number of essential genes are shared broadly across the tree of life (Universal in 268

390
Tn-seq data processing and analysis 391 Illumina FASTQ reads from all three libraries that were fewer than 50 bp in length, had a quality score below 30, 392 and did not contain the 23-bp transposon sequence were removed. The remaining reads were stripped of transposon 393 and adapter sequence and aligned to the S. islandicus M.16.4 genome (NC_012726) using the Burrows-Wheeler 394 Bowtie 2 alignment tool 56 . Reads that mapped to multiple locations in the genome or to ambiguous sites were set 395 aside, as were those with an alignment length less than 11 base pairs. Using in-house software, the resulting .sam 396 alignment files were converted to lists that included unique insertion locations, the strand to which they aligned, and 397 the number of reads associated with that event (Supplementary Dataset 1). Insertions that occurred in the same 398 location but on different strands or in separate libraries were considered independent events. Tn5 transposase has 399 been shown to prefer certain insertion sites over others 57 , so each reported site was extracted and nucleotide  Table 1). The program uses a sliding window approach and returns an essentiality 422 index (EI) based on the number, location, and spatial concentration of insertion sites within each individual gene. It 423 also allows for the adjustment of the stated start and end points of the gene. As is default, insertions in the first 5% 424 and last 20% of genes were excluded to compensate for misannotated start codons and proteins for which C-terminal 425 deletions are tolerated, respectively. The program suggested an EI maximum of 3 (Fig. 1b).      . Points indicate individual genes plotted according to the scores returned by each program. Histograms indicate the number of genes of a particular score, and the dotted lines indicate the recommended cutoffs returned by each program as the local minimum between the essential and non-essential score distributions. Essential genes meet both criteria (lower-left quadrant) The protein-coding genes that only met the ESSENTIALS or Tn-Seq-Explorer criteria were deemed as "unassigned candidates" leaving the rest as likely non-essential to S. islandicus M.16.4 growth under these conditions. A complete list of the log 2 FC and EI for the S. islandicus M.16.4 genes from the combined mutant libraries are provided in Supplemental Dataset 2.