Designing efficient genetic code expansion in Bacillus subtilis to gain biological insights

Bacillus subtilis is a model gram-positive bacterium, commonly used to explore questions across bacterial cell biology and for industrial uses. To enable greater understanding and control of proteins in B. subtilis, here we report broad and efficient genetic code expansion in B. subtilis by incorporating 20 distinct non-standard amino acids within proteins using 3 different families of genetic code expansion systems and two choices of codons. We use these systems to achieve click-labelling, photo-crosslinking, and translational titration. These tools allow us to demonstrate differences between E. coli and B. subtilis stop codon suppression, validate a predicted protein-protein binding interface, and begin to interrogate properties underlying bacterial cytokinesis by precisely modulating cell division dynamics in vivo. We expect that the establishment of this simple and easily accessible chemical biology system in B. subtilis will help uncover an abundance of biological insights and aid genetic code expansion in other organisms.

D 0 The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement D 0 A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 0 D The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.

D
A description of all covariates tested 0 D A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons D 0 A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) l'i7 D For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Software and code
Policy information about availability of computer code Data collection

Data analysis
Gen5 (basic version, BioTek) was used to collect all plate reader data. XCalibur Software 4.3 (Thermo Fisher Scientific) was used to collect all mass-spectrometry data. NIS-Elements software version 5.02.01 was used to generate all microscopy elements except the sporulation images, which were generated with Zen 2.0 software (Zeiss). Chemidoc analysis was done with Image Lab Touch 2.2.
Sequest (Proteome Discoverer, Thermo Fisher Scientific) was used to analyze mass-spectrometry data. Microsoft Excel (365) was use d to analyze plate reader data, and Graphpad Prism 9 was used to generate plots, calculate standard deviations and perform best-fit titration analysis. Division microscopy data was analyzed with Matlab, version 2018b with Morphometrics and DeepCell packages, as well as FIJI, version 1.53, including the plugin TrackMate, and custom code available at https:// bitbucket.org/garnerlab/squyres-2020/src/ master/. Gel and microscopy data was analyzed and prepared for presentation with lmageJ with FIJI, version 1.53.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our QQ1ig,_ All the DNA sequence data used in this manuscript is provided in the Supplementary Information files, and all raw numerical data and gel images obtained from measurements in this study are also provided as Supplementary files associated with this manuscript. Source data are provided with this paper this paper.

Fie Id-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
Data exclusions

Randomization Blinding
We applied the precedent in the field of N=3 for plate reader, time-course and nsAA concentration liquid chromatography/ mass-spectrometry experiments, which generally contained low variation among sample replicates within an experiment. No calculation was used to determine sample size. However, these experiments were repeated and exhibited very similar results. The other experiments, including mass spectrometry experiments, microscopy experiments and crosslinking used a sample size of N=1. We did not use a calculation to determine sample size, and we did not always repeat experiments. However, the analytical techniques, analysis methods, and/or controls that we used give us high confidence in the results that were obtained.
Many more experiments were performed than are presented. Some of these were replicate experiments with similar results, and many were negative experiments with attempts to use different conditions or nonstandard amino acids tor various purposes. For instance, click-labelling of incorporated pAzF on the cell surfaces does not get around background incorporation, the standard MaPylRS cannot efficiently incorporate the photocrosslinker AbK, external Apidaecin peptides do not improve nsAA incorporation in B. subtilis, knocking out PrmC improves nsAA incorporation efficiency but is too toxic to use for most applications, and more. These experiments are excluded tor the sake of concision and because generating publication-quality data tor these negative results would be unacceptably costly in time and materials.
Typically plate reader experiments were conducted multiple times, often with slightly different experimental or procedural elements in an attempt to optimize data collection. I in general, the findings observed in this paper were observed in all experiments, with exceptions where cells didn't grow or contamination was believed to have occurred. Mass-spectrometry, gel blot and microscopy data was sometimes replicated in this fashion, as noted in the Statistics and Reproducibility section of the manuscript.
Experimental samples organized into different experimental groups were generated from the same source. If one strain was exposed to +nsAA and -nsAA conditions, a single culture of that strain was split immediately before nsAA was added.
All experiments were unblinded, and strains referred to by their strain numbers throughout experimental procedures, data collection and analysis. This is due to the difficulty of fully blinding rapidly-iterated experiments and the unlikeliness that researcher bias in treatment of bacterial strains would result in the replicable effect sizes observed in this manuscript.

Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. Goat anti-rabbit come from Abcam, catalog # ab6721