Targeted base editing in the plastid genome of Arabidopsis thaliana

Bacterial cytidine deaminase fused to the DNA binding domains of transcription activator-like effector nucleases was recently reported to transiently substitute a targeted C to a T in mitochondrial DNA of mammalian cultured cells1. We applied this system to targeted base editing in the Arabidopsis thaliana plastid genome. The targeted Cs were homoplasmically substituted to Ts in some plantlets of the T1 generation and the mutations were inherited by their offspring independently of their nuclear-introduced vectors.

tions are expected to be the best way to make desired single nucleotide polymorphisms (SNPs) without disturbing any other genes or regulatory regions in the plastid genomes of common crops or elite lines. For the targets, we selected three genes whose modifications would be expected to lead to observable effects; 16S rRNA, whose modification was expected to confer resistance to an antibiotic and two genes whose modifications would lead to poor growth; rpoC1, which encodes a part of the DNA-directed RNA polymerase subunit beta′; and psbA, which encodes photosystem II (PSII) protein D1.
As in the previous study 1 , the CD domain (163 amino acids (aa)) of Burkholderia cenocepacia DddA toxin (1,427 aa) was split at the 1,333th or 1,397th amino acid. Each of the amino terminal or carboxy terminal halves of the CD was linked to the C terminus of the DNA binding domain of the platinum TALEN 6 (pTALECD; Fig. 1a). The N terminus of the pTALECD was linked to a plastid-targeting signal peptide (PTP) of Arabidopsis thaliana RecA1 protein (51 aa) 7,8 (Fig. 1b), while the C terminus was linked to an uracil glycosylase inhibitor (UGI) 1,9 to inhibit hydrolysis of the generated uracil (Fig. 1b). The nucleotide sequences of CD and UGI were optimized for A. thaliana codon usage. A pair of PTP-pTALECD-UGIs (ptpTALECDs) were expressed in a single plant transformation vector under control of efficient RPS5A promoters 10 (Fig. 1b). We established a system to smoothly assemble the complicated tandem expression vectors of ptpTALECD for each target sequence on the Ti plasmid (Extended Data Fig. 1) by replacing the FokI in the vectors used in a previous study 11 with CD-UGI (Extended Data Fig. 2). We introduced the vectors into the nucleus of A. thaliana by floral dipping 12 and attempted to substitute C/G to T/A in 16S rRNA (Fig. 1c), rpoC1 (Fig. 1d) and psbA (Fig. 1e). Substitution of G 5 and/ or G 8 in 16S rRNA (highlighted in red in Fig. 1a) to A would confer spectinomycin (Spm) resistance (below) 13,14 while substitutions of C 6 in rpoC1 and C 10 in psbA to Ts would lead to changes in initiation codons from ATG (methionine) to ATA (isoleucine). As a result, accumulation of their coding proteins would decrease and mutants would grow poorly 15,16 and/or be unable to grow photoautotrophically 17,18 . Other neutral mutations in some C/G pairs in the target windows, which are the regions between the sequences that the left and right transcription activator-like effector (TALE) domains recognize, would also be expected.
Twelve ptpTALECD expression vectors were constructed (four pairs of CD halves for each of three targets). Each vector was introduced into A. thaliana and, 23 days after stratification (DAS), the targeted regions of the T 1 plants were sequenced by Sanger sequencing. Only constructs from which T 1 plants were obtained are shown in Fig. 1c-e and Supplementary Table 1a-c. In all three target windows, C/G pairs were replaced with T/A pairs in multiple T 1 lines (Fig. 1c-h and Supplementary Table 1a-c). Surprisingly, in many lines, the targeted base(s) seemed to be homoplasmically substituted (homo), while in other lines, they seemed to be heteroplasmically or chimaerically substituted (h/c; Fig. 1c-h and Supplementary Table 1a-c). Such homoplasmic mutations might have occurred through stochastic sorting processes, such as selection of mutations in a small number of copies of plastid genomes (that is, plastid sorting) 19 or gene conversion 20 . Nevertheless, it is also conceivable that ptp-TALECD mutated the C/G pairs in the target windows of all plastid genomes at the early stage of embryogenesis because the RPS5A promoter used for ptpTALECD expression was reported to highly drive gene expression in egg cells and early embryos 21,22 . Not all C/G pairs in the target windows were substituted and the positions of the substituted C/G pairs were biased for all the three target windows (Fig. 1c-e). Three homoplasmically substituted bases were C of (5′) TC(3′) (Cp in Fig. 1c-e), which was the preferential target 1 but a C of (5′)AC(3′) in 16S rRNA gene was also substituted (Fig. 1c).
To investigate the stability of mutation rates during plant development, total DNAs extracted from an emerging leaf of T 1 plants at 11

NATurE PlANTS
formation 19 ; that is, it had differently coloured sectors (wild-type-like green and pale) and the mutation rate at the Cp* in 16S rRNA differed between the sectors (Extended Data Fig. 3a,b). Remarkably, most of the bases homoplasmically substituted at 11 DAS were also homoplasmically substituted at 23 DAS (86.4%, 19/22; Fig. 1i), suggesting that the targeted bases of T 1 plants transformed by the ptp-TALECD expression vector were homoplasmically substituted at a high frequency and that the homoplasmic mutations were stably fixed through development.
Next, we investigated the off-target effects of ptpTALECD on the plastid and mitochondrial genomes, because both organelle genomes are maternally inherited, so off-target mutations in these two organelle genomes cannot be segregated from the desired mutation by usual cross-breeding. The total genomes of 17 T 1 plants were sequenced (Novaseq, illumina) and Fig. 2a 1397CN 8, 9, 13, 16), while it was either heteroplasmically or chimaerically substituted in the remaining one (rpoC1 1397CN 3). Each redundant mutation in the inverted repeats of the plastid genome was counted as one mutation. The targeted bases in these 16 lines were confirmed to be homoplasmically or dominantly substituted and the base in the remaining one line was confirmed to be heteroplasmically or chimaerically substituted (  Table  2). The 16S rRNA 1397CN 1 did not have any true leaves (Extended Data Fig. 4) and died before 23 DAS. A total of 116 off-target mutations (allele frequencies ≥1%) were detected in the plastid genomes ( Fig. 2b-f and Supplementary Table 2). Most (69.0%) were located within 2,000 base pairs (bp) of the target windows while only a few (11.2%) were located within 20 bp of sequences similar to those recognized by TALEs. The rest were found in other regions. No dominant off-target mutations were detected in the mitochondrial genomes of these 17 lines, including 16S rRNA 1397CN 1 (Supplementary Table 2). These results indicate that ptpTALECD only infrequently introduced off-target point mutations into organelle genomes and can specifically and homoplasmically (or dominantly) substitute C/G to T/A in the target windows.
All but one of the T 1 plants that were transformed by the 16S rRNA-targeting ptpTALECD vector and whose first Cp* (G 5 ) and/or C 10 was homoplasmically substituted were fertile (Supplementary Table 1a). The exception was 16S rRNA 1397CN 1. To investigate whether the mutations were stably inherited by the offspring, the T 2 progenies of three of these lines (16S rRNA 1397CN 2, 8 and 1397NC 3) were genotyped ( Fig. 3a and Extended Data Fig. 5a). Transgenic T 2 plants were identified by having seed-specific green fluorescent protein (GFP) fluorescence from Ole1 pro::Ole1-GFP (ref. 23 ) on the transfer DNA (T-DNA; Fig. 1b) and/or a positive polymerase chain reaction (PCR) result showing the presence of the ptpTALECD reading frame ( Fig. 1b and 3a). Both progeny stably inherited the homoplasmic mutations ( Fig. 3a and Extended Data Fig. 5a). Interestingly, some T 2 plants had white, red or variegated cotyledons ( Fig. 3b and Extended Data Fig. 5b), which were different from the phenotypes of their parents (Extended Data Fig. 4). All of these plants were GFP positive ( Fig. 3a and Extended Data Fig. 5a) and many of them (8/9) had additional mutation(s) in or near the target window of 16S rRNA (Extended Data Fig. 5a). Because, as mentioned above, the RPS5A promoter used for ptpTALECD expression was reported to highly drive gene expression in egg cells and early embryos 21,22 , de novo mutagenesis may have occurred during the early developmental stage in these transgenic T 2 plants with abnormal cotyledons. In contrast, all T 2 plants of the T-DNA-free null segregants examined had the targeted mutations without any of the additional altered phenotypes described above. No major off-target mutations were detected in the three null-segregant T 2 plants (16S rRNA 1397CN 8 lines 1, 2 and 8), whose genomes were sequenced by next generation sequencing (Fig. 2a and Supplementary Table 2). Homoplasmic mutations in rpoC1 (G 3 ) and psbA (C 10 ) in other lines were also inherited by their T 2 progeny (Extended Data Figs. 6a and 7a). These data indicate that plastid genomes with artificially introduced point mutations were stably inherited, independently of nuclear T-DNA inheritance and also suggest that transgene-free plants with targeted point mutations in the plastid genomes were successfully established.
The antibiotic Spm binds to a specific location in Escherichia coli 16S rRNA and inhibits translation 24 . Substitution of a specific G near this region to A confers Spm resistance (Spm r ) 13 . The targeted G 5 in the Arabidopsis plastid 16S rRNA gene is homologous to this G. Several mutations are known to confer Spm r to flowering plants 25,26 but none of them occur at the position of the targeted G 5 . T 2 seeds obtained from a T 1 plant in which G 5 was homoplasmically substituted to A (16S rRNA 1397CN 2; Supplementary Table  1a) were sown on plates containing Spm. Many of the seedlings that germinated from these seeds showed Spm r , regardless of the presence of seed GFP fluorescence (  1397CN 15) that had the G 5 mutation at very low frequency at 11 DAS and no mutation at 23 DAS (Supplementary  Table 1a) also showed Spm r . These progeny germinated from GFP-positive seeds (Fig. 3c). In five of them, the G 5 was homoplasmically substituted to A and in 13 others it was dominantly substituted to A (Extended Data Fig. 9). This suggests that the inherited nuclear T-DNA caused a major de novo mutation on the G 5 . These results suggest that homoplasmic substitution of G 5 to A confers Spm r to A. thaliana. Furthermore, the result that the GFP-negative T 2 progeny showed Spm r or Spm s phenotype that was predictable from SNPs at G 5 in the T 1 plants showed that the null-segregant T 2 plants were likely to inherit mutation(s) that their parent had and not likely to have additional mutations.
Previous studies showed that accumulation of D1 protein (encoded by psbA) and/or the maximum quantum yield of PSII (Fv/Fm) drastically decreased in mutants deficient in psbA expression 17,18 . Furthermore, these mutants looked pale 17 and could not grow photoautotrophically 17,18 . Surprisingly, psbA 1397NC 1, which had the homoplasmic mutation at the psbA initiation codon (C 10 ) at both 11 and 23 DAS (Supplementary Table 1c), could grow photoautotrophically and set viable seeds. Thus, to investigate the effects of the homoplasmic mutation at the psbA initiation codon (C 10 ) on its expression, we measured Fv/Fm and the accumulation of D1 protein in T-DNA-free null-segregant T 2 progeny of psbA 1397NC 1, which were confirmed to inherit the homoplasmic mutation. Unexpectedly, their growth (Extended Data Fig. 6b,c) and accumulation of D1 protein (Extended Data Fig. 6d-f) were comparable to those in wild-type plants, while Fv/Fm was only slightly decreased compared with wild-type plants (Extended Data Fig. 6g). One possibility is that another codon served as the initiation codon. It could be another AUG, or possibly a GUG or UUG, which can also serve Letters NATurE PlANTS as start codons in the chloroplast 27 . Upstream of the altered AUA, no such sites occur after the nearest stop codon. Downstream, the next potential start codons would shorten the protein by at least 10% but they can be excluded because the recombinant protein was the same size as the wild-type protein (Extended Data Fig. 6d). These results suggest that the AUA codon does not greatly affect the initiation

NATurE PlANTS
of translation of psbA or the D1 level but that the AUG codon is necessary for the full activity of PSII. Thus, a better way to knock out a plastid gene might be to create a premature stop codon in its reading frame rather than to change the initiation codon to AUA.
In rpoC1, none of the homoplasmic mutations that were obtained were at the initiation codon as expected. Instead, they were at the second codon where they caused a synonymous mutation (Ile to Ile; Fig. 1d and Supplementary  progeny of rpoC1 1397CN 8, which had the synonymous homoplasmic mutation at both 11 and 23 DAS, inherited the homoplasmic mutation and appeared to grow as well as wild-type plants (Extended Data Fig. 7b,c). These experiments showed that ptpTALECD could specifically introduce homoplasmic C-to-T mutations in target windows in the A. thaliana plastid genome and that the mutations were stably (and probably maternally) inherited by the progeny seeds. Previous attempts to introduce homoplasmic mutations in mammalian mitochondrial genomes were unsuccessful 1,28 . The method was also successful in a region of inverted repeats, where mutations are thought to occur at a lower rate due to their greater potential for copy correction 29 ; 16S rRNA occurs in inverted repeats and targeted point mutations were successfully introduced in both copies. Compared to traditional methods for plastid transformation, such as biolistic methods, ptpTALECD technology has three advantages. First, it allows plastid-genome editing of A. thaliana without using specific mutants 3,4 or a specific ecotype 30 and without tissue culture, which is a major obstacle to plastid transformation. Second, it could probably be used to edit plastid genomes of other plant species that are recalcitrant to plastid transformation but amenable to nuclear transformation. And third, it could be used to create plastid-genome-edited plants without leaving any foreign gene in their genomes. Such plants are not regarded as GMOs in several countries. On the other hand, the ptpTALECD method has some problems with respect to accuracy. For example, unwanted substitutions in the target windows occurred (C 10 in 16S rRNA and G 3 in rpoC1; Fig. 1c,d), while homoplasmic mutations at some special target C/G pairs in the target windows were not introduced (G 8 in 16S rRNA and C 6 in rpoC1; Fig. 1c,d). These problems might be avoided by sliding the TALE recognition targets a few base pairs upstream or downstream or by using different sizes of target windows or by optimizing the sequences linking the TALE and CD 31 . In any case, only a few mutations in this study were off target. We also obtained null-segregant T 2 plants that had the targeted homoplasmic mutation but had no off-target mutations ( Fig. 2a and Supplementary Table 2).
This technology may also be useful for strengthening agronomic traits. For example, amino acid polymorphisms in the plastid-encoded ribulose 1,5-bisphosphate carboxylase/oxygenase (Rubisco) large subunit are expected to affect the carbon assimilation (and oxidation) rate 32,33 and some polymorphisms in psbA (not involving the C 10 in this study) enhance herbicide resistance 34 . In addition, null-segregant plants are not regarded as GMOs in some countries and the introduced mutations would not leak out of the pollen 2,5 . Therefore, plants with their plastid genomes precisely edited by ptpTALECD might be more acceptable to the public. Also, this technology could be used for creating premature stop codons, substituting amino acids and modifying RNA editing sites. Thus, ptpTALECD technology has the potential to accelerate both plant breeding and basic research on plastid-encoded genes. Designing the TALE binding sequence. The TALE targeting sequence was designed to be on both sides of the CD targeting window, with Old TALEN Targeter (https://tale-nt.cac.cornell.edu/node/add/talen-old). The first recognized base was required to be adjacent to the 3′ side of 'T' as far as possible. The minimum length of TALE targeting sequence was 15 bp so that TALE would specifically bind the sequence. All the sequences that TALE binds and the target windows between the TALE binding sequences are shown in Supplementary Table 3 and Fig. 1c-e. Fig. 1b) for each target was constructed by using Platinum Gate assembling kit and multisite Gateway (Thermo Fisher) as described in our previous study of mitochondria-targeted TALEN 11 .

Vector constructions. A pair of left and right ptpTALECDs in Ti-plasmids (Extended Data
The DNA binding domains of ptpTALECD were assembled with the Platinum Gate TALEN system 6 on the basis of the same previous study 11 (Extended Data Fig. 1a). Each FokI coding sequence in the previous vectors of mitoTALENs used for assembly-step2 was replaced in advance by the CD half and UGI coding sequence with In-Fusion HD Cloning Kit (TaKaRa; Extended Data Fig. 2). The CD half and UGI coding sequences were designed to encode the same amino acids as those of Mok's experiment 1 and artificially synthesized by Eurofins Genomics with the codon usage optimized for A. thaliana (https://www.eurofinsgenomics. jp/jp/orderpages/gsy/gene-synthesis-multiple/; Supplementary Table 4). The reading frames in the assembled first and third entry vectors and the second entry vector (below) were transferred into the Ti plasmid 10 by a multi-LR reaction with LR Clonase II Plus enzyme (Thermo Fisher Scientific; Extended Data Fig. 1b). The second entry vector had an Arabidopsis heat-shock protein terminator 35 , an Arabidopsis RPS5A promoter and the N terminal (51 aa) PTP of Arabidopsis RECA1 (refs. 7,8 ; Extended Data Fig. 10a). This Ti plasmid was made from a Gateway destination Ti plasmid pK7WG2 (ref. 36 ) by replacing the CaMV 35S promoter with the Arabidopsis RPS5A promoter and inserting the PTP coding sequence and Ole1 pro::Ole1-GFP derived from pFAST02 (ref. 23 ; Extended Data Fig. 10b; http:// www.inplanta.jp/pfast.html). All primers used for vector construction are listed in Supplementary Table 5. All plasmids are deposited in Addgene and their sequences are also available in Addgene (ID 171723-171736).

Plant transformation and screening transformants.
Col-0 plants were transformed by floral dipping 12 with Agrobacterium tumefaciens strain C58C1 that harboured one of the transformation vectors described above. Transgenic T 1 seeds were selected at first by observing seed GFP fluorescence. GFP-positive seeds were sown on the 1/2 MS medium (section Plant material and growth conditions) further containing 125 mg l −1 of claforan. In addition, GFP-negative seeds were sown on the 1/2 MS medium containing 50 mg l −1 of kanamycin and 125 mg l −1 of claforan.

Sanger sequencing and next generation sequencing and their analyses. Total
DNAs were extracted from an emerging true leaf or a cotyledon of the selected seedlings with Maxwell RSC Plant DNA Kit (Promega). To genotype transgenic lines, plastid DNA sequences adjacent to the CD targeting windows were amplified with primer sets (Supplementary Table 6). Purified PCR products were subjected to Sanger sequencing (Eurofins Genomics) to detect substitution of the targeted bases. The data were analysed with Geneious prime (v.2020.2.2).
We called SNPs in the plastid and mitochondrial genomes using total DNA sequenced data. First, we ordered Macrogen Japan to prepare paired-end libraries using a Nextera XT DNA library Prep Kit (Illumina) and sequenced using Illumina NovaSeq 6000 platform. As preprocess for analysis, low-quality and adaptor sequences in the reads were trimmed using Platanus_trim v.1.0.7 (http:// platanus.bio.titech.ac.jp/pltanus_trim). Pair-end reads of each strain were mapped to reference sequences (AP000423.1 and BK010421.1) using BWA (v.0.7.12) 37 in single-ended mode. We filtered out inadequate mapped reads with mapping identities ≤97% or alignment cover rates ≤80%. SNPs were then called using samtools mpileup command (-uf -d 30000 -L 2000) and bcftools call command (-m -A -P 0.1) 38 . We finally listed positions in which variants with allele frequencies (AFs) ≥0.1 were detected in at least one strain including the WT (Fig. 2a). SNP calls with AFs ≥0.01 were also performed for positions with read depths ≥500 (Supplementary Table 2).
To evaluate whether closeness to target sites or similarity to TALE sequences influenced the locations of off-target mutations, we tallied off-target mutations that were either within 2,000 bp of the target site or within 20 bp of sequences ≥70% similar to those recognized by one of the TALEs.
Genotyping T 2 plants. T 2 seeds gained from several T 1 lines were sown on the 1/2 MS medium (section Plant material and growth conditions). The genotypes of the target windows of a cotyledon of the 7 DAS (for Fig. 3a and Extended Data Figs. 5a, 6a and 7a) or 13 DAS (for Extended Data Fig. 9) seedlings were determined in the same way as determining those of T 1 plants (above). The ptpTALECD PCRs were performed with primers described in Supplementary Table 6.
Screening Spm-resistant plants. T 2 seeds obtained from a T 1 line of which G 5 in 16S rRNA was homoplasmically substituted at 11 and 23 DAS and control seeds were sown on the 1/2 MS medium (section Plant material and growth conditions) containing 0, 10, 50 or 100 mg l −1 of Spm (without Plant Preservative Mixture for Extended Data Fig. 8a-d). Phenotypes of germinated seedlings were observed on 8 DAS.
Measurement of chlorophyll fluorescence. Chlorophyll fluorescence was measured using a MINI-pulse-amplitude modulation portable chlorophyll Maximum quantum yield of PSII was calculated as F v /F m . These procedures were done independently three times (experimental replicates = 3). In each replicate, four plants of each genotype (Col-0 and psbA 1397NC 1 T 2 ) were analysed, average values and standard errors were calculated and F v /F m values of the two groups were tested by two-tailed Welch's test.
SDS-polyacrylamide gel electrophoresis and immunoblot analyses. Leaf extract was prepared by grinding the rosette leaves using mortar and pestle in an ice-cold buffer (20 mM Tricine (pH 8.4) containing 330 mM sorbitol, 10 mM NaHCO 3 , 5 mM EGTA and 5 mM EDTA). After filtration with two layers of Miracloth, intact chloroplasts were collected by centrifugation for 5 min at 4,800g. The purified chloroplasts were ruptured in a buffer (20 mM HEPES-KOH (pH 7.6), 5 mM MgCl 2 , 2.5 mM EDTA and complete ULTRA protease-inhibitor cocktail (Roche)). The insoluble fraction containing thylakoids and envelopes was separated from the soluble fraction by centrifugation for 2 min at 15,000g and resuspended in the above buffer. The concentration of chlorophyll was determined as described previously 39 . Chloroplast thylakoid and membrane proteins were solubilized in SDS-PAGE sample buffer. Proteins solubilized from the thylakoid membrane corresponding to 1-2 μg of chlorophyll were separated by 12.5% (w/v) SDS-PAGE and electrotransferred onto polyvinylidene fluoride membranes. The antibodies were added and the protein-antibody complexes were labelled using the ECL Prime western-blotting detection system (GE Healthcare). The chemiluminescence was detected with a lumino-image analyser (LAS4000, GE Healthcare). Anti-PsbA and anti-AtpB were purchased from Agrisera. Anti-PetA and anti-PsbO were kindly provided by A. Makino (Tohoku University, Japan) and T. Endo (Kyoto University, Japan), respectively.  Table 4) and entry vectors for step2 assembly containing FokI coding sequence were amplified with primes corresponding to the template (Supplementary Table 5). The purified PCR products were mixed with 5× In-Fusion HD Cloning Enzyme Premix (TaKaRa) and incubated at 50 °C for 15 minutes.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability All data supporting the findings of this study are available in the article and the supplementary information. Vector sequences will be deposited in GenBank and addgene. Accession numbers of the reference sequences for organelle genome used in this study are AP000423.1 (plastid) and BK010421.1 (mitochondrion).

nature research | reporting summary
April 2020 Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
At least seven T1 plants for each targeted gene were analyzed. At least eight T2 progeny of three T1 lines were genotyped. To investigate spectinomycin resistance, at least 30 seeds per line were sown on one plate. The number of T1 plants used was determined by the maximum number that could be generated in our growth cabinets. The numbers of T2 seeds used was determined by the the maximum number of seeds (T2) available at the time of the experiment.
Data exclusions No data exclusion.

Replication
For each targeted gene, at least seven T1 lines were genotyped and targeted bases were successfully substituted in at least three lines. Total DNA NGS was successfully performed on 17 T1 lines, 3 T2 plants and 3 wild-type plants. At least eight T2 progeny of the seven T1 lines were genotyped and all data are shown. Spectinomycin resistance was assayed four times, with the same results. Chlorophyll fluorescence measurements and Immunoblot experiments were successfully done in triplicate and duplicated respectively.
Randomization All T1 plants were subjected to Sanger sequencing. The samples that were subjected to total DNA NGS were chosen randomly among the plants that appeared to have a homoplasmic mutation at the targeted base based on Sanger sequencing. T2 seeds used for genotyping were selected randomly among those whose T1 parents had a homoplasmic mutation in the target window. T2 seeds used for spectinomycin resistance assay were selected randomly among those whose T1 parents had a homoplasmic mutation at G5 in 16S rRNA. T2 progeny of psbA 1397NC 1 that were subjected to chlorophyll measurements and immunoblot experiments were selected randomly among the null segregants of T-DNA.

Blinding
Blinding was not required because all analyses including genotyping by sequencing and phenotyping by antibiotics and molecular methods could be carried out without making any subjective judgements.