Letter | Published:

A genome-wide algal mutant library and functional screen identifies genes required for eukaryotic photosynthesis



Photosynthetic organisms provide food and energy for nearly all life on Earth, yet half of their protein-coding genes remain uncharacterized1,2. Characterization of these genes could be greatly accelerated by new genetic resources for unicellular organisms. Here we generated a genome-wide, indexed library of mapped insertion mutants for the unicellular alga Chlamydomonas reinhardtii. The 62,389 mutants in the library, covering 83% of nuclear protein-coding genes, are available to the community. Each mutant contains unique DNA barcodes, allowing the collection to be screened as a pool. We performed a genome-wide survey of genes required for photosynthesis, which identified 303 candidate genes. Characterization of one of these genes, the conserved predicted phosphatase-encoding gene CPL3, showed that it is important for accumulation of multiple photosynthetic protein complexes. Notably, 21 of the 43 higher-confidence genes are novel, opening new opportunities for advances in understanding of this biogeochemically fundamental process. This library will accelerate the characterization of thousands of genes in algae, plants, and animals.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Code availability

All programs written for this work have been deposited at GitHub (see URLs).

Data availability

Insertion details and distribution information for mutants are available through the CLiP website at https://www.chlamylibrary.org/. The mass spectrometry proteomics data on the cpl3 mutant have been deposited to the ProteomeXchange Consortium via the PRIDE71 partner repository with dataset identifier PXD012560. Other data that support the findings of this study are available from the corresponding author upon reasonable request.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Karpowicz, S. J., Prochnik, S. E., Grossman, A. R. & Merchant, S. S. The GreenCut2 resource, a phylogenomically derived inventory of proteins specific to the plant lineage. J. Biol. Chem. 286, 21427–21439 (2011).

  2. 2.

    Krishnakumar, V. et al. Araport: the Arabidopsis information portal. Nucleic Acids Res. 43, D1003–D1009 (2015).

  3. 3.

    Levine, R. P. Genetic control of photosynthesis in Chlamydomonas reinhardi. Proc. Natl Acad. Sci. USA 46, 972–978 (1960).

  4. 4.

    Gutman, B. L. & Niyogi, K. K. Chlamydomonas and Arabidopsis. A dynamic duo. Plant Physiol. 135, 607–610 (2004).

  5. 5.

    Harris, E. H., Stern, D. B. & Witman, G. B. The Chlamydomonas Sourcebook (Academic Press, 2009).

  6. 6.

    Rochaix, J. D. Chlamydomonas reinhardtii as the photosynthetic yeast. Annu. Rev. Genet. 29, 209–230 (1995).

  7. 7.

    Li, J. B. et al. Comparative genomics identifies a flagellar and basal body proteome that includes the BBS5 human disease gene. Cell 117, 541–552 (2004).

  8. 8.

    Silflow, C. D. & Lefebvre, P. A. Assembly and motility of eukaryotic cilia and flagella: lessons from Chlamydomonas reinhardtii. Plant Physiol. 127, 1500–1507 (2001).

  9. 9.

    Li, X. et al. An indexed, mapped mutant library enables reverse genetics studies of biological processes in Chlamydomonas reinhardtii. Plant Cell 28, 367–387 (2016).

  10. 10.

    Terashima, M., Specht, M. & Hippler, M. The chloroplast proteome: a survey from the Chlamydomonas reinhardtii perspective with a focus on distinctive features. Curr. Genet. 57, 151–168 (2011).

  11. 11.

    Pazour, G. J., Agrin, N., Leszyk, J. & Witman, G. B. Proteomic analysis of a eukaryotic cilium. J. Cell Biol. 170, 103–113 (2005).

  12. 12.

    Merchant, S. S. et al. The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science 318, 245–251 (2007).

  13. 13.

    Zones, J. M., Blaby, I. K., Merchant, S. S. & Umen, J. G. High-resolution profiling of a synchronized diurnal transcriptome from Chlamydomonas reinhardtii reveals continuous cell and metabolic differentiation. Plant Cell 27, 2743–2769 (2015).

  14. 14.

    Duanmu, D. et al. Retrograde bilin signaling enables Chlamydomonas greening and phototrophic survival. Proc. Natl Acad. Sci. USA 110, 3621–3626 (2013).

  15. 15.

    Dent, R. M. et al. Large-scale insertional mutagenesis of Chlamydomonas supports phylogenomic functional prediction of photosynthetic genes and analysis of classical acetate-requiring mutants. Plant J. 82, 337–351 (2015).

  16. 16.

    Allen, J. F., de Paula, W. B., Puthiyaveetil, S. & Nield, J. A structural phylogenetic map for chloroplast photosynthesis. Trends Plant Sci. 16, 645–655 (2011).

  17. 17.

    Giordano, M., Beardall, J. & Raven, J. A. CO2 concentrating mechanisms in algae: mechanisms, environmental modulation, and evolution. Annu. Rev. Plant Biol. 56, 99–131 (2005).

  18. 18.

    Goldschmidt-Clermont, M. & Rahire, M. Sequence, evolution and differential expression of the two genes encoding variant small subunits of ribulose bisphosphate carboxylase/oxygenase in Chlamydomonas reinhardtii. J. Mol. Biol. 191, 421–432 (1986).

  19. 19.

    Suzuki, Y., Arae, T., Green, P. J., Yamaguchi, J. & Chiba, Y. AtCCR4a and AtCCR4b are involved in determining the poly(A) length of granule-bound starch synthase 1 transcript and modulating sucrose and starch metabolism in Arabidopsis thaliana. Plant Cell Physiol. 56, 863–874 (2015).

  20. 20.

    Wang, H. et al. The global phosphoproteome of Chlamydomonas reinhardtii reveals complex organellar phosphorylation in the flagella and thylakoid membrane. Mol. Cell. Proteomics 13, 2337–2353 (2014).

  21. 21.

    Bassi, R., Soen, S. Y., Frank, G., Zuber, H. & Rochaix, J. D. Characterization of chlorophyll a/b proteins of photosystem I from Chlamydomonas reinhardtii. J. Biol. Chem. 267, 25714–25721 (1992).

  22. 22.

    Sager, R. & Zalokar, M. Pigments and photosynthesis in a carotenoid-deficient mutant of Chlamydomonas. Nature 182, 98–100 (1958).

  23. 23.

    Baek, K. et al. DNA-free two-gene knockout in Chlamydomonas reinhardtii via CRISPR–Cas9 ribonucleoproteins. Sci. Rep. 6, 30620 (2016).

  24. 24.

    Jiang, W., Brueggeman, A. J., Horken, K. M., Plucinak, T. M. & Weeks, D. P. Successful transient expression of Cas9 and single guide RNA genes in Chlamydomonas reinhardtii. Eukaryot. Cell 13, 1465–1469 (2014).

  25. 25.

    Shin, S. E. et al. CRISPR/Cas9-induced knockout and knock-in mutations in Chlamydomonas reinhardtii. Sci. Rep. 6, 27810 (2016).

  26. 26.

    Slaninová, M., Hroššová, D., Vlček, D. & Wolfgang, W. Is it possible to improve homologous recombination in Chlamydomonas reinhardtii? Biologia 63, 941–946 (2008).

  27. 27.

    Greiner, A. et al. Targeting of photoreceptor genes in Chlamydomonas reinhardtii via zinc-finger nucleases and CRISPR/Cas9. Plant Cell 29, 2498–2518 (2017).

  28. 28.

    Ferenczi, A., Pyott, D. E., Xipnitou, A. & Molnar, A. Efficient targeted DNA editing and replacement in Chlamydomonas reinhardtii using Cpf1 ribonucleoproteins and single-stranded DNA. Proc. Natl Acad. Sci. USA 114, 13567–13572 (2017).

  29. 29.

    Liu, X. L., Yu, H. D., Guan, Y., Li, J. K. & Guo, F. Q. Carbonylation and loss-of-function analyses of SBPase reveal its metabolic interface role in oxidative stress, carbon assimilation, and multiple aspects of growth and development in Arabidopsis. Mol. Plant 5, 1082–1099 (2012).

  30. 30.

    Klein, R. R. & Houtz, R. L. Cloning and developmental expression of pea ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit N-methyltransferase. Plant Mol. Biol. 27, 249–261 (1995).

  31. 31.

    Johnson, X. et al. MRL1, a conserved pentatricopeptide repeat protein, is required for stabilization of rbcL mRNA in Chlamydomonas and Arabidopsis. Plant Cell 22, 234–248 (2010).

  32. 32.

    Wang, L. et al. Chloroplast-mediated regulation of CO2-concentrating mechanism by Ca2+-binding protein CAS in the green alga Chlamydomonas reinhardtii. Proc. Natl Acad. Sci. USA 113, 12586–12591 (2016).

  33. 33.

    Wang, Y. & Spalding, M. H. An inorganic carbon transport system responsible for acclimation specific to air levels of CO2 in Chlamydomonas reinhardtii. Proc. Natl Acad. Sci. USA 103, 10110–10115 (2006).

  34. 34.

    Gao, H., Sage, T. L. & Osteryoung, K. W. FZL, an FZO-like protein in plants, is a determinant of thylakoid and chloroplast morphology. Proc. Natl Acad. Sci. USA 103, 6759–6764 (2006).

  35. 35.

    Martinis, J. et al. ABC1K1/PGR6 kinase: a regulatory link between photosynthetic activity and chloroplast metabolism. Plant J. 77, 269–283 (2014).

  36. 36.

    Kim, E. H., Lee, Y. & Kim, H. U. Fibrillin 5 is essential for plastoquinone-9 biosynthesis by binding to solanesyl diphosphate synthases in Arabidopsis. Plant Cell 27, 2956–2971 (2015).

  37. 37.

    Lefebvre-Legendre, L. et al. Loss of phylloquinone in Chlamydomonas affects plastoquinone pool size and photosystem II synthesis. J. Biol. Chem. 282, 13250–13263 (2007).

  38. 38.

    Wilde, A., Lunser, K., Ossenbuhl, F., Nickelsen, J. & Borner, T. Characterization of the cyanobacterial ycf37: mutation decreases the photosystem I content. Biochem. J. 357, 211–216 (2001).

  39. 39.

    Stockel, J., Bennewitz, S., Hein, P. & Oelmuller, R. The evolutionarily conserved tetratrico peptide repeat protein pale yellow green7 is required for photosystem I accumulation in Arabidopsis and copurifies with the complex. Plant Physiol. 141, 870–878 (2006).

  40. 40.

    Heinnickel, M. et al. Tetratricopeptide repeat protein protects photosystem I from oxidative disruption during assembly. Proc. Natl Acad. Sci. USA 113, 2774–2779 (2016).

  41. 41.

    Lezhneva, L., Amann, K. & Meurer, J. The universally conserved HCF101 protein is involved in assembly of [4Fe-4S]-cluster-containing complexes in Arabidopsis thaliana chloroplasts. Plant J. 37, 174–185 (2004).

  42. 42.

    Meurer, J., Meierhoff, K. & Westhoff, P. Isolation of high-chlorophyll-fluorescence mutants of Arabidopsis thaliana and their characterisation by spectroscopy, immunoblotting and northern hybridisation. Planta 198, 385–396 (1996).

  43. 43.

    Douchi, D. et al. A nucleus-encoded chloroplast phosphoprotein governs expression of the photosystem I subunit PsaC in Chlamydomonas reinhardtii. Plant Cell 28, 1182–1199 (2016).

  44. 44.

    Felder, S. et al. The nucleus-encoded HCF107 gene of Arabidopsis provides a link between intercistronic RNA processing and the accumulation of translation-competent psbH transcripts in chloroplasts. Plant Cell 13, 2127–2141 (2001).

  45. 45.

    Carlotto, N. et al. The chloroplastic DEVH-box RNA helicase INCREASED SIZE EXCLUSION LIMIT 2 involved in plasmodesmata regulation is required for group II intron splicing. Plant Cell Environ. 39, 165–173 (2016).

  46. 46.

    Perron, K., Goldschmidt-Clermont, M. & Rochaix, J. D. A factor related to pseudouridine synthases is required for chloroplast group II intron trans-splicing in Chlamydomonas reinhardtii. EMBO J. 18, 6481–6490 (1999).

  47. 47.

    Rivier, C., Goldschmidt-Clermont, M. & Rochaix, J. D. Identification of an RNA–protein complex involved in chloroplast group II intron trans-splicing in Chlamydomonas reinhardtii. EMBO J. 20, 1765–1773 (2001).

  48. 48.

    Jacobs, J. et al. Identification of a chloroplast ribonucleoprotein complex containing trans-splicing factors, intron RNA, and novel components. Mol. Cell. Proteomics 12, 1912–1925 (2013).

  49. 49.

    Marx, C., Wunsch, C. & Kuck, U. The octatricopeptide repeat protein Raa8 is required for chloroplast trans splicing. Eukaryot. Cell 14, 998–1005 (2015).

  50. 50.

    Link, S., Engelmann, K., Meierhoff, K. & Westhoff, P. The atypical short-chain dehydrogenases HCF173 and HCF244 are jointly involved in translational initiation of the psbA mRNA of Arabidopsis. Plant Physiol. 160, 2202–2218 (2012).

  51. 51.

    Schult, K. et al. The nuclear-encoded factor HCF173 is involved in the initiation of translation of the psbA mRNA in Arabidopsis thaliana. Plant Cell 19, 1329–1346 (2007).

  52. 52.

    Wei, L. et al. LPA19, a Psb27 homolog in Arabidopsis thaliana, facilitates D1 protein precursor processing during PSII biogenesis. J. Biol. Chem. 285, 21391–21398 (2010).

  53. 53.

    Ma, J. et al. LPA2 is required for efficient assembly of photosystem II in Arabidopsis thaliana. Plant Cell 19, 1980–1993 (2007).

  54. 54.

    Komenda, J. et al. The cyanobacterial homologue of HCF136/YCF48 is a component of an early photosystem II assembly complex and is important for both the efficient assembly and repair of photosystem II in Synechocystis sp. PCC 6803. J. Biol. Chem. 283, 22390–22399 (2008).

  55. 55.

    Peng, L. et al. LOW PSII ACCUMULATION1 is involved in efficient assembly of photosystem II in Arabidopsis thaliana. Plant Cell 18, 955–969 (2006).

  56. 56.

    Tardif, M. et al. PredAlgo: a new subcellular localization prediction tool dedicated to green algae. Mol. Biol. Evol. 29, 3625–3639 (2012).

  57. 57.

    Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

  58. 58.

    Zhang, R. et al. High-throughput genotyping of green algal mutants reveals random distribution of mutagenic insertion sites and endonucleolytic cleavage of transforming DNA. Plant Cell 26, 1398–1409 (2014).

  59. 59.

    Rubin, B. E. et al. The essential gene set of a photosynthetic organism. Proc. Natl Acad. Sci. USA 112, E6634–E6643 (2015).

  60. 60.

    Wetmore, K. M. et al. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons. MBio 6, e00306–e00315 (2015).

  61. 61.

    Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B 57, 289–300 (1995).

  62. 62.

    Berthold, P., Schmitt, R. & Mages, W. An engineered Streptomyces hygroscopicus aph 7″ gene mediates dominant resistance against hygromycin B in Chlamydomonas reinhardtii. Protist 153, 401–412 (2002).

  63. 63.

    Porra, R. J., Thompson, W. A. & Kriedemann, P. E. Determination of accurate extinction coefficients and simultaneous equations for assaying chlorophylls a and b extracted with four different solvents: verification of the concentration of chlorophyll standards by atomic absorption spectroscopy. BBA Bioenergetics 975, 384–394 (1989).

  64. 64.

    Saroussi, S. I., Wittkopp, T. M. & Grossman, A. R. The type II NADPH dehydrogenase facilitates cyclic electron flow, energy-dependent quenching, and chlororespiratory metabolism during acclimation of Chlamydomonas reinhardtii to nitrogen deprivation. Plant Physiol. 170, 1975–1988 (2016).

  65. 65.

    Rappsilber, J., Ishihama, Y. & Mann, M. Stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics. Anal. Chem. 75, 663–670 (2003).

  66. 66.

    Kulak, N. A., Pichler, G., Paron, I., Nagaraj, N. & Mann, M. Minimal, encapsulated proteomic-sample processing applied to copy-number estimation in eukaryotic cells. Nat. Methods 11, 319–324 (2014).

  67. 67.

    Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008).

  68. 68.

    Maul, J. E. et al. The Chlamydomonas reinhardtii plastid chromosome: islands of genes in a sea of repeats. Plant Cell 14, 2659–2679 (2002).

  69. 69.

    Michaelis, G., Vahrenholz, C. & Pratje, E. Mitochondrial DNA of Chlamydomonas reinhardtii: the gene for apocytochrome b and the complete functional map of the 15.8 kb DNA. Mol. Gen. Genet. 223, 211–216 (1990).

  70. 70.

    Bern, M., Kil, Y. J. & Becker, C. Byonic: advanced peptide and protein identification software. Curr. Protoc. Bioinformatics 40, 13.20 (2012).

  71. 71.

    Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).

Download references


We thank O. Vallon for helpful discussions; M. Cahn and G. Huntress for developing and improving the CLiP website; X. Ji at the Stanford Functional Genomics Facility and Z. Weng at the Stanford Center for Genomics and Personalized Medicine for deep sequencing services; A. Itakura for help in library pooling; S. Ghosh, K. Mendoza, M. LaVoie, L. Galhardo, X. Li, Y. Wang, and Q. Chen for technical assistance; K. Barton, W. Briggs, and Z.-Y. Wang for providing lab space; J. Ecker, L. Freeman Rosenzweig, and M. Kafri for constructive suggestions on the manuscript; and the Princeton Mass Spectrometry Facility for proteomics services. This project was supported by a grant from the National Science Foundation (MCB-1146621) awarded to M.C.J. and A.R.G., grants from the National Institutes of Health (DP2-GM-119137) and the Simons Foundation and Howard Hughes Medical Institute (55108535) awarded to M.C.J., a German Academic Exchange Service (DAAD) research fellowship to F.F., Simons Foundation fellowships of the Life Sciences Research Foundation to R.E.J. and J.V.-B., an EMBO long-term fellowship (ALTF 1450-2014 and ALTF 563-2013) to J.V.-B and S.R., a Swiss National Science Foundation Advanced PostDoc Mobility Fellowship (P2GEP3_148531) to S.R., and a Westlake University startup fund to X.L.

Author information

X.L. developed the method for generating barcoded cassettes. R.Y. and S.R.B. optimized the mutant generation protocol. R.Y., N.I., and X.L. generated the library. J.M.R., N.I., A.G., and R.Y. maintained, consolidated, and cryopreserved the library. X.L. developed the barcode sequencing method. N.I., X.L., R.Y., and W.P. performed combinatorial pooling and super-pool barcode sequencing. X.L. performed LEAP-Seq. W.P. developed the mutant mapping data analysis pipeline and performed data analyses for barcode sequencing and LEAP-Seq. W.P. analyzed insertion coverage and hot- and coldspots. R.Z. and J.M.R. performed insertion verification PCRs and Southern blots. F.F., R.E.J., and J.V.-B. developed the library screening protocol. F.F., J.V.-B., and X.L. performed the photosynthesis mutant screen and barcode sequencing. R.E.J. and W.P. developed data analysis methods and implemented them for the photosynthesis screen. X.L. and T.M.W. annotated the hits from the photosynthesis screen. X.L., J.M.R., and S.R. performed growth analysis, molecular characterizations, and complementation of cpl3. S.S. and T.M.W. performed physiological characterizations of cpl3. M.T.M. and S.S. performed western blots on the photosynthetic protein complexes. M.T.M. performed microscopy on cpl3. X.L., W.P., and T.S. performed proteomic analyses. M.L. and P.A.L. maintained, cryopreserved, and distributed mutants at the Chlamydomonas Resource Center. X.L., W.P., A.R.G., and M.C.J. wrote the manuscript with input from all authors. M.C.J. and A.R.G. conceived and guided the research and obtained funding.

Competing interests

The authors declare no competing interests.

Correspondence to Martin C. Jonikas.

Integrated supplementary information

  1. Supplementary Figure 1 A pipeline was developed for generating barcoded cassettes and for generating an indexed and barcoded library of insertion mutants in Chlamydomonas.

    a, A long oligonucleotide primer containing a random sequence region (indicated in gray) was used as a template for the extension of a shorter oligonucleotide primer (Supplementary Table 1). The resulting double-stranded product contains a random sequence region (22 bp in length; termed ‘barcode’). This product was restriction digested to generate a sticky end for subsequent ligation. The above steps were performed to produce both the 5’ and the 3’ ends of the cassette. The 5’ end of the cassette is shown as an example. b, The pMJ016c plasmid was digested to yield the backbone of the cassette. c, The 5’ and 3’ ends of the cassette generated above were ligated together with the cassette backbone to yield the cassette CIB1. d, The components of the cassette CIB1 are shown. CIB1 contains the HSP70-RBCS2 promoter (with an intron from RBCS2), the AphVIII gene that confers resistance to paromomycin, two transcriptional terminators (T1: PSAD terminator; T2: RPL12 terminator), and two barcodes (each 22 bp in length). e, Following transformation and arraying of individual mutants, the sequence of the barcodes contained in each insertion cassette was unique to each transformant but initially unknown for each colony. f, Barcodes were amplified from combinatorial pools of mutants, sequenced, and traced back to single colonies (Supplementary Fig. 2a-e; Supplementary Note). After this step, the barcode sequence for each colony was known. For simplicity, only one side of the cassette is shown. g, Barcodes and genomic sequences flanking the insertion cassettes were amplified from a pool of the library. By pooled next-generation sequencing, the sequence flanking each insertion cassette was paired with the corresponding barcode (Supplementary Fig. 2f). The flanking sequences were used to determine the insertion site in the genome. Because the colony location for each barcode was determined in the previous step, insertion sites could then be assigned to single colonies.

  2. Supplementary Figure 2 Combinatorial pooling, barcode deconvolution to colony and determination of insertion sites.

    a, To determine which plate each barcode was on, each plate of mutants was pooled into one of 570 plate-pools. The plate-pools were then further combinatorially pooled into 21 plate-super-pools, in such a way that each plate-pool was in a unique combination of plate-super-pools. The barcodes present in each plate-super-pool were determined by deep sequencing, and the barcodes were assigned to plates based on the combination of plate-super-pools they were found in. A similar process was applied to the colony positions of each barcode. Combining the plate and colony data yielded a specific position for each barcode. b, The barcodes on the 5’ and 3’ sides of the cassette were sequenced separately, each with a single-end Illumina read. With the sequencing primers we used (indicated on the cassette), the reads start with the barcode sequence and extend into the cassette. c, Most barcode colony positions were identified with no errors, that is were found in one of the expected combinations of super-pools. Some were found in a combination of super-pools that had one or more differences from any expected combination, but the positions could still be identified due to the redundancy built into our method. The much higher number of one-error cases in the colony data compared to plate data is due to a loss of one of the colony-super-pools for a significant fraction of the samples (Supplementary Note). d, Both a plate and a colony position were identified for most barcodes. e, The number of barcodes mapped to an individual colony varied, with 2 being the most common. For colonies with two mapped barcodes, the large majority had one 5’ and one 3’ barcode, likely derived from two sides of one cassette. f, LEAP-Seq reads are paired-end reads with the proximal read containing the cassette barcode and immediate flanking genomic sequence, and the distal read containing flanking genomic sequence a variable distance away from the insertion site. During transformation, short fragments of genomic DNA, likely originating from lysed cells, are often inserted between the cassette and the true flanking genomic DNA. We refer to these short DNA fragments as ‘junk fragments’ (Zhang, R. et al., Plant Cell. 26, 1398-1409, 2014 an Li, X. et al., Plant Cell. 28, 367-387, 2016). Such junk fragments can lead to incorrect insertion mapping if only the immediate flanking genomic sequence is obtained. LEAP-Seq data can be used to detect presence of junk fragments at an insertion junction based on two key characteristics: 1) the number of read pairs where both sides aligned to the same locus and 2) the longest distance spanned by such read pairs. g, The two key characteristics are plotted for the original full library, before any mapping corrections were applied. h, The same two characteristics are plotted for confidence level 1 and 2 insertions. For confidence level 2 insertions, only the side with no junk fragment is shown; for confidence level 1 insertions, one randomly chosen side is shown. i, LEAP-Seq data can be used to correct cases of probable junk fragment insertions and determine the most likely correct insertion position. The corrected data can be visualized using two modified key characteristics: the number of distal reads aligned to the corrected location, and the distance spanned by such reads. j, The modified characteristics are plotted for confidence level 4 insertions.

  3. Supplementary Figure 3 Characterization of genomic disruptions in mutants in the library.

    a, Mutants in the library were divided into four confidence levels, corresponding to different mapping scenarios. The insertion sites of a number of randomly chosen mutants in each category were verified by PCR (mutants from confidence levels 1 and 2 were assayed as one group; Supplementary Table 6). The numbers and percentages of confirmed insertions are shown in the last column. b, Most mutants have a single mapped insertion, and < 20% contain two or more mapped insertions. c, Eighteen randomly selected mutants from the four confidence levels were analyzed by Southern blotting using the coding sequence of AphVIII as the probe. Mutants are numbered and the details of their insertion sites are presented in Supplementary Table 6. The mutant number is highlighted in red when the Southern blot was interpreted to indicate at least two insertions in that mutant. The wild-type strain CC-4533 (WT) was included as a negative control. d, Most genomic deletions accompanying cassette insertions are smaller than 100 bp, but deletions up to 10 kb are present in some mutants. Deletions larger than 10 kb may also be present, but there were not enough of them to be clearly detected based on the aggregate numbers. e, Most genomic duplications accompanying cassette insertion are smaller than 10 bp, but they can be up to 30 bp. Larger duplications may be present, but these are not common enough to be detected based on the aggregate numbers. f, The distribution of junk fragment lengths was determined using a dataset of 651 insertions of two cassettes surrounding a junk fragment, allowing us to precisely map both ends of the junk fragment using LEAP-Seq. Most junk DNA fragments are smaller than 320 bp, but we have detected some up to 1 kb in size. Larger junk fragments may be present, but are not common enough to be detected based on the aggregate numbers. Note that the x-axes for d-f are set to the logarithmic scale. Data presented in this figure are described in a Supplementary Note.

  4. Supplementary Figure 4 The distribution of insertions in the genome is largely random, and the hotspots fall into two classes.

    a, For each chromosome, the observed insertion density is shown as a heatmap in a wide column, followed by three narrow columns depicting three simulated datasets in which insertions were placed in randomly chosen mappable genomic locations. The simulated data provide a visual guide to the amount of variation expected from a random distribution. The large white areas present in both the observed and simulated data correspond to repetitive genomic regions in which insertions cannot be mapped uniquely. The red and blue circles/lines to the left of each chromosome show statistically significant insertion hot spots and cold spots, respectively. To ensure that we are showing true insertion density rather than artifacts caused by junk fragments or other mapping inaccuracies, the plot of insertion site distribution and identification of hot/cold spots are based on confidence level 1 insertions only. In contrast, Fig. 1c shows the distribution of insertions of all confidence levels over the genome. b and c, Each plot represents a 1-kb genomic region surrounding one hot spot, showing multiple features of that region, as listed in the legend. The plots shown are the 22 1-kb regions with the highest total insertion number. The total number of insertions for each region is listed above each plot, along with the genomic position and the y-axis range. b, 7 of the top 22 hot spots are narrow, with 20 or more insertions in a 10-bp area, and a total width of 20-30 bp with few or no additional insertions in the surrounding 1 kb. c, 15 of the top 22 hot spots are wider, with multiple peaks of high insertion density spanning at least hundreds of base pairs. In either class, the insertion density peaks do not appear to reliably correlate with any of the other genomic features shown. Data presented in this figure are described in a Supplementary Note.

  5. Supplementary Figure 5 The barcode sequencing method is robust.

    a, The barcode sequencing read counts (normalized to 100 million total reads) for each insertion were highly reproducible between technical replicates, with a Spearman’s correlation of 0.978. 94% of barcodes showed a normalized read count of no more than a 2-fold difference between the two replicates. b, The TP-light/TAP-dark ratios of multiple barcodes in the same mutant are consistent, with a Spearman’s correlation of 0.744. Only 4% of insertion pairs had a greater than 5x difference between ratios. See also Fig. 3b, c.

  6. Supplementary Figure 6 Molecular characterization of the cpl3 mutant.

    a, The cassette insertion site is indicated on a model of the CPL3 gene from the Chlamydomonas v5.5 genome. Two cassettes are inserted in opposite orientations, with one of them truncated on the 3' side (indicated by a notch); the 5' ends may be intact or truncated. The orange box arrow indicates insertion of a small fragment of unknown origin. Binding sites for primers g1, g2, g3, and c1 are indicated. b, PCR genotyping results of cpl3 and complemented lines. PCR with the primer pair ‘g1 + g2’ indicated presence of an insertion within the CPL3 gene in the cpl3 mutant and presence of wild-type CPL3 sequence in the wild-type (from the native CPL3 locus) and in the complemented lines (from the complementation construct inserted at a random site in the genome of each line). PCR with the primer pair ‘g3 + g2’ demonstrated the disruption of the native CPL3 locus in the cpl3 and comp1-3 lines, as the binding site for primer g3 is present only in the native CPL3 locus and not in the complementation construct. PCR with primer pairs ‘g1 + c1’ and ‘g2 + c1’ showed the presence of a cassette inserted into the CPL3 gene in cpl3 as well as the complemented lines. c, cpl3 mutants transformed with the CPL3 gene were arrayed and grown photosynthetically in the absence of acetate for one day under 100 µmol photons m−2 s−1 light and four additional days under 500 µmol photons m−2 s−1 light before imaging. The colony circled was a positive control strain that grows photosynthetically. d, The same transformants were grown for five days in the presence of acetate in the medium under 50 µmol photons m−2 s−1 light. All colonies grew similarly. e, CPL3 contains conserved tyrosine phosphatase motifs. Sequences of CPL3 in Chlamydomonas and its homolog psychrophilic phosphatase I (PPI) in Shewanella sp. were aligned using Clustal Omega (Sievers, F. et al., Mol Syst Biol. 7, 539, 2011). Asterisks (*), colon (:), and period (.) indicate conserved, strongly similar, and weakly similar amino acid residues, respectively. The motifs that are conserved among multiple protein phosphatases (Tsuruta, H. et al., J Biochem. 137, 69-77, 2005) are boxed. Data in panels a-d are described in a Supplementary Note. See also Fig. 4.

  7. Supplementary Figure 7 Phenotypic characterization of the cpl3 mutant.

    a, cpl3, the wild-type strain (WT), as well as the complemented line (comp1), contain a normal cup-shaped chloroplast. Representative images of confocal chlorophyll fluorescence, bright field, and an overlay are shown for each strain. b, cpl3 has a lower chlorophyll a/b ratio than WT and comp1 (P<0.03, Student’s t-test). Error bars indicate standard deviations (n=3). c, western blots show that cpl3 accumulates lower levels of the PSII subunit CP43, the PSI subunit PsaA, and the chloroplast ATP synthase subunit ATPC. For PsaA, bands with a higher molecular weight have been observed when its antibody was used on Chlamydomonas (see the product sheet for Agrisera antibody AS06-172-100). An asterisk is used to indicate the band at the expected PsaA molecular weight. α-tubulin served as a loading control. Major bands cropped from this panel are also presented in Fig. 4e.

Supplementary information

  1. Supplementary Information

    Supplementary Figures 1–7 and Supplementary Note

  2. Reporting Summary

  3. Supplementary Table 1

    Primers and experimental design for PCRs in this research

  4. Supplementary Table 2

    Binary codes for plate super-pooling

  5. Supplementary Table 3

    Binary codes for colony super-pooling

  6. Supplementary Table 4

    Read counts for each barcode in each combinatorial super-pool

  7. Supplementary Table 5

    List of all mapped mutants in the library

  8. Supplementary Table 6

    Primers and results of PCRs used to verify the insertion sites of randomly picked mutants from the mutant library

  9. Supplementary Table 7

    Statistically significant insertion hotspots and coldspots

  10. Supplementary Table 8

    Statistically significant depleted functional terms

  11. Supplementary Table 9

    Candidate essential genes

  12. Supplementary Table 10

    Read counts of barcodes before and after pooled growth in the photosynthesis screen

  13. Supplementary Table 11

    Statistics of the pooled growth data for all genes

  14. Supplementary Table 12

    Summary of previous characterizations of the roles of higher- and lower-confidence genes in photosynthesis

  15. Supplementary Table 13

    Read counts of cpl3 exon and intron alleles in the pooled screens

  16. Supplementary Table 14

    Proteomic characterization of the cpl3 mutant

Rights and permissions

To obtain permission to re-use content from this article visit RightsLink.

About this article

Fig. 1: A genome-wide library of Chlamydomonas mutants was generated by random insertion of barcoded cassettes and mapping of insertion sites.
Fig. 2: The library covers 83% of Chlamydomonas genes.
Fig. 3: A high-throughput screen using the library identifies many genes with known roles in photosynthesis and many novel components.
Fig. 4: CPL3 is required for photosynthetic growth and accumulation of photosynthetic protein complexes in the thylakoid membranes.
Supplementary Figure 1: A pipeline was developed for generating barcoded cassettes and for generating an indexed and barcoded library of insertion mutants in Chlamydomonas.
Supplementary Figure 2: Combinatorial pooling, barcode deconvolution to colony and determination of insertion sites.
Supplementary Figure 3: Characterization of genomic disruptions in mutants in the library.
Supplementary Figure 4: The distribution of insertions in the genome is largely random, and the hotspots fall into two classes.
Supplementary Figure 5: The barcode sequencing method is robust.
Supplementary Figure 6: Molecular characterization of the cpl3 mutant.
Supplementary Figure 7: Phenotypic characterization of the cpl3 mutant.