Using BEAN-counter to quantify genetic interactions from multiplexed barcode sequencing experiments

Article metrics

Abstract

The construction of genome-wide mutant collections has enabled high-throughput, high-dimensional quantitative characterization of gene and chemical function, particularly via genetic and chemical–genetic interaction experiments. As the throughput of such experiments increases with improvements in sequencing technology and sample multiplexing, appropriate tools must be developed to handle the large volume of data produced. Here, we describe how to apply our approach to high-throughput, fitness-based profiling of pooled mutant yeast collections using the BEAN-counter software pipeline (Barcoded Experiment Analysis for Next-generation sequencing) for analysis. The software has also successfully processed data from Schizosaccharomyces pombe, Escherichia coli, and Zymomonas mobilis mutant collections. We provide general recommendations for the design of large-scale, multiplexed barcode sequencing experiments. The procedure outlined here was used to score interactions for ~4 million chemical-by-mutant combinations in our recently published chemical–genetic interaction screen of nearly 14,000 chemical compounds across seven diverse compound collections. Here we selected a representative subset of these data on which to demonstrate our analysis pipeline. BEAN-counter is open source, written in Python, and freely available for academic use. Users should be proficient at the command line; advanced users who wish to analyze larger datasets with hundreds or more conditions should also be familiar with concepts in analysis of high-throughput biological data. BEAN-counter encapsulates the knowledge we have accumulated from, and successfully applied to, our multiplexed, pooled barcode sequencing experiments. This protocol will be useful to those interested in generating their own high-dimensional, quantitative characterizations of gene or chemical function in a high-throughput manner.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Overview of multiplexed barcode sequencing experiments and their processing using the BEAN-counter software.
Fig. 2: Design of large-scale, pooled and multiplexed chemical–genetic interaction screens.
Fig. 3: Schematic of the steps involved in processing large-scale interaction screens using BEAN-counter.
Fig. 4: The large signature that we observe in and remove from most of our datasets.
Fig. 5: An inoculation date-related effect in one of our datasets.
Fig. 6: Typical barcode and index tag abundance distributions.
Fig. 7: Manual examination of the dataset to flag mutants and conditions for removal.
Fig. 8: Analysis of same-compound, same-index-tag, and same-lane correlations to detect the presence of batch effects and uninformative signal.
Fig. 9: Removal of large, uninformative signature via singular-value decomposition (SVD).

Code availability

The source code for BEAN-counter is available from https://github.com/csbio/BEAN-counter. It requires a license for use (the license can be obtained at http://z.umn.edu/beanctr). It is free for academic use and can be purchased on a per-project basis for commercial use.

Data availability

All data needed to process the example dataset into chemical–genetic interaction scores are available at http://csbio.cs.umn.edu/BEAN-counter/example_dataset/. These data are a subset of the complete large-scale chemical–genetic interaction dataset, which is available from http://mosaic.cs.umn.edu, the supplementary material of the associated article10, or the corresponding author upon reasonable request.

References

  1. 1.

    Giaever, G. et al. Genomic profiling of drug sensitivities via induced haploinsufficiency. Nat. Genet. 21, 278–283 (1999).

  2. 2.

    Parsons, A. B. et al. Integration of chemical-genetic and genetic interaction data links bioactive compounds to cellular target pathways. Nat. Biotechnol. 22, 62–69 (2004).

  3. 3.

    Parsons, A. B. et al. Exploring the mode-of-action of bioactive compounds by chemical-genetic profiling in yeast. Cell 126, 611–625 (2006).

  4. 4.

    Pierce, S. E., Davis, R. W., Nislow, C. & Giaever, G. Genome-wide analysis of barcoded Saccharomyces cerevisiae gene-deletion mutants in pooled cultures. Nat. Protoc. 2, 2958–2974 (2007).

  5. 5.

    Costanzo, M. et al. The genetic landscape of a cell. Science 327, 425–431 (2010).

  6. 6.

    Costanzo, M. et al. A global genetic interaction network maps a wiring diagram of cellular function. Science 353, aaf1420 (2016).

  7. 7.

    Hoepfner, D. et al. High-resolution chemical dissection of a model eukaryote reveals targets, pathways and gene functions. Microbiol. Res. 169, 107–120 (2014).

  8. 8.

    Lee, A. Y. et al. Mapping the cellular response to small molecules using chemogenomic fitness signatures. Science 344, 208–211 (2014).

  9. 9.

    Estoppey, D. et al. Identification of a novel NAMPT inhibitor by CRISPR/Cas9 chemogenomic profiling in mammalian cells. Sci. Rep. 7, 42728 (2017).

  10. 10.

    Piotrowski, J. S. et al. Functional annotation of chemical libraries across diverse biological processes. Nat. Chem. Biol. 13, 982–993 (2017).

  11. 11.

    Roguev, A. et al. Conservation and rewiring of functional modules revealed by an epistasis map in fission yeast. Science 322, 405–410 (2008).

  12. 12.

    Ryan, C. J. et al. Hierarchical modularity and the evolution of genetic interactomes across species. Mol. Cell 46, 691–704 (2012).

  13. 13.

    Frost, A. et al. Functional repurposing revealed by comparing S. pombe and S. cerevisiae genetic interactions. Cell 149, 1339–1352 (2012).

  14. 14.

    Vizeacoumar, F. J. et al. A negative genetic interaction map in isogenic cancer cell lines reveals cancer cell vulnerabilities. Mol. Syst. Biol. 9, 696 (2013).

  15. 15.

    Babu, M. et al. Quantitative genome-wide genetic interaction screens reveal global epistatic relationships of protein complexes in Escherichia coli. PLoS Genet. 10, e1004120 (2014).

  16. 16.

    Hart, T. et al. High-resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell 163, 1515–1526 (2015).

  17. 17.

    Hillenmeyer, M. E. et al. The chemical genomic portrait of yeast: uncovering a phenotype for all genes. Science 320, 362–365 (2008).

  18. 18.

    Wildenhain, J. et al. Prediction of synergism from chemical-genetic interactions by machine learning. Cell Syst. 1, 383–395 (2015).

  19. 19.

    Smith, A. M. et al. Quantitative phenotyping via deep barcode sequencing. Genome Res. 19, 1836–1842 (2009).

  20. 20.

    Smith, A. M. et al. Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples. Nucleic Acids Res. 38, e142 (2010).

  21. 21.

    Cleveland, W. S. Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 74, 829–836 (1979).

  22. 22.

    Cleveland, W. S. LOWESS: a program for smoothing scatterplots by robust locally weighted regression. Am. Stat. 35, 54 (1981).

  23. 23.

    Yang, Y. H. with contributions from Paquet, A. & Dudoit, S. marray: Exploratory analysis for two-color spotted microarray data. https://rdrr.io/bioc/marray/ (2009).

  24. 24.

    Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

  25. 25.

    Piotrowski, J. S. et al. Chemical genomic profiling via barcode sequencing to predict compound mode of action. Methods Mol. Biol. 1263, 299–318 (2015).

  26. 26.

    Piotrowski, J. S. et al. Plant-derived antifungal agent poacic acid targets β-1,3-glucan. Proc. Natl. Acad. Sci. USA 112, E1490–E1497 (2015).

  27. 27.

    Baryshnikova, A. et al. Quantitative analysis of fitness and genetic interactions in yeast on a genome scale. Nat. Methods 7, 1017–1024 (2010).

  28. 28.

    Morales, E. H. et al. Accumulation of heme biosynthetic intermediates contributes to the antibacterial action of the metalloid tellurite. Nat. Commun. 8, 15320 (2017).

  29. 29.

    Giaever, G. & Nislow, C. The yeast deletion collection: a decade of functional genomics. Genetics 197, 451–465 (2014).

  30. 30.

    Ho, C. H. et al. A molecular barcoded yeast ORF library enables mode-of-action analysis of bioactive compounds. Nat. Biotechnol. 27, 369–377 (2009).

  31. 31.

    Ben-Aroya, S. et al. Toward a comprehensive temperature-sensitive mutant repository of the essential genes of Saccharomyces cerevisiae. Mol. Cell 30, 248–258 (2008).

  32. 32.

    Spirek, M. et al. S. pombe genome deletion project: an update. Cell Cycle 9, 2399–2402 (2010).

  33. 33.

    Baba, T. et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol. Syst. Biol. 2, 2006.0008 (2006).

  34. 34.

    Andrusiak, K. Adapting S. cerevisiae Chemical Genomics for Identifying the Modes of Action of Natural Compounds. Master’s thesis, University of Toronto (2012).

  35. 35.

    Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).

  36. 36.

    Bao, E., Jiang, T., Kaloshian, I. & Girke, T. SEED: efficient clustering of next-generation sequences. Bioinformatics 27, 2502–2509 (2011).

  37. 37.

    Shimizu, K. & Tsuda, K. SlideSort: all pairs similarity search for short reads. Bioinformatics 27, 464–470 (2011).

  38. 38.

    Mahé, F., Rognes, T., Quince, C., de Vargas, C. & Dunthorn, M. Swarm: robust and fast clustering method for amplicon-based studies. PeerJ 2, e593 (2014).

  39. 39.

    Zorita, E., Cuscó, P. & Filion, G. J. Starcode: sequence clustering based on all-pairs search. Bioinformatics 31, 1913–1919 (2015).

  40. 40.

    Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).

  41. 41.

    Vetrovský, T., Baldrian, P., Morais, D. & Berger, B. SEED 2: a user-friendly platform for amplicon high-throughput sequencing data analyses. Bioinformatics 34, (2018).

  42. 42.

    Zhao, L., Liu, Z., Levy, S. F. & Wu, S. Bartender: a fast and accurate clustering algorithm to count barcode reads. Bioinformatics 34, 739–747 (2018).

  43. 43.

    Dai, Z. et al. edgeR: a versatile tool for the analysis of shRNA-seq and CRISPR-Cas9 genetic screens. F1000Res. 3, 95 (2014).

  44. 44.

    Mun, J., Kim, D.-U., Hoe, K.-L. & Kim, S.-Y. Genome-wide functional analysis using the barcode sequence alignment and statistical analysis (Barcas) tool. BMC Bioinformatics 17, 475 (2016).

  45. 45.

    Robinson, D. G., Chen, W., Storey, J. D. & Gresham, D. Design and analysis of Bar-seq experiments. G3 (Bethesda) 4, 11–18 (2014).

  46. 46.

    Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).

  47. 47.

    Leek, J. T. & Storey, J. D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, 1724–1735 (2007).

  48. 48.

    Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).

  49. 49.

    Levy, S. F. et al. Quantitative evolutionary dynamics using high-resolution lineage tracking. Nature 519, 181–186 (2015).

  50. 50.

    Simpkins, S. W. et al. Predicting bioprocess targets of chemical compounds through integration of chemical-genetic and genetic interaction networks. Preprint at https://www.biorxiv.org/content/early/2018/05/18/111252 (2018).

Download references

Acknowledgements

S.W.S. thanks B. VanderSluis for proofreading the manuscript and testing the software and also A. Becker at the University of Minnesota Genomics Center for discussions regarding amplicon sequencing issues. This work was supported by RIKEN (http://www.riken.jp/en/) Strategic Programs for R&D, the National Institutes of Health (https://www.nih.gov/; R01HG005084, R01GM104975), and the National Science Foundation (https://www.nsf.gov/; DBI 0953881). S.W.S. was supported by an NSF Graduate Research Fellowship (00039202), an NIH Biotechnology training grant (T32GM008347), and a one-year fellowship from the University of Minnesota Bioinformatics and Computational Biology (BICB) Graduate Program (https://r.umn.edu/academics-research/graduate-programs/bicb). S.C.L. and J.S.P. were supported by a RIKEN Foreign Postdoctoral Research Fellowship. S.C.L. was supported by a RIKEN CSRS (http://www.csrs.riken.jp/en/) Research Topics for Cooperative Projects Award (201601100228) and a RIKEN FY2017 Incentive Research Projects Grant. H.N.W. was supported by a one-year BICB fellowship from the University of Minnesota. C.B. was supported by JSPS KAKENHI (https://www.jsps.go.jp/english/e-grants/) grant no. 15H04483. C.L.M. and C.B. are fellows in the Canadian Institute for Advanced Research (CIFAR, https://www.cifar.ca/) Genetic Networks Program. Computing resources and data storage services were partially provided by the Minnesota Supercomputing Institute and the UMN Office of Information Technology, respectively. Software licensing services were provided by the UMN Office for Technology Commercialization. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

C.B., M.Y., C.L.M., J.S.P., and S.C.L. conceived the project. R.D. and C.L.M. designed and R.D. wrote the original analysis software, which was extended and re-implemented as a full analysis pipeline by S.W.S. with assistance from J.N. and H.N.W. S.C.L., J.S.P., and Y.Y. generated chemical–genetic interaction data for software development and testing. J.N., S.C.L., and J.S.P. performed extensive user testing of the software. H.O. provided the RIKEN Natural Product Depository compounds used in the example dataset. S.W.S. wrote the manuscript with input and editing from all authors.

Correspondence to Chad L. Myers.

Ethics declarations

Competing interests

The authors declare competing financial interests. A license is required to use the BEAN-counter software (http://z.umn.edu/beanctr). It is free for academic use and must be purchased on a per-project basis for commercial use.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Piotrowski, J. S. et al. Nat. Chem. Biol. 13, 982–993 (2017): https://doi.org/10.1038/nchembio.2436

Morales, E. H. et al.. Nat. Commun. 8, 15320 (2017): https://doi.org/10.1038/ncomms15320

Integrated supplementary information

Supplementary Figure 1 The BEAN-counter configuration file provides necessary information for the pipeline.

Schematic showing how the configuration file coordinates the processing of pooled interaction screening data by specifying the location of the raw data, the structure of the PCR amplicons, and the mappings from genetic barcode to mutant and index tag to condition. Columns in bold are required in order to process data with BEAN-counter.

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.