Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Active learning for optimal intervention design in causal models

Abstract

Sequential experimental design to discover interventions that achieve a desired outcome is a key problem in various domains including science, engineering and public policy. When the space of possible interventions is large, making an exhaustive search infeasible, experimental design strategies are needed. In this context, encoding the causal relationships between the variables, and thus the effect of interventions on the system, is critical for identifying desirable interventions more efficiently. Here we develop a causal active learning strategy to identify interventions that are optimal, as measured by the discrepancy between the post-interventional mean of the distribution and a desired target mean. The approach employs a Bayesian update for the causal model and prioritizes interventions using a carefully designed, causally informed acquisition function. This acquisition function is evaluated in closed form, allowing for fast optimization. The resulting algorithms are theoretically grounded with information-theoretic bounds and provable consistency results for linear causal models with known causal graph. We apply our approach to both synthetic data and single-cell transcriptomic data from Perturb–CITE-sequencing experiments to identify optimal perturbations that induce a specific cell-state transition. The causally informed acquisition function generally outperforms existing criteria, allowing for optimal intervention design with fewer but carefully selected samples.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview schematic of the active learning framework for optimal intervention design in causal models.
Fig. 2: Example causal model.
Fig. 3: Illustration of output-weighted non-negative measure ν(a) on the space of all possible interventions \({{{\mathcal{A}}}}\).
Fig. 4: Convergence of the selected intervention a(t) to the optimal intervention a*.
Fig. 5: Comparison of different acquisition functions (random baseline, greedy baseline, MaxV baseline, CV baseline and our proposed CIV and CIV-OW) in a simulation study.
Fig. 6: Results on perturbational single-cell gene expression dataset.

Similar content being viewed by others

Data availability

The Perturb–CITE-seq36 data can be obtained from https://doi.org/10.1038/s41588-021-00779-1.

Code availability

All code has been deposited at ref. 49.

References

  1. Cherry, A. B. & Daley, G. Q. Reprogramming cellular identity for regenerative medicine. Cell 148, 1110–1122 (2012).

    Article  Google Scholar 

  2. Todorov, E. & Jordan, M. I. Optimal feedback control as a theory of motor coordination. Nat. Neurosci. 5, 1226–1235 (2002).

    Article  Google Scholar 

  3. Blanchard, A. B. et al. Bayesian optimization for active flow control. Acta Mech. Sin. 37, 1786–1798 (2021).

    Article  MathSciNet  Google Scholar 

  4. Sunar, N., Birge, J. R. & Vitavasiri, S. Optimal dynamic product development and launch for a network of customers. Oper. Res. 67, 770–790 (2019).

    Article  MathSciNet  MATH  Google Scholar 

  5. Serrao-Neumann, S., Di Giulio, G. M., Ferreira, L. C. & Choy, D. L. Climate change adaptation: is there a role for intervention research? Futures 53, 86–97 (2013).

    Article  Google Scholar 

  6. Fu, Y., Zhu, X. & Li, B. A survey on instance selection for active learning. Knowl. Inf. Syst. 35, 249–283 (2013).

    Article  Google Scholar 

  7. Jesson, A. et al. Causal-BALD: deep Bayesian active learning of outcomes to infer treatment-effects from observational data. In Adv. Neural Information Processing Systems Vol. 34, 30465–30478 (NeurIPS, 2021).

  8. Cohn, D. A., Ghahramani, Z. & Jordan, M. I. Active learning with statistical models. J. Artif. Intell. Res. 4, 129–145 (1996).

    Article  MATH  Google Scholar 

  9. Houlsby, N., Huszár, F., Ghahramani, Z. & Lengyel, M. Bayesian active learning for classification and preference learning. Preprint at arXiv https://doi.org/10.48550/arXiv.1112.5745 (2011).

  10. Lattimore, F., Lattimore, T. & Reid, M. D. Causal bandits: learning good interventions via causal inference. In Adv. Neural Information Processing Systems Vol. 29 (2016).

  11. Lee, S. & Bareinboim, E. Structural causal bandits: where to intervene? In Adv. Neural Information Processing Systems Vol. 31 (2018).

  12. Aglietti, V., Lu, X., Paleyes, A. & González, J. Causal Bayesian optimization. In Int. Conf. Artificial Intelligence and Statistics 3155–3164 (PMLR, 2020).

  13. Alabed, S. & Yoneki, E. BoGraph: structured Bayesian optimization from logs for expensive systems with many parameters. In Proc. 2nd European Workshop on Machine Learning and Systems 45–53 (2022).

  14. Branchini, N., Aglietti, V., Dhir, N. & Damoulas, T. Causal entropy optimization. In Int. Conf. on Artificial Intelligence and Statistics 8586–8605 (PMLR, 2023).

  15. Cahan, P. et al. CellNet: network biology applied to stem cell engineering. Cell 158, 903–915 (2014).

    Article  Google Scholar 

  16. Kemmeren, P. et al. Large-scale genetic perturbations reveal regulatory networks and an abundance of gene-specific repressors. Cell 157, 740–752 (2014).

    Article  Google Scholar 

  17. Spirtes, P., Glymour, C. N., Scheines, R. & Heckerman, D. Causation, Prediction, and Search (MIT Press, 2000).

  18. Pearl, J. Causality (Cambridge Univ. Press, 2009).

  19. Rothenhäusler, D., Heinze, C., Peters, J. & Meinshausen, N. Backshift: learning causal cyclic graphs from unknown shift interventions. In Adv. Neural Information Processing Systems Vol. 28 (2015).

  20. Zhang, J., Squires, C. & Uhler, C. Matching a desired causal state via shift interventions. In Adv. Neural Information Processing Systems Vol. 34 (2021).

  21. Eberhardt, F. & Scheines, R. Interventions and causal inference. Philos. Sci. 74, 981–995 (2007).

    Article  MathSciNet  Google Scholar 

  22. Shalem, O., Sanjana, N. E. & Zhang, F. High-throughput functional genomics using CRISPR–Cas9. Nat. Rev. Genet. 16, 299–311 (2015).

    Article  Google Scholar 

  23. Joung, J. et al. A transcription factor atlas of directed differentiation. Cell 186, 209–229 (2023).

    Article  Google Scholar 

  24. Replogle, J. M. et al. Mapping information-rich genotype–phenotype landscapes with genome-scale Perturb-seq. Cell 185, 2559–2575 (2022).

    Article  Google Scholar 

  25. Sen, R., Shanmugam, K., Dimakis, A. G. & Shakkottai, S. Identifying best interventions through online importance sampling. In Int. Conf. Machine Learning 3057–3066 (PMLR, 2017).

  26. Koumoutsakos, P. & Leonard, A. High-resolution simulations of the flow around an impulsively started cylinder using vortex methods. J. Fluid Mech. 296, 1–38 (1995).

    Article  MATH  Google Scholar 

  27. Rackham, O. J. et al. A predictive computational framework for direct reprogramming between human cell types. Nat. Genet. 48, 331–335 (2016).

    Article  Google Scholar 

  28. Geiger, D. & Heckerman, D. Parameter priors for directed acyclic graphical models and the characterization of several probability distributions. Ann. Stat. 30, 1412–1440 (2002).

    Article  MathSciNet  MATH  Google Scholar 

  29. Kuipers, J. & Moffa, G. The interventional Bayesian Gaussian equivalent score for Bayesian causal inference with unknown soft interventions. Preprint at arXIv https://doi.org/10.48550/arXiv.2205.02602 (2022).

  30. Kuipers, J., Moffa, G. & Heckerman, D. Addendum on the scoring of Gaussian directed acyclic graphical models. Ann. Statist. 42, 1689–1691 (2014).

    Article  MathSciNet  MATH  Google Scholar 

  31. Kleijn, B. J. & van der Vaart, A. W. The Bernstein–von-Mises theorem under misspecification. Electron. J. Stat. 6, 354–381 (2012).

    Article  MathSciNet  MATH  Google Scholar 

  32. Sapsis, T. P. Output-weighted optimal sampling for Bayesian regression and rare event statistics using few samples. Proc. R. Soc. A 476, 20190834 (2020).

    Article  MathSciNet  MATH  Google Scholar 

  33. Mohamad, M. A. & Sapsis, T. P. Sequential sampling strategy for extreme event statistics in nonlinear dynamical systems. Proc. Natl Acad. Sci. USA 115, 11138–11143 (2018).

    Article  MathSciNet  MATH  Google Scholar 

  34. Astudillo, R. & Frazier, P. Bayesian optimization of function networks. In Adv. Neural Information Processing Systems Vol. 34, 14463–14475 (NeurIPS, 2021).

  35. Bubeck, S. et al. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5, 1–122 (2012).

    Article  MATH  Google Scholar 

  36. Frangieh, C. J. et al. Multimodal pooled Perturb–CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nat. Genet. 53, 332–341 (2021).

    Article  Google Scholar 

  37. Carretero, R. et al. Analysis of HLA class I expression in progressing and regressing metastatic melanoma lesions after immunotherapy. Immunogenetics 60, 439–447 (2008).

    Article  Google Scholar 

  38. Jaeger, J. et al. Gene expression signatures for tumor progression, tumor subtype, and tumor thickness in laser-microdissected melanoma tissues. Clin. Cancer Res. 13, 806–815 (2007).

    Article  Google Scholar 

  39. Cheng, Q. et al. SOX4 promotes melanoma cell migration and invasion though the activation of the NF-κB signaling pathway. Int. J. Mol. Med. 40, 447–453 (2017).

    Article  Google Scholar 

  40. Cao, X., Khare, K. & Ghosh, M. Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models. Ann. Stat. 47, 319–348 (2019).

    Article  MathSciNet  MATH  Google Scholar 

  41. Kirsch, A., Van Amersfoort, J. & Gal, Y. BatchBALD: efficient and diverse batch acquisition for deep Bayesian active learning. In Adv. Neural Information Processing Systems Vol. 32 (2019).

  42. Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).

    Article  Google Scholar 

  43. Hagberg, A., Swart, P. & Schult, D. A. Exploring Network Structure, Dynamics, and Function Using NetworkX (Los Alamos National Lab, 2008).

  44. Squires, C. CausalDAG: creation, manipulation, and learning of causal models. GitHub https://github.com/uhlerlab/causaldag (2018).

  45. Reisach, A., Seiler, C. & Weichwald, S. Beware of the simulated DAG! Causal discovery benchmarks may be easy to game. In Adv. Neural Information Processing Systems Vol. 34, 27772–27784 (NeurIPS, 2021).

  46. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 1–5 (2018).

    Article  Google Scholar 

  47. Abid, A., Zhang, M. J., Bagaria, V. K. & Zou, J. Exploring patterns enriched in a dataset with contrastive principal component analysis. Nat. Commun. 9, 1–7 (2018).

    Article  Google Scholar 

  48. Solus, L., Wang, Y. & Uhler, C. Consistency guarantees for greedy permutation-based causal inference algorithms. Biometrika 108, 795–814 (2021).

    Article  MathSciNet  MATH  Google Scholar 

  49. Zhang, J. uhlerlab/actlearn_optint: v1, July. Zenodo https://doi.org/10.5281/zenodo.8170179 (2023).

Download references

Acknowledgements

J.Z., C.S. and C.U. were partially supported by the National Center for Complementary and Integrative Health at the National Institutes of Health (NCCIH/NIH), the Office of Naval Research (N00014-22-1-2116), the National Science Foundation (DMS-1651995), the MIT-IBM Watson AI Lab, MIT J-Clinic for Machine Learning and Health, the Eric and Wendy Schmidt Center at the Broad Institute and a Simons Investigator Award to C.U. T.P.S. acknowledges support by the Office of Naval Research (N00014-21-1-2357) and the Air Force Office of Scientific Research (MURI FA9550-21-1-0058). C.S. was partially supported by an NSF Graduate Fellowship.

Author information

Authors and Affiliations

Authors

Contributions

J.Z., L.C., T.P.S. and C.U. conceived the research and designed the method. J.Z. derived the theoretical results and performed the numerical experiments. L.C. and J.Z. processed the biological data. C.S. and J.Z. derived the extension of the DAG–Wishart distribution. All authors interpreted the results and wrote the paper.

Corresponding authors

Correspondence to Themistoklis P. Sapsis or Caroline Uhler.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Virginia Aglietti and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Mirko Pieropan, in collaboration with the Nature Machine Intelligence team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Comparison of different acquisition functions in a simulation study where the underlying causal graph is the complete graph, half of the nodes are selected at random as intervention targets, and we vary the number of nodes p.

Each plot corresponds to an average of 10 instances and each method is run 20 times and averaged. (A)-(C) Relative distance between the target mean μ* and the best approximation \({{{{\boldsymbol{\mu }}}}}_{t}^{* }\) (defined in Fig. 5A in the main text) up to time step t. Lines denote the mean over 10 instances; the shading corresponds to one standard deviation. (D)-(F) Relative distance statistics of each method averaged over 10 instances at the last time step (t = 50). (G)-(I) Squared distance presented as mean value +/- SEM between the optimal intervention a* and the best approximation \({{{{\bf{a}}}}}_{t}^{* }\) that is used to obtain \({{{{\boldsymbol{\mu }}}}}_{t}^{* }\) up to time step t.

Extended Data Fig. 2 Comparison of different acquisition functions in a simulation study where the underlying causal graph is the complete graph, the most downstream half of the nodes are fixed as intervention targets, and we vary the number of nodes p.

Each plot corresponds to an average of 10 instances and each method is run 20 times and averaged. (A)-(C) Relative distance between the target mean μ* and the best approximation \({{{{\boldsymbol{\mu }}}}}_{t}^{* }\) (defined in Fig. 5A in the main text) up to time step t. Lines denote the mean over 10 instances; the shading corresponds to one standard deviation. (D)-(F) Relative distance statistic of each method averaged over 10 instances at the last time step (t = 50). (G)-(I) Squared distance presented as mean value +/- SEM between the optimal intervention a* and the best approximation \({{{{\bf{a}}}}}_{t}^{* }\) that is used to obtain \({{{{\boldsymbol{\mu }}}}}_{t}^{* }\) up to time step t.

Extended Data Fig. 3 Comparison of different acquisition functions in a simulation study where the underlying causal graph is the complete graph on 30 nodes, the most downstream nodes are fixed as intervention targets, and we vary the number of intervention targets.

Each plot corresponds to an average of 10 instances and each method is run 20 times and averaged. (A)-(D) Relative distance between the target mean μ* and the best approximation \({{{{\boldsymbol{\mu }}}}}_{t}^{* }\) (defined in Fig. 5A in the main text) up to time step t. Lines denote the mean over 10 instances; the shading corresponds to one standard deviation. (E)-(H) Relative distance statistic of each method averaged over 10 instances at the last time step (t = 50). (I)-(L) Squared distance presented as mean value +/- SEM between the optimal intervention a* and the best approximation \({{{{\bf{a}}}}}_{t}^{* }\) that is used to obtain \({{{{\boldsymbol{\mu }}}}}_{t}^{* }\) up to time step t.

Extended Data Fig. 4 Comparison of different acquisition functions in a simulation study where we vary the underlying causal graph (complete graph, Erdös-Rényi graph with edge probability 0.8, Erdös-Rényi graph with edge probability 0.8, path graph) and the most downstream half of the nodes are fixed as intervention targets.

Each plot corresponds to an average of 10 instances on a 30-node DAG with 15 perturbation targets. Each method is run 20 times and averaged. (A)-(D) Relative distance between the target mean μ* and the best approximation \({{{{\boldsymbol{\mu }}}}}_{t}^{* }\) (defined in Fig. 5A in the main text) up to time step t. Lines denote the mean over 10 instances; the shading corresponds to one standard deviation. (E)-(H) Relative distance statistic of each method averaged over 10 instances at the last time step (t = 50). Note that the DAGs become sparser from left to right. (I)-(L) Squared distance presented as mean value +/- SEM between the optimal intervention a* and the best approximation \({{{{\bf{a}}}}}_{t}^{* }\) that is used to obtain \({{{{\boldsymbol{\mu }}}}}_{t}^{* }\) up to time step t.

Extended Data Fig. 5 Comparison of our acquisition functions to baseline acquisition functions adapted from prior works (EI-Int: based on Expected Improvement, MI-Int: based on Mutual Information, and UCB-Int: based on Upper Confidence Bound) in a simulation study where the underlying causal graph is the complete graph on 10 nodes and the most downstream 5 nodes are fixed as intervention targets.

Each plot corresponds to an average of 10 instances and each method is run 20 times and averaged. (A) Relative distance between the target mean μ* and the best approximation \({{{{\boldsymbol{\mu }}}}}_{t}^{* }\) (defined in Fig. 5A in the main text) up to time step t. Lines denote the mean over 10 instances; the shading corresponds to one standard deviation. (B) Relative distance statistic of each method averaged over 10 instances at the last time step (t = 50). (C) Runtime per iteration of each method in seconds.

Extended Data Fig. 6 Performance of the different acquisition functions under three types of DAG misspecifications where the underlying causal graph is a 5-node random Erdös-Rényi DAG with edge density 0.5 and 3 intervention targets.

Each plot corresponds to an average of the relative distance at time step 10 across 10 instances. Each method is run 10 times and averaged. SHD denotes the number of misspecified edges. (A)-(C) Three types of DAG misspecifications.

Extended Data Fig. 7 Learned linear Gaussian SCM on the 36 genes of interest based on the control cells.

(A) Learned DAG on the 36 considered genes. Nodes are oriented up-down by their topological order and colored by the module/program they belong to in Supplementary Fig. 7. (B) Parameters used in GSP48 to learn the above DAG. (C) Pearson r scores of regressing each non-source gene against its parents. Blue: average scores in the learned DAG; grey (with errorbars): average of scores in a random graph (100 samples).

Extended Data Fig. 8 Comparison of the different acquisition functions for identifying the intervention that matches the target mean for 5 different ground-truth target genes.

Square distance presented as mean value +/- SEM between the target mean μ* and the best approximation \({{{{\boldsymbol{\mu }}}}}_{t}^{* }\) across time step t is reported. (A)-(E) comparison of 6 acquisition functions. (F)-(J) same as top, showing only 3 methods to de-clutter the plots, comparing our CIV acquisition function against the random and greedy baseline. Each plot is captioned with its ground-truth target gene.

Extended Data Fig. 9 Gene expression changes for three examples of different knock-out perturbations.

Comparing target-gene expression in the control cell population and the perturbed cell population of the corresponding knock-out experiment. (A)-(C) Included target genes: MYC, EIF3K, and HLA-C. The mean expression of the target gene is given in each subcaption.

Extended Data Fig. 10 Comparison of acquisition functions for identifying interventions that match the target mean of perturbing MYC.

The reported metric is the square distance presented as mean value +/- SEM between the target mean μ* and the best approximation \({{{{\boldsymbol{\mu }}}}}_{t}^{* }\) across all time steps t. (A) All methods. (B) De-cluttered subset of methods.

Supplementary information

Supplementary Information

Supplementary Figs. 1–18 and discussion.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Cammarata, L., Squires, C. et al. Active learning for optimal intervention design in causal models. Nat Mach Intell 5, 1066–1075 (2023). https://doi.org/10.1038/s42256-023-00719-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-023-00719-0

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics