Estimating categorical counterfactuals via deep twin networks

Vlontzos, Athanasios; Kainz, Bernhard; Gilligan-Lee, Ciarán M.

doi:10.1038/s42256-023-00611-x

Article
Published: 20 February 2023

Estimating categorical counterfactuals via deep twin networks

Nature Machine Intelligence volume 5, pages 159–168 (2023)Cite this article

3925 Accesses
1 Citations
73 Altmetric
Metrics details

Subjects

A preprint version of the article is available at Research Square.

Abstract

Counterfactual inference is a powerful tool, capable of solving challenging problems in high-profile sectors. To perform counterfactual inference, we require knowledge of the underlying causal mechanisms. However, causal mechanisms cannot be uniquely determined from observations and interventions alone. This raises the question of how to choose the causal mechanisms so that the resulting counterfactual inference is trustworthy in a given domain. This question has been addressed in causal models with binary variables, but for the case of categorical variables, it remains unanswered. We address this challenge by introducing for causal models with categorical variables the notion of counterfactual ordering, a principle positing desirable properties that causal mechanisms should possess and prove that it is equivalent to specific functional constraints on the causal mechanisms. To learn causal mechanisms satisfying these constraints, and perform counterfactual inference with them, we introduce deep twin networks. These are deep neural networks that, when trained, are capable of twin network counterfactual inference—an alternative to the abduction–action–prediction method. We empirically test our approach on diverse real-world and semisynthetic data from medicine, epidemiology and finance, reporting accurate estimation of counterfactual probabilities while demonstrating the issues that arise with counterfactual reasoning when counterfactual ordering is not enforced

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Construction and interventions on Twin Networks.**

**Fig. 2: From DAG to twin network DAG to deep neural network architecture for binary X, Y.**

Psilocybin microdosers demonstrate greater observed improvements in mood and mental health at one month relative to non-microdosing controls

Article Open access 30 June 2022

Joseph M. Rootman, Maggie Kiraga, … Zach Walsh

The serotonin theory of depression: a systematic umbrella review of the evidence

Article Open access 20 July 2022

Joanna Moncrieff, Ruth E. Cooper, … Mark A. Horowitz

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Data availability

All our datasets are publicly available and free to use for research purposes. The Kenyan water dataset originates from ref. ³⁸ licensed under a non-commercial use clause and with the requirement for secure storage; both conditions have been fulfilled by the authors. The twin mortality dataset on the other hand was used as supplied by ref. ²⁰. Finally, the semisynthetic and synthetic datasets can be replicated with the code provided.

Code availability

Our codebase is available in ref. ³⁹ for public use under an MIT licence.

References

Schwab, P., Linhardt, L. & Karlen, W. Perfect match: a simple method for learning representations for counterfactual inference with neural networks. Preprint at https://arxiv.org/abs/1810.00656 (2018).
Alaa, A. M., Weisz, M. & Van Der Schaar, M. “Deep counterfactual networks with propensity-dropout,” ICML 2017 – Workshop on Principled Approaches to Deep Learning. Preprint at https://arxiv.org/abs/1706.05966 (2017).
Shi, C., Blei, D. M. & Veitch, V. Adapting neural networks for the estimation of treatment effects. In Advances of Neural Information Processing Systems (NeurIPS). Preprint at https://arxiv.org/abs/1906.02120 (NeurIPS, 2019).
Pearl, J. Causality 2nd edn (Cambridge University Press, 2009).
Bareinboim, E., Correa, J. D., Ibeling, D. & Icard, T. On Pearl’s Hierarchy and the Foundations of Causal Inference (Columbia University–Stanford University, 2020).
Richens, J. G., Lee, C. M. & Johri, S. Improving the accuracy of medical diagnosis with causal machine learning. Nat. Commun. 11, 3923 (2020).
Article Google Scholar
Oberst, M. & Sontag, D. Counterfactual off-policy evaluation with Gumbel–Max structural causal models. Proc. Mach. Learning Res. 97, 4881–4890 (2019).
Lagnado, D. A., Gerstenberg, T. & Zultan, R. Causal responsibility and counterfactuals. Cogn. Sci. 37, 1036–1073 (2013).
Article Google Scholar
Kusner, M., Loftus, J., Russell, C. & Silva, R. Counterfactual fairness. Adv. Neural Inf. Process. Syst. 30 (2017).
Galhotra, S., Pradhan, R. & Salimi, B. Explaining black-box algorithms using probabilistic contrastive counterfactuals. In Proc. of the 2021 International Conference on Management of Data. Preprint at https://arxiv.org/abs/2103.11972 (2021).
Li, A. & Pearl, J. Unit selection based on counterfactual logic. In Proc. of the 28th International Joint Conference on Artificial Intelligence, 2019 (2019).
Tian, J. & Pearl, J. Probabilities of causation: bounds and identification. Ann. Math. Artif. Intell. 28, 287–313 (2000).
Article MATH MathSciNet Google Scholar
Zhang, J. & Bareinboim, E. Bounding causal effects on continuous outcomes. In Proc. of the AAAI Conference on Artificial Intelligence, Vol. 35, no. 13, 12207–12215 (2021).
Balke, A. & Pearl, J.. ‘Probabilistic evaluation of counterfactual queries.’ In Probabilistic and Causal Inference: The Works of Judea Pearl, 237–254 (2022).
Pearl, J. Probabilities of causation: three counterfactual interpretations and their identification. Synthese 121, 93–149 (1999).
Article MATH MathSciNet Google Scholar
Dua, D. & Graff, C. UCI Machine Learning Repository http://archive.ics.uci.edu/ml (2017).
Sandercock, P., Niewada, M. & Czlonkowska, A. International Stroke Trial Database version 2 https://doi.org/10.7488/DS/104 (2011).
Cuellar, M. & Kennedy, E. H. A non-parametric projection-based estimator for the probability of causation, with application to water sanitation in Kenya. J. R. Stat. Soc. A 183, 1793–1818 (2020).
Article MathSciNet Google Scholar
Louizos, C. et al. Causal effect inference with deep latent-variable models. Adv. Neural Inf. Process. Syst. 30, 6449–6459 (2017).
Yoon, J., Jordon, J., & Van Der Schaar, M. GANITE: estimation of individualized treatment effects using generative adversarial nets. In International Conference on Learning Representations (2018).
Pawlowski, N., Castro, D. C. & Glocker, B. Deep structural causal models for tractable counterfactual inference. In Advances of Neural Information Processing Systems (NeurIPS). Preprint at https://arxiv.org/abs/2006.06485 (NeurIPS, 2020).
Lorberbom, G., Johnson, D. D., Maddison, C. J., Tarlow, D. & Hazan, T. Learning generalized Gumbel–Max causal mechanisms. Adv. Neural Inf. Process. Syst. 34, 26792–26803 (2021).
Google Scholar
Joshi, S., Koyejo, O., Vijitbenjaronk, W., Kim, B. & Ghosh, J. Towards realistic individual recourse and actionable explanations in black-box decision making systems. Preprint at https://arxiv.org/abs/1907.09615 (2019).
Pawelczyk, M., Agarwal, C., Joshi, S., Upadhyay, S. & Lakkaraju, H. Exploring counterfactual explanations through the lens of adversarial examples: a theoretical and empirical analysis. Proc. Mach. Learning Res. 151, 4574–4594 (2022).
Balke, A. & Pearl, J. Bounds on treatment effects from studies with imperfect compliance. J. Am. Stat. Assoc. 92, 1171–1176 (1997).
Article MATH Google Scholar
Zhang, J., Tian, J. & Bareinboim, E. Partial counterfactual identification from observational and experimental data. In International Conference on Machine Learning, 26548–26558. PMLR (2021).
Imbens, G. W. & Angrist, J. D. Identification and estimation of local average treatment effects. Econometrica 62, 467–475 (1994).
Article MATH Google Scholar
Parbhoo, S., Bauer, S. & Schwab, P. NCoRE: neural counterfactual representation learning for combinations of treatments. Preprint at https://arxiv.org/abs/2103.11175 (2021).
Shalit, U., Johansson, F. D. & Sontag, D. Estimating individual treatment effect: generalization bounds and algorithms. In the International Conference on Machine Learning (ICML). Preprint at https://arxiv.org/abs/1606.03976 (ICML, 2017).
Johansson, F., Shalit, U. & Sontag, D. Learning representations for counterfactual inference. Proc. Mach. Learning Res. 48, 3020–3029 (2016).
Goudet, O. et al. in Explainable and Interpretable Models in Computer Vision and Machine Learning 39–80 (Springer, 2018).
Sill, J., and Y. S. Abu-Mostafa. “Monotonicity hints for credit screening.” In Progress in Neural Information Processing. Proc. of the 1996 International Conference on Neural Information Processing (ICONIP) Vol. 96, 123–127 (1996).
Sivaraman, A., Farnadi, G., Millstein, T. & Van den Broeck, G. Counterexample-guided learning of monotonic neural networks. In Advances of Neural Information Processing Systems (NeurIPS). Preprint at https://arxiv.org/abs/2006.08852 (NeurIPS, 2020).
Gupta, M. et al. Monotonic calibrated interpolated look-up tables. J. Mach. Learning Res. 17, 3790–3836 (2016).
MATH MathSciNet Google Scholar
Graham, L., Lee, C. M. & Perov, Y. Copy, paste, infer: a robust analysis of twin networks for counterfactual inference. In NeurIPS Causal ML Workshop 2019 (2019).
Reynaud, H. et al. D’ARTAGNAN: counterfactual video generation. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2022 (eds Wang, L. et al.) (Lecture Notes in Computer Science Vol. 13438, Springer, 2022).
Ye, X., Leake, D., Huibregtse, W. & Dalkilic, M. Applying class-to-class Siamese networks to explain classifications with supportive and contrastive cases. In Case-Based Reasoning Research and Development (eds Watson, I. & Weber, R.) 245–260 (Springer, 2020).
Kremer, M., Leino, J., Miguel, E. & Peterson, A. Replication data for: Spring cleaning: rural water impacts, valuation, and property rights institutions. Harvard Dataverse https://doi.org/10.7910/DVN/28063 (2015).
Vlontzos, A. thanosvlo/Twin_Causal_Nets: citable release. Zenodo https://zenodo.org/record/7118761 (2022).

Download references

Acknowledgements

We acknowledge and thank our sources of funding and support for this paper. Funding for this work was received by Imperial College London and the MAVEHA (EP/S013687/1) project and the UKRI London Medical Imaging & Artificial Intelligence Centre for Value-Based Healthcare (A.V., B.K.). We also received graphics processing unit (GPU) donations from Nvidia.

Author information

Athanasios Vlontzos
Present address: Spotify, London, UK

Authors and Affiliations

Department of Computing, Imperial College London, London, UK
Athanasios Vlontzos & Bernhard Kainz
FAU Erlangen-Nuremberg, Erlangen, Germany
Bernhard Kainz
University College London, London, UK
Ciarán M. Gilligan-Lee
Spotify, London, UK
Ciarán M. Gilligan-Lee

Authors

Athanasios Vlontzos
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Kainz
View author publications
You can also search for this author in PubMed Google Scholar
Ciarán M. Gilligan-Lee
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.V. and C.M.G.-L. contributed to the theoretical formulations; A.V. developed the codebase and ran the experiments; A.V., B.K. and C.M.G.-L. contributed to the manuscript.

Corresponding author

Correspondence to Athanasios Vlontzos.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Mark Keane and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Results and Figures.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Vlontzos, A., Kainz, B. & Gilligan-Lee, C.M. Estimating categorical counterfactuals via deep twin networks. Nat Mach Intell 5, 159–168 (2023). https://doi.org/10.1038/s42256-023-00611-x

Download citation

Received: 23 May 2022
Accepted: 04 January 2023
Published: 20 February 2023
Issue Date: February 2023
DOI: https://doi.org/10.1038/s42256-023-00611-x