Exploring the cloud of variable importance for the set of all good models

Dong, Jiayun; Rudin, Cynthia

doi:10.1038/s42256-020-00264-0

Article
Published: 10 December 2020

Exploring the cloud of variable importance for the set of all good models

Nature Machine Intelligence volume 2, pages 810–824 (2020)Cite this article

1247 Accesses
18 Citations
11 Altmetric
Metrics details

Subjects

A preprint version of the article is available at arXiv.

Abstract

Variable importance is central to scientific studies, including the social sciences and causal inference, healthcare and other domains. However, current notions of variable importance are often tied to a specific predictive model. This is problematic: what if there were multiple well-performing predictive models, and a specific variable is important to some of them but not to others? In that case, we cannot tell from a single well-performing model if a variable is always important, sometimes important, never important or perhaps only important when another variable is not important. Ideally, we would like to explore variable importance for all approximately equally accurate predictive models within the same model class. In this way, we can understand the importance of a variable in the context of other variables, and for many good models. This work introduces the concept of a variable importance cloud, which maps every variable to its importance for every good predictive model. We show properties of the variable importance cloud and draw connections to other areas of statistics. We introduce variable importance diagrams as a projection of the variable importance cloud into two dimensions for visualization purposes. Experiments with criminal justice, marketing data and image classification tasks illustrate how variables can change dramatically in importance for approximately equally accurate predictive models.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: The VIC for uncorrelated variables ρ₁₂ = 0, ρ_1Y = 0.4 and ρ_2Y = 0.5.**

**Fig. 2: The VIC for correlated variables.**

**Fig. 3: Tuning the parameters (r, M).**

**Fig. 4: VID for recidivism using logistic regression.**

**Fig. 5: The VID for image classification.**

**Fig. 6: Visualizing representative models.**

Variable selection for inferential models with relatively high-dimensional data: Between method heterogeneity and covariate stability as adjuncts to robust selection

Article Open access 14 May 2020

Eliana Lima, Peers Davies, … Martin Green

Algorithms to estimate Shapley value feature attributions

Article 22 May 2023

Hugh Chen, Ian C. Covert, … Su-In Lee

Easy computation of the Bayes factor to fully quantify Occam’s razor in least-squares fitting and to guide actions

Article Open access 19 January 2022

D. J. Dunstan, J. Crowne & A. J. Drew

Data availability

The datasets analysed in this paper are publicly available. The COMPAS dataset is available in Propublica’s repository⁷. The image dataset of dogs and cats is available at ImageNet⁸. The image we use in Fig. 6 is provided in the Supplementary Information. Our experiment on the in-vehicle coupon recommendation dataset in the Supplementary Information uses data from ref. ²¹. Source data are provided with this paper.

Code availability

The code we use in our paper can be downloaded from ref. ²².

References

Breiman, L. et al. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat. Sci. 16, 199–231 (2001).
Article Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Fisher, A., Rudin, C. & Dominici, F. All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Mach. Learn. Res. 20, 1–81 (2019).
MathSciNet MATH Google Scholar
Semenova, L., Rudin, C. & Parr, R. A study in Rashomon curves and volumes: a new perspective on generalization and model simplicity in machine learning. Preprint at https://arxiv.org/abs/1908.01755 (2020).
Lin, J., Zhong, C., Hu, D., Rudin, C. & Seltzer, M. Generalized and scalable optimal sparse decision trees. Preprint at https://arxiv.org/abs/2006.08690 (2020).
Flores, A. W., Bechtel, K. & Lowenkamp, C. T. False positives, false negatives, and false analyses: a rejoinder to machine bias: there’s software used across the country to predict future criminals. And it’s biased against blacks. Fed. Prob. 80, 38–46 (2016).
Google Scholar
Larson, J., Mattu, S., Kirchner, L. & Angwin, J. How we analyzed the COMPAS recidivism algorithm. GitHub https://github.com/propublica/compas-analysis (2017).
Deng, J. et al. ImageNet: a large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. Preprint at https://arxiv.org/abs/1409.1556 (2014).
Brown, G., Pocock, A., Zhao, M.-J. & Luján, M. Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13, 27–66 (2012).
MathSciNet MATH Google Scholar
Vinh, N. X., Zhou, S., Chan, J. & Bailey, J. Can high-order dependencies improve mutual information based feature selection? Pattern Recognit. 53, 46–58 (2016).
Article Google Scholar
Choi, Y., Darwiche, A. & Van den Broeck, G. Optimal feature selection for decision robustness in Bayesian networks. In Proc. 26th International Joint Conference on Artificial Intelligence (IJCAI) 1554–1560 (AAAI, 2017).
Van Haaren, J. & Davis, J. Markov network structure learning: a randomized feature generation approach. In Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI, 2012).
Coker, B., Rudin, C. & King, G. A theory of statistical inference for ensuring the robustness of scientific results. Preprint at https://arxiv.org/abs/1804.08646 (2018).
Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001).
Article MathSciNet Google Scholar
Velleman, P. F. & Welsch, R. E. Efficient computing of regression diagnostics. Am. Stat. 35, 234–242 (1981).
MATH Google Scholar
Casalicchio, G., Molnar, C. & Bischl, B. Visualizing the feature importance for black box models. Preprint at https://arxiv.org/abs/1804.06620 (2018).
Gevrey, M., Dimopoulos, I. & Lek, S. Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecol. Model. 160, 249–264 (2003).
Article Google Scholar
Harel, J., Koch, C. & Perona, P. Graph-based visual saliency. In Advances in Neural Information Processing Systems 545–552 (NeurIPS, 2007).
Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T. & Zeileis, A. Conditional variable importance for random forests. BMC Bioinformatics 9, 307 (2008).
Article Google Scholar
Wang, T. et al. A Bayesian framework for learning rule sets for interpretable classification. J. Mach. Learn. Res. 18, 2357–2393 (2017).
MathSciNet Google Scholar
Dong, J. & Rudin, C. Jiayun-Dong/vic v1.0.0. Zenodo https://doi.org/10.5281/zenodo.4065582 (2020).
Hayashi, F. Econometrics (Princeton Univ. Press, 2000).

Download references

Author information

Authors and Affiliations

Department of Economics, Duke University, Durham, NC, USA
Jiayun Dong
Departments of Computer Science, Electrical and Computer Engineering, and Statistical Science, Duke University, Durham, NC, USA
Cynthia Rudin

Authors

Jiayun Dong
View author publications
You can also search for this author in PubMed Google Scholar
Cynthia Rudin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Both authors contributed to the conception, analytics and writing of the study. The experiments were conducted by J.D. The code was designed by J.D.

Corresponding author

Correspondence to Jiayun Dong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Professor Kristian Kersting and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Logistic loss function.

Logistic loss.

Extended Data Fig. 2 VID of decision tree models for the recidivism experiment.

VID for Recidivism: decision trees. This is the projective of the VIC onto the space spanned by the four variables of interest: age, race, prior criminal history and gender. Unlike Fig. 4, the VIC is generated by the Rashomon set that consists of the all the good decision trees instead of logistic regression models. However, the diagrams should be interpreted in the same way as before.

Supplementary information

Supplementary Information

Supplementary experiment.

Source data

Source Data Fig. 1

Source data for our experiment.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, J., Rudin, C. Exploring the cloud of variable importance for the set of all good models. Nat Mach Intell 2, 810–824 (2020). https://doi.org/10.1038/s42256-020-00264-0

Download citation

Received: 06 February 2020
Accepted: 25 October 2020
Published: 10 December 2020
Issue Date: December 2020
DOI: https://doi.org/10.1038/s42256-020-00264-0

This article is cited by

Auditing and Debugging Deep Learning Models via Flip Points: Individual-Level and Group-Level Analysis
- Roozbeh Yousefzadeh
- Dianne P. O’Leary
La Matematica (2022)