# Quantifying the dynamics of failure across science, startups and security

## Abstract

Human achievements are often preceded by repeated attempts that fail, but little is known about the mechanisms that govern the dynamics of failure. Here, building on previous research relating to innovation1,2,3,4,5,6,7, human dynamics8,9,10,11 and learning12,13,14,15,16,17, we develop a simple one-parameter model that mimics how successful future attempts build on past efforts. Solving this model analytically suggests that a phase transition separates the dynamics of failure into regions of progression or stagnation and predicts that, near the critical threshold, agents who share similar characteristics and learning strategies may experience fundamentally different outcomes following failures. Above the critical point, agents exploit incremental refinements to systematically advance towards success, whereas below it, they explore disjoint opportunities without a pattern of improvement. The model makes several empirically testable predictions, demonstrating that those who eventually succeed and those who do not may initially appear similar, but can be characterized by fundamentally distinct failure dynamics in terms of the efficiency and quality associated with each subsequent attempt. We collected large-scale data from three disparate domains and traced repeated attempts by investigators to obtain National Institutes of Health (NIH) grants to fund their research, innovators to successfully exit their startup ventures, and terrorist organizations to claim casualties in violent attacks. We find broadly consistent empirical support across all three domains, which systematically verifies each prediction of our model. Together, our findings unveil detectable yet previously unknown early signals that enable us to identify failure dynamics that will lead to ultimate success or failure. Given the ubiquitous nature of failure and the paucity of quantitative approaches to understand it, these results represent an initial step towards the deeper understanding of the complex dynamics underlying failure.

## Access options

from\$8.99

All prices are NET prices.

## Data availability

This paper makes use of restricted access data from the National Institutes of Health (NIH), protected by the Privacy Act of 1974 as amended (5 U.S.C. 552a). Deidentified data necessary to reproduce all plots and statistical analyses are freely available at https://yian-yin.github.io/quantifyFailure. Those wishing to access the raw data can apply for access following the procedures outlined in the NIH Data Access Policy document (http://report.nih.gov/pdf/DataAccessPolicy.pdf). The VentureXpert database is available from Thomson Reuters. The Global Terrorism Database is publicly available at https://www.start.umd.edu/gtd/.

## Code availability

Code is available at https://yian-yin.github.io/quantifyFailure.

## References

1. 1.

Fortunato, S. et al. Science of science. Science 359, eaao0185 (2018).

2. 2.

Harford, T. Adapt: Why Success Always Starts with Failure (Farrar, Straus and Giroux, 2011).

3. 3.

Wuchty, S., Jones, B. F. & Uzzi, B. The increasing dominance of teams in production of knowledge. Science 316, 1036–1039 (2007).

4. 4.

Jones, B. F. The burden of knowledge and the “death of the renaissance man”: is innovation getting harder? Rev. Econ. Stud. 76, 283–317 (2009).

5. 5.

Sinatra, R., Wang, D., Deville, P., Song, C. & Barabási, A.-L. Quantifying the evolution of individual scientific impact. Science 354, aaf5239 (2016).

6. 6.

Liu, L. et al. Hot streaks in artistic, cultural, and scientific careers. Nature 559, 396–399 (2018).

7. 7.

Hu, Y., Havlin, S. & Makse, H. A. Conditions for viral influence spreading through multiplex correlated social networks. Phys. Rev. X 4, 021031 (2014).

8. 8.

Barabási, A.-L. The origin of bursts and heavy tails in human dynamics. Nature 435, 207–211 (2005).

9. 9.

González, M. C., Hidalgo, C. A. & Barabási, A.-L. Understanding individual human mobility patterns. Nature 453, 779–782 (2008).

10. 10.

Castellano, C., Fortunato, S. & Loreto, V. Statistical physics of social dynamics. Rev. Mod. Phys. 81, 591–646 (2009).

11. 11.

Malmgren, R. D., Stouffer, D. B., Campanharo, A. S. & Amaral, L. A. N. On universality in human correspondence activity. Science 325, 1696–1700 (2009).

12. 12.

Argote, L. Organizational Learning: Creating, Retaining and Transferring Knowledge (Springer Science & Business Media, 2012).

13. 13.

Sitkin, S. B. Learning through failure: the strategy of small losses. Res. Organ. Behav. 14, 231–266 (1992).

14. 14.

Yelle, L. E. The learning curve: historical review and comprehensive survey. Decis. Sci. 10, 302–328 (1979).

15. 15.

Dutton, J. M. & Thomas, A. Treating progress functions as a managerial opportunity. Acad. Manage. Rev. 9, 235–247 (1984).

16. 16.

Huber, G. P. Organizational learning: the contributing processes and the literatures. Organ. Sci. 2, 88–115 (1991).

17. 17.

Cannon, M. D. & Edmondson, A. C. Failing to learn and learning to fail (intelligently): how great organizations put failure to work to innovate and improve. Long Range Plann. 38, 299–319 (2005).

18. 18.

Kaplan, S. N. & Lerner, J. in Measuring Entrepreneurial Businesses: Current Knowledge and Challenges (Univ. Chicago Press, 2016).

19. 19.

Eggers, J. P. & Song, L. Dealing with failure: serial entrepreneurs and the costs of changing industries between ventures. Acad. Manage. J. 58, 1785–1803 (2015).

20. 20.

National Consortium for the Study of Terrorism and Responses to Terrorism. Global Terrorism Database (GTD) https://www.start.umd.edu/research-projects/global-terrorism-database-gtd (2018).

21. 21.

Clauset, A. & Gleditsch, K. S. The developmental dynamics of terrorist organizations. PLoS ONE 7, e48633 (2012).

22. 22.

Johnson, N. et al. Pattern in escalations in insurgent and terrorist activity. Science 333, 81–84 (2011).

23. 23.

Newell, A. & Rosenbloom, P. S. in Cognitive Skills and their Acquisition 1 (ed. Anderson, J. R.) 1–55 (Erlbaum, 1981).

24. 24.

Anderson, J. R. Acquisition of cognitive skill. Psychol. Rev. 89, 369–406 (1982).

25. 25.

Muth, J. F. Search theory and the manufacturing progress function. Manage. Sci. 32, 948–962 (1986).

26. 26.

Wright, T. P. Factors affecting the cost of airplanes. J. Aeronaut. Sci. 3, 122–128 (1936).

27. 27.

March, J. G. Exploration and exploitation in organizational learning. Organ. Sci. 2, 71–87 (1991).

28. 28.

Foster, J. G., Rzhetsky, A. & Evans, J. A. Tradition and innovation in scientists’ research strategies. Am. Sociol. Rev. 80, 875–908 (2015).

29. 29.

Arbesman, S. The Half-life of Facts: Why Everything We Know Has an Expiration Date (Penguin, 2013).

30. 30.

Madsen, P. M. & Desai, V. Failing to learn? The effects of failure and success on organizational learning in the global orbital launch vehicle industry. Acad. Manage. J. 53, 451–476 (2010).

31. 31.

Argote, L., Beckman, S. L. & Epple, D. The persistence and transfer of learning in industrial settings. Manage. Sci. 36, 140–154 (1990).

32. 32.

Kuhn, T. S. The Structure of Scientific Revolutions (Chicago Univ. Press, 2012).

33. 33.

Merton, R. K. Singletons and multiples in scientific discovery: a chapter in the sociology of science. Proc. Am. Phil. Soc. 105, 470–486 (1961).

34. 34.

Gompers, P., Kovner, A., Lerner, J. & Scharfstein, D. Performance persistence in entrepreneurship. J. Financ. Econ. 96, 18–32 (2010).

35. 35.

de Holan, P. M. & Phillips, N. Remembrance of things past? the dynamics of organizational forgetting. Manage. Sci. 50, 1603–1613 (2004).

36. 36.

Schelling, T. C. Micromotives and Macrobehavior (WW Norton & Company, 2006).

37. 37.

Watts, D. J. A simple model of global cascades on random networks. Proc. Natl Acad. Sci. USA 99, 5766–5771 (2002).

38. 38.

Holme, P. & Newman, M. E. Nonequilibrium phase transition in the coevolution of networks and opinions. Phys. Rev. E 74, 056108 (2006).

39. 39.

Ginther, D. K. et al. Race, ethnicity, and NIH research awards. Science 333, 1015–1019 (2011).

40. 40.

Boudreau, K. J., Guinan, E. C., Lakhani, K. R. & Riedl, C. Looking across and looking beyond the knowledge frontier: intellectual distance, novelty, and resource allocation in science. Manage. Sci. 62, 2765–2783 (2016).

41. 41.

Bromham, L., Dinnage, R. & Hua, X. Interdisciplinary research has consistently lower funding success. Nature 534, 684–687 (2016).

42. 42.

Banal-Estanol, A., Macho-Stadler, I. & Pérez Castrillo, D. Key Success Drivers in Public Research Grants: Funding the Seeds of Radical Innovation in Academia? CESifo Working Paper Series 5852 (CESifo, 2016).

43. 43.

Ma, A., Mondragón, R. J. & Latora, V. Anatomy of funded research in science. Proc. Natl Acad. Sci. USA 112, 14760–14765 (2015).

44. 44.

Levitt, B. & March, J. G. Organizational learning. Annu. Rev. Sociol. 14, 319–338 (1988).

45. 45.

Argote, L. & Epple, D. Learning curves in manufacturing. Science 247, 920–924 (1990).

46. 46.

Merton, R. K. et al. The Matthew effect in science. Science 159, 56–63 (1968).

47. 47.

Huang, J., Ertekin, S. & Giles, C. L. Efficient name disambiguation for large-scale databases. In European Conference on Principles of Data Mining and Knowledge Discovery 536–544 (Springer, 2006).

48. 48.

Shen, H. Inequality quantified: Mind the gender gap. Nature 495, 22–24 (2013).

49. 49.

Larivière, V., Ni, C., Gingras, Y., Cronin, B. & Sugimoto, C. R. Bibliometrics: global gender disparities in science. Nature 504, 211–213 (2013).

50. 50.

Yang, T. & Aldrich, H. E. Who’s the boss? Explaining gender inequality in entrepreneurial teams. Am. Sociol. Rev. 79, 303–327 (2014).

51. 51.

Argote, L., Insko, C. A., Yovetich, N. & Romero, A. A. Group learning curves: the effects of turnover and task complexity on group performance. J. Appl. Soc. Psychol. 25, 512–529 (1995).

52. 52.

Bailey, C. D. Forgetting and the learning curve: a laboratory study. Manage. Sci. 35, 340–352 (1989).

## Acknowledgements

We thank C. Song, A. Clauset, B. Uzzi, B. Jones, E. Finkel, J. Van Mieghem, A. Bassamboo and Y. Xie for helpful discussions, and H. Sauermann and S. Havlin for suggesting extensions of the model, leading us to discover the kα and kα –δ models. This work is supported by the Air Force Office of Scientific Research under award number FA9550-15-1-0162, FA9550-17-1-0089 and FA9550-19-1-0354, National Science Foundation grant SBE 1829344, the Alfred P. Sloan Foundation G-2019-12485, and Northwestern University Data Science Initiative. This work does not reflect the position of NIH.

## Author information

D.W. conceived the project and designed the experiments; Y.Y. and Y.W. collected data and performed empirical analyses with help from D.W. and J.A.E.; Y.Y. and D.W. carried out theoretical calculations; all authors collaboratively designed the model and interpreted results; D.W. and Y.Y. wrote the manuscript; all authors edited the manuscript.

Correspondence to Dashun Wang.

## Ethics declarations

### Competing interests

Y.W. and D.W. serve as special volunteers (unpaid) to the NIH. The remaining authors declare no competing interests.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Peer review information Nature thanks Shlomo Havlin and Henry Sauermann for their contribution to the peer review of this work.

## Extended data figures and tables

### Extended Data Fig. 1 The k model.

af, Simulation results from the model (α = 0.6) for the cases of k = 0 (a, d) and k → ∞ (b, e) in terms of the average quality (ac) and efficiency (df) of each attempt. k = 0 recovers the chance model, predicting a constant quality (c) and efficiency (f). k → ∞ predicts temporal scaling that characterizes the dynamics of failure (e) with improved quality (b), recovering predictions from learning curves and Wright’s law. gj, Illustration of mapping between failure dynamics (g, h) and canonical ensembles (i, j). The canonical system is characterized by three different states a, b, c with corresponding energy densities Ea(h), Eb(h), Ec(h). Here we assume Ea(h) = (2εh − 1)2, Eb(h) = (2h − 1)2 and Ec(h) = [2ε(1 − h) − 1]2 where ε → 0+. The introduction of ε is to distinguish state a from state c, both of which can be approximated in the limiting condition Ea(h) = Ec(h) = 0. We map f → (2Γ − 1)2, N → ln[n], h → K and Ei(h) = [2Γi(K) − 1]2. In this case, the two transition points k* and k* + 1 correspond to h = 0 and 1 in the canonical ensemble systems.

### Extended Data Fig. 2 Predicting temporal dynamics in science, entrepreneurship and security.

ac, We compare the goodness of fit for three different models in temporal dynamics in NIH grants (a, n = 10345), startups (b, n = 275) and terrorist attacks (c, n = 136). For each individual sample, we take all but the last inter-event time for model fitting (n = 1, …, N − 1), comparing model predictions for the last inter-event time. The tested functional forms are power law, tn = anb; exponential, tn = abn; and linear, tn = a + bn. We then calculate the frequency that each model reaches minimum error, defined as $$|\,\log ({t}_{N})-\,\log ({\hat{t}}_{N})|$$, among all three forms. The power-law model offers consistently better predictions. df, As in ac, but using $$|{t}_{N}-{\hat{t}}_{N}|$$ as the loss function.

### Extended Data Fig. 3 Predicting ultimate success in science, entrepreneurship and security.

ac, Area under the receiver operating characteristic curve (AUC) of the prediction task. We apply two logistic regression models (Supplementary Information 6.1) to predict ultimate success in NIH grants (a), startups (b) and terrorist attacks (c). The centres and error bars of AUC scores denote the mean ± s.e.m. calculated from tenfold cross-validation over 50 randomized iterations (green, model 1; red, model 2). d, e, As in a but predicting ultimate success in NIH grants for male (d) and female (e) investigators.

### Extended Data Fig. 4 Model validations.

a, b, An illustration of the component dynamics. We extract all MeSH terms associated with the nth attempt, Sn, and calculate the number of new terms mn, defined as $$|{S}_{n}-({S}_{n-1}\cup \cdots \cup {S}_{n-k})|$$. b, Testing component dynamics in NIH grant applications. We calculate the dynamics of Mn = 〈mn〉/〈m1〉 using different k and compare it with Tn. The centres and error bars of Mn show the mean ± s.e.m. (n = 5,899) for different k. The shaded area shows mean ± s.e.m. of Tn (log scale) measured on the same subset. All k > 3 lead to similar trends between Mn and Tn. ce, Length of failure streak after randomization in science (c), entrepreneurship (d) and security (e). We take the samples used in Fig. 1 and shuffle the success/failure label from each attempt. This operation keeps both the overall success rate and the total number of attempts for each individual constant. fh, Temporal scaling patterns within the successful group in science (f), entrepreneurship (g) and security (h). We separated the successful group into two subgroups (narrow winners and clear winners) based on eventual performance (0.9 in evaluation score for D1, 0.5 in investment amount for D2 and 1 in wounded individuals for D3). The shaded area shows mean ± s.e.m. of Tn (log scale).

### Extended Data Fig. 5 Robustness check on definition of unsuccessful group.

al, Robustness check as we change the threshold of inactivity to 3 years. ac, Failure streak in science (a), entrepreneurship (b) and security (c). Blue circles represent real data from the successful group and dashed lines represent fitted Weibull distributions. df, Temporal scaling patterns in science (d), entrepreneurship (e) and security (f). The shaded area shows mean ± s.e.m. of Tn (log scale). gi, Performance dynamics in science (g, n = 641, 231, 578, 190, from left to right), entrepreneurship (h, n = 248, 1,332, 237, 1,312 from left to right) and security (i, n = 238, 198, 236, 199, from left to right). The successful and unsuccessful groups that experienced a large number of consecutive failures before the last attempt (at least 5 for D1, 3 for D2 and 2 for D3) appear indistinguishable for first failures (two-sided Welch’s t-test; P = 0.566, 0.671 and 0.349), but quickly diverge for second failures (two-sided Welch’s t-test; P = 2.09 × 10−2, 4.95 × 10−3 and 7.77 × 10−2). The successful group also shows significant improvement in performance (one-sided Welch’s t-test; P = 7.03 × 10−2, 2.37 × 10−2 and 2.32 × 10−2), which is absent for the unsuccessful group (one-sided Welch’s t-test; P = 0.717, 0.176 and 0.786). Data are mean ± s.e.m. jl, AUC score of predicting ultimate success in science (j), entrepreneurship (k) and security (l). The centres and error bars of AUC scores denote the mean ± s.e.m calculated from tenfold cross-validation over 50 randomized iterations. mx, As in al but using 7 years as the threshold of inactivity. Sample sizes are s: n = 620, 101, 559, 76; t: n = 248, 977, 237, 989; u: n = 216, 152, 214, 153. P values in su (from bottom to top) are P = 0.883 (s), 0.671 (t), 0.456 (u); P = 2.25 × 10−2 (s), 1.38 × 10−3 (t), 8.34 × 10−2 (u); P = 4.59 × 10−2 (s), 2.37 × 10−2 (t), 3.33 × 10−2 (u); P = 0.838 (s), 0.446 (t), 0.775 (u). *P < 0.1, **P < 0.05, ***P < 0.01, NS, not significant (P ≥ 0.1).

### Extended Data Fig. 6 Robustness check on D1.

ac, Failure streak as we change the score threshold to 55 (a), exclude revisions as successes (b) and only focus on new principal investigators without previous R01 grants (c). Blue circles represent real data from successful groups and dashed lines represent fitted Weibull distributions. df, Temporal scaling patterns as we change the score threshold to 55 (d), exclude revisions as successes (e) and only focus on new principal investigators without previous R01 grants (f). The shaded area shows mean ± s.e.m. of Tn (log scale). gi, Performance dynamics as we change the score threshold to 55 (g, n = 768, 189, 686, 170, from left to right), exclude revisions as successes (h, n = 252, 145, 216, 123, from left to right) and only focus on new principal investigators without previous R01 grants (i, n = 1,164, 308, 1,530, 334, from left to right). The successful and unsuccessful groups that experienced a large number of consecutive failures before their last attempt (at least 5 for g and h, and 3 for i) appear indistinguishable for first failures (two-sided Welch’s t-test; P = 0.242, 0.819, 0.289) but quickly diverge for second failures (two-sided Welch’s t-test; P = 3.40 × 10−4, 3.40 × 10−2, 9.70 × 10−7). The successful group also shows a significant improvement in performance (one-sided Welch’s t-test; P = 4.23 × 10−2, 3.04 × 10−2, 1.92 × 10−4), which is absent for the unsuccessful group (one-sided Welch’s t-test; P = 0.863, 0.754, 0.997). Data are mean ± s.e.m. jl, AUC score of predicting ultimate success as we change the score threshold to 55 (j), exclude revisions as successes (k) and only focus on new principal investigators without previous R01 grants (l). The centres and error bars of AUC scores denote the mean ± s.e.m calculated from tenfold cross-validation over 50 randomized iterations. *P < 0.1, **P < 0.05, ***P < 0.01, NS, P ≥ 0.1.

### Extended Data Fig. 7 Robustness check on D2.

ac, Failure streak as we change the threshold of high-value mergers and acquisitions (M&A) to 5% (a), exclude M&As as successes (b) and classify unicorns as successes (c). Blue circles represent real data from successful groups and dashed lines represent fitted Weibull distributions. df, Temporal scaling patterns as we change the threshold of high-value M&A to 5% (d), exclude M&As as successes (e) and include unicorns as successes (f). The shaded area shows mean ± s.e.m. of Tn (log scale). gi, Performance dynamics as we change the threshold of high-value M&A to 5% (g, n = 251, 1,304, 243, 1,284, from left to right), exclude M&As as successes (h, n = 248, 1,335, 237, 1,315, from left to right) and include unicorns as successes (i, n = 257, 1,330, 244, 1,311, from left to right). The successful and unsuccessful groups that experienced a large number of consecutive failures before their last attempt (at least 3) appear indistinguishable for first failures (two-sided Welch’s t-test; P = 0.937, 0.647, 0.620) but quickly diverge for second failures (two-sided Welch’s t-test; P = 9.92 × 10−3, 4.94 × 10−3, 6.33 × 10−3). The successful group also shows a significant improvement in performance (one-sided Welch’s t-test; P = 2.16 × 10−2, 2.37 × 10−2, 2.77 × 10−2), which is absent for the unsuccessful group (one-sided Welch’s t-test; P = 0.224, 0.158, 0.167). Data are mean ± s.e.m. jl, AUC score for predicting ultimate success as we change threshold of high-value M&A to 5% (j), exclude M&As as successes (k) and include unicorns as successes (l). The centres and error bars of AUC scores denote the mean ± s.e.m calculated from tenfold cross-validation over 50 randomized iterations. *P < 0.1, **P < 0.05, ***P < 0.01, NS, P ≥ 0.1.

### Extended Data Fig. 8 Robustness check on D3.

ac, Failure streak as we focus on all samples (a), samples of human-targeted attacks (b) and include vague data on fatalities (c). Blue circles represent real data from successful groups and dashed lines represent fitted Weibull distributions. df, Temporal scaling patterns as we focus on all samples (d), samples of human-targeted attacks (e) and include vague data on fatalities (f). The shaded area shows mean ± s.e.m. of Tn (log scale). gi, Performance dynamics as we focus on all samples (g, n = 231, 231, 229, 232, from left to right), samples of human-targeted attacks (h, n = 176, 173, 173, 174, from left to right) and include vague data on fatalities (i, n = 227, 147, 225, 148, from left to right). The successful and unsuccessful groups that experienced a large number of consecutive failures before their last attempt (at least 2) appear indistinguishable for first failures (two-sided Welch’s t-test; P = 0.400, 0.859, 0.395), but quickly diverge for second failures (two-sided Welch’s t-test; P = 2.08 × 10−3, 6.70 × 10−3, 3.76 × 10−3). The successful group also shows a significant improvement in performance (one-sided Welch’s t-test; P = 2.55 × 10−2, 5.65 × 10−2, 3.77 × 10−2), which is absent for the unsuccessful group (one-sided Welch’s t-test; P = 0.970, 0.901, 0.967). Data are mean ± s.e.m. jl, AUC score of predicting ultimate success as we focus on all samples (j), samples of human-targeted attacks (k) and include vague data on fatalities (l). The centres and error bars of AUC scores denote the mean ± s.e.m calculated from tenfold cross-validation over 50 randomized iterations. mo, Temporal scaling patterns as we change the threshold for the successful group to fatal attacks that killed at least 5 (m), 10 (n) and 100 (o) people. *P < 0.1, **P < 0.05, ***P < 0.01, NS, P ≥ 0.1.

### Extended Data Fig. 9 Additional robustness checks.

ai, Robustness check as we control for temporal variation. ac, Failure streak in science (a), entrepreneurship (b) and security (c). Blue circles represent real data of successful groups and dashed lines represent fitted Weibull distributions. df, Temporal scaling patterns in science (d), entrepreneurship (e) and security (f). The shaded area shows mean ± s.e.m. of Tn (log scale). gi, Performance dynamics in science (g, n = 628, 145, 571, 123, from left to right), entrepreneurship (h, n = 248, 1,332, 237, 1,312, from left to right) and security (i, n = 231, 173, 229, 174, from left to right). The successful and unsuccessful groups that experienced a large number of consecutive failures before their last attempt (at least 5 for D1, 3 for D2 and 2 for D3) appear indistinguishable for first failures (two-sided weighted Welch’s t-test; P = 0.814, 0.728, 0.330) but quickly diverge for second failures (two-sided weighted Welch’s t-test; P = 1.80 × 10−2, 3.10 × 10−2, 4.56 × 10−2). The successful group also shows significant improvement in performance (one-sided weighted Welch’s t-test; P = 2.10 × 10−2, 1.92 × 10−2, 4.53 × 10−2), which is absent for the unsuccessful group (one-sided weighted Welch’s t-test; P = 0.755, 0.175, 0.903). Data are mean ± s.e.m. jl, Performance dynamics as we compare first and halfway attempts in science (j, n = 628, 145, 582, 111, from left to right), entrepreneurship (k, n = 248, 1,332, 240, 1,294, from left to right) and security (l, n = 231, 173, 228, 175, from left to right). The successful and unsuccessful groups that experienced a large number of consecutive failures before their last attempt (at least 5 for D1, 3 for D2 and 2 for D3) appear indistinguishable for first failures (two-sided Welch’s t-test; P = 0.898, 0.671, 0.289) but diverge for halfway failures (two-sided Welch’s t-test; P = 2.18 × 10−5, 1.34 × 10−2, 1.34 × 10−2). The successful group also shows significant improvement in performance (one-sided Welch’s t-test; P = 2.35 × 10−2, 4.54 × 10−2, 3.69 × 10−2), which is absent for the unsuccessful group (one-sided Welch’s t-test; P = 0.992, 0.252, 0.955). Data are mean ± s.e.m. mo, Performance dynamics as we compare the first and penultimate attempts in science (m, n = 628, 145, 896, 87, from left to right), entrepreneurship (n, n = 248, 1,332, 227, 1,199, from left to right) and security (o, n = 231, 173, 230, 173, from left to right). The successful and unsuccessful groups that experienced a large number of consecutive failures before the last attempt (at least 5 for D1, 3 for D2 and 2 for D3) appear indistinguishable for first failures (two-sided Welch’s t-test, P = 0.898, 0.671, 0.289) but diverge for penultimate failures (two-sided Welch’s t-test; P = 8.50 × 10−8, 3.12 × 10−2, 1.13 × 10−2). The successful group also shows a significant improvement in performance (one-sided Welch’s t-test; P = 5.79 × 10−9, 4.30 × 10−2, 1.33 × 10−2), which is absent for the unsuccessful group (one-sided Welch’s t-test; P = 0.980, 0.138, 0.923). Data are mean ± s.e.m. pr, The correlation between length of failure streak and initial performance (samples with repeated failures) in science (p, n = 12,171), entrepreneurship (q, n = 2,086) and security (r, n = 441). Correlation is weak across all three datasets (Pearson correlation; r = −0.051, −0.011, −0.107 for p, q, r, respectively). su, Length of failure streak still follow fat-tailed distributions conditional on bottom 10% initial performance samples in science (s, n = 6,339), entrepreneurship (t, n = 2,438) and security (u, n = 1,092). Two-sided Kolmogorov–Smirnov test between sample and exponential distributions rejects the hypothesis that the two distributions are identical with P < 0.01. *P < 0.1, **P < 0.05, ***P < 0.01, NS, P ≥ 0.1.

### Extended Data Fig. 10 Generalization of the k model.

a, The α parameter connects the potential to improve (1 − x) with the likelihood of creating new versions p through p = (1 − x)α. b, Phase diagram of the kα model. The two-dimensional parameter space is separated into three regimes, with boundaries at  = 1 and (k − 1)α = 1. c, The impact of δ parameter on scaling exponent γ for given k = 1, 2, 3 and α = 0.4, 0.8, 1.2. We find that δ may affect the temporal scaling parameter when it is small, but has no further effect beyond a certain point δ* = min(α, 1/(k − 1)). d, Phase diagram of the kαδ model for k = 3, with boundaries at α = δ, (k − 1)δ = 1, (k − 1)δ + α = 1,  = 1 and (k−1)α = 1, respectively.

## Supplementary information

### Supplementary Information

This file contains the following sections: 1 Data description; 2 Related work and models; 3 Modeling failure dynamics; 4 Generalized models; 5 Empirical measurements; 6 Prediction task; 7 Robustness checks; and Supplementary Tables 1-4 and additional references.

## Rights and permissions

Reprints and Permissions

Yin, Y., Wang, Y., Evans, J.A. et al. Quantifying the dynamics of failure across science, startups and security. Nature 575, 190–194 (2019) doi:10.1038/s41586-019-1725-y

• #### DOI

https://doi.org/10.1038/s41586-019-1725-y