a, b, Comparison between theoretical and empirical articles (a) and review and non-review articles (b). a, We separate 4,258 papers from www.arXiv.org published between 1992 and 2003 into two groups on the basis of the number of figures they contain; this grouping comprised 1,502 articles without figures and 2,756 articles with figures. The assumption is that empirical papers tend to contain more figures than theoretical papers23. We match these articles to the WOS datasets and observe that for both theoretical and empirical articles, the disruption percentile decreases with the growth of team size. b, We select two groups of WOS articles on the basis of journal name; 22,672 reviewing articles published across 48 journals that have both ‘annual’ and ‘review’ in the title, and their 1,338,808 references (reviewed articles). For both reviewing and reviewed articles, the disruption percentile decreases with team size. c, d, Comparison of US patents across classes and owners. We plot the disruption percentile against team size for the seven most popular classes of patents (92,175 patents) (c) and the top five companies legally assigned the most patents (21,261 patents) (d) from 2002 to 2009. We observe that the decrease in disruption and increase in team size holds broadly across classes and owners. The moving average technique used in Extended Data Fig. 4 is used to smooth the curve (smoothing parameter k = 0.1). As sample size decreases rapidly with team size in the patent data, we assigned equal weights across team sizes in applying the smoothing technique. e, f, Comparison of GitHub software projects across programming languages and code-base sizes. We plot the disruption percentile against team size for the seven most popular programming languages (18,702) (e) and four scales of code-base sizes (24,853 code-bases) (f) from 2011 to 2014. The decrease in disruption with growth of team size holds broadly across programming languages and code-base sizes. g, Simplified citation networks comprising focal papers (blue diamonds), references (grey circles) and subsequent work (rectangles). Subsequent work may cite: (1) only the focal work (i, green), (2) only its references (k, black) or (3) both focal work and references (j, brown). A reference identified as popular is coloured in red, and self-citations are shown by dashed lines (with corresponding subsequent work coloured in light brown). Five definitions of disruption are provided for comparison. D0 is the definition of disruption used in the main text. D1is defined the same way as D0, but with self-citations excluded. D2 is defined the same way as D0, but only considers popular references. We identified references as popular that received citations within the top quartile of the total citation distribution (≥24 citations). D3 simplifies D0 by only measuring the fraction of papers that cite the focal paper and not its references, among all papers citing the focal paper, which equals ni/(ni + nj). D4 is similar to D3, but considers the number of citations and not papers cited in calculating the fraction (for example, if a single referenced paper is cited five times, then it receives a count of five rather than one in this measure). h, A citation network copied from g, with one additional citation edge (brown curve) added. As a consequence, some—but not all—disruption measure variants change. i, All disruption measures decrease with team size. D0 and D1 are indexed by the right y axis and other disruption measures are indexed by the left y axis. One hundred thousand randomly selected WOS papers (97,188 papers remained after excluding missing data) are used to calculate these disruption values.