Grand challenges and emergent modes of convergence science

To address complex problems, scholars are increasingly faced with challenges of integrating diverse domains. We analyzed the evolution of this convergence paradigm in the ecosystem of brain science, a research frontier that provides a contemporary testbed for evaluating two modes of cross-domain integration: (a) cross-disciplinary collaboration among experts from academic departments associated with disparate disciplines; and (b) cross-topic knowledge recombination across distinct subject areas. We show that research involving both modes features a 16% citation premium relative to a mono-domain baseline. We further show that the cross-disciplinary mode is essential for integrating across large epistemic distances. Yet we find research utilizing cross-topic exploration alone—a convergence shortcut—to be growing in prevalence at roughly 3% per year, significantly outpacing the more essential cross-disciplinary convergence mode. By measuring shifts in the prevalence and impact of different convergence modes in the 5-year intervals up to and after 2013, we find that shortcut patterns may relate to competitive pressures associated with Human Brain funding initiatives launched that year. Without policy adjustments, flagship funding programs may unintentionally incentivize suboptimal integration patterns, thereby undercutting convergence science’s potential in tackling grand challenges.

The history of scientific development is characterized by a pattern of convergence-divergence cycles (Roco et al 2013).In convergence, originally distinct disciplines synergistically interact to address complex problems and accelerate breakthrough discovery (National Research Council 2014).In divergence, in addition to fragmentation resulting from conflicting social forces (Balietti et al 2015), spin-offs occur as new techniques, tools and applications spawn.The evolving fusion of multi-domain expertise during the present convergence cycle carries significant intellectual and organizational challenges (Bromham et al 2016;Fealing & eds. 2011;National Research Council 2005;Pavlidis et al 2014).The core issue is that contemporary convergence takes place in the context of team science (Milojevic 2014;Wuchty et al 2007).Accordingly, collaboration across distinct academic cultures and units faces behavioral (Van Rijnsoever & Hessels 2011) and institutional barriers (National Research Council 2014).
Two early successful examples of convergence are worth mentioning to draw a comparative baseline.First, the Manhattan Project (MP), where physicists, chemists, and engineers successfully worked in the 1940s to control nuclear fission and produce the first atomic bomb, under a tightly run government program (Hughes & Hughes 2003).A half-century later (1990s-2000s), the Human Genome Project (HGP) forged a multi-institutional bond integrating biologists and computer scientists, under an organizational design known as consortium science model whereby teams of teams organize around a well-posed central grand challenge (Helbing 2012), with a common goal to share benefits equitably within and beyond institutional boundaries (Petersen et al 2018).In 10 short years, the HGP led to the mapping and identification of the (a) * To whom correspondence should be addressed; E-mail: ipavlidis@uh.eduor apetersen3@ucmerced.edu human genetic code, ushering civilization into the genomics era.
Brain science is presently supported by major funding programs that span the world over (Grillner et al 2016).In late 2013, the United States launched the BRAIN Initiative ® (Brain Research through Advancing Innovative Neurotechnologies), a public-private effort aimed at developing new experimental tools that will unlock the inner workings of brain circuits (Jorgenson et al 2015).At the same time, the European Union launched the Human Brain Project (HBP), a 10 year funding program based on exascale computing approaches, which aims to build a collaborative infrastructure for advancing knowledge in the fields of neuroscience, brain medicine, and computing (Amunts et al 2016).In 2014, Japan launched the Brain Mapping by Integrated Neurotechnologies for Disease Studies (Brain/MINDS), a program to develop innovative technologies for elucidating primate neural circuit functions (Okano et al 2015).China followed in 2016 with the China Brain Project (CBP), a 15 year program targeting the neural basis of human cognition (Poo et al 2016).Canada (Jabalpurwala 2016), South Korea (Jeong et al 2016), and Australia (Committee et al 2016) followed suit, launching their own brain programs in the late 2010s.
By nature and historical precedence, convergence tends to operate on the frontier of science.In the 2010s, brain science was declared the new research frontier (Quaglio et al 2017) promising health and behavioral applications (Eyre et al 2017).Intensification of brain research has been taking place against a backdrop of an increasingly globalized, interconnected and online scientific commons.This stands in sharp contrast to the nationally unipolar and offline backdrop of the MP and even the HGP.Moreover, the brain funding programs were designed to act as behavioral incentives in an scientific marketplace, aimed at bringing together diverse scholars and ideas.However, despite being oriented around the compelling structure-function brain problem, there were few guidelines on how to configure scholarly expertise to address the brain challenge.As such, these characteristics render brain research a "live experiment" in the international evolution of the convergence paradigm.
Accordingly, we apply data-driven methods to reconstruct the brain science ecosystem as a way to capture the contemporary "pulse" of convergence, explored through a progressive series of research questions regarding its prevalence, anatomy and scientific impact.Given the pervasive funding championing the HBS challenge, we further analyze how the trajectory of HBS convergence has been impacted by the ramp-up of flagship funding initiatives oriented around the world.While previous work explored the role of cross-disciplinary collaboration in the Human Genome Project (Petersen et al 2018), here we extend that framework to differentiate between (a) the disciplinary diversity of the research team and (b) the topical diversity of their research -two alternative means of crossdomain integration.We refer to the former as disciplinary diversity and to the latter as topical diversity.We leverage existing taxonomies -in the case of disciplines, using the Classification of Instructional Program (CIP) system developed by the U.S. National Center for Education Statistics; and for topics using Medical Subject Heading (MeSH) ontology developed by the U.S. National Library of Medicine disciplines -to distinguish mono-domain versus cross-domain activity.Accordingly, we classify HBS research according to four integration types defined by a mono-/cross-{discipline × topic} domain decomposition.
In a highly competitive and open science system with multiple degrees of freedom, our motivating hypothesis is that more than one operational cross-domain integration mode is likely to emerge.With this in mind, we identify five research questions (RQ) addressed in each figure in series.The first (RQ1) regards how to define convergence, which we address by developing a typological framework, one that is generalizable to other frontiers of biomedical science, and is relevant to the evaluation of multiple billion-dollar HBS flagship projects around the world.The second (RQ2) regards the status and impact of brain science convergence: Have HBS interfaces have developed to the point of sustaining fruitful cross-disciplinary knowledge exchange?Does the increasing prevalence of teams adopting convergent approaches correlate with higher scientific impact research?RQ3 addresses whether convergence is evenly distributed across HBS subdomains?And what empirical combinations of distinct subject areas (knowledge) and disciplinary expertise (people) are overrepresented in convergent research?RQ4 follows by seeking to identify whether convergence is evenly distributed over time and geographic region?And finally, RQ5: does the propensity to pursue convergence science or does the citation impact of convergence science depend on the convergence mode?To address this question, we implement hierarchical regression models that differentiate between three convergence modes: research involving cross-disciplinary collaboration, cross-subject-area exploration, or both.Given the lucrative nature of flagship funding initiatives, we hypothesize that the ramp-up of HBS flagships correlates with shifts in the prevalence and relative impact of research adopting these dif-ferent convergence modes.
Our results identify timely and relevant science policy implications.Given contemporary emphasis around accelerating breakthrough discovery (Helbing 2012) by way of strategic research team configurations (Börner et al 2010), convergence science originators called for cross-disciplinary approaches integrating distant disciplines (National Research Council 2014).Instead, our analysis reveals that HBS teams recently tend to integrate diverse topics without necessarily integrating appropriate disciplinary expertise -an approach we identify as a convergence shortcut.
Efficient long-range exploration facilitated by multidisciplinary teams is a defining value proposition of convergence science (National Research Council 2014), and provides a testable mechanism underlying the increased likelihood of large team science producing high-impact research (Wuchty et al 2007).Hence, the emergence and densification of cross-domain interfaces are likely to increase the potential for breakthrough discovery by catalyzing recombinant innovation (Fleming 2001), which effectively expands the solution space accessible to problem-solvers.It then follows that certain configurations are likely to amplify the effectiveness of recombinant innovation.Adapting a triple-helix model of medical innovation (Petersen et al 2016), recombinant innovation manifests from integrating expertise around the three dimensions of supply, demand and technological capabilities: (i) the fundamental biology domain that supplies a theoretical understanding of the anatomical structure-function relation, (ii) the health domain that identifies demand for effective science-based solutions, and (iii) the techno-informatics domain which develops scalable products, processes and services to facilitate matching supply from (i) with demand from (ii) (Yang et al 2021).
In order to overcome the challenges of selecting new strategies from the vast number of possible combinations, prior research finds that innovators are more likely to succeed by way of exploiting their own local expertise (Fleming 2001) rather than individually exploring distant configurations by way of internal expansive learning (Engeström & Sannino 2010).Extending this argument, exploration at unchartered multidisciplinary interfaces is likely to be more successful when integrating knowledge across a team of experts from different domains, thereby hedging against recombinant uncertainty un- derlying the exploration process (Fleming 2004).
A complementary argument for convergence derives from the advantage of diversity for harnessing collective intelligence and identifying successful hybrid strategies (Page 2008).Recent work provides additional empirical support for the competitive advantage of diversity, using cross-border mobility (Petersen 2018) as an instrument for social capital disruption to identify the positive role of research topic and collaborator diversity.

Data collection and notation
Figure 1 shows the multiple sources combined in our study, which integrates publication and author data from Scopus, PubMed, and the Scholar Plot web app (Majeti et al 2020) (see Supplementary Information (SI) Appendix S1 for detailed description).In total, our data sample spans 1945-2018 and consists of 655,386 publications derived from 9,121 distinct Scopus Author profiles, to which we apply the following variable definitions and subscript conventions to capture both articleand scholar-level information.At the article level, subscript p indicates publication-level information such as publication year, y p ; the number of coauthors, k p ; and the number of keywords, w p .Regarding the temporal dimension, a superscript > (respectively, <) indicates data belonging to the 5-year "post" period 2014-2018 (5-year "pre" period 2009-2013), while N (t) represents the total number of articles published in year t.Regarding proxies for scientific impact, we obtained the number of citations c p,t from Scopus, which are counted through late 2019.Since nominal citation counts suffer from systematic temporal bias, we use a normalized citation measure, denoted by z p (see Methods -Normalization of Citation Impact).Regarding author-level information, we use the index a -e.g.we denote the academic age measured in years since a scholar's first publication by τ a,p .
To address RQ1 we classified research according to three category systems indicative of topical, disciplinary and regional clusters.The first category system captures research topic clusters grouped into Subject Areas (SA); counts for each article are represented by a vector with 6 elements, − → SA The variable N SA,p counts the total number of SA categories present in a given article, with min value 1 and max value 6.
The second taxonomy identifies disciplinary clusters determined by author departmental affiliation, which we categorized according to Classification of Instructional Program (CIP) codes.Article-level CIP category counts are repre-sented by −−→ CIP p , with 9 elements pertaining to the following categories: (1) Neurosciences, (2) Biology, (3) Psychology, (4) Biotech.& Genetics, (5) Medical Specialty, (6) Health Sciences, (7) Pathology & Pharmacology, (8) Engineering & Informatics, and (9) Chemistry & Physics & Math.The variable N CIP,p counts the total number of CIP categories present in a given article, with min value 1 and max value 9; Methods and SI Appendix S1 offer more details.
The third taxonomy captures the broad regional scope of each research article team determined by each Scopus author's affiliation location, and represented by the vector − → R p which has 4 elements representing North America, Europe, Australasia, and rest of World.See Fig. S1 for the composition of SA and CIP clusters, and SI Appendix S1 for additional description of how these classification systems are constructed.Figure S2 (Fig. S3) shows the frequency of each SA (CIP) category and the pairwise frequency of all {SA, SA} ({CIP, CIP }) combinations over the 10-year period centered on 2014, along with their relative changes after 2014; See SI Appendix S2-S3 for discussion of the relevant changes in SA and CIP categories after 2014.
We represent the collection of article features by As indicated in Fig. 1, based upon the distribution of types tabulated as counts across vector elements, an article is either cross-domain, representing a diverse mixture of types denoted by X; or mono-domain, denoted by M .We use a generic operator notation to specify how articles are classified as X or M , The objective criteria of the feature operator O is specified by its subscript: for example O SA ( F p ) yields one of two values: X SA or M ; similarly, O CIP ( F p ) = X CIP or M .Note that all scholars map onto a single CIP, hence solo-authored research articles are by definition classified by O CIP as M .While we acknowledge that is possible for a scholar to have significant expertise in two or more domains, we do not account for this duplicity, as it is likely to occur at the margins; hence, the home department CIP represents the scholar's principle domain of expertise.We also classify articles featuring both X SA and X CIP as O SA&CIP ( F p ) = X SA&CIP (and otherwise M ).
To complement these categorical measures, we also developed a scalar measure of an article's cross-domain diversity (see Materials & Methods -Measuring cross-domain diversity for additional details).By way of example, consider the vector − → SA p (or −−→ CIP p ) which tallies the SA (or CIP counts) for a given article p published in year t.We apply the outer tensor product − → SA p ⊗ − → SA p (or −−→ CIP p ⊗ −−→ CIP p ) to represent all pairwise co-occurrences in a weighted matrix D p ( v p ) (where v p represents a generic category vector; see SI Appendix S4 for examples of the outer tensor product).The sum of elements in this co-occurrence matrix are normalized to unity so that each D p ( v p ) contributes equally to averages computed across all articles from a given year or period.Since the off-diagonal elements represent cross-domain combinations, their relative weight given by f D,p = 1 − Tr(D p ) ∈ [0, 1) is a straightforward Blau-like measure of variation and disparity (Harrison & Klein 2007).

Descriptive Analysis
Increasing prevalence of cross-domain science.With the continuing shift towards large team science (Milojevic 2014;Pavlidis et al 2014;Petersen et al 2014;Wuchty et al 2007), one might expect a similar shift in the multiplicity of domains spanned by modern research teams -but to what degree? Figure 2(A) addresses RQ2 by showing the frequencies of monodomain (M ) research articles versus cross-domain articles (X) in our HBS sample.Articles were separated into aboveand below-average citation impact (z) for each publicationyear cohort (t), and within each of these two subsets we calculated the fraction f # (t|z) of articles containing combinations across # = 1, 2, 3 and 4 categories.The fraction of monodomain articles is trending downward, which we observe for both research topics (SA) and authors' disciplinary affiliations (CIP).The decline is much more steep for SA than for CIP.Correspondingly, cross-domain articles have become increasingly prevalent, in particular for SA.For both SA and CIP the two-category mixtures dominate the three-and four-category mixtures in frequency, in sequence.Accordingly, in the sections that follow we do not distinguish between cross-domain articles with different #.
As a first indication of the comparative advantage associated with X, we observe a robust inequality f # (t|z > 0) > f # (t|z < 0) for cross-domain research (# ≥ 2), meaning a higher frequency of cross-domain combinations is observed among articles with higher impact.Contrariwise, in the case of mono-domain research the opposite phenomenon occurs, f 1 (t|z > 0) < f 1 (t|z < 0).Taking into consideration temporal trends, these robust patterns indicate a faster depletion of impactful mono-domain articles, coincident with an increased prevalence of impactful research drawing upon integrative recombinant innovation.
Recombinant innovation at the convergence nexus.Comprehensive analysis of biomedical science indicates that convergence has largely been mediated around the integration of modern techno-informatics capabilities (Yang et al 2021).Yet within any domain, in particular HBS, the questions remains as to the development of a functional nexus that sustains and possibly even accelerates high-impact discovery by both expanding the number of possible functional expertise configurations and supporting rich cross-disciplinary exchange of new knowledge and best practices.The robust inequality f # (t|z > 0) > f # (t|z < 0) provides support at the aggregate level, but does not lend any structural evidence.
To further address RQ2, Fig. 2(B) illustrates the composition of the HBS convergence nexus, showing integration of cross-disciplinary expertise across three broad yet distinct biomedical domains.Shown are the populations of HBS researchers by region, represented as collaboration networks compared over two non-overlapping 10-year intervals to indicate dynamics.Each node represents a researcher, colored according to three disciplinary CIP superclusters: (i) neurobiological sciences (corresponding to CIP 1-4), (ii) health sciences (CIP 5-7), and (iii) engineering & information sci-FIG.2: Trends in cross-domain scholarship in Human Brain Science.(A) Fraction f # (t|z) of articles published each year t that feature a particular number (#) of categories.Articles are split into an above-average citation subset (zp > 0) and below-average citation subset (zp < 0).Upper panel: Articles categorized by SA.Middle panel: Articles categorized by CIP; subpanel shows data on logarithmic y-axis; Lower panel: Articles categorized by both SA and CIP.Distinguishing frequencies by citation group indicates higher levels of cross-domain combinations among research articles with higher scientific impact -for both SA and CIP.However, cross-domain activity levels are visibly higher for SA than for CIP, indicating higher barriers to boundary-crossing arising from mixing different scholar expertise.(B) Snapshots of the collaboration network at 10-year intervals indicating researcher population sizes by region, and the densification of convergence science at cross-disciplinary interfaces.Nodes (researchers) are sized according to the number of collaborators (link degree) within each time window.
ences .Node locations are fixed to facilitate visual representation of network densification.Inter-and crossregional comparison alludes to the emergence and densification of cross-domain interfaces (see also Fig. S4).Because the network layout is determined by the underlying structure, there is a high degree of clustering by node color, emphasizing both the relative sizes of the subpopulations that are wellbalanced across region and time, and also the convergent interfaces where cross-disciplinary collaboration and knowledge exchange are likely to catalyze.As such, these communities of expertise conjure the image of a Pólya urn, whereby successful configurations reinforce the adoption of similar configurations.
The links that span disciplinary boundaries are fundamental conduits across which scientists' strategic affinity for exploration (Foster et al 2015;Rotolo & Messeni Petruzzelli 2013) is effected via cross-disciplinary collaboration that brings "together distinctive components of two or more disciplines" (Nissani 1995;Petersen et al 2018).Our analysis of cross-disciplinary collaboration indicates that the fraction of articles featuring convergent collaboration have continued to grow over the last two decades (see Fig. S4).In what follows we further distinguish between integration across neighboring (Leahey & Moody 2014) and distant domains, with the latter appropriately representing convergence (National Research Council 2005, 2014;Roco et al 2013).
Cross-domain convergence of expertise (CIP) and knowledge (SA).In the context of the bureaucratic structurefunction problem, team assembly should be optimized by strategically matching scholarly expertise and research topics to address the particular demands of a particular challenge.Hence, with 9 different disciplinary (CIP) domains historically faced with a variety of challenges, RQ3 addresses to what degree these domains differ in terms of their composition of targeted SA.Fig. 3(A) illustrates the evolution of topical diversity within and across each CIP cluster, revealing several common patterns.First, nearly all domains show a reduction in research pertaining to structure (SA 2), with the exception of Biotechnology & Genetics, which was oriented around the structure-function problem from the outset.As such, this domain features a steady balance between SA 2-5, while being an early adopter of techno-informatics concepts and methods (SA 6).Early balance around the innovation triple-helix (Petersen et al 2016) may explain to some degree the longstanding success of the genomics revolution, as the core disciplines of biology and computing were primed for a fruitful union (Petersen et al 2018).Other HBS disciplinary clusters are also integrating techno-informatic capa-bilities, reflecting a widespread pattern observed across all of biomedical science (Yang et al 2021).
Which CIP-SA combinations are are overrepresented in boundary-crossing HBS research?Inasmuch as mono-domain articles identify the topical boundary closely associated with individual disciplines, cross-domain articles are useful for identifying otherwise obscured boundaries that call for both X CIP and X SA in combination.We identified these novel CIP-SA relations by collecting articles that are purely monodomain for both CIP and SA (i.e., those with O CIP ( F p ) = O SA ( F p ) = M ) and a complementary non-overlapping subset of articles that are simultaneously cross-domain for both CIP and SA (i.e., O SA&CIP ( F p ) = X SA&CIP ).
Starting with mono-domain articles, we identified the SA that are most frequently associated with each CIP category.Formally, this amounts to calculating the bi-partite network between CIP and SA, denoted by M CIP M SA .These All CIP 1 2 3 4 5 6 7 8 9 CIP-SA associations are calculated by averaging the − → SA p for mono-domain articles from each CIP category, given by − → SA CIP .Figure 3(B) highlights the most prominent CIP-SA links (see SI Appendix S5 for more details).Likewise, we also calculated the bi-partite network X CIP X SA using the subset of X SA&CIP articles.
To identify the cross-domain frontier, we calculated the network difference ∆ XM ≡ X CIP X SA − M CIP M SA , and plot the links with positive values -i.e.CIP-SA links that are over-represented in X CIP X SA relative to M CIP M SA .Results identify SA that are reached by way of crossdisciplinary teams.SA 2 (Anatomy and Organisms) and 3 (Phenomena & Processes) representing the structure-function problem, stand out as a potent convergence nexus accessible by teams combining disciplines 1, 2, 4 and 9.
A related key insight concerns the relative increase in SA integration achieved by increased CIP diversity.Figure S5 compares the average number of SA integrated by teams with varying number of distinct CIP, N CIP,p .On average, mono-disciplinary teams (N CIP,p = 1) span 2.2 SA, whereas teams with N CIP,p = 3 span 19% more SA, confirming that cross-disciplinary configurations are functional in achieving research breadth.

Quantitative Model
Trends in cross-domain activity.To address the temporal and geographic parity associated with RQ4, we define three types of cross-domain configurations -Broad, Neighboring, and Distant -defined according to a particular combination of SA and CIP categories featured by a given article.
Broad is the most generic cross-domain configuration, based upon combinations of any two or more SA (or CIP) categories, and represented by our operator notation as Neighboring is the X configuration that captures the neuro-psychological ↔ bio-medical interface representing articles that contain MeSH from SA (1) and also from SA (2, 3 or 4), represented summarily as Distant is the X configuration that captures the neuropsycho-medical ↔ techno-informatic interface.The specific set of category combinations representing this configuration are SA [1-4] × [5,6]; and for CIP, [1,3,5] × [4,8]; as above, articles featuring (or not featuring) categories spanning these categories are represented by X Distant,SA (belong to a counterfactual set indicated by M ), X Distant,CIP (resp., M ), X Distant,SA&CIP (resp., M ).By way of example, the bottom of Figure 1 illustrates an article combining SA 1 and 4, which is thereby classified as both X SA and X Neighboring,SA ; and, an article featuring CIP 1,3,5,8, which is thereby both X CIP and X Distant,CIP .
To complement these categorical variables, we also developed a Blau-like measure of cross-domain diversity, given by f D,p (see Methods Measuring cross-domain diversity).Figure 4 shows the trends in mean diversity f D (t) for the Broad, Neighboring, and Distant configurations.For each configuration we provide a schematic motif illustrating the combinations measured by D p ( v p ), with diagonal components representing mono-domain articles (indicated by 1 on the matrix diagonal) and upper-diagonal elements capturing cross-domain combinations (indicated by X).Comparing SA and CIP, there are higher diversity levels for SA, in addition to a prominent upward trend.In terms of CIP, Fig. 4(A) indicates a decline in Broad diversity in recent years, with North America (NA) showing higher levels than Europe (EU) and Australasia (AA); these general patterns are also evident for Neighboring diversity, see Fig. 4(B).Distant CIP diversity shown in Fig. 4(C) indicates a recent decline for AA and NA, with NA peaking around 2009; contrariwise, EU shows a steady increases consistent with the computational framing of the Human Brain Project.
In contradistinction, all three regions show steady increase irrespective of configuration in the case of SA diversity, consistent with scholars integrating topics without integrating scholarly expertise, possibly owing to differential costs associated with each.For both Broad and Neighboring configurations, NA and EU show remarkably similar levels of SA diversity above AA; however, in the case of Neighboring, AA appears to be catching up quickly since 2010, see Fig. 4(D,E).In the case of Distant, all regions show steady increase that appears to be in lockstep for the entire period.See Figs.S6-S7 and SI Appendix Text S6 for trends in SA and CIP diversity across additional configurations.
Regression model -propensity for and impact of X.To address RQ5, we constructed article-level and author-level panel data to facilitate measuring factors relating to SA and CIP diversity and shifts related to the ramp-up of HBS flagship projects circa 2013 around the globe.To address these two outcomes, we modeled two dependent variables separately: In the first model the dependent variable is the propensity for cross-domain research (indicated by X; depending on the focus around topics, disciplines or both, then X is specified by X SA , X CIP or X SA&CIP ).We use a Logit specification to model the likelihood P (X).In the second model the dependent variable is the article's scientific impact, proxied by c p .Building on previous efforts (Petersen 2018;Petersen et al 2018), we apply a logarithmic transform to c p that facilitates removing the time-dependent trend in the location and scale of the underlying log-normal citation distribution (Radicchi et al 2008) (see Methods -Normalization of Citation Impact). Figure S9 shows the covariation matrix between the principal variables of interest.
Model A: Quantifying the propensity for X and the role of funding.As defined, O( F p ) = X or M is a two-state outcome variable with complementary likelihoods, P (X) + P (M ) = 1.Thus, we apply logistic regression to model the odds Q ≡ P (X) P (M ) , measuring the propensity to adopt crossdomain configurations.We then estimate the annual growth in P (X) by modeling the odds as log(Q p ) = β 0 + β y y p + β • x, where x represents the additional controls for confounding sources of variation, in particular increasing k p associated with the growth of team science (Milojevic 2014;Wuchty et al 2007).See SI Appendix Text S7, in particular Eqns.( S2)-(S4), for the full model specification; and, Tables S1-S3 for parameter estimates.
Summary results shown in Fig. 5(A) indicate a roughly 3% annual growth in P (X SA ), consistent with descriptive trends shown in Fig. 2. In contradistinction, growth rates for P (X CIP ) are generally smaller, indicative of the additional barriers to integrating individual expertise as opposed to just combining different research topics.In the case of P (X SA&CIP ), the growth rate is higher for Distant, where the need for cross-disciplinary expertise cannot be short-circuited as easily as in Neighboring.
A relevant dimension of RQ5 is how HBS projects have altered the propensity for X.Hence, we added an indicator variable I 2014+ which takes the value 1 for articles with y p ≥ 2014 and 0 otherwise.Figure 5(B) indicates significant decline in P (X) for X CIP and X SA&CIP for each configuration on the order of -30%; this result is consistent with the recent increase in f 1 (t|z) visible in Fig. 2

(B).
Model B: Quantifying the citation premium associated with X and funding.We model the normalized citation impact z p = α a + γ X SA I X SA,p + γ X CIP I X CIP,p + β • x, where x represents the additional control variables and α a represents an author fixed-effect to account for unobserved timeinvariant factors specific to each researcher.The primary test variables are I X SA,p and I X CIP,p , two binary factor variables with defined similarly for CIP.To distinguish estimates by configuration, for Neighboring we specify I X Neighboring,SA and I X Neighboring,CIP , with similar notation for Distant.Full model estimates are shown in Tables S4 -S5.
Figure 5(C) summarizes the model estimates -γ X SA , γ X CIP and γ X SA&CIP -quantifying the citation premium attributable to X.To translate the effect on z p into the associated citation premium in c p , we calculate the percent change 100∆c p /c p associated with a shift in I X,p from 0 to 1. Observing that σ t ≈ σ = 1.24 is approximately constant over the period 1970-2018 and due to the property of logs, the citation percent change is given by 100∆c 0 l p x i 5 h j + w P n 8 A U s u j c o = < / l a t e x i t > X < l a t e x i t s h a 1 _ b a s e 6 4 = " l F / j m U A u D z R z q 6 s L u g X r A x B 1 r e g = " > A A A B 6 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P R i 8 c q 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g M v H h T x 6 j / y 5 r 9 x 0 + a < l a t e x i t s h a 1 _ b a s e 6 4 = " l F / j m U A u D z R z q 6 s L u g X r A x B 1 r e g = " > A A A B 6 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P R i 8 c q 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g M v H h T x 6 j / y 5 r 9 x 0 + a l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " l F / j m U A u D z R z q 6 s L u g X r A x B 1 r e g = " > A A A B 6 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P R i 8 c q 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g M v H h T x 6 j / y 5 r 9 x 0 + a l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " l F / j m U A u D z R z q 6 s L u g X r A x B 1 r e g = " > A A A B 6 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B U 0 l E 0 G P R i 8 c q 9 g P a U D b b T b t 0 s w m 7 E 6 G E / g M v H h T x 6 j / y 5 r 9 x 0 + a x C e Q j e 4 s v L 5 P G s 7 r l 1 7 + 6 8 1 r g u 4 i j D E R z D K X h w A Q 2 4 h S a 0 g M E Y n u E V 3 p z E e X H e n Y 9 5 a 8 k p Z g 7 h D 5 z P H 5 K O j w 0 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " r Z a E / z n q W Y W x C e Q j e 4 s v L 5 P G s 7 r l 1 7 + 6 8 1 r g u 4 i j D E R z D K X h w A Q 2 4 h S a 0 g M E Y n u E V 3 p z E e X H e n Y 9 5 a 8 k p Z g 7 h D 5 z P H 5 K O j w 0 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " r Z a E / z n q W Y W x C e Q j e 4 s v L 5 P G s 7 r l 1 7 + 6 8 1 r g u 4 i j D E R z D K X h w A Q 2 4 h S a 0 g M E Y n u E V 3 p z E e X H e n Y 9 5 a 8 k p Z g 7 h D 5 z P H 5 K O j w 0 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " r Z a E / z n q W Y W x C e Q j e 4 s v L 5 P G s 7 r l 1 7 + 6 8 1 r g u 4 i j D E R z D K X h w A Q 2 4 h S a 0 g M E Y n u E V 3 p z E e X H e n Y 9 5 a 8 k p Z g 7 h D 5 z P H 5 K O j w 0 = < / l a t e x i t > X SA&CIP < l a t e x i t s h a 1 _ b a s e 6 4 = " 7 L < l a t e x i t s h a 1 _ b a s e 6 4 = " M P K p F a 5 (see SI Appendix S7B).
Our results indicate a robust statistically significant positive relationship between cross-disciplinarity (X CIP ) and citation impact, consistent with the effect size in a different case study of the genomics revolution (Petersen et al 2018), which supports the generalizability of our findings to other convergence frontiers.To be specific, we calculate a 8.6% citation premium for the Broad configuration (γ X CIP = 0.07; p < 0.001), meaning that the average cross-disciplinary publication is more highly cited than the average mono-disciplinary publication.We calculate a smaller 5.9% citation premium associated with X SA (γ X SA = 0.05; p < 0.001).Yet the effect associated with articles featuring X CIP and X SA simultaneously is considerably larger (16% citation premium; γ X SA&CIP = 0.13; p < 0.001), suggesting an additive effect.
Comparing results for the Neighboring configuration to the baseline estimates for Broad, the citation premium is relatively larger for X SA (11% citation premium; γ X Neighboring,SA = 0.088; p < 0.001) and roughly the same for X CIP and X SA&CIP .This result reinforces our findings regarding the convergence "short-cut" (when X CIP is absent), indicating that this approach is more successful when integrating domain knowledge across shorter distances, consistent with innova-tion theory (Fleming 2001).
The configuration most representative of convergence is Distant, which compared to Broad and Neighboring features smaller effect size for X SA&CIP (5.2% citation premium; γ X Distant,SA&CIP = 0.04; p < 0.001).The reduction in γ X Distant,SA&CIP relative to values for Broad and Neighboring configurations likely reflects the challenges bridging communication, methodological and theoretical gaps across the Distant neuro-psycho-medical ↔ techno-informatic interface.More interestingly, this configuration is distinguished by a negative X SA estimate, indicating that the convergence shortcut yields less-impactful research than mono-domain research.Nevertheless, it is notable that for this convergent configuration, there is a clear hierarchy indicating the superiority of cross-disciplinary collaboration approaches to integrating research across distant domains.
As in the Article-level model, we also tested for shifts in the citation premium attributable to the advent of Flagship HBS project funding using a similar DiD approach.Figure 5(D) shows the citation premium γ X SA&CIP for articles published prior to 2014, and the difference δ X+ corresponding to the added effect for articles published after 2013.For Broad and Distant we observe δ X+ < 0, indicating a reduced ci-tation premium for post-2013 research.By way of example for the Broad configuration: whereas cross-domain articles published prior to 2014 show a 19% citation premium (γ X SA&CIP = 0.15; p < 0.001), those published after 2013 have just a 19%-11% = 8% citation premium (δ X SA&CIP + = −0.09;p < 0.001).The reduction of the citation premium is even larger for Neighboring (δ Neighboring,X SA&CIP + = −0.16;p < 0.001).Yet for Distant, we observe a different trendresearch combining both X SA and X CIP simultaneously has advantage over those with just X CIP or X SA , in that order (δ Dist.,XSA&CIP + = 0.04; p = 0.016; 95% CI = [.01,.08]).
We briefly summarize coefficient estimates for the other control variables.Consistent with prior research on crossdisciplinarity (Petersen et al 2018), we observe a positive relationship between team-size and citation impact (β k = 0.415; p < 0.001), which translates to a σ β k ≈ 0.5% increase in citations associated with a 1% increase in team size (since k p enters in log in our specification).We also observe a positive relationship for topical breadth (β w = 0.03; p < 0.001), which translates to a much smaller σ β w ≈ 0.04% increase in citations associated with a 1% increase in the number of major MeSH headings.And finally, regarding the career lifecycle, we observe a negative relationship with increasing career age (β τ = −0.011;p < 0.001) consistent with prior studies (Petersen et al 2018), translating to a 100 σ β τ ≈ -1.3% decrease in c p associated with every additional career year.See Tables S4-S5 for the full set of model parameter estimates.

Behind the Numbers
Further qualitative inspection of prominent research articles in this category identifies four key convergence themes associated with past or developing breakthroughs: Magnetic Resonance Imaging (MRI).MRI technology has been instrumental in identifying structure-function relations in brain networks, and has reshaped brain research since the 1990s.As a method that involves both sophisticated technology and core brain expertise, MRI has been a focal point for X Distant,SA&CIP scholarship.For example, ref. (Van Dijk et al 2012) addresses the problem of motion, a pernicious confounding factor that can invalidate MR brain results.Hence, this research article exemplifies how a fundamental problem threatening an entire line of research acts as an attractor of distant cross-disciplinary collaborations with an allencompassing theme, including authors from CIP 5 (medical specialists) and CIP 8 (engineers and computer scientists), while thematically spans four topical domains: SA 2 (Anatomy & Organisms), SA 3 (Phenomena & Processes), SA 5 (Techniques & Equipment), and SA 6 (Technology & Information Science).
Genomics.Following the completion of the Human Genome Project (HGP) in the early 2000s, genomics and biotechnology methods have established a foothold in brain research.This convergent frontier made headway in solving long-standing morbidity riddles and formulating novel therapies, e.g.providing a deeper understanding of the genetic basis of developmental delay (Cooper et al 2011) and developing treatment for glioblastoma using a recombinant poliovirus (Desjardins et al 2018).Both these articles include authors from CIP 4 and 5; thematically, these articles cast a wide net, with the former spanning SA 1, 3, 4 and 5, while the latter covers SA 2, SA 4 and SA 5.
Robotics.In the early 2010s neurally controlled robotic prosthesis reached fruition by way of collaboration between neuroscientists (CIP 1) and biotechnologists (CIP 4).A prime example of this emerging bio-mechatronics frontier is research on robotic arms for tetraplegics (Hochberg et al 2012), which thematically covers all SA 1-6.
Artificial Intelligence (AI) and Big Data.Following developments in machine learning capabilities (ML), deep AI methods were brought to bear on MR data, pushing brain imaging towards more quantitative, accurate, and automated diagnostic methods.Research on brain legion segmentation using Convolutional Neural Networks (CNN) (Kamnitsas et al 2017) is an apt example produced by collaboration between medical specialists (CIP 5) and engineers (CIP 8), and spanning SA 2-4 and SA 6. Simultaneously, massive brain datasets combined with powerful AI engines made their appearance along with methods to control noise and ensure their validity, as exemplified by ref. (Alfaro-Almagro et al 2018) produced by neuroscientists (CIP 1), health scientists (CIP 6), and engineers (CIP 8), and also featuring a nearly exhaustive topical scope (SA 2-6).
All together, case analysis indicates X Distant,SA&CIP products are typically characterized by significant SA integration, typically including 3-4 non-technical SA plus 1-2 technical SA.This thematic coverage exceeds the disciplinary bounds implied by the CIP set of the authors, which typically includes one non-technical CIP plus one technical CIP.

Discussion
In a highly competitive and open science system with multiple degrees of freedom, more than one operational mode is likely to emerge.To assess the different configurations that exist, we developed an {author discipline × research topic} classification that enables examination of several operational modes and their relative scientific impact.
Competing Convergence Modes: Our key result regards the identification and assessment of a prevalent convergence shortcut characterized by research combining different SA (X SA ) but not integrating cross-disciplinary expertise (M CIP ).Assuming the HBS ecosystem to be representative of other competitive science frontiers, our results suggest that the two operational modes of convergence evolve as substitutes rather than complements.Trends from the last five years indicate an increasing tendency for scholars to shortcut crossdisciplinary approaches, and instead integrate by way of expansive learning.This appears to be in tension with the intended mission of flagship HBS programs.Instead, our analysis provides strong evidence that the rise of expedient convergence tactics may be an unintended consequence of the race among teams to secure funding.
In order to provide timely assessment of convergence science, we addressed our fundamental RQ1 -how to measure convergence?-bydeveloping a generalizable framework that differentiates between diversity in team expertise and research topics.While it is true that a widespread paradigm shift towards increasing team size has transformed the scientific landscape (Milojevic 2014;Wuchty et al 2007), this work challenges the prevalent view that larger teams are innately more adept at prosecuting cross-domain research.Indeed, convergence does not only depend on team size but also on its composition.In reality, however, research teams targeting the class of hard problems calling for convergent approaches are faced with coordination costs and other constraints associated with crossing disciplinary and organizational boundaries (Cummings & Kiesler 2005, 2008;Van Rijnsoever & Hessels 2011).Consequently, teams are likely to economize in disciplinary expertise, and instead integrate cross-domain knowledge in part (or in whole) by way of polymathic generalists comfortable with the expansive learning approach.As a result, a team's composite disciplinary pedigree tend to be a subset of the topical dimensions of the problem under investigation.
As a consistency check, we also find this convergence shortcut to be more widespread in research involving topics that are epistemically close, as represented by the Neighboring configuration we analyzed.Contrariwise, in the neuropsycho-medical ↔ techno-informatic interface, belonging to the Distant configuration, convergent cross-disciplinary collaboration runs strong.Perhaps not by serendipity, mixed analysis further indicates that this is exactly the configuration where transformative science has long been occurring.
Arguably, a certain degree of expansive learning is needed for multidisciplinary teams to operate in harmony.For example, in the case of a psychologist collaborating with a medical specialist, it would be ideal if each one knew a little bit about the other's field, so that they establish an effective knowledge bridge.After all, this is what transforms a multidisciplinary team to a cross-disciplinary team, such that convergence becomes operative.However, this approach is not the dominant trend in HBS (see the Article level Model), and is possibly a response to the broad and longstanding paradigm promoting interdisciplinarity (Nissani 1995) with less emphasis on cross-domain collaboration.Again using our simple example, it may be that the medical specialist prefers not to partner at all with psychologists in the prosecution of bi-domain research, i.e., opting for the streamlined substitutive strategy of total replacement over the strategy of partial redundancy, which comes with the risks associated with cross-disciplinary coordination.
A limitation to our framework is that we do not specify what task (e.g.analysis, conceptualization, writing) a given domain expert performed, and hence do not account for division of labor in the teams here analyzed.Indeed, recent work provides evidence that larger teams tend to have higher levels of task specialization (Haeussler & Sauermann 2020), which thereby provides a promising avenue for future investigation, i.e., to provide additional clarity on how bureaucratization (Walsh & Lee 2015) offsets the recombinant uncertainty (Fleming 2001) associated with cross-disciplinary exploration.Another limitation regards the nuances of HBS programs that we do not account for, e.g.different grand ob-jectives, funding levels and disciplinary framing which varies across flagships.Yet as a truly multidisciplinary ecosystem, we believe HBS provides an ideal testbed for evaluating the prominence, interactions, and impact of the constitutional aspects of convergence (Eyre et al 2017;Grillner et al 2016;Jorgenson et al 2015;Quaglio et al 2017).
Our results also provide clarity regarding recent efforts to evaluate the role of cross-disciplinarity in the domain of genomics (Petersen et al 2018), where we used a similar scholaroriented framework that did not incorporate the SA dimensions.One could argue that the cross-disciplinary citation premium reported in the genomics revolution arises simply from the genomics domain being primed for success.Indeed, Fig. 3(A) shows that HBS scholars in the domain of Biotech.& Genetics discipline maintained high levels of SA diversity extending back to the 1970s.We do not observe similar patterns for other HBS sub-disciplines.Yet, our measurement of a ∼16% citation premium for research featuring both modes (X SA&CIP ) are remarkably similar in magnitude to the analog measurement of a ∼ 20% citation premium reported in (Petersen et al 2018).
Econometric Analysis: In order to accurately measure shifts in the prevalence and impact of cross-domain integration, in addition to how they depend on the convergence mode, we employed an econometric regression specification that leverages author fixed-effects and accounts for research team size, in addition to a battery of other CIP and SA controls.Regarding the growth rate of HBS convergence science, Fig. 5(A) indicates that research integrating topics and disciplinary expertise is growing between ∼2-4% annually, relatively to the mono-disciplinary baseline; however, this upward trend reversed after the ramp-up of HBS flagships, as indicated in Fig. 5(B).Our results also indicate that the citation impact of publications from polymathic teams (X Neighboring,SA and X Distant,SA ) is significantly lower than the impact of publications from more balanced cross-disciplinary teams (X Neighboring,SA&CIP and X Distant,SA&CIP ), see Fig. 5(C).On a positive note, a difference-in-difference strategy provides support that HBS research featuring the X Distant,SA&CIP configuration has increased in citation impact following the rampup of HBS flagships, see Fig. 5

(D).
There are various possible explanations to consider, most prominent of which is that the cognitive and resource demands required to address grand scientific challenges have outgrown the capacity of even monodisciplinary teams, let alone solo genius (Simonton 2013).
Reflecting upon these results together, it is somewhat troubling that the polymathic trend proliferates and competes with the gold standard, that is, configurations featuring a balance of cross-disciplinary teams and diverse topics (X SA&CIP ).Counterproductively, flagship HBS projects appear to have incentivized expansive research strategies manifest in a relative shift towards X SA since the ramp-up of flagship projects in 2014.This trend may depend upon the particular flagship's objective framing.Take for instance the US BRAIN Initiative, with the expressed aim to support multi-disciplinary mapping and investigation of dynamic brain networks.As such, its corresponding research goals promote the integration of Neighboring topics, where scientists with polymathic tendencies may feel more emboldened to short-circuit expertise.In addition, there are practical pressures associated with proposal calls.Another possible explanation regarding team formation, is that it may be easier and faster for researchers to find collaborators from their own discipline when faced with the pressure to meet proposal deadlines.Additionally, funding levels are not unlimited and bringing additional reputable specialists into the team comes with great financial consideration.Hence, a natural avenue for future investigation is to test whether other convergence-oriented funding initiatives also unwittingly amplify such suboptimal teaming strategy.
Theoretical insights -expansive learning: Indeed, the polymathic trends described here pre-existed the flagship HBS projects, and so must have deeper roots.One hypothesis is that this trend represents an emergent scholarly behavior owing to efficient 21st century means to pursue new topics by way of expansive learning (Engeström & Sannino 2010), since the learning costs associated with certain tasks characterized by explicit knowledge have markedly decreased with the advent of the internet and other means of rapid high-fidelity communication.Indeed, many of the activity signals brought to the fore by this study bear the hallmarks of expansive learning.Perhaps the most telling signal is the propensity towards topically diverse publications -Fig.4(D-F), which largely stems from horizontal movements in the research focus of individual scientists rather than vertical integration among experts from different disciplines -Fig.4(A-C).The scientific system is increasingly interconnected, as evident from the densification of collaboration networks and emergent cross-disciplinary interfaces -Fig.2(B).These interfaces satisfy the conditions that are conducive to boundary crossing, especially with respect to research topics, which can act as structures facilitating "minimum energy" expansion (Toiviainen 2007).To this point, we also assessed wether the relationship between CIP diversity and SA integration depends on wether the configuration represents neighboring or distant domains.Analyzing the set of X SA&CIP articles, we find that expansive integration is consistently most effective in Distant configurations, e.g.teams with N CIP,p = 3 span roughly 32% more SA than their mono-disciplinary counterparts -Fig.S5(B).
Policy Implications: Consistent also with other studies in expansive learning, actions taken by participants do not necessarily correspond to the intentions by the interventionists (Rasmussen & Ludvigsen 2009).The participants are brain scientists in this case, and the interventionists are the funding agencies and the scientific establishment at large.While the latter aim to promote research powered by true multidisciplinary teams, the former appear to prefer to shortcut around this ideal.
Policy makers and other decision-makers within the scientific commons are faced with the persistent challenge of efficient resource allocation, especially in the case of grand scientific challenges that foster aggressive timelines (Stephan 2012).The implicit uncertainty and risk associated with such endeavors is bound to affect reactive scholar strategies, and this interplay between incentives and behavior is just one source of complexity among many that underly the scientific system (Fealing & eds. 2011).
To begin to address this issue, policies addressing the challenges of historical fragmentation in Europe offer guidance.European Research Council (ERC) funding programs have been powerful vehicles for integrating national innovation systems by way of supporting cross-border collaboration, brain-circulation and knowledge diffusion -yet with unintended outcomes that increase the burden of the challenge (Doria Arrieta et al 2017).To address this fragmentation, many major ERC collaborative programs require multinational partnerships as an explicit funding criteria.Motivated by the effectiveness of this straightforward integration strategy, convergence programs can can include analog crossdisciplinary criteria or review assessment to address the convergence shortcut.Such guidelines could help to align polymathic vs. cross-disciplinary pathways towards more effective cross-domain integration.Much like the vision for brain science -towards a more complete understanding of the emergent structure-function relation in an adaptive complex system -a better understanding of cross-disciplinary team assembly, among other team science considerations (Börner et al 2010), will be essential in other challenging frontiers calling on convergence.

Methods
Normalization of citation impact.We normalized each Scopus citation count, c p,t , by leveraging the well-known log-normal properties of citation distributions (Radicchi et al 2008).To be specific, we grouped articles by publication year y p , and removed the time-dependent trend in the location and scale of the underlying log-normal citation distribution.The normalized citation value is given by where µ t ≡ ln(c t + 1) is the mean and σ t ≡ σ[ln(c t + 1)] is the standard deviation of the citation distribution for a given t; we add 1 to c p,t to avoid the divergence of ln 0 associated with uncited publications -a common method which does not alter the interpretation of results.
Figure S8(G) shows the probability distribution P (z p ) calculated across all p within five-year non-overlapping time periods.The resulting normalized citation measure is well-fit by the Normal N (0, 1) distribution, independent of t, and thus is a stationary measure across time.Publications with z p > 0 are thus above the average log citation impact µ t , and since they are measured in units of standard deviation σ t , standard intuition and statistics of z-scores apply.The annual σ t value is rather stable across time, with average and standard deviation σ ± SD = 1.24 ± 0.09 over the 49-year period 1970-2018.Subject Area classification using MeSH.Each MeSH descriptor has a tree number that identifies its location within one of 16 broad categorical branches.We merged 9 of the science-oriented MeSH branches (A,B,C,E,F,G,J,L,N) into 6 Subject Area (SA) clusters (see Fig. 1). Figure S1 shows the 50 most prominent MeSH descriptors for each SA cluster.Hence, we take the set of MeSH for each p denoted by W p , and map these MeSH to the corresponding MeSH branch (represented by the operator O SA ), yielding a count vector with six elements: shows the distribution P (N SA ) of the number of SA per publication: 72% of articles have two or more SA; the mean (median) SA p is 2.1 (2), with standard deviation 0.97, and maximum 6.
Disciplinary classification using CIP.We obtained host department information from each scholar's Scopus Profile.Based upon this information provided in the profile description, and in some cases using additional web search and data contained in the Scholar Plot web app (Majeti et al 2020), we manually annotated each scholar's home department name according to National Center for Education Statistics Classification of Instructional Program (CIP) codes.We then merged these CIP codes into 9 broad clusters and three super-clusters (Neuro/Biology, Health, and Science & Engineering, as indicated in Fig. 1); for a list of constituent CIP codes for each cluster see Fig. S1(C).Analogous to the notation for assigning − → SA p , we take the set of authors for each p denoted by A p , and map their individual departmental affiliations to the corresponding CIP cluster (represented by the operator O CIP ), yielding a count vector with nine elements: Measuring cross-domain diversity.We developed a measure of cross-domain diversity defined according to categorical cooccurrence within individual research articles.Each article p has a count vector v p : for discipline categories v p ≡ −−→ CIP p and for topic categories v p ≡ − → SA p .We then measure article co-occurrence levels by way of the normalized outer-product where ⊗ is the outer tensor product, U (G) is an operator yielding the upper-diagonal elements of the matrix G (i.e.representing the undirected co-occurrence network among the categorical elements).In essence, D p ( v p ) captures a weighted combination of all category pairs.The resulting matrix represents dyadic combinations of categories as opposed to permutations (i.e., capturing the subtle difference between an undirected and directed network).While we did not explore it further, this matrix formulation may also give rise to higherorder measures of diversity associated with the eigenvalues of the outer-product matrix.The notation ||...|| indicates the matrix normalization implemented by summing all matrix elements.The objective of this normalization scheme is to control for the variation in v p in a systematic way.As such, this co-occurrence is a article-level measure of diversity which controls for variations in the total number of categories and different count statistics for elements belonging to −−→ CIP p and − → SA p .Consequently, totaling D p ( v p ) across articles from a given publication year yields the total number of articles published in a given year, p|yp∈t ||D p,t || = N (t).
We also define a categorical diversity measure for each article given by f D,p = 1 − Tr(D p ) ∈ [0, 1), which corresponds to the sum of the off-diagonal elements in D. The average article diversity by publication year is denoted by f D (t) .In simple terms, articles featuring a single category have f D,p = 0 whereas articles featuring multiple categories have f D,p > 0. While the result of this approach is nearly identical , also referred to as the Gini-Simpson index), f D,p is motivated by way of dyadic co-occurrence rather than the standard formulation motivated around repeated sampling.
Data accessibility: All data analyzed here are openly available from Scopus and PubMed APIs.Competing Interests The authors declare that they have no competing financial interests.Author Contributions AMP performed the research, participated in the writing of the manuscript, collected, analyzed, and visualized the data; MA developed software to collect, analyze, and visualize the data; and IP designed the research, performed the research, and participated in the writing of the manuscript.Funding: AMP and IP acknowledge funding from NSF grant 1738163 entitled 'From Genomics to Brain Science'.Acknowledgements: The authors acknowledge support from the Eckhard-Pfeiffer Distinguished Professorship Fund.AMP acknowledges financial support from a Hellman Fellow award that was critical to this project.Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.
reported in each article.In total, we encountered 14,212 distinct Major Topic MeSH.
Geographic Regions.We obtained geographic location data from each scholar's Scopus Profile, associating each individual with one of 77 countries; the top five countries represented are the United States with 5030 scholars, Germany with 1192, UK with 1074, China with 1049, and Japan with 894.These coauthors associate each p with a set of countries, which we cluster into four localized regions indexed by R: North America, corresponding to R = 1 (United States and Canada); Europe, R = 2 (33 European Union and non-European Union countries including Norway, Switzerland, Israel, Iceland, and Serbia); Australasia, R = 3 (Peoples Republic of China, Japan, South Korea, Australia, Taiwan, New Zealand, Singapore, Malaysia, and Thailand); and World, R = 4 (remaining countries including Brazil, India, Turkey, and South Africa, among others).88% of the publications are covered by regional clusters R = 1, 2, 3.The most prominent distinction between regions is for NA and EU, which both feature increases in Technology & Information Science [6] that are relatively larger than observed for AA and World, likely reflecting the technological capacity related to the tech.hubs in these regions; another distinction relates to the Psychiatry & Psychology [SA 1] which increases in EU and AA more than for NA and World; and also for Health [4] which increases in NA and AA more than for EU and World.

S3. Levels and changes in SA and CIP co-occurrence before and after 2014
We also seek to identify which category pairs are frequently combined in articles, and to assess their frequency shifts after 2014.To this end, we introduce a tensor-product method to readily measure SA an CIP co-occurrence statistics for the purpose of identifying particular cross-domain orientations observed in cross-domain HB science.
In order to juxtapose the relative frequencies of mono-category articles separately from multi-category articles, we define a modified outer-product matrix designed purely for visualization purposes: where ⊗ is the outer tensor product and • indicates the element-wise or Hadamard product.Note that this definition is slightly different than D p ( v p ) defined in Eq.( 2).The difference occurs in the first case, for which the matrix Υ ≡ v p ⊗ v p − v p • • v p for which the diagonal elements are eliminated via subtraction, i.e.Tr(Υ) = 0. Simply stated, when v p (representing − → SA p or −−→ CIP p ) has 2 or more non-zero elements then we primarily count the off-diagonal elements of the outer-product matrix and are not concerned with the relative frequencies of the on-diagonal elements.Contrari-wise, in the case that there is just one category present -e.g.v p = {0, 3, 0, 0, 0, 0} -then we track only the diagonal element, which counts the occurrence of the single category.In this second case, the resulting matrix Dp ( v p ) = DiagonalMatrix(Sign[ v p ]) has only one non-zero element, which occurs for the diagonal value D22 = 1; and all other matrix elements = 0. Note that in either case the total sum of all elements are normalized to unity, || Dp ( v p )|| = 1.This normalization implies that totaling Dp ( v p ) across articles from a given publication year yields the total number of articles, N (t).
We then calculated the aggregate co-occurrence matrix, denoted by C < = yp∈ [2009−2013] Dp , using all articles published in the pre-period.It then follows from our normalization procedure that the total across all matrix elements is proportional to the total number of articles published in a given period, i.e.
CIP and C < SA , respectively.To measure relative changes, we then calculated the percent difference in each matrix element 2009−2013] corrects for bias associated with differences in the number of articles published each of the pre-and post-periods.To illustrate why this correction is important, we randomized the counts contained in v p = − → SA p and plot the resulting C < rand.,SA and ∆C rand.,SAmatrices in Fig. S11.As anticipated, this randomization scheme eliminates the variation among on-diagonal elements and off-diagonal elements in panel (A); Moreover, in panel (B) the off-diagonal elements all show percent change values that are in the range of ±3%, thereby indicative of the threshold for distinguishing statistically significant percent changes in the real data.
Returning to the real data and the calculation of C < CIP , the most notable results of this visualization are the consistently strong couplings between CIP category [1] and all other categories [2,3,4,5,6]; between categories [1,2,3] and [5]; and also between categories [1,2,5] and [6].Also of note is the higher-order clique among [1,5,6] where each CIP is strongly coupled to each other.Contrariwise, we observe relatively weak coupling between [7,8,9] and most all other CIP.
Other prominent CIP that couple by region: NA shows relatively higher coupling between [1,4] and [4,5] and [5,8] compared to other regions; and EU shows relatively higher coupling between [1,9] and [2,9].Regarding the shifts from the pre-to post-2014 captured by ∆C CIP , NA and EU regions show consistent increase in CIP pairs [4,7] and [4,9], [3,8] and [2,7]; and consistent decrease between [1,8] and [2,9] and [6,9] and all combinations between 5 and [7,8,9].Notably, AA exhibits higher % change levels, following from the fact that several elements in C < CIP that are nearly 0. Regarding the shifts from the pre-to post-2014 captured by ∆C SA , the most consistent increases are between SA [1] and each of [2,4,6]; and between 4 and both [5,6].Contrariwise, the most consistent decreasing coupling is between [3] and both [2,6], and between [5,6].The matrices for NA and EU are rather similar, with the most prominent distinction between [2,6] -showing a -12% change for NA and a +5% change for EU; and also between [3,6] -showing also a -12% decrease for NA but no significant change for EU.This latter disparity is an example of where EU may be taking the lead in in-silico-oriented approaches to HB science, consistent with the framing of the Human Brain Project.
The most notable distinction for AA relative to NA and EU is in the larger magnitude of shifts, representing a period of international convergence for all couplings involving SA [1], and in particular between [1,2] and between [1,6]; contrariwise for AA, there is a prominent decoupling between SA [5,6] which is consistent with the relative shifts away from these two SA to compensate for the prominent redirection towards [1] and [4], as also indicated by Fig. S3(B).

S4. Calculation of cross-domain co-occurrence: an illustrative example of the Tensor Product
Calculating f D,p begins with the outer-product between a categorical count vector, e.g.SA p ⊗ SA p , where ⊗ is the outer tensor product.The resulting matrix represents dyadic combinations of categories as opposed to permutations (i.e., capturing the subtle difference between an undirected and directed network).The subtle difference between the Blau index and f D,p arises from U (G), which is imposed to capture the difference between combinations rather than permutations (or directed versus undirected network).Hence, this perspective offers a new pathway to the formulation of the common Blau diversity index by way of co-occurrence rather than repeated sampling.
Take for example an article p with 4 metadata entities belonging to 3 categories, v p = {1, 2, 0, 0, 1, 0}.Calculation of the co-occurrence matrix D p ( v p ) using the normalized outer-product defined in Eq.(2) yields The categorical diversity is calculated as the total across off-diagonal elements, f D,p = 1 − Tr(D p ) = 5/11.For completeness, consider the representation of a mono-disciplinary article with the same number of metadata entities that all fall into the second category, v p = {0, 4, 0, 0, 0, 0}.Then What does this measure measure?Notably, f D,p accounts for both categorical differences (Shannon-like) and concentration disparity (Gini-like) (Harrison & Klein 2007).One the first hand, articles with more variation in SA categories will correspond to larger f D,p values, as the number of non-zero off-diagonal elements is proportional to Mp 2 ∼ M 2 p , where M p is the number of distinct SA present, which contributes to larger f D,p ; and on the second hand, the off-diagonal elements will be relatively larger in combination if the count values contained in SA 2 are more evenly distributed, i.e., are not highly concentrated in just one category.

S5. Bi-partite network between CIP and SA
We quantify the empirical association between CIP and SA categories by aggregating the information contained in −−→ CIP p and − → SA p .We first applied this method to the subset of mono-domain articles comprised of p with O CIP ( F p ) = O SA ( F p ) = M .By definition, each of these article features just a single CIP, making it possible to identify the SA that are most frequently associated with mono-domain researchers from that CIP category.Formally, this amounts to calculating the bi-partite network between CIP and SA, operationalized by averaging the − → SA p for mono-domain articles from each CIP category, given by  4(G) shows only the most prominent CIP-SA links.
For juxtaposition, we also calculated the bi-partite network using the non-overlapping subset of articles with O SA&CIP ( F p ) = X SA&CIP .Since these articles by construction have N CIP,p ≥ 2, we define the average association between CIP and SA as − → SA CIP = p∈CIP ( − → SA p /N SA,p )/N CIP,p , where the vector − → SA p /N SA,p contributes to the average for all CIP present in −−→ CIP p .The bi-partite network labeled X CIP X SA in Figure 4(G) also shows just the most prominent CIP-SA links, applying the same threshold that excludes links that have weight less than half of the most prominent weighted CIP-SA link for a given CIP.
Let A (B) represent the matrix representation of X CIP X SA (M CIP M SA ) -after pruning less prominent CIP-SA links.We then compute the difference between the matrices, ∆ XM ≡ C = A − B, such that positive (negative) elements of C indicate prominent links that are relatively over-represented in cross-domain (mono-domain) articles.The Sankey chart labeled ∆ XM in Figure 4(G) shows just the positive elements, which tend to be larger in magnitude than the (relatively few) negative elements.
In Fig. 3(B) we presented the bi-partite network of prominent CIP-SA relations for the Broad configuration. Figure S12 complements those results showing the bipartite network for both the Neighboring and Distant configurations, which provide cross-validation for the choice of CIP and SA categories they represent.

S6. Historical trends in SA & CIP diversity: 2000-2018
We investigate historical trends in SA & CIP diversity using the matrix D p defined in Eq. ( 2), which simultaneously measures mono-dimensional and multi-dimensional features of each article.More specifically, we define f D,p = 1−Tr(D p ) as the fraction of the article's co-occurrence matrix capturing combinatorial diversity.Hence, in the limiting case that the article features just a single category, then f D,p = 0; and when all categories are present in equal quantities then f D,p = (d − 1)/(d + 1), where d is the dimension of the categorical vector v p .As d increases then f D,p approaches 1.Hence, for sufficiently large d then 0 ≤ f D,p 1. Figures S8(D,E) show the unconditional distributions, P (N SA ) and P (N CIP ), with observed values spanning across the full range d = 6 and d = 9, respectively.
As a bounded quantity, the average article-level diversity f D (t) = N (t) −1 p∈N (t) f D,p is an appropriate measure of a characteristic article, where N (t) is the number of articles being considered from year t.However, f D (t) is nevertheless sensitive to bias associated with a systematic increase over time in N CIP,t and N SA,t , the average number of categories present per article per year.We address this issue by applying a temporal deflator which adjusts the annual averages to account for systematic shifts in the underlying data generating process.To be specific, we define f D,SA (t) = f D,SA (t) × [ μSA /μ SA (t)], where μSA (t) = N SA,t /σ N SA,t is the inverse coefficient of variation (also called the signal-to-noise ratio) with respect to the number of SA per article, represented by N SA,p ; and μSA is the average value calculated across the roughly 3 decades of analysis.Figure S11(C) shows that μSA (t) is increasing steadily with time.Hence, adjusting for this secular growth is essential so that observed increases are not simply artifacts of the underlying growth in N SA,p or N CIP,p .We apply the same method to adjust for systematic shifts in N CIP,t .
To illustrate the utility of this deflator method, we randomized the SA for all articles (by randomly shuffling the counts in each demonstrates that there is no trend in the corresponding f D,SA (t) , indicating that this method removes the underlying bias.
Returning to the empirical data, Fig. S6 shows the evolution of disciplinary diversity captured by coauthors' departmental affiliations.Each panel shows f D,CIP (t) calculated for a specified combination of categories contained in each −−→ CIP p vector, as indicated by the schematic motif provided alongside each panel.For example, Fig. S6(A) calculates f D,CIP (t) from all 9 CIP categories considered independently, whereas Fig. S6(B) collects the counts associated with the combined categories [1-4] and [5-9] and calculates the diversity based upon the fraction of D p belonging to the single off-diagonal element D 12,p , which records the disciplinary mixing between these two supergroups.
Figure S6(A) is calculated using the Broad configuration, and exhibits a slow increase in CIP diversity from 1990 to the mid 2000s in North America (NA) and European (EU) regions, which stalled thereafter, and even declined in the last decade for NA and AA, but not for EU. Figure S6(B) shows relatively lower levels and trends in the diversity at the intersection of supercategories [1-4] (representing traditional neuro/biology departments) and [5][6][7][8][9] (representing all other CIP jointly).By way of comparison, this trend indicates that the decline in panel (A) is not derived from the intersection explored in panel (B).Instead, Figs.S6(C,D) indicate that the decline in (A) is attributable to declines at the individual intersections between all permutations of CIP categories 1-7, and to a lesser extent between the three disciplinary subdomains: neuro/biology [1-5], health [5-7] and science and engineering [8][9].Overall, we also observe higher levels of CIP diversity in NA, followed by EU, and then followed by AA.
Likewise, Figure S7 shows the evolution of research topic diversity captured by SA counts in each − → SA p .We observe much stronger trends for SA, suggesting that scholars tend to also cross disciplines as mono-disciplinary teams rather than via crossdisciplinary collaboration.Fig. S7(A) shows f D,SA (t) calculated for the Broad configuration which includes all SA categories.The diversity trend is increasing since 1990 for all regions, but with reduced pace since the early 2010s.Similar to our findings for CIP, we observe AA lagging the other two regions; however, in this case of SA we do observe more similar levels of diversity between EU and NA. Figure S7(B) indicates that much of the increase in SA diversity is attributable to research combining Health [SA 4] and the other categories -in other words, the domain of health science appears to be a persistent driving force behind convergence trends.Supporting evidence for this observation is also captured in the hierarchical clustering of SA represented by the minimum spanning tree (MST) representation of the aggregate SA co-occurrence matrix DSA,p -see Fig. S1(B).By way of comparison, the analog MST representation of DCIP,p in Fig. S1(D) features a less prominent hierarchy across the CIP categories.
We analyzed several additional SA category subsets and super-category combinations to more deeply explore the anatomy of research topic diversity.Similarly, a significant component of the increasing diversity captured between SA [4,5,6] derives from the increase between research that is centered around Techniques & Equipment [5] and Technology & Information Science [6]; although this contribution shown in Fig. S7(F) only contributes to increases in diversity until 2010, after which there is a prominent decline.Interestingly, this is a configuration which emphasizes the leading role of AA since 2010 in combining these two areas.To further emphasize the role of Health, we exclude this category [4] from the diversity measures shown in Fig. S7(G), indicating that combinations of SA across the traditional domains of biology and the technology-oriented domains have also saturated around 2010, and their contribution to SA diversity primarily appears when considered the biology [1-3] and technology-oriented [5,6] as super-clusters illustrated in Fig. S7(H).

S7. Panel regression: model specification
We constructed article-level and author-level panel data to facilitate measuring factors relating to SA and CIP diversity and shifts related to the ramp-up of three regional HB flagship projects circa 2013, and several others thereafter.Figure S8 shows the distribution of various article-level features; and Figure S9 shows the covariation matrix between the principle variables of interest.
We use the following operator notation to specify how we classify articles as being cross-domain (X) or mono-domain (M ).Starting with the feature vector , we obtain a binary diversity classification for each article denoted by X and M .We specify the objective criteria of the feature operator O by its subscript.For example, O SA ( F p ) = X SA if two or more SA categories are present, otherwise the value is M ; and by analogy, O CIP ( F p ) = X CIP if two or more categories are present, and otherwise O CIP ( F p ) = M .In the case of models oriented around articles featuring X SA and X CIP simultaneously (represented by O SA&CIP ( F p ) = X SA&CIP ), we exclude the set of articles classified as X SA but not X CIP and those classified as X CIP but not X SA .Hence, in what follows, the counterfactual baseline group for X SA&CIP articles are also the subset of mono-domain articles, which facilitates comparison of effect sizes across models oriented around X CIP , X SA and X SA&CIP . .Because all Scopus scholars map onto a single CIP, and since this model is primarily concerned with identifying factors associated with orientation towards cross-domain research, we exclude solo-authored research papers (i.e.those with k p = 1) from this analysis since the likelihood for those articles is predetermined (i.e.P (M ) = 1); for the same reason, we also exclude articles with a single Major MeSH category (i.e.those with w p = 1).

A. Article-level
For each article we also include several covariates of I X,p : the article publication year y p ; the mean journal citation impact, calculated as the average z p for articles from journal j, denoted by z j = z p | journal j ; the natural logarithm of the total number of coauthors, ln k p ; and the natural logarithm of the total number of Major MeSH terms, ln w p .As additional controls, we also include the total number of international regions associated with the authors' affiliations N R,p (with min value 1 and max value 4), and also the total number of categories featured by the article, N SA,p and N CIP,p .
We then model the odds Q by way of a Logit regression model, specified in the case of X SA as Logit P (X SA ) = log P (X SA ) P (M ) = and in the case of X SA&CIP as Logit P (X SA&CIP ) = log P (X SA&CIP ) P (M ) = To account for errors that are geographically correlated over time, we estimated the model using robust standard errors clustered on a regional categorical variable.The full set of parameter results are tabulated in models ( 1)-( 3) in Tables S1-S3, which report the exponentiated coefficients.To be specific, the exponentiated coefficient exp(β) is the odds ratio, representing the factor by which Q changes for each 1-unit increase in the corresponding independent variable, i.e.Q +1 /Q = exp(β).In real terms, 100β ≈ 100(exp(β)−1) represents the percent change in Q corresponding to a 1-unit increase in the corresponding independent variable (where the approximation holds for small β values).As a result, exp(β) values that are less than (greater than) unity indicate variables that negatively (positively) correlate with the likelihood P (X).
Quantifying shifts in propensity for CIP and SA diversity associated with the announcement of global Flagship HB projects circa 2013 In order to identify shifts in the 5-year period after the 2013 ramp-up of HB projects worldwide, we incorporated an interaction between the pre-/post periods -indicated by I 2014+,p , which takes the value 1 for y p ≥ 2014 and 0 otherwise -and a categorical variable specifying the region, represented by I R,p .We use the Rest of World region category (indicated by countries colored gray in Fig. 1) as the baseline for regional comparison since these regions did not feature flagship HB programs on the scale of those announced in Australia, Canada, China, Japan, Europe, South Korea, and the United States.
By way of example, in the case of modeling the likelihood P (X SA ), the interaction term is added in the second row, Logit P (X SA ) = log P (X SA ) P (M ) = To differentiate different types of model variables, β is used to identify coefficients associated with continuous variables, γ is used for indicator variables, and δ is used to indicate interactions between indicator variables.In particular, the coefficient δ R+ measures the Difference-in-Difference (DiD) estimate of the effect of HB projects on the propensity for research teams to pursue X SA approaches.Figures S6 and S7 demonstrate that historical trends in the prevalence of cross-domain diversity satisfy the parallel trend assumption for both CIP and SA, respectively.The full set of parameter results are tabulated in models ( 4)-( 6) in Tables S1-S3, and the point estimates for principal test variables are visually summarized in Fig. S10.

B. Author-level Model B
In the second model, we seek to measure the relation between the two different types of article diversity -CIP and SA -and the article's scientific impact, proxied by c p .Our approach leverages the hierarchical features of the article-level data grouped into author-specific subgroups representing HB researcher publication portfolios.As a result, model coefficients represent estimates net of author-specific time-independent factors.In other words, this fixed-effect specification yields parameter estimates that are net of the author-specific baseline α a = z a , where a is an author index.This specification identifies a clear counterfactual framework for identifying the different outcomes associated with X and M that are relevant to researcher problem identification and team-assembly strategies.
First, in order to measure relative differences in citation impact within and across publication cohorts, we apply a logarithmic transform that facilitates removing the time-dependent trend in the location and scale of the underlying log-normal citation distribution (Radicchi et al 2008).As such, the normalized citation impact defined in Eq. ( 1) is where µ t ≡ ln(1 + c t ) is the mean and σ t ≡ σ[ln(1 + c t )] is the standard deviation of the log-citation distribution for articles grouped by publication year.We uniformly add 1 to each c p,t count to avoid the divergence ln 0 associated with uncited publications, a common method that does not alter the interpretation of our results.Importantly, the standard deviation σ t ≈ σ = 1.24 is approximately constant over the focal period of our analysis.Consequently, we are able to transform the relation between z p and a given covariates into a percent change in c p,t associated with the same covariate.More specifically, building on previous work (Petersen 2018;Petersen et al 2018) we define the citation premium as the percent change 100∆c p /c p associated with shift in the independent variable v.For sake of simplicity, consider the basic linear model Y (c) = z p = β 0 + β v v with the decomposition of differentials, ∂Y (c)/∂v = (∂Y /∂c)(∂c/∂v) = β v ; it follows from the property of logarithms that ∂Y /∂c = 1 σt(1+c) .Calculating the percent change 100∆c p /c p follows from rearranging the differential relations above, yielding dcp σt(1+cp)dv = β v .Hence, when the independent variable β v is a binary indicator variable, then the shift from value 0 to 1 corresponds to dv = 1, and so the percent change 100∆c p /c p ≈ 100dc p /c p ≈ 100 × σ t × β v ≈ 100 × σ × β v .By extension, when the independent variable is a scalar quantity then the percent change in c p associated with a 1-unit increase dv is also given by 100 × σ × β v .And in the case that the scalar quantity enters in logarithm (e.g.ln k p ), then a 1% increase in v corresponds to a σ × β v percent increase in c p .
Quantifying the effect of cross-domain diversity on scientific impact While previous work aimed to identify the role of X CIP in the ecosystem of biology and computing researchers that championed the genomics revolution (Petersen et al 2018), here we seek to simultaneously identify the relative impact of X CIP and X SA in the emerging ecosystem of HB science.In this way, we are able to compare research strategies that leverage combinations of diverse researcher expertise -i.e.cross-disciplinary collaboration -to those that do not, in the ultimate pursuit of interdisciplinary knowledge and research (Nissani 1995).
To this end, we model the relation between z p and X CIP & X SA by applying ordinary least-squares (OLS) regression to estimate the coefficients of the panel regression model implemented with researcher profile fixed effects: where the model parameters are estimated using Huber-White robust standard errors, which account for heteroscedasticity and serial correlation within the publication set of each scholar, indexed by a.
The control variables in Eq. (S7) include ln k p , measuring the natural logarithm of the total number of coauthors; ln w p is the natural logarithm of the total number of Major MeSH terms; the career age variable τ a,p , measuring the number of years since the researcher's first publication, capturing variation attributable to the career life cycle; and we also include factor variables controlling for publication year and other article-level features measured by − → SA p , −−→ CIP p , − → R p .We exclude solo-authored research papers (i.e.those with k p = 1) along with articles with a single Major MeSH category (i.e.those with w p = 1).
Table S4 shows the full parameter estimates for six similar models that differ primarily in the type of cross-domain diversity included as the principle test variable, represented generically by I X .In models (1)-( 3) we vary the specification of the types of SA and CIP being tested.To be specific, in model (1) we include indicators I X SA and I X CIP , where I X CIP takes the value 1 if O CIP ( F p ) = X CIP and 0 if O CIP ( F p ) = M , and similarly for I X SA .These definitions of X correspond to the Broad configuration, calculated using all CIP and SA categories, as indicated in Figures 4(A,D).According to this definition, articles combining SA (CIP) from any two or more categories are classified as X SA (X CIP ).
In model ( 2 In model (3) we use X indicators defined according to the Distant configuration, representing longer-distance or "Convergent" cross-domain activity -here capturing the neuro-psycho-medical -vs-techno-computational interface.In our model specification, X is represented by the binary indicator variables I X Distant,SA and I X Distant,CIP .In the case of X Distant,SA , this interface corresponds to articles combining SAs [1][2][3][4]  Likewise, Models (4-6) instead focus on X SA&CIP (represented by I X SA&CIP ); each model corresponds to the either the Broad, Neighboring or Distant configurations defining X SA and X CIP .As such, these models test the citation premium associated with articles featuring cross-domain diversity in combination.Because we exclude the confounding subsets of articles featuring X SA but not X CIP , or vice versa, then the counterfactual to X SA&CIP in are articles that are mono-dimensional in both categories.Thus, since the counterfactual groups are similar, the the citation premium estimated by γ X SA&CIP are comparable with the γ X SA and γ X CIP estimated in models (1-3).The full set of parameter results are reported in Table S5, and the transformed point estimates measuring the percent increase in c p associated with each X definition are visually summarized in Fig. 5(C).
Quantifying shifts in the effect of cross-domain diversity associated with the announcement of global Flagship HB projects circa 2013 We test for shifts in the citation premium attributable to the advent of global Flagship HB projects by introducing an interaction between I 2014+,p and I X SA&CIP , as indicated by the addition of two terms into the second row, As before, this Difference-in-Difference approach is based upon the counterfactual comparison of articles featuring X SA&CIP to those featuring M , integrating an additional comparison between articles published after 2014 to those published before 2014.As in the previous citation model, the model parameter γ X SA&CIP represents the citation premium attributable to research endeavors simultaneously featuring cross-domain combinations of both SA and CIP.However, in this specification γ X SA&CIP applies to articles published before 2014.The analog estimate of the relative citation premium for articles published after 2014 is γ X SA&CIP + δ X SA&CIP + .In other words, if all other covariates are held at the average values, then the citation premium difference is given by δ X SA&CIP + , with positive (negative) values indicating an increase (decrease) in the citation premium after 2014.The principle test variables γ X SA&CIP , δ X SA&CIP + and their sum are visually summarized in Fig. 5(D).

Significance Statement
Authors must submit a 120-word maximum statement about the significance of their research paper written at a level understandable to an undergraduate educated scientist outside their field of speciality.The primary goal of the significance statement is to explain the relevance of the work in broad context to a broad readership.The significance statement appears in the paper itself and is required for all research papers.
Please provide details of author contributions here.Please declare any competing interests here.[2009][2010][2011][2012][2013][2014][2015][2016][2017][2018].(C) Increased frequency of convergent domain combinations between P 1 and P 2. For example, the most prominent convergent interface is between Neuro/Bio and Health, which was featured in 8.6% of articles in P 1 and 10.7% in P 2, corresponding to a +24% growth in P 2 relative to P 1.All percent increases are significant at the p < 0.001 level based on a two-sample two-tailed z-test comparing the proportions for P 1 and P 2.
1 2 3 4 5 6 7 8 9 5 6 7 8 9 r pub., N CIP,p # CIP per pub., N CIP,p   2); each curve is calculated for articles belonging to a given geographic region, as determined by the coauthors' regional affiliations: Australasia (red), Europe (blue), and North America (orange).For each panel we provide a matrix motif indicating the set of focal CIP categories; counts for categories included in brackets are considered in union.For example, whereas panel (A) calculates fD,CIP (t) across all 9 CIP categories (each category considered separately); instead, panel (B) calculates each Dp by considering just two super-groups, the first consisting of the union of CIP counts for categories [1-4], and the second comprised of categories [5][6][7][8][9].Subset SA  (2); each curve is calculated for articles belonging to a given geographic region, as determined by the coauthors' regional affiliations: Australasia (red), Europe (blue), and North America (orange).For each panel we provide a matrix motif indicating the set of focal SA categories; counts for categories included in brackets are considered in union.For example, whereas panel (A) calculates fD,SA(t) across all 6 SA categories (each category considered separately); instead, panel (C) calculates each DSA,p by considering a subset of four SA categories 1-4.

N(t)
t < l a t e x i t s h a 1 _ b a s e 6 4 = " L / T 4 4 + 5 q P X 8 U r 8 1 R p T e G y U H u z H 0 h y h 4 U z 9 P j F B X K k x j 0 y S I z 1 U v 7 1 M / M t r p 7 o f d C Z U J K k m A s 8 f 6 q c M 6 h h m n c A e l Q R r N j Y E Y U n N X y E e I o m w N s 0 V T A l f m 8 L / S c N 3 P N f x r k v F 6 s W i j j w 4 A E f g B H i g A q r g C t R A H W B w B x 7 A E 3 i 2 7 q 1 H 6 8 V 6 n U d z 1 m J m H / y A 9 f Y J U N S X u g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " O 0 W W 8 U 4 4 p f y Z N e Y z 6 w 4 B P 0 / g j w R p T e G y U H u z H 0 h y h 4 U z 9 P j F B X K k x j 0 y S I z 1 U v 7 1 M / M t r p 7 o f d C Z U J K k m A s 8 f 6 q c M 6 h h m n c A e l Q R r N j Y E Y U n N X y E e I o m w N s 0 V T A l f m 8 L / S c N 3 P N f x r k v F 6 s W i j j w 4 A E f g B H i g A q r g C t R A H W B w B x 7 A E 3 i 2 7 q 1 H 6 8 V 6 n U d z 1 m J m H / y A 9 f Y J U N S X u g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " O 0 W W 8 U 4 4 p f y Z N e Y z 6 w 4 B P 0 / g j w R p T e G y U H u z H 0 h y h 4 U z 9 P j F B X K k x j 0 y S I z 1 U v 7 1 M / M t r p 7 o f d C Z U J K k m A s 8 f 6 q c M 6 h h m n c A e l Q R r N j Y E Y U n N X y E e I o m w N s 0 V T A l f m 8 L / S c N 3 P N f x r k v F 6 s W i j j w 4 A E f g B H i g A q r g C t R A H W B w B x 7 A E 3 i 2 7 q 1 H 6 8 V 6 n U d z 1 m J m H / y A 9 f Y J U N S X u g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " O 0 W W 8 U 4 4 p f y Z N e Y z 6 w 4 B P 0 / g j w Kp", "nMeSHMain","Zp", "XSAp", "XCIPp", "NEUROSHORTXSAp", "NEUROSHORTXCIPp", "NEUROLONGXSAp", "NEUROLONGXCIPp", "NRegp", "NSAp",  100 k < l a t e x i t s h a 1 _ b a s e 6 4 = " q f 2 H u w t q 2 V W x 0 S R Q j 3 / c 7 v D a + N 4 = " > A A A B 8 n i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k V Q Y 9 F L x 4 r 2 F p o l 5 J N s 2 1 o N l m S W a E s / R l e P C j i 1 V / j z X 9 j 2 u 5 B W x 8 E H u / N T G Z e l E p h 0 f e / v d L a + s b m V n m 7 s r O 7 t 3 9 Q P T x q W 5 0 Z x l t M S 2 0 + F q U l r + g 5 h j / w P n 8 A 6 l 6 Q W w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " q f 2 H u w t q 2 V W x 0 S R Q j 3 / c 7 v D a + N 4 = " > A A A B 8 n i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k V Q Y 9 F L x 4 r 2 F p o l 5 J N s 2 1 o N l m S W a E s / R l e P C j i 1 V / j z X 9 j 2 u 5 B W x 8 E H u / N T G Z e l E p h 0 f e / v d L a + s b m V n m 7 s r O 7 t 3 9 Q P T x q W 5 0 Z x l t M S 2 0 + F q U l r + g 5 h j / w P n 8 A 6 l 6 Q W w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " q f 2 H u w t q 2 V W x 0 S R Q j 3 / c 7 v D a + N 4 = " > A A A B 8 n i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k V Q Y 9 F L x 4 r 2 F p o l 5 J N s 2 1 o N l m S W a E s / R l e P C j i 1 V / j z X 9 j 2 u 5 B W x 8 E H u / N T G Z e l E p h 0 f e / v d L a + s b m V n m 7 s r O 7 t 3 9 Q P T x q W 5 0 Z x l t M S 2 0 + F q U l r + g 5 h j / w P n 8 A 6 l 6 Q W w = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " q f 2 H u w t q 2 V W x 0 S R Q j 3 / c 7 v D a + N 4 = " > A A A B 8 n i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C R b B U 9 k V Q Y 9 F L x 4 r 2 F p o l 5 J N s 2 1 o N l m S W a E s / R l e P C j i 1 V / j z X 9 j 2 u 5 B W x 8 E H u / N T G Z e l E p h 0 f e / v d L a + s b m V n m 7 s r O 7 t 3 9 Q P T x q W 5 0 Z x l t M S 2 0 + F q U l r + g 5 h j / w P n 8 A 6 l 6 Q W w = = < / l a t e x i t > 100 z < l a t e x i t s h a 1 _ b a s e 6 4 = " y M a 7 e w N C l J K s x S L 3 4 V L x 4 U 8 e q 3 8 O a 3 M d 1 6 0 M 0 H g c d 7 7 5 f k 9 8 K E M 2 0 Q + n Y q S 8 s r q 2 v V 9 d r G 5 t b 2 T n 1 3 r 6 t l q g j t E M m l u g u x p p w J 2 j H M c H q X K I r j k N N e O L 4 q / N 4 9 7 e w N C l J K s x S L 3 4 V L x 4 U 8 e q 3 8 O a 3 M d 1 6 0 M 0 H g c d 7 7 5 f k 9 8 K E M 2 0 Q + n Y q S 8 s r q 2 v V 9 d r G 5 t b 2 T n 1 3 r 6 t l q g j t E M m l u g u x p p w J 2 j H M c H q X K I r j k N N e O L 4 q / N 4 9 U 8 e q 3 8 O a 3 M d 1 6 0 M 0 H g c d 7 7 5 f k 9 8 K E M 2 0 Q + n Y q S 8 s r q 2 v V 9 d r G 5 t b 2 T n 1 3 r 6 t l q g j t E M m l u g u x p p w J 2 j H M c H q X K I r j k N N e O L 4 q / N 4 9 U 8 e q 3 8 O a 3 M d 1 6 0 M 0 H g c d 7 7 5 f k 9 8 K E M 2 0 Q + n Y q S 8 s r q 2 v V 9 d r G 5 t b 2 T n 1 3 r 6 t l q g j t E M m l u g u x p p w J 2 j H M c H q X K I r j k N N e O L 4 q / N 4 9 u X e e / y E M 6 U R + r Z K S 8 s r q 2 v l 9 c r G 5 t b 2 j r 2 7 1 1 Z x K g l t k Z j H s u t j R T k T t K W Z 5 r S b S I o j n 9 O O P 7 q e + J 0 H u X e e / y E M 6 U R + r Z K S 8 s r q 2 v l 9 c r G 5 t b 2 j r 2 7 1 1 Z x K g l t k Z j H s u t j R T k T t K W Z 5 r S b S I o j n 9 O O P 7 q e + J 0 H u X e e / y E M 6 U R + r Z K S 8 s r q 2 v l 9 c r G 5 t b 2 j r 2 7 1 1 Z x K g l t k Z j H s u t j R T k T t K W Z 5 r S b S I o j n 9 O O P 7 q e + J 0 H A i e w S t 4 s 5 6 s F + v d + p i 2 L l j l z B 7 4 A + v z B w h 5 l E 8 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " w t A i e w S t 4 s 5 6 s F + v d + p i 2 L l j l z B 7 4 A + v z B w h 5 l E 8 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " w t A i e w S t 4 s 5 6 s F + v d + p i 2 L l j l z B 7 4 A + v z B w h 5 l E 8 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " w t < l a t e x i t s h a 1 _ b a s e 6 4 = " M P K p F a 5 + 2 n L P M w 3 / M Q j 5 B N j W V n Z T P 7 8 A f W 5 w 8 + q Z E H < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " M P K p F a 5 + 2 n L P M w 3 / M Q j 5 B N j W V n Z T P 7 8 A f W 5 w 8 + q Z E H < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " M P K p F a 5 + 2 n L P M w 3 / M Q j 5 B N j W V n Z T P 7 8 A f W 5 w 8 + q Z E H < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " M P K p F a 5 + 2 n L P M w 3 / M Q j 5 B N j W

FIG. 1 :
FIG. 1: Data collection and classification schemes.The upper part of the figure shows the data generation mechanism along with the resulting topical (SA) and disciplinary (CIP) clusters.The middle part of the figure shows on the world map regional clusters pertaining to three large HBS funding initiatives -North America (NA), Europe (EU), and Australasia (AA).The lower part of the figure shows an example of how all three categorizations are operationalized for analytic purposes.Circles represent four research articles with authorship from distinct regions.The articles feature different keyword (SA) or disciplinary (CIP) category mixtures assigned one of two diversity measures: mono-(M ) and cross-domain (X).
p , each corresponding to top-level Medical Subject Heading (MeSH) categories implemented by PubMed, which are indicated by the letters in brackets: (1) Psychiatry & Psychology [F], (2) Anatomy & Organisms [A,B], (3) Phenomena & Processes [G], (4) Health [C,N], (5) Techniques & Equipment [E], and (6) Technology & Information Science [J,L]; notably, regarding the structure-function problem that is a fundamental focus in much of biomedical science, category (2) represents the domain of structure while (3) represents function.

FIG. 3 :
FIG. 3: Evolution of SA boundary-crossing within and across disciplinary clusters.(A) SA composition of HBS research within disciplinary (CIP) clusters.Each subpanel represents articles published by researchers from a given CIP cluster, showing the fraction of article-level MeSH belonging to each SA, shown over 5-year intervals across the period 1970-2018.The increasing prominence of SA 5 & 6 in nearly all domains, in particular CIP 4 (Biotech.& Genetics) indicates the critical role of informatic capabilities in facilitating biomedical convergence science (Yang et al 2021).(B) Empirical CIP-SA association networks calculated for non-overlapping sets of mono-domain (MCIP MSA) and cross-domain (XCIP XSA) articles, and based upon the Broad configuration.The difference between these two bi-partite networks (∆XM ) indicates the emergent research channels that are facilitated by simultaneous XCIP and XSA boundary crossing -in particular integrating SA 2 with 3 (i.e. the structure-function nexus) facilitated by teams combining disciplines 1, 2, 4 and 9.
Neurosciences 2: Biology 3: Psychology 4: Biotech.& Genetics 5: Medical Specialty 6: Health Sciences 7: Pathology & Pharmacology 8: Eng.& Informatics 9: Chemistry & Physics & Math FIG.4: Evolution of CIP and SA diversity in Human Brain Science research.(A-F) Each fD(t) represents the average article diversity measured as categorical co-occurrence, by geographic region: North America (orange), Europe (blue), and Australasia (red).Each matrix motif indicates the set of CIP or SA categories used to compute Dp in Eq. (2); categories included in brackets are considered in union.For example, panel (A) calculates fD,CIP (t) across all 9 CIP categories; instead, panel (B) is based upon counts for two super-groups, the first consisting of the union of CIP counts for categories 1 and 3, and the second comprised of categories 2, 4, 5, 6 and 7. (A,D) Broad diversity is calculated using all categories considered as separate domains; (B,E) Neighboring represents the shorter-distance boundary across the neuro-psychological ↔ bio-medical interface; (C,F) Distant represents longer-distance convergence across the neuro-psycho-medical ↔ techno-informatic interface.
p 6 1 z w W z m O 3 m D 4 P k / f c q g 9 w = = < / l a t e x i t >

FIG. 5 :
FIG.5: Propensity for X and citation impact attributable to cross-domain activity at the article level.(A) Annual growth rate in the likelihood P (X) of research having cross-domain attributes represented generically by X. (B) Decreased likelihood P (X) after 2014.(C) Citation premium estimated as the percent increase in cp attributable to cross-domain mixture X, measured relative to mono-domain (M) research articles representing the counterfactual baseline.Calculated using a researcher fixed-effect model specification which accounts for time independent individual-specific factors; see TablesS4-S5for full model estimates.Note that "Broad" corresponds to XSA,XCIP ,X SA&CIP ; "Neighboring" corresponds to XNeighboring,SA,XNeighboring,CIP ,X Neighboring,SA&CIP ; and "Distant" corresponds to XDistant,SA,XDistant,CIP ,X Distant,SA&CIP .(D) Difference-in-Difference (δX+) estimate of the "Flagship project effect" on the citation impact of cross-domain research.Shown are point estimates with 95% confidence interval.Asterisks above each estimate indicate the associated p−value level: * p < 0.05, * * p < 0.01, * * * p < 0.001.

S2.
FigureS2(A) shows the relative frequency f < R,CIP (f > R,CIP ) by region, calculated in the 5-year period before (<) and after (>) the HB flagship project ramp-up year 2014.Each f R,CIP value represents the average −−→ CIP p vector calculated across all articles belonging to a particular region, and normalized to unity to facilitate comparison, i.e. 9 CIP =1 f R,CIP = 1.In both the pre-2014 period [2009-2013] and post-2014 period [2014-2018], the most prominent disciplines are Neurosciences [CIP 1] and Medical Specialty [5] in the North American (NA) and European (EU) regions.The Australasian (AA) region shows higher levels of scholars from disciplines in Engineering & Informatics [8] and Chemistry & Physics & Math [9] than their NA and EU counterparts in the pre-2014 period.However, after 2014 we observe a realignment of AA with the remarkably similar NA and EU profiles.This realignment is achieved by decreases in Engineering & Informatics [8] and Chemistry & Physics & Math [9], and increases in Neurosciences [1] & Medical Specialty [5].Fig. S2(B) shows these relative shifts calculated as the difference ∆f R,CIP = f > R,CIP −f < R,CIP .Overall, there appears to be a remarkable synchrony in the direction and magnitude of ∆f R,CIP for the NA and EU regions, primarily associated with decreases in Neurosciences [1] and Pathology & Pharmacology [7] and increases in Psychology [3] and Medical Specialty [5].NA is the only region showing increase in both Science & Engineering domains [CIP 8&9].Similarly, Fig. S3(A) shows the analog frequencies f < R,SA (f > R,SA ) for each SA by region.In the pre-2014 period, the most prominent SA categories are Anatomy & Organisms [SA 2] and Health [4], with all regions showing similar distribution profiles.The most prominent distinction in AA is a reduced prominence of Psychiatry & Psychology [1].By and large, the profiles remain consistent in the post-2014 period, with AA and NA experiencing prominent increases in Health [4], and AA showing a modest increase in Psychiatry & Psychology [1], which nevertheless does not fully compensate for the initial deficit in this category with respect to both NA and EU.Figure S3(B) indicates that all regions experienced a consistent decline in research involving the structure-oriented topics associated with Anatomy & Organisms [2], as well as the function-oriented topics associated with Phenomena & Processes [3].The most prominent distinction between regions is for NA and EU, which both feature increases in Technology & Information Science [6] that are relatively larger than observed for AA and World, likely reflecting the technological capacity related to the tech.hubs in these regions; another distinction relates to the Psychiatry & Psychology [SA 1] which increases in EU and AA more than for NA and World; and also for Health[4]  which increases in NA and AA more than for EU and World.
Figure S3(B) indicates that all regions experienced a consistent decline in research involving the structure-oriented topics associated with Anatomy & Organisms [2], as well as the function-oriented topics associated with Phenomena & Processes [3].
Figure S7(C,D) show that increasing diversity associated with Health [4] is largely captured via the in-corporation of technology-and informatics-oriented capabilities [5,6] -as opposed to integrating more traditional biological SA representing research domains associated with questions relating to how Anatomy & Organisms (structure) [2] and Phenomena & Processes (function) [3] relate to complex human behavior addressed by Psychiatry & Psychology [1] -as illustrated in Fig. S7(E).
Model A Quantifying factors associated with propensity for CIP and SA diversity In the first model, we seek to better understand the factors associated with the prevalence of CIP and SA diversity as they evolve over time, and in particular their relation to the launching of the HB flagship programs.In order to model the articlelevel factors (indicated by p) associated with cross-domain research activity we define the binary indicator variable generically denoted as I X,p .By way of example, if we are considering SA diversity, then the indicator variable I X SA,p takes the value 1 if O SA ( F p ) = X SA and 0 if O SA ( F p ) = M .We then model the 2-state odds Q ≡ P (O SA ( Fp)=X SA ) P (O SA ( Fp)=M ) = P (X SA ) P (M ) , which represents the propensity for cross-domain research, where P (X SA ) + P (M ) = 1.Likewise, in the case of CIP diversity we model the odds as Q ≡ P (O CIP ( Fp)=X CIP ) P (O CIP ( Fp)=M ) ; and finally, we also consider the likelihood of research featuring both types of cross-domain activity, for SA & CIP, represented as Q ≡ P (O SA&CIP ( Fp)=X SA&CIP ) P (O SA&CIP ( Fp)=M ) ) we use X indicators defined according to the Neighboring configuration representing shorter-distance crossdomain activity -here capturing the neurobiological -vs-bioengineering interface.In our model specification, X is represented by the binary indicator variables I X Neighboring,SA and I X Neighboring,CIP .In the case of X Neighboring,SA , this interface corresponds to articles combining at least one MeSH mapping onto SA 1 (Psychiatry & Psychology) and at least one MeSH mapping onto SA [2] (Anatomy & Organisms), [3] (Phenomena & Processes) or [4] (Health).In the case of X Neighboring,CIP , this interface corresponds to articles combining at least one coauthor whose department maps onto CIP [1] (Neurosciences) or [3] (Psychology) and at least one coauthor whose department maps onto CIP [2] (Biology), [4] (Biotechnology & Genetics) or [5] (Medical Specialty) or [6] (Health Sciences) or [7] (Pathology & Pharmacology).Note that all Scopus scholars map onto a single CIP, and so solo-authored research articles are by definition mono-disciplinary.
(corresponding to Psychiatry & Psychology (mind), Anatomy & Organisms (structure), Phenomena & Processes (function) and Health, respectively) and at least one MeSH mapping onto SAs [5,6] (Techniques & Equipment and Technology & Information Science, respectively).In the case of X Neighboring,CIP , this interface corresponds to articles combining at least one coauthor whose department maps onto CIPs [1,3,5] (Neurosciences, Psychology and Medical Specialty, respectively) and at least one coauthor whose department maps onto CIPs [4,8] (Biotechnology & Genetics and Engineering & Informatics, respectively).

1 :
FIG. S1: Subject Area and Disciplinary clusters.(A) Principal MeSH terms comprising 6 Subject Area (SA) clusters.(B) Minimum spanning tree representation of topical hierarchy based upon SA co-occurrence within articles; node size proportional to total number of articles featuring a particular SA.(C) CIP codes comprising 9 disciplinary clusters.(D) Minimum spanning tree representation of disciplinary hierarchy based upon CIP co-occurrence within articles; node size proportional to total number of articles featuring a particular CIP.
and CCIP,ij: matrices reporting % di erences in co-occurrences between the 5-year pre/post periods 12. the set of "Major Topic" MeSH keywords for a publication, Wp; the SA operator converts accordingly: OSA( Wp) = ≠ ae SAp; Similarly the set of Scopus authors on are, Ąp and 24.NCIP,p article-level count variable indicating the total number of CIP represented (independent of concentrations), ie min = 1 and max = 9 25.zj is the average z value calculated across all articles published in a specific journal (indexed by j) .(Author One) contributed equally to this work with A.T. (Author Two) (remove if not applicable).2 To whom correspondence should be addressed.E-mail: author.twoemail.comwww.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX PNAS | May 4, 2020 | vol.XXX | no.XX | 1-16 published in year t; normalized citations zp (involving µt) and ‡t) 5. number of coauthors, kp 6. number of regions associated with a publication, Rp 7. The article-level categorical diversity measure is calculated using Dp, the outer-product matrix used to count co-occurrences at the article level, drawing on the generic categorical vector vp period 2014-2018 (5-year period 2009-2013) 9. f indicates a generic fraction/frequency variable (i.e. with range [0,1]); subscript indicates the variable context, e.g.fD,SA(t) (fD,CIP (t)) indicates the co-occurrence per article measured using SA (CIP).
FIG. S2: Temporal and regional distributions of CIP-coded author departments in human brain research.(A) Relative frequency of department CIP clusters in the 5-year period before 2014 (f < R,CIP ) and after 2014 (f > R,CIP ); f values are normalized to unity within region.(B) Shift in CIP cluster frequencies given by the difference ∆f R,CIP = f > R,CIP − f < R,CIP .(C) Disciplinary {CIP, CIP } co-occurrence in human brain science -by region.Each co-occurrence matrix C < CIP measures the frequency of a given {CIP, CIP } pair over the 5-year pre-period 2009-2013 based upon publications associated with one of three broad geographic regions; see Eqn. (S1) for its definition.By construction, matrix element values C < CIP,ij are proportional to the net share of publications featuring the indicated pair.Diagonal elements measure the frequency of publications featuring only a single CIP category.Note the use of two legends, one for the mono-disciplinary diagonal elements (gray-scale legend reported in units of 1000 publications) and one for off-diagonal elements (color-scale legend reported in units of 100 publications); as indicated by the legend scales, mono-CIP publications occur with significantly higher frequency than multi-CIP publications.(D) Relative change (post -pre period) in the co-occurrence matrix: ∆C CIP,ij measures the percent difference in the frequency of publications characterized by each {CIP, CIP } pair.
FIG. S3: Temporal and regional distributions of Subject Areas (SA) in human brain research.(A) Relative frequency of topical SA clusters in the 5-year period before 2014 (f < R,SA ) and after 2014 (f > R,SA ); f values are normalized to unity within region.(B) Shift in SA cluster frequencies given by the difference ∆fR,SA = f > R,SA − f < R,SA .(C) Topical {SA, SA} co-occurrence in human brain science -by region.Each co-occurrence matrix C < SA measures the frequency of a given {SA, SA} pair over the 5-year pre-period 2009-2013 based upon publications associated with one of three broad geographic regions; see Eqn. (S1) for its definition.By construction, matrix element values C < SA,ij are proportional to the net share of publications featuring the indicated pair.Diagonal elements measure the frequency of publications featuring only a single SA category.Note the use of two legends, one for the mono-dimensional diagonal elements (gray-scale legend) and one for off-diagonal elements (color-scale legend), both of which are reported in units of 1000 publications.(D) Dynamic co-occurrence matrix, ∆CSA,ij, measuring the percent difference (post-pre) in the frequency of publications characterized by each {SA, SA} pair.
FIG. S5: Expansive topical integration facilitated by CIP diversity.The number NCIP,p of distinct CIP featured by a given article is a measure of disciplinary diversity.(A) Average number of SA per article, NSA , computed for articles with a given NCIP,p and conditioned on the normalized citation impact.(B) Average number of SA per article, NSA, computed for articles featuring X SA&CIP according to a given configuration (Broad, Neighboring and Distant).The Distant configuration consistently corresponds to the highest levels of SA diversity.Comparing panels (A) and (B), NSA are also consistently larger for the NCIP subsets in (B) featuring X SA&CIP .For both panels, the horizontal dashed red line represents the baseline for comparison, computed as the average number of SA, NSA = 2.2, calculated for mono-disciplinary articles (NCIP,p = 1).
FIG. S6:Trends in cross-disciplinary (CIP) scholarship in human brain science.Each curve corresponds to fD,CIP (t) , representing the average article diversity measured as categorical CIP co-occurrence in the off-diagonal matrix elements of DCIP,p, see Eq. (2); each curve is calculated for articles belonging to a given geographic region, as determined by the coauthors' regional affiliations: Australasia (red), Europe (blue), and North America (orange).For each panel we provide a matrix motif indicating the set of focal CIP categories; counts for categories included in brackets are considered in union.For example, whereas panel (A) calculates fD,CIP (t) across all 9 CIP categories (each category considered separately); instead, panel (B) calculates each Dp by considering just two super-groups, the first consisting of the union of CIP counts for categories[1][2][3][4], and the second comprised of categories[5][6][7][8][9].
FIG. S7:Trends in cross-topical (SA) scholarship in human brain science.Each curve corresponds to fD,SA(t) , representing the average article diversity measured as categorical SA co-occurrence in the off-diagonal matrix elements of DSA,p, see Eq. (2); each curve is calculated for articles belonging to a given geographic region, as determined by the coauthors' regional affiliations: Australasia (red), Europe (blue), and North America (orange).For each panel we provide a matrix motif indicating the set of focal SA categories; counts for categories included in brackets are considered in union.For example, whereas panel (A) calculates fD,SA(t) across all 6 SA categories (each category considered separately); instead, panel (C) calculates each DSA,p by considering a subset of four SA categories 1-4.

For
y a a Y N T T J j k h F K 6 U + 4 c a G I W 3 / H n X 9 j 2 o 6 g o g c u H M 6 5 l 3 v v C R P O t E H o w 8 m t r W 9 s b u W 3 C z u 7 e / s H x c O j t o 5 T R W i L x D x W 3 R B r y p m k L c M M p 9 1 E U S x C T j v h 5 G r h d + 6 p 0 i y W t 2 a a 0 E D g k W Q R I 9 h Y q d v X b C T w w A y K J e T W f d + r I o j c c r m S k W q t X v a h 5 6 I l S i B D c 1 B 8 7 w 9 j k g o q D e F Y 6 5 6 HE h P M s D K M c D o v 9 F N N E 0 w m e E R 7 l k o s q A 5 m y 3 v n 8 M w q Q x j F y p Y 0 c K l + n 5 h h o f V U h L Z T Y D P W v 7 2 F + J f X S 0 3 k B z M m k 9 R Q S V a L o p R D E 8 P F 8 3 D I F C W G T y 3 B R D F 7 K y R j r D A x N q K C D e H r U / g / a V + 4 H n K 9 m 0 q p c Z n F k Q c n 4 B S c A w / U Q A N c g y Z o A Q I 4 e A BP 4 N m 5 c x 6 d F + d 1 1 Z p z s p l j 8 A P O 2 y e i A 5 B c < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " L / T 4 4 + 5 q P X 8 U r 8 1I x / x 8 / o 1 l v O o = " > A A A B 7 3 i c d V D L S g M x F M 3 U V 6 2 v q k s 3 w S K 4 G j K 2 p d N d 0 Y 3 L C v Y B 7 V A y a a Y N T T J j k h F K 6 U + 4 c a G I W 3 / H n X 9 j 2 o 6 g o g c u H M 6 5 l 3 v v C R P O t E H o w 8 m t r W 9 s b u W 3 C z u 7 e / s H x c O j t o 5 T R W i L x D x W 3 R B r y p m k L c M M p 9 1 E U S x C T j v h 5 G r h d + 6 p 0 i y W t 2 a a 0 E D g k W Q R I 9 h Y q d v X b C T w w A y K J e T W f d + r I o j c c r m S k W q t X v a h 5 6 I l S i B D c 1 B 8 7 w 9 j k g o q D e F Y 6 5 6 H E h P M s D K M c D o v 9 F N N E 0 w m e E R 7 l k o s q A 5 m y 3 v n 8 M w q Q x j F y p Y 0 c K l + n 5 h h o f V U h L Z T Y D P W v 7 2 F + J f X S 0 3 k B z M m k 9 R Q S V a L o p R D E 8 P F 8 3 D I F C W G T y 3 B R D F 7 K y R j r D A x N q K C D e H r U / g / a V + 4 H n K 9 m 0 q p c Z n F k Q c n 4 B S c A w / U Q A N c g y Z o A Q I 4 e A B P4 N m 5 c x 6 d F + d 1 1 Z p z s p l j 8 A P O 2 y e i A 5 B c < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " L / T 4 4 + 5 q P X 8 U r 8 1I x / x 8 / o 1 l v O o = " > A A A B 7 3 i c d V D L S g M x F M 3 U V 6 2 v q k s 3 w S K 4 G j K 2 p d N d 0 Y 3 L C v Y B 7 V A y a a Y N T T J j k h F K 6 U + 4 c a G I W 3 / H n X 9 j 2 o 6 g o g c u H M 6 5 l 3 v v C R P O t E H o w 8 m t r W 9 s b u W 3 C z u 7 e / s H x c O j t o 5 T R W i L x D x W 3 R B r y p m k L c M M p 9 1 E U S x C T j v h 5 G r h d + 6 p 0 i y W t 2 a a 0 E D g k W Q R I 9 h Y q d v X b C T w w A y K J e T W f d + r I o j c c r m S k W q t X v a h 5 6 I l S i B D c 1 B 8 7 w 9 j k g o q D e F Y 6 5 6 H E h P M s D K M c D o v 9 F N N E 0 w m e E R 7 l k o s q A 5 m y 3 v n 8 M w q Q x j F y p Y 0 c K l + n 5 h h o f V U h L Z T Y D P W v 7 2 F + J f X S 0 3 k B z M m k 9 R Q S V a L o p R D E 8 P F 8 3 D I F C W G T y 3 B R D F 7 K y R j r D A x N q K C D e H r U / g / a V + 4 H n K 9 m 0 q p c Z n F k Q c n 4 B S c A w / U Q A N c g y Z o A Q I 4 e A BP 4 N m 5 c x 6 d F + d 1 1 Z p z s p l j 8 A P O 2 y e i A 5 B c < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " L / T 4 4 + 5 q P X 8 U r 8 1I x / x 8 / o 1 l v O o = " > A A A B 7 3 i c d V D L S g M x F M 3 U V 6 2 v q k s 3 w S K 4 G j K 2 p d N d 0 Y 3 L C v Y B 7 V A y a a Y N T T J j k h F K 6 U + 4 c a G I W 3 / H n X 9 j 2 o 6 g o g c u H M 6 5 l 3 v v C R P O t E H o w 8 m t r W 9 s b u W 3 C z u 7 e / s H x c O j t o 5 T R W i L x D x W 3 R B r y p m k L c M M p 9 1 E U S x C T j v h 5 G r h d + 6 p 0 i y W t 2 a a 0 E D g k W Q R I 9 h Y q d v X b C T w w A y K J e T W f d + r I o j c c r m S k W q t X v a h 5 6 I l S i B D c 1 B 8 7 w 9 j k g o q D e F Y 6 5 6 H E h P M s D K M c D o v 9 F N N E 0 w m e E R 7 l k o s q A 5 m y 3 v n 8 M w q Q x j F y p Y 0 c K l + n 5 h h o f V U h L Z T Y D P W v 7 2 F + J f X S 0 3 k B z M m k 9 R Q S V a L o p R D E 8 P F 8 3 D I F C W G T y 3 B R D F 7 K y R j r D A x N q K C D e H r U / g / a V + 4 H n K 9 m 0 q p c Z n F k Q c n 4 B S c A w / U Q A N c g y Z o A Q I 4e A B P 4 N m 5 c x 6 d F + d 1 1 Z p z s p l j 8 A P O 2 y e i A 5 B c < / l a t e x i t > µt < l a t e x i t s h a 1 _ b a s e 6 4 = " 1 N W Q D y 5 s p z b O W b s M 1 N F + w M G C n w s = " > A A A B 7 H i c d V B N S w M x E M 3 W r 1 q / q h 6 9 B I v g a d l t K + 2 x 6 M V j B b c t t E v J p m k b m m S X Z F Y o S 3 + D F w + K e P U H e f P f m H 6 B i j 4 Y e L w 3 w 8 y 8 K B H c g O d 9 O r m N z a 3 t n f x u Y W / / 4 P C o e H z S M n U e P 6 B m 9 O M p 5 c l 6 d t 2 V r z l n N n K I f c N 6 / A G C m j w 8 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = "1 N W Q D y 5 s p z b O W b s M 1 N F + w M G C n w s = " > A A A B 7 H i c d V B N S w M x E M 3 W r 1 q / q h 6 9 B I v g a d l t K + 2 x 6 M V j B b c t t E v J p m k b m m S X Z F Y o S 3 + D F w + K e P UH e f P f m H 6 B i j 4 Y e L w 3 w 8 y 8 K B H c g O d 9 O r m N z a 3 t n f x u Y W / / 4 P C o e H z S M n U e P 6 B m 9 O M p 5 c l 6 d t 2 V r z l n N n K I f c N 6 / A G C m j w 8 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = "1 N W Q D y 5 s p z b O W b s M 1 N F + w M G C n w s = " > A A A B 7 H i c d V B N S w M x E M 3 W r 1 q / q h 6 9 B I v g a d l t K + 2 x 6 M V j B b c t t E v J p m k b m m S X Z F Y o S 3 + D F w + K e P UH e f P f m H 6 B i j 4 Y e L w 3 w 8 y 8 K B H c g O d 9 O r m N z a 3 t n f x u Y W / / 4 P C o e H z S M n U e P 6 B m 9 O M p 5 c l 6 d t 2 V r z l n N n K I f c N 6 / A G C m j w 8 = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = "1 N W Q D y 5 s p z b O W b s M 1 N F + w M G C n w s = " > A A A B 7 H i c d V B N S w M x E M 3 W r 1 q / q h 6 9 B I v g a d l t K + 2 x 6 M V j B b c t t E v J p m k b m m S X Z F Y o S 3 + D F w + K e P UH e f P f m H 6 B i j 4 Y e L w 3 w 8 y 8 K B H c g O d 9 O r m N z a 3 t n f x u Y W / / 4 P C o e H z S M n U e P 6 B m 9 O M p 5 c l 6 d t 2 V r z l n N n K I f c N 6 / A G C m j w 8 = < / l a t e x i t > h i = 1.24 < l a t e x i t s h a 1 _ b a s e 6 4 = " O 0 W W 8 U 4 4 p f y Z N e Y z 6 w 4 B P 0 / g j w I FIG. S8: Distributions of Article-level variables.(A) N (t) is the number of HB articles by (inset) The log-citation distribution is well described by a log-normal distribution (see panel G).As such, µt and σt corresponding to log-transformed citation counts are appropriate measures of log-normal location and scale; the average and standard deviation are σ ± SD = 1.24 ± 0.09 over the 49-year period 1970-2018.(B) P (k) is the probability distribution (PDF) of the number of coauthors per article.(C) P (w) is the PDF of the number of Major Topic MeSH "keywords" per publication, denoted by wp.(D) Each MeSH keyword maps onto one of the 6 SA clusters.Shown is the PDFof the number of distinct SA categories per publication, N SA,p .(E) Each departmental affiliation maps onto one of the 9 CIP clusters.Shown is the PDF of the number of distinct CIP categories per publication, N CIP,p .(F) Each Scopus Author's affiliation maps onto one of 4 regions: Australasia, Europe, North America, and (rest of) World.Shown is the PDF of the number of region categories per publication, N R,p .(G) Probability distribution (PDF) of zp disaggregated by publication cohort {t}; each green curve represents the smoothed kernel density estimate of the P (z), calculated with kernel bandwith = 0.1.Data are split into 5-year periods from 1965-2018, with the first panel including data from 1945-1964.Each PDF shows the baseline Normal distribution N (0, 1), demonstrating the stability of the distribution of normalized citation impact values over time, thereby facilitating robust cross-temporal modeling.
FIG. S9: Cross-correlation and Descriptive statistics for regression model variables.Upper-diagonal elements: bivariate histogram between row and column variables.Diagonal elements: histogram for variable indicated by the row/column labels.Lower-diagonal elements: bivariate cross-correlation coefficient: light-shaded squares indicate the Pearson's correlation coefficient between two variables that are both continuous measures; dark-shaded squares indicate the Cramer's V associate between two variables that are both nominal (categorical).
FIG. S10: Summary of Logit model parameter estimates.(A-C) Reported are 100β for the main covariates of interest reported in Tables S1-S3, quantifying the percent increase in the odds Q ≡ P (X)/P (M ) associated with a one-unit increases in: (A) mean journal citation impact zj,p; (C) ln k; (B) number of coauthors, kp; (C) number major MeSH terms (keywords), wp; (D-F) difference-in-difference estimates (100δR+) capturing the effect of Flagship project ramp-ups after 2013 on rates of cross-domain research -at three levels of specificity regarding the diversity range captured by X. Broad configuration correspond to unconstrained combinations of SA and CIP (represented by XSA, XCIP , X SA&CIP ).The Neighboring configuration corresponds to specific set of category combinations capturing the neurobiological -vs-bioengineering interface, represented by SA [1] × [2-4] and CIP [1,3] × [2,4-7] (and represented by XNeighboring,SA, XNeighboring,CIP , X Neighboring,SA&CIP ).And Distant also identifies a specific set of category combinations capturing the neuro-psycho-medical -vs-technocomputational interface, represented by SA [1-4] × [5,6] and CIP [1,3,5] × [4,8] (XDistant,SA, XDistant,CIP , X Distant,SA&CIP ).Reported are percent increase in Q, a ratio representing the propensity for cross-domain research relative to mono-domain research, directly associated with the ramp-up of Brain projects in: (D) Australasia; (E) Europe; (F) North America.Shown are point estimates with 95% confidence interval.Standard errors clustered by region to account for residuals that are correlated within regions over time.Asterisks above each estimate indicate the associated p−value level: * p < 0.05, * * p < 0.01, * * * p < 0.001.
p ); importantly, this definition accounts for variability in N SA by normalizing the sum of the SA counts contained in − → SA p by N SA,p so that each article contributes equally to the average.Less prominent CIP-SA links are pruned from our Sankey chart visualization in order to emphasize the most meaningful CIP-SA relations.To this end, we remove the weakest

Table :
reports odds ratio e^beta (\approx 1+beta for small beta)

TABLE S1 :
Modeling the prevalence of cross-domain activity at the article level.Article-level analysis implemented using the logit model.The dependent variable is a binary indicator variable taking the value 1 if the article features cross-domain combinations (represented by XSA,p or XCIP,p or X SA&CIP,p ) and 0 otherwise.Publication data included: articles published in period yp ∈[1970, 2018]with kp ≥ 2 and wp ≥ 2. Robust standard errors are shown in parenthesis below each point estimate.Reported are odds ratios, exp(β).

TABLE S2 :
Conditional definition of Xp -identifying "Neighboring" or shorter-distance cross-domain combinations.Article-level analysis implemented using the logit model.The dependent variable is a binary indicator variable taking the value 1 if the article features cross-domain combinations (represented by XNeighboring,SA,p or XNeighboring,CIP,p or X Neighboring,SA&CIP,p ) and 0 otherwise.Publication data included: articles published in period yp ∈[1970, 2018]with kp ≥ 2 and wp ≥ 2. Robust standard errors are shown in parenthesis below each point estimate.Reported are odds ratios, exp(β).

TABLE S3 :
Conditional definition of Xp -identifying "Distant" or longer-distance cross-domain combinations.Article-level analysis implemented using the logit model.The dependent variable is a binary indicator variable taking the value 1 if the article features cross-domain combinations (represented by XDistant,SA,p or XDistant,CIP,p or X Distant,SA&CIP,p ) and 0 otherwise.Publication data included: articles published in period yp ∈[1970, 2018]with kp ≥ 2 and wp ≥ 2. Robust standard errors are shown in parenthesis below each point estimate.Reported are odds ratios, exp(β).

TABLE S4 :
Career-level analysis using panel model with individual researcher fixed effects.Publication data included: articles published in period yp ∈[1970, 2018]with kp ≥ 2 and wp ≥ 2; only includes researchers with Na ≥ 10 articles satisfying these criteria.Robust standard errors are shown in parenthesis below each point estimate.Y indicates additional fixed effects included in the regression model.

TABLE S5 :
Flagship Project effect: Career-level analysis using panel model with researcher fixed effects.Publication data included: articles published in period yp ∈[1970, 2018]with kp ≥ 2 and wp ≥ 2; only includes researchers with Na ≥ 10 articles satisfying these criteria.Robust standard errors are shown in parenthesis below each point estimate.Y indicates additional fixed effects included in the regression model.p < 0.05, * * p < 0.01, * * * p < 0.001 *