Introduction: a new testbed for evaluating interdisciplinary research

Many of the world’s contemporary challenges are inherently complex and cannot be addressed or resolved by any single discipline, requiring a multifaceted and integrated approach across disciplines (Gibbons et al., 1994; Frodeman et al., 2010; Aldrich, 2014; Ledford, 2015). Given the widespread recognition today that cross-disciplinary communication and collaboration are necessary to not only pursue a curiosity-driven quest for fundamental knowledge but also address complex socioeconomic issues, interdisciplinary research (IDR) has become increasingly central to both academic interest and government science policies (Jacobs and Frickel, 2009; Roco et al., 2013; NRC, 2014; Allmendinger, 2015; Van Noorden, 2015; Davé et al., 2016b; Wernli and Darbellay, 2016). Accordingly, various national and international programmes, focusing especially on promoting IDR, have recently been launched and developed in many countries through specialised research funding and grants or through staff allocations (e.g., Davé et al., 2016a; Gleed and Marchant, 2016; Kuroki and Ukawa, 2017; NSF, 2019).

Driving these pro-IDR policies and the attendant rhetoric is an implicit assumption that IDR is inherently beneficial and has a more substantial impact compared with traditional disciplinary research. However, this assumption has rarely been supported by solid scientific evidence, and in most cases, the supposed merit of IDR has been based on anecdotal evidence from specific narrative examples or case studies (for related perspectives, see e.g., Jacobs and Frickel, 2009, p. 60; Weingart, 2010, p. 12). Considering the fact that significant resources have been and are being invested in promoting IDR, better clarity regarding the relationship between interdisciplinarity and its potential benefit, particularly the research performance, could help increase accountability for such policy actions.

Extant literature has investigated the relationship between interdisciplinarity and the research performance by using various data sources and methodologies, with different operationalisation of both dimensions (e.g., Steele and Stier, 2000; Rinia et al., 2001; Rinia et al., 2002; Adams et al., 2007; Levitt and Thelwall, 2008; Larivière and Gingras, 2010; Chen et al., 2015; Elsevier, 2015; Yegros-Yegros et al., 2015; Leahey et al., 2017). Owing to such diverse investigation approaches, it is unsurprising that the results are usually neither consistent nor conformable and sometimes are even contradictory among the literature. Given this situation, it is desirable that a more robust and reproducible methodology be developed and implemented to systematically assess the value of IDR in practice. The present study seeks to contribute to this goal by developing a new testbed for IDR evaluation. The focus is especially placed on highly cited paper clusters known as the research fronts (RFs), which are defined by a co-citation clustering method (Small, 1973). In this new approach, the research interdisciplinarity is characterised by the disciplinary diversity of the papers that compose the RF, and the research performance is operationalised and measured as a field-normalised citation-based measure at the RF level.

This proposed RF-based approach has three major advantages over common approaches that focus, for instance, on individual papers (Steele and Stier, 2000; Adams et al., 2007; Larivière and Gingras, 2010; Chen et al., 2015; Elsevier, 2015; Yegros-Yegros et al., 2015) to investigate the potential effect of interdisciplinarity on high-impact research. First, through the analyses of RFs, it is possible to capture a snapshot of the most lively, animated and high-impact research currently being undertaken in the academic sphere, since the papers composing RFs are classified as the most highly cited papers for each science discipline. As science policymakers, leaders, funders and practitioners are often most interested in promoting and supporting high-impact research, the evidence and insights obtained through this investigation of RFs can assist them in formulating more accountable policy recommendations that otherwise cannot be adequately addressed. Second, the RF is a unique manifestation of knowledge integration from different science disciplines. By construction, the interdisciplinarity operationalised at the RF level does not represent a mere parallel existence of discrete knowledge sources from multiple disciplines; rather, it indicates the state of the knowledge integration from multiple disciplines to create new knowledge syntheses. This organic scientific knowledge structure can be captured more effectively and robustly through RFs than through, for instance, an individual paper’s reference list. Consequently, the emergence of a new high-impact research area will also be more reliably detected at the RF level than at the paper level. The third advantage of the proposed RF-based approach is related to the technicalities. As discussed, RFs are unique self-organised units of knowledge in which bibliographically important information is effectively compressed and integrated. As this study considers thousands of papers, it is considerably more efficient and effective to handle RFs compared with a multitude of papers while conducting data retrieval, analysis and visualisation. These multifold advantages of the RF-based approach enable this study to comprehensively and uniquely assess the value of interdisciplinarity.

Methods: through the lens of emergent research fronts

The analyses in this study were based on the data retrieved from the Essential Science Indicators (ESI) database, published by Clarivate Analytics, and data published by the National Institute of Science and Technology Policy (NISTEP) of Japan. In this section, the definitions for the main terms used in this paper—the RFs, the research areas, the research impact and the interdisciplinarity index—are provided. Subsequently, the regression model specification used in this study and the rationale behind it are detailed.

Research fronts and (broad) research areas

The bibliometric data for the research papers (regular scientific articles and review articles) and citation counts were derived from more than 10,000 journals indexed in the Web of Science Core Collection published by Clarivate Analytics. The master journal list is updated regularly, with each journal being assigned to only one of the 22 ESI research areas (see Supplementary Table S1). Given a pre-set co-citation threshold, the original ‘ESI-RFs’ were defined based on the number of times the pairs of papers had been co-cited by the specified year and month within a five-year to six-year period. The ESI-RF investigation in this paper was focused on papers classified as ‘Highly Cited Papers’ in the ESI database, which are the top 1% for annual citation counts in each of the 22 ESI research areas based on the 10 most recent publication years.

Based on the ESI framework, the NISTEP’s Science Map dataset (NISTEP, 2014, 2016, 2018) defines a set of ‘aggregate RFs’ using a second-stage clustering in each of the three data periods: 2007–2012, 2009–2014 and 2011–2016, which are denoted in this study as S2012, S2014 and S2016, respectively. Each dataset comprised approximately 800–900 of such ‘aggregate RFs’ (hereinafter referred to as ‘RFs’). The i-th RF in the aggregate dataset S=S2012S2014S2016 was denoted by RFi. After excluding two RFs with missing data, there were |S| = 2,560 RFs collected for the total data period (2007–2016), with a cumulative number of 53,885 papers (Table 1).

Table 1 Distribution of papers by research area by data period

For this study’s purpose, the 22 ESI research areas were reorganised into nine broad categories based on the classification scheme in Supplementary Table S1. Of these, we focused on the following eight categories composed of 19 ESI natural science areas: ‘Environmental and Geosciences’, ‘Physics and Space Sciences’, ‘Computational Science and Mathematics’, ‘Engineering’, ‘Materials Science’, ‘Chemistry’, ‘Clinical Medicine’ and ‘Basic Life Sciences’, which we denote collectively as \({\mathscr{R}}\). The other category, composed of the three ESI ‘non-natural-science’ areas—‘Economics and Business’, ‘Social Sciences, General’ and ‘Multidisciplinary’—was excluded from the analyses because the main research output were books rather than journal papers and thus were under-represented in the data.

Research impact measure

Although higher citations do not necessarily represent the intrinsic value or quality of a paper, research impact is commonly operationalised as citation-based measure (e.g., Steele and Stier, 2000; Rinia et al., 2001, 2002; Adams et al., 2007; Levitt and Thelwall, 2008; Larivière and Gingras, 2010; Chen et al., 2015; Elsevier, 2015; Yegros-Yegros et al., 2015), which is due to not only its intuitive and computational simplicity but also the data availability and tractability. Moreover, the citation-based research impact is often defined as a field-normalised measure, that is, the absolute citation counts divided by the world average in each discipline, in order to take into account for the disciplinary variations in publication and citation practices. This study also used a surrogate field-normalised citation-based measure of research impact; however, in contrast to previous studies, it was defined and measured at the RF level rather than at a paper level (Steele and Stier, 2000; Adams et al., 2007; Larivière and Gingras, 2010; Chen et al., 2015; Elsevier, 2015; Yegros-Yegros et al., 2015), at a journal level (Levitt and Thelwall, 2008) or at a research programme level (Rinia et al., 2001, 2002).

Let Ni be the number of papers comprising RFi, and let \(N_i = \mathop {\sum}\nolimits_{{\mathrm{A}} \in {\mathscr{R}}} {N_{i,{\mathrm{A}}}}\) be its decomposition based on the research areas, where Ni,A is the number of papers in RFi attributed to each research area A \({\mathscr{R}}\). Let Xi be the actual citation counts received by RFi. Let also CA;y/m be the baseline citation rate for each research area A as noted on the ESI database as of the specified year and month (‘y/m’), which is defined as the total citation counts received by all papers attributed to research area A divided by the total number of papers attributed to the same research area in the 10 years of the Web of Science. Then, the mean baseline citation rate for each research area A, denoted 〈CA〉, was obtained by averaging CA;y/m over all the ESI data periods from March 2017 to January 2019 (i.e., from y/m = 2017/03 to 2019/01; bimonthly) (Supplementary Table S2). Subsequently, the research impact measure for RFi was defined by

$$I_i = \frac{{X_i}}{{\mathop {\sum}\nolimits_{{\mathrm{A}} \in {\mathscr{R}}} {N_{i,{\mathrm{A}}}\left\langle {C_{\mathrm{A}}} \right\rangle } }}\,,$$

that is, the ratio of the actual citation counts earned by RFi to the expectation value of the citation counts for the same RF.

Interdisciplinarity index

The context-dependent nature of research interdisciplinarity has made its identification and assessment far from trivial, hitherto without a broad consensus on its operationalisation (Porter and Chubin, 1985; Morillo et al., 2003; Huutoniemi et al., 2010; Klein et al., 2010; Wagner et al. 2011; Siedlok and Hibbert, 2014; Adams et al., 2016). Numerous attempts have been made to develop methodologies for operationalising interdisciplinarity in practice, not only at the paper level (Morillo et al., 2001; Adams et al., 2007; Porter and Rafols, 2009; Larivière and Gingras, 2010; Chen et al., 2015; Elsevier, 2015; Yegros-Yegros et al., 2015; Leahey et al., 2017) but also at a journal level (Morillo et al., 2003; Levitt and Thelwall, 2008; Leydesdorff and Rafols, 2011) or at a research programme level (Rinia et al., 2001; Rinia et al., 2002). Still, it is most popularly defined at a paper level, either in terms of ‘knowledge integration’, as measured through the proportion of references from different disciplines, or ‘knowledge diffusion’, as measured through the proportion of citations received from different disciplines (Porter and Chubin, 1985; Adams et al., 2007; Van Noorden, 2015). Regardless of the operationalisation level, a more refined quantitative approach to interdisciplinarity, conceptualised as the disciplinary diversity, necessarily requires the following three aspects: ‘variety’ (number of disciplines involved), ‘balance’ (distribution evenness across disciplines) and ‘dissimilarity’ (degree of dissimilarity between the disciplines) (see Rao, 1982; Stirling, 2007). Most previous IDR studies have evaluated interdisciplinarity based on either variety or balance, while some recent studies (e.g., Porter and Rafols, 2009; Leydesdorff and Rafols, 2011; Mugabushaka et al., 2016) have made efforts to incorporate the aspect of dissimilarity as well.

This study also operationalises interdisciplinarity as an integrated measure of the aforementioned three aspects; however, in contrast to previous studies, it was uniquely operationalised at the RF level. Specifically, the interdisciplinarity index for RFi was defined and evaluated using the following ‘canonical’ formula (Okamura, 2018):

$${\it{\Delta }}_i = \left[ {{\sum\limits_{{\mathrm{A}} \in {\mathscr{R}}}} {{\sum\limits_{{\mathrm{B}} \in {\mathscr{R}}}} {w_{i,{\mathrm{A}}}w_{i,{\mathrm{B}}}\left\langle {M_{{\mathrm{AB}}}} \right\rangle } } } \right]^{ - 1}\,.$$

Here, wi,A denotes the relative abundance of a research area A in RFi, defined by, using the previous notations, wi,A = Ni,A/Ni, satisfying \({\sum\nolimits_{{\mathrm{A}} \in {\mathscr{R}}}} {w_{i,{\mathrm{A}}} = 1}\). The effective affinity (i.e., similarity) between each pair of research areas A and B in \({\mathscr{R}}\), denoted 〈MAB〉 in (2), was defined as the time-averaged Jaccard indices (see Supplementary Methods and Discussion), where, as before, the bracket ‘〈…〉’ represented the average over the 12 ESI data periods. Figure 1 shows the chord diagram representation of the affinity matrix (see Supplementary Table S3 for the source data), from which it was evident that the degree of affinity varied considerably for different pairs of the disciplines.

Fig. 1
figure 1

A chord diagram representation of the affinities between research areas. The affinity indices were defined as the time-averaged Jaccard similarity indices and were evaluated between each pair of research areas (Supplementary Methods and Discussion). They were assigned to each connection between the research areas, represented proportionally by the size of each arc, from which it is evident that the degree of affinity varied considerably for different pairs of the disciplines (see Supplementary Table S3 for the source data)

The interdisciplinarity index (2) is unique because it is conceptualised as the effective number of distinct disciplines involved in each RF and is robust regarding the research discipline classification scheme. Specifically, it has the special property of remaining invariant under an arbitrary grouping of the constituent disciplines, given that the between-discipline affinity is properly defined for all pairs of disciplines. For instance, suppose one is interested in measuring the interdisciplinarity of RFi based on the classification scheme \({\mathscr{R}}\)1 and someone else wishes to measure the interdisciplinarity of the same RFi based on the more aggregate classification scheme \({\mathscr{R}}\)2. Then, for the interdisciplinarity index to be a consistent measure of disciplinary diversity, both approaches must result in the same value for the interdisciplinarity; that is, \({\it{\Delta }}_i\left[ {{\mathscr{R}}_1} \right] = {\it{\Delta }}_i\left[ {{\mathscr{R}}_2} \right]\). Otherwise, it results in an inconsistent situation as the interdisciplinarity changes with respect to the level (or ‘granularity’) of the research discipline classification, while the physical content of the RF (i.e., the constituent papers) remains the same. Note that popular (dis)similarity-based diversity measures such as the Rao-Stirling index (Rao, 1982; Stirling, 2007) and the Leinster-Cobbold index (Leinster and Cobbold, 2012) do not generally satisfy this invariance property; to the best of our knowledge, the only known diversity measure that respects this invariance property is given by the formula (2), the theoretical grounds for which have recently been established for a general diversity/entropy quantification context (Okamura, 2018).

Using this formula, the interdisciplinarity index for each RF in S was obtained, from which it was found that 43.6% of the RFs were mono-disciplinary (i.e., Δ = 1) and more than half were interdisciplinary (Fig. 2a; median = 1.2, range = 2.5; see also Supplementary Fig. S1a).

Fig. 2
figure 2

Relationship between research impact and interdisciplinarity. a The histogram for the interdisciplinarity index (median = 1.2, range = 2.5, interquartile range = 0.58); b The histogram for the log-transformed research impact (mean = 1.2, SD = 0.83); c The scatterplot showing the associations between the interdisciplinarity index and the log-transformed research impact. The solid line in the scatterplot represents the robust linear model fit. The shaded region and the dashed lines, respectively, indicate the 95% confidence interval based on the standard error of the mean and on the standard error of the forecast, including both the uncertainty of the mean prediction and the residual

Regression model

Based on the aforementioned operationalisations of the research impact and the interdisciplinarity index, the relationship between the two variables was analysed using a regression analysis method. As the histogram analysis showed that the original research impact distribution was skewed, it was log-transformed so that the distribution curve was closer to a normal curve (Fig. 2b; mean = 1.2, SD = 0.83; see also Supplementary Fig. S1b). The scatterplot of the log-transformed research impact against the interdisciplinarity index indicated that these variables were relatively linearly related (Fig. 2c; see also Supplementary Fig. S2a–c). Subsequently, the following multiple linear regression model was investigated:

$$\ln \left( {I_i} \right) = {\boldsymbol{x}}_{i}{\boldsymbol{\beta}},$$

where, xi was a l × k vector for predictive variables, and β was a k × l vector for the regression coefficients, which were the unknown parameters to be estimated (with k being some integer). To deal with the possible issue of heteroscedasticity, the model was analysed using heteroscedasticity-robust standard errors (i.e., the Huber-White estimators of variance). In addition, a test for serial correlation (i.e., the Breusch-Godfrey Lagrange multiplier test) was conducted as a post-estimation procedure, which indicated that there was no serial correlation between the residuals in each model considered (see below).

For comparability, five different regression models corresponding to different specifications of the predictive variables were analysed and labelled Models 1–5, with the following sets of predictive variables, respectively, defined for each model:

$$\begin{array}{l}{\mathrm{Model}}\,1:\quad {\boldsymbol{x}}^{(1)} = ({1},\,\Delta)\,,\\{\mathrm{Model}}\,2:\quad {\boldsymbol{x}}^{(2)} = ({\boldsymbol{x}}^{(1)},\,IntlCollab,\,IntlCiting)\,,\\{\mathrm{Model}}\,3:\quad {\boldsymbol{x}}^{(3)} = ({\boldsymbol{x}}^{(2)},\,Year\,\, {\mathrm{dummies}})\,,\\{\mathrm{Model}}\,4:\quad {\boldsymbol{x}}^{(4)} = ({\boldsymbol{x}}^{(3)},\,Research\,Area\,\,{\mathrm{control}}\,{\mathrm{set}})\,,\\{\mathrm{Model}}\,5:\quad {\boldsymbol{x}}^{(5)} = ({\boldsymbol{x}}^{(4)},\,Country\,\, {\mathrm{control}}\,{\mathrm{set}})\,.\end{array}$$

In Model 1, the interdisciplinarity index was used as the only predictive variable, which was added to the intercept term (constant). In Model 2, the variables associated with IntlCollab and IntlCiting, denoting the proportion of internationally collaborated papers in papers comprising an RF and in the citing papers, respectively, were included as additional predictive variables. Models 3, 4 and 5, in the same manner, represented the prior model with a new set of predictive variables, respectively, added as follows: Year dummy variables for the different years (2012, 2014 and 2016) of the Science Map to capture the possible time-fixed effects; a ‘Research Area’ control set to represent the proportion of papers belonging to each research area A \({\mathscr{R}}\); and a ‘Country’ control set to represent the proportion of papers for which authors from each country of \({\mathscr{C}}\) = {US, France, UK, Germany, Japan, South Korea, China} contributed (measured on a fractional-count basis). The last two control sets were introduced to, respectively, account for the possible discipline-related and country-related effects that could reflect such factors as research environment, practices and cultures intrinsic to each discipline or/and country.

In interpreting the regression results, each regression coefficient βk (i.e., the k-th component of β in Eq. (3)) indicated that a one point increase in the predictive variable xk was associated with βk point increase in ln(I), or equivalently, [exp(βk)−1] × 100% increase in the research impact (I) at the specified significance level. Care should be taken in interpreting the results for the proportion variables (IntlCollab, IntlCiting, ‘Research Area’ and ‘Country’ control sets) as the regression coefficients for each of these variables represented the effect on the criterion variable (i.e., the log-transformed research impact) associated with a 100% increase in the proportion variable. For the time-fixed effects, the base category was chosen as Year = 2014, against which the effects of the other two data periods (corresponding to Year = 2012 and 2016) were measured. For the ‘Research Area’ control set, the effect of the proportion of each research area in \({\mathscr{R}}\) was measured against the set of ‘residual’ (i.e., ‘non-natural-science’) ESI research areas. Finally, for the ‘Country’ control set, the effect of the share of each country in \({\mathscr{C}}\) was measured against the set of those countries not listed in \({\mathscr{C}}\).

Results: interdisciplinarity as a key driver of impact at research fronts

The results of the multiple regression analyses for all the five models (n = 2,560; two-tailed) are summarised in Supplementary Table S4. Based on the adjusted-R2 for each model (the bottom row of the table), Model 5 was found to be the preferred model in terms of the goodness-of-fit, and therefore, this model was considered in detail in this study; see Table 2 for the summary table.

Table 2 Results of the multiple regression analysis with robust standard errors

Particularly, the estimated coefficient for the interdisciplinarity index was found to be positive and statistically highly significant. Specifically, a one point increase in the interdisciplinarity index in an RF (i.e., an increase in the effective number of distinct disciplines by one) is, on average, associated with approximately a ((e0.186−1) × 100% ≈) 20% increase in the research impact, holding other relevant factors constant (P < 0.001). This appears to imply that, on average, a high-impact RF is more likely to be formed either in the presence of disciplines that are more dissimilar or with a more balanced mix of distinct disciplines, or both. What this indicates is that while the papers composing the RFs were already high-impact papers as they were classified as ‘Highly Cited Papers’ in the ESI database, nevertheless the degree of the ‘high-impact’ at the RF level was found to be higher on average as the interdisciplinarity level increased. Notably, this implication was found to hold sufficiently generally, reproducing the same results qualitatively for each data period separately (Supplementary Fig. S2a–c).

Though outside the main scope of the present study, the regression results led to additional intriguing implications for the research impact predictors. Particularly, the regression coefficient for IntlCollab implied that a 1% increase in the international collaboration in an RF was, on average, associated with an approximately 0.6% increase in the research impact (P < 0.001), which was also found to hold sufficiently generally across the three data periods. By contrast, the regression coefficient for IntlCiting was found to be negatively significant (P < 0.001). For the time-fixed effects, the research impact was found to be, on average, statistically significantly lower in the ‘2012’ data compared with the ‘2014’ or ‘2016’ data (P < 0.001). However, no statistically significant difference was observed between the ‘2014’ and ‘2016’ data (see also Supplementary Fig. S1b, which already indicated this trend via the kernel density estimations for the criterion variable). Further, the coefficient for each of the ‘Research Area’ variables was found to be positively significant (P < 0.001), indicating that, on average, a paper belonging to either area of \({\mathscr{R}}\) is likely to have a higher research impact compared with a paper attributed to the ‘residual’ (i.e., ‘non-natural-science’) research area. Finally, the result for each of the country-share variables in \({\mathscr{C}}\) provided some intriguing insights into its effect on the research impact. For instance, the result for the variable ‘US’ implied that, on average, replacing 1% of the contributions from the ‘residual’ countries with that from the US resulted in an approximately 0.3% increase in the research impact (P < 0.001). These observed relationships between the research impact and each predictor variable, along with their policy implications, should be investigated in future studies.

Discussion: evolving landscape of cross-disciplinary research impact

To further enhance our understanding of the relationship between interdisciplinarity and research impact, a more detailed investigation of the finer structures and evolutionary dynamism of high-impact research over time and across disciplines is desirable. For this purpose, we present in the following a new bibliometric visualisation technique and demonstrate its potential use in the study of interdisciplinarity.

Science Landscape’: a novel bibliometric visualisation approach

Significant efforts have been made to visualise scientific outputs, especially bibliometric data regarding the citation characteristics. Such efforts have been partially successful in displaying the links between and across various research disciplines or subject categories (Small, 1999; Boyack et al. 2005; Igami and Saka, 2007; Leydesdorff and Rafols, 2009; Porter and Rafols, 2009; Van Noorden, 2015; Klavans and Boyack, 2017; Elsevier, 2019). Each alternative form of ‘science mapping’ has its own merit in particular situations, offering complementary and synergistically beneficial implications not only for a deeper understanding of academic (inter-)disciplinarity but also for policy implementation. To contribute to the evidence-base in this fast-growing and innovative field, here we present a new technique—called the Science Landscape—that visualises research impact and its development patterns in relation to the entire natural science discipline corpus. The same research impact measure and the interdisciplinarity index as used in the previous sections were employed to ensure methodological consistency between the empirical implications drawn from this new visualisation technique and the quantitative evidence already obtained from the regression analyses.

In the Science Landscape diagrams (Fig. 3a–c), the eight (broad) research areas were arranged along the edge of a circular map, with the angle of each research area being proportional to the number of papers attributed to that research area. Each RF was then mapped onto the circular map for each data period (Supplementary Fig. S3a–c), so that the distance from the edge to the centre indicated the RF’s interdisciplinarity index; that is, the closer it was to the centre, the greater the degree of interdisciplinarity. The angle around the centre was determined by the disciplinary composition; that is, the closer it was to a particular research area, the higher its share in the disciplinary composition. A similar circular research field frame (27 subject areas) is used in the ‘Wheel of Science’ for Elsevier’s SciVal system based on Scopus data (Klavans and Boyack, 2017; Elsevier, 2019); however, the objectives and what is mapped and how it is mapped are dissimilar. In particular, the Science Landscape shown here was based on 3D mapping technology, so that the height of each RFi was proportional to the log-transformed research impact, ln(Ii), with the highest (‘over the clouds’) and lowest (‘under the sea’) research impact levels being depicted in red and blue, respectively. Here the heights of the RFs were not added vertically; rather, at each map position, the maximum height value was used to depict the surface of the landscape. The rationale behind this method was that for the current purpose of investigating the cross-disciplinary spectrum of research impact, it was more meaningful and implicative to visualise ‘individually outstanding high-impact RFs’ rather than ‘a number of low-impact RFs additively forming high peaks’.

Fig. 3
figure 3

Dynamic evolution of research impact across disciplines. Corresponding to each data period—2007–2012 (a), 2009–2014 (b) and 2011–2016 (c)—the Science Landscape diagrams are shown. The figures on the left show the top views and the figures on the right show the birds-eye views. The eight ‘base’ research areas are arranged along the edge of the circular map, and the angle allocated to each research area is proportional to the number of papers from each discipline. The highest and lowest levels of research impact are depicted in red and blue, respectively

Moreover, each RF’s concrete disciplinary composition was indicated by the direction(s) towards which the RF’s peak tails (see Supplementary Fig. S4). For instance, in the Science Landscape for 2009–2014 (Fig. 3b), there is a high research impact peak (I = 100.7) near the centre that has one tail towards ‘Comp & Math’ and another tail towards ‘Basic Life Sciences’ (the solid square region). In light of the original NISTEP’s Science Map dataset (NISTEP, 2016), this peak corresponds to the RF characterised by feature words such as ‘RNA Seq’ and ‘next generation sequencing’. Then, intuitively, this correspondence indicates that during this period, there was a scientific breakthrough related to new sequencing technology that occurred at the intersection of these two disciplines. Further technical and mathematical details including the explicit functional form of the 3D research impact profile are presented in Supplementary Methods and Discussion.

Provided the above encoding, the Science Landscape diagrams (Fig. 3a–c) clearly illustrate how the shape of interdisciplinarity has changed over the three data periods. It is noticeable that the overall landscape of the research impact has never been static, monolithic nor homogeneous; rather, it evolves dynamically, both over time and across disciplines. One of the most remarkable features can be seen in the northwest of the map (dashed circle region) at the low ivory-white-coloured ‘mountains’ in 2007–2012 (Fig. 3a), where new high-impact RFs are evolving and developing into a group of yellow-coloured mid-height ‘mountains’ in the years up to 2009–2014 (Fig. 3b) and towards 2001–2016 (Fig. 3c). This dynamic research impact growth indicates the increased IDR focus around the region during the data period. Thus, this visualisation can assist identifying where the scientific community’s focus of attention is undergoing a massive change, where high-impact IDR is underway worldwide, and where new knowledge domains are being created. Each landscape appears to represent the superposition of the following two research impact evolutionary patterns; one that has steady, stable or predictable development that accounts for the ‘global’ or ‘evergreen’ structure of the landscape, and the other that represents a breakthrough in science or a discontinuous innovation, induced ‘locally’ in a rather abrupt or unpredictable manner. The challenge of science policy, therefore, is developing ways to address each of these dynamic evolutionary patterns and the mechanism thereof and to promote IDR in a more evidence-based manner with increased accountability for the investments made.

Summary and conclusions: towards evidence-based interdisciplinary science policymaking

This study revisited the classic question as to the degree of influence interdisciplinarity has on research performance by focusing on the highly cited paper clusters known as the RFs. The RF-based approach developed in this paper had several advantages over more traditional approaches based on a paper-level or journal-level analysis. The multifold advantages included: quality-screening, cross-disciplinary knowledge syntheses, structural robustness and effective data handling. Based on data collected from 2,560 RFs from all natural science disciplines that had been published from 2007 to 2016, the potential effect of interdisciplinarity on the research impact was evaluated using a regression analysis. It was found that an increase by one in the effective number of distinct disciplines involved in an RF was statistically highly significantly associated with an approximately 20% increase in the research impact, defined as a field-normalised citation-based measure. These findings provide verifiable evidence for the merits of IDR, shedding new light on the value and impact of crossing disciplinary borders. Further, a new visualisation technique—the Science Landscape—was applied to identify the research areas in which high-impact IDR is underway and to investigate its evolution over time and across disciplines. Collectively, this study established a new framework for understanding the nature and dynamism of IDR in relation to existing disciplines and its relevance to science policymaking.

Validity and limitations

The new conceptual and methodological framework developed to reveal the nature of IDR in this paper would be of interest to a wide range of communities and people involved in research activities. However, as with any bibliometric research, this study also faced various limitations that may have impacted the general validity of the findings, and thus, its practicability in the real policymaking process is necessarily limited. To conclude, some of these key issues and challenges are highlighted.

First, both the regression analysis results and the Science Landscape visualisations should be assessed with caution as they may be highly dependent on the research area classification scheme, which is not unique. Research area specifications other than those used in this study could also have been applied. For instance, a factor-analytical approach (Leydesdorff and Rafols, 2009) to identify a ‘better justified’ set of academic disciplines could be useful in providing a more nuanced assessment and understanding of the nature of interdisciplinarity and could possibly have higher robustness and reliability. Moreover, a different research area arrangement along the edge of the circular map would have resulted in different Science Landscape visualisations, and the cross-disciplinary spectrum of research impact might have been more plentiful or profound than observed in this study.

Second, in relation to the first point, the quantification of the affinity between the research areas could have been refined in other acceptable ways. Our rationale behind the definition of the between-discipline affinity based on the Jaccard-index was that papers from closer (i.e., with higher affinity) research areas were more likely to be co-cited, and thus more likely to belong to the same ESI-RF (see Supplementary Methods and Discussion). In this approach, the affinity matrix was defined solely using the bibliometric method, and therefore its matrix elements may have been more or less biased because of the publication/citation practices of the existing disciplines. Consequently, it may have failed to capture the inherent ‘true’ between-discipline affinities responsible for the ‘true’ interdisciplinarity operationalised at the RF level.

Third, it is unlikely that the regression model specification used in this study included every salient research impact predictor. For example, factors such as the types of research institute, departmental affiliations, individual journal characteristics and funding opportunities (e.g., funding agencies and programmes/fellowships) were not considered in the model owing to their unavailability in the dataset. Moreover, the links between the different scientific specialties irrespective of their academic discipline could have also influenced the research performances. These omitted variables may also have affected the regression results because they may be associated with both the criterion variable (i.e., the research impact) and some predictive variables including the interdisciplinarity index.

Finally, there are inherent limitations in using citation-based methods to evaluate research performance. Combining bibliometric approaches with expert judgements from qualitative perspectives will be favoured to extract the policy implications and recommendations from a wider context. Although the societal impacts of research (see e.g., Bornmann, 2013) were beyond the scope of the present work, it is hoped that this study’s findings can be extended to incorporate such societal aspects. In so doing, it is also important to consider not only the benefits but also the costs of IDR (Yegros-Yegros et al., 2015; Leahey et al., 2017) for interdisciplinary approaches to provide viable policy options for decision-makers.

With further conceptual and methodological improvements, it is hoped that future studies can reveal more about the nature of IDR and its intrinsic academic and/or societal value by overcoming some of the aforementioned limitations. Continued efforts will contribute to the development of the more evidence-based and accountable IDR strategies that will be imperative for addressing, coping with and overcoming contemporary and future challenges of the world.