The inverted U-shaped relationship between knowledge diversity of researchers and societal impact

With the increasing importance of interdisciplinary research, some studies have focused on the role of reference diversity by analysing reference lists of published papers. However, the relationship between the knowledge diversity of collaborating team members and research performance has been overlooked. In this study, we measured knowledge diversity through the disciplinary attributes of collaborating authors and research performance (understood as societal impact) through altmetric data. The major findings are: (1) The relationship between interdisciplinary collaboration diversity and societal impact is not a simple linear one, showing an inverted U-shaped pattern; and (2) As the number of collaborative disciplines increases, the marginal effects diminish or even become outweighed by the costs, showing a predominance of negative influences. Hence, diversity in interdisciplinary collaboration does not always have a positive impact. Research collaborations need to take into account the cost issues associated with the diversity of member disciplines.

The relationship between interdisciplinary collaboration diversity and societal impact. Understanding the essence of the relationship between knowledge diversity in interdisciplinary collaboration and societal impact involves investigating the impact of collaboration diversity on research performance 32 . Most studies suggest that collaboration diversity has a positive effect on research output, which encourages research institutions to actively engage in diverse collaborations and researchers to seek new partners 5,11,33,34 . However, other studies have found that heterogeneity in team members" knowledge hinders creativity and reduces performance 9,35 . These inconsistent findings may reflect the fact that this relationship is non-linear.
Theoretically, knowledge diversity among collaborating members is often considered a double-edged sword. On the one hand, in terms of information processing, knowledge diversity challenges existing knowledge structures, stimulates creativity and divergent thinking, and facilitates knowledge reconstruction by identifying and integrating knowledge from different fields 9,36 . Just as the weak ties hypothesis emphasises the importance of different resources, richer and more diverse information contributes to improved collaborative performance 37,38 . On the other hand, increasingly heterogeneous knowledge can exacerbate the knowledge boundaries among team members and increase the cognitive costs required for shared knowledge construction 39 .
Similar trends were observed at the level of team interactions. Highly heterogeneous teams experienced some degree of conflict when collaborating because of their different perceptions of the task 40,41 . Moderate levels of task conflict are more conducive to complex issue resolution than excessively high or low levels 42 . Simultaneously, researchers' pride in their own discipline can prompt protective behaviour in the domain of knowledge. Moreover, the use of technical discipline-specific terms may deepen communication barriers and cause other challenges 43 . The effectiveness of team communication and decision-making decreases as diversity increases 44 .
It is clear from the above that as knowledge diversity increases, the advantages do not always outweigh the disadvantages; in terms of diversity, there can be "too much of a good thing". There is a threshold of positive effects beyond which undesired outcomes are produced, resulting in an inverted U-shaped non-linear relationship 45,46 . Accordingly, we defined Hypothesis as follows: The impact of knowledge diversity in the relationship between interdisciplinary collaboration and societal impact forms an inverted U-shape. As diversity increases, societal impact reaches a peak and then tends to decline.

Results
Descriptive statistics and correlation analysis. The results of the descriptive statistics and correlation analysis of the variables revealed significant positive correlations between Tweets mentioned counts and the number of subjects (Table 1) www.nature.com/scientificreports/ that the correlation coefficients were below the critical value of 0.7, indicating there were no serious collinearity issues between the variables. Accordingly, we proceeded with further hypothesis testing and regression analysis.

Regression analysis.
Given that the variance of the dependent variable was larger than the mean and showed an over-dispersed distribution (i.e. it did not meet the requirements of a general multiple linear regression), we modified the model using a negative binomial regression applicable to asymmetric datasets. The variance inflation factor (VIF) for each variable was less than 10, excluding the possibility of multicollinearity between the independent and control variables. The regression Eq. (1) is as follows: Model 1 showed the regression results for all the control variables on societal impact. Models 2 and 3 tested the regression results for the linear term and the quadratic term for the societal impact on diversity in interdisciplinary collaboration, respectively ( Table 2). The 95% confidence interval for the alpha value of the negative binomial regression model did not include zero, indicating that the model rejected the original hypothesis of "alpha = 0" at the 5% significance level; hence, the negative binomial regression model was basically acceptable 47 .
As seen in Fig. 1a, the fitted curve for Tweets mentioned count to subject count showed an inverted U-shaped trend. This study adopted the AIC and BIC statistic to determine the fit of the quadratic model versus the linear model. When comparing the two, a smaller AIC or BIC means a better model fit. It was found that the quadratic model revealed more patterns in the data than the linear model (See Models 2 and 3 of Table2 for detailed results). Likewise, the Pseudo R 2 data for the three models supported that the model fit was better for the quadratic terms.
The results showed that the linear term for knowledge diversity in interdisciplinary collaboration positively predicted the number of Tweets, while the regression results for the quadratic term were negative and significant (β 1 = 0.099, p < 0.001, β 2 = − 0.006, p < 0.001), thus supporting Hypothesis. To further test the inverted U-shaped hypothesis, we adopted the utest command in Stata and found that the extreme value points were within the data limits (1 < 6.62 > 22), and the 95% confidence interval was [4.745, 8.892]. As such, the null hypothesis of no U-shape could be rejected at the 5% significance level (t = 3.85, p < 0.001). The trends in collaboration diversity and societal impact are plotted against the negative binomial regression results in Fig. 1b. Figure 1c showed a further analysis of the marginal effects. It was found that as the diversity of interdisciplinary membership increases, the marginal effect on societal impact continues to diminish and even becomes (1) TMC =β 0 + β 1 Subjectcount + β 2 Subjectcount 2 + β 3 Organizationcount + β 4 (Authorcount) + β 5 (Journal) + β 6 Timelag + β 7 (Fields PSE ) + β 8 (Fields LES ) + β 9 (Fields BHS ) www.nature.com/scientificreports/ negative. At an average number of disciplines of 6.6, the equilibrium point between costs and benefits is reached. However, the costs then begin to outweigh the benefits and the slope becomes negative.

Robustness tests.
To verify the regression results, we conducted robustness tests using another important measure of societal impact: altmetric attention score (Table 3 and Fig. 2a,b). Since altmetric attention score was a non-integer data, it was not rigorous to employ negative binomial regression analysis. This study adopted boxcox transformed dependent variable data for OLS regression analysis. The results showed that the linear term for collaboration diversity positively predicted altmetric attention score, while the regression results for the quadratic term were negative and significant (β 1 = 0.162, p < 0.01, β 2 = − 0.012, p < 0.01).
To further test the inverted U-shaped hypothesis, we adopted the utest command in Stata and found that the extreme value points were within the data limits (1 < 6.823 > 22), and the 95% confidence interval was [5.045, 9.865]. As such, the null hypothesis of no U-shape could be rejected at the 5% significance level (t = 3.18, p < 0.001). Furthermore, the results of the marginal effects analysis similarly indicated that the instantaneous slope of subject diversity and societal impact shifted from positive to negative. This suggested that when the number of disciplines exceeded the optimal size, the marginal costs outweigh the marginal effects and the negative impact becomes dominant (See Fig. 2c).

Discussion and conclusion
Main findings. Grounded in the question of whether diversity in collaboration is always optimal, this study explored the relationship between collaboration diversity and societal impact in interdisciplinary research, and the role of cognitive distance. Based on a literature review and theoretical analysis, we suggested that this relationship may take the form of an inverted U-shaped pattern. We used a sample of co-authored studies published in Nature and Science to test the hypotheses. The results indicated a significant inverted U-shaped relationship between the number of cross-disciplinary collaborators and both the number of times the paper was mentioned on Twitter and the altmetric attention score. This implied that as the knowledge diversity of the collaboration members increases, societal impact tends to decline after it reaches a certain peak. Current mainstream research has dissected the benefits of collaboration diversity from a variety of perspectives. Philosophical explanations can be traced back to Aristotle's famous defence of the "wisdom of the multitude": "Hence the many are better judges than a single man of music and poetry; for some understand one part, and some another, and among them they understand the whole" 48 . This emphasises the role of diversity in facilitating collective decision-making. Furthermore, cognitive neuroscience experiments have shown that the proximity of other people's ideas effectively enhances interpersonal brain synchronisation, thereby increasing team creativity and vice versa 49 . Contrastingly, some meta-analytic studies have found small or zero effect sizes for the positive relationship between demographic or cultural diversity and team performance 50,51 . Hence, it is evident that logical deduction and empirical studies continue to note both the advantages and disadvantages of collaboration diversity. This controversy may indicate that the advantages of collaboration diversity have their own specific scope and boundary conditions 52 . This study confirms, from an intellectual context, that the benefits of cooperative diversity may be reduced after a certain peak due to excessive costs. In scientific research, where divergence and convergence processes alternate, diversity is essential if scientists are to produce novel and Table 3. Regression analysis results. Robust standard errors are in parentheses; ***p < 0.01, **p < 0.05, *p < 0.1.  www.nature.com/scientificreports/ rigorous results 53 ; it is also important to pay attention to the cost of too much diversity and avoid inefficiency and the phenomenon of "too many cooks spoiling the broth". It is interesting to note that this finding is similar to the optimal scale effect found in recent years in scientific collaboration. Price's famous prediction 54 that scholarly publications will "move steadily toward an infinity of authors per paper" is being borne out, with the "scientist-as-lonely genius" myth becoming further detached from reality 55,56 . In fact, the benefits of team size were shown not to follow a linear growth pattern. With the continued "inflation" of collaboration size, citation rates appeared the tendency of reduction, which presented an inverted U-shaped relationship 7,57,58 . As Wu et al. 13 found, large teams tend to develop existing science and technology more than small teams, resulting in fewer disruptive innovation breakthroughs. Increased team size is often accompanied by greater diversity among team members 58 , introducing more collective intelligence and innovative perspectives. There are exceptions, such as very diverse small groups and large, highly homogenous teams. The key question is whether size or diversity is more important in the relationship between interdisciplinary collaborative research and research performance. Collective intelligence research has focused on this relationship, with Condorcet's jury theorem predicting that heterogeneous team composition may be more important when faced with highly complex decisions which are susceptible to bias 59 . To improve the accuracy of collective decision-making, a high level of diversity needs to be maintained if team size is to be continuously increased 60 . Despite controlling for the variable of team size, the present study still found an inverted U-shaped distribution in the relationship between collaboration diversity and societal impact, indicating the relatively greater importance of diversity in collective decision-making performance compared to team size. Zhu et al. 58 discovered that research team members' own research diversity played a moderating role in the relationship between team size and the influence of the research. In any case, our conclusions need to be interpreted with caution, as the relationship between size and diversity may be reversed under other conditions. For example, in real-world social activities, the performance of collective decision-making depends on a variety of factors, such as average individual accuracy and decision bias 59,61 . Implications. In theoretical terms, this study focused on the performance of collaboration diversity from the perspective of knowledge diversity in interdisciplinary collaboration. As an important cognitive basis for scientific collaboration, knowledge diversity among team members is a key element that influences innovation performance. In contrast to previous studies that have emphasised the value of collaboration diversity, this study quantitatively verified that collaboration diversity is a double-edged sword, peaking in the number range of 6 to 7, after which its negative effects may outweigh the positive ones.

Variables
In current society, the promotion of scientific and technological innovation through interdisciplinary collaboration has become a major concern for national governments and research institutions. This study provides important insights into the formation of research teams and knowledge management. It is important to be aware of the differences in the disciplinary backgrounds among team members and to try to optimise the level of disciplinary diversity. During collaboration, research teams, especially those working at the intersection of the humanities and sciences, need to be aware of potential communication barriers and conflicts among members.
Limitations and future research. Note that while research on societal impact has widely used altmetric as indicators, we are aware of the limitations of measuring societal impact through analysis of social media activity. The findings of this study need to be validated with multiple sources of data and research methods. For example, societal impact is not only measured by the level of attention but also by the content. It is necessary to explore metrics for quantifying the valance of social evaluation. Furthermore, in addition to diversity and differences in disciplinary distances, interdisciplinary collaboration also involves disparity and balance. It is necessary for future research to reveal the patterns in interdisciplinary collaboration from more dimensions.
As a major part of scientific activity, scientific collaboration is a dynamic and changing process of cognitive interaction comprising a comprehensive and complex collection of intertwined factors, such as member attributes and interactions. This study only focused on the relationship between the disciplinary diversity of team members and the outcome of collaboration which reflected in societal impact. It remains to be seen how disciplinary diversity affects the innovation performance of collaboration and how organisations can respond to the "marginal dilemma" identified in the present study. Future research should explore how to effectively control the decline in costs and creativity, while maximising team members' innovative behaviour.

Methods
Data. Nature and Science are multidisciplinary journal, and choosing papers from the same journal in the same year would exclude the effect of different impact factors and years of publication. Thus, we conducted a search of the Web of Science Core Collection which was limited to the journal Nature and Science, the publication date "2018" and the literature type "Article", giving a total of 1596 articles. The data collection process was as followed. First, the basic information of each article was obtained, such as the DOI, authors, institutions, publication date, and so on. Second, using the DOI of each article, the number of mentions on Twitter and the overall altmetrics score was obtained from altmetric.com. The above data were crawled with the help of the altmetrics package for Python (version 3.6.6) for 3 August 2022.
To avoid missing data and ensure the matching of information in the literature, the following papers were excluded: retracted papers; papers with missing information, such as authors' institutions; papers that was missing altmetrics data; single-author papers; or papers with unclear information on the discipline covered by the primary or secondary institution. The final valid sample comprised 1554 papers.  Table 4. The dependent variable was societal impact. The independent variable was diversity in interdisciplinary collaboration. The control variables were the date of publication, institutions, and number of authors. The specifics of societal impact, interdisciplinary collaboration diversity, and control variables were detailed below.
Societal impact. The dependent variable was the societal impact of each paper. To measure it, we adopted the main altmetrics indicator, namely the number of mentions on Twitter 62 . Twitter, as a major public social media platform, broadens the audience and dissemination channels of academic research findings and helps to improve the understanding and application of academic research by the public 63 .
Interdisciplinary collaboration diversity. Referring to the studies of Zhang et al. 31 as well as Zhang and Zhang 64 , we obtained the number of disciplines involved in each study to measure the diversity of interdisciplinary collaboration from a scientific activity perspective, starting with information on each author's institution. Data processing was conducted as followed (Fig. 3). First, the complete address of each institution was extracted, and the characteristic words of the secondary institution were identified, which included information on the discipline. If the secondary institution did not show subject information, the characteristic words of the primary institution were identified. Second, a disciplinary classification scheme was constructed based on the 254 disciplinary categories of the Web of Science and their corresponding five scientific disciplines (Life Sciences & Biomedicine, Technology, Arts & Humanities, Social Sciences, and Physical Sciences). In addition, the subject information words of each institution were matched to the subject categories in the subject classification scheme. Non-English words or ambiguous abbreviations and acronyms were manually checked to ensure accuracy. Following this method, we were able to count the number of different disciplines involved in each publication.  www.nature.com/scientificreports/ Control variables. This study controlled for the number of authors, institutions and the type of journal; it was suggested that these variables have a potential correlation with the academic impact of scientific papers 30 . Considering the real-time nature of altmetric data facilitated by online research communication, we obtained the difference between the date of publication and obtaining altmetric data. This helped control for the effect of the differences in the times of publication of the papers in the sample. Due to the wide variation in the societal attention to research on different topics 65 , this study used subject classification data from the Leiden ranking, i.e., flagging one or more of the main fields covered by the publication. The fields involved were respectively: Biomedical and health sciences, Life and earth sciences, Mathematics and computer science, Physical sciences and engineering and Social sciences and humanities. We included in our analysis the coverage of each of these fields.

Data availability
The datasets used or analysed during the current study available from the corresponding author on reasonable request.