Introduction

The phenomenon of “the rich get richer”, which is also called preferential attachment in the field of complex networks1, is quite common in many fields2,3 (for example, see references cited in Table I); however, the scientific field is composed of scientists, a special group of people who focus on proposing, investigating and implementing original and creative ideas. Therefore, it is plausible that the “the rich get richer” phenomenon is less pronounced in the fields investigated by scientists than in other areas. Ideally scientists choose their fields of investigation according to their scientific interest and the scientific value of the investigated questions but not due to the hotness of the investigated fields. In this work, using published papers from the American Physical Society (APS) Physical Review journals beginning in 1976 and ending in 2009, we test whether the subject of a new paper is more likely to be in a hot field than in a relatively unknown field when the paper is published. We also compare scientists from different countries. Such comparisons could provide insightful and interesting information. In China, modern scientific development is still very young. It is believed among many scientists that there are many more Chinese scientists that are followers than original thinkers compared with many other countries. In this work, we offer direct empirical support for this hypothesis. Finally, we also determine if the degree of tracing hot fields differs for papers with different number of authors or affiliations and different number of references. Interestingly, it is found that scientists who collaborate with more authors or more affiliations tend to follow hotter fields than those who works with a few collaborators or affiliations. Moreover, papers with a small number of references, on average, are more interest-driven or value-driven, whereas papers with a large number of references are more hotness-driven. These empirical discoveries, particularly if it is also performed in other fields and for a larger periods of time, could provide valuable information for policy makers.

Table 1 A list of some values of α, the degree of preferential attachment in different networks collected from literature, showing also the number of nodes N (and M for bipartite networks), the number of links E. The first column is the name of the investigated database and the discussed relationships are discussed within the brackets. Please note that APS-PACS, Google Earth and Digg are bipartite networks between individuals and groups while all other networks describe relations among individuals

Results

Empirically, the phenomenon of preferential attachment has been found in many systems. We compile a list of typical systems, where their positive exponent α values are shown in Table I, indicating that new links are more likely to attach to nodes with larger degree. Later, we will compare our results on hotness tracing of newly published papers against other phenomena listed in this table.

Firstly, we examine the phenomenon of preferential attachment of papers in the PR-PACS data set (see Methods for details). In a log-log plot, Fig. 1 (a) displays the accumulated distribution function κ(k) with respect to the size k of the field that a newly published paper belongs to. The positive exponents α indicate that new papers are more likely to focus on hot fields (larger sizes), i.e., new links preferentially attach to nodes (PACS) with larger sizes. Or to say, generally, scientists do publish more new papers in current hot fields. We obtain the exponents α by least-square fitting from k = 1 to k = 300 as the curves deviate from the straight line for large k due to low statistics. For different years t, all of κ(k) follow power law, namely κ(k) ~ kα+1, but slightly different parameters α (as shown in Fig. 1 (b)). We also plot the distribution of fields' size N(k) as an inset in Fig. 1 (a), which follows a highly skewed distribution.

Figure 1
figure 1

Empirical preferential attachment to hot fields of new papers in the PR-PACS data set.

(a) The cumulative probability functions κ(k) in the years 2000 and 2008. Inset: the distribution function of the sizes of fields. (b) The exponents α for different years. We start our measurements from 1995, not 1976, the beginning year of the data, since the method requires a relatively large initial system. Notice α = 0.52–0.87 is near the lower end of all the exponents collected in Table I. The 2009 exponent is relatively smaller compared to its previous several years for reason that we do not yet know. If not mentioned, Δt is one year in this paper. The straight lines are guide to the eye through this paper.

Compared with the preferential attachment phenomena in other fields listed in Table I, the exponents α = 0.52–0.87 from our PR-PACS data are near the lower end of all the exponents. Out of all the other 13 entries in Table I, only sexual contact in sexual networks (α = 0.32–0.80)4, users attached to membership of groups of Digg (α = 0.69)5 and friendship relations in Linkedin (α = 0.6)6 are approximately at the same level of hotness tracing. It might be easy to “follow” a star member in a social networking website, such as Flickr (α = 1.0)6; however, it might not be so easy to shift a research field, join a user group of a different product or change sexual partners towards hotter choices. Besides the difficulty in changing fields, it might also because that scientific works are more independent and more interest-driven or value-driven than other fields.

Secondly, we test whether the intensity of tracking hot topics of scientific research differs in different countries. Therefore, we classify the papers according to countries of the first author's affiliation and calculate the absolute contribution ratios r c of several major countries within PR-PACS data set. They are USA (33.07%), Germany (9.95%), Japan (6.94%) and China (3.73%). As seen from Fig. 2 (a), in the year 2008 the exponent (α = 1.06) of P. R. China is larger than that of other countries, e.g., USA 0.74, Germany 0.86 and Japan 0.93. Moreover, as shown in the inset of Fig. 2 (a), the exponent α for China is generally larger than that of other countries for different years. These results indicate that the phenomenon of tracing hotness is more severe among Chinese scholars. In addition, it is found that UK (3.80%) and France (5.55%) lie between USA and Germany, while Russia (1.89%) lies between Germany and China.

Figure 2
figure 2

Results for authors from the most-contributed countries in the PR-PACS data set.

(a) The cumulative probability functions κ(k) in the year 2008 for USA, Germany, Japan and P. R. China. The exponent of Chinese authors (α = 1.06) is much larger than that of other countries. Inset: The exponents α for different years, where the exponents α of China always are larger than that of USA. (b) The relative contribution ratio Rc(k) for USA, Germany, Japan and P. R. China.

To provide a comparative study, we also calculate the relative contribution ratio Rc(k) of papers from country c and belonging to the fields with size k (see Methods for the details). As seen from Fig. 2 (b), the relative contribution ratio Rc(k) of Chinese scholars is smaller than 1 in cold fields (smaller k) but larger than 1 in hot fields (larger k), indicating that Chinese scholars make less contributions to cold fields than their average contribution but more to hot fields than their average. Meanwhile, the situation of USA is opposite to that of China. This difference also indicates that Chinese scientists are more keen to follow hot topics than United States scientists from another aspect. This agrees with our previous observations.

Considering the fact that scientific studies in China are still young, it is understandable that a large percentage of them are on hotness-driven fields rather than value-driven fields. The discovered order – the USA, Germany, Japan and China – of degree of hotness tracing makes sense intuitively. These results are more or less consistent with our intuitions. How different positions are related to scientific policies of that country, or even the culture and values of that country, although is definitely worth a further investigation, is outside the scope of the current study. We simply want to demonstrate the capability of the methods that are discussed above in analyzing publication records and to present some basic discoveries using the methods in this work.

Next, we measure the influence of different number of authors and affiliations on the degree of tracing hot topics. Therefore, we classify the papers according to their number of authors and number of affiliations. It is argued in Ref. 7 that research works with many authors or many affiliations typically focus more on hot topics because it might exactly be the hotness of the paper subject that made collaboration attractive among the scientists and that a joint task team is generally more likely to focus on short-term projects rather than long-term projects. Here we make such an examination based on the PR-PACS data. We can see from Fig. 3 that overall, α, the degree of severity of tracing hot topics, increases with the number of authors and affiliations. These results provide empirical supports for the arguments in Ref. 7. In a sense, global collaboration is not necessarily a good strategy for high-quality, value-driven research topics as suggested in Ref. 7.

Figure 3
figure 3

The preferential attachment exponents α with respect to the number of authors NAut (the number of affiliations NAff in the inset) in the year 2000 and 2008.

Note that every three numbers for authors are grouped together and labelled as the intermediate number, e.g., 1, 2 and 3 are grouped together and labelled as 2. Overall, the exponents α increase with number of authors and affiliations.

Moreover, it is also claimed that the size of teams and the share of multi-university collaboration grows rapidly in recent decades8,9,10 and teams and multi-university collaborations tend to produce more highly cited papers9,10. It is believed that generally, the papers belonging to hot fields can easily acquire more citations than that of cold fields as there are more papers focusing on similar topics. Therefore, the fact discovered in8,9,10 does not imply that multi-university collaboration and larger teams are making bigger and significant progress. It might more due to that fact that the fields they are working on is bigger. We check this hypothesis in this section to see if the papers with more authors and more affiliations are more likely to attach to hot fields. It is indeed so as shown in Fig. 3. From this view point, our results are consistent with the claims in Ref. 9,10.

Finally, we investigate the effect of the number of references on tracking hot topics. We classify the papers according to their number of references. It is obvious that the average number of references in papers today is much larger than that of early times. For earlier times, one can intuitively hypothesize that a pioneer paper or a paper of good quality typically cited less references. However, today, the number of references may or may not relate to how innovative the paper is. Here, we examine this hypothesis. As seen from Fig. 4, the exponents α, the degree of severity of tracing hot topics, increase with the number of references, which could indicate that papers with a larger number of references are more likely to be on hot topics. Notice that the absolute values of α for larger number of references in recent years are larger than that in earlier years, suggesting that scientists trace hot fields a bit more severely in recent years than in earlier times. Therefore, the hypothesis is reasonable overall.

Figure 4
figure 4

The preferential attachment exponents α with respect to the number of references NRef.

Clearly, the exponents α increase with the number of references. The absolute value of α is a bit larger in recent years than in early years for larger number of references.

Discussion

We have found that in the tested fields of science for papers published by the APS physical review journals, hot fields attract more newly published papers; however, scientific works are not as hotness-driven compared with other fields. Among the major countries, this phenomenon of tracing hotness occurs more in China compared with other countries, which could be related to the fact that China is still a developing country in terms of scientific research. We also found that papers with more authors, more affiliations and more references were on subjects that were more hotness-driven. This observation could potentially be valuable to policy makers that fund scientific projects.

Here only data of physics publications were analyzed. A cross-field comparison would be interesting, particularly if one can do similar studies on math and social science, such as economics. Similar studies can be applied on producing, selling or buying products. That is to investigate when a product is manufactured, sold or bought, how often is it related to the number of the product that has been manufactured, sold or bought already. Such studies could be valuable for marketing research.

It might also be interesting to determine how often outstanding/important works when they are first published are in hot fields. Fields gradually or suddenly become hot after major breakthroughs are presented in a few pioneer papers, which later on, might be awarded or honoured. Using records of awards such as the Nobel prize, the Fields Medal, the Turing Award etc., it would be possible to perform a statistical analysis of papers with major breakthroughs. All the award-winning papers could be collected, where the same analysis to determine α can be performed, which could then be compared with the overall α.

The above investigation can be performed, not only at the macro level of counties of authorship but also at the mesoscopic level of affiliations of authorship. In this way, one might be able to compare tradition, culture and research quality at various levels of academic units. In principle, one could also collect all published papers of one author and treat all those papers as a subset and study the preferential attachment phenomenon of those papers if the size of such collection is large enough. It would then be possible to use it to measure the creativity and originality of a researcher.

Methods

Data description and notations

Our data set is a collection of all papers published by the American Physical Society (APS) Physical Review journals from 1976 to 2009. Each paper is recorded as a data entry, which includes its title, date of publication, classification number according to the AIP-Physics and Astronomy Classification Scheme (PACS), author(s), affiliation(s) and reference(s) to other papers within the data set. The entire data set contains more than 320, 000 entries, including a variety of article types, such as article, brief report, rapid communication, comment, reply, erratum, essay, announcement, editorial, announcement and so on. Here, we will only consider those research papers, e.g., article, brief report and rapid communication, with PACS numbers and refer to it as the PR-PACS data set. At last, we have N = 315, 082 entries, which each entry, i.e. each paper, is denoted as i and M = 5, 472 PACS, which each entry, i.e. each PACS number, is denoted as λ.

We use PACS, the established hierarchical classification systems of physics, to identify the fields within the physics field. The investigated system of APS-PACS is a bipartite network, where there are two kinds of nodes, i.e., papers and PACS and one kind of links, i.e., relation between paper and PACS, which is determined by the researchers. Mathematically, we can use one matrix to characterize the relation between paper and PACS. A is an N × M adjacency matrix, with element a = 1 if paper i belongs to PACS λ, or to say, PACS λ appears in paper i, otherwise 0. Generally, one paper can belong to several fields (have several PACS), therefore there are a total of 900,832 links between papers and PACS, i.e., . The size of a field, , is defined as the number of papers that belong to it, describing the popularity (hotness) of the fields. Then, we can calculate the number N(k) of fields with size k. Based on PR-PACS data, our main concern is to observe whether new papers would like to attach to hot fields (with larger size k), i.e., whether new links preferentially attach to the nodes with larger degree.

Measuring preferential attachment

Here, we explain our method for this statistical analysis for testing preferential attachment on temporal data set. The basic idea is to investigate whether new links are likely to attach to nodes with larger degree (size). We calculate the empirical value of the relative probability T(k) that a new paper published within a short period Δt connects to a field which has a size of k before the time t11 as follows. Since the corresponding time-dependent absolute probability Pk(t) that a new paper published in a field with size k is proportional to T(k)nk(t)/N(t), where nk (t) is the number of fields with size k and N(t) is the number of fields immediately before time t, then T(k) can be estimated by making a histogram of the sizes k of the fields to which each paper is added within the time period Δt in which each sample is weighted by a factor of ,

where kλ(t) = k means that the field, to which the papers published within the period Δt belong, has size k at time t. We now have the empirical curve T(k) from the above statistical analysis. In order to conveniently compare T(k) with different time t, T(k) is normalized as 12,13.

The preferential attachment hypothesis states that the rate T′(k) with which a node with k links acquires new links is a monotonically increasing function of k1, namely

For BA model α = 11. To obtain a smooth curve from noisy data, we take the cumulative function form instead of T′(k):

Thus, κ (k) should be proportional to kα+1. We can now fit the empirical curve from the previous statistical analysis and then compare it against this hypothesized curve of preferential attachment. This is the general procedure of all the analysis presented in this work. This method has been extensively used to measure the preferential attachment phenomenon in different kinds of networks4,5,6,11,12,13,14,15,16,17,18,19.

To test the preferential attachment of scientific research differs in different countries, we separated the entire data set according to countries of the first author's affiliation and then perform a comparison among the most contributed countries (USA, Germany, Japan, China). With this separated data set, we perform the examination of preferential attachment only counting the papers from authors in country c as

Here affi = c means the principle affiliation of this paper i is in country c. In counting k and nk(t), we included papers from all countries, meaning that scientists from all countries face the temptation of tracing the same overall hotness in the entire PR-PACS data set. Similarly, besides countries, the above calculation can be applied to any features of papers, such as different number of authors, affiliations and references.

Measuring relative contribution ratio Rc (k)

Absolute contribution from a country is measured simply by a percentage of published papers from that country out of the total number of published papers,

where mk is the number of papers (from country c) belonging to fields with size k. Here we present a more detailed breakdown of this absolute contribution by looking at each individual field what is the percentage of papers from that country out of all papers in that field and then normalized by the absolute contribution of that country,

This is a static measure, so it is easy to perform. In a sense it also describes how often scientists in that country are pursuing hot fields.