Introduction

The strength of most interactions in nature typically decreases with the distance between objects or constituents. The most famous example is Newton’s gravitational force, which is known to decay with the square of the distance between the masses. This principle holds also outside the realm of physical processes. Recent studies on mobile phone communication networks1,2 and blogs3 have revealed that the probability for a social tie to occur between agents decays with a power of their distance.

Likewise, scientific interactions are likely to take place between scholars localized in the same or nearby areas. Scientists tend to cluster in space, since the elaboration and progress of a project requires frequent discussions between collaborators that is hardly possible if they live far apart. Factors based on cultural, linguistic and institutional differences cause additional obstacles to long-distance cooperation4. Further, research funding is mostly allocated at the national level5, thus favoring regional over international collaborations.

Nowadays, the Internet and the greater affordability of international transportation have enormously reduced distances between people, overcoming both geographic and cultural barriers6,7,8. This in turn has made scientific collaborations between distant scholars far easier than before9,10,11,12,13,14. Nevertheless, the role of geography in the creation and recognition of scientific output is not yet fully known. For example, How do scientific interactions depend on distance? Is collaboration concentrated within the perimeter of a university, of a city or of a country, as it used to be in the past, or has it become truly international, possibly due to the modern information and communication technologies?

Multi-authored collaborations serve as big opportunity for science15, as one can integrate a wide range of competence and skill, to attack difficult problems, with an enhanced chance of success. Indeed, the last decades have witnessed the formation of larger and larger research teams16,17. In particular, multi-university collaborations have been growing at a fast pace and are more likely to lead to high impact publications18, especially if they involve different countries19,20. On the other hand, there is also evidence of decreasing returns from large team size, likely from management inefficiencies, which limits the productivity arising from collaboration21.

Geographic proximity is also likely to affect the process of giving and receiving credits for someone’s work, expressed by paper citations. For most papers one expects to find a decaying probability of citation with distance, as new findings are typically more visible in the area where the authors operate. This is confirmed by a recent study22. In addition, collaboration patterns are likely to influence and be influenced by citations. While collaborating, scholars become more familiar with the scientific output of their co-authors, which then has a higher chance to be cited in the future. In turn, scholars citing frequently each other’s work have strongly overlapping research interests and are more likely to become co-authors sooner or later. Therefore citations and collaborations between distinct locations are likely to be correlated. However, it is crucial to assess how collaborative patterns affect citation flows, to be able to disentangle the actual impact of a publication (and, therefore, its merit) from credits coming through social networking. A geographic analysis of citation flows between cities is also useful to understand how quickly a new result gets recognized by the scientific community in different geographical areas, which may help to uncover how new scientific paradigms spread and get established23.

Knowing how scientific interactions vary with distance is also valuable for practical reasons. To scholars, it might suggest how to choose collaborators in order to optimize the impact and visibility of their research. To institutions and governments, it might advice suitable allocations of funds for regional and international projects, in order to improve the scientific outcome for a given amount of resources. It is then not surprising that spatial scientometrics has acquired a prominent role during the last few years. There are a number of studies carried out exploiting the enhanced availability of citation data24. Yet there are other factors, namely funding, that also plays a crucial role in the development of a research project, as it not only contribute towards the direct and overhead costs of the research but also facilitates the cooperation and collaboration among researchers working in different locations and different fields25. Since both public and industrial resources are used to fund academic research, it is also natural to question the result and impact obtained with these resources26,27.

We have performed the first comprehensive study of citation and collaborative interactions between different geographic locations. We used one of the world’s largest citation databases to derive the citation and the collaboration network, i.e. weighted networks where nodes are cities and links are citations and collaborations between the corresponding cities (see Methods). The analysis of these networks28,29,30,31 discloses the existence of gravity laws as well as non-trivial correlation between collaborations and citations. Finally, we explore the issue of the importance of funding to research and development in promoting high quality science, by studying the relationship between national expenditure, the number of publications and their impact in terms of number of citations for different countries.

Results

The research contribution of each country in terms of the (normalized) number of citations received NCite is illustrated in the world map of Fig. 1A. Colored maps can be misleading as the value assigned to a large area gives an impression of a much greater impact of that color in the visualization. We thus created a cartogram, in which the geographic regions are deformed and rescaled in proportion to their relative research contribution32. The citation strengths of countries span over seven orders of magnitude. North America and Europe receive 42.3% and 35.3% of world’s citations, respectively. In contrast, the contribution by Asia amounts to only 17.7% of world’s citations while the total contribution of Africa, South America and Oceania is lower than 5%. In this ranking the United States is the leading country followed by the United Kingdom, Germany, Japan and China. The corresponding world map in terms of countries’ number of (normalized) publications is shown in the Supplementary Fig. S1 online. This heterogeneity suggests that a small number of countries have a substantial contribution to research while the rest has a negligible contribution. In Fig. S2 online we report the results for the average number of citations of each country.

Figure 1
figure 1

Properties of the world citation network.

(A) Citation map of the world where the area of each country is scaled and deformed according to the number of citations received, which is also represented by the color of each country. (B) Citation distribution of papers of top 20 countries. If a paper is written by authors from multiple countries, the paper contributes to each country. (C) When the distributions in (B) are normalized by the average number of citations of each country, they fall on top of each other. (D) Probability distribution function of the number of citations received by each city. (E) Cumulative distribution function of the link weights wij (excluding self-links) and self-links wii in the citation network of cities. (F) Node in-strength against its in-degree for the city citation network. (G) Link weight against the product of the strengths of the connected nodes in the city citation network. For each plot we show the corresponding best-fit lines and power law exponents.

In order to find out the quality of papers published by different countries we consider the number of citations of each of the papers written by that country. In Fig. 1B we plot the probability distribution of the number of citations of papers in the largest 20 countries. A paper is associated to a country if at least one of its affiliations is from that country. All these distributions are broad and vary over four orders of magnitude. When each distribution is rescaled by the average number of citations of papers of the respective country, all curves nicely collapse (Fig. 1C). This result suggests that the functional form of the citation distribution is the same in each country and that the difference between countries can be effectively summarized by the average number of citations. This type of universality holds at the level of scientific disciplines as well33.

Next we consider the contribution at the level of cities. In Fig. 1D we plot the probability distribution of the cities’ citations. The distribution is broad, spanning over five orders of magnitude and it follows a power law decay with exponent 1.46 ± 0.03. This suggests a relationship with the population of the city, as the city size distribution obeys the Zipf law34,35, i.e. decays as a power law (with exponent 2). The observed power law scaling relation might suggest a self-organization phenomena due to the agglomeration benefits in science. These advantages can be due to the ease of collaboration between groups working in similar fields, sharing of infrastructure and support, etc., which leads to efficient integration and transfer of information.

We now consider the weighted citation network between cities, where the nodes are the cities that are connected by weighted and directed links, indicating publications of one city citing publications of the others. The network has 18,199 nodes and 9,494,021 links including 14,447 self-links (i.e., citations within the same city). In Fig. 1D we plot the cumulative distribution of the weights of self-links and links between different nodes. Both these distributions are broad; however, the weights of self-links are more heterogeneous, revealing a bias towards self-citations. Next we calculate the number of incoming links, i.e., the in-degree of each node i and its in-strength, , which equals the number of (normalized) citations received. By plotting the in-degree against the in-strength, we find that there is a power law scaling behavior with 〈sin〉(kin) (kin)α (Fig. 1E). However, there are two distinct scaling regimes: for nodes with small (< 200) the exponent is α = 0.91 ± 0.03 (regression coefficient ± standard error of the estimate R = 0.95 ± 0.01), while for large (≥ 200) the exponent is α = 2.20 ± 0.08 (R = 2.01 ± 0.01). The super-linear behavior suggests that stronger links are more frequently connected to high in-degree nodes. The out-strength of the nodes follows a similar relationship with the outdegree of the nodes (see Supplementary Fig. S1 online). Finally, we plot the weights of the links against the product of the node strength . The product gives the weight of a link that is expected to occur by chance between i and j if all the papers would be citing each other at random. Even in this case there are two distinct scaling regions, , where α = 0.13 ± 0.01 (R = 0.19 ± 0.0003) if the product is less than 2 × 107, while for larger values of the product α = 0.99 ± 0.01 (R = 1.07 ± 0.001). This suggests that the observed citation is as expected between high strength nodes, while it is much lower in case of cities with low strength.

Let us now consider the collaboration network at the city level, where the nodes are cities and weighted undirected links indicate the presence and frequency of collaborations between scholars of different cities. There are 18,199 nodes in the network and 1,256,718 undirected links including 14,954 self-links. The weight of the self-links indicates the amount of internal collaboration. The degree of a node i indicates the number of other cities with which i collaborates and its strength is indicative of, but not coincident with, the number of papers written by scholars of institutions in that city.

In Fig. 2A we plot the cumulative probability distribution of link weights. As for citations, the weights of self-links are more broadly distributed than the weights of the links between different cities, showing that scholars of a city collaborate more frequently with each other than with colleagues from any other city. The distributions of collaboration and citation streams between cities differ from their analogues in mobile phone communications and world trade, that show log-normal distributions2,36. Next, we consider the fraction of internal collaboration by calculating the ratio of the weight of the self-link to the strength of the node. By plotting against the strength of the node si, we see that the ratio increases with si, indicating that as the city size increases most of its collaborations take place within the city (Fig. 2B). However, for small cities most of their papers are written with external collaborators. The node degree scales with its strength as 〈s〉(k) kα, where α = 1.66 ± 0.04 (R = 1.65 ± 0.01) (Fig. 2C). This super-linear scaling suggests that higher degree nodes are more frequently connected by stronger links.

Figure 2
figure 2

Properties of the world collaboration network.

(A) Cumulative probability distribution of the link weights in the collaboration network of cities. Self-links are shown separately. (B) Fraction of internal collaboration, indicated by the ratio of the weight of the self-link and strength si of a node, against si. (C) Strength of a node against its degree. The straight line indicates a power law behavior with exponent 1.66 ± 0.04. In these plots we use the same colorbar as in Fig. 1.

Let us explore the relationship between the citation and the collaboration networks at both the country and the city level. At the country level the collaboration network comprises 226 nodes and 10,308 undirected links, including 219 self-links. In the citation network there are also 226 nodes but 28,869 directed links, including 215 self-links. In Fig. 3, we plot the weight of links of the collaboration network, against the weight of the same links in the citation network, . We find scaling where α = 1.04 ± 0.01 (R = 1.08 ± 0.008) for countries (Fig. 3A) and α = 0.82 ± 0.02 (R = 1.05 ± 0.002) for cities (Fig. 3B), i.e. the increase in collaboration is linearly related to the amount of citations exchanged between the two countries/cities.

Figure 3
figure 3

Correlation between the world citation and collaboration networks.

Weight of the links in the citation network against the corresponding links in the collaboration network at the (A) country level and (B) city level network. Power law scaling is shown by solid lines with exponents 1.04 ± 0.01 and 0.82 ± 0.02, respectively. Density plot of the number of citations of a publication against the number of (C) co-authors, (D) countries (E) cities in the affiliation. The circles indicate the average trend.

We now consider the dependence of the number of citations of a paper on the number of coauthors of that paper and on the number of affiliations of its coauthors. It has been previously shown that papers published by teams often get more citations than single author papers17,18. Our results also show that the average number of cites of a publication increases with the number of co-authors of that publication (Fig. 3C). Furthermore, the average number of citations of a publication increases with the number of affiliated countries and cities of its authors (Fig. 3D and E). In order to separate the effect of the number of coauthors and different type of collaboration (internal, domestic and international) we grouped each paper based on its affiliations and number of coauthors. In Table 1, we consider papers with a given number of authors and categorize them according to whether all the affiliations listed in the paper are from a single city, from multiple cities in a single country or from different countries. For an equal number of authors, publications having multiple international affiliations get a statistically significant increment (p < 10−4) in the number of citations with respect to publications with only domestic affiliations. Thus, crossing territorial boundaries also pays off in terms of scientific impact. In contrast, multiple domestic affiliations do not positively effect the number of citations when the number of authors in a publication is less than 6.

Table 1 Dependence of citations on collaboration. We categorize each paper by the number of authors and their affiliations. For each of these groups we indicate the fraction of papers that are in the group and the mean number of citations. The error represents the standard error of the mean, calculated using bootstrap sampling with repetition

Next we consider the effect of geographical proximity on the citation and collaboration networks by determining the geographic location (latitude and longitude) of each place in the dataset37 (see Methods). We found that the probability that there is a link between two cities in the collaboration network decreases as a power law as the distance between the two cities increases (Fig. 4A). The power law exponent is 0.57 ± 0.01. Our results are different from those obtained in Ref. 38, where it was found that the distribution of distances between co-authors decreases exponentially. Such difference might be due to the limited dataset used in Ref. 38, which included only papers published before 1990 and possibly also due to the recent advances in communication and transportation technologies.

Figure 4
figure 4

Effect of geographical proximity in the world collaboration and citation networks.

The probability of existence of a link as a function of the distance between two cities in the (A) collaboration network and (B) citation network. Distribution of the ratio of the link weight and product of the strengths of its endpoints in (C) collaboration network, and (D) citation network, against the distance dij between the cities. For each distance the average ratio is also shown. The solid line indicates a power law behavior with exponent α = 1.16 ± 0.03 and 0.77 ± 0.02 respectively.

Many spatially embedded networks have been observed to follow gravity laws37, where the flow between two locations follows

Here, Tij is the flow between nodes i and j, Pi and Pj are the populations of nodes i and j, respectively and dij is the geodesic distance between i and j, the value of exponent α being dependent of the system. For the collaboration network Eq. 1 becomes

In Fig. 4B, we plot the ratio against the distance dij between all node pairs. We found that as the distance increases decreases as a power law with the exponent α = 1.16 ± 0.03 (R = −0.97 ± 0.002), except at very short distances. As we have seen before, collaboration and citation between two places are correlated. Hence, we also look at the geographical proximity in the citation network. We found that the probability that there is a link between two cities in the citation network also decreases with distance as a power law (Fig. 4C). In this case the power law exponent is much lower (0.30 ± 0.01). The gravity law for the citation network reads

In Fig. 4D we plot against the distance between all the node pairs in the citation network. As for the collaboration network we found that decreases with distance as a power law with the exponent α = 0.77 ± 0.02 (R = −0.35 ± 0.001). The above analysis shows the existence of an important spatial component in both the citation and the collaboration network. It shows that both our collaborators and our citations typically come from our spatial neighborhood. Further, long distance collaborations as well as citations decrease as a power law of distance. The difference of the scaling exponents of the two networks suggests that two distant places are more likely to cite each other than collaborate. Additional results are shown in the Supplementary Fig. S3 online.

The research performance of each country is generally estimated on the basis of the number of publications and citations. Although these are straightforward measurements of research output, they depend on a wide spectrum of resources39. For instance, the number of researchers and facilities (instruments, laboratories, libraries and other resources) available are typically different in different countries. A key determinant is the funding available for research & development (R&D). To quantify the expenses in R&D of a country we consider the fraction of gross domestic product (GDP) that is spent on R&D. To get rid of economic inequalities in different countries we consider the R&D spending in terms of the purchasing power parity (PPP). In Fig. 5A, we plot the number of citations NCite against the R&D expenditure and find that it scales linearly with funding. Such correlation is not surprising, but the scaling exponent is non-trivial. It suggests that it is not possible to perform or contribute substantially unless there is a corresponding amount of funding available for research. Moreover, the research contribution in terms of citations also scales linearly with the number of researchers in that country (Fig. 5B). This result is consistent with the fact that the R&D expenditure is correlated with the number of researchers. The number of publications of a country also shows similar scaling against R&D expenditure and number of researchers (Supplementary Fig. S4 online).

Figure 5
figure 5

Relation between research outcome and funding.

Average number of citations per paper of a country against (A) the expenditure in research and development (in millions of dollars per year and purchasing power parity) and (B) the number of researchers in that country. The solid line indicates power law scaling with exponent 0.99 ± 0.03 and 0.98 ± 0.04, respectively. (C) Average number of citations per paper of a country against the average spending per researcher. The horizontal line indicates the average number of citations over all papers of all countries, the vertical line indicates the threshold of about 100,000 $ per researcher per year.

Finally as a measure of impact of a country’s scientific output we consider the average number of citations to the publications of that country. In Fig. 5C we plot this number against the average spending per researcher per year (R&D expenditure divided by the number of researchers). The latter is not the average salary of researchers in that country, as it includes other expenditures such as infrastructure, bureaucracy, instruments, etc. This plot is much more scattered than the previous plots and does not show any definite correlation pattern. In order to identify groups of countries that behave similarly or show similar characteristics we use the k-mean clustering technique40. By using this clustering method with k = 2, we found that the countries can be classified into two groups, one with average spending less than about 100,000 $ per researcher per year and other with average spending more than about 100,000 $ (Fig. S5 online). Another clustering methods also gives qualitatively similar results. This separation in two groups, distinguished by the average spending per researcher per year (vertical line in the plot) also reveals another striking feature. If the average spending is less than about 100,000 $ (vertical line in the plot) per researcher per year we see an increase in the average number of citations with the spending. However if the average spending exceeds this limit, it becomes scattered and independent of funding. This figure shows that very rich countries like Kuwait and Luxembourg have high funding per researcher, still the average number of citations per paper is low. Countries like India, Brazil have high funding per researcher as well, but low average number of cites; this might mean they are investing more on infrastructure. Switzerland, Costa Rica, Panama, Germany, Austria, Netherlands, United States have high spending per researcher and their average number of citations is also high. If we display the number of cites per paper averaged over all countries (horizontal line), we see that there are no countries in the top left quadrant, i.e. it is not possible to do better than the world’s average unless there is sufficient spending. Additional measures of a country’s research performance and corresponding rankings are reported in the Supplementary Table S1 online.

Discussion

Our thorough analysis of the world citation and collaboration networks has revealed that the effects of geography on the dynamics of science are relevant, despite the recent advances in communication and transportation. The occurrence of gravity laws for both citation and collaboration implies a preference by scientists to interact with peers in their geographic areas. However, long-distance interactions are not rare, as the interaction strength and probability are characterized by power law decays. Our work follows similar findings in mobile phone communication1,2, social media3 and international trade41, reinforcing the belief that gravity laws hold in several different contexts and that scientific interactions are not exceptional from this point of view. Thus, the gravity law is a fundamental relationship holding also in human dynamics.

Citation and collaboration streams between distinct locations are strongly correlated, with an approximately linear relation. An increase in the number of collaborations between two cities is then expected to be followed by a proportional increase in the flow of citations between the cities. This is justified from the fact the people/groups working in similar fields and subject area are more likely to cite as well as collaborate with each other and also suggests a natural bias towards self-citation, of which we have provided strong quantitative evidence.

From the point of view of scientific impact, it pays off for a team to put together several institutions with a strong international participation. While part of this effect could be justified by the fact that having people from different locations facilitates the circulation of a work, which then becomes more visible and susceptible to be cited, the trend indicates that it is more likely to produce high quality work through international collaborations. It would be valuable to be able to disentangle the impact due to social networking from that due to the quality of the paper. Our findings pave the way for the first quantitative assessment of this issue. As a consequence, we expect to observe an increasing tendency to form large teams with members of many different countries in the future.

We also disclose a striking effect in the relationship between the national expenditure per researcher and the impact of the scientific output of a country. If the average spending per researcher per year is low, it is impossible for a country to do better than the world average, in terms of the average number of cites per paper. So there is a minimal funding quota that needs to be exceeded if a country wishes to have a scientific output of high average quality. Exceeding the threshold, however, does not guarantee success. This suggest that in science money acts as a kind of threshold motivator: if one does not pay people enough they will not be motivated and the outcomes of the research are poor; if people are paid sufficiently to take the issue of money off the table, internationally competitive findings are within reach. On the other hand, for conceptual and creative tasks, paying more than a certain threshold does not necessarily increase the output42,43,44. Further, our analysis reveals that at the country level funding has a positive linear impact on the research output both in terms of number of publications as well as citations. Thus, it is not possible for a country to increase its research output substantially without a sizeable increase in investments.

In the future we plan to study the role of cities' population, in particular on the distributions of citation and collaboration strengths along with their flows. It is well known that most characteristics of cities are strongly correlated to the size of their populations45. Furthermore, an analysis of the evolution of the world citation and collaboration networks would show how the spatial dimension of science dynamics has been affected by the progress of technology, internationalization and extreme events (e.g. wars, economic crises). This way one could infer how the scientific landscape has been shaping up in the last decades and how it is possible to create more efficient partnerships, via dedicated funding programs at the national and/or international level and consequently a more productive and successful scholarly world.

Methods

Data description

We have analyzed all publications (articles, reviews and editorial comments) written in English from 2003 till the end of 2010 included in the database of the Institute for Scientific Information (ISI) Web of Science. For each publication we extract the affiliations of the authors and the corresponding citations to that publication. We parsed the affiliations of all publications and have determined the geographic location at the city and country level. If there are multiple affiliations listed in a publication, the latter is associated with all represented cities and countries. After obtaining the locations we use the publicly available resources (www.wikipedia.org and maps.google.com) to determine their coordinates (latitude and longitude). Our dataset consists of 8,094,948 publications which have received 62,105,592 citations during the period 2003–2010. We were able to extract the geographical information from 8,092,314 publications. Affiliations refer to 226 countries and 37,750 cities. In order to get rid of anomalies due to any misclassification, we have only considered those places that have appeared in at least 5 publications during the period 2003–2010. This cutoff led us to 18,199 cities, producing 99.8% of the total publications and receiving 99.9% of total citations.

Country level information regarding expenditures for research and development (R&D) in terms of purchasing power parity (PPP) and number of researchers in R&D are obtained from the World Bank Data (databank.worldbank.org) for each year between 2003 till 2010. By aggregating these yearly datasets we determine the average of each of the above quantities for the period 2003–2010. The data of expenditure for R&D is available for 102 countries, the numbers of researchers for 89 countries and for 77 countries both datasets are available. Further details can be found in the Supplementary Methods online.

Network construction

We have analyzed the data at the country and the city level. As the publications and their affiliations form a bipartite graph, we construct the collaboration network between countries (cities) by projecting it onto the space of affiliations. In this collaboration network individual countries (cities) act as nodes and links between them indicate that they have appeared in the same publication. If a paper is written by authors with n affiliations, we put undirected links between each possible pair of collaborating countries (cities), with every link having weight . The total weight between any pair of nodes is the sum of all the weights over all the publications in the dataset. If there is a single affiliation in a publication then we put a self-link with weight 1.

In the citation network between countries (cities) nodes are papers which are linked if one paper cites the other. If a paper written by authors with n affiliations cites a paper written by authors with m affiliations we put n × m directed connections from each of the n citing countries (cities) to each of the m cited countries (cities), every link having weight 1/(nm). The total weight of a directed link between two countries (cities) is the sum of all the weights over all the citations in the dataset. Since there can be multiple affiliations from the same country (city) in a publication, there are self-loops both in the world citation and in the world collaboration networks.

Great-circle distance

The geodesic or the great-circle distance is the shortest distance between any two points on the earth measured along a path on the surface of the earth. Given the latitudes and longitudes of two points, we have used the Haversine formula to calculate the great-circle distance between them46. In these calculations, we considered the earth's radius to be 6372.8 KM.