Abstract
Recent science of science research shows that scientific impact measures for journals and individual articles have quantifiable regularities across both time and discipline. However, little is known about the scientific impact distribution at the scale of an individual scientist. We analyze the aggregate production and impact using the rankcitation profile c_{i} (r) of 200 distinguished professors and 100 assistant professors. For the entire range of paper rank r, we fit each c_{i} (r) to a common distribution function. Since two scientists with equivalent Hirsch hindex can have significantly different c_{i} (r) profiles, our results demonstrate the utility of the β_{i} scaling parameter in conjunction with h_{i} for quantifying individual publication impact. We show that the total number of citations C_{i} tallied from a scientist's N_{i} papers scales as . Such statistical regularities in the inputoutput patterns of scientists can be used as benchmarks for theoretical models of career progress.
Introduction
A scientist's career path is subject to a myriad of decisions and unforeseen events, such as Nobel Prize worthy discoveries^{1}, that can significantly alter an individual's career trajectory. As a result, the career path can be difficult to analyze since there are potentially many factors (individual, mentorapprentice, institutional, coauthorship, field)^{2,3,4,5,6,7,8,9} to account for in the statistical analysis of scientific panel data.
The rankcitation profile, c_{i} (r), represents the number of citations of individual i to his/her paper r, ranked in decreasing order c_{i} (1) ≥ c_{i} (2) ≥ …c_{i} (N) and provides a quantitative synopsis of a given scientist's publication career. Here, we analyze the rankordered citation distribution c_{i} (r) for 300 scientists in order to better understand patterns of success and to characterize scientific production at the individual scale using a common framework. The review of scientific achievement for postdoctoral selection, tenure review, award and academy selection, at all stages of the career is becoming largely based on quantitative publication impact measures. Hence, understanding quantitative patterns in production are important for developing a transparent and unbiased review system. Interestingly, we observe statistical regularities in c_{i} (r) that are remarkably robust despite the idiosyncratic details of scientific achievement and career evolution. Furthermore, empirical regularities in scientific achievement suggest that there are fundamental social forces governing career progress^{10,11,12,13}.
We group the 300 scientists that we analyze into three sets of 100, referred to as datasets A, B and C, so that we can analyze and compare the complete publication careers of each individual, as well as across the three groups:

[A] 100 highlyprofile scientists with average hindex 〈h〉 = 61 ± 21. These scientists were selected using the citation shares metric^{9} to quantify cumulative career impact in the journal Physical Review Letters (PRL).

[B] 100 additional “control” scientists with average hindex 〈h〉 = 44 ± 15.

[C] 100 current Assistant professors with average hindex 〈h〉 = 14 ± 7. We selected two scientists from each of the top50 US physics departments (departments ranked according to the magazine U.S. News).
In the methods section we describe in detail the selection procedure for datasets A, B and C and in tables S1S6 we provide summary statistics for each career.
There are many conceivable ways to quantify the impact of a scientist's N_{i} publications. The hindex^{14} is a widely acknowledged singlenumber measure that serves as a proxy for production and impact simultaneously. The hindex h_{i} of scientist i is defined by a single point on the rankcitation profile c_{i} (r) satisfying the condition
To address the shortcomings of the hindex, numerous remedies have been proposed in the bibliometric sciences^{15}. For example, Egghe proposed the gindex, where the most cited g papers cumulate g^{2} citations overall^{16} and Zhang proposed the eindex which complements the h and g indices quantitatively^{17}.
To justify the importance of analyzing the entire profile c_{i} (r), consider a scientist i = 1 with rankcitation profile c_{1}(r) ≡ [100, 50, 33, 25, 20, 16, 14, 12, 11, 10, 9…] and a scientist i = 2 with c_{2}(r) ≡ [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9…]. Both scientists have the same hindex value h = 10, although c_{1}(r) tallies 2.9 times as many citations as c_{2}(r) from his/her mostcited 10 papers. Hence, an additional parameter β_{i} is necessary in order to distinguish these two example careers. Specifically, the β_{i} parameter quantifies the scaling slope in c_{i} (r) for the highrank papers corresponding to small r values. In this simple illustration, β _{1} ≈ 1 while β _{2} ≈ 0.
In Fig. 1 we plot c_{i} (r) for 5 extremely highimpact scientists. The individuals EW, ACG, MLC and PWA are physicists with the largest h_{i} values in our data set; BV is a prolific molecular biologists who we include in this graphical illustration in order to demonstrate the generality of the statistical regularity we find, which likely exists across discipline. However, citation and hindex metrics should not be compared across discipline since baseline publication and citation rates can vary significantly between research fields Refs[8, 9]. To demonstrate how the singe point c_{i} (h_{i} ) is an arbitrary point along the c_{i} (r) curve, we also plot the lines H_{p} (r) ≡ p r for 5 values of p = {1, 2, 5, 20, 80}. The value p ≡ 1 recovers the hindex h_{1} = h proposed by Hirsch. The intersection of any given line H_{p} (r) with c_{i} (r) corresponds to the “generalized hindex” h_{p} ,
proposed in^{18} and further analyzed in^{19}, with the relation h_{p} ≤ h_{q} for p > q. Since the value p ≡ 1 is chosen somewhat arbitrarily, we take an alternative approach which is to quantify the entire c_{i} (r) profile at once (which is also equivalent to knowing the entire h_{p} spectrum). Surprisingly, because we find regularity in the functional form c_{i} (r) for all 300 scientists analyzed, we can relate the relative impact of a scientist's publication career using the small set of parameters that specify the c_{i} (r) profile for the entire set of papers ranging from rank r = 1…N_{i} . Using a much smaller parameter space than the h_{p} spectrum, we can begin to analyze the statistical regularities in the career accomplishments of scientists.
The aim of this analysis is not to add another level of scrutiny to the review of scientific careers, but rather, to highlight the regularities across careers and to seed further exploration into the mechanisms that underlie career success. The aim of this brand of quantitative social science is to utilize the vast amount of information available to develop an academic framework that is sustainable, efficient and fruitful. Young scientific careers are like “startup” companies that need appropriate venture funding to support the career trajectory through lows as well as highs^{13}.
Results
A Quantitative Model for c_{i} (r)
For each scientist i, we find that c_{i} (r) can be approximated by a scaling regime for small r values, followed by a truncated scaling regime for large r values. Recently a novel distribution, the discrete generalized beta distribution (DGBD)
has been proposed as a model for rank profiles in the social and natural sciences that exhibit such truncated scaling behavior^{20,21}. The parameters A_{i} , β_{i} , γ_{i} and N_{i} are each defined for a given c_{i} (r) corresponding to an individual scientists i, however we suppress the index i in some equations to keep the notation concise. We estimate the two scaling parameters β_{i} and γ_{i} using Mathematica software to perform a multiple linear regression of ln c_{i} (r) = ln A_{i} − β_{i} ln r + γ_{i} ln(N_{i} + 1 − r) in the base functions ln r and ln(N_{i} + 1 − r). In our fitting procedure we replace N with r_{1}, the largest value of r for which c(r) ≥ 1 (we find that r_{1}/N_{i} ≈ 0.84 ± 0.01 for careers in datasets A and B). Figs. 1 and 2 demonstrate the utility of the DGBD to represent c_{i} (r), for both large and small r. The regression correlation coefficient R_{i} > 0.97 for all ln c_{i} (r) profiles analyzed.
The DGBD proposed in^{20} is an improvement over the Zipf law (also called the generalized powerlaw or Lotkalaw^{22}) model and the stretched exponential model^{14} since it reproduces the varying curvature in c_{i} (r) for both small and large r. Typically, an exponential cutoff is imposed in the powerlaw model and justified as a finitesize effect. The DGBD does not require this assumption, but rather, introduces a second scaling exponent γ_{i} which controls the curvature in c_{i} (r) for large r values. The DGBD has been successfully used to model numerous rankordering profiles analyzed in^{20,21} which arise in the natural and socioeconomic sciences. The relative values of the β_{i} and γ_{i} exponents are thought to capture two distinct mechanisms that contribute to the evolution of c_{i} (r)^{20,21}. Due to the data limitations in this study, we are not able to study the dynamics in c_{i} (r) through time. Each c_{i} (r) is a “snapshot” in time and so we can only conjecture on the evolution of c_{i} (r) throughout the career. Nevertheless, we believe that there is likely a positive feedback effect between the “heavyweight” papers and “newborn” papers, whereby the reputation of the “heavyweight” papers can increase the exposure and impact the perceived significance of “newborn” papers during their infant phase. Moreover, the 2regime powerlaw behavior of c_{i} (r) suggests that the reinforcement dynamics can be quantified by the scalefree parameters β and γ.
The β_{i} value determines the relative change in the c_{i} (r) values for the highrank papers and thus it can be used to further distinguish the careers of two scientists with the same hindex. In particular, smaller β values characterize flat profiles with relatively low contrast between the high and lowrank regions of any given profile, while larger β values indicate a sharper separation between the two regions.
In Fig. 2(a) we plot c_{i} (r) for each scientist from dataset [A] as well as the average of the 100 individual curves (see Figs. S1 and S2 for analogous plots for datasets [B] and [C]). We find robust powerlaw scaling
for 10^{0} ≤ r ≤ 10^{2}. The scaling value calculated for other ranksize (Zipf) distributions in the social and economic sciences is typically around unity, β ≈ 1, for example in studies of word frequency^{23} and city size^{20,21,24}. Here we calculate β_{i} for each individual author and observe a distribution which is centered around characteristic values 〈β〉 = 0.83 ± 0.23 [A], 〈β〉 = 0.70 ± 0.16 [B], 〈β〉 = 0.79 ± 0.38 [C].
We calculate each β_{i} value using a multilinear leastsquares regression of ln c_{i} (r) for 1 ≤ r ≤ r_{1} using the DGBD model defined in Eq. [3]. To properly weight the data points for better regression fit over the entire range, we use only 20 values of c_{i} (r) data points that are equally spaced on the logarithmic scale in the range r ∈ [1, r_{1}]. We elaborate the details of this fitting technique in the methods section. We plot five empirical c_{i} (r) along with their corresponding bestfit DGBD functions in Fig. 1 to demonstrate the goodness of fit for the entire range of r.
In order to demonstrate the common functional form of the DGBD model, we collapse each c_{i} (r) along a universal scaling function c(r′) = 1/r′, by using the rescaled rank values defined for each curve. In Figs. 2(b), S1(b) and S2(b), we plot the quantity c_{i} (r′) ≡ c_{i} (r)/A(r_{1} + 1 − r) ^{γ} , using the bestfit γ_{i} and A_{i} parameter values for each individual c_{i} (r) profile. While the curves in Fig. 2(a) are jumbled and distributed over a large range of c(r) values, the rescaled c_{i} (r) curves in Fig. 2(b) all lie approximately along the predicted curve c(r′) = 1/r′.
Using c_{i} (r) to quantify career production and impact
A main advantage of the hindex is the simplicity in which it is calculated, e.g. ISI Web of Knowledge^{25} readily provides this quantity online for distinct authors. Another strength of the hindex is its stable growth with respect to changes in c_{i} (r) due to time and informationdependent factors^{26}. Indeed, the hindex is a “fixedpoint” of the citation profile. This time stability is evident in the observed growth rates of h for scientists. Average growth rates, calculated here as h/L, where L is the duration in years between a given author's first and most recent paper, typically lie in the range of one to three units per year (this annual growth rate corresponds to the quantity m introduced by Hirsch^{14}). Annual growth rates h/L ≈ 3 correspond to exceptional scientists (for the histogram of P(h/L) see Fig. S3 and for h/L values see the SI text (Tables S1–S6)). As a result, h/L is a good predictor for future achievement along with h^{27}.
It is truly remarkable how a single number, h_{i} , correlates with other measures of impact. Understandably, being just a single number, the hindex cannot fully account for other factors, such as variations in citation standards and coauthorship patterns across discipline^{28,29,30}, nor can h_{i} incorporate the full information contained in the entire c_{i} (r) profile. As a result, it is widely appreciated that the hindex can underrate the value of the bestcited papers, since once a paper transitions into the region r ≤ h_{i} , its citation record is discounted, until other lesscited papers with r > h_{i} eventually overcome the rank “barrier” r = h_{i} . Moreover, as noted in^{14}, the papers for which r > h_{i} do not contribute any additional credit.
Instead of choosing an arbitrary h_{p} as an productivityimpact indicator, we use the analytic properties of the DGBD to calculate a crossover value . In the methods section, we derive an exact expression for which highlights the distinguished papers of a given author. To calculate , we use the logarithmic derivative χ(r) ≡ d ln c(r)/dr to quantify the relative change in c_{i} (r) with increasing r. We defined papers as “distinguished” if they satisfy the inequality , where is the average value of χ(r) over the entire range of r values. This inequality selects the peak papers which are significantly more cited than their neighbors. The peak region corresponds to a “knee” in c_{i} (r) when plotted on loglinear axes. The dependence of and on the three DGBD parameters β_{i} , γ_{i} and N_{i} are provided in the methods section.
The advantage of is that this characteristic rank value is a comprehensive representation of the stellar papers in the highrank scaling regime since it depends on the DGBD parameter values β_{i} , γ_{i} and N_{i} and thus probes the entire citation profile. Fig. 3 shows a scatter plot of the “cstar” and h_{i} values calculated for each scientist and demonstrates that there is a nontrivial relation between these two singlevalue indices. It also shows that for scientists within a small range of c* there is a large variation in the corresponding h values, in some cases straddling across all three sets of scientists. Also, there are several values which significantly deviate from the trend in Fig. 3, which is plotted on loglog axes. These results reflect the fact that the hindex cannot completely incorporate the entire c_{i} (r) profile. We plot the histogram of and values in Figs. S4 and S5, respectively.
To further contrast the values of and the hindex, we propose the “peak indicator” ratio , which corrects specifically for the hindex penalty on the stellar papers in the peak region of c_{i} (r). Thus, all papers in the peak region of c_{i} (r) satisfy the condition c_{i} (r) ≥ h_{i} Λ _{i} . In an extreme example, R. P. Feynman has a peak value Λ ≈ 36, indicating that his best papers are monumental pillars with respect to his other papers which contribute to his hindex. Fig. S6 shows the histogram of Λ _{i} values, with typical values for dataset [A] scientists 〈Λ〉 ≈ 3.4 ± 3.9 and for dataset [B] scientists 〈Λ〉 ≈ 2.2 ± 1.1. This indicator can only be used to compare scientists with similar h values, since a small h_{i} can result in a large Λ _{i} .
An alternative “single number” indicator is C_{i} , an author's total number of citations
which incorporates the entire c_{i} (r) profile. However, it has been shown that correlates well with h_{i}^{31}, a result which we will demonstrate in Eq. [6] to follow directly from a c_{i} (r) with β_{i} ≈ 1.
We test the aggregate properties of c_{i} (r) by calculating the aggregate number of citations C_{β,h} for a given profile,
where H_{N′,β} is the generalized harmonic number and is of order O(1) for β ≈ 1. We neglect the γ_{i} scaling regime since the lowrank papers do not significantly contribute to an author's C_{i} tally. We approximate the coefficient A in Eq. [6] using the definition c(h) ≡ h, which implies that A/h^{β} ≈ h. We use the value N′ ≡ 3 h, so that C_{β,h} can be approximated by only the two parameters h_{i} and β_{i} for any given author. We justify this choice of N′ by examining the rescaled c_{i} (r/h), which we consider to be negligible beyond rank r = 3 h_{i} for most scientists. In Fig. 4(a), we plot for each scientist the predicted C_{β,h} value versus the empirical C_{i} value and we find excellent agreement with our theoretical prediction given by Eq. [6]. In Fig. 4(b), we plot for each scientist the total number of citations using the bestfit DGBD model c_{m} (r) ≡ c_{i} (r; β_{i} , γ_{i} , A_{i} , r_{1}) to approximate c_{i} (r). The excellent agreement demonstrates that the fluctuations in the residual difference c_{m} (r) − c_{i} (r) cancel out on the aggregate level. Furthermore, a comparison of the quality of agreement between the theoretical C_{i} values and the empirical C_{i} values in Fig. 4(a) and (b) shows the importance of the additional γ_{i} scaling regime in the DGBD model.
Discussion
We use the DGBD model to provide an analytic description of c_{i} (r) over the entire range of r and provide a deeper quantitative understanding of scientific impact arising from an author's career publication works. The DGBD model exhibits scaling behavior for both large and small r, where the scaling for small r is quantified by the exponent β_{i} , which for many scientists analyzed, can be approximated using only two values of the generalized hindex h_{p} (see SI text). In particular, we show that for a given hvalue, a larger β_{i} value corresponds to a more prolific publication career, since .
Many studies analyze only the high rank values of generic Zipf ranking profiles c(r), e.g. computing the scaling regime for r < r_{c} below some some rank cutoff r_{c} . However, these studies cannot quantitatively relate the large observations to the small observations within the system of interest. To account for this shortcoming, our method for calculating the crossover values , r_{×} and , which we elaborate in the methods section, can be used in general to quantitatively distinguish relatively large observations and relatively small observations within the entire set of observations. Moreover, the DGBD model has been shown to have wide application in quantifying the Zipf rank profiles in various phenomena^{21}.
To measure the upward mobility of a scientist's career, in the SI text we address the question: given that a scientist has index h, what is her/his most likely hindex value Δt years in the future? In consideration of the bulk of c_{i} (r) and following from the regularity of c_{i} (r) for r ≈ h, we propose a modelfree gapindex G(Δh) as both an estimate and a target for future achievement which can be used in the review of career advancement. The gap index G(Δh), defined as a proxy for the total number of citations a scientist needs to reach a target value h+Δh, can detect the potential for fast hindex growth by quantifying c_{i} (r) around h. This estimator differs from other estimators for the timedependent hindex^{33,34,35} in that G(Δh) is model independent.
Even though the productivity of scientists can vary substantially^{9,36,37,38,39} and despite the complexity of success in academia, we find remarkable statistical regularity in the functional form of c_{i} (r) for the scientists analyzed here from the physics community. Recent work in^{8,9,40} calculates the citation distributions of papers from various disciplines and shows that proper normalization of impact measures can allow for comparison across time and discipline. Hence, it is likely that the publication careers of productive scientists in many disciplines obey the statistical regularities observed here for the set of 300 physicists. Towards developing a model for career evolution, it is still unclear how the relative strengths of two contributing factors (i) the extrinsic cumulative advantage effect^{2,3,9} versus (ii) the intrinsic role of the “sacred spark” in combination with intellectual genius^{37} manifest in the parameters of the DGBD model.
With little calculation, the β_{i} metric developed here, used in conjunction with the h_{i} , can better answer the question, “How popular are your papers?”^{41}. Since the cumulative impact and productivity of individual scientists are also found to obey statistical laws^{9,11}, it is possible that the competitive nature of scientific advancement can be quantified and utilized in order to monitor career progress. Interestingly, there is strong evidence for a governing mechanism of career progress based on cumulative advantage^{9,11,42} coupled with the the inherent talent of an individual, which results in statistical regularities in the career achievements of scientists as well as professional athletes^{11,43,44}. Hence, whenever data are available^{45,46}, finding statistical regularities emerging from human endeavors is a first step towards better understanding the dynamics of human productivity.
Methods
Selection of scientists and data collection
We use disambiguated “distinct author” data from ISI Web of Knowledge. This online database is host to comprehensive data that is wellsuited for developing testable models for scientific impact^{9,32,40} and career progress^{11}. In order to approximately control for disciplinespecific publication and citation factors, we analyze 300 scientists from the field of physics.
We aggregate all authors who published in Physical Review Letters (PRL) over the 50year period 1958–2008 into a common dataset. From this dataset, we rank the scientists using the citations shares metric defined in^{9}. This citation shares metric divides equally the total number of citations a paper receives among the n coauthors and also normalizes the total number of citations by a timedependent factor to account for citation variations across time and discipline.
Hence, for each scientist in the PRL database, we calculate a cumulative number of citation shares received from only their PRL publications. This tally serves as a proxy for his/her scientific impact in all journals. The top 100 scientists according to this citation shares metric comprise dataset [A]. As a control, we also choose 100 other dataset [B] scientists, approximately randomly, from our ranked PRL list. The selection criteria for the control dataset [B] group are that an author must have published between 10 and 50 papers in PRL. This likely ensures that the total publication history, in all journals, be on the order of 100 articles for each author selected. We compare the tenured scientists in datasets A and B with 100 relatively young assistant professors in dataset [C]. To select dataset [C] scientists, we chose two assistant professors from the top 50 U.S. physics and astronomy departments (ranked according to the magazine U.S. News).
For privacy reasons, we provide in the SI tables only the abbreviated initials for each scientist's name (last name initial, first and middle name initial, e.g. L, FM). Upon request we can provide full names.
We downloaded datasets A and B from ISI Web of Science in Jan. 2010 and dataset C from ISI Web of Science in Oct. 2010. We used the “Distinct Author Sets” function provided by ISI in order to increase the likelihood that only papers published by each given author are analyzed. On a case by case basis, we performed further author disambiguation for each author.
Statistical significance tests for the c(r) DGBD model
We test the statistical significance of the DGBD model fit using the χ ^{2} test between the 3parameter bestfit DGBD c_{m} (r) and the empirical c_{i} (r). We calculate the pvalue for the χ ^{2} distribution with r_{1} − 3 degrees of freedom and find, for each data set, the number of c_{i} (r) with pvalue [A], 19 [B], 22 [C] for p_{c} = 0.05 and 8 [A], 22 [B], 37 [C] for p_{c} = 0.01.
The significant number of c_{i} (r) which do not pass the χ ^{2} test for P_{c} = 0.05, results from the fact that the DGBD is a scaling function over several orders of magnitude in both r and c_{i} (r) values and so the residual differences [c_{i} (r) − c_{m} (r)] are not expected to be normally distributed since there is no characteristic scale for scaling functions such as the DGBD. Nevertheless, the fact that so many c_{i} (r) do pass the χ ^{2} test at such a high significance level, provides evidence for the qualityoffit of the DGBD model. For comparison, none of the c_{i} (r) pass the χ ^{2} test using the powerlaw model at the P_{c} = 0.05 significance level. In the next section, we will also compare the macroscopic agreement in the total number of citations for each scientist and the total number of citations predicted by the DGBD model for each scientist and find excellent agreement.
Derivation of the characteristic DGBD r values
Here we use the analytic properties of the DGBD defined in Eq. [3] to calculate the special r values from the parameters β, γ and N which locate the two tail regimes of c(z) and in particular, the distinguished paper regime. The scaling features of the DGBD do not readily convey any characteristic scales which distinguish the two scaling regimes. Instead, we use the properties of ln c_{i} (r) to characterize the crossover between the highrank and the lowrank regimes of c_{i} (r).
We begin by considering c_{i} (r) under the centered rank transformation z = r − z_{0}, where z_{0} = (N + 1)/2, then
in the domain z ∈ [− (z_{0} − 1), (z_{0} − 1)]. The logarithmic derivative of c(z) expresses the relative change in c(z),
where x = z/z_{0}, and . The extreme values of for are given by
and the average value is calculated by,
The function χ(z) takes on the value of twice at the values corresponding to the solutions to the quadratic equation,
which has the solution
for . Converting back to rank, then
and so the value is the special rank value which distinguishes the set of excellent papers of each given author. The cstar value c_{i} (r*) is thus a characteristic value arising from the special analytic properties of c_{i} (r). This method for determining the crossover value r* can be applied to any general rank order profile which can be modeled by the DGBD.
Furthermore, the crossover z_{x} between the β scaling regime and the γ scaling regime is calculated from the inflection points of ln c(z),
which has 2 solutions , where . only is a physical solution. Transforming back to rank values, we find . We illustrate these special z values in Fig. 5.
References
Mazloumian, A., Eom, Y.H., Helbing, D., Lozano, S., Fortunato, S. How citation boosts promote scientific paradigm shifts and Nobel prizes. PLoS ONE 6(5), e18975 (2011).
Merton, R. K. The Matthew effect in science. Science 159, 56–63 (1968).
Merton, R. K. The Matthew effect in science, II: Cumulative advantage and the symbolism of intellectual property. ISIS 79, 606–623 (1988).
Cole, J. R. Social Stratification in Science (Chicago, Illinois, The University of Chicago Press, 1981).
Guimera, R., Uzzi, B., Spiro, J., Amaral, L. A. N. Team assembly mechanisms determine collaboration network structure and team performance. Science 308, 697–702 (2005).
Malmgren, R. D., Ottino, J. M., Amaral, L. A. N. The role of mentorship in protégé performance. Nature 463, 622–626 (2010).
Azoulay, P., Zivin, J. S. G., & Wang, J. Superstar Extinction. Q. J. of Econ. 125 (2), 549–589 (2010).
Radicchi, F., Fortunato, S. & Castellano, C. Universality of citation distributions: Toward an objective measure of scientific impact. Proc. Natl. Acad. Sci. USA 105, 17268–17272 (2008).
Petersen, A. M., Wang, F., Stanley, H. E. Methods for measuring the citations and productivity of scientists across time and discipline. Phys. Rev. E 81, 036114 (2010).
Simonton, D. K. Creative productivity: A predictive and explanatory model of career trajectories and landmarks. Psychol. Rev. 104, 66–89 (1997).
Petersen, A. M., Jung, W.–S., Yang, J.–S. & Petersen, A. M., Jung, W.–S., Yang, J.–S. & Stanley, H. E. Quantitative and empirical demonstration of the Matthew effect in a study of career longevity. Proc. Natl. Acad. Sci. USA 108, 18–23 (2011).
Wu, J., Lozano, S., Helbing, D. Empirical study of the growth dynamics in real career hindex sequences. J. Informetrics 5, 489–497 (2011). (In press)
Petersen, A. M., Riccaboni, M., Stanley, H. E., Pammolli, F. Persistency and Uncertainty in the Academic Career. (2011). In preparation.
Hirsch, J. E. An index to quantify an individual's scientific research output. Proc. Natl. Acad. Sci. USA 102, 16569–16572 (2005).
Bornmann, L., Mutz, R., Daniel, H.–J. Are there better indices for evaluation purposes than the h Index? A comparison of nine different variants of the h Index using data from biomedicine. JASIST 59, 001–008 (2008).
Egghe, L. Theory and practise of the gindex. Scientometrics 69, 131–152 (2006).
Zhang, C–T. Relationship of the hindex, gindex and eindex. JASIST 62, 625–628 (2010).
van Eck, J. N., Waltman, L. Generalizing the h and gindices. J. Informetrics 2, 263–271 (2008).
Wu, Q. The windex: A measure to assess scientific impact focusing on widely cited papers. JASIST 61, 609–614 (2010).
Naumis, G. G., Cocho, G. Tail universalities in rank distributions as an algebraic problem: The betalike function. Physica A 387, 84–96 (2008).
MartinezMekler, G., Martinez, R. A., del Rio, M. B., Mansilla, R., Miramontes, P., Cocho, G. Universality of rankordering distributions in the arts and sciences. PLoS ONE 4, e4791 (2009).
Egghe, L., Rousseau, R. An informetric model for the Hirschindex. Scientometrics 69, 121–129 (2006).
Zipf, G. Human Behavior and the principle of least effort (Cambridge, MA, AddisonWesley, 1949).
Gabaix, X. Zipf's law for cities: An explanation. Q. J. of Econ. 114 (3), 739–767 (1999).
ISI Web of Knowledge: www.isiknowledge.com/
Henzinger, M., Sunol, J., Weber, I. The stability of the hindex. Scientometrics 84, 465–479 (2010).
Hirsch, J. E. Does the h index have predictive power. Proc. Natl. Acad. Sci. USA 104, 19193–19198 (2008).
Batista, P. D., Campiteli, M. G., Martinez, A. S. Is it possible to compare researchers with different scientific interests? Scientometrics 68, 179–189 (2006).
Iglesias, J. E., Pecharromán, C. Scaling the hindex for different scientific ISI fields. Scientometrics 73, 303–320 (2007).
Bornmann, L., Daniel, H.–J. What do we know about the h index? JASIST 58, 1381–1385 (2007).
Redner, S. On the meaning of the hindex. J. Stat. Mech. 2010, L03005 (2010).
Radicchi, F., Fortunato, S., Markines, B., Vespignani, A. Diffusion of scientific credits and the ranking of scientists. Phys. Rev. E 80, 056103 (2009).
Egghe, L. Dynamic hIndex: the Hirsch index in function of time. JASIST 58, 452–454 (2006).
Burrell, Q. L. Hirsch's hindex: A stochastic model. J. Informetrics 1, 16–25 (2007).
Guns, R., Rousseau, R. Simulating growth of the hindex. JASIST 60, 410–417 (2009).
Shockley, W. On the statistics of individual variations of productivity in research laboratories. Proc. of the IRE 45, 279–290 (1957).
Allison, A. D., Stewart, J. A. Productivity differences among scientists: Evidence for accumulative advantage. Amer. Soc. Rev. 39(4), 596–606 (1974).
Huber, J. C. Inventive productivity and the statistics of exceedances. Scientometrics 45, 33–53 (1998).
Peterson, G. J., Presse, S., Dill, K. A. Nonuniversal power law scaling in the probability distribution of scientific citations. Proc. Natl. Acad. Sci. USA 107, 16023–16027 (2010).
Radicchi, F., Castellano, C. Rescaling citations of publications in Physics. Phys. Rev. E 83, 046116 (2011).
Redner, S. How popular is your paper? An empirical study of the citation distribution. Eur. Phys J. B 4, 131–134 (1998).
De Solla Price, D. A general theory of bibliometric and other cumulative advantage processes. JASIST 27, 292–306 (1976).
Petersen, A. M., Jung, W.S. & Stanley, H. E. On the distribution of career longevity and the evolution of homerun prowess in professional baseball. EPL 83, 50010 (2008).
Petersen, A. M., Penner, O. & Stanley, H. E. Methods for detrending success metrics to account for inflationary and deflationary factors. Eur. Phys. J. B 79, 67–78 (2011).
Lazer, D., et al. Computational social science. Science 323, 721–723 (2009).
Castellano, C., Fortunato, S., Loreto, V. Statistical physics of social dynamics. Rev. Mod. Phys. 81, 591–646 (2009).
Redner, S. Citation statistics from 110 years of Physical Review. Phys. Today. 58, 49–54 (2005).
Acknowledgements
We thank J. E. Hirsch and J. Tenenbaum for helpful suggestions.
Author information
Authors and Affiliations
Contributions
A. M. P., H. E. S., & S. S. designed research, performed research, wrote, reviewed and approved the manuscript. A. M. P. performed the numerical and statistical analysis of the data.
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Electronic supplementary material
Supplementary Information
Supplementary Information Text
Rights and permissions
This work is licensed under a Creative Commons AttributionNonCommercialNo Derivative Works 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/byncnd/3.0/
About this article
Cite this article
Petersen, A., Stanley, H. & Succi, S. Statistical regularities in the rankcitation profile of scientists. Sci Rep 1, 181 (2011). https://doi.org/10.1038/srep00181
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/srep00181
This article is cited by

Assessing statistical hurricane risks: nonlinear regression and timewindow analysis of North Atlantic annual accumulated cyclonic energy rank profile
Natural Hazards (2021)

Universal trajectories of scientific success
Knowledge and Information Systems (2018)

A theoretical model of the relationship between the hindex and other simple citation indicators
Scientometrics (2017)

The hindex as an almostexact function of some basic statistics
Scientometrics (2017)

Analysis of bibliometric indicators for individual scholars in a large data set
Scientometrics (2013)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.