Predicting scientific success

Acuna, Daniel E.; Allesina, Stefano; Kording, Konrad P.

doi:10.1038/489201a

Download PDF

Comment
Published: 12 September 2012

Future impact

Predicting scientific success

Daniel E. Acuna¹,
Stefano Allesina² &
Konrad P. Kording³

Nature volume 489, pages 201–202 (2012)Cite this article

26k Accesses
185 Citations
313 Altmetric
Metrics details

Subjects

Daniel E. Acuna, Stefano Allesina and Konrad P. Kording present a formula to estimate the future h-index of life scientists.

متوفر باللغة العربية

We research scientists often worry about the future of our careers. Is our research an exciting path or a dead end that will end our careers prematurely? Predicting scientific trajectories is a daily task for hiring committees, funding agencies and department heads who probe CVs searching for signs of scientific potential.

One popular measure of success is physicist Jorge Hirsch's h-index¹, which captures the quality (citations) and quantity (number) of papers, thus representing scientific achievements better than either factor alone. A scientist has an h-index of n if he or she has published n articles receiving at least n citations each². Einstein, Darwin and Feynman, for example, have impressive h-indices of 96, 63 and 53, respectively. According to Hirsch, an h-index of 12 for a physicist — meaning 12 papers with at least 12 citations each — could qualify him or her for tenure at a major university.

Credit: ILLUSTRATION BY DAVID PARKINS

However, the h-index³ and similar metrics⁴ can capture only past accomplishments, not future achievements⁵. Here we attempt to predict the future h-index of scientists on the basis of features found in most CVs.

We maintain that the best way of predicting a scientist's future success is for peers to evaluate scientific contributions and research depth, but think that our methods could be valuable complementary tools.

The typical research CV contains information on the number of publications, those in high-profile journals, the h-index and collaborators. One can also infer interdisciplinary breadth, the length and quality of training, the amount of funding received and even the standing of the scientist's PhD adviser. Such factors are taken into account for hiring decisions, but how should they be weighted? Fortunately, obtaining data on the scientific activities of individual researchers has never been easier. Using all of these features, we can begin to probe the scientific enterprise statistically.

Vital statistics

To construct a formula to predict future h-index, we assembled a large data set and analysed it using machine-learning techniques. Our initial sample from academictree.org — a crowd-sourced website listing scientists' mentors, trainees and collaborators — contains the names and institutions of about 34,800 neuroscientists, 2,000 scientists studying the fruitfly Drosophila and 1,300 evolutionary researchers. We matched these authors to records in Scopus, an online database of academic papers and citation data. We restricted our analysis to authors who had accrued an h-index greater than 4 (to exclude inactive scientists); to publications after 1995 (because electronic records are sparse before then); to authors who had published their first manuscript in the past 5–12 years; and to authors who were identifiable in Scopus.

That left us with 3,085 neuroscientists, 57 Drosophila researchers and 151 evolutionary scientists for whom we constructed a history of publication, citation and funding.

For each year since the first article published by a given scientist, we used the features that were available at the time to forecast their h-index a number of years into the future. For example, we reconstructed how the CV features of a scientist looked five years after publishing his or her first article, and found a relationship between those features and the reconstructed h-index five years on.

Starting with neuroscientists, we attempted to predict the h-index of each scientist 5 years ahead — a timescale relevant for tenure decisions — using a linear regression with elastic net regularization⁶(see Supplementary Information). The model predicted the future h-index accurately, yielding a respectable R²=0.67, cross-validated across scientists (an R² of 1 would imply that the model predicts the data perfectly). A simplified model containing only the number of published articles, the h-index, years since first publication, number of publications in prestigious neuroscience journals (Nature, Science, Nature Neuroscience, Neuron and the Proceedings of the National Academy of Sciences) and the number of distinct journals still performed nearly equally well (R²=0.66; see 'Predict your future h -index').

Predicting the future careers of Drosophila and evolutionary scientists leads to somewhat worse predictions (R²=0.54 and R²=0.61, respectively, based on scientists 3–15 years into their careers) but still better than predictions based on the h-index alone (R²=0.38 and R²=0.39, respectively). This indicates that generalizations to other fields within and outside of life science may be limited¹. But for neuroscientists, at least, the predictions extend well to longer periods of time, such as ten years into the future (R²=0.52). Over time, using just the h-index performs much worse than taking all features into account (see 'Paths to success', left panel).

The main five predictive features change in importance for predicting h-indices over increasingly longer periods (see 'Paths to success', right panel). The power of the h-index declines. The number of articles written, the diversity of publication in distinct journals and the number of articles published in five prestigious journals all become increasingly influential over time.

Future fortunes

It is risky to make any causal interpretations of these results. However, we will briefly speculate on why these features might be important predictors of future success. Some features directly affect the potential for a high h-index, such as the number of articles written. These features can also indirectly affect a scientist's future success, because scientists who are productive and publish many papers tend to remain productive. Publishing in many different journals may lead to fewer overlapping populations of scientists who cite the work, and hence higher growth potential for articles. A scientist who has published in several distinct journals is also likely to be someone with broad training who contributes in many ways. The number of publications in leading journals can increase the visibility of a scientist's other papers, past and future.

If promotion, hiring or funding were largely based on indices (h-index, the model used here or any other measure), then some scientists would adapt their behaviour to maximize their chances of success. Models such as ours that take into account several dimensions of scientific careers should be more difficult for researchers to game than those that focus on a single measure.

Our formula is particularly useful for funding agencies, peer reviewers and hiring committees who have to deal with vast numbers of applications and can give each only a cursory examination. Statistical techniques have the advantage of returning results instantaneously and in an unbiased way. Building and analysing massive data sets to track scientific careers could also help to identify potential gender, racial and other biases^7,8,9 and advance our understanding of how science develops.

Although our findings and predictions may not alleviate scientists' angst over their careers, the results offer some comfort by showing that the future is not so random. The occasional rejection of a paper may feel unjust and indiscriminate, but in the long run, such factors seem to average out, rendering h-index trajectories relatively predictable.

Box 1: Metrics: Predict your future h-index

These are approximate equations for predicting the h-index of neuroscientists in the future. They are probably reasonably precise for life scientists, but likely to be less meaningful for the other sciences. Try it for yourself online at go.nature.com/z4rroc.

• Predicting next year (R² = 0.92):

• Predicting 5 years into the future (R² = 0.67):

• Predicting 10 years into the future (R² = 0.48):

Key: n, number of articles written; h, current h-index; y, years since publishing first article; j, number of distinct journals published in; q, number of articles in Nature, Science, Nature Neuroscience, Proceedings of the National Academy of Sciences and Neuron.

References

Hirsch, J. E. Proc. Natl Acad. Sci. USA 102, 16569–16572 (2005).
Article ADS CAS Google Scholar
Redner, S. J. Stat. Mech. Theory Exp. 3, L03005 (2010).
Google Scholar
Peterson, I. ScienceNews 2 December 2005; available at http://go.nature.com/iawd5o.
Alonso, S., Cabrerizo, F. J., Herrera-Viedma, E. & Herrera, F. J. Informetr. 3, 273–289 (2009).
Article Google Scholar
Hirsch, J. E. Proc. Natl Acad. Sci. USA 104, 19193–19198 (2007).
Article ADS CAS Google Scholar
Zou, H. & Hastie, T. J. Roy. Stat. Soc. B 67, 301–320 (2005).
Article Google Scholar
Dwan, K. et al. PLoS ONE 3, e3081 (2008).
Article ADS Google Scholar
Ginther, D. K. et al. Science 333, 1015–1019 (2011).
Article ADS CAS Google Scholar
Allesina, S. PLoS ONE 6, e21160 (2011).
Article ADS CAS Google Scholar

Download references

Author information

Authors and Affiliations

Daniel E. Acuna is a research associate at the Rehabilitation Institute of Chicago, Illinois 60611, USA, and a research affiliate in biomedical engineering at Northwestern University, Evanston, Illinois 60208, USA.,
Daniel E. Acuna
Stefano Allesina is assistant professor in ecology and evolution and at the Computation Institute at the University of Chicago, Illinois 60637, USA.,
Stefano Allesina
Konrad P. Kording is associate professor of physical medicine and rehabilitation, physiology, and applied mathematics at Northwestern University, and at the Rehabilitation Institute of Chicago.,
Konrad P. Kording

Authors

Daniel E. Acuna
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Allesina
View author publications
You can also search for this author in PubMed Google Scholar
Konrad P. Kording
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel E. Acuna.

Supplementary information

Supplementary Information (PDF 348 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Acuna, D., Allesina, S. & Kording, K. Predicting scientific success. Nature 489, 201–202 (2012). https://doi.org/10.1038/489201a

Download citation

Published: 12 September 2012
Issue Date: 13 September 2012
DOI: https://doi.org/10.1038/489201a

This article is cited by

Exploring the determinants of research performance for early-career researchers: a literature review
- Danielle Lee
Scientometrics (2024)
Categorization and correlational analysis of quality factors influencing citation
- Asma Khatoon
- Ali Daud
- Tehmina Amjad
Artificial Intelligence Review (2024)
Early-career factors largely determine the future impact of prominent researchers: evidence across eight scientific fields
- Alexander Krauss
- Lluís Danús
- Marta Sales-Pardo
Scientific Reports (2023)
Temporal trends in academic performance and career duration of principal investigators in ecology and evolutionary biology in Taiwan
- Gen-Chang Hsu
- Wei-Jiun Lin
- Syuan-Jyun Sun
Scientometrics (2023)
Is there a differentiated gender effect of collaboration with super-cited authors? Evidence from junior researchers in economics
- Rodrigo Dorantes-Gilardi
- Aurora A. Ramírez-Álvarez
- Diana Terrazas-Santamaría
Scientometrics (2023)