Altmetrics: diversifying the understanding of influential scholarship

The increase in the availability of data about how research is discussed, used, rated, recommend, saved and read online has allowed researchers to reconsider the mechanisms by which scholarship is evaluated. It is now possible to better track the influence of research beyond academia, though the measures by which we can do so are not yet mature enough to stand on their own. In this article, we examine a new class of data (commonly called “altmetrics”) and describe its benefits, limitations and recommendations for its use and interpretation in the context of research assessment. This article is published as part of a collection on the future of research assessment.


Introduction
T o date, academia's traditional framework for attempting to understand influential scholarship has lacked a concern with "real world" impacts. In the sciences, supplementing peer review by considering researchers' expert-granted awards and their publications' citation-based metrics has meant that their influence has only been measured among other researchers. In the social sciences, arts and humanities, research assessment has been similarly limited to tracking impact within academia, using the prestige of one's publisher as a proxy for the importance of one's work (again, as a supplement for peer review practices).
What of influence beyond the academy? In recent years, the increased use of the social web by scholars and civilians alike to discuss research has meant that it is now possible to broaden our understanding of what makes for "influential" scholarship. By mining this scholarly Big Data, described popularly as "altmetrics", we can start to understand influence beyond what has traditionally been recognized, seeing researchers' marks on culture, policy, the economy and education.
The rise in available, diverse impact data has been met by an increased demand upon researchers to prove that the work they are pursuing is of relevance to "the real world". Funding agencies, governments, and even university administrators are now tasking researchers with showcasing the value of what they do beyond the academy. While this trend has been decried as "injurious neoliberalism" by some (Gill, 2009), others have welcomed a change that better rewards them for pursuing research that has a direct effect upon society, and for doing outreach to the public (Terras, 2012;Howard, 2013;Piwowar, 2013).
In this article, I will examine the use of altmetrics in evaluating research.
Altmetrics complement the dominant understanding of influence. Mapping researchers' influence on the Internet has been of concern since at least the late 1990s (Cronin et al., 1998). Altmetrics as a concept, however, is much younger, having been first articulated in 2010 by a group of scientists in the Altmetrics Manifesto (Priem et al., 2010). The authors point out that in using data from the social web, we can start to track and quantify interactions with scholarship that were previously invisible: … that dog-eared (but uncited) article that used to live on a shelf now lives in Mendeley, CiteULike, or Zotero-where we can see and count it. That hallway conversation about a recent finding has moved to blogs and social networks-now, we can listen in. The local genomics dataset has moved to an online repository-now, we can track it. This diverse group of activities forms a composite trace of impact far richer than any available before. We call the elements of this trace altmetrics (Priem et al., 2010).
No widely accepted formal definition for altmetrics exists. Thelwall and Kousha (2015a) have characterized altmetrics as being "derived from social media (for example, social bookmarks, comments, ratings, tweets)" and distinct from "web citations in digitised scholarly documents (for example, eprints, books, science blogs or clinical guidelines)"-that is, references to research within online sources that are formally "cited" in the manner that references appear in formally published, peer reviewed literature. Holmberg (2014) has similarly defined altmetrics as pertaining specifically to social media. Moed (2015), on the other hand, casts a much larger net in defining altmetrics, defining them simply as "traces of the computerization of the research process"; the NISO Altmetrics Initiative (2016) has similarly offered a broad and all-encompassing definition in a recent report. At least one definition lies in between the two extremes: Haustein (2016) sees altmetrics as overlapping with various types of informetrics (scientometrics, webometrics and bibliometrics, to name a few), having both distinctions from each and sharing characteristics with each.
There are a number of characteristics that apply across the board to to altmetrics. Altmetrics are noted for being quick to accumulate, available for any research output format (that is, not just journal articles or books, but also datasets, software and presentations), and useful for understanding the use and uptake of scholarship among many audiences (Priem et al., 2010;Sud and Thelwall, 2013;Kousha and Thelwall, 2015a, b).
Altmetrics are also, by their very nature, diverse and everchanging: as Moed (2015) points out, anything that can be textmined from the Web is potentially a type of altmetric; as such, there is no canonical set list of websites or data sources that comprise "proper" or "real" altmetrics. To date, researchers have studied the following data sources under the banner of altmetrics: • However, as we examine below, the diversity of data represented under the nebulous umbrella term "altmetrics" means that various types of altmetrics data can mean very different things depending on who is using research, what they are doing with it, and what implications their use has for understanding that research's influence upon the world.
Patterns exist in how research is used online, which can expose so-called "flavors of impact" for scholarship (Piwowar, 2012;Priem et al., 2012). One study (Priem et al., 2012) found that at least four distinct "flavors" exist for scientific publications: items that had been "Read, bookmarked, and shared" online; items that had been "Read and cited"; "expert picks", which have been recommended on Faculty of 1,000 Prime, bookmarked on Mendeley, and otherwise used by scholars; and "popular hits", which have been read often and shared on social media, but have not seen much attention from scholarly social networks. Other researchers have pointed to the ability to text-mine syllabi , book reviews (Zuccala et al., 2015;Kousha and Thelwall, 2015c) and Mendeley bookmarks (Mohammadi et al., 2016) as ways to track the "flavors" of educational impact, public popularity, and scholarly readership and intent to cite, respectively.
For altmetrics related to books and journal articles, a commonly asked question is, "Does this indicator correlate to citations?" Mostly, these metrics do not. Researchers have found only moderate correlations between citations and Mendeley readership (Li and Thelwall, 2012;Priem et al., 2012), Faculty of 1,000 Prime ratings (Priem et al., 2012;Waltman and Costas, 2014), and mentions to research in scholarly blogs Shema et al., 2014). Weak and even negative correlations exist for indicators like tweets Bornmann, 2015),  This lack of correlation between most altmetrics and citations is a significant finding, but not for the reason that most assume. Some researchers have suggested that the lack of strong correlation shows us that altmetrics can help us uncover new "flavors of impact", beyond the scholarly impact that we have traditionally been preoccupied with (Priem et al., 2012;Thelwall et al., 2013). However, other researchers have cautioned that far more research (such as source content analysis or creator interviews) is needed to fully understand the nature of attention and impact that various altmetrics represent (Sud and Thelwall, 2013;Bornmann, 2016).
The broad net cast by altmetrics also allows for contributions by software developers, data curators and other collaborators on the average research project to be better credited for their work. Often, these collaborators are crucial to a research project but do not contribute to writing related articles or books. Thus, in a system focused on counting citations or reading publisher bylines-one that rewards authorship rather than contributorship-these "nontraditional" researcher roles cannot be properly recognized. Were we to imagine a system where hard evidence showcasing the value of all types of contributions were accepted for professional advancement (the number of users of their software (Singh Chawla, 2016), adaptations of their datasets (Peters et al., 2016) and so on), these researchers could get the credit they deserve.
Full of promise, but currently imperfect. The many benefits of altmetrics as a class of complementary impact metrics should not overshadow their current limitations, which have been identified by Wouters and Costas (2012) as including: • They do not "meet crucial requirements for data quality and indicator construction" … meaning that certain "web based tools may create statistics and indicators on incorrect data, without being possible for the user to detect or correct the data properly"; • In general, few altmetrics data sources normalize their data, making cross-discipline comparisons difficult; and • Most tools are not transparent about data coverage (that is, what disciplines are included, what sources are indexed, or other such details about how data is gathered).
Moreover, in many ways altmetrics' limitations mirror those of other quantitative impact metrics: • They mean little in isolation.
• They are subject to disciplinary and other biases.
• They can be gamed. And • Though we use the shorthand of "impact metrics" to describe altmetrics, they actually measure attention, not true impact.
Much like citation counts, altmetrics cannot be properly interpreted if they are used in isolation. After all, do 13 Wikipedia mentions for a research article mean that an article is performing well or poorly? One has to use disciplinary and age-based comparisons to truly understand these numbers. Those 13 Wikipedia mentions for a biomedical research article published last year may turn out to be a lot, if the average article in that discipline, published in that same time frame, has only received 4 Wikipedia mentions. Citation-based indicators like the Source Normalized Impact per Paper and Scimago Journal Rank were created for a similar reason: to allow for discipline-and ageappropriate comparisons of research articles and journals (Falagas et al., 2008;Moed, 2011).
To provide the necessary context to altmetrics, several approaches have been used and recommended to date. Hicks et al. (2015) recommend the use of percentiles, in particular, as a means for providing such context. Percentiles are favored by altmetrics services like Altmetric and Impactstory (that is, "This article has a high score compared to outputs of the same age and source (97th percentile)"). Researchers have proposed normalized counts for Mendeley readership (Bornmann and Haunschild, 2016a;Haunschild and Bornmann, 2016) and Twitter mentions (Bornmann and Haunschild, 2016b), allowing cross-discipline and cross-time comparisons to be made. Sentiment analysis of altmetrics like tweets have also been proposed as a means to better understand what is actually being said about a piece of research on a wide scale (Friedrich et al., 2015). The use of "baskets of metrics" has been recommended by groups like the HEFCE Metrics Review panel (Wilsdon et al., 2015) and the Snowball Metrics initiative (Colledge, 2014), encouraging researchers to use many related, appropriate metrics at once to showcase particular "flavors of impact" for their scholarship.
Another challenge of altmetrics lies in their disciplinary biases. Research in biomedical science, social science and the humanities have been shown to garner more online attention than scholarship from other disciplines , making crossdisciplinary comparisons with raw metrics impossible without the use of percentiles or weighted indicators. Research has shown that certain altmetrics currently reflect gender (Paul-Hus et al., 2015) and regional biases (Alperin, 2015).
Gaming is another concern for altmetrics, though to date there have not been any major cases of purposeful manipulation of altmetrics for personal gain. Perhaps the biggest altmetrics gaming danger lies in benevolent Twitter bots (accounts set up to tweet when new papers are added to a repository or when research on a particular topic is discussed in the media), which one study has shown account for upwards of 9% of all tweets related to papers submitted to ArXiv in 2012 . Gaming of pageviews and downloads has also been of concern to publishers and repositories (Gordon et al., 2015).
Similarly, legitimate self-promotion can have an effect upon an article's altmetrics. As Adie (2013) explains, gaming exists on a spectrum along with other activities that can potentially showcase the value of research, both directly and indirectly. These activities break down into four general themes: (Fig. 1) • Legitimate Promotion (intent exists, value added): "Alice has a new paper out. She asks those grad students of hers who blog to write about it".  • Spam (no intent, no value): Spam networks pick up legitimate posts at random from others and replicate them, hoping to fool content-based analysis systems into thinking that they are real users. This is by far the most common scenario we [Altmetric] see. • Gaming (intent exists, no value): "Alice has a new paper out.
She believes that it contains important information for diabetes patients and so signs up to a "100 retweets for $$$" service". • Incidental (no intent, value but not directly related to the article): "Just tried to access paper x but hit the paywall.
By far, the biggest current limitation to altmetrics is that they are understood to measure attention, not impact. That is, altmetrics tend to be comprised of metrics that can indicate if many people are reading or discussing research, but include few metrics that can indicate if research findings are being utilized and making a positive effect upon the world (Sugimoto, 2015). This is another trait that altmetrics have in common with citations, which alone are not always a good indicator for impactful research; after all, citations can occur for many reasons (Cronin, 1984;Bornmann and Daniel, 2008).
However, we cannot rule out the possibility that certain types of altmetrics data may, with further study, be found to be early indicators of "real world" or non-traditional scholarly impact. Beyond the use of altmetrics as signals for "attention" (itself a vague concept), little is known of the motivations that underpin the actions online that result in altmetrics (Sud and Thelwall, 2013;Bornmann, 2016). Bornmann and Haunschild (forthcoming), in exploring the applicability of the Leiden Manifesto principles to altmetrics, have pointed out that altmetrics are, in theory, better suited than citations to "measure performance against the research missions of the institution, group, or researcher". However, as discussed above, more research in the way of content analyses and other investigative methods are needed to confirm the meaning of such altmetrics and to map those meanings to various impact types.
Though altmetrics currently share many of the same limitations of citations-making them poor choices for a quantitative means for understanding true research impact-these drawbacks are not immutable. As these relatively young metrics matureand as the services that provide them mature, as well-it is possible that we will start to encounter improved altmetrics, with context, clean (not gamed) data, and accurate impact measures baked-in from the start.
Recommendations for using altmetrics. Though altmetrics currently share many of the same limitations as citation-based metrics, there are a number of ways that the use of altmetrics can improve upon the use of their bibliometric predecessors. Following are recommendations specifically for researchers on how to keep altmetrics from becoming just another set of numbers that academics need to boost. These recommendations draw upon and overlap with previous recommendations made on the use of altmetrics in evaluation scenarios (Colledge, 2014;Thelwall, 2014;Wilsdon et al., 2015). Researchers should keep these recommendations in mind when using altmetrics to demonstrate the attention to and impact of their work, and administrators and reviewers should also keep them in mind when interpreting research impact metrics.
Recommendation 1: Always use altmetric counts in context. As described above and recommended in the Leiden Manifesto (Hicks et al., 2015), the best way to contextualize any metric is to compare it with averages for research published in the same discipline, year or even against authors of the same gender or nationality (given biases that exist for all characteristics (Konkiel, 2016)). Some altmetrics services (namely, Altmetric and Impactstory) offer predetermined performance percentiles for all altmetrics they provide based on year and, in the case of Altmetric, upon discipline as well. The Public Library of Science (PLOS) journals all offer a similar feature for graphs of page views and downloads (Fig. 2).
Where such pre-calculated percentiles do not already exist, it is possible to collect and calculate these contextual numbers manually (Bornmann and Haunschild, 2016a, b). However, it is recommended that this task be undertaken with the help of a bibliometrics expert such as a librarian.
Another important dimension to context is the consideration of purposes for which altmetrics may be used to document the attention or influence of research. The use of a metric (or "basket of metrics") have different implications when used to make funding decisions  as opposed to promotion and tenure decisions  or national evaluation exercises (Thelwall, 2014), for example. Researchers and evaluators should always bear this in mind.
Recommendation 2: Use altmetrics to find compelling impact evidence. Though quantitative altmetrics cannot themselves currently serve as evidence of true impact, some metrics can signal that a lot of attention is being paid to research, and in turn that "pathways to impact" exist. Examples of such pathways can include media coverage for a book, which in turn can lead to downstream cultural impact, enriching the lives of the public; citations to a journal article in public policy documents, which can be read to discover if governments are enacting laws based on research; or patient advocacy groups sharing a journal article on Twitter, which may help those affected with a disease to improve their health.
Such "pathways to impact" evidence lies in the qualitative data of which altmetrics are comprised. It is up to individuals to find those "gems" of impact evidence by using metrics to discover when attention is being paid to research in the first place.
Recommendation 3: Use "baskets of metrics," rather than one number in isolation. No single number can summarize the many flavours of impact of research, nor can it even capture the various gradients that exist within a single flavour (Hicks et al., 2015). For example, in showcasing interest from clinicians, a public health researcher might include PubMed Central pageviews, tweets from practitioners, and references to an article in Wikipedia (which over half of all doctors reportedly consult when making diagnoses (Beck, 2014)) to showcase distinct uses of an article among a particular stakeholder group: readership, discussion and use in practice. Such diverse uses cannot be communicated in a single number. As such, it is up to researchers to create their own "baskets of metrics" to communicate impact, comprised on appropriate indicators of attention and influence among a specific audience (Wilsdon et al., 2015). Starting places for assembling these "baskets" can be found in the Snowball Metrics Recipe Book (Colledge, 2014) or by creating an Impactstory profile, which offers badges highlighting attention types (Fig. 3).
Recommendation 4: Advocate for altmetrics as opportunity, not evaluation. There is worry among academics that altmetrics may become just another evaluative mechanism: a set of required benchmarks imposed by administrators, another suite of numbers (like citations or the h-index) that one needs to worry about. Some warn that by requiring altmetrics to be reported in evaluations, that "academics and research support offices [will be pushed] towards wasting their time trying to attract tweets etc. to their work" (Thelwall, 2014). However, this does not have to be the case.
It is in the power of faculty councils, department chairs, grant review boards, and hiring and promotion committees-groups led by researchers themselves-to declare that altmetrics should only be used as a voluntary growth mechanism: a means to understand where they are succeeding and to share that attention and those pathways to impact with others. Thelwall (2014) has suggested that altmetrics can be "particularly valuable for social impact case studies but can also be useful to demonstrate educational impacts for research". By insisting that altmetrics be an option, not a requirement, to use in promotion and tenure dossier preparation guidelines, job applications, grant proposals and other professional advancement opportunities, researchers can retain control over the appropriate use of these metrics.

5:
Evaluators should use and interpret altmetrics carefully. In cases where researchers find themselves in the evaluator's chair-whether on grant review panels, search committees, or other such scenarios-there is an important principle to keep in mind with regard to interpreting altmetrics. Experts have recommended that metrics should "supplement, not supplant, expert judgement" (Hicks et al., 2015;Wilsdon et al., 2015). Thelwall (2014) adds, "[A]ssessors should use the alternative metrics to guide them to a starting position about the impact of the research but should make their own final judgement, taking into account the limitations of alternative metrics".

Conclusion
Altmetrics are a new class of research impact and attention data that can help researchers understand their influence and share it with others, for a variety of purposes. Though altmetrics currently have limitations to their formulation and use, these relatively young metrics are still evolving and may soon be more accurate measures of true research impact than their bibliometric predecessors. Until that day, researchers considering using altmetrics should follow a number of recommendations that can make a difference in their proper use, preventing abuse.