Erratum: “Excellence R Us”: university research and the fetishisation of excellence

Correction to: Palgrave Communications (2017) 3 Article number: 16105 doi: 10.1057/palcomms.2016.105; Published 19 January 2017; Updated 7 February 2017 Previously the spelling of Martin Paul Eve’s institution in the Author Information section was given as “Birckbeck, University of London, UK ”. This has now been corrected to “Birkbeck, University of London, UK”.

scholars? Does "excellence" live up to the expectations that academic communities place upon it? Is "excellence" excellent? And are we being excellent to each other in using it?
This article examines the utility of "excellence" as a means for organizing, funding, and rewarding science and scholarship. It argues that academic research and teaching is not well served by this rhetoric. Nor, we argue, is it well served by the use of "excellence" to determine the distribution of resources and incentives to the world's researchers, teachers and research institutions. While the rhetoric of "excellence" may seem in the current climate to be a natural method for determining which researchers, institutions, and projects should receive scarce resources, we demonstrate that it is not as efficient, accurate, or necessary as it may seem. As we show, indeed, a focus on "excellence" impedes rather than promotes scientific and scholarly activity: it at the same time discourages both the intellectual risk-taking required to make the most significant advances in paradigm-shifting research and the careful "Normal Science" (Kuhn [1962] 2012) that allows us to consolidate our knowledge in the wake of such advances. It encourages researchers to engage in counterproductive conscious and unconscious gamesmanship. And it impoverishes science and scholarship by encouraging concentration rather than distribution of effort. The net result is science and scholarship that is less reliable, less accurate, and less durable than research assessed according to other criteria. While we acknowledge that it often seems politically necessary to argue for "excellence", and while we understand that funding and accreditation bodies and agencies must play a political as well as scientific game, we here present the evidence that the internalization of such rhetoric into the research space can be counter-productive.
The article itself falls into three parts. In the first section, we discuss "excellence" as a rhetoric. Drawing on work by Michèle Lamont and others, we argue that "excellence" is less a discoverable quality than a linguistic interchange mechanism by which researchers compare heterogeneous sets of disciplinary practices. In the second section, we dig more deeply into the question of "excellence" as an assessment tool: we show how it distorts research practice while failing to provide a reliable means of distinguishing among competing projects, institutions, or people. In the final section, we consider what it might take to change our thinking on "excellence" and the scarcity it presupposes. We consider alternative narratives for approaching the assessment of research activity, practitioners, and institutions and discuss ways of changing the "scarcity-thinking" that has led us to our current use of this fungible and unreliable term. We propose that a narrative built on "soundness" and "capacity" offers us the opportunity to focus on practice of productive research and on the crucial role that social communication and criticism plays. Where there is more heterogeneity and greater opportunity for diversity of outcomes and perspectives, we argue, research improves.
What is "excellence"? In her book, How Professors Think: Inside the Curious World of Academic Judgment, Michèle Lamont opens by noting that " 'excellence' is the holy grail of academic life" (Lamont, 2009, 1). Yet, as she quickly moves to highlight, this "excellence is produced and defined in a multitude of sites and by an array of actors. It may look different when observed through the lenses of peer review, books that are read by generations of students, current articles published by 'top' journals, elections at national academies, or appointments at elite institutions" (3). Or as Jack Stilgoe suggests: " 'Excellence' tells us nothing about how important the science is and everything about who decides" (Stilgoe, 2014). ARTICLE PALGRAVE COMMUNICATIONS | DOI: 10.1057DOI: 10. /palcomms.2016 This tallies with the work of others who have considered reforms to the review process in recent years. Kathleen Fitzpatrick, for instance, has also situated the crux of evaluation in the evaluator, not the evaluated. For, as Fitzpatrick notes, "in using a human filtering system, the most important thing to have information about is less the data that is being filtered, than the human filter itself: who is making the decisions, and why. Thus, in a peer-to-peer review system, the critical activity is not the review of the texts being published, but the review of the reviewers." (Fitzpatrick, 2011, 38) The challenge here is that it is not possible to conduct a "review of the reviewers" without some reference to the evaluated material. It is possible to query the conduct of reviewers or the process they are (supposed to be) applying against another set of disciplinary norms (that is, are the reviewers acting in good faith? Have they provided a useful report? Do they know the field as normatively defined?); but to assess qualitative aspects of reviewers' judgment of a specific work requires an external evaluation of the work itself-a type of circularity in which a preshared evaluative culture must exist in order to pass judgment on the evaluation that is its basis: the "shared standards" of which Lamont writes (2009: 4).
Yet despite the anti-foundational nature of this problem, there remains a pressing need, in Lamont's view, to ensure that "peer review processes [... are] themselves subject to further evaluation" (247). Calls for training in peer review practices as well as calls for greater transparency occur across disciplinary boundaries, but generally without addressing the differences in practice that occur on either side of those boundaries. Lamont suggests that current remedies to this problem-which mostly consist of changing the degrees of anonymity or the point at which review is conducted (pre-versus post-filter)-are insufficient and constitute "imperfect safeguards". Instead, she suggests, it is more important that members of peer-review communities should be educated "about how peer evaluation works," avoiding the pitfalls of homophily (in which review processes merely re-inscribe value to work that exhibits similitude to pre-existing examples) by re-framing the debate as a "micro-political process of collective decision making" that is "genuinely social" (246-247). As with most problems in scholarly communication, the challenge with peer review is therefore not technical but social.
As Lamont and others show, then, "excellence" is a pluralized construct that is specific to (and conservative within) each disciplinary environment. Yet even the most obvious solution to this challenge-interdisciplinary diversity of evaluators-only leads to further problems. For the differences in practice of review and perceptions of "excellence" across disciplinary boundaries, combined with a lack of appreciation that these differences exist, makes it difficult to reach consensus within such diverse pools of reviewers. This is because, as Stirling (2007b) has noted, "it is difficult indeed to contemplate any single general index of diversity that could aggregate properties [...] in a uniquely robust fashion". If diversity itself cannot easily be collapsed onto a single measurable vector then there is little hope of aggregating diverse senses of "excellence" into a coherent and universal framework.
This suggests that "excellence" resides between different communities and is ill-structured/defined in each context. Local groups and disciplines may have their own more specific (though sometimes conventional rather than explicit) measures of "excellence": Biologists may treat some aspects of performance as "excellent" (for example, number of publications, author position, citations counts), while failing to recognize aspects considered equally or more "excellent" by English professors (large word counts, single authorship, publication or review in popular literary magazines and journals) (O'Donnell, 2015). Finally, as we will go on to show, it is clear that evaluative cultures are operating without even internal consensus beyond a few broad categories of performance.
That said, it remains tempting to argue that such concepts of value, even if they are ungrounded and unshared, can be used pragmatically to foster consensus. This is the point of Wittgenstein's (2001: section 293) famous "beetle in a box" metaphor, which he uses to exemplify the "private language argument". For Wittgenstein, the question of unique noncommunicable epistemic knowledge (such as pain experience), should actually be framed in terms of public, pragmatic language games/contexts. If we each have an object in a box that is called a "beetle," but none of us can see each other's "beetles", he argues, then the important thing is not what the objects in our boxes actually are but rather how we negotiate and use the term socially to engender intersubjective understanding or action. In such cases, "if we construe the grammar of the expression of sensation on the model of 'object and designation', the object drops out of consideration as irrelevant" and designation is all that matters.
We might therefore productively ask: even if "excellence" is a concept that carries little or no information content, either within communities or across them, might it nonetheless be useful as a "beetle"? That is, as a carrier of interpretation or a set of social practices functioning as an expert system to convert intrinsic, qualitative, and non-communicable assessment into a form that allows performance to be compared across disciplinary or other boundaries? Might it, indeed, even be useful given the political necessity for research communities and institutions to present an (ostensibly) unified front to government and wider publics as a means of protecting their autonomy? Could "excellence" be, to speak bluntly, a linguistic signifier without any agreed upon referent whose value lies in an ability to capture cross-disciplinary value judgements and demonstrate the political desirability of public investment in research and research institutions?
In actual practice, it is not even useful in this way. Although, as its ubiquity suggests, "excellence" is used across disciplines to assert value judgements about otherwise incomparable scientific and scholarly endeavours, the concept itself mostly fails to capture the disciplinary qualities it claims to define. Because it lacks content, "excellence" serves in the broadest sense solely as an (aspirational) claim of comparative success: that some thing, person, activity, or institution can be asserted in a hopefully convincing fashion to be "better" or "more important" than some other (often otherwise incomparable) thing, person, activity, or institution-and, crucially, that it is, as a result, more deserving of reward. But this emphasis on reward, as Kohn (1999) and others have demonstrated, is itself often poisonous to the actual qualities of the underlying activity.
Is "excellence" good for research? Thus far, we have been arguing that "excellence" is primarily a rhetorical signalling device used to claim value across heterogeneous institutions, researchers, disciplines, and projects rather than a measure of intrinsic and objective worth. In some cases, the qualities of these projects can be compared in detail on other bases; in many-perhaps most-cases, they cannot. As we have argued, the claim that a research project, institution, or practitioner is "excellent" is little more than an assertion that that project, institution, or practitioner can be said to succeed better on its own terms than some other project, institution, or practitioner can be said to succeed on some other, usually largely incomparable, set of terms.
But what about these sets of "own terms"? How easy is it to define the "excellence" of a given project, institution, or practitioner on an intrinsic basis? Even if we leave aside the comparative aspect, are there formal criteria that can be used to identify "excellence" in a single research instance on its own terms or that of a single discipline?
Research suggests that this is far harder than one might think. Academics, it turns out, appear to be particularly poor at recognizing a given instance of "excellence" when they see it, or, if they think they do, getting others to agree with them. Their continued willingness to debate relative quality in these terms, moreover, creates a basis for extreme competition that has serious negative consequences.
Do researchers recognize excellence when they see it? The short answer is no. This can be seen most easily when different potential measures of "excellence" conflict in their assessment of a single paper, project, or individual. Adam Eyre-Walker and Nina Stoletzki, for example, conclude that scientists are poor at estimating the merit and impact of scientific work even after it has been published (2013). Post-publication assessment is prone to error and biased by the journal in which the paper is published. Predictions of future impact as measured by citation counts are also generally unreliable, both because scientists are not good at assessing merit consistently across multiple metrics and because the accumulation of citations is itself a highly stochastic process, such that two papers of similar merit measured on other bases can accumulate very different numbers of citations just by chance. Moreover, Wang et al. (2016) show that in terms of citation metrics the most novel work is systematically undervalued over the time frames that conventional measures use, including, for instance, the Journal Impact Factor that Eyre-Walker and Stoletzki suggest biases expert assessment. This is true even of work that can be shown to be successful by other measures. Campanario, Gans and Shepherd, and others, for example, have traced the rejection histories of Nobel and other prize winners, including for papers reporting on results for which they later won their recognition (Gans and Shepherd, 1994;Campanario, 2009;Azoulay et al., 2011: 527-528). Campanario and others have also reported on the initial rejection of papers that later went on to become among the more highly cited in their fields or in the journals that ultimately accepted them (Campanario, 1993(Campanario, , 1996Campanario, 1995;Campanario and Acedo, 2007;Calcagno et al., 2012;Nicholson and Ioannidis, 2012;Siler et al., 2015). Yet others have found a generally poor relationship between high ratings in grant competitions and subsequent "productivity" as measured by publication or citation counts (Pagano, 2006;Costello, 2010;Lindner and Nakamura, 2015;Fang et al., 2016;Meng, 2016).
As this suggests, academics' abilities to distinguish the "excellent" from the "not-excellent" do not correlate well with one another even within the same disciplinary environment (there tends to be greater agreement at the other end of the scale, distinguishing the "not acceptable" from the "acceptable," see Cicchetti, 1991;Weller, 2001). To earn citations or win prizes for a rejected manuscript, after all, authors need to begin by convincing a different journal (and its referees) to accept work that others previously have found wanting.
But this is not something that only Nobel prize winners are good at: as Weller reported in the early years of this century, most (51.4%) rejected manuscripts were ultimately published; in the vast majority of cases (approximately 90%), these previously rejected articles were accepted on their second submission and, in the vast majority of these cases (also approximately 90%), at a journal of similar prestige and circulation (Weller, 2001).
While these statistics have almost certainly changed in the last few years with changes in the demographics of submission and, especially, the development of venues that focus on the publication of "sound science" (Public Library of Science, 2016), the basic sense that journal peer review is a gatekeeper that is frequently circumvented remains.
Articles that are initially rejected and then go on to be published to great acclaim or even just in journals of a similar or higher ranking represent what are in essence false negatives in our ability to assess "excellence." They are also evidence of terrible inefficiency. The rejection of papers that are subsequently published with little or no revision at journals of similar rank increases the costs for everyone involved without any countervailing improvement in quality. In addition to multiplying the systemic cost of refereeing and editorial management by the number of resubmissions, such articles also present an opportunity cost to their authors through lost chances to claim priority for discoveries, for example, or, even more commonly, lost opportunities for citation and influence (Gans and Shepherd, 1994;Campanario, 2009;Şekercioğlu, 2013;Brembs, 2015;Psych Filedrawer, 2016).
More worryingly, there is also considerable evidence of false positives in the review process-that is to say submissions that are judged to meet the standards of "excellence" required by one funding agency, journal, or institution, but do worse when measured against other or subsequent metrics. In a somewhat controversial work, Peters and Ceci submitted papers in slightly disguised form to journals that had previously accepted them for publication (Peters and Ceci, 1982;see Weller, 2001 for a critique). Only 8% overall of these resubmissions were explicitly detected by the editors or reviewers to which they were assigned. Of the resubmissions that were not explicitly detected, approximately 90% were ultimately rejected for methodological and/or other reasons by the same journals that had previously published them; they were rejected, in other words, for being insufficiently "excellent" by journals that had decided they were "excellent" enough to enter the literature previously.
When it comes to funding, a similar pattern of false positives may pertain: a study by Nicholson and Ioannidis (2012) suggests that highly cited authors are less likely to head major biomedical research grants than less-frequently-cited but socially betterconnected authors who are associated with granting agency study groups and review panels. Fang, Bowen and Casadevall have discovered that "the percentile scores awarded by peer review panels" at the NIH correlated "poorly" with "productivity as measured by citations of grant-supported publications" (Fang et al., 2016). These suggest a bias towards conformance and social connectedness over innovation in funding decisions in a world in which success rates are as low as 10%. It also provides further evidence of funding-agency bias against disruptively innovative work noted by many researchers over the years (Kuhn [1962] 2012; Campanario, 1993Campanario, , 1995Campanario, , 1996Campanario, , 2009Costello, 2010;Ioannidis et al., 2014;Siler et al., 2015).
Fraud, error and lies. To the extent that the above are evidence of inefficiencies in the system, some might argue that individual problems in determining "excellence" in specific cases are resolved in the longer term and over large samples. Of course, these examples only show work for which multiple measures of "excellence" can be compared: given their unreliability, this suggests that work that is not measured more than once may be unjustly suppressed or unjustly published, without us being able to tell the difference. On the other hand, it is presumably possible that even such extreme examples of differing perceptions of "excellence" represent honest differences of opinion as to the ARTICLE PALGRAVE COMMUNICATIONS | DOI: 10.1057/palcomms.2016.105 qualitative merit of the research or researchers. The same cannot be said, however, of actual fraud and outright errors.
As various studies have concluded, reported instances of both fraud and error (as measured through retractions) are on the rise (Claxton, 2005;Dobbs, 2006;Steen, 2011;Fang et al., 2012;Grieneisen and Zhang, 2012;Yong, 2012b;Chen et al., 2013;Andrade, 2016). This is particularly true at higher prestige journals (Resnik et al., 2015;Siler et al., 2015;Belluz, 2016). If we add to this list of (potentially) "false positives" studies that cannot be replicated, the number of papers that meet one measure of "excellence" (that is, passing peer review, often at "top" journals) while failing others (that is, being accurate and reproducible, and/ or non-fraudulent) rises considerably (Dean, 1989;Burman et al., 2010;Lehrer, 2010;Bem, 2011;Goldacre, 2011;Yong, 2012b;Rehman, 2013;Resnik and Dinse, 2013;Hill and Pitt, 2014;Chang and Li, 2015;Open Science Collaboration, 2015). It is the very focus on "excellence", however, that creates this situation: the desire to demonstrate the rhetorical quality of "excellence" encourages researchers to submit fraudulent, erroneous, and irreproducible papers, at the same time as it works to prevent the publication of reproduction studies that can identify such work.
In other words, erroneous, and especially fraudulent or irreproducible papers are interesting because they represent a failure of both our ability to identify and predict actual qualitative "excellence" and the incentive system that is used to encourage scientists and scholars to produce the kind of sound and defensible work that should be a sine qua non for quality. As Fang, Steen, and Casadevall (2012;cf Steen, 2011 for which the later article represents a correction) have shown, the majority of retracted papers are withdrawn for reasons of misconduct including fraud, duplicate publication, or plagiarism (67.4%), rather than error (21.3%)-although inadvertent error should presumably itself be disqualification from "excellence". But even these figures may under-represent the true incidence of misconduct. Mistakes and errors made in good faith are a natural and necessary part of the research process. Yet, as focus groups and surveys conducted by various researchers have demonstrated, some forms of error can be misconduct in the form of a (semi-) deliberate strategy for ensuring quick and/or numerous publications by " 'cutting a little corner' in order to get a paper out before others or to get a larger grant,... [or] because... [a researcher] needed more publications that year" (Anderson et al., 2007: 457-458; see also Fanelli, 2009;Tijdink et al., 2014;Chubb and Watermeyer, 2016).
Thus in one small sample of detailed surveys, Fanelli showed that while only a small percentage of scientists (1.97% pooled weighted average, n = 7) admitted to fabricating, falsifying, or modifying data, a much larger percentage claimed to have seen others engaging in similarly outright fraudulent activity (14.12%, n = 12). Furthermore, even larger percentages had engaged in (33.7%) or seen others engage in (72%) questionable research described using less negatively loaded language (Fanelli, 2009; the percentage of scientists admitting to explicit misconduct is considerably higher [15%] in Tijdink et al., 2014). As Fanelli concludes: "Considering that these surveys ask sensitive questions and have other limitations, it appears likely that this is a conservative estimate of the true prevalence of scientific misconduct" (2009, 9)-a conclusion very strongly supported by the anecdotal admissions of Anderson et al.'s focus groups.
The drive for "excellence" in the eyes of assessors is shown even more starkly in work by Chubb and Watermeyer (2016). In structured interviews, academics in Australia and the United Kingdom admitted to outright lies in the claims of broader impacts made in research proposals. As the authors note: "Having to sensationalize and embellish impact claims was seen to have become a normalized and necessary, if regretful, aspect of academic culture and arguably par for the course in applying for competitive research funds" (6). Quoting an interviewee, they continue, "If you can find me a single academic who hasn't had to bullshit or bluff or lie or embellish to get grants, then I will find you an academic who is in trouble with his [sic] Head of Department" (6; "[sic]" as in Chubb and Watermeyer). Here we see how a competitive requirement, perceived or real, for "excellence", in combination with a lack of belief in the ability of assessors to detect false claims, leads to a conception of "excellence" as pure performance: a concept defined by what you can get away with claiming in order to suggest (rather than actually accomplish) "excellence".
What is striking about these behaviours, of course, is that they are unrelated to (and to a great extent perhaps even incompatible with or opposed to) the actual qualities funders, governments, journal editors and referees, and researchers themselves are ostensibly using "excellence" to identify. No agency, ministry, press, or research office intentionally uses "excellence" as shorthand for "able to embellish results or importance convincingly", even as the researchers being adjudicated under this system report such embellishment as a primary criterion for success. Whether it occurs through fraud, cutting corners, or exaggeration, this performance of "excellence" is commonly justified as being necessary for survival, suggesting a cognitive and cultural dissonance between those aspects of their work that the performers feel is essential and those aspects they feel they must emphasise, overstate, embellish, or fabricate to appear more "excellent" than their competitors. The evidence that fraud and corner-cutting are a problem at the core of the research process suggests that the pressure for these performances of "excellence" is not restricted to stages that do not matter. As Kohn argues, reward-motivation affects scientific creativity (the ability to "break out of the fixed pattern of behaviour that had succeeded in producing rewards… before") as much as it does evidencegathering or the inflation of results (1999,44; see also Lerner and Wulf, 2006;Azoulay et al., 2011;Tian and Wang, 2011).
Competition for scarce resources and the performance of "excellence". So why do researchers engage in this kind of dubious activity? Clearly for both Chubb and Watermeyer's interviewees, as well as those identified as having committed scientific fraud, it is competition for scarce resources, whether funding, positions, or community prestige. Of course this is not a new issue (Smith, 2006). Taking time away from his work on the difference machine, Charles Babbage published an analysis of what he saw as the four main kinds of scientific frauds in an 1830 polemic, Reflections on the Decline of Science in England: And on Some of Its Causes. These included the self-explanatory "hoaxing" and "forging," in addition to "trimming" ("clipping off little bits here and there from those observations which differ most in excess from the mean and in sticking them on to those which are too small") and "cooking" ("an art of various forms, the object of which is to give ordinary observations the appearance and character of those of the highest degree of accuracy") (Babbage, 1831: 178; see Zankl, 2003;and Secord, 2015 for a discussion).
The motivation for these frauds, then as now, involves prestige and competition for resources. Babbage's typology of fraudulent science was but a minor chapter in a book otherwise mostly concerned with the internal politics of the Royal Society. He attributed the decline he saw in English science to the lack of attention and professional opportunities available to potential scientists. He was, as a result, keenly sensitive to questions of credit and its importance in determining rank and authority. Indeed, as Casadevall and Fang remind us, "Since Newton, science has changed a great deal, but this basic fact has not.
Credit for work done is still the currency of science…. Since the earliest days of science, bragging rights to a discovery have gone to the person who first reports it" (Casadevall and Fang, 2012: 13). The prestige of first discovery always has been a scarce resource. Now that that prestige is measured also through the scarce resource of authorship in "the right journals" and coupled ever more strongly to the further scarce resources of career advancement and grant funding, it should not be a surprise that the competition for those markers has become steadily stronger. The performance of "excellence" has become more marked as a result.
If scandals such as fraudulent articles were the only way in which this overwhelming competitive focus on "excellence" hurt research, it would be bad enough. But the emphasis on rewarding the performance of "excellence" also has a more general impact on research capacity: it is the mechanism by which "the Matthew effect"-that is, the disproportionate accrual of resources to those researchers and institutions that are already well-rewardedoperates in a hyper-competitive research environment, creating distortions throughout the research cycle, even for work that is not fraudulent or the result of misconduct (Bishop, 2013; as its etymology implies, the "Matthew effect" predates today's hypercompetition, see Merton, 1968Merton, , 1988 1 : it increases the stakes of the competition for resources and, as a result, encourages gamesmanship; creates a bias towards (nondisruptively) novel, positive, and even inflated results on the part of authors and editors; and discourages the pursuit and publication of types of "Normal Science" (such as replication studies) that are crucial to the viability of the research enterprise, without being glamorous enough to suggest that their authors are "excellent".
Positive bias and the decline effect. Just how destructive this need to perform "excellence" is can be illustrated by the wellknown bias towards positive results in scientific publication (for example, Dickersin et al., 1987Dickersin et al., , 2005Sterling, 1959;Kennedy, 2004;Young and Bang, 2004;Bertamini and Munafò, 2012;Rothstein, 2014;Psych Filedrawer, 2016). Thus, for example, Fanelli (2011) demonstrated a 22% growth between 1990 and 2007 in the "frequency of papers that, having declared to have 'tested' a hypothesis, reported a positive support for it". This is all the more remarkable given that the late 1980s were themselves not a halcyon period of unbiased science: in an 1987 study of 271 unpublished and 1041 published trials, Dickersin et al. found that 14% of unpublished and 55% of published trials favoured the experimental therapy (1987). As Young et al. suggest, "the general paucity in the literature of negative data" is such that "[i]n some fields, almost all published studies show formally significant results so that statistical significance no longer appears discriminating" (2008,1419).
Another artifact of this positive bias is the "decline effect," or the tendency for the strength of evidence for a particular finding to decline over time from that stated on its first publication (Schooler, 2011;Gonon et al., 2012;Brembs et al., 2013;Groppe, 2015;Open Science Collaboration, 2015). While this effect is also well-known, Brembs et al. have recently shown that its presence is significantly positively correlated with journal prestige as measured by Impact Factor: early papers appearing in high prestige journals report larger effects than subsequent studies using smaller samples (2013, see Figs. 1b and 1c in this reference).
The bias against replication. Finally, there is a bias against the publication of replication studies in disciplines where such patterns make scientific sense. Indeed, there are currently insufficient structural incentives to perform work that "merely" revalidates existing studies, fuelled by a focus on novelty in most definitions of "excellence". As Nosek et al. note Publishing norms emphasize novel, positive results. As such, disciplinary incentives encourage design, analysis, and reporting decisions that elicit positive results and ignore negative results. Prior reports demonstrate how these incentives inflate the rate of false effects in published science. When incentives favour novelty over replication, false results persist in the literature unchallenged, reducing efficiency in knowledge accumulation. (2012) This bias against replication is even more remarkable, however, when it involves studies that invalidate rather than confirm the original result, especially when the original result has a high profile or is potentially field-defining-qualities that one would assume would increase the novelty and interest of the (non) replication itself (Goldacre, 2011;Wilson, 2011;Nosek et al., 2012;Yong, 2012a, b;Aldhous, 2011; for a view from the other side of replication, see Bissell, 2013). This is in part, a function of publishing economics: commercial journals earn money from subscription, access, and reprint fees (Lundh et al., 2010); high profile results and a high prestige reflected by a high Impact Factor help maintain the demand for these journals and hence ensure both a continuing stream of interesting new material and a steady or rising income for the journal as a whole (Lawrence, 2007;Munafò et al., 2009;Lundh et al., 2010;Marcovitch, 2010). Undercutting (or perhaps even qualifying) the high-profile results that help bring in these subscribers, new articles, and attention attacks the very foundation of this success-a journal that publishes high profile but incorrect papers is undercutting its case for subscription and author submissions. One doesn't need to imagine a conspiracy to promote poor science to understand how a conscious or unconscious bias against replication studies might arise under such circumstances.
The reluctance of major journals to publish replication studies embeds this bias in the incentive system that guides authors. As Wilson notes: [M]ajor journals simply won't publish replications. This is a real problem: in this age of Research Excellence Frameworks and other assessments, the pressure is on people to publish in high impact journals. Careful replication of controversial results is therefore good science but bad research strategy under these pressures, so these replications are unlikely to ever get run. Even when they do get run, they don't get published, further reducing the incentive to run these studies next time. The field is left with a series of "exciting" results dangling in mid-air, connected only to other studies run in the same lab. (2011) As Rothstein (2014) argues "The consequences of this problem include the danger that readers and reviewers will reach the wrong conclusion about what the evidence shows, leading at times to the use of unsafe or ineffective treatments".
Homophily. Thus far, we have been discussing the negative impact of "excellence" largely in terms of its effect on the practice and results of professional researchers. There is, however, another effect of the drive for "excellence": a restriction in the range of scholars, of the research and scholarship performed by such scholars, and the impact such research and scholarship has on the larger population. Although "excellence" is commonly presented as the most fair or efficient way to distribute scarce resources (Sewitz, 2014), it in fact can have an impoverishing effect on the very practices that it seeks to encourage. A funding programme ARTICLE PALGRAVE COMMUNICATIONS | DOI: 10.1057/palcomms.2016.105 that looks to improve a nation's research capacity by differentially rewarding "excellence" can have the paradoxical effect of reducing this capacity by underfunding the very forms of "normal" work that make science function (Kuhn [1962] 2012) or distract attention from national priorities and well-conducted research towards a focus on performance measures of North America and Europe (Vessuri et al., 2014). A programme that seeks to reward Humanists, similarly, by focussing on output in "high impact" academic journals paradoxically reduces the impact of these same disciplines by encouraging researchers to focus on their professional peers rather than broader cultural audiences (Readings, 1996), reducing the domain's relevance even as its performance of "excellence" improves. A programme of concentration on the "best" academics, in other words, can have the effect of focussing attention on problems and approaches in which "excellence" can be performed most easily rather than those that could benefit the most (or provide the greatest actual impact) from increased attention.
Moreover, a concentration on the performance of "excellence" can promote homophily among the scientists themselves. Given the strong evidence that there is systemic bias within the institutions of research against women, under-represented ethnic groups, non-traditional centres of scholarship, and other disadvantaged groups (for a forthright admission of this bias with regard to non-traditional centres of scholarship, see Goodrich, 1945), it follows that an emphasis on the performance of "excellence"-or, in other words, being able to convince colleagues that one is even more deserving of reward than others in the same field-will create even stronger pressure to conform to unexamined biases and norms within the disciplinary culture: challenging expectations as to what it means to be a scientist is a very difficult way of demonstrating that you are the "best" at science; it is much easier if your appearance, work patterns, and research goals conform to those of which your adjudicators have previous experience. In a culture of "excellence" the quality of work from those who do not work in the expected "normative" fashion run a serious risk of being under-estimated and unrecognised (King et al., 2014(King et al., , 2016O'Connor and O'Hagan, 2015; University of Arizona Commission on the Status of Women, 2015; this is, in part, an explanation for the systemically underreported and poorly acknowledged and rewarded work of women "assistants" in many of the great scientific discoveries of the twentieth century). There is a clear case to answer that, absent substantial corrective measures and awareness, a focus on "excellence" will continue to maintain rather than work to overcome social barriers to participation in research by currently underrepresented groups.
Homophily is in some senses a variant on Merton's "Matthew effect," discussed above. It is also a variant on the old argument that existing power structures-those populated by those whom it is assumed already exemplify "excellence"-tend towards conservatism in their processes of evaluation. It underpins the calls to reassess the focus of mainstream scholarship, whether this is "great men" history, the "Dead White Male" in literary "canon", or the bias towards the ills of the western male patient in medical research. As Barbara Herrnstein Smith says with respect to literary evaluation: …[a work that "endures"] will also also begin to perform certain characteristic cultural functions by virtue of the very fact that it has endured...In these ways, the canonical work begins increasingly not merely to survive within but to shape and create the culture in which its value is produced and transmitted and, for that very reason, to perpetuate the conditions of its own flourishing. (Herrnstein Smith, 1988 emphasis in the original) In other words, the works that-and the people who-are considered "excellent" will always be evaluated, like the canon that shapes the culture that transmits it, on a conservative basis: past performance by preferred groups helps establish the norms by which future performances of "excellence" are evaluated. Whether it is viewed as a question of power and justice or simply as an issue of lost opportunities for diversity in the cultural coproduction of knowledge, an emphasis on the performance of "excellence" as the criterion for the distribution of resources and opportunity will always be backwards looking, the product of an evaluative process by institutions and individuals that is established by those who came before and resists disruptive innovation in terms of people as much as ideas or process.
Alternative narratives: working for change If, as we have argued, "excellence" in all its many forms and meanings is both unreliable as a measure of actual quality, and pernicious in the way it promotes poor behaviour and discourages good, what then are the alternatives? Given the political realities that have promoted the use of this rhetoric in defence of science and scholarship, are there other, less damaging ways in which we can evaluate and promote the value of research and its communication?
Because "excellence" is used so ubiquituously across the research space, a complete answer to this question is far beyond the scope of any single paper: there is no single alternative that can replace the rhetoric of "excellence" in scholarly publishing, research funding, government and university policy, public relations, and promotion and tenure practices. In some areas, moreover, technological and economic changes suggest fairly obvious directions in which progress is being made-a prime example being the change from the physical scarcity that characterized print journals, adjudication to the abundance that, technically at least, characterizes a web-based publication infrastructure (for well-known discussions of this, see Shirky, 2010;Nielsen, 2012).
In many ways, however, the greatest challenge is research funding and infrastructure. The continuing competition for government and private funds raises questions of prioritization and adjudication that are unlikely to be rapidly answered by changes in technology or attitudes. A central test of our critique of rhetorics of "excellence" is therefore to ask whether there are any alternatives in this arena. Since funding applications tend to collect examples of "excellence" from other aspects of the research enterprise as a form of justification (success in funding is a function of one's ability to demonstrate "excellence" in different types of performance), it also represents the apex of the problem.
Perhaps because it is so hard, the tendency in policy, at least in the traditional North Atlantic centres of research in the last several decades, has clearly been in a non-distributive direction: for the concentration of resources on "top" institutions (in earlier periods, such as the early space race, for example, the focus was arguably more distributive). The Research Excellence Framework in the United Kingdom (REF) and massive new research centres such as the Crick in London are intended to create a "critical mass" of "excellent" or "world-leading" research. In Canada, which is an outlier internationally in the push towards stratification (Usher, 2016), it remains the case that the "top" universities (which have their own independent lobby group), receive a disproportionate share of research resources when measured, for example, against the percentage of students (including Doctoral students) they educate (U15 Group of Canadian Research Universities/Regroupement des universités de recherche du Canada, 2016). In the much larger U.S. post secondary system, ten universities received nearly 20% of all government research funds; as Weigley and Hess note, while these universities are among the richest in the country in terms of their endowments, public funding still constitutes the largest part of their R&D funding (2013).
Many have questioned the value of such an inequitable distribution of funds when a less concentrated, or less unequal, distribution could achieve greater outcomes. Dorothy Bishop argues, with respect to the REF that there should be less of a disparity between rewarding research that is perceived to be "the best" and that which is perceived as merely average. Instead, Bishop (2013) argues, all research submitted to the REF should receive some funding and the perceived best research should receive a smaller overall proportionate gain. This would have the benefit of decreasing the funding gulf between elite and middletier universities and would encourage diversity in the process. Of course such an approach may be politically troublesome for the academy, as long as the criterion it promotes is relative "excellence" rather than, say, "capacity", "breadth", "soundness", "comprehensiveness" or "accessibility". If funding is allocated on a scattered basis, following the logic that predictive approaches to quality are weak at best, then the authority claims of the university are substantially devalued as long as the rhetoric used to defend them privileges a "winner-take-all" measure of effectiveness.
There is, however, a compelling case to be made for the value of greater redistribution of research funding. Cook et al. (2015) showed that for UK Bioscience groups an optimal allocation of fixed resources would involve spreading the money between a larger number of smaller groups. This was the case whether number of publications or number of citations were used as the measure of productivity. A similar conclusion is reached by Fortin and Currie who argue that scientific impact is only "weakly money-limited" and that a more productive strategy would be to distribute funds based on "diversity" rather than perceptions of "excellence" (Fortin and Currie, 2013). Gordon and Poulin argued that, for science funding in Canada through the National Science and Engineering Research Council (NSERC, the main STEM funding agency), it would have cost less at a whole system level simply to distribute the average award to all eligible applicants than to incur the costs associated with preparing, reviewing and selecting proposals (2009; although see Roorda, 2009 for a critique of their calculation). A rough calculation of the system costs of preparing failed grant applications would suggest that they are in the same order of magnitude as research grant funding itself (Herbert et al., 2013).
What this suggests is that "excellence" is not the only policy choice concerning the resourcing of research, nor even, necessarily, the only politically compelling one: from concentrating resources on the most deserving, allegedly "excellent", institutions and researchers, to distributing them amongst all those that meet some minimum criteria-or even some subset, by lottery (Health Research Council of New Zealand, 2016;Fang et al., 2016), arguments can be made for a variety of different methods of funding research. In the context of scarce resources and a desire to maximize outcomes, indeed, there is even an argument for focussing most attention on the worst institutions; those that might most benefit from resources to improve (Bishop, 2013), have the greatest scope for improvement, and would go the longest way to ensuring an increase in basic capacity. In this case, rather than "excellence" appraisers would be looking for some sort of baseline level of qualification, "credibility" (Morgan, 2016), perhaps, or "soundness". This would be a shift from focussing on evaluation of outputs to an evaluation of practice.
The challenge with any redistributive scheme is how to engage with politics. While proposing interesting and valuable thought experiments, they do not address the needs of working with governments who need to account for the distribution of public funds and may fear the optics of a system built on criteria other than "the best". The narrative and the need for "excellence" (like that of "international competitiveness") is important as a shared language of externally recognizable symbols that justify funding to government and to wider publics.
As noted earlier, this serves the interests of those who have already "earned" the label. The local construction of "excellence" is inherently conservative, and maintaining its structures serves the interests of those who hold local power. Therefore, narratives arguing for redistribution need to be more than just interesting ideas and more than simply factually correct. They need to be politically as well as intellectually compelling.
Soundness and capacity over "excellence". This is where a rhetoric built around "soundness" and "capacity" offers opportunities. The idea that "sound research is good research", and "more research is better than less"-that our focus should be on thoroughness, completeness, and appropriate standards of description, evidence, and probity rather than flashy claims of superiority-presents an alternative to the existing notions of "excellence". Such a narrative also addresses deeper concerns regarding a breakdown in research culture through hypercompetition. These terms resonate with public and funder concerns for value, and they align with the need for improved communications and wider engagement encouraged by many governments and agencies.
It might be argued in the case of "soundness" in particular that the term is as subjective as "excellence". Stirling (2007a) has argued that the implication that expert analysis can be free from subjective values in determining something like "soundness" is itself misleading and exclusionary. Certainly "soundness" or "scientificness" rhetorics have been used to give credibility to controversial technologies and to shut a range of perspectives out of public discourse in ways that are similar to uses of "excellence" we have criticized.
But the evaluation of "soundness" is based in the practice of scholarship, whereas "excellence" is a characteristic of its objects (outputs and actors). In this sense "soundness" aligns well with approaches that locate the value of scholarship and evaluation in the nature of its processes (that is, "proper practice") and its social conduct. While disagreeing on what the outputs of research can actually mean, scholars from Fleck, through Merton, Kuhn, Ravetz and Latour have all focussed on how practice in a social context in which norms and ethics are sustained and enforced leads to productive scholarship (Fleck [1935(Fleck [ ] 1979Ravetz, 1973;Latour and Woolgar, 1986;Latour, 1987). "Soundness" can be assessed by how it supports socially developed and documentable processes and norms. In contrast assessment of "excellence" depends on how convincing the performance of importance and impact is. Like "excellence" the criteria for "soundness" are not universal qualities distinct from pre-existing socially developed practice; but in contrast to "excellence", the qualities of "soundness" can be benchmarked. They are also more precise: "excellence" in the senses we are discussing is used describe the competitive position of an entire performance in relation to others; "soundness" focusses on details: statistical or bibliographic appropriateness, say, or well-chosen evidence.
Another question about "soundness" involves its crossdisciplinary application. What is "soundness" in the context of the Humanities? Eve (2014, 144) has suggested that "soundness" in a humanities paper might involve the ability to "evince an argument; make reference to the appropriate range of extant scholarly literature; be written in good, standard prose of an appropriate register that demonstrates a coherence of form and ARTICLE PALGRAVE COMMUNICATIONS | DOI: 10.1057DOI: 10. /palcomms.2016 content; show a good awareness of the field within which it was situated; pre-empt criticisms of its own methodology or argument; and be logically consistent". More recently, Morgan (2016) has suggested that "credibility" may be the humanities equivalent of "soundness". Others have focussed on the term "quality" in the sense in which it used in quality assurance (Funtowicz and Ravetz, 1990;Funtowicz and Ravetz, 2003), as fitness for an explicitly defined purpose. As we have argued above all of these appear to capture the sense that productive scholarship can be defined by allegiance to socially defined research practice as much as performance of success.
Our argument here is not that expanding our boundary for resourcing from "excellence" to "soundness" and "capacity" is all that is necessary to change research culture and improve the distribution of resources; rather, it is that a move from resourcing based on the performance of an ineluctable quality to one based on the demonstration of documentable, socially developed practice, is the first step to solving the problems our rhetoric of "excellence" has created. Soundness appears be a plausible basis on which to build a new narrative, or rather to combine existing threads into a more consistent rhetorical framework. Such a framework will work to refocus our attention on research that is sufficiently valuable to be worth pursuing. To drive adoption and practice towards making this real, however, will require more than narrative. It will need resources to be redistributed towards supporting a broader class of research activities.
Do soundness and capacity sell? Although we have been focussing on funding, the rhetoric of soundness and capacity, about the idea that the most important quality of research is that it be done and done with care, does resonate with other aspects of the research enterprise.
Some examples of this are the broad area of reproducibility (Burman et al., 2010;Lehrer, 2010;Goldacre, 2011;Yong, 2012b;Rehman, 2013;Chang and Li, 2015;Open Science Collaboration, 2015), reporting guidelines for animal experiments (Kilkenny et al., 2010) and clinical trials (Schulz et al., 2010), and work on registered replication studies in social psychology (Simons et al., 2014). All have been areas of substantial professional and popular discussion and the emphasis on the need for clarity of description and "doing things properly" is consistent. The idea that research must be reproducible, safe, and complete can be at least as compelling an argument as that it must be simply excellent.
Another place where the rhetoric of "soundness" and "capacity" has booked considerable success is the online journal PLOS ONE and the journals that have since begun to follow its approach. 2 PLOS ONE was launched with the stated aim of publishing any scientific research that was deemed technically sound, regardless of its perceived novelty or impact. This approach was made possible by two developments in academic publishing-the move to fully online publications without the need for print editions, and the growing acceptance of Article Processing Charge (APC)-funded Open Access as a viable publication model. These enabled the journal to consider and publish any manuscript that met its criteria, with no limitations on page space or fixed subscription revenue. As a result, the journal grew very quickly, becoming the largest journal in the world within 5 years of launching (MacCallum, 2011). The PLOS ONE model has been widely emulated, with almost every major scientific publisher now offering a journal with similar editorial criteria. This has created a competitive landscape with interesting properties. Traditional journals compete by seeking to publish the most "excellent" papers that they can attract and demonstrate this by the number of papers they reject. This also leads authors to self-select for submission to those journals only the papers they consider most important-avoiding, for example, "wasting" anybody's time by submitting "nonoriginal" work such as replication studies. Over time, success in this venture, its own form of hypercompetition, leads to a differentiated set of ranked journals driven by their own performative targets, or aspirations to join the top ranks. Authors and editors engage in a cycle of performance that reduces the breadth of research journals are willing to publish and authors willing to submit.
PLOS ONE and its competitors also compete, but on quite different terms and in ways that arguably improve rather than imperil the research enterprise. Speed of publication, for example, always features in author surveys, and journals like PLOS ONE often advertise their average turnaround times. They even compete on the basis of journal prestige, reputation and Impact Factor (Solomon, 2014), albeit with a heavier emphasis on soundness and number of publications (that is, capacity) rather than exclusivity and "excellence". Even when the criteria for inclusion is only soundness, membership in the club of authors still provides a prestige benefit: that the doors of the club are more open does not necessarily mean that there is no benefit to membership (Potts et al., 2016).
But PLOS ONE and similar journals also demonstrate that it is not simply enough to create mechanisms that test for soundness and capacity. Even when offered a distributive narrative, researchers often still find it difficult to avoid the concentrating rhetoric of "excellence". A common complaint from the managers of journals such as PLOS ONE, indeed, is that their journals' referees, who are usually made up of previous authors, often seek to reject papers that they feel do not meet their own perceptions of "excellence," instead of focussing on the journal's formal criterion of "soundness". Many anecdotes from PLOS ONE authors, likewise, involve being surprised by how tough the refereeing process was for their articles-a response that signals relative "excellence" that might otherwise not be apparent to the reader (see especially Curry, 2012 and comments). The performance of "excellence", the signalling of relative superiority through an additional line on the CV, is still more important from a career perspective than the science itself: nobody gets tenure for publishing to arXiv, no matter how good the quality of their research. At least that appears to be what most tenure-track academics believe. And while reader attention or online conversation are gaining some currency as indicators of qualities valued in an article, the current discourse indicates that authors need to feel that they have cleared a higher bar than they in fact have.
In other words, initiatives like PLOS ONE will have truly succeeded in changing researchers' own bias towards (ultimately undemonstrable) "excellence" only when their rejection rate is seen to be less important than the evidence that controls are in place to ensure and encourage the recognition of "soundness".

Caveats and further work
The potential scope of the project of this article is huge, and we have only been able to touch on some of its aspects. We have focused on narratives and rhetoric and sought to bring evidence of how existing rhetorics are damaging. What we have not done, as a variety of both anonymous reviewers and non-anonymous commenters have noted, is address the power politics that underlie many of the structures that we are critiquing. Nor have we analysed the degree to which different actors within the system are able to enact change.
Understanding how the changes we propose in narrative and indeed culture can be achieved politically and institutionally is a much larger project, one on which others are already engaged and one that is critically important in the current political climate. Institutional change is challenging and slow. We hope that alongside the criticism, implicit and explicit of some existing institutions, we have offered some routes forward to be investigated and explored.
We have also not undertaken a historical analysis. While we draw on literature from a range of periods we have not addressed how and when our current narratives developed. While we would argue that it has deep roots, we have neither the expertise nor the space to probe the history through which excellence rhetorics became institutionalized in their current forms. The differing registers and locations of excellence rhetorics over time-policing access to the right clubs, publication in the right journals, career success and contributions to institutional funding-is deserving of further study and would additionally strengthen the political analysis.
Closing the loop: planning for cultural change In this article, we have advanced an argument that "excellence" is not just unhelpful to realising the goals of research and research communities but actively pernicious. A narrative of scarcity combined with "excellence" as an interchange mechanism leads to concentration of resources and thence hypercompetition. Hypercompetition in turn leads to greater (we might even say more shameless, see Anderson et al., 2007;Fanelli, 2009;Tijdink et al., 2014;Chubb and Watermeyer, 2016) attempts to perform this "excellence", driving a circular conservatism and reification of existing power structures while harming rather than improving the qualities of the underlying activity.
We have also argued that, while many commentaries reviewed throughout this piece lay the blame for this at the feet of external actors-institutional administrators captured by neo-liberal ideologies, funders over-focussed on delivering measurable returns rather than positive change, governments obsessed with economic growth at the cost of social or community value-the roots of the problem in fact lie in the internal narratives of the academy and the nature of "excellence" and "quality" as supposedly shared concepts that researchers have developed into shields of their autonomy. The solution to such problems lies not in arguing for more resources for distribution via existing channels as this will simply lead to further concentration and hypercompetition. Instead, we have argued, these internal narratives of the academy must be reformulated.
Finally, we have argued for a more pluralistic approach to the distribution of resources and credit. Where competition does take place it should do so on the basis of the many different qualities, plural, that are important to different communities using and creating research. But it should also be recognized that competition is not, in this context, an unalloyed good. In the context of assessing the risks of application of research Stirling and others argue for "broadening out and opening up" the technology assessment process (Ely et al., 2014, see also Stilgoe, 2014), that is to say increasing both the set of criteria considered and the range of people who have a voice in its assessment and application. The same approach needs to be applied to research assessment itself.
This leads to our argument for a focus on redistribution instead of concentration, which, we suggest, is necessary for three core reasons. Firstly because "excellence" cannot be recognized or defined consensually, except as a Wittgensteinian "beetle in a box" that no-one has ever seen, and even then, unlike Wittgenstein's beetle-owners, by researchers who cannot agree even within disciplinary communities on which aspects of "excellence" might matter or be useful. Second because, as we have argued, there is a case to be made for redistribution on its own merits. Unlike concentration, and the hypercompetition to which it leads, which break down our standards and cultures in systematic, predictable, and negative ways, redistribution enhances capacity and breadth of participation. And thirdly, we have shown that top-loading of research funding based upon anti-foundational principles of "excellence" is likely to hurt the incremental advances upon which research implicitly relies.
The argument for redistribution is a challenging one to advance. The rhetorics of scarcity, of concentration and competition are linked to strong cultural and economic narratives, particularly in the United Kingdom and United States. But as a route towards this goal we have argued that it is possible to build upon existing narratives of "soundness", "credibility" and "capacity"-which is to say on narratives of reproducibility, transparency, high-quality reporting, and a breadth and diversity of activity-to build a case for strong cultural practices that focus on fundamental standards that define proper scholarly and scientific practice. This focus on the practice of research, including its communications, rather than the performance of success at research can also be aligned with developing narratives of Responsible Research and Innovation and public engagement. For instance the approach of Post-Normal Science advocated by Funtowicz and Ravetz (2003;1990), focuses on assessing the quality of the process of research practice, and emphasises the need to effectively communicate the weaknesses of any claims made on the basis of research.
In taking this approach we root the discourse in long-standing traditions and culture, while also engaging with the newer concerns. It is through showing that we can recognize sound and credible research and that we can build strong cultures and communities around that recognition, that we lay the groundwork for making the case for redistribution. And that would be excellent.

Notes
1 The name of the Matthew Effect is derived from Matthew 13:12: "For whosoever hath, to him shall be given, and he shall have more abundance: but whosoever hath not, from him shall be taken away even that he hath". 2 As noted in the disclosure of competing interests, three of the authors of this article have worked for PLOS previously.