Researchers often complain about the indicators that hiring and grant committees use to judge them. In the past ten years, initiatives such as the San Francisco Declaration on Research Assessment and the Leiden Manifesto have pushed universities to rethink how and when to use publications and citations to assess research and researchers.

The use of rankings to assess universities also needs a rethink. These league tables, produced by the Academic Ranking of World Universities (ARWU) and the Times Higher Education World University Ranking (THE WUR) and others, determine eligibility for scholarships and other income, and sway where scholars decide to work and study. Governments devise policies and divert funds to help institutions in their countries claw up these rankings. Researchers at many institutions, such as mine, miss out on opportunities owing to their placing.

Two years ago, the International Network of Research Management Societies (INORMS), a collective of research-management organizations, invited me to chair a new working group on research evaluation with members from a dozen countries. From our first meeting, we were unanimous about our top concern: the need for fairer and more responsible university rankings. When we drew up criteria on what those would entail and rated the rankers, their shortcomings became clear.

This week, the Global Research Council, which includes heads of science- and engineering-funding agencies, is gathering experts online to discuss how assessments can improve research culture. This should include how university rankings are constructed and used.

The literature on research management is full of critiques of rankings. Rankings are methodologically challenged — often using inappropriate indicators such as counting Nobel-prizewinning alumni as a proxy for offering a quality education. They favour publications in English, and institutions that did well in past rankings. So, older, wealthier organizations in Europe and North America consistently top the charts. Rankings apply a combination of indicators that might not represent universities’ particular missions, and often overlook societal impact or teaching quality.

Nonetheless, they have become entrenched, with new rankers cropping up each year. As with the journal impact factor, students, faculty members and funders turn to rankings as a lazy proxy for quality, no matter the flaws. The consequences are all too real: talent deterred, income affected. And inequities quickly become embedded.

Our working group combed the literature to develop our criteria, and asked for feedback through various community discussion lists open to academics, research-support professionals and related groups. We synthesized feedback into 20 principles involving good governance (such as the declaration of financial conflicts of interest), transparency (of aims, methods and data), measuring what matters (in line with a university’s mission) and rigour (the indicators are a good proxy for what they claim to measure).

Then we converted these principles into a tool to assess rankings, qualitatively and quantitatively (see go.nature.com/2ioxhhoq). We recruited international specialists to assess six of the world’s highest-profile rankers, and invited rankers to self-assess. (Only one, CWTS Leiden, did so.) Richard Holmes, editor of the University Ranking Watch blog, calibrated the results, which we presented as profiles, not rankings.

The rankings with the largest audiences (ARWU, QS World University Ranking, THE WUR and US News & World Report global ranking) were found most wanting, particularly in terms of ‘measuring what matters’ and ‘rigour’. None of these ‘flagship’ rankings considered open access, equality, diversity, sustainability or other society-focused agendas. None allows users to weigh indicators to reflect a university’s mission. Yet all claim to identify the world’s best universities.

Rankers might argue that our principles were unrealistic — that it’s impossible to be completely fair in such evaluations, and that simple, overarching metrics have their place. I counter that we derived the principles from community best-practice expectations, and if rankers cannot meet them, perhaps they should stop ranking, or at least be honest about the inherent uncertainty in their conclusions (in our assessment, only CWTS Leiden attempted this).

Ultimately, rankers need to be made more accountable. I take heart from new expectations about how researchers are evaluated. From January 2021, UK research funder Wellcome will fund only organizations that present evidence that they conduct fair output assessments for researchers. Similarly, the European Commission’s ‘Towards 2030’ vision statement calls for higher education to move beyond current ranking systems for assessing university performance because they are limited and “overly simplistic”.

We hope that drawing attention to their weaknesses will draw in allies to push for change, such as neutral, independent oversight and standards for ethics and rigour as applied to other aspects of academia.

Such pressure could lead to greater alignment between the world rankers’ approaches and the higher-education community’s expectations for fair and responsible rankings. It might also help users to wise up to rankings’ limitations, and to exercise due caution when using them for decision-making. Either would be progress.