Psychology looks to physics to solve replication crisis
Brian Nosek explains the potential for accelerating collaboration.
6 January 2020
sorbetto/Getty Image
In the summer of 1998, a group of psychologists left their labs at Yale University and headed to the beach. They pitched a tent and began running experiments.
Among them was Brian Nosek, a graduate student interested in the subconscious biases that affect our social interactions. The effects he was looking for are subtle. For the signal to cut through the noise, he needed thousands of research participants – far more than he could feasibly test on his own.
His fellow students and postdocs were facing a similar predicament, so they decided to consolidate their efforts. Together, they would run up to 300 beach-goers a day through their experiments, handing out lottery tickets and sodas as rewards.
“It was fun,” Nosek recalls. “We got great tans.”
While the findings of his experiments were never published, for Nosek, now a professor of psychology at the University of Virginia and executive director of the Center for Open Science in Charlottesville, Virginia, it was his first taste of team science and a lesson in the power of collaboration.
Traditionally, psychology researchers have worked individually or in small groups, often within the same lab. Most papers in the field’s top journals have five or fewer authors.
But as psychologists have begun to grapple with concerns about the reliability and validity of many of their key findings, the need for broader collaboration has become increasingly urgent.
Nosek draws an analogy with physics, where mutual dependence on shared research infrastructure has made massive collaboration the norm. He points to CERN and LIGO as the most obvious examples.
“It’s not that we need to all gather around a machine,” he says. “But to do the kind of science that we want to do, we need to gather our access to human populations. [We need] to be able to look at small effects, to look at how those effects might vary across culture and context, and to then see how robust and how generalizable they are.”
Like many in the field, Nosek identifies 2011 as a turning point. At conferences and in online discussions, concerns were growing about questionable research practices and failed replications. But there was, he says, a frustrating lack of hard evidence.
“Everyone was just passing around anecdotes,” he recalls. “I thought, ‘This is ridiculous. We’re scientists. We have to evaluate it.’ But I immediately knew that the only way we could do that was as a crowd source.”
In 2014, Nosek and 50 co-authors published the Many Labs project, which attempted multiple replications of 13 key findings from psychology. A year later came the Reproducibility Project: Psychology, with 270 authors.
Many Labs projects 2 and 3 have followed, with 192 and 64 authors, respectively. Many Labs projects 4 and 5 were published as preprints in December and October this year, and 2018 saw the first Many Analysts project, with 29 independent research teams interrogating the same dataset. Projects replicating key findings in infant psychology (Many Babies) and brain electrophysiology (Many EEGs) are also underway.
The latest such initiative, Systematizing Confidence in Open Research and Evidence (SCORE), is, as Nosek puts it, “all of these replication projects on steroids”.
Recently funded with up to US$7.6 million by the United States Defense Advanced Research Projects Agency, the aim is to provide credibility scores for studies in the social sciences using human experts and artificial intelligence. The scoring algorithm will then be validated via crowd-sourced replications.
Crowd-sourcing from within
The success of these various replication projects has, says Nosek, owed much to their transparency. “There has been no hidden agenda,” he says. “Everyone could see what we were doing and anyone who wanted to get involved could get involved.”
A limitation, however, is that each of these projects was driven by a small leadership group that set the research agenda. As psychology moves from large-scale replication studies to original research, and from one-off projects to longer-term, institutionalized collaborations, the community needs a stake in deciding which studies are performed.
This future, Nosek says, is exemplified by the Psychological Science Accelerator. Initiated in 2017 by Chris Chartier, an associate professor at Ashland University in Ohio, this “CERN for psychological science” is a standing network of 548 labs from 72 different countries, that are prepared to collaborate in large-scale psychology research projects.
Members are invited to submit proposals, which are then peer reviewed and rated by other Accelerator members, who indicate whether or not they would run it in their own labs. To date, six projects have been approved.
The first, a cross-cultural study of people’s judgement of faces, is currently under review as a Stage 2 Registered Report with the journal Nature Human Behaviour. Together, its 243 co-authors have collected data from almost 11,500 participants in 41 countries.
Nosek, an unofficial advisor to the Accelerator, has been following with interest. “They’ve done a very nice job of setting transparent standards, defining a process, making it inclusive, and really initiating something that is sustainable,” he says.
At the same time, he is conscious of the challenges ahead for initiatives like the Accelerator. Prominent among these is ensuring that researcher contributions are recognized and valued by hiring, promotion, and funding committees.
To this end, the Accelerator has adopted the CRediT (Contributor Roles Taxonomy) schema, which formalizes 14 different scientific roles and allows researchers to articulate their precise contribution to a project in a way that goes beyond authorship of the resultant paper.
Large scale vs small team
But for the collaborative model to really fulfil its promise, Nosek argues, psychology needs a more fundamental culture change.
It should be possible, he says, to build a career by specializing in different components of the research cycle; to play to individual strengths and build a reputation as, for example, a theorist, a data collector, or an analyst, without necessarily having to excel at everything.
There are, Nosek admits, important trade-offs between large-scale collaborations and the traditional, small-team approach. But ultimately, he sees the two approaches as complementary. The exhaustive testing of a single, well-articulated idea on a global scale needs to build upon smaller-scale testing of many different ideas by individual labs.
The evidence, he suggests, is that this division of labour is emerging organically. “If you look at who is contributing to these large projects, it’s not the faculty at Stanford and Harvard. They’ve got resources. They do productive research in the traditional model.”
Instead, he argues, the large collaborations are tapping into a latent pool of under-resourced talent – researchers who don’t have a grant or whose teaching and administration commitments leave insufficient time to conduct meaningful research on their own.
“We are massively growing contributorship to science,” Nosek says. “If we frame it that way, it’s not really a trade-off at all.”