Lateral reading and monetary incentives to spot disinformation about science

Panizza, Folco; Ronzani, Piero; Martini, Carlo; Mattavelli, Simone; Morisseau, Tiffany; Motterlini, Matteo

doi:10.1038/s41598-022-09168-y

Download PDF

Article
Open access
Published: 05 April 2022

Lateral reading and monetary incentives to spot disinformation about science

Folco Panizza^1,2,
Piero Ronzani²,
Carlo Martini^2,3,
Simone Mattavelli⁴,
Tiffany Morisseau^5,6 &
…
Matteo Motterlini²

Scientific Reports volume 12, Article number: 5678 (2022) Cite this article

4739 Accesses
17 Citations
64 Altmetric
Metrics details

Subjects

Abstract

Disinformation about science can impose enormous economic and public health burdens. A recently proposed strategy to help online users recognise false content is to follow the techniques of professional fact checkers, such as looking for information on other websites (lateral reading) and looking beyond the first results suggested by search engines (click restraint). In two preregistered online experiments (N = 5387), we simulated a social media environment and tested two interventions, one in the form of a pop-up meant to advise participants to follow such techniques, the other based on monetary incentives. We measured participants’ ability to identify whether information was scientifically valid or invalid. Analysis of participants’ search style reveals that both monetary incentives and pop-up increased the use of fact-checking strategies. Monetary incentives were overall effective in increasing accuracy, whereas the pop-up worked when the source of information was unknown. Pop-up and incentives, when used together, produced a cumulative effect on accuracy. We suggest that monetary incentives enhance content relevance, and could be combined with fact-checking techniques to counteract disinformation.

How do online users respond to crowdsourced fact-checking?

Article Open access 25 November 2023

Users choose to engage with more partisan news than they are exposed to on Google Search

Article 24 May 2023

Online searches to evaluate misinformation can increase its perceived veracity

Article Open access 20 December 2023

Introduction

Scientific disinformation is the intentional spreading of misleading or outright false content purporting to have a basis in scientific methods and practices. Circulation of inaccurate scientific information can damage both institutions and individuals, further affecting the relation of trust between science and society¹. Successful misconceptions influence the public debate on decisions regarding the effectiveness of a vaccine, the adoption of solutions mitigating climate change, or the cost of a social policy. A prime example of the detrimental effects of scientific disinformation comes from the use of ivermectin, an oral drug that has been widely used in several countries as a treatment against COVID-19 disease, despite no evidence of clinical efficacy². The sharing of false information is easily fuelled by political or social motivations that disregard the best scientific evidence on the matter.

There are structural challenges to fighting the spread of false or misleading information on social media. One key issue is that companies often perceive a trade-off between engaging users and monitoring viral but potentially fake content, to the point of favouring the former over the latter³. Contrasting disinformation is made even more difficult when there is a deliberate intent behind the dissemination. For example, at the peak of the coronavirus infodemic, only 16% of fact-checked disinformation was labelled as such by Facebook’s algorithms, partly because content creators were able to simply repost content with minor changes, thus escaping detection⁴. It is therefore essential that, in combination with a systematic change in policy, users themselves are empowered against malicious or false content. Lay evaluation of science-related disinformation is harder than other forms of disinformation (e.g. political) because in the former case the lines between expertise and pseudoexpertise are blurred, and incompetent or otherwise biased sources pose as expert sources on topics like epidemiology or climate change⁵.

Research on countering disinformation has developed substantially over the last decade, bringing a wealth of different approaches^6,7,8,9,10. These include debunking, the systematic correction of false claims after they have been seen or heard^11,12, pre-bunking, preventive measures before exposure to disinformation^7,13, nudging, interventions affecting users’ choices without limiting their freedom of choice¹⁴, and boosting, the empowering of users by fostering existing competences or instilling new ones¹⁴. All of the above approaches have proven to be useful in a social media context, not least by adopting ingenious and innovative adaptations of classical paradigms. Debunking has been extensively studied, with several experiments focusing on the source^15,16,17,18 and the timing¹⁹ of fact checking. Research has also explored whether evaluations of the quality of contents and sources can be delegated to the so-called wisdom of crowds, with encouraging results^20,21,22,23 (for a less optimistic perspective, see^24,25). Studies on pre-bunking have largely focused on the concept of inoculation^7,26, namely exposing users to disinformation strategies in order to ease their recognition in future settings. Inoculation has demonstrated pronounced and lasting effects when introduced through games^27,28,29,30. Nudging was also tested by showing warning labels for unchecked or false claims^31,32,33,34, but also by priming users to pay attention to the accuracy of content they might be willing to share^35,36,37 (however see³⁸ for a critique of this approach). Finally, boosting was tested by presenting users with a list of news/media literacy tips or guidelines on how to evaluate information on-line^{39,40,41,42,43}, producing some remarkable results and some non-significant ones.

A promising example of media literacy intervention has been carried out by researchers interested in understanding how fact checkers navigate information when evaluating unfamiliar sources⁴⁴. Researchers catalogued fact checkers’ strategies and distilled them into an educational curriculum called Civic Online Reasoning^45,46. In particular, fact checkers adopt two core strategies to avoid being biased in their search. The first strategy is lateral reading, namely leaving a website and opening new tabs along a horizontal axis to use the resources of the Internet to learn more about a site and its claims. Appearance of websites can sometimes mislead about their reliability, hence reading laterally helps to identify potential issues such as undisclosed interests or false credentials. The second core strategy is click restraint, that is, to sift through search results of a browser search before clicking on any link. Given the well-documented tendency to open the first results without looking further⁴⁷, order of appearance is vulnerable to manipulation: websites can in fact improve their rank in the results to increase incoming traffic, a process called search engine optimisation. A restraint from clicking thus prompts users to explore the various sources to discern which ones are the most trustworthy. Lateral reading and click restraint seem particularly fit when a content has unknown origins that are hard to identify or that appear legitimate on the surface, a feature that has been associated with content creators spreading scientific disinformation⁴⁸.

In the absence of expertise and content knowledge, users can rely on a number of external cues to infer whether information presented as scientific is reliable⁴⁹. Lateral reading and click restraint can thus be used when scientific disinformation is deceptively sophisticated and difficult to detect. Indeed, training on Civic Online Reasoning has proven effective in countering disinformation among high school and college students^50,51,52, as well as elderly citizens⁵³. Despite extensive research on Civic Online Reasoning, so far little attention has been paid to the application of these techniques on social media. It is therefore unclear how effective presenting these strategies on a social network can actually be.

Critical thinking strategies might not be the only potentially effective tools in evaluating scientific (dis)information. For instance users might not be sufficiently motivated to evaluate the truthfulness of the content they see^6,54,55. Many users might share news simply because they come from a source they trust or like, or because those news align with their values, without paying much attention to accuracy. The spread of scientific disinformation then is not only related to false beliefs, but also to motivated behavior, paired with strong personal identities and values. In order to better exploit the benefits of critical thinking tools, it is therefore also important to identify the respective effects of being aware of truth-motivated strategies; i.e., being motivated to know the truth about a given topic. It may be that people, while being somehow familiar with fact-checking techniques, are only eager to apply them when identifying the truthfulness of the information is reinforced by specific incentives.

One way to test the effect of motivation then is the use of monetary incentives. In other words, does paying participants for their being accurate increase their accuracy in the evaluation of content? The idea behind this intervention is that money increases motivation, and thus the attention paid to otherwise ignored cues about the accuracy of content. A 40-year meta-analysis⁵⁶ points out that both monetary incentives and intrinsic motivation predict performance, and that incentives are particularly relevant when they are directly tied to performance. Moreover, a study conducted in a setting comparable to the present experiment showed that monetary incentives are the main driver for people to spend time solving online tasks even in the face of small average earnings⁵⁷.

Monetary incentives have been proven to be a cost effective tool to modify behavior in domains such as health and human development⁵⁸, where often an early boost in motivation promotes the adoption of cheap preventive behaviours, avoiding this way costly consequences⁵⁹. From a psychological perspective, the use of incentives builds on the attention-based account of disinformation spread. This account posits that certain features of social networks favour the dissemination of interesting and unexpected content at the expense of accuracy^6,60. Recent research in this field has found both laboratory and field evidence that accuracy of content is often overlooked and that simple cues reminding participants to evaluate the accuracy of content reduce participants’ willingness to share fake news^{35,37,61,62,63} (or possibly increase true news sharing³⁸). Increasing accuracy through incentives is not an entirely novel idea in social media either, as shown in a recent initiative promoted by Twitter⁶⁴. Although these premises indicate that this type of intervention can be effective, it is not a given that economic incentives will have a positive effect on scientific content evaluation. In an experimental setting in particular, social media content is subject to higher scrutiny than when users scroll through their news feed³⁵. It is therefore possible that additional incentives may not further increase participants’ accuracy.

The aim of the present study was to test and compare the effectiveness of Civic Online Reasoning techniques and monetary incentives in contributing to the recognition of science-related content on social media. We conducted two pre-registered experiments where participants observed and interacted with one out of several Facebook posts that linked to an article presenting science-themed information. Participants were free to conduct further research on external websites in order to form a more accurate idea of the scientific validity of the post. Once satisfied with the information they gathered, participants rated how scientifically valid the claims contained in the post were. To test for the usefulness of Civic Online Reasoning techniques, we designed a pop-up that preceded the post presenting the lateral reading and click restraint strategies Fig. 1. The use of a pop-up ensured that participants processed the content before observing the post, an approach that has also been adopted in previous research⁶². A pop-up could be easily adapted in a social media setting as regular reminders with the necessary precautions to avoid the reduction of their salience with time^65,66. To test the effect of monetary incentives instead, we doubled the participation fee (equivalent to an average +£8.40/hour) if participants guessed correctly the validity of the post they were evaluating.

Experiment 1

In Experiment 1, we tested separately the efficacy of pop-up and monetary incentives, and compared their effects to a control condition with no interventions. To assess that the effect of the interventions is effective over the widest possible range of contexts, we used a set of 9 different Facebook posts varying in various properties, such as the scientific topic, the source reputation, and its level of factual reporting. The original pre-registration of this experiment can be retrieved from osf.io/gsu9j.

Materials and methods

Ethics statement

All participants gave their written informed consent for participating in the experiment. The experimental protocols were approved by the Research Ethics Committee (CER) at the University of Paris (IRB No: 00012021-05), and all research was performed in accordance with the relevant guidelines and regulations.

Participants

We recruited 2700 U.K. residents through the online platform prolific.co on 11 March 2021 (for a rationale of sample size, see S1 Methods ). Average age was 36 (\(SD=13.5\), 8 not specified), 60.7% of participants were female, (39.1% male, 0.2% other), and 55.6% had a Bachelor’s degree or higher. Although recruitment explicitly specified that the experiment was supported only on computers or laptops, 316 participants (11.7%) completed the experiment on a mobile device. As our hypotheses were based on the assumption that search would happen on a computer (where internet browsing easily allows to read laterally), both stimuli and measures were not designed for mobile use. We therefore had to exclude these participants from the analyses. Analyses were thus conducted on 2384 participants.

Design

We conducted the experiment on Qualtrics and lab.js⁶⁷. During the experiment, participants observed and were able to interact with one out of several Facebook-like posts (Fig. 2 shows three examples; click here for an interactive example from Experiment 2). Participants’ task was to rate the scientific validity of the statements reported in the title, subtitle, and caption of the post (“how scientifically valid would you rate the information contained in the post?”; 6-point likert scale from (1) “definitely invalid” to (6) “definitely valid”). Researchers rated independently the scientific validity of the posts’ content in terms of valid/invalid according to pre-specified criteria (see S3 Methods ). Participants could take as much time as they wanted in giving their rating. Crucially, participants were also explicitly told that they were allowed to leave the study page before evaluating the post. After the rating, participants completed a questionnaire and were paid £0.70 for their time. Median completion time of the experiment was 5 minutes.

Experimental conditions Participants were randomly assigned to one of three experimental conditions: control, incentive, and pop-up. In the control condition, participants completed the task as described above. In the incentive condition, participants were doubled their participation fee if their rating matched that given by the experimenters. Unknown to participants, the correctness of the answer depended only on whether the answer was valid or invalid, and not on the extremity of the answer (e.g. having answered 4 instead of 5), even though we selected unambiguously valid or invalid content. In the pop-up condition, presentation of the post was preceded by a pop-up (Fig. 1) presenting a list of civic online reasoning techniques (e.g., lateral reading, click restraint) as tips to verify the information in the post.

Stimuli Each participant observed one out of nine possible Facebook posts (Fig. 2; see S1 File for a full list). Posts varied in terms of: (i) scientific validity of the content (i.e., six valid and three invalid posts, either with verified or debunked information; S3 Methods ); (ii) topic (i.e., three on climate change, three on the coronavirus pandemic, three on health and nutrition); (iii) factual reporting of the source, based on ratings from mediabiasfactcheck.com (i.e., three high/very high versus six low/very low); (iv) source reputation, as measured in a screening survey (S4 Methods ; three categories: trusted (2 posts), distrusted (4), unknown source (3)). Posts were balanced to have three posts for each topic, one from a source with high factual reporting displaying valid information, one from a source with low factual reporting displaying valid information, and one from a source with low factual reporting displaying invalid information.

We standardised emoji reactions across all posts to control for their influence. In addition, post date, number of reactions and shares were blurred. The rest of the post was instead accessible to the participant, who could click on different links to access the source Facebook page, the original article, and the Wikipedia page (if present). Text and images were taken from the article and are publicly available (original links: osf.io/ces8g for Experiment 1 and osf.io/xsr43 for Experiment 2). Captions were short statements of a scientific nature, i.e. facts or events pertaining to some scientific mechanism.

Measures

Accuracy We computed two measures of accuracy–correct guessing and accuracy score. Correct guessing refers to a dichotomous variable that tracks whether participant gave a ’valid’ (vs. ’invalid’) rating when the post content was actually scientifically valid (vs. invalid). Accuracy score instead is a standardised measure ranging from zero to one, with 0 indicating an incorrect “1” or “6” validity rating, 0.2 indicating an incorrect “2” or “5” rating, 0.4 an incorrect “3” or “4” rating, 0.6 a correct “3” or “4” rating, 0.8 a correct “2” or “5” rating, and 1 a correct “1” or “6” rating. Accuracy score allows to distinguish validity evaluations that are associated with different behaviours: for instance, not all participants would be willing to share content that they rated as 4 in terms of scientific validity. In addition, accuracy score is statistically more powerful than correct guessing as it includes more possible responses⁶⁸. We thus considered accuracy score as our main index.

Search behaviour During the evaluation of the post, we tracked participants’ behaviour on the study page. We measured the time spent both inside and outside the page, and a series of dummy variables tracking whether participants had clicked on any of the links present (e.g., Facebook page, article page, Wikipedia page). Based on these calculations we were able to estimate participants’ response times and search behaviour.

Civic online reasoning After having rated the scientific validity of the post, participants completed a questionnaire investigating those factors that could have influenced their choice. In order to test our hypotheses, we asked participants whether they engaged in lateral reading and click restraint. Participant were said to have used lateral reading if they reported having searched for information outside the study page (yes/no question), and if they specifically searched on a search engine among other destinations (multiple selection question). Participants were said to have used click restraint if they further reported looking beyond the first results suggested by the search engine (multiple choice question). Critically, questions were formulated in such a way as to avoid any expectation as to which answer to select, and thus reduce the influence of the experimenter.

Control measures In addition to measures of accuracy and civic online reasoning, we included a series of control measures for our analyses (S5 Methods ). Other questions included self-report measures of confidence in the validity rating, plausibility of the post content, subjective relevance of obtaining accurate information about the post, familiarity with the source, perceived trustworthiness of the source, subjective knowledge of the topic, trust in scientists, conspiratorial beliefs, and a scientific literacy test. In addition to responses in the questionnaire, we obtained information about participants from the recruiting platform, such as their level of education, socio-economic status, social media use, and belief in climate change.

Analyses

Statistical tests were conducted using base R⁶⁹. We adopted the standard 5% significance level to test against the null hypotheses. All tests were two-tailed unless otherwise specified. Post-hoc tests and multiple comparisons were corrected using the Benjamini-Hochberg procedure, and 95% confidence intervals were also family-wise corrected. Non-parametric statistics were log-transformed for conciseness. For probability differences, the lower boundary indicates the 2.5% quantile of the effect of the target variable starting from the 2.5% quantile of the baseline probability estimate, whereas the upper boundary indicates the 97.5% quantile of the effect of the target variable starting from the 97.5% quantile of the baseline probability estimate. Given the small number of stimuli (\(N<10\)), we do not cluster errors by Facebook post in our regression analyses. The use of random effects yields however comparable results in magnitude and statistical significance unless otherwise reported.

Deviations from the pre-registered protocol

Although we tried to be as faithful as possible to the original pre-registered protocol, we made some changes which we report here:

Scientific validity labels: labels for 1 and 6 responses were changed from “completely” to “definitely” invalid/valid
Exclusion of mobile users: we anticipated that participants would have accessed the experiment exclusively through a computer or laptop, and we explicitly defined this as a requirement to participate in the study. Some participants however did participate using a mobile device. For this reason we had to introduce an additional exclusion criterion, use of a mobile device (see Participants).
Effect of interventions on accuracy score: we report an ordinary logistic regression (Effect of interventions), originally listed as exploratory analysis, in lieu of the pre-registered ANOVA test. We deemed preferable to report a non-parametric test due to the strong violation of normality of the dependent variable. The ANOVA analysis yields the same results; it is reported in S1 Analyses.
Effect of interventions on correct guessing: to test correct guessing, preregistered analyses proposed the use of a probit regression. We chose however to report results of a logistic regression for ease of comparison with the other tests reported, considering that the two regressions yielded the same results.

With the exception of the above-mentioned deviations, we conducted our analyses as described in the original pre-registration.

Results

Participant randomisation was balanced across conditions (Chi squared test, \(\chi ^2(2)=0.016\), \(p=.99\)). Median time to evaluate the Facebook post was 33 seconds in the control condition (incentive condition: 45 seconds; pop-up condition: 35 seconds; minimum overall time: 2 seconds, maximum overall time: 40 minutes). In the pop-up condition, participants spent an additional median time of 11 seconds on the pop-up. On a scale from 1 to 6 (3.5 response at chance level), average accuracy score in the control condition was 4.35 (\(SD=1.20\); incentive condition 4.48, \(SD=1.32\); pop-up condition 4.35, \(SD=1.19\)). In the control condition, 78.2% of participants correctly guessed the scientific validity of the post (incentive condition: 80.1%; pop-up condition: 78.1%).

Effect of interventions

To test the effect of our interventions on accuracy, we adopted two tests, one for the accuracy scores, and one for correct guessing (original preregistered analyses are presented in S1 Analyses ; se also Deviations from the pre-registered protocol). Since accuracy scores were clearly non-normally distributed (Shapiro-Wilk test, all \(p<0.001\)), we used an ordinal logistic regression in place of the linear regression to test the effect of condition on accuracy scores. Results showed a significant effect of incentive (\(\beta =0.293\) [0.092, 0.494], \(z=3.225\), \(p=0.003\)) and a lack of significance for the pop-up (\(\beta =-0.009\) \([-0.207, 0.188]\), \(z=-0.103\), \(p=.918\)). According to the model, the probability of giving a “definitely valid” (“definitely invalid”) correct response increases by 4.4% [1.5%,8.2%] in the incentive condition compared to the control condition. Exploratory analyses suggest that incentives were particularly effective in increasing accuracy scores in valid posts (against control: \(\beta =0.3582\) [0.07329, 0.6431], \(z=3.268\), \(p=0.003\); against pop-up: \(\beta =0.3713\) [0.0913, 0.6514], \(z=3.447\), \(p=0.003\); S5 Analyses ). These last results should be taken with caution however, as posts from trusted sources were all presenting valid content.

Technique adoption

To compare the adoption of Civic Online Reasoning techniques between experimental conditions (pre-registered hypothesis 2) we used a logistic regression with technique use (adoption of both lateral reading and click restraint) as predicted variable and experimental condition as predictor. Results revealed that both incentive and pop-up increased technique adoption (Fig. 3; incentive: \(\beta =1.042\) [0.527, 1.556], \(z=4.728\), \(p<0.001\); pop-up: \(\beta =1.556\) [1.065, 2.046], \(z=7.405\), \(p<0.001\)), but that the increase was markedly higher with the presence of the pop-up than with monetary incentives (\(\beta =0.514\) [0.157, 0.871], \(z=3.362\), \(p<0.001\)).

Exploratory: technique adoption

Since our measure of technique use is based on self-reporting, responses might have been biased by external expectations. We therefore checked whether participants who reported the use of techniques actually left the study by tracking their behaviour on the post’s web page. According to our measures, 80% of these participants left the study in the control condition, compared to 87% in the pop-up and 90% in the incentive conditions. This result, if anything, suggests that our interventions did not increase the rate of false reporting. Moreover, even after accounting for false reports, results did not differ (incentive: \(\beta =1.156\) [0.594, 1.719], \(z=4.791\), \(p<0.001\); pop-up: \(\beta =1.626\) [1.087, 2.166], \(z=7.024\), \(p<0.001\); pop-up > incentive: \(\beta =0.467\) [0.095, 0.845], \(z=2.920\), \(p=0.004\); see sections S2 Analyses and S6 Analyses for an in-depth exploration of participants’ search behaviour).

Did the use of lateral reading and click restraint actually improve post evaluation? And did the use of techniques mediate the effect of our interventions? To test our first question, we ran an ordinal logistic regression with accuracy score as predicted variable, and a standard logistic regression with correct guessing as predicted variable, both tests including adoption of techniques as the sole predictor. Results showed that accuracy score improved significantly if a participant reported using Civic Online Reasoning techniques (\(\beta =0.526\) [0.274, 0.778], \(z=4.090\), \(p<0.001\)). According to the model, the use of Civic Online Reasoning Techniques increased the probability of giving a “definitely valid” (“definitely invalid”) correct response by 8.8% [4.0%,14.7%]. This result however was not confirmed by the standard logistic regression on correct guessing, which instead found no significant effect of technique adoption (\(\beta =0.219\) \([-0.121,0.580]\), \(z=1.228\), \(p=0.220\)).

Based on these results, we proceeded to test whether pop-up and incentives had some mediated impact on accuracy score through technique adoption. To test mediation we used the R package MarginalMediation⁷⁰. Technique adoption was found to mediate the effect of both incentive and pop-up on accuracy score (incentive: unstandardised \(\beta =0.004\) [0.001, 0.006], \(z=4.728\), \(p<0.001\); pop-up: unstandardised \(\beta =0.007\) [0.003, 0.012], \(z=7.405\), \(p<0.001\)). Although testing for one mediator cannot exclude countless other explanatory variables, this analysis suggests an indirect relation between both interventions and accuracy scores.

Exploratory: response times

As we expected monetary incentives to increase motivation, we tested whether response times (a common proxy for increased deliberation and attention) were affected by our interventions. We compared participants’ evaluation time of the post (excluding the time spent on the pop-up) across conditions by way of a Kruskal-Wallis rank sum test. The test was significant (\(\chi ^2(2)=67.63\), \(p<0.001\)), thus we conducted post hoc comparisons. All comparisons were significant, with participants in the incentive condition taking significantly more time than control (\(\log (V)=8.02\), \(p<0.001\)) and pop-up (\(\log (V)=5.54\), \(p<0.001\)) participants, and pop-up participants taking more time than control (\(\log (V)=2.41\), \(p=0.016\)).

We tested whether longer evaluation times predicted higher accuracy scores by means of an ordinal logistic regression with log-transformed evaluation time as predictor and accuracy score as predicted variable. Results revealed a significant and positive association (\(\beta =0.182\) [0.095, 0.268], \(z=4.12\), \(p<0.001\)). The result was confirmed also for correct guessing (logistic regression, \(\beta =0.242\) [0.120, 0.366], \(z=3.87\), \(p<0.001\)).

We additionally looked at how much time participants spent outside the study page when they left without clicking any link (a proxy of lateral reading). The Kruskal-Wallis test was again significant (\(\chi ^2(2)=13.482\), \(p=0.001\)): of those participants who performed such external searches, control participants spent less time outside the page than participants in both the incentive (\(\log (V)=2.85\), \(p=0.006\)) and the pop-up conditions (\(\log (V)=3.58\), \(p=0.001\)), whereas we found no significant difference between incentive and pop-up (\(\log (V)=.92\), \(p=0.360\)).

Exploratory: source reputation

Civic Online Reasoning techniques were originally designed for helping to evaluate content from seemingly legitimate but unknown websites⁴⁴. We thus analysed differences in our interventions based on the recognisability and perceived trustworthiness of the posts’ sources. The importance of a source’s perceived trustworthiness was exemplified by two posts covering the same scientific article, one from BBC News (a source trusted by most participants), and another one from the Daily Mail (a source barely trusted by most participants). Despite the posts covered the same content and presented similar wording, participants’ evaluation of the two posts differed considerably: average accuracy score was 4.7 for the BBC piece (\(SD=1.05\)) and 4.05 for the Daily Mail piece (\(SD=1.08\); ordinal regression: \(\beta =1.255\) [.926, 1.584], \(z=7.470\), \(p<0.001\)), and the proportion of correct guesses was 90.7% and 77.3%, respectively (logistic regression: \(\beta =1.059\) [0.568, 1.576], \(z=4.132\), \(p<0.001\)).

Perhaps not surprisingly, we observed that, in the pop-up condition, adoption of lateral reading and click restraint was strongly linked with source type (Chi squared test with technique adoption and source category as variables, \(\chi ^2(2)=15.407\), \(p<0.001\)): when the source was trusted, only 6.7% of participants used these techniques, whereas the proportion was 20% when the source was unknown. We then tested differences of the interventions by source type in accuracy scores and correct guessing. Likelihood-ratio tests confirmed the importance of this variable for both analyses (\(p<0.001\)), however family-wise corrected contrasts revealed only one significant result, the effect of incentive on accuracy scores for unknown sources (\(\beta =0.558\) [0.114, 1.001], \(z=3.445\), \(p=0.005\); Fig. 4; see S4 Analyses for results about the uncorrected contrasts).

Discussion

Results from Experiment 1 suggest that paying participants to be accurate does increase the accuracy score but not the proportion of participants correctly guessing the scientific validity of the posts. Exploratory analyses suggest that, compared to control, participants with an incentive gave more extreme answers, reported engaging in Civic Online Reasoning techniques more often (and did leave the page more often), spent more time in searching information outside the study page, and took longer to evaluate the post (even compared to pop-up participants). These results support the idea that monetary incentives affect accuracy, possibly by increasing motivation and attention in the task, although this hypothesis would need further testing.

By contrast, the presence of the pop-up seemed not to affect directly any indicator of accuracy. In spite of that, participants in the pop-up condition reported more lateral reading and click restraint, as well as the frequency of searches outside the study page. In turn, this increment of Civic Online Reasoning techniques (up to +13.5% when source is unknown) seems to mediate a small but significant increase in accuracy scores (exploratory marginal mediation analysis), suggesting an indirect effect of the pop-up. An effect of pop-up is possibly seen in posts produced by unknown sources, where correct guessing (but not accuracy scores) is slightly higher in the pop-up condition than in control (S4 Analyses).

These results suggest that monetary incentives might have more consistent effects over the presentation of Civic Online Reasoning techniques. At the same time, we observe considerable variability in participants’ behavior depending on specific features of the posts. For instance, source reputation seems to have a remarkable effect on the adoption of Civic Online Reasoning techniques, which were (foreseeably) overlooked by almost all participants when looking at posts from generally trusted sources.

One potential takeaway from these findings is that some prior beliefs might affect the rate at which participants look for information outside the content provided (e.g. familiarity and opinion about the source), as well as in the way they look for such information. To explore this possibility, we designed a second experiment in which we tried to reduce the influence of prior beliefs by presenting posts from generally unknown sources. Lack of source knowledge is indeed common on social media (e.g., sponsored content), and it should arguably increase the rate at which participants rely on external information. In addition, we included a fourth condition where we test the combination of monetary incentives and Civic Online Reasoning techniques, to explore whether and how the two interact.

Experiment 2

In line with evidence in the literature, we expected an increased impact of our interventions in a context where participants could rely on less prior information. We thus conducted a second experiment that was statistically powered to test for this possibility. In the Experiment 2 we replicated the format of the first one, with two main modifications: 1) we ran a pre-screening survey to identify lesser-known sources of information and only used those sources as the basis for the Facebook posts the participants were asked to evaluate; 2) we added an experimental condition that included both incentive and pop-up interventions, to test the interaction between the two. we advanced the idea that the two intervention strategies might trigger distinct behavioral outcomes (i.e., increased time spent on the task and use of Civic Online Reasoning). If this is the case, then combining the two interventions should produce even stronger effects on accuracy. The original pre-registration of this experiment can be retrieved from osf.io/w9vfb.