From alternative conceptions of honesty to alternative facts in communications by US politicians

The spread of online misinformation on social media is increasingly perceived as a problem for societal cohesion and democracy. The role of political leaders in this process has attracted less research attention, even though politicians who ‘speak their mind’ are perceived by segments of the public as authentic and honest even if their statements are unsupported by evidence. By analysing communications by members of the US Congress on Twitter between 2011 and 2022, we show that politicians’ conception of honesty has undergone a distinct shift, with authentic belief speaking that may be decoupled from evidence becoming more prominent and more differentiated from explicitly evidence-based fact speaking. We show that for Republicans—but not Democrats—an increase in belief speaking of 10% is associated with a decrease of 12.8 points of quality (NewsGuard scoring system) in the sources shared in a tweet. In contrast, an increase in fact-speaking language is associated with an increase in quality of sources for both parties. Our study is observational and cannot support causal inferences. However, our results are consistent with the hypothesis that the current dissemination of misinformation in political discourse is linked to an alternative understanding of truth and honesty that emphasizes invocation of subjective belief at the expense of reliance on evidence.

evidence-based truth seeking. We show that for Republicans-but not Democrats-an increase of belief-speaking of 10% is associated with a decrease of 12.8 points of quality (NewsGuard scoring system) in the sources shared in a tweet. Conversely, an increase in truth-seeking language is associated with an increase in quality of sources for both parties. The results support the hypothesis that the current dissemination of misinformation in political discourse is in part driven by an alternative understanding of truth and honesty that emphasizes invocation of subjective belief at the expense of reliance on evidence.

Introduction
Numerous indicators suggest that democracy is in retreat worldwide e.g., 1;2 . Although symptoms and causes of this democratic backsliding are difficult to tease apart, the widespread dissemination of misinformation-on social media, in hyperpartisan news sites, and in political discourse-is undoubtedly a challenge to democracies 3 . There is increasing evidence that exposure to misinformation can cause people to change their behavior e.g., 4 . Exposure to misinformation has been identified as a contributing cause of voting for populist parties in Italy 5 and has been causally linked to ethnic hate crimes in Germany ( 6 ; for a review of causal effects, see 7 ). Note that we use "misinformation" as an umbrella term to refer to any information that people consume and which later on turns out to be false. Misinformation can be spread unintentionally, when communicators mistakenly believe some item of information to be true, or it can be spread intentionally, for example in pursuit of a political agenda. Intentionally disseminated misinformation is often referred to as "disinformation". The psychological and cognitive consequences of disinformation are indistinguishable from those of unintentional misinformation, and we therefore use the latter term throughout.
Misinformation has several troubling psychological attributes. First, misinformation lingers in memory even if people acknowledge, believe, and try to adhere to a correction 8 . Even though people may adjust their factual beliefs in response to corrections 9 , their political behaviors and attitudes may be largely unaffected 10;11 . Second, perhaps most concerningly, in some circumstances people may even come to value overt dishonesty as a signal of "authenticity" 12 . A politician who routinely and blatantly misinforms the public is overtly violating the established societal norm of being accurate and truthful. Within a populist logic, this norm violation identifies the politician as an enemy of the "establishment" and, by implication, an authentic champion of "the people"-dishonesty and misinformation thus become a sign of distinction 12 . For example, polls have shown that around 75% of Republicans considered President Trump to be "honest" at various points throughout his presidency (e.g., NBC poll, April 2018). This perception of honesty is at odds with the records of fact checkers and the media, which have identified more than 30,000 false or misleading statements by Trump during his presidency (Washington Post fact checker).
This discrepancy between factual accuracy and perceived honesty is, however, understandable if "speaking one's mind" on behalf of a constituency is considered a better marker of honesty than veracity. The idea that untrue statements can be "honest", provided they arise from authentic belief speaking, points to a distinct ontology of honesty that does not rely on the notion of evidence, but on a radically constructivist appeal to an intuitive shared experience as "truth" 3 . There have been several attempts to characterize this ontology of truth and honesty and the stream of misinformation it gives rise to e.g., 13;3;14 . A recent analysis of ontologies of political truth ( 3 ; see also 15 ) proposed two distinct conceptions of truth: "belief-speaking" and "truth-seeking". Belief-speaking relates only to the speaker's beliefs, thoughts, and feelings, without regard to factual accuracy. Truth-seeking, by contrast, relates to the search for accurate information and an updating of one's beliefs based on that information.
The first of these two ontologies echoes the radical constructivist "truth", based on intuition and feelings, that characterized 1930s fascism e.g., 16 . This conception of truth sometimes rejects the role of evidence outright. For example, Nazi ideology postulated the existence of an "organic truth" based on personal experience and intuition that can only be revealed through inner reflection but not external evidence e.g., 16;17 . Contemporary variants of this conception of truth can be found in critical postmodern theory 18 and right-wing populism 19;20 . The second ontology, based on truth-seeking, aims to establish a shared evidence-based reality that is essential for the well-being of democracy 21 . This conception of truth aims to be dispassionate and does not admit appeals to emotion as a valid tool to adjudicate evidence, although it also does not preclude truth-finding from being highly contested and messy ( 22 vs. 23 ).
For democratic societies, a conception of truth that is based on "beliefspeaking" alone can have painful consequences as democracy requires a body of common political knowledge in order to enable societal coordination 21 . For example, people in a democracy must share the knowledge that the electoral system is fair and that a defeat in one election does not prevent future wins. Without that common knowledge, democracy is at risk. The attempts by Donald Trump and his supporters to overturn the 2020 election results with baseless claims of electoral fraud have brought that risk into sharp focus 24 . To achieve a common body of knowledge, democratic discourse must go beyond belief-speaking. In particular, democratic politics requires truth-seeking by leaders-otherwise, they may choose to remain wilfully ignorant of embarrassing information, for example, by refusing briefings from experts that are critical of their favoured public-health policy. A corollary of this requirement is that the public considers truth-seeking by politicians as an indicator of honesty rather than (only) belief speaking.
Although truth and honesty are closely linked concepts, with honesty and truthfulness being nearly synonymous 25 , in the present context they need to be disentangled for clarity. We focus here primarily on conceptions of honesty, which refers to a virtuous human quality and a socially recognized norm, rather than truth, which refers to the quality of information about the world. Thus, the two ontologies of truth just introduced describe how the world can be known-namely either through applying intuition or seeking evidence, irrespective of the virtuous qualities (or lack thereof) of the beholder. Nonetheless, this ontological dichotomy maps nearly seamlessly into the different conceptions of honesty that we characterize as belief-speaking and truth-seeking, respectively.
To date, there has been much concern but limited evidence about the increasing prevalence of belief-speaking at the expense of truth-seeking in American public and political life. We aim to explore this presumed shift in conceptions of truth and honesty by focusing on Twitter activity by members of both houses of the U.S. Congress. The U.S. is not only the world's leading democracy but it is also a crucible of the contemporary conflict between populism and liberal democracy and the intense partisan polarization it has entailed 26 . The choice of Twitter is driven by the fact that public outreach on Twitter has become one of the most important avenues of public-facing discourse by U.S. politicians in the last decade 27 and is frequently used by politicians for agenda-setting purposes 28 .
Our analysis addressed several research questions: Can we identify aspects of belief-speaking and truth-seeking in public-facing statements by members of Congress? And if so, how do these conceptions evolve over time? What partisan differences, if any, are there? Is the quality of shared information linked to the different conceptions of honesty? To answer these questions, we performed a computational analysis of an exhaustive dataset of tweets posted by U.S. politicians, detecting links to misinformation sources and analyzing text of tweets and news sources.

Identifying different conceptions of honesty in political speech
We first sought to identify the two components of truth and honesty -beliefspeaking and truth-seeking -in public-facing political speech by elected U.S. officials. For our analyses, we collected a corpus of tweets from members of the U.S. Congress between January 1, 2011 and December 31, 2022. After removing retweets and duplicates, our corpus contained a total of 4,527,814 tweets (see Methods for details). Twitter accounts were categorized by party affiliation.
To measure the conceptions of honesty in text, we created two dictionaries of words associated with each of the concepts. We followed a computational grounded theory approach 29 to incorporate both expert knowledge and computational pattern recognition. We started with a list of seed words for each conception, followed by computational expansion and iterative pruning and refinement through human input (see Methods for details).
We validated the dictionaries in three steps. First, to validate the candidate keywords (selected by the authors), we created a survey on Prolific and asked participants (N = 51) to rate each keyword's representativeness of the two honesty components on two separate Likert scales. We then ran paired ttests between each word's representativeness ratings for belief-speaking and truth-seeking, respectively. Keywords that were rated as significantly more representative for belief-speaking (truth-seeking) were included in the beliefspeaking (truth-seeking) dictionaries. The final dictionaries include a total of 37 keywords for each component and are provided in Table 1 (see Methods and online supplement Sections S1 and S2 for details). Following the distributed dictionary representation (DDR) approach 30 , we converted the keywords into vector embeddings using a pretrained algorithm (GloVe). Those representations capture nuanced contextual information and are amenable to a vector-similarity approach to establish overlap between each dictionary and the text or document of interest (see Methods for details).
In the second validation step, we applied the dictionaries to our tweet corpus and calculated the semantic similarity D b and D t between the article and the belief-speaking and truth-seeking dictionaries, respectively (see Methods for details). A positive semantic similarity means that a piece of text is more similar to the words contained in a dictionary, whereas a negative similarity means that it is more dissimilar. We then sampled tweets that had a high belief-speaking or truth-seeking similarity or were dissimilar to both honesty components. We again created a survey on Prolific with the same setup as described for the keyword validation. Using tweets for which a majority of human raters agreed that they were representative of "belief-speaking" or "truth-seeking" as ground-truth, we find satisfactory agreement between the computed belief-speaking and truth-seeking similarity scores and human ratings with AUC = 0.824 for belief-speaking and AUC = 0.772 for truth-seeking (see Methods and online supplement Section S3 for details).
In the third validation step we applied the dictionaries to historic articles from the New York Times for three text categories: "opinion", "politics" and "science" (see Methods for details). We found that articles in the "science" category are more similar to truth seeking than all articles on average ( D t sci − D t = 0.033), followed by articles in the opinion ( D t op − D t = 0.006) and politics ( D t pol − D t = −0.006) category. Articles in the opinion category show the highest similarity to the belief speaking dictionary ( D b op − D b = 0.013), followed by articles in the science ( D b sci − D b = 0.009) and politics ( D b pol − D b = −0.007) category. The analysis of New York Times content confirmed our expectation of articles in the science category being most similar to truth-seeking while articles in the opinion category being most similar to belief-speaking. It did not confirm our expectation of politics being more similar to truth-seeking than opinion articles and more similar to belief-speaking than science articles.
Finally, to establish the uniqueness of our dictionaries and to differentiate the honesty conceptions from existing similar measures, we investigated the relationship between our two components to text features such as authenticity 31 , analytic language 32 and a moral component reflecting judgemental language 33 , each measured using LIWC 2022 34 as well as positive and negative sentiment measured using VADER 35 . We calculated scores for each of these components for every tweet in the corpus. Both belief-speaking and truth-seeking are negatively correlated with "analytic", although the correlation with belief-speaking (r = −0.27) is about twice as high as with truth seeking (r = −0.16). Both honesty components are positively correlated with "authentic", "moral" and negative sentiment, while the correlation with positive sentiment is positive for belief-speaking (r = 0.06) and sightly negative for truth-seeking (r = −0.01). All correlations are highly significant (p < 0.001) but small -the correlation with the largest magnitude (r = −0.27) is observed between belief-speaking similarity and "analytic". Details of the comparison with LIWC and VADER scores are summarised in the online supplement Sections S4. In summary, these analyses show that belief-speaking and truth-seeking do not overlap greatly with existing related measures of text features.

Partisan and temporal dynamics of conceptions of honesty
Having validated our dictionaries, we produced textual scatterplots 36 (see Methods for details) to illustrate individual terms that are characteristic of the two honesty components. Figure 1 shows diagnostic words in a two-dimensional plot, with the x-and y-axes representing party and honesty conception respectively. Each dot is a unigram from the Twitter corpus, and its colour is associated with party keyness (a word with positive party keyness occurs more often for texts from members of a given party than expected by chance). The closer to a corner a word is, the more it characterizes that particular conception of honesty and party dimension. See methods for details on how words in the figure are represented. We see that Republican belief-speaking keywords, situated in the top-left corner, often refer to political opponents or ideologies ("biden", "democrats", "conservatives") or conservative values ("freedom", "liberty"). On the other hand, truth-seeking keywords by the same party are linked to economic ("energy", "taxpayer", "trade") or foreign policy aspects ("china", "chinese") and the military. On the right-hand side of the figure, we find that Democrat belief-speaking tweets also regard politicians and political ideology ("trump", "democrats", "republicans"), and social justice ("color", "discrimination", "justice"), whereas truth-seeking texts particularly concern the climate crisis ("climate"), as well as social welfare and healthcare ("worker", "care", "pre existing condition").    Figure 1 The figure depicts the distribution of keywords on a textual scatterplot. Every term is a dot with two coordinates associated with party (x-coordinate) and honesty component (y-coordinate) keyness. Each coordinate represents a Scaled F-Score (SFS) value ranging from -1 to 1. The word color is associated with the party keyness. We only show word labels where SF S > 0.65 or SF S < −0.65 for readability reasons. Below the scatterplot we show four example tweets associated with the four quadrants of the scatterplot.
The online supplement (Section S6) explores the topics of politicians' communications further. The analysis of some controversial topics revealed that these topics invoked more belief-speaking or truth-seeking than the average tweet, with only a few exceptions. For example, vaccine related discourse involved far less belief-speaking than other controversially discussed topics such as climate change or the opioid crisis for both parties.
We next examined the temporal trends of the two honesty components. For the following analyses, we use the centered and length-corrected belief-speaking and truth-seeking similarity scores D b and D t (see Methods for details).
To arrive at a finer-grained picture of the variability of these components between individual politicians, we calculated the average belief-speaking similarity D b acc and truth-seeking similarity D t acc of tweets for each individual politician. Note that acc denotes an account-average. Figure 2  . This overall increase in both belief-speaking and truth-seeking similarity also becomes apparent in Figure 2 E and F, and is especially pronounced after the presidential election in late 2016.
This parallel increase for both belief-speaking and truth-seeking could reflect the fact that in recent years, topics concerning fake news have become increasingly central to political discourse 37 , resulting in opposing claims and counterclaims (e.g., Donald Trump routinely accused mainstream media such as the New York Times of spreading "fake news", 28 ). Whereas those claims represented mainly belief speaking, they were accompanied by increasing attempts by the media, and other actors, to correct misinformation through truth-seeking discourse.

Relation of honesty components to information trustworthiness
To test our hypothesis that belief-speaking is preferentially associated with dissemination of misinformation, we analyzed the association between beliefspeaking and truth-seeking, respectively, to the quality of the information that is being relayed. To assess information quality, we examined links to websites external to Twitter that were shared by the accounts. We followed an approach employed by similar research in this domain 38;39 and used a trustworthiness assessment by professional fact checkers of the domain a link points to. We used the NewsGuard information nutrition data base 40 as well as an independently compiled data base of domain trustworthiness labels 41 (see Methods and online supplement Sections S7 and S8 for details).
The NewsGuard data base as of the beginning of March 2022 indexed 6,860 English language domains. Each domain is scored on a total of 9 criteria, ranging from "doesn't label advertising" to "repeatedly publishes false information". Each category awards a varying number of points for a total of 100. Domains with less than 60 points are considered "not trustworthy". The majority of indexed domains (63%) are considered trustworthy. After excluding links to other social media platforms (e.g., twitter.com, facebook.com, youtube.com and instagram.com) as well as links to search services (google.com, yahoo.com), the database covered between 20% and 60% of the links posted by members of the U.S. Congress, with a steadily increasing share of links covered over time and no difference in coverage between the parties -see also Extended Data Figure 3.
For each tweet, we calculated the belief-speaking and truth-seeking similarity D b and D t . Figure 3 A and B shows S NG , the NewsGuard score rescaled to [0; 1] over the belief-speaking and truth-seeking similarity, respectively, for each tweet posted by a member of Congress.
To investigate the relationship between D b , D t and S NG , we fitted a linear mixed effects model with random slopes and intercepts for every Congress Member following Equation (1). The lines shown in Figure 3 A and B show S NG predicted by the model depending on D b , D t , respectively, party P and their interaction terms (see Methods for details).  Relation of information quality with belief-speaking and truth-seeking. A and B show the rescaled NewsGuard score S NG of links posted by individual U.S. Congress members over belief-speaking (D b ) and truth-seeking (D t ) similarity measured in tweet texts, respectively. The lines and shaded areas indicate NewsGuard score predictions and 95% confidence intervals from a linear mixed effects model (see Eq. (1)). C and D show the rescaled NewsGuard score S NG over belief-speaking and truth-seeking similarity measured in article texts scraped from the tweeted links. The lines and shaded areas indicate NewsGuard score predictions and 95% confidence intervals from a linear regression model (see Eq. (2)). The scatter plots show only 10 5 data points per panel and vertical jitter was applied to visually separate data points. Note that we truncated the y-axis at 0.6. The full data is shown in Extended Data   Table 2 for the full regression statistics and Extended Data Figure 1 for a visualization of the fixed effect of the three-way interaction.
Therefore an increase in D b of 10% predicted a decrease in S NG of 12.8, but only for members of the Republican party. An increase in D t of 10% predicted an increase in S NG of 2.1 for Democrats and of 10.6 for Republicans. For Democrats, we find no significant relationship between S NG and belief-speaking similarity. Predictions of the NewsGuard score depending on belief-speaking and truth-seeking similarity based on the two-way interactions between honesty components and party are shown as lines in panels A and B of Figure 3, respectively.
In the online supplement Section S9, we explore this pattern further by considering NewsGuard scores and honesty components broken down by state and party. We find that the quality of information being shared by Republicans tends to be lower in southern states (e.g., AL, TN, TX, OK, KY) than in the north (e.g., NH, AK, ME), although there are also striking exceptions (e.g., NY). For Democrats, no clearly discernible pattern across states emerges. We also find that the voting patterns during the 2020 presidential election in their home state did not affect the quality of news being shared by members of Congress.
To exclude a dependence of these results on use of the NewsGuard data base, we validated this analysis with an independently collected list of news outlet reliability from academic and fact-checking sources. Results are reported in the online supplement (Section S7) and are consistent with results reported in the main text. In addition, using the different outlet reliability data base, we also find a significant effect of belief-speaking similarity on the quality of shared information for Democrats that goes in the same direction as the effect for Republicans.
Finally, we wanted to know whether the content of belief-speaking and truthseeking words in the texts found at the websites the tweets linked to was also indicative of low information quality. To this end, we attempted to scrape the text of all linked websites (see Methods). We successfully collected text from about 65% of links. We excluded texts with less than 100 words and only retained one copy of the text in the case when multiple tweets contained links to the same website. In addition, we excluded all articles collected from links that were posted by members of both parties (2462 texts, 0.91% of articles), such that every link had a unique party designation. This resulted in a total of 271,171 unique news texts.
We investigated the dependence of the NewsGuard score associated with the domain the text was scraped from on the belief-speaking similarity and the truth-seeking similarity of the article text (rather than in the original tweet). We fitted a linear regression model to predict the rescaled NewsGuard score S NG depending on party, the belief-speaking and truth-seeking similarities D b and D t , and the two-way interaction terms (see Equation (2) and Methods for details).
We show both the data for individual links and the model predictions for D b and D t in Figure Table 3 for the full regression statistics. Our analysis of article texts therefore reproduces the main results from our analysis of tweet texts.

Discussion
We curated two dictionaries that captured the distinction between an evidencebased conception of honesty (truth-seeking) and a conception based on intuition, subjective impressions, and feelings (belief-speaking). We confirmed the validity and diagnosticity of the dictionaries by soliciting ratings from human participants both for individual keywords as well as for documents, and by showing that belief-speaking prevailed in opinion pieces in the New York Times but not in their science section, whereas the reverse occurred for truth-seeking.
Applying those dictionaries to public political discourse by members of the U.S. Congress, represented by their tweets, we find a bipartisan increase of the use of both truth-seeking and belief-speaking language over time, in particular from late 2016 onward. The use of truth-seeking and belief-speaking language is particularly intense for controversial topics, and this is also a bipartisan phenomenon.
The parties differ considerably, however, when the quality of information being shared is considered. Overall, Republicans tend to share information of lower quality than Democrats (see also 41 ), and this difference is in large part driven by belief-speaking: the more Republicans engage in belief-speaking, the more likely they are to share low-quality information. This relationship is absent (or attenuated; see Section S7 in online supplement) for Democrats.
Our results have several theoretical and practical implications that deserve to be explored. First, our data cast a new light on several recent analyses of the American public's information diet that have shown that conservatives are more likely to encounter and share untrustworthy information than their counterparts on the political left 38;42;43;41 . Several reasons have been put forward for this apparent asymmetry, for example that partisans are motivated to share derogatory content towards the political outgroup 44 . Because greater negativity towards Democrats is mostly found in lower-quality outlets, conservatives may disproportionately share untrustworthy information because it is satisfying a need for outgroup derogation 45 .
Our analysis offers another explanation, namely that the public is sensitive to cues provided by the political elites which, as we have shown here, also differ considerably in the accuracy of content that they share on social media. Specifically, Republican politicians frequently, though not always, share lowquality information and are thus providing a cue to their partisan followers of the legitimacy of those outlets. Similar evidence for the sensitivity of the public to leadership cues have been observed in the climate change arena, where the growing polarization of the public along party lines mainly resulted from the Republican leadership gradually assuming a more hostile stance towards the science of climate change 46 .
Our analysis furthermore identified belief-speaking as a "gateway" rhetorical technique for the sharing of low-quality information. The more Republican politicians appeal to beliefs and intuitions, rather than evidence, the more likely they are to share low-quality information. For Democrats, this association was absent in the main analysis using NewsGuard scores, and it was attenuated if an independent source of domain quality was used (see online supplement Section S7). This pattern gives rise to the question why, if beliefspeaking gives licence to the sharing of misinformation, is it only Republicans (or mainly Republicans) who avail themselves of that option?
A possible answer can be found in the finding that belief-speaking is associated with greater negative emotion (see online supplement Section S4). Belief-speaking may therefore result from Republican politicians' desire to derogate Democrats, as suggested by 45 . On that view, negative emotional content should be a mediator of the association between belief-speaking and low quality of shared content. Conversely, if belief-speaking were instrumental in the sharing of low-quality content for other reasons, then it should mediate the association involving negative emotionality. We report two competing mediation models in the online supplement (Section S10). While the models cannot definitively adjudicate between the two possibilities, the analyses suggest the former hypothesis is in a better position to explain the mediating effect on the spread of low-quality news among Republicans. Within this framework, and concordant with 45 , negative emotion associated with derogation of the opponent is the driving force behind the association between belief speaking and the spread of low-quality content among Republicans. Further indirect support for this possibility is provided by the fact that Republican members of Congress do not exclusively share misinformation. When they engage in truthseeking, Republicans' accuracy of shared information rises nearly to the same level as that of Democrats.
Finally, we return to the argument advanced at the outset, namely that beliefspeaking can be a marker of "authenticity" which allows partisan followers to consider a politician to be honest despite them promulgating low-quality or false information. We cannot directly test this argument based on the present data because we have no way of ascertaining the perceived honesty of the politicians in our sample. We do, however, have state-level electoral data from the 2020 presidential election, which show that Republicans did not suffer an electoral penalty for their use of belief-speaking and the associated sharing of low-quality information (online supplement, Section S9). There is no association between the accuracy of Republicans' shared information and the vote share for Trump, suggesting that voters were not deterred by belief-speaking based dissemination of misinformation.
Our analysis was limited to communications by the "political class" in the United States, and although the U.S. is the world's leading democracy, the trends uncovered here should not be considered in isolation but deserve to be contrasted to observations in other countries and cultures. A recent comparison of the overall accuracy of information shared by U.S. members of Congress found that their accuracy was lower-even among Democrats-than the information shared by parliamentarians from mainstream parties in the U.K. and Germany 41 . Although there were differences between parties in those two countries as well, they were small in magnitude and European conservatives were more accurate than U.S. Republicans, underscoring that conservatism is not, per se, necessarily associated with reliance on low-quality information. Another international comparison of populist leaders (Trump in the U.S., Modi in India, Farage in the U.K. and Wilder in the Netherlands) found some commonalities among those politicians, such as the use of insults against political opponents, but also identified Trump as an outlier in the use of critical language 47 . Further examinations of belief-speaking and truth-seeking outside the U.S. context are therefore urgently needed to explore the generality of our findings and to redress the existing global imbalance in research activity 7 .
Future research is also needed to examine the temporal stability of the patterns we observed here. Although our analysis extended to the end of 2022, thus covering two months of Twitter activity after it was taken over by Elon Musk, there is no guarantee that the platform will remain stable in the future. Likewise, in the same way that sharing of misinformation mushroomed after 2016 41 , the long-term trend towards populism may reverse, and she sharing of misinformation may become less frequent in the future. Our analysis is therefore best understood as a historical and contemporary picture of political discourse rather than a pointer to the future.
Finally, future research should also address the particular role played by social media in our analysis. We de-emphasized this angle because when our analysis was extended to mainstream news articles shared by the members of Congress we found very similar results compared to the tweets. However, there may be other situations in which social media play a uniquely different role from conventional mainstream media, and those situations remain to be identified and examined. For each of the Twitter handles, metadata were collected on February 10, 2023 via the Twitter API v2 using the Python package twarc 48 . Metadata included the account's handle, user name, creation date, location, user description, number of followers, number of accounts followed, and tweet count. Out of the 1278 accounts, 220 were not accessible because they had been deleted, suspended, or set to "private".
To build the text corpus, all tweets posted by the collected Twitter accounts starting from November 6, 2010 and up to December 31, 2022 were collected, using academic access to the Twitter API. Note that following this approach we include all tweets posted by a given account in the given time span, not just tweets that were posted while a politician was in office. Earlier tweets all the way back to 2006 could be retrieved, but we chose 2010 as the earliest date due to changes in the design of retweeting in the Twitter platform at that time. The retweet button was introduced in November 2009 (previously retweeting was done by hand), and it took approximately a year for users to start using it consistently. Furthermore, the prominence of Twitter in U.S. politics emerged later, especially since 2012. The resulting corpus consisted of a total of 5,914,107 tweets, of which 3,463,409 were original tweets, 531,289 were quote tweets, 575,044 were replies and 1,351,346 were retweets. Note that quoting, replying and retweeting are not exclusive categories. We removed retweets from the corpus because they do not constitute original content. The number of tweets consistently increased from around 100,000 in 2011 to over 600,000 in 2020 and then declined to around 500,000 in 2022. We removed exact matches (i.e., duplicates) and included only tweets with more than 10 words. The final corpus contained 3,897,032 tweets. Next to the tweet text, the corpus contained the tweet creation date as well as a unique identifier of the account that posted the tweet. The identifier permitted linkage to the metadata collected about the user accounts, such as party affiliation.
We find a large variance in the number of tweets posted by individual accounts, ranging from only one tweet in the observed time period to 52,055 tweets, with a median number of 2876 tweets per account. To exclude a dependence of our results on highly prolific accounts, we also conducted the main analysis reported in Figure 3 and Extended Data Table 2 using only the latest 3200 tweets per account. Results from this analysis are highly consistent with the analysis using all available tweets. See online supplement Section S11 for details. In addition, we show which accounts contribute most to the overall increase of belief-speaking and truth-seeking (see online supplement Section S12).
In addition to the perspective of individual tweets taken in the analysis presented in Section 2, we also considered the perspective of individual links taken in the analysis presented in Section 2. For this analysis, we only considered tweets that contained at least one link (2,700,539 tweets). Because a single tweet can contain more than one link, we expanded the dataset such that every entry referred to a single link, transferring the tweet-level honestycomponent labels to the individual links. This resulted in a total of 2,844,901 links. From each link, we extracted the domain the link pointed to. If the link was shortened using a link-shortening service such as bit.ly, we followed the link to retrieve the full domain name. The domains were then matched against the NewsGuard domain trustworthiness data base as well as the independently compiled list of trustworthiness labels (described in Section 4.7 and Section S7 in the online supplement).

Honesty component keywords and validation
We relied on keywords to identify the relevant subsets of tweets that involved the presumed distinct conceptions of honesty. Initially, two lists of keywords, one for each honesty component, were generated by the researchers involved in this article. The aim was to capture linguistic cues whose presence might signal that one of the components has been enacted by the speaker. To illustrate, initial keywords for truth-seeking included terms such as "reality", "assess" "examine", "evidence", "fact", "truth", "proof", and so on. For beliefspeaking, initial keywords were terms such as "believe", "opinion", "consider", "feel", "intuition", or "common sense".
The lists were expanded computationally using a combination of the fasttext library 49 and colexification networks 50;51 . Using the fasttext embeddings, we expanded the seed words to include words that have a cosine similarity score above 0.75. Colexification networks connect words in a language based on their common translations to other languages, thus signalling words that can be used to express multiple concepts. For example, the words "air" and "breath" are considered to be colexifications because they both translate into the same word in multiple languages ("sukdun" in Manchu, "vu:jnas" in Kildin Sami, "jind"in Nenets; 52 ). Colexification networks have been used recently to study emotion structures in language 53 and are predictors of word meaning ratings 50 . Including colexification networks in lexicon expansion gives word lists with a better trade-off between precision and recall 51 than previous approaches using wordnet or word embeddings, such as empath. We subsequently filtered the expanded lists to remove duplicates, overlapping terms appearing in more than one list, and lemma inflections (i.e., "convey", "conveys", "conveyed"). The keywords were then used to identify texts relevant to the presumed conceptions of honesty.
To validate the keyword lists, we asked participants in an online survey to score each term on two scales reflecting the honesty components. Data were acquired on September 20, 2022, from 50 individuals (male = 15, female = 34, unlisted = 1; age M = 39.5, SD = 15.8) using the Prolific survey platform 54 . Participants were asked to score each term on two distinct Likert scales ranging from 1 to 5, which respectively indicated low and high representativeness of the word for that honesty component. The instructions provided to participants can be found in the online supplement (Section S1). The distributions of ratings collected for each keyword are shown in the online supplement, Figures S1 and S2.
We next performed paired t-tests to see how participants sorted the terms into the two conceptions. The results of the t-tests are shown in the online supplement (Table S1). Out of 98 keywords, 61 were judged to belong in the category we previously assigned them to, 24 did not reach the significance threshold (p < 0.05) and were therefore removed, and 13 were classified by participants as belonging to the opposite category. We followed the raters' indications and moved the keywords that were classified as belonging to the opposite category from their original dictionary to the the other dictionary. The final list of keywords for both dictionaries is given in Table 1.

Identification of honesty components in text
As a first preparatory step, we removed URLs and replaced user handles on Twitter with the word "user". We then split the tweet texts into individual tokens (words). We then created embeddings of each word contained in the honesty component dictionaries (see Table 1) with GloVe 55 trained on 840B tokens from the Common Crawl corpus, following the distributed dictionary representation (DDR) approach 30 . We note that the word "seem" from the belief-speaking dictionary is included in the list of stopwords of GloVe. We therefore removed "seem" from the stopword list to include it into the dictionary embedding that was calculated using GloVe.
We then averaged the single-word embeddings within every honesty component to create an embedded representation of the entire dictionary. Similarly, we embedded every token contained in a given tweet and calculated an average of all token embeddings to create an embedded representation of the tweet. For every tweet and both components we then calculated the cosine similarity between the embedded tweet representation and the embedded dictionary representations to arrive at a belief-speaking similarity score D b and a truthseeking similarity score D t for the given tweet. Similarity scores range from -1 (not similar at all) to 1 (perfectly similar).
We find that similarity scores correlate with the length of tweets (number of characters), with Pearson's r = 0.37 (p < 0.001) for belief-speaking and r = 0.42 (p < 0.001) for truth seeking. In addition, the length of tweets systematically increases over the years, particularly after the increase in the character limit of a tweet from 140 characters to 280 characters in 2017. To remove the trend in similarity scores due to increasing tweet length, we fit two linear models D b ∼ tweet length and D t ∼ tweet length. We then use these linear models to predict D b and D t for every tweet based on its length and subtract this prediction from the measured belief-speaking and truth-seeking similarity, resulting in the centered and length-corrected similarity scores D b and D t which we report throughout the publication.
To measure belief-speaking and truth-seeking similarity in the text of the articles collected from links posted by Congress Members on Twitter (see Section 4.9 below), we followed the same approach as described for the text of the tweets above but measure the length of an article as the number of words it contains instead of the number of characters.
To test the robustness of our results to perturbations of the dictionaries, we recalculated belief-speaking and truth-seeking similarities using versions of the dictionaries where 7 words (20%) were removed from the dictionary at random before embedding the words and calculating dictionary representations. We then re-ran the regression of S NG on D b , D t , party and the interaction terms (see Equation (1)), where D b and D t are the belief-speaking and truth-seeking similarities, calculated using the representations of the perturbed dictionaries. The distribution of estimates for the fixed effects of the two-way interaction between party and D b , and party and D t over 100 perturbations are shown in Extended Data Figure 2. While the estimates for the effect of D b and D t on NewsGuard score vary by about 20% between different perturbed dictionary versions, the effects never change direction and always stay significant (p < 0.001) for Republicans, as reported in the main text.
In addition to GloVe 55 embeddings, we also calculated D b and D t using word2vec 56 and fasttext 49 embeddings of both the dictionary keywords and the tweets, to exclude a dependence of our results on the choice of embedding. We note that similar to GloVe, the word "seem" is included in the stopword list of word2vec and was removed from the stopword list before computing the embeddings. Results of fitting the linear mixed effects model following Eq. (1) using the alternative embeddings for the dictionaries and tweet texts are shown in the online supplement Section S13. Results are similar to the results using GloVe embeddings (see Table 2). This shows that our results do not depend on the algorithm or the corpus (common crawl for GloVe and word2vec versus Google news for fasttext) that was used to train the embedding.
Lastly, we also investigated which individual keyword likely contributed most to the overall increases of belief-speaking and truth-seeking reported in Figure 2. We report the results in the online supplement Section S14.

Honesty component document-level validation
To validate our measures of the belief-speaking and truth-seeking honesty components on the document level, we asked human raters to rate individual tweets with respect to their similarity to the two honesty components.
To this end, we sampled 20 tweets from the top belief-speaking and bottom truth-seeking quartile, as well as 20 tweets from the top truth-seeking and bottom belief-speaking quartile. In addition, we sampled 20 tweets that simultaneously belonged to the bottom belief-speaking and truth-seeking quartiles. Each sample of 20 tweets included 10 tweets from Democrats and 10 from Republicans.
We then created a survey on Prolific 54 and asked participants (N = 51) to rate each tweet's representativeness of the two honesty components on two separate Likert scales. We followed exactly the same setup as described in Section 4.2 above, but presenting full tweets instead of singular keywords. The instructions provided to participants can be found in the online supplement (Section S1). In addition, we included an attention check in the survey, with the aim of excluding all participants that failed the check. To this end, we asked all participants to select "5" for both categories halfway through the survey. Only one person failed the check. The responses of this person were excluded from the survey, resulting in N = 50 total responses (male = 25, female = 24, nonbinary = 1; age M = 37.6, SD = 12.88). Data were acquired on February 10, 2023. The distributions of ratings collected for each tweet are shown in the online supplement (Section S2).
We then wanted to quantify the performance of our computed similarity scores when used as a classifier. To this end, for each honesty component we coded the 20 tweets that were selected from the top belief-speaking [truth-seeking] similarity quartile as "belief-speaking" ["truth-seeking"] and the 40 tweets that were selected from the bottom similarity quartile of that component as "not belief-speaking" ["not truth-seeking"]. We then classified every tweet for which a majority of human raters selected either a "4" or a "5" for how characteristic a tweet was for "belief-speaking" ["truth-seeking"] as "belief-speaking" ["truth-seeking"] to create a ground-truth dataset to compare our classifier against. We obtained ROC curves for belief-speaking and truth-seeking by varying the threshold for belief-speaking [truth-seeking] similarity to categorise a tweet as "belief-speaking" ["truth-seeking"] (akin to varying response criteria in a behavioral study). The ROC curves are shown in the online supplement (Section S2). The area under the curve is high in both cases, with AUC = 0.824 for belief-speaking and AUC = 0.772 for truth-seeking.

New York Times corpus
We retrieved data from the New York Times (NYT) through their archive API (https://developer.nytimes.com/docs/archive-product/1/overview). By iterating over the months since the founding of the newspaper in the 19 th century, we retrieved information on every article in the archive. The information returned by the API included the article title and an abstract that summarizes the article content, as well as additional metadata such as publication date and section of the paper. This approach is different to earlier research that used the NYT API to obtain a number of articles over time that contain certain terms, which does not yield any further text or ways to filter the data 57 . Because we needed text to identify honesty components in articles, the archive endpoint was more suitable than the term search function of the NYT API, despite not giving us the full text of all articles but only returning a summary.
We extracted three distinct categories of content from the NYT corpus based on the sections identified in the metadata: (i) An "opinion" category which comprises opinion pieces such as "OpEds"; (ii) a "politics" category consisting of articles in the sections U.S., Washington, and World; and (iii) a "science" category which includes health, science, education, and climate articles. We chose these three clusters because we expected opinion articles to contain more belief-speaking, whereas we expected science articles to contain more truth-seeking. We expected articles in the politics cluster to fall in between.
We retrieved a total of 809,271 articles consisting of 240,567 opinion articles, 518,123 politics articles, and 50,581 science articles.

Word and topic keyness analysis
The scatterplot in panel A of Figure 1 was produced following the approach described in Scattertext 36 , a Python package designed to illustrate words and phrases that are more characteristic of a category such as party than others.
To derive how characteristic a word is of a category, we start from raw word frequencies: for each word w i ∈ W and category c j ∈ C we define the precision of the word w i with respect to the category as .
Here, the function #(w i , c j ) represents the number of times w i occurs in a document labeled with the category c j . Therefore, prec(i, j) represents the discriminative power of a given word across categories regardless of its frequency in the given category.
Similarly, we define the frequency a word occurs in a category c j as .
To combine prec(i, j) and freq(i, j) into a single score, we scale and standardize both values using a normal cumulative density function Φ(z) and then calculate the harmonic mean between the two contributions (see 36 for details). This yields the Scaled F-Score SFS for every word w i and category c j that is defined as SFS(i, j) = H (Φ(prec(i, j)), Φ(freq(i, j))) .
For our application case, we want to represent how representative a word is not only for a single category (like "Republican") but rather on a spectrum of representativeness that ranges from "more Democratic" to "more Republican". To this end, we need to map the two distinct scores SFS D for the category "Democratic" and SFS R for the category "Republican" to a single score that ranges from −1 to +1. For two arbitrary categories x and y we therefore define This maps two SFS (one for category x and one for category y) that are both defined in the range [0, 1] to a single score in the range [-1, 1]. To this end, SFS y is mapped to [-1, 0], the SFS with the larger magnitude is selected and then rescaled to the new range. In our application case, this then yields a single Scaled F-Score SFS party that is -1 for more Republican tweets and +1 for more Democratic tweets.
To calculate representativeness along the "belief-speaking -truth-seeking" dimension, we follow a similar approach. Before we can calculate the SFS for belief-speaking and truth-seeking, we first need to transform the continuous honesty similarity scores D b and D t into a binary honesty component label for each tweet. To this end, we divided the tweets into quantiles according to their belief-speaking [truth-seeking] similarity. We then categorized the tweets with a belief-speaking [truth-seeking] similarity in the to 20% as "belief-speaking" ["truth-seeking"]. If a tweet was part of the upper quantile for both components, then the higher of the two similarity values was used to assign a category to the tweet. We then followed the approach described above to calculate a single Scaled F-Score SFS honesty from SFS b (for belief-speaking) and SFS t (for truth-seeking).
As a result, each word had two SFS scores: SFS party and SFS honesty . These two scores were used as x-and y coordinates for the scatterplot shown in panel A of Fig. 1. The X-shaped structure of the words in the scatterplot indicates that words that are characteristic for one dimension (e.g., party) are likely also characteristic for the other dimension (e.g., honesty component). Words that are not characteristic of any category (like stopwords) cluster in the middle.

NewsGuard nutrition labels
Following precedent 38;39 , we use source trustworthiness as an estimator for the trustworthiness of an individual piece of shared information. We use nutrition labels provided by NewsGuard, a company that offers professional fact checking as a service and curates a large data base of domains. The trustworthiness of a domain is assessed in nine categories, each of which awards a number of points: Does not repeatedly publish false content (up to 22 points), gathers and presents information responsibly (18), regularly corrects or clarifies errors (12.5), handles the difference between news and opinion responsibly (12.5), avoids deceptive headlines (10), website discloses ownership and financing (7.5), clearly labels advertising (7.5), reveals who is in charge, including any possible conflicts of interest (5), the site provides names of content creators, along with either contact or biographical information (5).
NewsGuard categorizes domains with a score of 60 or higher as "generally adheres to basic standards of credibility and transparency" 40 . Similar to 58 , we use this value as a threshold below which we categorize a domain and the link pointing to it as "not trustworthy".
After excluding links to other social media platforms (e.g., twitter.com, facebook.com, youtube.com, and instagram.com) as well as links to search services (google.com, yahoo.com), the NewsGuard database covers between 20% and 60% of the links posted by members of the U.S. Congress, with a steadily increasing share of links covered over time -see Extended Data Figure 3 A.

Regression
We performed a range of regression analyses to quantify the relationship between various manifestations of honesty components and information quality.
For the predictions shown in Figure 3 A and B we fitted the following linear mixed effects model for tweets from members of the U.S. Congress: Here, S NG is the NewsGuard nutrition score of a domain a Congress member linked to in a post on Twitter, rescaled to [0; 1]. D b and D t are the centered and length-corrected belief-speaking and truth-seeking similarity of the text in the tweet with the link, respectively (see section "Identification of Honesty components in text" above). P is the party designation of the account that posted the tweet which can be "Republican" or "Democrat". We include random slopes and intercepts for every account (userID). We fitted the model using the lmer function from the R library lme4 59 . Regression results are reported in Extended Data Table 2. Data distribution was assumed to be normal but this was not formally tested.
For the predictions shown in Figure 3 C and D, we fitted the following model for articles that were linked to by the U.S. Congress members: Here, D b and D t are the centered and length-corrected belief-speaking and truth-seeking similarity scores of the article text retrieved from the link. We fitted the model using an ordinary least squares fitting approach from the Python package statsmodels 60 . Regression results are reported in Extended Data Table 3. Data distribution was assumed to be normal but this was not formally tested. Note that we do not fit a linear mixed-effects model for the statistical analysis of the articles, since there is no clear nesting of articles within individual Twitter accounts, as a single article can be linked to from multiple accounts.

News article collection
Excluding links to other social media platforms (e.g., twitter.com, facebook.com, youtube.com and instagram.com) as well as links to search services (google.com, yahoo.com), our corpus of tweets contained 1,027,050 unique links to news articles that were shared by members of Congress. Of these links, 462,853 pointed to sites that were indexed by the NewsGuard data base (see Section 4.7 above). We scraped the text of these sites using Newspaper3k 61 , a Python package for scraping and curating news articles. Some links were broken, restricted, or could not be scraped by the package. In addition, we removed all articles that contained less than 100 words or were shared only by independent politicians (i.e., not Republican or Democrat). This resulted in 65% of total scraping coverage. When broken down by trustworthiness, the coverage for trustworthy links (N = 291,143) was 65%, and 82% for untrustworthy links with a NewsGuard score < 60 (N = 7,776). We retained only one copy of each news article in case it was shared multiple times and removed from the main analysis articles that were shared by members of more than one political party (i.e., a link was shared either by Republicans or Democrats, but not both). This was done to ensure each article had only a single party designation such that our statistical analysis of articles was comparable to our statistical analysis of tweets. This resulted in the removal of 2,462 articles (0.91% of all remaining articles), which were analyzed separately. To provide a marker for apparent bipartisan agreement, we plot the mean and standard deviation of honesty component similarity and S NG for the articles shared by both parties (gray ellipses in Extended Data Figure 4). Removing these articles left us with a corpus of 271,171 article texts.
The distribution of NewsGuard scores as well as the belief-speaking and truthseeking similarity in each article is shown in Extended Data Figure 4 C and D.

Data availability
The

Acknowledgments
This report was partly funded by the Templeton Foundation through a grant awarded to Wake Forest University for the "Honesty Project". SL was also supported by funding from the Humboldt Foundation in Germany, and SL and DG are beneficiaries of the ERC Advanced Grant PRODEMINFO (101020961). JL was supported by the Marie Sk lodowska-Curie grant No. 101026507.
We acknowledge Travis Coan for helpful feedback on the manuscript.

Inclusion and ethics
This study is based on publicly available archival Twitter data on U.S. Members of Congress and their official staff and campaign accounts. Only public figures are analyzed and only content that was not deleted by the time of data retrieval was considered. All U.S. Members of Congress in curated Twitter account lists are included as long as their Twitter accounts were public by the retrieval data. We focused on the two major parties to have sufficient evidence for statistical analysis and our results cannot be extended to independent members of congress or members of other parties besides the Democratic and the Republican party.

Competing interests
The authors declare no competing interests.
10 Extended data figures and tables Extended Data Figure 4 A and B rescaled NewsGuard score S NG of links shared in tweets by members of the U.S. congress over belief-speaking similarity D b . Red and blue dots denote tweets by Democrats and Republicans, respectively. B shows S NG over truthseeking similarity D t in tweets. C and D show the same information but with D b and D t calculated using the text of the articles that were linked instead of the tweet texts. The grey ellipses indicate the mean and standard deviation of the honesty component similarity and S NG for articles shared by members of both parties. These articles were excluded in the regression analysis. Extended Data Table 3 Results of an ordinary least-squares regression for rescaled NewsGuard score of each link S NG on belief-speaking similarity D b and truth-seeking similarity D t in articles collected from links in tweets, following Eq. Appendix S1 Instructions to participants during keyword validation What follows is a verbatim copy of the instructions provided to the participants who rated the keywords.
People can have different ideas about what it means to be "honest".
We are focusing on two ideas of honesty.
One is based on intuition, "gut feeling" and authenticity. According to this idea, people speak the truth and are honest when they "say what they felt to be true in the moment". Whether or not claims are correct reflections of reality is not as important. We call this idea of honesty and truth "belief speaking".
The other idea is based on evidence, analysis, and veracity. According to this idea, people speak the truth and are honest when their claims align with the evidence. Whether or not claims are authentic reflections of a person's feelings is not as important. We call this idea of honesty and truth "truth seeking".
Your task is to judge, for each of the words below, which idea of honesty it is most closely related to. If someone uses that word, does it likely reflect belief speaking? Or does the word likely reflect truth seeking?
Please indicate which idea of honesty each word is closest to by selecting, for each column, a value from 1 to 5, where 1 means that the word is the least representative of that category, and 5 means that the word is highly representative of that category. There are no right or wrong answers, we are interested in your analysis of the meaning of those words.

Appendix S2 Dictionary keyword validation results
To validate the keywords contained in the belief-speaking and truth-seeking dictionaries we asked raters on the survey platform Prolific 54 to score each term on two scales reflecting their representativeness for belief-speaking and truth-seeking, respectively. The collected data contains responses from 50 participants and ratings from 1 to 5 for each keyword. Data were acquired September 20, 2022, the instructions provided to participants are reported in section "Prolific Questionnaire Instructions". The distributions of ratings collected for each keyword are shown in Figures S1 and S2.
To determine the validity of each keyword, we conducted t-tests between the distribution of representativeness ratings for belief-speaking and the distribution of representativeness ratings for truth-seeking for every keyword. If the difference between the distributions was significant (α = 0.05), the keyword was included in the belief-speaking dictionary if the t-value was positive, and in the truth-seeking dictionary if the t-value was negative. Results of the t-tests for each keyword are reported in Table S1. Table S1: Results of the t-tests of the keyword ratings performed by 50 raters. "component" indicates the honesty component a given keyword was initially assigned to. The column "valid" is a binary variable indicating whether our initial component assignment for the keyword was confirmed by the raters, based on the t-value direction (positive for belief-speaking, negative for truth-seeking) and a significance level of α = 0.05. The column "opposite" indicates whether a keyword was shifted to the opposite honesty component dictionary. This happened when the t-value was significant (α = 0.05) but in the opposite direction than initially assumed. Rating distributions are shown in Figure S1 for the keywords that were initially categorised as "belief-speaking" and in Figure S2 for the keywords that were initially categorised as "truth-seeking".  Figure S1 Boxplots of rating distributions for keywords we originally categorized as beliefspeaking. Clear cases where our categorizations were confirmed are, for example, 'opinion', 'feel', 'believe'. Examples of discarded keywords are 'clearly', 'undoubtedly', 'sure'. The only reversed case is 'observe', categorized as 'truth-seeking' by the raters.

Appendix S3 Document-level validation results
To validate the belief-speaking and truth-seeking measures, we asked raters on the survey platform Prolific 54 to score tweets on two scales reflecting their representativeness for belief-speaking and truth-seeking, respectively.
Tweets shown to the participants were sampled from the full corpus of tweets with the aim of sampling tweets with high and low honesty component similarity D b and D t . We thus sampled 20 tweets from the top belief-speaking and bottom truth-seeking quartile, as well as 20 tweets from the top truthseeking and bottom belief-speaking quartile. In addition, we sampled 20 tweets that simultaneously belonged to the bottom belief-speaking and truth-seeking quartiles. Each sample of 20 tweets included 10 tweets from Democrats and 10 from Republicans.
The collected data contains responses from 50 participants (one participant from the initial 51 participants was excluded due to failing the attention check) and ratings from 1 to 5 for each tweet for belief-speaking and truth-seeking, respectively. Data were acquired February 10, 2023. The instructions provided to participants are the same as those reported in Section S1 with the only adaptation that the term "word" was replaced with the term "tweet".
We then classified every tweet for which a majority of human raters selected either a "4" or a "5" for how characteristic a tweet was for "belief-speaking" ["truth-seeking"] as "belief-speaking" ["truth-seeking"] to create a groundtruth dataset to compare our classifier against. This resulted in 27 tweets that were classified as "belief-speaking", 21 tweets that were classified as "truthseeking" and 12 tweets that were classified as neither by human raters.
To assess the performance of our similarity-based classifier, we calculate the ROC curves for belief-speaking as the threshold for the belief-speaking similarity D b to classify a tweet as "belief-speaking" is varied (see Figure S3, left panel). The ROC curve for the truth-seeking similarity D t is shown in the right panel of Figure S3. The area under the curve is high in both cases, with AUC = 0.824 for belief-speaking and AUC = 0.772 for truth-seeking. The distributions of ratings collected for each keyword are shown in Figures S4, S5 and S6.

Appendix S4 VADER text analysis
We explored the content of the tweet texts within the two honesty components using Valence Aware Dictionary for sEntiment Reasoning (VADER) 35 . VADER is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. VADER computes sentiment polarity of a text and provides a "positive" and "negative" sentiment score, as well as a "neutral" and "compound" score.
Correlations between VADER scores and belief-speaking and truth-seeking similarity are given in Table S2. In addition, we show the time-development of the positive and negative scores broken down for the top and bottom quantiles of belief-speaking and truth-seeking similarity in Figure S7.

Table S2
Pearson correlation between belief-speaking and truth-seeking similarity and LIWC scores measuring the prevalence of "analytic", "authentic" and "moral" language, as well as positive and negative sentiment measured with VADER.

Appendix S5 LIWC text analysis
We also explored the content of the tweet texts within the two honesty components using the Linguistic Inquiry and Word Count (LIWC) program 62 . LIWC is a text processing software that has been continuously developed for more than two decades and computes several indicator variables from text based on word lists generated by psychologists and validated in various experimentssimilar to our approach in generating the word lists for the belief-speaking and truth-seeking word lists.
With the Beta version of LIWC-2022 software (https://www.liwc.app/), we computed the scores for each tweet text for the following LIWC categories: authenticity, analytic, and moral. Authenticity indicates to what extent the language used is perceived as honest and genuine 31 . Analytic is linked to logical and formal thinking 32 . Finally, moral reflects the judgmental language expressed by positive or negative evaluation of someone's behavior or character 33 . The scores provide an efficient summary of those attributes in each text.
Correlations between LIWC scores and belief-speaking and truth-seeking similarity are given in Table S2. In addition, we show the time-development of the scores broken down for the top and bottom quantiles of belief-speaking and truth-seeking similarity for the "analytic", "authentic" and "moral" components in Figure S8.  Figure S7 shows the timelines of LIWC scores for positive and negative emotions for the top and bottom quantile for belief-speaking and truth-seeking similarity. We performed the same analysis for "authentic", "analytic" and "moral" language, using LIWC dictionaries as described in the Methods Section "LIWC text analysis". The time development of "analytic" language broken down by honesty component is shown in Figure S8, panels A to D, the time development of "authentic" language is shown in Figure S8 panels E to H and the time development of "moral" language is shown in Figure S8 panels I to L.  Figure S8 Time-development of LIWC scores of "analytic", "authentic" and "moral" language in tweets of members of the U.S. Congress. Panels A and B show the "analytic" score for tweets that belong to the top belief-speaking and truth-seeking similarity quantile, while panels C and D show the "analytic" score for the bottom similarity quantiles. Timelines are normalized by the overall "analytic" score (baseline) measured in the full corpus. Red and blue lines correspond to tweets by Republicans and Democrats, respectively. Panels E to H show the same information as panels A to D, but for "authentic" language, while panels I to L show the same information for "moral" language. The 95% confidence intervals (indicated by shading) were computed with bootstrap sampling over 1,000 iterations. Dashed vertical lines indicate dates of presidential elections in 2016 and 2020. Timelines are smoothed, using a rolling average over three months.

Appendix S6 Topic analysis
To investigate the prevalence of belief-speaking and truth-seeking, we performed topic modelling using the Python package BERTopic 63 . Following a three-step approach, the package uses the Sentence-BERT (SBERT) framework to create the embeddings for each document, then uses the Uniform Manifold Approximation and Projection (UMAP) technique 64 to decrease the dimensionality of embeddings and identify clusters through HDBSCAN 65 . Finally, it creates topic representations using class-based term-frequency inverse-document-frequency (TF-IDF). We opted for BERTopic rather than other techniques such as Latent Dirichlet Allocation (LDA) because the former performs better when modelling short and unstructured texts as in the case of Twitter data when compared to the latter 66;67 . Since BERTopic relies on an embedding approach, data was only minimally preprocessed to keep the original sentence structure. This means we lemmatized the entire dataset to produce cleaner topic representations, and only removed URLs from the texts.
Since the number of documents was too large to fit a topic model of all documents, we restricted the corpus to the last 3200 tweets from each account. We also applied thresholds to the topic modelling: The document minimum frequency was set to 200 in order to reduce the number of small topics. The number of neighboring sample points used when making the manifold approximation was set to 100 to produce a more global view of the embedding structure. Finally, the minimum document frequency for the c-TF-IDF was set to 50 to reduce the topic-term matrix size and decrease memory-related issues during the computation. With these settings, the model was able to identify 363 topics.
To check whether this was an optimal number of topics, we used ldatuning 68 , an R package that trains multiple models and calculates validation metrics. Despite the fact that ldatuning does not employ embeddings but Latent Dirichlet allocation and that the data it modelled was preprocessed by removing stopwords and irrelevant text (numbers, unknown characters, URLs, Twitter handles), it indicated 300 as an optimal number of topics for the dataset, thus converging towards the BERTopic results.
Building on the topic modelling, we investigated the difference between beliefspeaking and truth-seeking in communication about controversial topics in U.S. politics, such as foreign policy, climate change, or the death penalty, and how this differs by party. The selection of controversial topics presented here is inspired by other research in the same area, e.g. 69 and current research topics of non-partisan think-tanks, e.g. 70 . By default, BERTopic assigns each document to a single topic. Therefore, we used this information to calculate how particular controversial topics were distributed across parties and components, as shown in Figure S9. To do this, we grouped the tweets by the topic they were assigned to as well as by the party the politician that created them was affiliated with. We then averaged their belief-speaking and truth-seeking similarity scores to calculate D b topic, party and D t topic, party , respectively. We repeated this procedure for all 20 topics of interest. We also calculated the average belief-speaking similarity score D b and truth-seeking similarity score D t for all 363 topics found by BERTopic. Finally, we subtracted the specific component averages of a topic t from the full corpus component averages to highlight how parties differ in honesty-speech when talking about controversial matters.
In Figure S9 A and B we show the average belief-speaking and truth-seeking similarity within a given topic D b topic, party and D t topic, party , minus the average belief-speaking and truth-seeking similarity calculated over the full corpus D b and D t for members of the Democratic and Republican parties, respectively. Each horizontal bar in the figure thus represents the deviation from the average score across the entire corpus. A value greater than zero implies that a topic involved more belief-speaking or truth-seeking than expected on average, and a value less than zero implies below-average invocation of belief-speaking or truth-seeking. It is immediately apparent that most of these controversial topics invoked more belief-speaking or truth-seeking than the average tweet, with only a few exceptions. For example, vaccine related discourse involved far less belief-speaking than any other topic for both parties.
There is, however, also considerable heterogeneity in the amount of beliefspeaking and truth-seeking used between the topics: Topics such as impeachment, religious freedom and Putin / Ukraine show a large amount of belief-speaking in both parties, whereas topics such as vaccines show little. Similarly, for truth-seeking the topics climate change, impeachment and religious freedom show a large share of this honesty component for both parties whereas the LGBTQ topic shows little.
There are also marked differences in the balance of belief-speaking and truthseeking within a topic and between the parties. The topics of climate change, gun violence, COVID-19 and the gender pay gap have the largest difference in belief-speaking, with tweets by Democrats containing more belief-speaking than those by Republicans. The topics of climate change, police, Afghanistan and abortion have the largest difference in truth-seeking with tweets by Democrats containing more truth-seeking while for the topic of animal cruelty, tweets by Republicans contain more truth-seeking.  Appendix S7 Validation using an independently compiled list of unreliable news sources To exclude a dependence of the main results reported in Section "Relation of honesty components to information trustworthiness" on use of the News-Guard data base, we validated this analysis with an independently collected list of news outlet reliability from academic and fact-checking sources. Details on how this list was compiled are reported in Section "Independent list of untrustworthy sources" below. Using this list, we can assign an accuracy score S a ranging from 1 to 5 as well as a transparency score S t , ranging from 1 to 3 to each domain. In addition, a domain with an accuracy score of ≤ 2 and/or a transparency score of 1 will be labelled as "unreliable". Similar to the analysis above, we analyse the dependency of the accuracy score S a rescaled to [0; 1] and the transparency score S t rescaled to [0; 1] on the centered and lengthcorrected belief-speaking and truth-seeking similarity measured in tweet texts D b and D t , respectively. We fit a linear mixed effects model with party as fixed variable and random slopes and intercepts for every Congress Member for each of the two scores: We see the same pattern for the transparency score S t , where we see a significant negative relation with D b and a significant positive relation with D t for both parties, as well as a significant effect of party, the interaction terms party ×D b and party ×D t , and three-way interaction D b × D t × party.
Full regression statistics are reported in Tables S3 and S4. We note that there is extensive agreement between the trustworthiness labels in the NewsGuard data base and the alternative data base: An account that is labelled "untrustworthy" in the NewsGuard data base has a high chance of being labelled "unreliable" in the alternative database as well (Krippendorff's α of 0.84). This is also shown in a recent preprint 71 that compares both data bases.  Appendix S8 Independent list of untrustworthy sources We compiled a list of trustworthiness ratings from a range of academic sources and fact-checking sites. Most of these sources were also used by 72  The main challenge in combining lists from different fact checkers lies in unifying the labels the fact checkers assign to the domains. To address this, we devised a scheme where we rated each domain on two dimensions that we consider to be important to assess reliability and trustworthiness of information: "accuracy" and "transparency". We devise an accuracy score S a that varies from 1 (false information) to 5 (scientific) and a transparency score S t that varies from 1 (no transparency) to 3 (transparent). We provide a more detailed description of the five accuracy and three transparency levels in Tables S5 and  S6. Mappings of the labels of individual fact checking sites to accuracy and transparency scores as well as the full list of domains are provided at 41 . After mapping all individual lists to the accuracy and transparency dimensions, we label every domain that has an accuracy score of 1 (False Information) or 2 (Clickbait) and/or a transparency score of 1 (No Transparency) as "unreliable". This results in a total of 2,170 domains being labelled as "unreliable" and 2,597 as "reliable". For the 1,677 domains that are contained in both data bases, the Krippendorff's α between "untrustworthy" (score < 60 in News-Guard) and "unreliable" in the independently compiled data base is 0.84, which shows a very high agreement between the two databases. The independently compiled domain list including the unified labels is openly accessible at https://doi.org/10.5281/zenodo.6536692.
After excluding links to other social media platforms (e.g., twitter.com, facebook.com, youtube.com, and instagram.com) as well as links to search services (google.com, yahoo.com), the database covers a very similar share of links as the NewsGuard data base (between 20% and 60%) -see also Extended Data Figure 3 B in the main article.

Appendix S9 Honesty components by state
To examine geographical heterogeneity, we averaged NewsGuard scores across representatives and senators within each state, broken down by party. The results are shown in Figure S10, plotting each state's NewsGuard score against average belief-speaking similarity (left panels) and truth-seeking similarity (right panels), respectively. The size of plotting symbols additionally represents the vote share for Trump (in the bottom panels) and for Biden (top panels) during the 2020 presidential election. It can be seen that quality of information being shared by Republicans tends to be lower in southern states (e.g., AL, TN, TX, OK, KY) than in the north (e.g., NH, AK, ME). For democrats, no clearly discernible pattern emerges.
We also considered the outcome of the 2020 presidential election and compared the states that were called for Trump and Biden, respectively. In states that were called for Biden, Democrat members of Congress on average have a News-Guard score of 94.5 whereas Republicans have 88. 6. In states that were called for Trump, the NewsGuard scores were 94.2 (Democrats) and 87.7 (Republicans), respectively. These differences were small, suggesting that the electoral pattern in their home states did not affect the quality of information shared by members of Congress.

Appendix S10 Mediation analysis
Why is it the case that belief speaking is the preferred means to spread lowquality information? One possibility is that belief-speaking is the result of Republican politicians' desire to disparage Democrats, as suggested by 45 , given that belief speaking was found to be associated with greater negative sentiment (see Figure S7), and given that lower-quality information tends to be biased towards negativity 82 . According to this theory, the relationship between belief-speaking and low-quality shared information should be mediated by negative sentiment. On the other hand, if belief speaking were involved in the dissemination of poor quality content for other reasons, it should mediate the association involving negative sentiment.
To test these opposing predictions, we examined separately for Democrats and Republicans whether (1) negative sentiment mediated the effects of belief speaking on sharing low-quality information, or (2) belief speaking mediated the effects of negative sentiment on sharing low-quality information. For each user, we computed mean scores of negative sentiment (measured via VADER, see Section "VADER text analysis"), belief speaking similarity, and prevalence of sharing low-quality news (average NewsGuard score of the shared articles). We conducted a causal mediation analysis using the 'mediation' R package 83 and a bootstrap method with 10,000 iterations.  Tables S7 and S8 for the full details. These results align with the findings of 45 , suggesting that the relationship between belief-speaking and low-quality shared information is indeed driven by negative sentiment. Appendix S11 Robustness analysis using only a restricted number of tweets per account The number of tweets posted by an individual account varies widely: while the median number of tweets posted by an account is 2876, the mean is 4278, with the most prolific account posting 52,055 tweets and 10% of the accounts posting 9800 tweets or more in the observed time span (November 6, 2010 to December 31, 2022).
To assess whether our results are driven by accounts that post a large number of tweets, we repeat our main analysis analysis reported in Figure 3, including only the latest 3200 tweets from every account. The results of fitting the linear mixed effects model following Eq. (1) in Table S9 show only minute deviations from the results presented in the main text where we used all tweets to fit the model (see Extended Data Table 2). Appendix S12 Increase of belief-speaking and truth-seeking similarity by account To investigate the overall increase of both belief-speaking and truth-seeking reported in Fig. 2 in the main text, we investigated which politicians contributed most to the overall increase in both honesty components. We show the top 10 accounts with the largest change in belief-speaking and truth-seeking similarity between the 2010-2013 and the 2019-2022 period for both Democrats and Republicans in Tables S10 and S11. Appendix S13 Robustness analysis using different embeddings In addition to GloVe 55 embeddings used for the results presented in the main text, we also calculated D b and D t using word2vec 56 and fasttext 49 embeddings to exclude a dependence of our results on the choice of embedding. We note that both GloVe and fasttext were trained on the "common crawl" corpus, whereas word2vec was trained on Google news, a corpus with a more restricted scope. Results for the linear mixed effects modeling following Eq. (1) using word2vec and fasttext embeddings are shown in Tables S12 and S13, respectively. Results for both word2vec and fasttext are similar to the results using GloVe reported in Extended Data Table 2.
Table S12 Results of a linear mixed effects model for the dependence of the rescaled NewsGuard score of each link S NG on belief-speaking similarity D b and truth-seeking similarity D t in tweets, with party P as fixed variable following Eq.(1). In contrast to Tables S9 and Extended Data Tables 2 and 3 in the main text, the belief-speaking and truth-seeking similarities have been calculated using word2vec 56  Table S13 Results of a linear mixed effects model for the dependence of the rescaled NewsGuard score of each link S NG on belief-speaking similarity D b and truth-seeking similarity D t in tweets, with party P as fixed variable following Eq.(1). In contrast to Tables S9 and Extended Data Tables 2 and 3 in the main text, the belief-speaking and truth-seeking similarities have been calculated using fasttext 49  Appendix S14 Increase of belief-speaking and truth-seeking similarity by keyword To asses which keywords in the belief-speaking and truth-seeking dictionaries contributed most to the increase of overall belief-speaking and truth-seeking similarity, we created embeddings of single keywords to calculate the centered and length-corrected similarity D kw of tweets to a given keyword. For every keyword, we then calculated the mean similarity for tweets from the years 2010 to 2013 and for tweets from the years 2019 to 2022. We show the increase in similarity for every keyword in Figure S11.