A simulation-based analysis of the impact of rhetorical citations in science

Authors of scientific papers are usually encouraged to cite works that meaningfully influenced their research (substantive citations) and avoid citing works that had no meaningful influence (rhetorical citations). Rhetorical citations are assumed to degrade incentives for good work and benefit prominent papers and researchers. Here, we explore if rhetorical citations have some plausibly positive effects for science and disproportionately benefit the less prominent papers and researchers. We developed a set of agent-based models where agents can cite substantively and rhetorically. Agents first choose papers to read based on their expected quality, become influenced by those that are sufficiently good, and substantively cite them. Next, agents fill any remaining slots in their reference lists with rhetorical citations that support their narrative, regardless of whether they were actually influential. We then turned agents’ ability to cite rhetorically on-and-off to measure its effects. Enabling rhetorical citing increased the correlation between paper quality and citations, increased citation churn, and reduced citation inequality. This occurred because rhetorical citing redistributed some citations from a stable set of elite-quality papers to a more dynamic set with high-to-moderate quality and high rhetorical value. Increasing the size of reference lists, often seen as an undesirable trend, amplified the effects. Overall, rhetorical citing may help deconcentrate attention and make it easier to displace established ideas.


Supplementary Discussion
The Main paper considered several moderation variables, e.g., literature size.Here, as we provide supplementary discussions by varying various parameters to test for robustness, we also parameter-sweep arguably the most important and policy-relevant moderator -the citing budget.As we change the focal parameters, we show how the results vary as the citing budget goes from 20 to 100 and includes more rhetorical citations.

Varying quality and initial rhetorical value distributions
In the Main model, we assumed long-tail distributions for quality and initial rhetorical value.
The tail's fatness may have a significant effect on three metrics of community health.Here, we examine the robustness of our principal conclusions using different value distributions.In the Main model we used (1, w) for distributions of quality and rhetorical value.Parameter w β determines the fatness of the distribution's tail, which we set w at 6.We vary it to 4 and 8 to see how the results will change with more/fewer high-value papers, respectively.In addition, we consider a normal distribution N(0.5, 0.1).All distributions are shown in Figure S1, with 10000 draws from each.

Fewer high-value papers
Similarly to the above, using a distribution with fewer papers of high quality and rhetorical value (w = 8), we find that rhetorical citing still improves the three metrics of health, consistent with the Main results.

Varying distributions of threshold
In the Main results, we sampled adoption thresholds for agents from a uniform distribution on

Varying size of perception error
In the Main result, perception error was distributed Normal(0, 0.05).Here, we consider how the results change when the standard deviation is lowered to 0.02 or raised to 0.1.

Higher noise
Figure S6 shows that rhetorical citing improves the three metrics, consistent with the Main results.
Fit and perception errors can vary a paper's quality from 0.5 to a random number in [0.1, 0.9].For each panel, the shaded areas indicate the bootstrapped 95% confidence intervals derived from 20 simulation runs, while the lines depict the average values across these runs.

Lower noise
Figure S7 shows that rhetorical citing increases the correlation (left panel), churn (middle panel), and decreases citation inequality (right panel), consistent with the Main results.
Fit and perception errors can vary a paper's quality from 0.5 to a random number in appx.

Less varied fit
Figure S8 shows that rhetorical citing improves the three metrics, consistent with the Main results.For each panel, the shaded areas indicate the bootstrapped 95% confidence intervals derived from 20 simulation runs, while the lines depict the average values across these runs.

More varied fit
Figure S9 shows that rhetorical citing improves churn (the middle panel) and citation inequality (the right panel).However, the citation-quality correlation changes little between the full and null models.The intuition behind this result is that with a more varied fit, existing perceived quality of papers does not much affect whether researchers cite them subsequently as varied fits can significantly raise or lower the quality in readers' eyes.Hence, the lock-in effect observed in the null models in our Main results is reduced.In effect, rhetorical citing has similar but much stronger effects on the community health metrics as adding substantial person-specific fits to the citing process.

Varying reinforcement strengths
Varing α The parameter measures how a paper's citation count affects perceptions of its quality, with α . Note that because the maximum citation count is 1000, the maximum effect of α = 0. 001 citations on perceived quality is 0.001*1000 = 1, which equals the maximum underlying quality and maximum underlying rhetorical value.Here, we consider the case α = 0. 002 (maximum citation premium=2) and (citations have no impact on perceived quality).α = 0 Figure S10 shows that the full model outperforms the two null models in the three metrics if readers do not rely on citations to perceive quality.In contrast, Figure S11 shows that under a higher reinforcement strength, the full model outperforms the two null models on churn and inequality, but is similar on correlation.Greater reinforcement increases the concentration of substantive citations among elite-quality papers, and because perceived quality is a component of rhetorical value, increases rhetorical citations on them as well, diminishing the differences between the models.

Supplementary Methods
In contrast to the heterogeneous agent models presented in Main, homogeneous agents perceive quality identically (there is no fit i,j ), perceive rhetorical value identically, and have the same threshold for adoption 0.5.We initialize models with homogeneous agents the same way as with heterogeneous ones: literature size = 600, reading budget = 120, citing budget = 40, and timesteps = 1000.Figure S12 shows the citation distribution across paper quality.
After 1000 iterations, the distribution is nearly bimodal with some papers having 0 citations and some 1000.Given that citation distributions in practice are never bimodal, we do not believe such a simple model is sufficiently realistic.For completeness, we present all three community health metrics for the three versions of this model (Full, Null-reference, Null-threshold) in Figure S13, but do not interpret it further.Null-threshold -all with homogeneous agents, after 1000 iterations.

Figure S1 .
Figure S1.Value distributions.Additional distributions for quality and initial rhetorical value: more high-value papers (1, 4), fewer high-value papers (1, 8), and values following a β β Gaussian distribution N(0.5, 0.1) among papers.The presented model in the Main paper uses a Beta distribution (1, 6) of values with moderate numbers of high-value ones.10,000 draws β from each distribution are displayed.

Figure S2 .
Figure S2.The results with more high-value papers (1, 4).For each panel, the shaded areas β indicate the bootstrapped 95% confidence intervals derived from 20 simulation runs, while the lines depict the average values across these runs.

Figure S3 .
Figure S3.The results with fewer high-value papers (1, 8).For each panel, the shaded areas β indicate the bootstrapped 95% confidence intervals derived from 20 simulation runs, while the lines depict the average values across these runs.

Figure S4 .
Figure S4.The results under a normal distribution of values N(0.5, 0.1).For each panel, the shaded areas indicate the bootstrapped 95% confidence intervals derived from 20 simulation runs, while the lines depict the average values across these runs.

Figure
FigureS5shows that rhetorical citing improves the three metrics, consistent with the Main results.

Figure S5 .
Figure S5.Three models with different thresholds, N(0.5, 0.2) truncated within [0,1].For each panel, the shaded areas indicate the bootstrapped 95% confidence intervals derived from 20 simulation runs, while the lines depict the average values across these runs.

[ 0 .
35, 0.65].For each panel, the shaded areas indicate the bootstrapped 95% confidence intervals derived from 20 simulation runs, while the lines depict the average values across these runs.1.4Varying distribution of fitIn the Main result, we sampled fit for agents from a uniform distribution on [-0.1,0.1].Here, we consider how the results change when the range of the distribution is increased to [-0.2,0.2] or decreased to [-0.05,0.05].

Figure S8 .
Figure S8.Three metrics of community health with a less varied fit, Uniform(-0.05, 0.05).Fit and perception errors can vary a paper's quality from 0.5 to a random number in [0.3, 0.7].

Figure S9 .
Figure S9.Three metrics of community health with a more varied fit, Uniform(-0.2, 0.2).Fit and perception errors can vary a paper's perceived quality from 0.5 to a random number in

Figure S10 .
Figure S10.Three metrics under zero reinforcing strength ( ).For each panel, the α = 0 shaded areas indicate the bootstrapped 95% confidence intervals derived from 20 simulation runs, while the lines depict the average values across these runs.

Figure S11 .
Figure S11.Three metrics under a higher reinforcing strength ( .002).For each panel, α = 0 the shaded areas indicate the bootstrapped 95% confidence intervals derived from 20 simulation runs, while the lines depict the average values across these runs.

Figure S12 .
Figure S12.Full models after 1000 iterations with different values of .For each panel, the β shaded areas indicate the bootstrapped 95% confidence intervals derived from 20 simulation runs, while the lines depict the average values across these runs.

Figure S13 .
Figure S13.Citation distribution in the homogeneous model after 1000 iterations.

Figure S14 .
Figure S14.Metrics of community health produced by three models -Full, Null-reference,