Contemporary attitudes and beliefs on coronary artery calcium from social media using artificial intelligence

Somani, Sulaiman; Balla, Sujana; Peng, Allison W.; Dudum, Ramzi; Jain, Sneha; Nasir, Khurram; Maron, David J.; Hernandez-Boussard, Tina; Rodriguez, Fatima

doi:10.1038/s41746-024-01077-w

Download PDF

Brief Communication
Open access
Published: 30 March 2024

Contemporary attitudes and beliefs on coronary artery calcium from social media using artificial intelligence

npj Digital Medicine volume 7, Article number: 83 (2024) Cite this article

777 Accesses
8 Altmetric
Metrics details

Subjects

Abstract

Coronary artery calcium (CAC) is a powerful tool to refine atherosclerotic cardiovascular disease (ASCVD) risk assessment. Despite its growing interest, contemporary public attitudes around CAC are not well-described in literature and have important implications for shared decision-making around cardiovascular prevention. We used an artificial intelligence (AI) pipeline consisting of a semi-supervised natural language processing model and unsupervised machine learning techniques to analyze 5,606 CAC-related discussions on Reddit. A total of 91 discussion topics were identified and were classified into 14 overarching thematic groups. These included the strong impact of CAC on therapeutic decision-making, ongoing non-evidence-based use of CAC testing, and the patient perceived downsides of CAC testing (e.g., radiation risk). Sentiment analysis also revealed that most discussions had a neutral (49.5%) or negative (48.4%) sentiment. The results of this study demonstrate the potential of an AI-based approach to analyze large, publicly available social media data to generate insights into public perceptions about CAC, which may help guide strategies to improve shared decision-making around ASCVD management and public health interventions.

Patient perceptions of disease burden and treatment of myasthenia gravis based on sentiment analysis of digital conversations

Article Open access 27 March 2024

Online information analysis on pancreatic cancer in Korea using structural topic model

Article Open access 23 June 2022

Using deep learning-based natural language processing to identify reasons for statin nonuse in patients with atherosclerotic cardiovascular disease

Article Open access 15 July 2022

Atherosclerotic cardiovascular disease (ASCVD) remains the leading cause of death in the United States¹. Earlier identification and intervention of ASCVD is critical for reducing its morbidity and mortality, as over a third of all ASCVD deaths occur in individuals with no prior symptoms¹. Detection of coronary artery calcification (CAC) by a specialized computed tomography (CT) scan (“CAC scan”) can help guide patient and clinicians on shared-decision making around cardiovascular risk assessment². As such, CAC scans are endorsed by multiple medical societies as power tools for personalizing cardiovascular risk and preventive therapy recommendations^3,4. CAC may also be a strong motivator for improving health behaviors, including lifestyle changes and adherence to preventive therapies like statins^5,6.

While public interest in CAC has grown over time, current public perceptions about CAC are not well-described⁷. Understanding these beliefs about CAC is critical, as it may help frame shared decision-making discussions and guide public health interventions around ASCVD. Artificial intelligence (AI)-enabled analysis of large volumes of social media data can provide an efficient approach for analyzing contemporary public opinions on common health-related topics and allow for a systematic evaluation of emerging themes⁸. Reddit is a free and widely used social media platform with over 52 million daily active users and over 30 billion views every month⁹. In this study, we leverage an artificial intelligence pipeline using natural language processing and unsupervised learning to characterize real-world perceptions about CAC using discussions on Reddit.

We extracted a total of 5606 unique CAC-related discussions (1017 posts, 4589 comments) from 3545 unique users across 990 subreddits from March 29, 2008, through May 21, 2023 (Supplementary Fig. 1). The largest number of discussions from a single author was 26, while 3463 (97.7%) authors contributed less than six discussions each. The subreddits with the most discussions were r/keto (7.5%), r/Cholesterol (7.0%), and r/AskDocs (5.8% of all discussions). The number of CAC-related discussions increased by an average of 57.2% yearly. Using a pretrained, sentence-level Bidirectional Encoder Representations from Transformers (BERT) model, we embedded these discussions into a vectorized language space, in which they were further dimensionally reduced and clustered to identify a total of 91 topics (Fig. 1). These topics were further clustered to identify 14 overarching groups. The largest topics and groups centered around CAC testing to evaluate symptoms (e.g., palpitations, chest pain, and anxiety) and de-risking non-ischemic cardiovascular disease (groups 1, 5); interpreting CAC scores in the context of lifestyle and lipid results (groups 2, 4, 8, and 10); and the disadvantages of CAC testing (e.g., financial cost, radiation risk) (Table 1). Other notable topics included indications for CAC testing (e.g., topics 10, 27, 42, 48), CAC and statins (e.g., topics 24, 31, 34), ketogenic diets can affect CAC (e.g., topics 16, 19), radiation exposure risk (e.g., topics 22, 45), insurance issues (e.g., topics 29, 30, 37, 50), and celebrities with CAC (e.g., topics 43, 55, 56). A separate pretrained BERT model was used to analyze the sentiment of each discussion, uncovering that 49.5% of discussions were neutral, 48.4% were negative, and 2.1% were positive. The average sentiment of all discussions remained stably neutral-to-negative (−0.42 – −0.50) each year from 2013 through 2023 (Supplementary Fig. 2).

**Fig. 1: Topic modeling revealed 91 topics and 14 groups.**

Table 1 Overview of Groups of Topics With Example Text

Full size table

Our AI-enabled analysis of public perceptions of CAC testing demonstrates how well our previously described algorithm for topic modeling generalizes to another clinical domain⁸. A powerful aspect of our pipeline is leveraging techniques in unsupervised machine learning that obviate the need for topic prespecification, which allows discovery of previously unexpected ideas (e.g., non-evidence-based use of CAC). Such topic modeling analyses can also provide clinical insights that may be further explored to test generated hypotheses. By harnessing the power of AI on pre-existing datasets, we demonstrate a fast, inexpensive method of gathering public opinions that would otherwise require time- and finance-intensive clinical registries and user surveys to collect. Through this efficient extraction and interpretation of large volumes of social media data, AI also offers the ability to continuously evaluate public sentiment over time, monitor for emerging topics, and stream clinical insights to key stakeholders that could impact clinical care.

Our study revealed several noteworthy insights about public perceptions around CAC testing. First, CAC testing had a strong impact on therapeutic decision-making. Many discussions emphasized the power of a CAC score of zero as way of de-risking individuals and avoiding statin therapy. While a CAC-based de-escalation strategy is supported by practice guidelines, the presence of other risk-enhancing lifestyle or clinical factors (e.g., diabetes) may affect these decisions¹⁰. Conversely, many discussions where a non-zero CAC was noted demonstrated how these findings helped motivate lifestyle changes. Ultimately, CAC interpretation is nuanced, and our study highlights that public discussions around interpretation of CAC results may not always be guideline-concordant, underscoring the need for patient and clinician shared-decision making.

Second, there were several discussions surrounding non-evidence-based uses of CAC testing, including for evaluation of patients with cardiac symptoms, such as chest pain and palpitations. This may be discordant with current clinical guidelines, which endorse the use of CAC testing in primary prevention among asymptomatic patients, particularly those with intermediate ASCVD risk³. Many discussions also misattributed the negative predictive value of a CAC scan to evaluate non-specific symptoms typically not related to ASCVD risk assessment, which may further misrepresent the current indications for CAC to the public. Future work may focus on evaluating the dynamics of how such misinformation can be amplified in social media frameworks and ultimately help determine optimal strategies for containing their spread.

Third, we identified discussions regarding the disadvantages of CAC testing, including out-of-pocket costs due to lack of insurance coverage and radiation exposure. However, many individuals still found value in CAC testing despite costs and radiation. The cost-effectiveness of CAC has been reported elsewhere in the literature¹¹. Although the radiation risk associated with CAC testing is minimal, similar to ambient radiation from living in large cities¹², our work identified that patients may be concerned about this risk when deciding to pursue CAC testing.

Finally, we found that the sentiment around CAC-related discussions was mostly neutral-to-negative. This is consistent with prior studies evaluating healthcare discussions on Reddit, which identify a negative tone and expressions of sadness, fear, and anger that is believed to reflect the underlying patient experience in a complex healthcare environment¹³. This negativity bias is well reported in the media and can impact health outcomes¹⁴, suggesting the importance of public health efforts to moderate misinformation¹⁵.

This study should be interpreted in context of its limitations. Discussions in this study reflect views of Reddit users, who have historically been younger and may not be broadly representative of patients at high risk of ASCVD¹⁶; however, CAC testing is most appropriate for lower and intermediate risk individuals. While a variety of search terms were used, this dataset may not capture all CAC-related discussions on Reddit if individuals use other terms to refer to CAC. Clustering techniques we employed may reflect linguistic concordance to determine similarity rather than clinical concordance, which may lead to seemingly redundant topics and groups. This limitation highlights how AI can augment, but not replace, researchers in analyzing large datasets, and opens the door to consider how more advanced NLP techniques, like large language models, can improve this pipeline.

In this AI-enabled qualitative study of discussions on Reddit, we identified contemporary public perceptions and sentiments around CAC, which included the impact of CAC on therapeutic decision-making, non-evidence-based use of CAC testing, and the perceived downsides of CAC testing. The themes uncovered from this study highlight potential areas of patient concern and misinformation that can be addressed to improve shared decision-making around ASCVD management, improve statin adherence rates, and reduce ASCVD morbidity and mortality.

Methods

Dataset

Reddit (www.reddit.com) was used as the data source for this study¹⁷. It is composed of communities called ‘subreddits’ which are prefixed by “r/” and are focused on specific topics (e.g., r/AskDoctors, r/WorldNews, r/Keto). Users may interact with the platform by creating a “post” to initiate a new discussion thread and by commenting on other users’ posts as part of discussions (“comments”). Most subreddits, including all posts and comments contained within them, are openly accessible and visible without having to create a Reddit user account.

To create a list of CAC-related discussions from Reddit, an Application Programming Interface (API) called PushShift was used to search all the posts and comments on Reddit for case-insensitive matching on the following commonly used terms for CAC scans: “coronary artery calcium”, “coronary calcium”, “cac score”, “calcium score”, and “heart scan”^7,18.

This study was deemed exempt from ethical review since it did not involve human subjects as defined in 45 United States’ Code of Federal Regulations (CFR) 46.102(f) or 21 CFR 50.3(g).

Data analysis

Details around topic modeling and sentiment analysis in this paper are described elsewhere⁸. Briefly, after preprocessing, discussions are embedded into a numerical representation using a pretrained, sentence-level Bidirectional Encoder Representations from Transformers (BERT) model called all-MiniLM-L6-v2¹⁹, which has been trained on over 600 million Reddit posts and a dataset containing over 12 million papers from medical journals. This embedding was then simplified into a smaller representation using the Uniform Mapping Approximation and Projection algorithm to improve clustering performance into topics using Spectral Clustering. Since topics may be similar in content but be differentiated by other embedded features from the model (e.g., linguistic style, tone), a subsequent clustering analysis was performed to find overarching themes of discussion (“groups”). The number of topics and groups were automatically determined based on optimizing the Silhouette Coefficient and Davies-Bouldin Index, which are mathematical measures of how similar discussions are within a cluster relative to how similar those discussions are to those in other clusters. A separate BERT model, RoBERTa, pretrained on social media posts, was used to classify sentiment (i.e., “positive”, “neutral’, or “negative” classification of text).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The data used in this manuscript are available at https://github.com/sssomani/cac_reddit.

Code availability

The code used in this manuscript is available at https://github.com/sssomani/cac_reddit.

References

Tsao, C. W. et al. Heart disease and stroke statistics-2023 update: a report from the American heart association. Circulation 147, e93–e621 (2023).
Article PubMed Google Scholar
Greenland, P. & Lloyd-Jones, D. M. Role of coronary artery calcium testing for risk assessment in primary prevention of atherosclerotic cardiovascular disease. JAMA Cardiol. 7, 219 (2022).
Article PubMed Google Scholar
Grundy, S. M. et al. 2018 AHA/ACC/AACVPR/AAPA/ABC/ACPM/ADA/AGS/APhA/ASPC/NLA/PCNA guideline on the management of blood cholesterol: a report of the American college of cardiology/American heart association task force on clinical practice guidelines. Circulation 139, e1082–e1143 (2019).
Golub, I. S. et al. Major global coronary artery calcium guidelines. JACC Cardiovasc. Imaging 16, 98–117 (2023).
Article PubMed Google Scholar
Sandhu, A. T. et al. Incidental coronary artery calcium: opportunistic screening of previous nongated chest computed tomography scans to improve statin rates (NOTIFY-1 project). Circulation 147, 703–714 (2023).
Article PubMed Google Scholar
Muhlestein, J. B. et al. Effect on patient adherence to primary prevention recommendations for statin therapy based on the national guidelines-supported pooled cohort risk equation or a coronary artery calcium score: preliminary findings from the vanguard study for the corcal randomized clinical outcomes trial. J. Am. Coll. Cardiol. 75, 5 (2020).
Article Google Scholar
Dzaye, O. et al. Temporal trends and interest in coronary artery calcium scoring over time: an infodemiology study. Mayo Clin. Proc. Innov. Qual. Outcomes 5, 456–465 (2021).
Article PubMed PubMed Central Google Scholar
Somani, S., van Buchem, M. M., Sarraju, A., Hernandez-Boussard, T. & Rodriguez, F. Artificial intelligence-enabled analysis of statin-related topics and sentiments on social media. JAMA Netw. Open. 6, e239747 (2023).
Article PubMed PubMed Central Google Scholar
Curry, D. Reddit Revenue and Usage Statistics (2023) https://www.businessofapps.com/data/reddit-statistics/ (2020).
Patel, J. et al. Assessment of coronary artery calcium scoring to guide statin therapy allocation according to risk-enhancing factors. JAMA Cardiol. 6, 1161 (2021).
Article PubMed Google Scholar
Venkataraman, P. et al. Cost-effectiveness of coronary artery calcium scoring in people with a family history of coronary disease. JACC Cardiovasc. Imaging 14, 1206–1217 (2021).
Article PubMed Google Scholar
Gerber, T. C. & Gibbons, R. J. Weighing the risks and benefits of cardiac imaging with ionizing radiation. JACC Cardiovasc. Imaging 3, 528–535 (2010).
Article PubMed Google Scholar
Maleki, N., Padmanabhan, B. & Dutta, K. The effect of monetary incentives on health care social media content: study based on topic modeling and sentiment analysis. J. Med. Internet Res. 25, e44307 (2023).
Article PubMed PubMed Central Google Scholar
Indremo, M., Jodensvi, A. C., Arinell, H., Isaksson, J. & Papadopoulos, F. C. Association of media coverage on transgender health with referrals to child and adolescent gender identity clinics in Sweden. JAMA Netw. Open. 5, e2146531 (2022).
Article PubMed PubMed Central Google Scholar
Trethewey, S. P. Medical misinformation on social media. Circulation 140, 1131–1133 (2019).
Article PubMed Google Scholar
Stocking, G., Holcomb, J. & Mitchell, A. 1. Reddit News Users More Likely to be Male, Young and Digital in Their News Preferences https://www.pewresearch.org/journalism/2016/02/25/reddit-news-users-more-likely-to-be-male-young-and-digital-in-their-news-preferences/ (2016).
Reddit. Dive Into Anything https://www.reddit.com (2023).
Rodriguez, F. et al. Readability of online patient educational materials for coronary artery calcium scans and implications for health disparities. J. Am. Heart Assoc. 9, e017372 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hugging Face. Sentence-Transformers/all-MiniLM-L6-v2 https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 (2023).

Download references

Acknowledgements

Dr. Rodriguez was funded by grants from the NIH National Heart, Lung, and Blood Institute (1K01HL144607; R01HL168188), the American Heart Association/Harold Amos Faculty Development program, and the Doris Duke Charitable Foundation (Grant #2022051). These funding organizations played no role in study design, data analysis, manuscript preparation, or the decision to submit for publication.

Author information

Authors and Affiliations

Department of Medicine, Stanford University, Stanford, CA, USA
Sulaiman Somani & Allison W. Peng
Cardiovascular Institute, Stanford University, Stanford, CA, USA
Sulaiman Somani, Allison W. Peng, Ramzi Dudum, Sneha Jain, David J. Maron & Fatima Rodriguez
Department of Medicine, University of California, San Francisco—Fresno, Fresno, CA, USA
Sujana Balla
Division of Cardiovascular Medicine, Stanford University, Stanford, CA, USA
Ramzi Dudum, Sneha Jain, David J. Maron & Fatima Rodriguez
Division of Cardiovascular Prevention and Wellness, Department of Cardiology, Houston Methodist DeBakey Heart & Vascular Center, Houston, TX, USA
Khurram Nasir
Stanford Prevention Research Center, Palo Alto, CA, USA
David J. Maron
Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
Tina Hernandez-Boussard
Center for Digital Health, Stanford University, CA, USA
Fatima Rodriguez

Authors

Sulaiman Somani
View author publications
You can also search for this author in PubMed Google Scholar
Sujana Balla
View author publications
You can also search for this author in PubMed Google Scholar
Allison W. Peng
View author publications
You can also search for this author in PubMed Google Scholar
Ramzi Dudum
View author publications
You can also search for this author in PubMed Google Scholar
Sneha Jain
View author publications
You can also search for this author in PubMed Google Scholar
Khurram Nasir
View author publications
You can also search for this author in PubMed Google Scholar
David J. Maron
View author publications
You can also search for this author in PubMed Google Scholar
Tina Hernandez-Boussard
View author publications
You can also search for this author in PubMed Google Scholar
Fatima Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.S. and F.R. conceived and designed the study. S.S. acquired the data. S.S., S.B., A.W.P. and S.S.J. helped analyze the data. S.S. drafted the manuscript. All authors reviewed the data and revised the manuscript.

Corresponding author

Correspondence to Fatima Rodriguez.

Ethics declarations

Competing interests

F.R. reports consulting relationships with Healthpals, Novartis, NovoNordisk (CEC), Movano Health, Esperion Therapeutics, Kento Health, Inclusive Health, Arrowhead Pharmaceuticals, HeartFlow, and Edwards outside the submitted work. The remaining authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental Material

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Somani, S., Balla, S., Peng, A.W. et al. Contemporary attitudes and beliefs on coronary artery calcium from social media using artificial intelligence. npj Digit. Med. 7, 83 (2024). https://doi.org/10.1038/s41746-024-01077-w

Download citation

Received: 17 November 2023
Accepted: 07 March 2024
Published: 30 March 2024
DOI: https://doi.org/10.1038/s41746-024-01077-w