Leveraging deep learning to understand health beliefs about the Human Papillomavirus Vaccine from social media

Du, Jingcheng; Cunningham, Rachel M.; Xiang, Yang; Li, Fang; Jia, Yuxi; Boom, Julie A.; Myneni, Sahiti; Bian, Jiang; Luo, Chongliang; Chen, Yong; Tao, Cui

doi:10.1038/s41746-019-0102-4

Download PDF

Brief Communication
Open access
Published: 15 April 2019

Leveraging deep learning to understand health beliefs about the Human Papillomavirus Vaccine from social media

npj Digital Medicine volume 2, Article number: 27 (2019) Cite this article

5039 Accesses
20 Citations
20 Altmetric
Metrics details

Subjects

Abstract

Our aim was to characterize health beliefs about the human papillomavirus (HPV) vaccine in a large set of Twitter posts (tweets). We collected a Twitter data set related to the HPV vaccine from 1 January 2014, to 31 December 2017. We proposed a deep-learning-based framework to mine health beliefs on the HPV vaccine from Twitter. Deep learning achieved high performance in terms of sensitivity, specificity, and accuracy. A retrospective analysis of health beliefs found that HPV vaccine beliefs may be evolving on Twitter.

Detecting and monitoring concerns against HPV vaccination on social media using large language models

Article Open access 21 June 2024

Political context of the European vaccine debate on Twitter

Article Open access 22 February 2024

Understanding the rationales and information environments for early, late, and nonadopters of the COVID-19 vaccine

Article Open access 14 September 2024

Introduction

The human papillomavirus (HPV) is the most common sexually transmitted disease and causes several types of cancers, including cervical, vaginal, vulvar, penile, anal, and oropharyngeal. Although the HPV vaccine is highly effective, vaccine refusal is common among parents of adolescents.¹ Understanding parental beliefs about the HPV vaccine is an important step toward developing effective and targeted vaccine promotion strategies.^1,2 The Health Belief Model (HBM) is the most widely used conceptual framework in health behavior research to explain why people adopt behaviors that lead to healthy lives.³ Studies have found that HBM constructs are associated with HPV vaccine intention and uptake.^4,5,6

Traditional survey methods present significant limitations in assessing public health beliefs, including difficulties in reaching a large-scale population and tracking changes in real time.^7,8,9 Social media enables millions of people to voluntarily and continuously share self-generated content, which allows access to the health beliefs of a large-scale population. Understanding the large amount of free text data on social media, however, requires advanced algorithms. Previous efforts were focused on developing traditional machine learning-based approaches to understand attitudes and health beliefs toward the HPV vaccine.^10,11,12 Deep learning is a set of advanced computational models that has achieved state-of-the-art performance for various tasks in natural language understanding.^13,14,15,16 The efficacy of deep-learning-based approaches to mining health beliefs about the HPV vaccine from Twitter discussions is unknown.

Results

We focus on four primary HBM constructs: perceived susceptibility, perceived severity, perceived benefits, and perceived barriers. The inter-annotator agreements for the four HBM constructs are 0.727, 0.807, 0.831, and 0.834, respectively. Our deep-learning models achieved satisfactory results in terms of sensitivity, specificity, and accuracy on testing sets. The models achieved a mean accuracy of 80.50% for identifying HBM-related tweets and between 80.33% and 89.82% for the four HBM constructs. Table 1 shows the constructs, definition, sample tweets, and performance (estimated sensitivity, specificity, and accuracy, with their 95% confidence intervals) of the proposed deep-learning model.

Table 1 The annotation of HPV vaccination discussion on Twitter with respect to the four Health Belief Model (HBM) primary constructs and the performance of the deep-learning classifier on each annotation

Full size table

After applying the model to classify the 956,262 un-labeled tweets, we classified 652,252 tweets, obtained from 216,864 unique Twitter user IDs, as HBM related. Among the related tweets, 184,604, 243,206, 373,228, and 309,501 tweets were categorized into the four primary HBM constructs, respectively. For each month from 2014 to 2017, we calculated the number of HBM-related tweets; we further defined the prevalence of each HBM construct by calculating the ratio of the number of tweets related to that construct to the total number of HBM-related tweets. Temporal analysis of the overall data (Fig. 1) showed that the prevalence of tweets in the perceived susceptibility/severity constructs increased every year, while tweets categorized into perceived benefits/barriers decreased.

A significant shift in health beliefs was seen in 2016. We checked the Twitter discussion as well as historical news media from 2016 and found that the significant shift was due largely to promotional articles on the HPV vaccine from several influential media sources, including the New York Times (“HPV Sharply Reduced in Teenage Girls Following Vaccine, Study Says,” 23 February 2016) and Time (“The HPV Vaccine Is Lowering Infection Rates,” 22 February 2016) as well as others. These articles led to a large proportion of the discussion at that time.

As can be seen in Fig. 1, two spikes in barriers were found in February and July in 2015. We reviewed the Twitter discussion during these two time periods and identified corresponding events that contributed to the high prevalence of barriers: The spike in February was due mainly to the Toronto Star’s story on Gardasil, titled, “A Wonder Drug’s Dark Side” (February 5, 2015), whereas the spike in that July was due mainly to the news that the European Medicines Agency was conducting a review of the HPV vaccine’s side effects.

Discussion

We performed a retrospective analysis of HPV vaccine health beliefs, using Twitter data pulled from a large population. Our findings indicate that the number of tweets that correspond to certain HBM-related constructs have undergone a substantial temporal shift, which may indicate the evolving of HPV vaccine beliefs on Twitter. The decrease in the number of tweets related to perceived susceptibility/severity may reflect an improved understanding of the prevalence of HPV and HPV-related cancers as well as an increased awareness of the severity of these cancers. Likewise, the decrease in tweets related to perceived barriers may reflect a shift in parental assessment of the risk/benefit ratio in accepting the HPV vaccine for their teen. Specific events that may contribute to the changes in health beliefs were identified. Further analysis of the impact of these events could benefit the promotion of HPV vaccination. There are, however, certain limitations of our study. For example, our study did not consider information about the users and classified tweets independently. In the future, we plan to develop novel computational algorithms to understand health beliefs on the user level by analyzing the historical tweets for each user.

This study demonstrates the potential for utilizing social media to better understand HPV vaccine health beliefs. With deep-learning approaches, our study was able to map large-scale Twitter discussions on HPV vaccines to HBM constructs in a high accurate manner. Such deep-learning approaches can complement traditional surveys with real-time surveillance on the Twitter population.

Methods

Data collection and annotation

A combination of HPV vaccine-related keywords (i.e., HPV, human papillomavirus, Gardasil, and Cervarix) was used to collect 956,262 English-language tweets from 1 January 2014, to 31 December 2017, using Twitter streaming API (~1% of the entire stream volume). Three reviewers categorized a subset of 6000 tweets based on their relevance to the HBM constructs. Each tweet was assigned to none (not related to HBM), one, or multiple HBM constructs. The reviewers first annotated the same 500 tweets and resolved disagreements by discussion. Then, the reviewers categorized the remaining 5500 tweets independently. This manually categorized data set served as the gold-standard data for training and evaluation of the deep-learning model.

Deep-learning model

We frame the automatic categorization of tweets to the HBM constructs to text classification tasks. We propose an attentive recurrent neural network (RNN)-based deep-learning model for these tasks. The architecture of the proposed model can be seen in Fig. 2. Our model consists of four computation layers: (1) a token-embedding layer that maps each token (i.e., word) in the text to a 200-dimension vector; pre-trained Global Vectors for Word Representation (GloVe) Twitter (trained on 2 billion tweets)¹⁷ is used to initialize the token-embedding layer; (2) a bidirectional RNN (Bi-RNN) layer¹⁸ that takes the output of the token-embedding layer as the input and outputs a high-dimensional vector (length: 50) that represents the tweet content by capturing both forward and backward information from the text; (3) an attention layer¹⁹ that augments the bidirectional RNN layer by capturing salient information from the RNN output; and (4) a Softmax layer that normalizes the attention output into a probability distribution for classification.

We split the task into two steps: (1) categorize the tweet based on whether it is relevant to any of the HBM constructs (one classification task) and (2) categorize the relevant tweets into the four primary HBM constructs (four independent classification tasks). For Step 1, we divided all gold-standard tweets (6000 in total) into training, validation, and testing sets with a proportion of 7:1:2. For Step 2, we divided all HBM-related tweets (3264 in total) in the gold standard into training, validation, and testing sets with the same proportion. We performed hyper-parameter tuning on the validation set and evaluated the models on the testing sets. We repeated random sampling of the tweets 30 times with same proportion and calculated the sensitivity, specificity, and accuracy for each model at each time. We further calculated the mean and confidence interval of these values for each model. After the evaluation, we then applied one set of trained models to categorize the remaining un-labeled tweets into the four primary HBM constructs.

Ethics approval and consent to participate

This study received expedited review and IRB approval from the Committee for the Protection of Human Subjects at The University of Texas Health Science Center at Houston. Waiver of informed consent was granted by the IRB due to the retrospective design of the study. The approved IRB protocol number is HSC-SBMI-16–0291.

Data availability

The data that support the findings of this study are available from the corresponding author upon request. The data are not publicly available due to privacy concerns for Twitter users.

Code availability

The codes that support the findings of this study are available from the corresponding author upon request.

References

Gilkey, M. B., Calo, W. A., Marciniak, M. W. & Brewer, N. T. Parents who refuse or delay HPV vaccine: differences in vaccination behavior, beliefs, and clinical communication preferences. Hum. Vaccin. Immunother. 13, 680–686 (2017).
Article Google Scholar
Larson, H. J. et al. Measuring vaccine confidence: analysis of data obtained by a media surveillance system used to analyse public concerns about vaccines. Lancet. Infect. Dis. 13, 606–613 (2013).
Article Google Scholar
Rosenstock, I. M. The health belief model and preventive health behavior. Health Educ. Monogr. 2, 354–386 (1974).
Article Google Scholar
Reiter, P. L., Brewer, N. T., Gottlieb, S. L., McRee, A.-L. & Smith, J. S. Parents’ health beliefs and HPV vaccination of their adolescent daughters. Soc. Sci. Med. 69, 475–480 (2009).
Article Google Scholar
Donadiki, E. M. et al. Health belief model applied to non-compliance with HPV vaccine among female university students. Public Health 128, 268–273 (2014).
Article CAS Google Scholar
Skinner, C. S., Tiro, J. & Champion, V. L. The health belief model. In: Glanz K., Rimer, B. K., Viswanath, K., eds. Health Behavior: Theory, Research, and Practice. 5th ed. 75-94 (Jossey-Bass, San Francisco, 2015).
Mitra, T., Counts, S. & Pennebaker, J. Understanding Anti-Vaccination Attitudes in Social Media. In Proc. of the Tenth International AAAI Conference on Web and Social Media (ICWSM 2016) 269–278 (Cologne, Germany, 2016).
Chan, B., Lopez, A. & Sarkar, U. The canary in the coal mine tweets: social media reveals public perceptions of non-medical use of opioids. PLoS ONE 10, 1–10 (2015).
CAS Google Scholar
Kagashe, I., Yan, Z. & Suheryani, I. Enhancing seasonal influenza surveillance: topic analysis of widely used medicinal drugs using Twitter data. J. Med. Internet. Res. 19, 315–3151 (2017).
Article Google Scholar
Shapiro, G. K., Surian, D., Dunn, A. G., Perry, R. & Kelaher, M. Comparing human papillomavirus vaccine concerns on Twitter: a cross-sectional study of users in Australia, Canada and the UK. BMJ Open 7, e016869 (2017).
Article Google Scholar
Du, J., Xu, J., Song, H.-Y. & Tao, C. Leveraging machine learning-based approaches to assess human papillomavirus vaccination sentiment trends with Twitter data. BMC Med. Inform. Decis. Mak. 17, 69 (2017).
Article Google Scholar
Dunn, A. G., Leask, J., Zhou, X., Mandl, K. D. & Coiera, E. Associations between exposure to and expression of negative opinions about human papillomavirus vaccines on social media: an observational study. J. Med. Internet. Res. 17, e144 (2015).
Article Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article CAS Google Scholar
Young, T., Hazarika, D., Poria, S. & Cambria, E. Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag. 13, 55–75 (2018).
Article Google Scholar
Tang, D., Wei, F., Qin, B., Liu, T. & Zhou, M. Coooolll: A Deep Learning System for Twitter Sentiment Classification. In Proc. of the 8th International Workshop on Semantic Evaluation (SemEval–2014), 208–212 (Dublin, Ireland, 2014).
Du, J. et al. Extracting psychiatric stressors for suicide from social media using deep learning. BMC Med. Inform. Decis. Mak. 18, 43 (2018).
Article Google Scholar
Pennington, J., Socher, R. & Manning, C. D. GloVe: Global Vectors for Word Representation. Empirical Methods in Natural Language Processing (EMNLP). https://nlp.stanford.edu/projects/glove/. 2014. Accessed 3 Jan 2019.
Schuster, M. & Paliwal, K. K. Bidirectional recurrent neural networks. IEEE Trans Signal Process. 45, 2673–2681 (1997). Accessed 3 Jan 2019.
Article Google Scholar
Zhou, P. et al. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification. In Proc. of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 2: Short Papers) 207–212 (ACL, Berlin, Germany, 2016).

Download references

Acknowledgements

The authors thank Dr. Lu Tang for helpful discussion on behavior change theories. The authors also thank Dr. Sharon Lynn Bear for the language editing service. JD received funding support from the UTHealth Innovation for Cancer Prevention Research Training Program Pre-doctoral Fellowship (Cancer Prevention and Research Institute of Texas Grant No. RP160015); J.B. received funding support from National Science Foundation under Award No. 1734134; and C.T. and Y.C. recieved funding support from National Insitutes of Health under Award Nos. R01LM011829, R01AI130460, 1R01LM012607, R01AI116794 and R01LM009012.

Author information

Authors and Affiliations

School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Jingcheng Du, Yang Xiang, Fang Li, Yuxi Jia, Sahiti Myneni & Cui Tao
Texas Children’s Hospital, Houston, TX, USA
Rachel M. Cunningham & Julie A. Boom
School of Public Health, Jilin University, Changchun, China
Yuxi Jia
Baylor College of Medicine, Houston, TX, USA
Julie A. Boom
Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
Jiang Bian
Perelman School of Medicine, The University of Pennsylvania, Philadelphia, PA, USA
Chongliang Luo & Yong Chen
Institute for Biomedical Informatics, The University of Pennsylvania, Philadelphia, PA, USA
Yong Chen
Center for Evidence-based Practice, The University of Pennsylvania, Philadelphia, PA, USA
Yong Chen

Authors

Jingcheng Du
View author publications
You can also search for this author in PubMed Google Scholar
Rachel M. Cunningham
View author publications
You can also search for this author in PubMed Google Scholar
Yang Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Fang Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuxi Jia
View author publications
You can also search for this author in PubMed Google Scholar
Julie A. Boom
View author publications
You can also search for this author in PubMed Google Scholar
Sahiti Myneni
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Bian
View author publications
You can also search for this author in PubMed Google Scholar
Chongliang Luo
View author publications
You can also search for this author in PubMed Google Scholar
Yong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Cui Tao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.D. and C.T. have full access to all data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: J.D., S.M., and C.T.. Data annotation: J.D., F.L., and Y.J.. Drafting of the manuscript: J.D., R.M.C., and C.T.. Acquisition, analysis, or interpretation of data: All authors. Critical revision of the manuscript for important intellectual content: All authors. Study supervision: C.T.

Corresponding author

Correspondence to Cui Tao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Du, J., Cunningham, R.M., Xiang, Y. et al. Leveraging deep learning to understand health beliefs about the Human Papillomavirus Vaccine from social media. npj Digit. Med. 2, 27 (2019). https://doi.org/10.1038/s41746-019-0102-4

Download citation

Received: 09 December 2018
Accepted: 26 March 2019
Published: 15 April 2019
DOI: https://doi.org/10.1038/s41746-019-0102-4

This article is cited by

Detecting and monitoring concerns against HPV vaccination on social media using large language models
- Sunny Rai
- Melanie Kornides
- Sharath Chandra Guntuku
Scientific Reports (2024)
COVID-19 vaccine hesitancy: a social media analysis using deep learning
- Serge Nyawa
- Dieudonné Tchuente
- Samuel Fosso-Wamba
Annals of Operations Research (2024)
Comparability of clinical trials and spontaneous reporting data regarding COVID-19 vaccine safety
- Chongliang Luo
- Jingcheng Du
- Yong Chen
Scientific Reports (2022)