INTRODUCTION

Rare diseases affect an estimated 300 million people worldwide [1]. Though definitions vary across countries, the National Institutes of Health defines a rare disease as one affecting fewer than 200,000 people in the United States [2]. Approximately 80% of rare diseases are genetic in etiology [2]. Patients with rare diseases experience extended diagnostic odysseys lasting an average of six years, while some individuals remain undiagnosed indefinitely [1]. Though specific symptoms vary widely, many rare disease patients suffer from complex, poorly understood medical conditions, and the vast majority of rare diseases lack a Federal Drug Administration–approved treatment [3]. Patients and families struggle to access health-care providers with sufficient knowledge of their conditions and must often coordinate health care across multiple specialty providers [4].

Conducting health research on patients with rare diseases is particularly challenging. The low prevalence of each condition means that those who share a given diagnosis are likely to be geographically dispersed, making it difficult to recruit sufficient numbers of patients for research [4]. However, the rapid expansion of social media over the last 15 years has provided a new opportunity for rare disease patients to find each other. There is evidence that rare disease patients use social media frequently for a range of purposes, including social and informational support, research, and advocacy [5]. Due in part to rare disease patients’ high utilization of social media, there has been growing interest in using social media to facilitate rare disease research across the spectrum of clinical and translational research [6].

Social media methods have several advantages over traditional methods (e.g., clinic-based recruitment) in health research, including increased access to patients, larger sample sizes, and more efficient recruitment [7]. Additionally, the content users generate on social media may provide valuable patient-reported data on disease course, health outcomes, and quality of life, as well as a new forum for delivery and evaluation of targeted health interventions [8]. For rare disease research, social media has been used to collect patient histories [9], examine patient needs via content analysis of support group posts [10], recruit rare disease patients for studies [11], and allow for data mining of information on symptoms and health outcomes [12]. Additionally, research has examined social media itself to understand how and why rare disease patients and their families use these online platforms [12]. Social media presents an opportunity to conduct new types of research in rare disease and to address longstanding challenges in research with these patients and families.

There are also potential drawbacks to using social media for studying rare diseases, including issues of representation and generalizability in study samples. Research in other conditions suggests that samples drawn from social media may be subject to biases in terms of gender, race/ethnicity, and age [13]. As there are over 7,000 rare diseases, findings or methods applicable to one disease may not apply to others [2]. This is particularly a concern in studies that make claims about the experiences of “rare disease patients” overall. Systematic over- or underrepresentation of certain diseases, or of patients from certain sociodemographic subgroups, could have a cumulative effect of biasing our knowledge of rare diseases. While social media represents a potentially powerful tool for rare disease research, it is necessary to understand these possible drawbacks.

The goal of this study is to systematically review the peer-reviewed academic literature on the use of social media in rare disease research. In this review, we examine how social media has been used in rare disease research, the types of research questions examined, the methods used, and the characteristics of participants included in these studies, with a focus on identifying gaps and opportunities in rare disease research using social media.

MATERIALS AND METHODS

We conducted a systematic review of the peer-reviewed academic literature. Our preregistered protocol containing our detailed methods is available at Open Science Framework (https://osf.io/, protocol ID 97fd6).

Eligibility criteria

We included studies that met the following criteria: (1) focused on the topic of rare or undiagnosed genetic diseases, (2) used social media to conduct the research, and (3) were published in English in a peer-reviewed journal between 1 January 2004 and 10 November 2020. We chose 2004 as our start date because MySpace, the first widely used social media site, was launched in this year [14]. We included rare diseases that meet the US definition of rare disease (a disease with a prevalence of fewer than 200,000 US citizens) [2]. We defined social media as any online site with user-generated content that also allowed for direct communication between user-specific profiles and groups. Examples included Facebook, Twitter, Reddit, and YouTube. We included studies that focused on rare diseases in general, a single rare disease, a group of diseases including at least one rare disease, or caregivers for those with rare diseases. We excluded studies that focused on rare infectious diseases or rare diseases with a known nongenetic etiology, as research on these acquired diseases may focus on issues such as prevention, and therefore may not be applicable to most rare diseases.

Search strategy

We developed a sensitive search strategy in collaboration with an academic reference librarian (A.L.W.). Our rare disease search terms included general terms for rare disease as well as keywords based on the rare disease categories defined by the Genetic and Rare Diseases Information Center (GARD) and Orphanet [15]. Our social media search terms included synonyms for social media (e.g., “online forum”) as well as the names of the ten social media sites with the largest number of global users during our search period [14]. We adapted our search string for six databases. The databases used were PubMed, Embase, and CINAHL (via Ebsco) to find biomedical literature; PsycINFO (via ProQuest) to find psychology literature; Communications & Mass Media Complete (via ProQuest) to find social media and communications literature; and Web of Science to find interdisciplinary literature. Search strings for each database can be found in Appendix A.

To supplement our broad search, we also conducted targeted searches within the following journals that focus on rare or genetic diseases and/or online research: Rare Diseases, Intractable & Rare Diseases Research, Orphanet Journal of Rare Diseases, Rare Diseases of the Immune System, Rare Tumors, Journal of Genetic Counseling, Genetics in Medicine, and The Journal of Medical Internet Research. Finally, we manually reviewed the reference lists of all included articles, as well as any relevant reviews identified through our search, and screened all articles with keywords such as “rare disease” and “social media” based on our inclusion criteria.

Article selection

The results of our database and manual searches were exported into Zotero [16]. Duplicates and retracted papers were removed, and the remaining articles were uploaded into Covidence [17]. One reviewer (E.G.M.) screened each article for eligibility by title and abstract. Articles that clearly failed the eligibility criteria were excluded. Two reviewers (E.G.M. and M.C.H.) then independently reviewed the full text of the remaining articles to assess eligibility. Disagreements between reviewers were resolved through an iterative consensus process involving multiple rounds of deliberative discussion.

Data extraction

We extracted detailed study characteristics and recorded them in Microsoft Excel, version 16.45 [18]. One author (E.G.M.) extracted verbatim text from each article relevant to publication details, study aims, methods, participants, results, strengths, and limitations from all eligible studies. The verbatim text excerpts were then uploaded into Dedoose, where categorical variables were created and assigned by two independent reviewers (E.G.M. and G.F.) [19]. Variables were created to categorize study aims, methods used, study design, disease categories, specific disease(s) studied, role of social media, social media site(s) used, justification for social media sampling frame, and countries represented in the study. We used disease categories taken from the GARD website to categorize rare diseases into groups [2]. A full codebook with definitions can be found in Appendix B.

Data analysis

We conducted descriptive data analysis using Excel to generate summary statistics for the entire sample. We also summarized participant demographics across each study, as well as patient demographics when a given study participant was a caregiver.

RESULTS

Our initial search yielded 3,437 articles, and 12 more were found in our manual search. After removal of duplicates and screening, 120 articles were included (Fig. 1) [6, 8,9,10,11,12, 20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133]. Supplementary Table 1 includes detailed data for each study.

Fig. 1: PRISMA flow diagram detailing screening and selection of articles.
figure 1

Final sample size = 120 articles.

Study characteristics

The studies reflected four broad goals, with some studies addressing multiple goals. The most common goal was the evaluation of the psychosocial challenges and needs of patients or caregivers, including topics related to mental health, social support, coping behaviors, and health-care access (n = 48, 40.0%), e.g., [29], [65], [88]. Many studies sought to evaluate patient physical health status or outcomes (n = 40, 33.3%), examining topics like survival comparisons between countries [21], the effectiveness of various treatments, e.g., [24], [58], [113], and phenotypic characterizations of diseases, e.g., [44], [53], [74]. Some studies aimed to gather information on patients’ use of social media itself (n = 34, 28.3%), e.g., [28], [38], [56], and 14 studies (11.7%) aimed to develop social media research methods, typically through feasibility studies e.g., [27], [49], [70] (Table 1).

Table 1 Study characteristics.

The majority of studies were observational (n = 114, 95.0%), e.g., [6], [40], [73], and cross-sectional (n = 107, 89.2%), e.g., [35], [67], [126]. More than half (n = 69, 57.5%) used surveys to collect data from rare disease patients or caregivers, e.g., [32], [66], [105], and a third (n = 37, 30.8%) conducted secondary data analyses of existing social media content (e.g., posts, videos, tweets), e.g., [42], [63], [102]. Other methodologies included telephone, video, or in-person interviews (n = 10, 8.3%), e.g., [27], [35], [96], and/or clinical research (n = 11, 9.2%) such as physical evaluations or natural history studies, e.g., [40], [59], [74]. Five studies (4.2%) involved an online intervention in which a social media group was created for the purposes of the study [8, 20, 67, 81, 94] (Table 1).

The number of studies published annually increased over time. No studies were published before 2007. The year with the most studies published (n = 24) was 2020, even though our review only included studies published before 10 November 2020, and so did not capture the entire calendar year. The use of surveys increased the most from 2004 to 2020 compared to other methods (Supplementary Fig. 1).

Types and uses of social media

Eleven different social media sites were used across the 120 studies, and 22 studies (18.3%) used more than one platform, e.g., [73], [112], [120]. The most commonly used platforms were Facebook (n = 59, 49.2%), e.g., [77], [123], [131], and Twitter (n = 28, 23.3%), e.g., [9], [85], [100] (Table 2).

Table 2 Social media use characteristics.

Across all studies, 79 (65.8%) used social media for recruitment of study participants, including for online surveys, in-person studies, clinical trials, or phone, video, and in-person interviews, e.g., [33], [92], [96]. Of these 79 studies, 64 also used social media for primary data collection (including through a link to an online survey hosted by a different platform), e.g., [88], [106], [118]. Secondary data analysis of the existing content of social media sites was the second most common method used (n = 38, 31.7%), e.g., [10], [104], [120]. Only four studies used social media for intervention delivery. Examples of intervention delivery included the creation of a social media site to determine its impact on the psychological well-being of rare disease patients or caregivers [20, 22, 94] and the use of social media to provide postoperative care training for rare disease patients [8] (Table 2).

Diseases studied

The 120 studies included 101 different diseases. The majority of studies (n = 89, 74.2%) focused on a single rare disease, e.g., [21], [87], [128], while others focused on a subset of multiple rare diseases (n = 18, 15.0%) e.g., [50], [116], [117], or a mixture of rare and nonrare diseases (n = 7, 5.8%) [9, 45, 47, 57, 121, 126, 127]. A subset (n = 5, 4.2%) stated a focus on “rare diseases” but did not specify which diseases were included [11, 12, 27, 38, 60], and one of the studies (0.8%) focused on undiagnosed rare diseases (Fig. 2) [81].

Fig. 2: Diseases studied.
figure 2

Pie chart of diseases studied by rare disease focus. Accompanying table of the most commonly studied rare diseases across all studies. ALS: amyotrophic lateral sclerosis, BPDCN: blastic plasmacytoid dendritic cell neoplasm.

Across all studies, cystic fibrosis (CF; n = 14, 11.6%) was the most frequently studied disease, e.g., [35], [75], [88], followed by amyotrophic lateral sclerosis (ALS; n = 7, 5.8%), e.g., [56], [57], [126], blastic plasmacytoid dendritic cell neoplasm (BPDCN) [100,101,102,103], and Huntington disease [10, 26, 41, 49] were each included in 4 (3.3%) studies, and Hirschsprung disease [8, 128, 129], myeloproliferative neoplasm [99, 103, 132], neurofibromatosis type 1 [20, 70, 110], and sickle cell disease [28, 82, 120] were each included in 3 (2.5%) studies (Fig. 2). All other diseases were represented in two or fewer studies. Eighteen (15.0%) of the studies focused on rare cancers, e.g., [64], [94], [100]. Supplementary Tables 2 and 3 contain a full list of diseases and disease categories included across studies.

Study participant characteristics

Of the 84 studies that included data collection from human subjects, 40.5% (n = 34) included only rare disease patient participants, e.g., [43], [83], [88]; 19.0% (n = 16) included only caregivers of rare disease patients, e.g., [20], [94], [96], [114]; and 39.3% (n = 33) included both caregivers and patients, e.g., [50], [81], [97]. Demographic reporting included patient self-report, caregiver self-report, and caregivers reporting the demographics of the patients they cared for. In addition, some studies included both patients and caregivers, but did not disaggregate patient demographic data by reporter.

Across all studies, race/ethnicity of both patients and caregivers was underreported compared to age and sex. Demographics of caregivers were reported less frequently than patient demographics (Fig. 3).

Fig. 3: Percentage of studies reporting age, race, and sex of patients and caregivers.
figure 3

Patient group includes self-reported patients, caregiver-reported patients, and patients for whom the reporter is unclear.

Race/ethnicity

Only 20 studies reported information on patient race and/or ethnicity e.g., [70, 27, 117, 133] (Fig. 3). ethnicity, /ethnicity, we summarized demographics across studies using sociopolitical categories (i.e., Black, White), though studies used a range of both social and ancestry-based terms [134]. Across all studies reporting patient race/ethnicity (n = 20 papers) (including patient self-reported, caregiver reporting patients, and patients for whom the reporter is not specified) a mean of 85.0% (±11.0%) of patients identified as White. For the subset of these papers in which rare disease patients self-reported their race/ethnicity (n = 12 papers), a mean of 88.6% (±12.3%) of patients identified as White. No studies reported the race/ethnicity of the rare disease patient when only the caregiver was reporting. When caregivers reported their own race/ethnicity (n = 6 papers), a mean of 78.6% (±22.0%) of caregivers identified as White (Table 3).

Table 3 Study participant demographics by reporter.

Reporting of non-White racial and ethnic categories was widely variable. Of the 26 studies that reported either patient or caregiver race/ethnicity, 10 reported only percentage White, e.g., [40], [58], [114], 5 reported only percentage White and percentage Hispanic [22, 27, 29, 48, 117], and 11 reported percentage White and at least one other race, e.g., [26], [70], [133]. Among the 16 studies reporting race/ethnic categories other than White, an average of 4.6% of respondents identified as Black (±3.2%), 5.4% as Asian American race (±3.7%), 15.4% as Hispanic (±27.7%), and 1.03% as Native American (±1.2%).

Sex

In studies with rare disease patient participants (n = 35 papers), an average of 80.0% were female (±21.0%) (Table 3). However, when we excluded studies that focused on diseases that disproportionately affected one sex or that used gender-biased recruitment methods (e.g., recruiting from a women’s group) (n = 17 papers), e.g., [35], [67], [125], the mean dropped to 63.8% (±15.6%) female.

In studies with only caregivers reporting on patient demographics (n = 17 papers), 52.9% reported the sex of the cared-for patient (n = 9 papers). For those studies that did report patient sex, a mean of half (51.2% ± 6.7%) of patients cared for were female. Caregivers reported their own sex in only 33.3% of studies (n = 16 papers out of 45 studies with at least some caregiver reporting), with an average of 77% female (±23.5%) across studies. One study had skewed recruitment because it recruited from a fathers’ group [94]. When this study was excluded, an average of 83% of caregiver participants were female across studies (±11.8%) (Table 3 and Fig. 3).

Across all studies (including patient-reported, caregiver-reported, and those in which the reporter was not specified) (n = 62 papers), approximately two-thirds (70.1% ± 22.5%) of all patients were female, suggesting that rare disease research on social media may underreport accounts from male rare disease patients.

Age

In studies with rare disease patient participants (n = 36 papers), 91.7% (n = 33) reported patient age. Of these, all studies included adult patients, but only 12.1% (n = 4) also included pediatric patients (<18 years of age) [47, 65, 75, 130]. In studies with only caregiver participants (n = 17 papers), 88.2% (n = 15) reported the patients’ ages, e.g., [24], [37], [53]. Of these, all included caregivers of pediatric patients, and 66.7% of studies (n = 10) also included caregivers of adult patients, e.g., [11], [84], [92]. The majority of pediatric patients were therefore studied through their caregivers. The methods used to collect and report pediatric patients’ age varied widely; among those papers that reported a mean patient age (n = 7 papers), the average age of pediatric patient participants across studies was 7.6 years old (±5.7 years).

Across all studies that reported patient age, regardless of reporter (n = 67 papers), more studies included adult rare disease patients than pediatric rare disease patients (88.4% vs. 50.7%). Caregiver age was reported in 24.4% (n = 11) of papers that included at least some caregivers (n = 45 papers), and all caregivers were adults (Fig. 3).

Location

Fifty-four (45.2%) of the studies reported at least one country or region for study participants, e.g., [20]–[22]. The mean number of countries reported was 6.5 (±8.6), though 22.5% (n = 27) of studies included participants from only one country, e.g., [10], [40], [41]. The highest number of countries represented by one study was 40, though individual countries were not specified [70]. The majority of countries were Western and English-speaking, and the most frequently reported countries were the United States (n = 36), e.g., [126], [130], [131]; the United Kingdom (n = 20), e.g., [32], [35], [44]; Canada (n = 18), e.g., [21], [30], [36]; Australia (n = 16), e.g., [67], [68], [74]; and New Zealand (n = 9), e.g., [6], [30], [44]. This is not surprising given that our review was limited to studies published in English. See Supplementary Table 4 for the frequencies of all countries reported.

Participant sampling and sample size

All samples were convenience samples; no studies reported using a representative sampling frame. Over half (n = 72, 60.0%) of the studies did not provide justifications for their choices of social media platforms or sampling frames for participants, e.g., [26], [29], [85]. When reported, the most common justification for choice of social media was the size of a particular online community (e.g., “this platform has the most users”) (n = 25, 20.8%), e.g., [42], [70], [132], followed by an existing collaboration with a group or organization (n = 19, 15.8%), e.g., [23], [55], [76]. The remaining studies that provided justifications for their sampling indicated that they chose a certain platform because it included a format of information (e.g., videos) or type of data (e.g., group was unmoderated, information was public, participants were verified) that was specific to study goals (n = 14, 11.7%), e.g., [9], [57], [104].

Study sample sizes ranged from 2 to 4,860 people. Six studies claimed to have reached the largest cohort of their specific rare disease ever recruited for a single study, though we did not independently verify these claims [6], [42], [46], [55], [90], [117].

DISCUSSION

The results of this systematic review indicate that there has been a rapid increase in the use of social media for rare disease research over the last 13 years. However, this research is still limited in terms of goals, methods, and study designs, as well as its representativeness of the broader rare disease community, both in terms of disease type and patient demographics.

Our results indicate that, in rare disease research, social media has primarily been used for recruitment in observational, cross-sectional studies. This is in contrast to social media research in other fields, where researchers have developed methods for employing interventional and longitudinal designs in social media research, as well as strategies to help reach a more representative sample using social media [135, 136]. In our sample, a small number of studies used social media in unique ways, for example, by mining social media data to identify adverse reactions to medications in order to guide drug development for rare diseases [137], or as a component of communication for postdischarge follow-up with patients and caregivers [8]. Social media has the potential to be further utilized by those in the rare disease community—as it has in other fields, such as cancer prevention and adolescent health—to increase mental and physical well-being of patients and to share health information and emerging research [13, 138].

In rare disease research, the cohorts of patients recruited through social media may not represent the broader rare disease community in terms of gender and race/ethnicity. Despite social media use being nearly equal across racial groups and only slightly higher in women than men, White and female participants were overrepresented in our included studies [139]. This is consistent with other reviews that have found an overrepresentation of White and female participants in studies using social media [140, 141]. Race/ethnicity was also highly underreported and, when reported, used a range of sociopolitical and ancestry-based categories in direct contrast to recent guidelines [134]. This lack of diversity is a problem in genomics studies in general, as individuals identifying as Black, Native American, or Hispanic/Latino are rarely included in genome-wide association studies [142]. Our findings indicate that, in addition to being excluded from genome sequencing studies, racial and ethnic minorities also are excluded from studies of rare diseases using social media. This is concerning, as our review demonstrates that a key focus of these studies is the psychosocial and health needs and challenges of rare disease patients and their families. By excluding non-White populations, who may face additional challenges associated with racial bias in our health-care system and society more broadly, we are likely overlooking the needs of many rare disease patients and families.

Additionally, only 4 of the 120 studies in our sample included a pediatric perspective that came directly from the pediatric patient. This may be due to the challenges of obtaining parental consent for research using social media and suggests that different approaches may be needed to understand the pediatric perspective on rare disease. It is difficult to assess the extent to which participants recruited through social media research are representative of the broader rare disease community, as there is currently no unified system for tracking epidemiologic data on rare diseases in the United States [143]. Regardless, it is clear that rare diseases affect all races, genders, and ages, and therefore information drawn from cohorts that are mostly White, female, and adult might not be generalizable to the rare disease community at large.

Studies were not representative in terms of disease type, with only 101 (1.4%) of the estimated 7,000 rare diseases represented. Some diseases were also over- or underrepresented relative to their prevalence. For example, cystic fibrosis was by far the most studied disease (n = 14, 11.6%), while sickle cell disease, which has a prevalence in the United States of over three times that of cystic fibrosis (100,000 vs. 30,000) [2, 144] was only included in three studies. Further, many rare disease communities may not currently be reachable through social media, as suggested by a recent study showing that only one in five rare pediatric diseases has a disease-specific group on Facebook [145]. It is possible that patients without dedicated groups for their specific diseases are present in groups for the broader rare disease community, but more research is needed to understand the distribution of rare diseases across the landscape of social media. Over- or underrepresentation of certain rare diseases may bias the general body of rare disease research, making it more difficult to draw conclusions about the needs and characteristics of rare disease patients as a whole.

Based on the findings of this review, we suggest a few key steps researchers can take to improve the quality and utility of future rare disease research using social media:

  1. 1.

    Researchers can focus on increasing representativeness in study samples. They should explore new methods to reach underrepresented demographics, such as reaching out to social media groups that specifically include male or non-White rare disease patients and their families, and/or diversify their recruitment methods beyond social media. Researchers could also use social media to contact community groups (many of which have a presence on social media), and these groups could help recruit patients and families outside of the social media platform. If that is not possible, researchers should, at a minimum, discuss the potential implications of bias in their results.

  2. 2.

    Researchers can more thoroughly report participant demographics and study methods. This should include clear reporting of race, gender, age, and nationality for both patients and caregivers. While there may be some occasions—for example, when studying ultrarare diseases—when concerns about confidentiality may be an obstacle to reporting detailed demographic data for a given study sample, researchers should still consider how gender, race/ethnicity, and age bias may influence their findings. Furthermore, new recommendations for publishing research in genetics and genomics have emphasized the need for authors to explicitly define race in genetics research, and for journals to provide clear guidelines for reporting race and other sociopolitical characteristics [134].

  3. 3.

    Researchers can clearly report methodological details including the social media platform used. They should justify both their choice of platform and choice of sampling frame for recruitment. Social media is not a monolith, and different groups may attract different types of rare disease patients. While some groups are solely for rare disease patients, others cater specifically to family members or health-care providers; others are open to all these categories [5]. Researchers should carefully consider their study goals when choosing the type of social media group to target. For further guidance on rigor in research using social media, rare disease researchers can turn to standardized reporting checklists developed specifically for online research [146].

Our study has several limitations. First, although we developed a detailed search string and augmented our search with manual review (see Appendix A), there is no way to ensure we captured every study on every rare disease. This limitation is inherent in the study of “rare diseases,” which are, by their very nature, a large group of heterogeneous conditions that are methodologically challenging to study. However, we feel our carefully designed search string substantially mitigated against this limitation. Second, our review only includes studies published in the peer-reviewed research literature, and therefore does not include studies that may have been conducted by advocacy or other nonprofit organizations. However, the peer-reviewed literature is widely considered to represent the most rigorous scientific work available and should be held to the high standards outlined in our conclusion. Third, our review was limited to studies published in English. Given the global burden of rare disease, we likely missed studies published in other languages, and therefore our review may lack a global perspective on how social media is used in rare disease research.

Conclusion

Social media is increasingly used to study hard-to-reach populations, including rare disease patients and their caregivers, in innovative and important ways. While social media is a potentially powerful tool, its current application in rare disease research is limited to primarily observational, cross-sectional studies using surveys to examine patient experiences and patient-reported outcomes. In addition, rare disease patients and caregivers reached by social media studies may not be representative of the rare disease population by gender or race/ethnicity, and represent only a small percentage of the over 7,000 identified rare diseases. As scholars explore new approaches to using social media for rare disease research, careful attention should be paid to representation within this large and diverse patient community.