Social connectivity is known to impact health. Social isolation is a predictor of mortality comparable to smoking, hypertension, and physical inactivity1. Social enrichment has a strong positive effect on biological2 and functional health outcomes3,4. Social connections are also potentially modifiable, making them ideal targets for changing habits such as smoking, exercise, and diet5.

Despite their promise in health, social networks are poorly understood in patient populations, and interventions aimed at networks are nascent. One main reason is a lack of clear definition of the network surrounding a patient6,7. Traditional social network metrics are actually summary indices of social support that query the total number of social contacts, social resources available, and community engagement8. Multiple clinical trials that have used such measures in patient populations have failed to demonstrate a change in patient outcomes9,10,11. A more precise set of measures are needed to map the specific people in the social system, one-by-one, and the nature of ties between persons to clarify a network's properties.

In this study, we introduce a social network assessment tool that quantifies patients’ personal network structure and health characteristics in a web-based, secure, and scalable form. The tool is a survey adapted from a validated instrument, the General Social Survey12, and captures the structure of social ties and composition of demographics and habits around the index patient. We demonstrate the utility of the tool by quantifying the personal networks of 1493 individuals at risk for multiple sclerosis. The participants are enrolled in the Genes and Environment in Multiple Sclerosis (GEMS) project, a prospective cohort study of people with first-degree family history of MS13. The goal of the GEMS project is to identify novel genetic and environmental risk factors, including the social environment. Prior work has shown that asymptomatic MS family members who have a high burden of genetic and environmental risk factors have evidence of diminished neurologic function14. Here, we show a relationship in the GEMS cohort between social network metrics and neurological disability. We demonstrate that quantifying social networks in large-scale clinical studies offers an effective platform to identify previously unknown social environment risk factors that are potentially modifiable.


Creating a scalable online tool to assess social networks

We designed a HIPAA-compliant structured social network questionnaire adapted primarily from the General Social Survey12,15 (Supplementary Methods 1). The schema of data acquisition and potential use is presented in Fig. 1. The questionnaire comprises ~48 questions with adaptation to responses. The estimated completion time of the questionnaire is 10–15 min. The questionnaire begins with three traditional name generators, in which participants named all people with whom they had discussed important matters, socialized, or sought support in the last 3 months. The number of people who could be named was not capped. Next, participants answered questions that evaluate the connections between each pair of the first ten persons in the network, including the strength of ties in three levels (strangers, weak, and strong). Finally, participants answered questions about the characteristics and health habits of each of the first ten persons in the network7. The online questionnaire was hosted on the Research Electronic Data Capture (REDCap) server, a secure web platform for administering questionnaires in clinical research16. A version of the instrument is available for use in the REDCap Shared Library. Code to analyze and visualize data created from the instrument is available on GitHub.

Fig. 1
figure 1

Overview of data collection, analysis, and interventions. This flowchart shows the social network data acquisition, identification of modifiable elements in the social environment, and potential intervention strategies

The assessment generated two main categories of network metrics, structure, and composition, based on graph theoretical statistics. Within the category of social network structure, size is the number of individuals in the network, excluding the index participant or “ego”. Density is a measure of connectivity of individuals in the network, calculated as the sum of ties, excluding the ego’s ties, divided by all possible ties17. Constraint is a more detailed version of density that quantifies the extent to which the ego’s connections are to individuals who are connected to one another. Effective size is the number of non-redundant members in the network18. Maximum degree is the highest number of ties by a network member, and mean degree is the average number of ties by a network member. Equations for these measures are available in Supplementary Methods 2.

Within the social network composition category, several metrics quantify the ratio of member characteristics in the network. For instance, the percent kin is the percent of individuals in the network who are family members. Standard deviation of age represents the range of ages. The diversity of sex index is the mix of men and women in the network, according to the index of qualitative variation19, with a value of 1 indicating equal mix of men and women. The diversity of race is the mix of races similarly calculated. Importantly, the questionnaire also queries the health behavior environment around the participant by examining the percentage of the network members with negative health habits, including smoking, sedentary lifestyle, not visiting doctors regularly, and poor compliance of prescription medications. All compositional variables were created to account for network size. Specifically, the number that fits a category was divided by the total size to create the percentage.

Demonstrating network quantification in a nation-wide cohort

We assessed the social networks of 1493 GEMS participants from across the United States (Supplementary Fig. 1), which represented 57% of the cohort as of October 2016. In Table 1, we report the demographic and clinical information of the cohort at the time of the study, separated into subgroups of asymptomatic participants and participants with an MS diagnosis. Asymptomatic participants had a lower age on average than participants with an MS diagnosis, consistent with the previously reported demographics of the cohort13.

Table 1 Demographics and clinical characteristics of the participants

The primary outcome measure of functional disability was the MSRS-R, a self-reported outcome of functional disability validated for people with MS. The MSRS-R is a brief questionnaire that correlates with traditional clinical instruments20,21. The eight domains of MSRS-R include walking, using arms and hands, vision, speaking clearly, swallowing, cognition, sensation, and the bowel and bladder function for a maximum score of 32. In this cohort of primarily asymptomatic people at risk for MS, we chose MSRS-R as an outcome measure because few alternative self-reported outcome measures have the advantages of being concise and validated in early MS. As expected, the median MSRS-R score was higher on average in the MS group than in the asymptomatic group.

To visualize each participant’s social network structure, we plotted a montage of all participants’ networks, ranging from the smallest to the largest, with the strength of each tie highlighted in color (Fig. 2). The average network consisted of eight people who were densely linked (67% of all possible ties were present). Furthermore, an average of 44% of all network members were kin, 38% were supportive of the index participant, and there was a nearly equal mix of men and women (diversity score of 0.89 with one being an equal mixture of men and women). Race, on the other hand, was not varied within networks with a diversity score of 0, indicating that most members in a participant’s network were of the same race. Weak ties, denoting those who are less familiar with the participant, ranged from 20% to 67% depending on the measure. The percent of individuals who were known for less than 6 years by the respondent was 20% in asymptomatic persons and 12% in MS patients (P = 0.001, Wilcoxon signed-rank test), suggesting a reduction in recent acquaintances in participants with an MS diagnosis. Otherwise, differences in network structure and general network composition between asymptomatic and MS participants were small and not significant (Table 2).

Fig. 2
figure 2

Structure of participants’ personal social network. Each small network has a black circle that represents the participant who is surrounded by white circles who are the network members. The lines connecting the circles are red if the relationship is strong and blue if the relationship is weak. Networks are arranged from the smallest (top left) to the largest (bottom right)

Table 2 Network characteristics

To visualize the milieu of health habits around the participant, we plotted a montage of all participants’ networks, ranging from the healthiest environment to the least healthy (Fig. 3). On average, the network composition with respect to health habits skewed toward social environments in which most network members have healthy habits. Seventeen percent of participants had personal networks in which all members were healthy. On average, the percent of network members who do not exercise was 33%, and this was the highest value out of the examined negative health habits. There was a weak negative correlation between network size and the percentage of network members with unhealthy habits (Pearson’s correlation = −0.13 ± 0.05, P < 0.0001). Because we did not detect differences in network composition with respect to healthy habits between asymptomatic and MS participants, we were able to pursue joint analyses of these two subgroups.

Fig. 3
figure 3

Health habits in participants’ personal social network. In each network, a black circle is the participant, a white circle is a healthy social contact, and a red dot is an unhealthy social contact. Unhealthiness is defined as someone who does any of the following: smokes, does not exercise, does not visit doctors regularly, or not compliant with prescription medications. Networks are arranged from least negative health influence (top left) to most negative health influence (bottom right)

Having established the basic properties of our data, we examined the relationship between network metrics and self-reported functional disability outcome. Given the number of network metrics and to account for multiple testing burdens, we grouped the network variables into structure and composition categories. We then used a permutation-based omnibus test to examine the associations of these two groups of network metrics with the MSRS-R. The observed distribution of P-values in the omnibus test was greater than chance for network composition (P = < 0.0001, all; P = 0.008, asymptomatic subgroup; P = 0.001, MS subgroup), but not for network structure (P = 0.066, all; P = 0.14, asymptomatic subgroup; P = 0.25, MS subgroup) (Table 3, Fig. 4). Thus, our global assessments indicated that network composition, rather than network structure, was associated with self-reported functional disability based on the MSRS-R scores (Table 3).

Table 3 Relationship of the composite categories of network variables to MSRS in all participants
Fig. 4
figure 4

Comparison of expected versus observed regression results. Quantile–quantile plot of expected versus observed P-values of composite network structure and network composition metrics in relation to neurological function and disability in the full cohort (a, b) and subgroups of asymptomatic (c, d) and MS participants (e, f). The expected P-values (-log10[P-value]) are shown on the x-axis and the observed P-values (-log10[P-value]) are shown on the y-axis. The dark gray area indicate the confidence interval ranges as generated by chance at a threshold of P = 0.10 and the light gray is for P = 0.05. The observed values for composition, and not structure, are outside of the gray areas, suggesting that composition is associated with the MSRS-R score beyond chance after accounting for multiple testing burden and correlation structure of the composition variables

To deconstruct these global effects of the social network, we examined the association of individual network metrics with the MSRS-R, adjusting for sex, age, marital status, and years of education (Table 4). None of the network structure metrics were significantly associated with MSRS-R score, consistent with the global assessment. Two network composition features were significantly associated with MSRS-R score: the percent of network members who (1) do not go to a doctor regularly or (2) are deemed to have a negative health influence on the respondent. The strongest association was with the percent of network members who are deemed to have a negative health influence (β = 0.017 ± 0.005, P = 0.016, linear regression).

Table 4 Relationship of individual network variables to MSRS-R

In exploratory analyses, we examined the relationship between each individual’s Genetic and Environmental Risk Score (GERS) and her or his social network size. The GERS is an aggregate estimate of an individual’s MS risk based on validated genetic and environmental susceptibility factors. We have previously reported that the GERS is informative of MS risk beyond family history in the GEMS cohort of first-degree family members13. Using the published GERS based on previously reported genetic and environmental risk factor data available among a subset of the GEMS participants (n = 999 all, n = 920 asymptomatic subgroup, n = 79 MS subgroup), we noted an association in linear regression between larger network size and increased GERS (β = 0.82 ± 0.19, P = 2.43 × 10-5, all) (Supplementary Table 1). This finding appears to be driven by the larger network size of women participants relative to men. In a regression analysis, network size is inversely related to male sex (β = −1.87 ± 0.42, P = 8.71 × 10-6, all). Among asymptomatic participants, both a history of mononucleosis (β = 1.13 ± 0.40, P = 0.005) and a higher genetic risk score for MS susceptibility (β = 0.65 ± 0.24, P = 0.006) were also associated with a larger network size in the linear regression (Supplementary Table 1).


In this in-depth analysis of social networks in family members of MS patients, we demonstrate the ease and utility of deploying our online questionnaire that evaluates an individual’s social network in a structured manner. In a few weeks and using only electronic communication, we collected complete data on 1493 individual GEMS participants. This large data set allowed us to pursue analyses in a statistically robust manner and to produce highly significant results. These results represent an important milestone in studies of MS and other neurologic conditions with a long prodromal neurodegenerative phase by providing investigators with the key data needed to support power calculations and guide future study designs. In particular, we found that asymptomatic family members at risk of MS have enough variance in our measure of self-reported disability to yield strong association results with compositional but not structural variables. Most prominently, the health habits of persons in their social environment was strongly associated with the participant’s self-reported neurological dysfunction, and the percent of network members who have a negative health influence had the strongest association with disability. While these results need to be validated, they show (1) that studies of “at risk” individuals in which overt symptoms of a neurologic disease have not yet become manifest are feasible and (2) that network composition is an area that deserves further dissection in individuals at risk for MS and perhaps for other neurodegenerative diseases.

Our assessment adds to a growing list of web-based personal network surveys that translate the complexity and burdensome features of this type of questionnaire into a more usable and scalable form22. Two examples in public health include: (1) EgoWeb 2.023, an open-source software that may be used for motivational interviewing using network graphics and (2) OpenEddi24, a tool designed for interactive, tablet, or mobile-ready field collection of network data. Our tool is unique, in that it is a HIPAA-compliant data collection tool, able to be completed by patients without an interviewer, and has the capability to handle large volumes of data from clinical populations using electronic communications. The assessment also included questions customized for patients or at-risk individuals with a focus on social support and health-related behaviors of network members. These dimensions are critical for future planning of network interventions to improve health and quality-of-life outcomes in clinical settings.

One mechanism that may explain some of our findings is the tendency of individuals to associate with others who are similar to themselves or homophily. Similarity-breeding social connection has been described in other social network studies25. Race and ethnicity are the strongest linkage factors leading to homogenous personal environments25, and we found this in our study as well. However, there are many examples of health behavior homophily. Children’s social network composition is significantly associated with several aspects of children’s own health26. Latrine ownership in rural India is correlated with latrine usage among social contacts, after control of caste, education, and income27. An individual’s weight is influenced by obesity of spouses and same-sex social contacts28, and incident type 2 diabetes is associated with obesity in spouses29. Aspirin use is correlated with aspirin use among friends and family30. Taken together, these findings point to core human behaviors that are shared among like-minded social contacts, with eating and physical activity as major driving forces for these effects.

Two more mechanisms that may explain the association of network members’ health habits and the participant’s neurological disability are social contagion and antecedent exposures. Social contagion is a type of social influence in which behavior in one or many network members affects the behavior of the index participant. Detection of this effect requires longitudinal data and network modeling, such as stochastic actor-oriented or instrumental variable approaches, to understand the spread of behaviors through social ties. For example, one study shows the spread of physical activity in 1 million users of a smartphone running application31. Antecedent exposures influencing both parties may be another contributor. For example, rural environments with poor access to medical services may influence the habits of all members of the network with regard to seeking medical care. Finally, a combination of these factors may explain the association of poor health habits in the network and a person’s neurological disability.

The association between an individual’s susceptibility for MS, as determined by GERS, and social network size is a preliminary finding that requires further investigation. This may be explained by the inclusion of sex as a component of GERS13 and prior observation that women tend to have larger social networks15. However, the imbalance of men (19%) and women (81%) in this study potentially complicates the interpretation. Another explanation is that larger network size reflects broader exposure to infectious agents that are associated with MS susceptibility, such as history of infectious mononucleosis13. Indeed, we observed a positive association between mononucleosis and network size among asymptomatic participants. Finally, the role of genetic factors in network size is provocative, but the effect is modest and needs further investigation.

Our study has limitations. First, we were unable to establish causality and directionality of the associations or the mechanisms of homophily in this cross-sectional study. Within the GEMS platform, we are gathering longitudinal social network data. Second, the primary outcome measure of neurological disability (MSRS-R) was skewed toward low scores due to the larger proportion of self-reported asymptomatic participants in the GEMS cohort who have low scores in this instrument. This could reduce the precision of our analyses due to a floor effect. Further, the study may be underpowered to compare asymptomatic and MS subgroups, given the modest number of the MS cases (i.e., familial MS). Larger studies of individuals with sporadic MS will better answer whether social network variables influence disease worsening in MS. Third, unmeasured confounders that influence report of social networks and functional disability could have affected our findings. We attempted to address this limitation by adjusting for major factors reported in the literature, including age, sex, and marital status. Fourth, we ascertained social network metrics based on participants’ self-report of their social networks. While this approach may introduce unknown biases, prior work reassuringly had shown self-reported personal networks of intimate contacts to be accurate32. Finally, this study of the GEMS participants, who were recruited through advocacy groups, social media, and electronic communications, may not have broad generalizability because these participants are more socially engaged and better educated than the general population. Future studies of more diverse populations and other chronic neurological disorders will be critical.

The social environment is ubiquitous and important for understanding human disease etiologies and outcomes. Social network features, in general, represent an emerging group of metrics that inform aspects of health and disease, but are not currently well captured by many biomedical research studies. We outline an approach of quantitative social network analysis that is readily adaptable in clinical investigations. The questionnaire that we have developed for quantifying social networks is available through the open-source REDCap platform. In the empirical work described, we found that the health behaviors of persons surrounding an individual at risk for MS were associated with the individual’s own functional status. These results suggest that interventions aimed at modulating network composition through education or treatment of members in a social network holds the promise of a novel complementary approach to managing MS onset and disease course.


Study design and participants

In a cross-sectional design, we invited GEMS participants to complete an online questionnaire assessing social networks and current neurological disability in October 2016 (Supplementary Methods 1). The questionnaire was live for 6 weeks, with reminders sent to non-responders. At the time, the GEMS cohort included 2632 first-degree family members from across the United States recruited using patient advocacy groups, social media, and word-of-mouth13. The inclusion criteria were: being 18 to 50 years of age at enrollment and having at least one first-degree relative with a diagnosis of MS (e.g., parent, full-sibling, or child). While asymptomatic family members who are at risk for MS represent the main focus of the GEMS project, we also recruited family members who already have a MS diagnosis for comparison in this cross-sectional study. MS cases were confirmed by review of medical records. The institutional review boards of all participating sites (Partners HealthCare, National Institutes of Health, and University of Pittsburgh) approved the study. All participants provided written informed consent.

Statistical methods

To compare the demographic characteristics between asymptomatic participants and confirmed MS cases, we performed a t test for age, chi-squared tests for dichotomous variables of sex, marital status, and living alone, as well as non-parametric Wilcoxon rank-sum tests for years of education and MSRS-R. Similarly, we performed non-parametric Wilcoxon rank-sum tests to compare network metrics between asymptomatic participants and participants with MS diagnosis.

To assess the association with MSRS-R score, we performed a linear regression for each network variable, adjusting for age, sex, and marital status. In this analysis, MSRS-R was modeled as the dependent variable and each network characteristic as the independent variable. Within each network metrics category (structure and composition), we calculated the false discovery rate to adjust for multiple testing. To examine any potential bias due to non-normal distributions, we performed a sensitivity analysis applying non-parametric spearman correlation tests.

To examine the hypothesis that as a category, social network variables were associated with the MSRS-R score, we performed an empirical omnibus test. In the first stage of this analysis, we calculated the P-values of association between each network variable and MSRS-R score using linear regression as described above. In the second stage, we used a Fisher’s meta-analysis to combine these P-values and calculate a chi-squared statistic. We then compared this chi-squared statistic to an empirical distribution of chi-squared statistics as generated by 10,000 random permutations. By permuting the MSRS-R score, we maintained the correlation structure of the network variables. The empirical omnibus P-value was then calculated as the number of times that the chi-squared statistic from the 10,000 permutations was greater than the true chi-squared statistic divided by the total number of permutations. To generate a quantile–quantile plot, we plotted the observed −log10 (P-value) of each pair of association between a network variable and MSRS-R score against the expected −log10 (P-value). The 90th and 95th empirical confidence intervals were determined using empirical P-values as generated by the 10,000 permutations. We performed the omnibus test in all participants as well as in the subset of asymptomatic participants and the subset of participants with MS diagnosis.

In exploratory analyses, we assessed the relationship of GERS (a published estimate of MS risk based on an individual’s known genetic burden and environmental exposures for MS susceptibility) and social network metrics. Here, we performed linear regressions adjusting for age, modeling network size as the dependent variable, and the GERS (and its components: history of infectious mononucleosis, sex, smoking status environmental risk score, and genetic risk score) as the independent variables. All analyses were performed in R version 3.233. All statistical tests were two-sided. Given the exploratory nature of the analysis and data, power calculations were not performed prior to analysis. Permutations and nonparametric tests were used to avoid bias due to any non-normal data or unequal variances between groups, as necessary.

Code availability

An updated version of the instrument called “Personal Network Survey for Clinical Research” is available in the REDCap Shared Library. We have also uploaded a comprehensive R codebase for researchers who use the instrument to analyze and visualize their data available at: R code used specifically for this project can be made available upon request.