Introduction

Autoimmune (AI) diseases such as systemic lupus erythematous (SLE), rheumatoid arthritis (RA) and multiple sclerosis (MS), are the consequence of an individual’s immune system inappropriately responding to self-tissue1,2. While a polygenic predisposition contributes to disease risk for AI diseases3,4,5, AI diseases are relatively uncommon (for instance, crude incidence rates of SLE are 5.3/100,000 person-years in a European ancestry population)6. This is due, in part, to the fact that an environmental exposure is often required to trigger the production of disease-facilitating autoantibodies7.

The diagnostic criteria for AI diseases typically require the presence of minimal number of stereotypical clinical features and may take many years for an individual to develop a sufficient number of symptoms to meet the diagnostic criteria for an AI disease8,9,10,11. Thus, the prevalence of individuals with subclinical or variably penetrant disease may be higher than prevalence estimates suggest12.

White blood cell (WBC) counts in peripheral blood are a quantitative trait, and increased or decreased counts are part of stereotypical disease process for AI disorders13,14. In general, quantitative traits are more sensitive than binary traits (e.g., disease status) to detect associations. Thus, we examined whether polygenic predictors of AI diseases were associated with WBC counts measured in large, unselected populations to test the hypothesis that for relatively rare conditions, such as AI diseases15, significant associations can be detected in populations where the disease prevalence would be expected to be low based on estimates from epidemiological studies if the disease is more common than expected. If these associations reflect disease mechanisms, they should follow a pattern consistent with the known epidemiology of the disease, consequently the magnitude of associations would differ among men and women.

We examined several AI diseases: SLE is an uncommon AI disease16 with a strong genetic component3 that is characterized by decrease in WBC counts in response to disease activity13,17; RA is a more common AI disease than SLE18 that may exhibit moderate increases in WBC during active disease14; and, because there is substantial genetic overlap among AI diseases4,19,20, we also included other common AI diseases (ulcerative colitis—UC, Crohn’s disease—CD, MS, Type 1 diabetes—T1D, and autoimmune thyroiditis—AIT). We observed that a genetic predisposition to several AI diseases was associated with WBC counts with larger effects in females.

Methods

Figure 1 summarizes the research questions and study design.

Figure 1
figure 1

Summary of the research question and study design. Summary level data from genome-wide association studies (GWAS) were extracted for 7 autoimmune (AI) diseases to develop and validate genetic instruments by testing their association with the respective AI disease in BioVU using disease-specific polygenic risk scores (PRS) and PheCodes. Validated instruments were used as exposures in the Mendelian randomization (MR) analyses to test the genetic association between AI diseases and white blood cells (WBCs). Significant association in the MR analyses were validated by testing the association between disease specific PRS and transformed measured WBC counts in ARIC and BioVU. Stratified analyses by sex were performed to measure the magnitude of association in males and females.

Study population

Individual-level phenotype data were obtained from BioVU, the Vanderbilt University Medical Center’s (VUMC) Biobank. A detailed description of BioVU has been published elsewhere21. Briefly, BioVU accrues DNA from samples obtained during routine clinical care from patients who have consented to have a DNA sample stored. DNA is extracted from discarded samples and linked to a de-identified version of the electronic health record (EHR) at VUMC. Individuals included in this retrospective observational study were previously genotyped on the Illumina Infinium Multi Ethnic Genotyping Array (MEGAEX) platform (described below) as part of a broad-based institutional genotyping initiative. The analyses were restricted to participants between 18 and 65 years of age of White European Ancestry, as determined by HAPMAP reference populations and principal components. All participants included in the WBC count analyses had at least one WBC count recorded in their EHR collected on the same day as a health maintenance exam (a wellness visit for routine clinical care).

The Atherosclerosis Risk in Communities (ARIC) study comprised 8926 unrelated genotyped adults of White European Ancestry aged between 45 and 64 years old and followed from 1986 through 201522. ARIC data were obtained from dbGaP (phs000280).

Approval for the present study was obtained from the VUMC Institutional Review Board.

The BioVU and ARIC cohorts represent a clinical- and community-based cohort, respectively. It would be expected that AI diseases would be enriched in a clinical cohort, as compared to a community cohort in which the prevalence of the diseases would be expected to be more comparable to the general population.

Phenotype data

In BioVU, WBC counts were extracted from the EHR. Analyses were restricted to WBC counts between 1.5 and 25 × 1000 cells/mm3 collected as part of a “health maintenance” exam (defined as a clinical encounter associated with a wellness visit denoted by any of the following: ICD-9/ICD-10 codes V70.9, V20, V20.1, V20.2, V70, V70.0, Z00.8, Z00.129, Z00.00). Age was defined as the age at the time of WBC measurements. For participants with multiple WBC measurements (83.8%), the median value was used. Age was the median age of those measurements. Clinical diagnoses for the AI diseases were defined using PheCode phenotypes (https://phewas.mc.vanderbilt.edu/) which are collections of related ICD-9/10-CM (International Classification of Disease) diagnosis codes23,24. Cases were defined as individuals with two or more instances of a PheCode in their medical record25,26. Controls were defined as those without the PheCodes. For RA, we also excluded those with Juvenile RA among the controls. The following PheCodes were employed—SLE (695.40, 695.41, 695.42), RA (714.1), UC (555.20, 555.21), CD (555.1), MS (335), T1D (250.10, 250.11, 250.12, 250.13, 250.14, 250.15), AIT (245.21).

In the ARIC cohort, WBC counts collected at visit 1 and whose value fell within the range used for BioVU, between 1.5 and 25 × 1000 cells/mm3, were examined.

Genetic data

SNP genotyping of BioVU subjects was performed in the Illumina Infinium Multi-Ethnic Genotyping Array (MEGAEX) platform. Quality control (QC) analyses used the PLINK v1.90β3 software27. One of each pair of individuals with a second or higher degree of relatedness were excluded. Prior to imputation, genetic data were filtered and standardized through the HRC-1000G-check tool v4.2.5 (http://www.well.ox.ac.uk/~wrayner/tools/) and pre-phased using Eagle v2.4.128. Principal components (PCs) were calculated using the SNPRelate package29. Data were imputed using the Michigan Imputation Server in conjunction with the 10/2014 release of the 1000 Genomes cosmopolitan reference haplotypes30. Imputed data were filtered for a sample missingness rate < 2%, a SNP missingness rate < 4% and SNP deviation from Hardy–Weinberg P-value < 10–6. After QC, 7,585,258 SNPs were available for analysis. Polygenic risk scores were calculated using PLINK v231. Quality control for the ARIC data set was according to the guidelines in the dbGaP release and used PLINK version 1.07, to obtain a sample of unrelated individuals27. ARIC data underwent the same QC and imputation protocol as BioVU.

GWAS summary statistics

Summary statistics for SLE (7219 cases and 15,991 controls)3, RA (29,880 cases and 73,758 controls)4, UC (4176 cases and 9500 controls)5, CD (4474 cases and 9500 controls)5, MS (9772 cases and 17,376 controls)32, T1D (18,942 cases and 50,1638 controls)33, and AIT (15,654 cases and 379,986 controls)34 were obtained from publicly available large scale GWAS performed on individuals of European ancestry and were used to define genetic instruments to construct polygenic risk scores (PRS) for each AI disease. The WBC GWAS summary statistics were from a European ancestry subset of a study of blood cell traits in 746,667 participants across 5 global populations and was used the test the associations with the a forementioned seven AI diseases35”.

Analysis

For each AI disease, genetic instruments were constructed using SNP associations derived from GWAS studies. Independent SNPs were selected using a pruning-and-thresholding algorithm that selected an LD-reduced set of SNPs (r2 < 0.05)36 with a minor allele frequency (MAF) > 5% and an association P-value < 5 × 10–8 using Plink V2.

A Mendelian randomization (MR) framework was used to test for associations between genetic instruments for AI diseases and WBC count. MR is an instrumental variable approach used to explore etiological relationships between exposures and outcomes37,38,39. It employs SNPs associated with a chosen exposure as instrumental variables that define the direction and magnitude of associations between the exposure/risk factor and a chosen outcome38,39. The general assumptions of the MR approach requires that the genetic instruments (1) are associated with the exposure or risk factor of interest, (2) are not associated with confounders of the exposure-outcome association, and (3) are not associated with the outcome conditional on the risk factor and confounders38,39.

The first assumption that the genetic instrument was associated with the AI disease of interest was ascertained by testing its association with cases and controls for the corresponding clinical diagnosis in BioVU using PheCode phenotypes24. A polygenic risk score (PRS) was computed for each AI disease by summing the product of each SNP effect size and the SNP dosage in BioVU participants. The PRS was standardized to have a mean of 0 and a standard deviation (SD) of 1. The association between the PRS and the PheCode phenotype was tested using a logistic regression model adjusted for sex, median age, and 10 PCs as covariates. Associations represent odds-ratios per 1 SD change in the PRS. An association P-value < 0.05 was considered significant.

A two-sample MR approach was used to test the association between selected AI diseases and WBC count using inverse-variance weighted regression (IVWR). To ensure that significant associations were not due to pleiotropy, sensitivity analyses were performed using the pleiotropy-robust MR-Egger and weighted median methods to confirm the magnitude and direction of associations38,39,40. When horizontal pleiotropy was detected, MR-PRESSO was applied to assess and correct for horizontal pleiotropy through outlier removal41. Association analyses were performed using the genetic instrument for each AI disease as the exposure, and SNPs from the GWAS of WBC count as the outcome. Effect sizes represent the change in transformed WBC count per change in the log odds-ratio of the AI disease. A Bonferroni-adjusted association P-value ≤ 0.05/7 (0.007) for the IVWR effect size was considered significant. In secondary analyses, we examined genetic instruments that were specific to the AI disease of interest by excluding shared SNPs with any of the other AI diseases at P-value < 5 × 10–4. All MR analyses were performed using the Mendelian Randomization R package37.

Significant associations from the MR analyses were validated by testing the associations between the PRS for the associated AI disease and measured WBC counts in two independent data sets (ARIC and BioVU). Because the distribution of WBC counts was skewed, log transformed WBC counts were the primary outcome variable. The PRS was computed, as described above. The association between log transformed WBC counts and the PRS was tested using a linear regression model adjusted for sex, age, and 10 PCs as covariates. A Bonferroni-adjusted P-value (< 0.05/# AI diseases tested) was considered significant. Sex-stratified analyses were also performed.

The BioVU population included prevalent cases of the selected AI diseases. Thus, our primary analysis was performed among participants without a recorded diagnosis of the of the AI diseases of interest. Cases definitions were based on PheCode phenotypes24,25, as described above.

Data are shown as frequency (percentage) for categorical variables and median [interquartile range] for continuous variables; and associations are shown as effect size (95% confidence interval).

Ethics approval and consent to participate

Per Vanderbilt University IRB approval.

Results

Study design

Figure 1 summarizes the study design. GWAS summary data were extracted to develop genetic instruments for the seven AI diseases. We constructed polygenic risk scores for each AI disease and validated them by testing their association with their respective disease in BioVU. Cases were defined by two or more PheCodes for the AI disease. Validated genetic instruments were used as exposures in the MR analyses to test the genetic association between each AI disease and WBC counts using summary GWAS level data. If the AI disease was significantly associated with WBC counts in the MR analyses, the relationship between the PRS for the AI disease and measured WBCs counts was studied in ARIC and BioVU; and stratified analyses by sex were performed.

Validation of the genetic instruments for each AI disease

To confirm that the selected SNPs were associated with the respective exposure (and thus valid instruments for use in the MR), we constructed SNP-based genetic instruments for each AI disease and tested their associations with cases and controls for the respective disease. Each PRS was significantly associated with the corresponding AI diagnosis in BioVU (P-value < 1 × 10–10 in all 7 AI diseases, Supplementary Table 1).

Mendelian randomization

In the IVWR analysis, three genetic instruments were significantly associated (P-value < 0.007) with WBC counts: SLE, MS, and RA (Fig. 2, Table 1). For each 1 unit increase in the log-odds of a diagnosis of SLE, there was a corresponding decrease of − 0.05 (95% CI − 0.06, − 0.03) units change in transformed WBC count. The change for MS was − 0.05 (95% CI − 0.07, − 0.02) and for RA was 0.02 (95% CI 0.01, 0.03) (Table 1). Sensitivity analyses for RA identified a significant association with the MR-Egger intercept (P-value = 0.001), suggesting a possible bias in the IVWR estimate. The MR-Egger association statistic for RA was 0.05 (95% CI 0.03 0.07) (Supplementary Table 2). The MR-PRESSO analysis for RA identified 47 outliers that inflated the estimated by 91% (P < 2 × 10–6). The exclusion of these outliers returned an association statistic of 0.010 (95% CI 0.003, 0.02), P = 0.005.

Figure 2
figure 2

Scatter plots of the inverse-variance weighted regression for systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), and multiple sclerosis (MS) with WBC counts (log transformed). X axes represent the log-odds of the genetic instruments on disease risk (SLE, RA, and MS); and Y axes represent the effect of the genetic instruments in transformed white blood cell (WBC) counts. IVW inverse-variance weighted regression.

Table 1 Genetic associations between genetic instruments for 7 common autoimmune diseases and white blood cell counts using inverse variance weighted (IVWR) method.

Polygenic risk score (PRS)

We tested associations between a PRS for SLE, MS, and RA and measured WBC counts in a community-based cohort (ARIC). The ARIC population included 8,926 individuals, 47% were male with a median age of 54 [IQR: 49, 59] years, and a median WBC count of 6.0 × 103 [5.0, 7.3] cells/mm3. A higher PRS for SLE and MS was associated with lower log-transformed WBC count (SLE: − 0.01, 95% CI − 0.017, − 0.005 per SD increase in the PRS; MS: − 0.01, 95% CI − 0.017, − 0.005) (Table 2). While the direction of association was consistent with the results in the IVWR analyses, it was not significant for RA (0.004, 95% CI − 0.001, 0.010).

Table 2 Associations between autoimmune polygenic risk scores and measured transformed white blood cell counts in ARIC and BioVU.

We performed similar analyses in a medical center-derived cohort (BioVU). The BioVU population included 41,442 individuals, 40% were male with a median [IQR] age of 49 [36, 57] years, and a median [IQR] WBC count of 7.5 × 103 [6.1, 9.3] cells/mm3 (Supplementary Table 3). Among those with > 1 WBC count measurement, the median [IQR] duration between first and last measurements was 7.0 [2.4–12.3]. After excluding cases for each respective AI disease, we found similar associations with PRS for the 3 AI diseases in BioVU compared to the ARIC cohort (Table 2), with modest improvement of the estimates when cases and non-cases were analyzed together (Supplementary Table 4).

To define if the PRS for the 3 AI diseases exhibit the same association with WBCs in males and females, we first confirmed the association of the PRS with their respective disease in both sexes (Supplementary Table 5). Then, we examined the magnitude of the point estimates for associations between the PRS for the 3 AI diseases and measured WBC counts stratified by sex. In the ARIC cohort, the effect size per SD change in the PRS was larger for females for SLE and MS, but not for RA (Fig. 3, Supplementary Table 6). A similar pattern was seen in BioVU, except that the effect sizes were larger for females for each AI disease (Fig. 3).

Figure 3
figure 3

Association between polygenic risk scores for select autoimmune diseases and log transformed white blood cell counts, by sex, in ARIC and BioVU. Participants with a diagnosis of the respective autoimmune disease were excluded in BioVU. SLE systemic lupus erythematosus, MS multiple sclerosis, RA rheumatoid arthritis.

Discussion

We studied the association between polygenic predictors of select AI diseases and WBC counts. Associations examined using a MR framework (IVWR analysis) showed that polygenic predictors for SLE and MS were inversely associated with WBC counts, while a polygenic predictor for RA showed a positive association with WBC counts. Consistent genetic associations were also observed using directly measured WBC counts in a cohort where the prevalence of AI diseases would be expected to be very low (ARIC) and in a clinical population where individuals with known AI diseases were excluded (BioVU). These associations were primarily seen among women, in whom incidence rates of these diseases are known to be higher.

In support of the validity of these findings, we showed the genetic instruments were associated with the expected phenotype; in addition, we tested for horizontal pleiotropy and calculated a new estimated excluding outliers in our sensitivity analysis. We took a two-step approach to understand the relationship between AI diseases and WBC count. First, we used MR as a screen to determine whether a change in WBC count is a down-stream consequence of an AI diseases. Then, we extended these findings by examining the associations between a polygenic risk score and measured WBC counts in subpopulations derived from a hospital-based (BioVU) a community-based (ARIC) cohort.

AI diseases associate with higher or lower levels of WBCs through differing mechanisms. Reduced WBC counts are observed in up to 50% of individuals with SLE and is one symptom contributing to the classification criteria17,42,43. The WBC subtypes most impacted are neutrophils and lymphocytes13,17, and both lymphopenia (low lymphocyte counts) and neutropenia (low neutrophil counts) can be caused by several factors including disease severity13,17 as well as autoantibodies directed against these cells44,45,46. In addition, lower WBC counts are secondarily associated with the use of immunomodulating treatments, hypersplenism and viral infections46,47.

While MS is typically not associated with WBC abnormalities, an increase in WBC counts at disease onset and relapse was observed in individuals with MS compared to matched healthy controls in a small clinical study, but this increase did not correlate with disease activity48.

RA, on the other hand, is associated with mildly elevated WBC counts, and levels often correlate with active disease14,49. The higher WBC counts in RA are likely due to elevated levels of neutrophils, which are important mediators of inflammation in the joint, a characteristic of this disease14. WBC count increases may also be secondary to treatment with corticosteroids50. A rare complication of RA is Felty syndrome, which is associated with neutropenia and splenomegaly51,52.

For RA and SLE, the directions of the genetic associations observed in this study are consistent with clinical observations. In general, the association between SLE genetics and WBC counts was the most robust across populations, which is consistent with the observation that low WBC counts are a common feature of this disease. The inverse association between MS and WBC counts was also consistently observed across data sets, which was not anticipated, as this has not been observed to be a robust pattern of association in epidemiological studies. There are several plausible explanations for the observed associations between the genetic predisposition to some AI diseases and WBC counts.

First, it can be argued that our study population is enriched by cases of these AI diseases. SLE is an uncommon disease that occurs more frequently in women compared to men. The prevalence rates among individuals of white race are ~ 8.5/10,000 for women and ~ 0.9/10,000 for men16. Thus, in a community cohort such as the ARIC data set, it would not be expected to find an association between a genetic predisposition for SLE and a clinical symptom found in some SLE patients given that in general, PRS are relatively weak proxies of disease risk and that fewer than 5 cases of SLE would be expected in this small cohort of ~ 9000 individuals. Moreover, RA is a relatively more common disease with sex-specific prevalence rates of ~ 2.6/1000 men and ~ 7.1/1000 women18 but no significant association was found for genetic susceptibility for RA in ARIC, suggesting that case enrichment is an unlikely explanation.

Another explanation is that the polygenic predictors of these AI diseases capture genetic mechanisms that modulate WBC levels that are constitutively active. This could be plausible for SLE, where a low WBC is one of the diagnostic criteria for disease43. Individuals with a genetic predisposition toward lower WBCs are more likely to satisfy the diagnostic criteria. Consequently, a GWAS of SLE and a PRS derived from this GWAS could contain SNPs that associate with WBC, independent of SLE disease status. However, for MS and RA, changes in WBCs are not a stereotypical finding, making this a less likely explanation for the associations observed with these diseases.

Genetic disease risk represents a continuous risk spectrum. Under this scenario, all clinical features (including WBC effects) are manifest among individuals in proportion to the dose of risk alleles they carry. This is true for the genetics underlying many continuous traits, such as height53. With AI diseases, it is thought that an environmental trigger is required to produce disease-related autoantibodies, immune dysregulation, and subsequent disease expression7,54. Thus, in the absence of a trigger (and thus absence of disease), an association between a genetic predisposition and disease-associated biomarkers would not be expected.

It is also possible that the prevalence of subclinical or incomplete AI disease may be higher than population estimates would suggest, and a polygenic predictor is effective at detecting subclinical disease associations when examining a continuous disease feature. Under this scenario, it would be expected that effect sizes (such as the magnitude of the association with WBC levels), would be greater among women, who have higher incidence rates, as compared to men. Consistently, effects sizes were larger for women for each of the disease examined. A gender differential would not be expected if the PRS were capturing constitutive regulation of WBC levels, as this activity should be comparable across genders. Based on the findings presented here, we hypothesize that the observed associations between the genetic predictors for these 3 AI diseases and WBC counts across these populations is explained by some degree of immune dysfunction and/or subclinical disease that is translated in modest changes in WBC counts. However, we cannot rule out that the PRS are far more sensitive than we would otherwise assume, and it is able to detect associations in populations with very low numbers of cases.

This study has limitations. While we excluded individuals in the BioVU population with known AI diseases, it is possible that these diseases were represented at higher rates than expected in the study populations. In addition, if individuals with a disease were under active treatment with medications that reduced or increased WBC levels, this could lead to larger or smaller (depending on drug and disease) effects sizes in the PRS associations that may be detectable among smaller numbers of cases. Finally, these analyses were restricted to individuals of White European Ancestry. For AI diseases like SLE, incidence rates are higher among individuals of African ancestries55. Thus, these results are likely not generalizable to other ancestries.

In conclusion, we found that a genetic predisposition toward some AI diseases (SLE, RA and MS) was associated with either lower or higher WBC counts in multiple populations. The directions of associations for SLE and RA were consistent with clinical observations of lower and higher WBC counts, respectively, as were differences in the magnitudes of the associations across sexes.