A diagnostic classifier for gene expression-based identification of early Lyme disease

Servellita, Venice; Bouquet, Jerome; Rebman, Alison; Yang, Ting; Samayoa, Erik; Miller, Steve; Stone, Mars; Lanteri, Marion; Busch, Michael; Tang, Patrick; Morshed, Muhammad; Soloski, Mark J.; Aucott, John; Chiu, Charles Y.

doi:10.1038/s43856-022-00127-2

Download PDF

Article
Open access
Published: 22 July 2022

A diagnostic classifier for gene expression-based identification of early Lyme disease

Venice Servellita¹^na1,
Jerome Bouquet¹^na1,
Alison Rebman²,
Ting Yang²,
Erik Samayoa¹,
Steve Miller¹,
Mars Stone³,
Marion Lanteri³,
Michael Busch ORCID: orcid.org/0000-0002-1446-125X³,
Patrick Tang ORCID: orcid.org/0000-0003-1583-5484⁴,
Muhammad Morshed⁵,
Mark J. Soloski²,
John Aucott² &
…
Charles Y. Chiu ORCID: orcid.org/0000-0003-2915-2094^1,6

Communications Medicine volume 2, Article number: 92 (2022) Cite this article

6019 Accesses
5 Citations
73 Altmetric
Metrics details

Subjects

Abstract

Background

Lyme disease is a tick-borne illness that causes an estimated 476,000 infections annually in the United States. New diagnostic tests are urgently needed, as existing antibody-based assays lack sufficient sensitivity and specificity.

Methods

Here we perform transcriptome profiling by RNA sequencing (RNA-Seq), targeted RNA-Seq, and/or machine learning-based classification of 263 peripheral blood mononuclear cell samples from 218 subjects, including 94 early Lyme disease patients, 48 uninfected control subjects, and 57 patients with other infections (influenza, bacteremia, or tuberculosis). Differentially expressed genes among the 25,278 in the reference database are selected based on ≥1.5-fold change, ≤0.05 p value, and ≤0.001 false-discovery rate cutoffs. After gene selection using a k-nearest neighbor algorithm, the comparative performance of ten different classifier models is evaluated using machine learning.

Results

We identify a 31-gene Lyme disease classifier (LDC) panel that can discriminate between early Lyme patients and controls, with 23 genes (74.2%) that have previously been described in association with clinical investigations of Lyme disease patients or in vitro cell culture and rodent studies of Borrelia burgdorferi infection. Evaluation of the LDC using an independent test set of samples from 63 subjects yields an overall sensitivity of 90.0%, specificity of 100%, and accuracy of 95.2%. The LDC test is positive in 85.7% of seronegative patients and found to persist for ≥3 weeks in 9 of 12 (75%) patients.

Conclusions

These results highlight the potential clinical utility of a gene expression classifier for diagnosis of early Lyme disease, including in patients negative by conventional serologic testing.

Plain language summary

Lyme disease is a bacterial infection spread by ticks and there are nearly half a million cases a year in the United States. However, the disease is difficult to diagnose and existing laboratory tests have limited accuracy. Here, we develop a new genetic test, described as a Lyme disease classifier (LDC), for diagnosing early Lyme disease from blood samples by assessing the patient’s response to the infection. We find that the LDC can identify early Lyme disease patients (those presenting with symptoms within weeks of a tick bite) accurately, even before standard laboratory tests turn positive. In the future, the LDC may be clinically useful as a test for Lyme disease to diagnose patients earlier in the course of their illness, thus guiding more timely and effective treatment for the infection.

Transcriptomic analysis of immune cells in a multi-ethnic cohort of systemic lupus erythematosus patients identifies ethnicity- and disease-specific expression signatures

Article Open access 21 April 2021

Gaia Andreoletti, Cristina M. Lanata, … Marina Sirota

Immune cell identifier and classifier (ImmunIC) for single cell transcriptomic readouts

Article Open access 26 July 2023

Sung Yong Park, Sonia Ter-Saakyan, … Ha Youn Lee

Integration of mechanistic immunological knowledge into a machine learning pipeline improves predictions

Article 12 October 2020

Anthony Culos, Amy S. Tsai, … Nima Aghaeepour

Introduction

Lyme disease is a systemic tick-borne infection caused by Borrelia burgdorferi sensu lato and the most common vector-borne disease in the United States¹. Lyme disease can cause arthritis, facial palsy, neuroborreliosis (neurological disease including meningitis, radiculopathy, and encephalitis), and even myocarditis resulting in sudden death². Most patients treated with appropriate antibiotics recover rapidly and completely, but 5–15% of patients develop persistent or recurring symptoms. When prolonged and associated with functional disability, patients are considered to have post-treatment Lyme disease syndrome (PTLDS)^3,4. The failure to diagnose and treat Lyme disease in a timely fashion results in higher morbidity and protracted recovery times⁵.

Diagnosis of early Lyme disease is challenging⁶. Clinical manifestations can be highly variable, presenting as non-specific “flu-like” symptoms, and a characteristic bullseye erythema migrans (EM) rash is seen only 60–70% of the time⁷. Available FDA-approved serologic assays, including two-tier antibody testing recommended by the CDC for diagnosis, are negative in up to 40% of early Lyme patients^8,9,10. Nucleic acid testing is hindered by low titers of B. burgdorferi in the blood during acute infection, with only 20–62% reported sensitivity of detection^11,12.

The advent of the genomics era has spurred the development of diagnostic tests based on transcriptome (“RNA-Seq”) analyses of the human host response¹³. Classification by gene expression profiling has been useful in the identification of various infections, including Staphylococcal bacteremia¹⁴, active versus latent tuberculosis¹⁵, influenza^16,17, and COVID-19^18,19. Transcriptome profiling of peripheral blood mononuclear cells (PBMCs)²⁰ or EM skin lesions²¹ from patients with early Lyme disease has demonstrated pronounced inflammatory responses predominated by interferon signaling. Machine learning (ML)-based analyses of RNA-Seq data have been used for cancer classification²², but to date have not yet been applied for infectious disease diagnosis. Here we sought to leverage iterative ML analyses of global and targeted RNA-Seq data to define a panel of differentially expressed genes (DEGs) to distinguish Lyme disease from non-Lyme controls. This panel, referred to as a Lyme disease classifier (LDC), consisted of 31 genes and was able to diagnose Lyme disease with >95% accuracy, including in >85% of Lyme seronegative patients.

Methods

Patient information

Patient enrollment, chart review, collection of clinical samples, and analysis of clinical samples by transcriptomic profiling or targeted RNA sequencing were done under protocols approved by the Institutional Review Boards of Johns Hopkins University (JHU) (JHU IRB # NA_00011170) and the University of California, San Francisco (UCSF IRB # 17–241124211). Written informed consent was obtained from all JHU Lyme disease and uninfected control patients for enrollment into the study. No consents were obtained from other, non-JHU patients since only remnant clinical samples from these patients were used, and the samples were analyzed under protocols approved by the UCSF IRB as part of a “no subject contact” biobanking study with waiver of consent (UCSF IRB #17–2411).

All 94 Lyme disease subjects included in this study presented with a physician documented EM of ≥5 cm and either concurrent flu-like symptoms that included at least one of the following: fever, chills, fatigue, headache, and/or new muscle or joint pains or dissemination of the EM rash to multiple skin locations. Controls (n = 26) were enrolled from the same physician practice as cases. Two-tier serological Lyme disease testing was performed on clinical Lyme patients by a clinical reference laboratory (Quest Diagnostics) at the first visit and at 3 weeks, following a standard 3-week course of doxycycline treatment. Patients found to be Lyme seropositive at the first visit did not get repeat testing. Seropositivity was assessed according to established CDC criteria²³, including the requirement that patients have had symptoms for less than or equal to 30 days for Lyme diagnosis by positive ELISA and IgM testing. All controls were required to have a negative Lyme serologic test and no clinical history of Lyme disease to be enrolled in the study. All Lyme disease patients and controls were collected in Maryland, USA, an area highly endemic for Lyme disease.

PBMC samples from 57 patients diagnosed with other infections were collected at the UCSF, and 22 controls (asymptomatic blood donors) were collected at the Blood Systems Research Institute in San Francisco, California. Patients with other infections were diagnosed with either bacteremia (n = 21), caused by Enterococcus faecium, Escherichia coli, Klebsiella pneumoniae, Staphylococcus aureus, Staphylococcus epidermidis, or Streptococcus pneumoniae by standard plate culture, or influenza (n = 36) by positive RT-PCR testing (Luminex NxTAG Respiratory Pathogen Panel). PBMC samples from 19 adults, 9 patients diagnosed with tuberculosis using an interferon-gamma release assay (Oxford Immunotec T-SPOT.TB), and 10 uninfected controls, were collected at the British Columbia Centre for Disease Control in Vancouver, Canada.

PBMCs were isolated from freshly collected whole blood in EDTA tubes (kept at 4 °C for <24 h) using Ficoll (Ficoll-Paque Plus, GE Healthcare) and total RNA was extracted from 10⁷ PBMCs using TRIzol reagent (Life Technologies).

Transcriptome sequencing

Messenger RNA was isolated with the Oligotex mRNA mini kit (Qiagen). The Scriptseq RNA-Seq library preparation kit (Epicentre) was used to generate the RNA-Seq libraries according to the manufacturer’s protocol. Libraries were sequenced as 100 bp paired-end reads on a HiSeq 2000 instrument (Illumina).

Samples were processed in two batches (Fig. 1). Set 1 corresponds to samples from 28 Lyme disease patients and 13 matched control samples as previously described²⁰. Set 2 corresponds to samples from 13 new Lyme disease and 6 matched control samples prepared and sequenced alongside samples from 6 influenza and 6 bacteremia patients. One sample was not included in the pooled analysis due to insufficient read counts.

**Fig. 1: Flowchart of the approach used to develop and validate a 31-gene Lyme disease classifier panel for identification of early Lyme disease.**

Transcriptome RNA-Seq data analyses

Paired-end reads were mapped to the human genome (hg19), followed by annotation of exons and calculation of FPKM (fragments per kilobase of exon per million fragments mapped) values for all 25,278 expressed genes with version 2 of the TopHat/Cufflinks pipeline²⁴. Differential expression of genes was calculated using the variance modeling at the observational level transformation²⁵, which applies precision weights to the matrix count, followed by linear modeling with the Limma package. Genes were considered to be differentially expressed when the change was ≥1.5-fold, the p value ≤ 0.05, and the adjusted p value (or false-discovery rate, FDR) was ≤0.001²⁶.

Targeted RNA sequencing

Quantitative analysis of a custom panel of transcripts of interest was performed using a targeted RNA enrichment sequencing approach that incorporated an anchored multiplex PCR technique. PBMC samples (~1 million cells) were extracted using Zymo DirectZol RNA Miniprep Kit with on-column DNase following the manufacturer’s instructions. Reverse transcription was performed using the Illumina TruSeq Targeted RNA Expression Kit on 50 ng of RNA according to the manufacturer’s instructions. A custom panel of oligoucleotides representing the genes of interest was designed and ordered using the Illumina DesignStudio platform. This pool of oligonucleotides, each attached to a small RNA sequencing primer (smRNA) binding site, was used to hybridize, extend, and ligate the second strand of cDNA from targeted genes of interest. Thirty-five cycles of amplification were then performed using primers with a complementary smRNA sequence. The resulting libraries were sequenced on an Illumina MiSeq to a depth of ~2500 reads per sample per gene. Expression counts per sample per gene were calculated on the instrument using MiSeq reporter targeted RNA workflow software (revision C). Briefly, following demultiplexing and FASTQ file generation, reads from each sample were normalized in R and then aligned locally against references corresponding to targeted regions of interest using a banded Smith–Waterman algorithm²⁷.

Machine learning

The k-nearest neighbor classification with leave-one-out cross-validation algorithm (KNNXV)⁸, as implemented on Genepattern²⁸, was used on the set of DEGs identified by RNA-Seq-based transcriptome profiling, using a k of 3, signal-to-noise ratio feature selection, Euclidean distance, and by iteratively decreasing the number of features until reaching maximum accuracy.

Class prediction performance using receiver-operating characteristic (ROC) metric on targeted RNA sequencing read count results was tested using the glmnet²⁹ and caret³⁰ packages in R for ten different ML methods at default parameters: classification and regression trees (“rpart” method), generalized linear models (“glmnet” method), linear discriminant analysis (“lda” method), k-nearest neighbor (“knn” method), random forest (“rf” method), eXtreme Gradient Boosting (“xgbTree” method), neural networks (“nnet” method), linear and radial support vector machine (“svmLinear” and “svmRadial” methods), and nearest shrunken centroid (“pam” method). Subsequent feature selection and fitting of the glmnet or generalized linear models were performed using 10-fold cross-validation with regularization using lasso (least absolute shrinkage and selection operator) penalty and lambda (λ) parameter. The value of lambda that provided the minimum mean cross-validated error was used to determine the optimal set of genes.

Statistical methods

The performance of the classifier was evaluated with the use of ROC curves, calculation of area under the curve (AUC)³¹, and estimates of sensitivity, specificity, positive predictive value, and negative predictive value. A Mann–Whitney nonparametric test was used for the analysis of continuous variables, and Fisher’s exact test was used for categorical variables. All confidence intervals were reported as two-sided binomial 95% confidence intervals. Statistical analysis was performed, and plots were generated using R software, version 4.0.3 (R Project for Statistical Computing).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Results

The study comprised a total of 263 samples from 218 subjects (Table 1 and Supplementary Data 1). The 218 subjects included 94 Lyme disease patients, 66 infected “non-Lyme” controls with influenza (n = 36), tuberculosis (n = 9), and other bacteremia (n = 21), and 58 uninfected asymptomatic controls. All Lyme patients, including 61 seropositive and 33 seronegative by clinical two-tiered antibody testing, had documented EM rash and history of tick exposure at the time of presentation and were enrolled in the “Study of Lyme disease Immunology and Clinical Events” study at the Johns Hopkins Medical Institute. Control subjects categorized as uninfected asymptomatic were from regions with an incidence of Lyme disease of ≤0.2% (San Francisco, California and Vancouver, British Columbia) or had a negative Lyme serology test and no clinical history of tick-borne disease. No significant differences in age or sex were noted between Lyme and control subjects.

Table 1 Performance characteristics of the 31-gene Lyme disease classifier.

Full size table

Transcriptome profiling using RNA-Seq was initially performed on PBMC samples from 72 subjects, including 41 Lyme patients and 31 controls (Fig. 1). Included were 41 samples from 28 Lyme patients and 13 uninfected controls (set 1), as previously reported²⁰. For the remaining 31 samples from 13 Lyme patients and 18 controls (set 2), a mean of 30 (±17 standard deviation) million reads was generated per sample (Supplementary Fig. 1). No batch effect based on the geographic site of the collection was observed (Supplementary Fig. 2). DEGs were selected separately for each set of PBMC samples using the KNNXV ML feature selection algorithm³². The best accuracy for sets 1 and 2 was achieved using a panel of 58 and 60 genes, respectively.

These genes, along with an additional top 50 DEGs that were ranked according to adjusted p value/FDR in order of decreasing significance and did not overlap with the two panels, were then combined into a 172-gene targeted RNA sequencing panel (Supplementary Data 2). The 172-gene panel was used to test 90 samples (38 Lyme seropositive, 9 Lyme seronegative, and 43 controls) over 2 targeted RNA expression sequencing runs (TREx, “targeted RNA expression” runs 1 and 2). A subset of 86 genes out of 172 (50%) with the maximum differences in gene expression between Lyme and “non-Lyme” control samples across the first 2 TREx runs was identified using Welch’s t-test at a p < 0.05 cutoff. The smaller 86-gene panel was then used to analyze an additional 119 samples in TREx runs 3 and 4.

Next, ML-based methods were applied to select from the list of 86 candidate genes and determine the optimal combination of genes and classification model for the LDC. We randomly partitioned samples from TREx runs 1–4 into a training set or test set. After ensuring that the training set consisted entirely of samples from laboratory-confirmed (“Lyme seropositive”) Lyme disease patients and that no prior analyses had been performed on the independent test set, 137 and 63 samples were assigned to the training and test sets, respectively, at an approximately 2:1 (68.5%:31.5%) ratio. The training set was used to evaluate ten different ML algorithms for feature and model selection while varying the number of features (genes) from 1 to 86 for discriminating Lyme from non-Lyme patients using a 10-fold cross-validation scheme (Supplementary Fig. 3). A generalized linear model (“glmnet”) was found to provide the highest AUC-ROC statistic (97.2%) with the AUC-ROC of other methods varying from 70 to 93%. The optimal cutoff as determined by Youden’s J statistic (Youden, 1950) was 0.3. The highest AUC and lowest rate of misclassification error were found with a panel of 31 genes (Fig. 2A).

**Fig. 2: A 31-gene Lyme disease classifier derived using the generalized linear model machine learning algorithm.**

Based on the expression of the 31 genes in the finalized LDC panel, a disease score ranging from 0 to 1 was calculated, with a score >0.3 classified as Lyme and <0.3 as “non-Lyme”. Compared to two-tier Lyme antibody testing as a reference gold standard, training set sensitivity, specificity, and AUC-ROC using this scoring metric were 95.5% (95% CI 84.1–100%), 86.0% (95% CI 77.4–98.9%), and 97.2 (95% CI 95.0–99.3%), respectively (Fig. 2B and Table 1). Five of 44 (11.4%) Lyme samples and 12 of 93 controls (12.9%) in the training set were misclassified (Fig. 2C). LDC results between subjects who were seropositive at presentation had comparable sensitivity to those who were seropositive after 3 weeks (Table 1, 88% versus 89%, respectively).

For the independent test set of 63 samples, the LDC classifier had an overall accuracy of 95.2% (95% CI 86.7–99.0%), with a sensitivity of 90% (95% CI 83.3–100%) and specificity of 100% (95% CI 90.9–100%) relative to two-tier Lyme antibody testing and based on misclassification of 1 Lyme seropositive and 2 Lyme seronegative samples (Fig. 2D, E). LDC results between subjects seropositive at presentation had higher sensitivity than those who were seropositive after 3 weeks (Table 1, 100% versus 83%, respectively). LDC sensitivities for Lyme seropositive and seronegative samples were 93.7% and 85.7%, respectively (Table 1).

The 31 identified genes on the panel were related to immune cell signaling (n = 7), cell division (n = 6), apoptosis (n = 3), cell growth and differentiation (n = 3), cell trafficking (n = 2), B. burgdorferi receptor-binding (n = 2), and 8 other functions (n = 8) (Fig. 2F). Many genes (23 of 31, 74.2%) had previously been described in association with cell culture (n = 20), murine (n = 2), and Lyme disease patient studies (n = 3) of B. burgdorferi infection (Supplementary Data 3).

To evaluate for the persistence of the LDC gene signature, we analyzed available serially collected samples from a subset of 18 clinical Lyme patients at 0 week (time of initial clinical presentation with EM rash) and 3 weeks (following completion of a 3-week course of doxycycline treatment) (Fig. 3). Among four Lyme seronegative cases, three (75%) had a discordant result, with negative Lyme serology but a positive LDC score of >0.3 (Fig. 3, P2–P4). Two of these three cases seroconverted at 3 weeks by IgM testing (Fig. 3, P2 and P4) but did not formally fulfill CDC criteria since the duration of illness from onset of symptoms was >30 days (although would be considered seropositive using a 6-week cutoff as suggested by others)³³, while the remaining seronegative/LDC-positive patient (Fig. 3, P3) was ELISA positive and had one and two bands for IgM and IgG, respectively, at 3 weeks, appeared close to seroconverting, Among the 4 cases with late seroconversion 3 weeks after the presentation (Fig. 3, P5–P8), 3 of 4 (Fig. 3A, P6–P8) were positive by LDC testing at time 0 week, while P5 was negative at 0 week but positive at 3 weeks. Ten of 13 cases (76.9%) that were LDC positive at time 0 remained persistently positive at 3 weeks (Fig. 3, P2, P7, P8, P9, P10, P11, P15, P16, P17, and P18), while the remaining 3 (Fig. 3, P6, P12, and P14) showed a decline in the LDC score below the 0.3 threshold.

**Fig. 3: Longitudinal testing of clinical Lyme patients using the Lyme disease classifier.**

Samples from ten patients collected at 3 weeks and/or 6 months after the clinical presentation of Lyme disease were available and, based on LDC testing, could be assigned into two subgroups with different longitudinal trajectories (Fig. 4). One subgroup (Fig. 4, I) contained three patients with positive LDC scores at 0 week (Fig. 4, P2, P12, and P14) that declined at 3 weeks but rebounded by 6 months. P12 and P14 had persistent symptoms at 6 and 12 months, respectively, but without the functional disability to meet clinical criteria for PTLDS^3,4. The other subgroup (Fig. 4, II) contained seven patients who had gradual declines in LDC score from 0 week to 6 months. Among these seven patients, two were symptomatic at 6 months but returned to usual state of health at 1 year (Fig. 4, P13 and P16), while one Lyme seronegative patient diagnosed with clinical PTLDS was negative by LDC testing at all three time points (Fig. 4, P1).

**Fig. 4: Lyme disease classifier scores from longitudinally collected patient samples.**

Unfortunately, 6-month samples were not available for two Lyme disease patients who met clinical criteria for PTLDS and had a persistently positive LDC signature at 3 weeks (Fig. 3B, P4 and P9).

Discussion

Here we applied transcriptome profiling, targeted RNA-Seq, and iterative ML-based analyses to construct a 31-gene LDC with 90% sensitivity and 100% specificity in identifying clinical Lyme patients at the time of initial presentation. A condensed diagnostic panel of 31 multiplexed gene targets makes it amenable to implementation on commercial multiplexed nucleic acid testing instruments³⁴ or on targeted RNA next-generation sequencing platforms, with the latter being used in 2020–2021 for clinical SARS coronavirus 2 (SARS-CoV-2) testing under FDA Emergency Use Authorization³⁵. We also found that 77% of Lyme disease patients with a positive LDC at initial presentation remained positive for at least 3 weeks, consistent with earlier work on the Lyme disease transcriptome²⁰. This observation indicates that an LDC classifier may be useful for Lyme disease diagnosis during the approximately 3-week “window period” prior to the generation of detectable antibody levels by two-tiered testing²³. Taken together, the LDC classifier meets four of the five characteristics of an “ideal” Lyme disease diagnostic, as described by Schutzer et al.⁸, including high sensitivity in early infection, high specificity, ≤24 h turnaround time (if implemented on a multiplexed nucleic acid testing platform), and testing from easily collected samples such as blood. Thus, the LDC classifier may be useful as a complementary diagnostic to serologic testing, which exhibits high sensitivity (95–100%) in later stages of Lyme disease (the sole remaining characteristic out of 5), but inadequate sensitivity (29–77%) in early Lyme^10,36.

As expected, most of the genes (74%, 23 of 31) in the LDC classifier panel had previously been reported as related to Lyme disease based on in vitro and in vivo investigations. However, the LDC would have been near impossible to construct a priori given that selection of an optimal set of genes would have been difficult and that 8 of the 31 (25.8%) genes had not been previously described in the literature. Notably, only 7 (22.5%) genes in the panel were associated with immune cell signaling, of which 3 (9.7%) were related to interferon signaling, in contrast with prior reports demonstrating strong immune and inflammatory responses in early Lyme disease^20,21,37,38. Unlike these previous studies, here we incorporated controls from patients with acute febrile infections from viruses (influenza) or other bacteria, potentially explaining why only a minority of LDC genes were associated with immune cell signaling. Instead, many of the identified genes in the LDC were related to cell division and proliferation, autophagy, and apoptosis. It has previously been shown that PBMCs from patients with Lyme disease exhibit proliferation in vitro to B. burgdorferi infection³⁹. B. burgdorferi has also been shown to induce autophagy in infected PBMCs resulting in the production of cytokines such as interleukin-1β⁴⁰. In addition, phagocytosis of B. burgdorferi induces apoptosis in human monocytes⁴¹and also in neuronal cells of the dorsal root ganglia⁴². Genes associated with these signaling pathways may be more specific to Lyme disease and thus more useful as diagnostic biomarkers than those focused solely on immune and inflammatory responses. Further research on the genes identified in the LDC classifier to investigate their involvement in Borrelia pathogenesis is warranted in future studies.

Prior studies have used gene expression to profile Lyme disease patients from PBMCs^20,37,38, although our study incorporates larger numbers of Lyme disease cases and controls. The three previously reported studies present similar findings showing an increase in immune and inflammatory response genes, particularly those interferon-regulated, in Lyme disease cases relative to uninfected controls. The study by Clarke, et al.³⁷ also reported the development of a diagnostic classifier of 20 genes for early Lyme disease, but the performance was not evaluated with an independent test set. The study by Petzke, et al.³⁸ reported two kinds of classifiers for discriminating between Lyme disease cases and controls and between Lyme disease cases that resolve after treatment and those that progress to having persistent symptoms. All these classifiers are limited by the absence of controls from other viral and bacterial infections to exclude overlapping immune and inflammatory response genes. In fact, only two genes in our LDC classifier, TYMS, a DNA replication and repair gene, and GRN, a cell proliferation gene, are shared with these prior classifiers^37,38. Other “omics” technologies have been used to develop classifiers for Lyme disease. For example, a previous study reported a metabolomic signature with 88% sensitivity and 95% specificity for the identification of seropositive Lyme⁴³, although the controls in that study were different (infectious mononucleosis, fibromyalgia, severe periodontitis, and syphilis).

One limitation of the current study is the absence of controls from other, less common tick-borne (e.g., babesiosis, anaplasmosis, ehrlichiosis, rickettsiosis, and Powassan virus infection) and spirochetal (e.g., syphilis, leptospirosis) infections. However, nearly all of these other tick-borne and spirochetal infections can be diagnosed by conventional microbiological molecular and/or serologic testing⁴⁴. In addition, we previously reported more overlap in the transcriptomic signature of Lyme disease with viral (influenza) infection than with bacterial infection²⁰. This suggests that the human host response to Lyme disease is likely different from other tick-borne and spirochetal infections. The finding of 23 of 31 genes in the classifier being related to Borrelia infection also supports the contention that the LDC is specific to Lyme disease. Another limitation is the small size of longitudinally collected samples at 3 weeks (n = 17) and 6 months (n = 10). Here we focused on a classifier for early Lyme disease based on host gene expression. Further investigation will be needed to investigate its potential role in the evaluation of Lyme disease patients with chronic symptoms and/or PTLDS. Finally, it can be challenging to develop and clinically validate an RNA expression-based assay for 31 genes simultaneously, However, it may be feasible to decrease the number of genes on the panel without unduly sacrificing performance (Fig. 2A), and FDA authorization of targeted omics-based tests for COVID-19³⁵ suggests a potential regulatory pathway for the deployment of a multiplexed Lyme diagnostic in the near future.

As ~86% of samples from patients persistently seronegative at 0 and 3 weeks were correctly classified as Lyme, our LDC classifier may allow more accurate stratification of presumptive Lyme patients testing negative by serology. In the absence of “gold-standard” testing, it cannot be proven that these seronegative patients were infected by B. burgdorferi. Nevertheless, documentation of EM rash in all Lyme patients in this study, even in those who tested seronegative, concurrent “flu-like” symptoms, and enrollment during tick season in a region highly endemic for Lyme disease suggest that this may indeed be the case. Evidence in support of infection is also provided by the finding that three of the four LDC-positive, seronegative patients exhibited borderline serologic responses just outside of formal CDC criteria for seropositivity. Conversely, the remaining seronegative Lyme patient, who was also negative by LDC testing (Figs. 3 and 4, P1), appears to be a likely bona fide Lyme-negative case, despite being incidentally diagnosed with PTLDS. More accurate discrimination of Lyme patients using the LDC may be clinically useful by prompting diagnostic workup for a different tick-borne disease or other acute illness. The identification of a subgroup of three patients (out of ten) with a persistently positive LDC signature at 6 months, two of whom had ≥6 months of persistent symptoms, warrants further study on the potential utility of the LDC for diagnosis and monitoring of Lyme disease patients with chronic symptoms.

Data availability

All data in this study were submitted to the National Institutes of Health (NIH) database of Genotypes and Phenotypes (dbGaP) (read count tables, raw FASTQ files for transcriptome sets 1 and 2 accession number phs002794.v1.p1). Public summary phenotype data are available at the dbGaP study report web page: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002793.v1.p1. Individual-level data, including transcriptomic sequencing data, are available for download by authorized investigators via https://view.ncbi.nlm.nih.gov/dbgap-controlled. The sequencing data are only available via restricted access as patients did not consent for the public release of their data and to protect patient confidentiality. Metadata for the 263 clinical samples included in this study are provided in Supplementary Data 1. Source data used to generate the main figures are provided in Supplementary Data 4.

Code availability

Code used to reproduce the ML analysis for LDC model prediction and feature selection has been deposited in a Zenodo repository (doi: 10.5281/zenodo.5987532)⁴⁵.

References

Rosenberg, R. et al. Vital signs: trends in reported vectorborne disease cases – United States and Territories, 2004-2016. MMWR Morb. Mortal Wkly Rep. 67, 496–501 (2018).
Article Google Scholar
Forrester, J. D. et al. Notes from the field: update on Lyme carditis, groups at high risk, and frequency of associated sudden cardiac death–United States. MMWR Morb. Mortal Wkly Rep. 63, 982–983 (2014).
PubMed PubMed Central Google Scholar
Aucott, J. N., Rebman, A. W., Crowder, L. A. & Kortte, K. B. Post-treatment Lyme disease syndrome symptomatology and the impact on life functioning: is there something here? Qual. Life Res. 22, 75–84 (2012).
Rebman, A. W. & Aucott, J. N. Post-treatment Lyme disease as a model for persistent symptoms in Lyme disease. Front Med. (Lausanne) 7, 57 (2020).
Article Google Scholar
Marques, A. Chronic Lyme disease: a review. Infect. Dis. Clin. North Am. 22, 341–360 (2008). vii–viii.
Article Google Scholar
Branda, J. A. & Steere, A. C. Laboratory diagnosis of Lyme borreliosis. Clin. Microbiol. Rev. 34, e00018–19 (2021).
Steere, A. C. et al. Systemic symptoms without erythema migrans as the presenting picture of early Lyme disease. Am. J. Med. 114, 58–62 (2003).
Article Google Scholar
Schutzer, S. E. et al. Direct diagnostic tests for Lyme disease. Clin. Infect. Dis. 68, 1052–1057 (2019).
Article CAS Google Scholar
Aguero-Rosenfeld, M. E. & Wormser, G. P. Lyme disease: diagnostic issues and controversies. Expert Rev. Mol. Diagn. 15, 1–4 (2015).
Article CAS Google Scholar
Steere, A. C., McHugh, G., Damle, N. & Sikand, V. K. Prospective study of serologic tests for Lyme disease. Clin. Infect. Dis. 47, 188–195 (2008).
Article Google Scholar
Aguero-Rosenfeld, M. E., Wang, G., Schwartz, I. & Wormser, G. P. Diagnosis of Lyme borreliosis. Clin. Microbiol. Rev. 18, 484–509 (2005).
Article CAS Google Scholar
Eshoo, M. W. et al. Direct molecular detection and genotyping of Borrelia burgdorferi from whole blood of patients with early Lyme disease. PLoS One 7, e36825 (2012).
Article CAS Google Scholar
Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet 10, 57–63 (2009).
Article CAS Google Scholar
Ahn, S. H. et al. Gene expression-based classifiers identify Staphylococcus aureus infection in mice and humans. PLoS One 8, e48979 (2013).
Article CAS Google Scholar
Anderson, S. T. et al. Diagnosis of childhood tuberculosis and host RNA expression in Africa. N. Engl. J. Med. 370, 1712–1723 (2014).
Article CAS Google Scholar
Woods, C. W. et al. A host transcriptional signature for presymptomatic detection of infection in humans exposed to influenza H1N1 or H3N2. PLoS One 8, e52198 (2013).
Article CAS Google Scholar
Zaas, A. K. et al. Gene expression signatures diagnose influenza and other symptomatic respiratory viral infections in humans. Cell Host Microbe 6, 207–217 (2009).
Article CAS Google Scholar
Butler, D. et al. Shotgun transcriptome, spatial omics, and isothermal profiling of SARS-CoV-2 infection reveals unique host responses, viral diversification, and drug interactions. Nat. Commun. 12, 1660 (2021).
Article CAS Google Scholar
Ng, D. L. et al. A diagnostic host response biosignature for COVID-19 from RNA profiling of nasal swabs and blood. Sci. Adv. 7, eabe5984 (2021).
Bouquet, J. et al. Longitudinal transcriptome analysis reveals a sustained differential gene expression signature in patients treated for acute Lyme disease. mBio 7, e00100–e00116 (2016).
Article CAS Google Scholar
Marques, A. et al. Transcriptome assessment of erythema migrans skin lesions in patients with early Lyme disease reveals predominant interferon signaling. J. Infect. Dis. 217, 158–167 (2017).
Article Google Scholar
Zhang, Y. H. et al. Identifying and analyzing different cancer subtypes using RNA-seq data of blood platelets. Oncotarget 8, 87494–87511 (2017).
Article Google Scholar
Moore, A., Nelson, C., Molins, C., Mead, P. & Schriefer, M. Current guidelines, common clinical pitfalls, and future directions for laboratory diagnosis of Lyme disease, United States. Emerg. Infect. Dis. 22, 1169–1177 (2016).
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).
Article Google Scholar
Law, C. W., Chen, Y., Shi, W. & Smyth, G. K. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014).
Article Google Scholar
Dalman, M. R., Deeter, A., Nimishakavi, G. & Duan, Z. H. Fold change and p-value cutoffs significantly alter microarray interpretations. BMC Bioinformatics 13(Suppl 2), S11 (2012).
Article Google Scholar
Okada, D., Ino, F. & Hagihara, K. Accelerating the Smith-Waterman algorithm with interpair pruning and band optimization for the all-pairs comparison of base sequences. BMC Bioinformatics 16, 321 (2015).
Article Google Scholar
Reich, M. et al. GenePattern 2.0. Nat. Genet. 38, 500–501 (2006).
Article CAS Google Scholar
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
Article Google Scholar
Kuhn, M. Building predictive models in R using the caret package. J. Stat. Soft. 28, 1–26 (2008).
Hanley, J. A. & Hajian-Tilaki, K. O. Sampling variability of nonparametric estimates of the areas under receiver operating characteristic curves: an update. Acad. Radiol. 4, 49–58 (1997).
Article CAS Google Scholar
Golub, T. R. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999).
Article CAS Google Scholar
Branda, J. A. et al. 2-tiered antibody testing for early and late Lyme disease using only an immunoglobulin G blot with the addition of a VlsE band as the second-tier test. Clin. Infect. Dis. 50, 20–26 (2010).
Article CAS Google Scholar
Poritz, M. A. & Lingenfelter, B. Multiplex PCR for detection and identification of microbial pathogens. in Advanced Techniques in Diagnostic Microbiology, 3rd edition, Volume 2: Techniques, Vol. 2 (eds. Tang, Y.-W. & Stratton, C.W.) 1 online resource (XIV, 541 pages 594 illustrations, 567 illustrations in color) (Springer International Publishing: Imprint: Springer, Cham, 2018).
First NGS-based COVID-19 diagnostic. Nat. Biotechnol. 38, 777 (2020).
Branda, J. A. et al. Advances in serodiagnostic testing for Lyme disease are at hand. Clin. Infect. Dis. 66, 1133–1139 (2018).
Article CAS Google Scholar
Clarke, D. J. B. et al. Predicting Lyme disease from patients’ peripheral blood mononuclear cells profiled with RNA-sequencing. Front. Immunol. 12, 636289 (2021).
Article CAS Google Scholar
Petzke, M. M. et al. Global transcriptome analysis identifies a diagnostic signature for early disseminated Lyme disease and its resolution. mBio 11, e00047–20 (2020).
Kalish, R. S. et al. Human T lymphocyte response to Borrelia burgdorferi infection: no correlation between human leukocyte function antigen type 1 peptide response and clinical status. J. Infect. Dis. 187, 102–108 (2003).
Article CAS Google Scholar
Buffen, K. et al. Autophagy modulates Borrelia burgdorferi-induced production of interleukin-1beta (IL-1beta). J. Biol. Chem. 288, 8658–8666 (2013).
Article CAS Google Scholar
Cruz, A. R. et al. Phagocytosis of Borrelia burgdorferi, the Lyme disease spirochete, potentiates innate immune activation and induces apoptosis in human monocytes. Infect. Immun. 76, 56–70 (2008).
Article CAS Google Scholar
Ramesh, G., Santana-Gould, L., Inglis, F. M., England, J. D. & Philipp, M. T. The Lyme disease spirochete Borrelia burgdorferi induces inflammation and apoptosis in cells from dorsal root ganglia. J. Neuroinflammation. 10, 88 (2013).
Article CAS Google Scholar
Molins, C. R. et al. Development of a metabolic biosignature for detection of early Lyme disease. Clin. Infect. Dis. 60, 1767–1775 (2015).
Article Google Scholar
Rodino, K. G., Theel, E. S. & Pritt, B. S. Tick-borne diseases in the United States. Clin. Chem. 66, 537–548 (2020).
Article Google Scholar
Chiu, C. Y., Servellita, V., & Bouquet, J. A diagnostic classifier for gene expression-based identification of early Lyme disease [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5987532 (2022).

Download references

Acknowledgements

This work was supported by grants from the Bay Area Lyme Foundation, the Steven and Alexandra Cohen Foundation, the Benioff Foundation, the Swartz Foundation, the Stabler Foundation, the Global Lyme Alliance, and the National Institutes of Health (grants R01-HL105704 and P30-AR05350), We would like to thank Yvonne Simpson for identifying and preparing tuberculosis patient and control samples for this study.

Author information

These authors contributed equally: Venice Servellita, Jerome Bouquet.

Authors and Affiliations

Department of Laboratory Medicine, University of California, San Francisco, CA, USA
Venice Servellita, Jerome Bouquet, Erik Samayoa, Steve Miller & Charles Y. Chiu
Lyme Disease Research Center, Division of Rheumatology, Department of Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
Alison Rebman, Ting Yang, Mark J. Soloski & John Aucott
Blood Systems Research Institute, San Francisco, CA, USA
Mars Stone, Marion Lanteri & Michael Busch
Sidra Medical and Research Center, Doha, Qatar
Patrick Tang
British Columbia Centre for Disease Control, Vancouver, BC, Canada
Muhammad Morshed
Department of Medicine, Division of Infectious Diseases, University of California, San Francisco, CA, USA
Charles Y. Chiu

Authors

Venice Servellita
View author publications
You can also search for this author in PubMed Google Scholar
Jerome Bouquet
View author publications
You can also search for this author in PubMed Google Scholar
Alison Rebman
View author publications
You can also search for this author in PubMed Google Scholar
Ting Yang
View author publications
You can also search for this author in PubMed Google Scholar
Erik Samayoa
View author publications
You can also search for this author in PubMed Google Scholar
Steve Miller
View author publications
You can also search for this author in PubMed Google Scholar
Mars Stone
View author publications
You can also search for this author in PubMed Google Scholar
Marion Lanteri
View author publications
You can also search for this author in PubMed Google Scholar
Michael Busch
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Tang
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Morshed
View author publications
You can also search for this author in PubMed Google Scholar
Mark J. Soloski
View author publications
You can also search for this author in PubMed Google Scholar
John Aucott
View author publications
You can also search for this author in PubMed Google Scholar
Charles Y. Chiu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.B. and C.Y.C. conceived of and designed the study. J.B. performed the experiments. V.S., J.B., A.R., T.Y., E.S., S.M., M.S., M.L., M.B., P.T., M.M., M.J.S., and J.A. collected samples and associated clinical and laboratory metadata. V.S., J.B., A.R., T.Y., and C.Y.C. analyzed clinical and epidemiological data. V.S., J.B., and C.Y.C. analyzed the gene expression data. V.S., J.B., and C.Y.C. wrote the manuscript. V.S. and C.Y.C. designed the figures. V.S., J.B., M.J.S., J.A., and C.Y.C. edited the manuscript.

Corresponding author

Correspondence to Charles Y. Chiu.

Ethics declarations

Competing interests

C.Y.C. and J.A. are on the scientific advisory board for the Bay Area Lyme Foundation. The other authors declare no competing interests.

Peer review

Peer review information

Communications Medicine thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4.

Description of Additional Supplementary Files

Peer Review File

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Servellita, V., Bouquet, J., Rebman, A. et al. A diagnostic classifier for gene expression-based identification of early Lyme disease. Commun Med 2, 92 (2022). https://doi.org/10.1038/s43856-022-00127-2

Download citation

Received: 09 November 2021
Accepted: 17 May 2022
Published: 22 July 2022
DOI: https://doi.org/10.1038/s43856-022-00127-2

This article is cited by

A new deep neuro-fuzzy system for Lyme disease detection and classification using UNet, Inception, and XGBoost model from medical images
- S. Vishnu Priyan
- S. Dhanasekaran
- D. Silambarasan
Neural Computing and Applications (2024)
Lyme borreliosis diagnosis: state of the art of improvements and innovations
- Mickaël Guérin
- Marc Shawky
- Séverine Padiolleau-Lefèvre
BMC Microbiology (2023)

Subjects

Abstract

Background

Methods

Results

Conclusions

Plain language summary

Similar content being viewed by others

Introduction

Methods

Patient information

Transcriptome sequencing

Transcriptome RNA-Seq data analyses

Targeted RNA sequencing

Machine learning

Statistical methods

Reporting summary

Results

Discussion

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links