SCGB1D2 inhibits growth of Borrelia burgdorferi and affects susceptibility to Lyme disease

Lyme disease is a tick-borne disease caused by bacteria of the genus Borrelia. The host factors that modulate susceptibility for Lyme disease have remained mostly unknown. Using epidemiological and genetic data from FinnGen and Estonian Biobank, we identify two previously known variants and an unknown common missense variant at the gene encoding for Secretoglobin family 1D member 2 (SCGB1D2) protein that increases the susceptibility for Lyme disease. Using live Borrelia burgdorferi (Bb) we find that recombinant reference SCGB1D2 protein inhibits the growth of Bb in vitro more efficiently than the recombinant protein with SCGB1D2 P53L deleterious missense variant. Finally, using an in vivo murine infection model we show that recombinant SCGB1D2 prevents infection by Borrelia in vivo. Together, these data suggest that SCGB1D2 is a host defense factor present in the skin, sweat, and other secretions which protects against Bb infection and opens an exciting therapeutic avenue for Lyme disease.

FinnGen (www.finngen.fi/en) is a joint research project of the public and private sectors, launched in Finland in the autumn of 2017, that aims to genotype 500,000 Finns including prospective and retrospective epidemiological and disease-based cohorts as well as hospital biobank samples 1 .We defined Lyme disease by extracting International Classification of Diseases (ICD)-9 (1048A) and ICD-10 (A69.2) codes from hospital inpatient, hospital outpatient and primary outpatient health registries.Our FinnGen data consisted of 7,354 individuals with Lyme disease and 404,827 controls (Supplementary Table 1).
The Estonian Biobank is a population-based biobank of the Estonian Genome Center at the University of Tartu.Its cohort size is 212,955 participants, which closely reflects the age, sex and geographical distribution of the Estonian population.Lyme disease was based on ICD-10 code A69.2.The Estonian Biobank sample included 18,001 cases and 187,549 disease free controls (Supplementary Table 1).
The resulting full data sample consisted of 617,731 individuals with 25,355 cases and

Analysis of individual genetic variants
To study genetics behind Lyme disease we analyzed a total of 617,731 samples from the FinnGen Data Freeze 10 or Estonian Biobank with 25,355 individuals with Lyme disease diagnosis.For the GWAS in both cohorts we performed genome-wide association testing as implemented in the REGENIE 2 .We combined the GWAS results from both cohorts using a fixed-effect meta-analysis model in METAL 3 .
The meta-analysis revealed three genome-wide significant signals (P < 5.0×10 -8 ).The characteristics of these loci are presented in Supplementary Table 2. Our analysis pointed to a genome-wide signal in the TLR1-locus (rs17616434) and in the HLA-region (rs9276610) located at the HLA class II locus.In addition, we observed the strongest association in SCGB1D2 where the lead variant was a missense variant rs2232950.Curiously, this variant causes amino acid change from proline to leucine, and this change is predicted deleterious by several databases.3-4).

Supplementary Table 2. Lead variants from meta-analysis for Lyme disease
Supplementary Table 3. Linkage disequilibrium (r2) between lead variants from FinnGen and Meta-analysis and the HLA-alleles in FinnGen.We show associations with P-value < 0.001.The tests were computed with two-sided logistic regression.
Supplementary Figure 2. Meta-analysis of HLA locus conditioning for main effect.

b) TLR1 locus
Toll-like receptors control innate immune responses that affect the first line defense against pathogens and infections.The TLR locus that associates with Lyme disease has been previously associated in particular with TLR activation and following cytokine response 4 .In our study we identified a signal with a non-coding variant at the TLR1 locus that was in high LD with two missense variants at TLR1 (rs5743618, Ser602Ile, r2 = 0.90; and rs4833095, Asn248Ser, r2 = 0.89).Conditional analysis adjusting for the main effect (rs17616434) did not reveal additional signals.The P-values for missense variants after conditioning for the main effect was (rs5743618, Ser602Ile, P = 0.22; and rs4833095, Asn248Ser, P = 0.40) suggesting that the signal from these missense variants is either captured or shared with the lead variant.Finally, Supplementary Figure 3 shows robustness of the signal across both cohorts.

c) SCGB1D2 locus
Our meta-analysis showed a novel genome-wide significant finding related to Lyme disease in chromosome 11 (rs2232950).This causes amino acid change from proline to leucine indicating a deleterious mutation by several algorithms 5,6 .

Supplementary Figure 4. Regional association at the SCGB1D2 locus
To examine SCGB1D2 locus and its genomic variation's causality to Lyme disease in more detail, we fine-mapped this region utilizing the "Sum of Single Effects" -model, called SuSiE 7 .The credible set included six variants including the common missense variant rs2232950.
Furthermore, SuSiE predicted posterior probability of 0.22 to the lead variant rs2232950.
Similarly to FinnGen, the locus in EstBB contains several significant variants.Fine-mapping the locus with HyprColoc 8 revealed all the SNPs to be in high LD with each other, and conditional analysis suggested the lead SNP rs2232950 was the likely causal variant (Supplementary Figure 5).
Supplementary Figure 5. Conditional analysis in SCGB1D2 locus.Adjusting for rs2232950 removes association signal at the SCGB1D2 locus.
The figure depicts P-value (-log10 scale) plot of colocalization of chr11-wide association study with Lyme disease as a binary trait (conditional for rs2232950).Each dot represents a single nucleotide variant (SNV).The highlighted SNV (mahogany red triangle) is the candidate causal SNV.Colors filled in the dots depict its linkage disequilibrium with the candidate causal SNV.No independent chr11-wide significance was observed for other SNVs.
Prolines are often the initiators of ɑ-helix structures 9 .Pro53 is the first residue in SCGB1D2 H3-helix backbone, therefore a Pro>Leu substitution at this position will likely destabilize the ɑ-helical structure, as the downstream amino acids (Val56 and Ala57) will lose an anchorpoint (Supplementary Figure 6).Supplementary Figure 6.Overall structure of Lipophilin B protein dimer and evaluation utilizing AlphaFold2 structure prediction.

PheWAS analysis
We performed a phenome-wide association analysis (PheWAS) to explore the association between the missense variant rs2232950 and 2,202 disease endpoints from FinnGen.FinnGen endpoints include primarily electronic health record derived phenotypes.To complement this analysis, we computed PheWAS also using the OpenTargets platform, which includes traits from publicly available GWASes and traits from other biobanks.This analysis did not reveal additional significant associations besides its association with Lyme disease in FinnGen and rs2232950.Therefore, this analysis did not provide additional insight into the function of SCGB1D2 (Supplementary Figure 7).

Supplementary Figure 7. Open targets PheWAS with rs2232950
Phenome-wide association (PheWAS) from publicly available data from the OpenTargets platform are visualized by trait and their -log10(P-values) in the y-axis.Colors represent trait categories.
Furthermore, to elucidate the possible broader association of SCGB1D2 Pro53Leu with pathogens we analyzed association across 36 different disease categories in EstBB, including tick-mediated diseases (tick-borne encephalitis), general arthropod behavioral markers (scabies), spirochaete bacterium phylum members (syphilis) and other bacterial diseases (e.g.sepsis, scarlet fever and erysipelas) of which none were associated with Lyme disease SCGB1D2 Pro53Leu variant (Supplementary Figure 8).
Supplementary Figure 8. Results of a single variant association study with various infectious agent categories 36 different infectious disease categories (ICD-10 classification system) were used to create study cohorts within EstBB, using electronic health records from the Estonian medical system between 2004-2021.Genetic relationship matrices and logistic regression for (rs2232950) SCGB1D2:Pro53Leu for each cohort was calculated with Rerenie 8 .Plot on left displays associations according to their P value (-log10 scale) and beta coefficient values.Lead hit for A69 category ("other spirochaetal infections", of which A69.2 "Lyme disease" makes ~97%) is marked in blue.The red horizontal line represents a threshold of statistical significance (P = 0.05).Table on the right displays the ICD-10 categories, cohort description, P value, effect size, standard error and is sorted according to P-value (-log10 scale).

Expression profiling a) GTEx
We examined RNA expression across tissue types using The Genotype-Tissue Expression (GTEx) v8 using the fully processed and normalized gene expression matrices for each tissue.
These data contain RNA expression samples from 948 donors across 54 tissues.These same values are used for eQTL calculations by GTEx.We extracted the values for SCGB1D2 in these data and plotted the normalized values per tissue (Supplementary Figure 9).The two skin types (sun exposed and unexposed) show the highest expression levels of SCBG1D2.

Supplementary Figure 9. Tissue distribution of SCGB1D2 expression
Expression pattern of SCGB1D2 across human tissues.We obtained RNA expression data from the GTEx project 11

b) Single cell analysis
In order to understand the relevant cell types for SCGB1D2 expression from the skin we examined single cell sequencing data from skin.We observed that SCGB1D2 was predominantly expressed by the sweat gland cells (Supplementary Figure 10).

Supplementary Figure 10. Expression of SCGB1D2 by cell type in the skin
Single cell sequencing data shows SCGB1D2 expression is specific for sweat gland cells.Data from He et al 12 .

Functional analyses of Borrelia burgdorferi
The results below support our main findings in live Bb.We estimate the effect of SCGB1D2

3 .
Understanding of genetic associations a) HLA locus HLA has a strong and established role in human immune defense.However, its contribution to Lyme disease has not been previously thoroughly explored.Our GWAS results showed a genome-wide significant HLA-locus, and to study this finding in more detail we fine-mapped this region.We computed association statistics with each HLA-allele from the HLA-A, HLA-B, HLA-C, HLA-DRB1, HLA-DQA1, HLA-DQB1, HLA-DPA1 and HLA-DPB1 genes and discovered the most significant association with HLA-DQB1*06:02.Similarly, our lead variant rs9276610 was in high linkage disequilibrium (LD) with HLA-DQB1*06:02 (r 2 = 0.558).As HLA-DQB1*06:02 is also in high LD with DRB1*15:01, we estimated the pairwise LD also for HLA-DRB1*15:01.The analysis supported HLA-DQB1*06:02 as the most significant HLA-allele to associate with Lyme disease (Supplementary Table 20-90 were used for generating this data.On the dimer model in the lower right, the Pro53Leu point mutation location has been highlighted in orange.Molecular graphics and analyses performed with UCSF ChimeraX (version 1.5rc202211080803), developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from National Institutes of Health R01-GM129325 and the Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases 10 .

Supplementary Figure 11 .Supplementary Figure 12 .
P53L on Bb growth and examine the killing capability of SCGB1D2 recombinant protein on live Bb (Supplementary Figure 11-14).Borrelia burgdorferi growth inhibition by SCGB1D2 P53L Timescale analysis of Borrelia burgdorferi growth inhibition by SCGB1D2 P53L over 72h hours.Yaxis represents green fluorescent protein (GFP) count per frame and X-axis represents time.Concentrations tested are 2 to 16 μg/mL.Borrelia burgdorferi killing by SCGB1D2 Borrelia burgdorferi (Bb) spirochetes expressing green fluorescent protein (GFP) were incubated with a) with or without propidium iodide (PI).b) either 8μg/mL or 16μg/mL of reference (Ref) or variant (Var) SCGB1D2 protein in the presence of PI to measure Bb death by SCGB1D2.After 24 hours of incubation, an aliquot of each culture was analyzed by flow cytometry for loss of GFP.Overall ANOVA was used comparing Bb with PI and Bb with PI and SCGB1D2 proteins (ANOVA F(4, 14 )=3.185,P=0.0467).****P<0.0001;ns, not significant.Supplementary Figure13.Gating scheme to distinguish live and dead bacteria Gating scheme is shown for green fluorescent protein (GFP) positive and GFP negative Borrelia Burgdorferi (Bb) to distinguish live and dead bacteria after 24 hours in BSK-H culture media and exposed to varying levels of SCGB1D2 protein in a Bb growth inhibition assay, with the addition of 1.5μL propidium iodide (Millipore Sigma) per well prior to incubation.After 24 hours of incubation, samples were fixed in 4% paraformaldehyde, resuspended in flow cytometry buffer (2% FBS, 1mmol EDTA, in PBS), and analyzed by flow cytometry for presence of GFP.SupplementaryFigure 14.In vivo imaging system (IVIS) quantification of SCGB1D2 prophylaxic effect on intradermal infection with N40D10/E9 Bb at Day 0 IVIS imaging of 1 min exposures of mice at day 0 post-infection of Bb(N40D10/E9) where the Bb had been co-incubated co-infected with SCGB1D2 or SCGB3A1 or no protein control.Total flux (p/s) signals were quantified by gating individual mice at the injection site 15 min after being injected with 277 mg/kg sterile filtered D-luciferin dissolved in phosphate buffered saline (PBS).

Table 5 .
HLA allele associations with Lyme disease in FinnGen We show associations with P-value < 0.05.The tests were computed with two-sided logistic regression.Supplementary

Table 6 .
HLA allele associations with Lyme disease in Estonian Biobank We show associations with P-value < 0.05.The tests were computed with two-sided logistic regression.Supplementary

Table 7 .
HLA amino acid associations with Lyme disease in FinnGen