Carbonic Anhydrase 6 Gene Variation influences Oral Microbiota Composition and Caries Risk in Swedish adolescents

Carbonic anhydrase VI (CA6) catalyses the reversible hydration of carbon dioxide in saliva with possible pH regulation, taste perception, and tooth formation effects. This study assessed effects of variation in the CA6 gene on oral microbiota and specifically the acidophilic and caries-associated Streptococcus mutans in 17-year old Swedish adolescents (n = 154). Associations with caries status and secreted CA6 protein were also evaluated. Single Nucleotide Polymorphisms (27 SNPs in 5 haploblocks) and saliva and tooth biofilm microbiota from Illumina MiSeq 16S rDNA (V3-V4) sequencing and culturing were analysed. Haploblock 4 (rs10864376, rs3737665, rs12138897) CCC associated with low prevalence of S. mutans (OR (95% CI): 0.5 (0.3, 0.8)), and caries (OR 0.6 (0.3, 0.9)), whereas haploblock 4 TTG associated with high prevalence of S. mutans (OR: 2.7 (1.2, 5.9)) and caries (OR: 2.3 (1.2, 4.4)). The TTG-haploblock 4 (represented by rs12138897(G)) was characterized by S. mutans, Scardovia wiggsiae, Treponema sp. HOT268, Tannerella sp. HOT286, Veillonella gp.1 compared with the CCC-haploblock 4 (represented by rs12138897(C)). Secreted CA6 in saliva was weakly linked to CA6 gene variation. In conclusion, the results indicate that CA6 gene polymorphisms influence S. mutans colonization, tooth biofilm microbiota composition and risk of dental caries in Swedish adolescents.

The primary aim of this study was to evaluate the effects of genetic variation in the CA6 gene region on oral microbiota and specifically the acidophilic and caries-associated S. mutans in Swedish adolescents. Secondary aims were to evaluate associations between variation in the CA6 gene region and secreted CA6 protein and caries status.

Results
Participant characteristics. Using next-generation sequencing (NGS), 75.3% of participants had detectable S. mutans, with significantly higher proportions among the caries-affected than caries-free subjects (p < 0.001; Table 1). Overall, the median caries score (DeFS) was 2.0 tooth surfaces and 67% of participants were affected by caries (DeFS ≥ 1) and 33% were caries-free (DeFS = 0) ( Table 1). There were no significant differences between those with detectable S. mutans versus not or those who were caries-affected versus not with respect to sex, smoking, sweet snacking, BMI, and number of tooth surfaces. The proportion who reported brushing the teeth twice a day did not differ between those with S. mutans versus not, but it was significantly lower among caries-affected than caries-free participants (p < 0.001; Table 1).
CA6 single nucleotide polymorphic sites and gene structure. After quality control measures, 27 out of 30 Single Nucleotide Polymorphisms (SNPs) were retained for analyses. Two SNP (rs2274333 and rs17032942) were excluded due to sample call rate of 0%. An additional SNP (rs6697763) was excluded from analysis for deviation from Hardy-Weinberg equilibrium (p = 0.033), leaving 27 SNPs for further analyses.
CA6 gene variation rs12138897 (G/C) and tooth biofilm microbiota. The association of the CCC-haploblock 4 and the TCG-haploblock 4 with S. mutans status (as well as caries status as described in the next paragraph) led us to evaluate the association between the C and G allele of rs12138897 (C representing the CCC-and G-the TTG haploblock) and overall tooth biofilm microbiota by multivariate Partial Least Squares (PLS) analysis of NGS determined taxa. PLS modelling (crossvalidated prediction Q 2 = 24%) displayed separation of participants carrying the G versus C allele based on their tooth biofilm microbiota (Fig. 3a). The separation was explained by strong associations (Variable Importance in Projection (VIP) > 1.5) with the G allele for eight species, i.e., higher detection rate of Treponema sp. HOT268, Actinomyces sp. HOT896, Tannerella sp. HOT286, S. mutans, GN02[G-1] sp. HOT872, Scardovia wiggsiae, and species detected by Veillonella genus probe1 (Fig. 3b). The C allele associated with higher detection rate of Bacteroidales [G-2] sp. HOT274, Prevotella sp. HOT315, Tannerella forsythia, Fretibacterium fastidiosum, Filifactor alocis, and species detected by Treponema genus probe 2 (Fig. 3c). Five of the eight species associating with the G allele, i.e., Treponema sp. HOT268, Tannerella sp. CA6 gene variation and caries status. Following that CA6 gene variation was associated with colonization of caries associated bacteria, such as S. mutans and S. wiggsiae, the association between the 27 SNPs and caries status (yes or no) was tested in an allelic association model. The rs12021597 (A), rs10864376 (T), rs3737665 (T), and rs12138897 (G) associated positively with having caries ( Table 4). The strongest effects on having caries was observed for the rs10864376 (T) allele with an odds ratio (95% CI) of 2.1 (1.2, 3.7) (p = 0.009). Of the 5 haploblocks, two (haploblock 3 and 4) were associated with caries status ( Table 4). The GCA-haploblock 3 and TTG-haploblock 4 associated with increased odds to have caries, whereas CCC-haploblock 4 associated with decreased odds of having caries. Comparing the TTG-haploblock 4 with the CCC-haploblock 4 showed an odds ratio of 2.6 (1.3, 5.2) (p = 0.008) covering 66.7% of the participants.
CA6 in saliva (concentration or secreted amounts) was not associated with having S. mutans or not, but secreted amounts (µg/min) was significantly higher in caries-free than caries-affected (p = 0.012) ( Table 1). In accordance, twice as many caries-free, compared to caries-affected, participants were identified in the highest tertile based on the distribution of secreted CA6 amounts (p between groups = 0.007) with an OR to be caries-free of 3.5 (1.4, 9.1).  Table 2. CA6 SNP and haploblock associations with S. mutans detection in both saliva and tooth biofilm by NGS. Alleles in bold indicate significant positive associations and in italic negative associations at FDR of < 0.25 (bold underline for differences significant at FDR ≤0.06 too). * indicates SNPs also associated with caries (see Table 4).  Table 5 together with comparisons between the allele variants groups as well as trends among the allele variants. There were no significant differences between the alleles with respect to sex, BMI, smoking, frequency of sweet snacking, or number of tooth surfaces. The proportions with S. mutans by NGS and culturing, respectively, differed significantly between the allele variant groups (p < 0.001) as did the trend from the TCG to CCC variant (p < 0.001). The association with caries was less strong, but the proportions caries-affected participants tended to differ between the allele variant groups (p = 0.054) with a significant trend (p = 0.021), whereas caries prevalence (DeFS) did not differ between the groups though the trend did (p = 0.018).

Discussion
This study tested the hypothesis that variation in the CA6 gene is relevant to dental health and found that CA6 gene polymorphisms and haploblocks of CA6 were associated with S. mutans colonization, overall microbiota composition and dental caries in Swedish adolescents. Secreted CA6 in saliva tended to be linked to CA6 gene variation. Thereby, it is the first study to demonstrate effects of CA6 gene polymorphisms on the oral bacterial communities in a population-based setting.
Collectively, the present findings support a role of CA6 polymorphisms in oral health, i.e., balancing less, against more, aciduric species, e.g., cariogenic S. mutans and S. wiggsiae, and de-and remineralization in the caries process. The results are in line with previous studies reporting a role of CA6 in caries development in children [13][14][15][16][17] . Though several of the physiological effects of CA6 are relevant for both oral microbiota ecology and the caries process, the detailed mechanisms for the effects remain to be elucidated. The underlying mechanisms for oral health of CA6 and CA6 gene variations may, alone or in concert, include tooth formation, local pH regulation, bacteria adhesion and metabolism and taste induced food preferences.
A tooth formation mechanism appears plausible because CA6 is expressed in the bud, cup, and bell stage of early tooth development, and by pre-and early differentiated odontoblasts and dental papilla mesenchyme cells, but most avidly in late tooth development with formation and maturation of the enamel 2 . Differences in enamel quality affects the initial tooth covering protein pellicle, and consequently the profile of attachment epitopes for bacteria 19 and possibly oral microbiota composition. Differences in enamel quality also affects resistance of tooth tissue to destruction in an acidic environment.
Given the importance of pH homeostasis for the oral environment, local pH regulation by CA6 may be a contributory mechanism. Several of the most abundant bacteria in the mouth are glycolytic with avid acid formation, leading to pH fall and enrichment of aciduric, cariogenic species (dysbiosis) 20 . The CA6 protein is suggested to regulate pH in secreted fluids 1 , and saliva buffer capacity was reported to differ by CA6 gene variation at rs2274327 16 , and CA6 protein activity in some 12 , but not other 5 , studies. This aspect could not be tested in the present study as neither saliva buffer capacity nor pH were analyzed, but some gene variants were associated with presence of aciduric and acidogenic species, such as S. mutans and S. wiggisae, which are promoted at low pH. Since reported intake of sweets did not differ by CA6 gene variation this may reflect a less well functioning pH regulation by some CA6 gene variants. Impaired pH regulation may also increase or decrease the risk of tooth de-and remineralisation.
Other important factors in oral biofilm ecology include access to attachment sites on host-and partner bacteria structures, and exposure to oxygen and nutrients. CA6 incorporated in pellicle and tooth biofilm may influence bacterial ecology and metabolism through effects unrelated to it's enzymatic activity. Such effects may be two-fold; i.e., by proteolytic release of bioactive peptides or by providing a bacteria binding site. Here, the TTG and CCC associated SNPs were linked with mixed panels of oral bacterial such as S. mutans and S. wiggsiae, but also Treponema sp. HOT268, and Tannerella sp. HOT286 which thrive at a neutral to slightly basic pH. However, these suggested mechanisms are theoretical and need to be evaluated in future studies.
Finally, CA6 gene variation has been suggested to link to bitter taste and smell perception [21][22][23][24][25][26][27] . To date, reported associations with food selection are inconsistent and the present study did not find any effect of CA6 polymorphisms on sweet snack intake. This may reflect a "true" lack of association or the inherent error in measuring sugar intake in general and especially in the dental office. This aspect as well as the reported association between bitter taste receptor stimulation and innate immunity function 28 should be addressed by a study design that is less prone to bias, such as using a biomarker or genetic marker reflecting sweet intake.
Compared to previous investigations, the present study included detailed genotyping of a relatively large number of SNPs within and near CA6, allowing fine-mapping of both single genetic variants and recombination blocks. The phenotypes were also detailed, including a range of salivary, microbiome, and host characteristics which may affect oral microbiota ecology and the caries process directly or indirectly. Additional strengths were that (i) all participants were of the same age (17 years old), (ii) participants were recruited in connection with their regular dental survey which reduces the risk for selection bias, (iii) participants permanent teeth have been exposed to the oral environment for years allowing ecology stabilization and potential development caries, and (iv) have minimal confounding by diseases or medications. Further, the study design, i.e., consecutive inclusion  Table 3. CA6 SNP and haploblock associations with detection of viable mutans streptococci by saliva culturing. Alleles in boldindicate significant positive associations and in italic negative associations at FDR < 0.25 (bold underline for significant differences at FDR ≤ 0.06). * indicates SNPs that also were associated with caries (see Table 4).
SCIEnTIfIC REPoRtS | (2019) 9:452 | DOI:10.1038/s41598-018-36832-z of participants as they attended their regular visit for a dental examination, and that virtually all accepted to participate, limited the risk of a selection bias. There are some limitations in the study that should be acknowledged. The relative low number of participants, including that only a subset was available for assessing CA6 amounts, limits the possibility of comparisons in subgroups. Further, identification of microorganisms by NGS is associated with potential errors at various steps, such as the PCR amplification, sequencing per se, and bioinformatic accuracy. It should also be noted that in contrast to culturing, DNA based identification by NGS identifies dead and alive bacteria, and that saliva detection, in contrast to tooth biofilm samples, reflects bacteria from epithelial and tooth surfaces. These differences contributed to the variation in S. mutans detection frequency between the different identification strategies. To reduce the impact of the sampling source, being S. mutans positive was based on presence in either saliva or tooth biofilm by NGS. In addition, self-reported information on lifestyle habits, here tooth brushing, tobacco use and intake of sweet snacks, is prone to bias, such as the well-known under-reporting of diet intake 29 .
CA6 was originally found in the salivary glands, but has now been found to be expressed by the lacrimal, tracheobronchial, nasal, and mammary glands (with especially high content in colostrum 30 , and in epithelial linings of the oesophagus, stomach, and large intestine. This has led to a wider search of its functions. Thereby, CA6 has been linked to acinic cell carcinoma 31 , colorectal cancer 27,[32][33][34] , and Sjögrens syndrome [35][36][37] . Based on the findings from the current study it may be hypothesized that CA6 and CA6 gene variations contribute to the microbial environment in both the nasopharynx and gastro-intestinal tracts. In conclusion, the results from the present study support that CA6 gene polymorphisms rs10864376 (T), rs3737665 (T), rs12138897 (G) and haploblock TTG of CA6 are associated with S. mutans colonization, overall microbiota composition and dental caries in Swedish adolescents. Further studies need to elucidate the mechanisms.

Material and Methods
Participants. In this study 154 17-year-old adolescents were recruited from three public dental health care clinics in the city of Umeå, Sweden 38 . Adolescents who had other chronic medical conditions, used medication regularly, required antibiotic treatment during the latest six months, who did not consent, or were unable to communicate in Swedish or English were excluded. All participants had answered a questionnaire with information  Table 4. CA6 SNP and haploblock associations with being caries-affected or caries-free. Alleles in bold indicate significant positive associations and in italic negative associations at FDR of < 0.25 (bold underline for significant differences at FDR ≤ 0.06). CA6 genotyping. Single nucleotide polymorphisms in the region of the CA6 gene were selected and genotyped on DNA isolated from whole stimulated saliva. The criteria for SNP selection were a) minor allele frequency (MAF) ≥ 5% in the CEU population b) not in complete linkage disequilibrium (r 2 ≤ 0.8) and c) in the 10,000 bp up-stream to 10,000 bp down-stream region of the CA6 gene. Using the NIH snptag resource (https:// snpinfo.niehs.nih.gov/snpinfo/snptag.html) 30 SNPs met these criteria. Genotyping was performed using a multiplexed primer extension chemistry of the iPLEX assay with detection of the incorporated allele by mass spectrometry using a MassARRAY analyzer (Agena Bioscience, Hamburg, Germany). Raw data from the mass reader illustrating the relation between the microbiota species and the G (red) and C (green) genotype of CA6. Red/ green dots refer to taxa with a VIP-value > 1.5, and grey dots to taxa with a VIP value < 1.5. (c) Correlation coefficients (mean (95% CI; shown on the left x-axis) for taxa with a VIP value > 1.5 (shown on the right x-axis) in plot B. Red bars show association with G and green bars with C genotypes of CA6. The dots on the lines represent median prevalence for each taxon and the stars (*) statistically significant difference between the two genotypes using univariate Chi 2 test and p-values < 0.05. Sampling for microbiota analyses and CA6 determination. Whole simulated saliva was collected for 3 min into ice-chilled sterile test tubes while participants chewed on a 1 g piece of paraffin wax. Supragingival biofilm was collected from all accessible tooth surfaces using sterile wooden toothpicks and pooled by participant into 100 µL of TE buffer (10 mM Tris, 1 mM ethylenediaminetetraacetic acid [EDTA], pH 7.6). All study participants refrained from oral hygiene on the morning of their visit to the dental clinic. All samples, except those for culturing, were stored at −80 °C until used.
Oral microbiota genotyping. Genomic DNA was extracted from supragingival biofilm samples, saliva and from bacteria mock communities with the GenElute Bacterial Genomic DNA Kit (Sigma-Aldrich, Stockholm, Sweden) as previously described 38 . Briefly, samples were collected and lysed in buffer with lysozyme and mutanolysin (Sigma-Aldrich), treated with RNase and proteinase K, purified and washed. DNA quantity and quality were measured using NanoDrop 1000 Spectrophotometer (Thermo Fisher Scientific, Uppsala, Sweden). Multiplex 16S rDNA amplicon sequencing was performed with the Illumina MiSeq platform (http://www.illumina.com). 16S rDNA amplicon sequencing was conducted at the Forsyth Research Institute with the HOMINGS protocol 42 . Briefly, the V3-V4 hypervariable regions of the 16S rRNA gene were polymerase chain reaction (PCR) amplified with the 341 F (ACGGGAGGCAGCAG) forward primer and the 806 R (GGACTACHVGGGTWTCTAAT) reverse primer. Pair-end reads were fused using FLASH, and barcodes, primers, and ambiguous and chimeric sequences were removed within QIIME. Taxa were identified with the ProbeSeq customized BLAST program, which targeted recognition of 538 species (using 638 species-level probes) and 129 groups of closely-related species (using 129 genus-level probes) 43 . Taxa targeted by the genus probes (gp) are described in Table S2. Based on sequencing results for the mock communities, taxa that were represented by at least 100 sequences with >98.5% similarity were included in further analyses.

Caries scoring and lifestyle information.
Visual and radiographic dental examinations were performed as described previously 38 . The numbers of tooth surfaces with caries in the enamel or into the dentine, or a filling, or were missing were recorded. The total numbers of decayed and filled tooth surfaces (DeFS) were calculated. Fissure sealants were not recorded. Missing surfaces were not considered to represent dental caries because tooth loss occurred for orthodontic reasons, tooth formation defects or aplasia in the study group.
Determination of CA6 concentration in saliva. For    specific antibody, then detected by a biotinylated secondary antibody, followed by an avidin-horseradish peroxidase (HRP) conjugate and tetramethylbenzidine (TMB) for colour development. The colour development reaction was terminated by sulfuric acid addition and the optical density (OD) measured at 450 nm using a MultiscanGO spectrophotometry (Thermo Scientific, Abninolab, Upplands-Väsby, Sweden). A standard curve using purified CA6 protein ranging from 0-1,000 pg/mL was run.
Determination of total saliva protein concentration. Total saliva protein concentration was determined using Pierce Coomassie (Bradford) protein assay kit (ThermoFisher). Briefly, 5 µl diluted (1:1) saliva, was mixed with 250 µL protein dye solution and incubated for 30 min at room temperature. The optical density of each sample at 595 nm wavelength was assessed using a MultiscanGO spectrophotometry (Thermo Scientific) and compared to the optical density of a bovine serum albumin standard curve from known antigen concentrations to determine the sample concentration.
Data handling and statistical inference. NGS sequencing of saliva was performed in 139 tooth biofilm samples and 154 saliva samples where 2 samples failed. Caries information was missing for one participant who was excluded. All analyses with caries as outcome were restricted to the 129 subjects who reported brushing their teeth twice a day. SPSS software version 24.0 was used for descriptive statistics, including medians (10%, 90% percentiles), frequencies (n), proportions (%) and odds ratios (OR) with 95% confidence intervals (CI). Group comparisons of continuous variables were done using Mann-Whitney U test and for categorical variables Chi 2 or Fisher's exact test. All tests were two-tailed. Correction for multiple comparison was done by the Benjamini Hochberg false discovery rate (FDR). P-values were considered significant at FDR < 0.25 and ≤0.06. The higher FDR was applied to avoid missing potential CA6 SNPs of interest as described earlier 44,45 .
Haploview software (version 4.2) was used to evaluate characteristics of SNPs (observed heterozygosity, predicted heterozygosity, Hardy-Weinberg equilibrium, minor allele frequency, pairwise linkage disequilibrium), as well as evaluation of potential haploblocks using the algorithm described by Gabriel et al. 46 . Haploblocks with frequencies <1% were not included in the analyses. Association between genetic variation (SNPs and haploblocks) and phenotypes was evaluated using Haploview and Chi2 tests. The tests were two-tailed tests and corrected for multiple comparison by FDR as described above. Phenotype penetrance was assessed in common allele, recessive and dominant models 47 .
Clustering of participants by presence (0/1) of bacterial taxa in saliva and tooth biofilm, respectively, was modelled using partial least square (PLS) regression. Saliva taxa did not yield a significant model and therefore PLS analyses employed tooth biofilm taxa. Variables with a Variable Importance in the Projection (VIP) > 1.5 were considered influential. SIMCA P+ version 14.1 (Umetrics AB) was used for these analyses.

Data Availability
NGS sequence data can be found at https://doi.org/10.6084/m9.figshare.5794989. Genotyping CA6 datasets generated and analysed during the current study are available from the corresponding author on reasonable request and with appropriate ethical approvals.