Evaluation of 16S rRNA primer sets for characterisation of microbiota in paediatric patients with autism spectrum disorder

Abstract intestinal microbiota is becoming a significant marker that reflects differences between health and disease status also in terms of gut-brain axis communication. Studies show that children with autism spectrum disorder (ASD) often have a mix of gut microbes that is distinct from the neurotypical children. Various assays are being used for microbiota investigation and were considered to be universal. However, newer studies showed that protocol for preparing DNA sequencing libraries is a key factor influencing results of microbiota investigation. The choice of DNA amplification primers seems to be the crucial for the outcome of analysis. In our study, we have tested 3 primer sets to investigate differences in outcome of sequencing analysis of microbiota in children with ASD. We found out that primers detected different portion of bacteria in samples especially at phylum level; significantly higher abundance of Bacteroides and lower Firmicutes were detected using 515f/806r compared to 27f/1492r and 27f*/1495f primers. So, the question is whether a gold standard of Firmicutes/Bacteroidetes ratio is a valuable and reliable universal marker, since two primer sets towards 16S rRNA can provide opposite information. Moreover, significantly higher relative abundance of Proteobacteria was detected using 27f/1492r. The beta diversity of sample groups differed remarkably and so the number of observed bacterial genera.

An advent of massively parallel sequencing (MPS) technologies has revolutionized an investigating of human microbiota 1 . MPS platforms has enabled unrevealing of the microbiota composition and bacterial identification in complex microbial samples which was not feasible before 2 . Relatively easily accessible results tempted to excitement for revealing yet unknown. However, the procedure is very complex. Obtaining the results truly reflecting the real microbiota state may require tedious optimization. All steps in DNA library preparation including sample collection and storage 3 or DNA isolation 4 may influence the results of analysis. Suitable protocol for DNA libraries preparation have to be chosen with regards to sequencing platform 5,6 as well as type of specimen origin 7,8 . In this respect, primer selection seems to be the most crucial task in preparation of DNA sequencing libraries 9 and so bacterial composition discovery. Via selection of specific amplification primers, investigating of more less any location and length of bacterial genes is enabled. Despite the fact that MPS technologies allow analysis of whole bacterial genomes, sequencing of 16S rRNA has become a gold standard in microbiota studies 10 . These genes are of suitable length and structure for phylogenetic analysis 11 . Huge advantage for 16S sequencing and subsequent bacterial identification are easily accessible and rapidly expanding databases of their sequences in specific organisms [12][13][14] .Various primer sets are being used for amplification of 16R rRNA genes and were considered to be universal. However, primers may differ in specificity and sensitivity. Moreover, amplification primer sets need to be chosen depending on type of biological sample as different type of microbiota can be present across various biological specimen, such as blood, saliva or stool samples 15 . New studies showed that amplification of different variable regions of 16S rRNA genes may result in different outcome and that results may be highly variable depending on primer set used. It is well-known that full length 16S rRNA gene sequence can provide the most specific phylogenetic analysis. On the other hand, due fact that currently used MPS platforms www.nature.com/scientificreports/ are producing much shorter reads than the length of 16S rRNA genes, as well as due to economic aspect, shorter fragments of 16S are often chosen for analysis. These may include one or more specific variable regions. However, short amplicons may reduce specificity and sensitivity of taxonomic classification 16 . In silico study comparing several hypervariable regions of 16S rRNA genes indicates that V4-V6 region reflect full length sequences of 16S rRNA genes the best, while V2 and V8 are the least reliable regions 17 . Sequencing analysis with results that truly mirror the real microbiota condition are of high importance as they may help to uncover connection between microbiota changes and various diseases. Recently, a lot of associations between changed microbiota conditions and health impairment such as allergic disease and asthma [18][19][20][21] , development of diabetes mellitus 1 22 , necrotizing enterocolitis 23 , inflammatory bowel disease 24 or obesity 25,26 have been revealed. Other studies indicate that specific microbiota composition may be associated with neurodevelopmental disorders, as ASD 27 and schizophrenia 28 , depression 28 or neurodegenrative disorders as Parkinson's 29 and Alzheimer's disease 30 . Many studies revealed behaviour 31 and autism spectrum disorder (ASD) particularly 32 are linked with microbiota changes via gut-brain axis connection. According to Ho et al. 33 , there are 26 published studies regarding the microbiome composition of paediatric ASD patients. Most of them found that children with ASD have distinct gut microbiota from that in children without the condition. Unfortunately, many of these studies showed inconsistent results crosslinking only in certain taxa, including Firmicutes at the phylum level, Clostridiales clusters including Clostridium perfringens, Bifdobacterium and Prevotella. Also, often used marker Firmicutes/Bacteroidetes (F/B) ratio provides opposed results. The studies of Strati et al. 34 and Williams et al. 35 declare a significant increase in the F/B ratio in ASD subjects, some studies declare no significant changes between ASD patients and controls [36][37][38] . There are also studies that mention Bacteroides/Firmicutes ratio 27,34,39,40 being increased. One of the reasons for discordances could be, different approaches applied for the ASD microbiome characterization, including 16S rRNA V3-V4 primer pair 41 , individual variable regions of 16S rRNA 42 , combined with short-read sequencing or also pyrosequencing 43 . Differences in high-throughput techniques can also implement significant changes in microbiome composition coming from the methodological approach itself.
In our study, we tested 3 different primer sets for investigating microbiota of ASD patients. For this purpose, stool was chosen as a suitable biological sample, as easily accessible in the children with ASD. The primer sets were chosen as these are standardly used for faecal microbiota investigation. Here we present the differences in microbial composition in stool samples of paediatric ASD patients that can be implemented only by using a specific primer set towards 16S rRNA. We show diametrically different low-abundance bacterial genera detected by either of the primer set. Differences in bacterial composition can be observed even within the analysis of the same 16S rRNA variable regions with alternative primer set. Furthermore, not only the quality but also the quantity of the detected bacterial genera differed between three primer sets.

Methods
Ethical approval. This study was approved by the Ethical committee of the Comenius University Faculty of Medicine, and the University hospital in Bratislava, Slovakia and it is consistent with the 1964 Helsinki declaration and its later amendments. Parents were aware of whole design of the study and the informed consent form was signed by both (or at least one if both are not available) parents or caregivers of the corresponding child.
The authors confirm that all methods and experimental protocols were performed in accordance with the relevant guidelines and regulations and were approved by the of the committee of the Ministry of Health of the Slovak republic.
Experimental design. In this experiment, intestinal microbiota composition of 10 children with ASD was examined (samples A-J), while 3 different ways of sample processing were used and compared. This study was approved by the Ethical committee of the Comenius University Faculty of Medicine, and the University hospital in Bratislava, Slovakia and it is consistent with the 1964 Helsinki declaration and its later amendments. Parents were aware of whole design of the study and the informed consent form was signed by both (or at least one if both are not available) parents or caregivers of the corresponding child.
Faecal samples. In the study, 10 boys with ASD in age 5.00 ± 0.20 (mean ± SEM), ranging from 2.8 to 9.2 years, were included. All subjects were medication-free. The diagnosis of ASD was determined by a clinical psychologist or a psychiatrist according to ICD-10 and DSM-5. The children also underwent behavioural testing by trained examiners at the Academic Research Centre for Autism, Institute of Physiology, Faculty of Medicine, Comenius University. The diagnostic tools involved: observation of a child by the Autism Diagnostic Observation Schedule-second revision (ADOS-2) 44 and the Autism Diagnostic Interview-Revised (ADI-R) 45 , a comprehensive interview administered to parents that provides a thorough assessment of individuals with ASD. All children enrolled in the study had to meet the criteria for ASD within both autism scales.
Stool specimens were collected by parents into sterile containers, after being given a detailed explanation of the procedure and kept at 4 °C and delivered to the laboratory within 4 h. Subsamples of 200 mg of each specimen were frozen at − 80 °C until DNA extraction. DNA isolation. Total

Results
The fragment size of DNA sequencing libraries was set to 500 bp for samples amplified by primer set#1 and from 150 to 1000 bp (median 500 bp) for primer sets 2 and 3. Number of reads obtained for samples amplified by different primer sets differed from 58,103 to 214,804 in average ( Table 2).

Correlation of used primer set with the bacterial composition of the gut at phylum level.
Microbial profiles of samples were analysed at phylum level. In general, microbiota of children with ASD was characterized by high relative abundance of Bacteroidetes followed by Firmicutes and Proteobacteria. In several samples, relatively high relative abundance of Verrucomicrobia was detected, however, this was observed only in samples amplified by set#1primers. Except of Bacteria, also low relative abundance (less than 1%) of Archaea was detected in samples amplified with set#1 primers (Table 3). Differences in microbial profiles of samples at phylum level were observed depending on set of primers used for DNA sequencing library preparation (Fig. 2). Significantly higher relative abundance of Bacteroidetes was detected in microbial profiles of samples amplified by set#1primers, compared to samples amplified by set#2 and set#3 primers (p < 0.0001 for both). Difference was observed between samples prepared by set#2 and set#3 primers as well, where samples prepared with set#3 primers had significantly higher Bacteroidetes (p = 0.0025). On the other hand, in samples amplified by set#2 and set#3 primers, significantly higher abundance of Firmicutes was detected (p < 0.0001 for both). This observation is in accordance with the ZymoBiomics Gut standard sequencing using primer set#2, where higher amount of Firmicutes compared to the original relative abundance defined by the supplier has been determined compared to primer set#1, but not set#3. Significantly higher relative abundance of Proteobacteria was detected in samples amplified by set#2 primers compared to samples prepared by set#1 and set#3 primers (p = 0.0107 and p = 0.0013, respectively).

Primer pair effect on detection of microbiome composition at genus level. Major differences
were observed also in number of bacterial genera detected in samples amplified by different sets of primers (Fig. 4). The most bacterial genera (1584.7 ± 122.52) were detected in samples amplified by set#2 primers compared to set#1 (1170 ± 145.58 OTUs; p = 0.0001) and set#3 primers (1283.6 ± 166.73 OTUs; p = 0.0007).
Furthermore, we found out that the microbial composition varied not only between different samples, but it differed also within the same sample amplified with different set of PCR primers. Pairwise analysis and Spearman Rank coefficients of the 244 bacterial genera present in all 10 analysed samples (Fig. 5) showed that the bacterial composition was the most similar between primer set#2 and set#3 (average r of 0. 33) followed by primer set#1 and set#3 (average r of 0.56) and finally set#1 and set#2 (average r of 0.62), regarding the quantity of individual bacterial genera. Within the group of samples amplified with the same primer set, the lowest variability in bacterial taxa abundance was observed with primer set#3 (average of r 0.22) followed by set#1 (average r of 0.32) and set#2 (average r of 0.34). www.nature.com/scientificreports/ We observed that detected structure of the microbiota depends directly on primers used for PCR amplification (Table 4). Microbial analysis at genus level showed, that Bacteroides was the most abundant genus in 80% of samples. In the remaining 20% of samples, Prevotella or Alloprevotella dominated. However, we observed differences depending on type of PCR primers used for library preparation. In samples amplified by set#1 primers relative abundance of Bacteroides ranged from 1.7 to 63.2% (37.5% ± 19.95%) and Bacteroides was the most abundant genus in 7 of 10 samples. We also found out, Prevotella to be the most abundant in 2 samples (relative abundance 22.6% and 33.7%) and Alloprevotella in 1 sample (relative abundance 42.9%). High relative abundance of Paraprevotella 10.9% ± 6.44% was observed in all except one sample prepared with set#1 primers. When set#2 and set#3 primers were used, Bacteroides was the most abundant in 9 of 10 samples for both, and its relative abundance was 22.7% ± 10.93% and 32.6 ± 16.21%, respectively. Faecalibacterium was highly abundant in all samples and the relative abundance only slightly differed depending on primer set used for library preparation. In most of samples prepared with primer set#2 and 3 Clostridium was highly abundant, however, only in 1 sample prepared by set#1 primers was more represented. High abundance of Salmonella and Escherichia/ Shigella was found in 3 and 2 samples correspondingly, when using set#2 primers, but this was not observed in the same samples prepared by different sets of primers ( Table 4).

Analysis of the least abundant bacterial genera.
Of the whole number of bacteria detected by individual primer sets (app.1500 genera per sample), the least abundant 500 genera of each primer set group were selected. Next, only genera significantly different (p < 0,005) between the sets of taxa obtained with different primer sets (71 genera) were considered for further analysis. Comparison of these groups revealed that primer set#2 (27f/1492r) is able to detect the most of unique bacterial genera representing less than 0.0001% of the detected bacterial community (in average 62 of 71) (Fig. 6). Furthermore, it detected the same group of bacterial genera like the primer set#3. This set of bacteria is represented also by Aggregatibacter, Pseudomonas, Mannheimia, Paraglaciecola, Actinobacillus, Lucibacterium, Otariodibacter, Pseudohongiella, Candidatus Nanosalina, Psychrobium, Natronobacillus, Modicisalibacter, Chromatocurvus, Rhodothalassium, Denitrobacterium, Amphiplicatus, Aquisalibacillus, Thermoactinospora, Marispirillum, Thiopseudomonas, Ciceribacter, Thermobispora, Hop-

Detection of ASD-associated bacterial genera. Several bacterial species have been observed in previ-
ous studies to have elevated or decreased abundance when compared samples of children with ASD and neurotypical controls 27,34,35,43,[47][48][49][50][51][52][53][54][55] . We analysed ability of 3 different primer sets to detect bacterial genera often associated with ASD. We observed significantly higher ability to detect Bacteroides by set#1 and set#3 rather than set#2 (p < 0.001 and p = 0.002 respectively). There was also a trend for higher abundance of Clostridium, Corynebacterium, Lactobacillus, Coprococcus and Dialister in set#2 and set#3 prepared samples. In contrast, samples amplified by set#1 showed higher relative abundance of Akkermansia, Bifidobacterium and Prevotella. For the detection of Colinsella and Alistipes rather primer set#1 and set#3 seem to be more suitable than set#2 (Fig. 7a,b).

Discussion
Intestinal microbiota analysis of an individual may be very variable since it is influenced by many factors. For example, false differences may be introduced during sample processing, different results may be obtained when different protocols for DNA library preparation or data analysis are used. Preparation procedures as well as sequencing procedures might be very important issue when comparing results of different experiments. Therefore, unification of procedures in studying of microbiota composition plays an important role. In our study, clear differences were observed depending on type of DNA sequencing library preparation method. In samples amplified with set#1 primers, Bacteroidetes were definitely the most abundant in microbiota of children with autism. This is in line with previously published study of Finegold et al. 55 who observed Bacteroidetes to be the most abundant in autistic microbiota, followed by Firmicutes, similarly to our study. In our primer pair evaluation the closest result was obtained by amplification of the shortest PCR product, only the V4 region of 16S rRNA gene compared to primer set used in Finegold´s analysis (amplification of V4-V6 region) 55 . The primer sets 2 and 3 gave rather opposite Bacteroidetes:Firmicutes ratio or even more abundant Firmicutes using the shot-gun sequencing of total 16S rRNA gene. The used ZymoBiomics Gut Microbiome Standard enabled us to validate not only the library preparation protocol, but also the DNA isolation method as well as to filter out wet-lab and dry-lab contamination from samples. In summary, shot-gun sequencing of pcr amplicon encompassed all the bacteria included within the recommended standard including Firmicutes, Bacteroidetes, Verrucomicrobia, Fusobacteria and Proteobacteria. The only genus Bifidobacterium (Actinobacteria) was detected by V1-V9 sequencing approach with 100-times lower abundance compared to the original mock community.
Here we could speculate that it is a matter of GC content (59.2%) of Bifidobacterium adolescentis. However, the Firmicutes represented by bacterial species with wide range of GC content (Faecalibacterium 57.8%; Veillonella Figure 2. Differences in relative abundance of bacterial phyla in paediatric ASD samples amplified with different primer sets. Primer set#1 (515f/806r) adhered significantly more to Bacteroidetes than primer set#2 (27f/1492r) and set#3 (27f*/1495r). For Firmicutes detection primer set#3 was the most favourable followed by set#2. Primer set#3 represented balanced ratio of detected OTUs between Bacteroidetes and Firmicutes phyla. The highest abundance of Proteobacteria was detected by primer set#2 compared to set#1 and 3. The level of significance ≤ 0.01 was applied.   www.nature.com/scientificreports/ 39.0%; Roseburia 48.7%; Lactobacillus 52.3%; Clostridioides 28.8%) were identified almost equally to original ratio within mock community and detected without any preference for the GC content. So, we suppose that the GC content of the 16S rRNA gene could be more balanced than of the whole genome, at least for Firmicutes. What can rather be influenced is the preference of primer set#2 and set#3 for Firmicutes than for Bacteroidetes, since the GC content of the represented bacteria (Bacteroides 43.3%; Prevotella 44.4%) was higher than of some of the Firmicutes representatives or similar. The ratio of Firmicutes to Bacteroidetes was favourable for Firmicutes what matched the premixed standard (50% Firmicutes, 26% Bacteroidetes) even though the relative abundance of Bacteroidetes was lower than expected (5-times). This is an important point for data validation using qPCR methods, since it is based on short amplicons amplification, and as such opposite results can be obtained compared to 16S sequencing based on full-length 16S rRNA sequence. Real-time PCR analysis belongs still to the most widely used methods for bacterial analysis also in terms of their quantification 27,56 so careful primer set selection needs to be taken into account according to further intents. At genus level, Bacteroides and Faecalibacterium were present across all samples with high relative abundance. This is in line with previously published studies 34,55 . Except these genera, in Finegold study, Clostridium, Eubacterium, Ruminococcus, Roseburia, Akkermansia, Parabacteroides and Alistipes, were also the most abundant 55 . In another study, where bacterial 16S rRNA genes were amplified using a primer set specific for V3-V5, genus level analysis showed different top ten most abundant genera Bifidobacterium, Unknown Lachnospiraceae, Blautia, Ruminococcus, Clostridium XI, Streptococcus, Gemmiger, and Lachnospiraceae incertae sedis 34 . On contrary with previously published study, we did not confirm high abundance of Bifidobacteria and Eubacterium. For other bacteria, we observed that relative abundance was depending on type of DNA library preparation and varied among samples. However, none of above mentioned species were highly abundant in more than half samples in our study, except Alistipes which was among 10 most abundant species in 6 samples prepared by set#1primers, Roseburia highly present in 7 samples prepared by 27f*/1495r primers and Lachnospiracea with high relative abundance in 6 samples prepared by 3 set of PCR primers. www.nature.com/scientificreports/ Higher levels of Clostridium, Suterrella and Ruminococcus are considered to be associated with pathogenesis of ASD. Some previously published studies, confirmed increased abundance of Clostridia compared to healthy controls 53,54 as well as increased Suterrella and Ruminococcus 51 . In our study, we observed differences in relative abundance of Clostridium depending on type of sequencing library preparation. Clostridium was detected with high abundance only in 1 sample amplified by 1 set of primers. However, Clostridia were present in high relative abundance in samples amplified by second and third set of PCR primers. On the other hand, Suterella was not found to be among the most abundant species in any sample and Ruminoccocus was detected to be among 10 most abundant species in half of samples amplified by 2 and 3 set of primers, and only in 1 sample amplified by 1 set of primers.
Previously published studies confirmed that many factors may influence results of sequencing analysis. These include storage conditions, DNA extraction method, type of primers used for DNA amplification and sequencing, sequencing platform and chemistry, as well as data analysis, which have an impact on detected microbiota composition [57][58][59][60] . The studies also indicate, that only suitable combination of several factors may result in findings that truly reflects reality. Although several primer sets are commonly used and considered to be universal, different results may be obtained. Study comparing V4, V6-V8 and V7-V8 57 , as well as other study comparing V4-V5, V1-V2 and V1-V2 degenerate primers 58 , both confirmed, that V4/V4-V5 primers give results that are the most comparable across platforms and the least biased. On the other hand, V4-V5 regions resulted in lowest number of analysable reads compared to others. Using V4 primers (515f/806r primers) in our study resulted significantly higher Bacteroides and lower Firmicutes than by two others primer sets, as well as significantly lower abundance of Proteobacteria than detected by 27f/1492r amplification primers. Table 4. The list of the most abundant bacterial genera in samples analysed using different type of 16S rRNA primer sets. *The number of samples in which bacteria belonged to the 10 most abundant. **Relative abundance of bacteria in total microbiota of samples in which bacteria belonged to the 10 most abundant (Data are presented as average ± standard deviation). In study of Farris and Olson universal primer sets (24F and 1492R original and modified versions of primers specific for Actinobacteria) for 16S rRNA genes amplification were used. These primers were not able to amplify 20%-50% of isolates. In this study, 1492R provided more effective identification of Actinobacteria than 1492-modif 61 . In our study, we did not observe significant differences in ability of 3 primer sets to identify Actinobacteria in samples. On the other hand, it is clear, that in our experiment, Actinobacteria presented only low percentage of microbial communities. This might be supported by fact, that some bacteria especially those whose sequences include high G + C content (also characteristic for Actinobacteria) are usually identified in lower abundance than in real microbial communities 62 .
Though only minor changes in sequences of 27f*/1495r primer set compared to standard set of 27f/1492r primers, in several phyla significant differences in microbiota composition were detected. We found out significantly higher relative abundance of Bacteroidetes and significantly lower relative abundance of Proteobacteria when samples were amplified by modified (27f*/1495r primers) compared to samples amplified by standard primers (27f/1492r primers). Interestingly, when it comes to microbial composition on genus level, only minor differences were observed between samples amplified by set#2 and set#3 of primers.

Conclusion
We have found out, that the primer set 27f/1492r (primer set#2) can provide much wider information regarding the diversity of samples from ASD children and can provide deeper insight into intra-individual variability. On the other side, the 16S rRNA V4 region can provide very homogenous information on bacterial composition typical for ASD patients what in comparative studies is an invaluable tool for discriminative analysis. Here we prove also that the selection of 16S rRNA variable regions for analysis matters more than the primer set itself in terms of bacterial semiquantitative analysis, based on the comparative analysis of primers 27f*/1495r (set#3) and primers 27f/1492 (set#2). However, the most interesting is the finding, that the Firmicutes/Bacteroidetes ratio  Figure 6. Comparison of the detection capability of three primer sets based on the least abundant bacterial genera with significantly different abundance among groups. For data analysis student´s t-test was performed. Bacterial genera with significantly altered abundance between compared datasets (p ≤ 0.005) were visualized using heatmap.