Tendentious effects of automated and manual metagenomic DNA purification protocols on broiler gut microbiome taxonomic profiling

Here, we developed protocols to improve sensitivity, rigor and comparability of 16S rRNA gene amplification-based next-generation sequencing (NGS) results. A thorough study was performed by evaluating extraction efficiency with respect to the yield, purity, fragmentation of the purified DNA, and sequencing metrics considering the number of quality reads, amplicon sequence variants (ASVs), community structure and biodiversity. We identified batch-effects that significantly bias broiler gastrointestinal tract (GIT) community compositions and made recommendations to improve sensitivity, consistency, and cross-study comparability. We found that the purity of the extracted nucleic acid had a strong effect on the success rate of downstream library preparations. The preparation of stool bacterial suspensions from feces showed a significant positive influence on community biodiversity by enriching Gram-negative bacteria and cataloguing low abundant taxa with greater success than direct processing of fecal material. Applications relying on the automated Roche MagNa Pure 24 magnetic-bead based method provided results with high consistency therefore it seems to be the optimal choice in large-scale studies for investigating broiler GIT microbiota.

General description of sequencing results. 16S rRNA gene based (V3-V4 region) amplicon sequencing was carried out on Illumina MiSeq platform generating a total of 13.257.362 reads with the mean read count of 138.097 ± 31.408 (mean ± SD) reads per sample in a range of 77.997-382.901 reads. Quality filtering with the dada2 software resulted an average denoised read count of 70.987 ± 17.676 per sample and after the merging process the read count dropped to an average of 63395 ± 16586 reads per sample. At the end, the average of non-chimeric reads was 45748 ± 11107 per sample.
DNA yield, quality, sequencing metrics and biodiversity. Several DNA extraction protocols have been tested for fecal microbiota community profiling. In 16.6% of the purified samples PCR amplification was inhibited. Importantly, in 87.5% (14 of 16) of these samples the phenol-chlorophorm method was applied (Fig. 2a). No apparent correlation was observed between DNA concentrations and the number of amplicon sequence variants (ASVs), whereas data rankings showed an obvious proportionality between the ASV numbers and the Shannon's diversity indexes (Fig. 2b, Table S2). Based on these results, MagNa Pure 24 -Pathogen 200 Protocol seems to be an ideal choice to extract community DNA from bacterial suspensions and achieve high resolution of community composition with rare taxa.
Sample homogenization and DNA purification have significant influence on DNA yield. The effects of sample homogenization, bacterial cell lysis and DNA purification methods on DNA concentrations are shown in Fig. 2c. For all tested parameters, the average concentration, quality and integrity of purified DNA were established. The highest DNA concentrations were obtained by feces homogenization by stirring (RS). In the case of RS, an average of ~1.5-fold increase in DNA yield was observed in comparison to bacterial cell suspensions (BS). Mixed lysis proved to be more effective than bead-beating resulting in ~1.57-times higher amount of nucleic acids. Conventional DNA precipitation methods resulted in the highest DNA yield (P < 0.0001) compared to the magnetic bead-based (3-fold increase) and silica column based (56-fold increase) methods. However, poor DNA quality (A 260 /A 230 1.3 ± 0.55) was observed in the majority of samples. The MagNa Pure 24 robotic platform managed to extract a large amount of good quality DNA, as indicated by optimal A 260 /A 280 1.8 ± 0.21-1.89 ± 0.2 absorbance ratios. The Qiagen kits were also able to generate acceptable values in the range of 1.8-2.2. No correlation was found between the fragmentation status of the metagenomic DNA (i.e. high molecular weight -HMW (DIN ≥ 4.5) versus fragmented low molecular weight -LMW (DIN < 4.5)) and yield or purity.
The quantity and integrity of extracted nucleic acids have no effect on library preparation.
Standard protocols strongly affect sequencing metrics and observed biodiversity. We evaluated the impact of 16 standard metagenome DNA isolation protocols [S1-S8 (RS), S9-S16 (BS)] on sequencing metrics (Supplementary Table S1). S1, S9, S2, S10 designate MagNa Pure 24 robotic methods, S3-S5, S11-S13 represent Qiagen kit-based manual methods, and S6-S8, S14-S16 denote conventional, phenol-chlorophorm DNA purification methods. Number of reads, non-chimeric reads, ASVs and community diversity metrics; Faith's PD, Chao1 index, Shannon's diversity indices, Simpson's evenness are shown in Fig. 3. Conventional approaches coupled with BS consistently resulted in the highest community diversity especially when applying chemical (S15) or mixed (S16) lysis. Regarding the RS samples, S7 surprisingly performed the worst, whereas the highest biodiversity was achieved by S1, which at the same time generated the lowest read counts. This paradox was also observable in the case of S16 providing the second highest biodiversity and the lowest read counts. In the case of the BS samples, conventional DNA extraction approaches and the MagNa Pure 24 method with Pathogen 200 protocol resulted in high diversity estimates, while the commercial QIAmp kits S11, S12 recovered the least diverse communities. The Qiagen DNeasy Power Soil Kit produced similar diversities regardless of the sample homogenization method.
Violating variations in overall taxonomic profiles. The effects of sample homogenization (RS vs. BS) standard metagenome isolation approaches (S1-S8 vs. S9-S16) on observed community composition are shown in Fig. 4. Differences of normalized abundance data were attained by considering 8 phyla, 15 classes, 29 orders, 47 . Direct lysis of the biologically diverse faces (raw specimen: RS) has the power to get access to both planktonic and sessile cells. Processing bacterial suspension: BS obtained by multiple washing steps free-floating bacteria can be separated from the indigestible compounds of the feces and manure particles. To compare the efficiency of lysis methods in disintegrating Gram-positive and Gram-negative cell boundaries, bead mill (L1), chemical cell disruption (L2) based techniques and the mixture of those (L3) were performed. To separate nucleic acids from sample lysates robotic magnetic bead (P4), manual silica column (P5) based commercial and manual nucleic acid precipitation based conventional (P6) techniques were used.
www.nature.com/scientificreports www.nature.com/scientificreports/ families, 72 genera and 22 species. Graphical representations were done by calculating the log2 differences in taxa abundances between RS -BS linked metagenome isolation approaches. Taxa with relative abundances lower than 1% were discarded from the analysis. Remarkably different taxonomic profiles were obtained when comparisons (a) On the Y1 axis of the DNA integrity (DIN) data are displayed for every sample elutes (A1-H12) sequestering high molecular weight (HMW) and low molecular weight (LMW) samples. Display of genomic DNA integrity (DIN) was done automatically by the TapeStation Analysis Software on the basis of the electropherogram intensity of the samples. Y2 axis represents total DNA yields (ng/µl). Total DNA extraction yields presented in ng/µl (bar-charts) of DNA per sample. Y3 axis shows the respective absorbance ratios A 260 /A 280 and A 260 / A 230 (line charts). Y4 axis presents sequencing metrics (number of identified genus, number of ASV). Y5 axis displays post-sequencing ecological inferences (Shannon's diversity and species richness). (b) Pyramid plots display the results of rankings of every single method considering DNA yield in relation with the number of ASVs and obtained Shannon's diversity indices. (c) Reports of DNA concentration and quality metrics from the point of view of the technical influential variables. Data shown are mean values with standard deviations. Violin plots represent the distribution of HMW vs. LMW DNA bands. (d) ROC represents the influence of extraction efficiency and suitability of the isolated nucleic acids quantities in downstream library preparation. (e) Significant differences observed in parameters between sample homogenizations. Asterisks report statistical significance; *P ≤ 0.05, **P ≤ 0.01, ***P ≤ 0.001, while "ns" stands for non-significance.
were made between subsets of RS vs. BS samples. This may be due in part to elevated concentrations of PCR inhibitors and nucleases in samples obtained from raw specimens compared to bacterial suspensions.

Impact of batch-effects on beta-diversities.
Principal coordinate analysis (PCoA) resulted in five cluster groups (cluster1 -cluster5) with different spatial ordinations. Different profiles were delineated in the case of BS (cluster1, 2, 3) in comparison to RS (cluster4, 5) samples (Fig. 5a). Dissimilarities in community compositions largely arise from the depletion of residents of sessile biofilm communities associated with solid organic particles in feces. In general, lysis protocols had no profound impact on the β-diversity profiles (Fig. 5b), except for MagNa Pure 24-based applications, where bead-beating based (cluster1) vs. mixed lysis techniques (cluster2, 4) showed a strong influence on differences in community compositions (Fig. 5b,c). Cluster3 displays a strong correlation between bacterial suspensions and DNA precipitation methods indicating a very high consistency. The choice of sample homogenization was not relevant in the case of silica membrane-based methods -P5 (Fig. 5a,c). Samples corresponding the robotic MagNa Pure 24 method were split into two subclasses with regards to sample homogenization (cluster1, 2 vs. cluster4), from which cluster2 exclusively represents bacterial suspensions where DNA was isolated with the Pathogen 200 protocol (Fig. 5c). Our observations based on PCoA plots were statistically strengthened by computing distance-based dissimilarity matrices to unravel protocol effects with significant influence on overall community variations (Fig. 5d). Bar lengths indicate within group compositional differences while between sample distances were calculated on the basis of quantitative (Bray-Curtis, weighted UniFrac) and qualitative (Jacquard, unweighted UniFrac) dissimilarity-based statistics.
Detailed evaluation of the bacterial cell lysis protocols on DNA yield, purity and community diversity. We compared bead mill based and chemical lysis methods and estimated their effects on DNA yield, purity, downstream PCR amplification and biodiversity. Alterations in mechanical lysis techniques did not produce statistically significant differences (Fig. 6a). A remarkable gain in DNA amount was observed when samples were lysed by bead mills (326 ± 110 ng/µl) relative to chemical lysis (101 ± 148 ng/µl). To achieve chemical lysis, ATL, BLB buffers and the inhibitor absorbent InhibitEX Tablet were used (Fig. 6b). We found a significant positive association between the application of the ATL buffer and the DNA purification efficiency. Pretreatment with InhibitEX Tablet resulted in purified nucleic acids with reasonably good quality showing an overall positive effect on the purity and success rate of downstream PCR reactions, while exerting a detrimental effect on DNA yield. One can assume that nucleic acids prone to adsorb to the particles may be eliminated by the centrifugation step. Stool samples subjected to InhibitEX Tablet showed significantly lower Shannon's diversity indexes, which may be associated with bacterial DNA loss. The use of both ATL and BLB buffers resulted in high quality and, yield of DNA. Strong positive correlation was observed between Shannon's diversity and the application of the BLB and ATL.
Distortions in the core microbiota. To assess the effects of technical variables, core microbiomes were constructed by considering taxa represented in at least 50% of all sample aliquots. Complex comparisons in core bacteria distributions were made on all taxonomic ranks except species (Fig. 8). Different methods of sample homogenization resulted in pronounced differences in core microbiota abundances. Although the majority of the core microbiota fall into a limited number of phyla and classes, their relative proportions showed remarkable alterations (Fig. 8a). Four phyla (Bacteroidetes, Firmicutes, Tenericutes, Proteobacteria) and four classes (Bacilli, Bacteroidia, Clostridia, Gammaproteobacteria) displayed large differences in relative abundance. Fold changes of log2 abundance ratios (RS/BS) were used to estimate over-and under-representations of taxa (  . Remarkable abundance variations were observed in community structures. Composite heat map was created to pronounce the distortions in taxonomic profiles on the basis of normalized abundance data associated to eight standard metagenome DNA isolation methods for comparison of the impact of two different sample homogenization strategies (RS vs. BS). Extents of differences are illustrated on the composite heatmap with gradient colors where red scale represents dominance of taxa owed to processing raw specimens homogenized due to stirring; log2(RS/BS) >0 whereas blue scale represents the overrepresentation of taxa associated to bacterial suspensions; log2(RS/BS) <0. Significant shifts in abundances of the 143 taxa affected by the batch-effects can be observed.
Technical artifacts and in silico taxonomic classification influence the proportion of quality reads and species resolution. In addition to DNA extraction methods, technical aspects of in silico bioinformatic analyses also contribute to biases. We performed a comparison of metabarcoding analyses by using two sequencing databases; GreenGenes (GG) and Silva (S). At the phylum, class and order taxonomic levels, more than 99% of the reads have been successfully classified irrespective of the reference datasets. On average, Silva was able to rank more reads 97.95 ± 2.03% at the family level in comparison to GreenGenes (90.28 ± 2.84%) (Supplementary Table S4). The difference was more pronounced at the genus level (S: 95.99 ± 1.42 vs. GG: 78.86 ± 5.8%). As shown in Fig. 9a, the choice of sample processing (BS vs. RS) did not have a significant effect on the proportion of ranked reads (family: 98.91 ± 1.19%, genus: 97.05 ± 2.95) of the two taxonomic ranks. With Silva, the choice of metagenomic DNA purification method did not affect ranked at the family and genus levels significantly (94.67 ± 5.33%). Remarkable differences were observed between species ranking capacity of the two databases. On average, 87.12 ± 3.86% of the species were not traced by Silva, whereas GreenGenes was able to identify 51.74 ± 8.22% of the reads. The higher proportions of ranked reads at the species level showed no correlation with the number of the identified species (Fig. 9b). Silva and GG were able to identify a total of 87 species, but only 13 were identified by both databases. In 85.05% of the cases S and GG identified different species. Figure 9c represents the 30 most abundant species, of which only 4 species (Lactobacillus vaginalis, Lactobacillus salivarius, Lactobacillus pontis, Bacteroides uniformis) were detected by both S and GG. In these 4 cases, similar abundance values were calculated with the two databases. The use of GG led to significantly higher abundance estimates for Streptococcus alactolyticus, Lactobacillus helveticus, Faecalibacterium prausnitzii, Escherichia coli, Barnesiella viscericola and Alistipes finegoldi.
Community taxonomy compositions suffer from biases of different kinds. We quantified the cumulative effects of protocol variables sample processing (H; homogenization), lysis (L) and DNA purification (P) by assessing alpha-diversity distributions (Fig. 10a). Chemical lysis (L2) combined with MagNA 24 Nucleic Acid Purification (P4) and conventional precipitation-based techniques (P6) followed by bead-beating (L1) proved to be relatively robust to sample processing (BS vs. RS), but chemical lysis (L2) coupled with conventional precipitation-based methods (P6) was highly sensitive to different of homogenization. www.nature.com/scientificreports www.nature.com/scientificreports/ To address the aggregate effects of technical variables on community taxonomy data heat trees were generated (Fig. 10b). Pairwise comparisons between parameters provide insights into the taxonomy shifts. To visualize differences between the massive data sets hierarchical structures of taxonomic classifications were developed, where the size of nodes and edges correlate with the abundance of clades. The taxonomic heat-trees reveal taxonomic lineages the detection of which may be challenging in the presence of a particular variable. Our analysis demonstrates, that the choice of sample preprocessing is of particular importance for the discrimination of lineages with relatively large nodes in orange (p_Bacteroidetes, p_Proteobacteria, c_Bacteroidia, c_Gammaproteobacteria, o_Bacteroidales, o_Enterobacteriales, f_Ruminococcaceae) and blue (p_Firmicutes, c_Bacilli, o_Lactobacillales, f_Lactobacillaceae, f_Lachnospiraceae). Grossly differences were observed between lineages from the aspect of the DNA preparation methods.
In the present study we evaluated and compared parameters of metagenome DNA purification protocols by focusing on some of the methodological aspects known as protocol batch-effects including sample www.nature.com/scientificreports www.nature.com/scientificreports/ homogenization, lysis and DNA purification. Of note, our study did not evaluate the impact of alternative stool collection procedures with regards to starting volume, nucleic acid stabilizers, transportation, etc. and the effects of commonly used storage preservatives. By obtaining multiple sample aliquots of the same biological specimen (Fig. 1) we carried out parallel metagenomic DNA extractions and performed paired-end sequencing of the bacterial V3-V4 hypervariable regions of the 16S rRNA gene.
Inflated diversity can arise from ignoring the significance of reagent microbiome ("kitome"), which can greatly impact both 16S rRNA gene surveys and shotgun metagenomics due to the presence of contaminating bacteria in DNA extraction kits [38][39][40][41] . Its effect is ubiquitous and can vary between different lots of the same kit 41 . The empirical assessment of background contamination can be especially relevant and challenging in the case of low biomass samples. However, as DNA extraction kits introduce biases that affect all the samples equally, these biases can be controlled and mitigated by the use of the same batch of reagents and by including proper negative ("blank-swab", "blank-extraction", "blank-library") and positive ("mock communities") technical controls 40,41 .
We observed a considerable variation in the amount and fragmentation of community DNA resulting from protocol variables (Fig. 2). DNA shearing was found to be inconsequential to library preparation likely due to the use of relatively short amplicons. Importantly, we observed opposite effects of DNA quantity and quality on  www.nature.com/scientificreports www.nature.com/scientificreports/ the success rate of downstream PCR amplification. On average, the conventional phenol-chlorophorm method yielded the highest gDNA concentrations and the worst purity resulting in a higher tendency for abortive amplicon PCR. This effect may be explained by elevated amounts of PCR inhibitors such as lipids, residual RNAs, polysaccharides, proteins, detergent contamination etc. To mitigate the effects of contaminants, additional bead-based DNA purification is strongly recommended. Our conclusion contrasts widely used metagenomic DNA library preparation protocols, which aim to maximize DNA concentration and maximize fragmentation 31 .
Based on the evaluation of eight standard metagenome DNA extraction methods we conclude that sample homogenization is the variable of greatest effect. We found that the method of choice has strong effects on A 260 / A 280 absorbance ratios, number of quality reads, ASVs, observed biodiversity, and is the source of remarkable variability in community taxonomic profiles at every taxonomic level (Fig. 3). Among the DNA isolation methods tested, MagNa Pure 24 robotic isolation is the best to ensure the highest community diversity.
Standard methods (commercial silica membrane-based platforms) based on the use raw feces specimens (RS) and bead-mill lysis (L1) introduce bias in favor of the Lactobacillales and have a detrimental effect on estimates of community biodiversity. We believe that this is because biodiversity is proportionally influenced by the equilibrium of ASVs, with the abundant taxa generally increasing Simpson's evenness thus decreasing Shannon entropy (Supplementary Fig. S2). Gram-positive Bacilli typically enrich biofilm forming clades, which show significantly higher abundance in RS samples likely due to physical rather than technical reasons. Sessile bacteria attached to manure particles have been depleted during multiple washing steps.
Efficient lysis of bacterial cells with the removal of exo, and endonucleases and PCR inhibitors are critical for the proper purification of total nucleic acids from Gram-positive and Gram-negative stool-bacteria with heterogeneous cell wall structures. In search of the optimal DNA extraction method, we tested common bead-based and chemical lysis techniques and their combination (Fig. 6). With respect to bead mill-based lysis our results conflict with previous studies reporting significant loss in nucleic acid molecules bounded to the particles 42,43 . Our study demonstrated that bead-based lysis did not reduce, but in fact significantly enhanced DNA yield. Furthermore, this lysis method was not associated with a significant increase in alpha-diversity. Finally, our analyses suggest that the application of InhibitEX tablet beneficially influenced DNA purity, while significantly (P ≤ 0.001) decreasing DNA yield 9,20,22,31,44 . Using raw feces specimens, we observed no differences between lysis methods with respect to Gram distribution (G-positive: 83% vs. G-negative: 17%). However, in the case of bacterial suspensions mixed lysis was associated with the highest relative Gram-positive abundance (66%).
We also quantified and compared the effects of variables on distortions between communities. Stirring-based homogenization resulted in the over-representation of Gram-positive and under-representation of Gram-negative communities (P ≤ 0.001) (Fig. 7). These distortions are likely related to the differences between the growth modes of sessile (in particular Enterococcus, Staphylococcus of the Gram-positive phyla Firmicutes) and motile planktonic bacteria, and their pili and fimbriae colonizing biotic and abiotic surfaces of the poultry manure. We conclude that there is a strong positive correlation between the fraction of Gram-negatives and observed community diversity. This conclusion is consistent by our observation that bacterial suspensions are associated with significantly higher (P < 0.001) community diversity considering Shannon's diversity, Faith's, Chao and Simpson's indexes, and allow better differentiation of Gram-negative taxa. However, our observations contrast with previous studies hypothesizing that lower relative abundance of Gram-positive bacteria would result in decreased diversity 31,45 . Nevertheless, handling with bacterial suspensions is a procedure which can be standardized, sequestering a remarkable amount of the biotic (plant debris) and abiotic contaminants of the fecal matter. In addition, cell counts can be easily estimated so making results more easily comparable being conducive in multi-center consortia studies. www.nature.com/scientificreports www.nature.com/scientificreports/ From the point of view of extraction quality and protocol transferability the robotic MagNa Pure 24 Pathogen 200 and Pathogen 1000 isolation protocols proved to be superb, showing on average the highest biodiversity metrics (Faith's PD 22.31 ± 2.12, Chao1: 439.98 ± 61.94, Shannon's index: 7.50 ± 0.32). Furthermore, the phenol-chlorophorm DNA extraction and the MagNa Pure 24 200 protocol yielded the more reproducible results revealing highly diverse estimates on the stool microbiota. To our knowledge this was the first case implementing the robotic Roche MagNA Pure 24 DNA purification system (P4) for metagenomic purposes. Silica membrane-based DNA purification methods were associated with the lowest DNA concentrations, but good quality. They obtained high number of classified reads (family 95.04%, genus 94.79%) transmitting predominantly the most abundant taxa contributing to a general decline in community biodiversity (Fig. 9).
We compared the power of the of two in silico metabarcoding approaches (GreenGenes vs. Silva) to understand complex and taxonomically diverse samples. One obvious virtue of SILVA is that it is regularly curated while GreenGenes has not been updated since 2013. Remarkable inconsistency (85.05%) was shown between the results of the two sequencing annotation databases. This can be explained by the disadvantage that the V3-V4 variable regions of the 16S rRNA phylogenetic marker genes used as phylogenetic markers are very similar in sequence between non-closely related species 44,45 . This means that the lack of resolution of the different taxonomy datasets at species level is a biological issue which either classify the 16S rRNA genes correctly (true positive), do not classify them (false negative) or classify them wrongly (false positive) [46][47][48] .
With this methodological study we also catalogued the GIT microbiome of Gallus gallus domesticus, Ross 308 hybrid. Based on our observations, the phylum Firmicutes was overrepresented (70.34%), followed by Bacteroides  Table S3). These observations were not in accordance with the results of Lu et al. 53 , revealing Clostridiaceae as the most dominant family in broiler fecal samples. Further considerable differences were found in the distribution of the genera in comparison to other studies. As an example, Mohd et al. found that 52 , the most abundant genus was Clostridium (47-70%) and Bacteroides (2-20%) with low relative occurrence of Lactobacillus (<4%). We obtained relatively high read counts referring to the genus Lactobacillus (35.46%) samples achieving also low read counts for the genus Clostridium (<2%). Bacteroides also was more abundant in RF (5.64%) and BS sample (14.1%). Beside the methodological aspects the revealed taxa abundance differences can originate from a number of environmental (location and diet, etc.) and host (genetics and age, etc.) factors.
The discovery of the complex microbial communities depends heavily on the choice of the batch-effects of metagenome purification protocols often with a trade-off between successful taxonomic resolution. We unraveled tendentious effects of protocol batch-effects associated to approaches frequently used in the microbiome field. a) While examining the yield and the quality of the DNA purified from stool, we found that A 260 /A 280 ratio proved to show a positive while the yield a negative correlation with successful downstream PCR amplification. We delineated that b) appreciable technical differences were generated by sample homogenization and DNA separation methods eventuating in solid β-diversity profiles while the lysis parameters represented only mild effects on compositional shifts. A strong positive c) association was also conveyed between community biodiversity and the application of robotic MagNa Pure24 or the conventional phenol-chlorophorm nucleic-acid based purification techniques. Across all other methods tested hereby the d) silica column-based applications seemed to be highly reproducible yielding the less diverse estimations. e) We demonstrated, that the species-level identification depends principally on the applied taxonomic database. f) We demonstrated that the cumulative bias exerted by the interaction of metagenomic protocol choices yields highly divergent results. g) We found that MagNa Pure 24 automated platform can be highly recommended for metagenomics solutions to achieve sufficient reproducibility while tracking large-scale microbiome shifts especially in multi-center consortia studies. Nevertheless, it proved to be prominent for the in-depth analysis of the microbiota communities as well.
All of our observations were taken by testing broiler stool samples on the basis of scarcity of standards to process livestock stool samples during metagenomic studies. We believe however, that some of our results can also serve as a guideline in optimizing DNA purification protocols for a wide range of samples.

Materials and Methods
Ethics approval. The study was approved and carried out in accordance with the local ethics committee's guidelines of the University of Debrecen under the registration number: (DEMAB/12-7/2015). Sample collection. Broiler (Gallus gallus domesticus, Ross 308 hybrid) fecal samples were collected from the premise of Nagisz Zrt., Hungary, Hajdú-Bihar County; 4181, Nádudvar; Fő út 119. Individual fecal samples were collected (5 g of each) from eight randomly selected poultry; 4 pullets (female) and 4 cockerels (male). The freshly collected stool samples were stored in specific, sterile, DNase free stool transportation bowls and stored at 4 °C for maximum 3 hours. A composite fecal sample was made by pooling the individually collected feces and thoroughly mixed.
Sample homogenization. Prior to lysis raw stool specimens (RS) were either homogenized via stirring manually with sterile stirrers or bacterial suspensions (BS) were obtained through multiple feces washing steps (Fig. 1). In the case of RS samples 250 mg raw specimens were further processed. To obtain BS 15-15 g specimens from different portions of the broiler stool specimens were transferred into 50 ml sterile, safe-lock-cap falcon tubes and 15-15 ml of PBS buffers were added. Samples were homogenized for 15 min (vortex at 350 RPM) and centrifuged for 5 min at 500 × g. These washing steps were repeated three times. Supernatants were collected and centrifuged at 13.000 × g for 20 minutes to pellet the bacterial cells. Finally, the supernatants were discarded and the pellets were dissolved in 3 ml PBS. And in every case 250 µl of BS was used for metagenomic DNA purification.
Cell lysis. In the case of bead-beating three different, commercially available bead mills (Fig. 1)  In the case of the conventional DNA purification technique (P6) bacterial DNA was isolated using phenol/chloroform/isoamyl alcohol method. Briefly, 800 µl of sample lysates was added to 800 µL of Ultrapure TM phenol:chloroform:isoamyl alcohol (25:24:1) mixture (Thermo Fisher Scientific, Maryland, USA), and vortexed thoroughly for approximately 20 seconds. After incubation at room temperature for 3 minutes the lysate was centrifuged for 10 minutes at 16.000 × g. The upper aqueous phase was carefully removed and transferred into a new Eppendorf tube. For ethanol (EtOH) precipitation, 1 µl of 20 µg/µl glycogen, 7.5 M NH 4 OAc (ammonium acetate in 0.5 × volume of the sample) and 2.5X volume of 100% EtOH were added to the supernatant. The sample was placed at −20 °C overnight to precipitate the DNA from the sample. The sample was then centrifuged for 30 minutes at 16.000 g at 4 °C to pellet the DNA. The supernatant was carefully removed without disturbing the pellet. 500 µl 70% EtOH was added to the sample and shaken by hand approximately for 20 seconds. The sample was centrifuged at 4 °C for 5 minutes at 16.000 x g and the supernatant was carefully removed. This washing step was repeated 2 times. The DNA pellet was dried at room temperature under laminar air flow, and then the DNA pellet was dissolved in 30 µl of nuclease free water.
Negative and positive controls. To minimize infections and laboratory contamination sterile surgical gloves and face masks (for collecting samples) were used and all DNA extraction steps were performed with sterile or sterilized equipments in a class II laminar air-flow cabinet. Negative isolation control (NIC) experiments were simultaneously conducted by substituting samples with PCR grade water. Elutes of the NIC samples were conveyed for V3-V4 amplicon -PCR and indexing performed under DNA free UV sterilized AirClean ® PCR workstations/cabinets. At each stage the PCR clean-ups of the library preparation NIC amplicons were also validated on 4200 Tape Station System (G2991AA, Agilent Technologies; Santa Clara, California, United States) by Agilent D1000 ScreenTapes (5067-5365) and Agilent Genomic DNA reagents. Host background nucleic acid contaminations were also detected by conducting real-time PCR reactions using GAPDH assay on eluted gDNAs. For measuring the overall quality of Illumina MiSeq paired-end (PE, 2 × 301 nt) sequencing runs 5% PhiX spike-in quality control (PhiX Control Kit v3 -FC-110-3001) was used.  Supplementary Fig. S1 with adaptor sequences (i7-N7xx-12 items, i5-S5xx-8 items)) was performed using the Illumina Nextera XT Index Kit (FC-131-1001/2). PCR cleanups and amplicon size selections were carried out with KAPA Pure Beads (KAPA Biosystems) based on the technical data sheet (KR1245 -v3.16) of the manufacturer resulting in final ~580-630 bp libraries. Every time, verifications were done with PCR Agilent D1000 screen tapes (5067-5582) and D1000 Reagents (5067-5583). The 16S amplicon libraries for each sample were quantified with qPCR, normalized with respect to amplicon sizes and pooled into a single library in equal molar quantities. Finally, 5 µl of pooled 4 nM DNA library pool was prepared for sequencing on Illumina MiSeq platform. The library pool was denatured with 0.2 M NaOH and diluted to 10 pM final concentration. Sequencing was carried out with MiSeq Reagent Kit v3-618 cycle (MS-102-3003) following manufacturer's protocols (Illumina, Inc., San Diego, CA, USA).
Sequencing read preparation for downstream analysis. Paired end reads were demultiplexed by the integrated software of the Illumina MiSeq sequencing machine 54 . The FastQ files were imported into the Qiime 2 (ver 2019.7) pipeline (https://qiime2.org/) according to the "Atacama Soil microbiome" tutorial 54 . The presence of adapter sequences (CTGTCTCTTATACACATCT) was checked and trimmed from the 3′ end of the reads with Cutadapt Software integrated in the Qiime 2 pipeline. Quality trimming was performed by using the DADA2 software 55 and the denoising parameters were set as follows: for the forward reads 13 bases were cropped from the start and the length was set to 290 bases; for the reverse reads 8 bases were cropped from the start of the reads and the length was set to 280 bases.
Alignment. Multiple sequence alignment was performed with the Mafft software 56 , and reads were taxonomically classified using Naïve Bayesian classifier trained with the Greengenes (ver13_8) and Silva (ver132) 57 reference databases by selecting mapping points according to the forward-reverse primer set that was used for amplifying the 16S V3-V4 regions of the bacterial community (341 F, 806 R). Phylogenetic tree was constructed with FastTree plugin 58 . Biodiversity analysis. Alpha and beta diversity tests were performed in the QIIME2 pipeline. For sample normalization a 25 000 read depth was set. However, alpha rarefaction analysis showed that 5 000 reads could be enough to maximize the diversity metrics. For alpha diversity Shannon's index 59 , Faith's phylogenetic diversity index 60 , Simpson evenness 61 , and Chao1 index 62 was calculated in the QIIME pipeline. Beta diversity analysis involved measuring weighted/unweighted UNIFRAC distances 63 and Bray-Curtis dissimilarities 64 . For visualization of beta diversity matrices PCoA plots were generated using the Emperor 65 plugin. Alpha diversity differences were measured using the Kruskal-Wallis test. Beta diversity group significances were calculated with Permutational multivariate analysis of variance (PERMANOVA) pseudo F statistical test 66 . Data visualization. For data mining (preparation) the QIIME2 pipeline was used. QIIME artifact files were exported from the pipeline resulting BIOM files that were converted to TSV files which were used with different visualization packages. Heatmaps were generated in Python (ver3.6.5) with Seaborn package 67 . Bar charts, boxplots, pie charts, polar plots, bubble plots were constructed with R programming language 68 using the ggplot2 package 69 . Differential heat tree was created with the metacoder R package 70 . In the case of community heat-trees significant differences were determined using a Wilcox rank-sum test followed by a Benjamin-Hochberg (FDR) correction for multiple comparisons 70 .

Data availability
All sequence data used in the analyses were deposited in the Sequence read Archive (SRA) (http://www.ncbi.nlm. nih.gov/sra) under PRJNA533250. Sample IDs, meta data and corresponding accession numbers are summarized in Supplementary Fig. S1.