DNA metabarcoding reveals the dietary composition in the endangered black-faced spoonbill

Extensive loss of natural wetlands caused by changes in land use largely diminishes the food resources essential for the survival of migratory waterbirds. Globally, the decline in waterbird populations in East Asia is the most serious, with 64% of these populations showing a decreasing trend. In this study, we applied DNA metabarcoding to examine the spatiotemporal variations and diversities in the dietary compositions of migratory waterbirds in a natural/artificial wetland complex in Asia. By investigating 110 fecal samples from the endangered black-faced spoonbill (Platalea minor) wintering in the wetland, our results show that P. minor had a broad dietary spectrum. The birds fed on at least 26 species in the classes Actinopterygii and Malacostraca, with Mugiliformes, Cichliformes, and Gobiiformes being the main taxa in their diets. Our results also demonstrated clear patterns of the spatiotemporal variations between the roosting groups and intraspecific variations between the individuals, which potentially reflect some of their feeding habits, and the probable usage of different habitat types in the wetland complex. Using high-throughput sequencing, we were able to elucidate the food resources that are critical to P. minor non-invasively, this method can also be used to provide invaluable information for the conservation of many other waterbird species.


Preparation of mock communities
We obtained eight specimens of aquatic species, including six species in class Actinopterygii, one species in class Malacostraca, and one species in class Gastropoda, from local food market (Supplementary Table S12). We used the E.Z.N.A. Tissue DNA Kit (Omega Bio-tek, USA) to extract gDNA from the fresh tissues of these species. Five mock communities (MC0-MC4) were prepared according to the Supplementary Table S12, which shows the species compositions of each mock community. MC0 was prepared by mixing 5mg (wet weight) of tissue from each of the included species for DNA extraction, whilst MC1-4 were prepared by mixing 15ng of gDNA from each of the included species. The number of species included in each of these mock communities ranged from five to eight, these numbers of species were in a similar range to the numbers of prey species detected in individual P. minor samples.

Preparation of libraries through 2-step PCRs
For 18S libraries, 1 st step PCRs were carried out in 25 μl reactions containing 5 μl of 5X GoTaq Flexi Buffer, 0.5 μl of 10 mM dNTP Mix, 3 μl of 25 mM MgCl2, 0.25 μl of 5 U/μl GoTaq G2 Flexi DNA Polymerase, 5 ng of extracted DNA, 0.5 μl of each assigned 10 μM forward and reverse primer uniquely tagged with heterogeneity spacer (Cruaud et al. 2017), 5 μl of 10 % DMSO (Sigma), 0.125 μl of 20 mg/ml BSA (NEB) and ultrapure water. Thermal cycling condition was 95 °C for 2 min; 25 cycles of 95 °C for 30 sec, 65 °C for 30 sec and 72 °C for 30 sec; and final extension at 72 °C for 5 min. We used the PureLink PCR Purification Kit (Invitrogen, Carlsbad, CA) to clean up all PCR products generated from the two-step PCRs. Each of the 1 st step PCR products was eluted in 20 μl elution buffer and used as the templates for the 2 nd step PCRs. 2 nd step PCRs were performed in 45 μl reactions comprising 9 μl of 5X GoTaq Flexi Buffer, 0.9 μl of 10 mM dNTP Mix, 5.4 μl of 25 mM MgCl2, 0.225 μl of 5 U/μl GoTaq G2 Flexi DNA Polymerase, 20 μl of 1 st step PCR products, 0.9 μl of each assigned 10 μM forward and reverse primer tagged with Illumina adapter and unique multiplexing index (Cruaud et al. 2017), 4.5 μl of 10 % DMSO (Sigma) and ultrapure water. PCR condition was 95 °C for 2 min; 10 cycles of 95 °C for 30 sec, 55 °C for 30 sec and 72 °C for 30 sec; and final extension at 72 °C for 5 min. For 12S libraries, we used a high fidelity polymerase for the 2step PCRs. The 1 st step PCRs were carried out in 25 μl reactions comprising 5 μl of 5X Phusion HF Buffer, 0.125 μl of 2 U/μl Phusion Hot Start Flex DNA Polymerase, 0.125 μl of 20 mg/ml BSA (all from NEB), 0.5 μl of 10 mM dNTP Mix (Promega), 5 ng of extracted DNA, 0.5 μl of each assigned 10 μM forward and reverse primer uniquely tagged with heterogeneity spacer, 5 μl of 10 % DMSO (Sigma) and ultrapure water. Thermal cycling condition was 98 °C for 2 min; 25 cycles of 98 °C for 30 sec, 53 °C for 30 sec and 72 °C for 30 sec; and final extension at 72 °C for 5 min. The 2 nd step PCRs were performed in 45 μl reactions containing 9 μl of 5X Phusion HF Buffer, 0.225 μl of 2 U/μl Phusion Hot Start Flex DNA Polymerase (all from NEB), 0.9 μl of 10 mM dNTP Mix (Promega), 20 μl of 1 st step PCR products, 0.9 μl of each assigned 10 μM forward and reverse primer tagged with Illumina adapter and unique multiplexing index (Cruaud et al. 2017), 4.5 μl of 10 % DMSO (Sigma) and ultrapure water. PCR condition was 98 °C for 2 min; 10 cycles of 98 °C for 30 sec, 55 °C for 30 sec and 72 °C for 30 sec; and final extension at 72 °C for 5 min.

Data pre-processing and filtering
We used negative controls to eliminate potential contaminant ASVs and mock communities to determine the thresholds for removing false positive reads. To make the read count comparable between samples and negative controls, total read count of each ASV from all samples was normalized by dividing it by its fold change compared to that of all negative controls. ASVs with read count in negative controls an order of a magnitude more than that in normalized sample read count were regarded as contaminants and discarded (Lee et al. 2015). All of the false-positives in mock communities of 18S and 12S libraries could be eliminated by applying threshold levels of 0.08% and 0.02%, respectively (Supplementary Tables S13 and S14). To ensure the removal of false-positives in all samples, we applied a more stringent threshold levels of 0.1% and 0.05% to 18S and 12S, respectively. Low threshold should be applied only when the requirement of high sequencing depth for each sample is fulfilled (>10,000 reads per sample) (Deagle et al. 2019). In this study, the lowest read count per sample was 49,651 and the low threshold used here is justifiable.
Sequencing of samples, mock communities and negative controls yielded quality-filtered paired-end raw reads 144,148,356 from 111 samples of 18S and 99,341,650 from 97 samples of 12S. After merging the paired-end sequences, we retained 70,608,252 and 48,311,373 reads for 18S and 12S, respectively. Followed by primer trimming, quality filtering and removal of chimera and singletons, 58,129,311 and 44,057,306 reads, accounted for 82% and 91% of total merged reads for 18S and 12S, respectively, were retained (mean number of reads per sample: 457,272 and 382,269 for 18S and 12S loci, respectively after pre-processing). The sample from Platalea leucorodia (Eurasian spoonbill) was excluded from later analyses. We removed falsepositives with thresholds based on mock communities and removed contaminant ASVs based on negative controls. After removal of false-positives and contaminants, we retained 553 from 933 ASVs and 184 from 233 taxa for 18S. For 12S rDNA, we kept 43 from 55 ASVs and 31 from 38 taxa for later analyses. Then, we excluded reads from non-target taxa, such as Homo sapiens and Platalea spp., leaving 551 ASVs and 183 taxa for 18S, and 40 ASVs and 29 taxa for 12S (Tables S15 and S16). The 143 non-metazoan 18S taxa in 7 categories including Eukaryota, Fungi, Algae, Protozoa, Plant, Fungus-like, and Fornicata were removed, leaving 40 taxa in the analyses (Fig. 2-4). Non-dietary category Platyhelminthes was excluded from the dietary diversity analysis of 18S ( Fig. 5-6). Aves (Platalea spp.) accounted for 14.0% and 29.3% of the sample reads for 18S and 12S (Supplementary Tables S17 and S18).

Diversity analyses of abundance-based data
While incidence-based data only takes into account the presence and absence of each taxon, abundance-based data includes the information of read abundance of each taxon. The alpha diversity of abundance-based dietary taxa of roosting groups was estimated by Shannon's diversity index (H), which takes into account both species abundance and evenness, and Simpson's dominance index (D), which determines whether the community is dominated by a few taxa by R package microbiome v1.6.0 (Lahti and Shetty 2017) and plotted with ggplot2. We used Tukey Honest Significant Difference (Tukey's HSD) analysis to estimate the pairwise differences of diversity indices between roosting groups.                Table S13 Read counts of taxa identified in individual 18S mock communities and negative controls. The species used in the mock communities were indicated by green color. The false positives with relative read abundance lower than 0.08%, 0.02%, and 0.002% were indicated by dark chocolate, orange and yellow colors, respectively. Amplicon sequence variances removed after treatment based on negative controls were labeled as 'Yes'. † ND, not detected; ‡ Incertae Sedis, the taxonomic group which broader relationships are unknown or undefined; § MAST-12A, a clade of marine stramenopiles; ¶ A31, an uncultured lineage in Alveolata; light green, used in the mock community; dark orange, false positive <0.08%; yellow, false positive <0.02%; light yellow, false positive <0.002%.   Table S14 Read counts of taxa identified in individual 12S mock communities and negative controls. The species used in the mock communities were indicated by green color. The false positives with relative read abundance lower than 0.08%, 0.02% and 0.002% were indicated by dark chocolate, orange and yellow colors, respectively. Amplicon sequence variances (ASVs) removed after treatment based on negative controls were labelled as 'Yes'. † Sillago spp. are the top hits in blastn search against NCBI nt database, but hits from multiple orders in the class of Actinopterygii were also detected. ‡ Protosalanx chinensis is the top hit but hits from multiple genera in the family of Salangidae were also detected. § ND, this non-fish taxa was not detected by using this marker. Please refer to Table S13 for the colorations.

Figure S2
Abundance-based Bray-Curtis dissimilarities of the dietary taxa compositions between different roosting groups of Platalea minor. Relative read abundance of each taxon was fourth root transformed before calculating the dissimilarity distance. (a, c) Nonmetric multidimensional scaling (NMDS) and (b, d) hierarchical clustering dendrograms showed the differences in dietary compositions of taxa (a, b) and fish species (c, d) between the roosting groups identified using 18S and 12S rDNA markers, respectively. Stress level of NMDS analysis and statistically significant differences examined by permutational ANOVA test (p<0.001) were indicated. Please see Figure 3 for abbreviations and Table S9b, S10b for SIMPER analysis.