Drivers of tropical soil invertebrate community composition and richness across tropical secondary forests using DNA metasystematics

Tropical forests are fundamental ecosystems, essential for providing terrestrial primary productivity, global nutrient cycling, and biodiversity. Despite their importance, tropical forests are currently threatened by deforestation and associated activities. Moreover, tropical regions are now mostly represented by secondary forest regrowth, with half of the remaining tropical forests as secondary forest. Soil invertebrates are an important component to the functioning and biodiversity of these soil ecosystems. However, it remains unclear how these past land-use activities and subsequent secondary forest developments have altered the soil invertebrate communities and any potential ecological consequences associated with this. DNA metabarcoding offers an effective approach to rapidly monitor soil invertebrate communities under different land-use practices and within secondary forests. In this study, we used DNA metabarcoding to detect community-based patterns of soil invertebrate composition across a primary forest, a 23-year-old secondary forest, and a 33-year-old secondary forest and the associated soil environmental drivers of the soil invertebrate community structure in the Maquenque National Wildlife Refuge of Costa Rica (MNWR). We also used a species contribution analysis (SIMPER) to determine which soil invertebrate groups may be an indication of these soils reaching a pre-disturbed state such as a primary forest. We found that the soil invertebrate community composition at class, order, family, and ESV level were mostly significantly different across that habitats. We also found that the primary forest had a greater richness of soil invertebrates compared to the 23-year-old and 33-year-old secondary forest. Moreover, a redundancy analysis indicated that soil moisture influenced soil invertebrate community structure and explained up to 22% of the total variation observed in the community composition across the habitats; whereas soil invertebrate richness was structured by soil microbial biomass carbon (C) (Cmic) and explained up to 52% of the invertebrate richness across the primary and secondary forests. Lastly, the SIMPER analysis revealed that Naididae, Entomobryidae, and Elateridae could be important indicators of soil and forest recuperation in the MNWR. This study adds to the increasing evidence that soil invertebrates are intimately linked with the soil microbial biomass carbon (Cmic) and that even after 33 years of natural regrowth of a forest, these land use activities can still have persisting effects on the overall composition and richness of the soil invertebrate communities.

to recover and/or fall short of returning to their previous biotic status 7 . Tropical forests in particular are critical zones for global biogeochemical cycling and hotspots of biodiversity, despite these important ecosystems only constituting 7-10% of the Earth's land surface 5,8,9 . The recuperation of soils in the tropics remains an important initiative for tropical forest recovery, as secondary forests serve the potential for soil carbon sequestration and refugia for biodiversity [10][11][12][13] . Thus, understanding the consequences these activities have had on the soil biotic ecological drivers in regenerating tropical forests warrants investigation.
The development of secondary forests in the tropics provides an avenue for the amelioration of degraded soils. However, previous land use histories are known to create persisting alterations to the plant communities, and also affect various components in the soil such as soil texture and soil nutrient content and availability 14 . These legacy effects often cause significant reductions of essential soil carbon (C) and nitrogen (N) through the loss of vegetation and through soil degradation. Through some of these secondary forest developments, it is believed that forest recovery, or afforestation, can help restore soil C and N; however, in the tropics it has been found that even landscapes of similar origin and past land use history can exhibit multiple successional pathways with differing routes of forest recovery 13,15 .
Soil microbial communities have been recently identified as key components to restoring ecological functions in degraded soils 16 . As with the soil microbial communities, the soil invertebrate communities are considered an important component of soil biodiversity and ecological function, and therefore are important to consider in the process of forest recovery following disturbance 17 . This is especially important when considering foodweb structure in soil and the position of invertebrates as an intermediate trophic layer between microbial taxa and higher-level animals. Despite their contribution towards soil biodiversity, extant studies regarding changes in the soil invertebrate community across regenerating secondary forests are substantially lacking 18 . Soil invertebrates play an essential role in the recycling of organic matter and are capable of enhancing decomposition and primary productivity for soil microbial communities, either directly or indirectly [19][20][21] . More specifically, soil invertebrate groups are capable of increasing soil microbial inoculation to the surrounding available soil nutrients through substrates rich in fecal material, and is typically a result of feeding methods and mobility characteristics 22 . For example, nitrogenous waste often in the form of ammonia (NH 3 ), is released through fecal material of soil invertebrates such as collembolans or nematodes which is readily available to plants, bacteria and fungi 20,21,[23][24][25] . Therefore, it is important to understand not only how the soil microbial communities change under these dynamics, but also how the soil invertebrate communities respond to land use change and emergence of secondary forest developments.
Soil invertebrates have complex relationships with the surrounding environment and are affected by variations in the microhabitat and fluctuations in the types and quality of resources [19][20][21]26 . For example, collembolan (springtail) richness is known to be negatively affected by soil disturbances, such that the number of collembolan individuals decreases as disturbance intensity increases 27 . Likewise, the soil environmental factors that can favor soil invertebrate richness are litter and soil quality (i.e. resource quality), soil C:N ratios, organic matter content, and soil microbial biomass 28 . Thus, soil invertebrates are good candidates for detecting and monitoring changes in terrestrial soil ecosystems. However, it is still unclear how soil environmental changes associated with changes in land management influence the soil invertebrate communities in tropical secondary forests. Identifying which soil abiotic factors most strongly drive soil invertebrate community richness it is therefore crucial to understand how to improve soil biomass development and secondary forest regeneration.
While DNA metabarcoding techniques have been widely applied to investigate soil microbial communities, the application of these techniques to survey bulk soil invertebrate communities in the tropics remains limited 29,30 . Traditional studies that use morphological strategies to identify invertebrate communities are typically based on samples collected from pitfall traps for ground dwelling arthropods and, Malaise traps for flying insects, with less focus on DNA metabarcoding from bulk soil samples 29,31,32 . Previous studies have used DNA metabarcoding to examine the diversity of specific invertebrate taxonomic groups such as collembolans, nematodes, and annelids; however, few studies have performed DNA metabarcoding of soil invertebrates for community-scale analyses and the associated soil abiotic drivers of the community 29,33-37 . Moreover, most work performed on soil analyses has been in temperate regions with limited focus on tropical areas 26,30 . With the advent of DNA metabarcoding, obtained sequences can be compared to a growing standard reference library of known organisms and the taxa present in an environmental sample such as soil can be identified with high confidence [38][39][40][41][42] . Here, we used DNA metabarcoding to analyze the soil invertebrate community composition from bulk-soil across a primary forest and two different ages of secondary forests in the Maquenque National Wildlife Refuge (MNWR) of Costa Rica.
In this study, we aimed to examine differences in the soil invertebrate composition at the community-scale (and at different taxonomic resolutions) and assess the associated soil abiotic drivers across a primary forest and two different ages of secondary forests. We asked three questions pertaining to the soil invertebrate community: (1) Are there differences in the soil CO1 community composition between a primary forest and two different secondary forests of different ages (at different taxonomic resolutions)? (2) Which soil taxa are contributing to the differences in the community composition? (3) Which soil abiotic factor(s) is (are) best explaining the differences in community composition? Here, we provide evidence that two different ages of secondary forest and a primary forest have relatively different soil invertebrate community compositions, and that even after 33 years of natural forest regrowth, secondary forest soils do not harbor as much soil invertebrate richness in comparison to the primary forest soils. Moreover, we provide evidence that soil microbial biomass C may be a strong determinant of soil invertebrate richness. www.nature.com/scientificreports/ Between the primary forest and young secondary forest, Entomobrymorpha (8.63%), Haplotaxida (6.58%), and Sarcoptiformes (5.48%) were the top 3 taxa that contributed the most to the percent dissimilarity between these two habitats (Table 3).
Family. Some of the dominant CO1 families included Enchytraeidea, Naididae, and Glossoscolecidae (Oligochaeta Haplotaxida), and Isotomidae (Collembola), (Table S4). The PERMANOVA results indicated that family community composition was significantly different between the primary forest and old secondary forest, and between the old and young secondary forest (p < 0.05) (Table 1c) (Fig. 1c). However, the family community composition was not different between the primary forest and young secondary forest (p > 0.05) (Table 1c). Family diversity across the habitats did not vary significantly between habitats (F 2,15 = 0.19, p = 0.826), however, family richness was significantly greatest in the primary forest (2.68 ± 0.62) in comparison to the old and young secondary forests (F 2,15 = 4.54, p = 0.03) (Fig. 2). The SIMPER results indicated that Enchytraeidae (9.14%), Naididae (8.85%), and Elateridae (Coleoptera) (7.93%) were the CO1 families that contribute the most to the percent dissimilarity between the primary forest  www.nature.com/scientificreports/  Table 3. The similarity of percentage analysis (SIMPER) using a one-way design with a 70% cut-off percentage to list only higher-contributing taxonomic groups for the soil COI order community composition between the three habitats in the MNWR (PF = primary forest, OS = 33-year-old secondary forest, YS = 23-year-old secondary forest). www.nature.com/scientificreports/ and old secondary forest (Table 4). Isotomidae (10.41%), Enchytraeidae (9.90%), and Onychiuridae (Collembola) (7.93%) contributed the most to the percent dissimilarity between the old and young secondary forest. Similarly, Naididae (10.30%), Isotomidae (9.40%), and Enchytraeidae (7.22%) contributed to the dissimilarity between the primary forest and young secondary forest (Table 4).

Exact sequence variants (ESVs). The CO1 ESV community composition was significantly different
between all pairwise habitat comparisons, indicating that ESV community composition is very distinct across the primary forest, old secondary forest, and young secondary forest (p < 0.05) (Table 1d) (Fig. 1d). The results of the one-way ANOVA indicated that the CO1 ESV richness was significantly greatest in the primary forest soils (37.75 ± 4.86) in comparison to the old and young secondary forest soils (F 2,15 = 59.51, p < 0.0001) (Fig. 2). In contrast, the CO1 ESV diversity was not significantly different across habitat soils (F 2,15 = 1.95, p = 0.18) (Fig. 2).

Drivers of soil invertebrate community composition and richness.
Soil moisture and pH were the best variables explaining the invertebrate class community composition, and together, explained 34% of the total variation observed ( (Table 5b,c). Soil moisture also explained the ESV community composition, however, the Dis-tLM sequential tests could not verify a best solution, as including or excluding moisture did not improve the overall model (Table 5d). In contrast to the community composition, richness at the class, order, family, and ESV levels were best structured by soil C mic and explained 30.06%, 51.73%, 23.41%, and 36.35%, respectively, of the total variation observed across the habitats (Table 6) (Fig. 4).

Discussion
Many soil invertebrate studies examining community structure typically rely on traditional morphological-based tools to assign identity to soil invertebrates captured using Berlese funnel extraction or pitfall traps. However, recent DNA metabarcoding advances have allowed us to use marker gene surveys as a tool to assess community ecology-based questions and is becoming increasingly popular for soil-based invertebrate studies 42 . In this study, Table 4. The similarity of percentage analysis (SIMPER) using a one-way design with a 70% cut-off percentage to list only higher-contributing taxonomic groups for the soil COI family community composition between the three habitats in the MNWR (PF = primary forest, OS = 33-year-old secondary forest, YS = 23-year-old secondary forest).

Invertebrate family
Proportion of reads (%)

Average dissimilarity Contribution (%) Cumulative (%) Primary
Old secondary www.nature.com/scientificreports/ we amplified 3 fragments of the standard 5′ end of the 658 bp mitochondrial CO1 region and DNA metabarcoding to assess soil invertebrate community composition and richness at four different taxonomic resolutions across a primary forest and two different ages of secondary forest, and the associated abiotic drivers with community composition and richness. Even though there are current reference database limitations in assigning invertebrate taxonomy, it is clear from this study that community-level differences in soil invertebrate composition and the associated soil abiotic drivers are detectable via DNA metabarcoding from bulk-soil across a land management gradient in the MWNR of Costa Rica.
Soil invertebrate community composition. Overall, the results showed that soil invertebrate community composition is different, and even 33 years of natural regrowth can have consequences on the soil invertebrate community that develops under these conditions and can result in community-level differences across primary and secondary forests. This is not surprising as previous evidence has shown that land use history or subsequent secondary growth can create changes and differences in the soil decomposer communities that develop under these conditions due to changes in resource input 42 . Moreover, a study using DNA metabarcoding of arthropod orders showed that certain groups of invertebrates were significantly different across land use types in southwest China 43 . In terms of taxonomic resolution, the soil invertebrate community composition at the ESV level was distinctly different across the primary and secondary forests, compared to differences at class, order, and family resolutions. This finding is most likely due to finer differences that can be detected at a greater taxonomic resolution in the data-set at the ESV level, and has been described elsewhere 44,45 . Past land use history is known to influence the vegetation type observed in secondary forests. Ultimately, the changes in vegetation type between pre-and post-disturbance and likely soil disturbance can affect the types of substrates entering the soil through decomposition of leaf litter, as well as changes in the soil chemistry and soil nutrient and organic matter content. Indeed, fluctuations in soil characteristics can affect the abundance and distribution of soil biotic communities and often depends on the individual tolerance limits of different soil invertebrates at the local and regional scale [46][47][48][49] . We found that changes in soil invertebrate community composition were best explained by soil moisture at the class, order, and family level. However, the percent soil moisture accounted for relatively little of the total variation (up to 22%) in the soil invertebrate community structure, indicating that there could be an additional unknown soil environmental variable driving the variation observed. Previous studies have shown certain groups of soft-bodied invertebrates, such as collembola and some groups of mites to be effected by varying levels of moisture or water content 47,50-53 . Even though percent soil moisture was not significantly different between the three habitats, it could be that subtle fluctuations in soil moisture  www.nature.com/scientificreports/  www.nature.com/scientificreports/ Onychiuridae and are invertebrate families that prefer moist conditions 48 . Thus, these families with the greatest representation across the primary and secondary forest soils are likely to be highly sensitive to changes in soil moisture, as demonstrated by differences in relative abundance and distribution of these groups across land-use. A general model of soil invertebrate responses to past and/or current land use history is difficult to define. However, by examining and detecting differences in the soil invertebrate community in conjunction with the associated soil abiotic drivers, we can begin to assess the influence of past and present land use on soil-dwelling invertebrates. We have demonstrated in this study that soil invertebrate community composition changes can be detected, even over the relatively short time frame of 33 years natural forest regrowth. The outcomes and trajectories of these altered invertebrate communities are still largely unknown and warrants further investigation. It is likely that similar transitions of soil invertebrate communities, depending on land use and stage of reforestation, are currently occurring across the globe, and we have demonstrated that DNA metabarcoding can be applied to determine both the extent of community shifts and the associated abiotic drivers of the soil invertebrate structure.
Soil invertebrate richness and diversity. While we did not find differences in soil invertebrate Shannon diversity at all taxonomic levels across the primary and two secondary forests, we did find that soil invertebrate richness was greatest in the primary forest soils with lower richness in both of the secondary forests. This pattern may indicate that secondary forest developments may not be able to sustain the soil invertebrate richness to the same extent as the primary forest, even after 33 years of regrowth. This is most likely due to a primary forest presumably having higher diversity, complexity and richness of vegetation than resulting secondary forest growth, which has been described previously at this site. A greater multiplicity of resources and spatial heterogeneity (such as greater quantity, quality, and complexity of plant litter) on the forest floor in primary forests would presumably create additional niches or a partitioning of resources, resulting in a greater richness of soil invertebrate communities present to consume these resources 55,69 .
Previous evidence suggests there is a strong link between soil arthropod biomass and soil microbial biomass, and that soil arthropod biomass is typically regulated by bottom-up forces 28 . To corroborate this, we found that as soil microbial biomass C increases, soil invertebrate richness also increases, and that soil C mic explained up to 52% of the total variation observed in the soil invertebrate richness. Soil C mic is a measure of the C contained within the living component of soil organic matter and as microbial organisms decompose soil organic matter, this releases carbon dioxide and plant nutrients that become available to the surrounding biota. Thus, as soil organic matter increases, so should soil C mic , and thus, more soil organic C available to biotic communities.  www.nature.com/scientificreports/ However, soil C mic is strongly correlated with the amount and availability of soil organic matter, and is also influenced by soil properties such as pH, soil type and texture (i.e. percent sand, silt, and clay) 48,54 . Therefore, land use activities that cause reductions of essential soil C and N through the loss in the amount and complexity of vegetation and leaf litter, thereby decreasing the availability of soil organic matter, will also decrease the amount of soil C mic , and consequently, will also cause a decrease in the soil invertebrate richness. As a previous study determined that at our study site, the soil texture was uniform across the three forest types 55 , we can therefore assume that shifts in microbial biomass, as a consequence of land use and afforestation status, are likely to affect soil fauna assemblages observed. We have demonstrated that previous land use histories and current practices that remove or reduce soil microbial biomass C could have major implications for the composition of soil invertebrate communities, which directly influences the extent of ecological functions being performed across different land management gradients.
Taxonomic contributions. The SIMPER analysis revealed that the taxonomic groups contributing the most to the dissimilarity between habitats were those invertebrates involved with processing plant and soil material. These invertebrates are microbivores that feed on bacteria and fungi, either directly or indirectly, and contribute to the available soil organic matter content. The most prevalent microbivore invertebrate orders were Haplotaxida and Entomobryomorpha, or more specifically, the families Enchytraeidae, Naididae, Onychiuridae, Elateridae, and Isotomidae. When interpreting SIMPER results, it is important to take into account the proportion data; for example, despite Enchytraeidae being one of the top taxonomic groups contributing to the percent dissimilarity across habitats, this invertebrate family also had the greatest representation in the 33-year-old secondary forest. This could reveal a more hump-backed or bell-shaped response in relation to succession. Ideally, a good soil invertebrate indicator should have an increasing trend in the representation of the invertebrate group across succession, to provide an indication if secondary forest soils are on a trajectory to that of a primary forest.
In accordance with the proportion data, Naididae, Entomobyridae, and Elateridae, show an increasing trend of representation from young and old secondary forest to primary forest. The observation that the primary forest soils had a greater representation of these three invertebrate families suggests the importance of these groups as potential terrestrial indicators in this area for understanding the future trajectory and recuperation of soil in these secondary forests. The larvae of the Elateridae are (known as 'wireworms') live in soil and/or under bark and feed and process decaying soil organic matter 56 . The wireworm larval stage can span from 2 to 6 years, and in some cases even 10 years [56][57][58] . During this larval period, wireworms specifically feed underground making them persistent residents of soil for some time. Therefore, the Elateridae detected in this study could most likely be in the larval stage and may be an over-looked group in tropical soils of this area. Although wireworms are often agricultural and garden pests by feeding on plant tissue and roots, in this study the wireworms may represent an important fraction of the soil invertebrate community involved in the recycling of decaying organic matter 57 . These decomposition-based activities can increase the amount and availability of resources to the surrounding soil biotic communities. Wireworms are known to be influenced by soil characteristics such as soil texture 57 . However, given that these forests are of the same soil texture, the greater representation of Elateridae in the primary forest than both of the secondary forests may be an indicator of more soil organic matter and decaying materials for substrates for the wireworms.
The greater representation of Naididae in primary forest soils could indicate a higher substrate availability to enable the build-up of soil organic matter for plant nutrient acquisition and growth of soil organisms. Naididae are important oligochaetes capable of increasing the availability of soil N through bioturbation and feeding activities [59][60][61] . For example, Naididae excrete NH 4 + by ingesting silt and clay particles saturated with microbial groups attached to the soil particles 60 . These feeding activities could be an important avenue for increasing soil N, particularly where soil N has been depleted following agricultural abandonment or the early stages of tropical lowland secondary forests [62][63][64][65] .
Most of the groups present in the SIMPER results consist of groups from the soil mesofauna. Entomobyridae are a group of collembola and make-up an important fraction of the mesofauna. Indeed, collembolan groups have been used as indicators of soil condition in other landscapes, but less is known about this group in secondary forests of the tropics. Different groups of collembolans have different feeding preferences, and can suggest different scenarios regarding their presence or absence from an ecosystem 48 . For example, collembolan groups Isotomidae and Onychiuridae were contributing taxonomic groups in the SIMPER results; however, Isotomidae and Onychiuridae had greater representation in the secondary forest soils than the primary forest soils. This suggests that different groups of collembolans might show greater representation during different periods of succession, but that collembolan groups such as Entomobryidae may be an indication of late succession or even a non-disturbed habitat such as primary forest soils. Entomobryidae collembolans could be a potential indication as to whether or not secondary forests and soils are approaching a state close to the primary forest soils.
Taken together, these findings indicate that previous land-use history and subsequent regrowth can influence particular groups of soil invertebrates that can be used as potential terrestrial indicators of succession in the tropics.

Conclusions
In the present study, we determined if soil invertebrate community composition at class, order, family, and ESV level was different between a primary forest, 33-year-old secondary forest, and a 23-year-old secondary forest with the same soil type and texture but with differing past land-use histories and which soil environmental variables were structing the community composition and richness. Our work suggests that past land-use history and associated forest regrowth can lead to shifts in the structure of soil invertebrate community, but differences are more prominent or detectable at finer resolutions such as the ESV level. Moreover, these past land-use histories Scientific Reports | (2020) 10:18429 | https://doi.org/10.1038/s41598-020-75452-4 www.nature.com/scientificreports/ and subsequent secondary forest developments can alter soil invertebrate richness but not diversity, such that the soil invertebrate richness has not reached or returned to pre-disturbed state. These findings were all detectable with DNA metabarcoding of the CO1 region, providing a useful and rapid approach in understanding soil invertebrate biodiversity across succession in the tropics. Nonetheless, using DNA metabarcoding to analyze soil invertebrates at the community-based scale, as opposed to specific lineages, it is possible to detect changes in the structure of the soil invertebrate community even across landscapes of the same soil texture but managed differently.
As global, and in particular tropical deforestation continues today, and as the recuperation of degraded soils is becoming increasingly important, it is crucial to understand which soil abiotic factors most strongly drive soil invertebrate community dynamics, with implications for how to best improve soil conditions. This study provides valuable knowledge and applicability in using DNA metabarcoding to detect community-level differences in soil invertebrate structure from bulk-soil across a land management gradient and subsequent forest development.
Using rapid techniques such as DNA metabarcoding will become increasingly invaluable and a potentially more powerful technique to capture changes in soil invertebrate communities under land use changes.

Methods
Due to legal and illegal extraction-based land management practices since the 1970's, the San Juan-La Selva Biological Corridor (SJLBC) (Fig. S1) (Fig. S1). The MNWR is the core nucleus of the SJLBC for biodiversity as it contains the highest percentage of forest cover and most valuable habitats for biodiversity in the region 66 . Mean annual temperature is 27 °C, mean annual rainfall is 4300 mm, and the dominant soil type are oxisols 67 .
This study was conducted in the humid Atlantic lowland rainforests of the MNWR in three upland habitat types that were at one point in time all a part of a single tract of primary forest, with the same soil type (oxisol), texture 55 , and topography, but have been managed differently in the past ~ 40 years (total area 500 hectares; personal communication). The resulting habitats consist of a primary forest that has not been harvested in the known history of the land; a 33-year-old secondary forest that was allowed to regenerate immediately following deforestation (harvested 33 years ago); and a 23-year-old secondary forest that was cut 33 years ago, used for cattle pasture for ~ 10 years, and then allowed to regenerate into secondary-growth. In July 2014, 6 independent replicate plots (40 m × 25 m) were established in each of the three habitats. Distance between forest sites are as followed: primary forest (10° 40′ 46.21″ N, 84° 10′ 42.10″ W) and young secondary forest (10° 41′ 7.92″ N, 84° 9′ 57.30″ W) is ~ 1.5 km, primary forest and old secondary forest (10° 41′ 12.56″ N, 84° 10′ 15.65″ W) is ~ 1.1 km, old and young secondary forest is ~ 0.55 km. In each 1000 m 2 plot, nine soil subsamples were collected using a soil core (7.5 cm × 15 cm × 1.25 cm) in a grid fashion (Fig. S2) from each plot and bulked to provide one composite soil sample for each replicate plot (6 plots × 3 habitats = 18 bulked independent samples). The soil core and soil collection gloves were sterilized with 70% ethanol between each 1000 m 2 replicate plot to minimize cross-contamination and ensure independence of soil samples. This approach to soil sampling was to reduce microsite variability given the heterogeneous properties in tropical soils 68 . The soil pH and percent moisture were recorded during the same time of soil sample collection at each soil profiler core sample location in each of the plots in triplicate readings (3 readings × 9 soil core locations = 27 readings per plot) (Fig. S1) with a Kelway Soil pH and Moisture meter (Kel Instruments Co., Inc., Wyckoff, NJ, USA). Elevation of these plots were measured with a Garmin Rino 650 GPS (Garmin International, Olathe, KS, USA). For homogenization, all soil samples were mixed and passed through a 4-mm sieve at field moist conditions while sterilizing the sieve and laboratory gloves with 70% ethanol between each of the 18 composite soil samples, prior to all downstream analyses.
Soil abiotic factors have been measured previously and described elsewhere 55 . The soil abiotic properties examined in this study for each of the three habitats included soil pH, % moisture, C, N, C:N ratio, NH 4 + , NO 3 − , C mic , and % sand, silt, and clay. As aforementioned, soil pH and % moisture were measured from each of the 18 plots during the period of soil subsample collection To estimate (ToC), (TN), NH 4 + , and NO 3 − , 200 g of soil from each of the 18 sieved composite soil samples were delivered to the Center for Tropical Agriculture Research and Education (CATIÉ) Laboratories in Turrialba, Costa Rica. To estimate levels of microbial biomass C (C mic ), substrate-induced respiration (SIR) was used following the methods of Höper (2006). Substrate-induced respiration for measuring C mic was preferred as it measures the amount of living microbial biomass C. All nutrient and microbial activity data presented were adjusted for dry weight of the soil.
Environmental DNA was extracted from each of the 18 composite soil samples using three 0.33 g replicate sub-samples for a total of 1 g for each composite soil sample using the MoBio PowerSoil DNA Isolation Kit (MO BIO Laboratories Inc., Carlsbad, CA, USA) and according to the manufacturer's protocol. The concentration and purity (A 260 /A 280 ratio) of extracted soil DNA were determined prior to downstream analyses using a NanoDrop 1000 spectrophotometer (ThermoFisher Scientific, Waltham, MA) and all eDNA was stored at − 80 °C, as has been done in previous studies 55,69,70 .
Three www.nature.com/scientificreports/ buffer (200 mM Tris-HCl, 500 mM KCl, pH 8.4), 1 µL 50 × MgCl 2 (50 mM), 0.5 µL of dNTPs mix (10 mM), 0.5 µL of forward primer (10 mM), 0.5 µL of reverse primer (10 mM), and 0.5 µL of Invitrogen's Platinum Taq polymerase (5 U/µL) in a total volume of 25 µL. The PCR conditions were initiated with heated lid at 95 °C for 5 min, followed by a total of 35 cycles of 94 °C for 40 s, 46 °C (for all primer sets) for 1 min, and 72 °C for 30 s, and a final extension at 72 °C for 5 min, and hold at 4 °C. PCR products were then purified using a Qiagen Min-Elute PCR purification kit (Qiagen, Valencia, CA, USA) and eluted in 30 µL of molecular biology grade water. A second PCR step was implemented using the purified 1st PCR product as a template and with Illumina adaptor tailed target specific primers. The 2nd PCR was made following the same protocol as aforementioned except for 30 cycles were used for PCR. All PCR products were visualized on 1.5% agarose gels to confirm successful amplification by the presence of fluorescent bands, as has been done previously 55,69,70 . All PCRs were done using Eppendorf Mastercycler ep gradient S thermalcyclers and negative control reactions (no DNA template) were included in all experiments, and all generated soil amplicons plates were dual indexed and pooled into a single tube and sequenced in several Illumina MiSeq runs using a V2 MiSeq sequencing kit (500 cycles-250 × 2) (FC-131-1002 and MS-102-2003).
The raw Illumina paired-end reads were processed using the SCVUC v2.3 pipeline available from https ://githu b.com/EcoBi omics -Zoobi ome/SCVUC _CO1_metab arcod e_pipel ine. The subsequent CO1 Illumina generated sequences were processed using semi-automated pipelines. The CO1 Illumina paired-end libraries generated forward (R1) and reverse (R2) reads. Raw sequence reads were paired with SeqPrep ensuring a minimum Phred score of 20 and minimum overlap of at least 25 bp 74 . Primer sequences were trimmed using CutAdapt v1.10 specifying a minimum Phred score of 20, a minimum fragment length of 150 bp after trimming, and no more than 3 N's allowed. Reads were dereplicated with VSEARCH v2.11.0 using the 'derep_fulllength' command and the 'sizein' and 'sizeout' options. Denoising was performed using the UNOISE3 algorithm in USEARCH v10.0.240 75 . This method corrects sequences with potential errors, removes putative chimeric sequences, as well as removes rare exact sequence variants (ESVs) 76 with only one or two reads (singletons and doubletons). Previous work has shown that rare sequence clusters may be particularly prone to sequence errors and add 'noise' to the dataset 77,78 . Here we defined rare reads to be ESVs containing only 1 or 2 reads (singletons and doubletons). An ESV x sample table was created with VSEARCH using the 'usearch_global' command, mapping reads to ESVs with 100% identity. The remaining ESVs were taxonomically assigned using the CO1 Classifier v3.2 41 .
All generated sequencing data have been deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database under the BioProject: PRJNA542652.
A total of 1,117,015 ESV reads were taxonomically assigned by the CO1 classifier 41 . After filtering out the reads identified as bacteria, fungi, plants, and non-metazoan, 534,185 reads were retained. Out of the 534,185 reads, 117,162 reads were correctly identified to class and order at the 0.0 bp cutoff for 100% accuracy as is recommended for 200 bp fragments 41 . At the family level, 38,690 reads were correctly identified at the 0.20 bp cutoff, and at genus level, only 4,753 reads were correctly assigned at the 0.30 bp cutoff for 99% accuracy 41 . Thus, the soil invertebrate reads used in further analyses were class, order, family, and ESV levels. All CO1 ESVs were organized taxonomically at the class (0.0 bp cutoff), order (0.0 bp cutoff), family (0.2 bp cutoff), and ESV rank, and converted to mean proportion of sequences per sample 79 . The relative proportion of CO1 sequences for class, order, family, and ESVs were calculated by summing the number of reads for each taxon in a sample, then dividing by the total number of reads from the sample (i.e. sample replicate). To examine the soil CO1 class, order, family, and ESV richness and diversity for each sample, Margalef 's richness (d = (S − 1)/Log(N)) and Shannon Index (H′) (H′ = −SUM(P i *ln(P i ))) (P i is the proportion of the total count (abundance) arising from the ith class; order; family; ESV) were calculated in PRIMER-E v6 80 . Soil CO1 class, order, family, and ESV community compositions were transformed using a square-root transformation to account for dominant taxa as well as rare taxa 80 . For a weighted approach, the square-root transformed CO1 data were then calculated into a Bray-Curtis dissimilarity matrix in PRIMER-E v6 80 . Statistical and multivariate analyses. To address question one, significant differences in the means of the alpha diversity indices for soil CO1 class, order, family, and ESVs across the primary forest and secondary forests were determined using a one-way ANOVA followed by Tukey's HSD post-hoc analyses in SPSS (v.25, Armonk, NY, USA). Prior to analyses, a Shapiro-Wilk test was performed to determine normality of all the data in SPSS (v.25, Armonk, NY, USA) and all data were p > 0.05 suggesting normality. To determine if there were differences in the Bray-Curtis dissimilarity soil CO1 class, order, family, and ESV community composition matrices between the three forests, a one-way permutational multivariate analysis of variance (PERMANOVA) with main and pair-wise tests based on unrestricted permutations (999 permutations) 81 . The PERMANOVA results of the soil CO1 community compositions across the primary and secondary forests were considered significant if p ≤ 0.05.
In addition, a Canonical Analysis of Principal Coordinates (CAP) was used to visualize the distinctness of the soil CO1 class, order, family, and ESV community compositions across the primary forest and secondary forest soils based on an a priori allocation success using the PERMANOVA + guidelines 80,82 . Strong differences between primary forest and secondary forest soils are represented by CAP axis 1 and CAP axis 2 squared canonical correlations greater than or equal to 0.7, and moderate differences are represented by squared canonical correlations greater than or equal to 0.5-0.69 80,82 . Furthermore, Cohen's d effect sizes were calculated for the PERMANOVA pairwise comparisons to assess if the differences were trivial or not, and used as indicators of biologically meaningful differences between mean values of the parameters measured, as recommended for analysis of small sample sizes 83  To address question two, a Similarity of Percentage analysis (SIMPER) was used to determine which CO1 class, order, and family taxonomic groups primarily contribute to the Bray-Curtis dissimilarity between the Scientific Reports | (2020) 10:18429 | https://doi.org/10.1038/s41598-020-75452-4 www.nature.com/scientificreports/ habitat pairwise groups using a one-way design with a 70% cutoff percentage to list only higher-contributing taxonomic groups 84 . The SIMPER analysis was performed to potentially indicate or elude to the higher contributing taxonomic groups whose changes in proportion of sequences may indicate change; not that these groups are responsible for the biological dissimilarity in community composition across these forests. To address question three, a Distance-Based Linear Model approach (a multivariate multiple regression) was implemented to determine which soil environmental variable is contributing the most to the dissimilarities between habitats. The DistLM was performed using a 'step-wise' selection procedure and an AICc (Akaike's Information Criterion Corrected) selection criterion with 999 permutations 80,85 . The Distance-Based Linear Modeling sequential tests were considered significant if p ≤ 0.05. These DistLM results were then visualized using a Distance-Based Redundancy Analysis (dbRDA) in which the ordination is plotted from the values of the given model that explains the greatest variation in the data cloud 86 .
All multivariate analyses were performed in PRIMER-E v6 84 and its add-on PERMANOVA+ 80,82 . Prior to multivariate analyses, draftsman plots (variable pair-wise scatter plots) were used to determine the homogeneity and multicollinearity of each soil abiotic variable (predictor variables). All soil abiotic variables were transformed using the log (x + 1) transformation to correct for skewness and the data standardized using the 'normalize' parameter in PRIMER-E v6 80,82,84 .