The importance of common and the irrelevance of rare species for partition the variation of community matrix: implications for sampling and conservation

In community ecology, it is important to understand the distribution of communities along environmental and spatial gradients. However, it is common for the residuals of models investigating those relationships to be very high (> 50%). It is believed that species’ intrinsic characteristics such as rarity can contribute to large residuals. The objective of this study is to test the relationship among communities and environmental and spatial predictors by evaluating the relative contribution of common and rare species to the explanatory power of models. Our hypothesis is that the residual of partition the variation of community matrix (varpart) models will decrease as rare species get removed. We used several environmental variables and spatial filters as varpart model predictors of fish and Zygoptera (Insecta: Odonata) communities in 109 and 141 Amazonian streams, respectively. We built a repetition structure, in which we gradually removed common and rare species independently. After the repetitions and removal of species, our hypothesis was not corroborated. In all scenarios, removing up to 50% of rare species did not reduce model residuals. Common species are important and rare species are irrelevant for understanding the relationships among communities and environmental and spatial gradients using varpart. Therefore, our findings suggest that studies using varpart with single sampling events that do not detect rare species can efficiently assess general distributional patterns of communities along environmental and spatial gradients. However, when the objectives concern conservation of biodiversity and functional diversity, rare species must be carefully assessed by other complementary methods, since they are not well represented in varpart models.

Understanding the distribution of biological communities along environmental and spatial gradients has had several theoretical advances such as the Niche concept 1 , Neutral Theory 2 , and the Meta communities synthesis 3 . Parallel to these advances, the development of methods that quantify the importance of spatial and environmental filters as community predictors [4][5][6] have been essential for interpreting the distribution patterns of species in communities 7 .
Niche theory 1 , that predicts species are distributed according to the environmental conditions 8 and biotic interactions 9 , which determine their distribution along the environmental gradient. Thus, species distribution is solely explained by the conditions and resources present in the studied sites. In contrast, Neutral theory posits that species distribution does not depend of environmental conditions or resources, and that spatially closer sites would have similar communities 10 due to historical processes such as vicariance and dispersal 2 . Therefore, according to niche theory predictions, the main filters for species distributions are the relationship between species and their environment, along with local extinctions where conditions and resources are inadequate 1 . Alternatively, for Neutral theory, the main filters are species dispersal and the probabilities (random events) of speciation and Scientific Reports | (2020) 10:19777 | https://doi.org/10.1038/s41598-020-76833-5 www.nature.com/scientificreports/ extinction 2 . Considering these different structuring forces, to understand the distribution patterns of species in a landscape, it is necessary to understand the environmental and spatial mechanisms associated with species 7 . A way of quantifying the importance of those mechanisms is using variation partitioning models 5 between environmental and spatial predictors 11 . These models have four elements: the portion explained exclusively by the spatial component, the portion explained exclusively by the environmental component, the portion explained by the interaction between environmental and spatial predictors, and the residual portion, not explained by the model 12 . Though this is a widely accepted method for investigating the effects of niche and neutral mechanisms in community ecology, a recent meta-analysis 7 revealed it is common for the residual portion of the model to be higher than the explained portion (> 50%) 5 . This was shown in very metacommunities with fish 13 , Odonata 14,15 and termites 16 in Amazonian streams, macroinvertebrates in the south 17 and southeast 18 of Brazil, and in macrophytes 19 and beetles 20 in temperate regions. The high residual variation is associated with environmental or spatial predictors that were not included in the models 5,7 and/or with a subset of species that have antagonistic responses to the environmental and/or spatial gradients due to autoecological characteristics 21 .
One potential reason for such an antagonistic response is the categorization of species as rare or common, which tend to respond to environmental gradients in an idiosyncratic way 18,22 . Because of that, by analyzing all species in the community together, it is possible to increase the model residual portion. In this context, species that are common and have a large spatial distribution do not have high dispersal limitations and are generalists, possessing environmental plasticity that allows them to survive in different conditions 23 , some of which are considered adverse. While, rare species, on the other hand, have more restricted distributions 24 and knowledge about their relationship with environmental variables is generally limited. For these reasons, rare species are often not considered in community analyses or have few statistical relevance to community patterns (e.g. in direct ordination analyses) 4 .
To understand the mechanisms and patterns important for community assembly it is essential to create a multitaxon approach, allowing for greaterextrapolation power of the results 25 . Considering studies in stream ecology to jointly evaluate taxa that use different parts of the habitat (e.g.: fish, exclusively aquatic organisms, and Zygoptera adults, which live in the riparian vegetation 13,26 ) may provide robust results which could be extrapolated for other organisms of the aquatic biota living in similar habitats.
Thus, our objective is to test the relationship among communities and the environmental and spatial predictors by evaluating the relative contribution of common and rare species to the power of explanation of the models. Our hypothesis is that the residual portion of variation partitioning models will decrease as rare species are removed from the analyses. Rare species would inflate the residual portion as they make the models less parsimonious, since they would add too many species that are not relevant in the relationship among community patterns and spatial and environmental patterns.

Results
Environmental and spatial predictors explained together 37% and 22% of the variation for Zygoptera and fish, respectively. For fish (25%) and Zygoptera (12%), most of the explanation was for environmental predictors. The spatial component explained only 3% for fish and Zygoptera (Fig. 1).
The gradual removal of rare species (up to 50% of total community richness) had little effect on the percentage of variation explained by environmental and spatial predictors for Zygoptera (ΔR 2 = − 0.004, p = 0.053) and fish (ΔR 2 = − 0.003, p = 0.060) communities. However, the gradual removal of common species (up to 50% of total community richness) strongly altered the percentage of variation explainedby environmental and spatial www.nature.com/scientificreports/ predictors of fish (ΔR 2 = 0.135, p = 0.002) communities, but did not significantly differ from the alteration in the explanation of Zygoptera (ΔR 2 = 0.048, p = 0.536) communities when species are removed at random when species are removed at random ( Fig. 2A:D).

Discussion
Our hypothesis that the residual portion of the variation partitioning models would decrease as rare species were removed was not corroborated. In all scenarios, for both fish and Zygoptera, removing up to 50% of rare species did not reduce residuals in a significant way. However, for the most common species, the removal of only one species had notable effects in the residual portion of the models. This effect is more pronounced in communities in which a few species dominate the community, such as the fish community in our study (Fig. 3), since the removal of the most abundant species steeply reduces the percentage of environmental and spatial variation explained. Therefore, when analyzing general patterns of community distribution in spatial or environmental gradients using variation partitioning, the exclusion of rare species does not promote significant residual increase. Due to potential analytical problems associated with rare species in community analyses, for example, the great number of zeros in species matrices 11 and having more species than samples 27 , it is recommended to exclude rare species before performing data analyses. Our results suggest this is possible without harming the visualization of general patterns of communities along spatial and environmental gradients 28 .
In tropical countries, where species diversity is high 29 , big knowledge shortfalls regarding biodiversity distribution 30 (known as the Wallacean shortfall 31 ) exist. Additional financial crisis may reduce research investments 32 , so it is common for the biological sample to be from one sole sampling/field trip 13,26,33,34 . These sampling strategies can result in a low representation of rare species, and, consequently, can affect the detection of diversity patterns 35 . However, despite these possible limitations, our results demonstrate that even if a considerable portion of rare species is not captured this does not make it impossible to evaluate the general distributional patterns of communities in spatial and environmental gradients.
Nonetheless, for conservation 24 or to assess the functional role of species in ecosystems 36,37 , the exclusion of rare species can bias the results. Functionally, rare species contribute considerably to patterns of richness, specialization, and functional originality of communities 36,37 . Conservation assessment at the community level Correlogram showing the R 2 values of the variation partitioning (pRDA) analyzing the effects of environmental predictors (black circles), spatial predictors (empty circles), the interaction termbetween environmental and spatial (grey circles), and the residual portion (red circles) on fish and Zygoptera communities. A = fish communities with abundance data gradually removing rare species. B = Zygoptera communities with abundance data gradually removing rare species. C = fish communities with presence-absence data gradually removing rare species. D = Zygoptera communities with abundance data gradually removing common species. www.nature.com/scientificreports/ can disregard rare and important taxa, such as endemic species 38 . Thus, conservation measures are commonly evaluated and proposed at the population level 39 . From our findings, we conclude that common species are important, and rare species contribute little for understanding the relationships of communities with spatial and environmental gradients using variation partitioning (Fig. 4). However, when considering other objectives such as conservation or the analysis of functional patterns, information about rare species continues to be essential. We further emphasize that studies with sole sample events often may not be effective in detecting rare species but are efficient in identifying general  www.nature.com/scientificreports/ community distribution patterns along environmental and spatial gradients. Therefore, when necessary, the exclusion of rare species does not harm the interpretation of community patterns in those circumstances.

Material and methods
Study area. We sampled Zygoptera and Fish assemblages of small streams (up to 4 m width and a 0.8 m mean depth) in Eastern Amazon (Brazil). We sampled fish in 109 streams and Odonates in 141 streams. The sampling sites are in areas where the natural forest exists but has been suffering with a continuous process of landscape fragmentation due to the advancement of agrosystems (e.g.: agriculture, pasture and logging) in natural areas 13,26 . The study area consists of three macro climatic zones according to the Köppen classification 40 : The Am zone (North), the Aw zone (South), and the Af zone (East). The Am is a tropical rainy climate (also called tropical monsoon climate), Aw is a tropical climate with distinct dry and rainy seasons, and Af is a tropical rainforest climate (also called equatorial climate) 40  Zygoptera sample. We sampled adults Zygoptera once on each stream. Sampling happened between 2009 and 2018 always between the months of July and December, when the highest diversity of aquatic insects is collected in the Amazon. In each stream, we demarcated a linear transect of 150 m and captured all the Zygoptera adult specimens observed in the span of one hour. We used an entomological net of 40 cm diameter and 65 cm width. To minimize bias related to the different types of Zygoptera thermoregulation (thermal conformers, heliotherms and endotherms), the collection of specimens occurred between 10 a.m. and 2 p.m., when sunlight reaches the stream channel 14,41 .
The specimens were stored according to the protocol proposed by Ref. 42 . We identified the material using taxonomic keys and specialized guidebooks e.g.: Refs. [42][43][44] , and, when necessary, we sent the material to specialists. The specimens are deposited in the collection of the Zoology Museum of the Federal University of Pará, located in Belém, state of Pará, Brazil.
Fish sample. The fish were collected in 109 first and third-order streams, along 150 m transects, in the dry season, between July and December of 2012-2015. We selected this period to avoid seasonal variation in the fish assemblage structure 45,46 and to increase sampling efficiency, which is easier under low streamflow.
The fish were captured using circular 55-cm-diameter dip nets with a 3 mm metallic mesh. This type of dip net is considered an efficient method to collect specimens, allowing the capture of fish in small stream microhabitats, including complex habitats (e.g. riparian vegetation, leaf banks and the inferior portion of wood trunks and branches 47 ). The 150 m transect was divided into 10 sections. Each section was sampled by 2 collectors for 18 min, and the total sampling time was 3 h/stream. The specimens were euthanized using anesthetics (Eugenol, following the Brazilian Civil Law nº 11.794/2008), fixed in a 10% formalin solution, and transferred to 70% ethanol after 48 h. In the laboratory, the specimens were identified using taxonomic keys 48,49 and with the aid of specialists. All specimens were deposited in the ichthyology collection of the Emílio Goeldi Museum (MPEG), Belém, State of Pará, Brazil.
Environmental predictors. We used environmental variables measured by the protocol of environmental assessment developed by the United States Environmental Protection Agency (US-EPA) 50,51 . We took measurements and observed the characteristics of the habitat in the same 150 m stretch where Zygoptera and fish sampling took place. In total, we obtained 186 measurements: 27 are related to channel morphology and hydraulics, 26 are related to substrate, 28 to organic debris, 60 are related to wood, 16 to the characteristics of the riparian forest, and 29 refer to human influence 13 . We took a series of steps to reduce the number of variables. Initially, we removed variables that had a low (< 40%) coefficient of variation (CV) and the ones that had high values (> 80%). We removed them because predictors that have little variation are not representative in the environmental gradients, and samples with too many zeroes (high coefficient of variation) indicate a sampling problem 13,26 .
Considering this selection criterion for the 186 initial environmental variables, there were 139 remaining predictors for Zygoptera, and 51 for Fish. This difference is because some of the collection sites were distinct for fish and Zygoptera. After that, we used those environmental matrices to perform a model selection (forward stepwise) with species composition (separately for Zygoptera and fish). We used this method of selection to eliminate predictors that did not associate with the species matrices and to avoid inflated residual values because of the use of spurious variables 4 . To avoid multicollinearity in further analyses, we performed Principal Component Analyses (PCA) with the original predictors and used the ordination axes as the predictive variables. To avoid information loss, we retained the axes until the sum of explanation (eigenvalue) was equal to or higher than 90% 52 . For Zygoptera and fish, seven axes were necessary for > 90% in PCA (Supplementary Information). Spatial predictors. It is expected for communities that are spatially closer to have similar species composition 53 . This happens because there is a higher likelihood of a migratory species flow between environmental patches that are spatially closer 3 . Consequently, when we analyze biological communities through space it is necessary to understand how much of the similarity or dissimilarity among species relates only with the spatial distribution of the sampling sites. To analyze the importance of space, we used spatial filters 6  www.nature.com/scientificreports/ Neighbour Matrices (PCNM). Then, we used the set of vectors as the spatial predictors. To select which vectors are important predictors for the communities, we performed a forward selection model using the PCNMs and the community matrices of Zygoptera and Fish, respectively (Fig. 5A). We used two sets of spatial predictors (PCNMs), one for each taxonomic group. Three PCNMs were selected as spatial predictors for Zygoptera and two for Fish.

Data analysis.
To test the hypothesis that the residual portion of the model would decrease as rare species were removed, we performed the models for each taxonomic group (Zygoptera and Fish). For each taxonomic group, we used the abundance matrix to perform variation partitioning (varpart function from the vegan package in R) using the whole community and the environmental and spatial predictors (Fig. 5A). This was con- The analytic procedure used to analyze the effects of environmental predictors (black circles), spatial predictors (empty circles), the interaction term between environmental and spatial predictors (grey circles), and the residual portion (red circles), while gradually removing the rarest and most common species. Env environmental. www.nature.com/scientificreports/ sidered the reference value. Then, we built a function where in each loop one species was removed from the community matrix and then the variation partitioning was performed again. The procedure of species removal was executed from the rarest species to the most common species and from the most common species to the rarest species. The species were removed until the resulting richness was equal to 50% of the original community richness. To facilitate interpretation of the results, we created a correlogram using the proportion of variation explained by the environmental, spatial, interaction term, and residual components of each routine step and of each dataset, with the number of species removed from the analysis (Fig. 5B). We tested our hypothesis that residual variance decreases when rare species are removed by contrasting the difference in residual variation before and after removing the rare species (ΔR 2 = R 2 Before-R 2 After) against a null model in which we took caution to perform a stratified sample of our original community, in order to create subsamples that resemble the distribution of relative abundances in the original community. Species from our subsamples were removed randomly and we calculated the difference in residual variation before and after removing 50% of the community (ΔR 2 ). We repeated this procedure 999 times to create a distribution of values for ΔR 2 .We then contrasted our values obtained when removing rare fish and Odonata species against this distribution to obtain a measure of significance for the effect of removing rare species on the proportion of residual variation.