Linear filtering reveals false negatives in species interaction data

Species interaction datasets, often represented as sparse matrices, are usually collected through observation studies targeted at identifying species interactions. Due to the extensive required sampling effort, species interaction datasets usually contain many false negatives, often leading to bias in derived descriptors. We show that a simple linear filter can be used to detect false negatives by scoring interactions based on the structure of the interaction matrices. On 180 different datasets of various sizes, sparsities and ecological interaction types, we found that on average in about 75% of the cases, a false negative interaction got a higher score than a true negative interaction. Furthermore, we show that this filter is very robust, even when the interaction matrix contains a very large number of false negatives. Our results demonstrate that unobserved interactions can be detected in species interaction datasets, even without resorting to information about the species involved.

Biological data such as microscopy images, environmental sensor readings and species incidence counts are inherently noisy. Often a simple linear transformation can be applied to obtain a denoized re-estimation of the data 1 . For instance, a noisy image can be rectified by applying a filter that exploits the fact that adjacent pixels in an image tend to have similar values 2 . Similarly, species interaction values are not randomly distributed, but exhibit structures such as nestedness 3,4 , modularity 5 or low-dimensional embedding 6 . Since these interactions are largely determined by evolved traits of both partners 7-9 , a filter for these types of data could take this information into account.
Machine learning methods, often based on kernels, have been applied with great success in similar cases, for example to predict interaction values between biomolecules based on sequence information [10][11][12] , but seem to have remained absent from an ecological context. If no side information such as traits or phylogeny of the individual species is available, only the structure of the interaction dataset can be exploited. This can be realized by letting the filtered interaction values not only depend on the observed interaction, but also on the degree to which the two species in the interactions are involved in other interactions. Let Y = [Y ij ] be the sparse n × m matrix of interaction values, either a binary matrix or a matrix of positive real numbers expressing interaction strength. We refer to the non-zero values, i.e. detected interactions, as positive interactions, and to the zero values, i.e. absent interactions, as negative interactions. In ecological literature, 'positive interaction' is often used to refer to an interaction in which both species benefit (e.g. symbiosis), while 'negative interaction' is used for an interaction where one of the species has a disadvantage (e.g. parasitism). In this work, we use the term positive (resp. negative) interactions to refer to an observed (resp. unobserved) interaction, regardless of the nature of the interaction. This is more consistent with standard statistical terminology.
The filtered interaction matrix F = [F ij ] can be obtained as the following weighted average of averages: . The first term is proportional to the interaction value, while the last term is proportional to the average of all interaction values in the matrix. The second (resp. third) term is proportional to the average of the values in the corresponding column (resp. row), i.e. relative to the promiscuity of the individual species. The parameters α 1 , α 2 , α 3 and α 4 act as weighting coëfficiënts. This filter is illustrated on a toy dataset in Fig. 1(a-c).
Usually, interaction datasets are sampled by monitoring one of the species types and observing the number of interactions with the species of the other type 13 (e.g. studying the fecal matter of predators to assess their preys or keeping track of pollinators landing on plants). As a consequence, these interaction matrices are often undersampled and some zeros might be false negatives rather than true negative interactions 14,15 . This can lead to some serious biases in descriptors derived from such matrices 13,[16][17][18] . To assess whether a particular interaction between species i and species j is likely to occur in reality according to the dataset, one should ideally not make use of the observed interaction value Y ij . We therefore impute this interaction value, further on denoted as β, in such a way that when it is passed through the filter, it remains unchanged. This embodies the rationale that we want to impute the interaction value to closely match the rest of the data according to the filter. Consider Eq. (1) using a copy of Y where Y ij is replaced by β, then it should hold that: This is illustrated in Fig. 1(d-f) for the toy dataset. Solving for β, we obtain This imputation does not depend on the original value of Y ij , as can be gleaned from Eq. (2). Only the other interaction values in the dataset contribute to the imputation. The process of imputing the interaction values one by one is known as leave-one-out (LOO) imputation. Equation (4) is a special case of the well-known LOO shortcut 19 and provides a computationally efficient way of performing LOO imputation. As a simple method to detect false negatives in interaction matrices, we suggest to score negative interactions in datasets using LOO imputation and rank the negative interactions according to this score. The last term in Eq. (1), i.e. the average interaction value, will not influence the ranking of interactions. However, if the goal is to impute the interaction value to some degree of accuracy, this term provides an essential contribution. Negative interactions that receive high scores during imputation are potential false negatives and should be closer examined. In the experiments we will demonstrate, first, that imputations of positive interactions will on average result in higher scores than negative interactions and, second, that false negatives in turn receive higher scores than true negatives, making this a suitable method for false negative discovery. The proposed linear filter will be compared to the use of a low-rank approximation of the interaction matrix, obtained through singular value decomposition (SVD), a popular method to impute missing values in collaborative filtering 20,21 . The re-estimation using SVD is obtained by retaining only the leading eigenvalues of the matrix Y after decomposition. Since the eigenvalue spectrum of the interaction dataset is related to the nestedness of the network 22 , it seems sensible that this method could work well for nested interaction networks. Our filter works demonstratively better than SVD in most cases and remains performant even with very high rates of false negative interactions. Finally, we illustrate that when forbidden links (i.e. true negatives) are known, the performance can be increased slightly.

Material and Methods
In our experiments we used a series of species interaction datasets obtained from the Interaction Web DataBase (https://www.nceas.ucsb.edu/interactionweb/resources.html) and Web of Life database (http://www.web-of-life. es/). We only withheld datasets with at least ten rows and ten columns, leaving us with 180 datasets describing anemone-fish, host-parasite, plant-ant, plant-herbivore, plant-seed dispensers, plant-pollinator and predatory-prey interactions. We have chosen such a diverse catalogue of datasets to illustrate that the proposed method is broadly applicable. Some datasets contained only binary absence-presence information, others contained valued interactions, such as frequency of visits. Our method can be applied regardless. All datasets were quite sparse, with an average positive interaction density ρ of 0.15 ± 0.12 (average value ± standard deviation calculated over the different datasets).
In this work we investigate whether the scores of imputed interaction values can be used to discriminate between unobserved positive and negative interactions. As a performance metric, we will use the area under the ROC curve (AUC), calculated as with F ij the imputed score,  + (resp. −  ) the set of the positive (resp. negative) interactions and H(·) the Heaviside step function. The AUC can be interpreted as the probability that a randomly chosen positive interaction receives a higher score than a randomly chosen negative interaction.
The LOO imputations of the interaction datasets were computed using Eq. (4). Since we use AUC to evaluate the imputations, we are not interested in the exact values. Rather, positive interactions should on average receive higher imputed values compared to negative interactions. A small explorative study on a couple of datasets has shown that our ranking-based evaluation using AUC is quite insensitive to the exact values of the parameters of the filter. Hence, we have set all parameters equal, i.e. (α 1 , α 2 , α 3 , α 4 ) = (0.25, 0.25, 0.25, 0.25), meaning that each of the four averages in Eq. (1) has the same weight. The filter is thus reduced to a standard average. If the filter would be used to estimate the probability of interaction or the interaction strength, we recommend to do some tuning of the parameters to the dataset at hand, for example, using cross-validation to minimize squared loss.

Results
First, we show that a positive interaction receives a higher score than a negative interaction. For each dataset, we calculated the LOO imputation and compared the scores of the positive and the negative interactions. The average AUC was found to be 0.77 ± 0.10, meaning that on average there is about 77% chance that a missing positive interaction will receive a higher score than a missing negative interaction. Intriguingly, we found that using the strength of the interactions tends to decrease the performance. When datasets containing strength of interactions were binarized by setting positive values to one, the performance increased on average with 3.5% ± 4.4%. A paired t-test showed that this increase in average AUC is significant at the 0.01 level ( −  p 10 10 , n = 94 datasets). This implies that in many cases the strength of interaction is too noisy to be exploited by the filter. This was to be expected, as quantitative interaction strength depends on local conditions 23,24 , and is therefore more susceptible to noise. Hence, making the interaction matrix binary often leads to more robust filtering.
Four sizeable datasets representing different types of interactions [25][26][27][28][29] were studied in more detail, see Fig. 2. In Fig. 3(a) the ROC curves illustrate that usually a large fraction of the positive interactions can easily be detected without obtaining many false positives. This is important for practical applications, as these high-scoring interactions should be used to decide which interactions are promising for validation in the field. The top-scoring interactions are strongly enriched with positives, as illustrated in Fig. 3(b), which shows the precision (fraction of top-scoring positive interactions) as a function of the size of the top. Although the individual patterns vary with the density, distribution and sampling effort of the interaction datasets, here one can observe also a clear trend that making the datasets binary results in higher precision. On average, for all datasets, the precision at the top-10 was 0.69 ± 0.27, which is substantially higher than the average density of 15%, the expected precision of a random scoring.
Since most species interaction datasets are obtained through observation studies, negative interactions may either indicate that the species do not interact in practice or that their interaction is not observed during the study. To show that linear filtering can reveal false negatives, we created variants of each dataset, each with exactly one  positive interaction made negative, and did this for every positive interaction. Subsequently, all negative interactions were scored using LOO imputation and the score of the false negative was compared with the scores of the true negatives (Fig. 4). The average AUC for detecting these false negatives was 0.78 ± 0.098, averaged over all the 180 datasets. Again, when the interaction datasets containing strength of interaction were binarized, the performance increased with on average 4.0% ± 4.4%. Using a paired t-test, this increase in average AUC was also found to be significant at the 0.01 level ( −  p 10 10 , n = 94 datasets). Whereas the previous experiment showed that positive interactions receive higher scores than negative interactions, this experiment demonstrates that within the negative interactions, false negatives tend to receive higher scores than true negatives. Table 1 summarizes the AUC scores obtained for the two described experiments.
Even when many interactions are missing, our method remains performant. In an additional experiment, first, we illustrate how the performance of the linear filter changes with larger fractions of false negatives and, second, we compare the linear filter to the use of a low-rank approximation of the interaction matrix Y obtained by SVD. SVD can be used to obtain the closest approximation in terms of mean squared error of a matrix for a given rank. The rank was chosen as the lowest rank such that the approximated dataset retained at least 75% of the variance of the original dataset. The re-estimated matrix was evaluated the same way as the matrix obtained by LOO imputation using the linear filter. Experiments using both the linear filter and the SVD approximation were performed on the four datasets in Fig. 2, by randomly setting 5%, 10%, 20%, 50% or 90% of the positive interaction values to zero. Using AUC, we assessed how well the re-estimated interaction values could be used to discriminate between true and false negatives. Re-estimation was done using both the original interaction datasets and versions of the datasets where the interaction values were binarized. Each experiment was repeated 100 times. The performances are listed in Table 2. For three datasets, the linear filter clearly shows a better performance. Interestingly, SVD seems to work really well on the predator-prey dataset, a large dataset with visually a strong structural pattern. Nevertheless, using the linear filter usually leads to a good performance, especially since most interaction matrices are rather small. This filter also seems to be still able to detect false negative interactions even when the   Table 1. Average AUC, aggregated for different densities ρ and different total numbers of positive interactions in all the different datasets. The first part gives the results for the imputation experiments, the second part presents the results for the false negative recovery experiments. All datasets with interaction strengths were binarized for these experiments.
percentage of false negatives is very high, in contrast to using the low-rank approximation. This indicates that our method is quite robust, even when the datasets contain many missing values. Finally, we performed a small experiment where true negatives or forbidden links are known. To this end, we use the 25-by-25 seed-dispersal network of Olesen and coauthors 30 . It consists of 156 observed positive interactions and 228 forbidden interactions due to phenological uncoupling or morphological constraints. We used the linear filter to perform LOO imputation on the interaction matrix. Figure 5 shows the distributions of the imputed values for the positive interactions, true negative interactions and negative interactions that are potential false positives. The AUC for discriminating between positive and negative interactions (both true negatives and false negatives) using LOO imputation was found to be 0.8270. When only trying to discriminate between true positives and true negatives, the AUC was 0.7981. Upon removing the true negatives, the AUC improved slightly to 0.8543. For this dataset, it seems that the true negatives are somewhat harder to identify than the negatives in general. When true negatives are known, it is best to only search for false negatives within the potentially positive interactions.  Table 2. Comparison of the linear filter with SVD for an increasing fraction of randomly assigned false negatives (FN) for four datasets. The AUC is given for both the original dataset and a binarized version. Each performance is an average of 100 repetitions. In most cases the linear filter is better than SVD. The performance of the latter deteriorates quickly with an increasing number of false negatives. The performance of the linear filter remains relatively high, even with 90% of false negatives. Discussion Evidently, the latent information in the interaction matrices can be used to detect unobserved (false negative) interactions. We are convinced that techniques such as linear filtering may allow to either directly ameliorate an interaction dataset or can be used to suggest promising interactions that can subsequently be verified in the field. Making use of in silico predicted interaction scores to suggest experiments in vitro is already commonplace in domains such as drug discovery 31 and can be seen as part of the broader paradigm of recommender systems 32,33 . Negative interactions with high scores are natural targets for increased sampling effort, as they are most likely to occur in reality. Standard algorithms for recommender systems make recommendations by exploiting structures in the data, e.g. low-rankness of the interaction matrix 34 . This idea could be applied to predict the value of missing interactions. For example, it has been used successfully to predict the joint growth between heterotrophic and methanotrophic bacteria 35 . Other methods for filtering a network could be based on different principles, for example the stochastic block model 36 . In essence, the simple linear filter of Eq. (1) and the associated imputation formula (4) only use information on row and column counts to do an imputation. We can motivate the use of this filter in three ways. Firstly, it is a very simple first method to try to infer false negatives. Although despite having four parameters, their exact value is less important if one is only interested in ranking interactions, so not much tuning is required. Secondly, the filter is very robust and works demonstratively well on small datasets and with a very large fraction of false negatives. Finally, using the shortcut for LOO cross validation, it is very easy and computationally efficient to get a realistic estimate of the performance of the filter for a given dataset. More complex methods are expected to yield better performance, but require to be tuned more carefully to the dataset at hand.
Often, one has information about the individual species, such as geographical location, morphology or phylogeny, which can also be incorporated to predict interaction 8,37,38 . Using such side information, denoted as content-based filtering in recommender systems 32 , can improve the accuracy of the prediction as well as explain the interactions based on species traits, if used in combination with model selection tools. As we have not incorporated such information in our method, the performances presented in this work can be seen as a lower bound for detecting missing interactions.