Unsupervised Scalable Statistical Method for Identifying Influential Users in Online Social Networks

Billions of users interact intensively every day via Online Social Networks (OSNs) such as Facebook, Twitter, or Google+. This makes OSNs an invaluable source of information, and channel of actuation, for sectors like advertising, marketing, or politics. To get the most of OSNs, analysts need to identify influential users that can be leveraged for promoting products, distributing messages, or improving the image of companies. In this report we propose a new unsupervised method, Massive Unsupervised Outlier Detection (MUOD), based on outliers detection, for providing support in the identification of influential users. MUOD is scalable, and can hence be used in large OSNs. Moreover, it labels the outliers as of shape, magnitude, or amplitude, depending of their features. This allows classifying the outlier users in multiple different classes, which are likely to include different types of influential users. Applying MUOD to a subset of roughly 400 million Google+ users, it has allowed identifying and discriminating automatically sets of outlier users, which present features associated to different definitions of influential users, like capacity to attract engagement, capacity to attract a large number of followers, or high infection capacity.


Introduction
In this work we deal with real multivariate data that describes d = 21 features of Google+ users. See Section 6 for a detailed description of the features. Moreover, note that, after a preliminary analysis of the data, we run our study using only 21 features.
When multivariate data have more than three dimensions is practically impossible to graphically visualize the observations using Cartesian orthogonal axes. A convenient alternative is given by a visualization tool known as parallel coordinates ( [30]). Parallel coordinates substitute Cartesian orthogonal axes with parallel axes, represent a multivariate point on these parallel coordinates by a series of points and then connect each pair of adjacent points by a line, forming at the end a broken line.
As suggested by [19], once represented by means of parallel coordinates, observations x ∈ R d can be seen as real functions defined on an arbitrary set of equally spaced domain points, e.g., {1, . . . , d}, and x can be expressed as x = {x(1), . . . , x(d)}. Note that observations that can be represented as curves then can be analyzed using the tools provided by an area of statistics known as functional data analysis (FDA * ). We use the same approach in this work: first, we represent multivariate data using parallel coordinates; second, we treat the transformed multivariate data as functional. Note that this approach provides a unified framework where both visualization and analysis tools become available.
In the univariate or multivariate framework we assume that observations are generated by an univariate or multivariate random variable, accordingly. Similarly, in the FDA framework it is common to assume that observations are generated by a functional random variable X ∈ F, where F is a functional space. An alternative way to interpret X is as a stochastic process {X(t), t ∈ I}, where I is an interval in R.
The accurate identification of outliers is an important aspect in any statistical data analysis and scientists have interest in identifying such atypical observations because: • a statistical analysis performed on a sample containing outliers may lead to misleading results; • a non-statistical analysis of the observations detected as outliers done by experts in the field may reveal interesting aspects of the phenomenon under study.
Outlier detection is an issue also in FDA, where an outlier can be defined as an observation generated by a functional random variable with a different distribution than the one generating the observations of a functional sample ( [13]). Note that the previous definition covers the case of isolated outliers, which exhibit outlying behavior during a very short domain interval, and the case of persistent outliers, which are outlying on a large part of the domain ( [12]). In this work we focus on persistent outliers, and on the three types of persistent outliers defined by [12]: shift/magnitude outliers, which have the same shape of the majority, but are moved away; amplitude outliers, curves that may have the same shape as the majority but their scale differs; shape outliers, curves whose shape differs from the majority.
We propose a statistical methodology for identifying outliers and classifying them as magnitude, amplitude and shape outliers, and we argue that influential users of Online Social Networks can be seen as outliers since they usually show behaviors and patterns that are different from those of non-influential users. Therefore, the new methodology can be used to search for potentially relevant Google+ users, whose identification will be based on a statistical criterion instead of directed arguments [10,8,11,9,4].
Detection of influencers may have a special relevance in digital marketing in order to maximize profits using social networks data [35,36]. Moreover, since the new procedure searches for three different types of outliers, it may reveal different types of relevant Google+ users. We refer to the proposed method as Massive Unsupervised Outlier Detection (MUOD).
Before formally defining MUOD, let us consider the illustrative example in Figure S1, where observations are curves having the following characteristic: at different but relatively common levels, they all share the same behavior in terms of magnitude, amplitude and shape. More precisely, the curves in Figure S1 represent the same equation, but at different levels defined by different constants, and we can state that data set in Figure S1 does not contain any outlier. Figure S2 is a modification of Figure S1, where, using now relatively large constants, we obtain several curves that can be seen as magnitude outliers, without being amplitude and shape outliers. One of the goal of MUOD is to detect this kind of outliers and identify them as magnitude outliers.
Figures S3 and S4 are two additional modifications of Figure S1, and they serve to  illustrate examples of amplitude and shape contamination of a data set. Our proposal will also deal with these types of scenarios of contamination, trying to solve their associated outlier detection problems. Finally, note that we also aim to satisfactorily solve outlier detection problems where magnitude, amplitude and shape outliers simultaneously contaminate a data set as in Figure   S5, or even where an outlier can belong to several classes.  Figure S5: A modification of Example 1 allowing magnitude, amplitude and shape outliers.

A New Outlier Detection Method Based on Indices of Magnitude, Amplitude and Shape
In this section we present three indices that can be used to analyze the type of data sets that we are considering in this work, i.e., multivariate data sets represented by means of a parallel coordinates system. These indices can be interpreted as similarity measures of an observation with respect to a sample, and each one of them focuses on a different feature of the data: magnitude, amplitude or shape. We start introducing the shape index, which is based on a well known statistical coefficient such as the Pearson correlation coefficient.
. , x n } be a set of n curves whose common discretized form is defined on a given set of d equally-spaced domain points, and let x be another curve defined on the same set. We define the shape index of x with respect to X as where ρ(x, x j ) is the Pearson correlation coefficient between of x and x j , and it is the measure responsible to capture the similarity between x and x j in terms of shape. Note that, if we compute (1) for each curve in Figure S1 with respect to the whole data set, then we always obtain an I S equal to 0. The same happens with the data sets in Figures S2 and S3. However, if we compute (1) for one of the shape outliers from Figure S4, say x s , with respect to the whole data set, then we obtain I S (x s , X ) significantly greater than 0, while I S (x c , X ) 0, where x c is any of the "common" curves in Figure S4. Indeed, we have computed I S (x, X ) for all the curves in Figure S4 and sorted them from the lowest to the highest indices, that is, we have ranked the curves using I S (x, X ) as criterion. Figure S6 is a scatter plot of the I S (x, X )-based ranks (horizontal axis) and the I S (x, X ) values (vertical axis): there are 12 points that stand out because of their I S (x, X ) value, and they correspond exactly to the 12 shape outliers that we have generated in Figure S4. Therefore, I S seems to help in detecting shape outliers. Clearly, in real outlier detection problems the behavior of both non-outlying and outlying curves is less "regular" than the one in our illustrative examples. Hence, the corresponding scatter plots will show more differences among the index values of non-outlying curves, and likely less differences between the index values of non-outlying and outlying curves. However, we hold our statement: I S is a useful tool to identify shape outliers.
The magnitude and amplitude indices are also based on some well-known concepts in statistics. Letα j andβ j be the estimated intercept and slope of a simple linear regression model where the discretized version of x represents the observed values of the dependent variable and the discretized version of x j represents the observed values of the regressor.
More formally,β Then, we define the magnitude index of x with respect to X as and the amplitude index of x with respect to X as As for I S , using I M we compute I M (x, X ) for all the curves in Figure S2 and obtain the scatter plot of Figure S7. Also in this case we observe points that stand out because of their index's values, which correspond to the magnitude outliers from Figure S2. Similarly, using I A we compute I A (x, X ) for all the curves in Figure S3 and obtain the scatter plot shown in Figure S8. Once more, the true outliers, i.e., the amplitude outliers from Figure S3, stand out in the scatter plot.
Therefore, also I M and I A seem to help in detecting the type of outliers for which they were designed, i.e., magnitude and amplitude outliers, respectively. Now, we have to be able to detect and filter those values which stand out in the scatter plots. To do so, we can use a heuristic such as the first derivative of the indices curve (if necessary, normalized) and filtering out those fulfilling a certain condition (e.g. slope ≥ 2). Also, we can apply the tangent method as described below. Let I S (X ) = {I S (x 1 , X ), · · · , I S (x n , X )} be the set of shape indices. Analogously, let I M (X ) and I A (X ) be the set of magnitude and amplitude indices respectively. Hereafter, we will use I(X ) for any of these three sets of indices indistinctly. Then, we sort (ascending order) and plot the set I(X ) as in, e.g, Figures S6 to S8. Thus, the tangent method consists in taking, as boundary between regular curves and outliers, the value I * (X ) = I(x * , X ) at the intersection point x * between the tangent at the furthest right point and the horizontal axis. Figure S9 illustrate this method. The tangent method is inspired on the classical scale determination assuming you have an exponential decay, as described in [14]. Besides, the authors compare this method with the first derivative one, being the latter outperformed by the tangent method.
Finally, we use the value I * (X ) to identify the outstanding indices. The threshold I * (X ) represents the maximum index value to be considered a non-outlying value. Hence, any index value above this threshold is considered an outlier. Thus, we obtain the set of outliers as I out (X ) = {I(x i , X ) : I(x i , X ) > I * (X )}.

A New Outlier Detection Method: Implementation
We have built a scalable algorithm that implements the method. Now, we are going to outline the implementation details of our algorithm and how it scales. Since we have the set  Figure S9: Tangent method applied to shape indices for our real dataset. The horizontal axis represents sample percentiles based on the shape index. The vertical axis represents shape index values. The green dot cuts the x-axis at x * = 0.9546, yielding a 4.538% of outliers (the higher values at the right-side). I * (X ) = I(0.9546, X ).
Partitioning A key step to make our algorithm parallel and, consequently, scalable is to partition the matrix M X . This way, we can simultaneously compute the indices leveraging the current multi-core machines. For this purpose, we define a partition Π = {π j } of users, where π j ⊆ X are non empty, mutually disjoint and j π j = X . The number of classes is k = |Π|. That will allow us to separately compute the indices in each class. Note that different partitions may exist and they range from the discrete partition (k = n : |π j | = 1, ∀j) to the unit partition (k = 1 : |π j | = n, j = {1}). The partitions considered hereinafter are equal sized and always consecutive elements (users), according to the matrix M X . Our algorithm has the matrix M X and the correspondant set of users π j as input. Note that, we can see π j ∈ Π as a matrix π j = [x i . . . x i+|π j | ] ∈ R d×|π j | for every j according to the partition Π.
We describe now the different sub-algorithms for the computation of the indices. Note that the algorithms will use any set π j as input, regardless of the partition chosen.

Shape Indices
We present the algorithm to compute I S (π j , X ). Algorithm 1 shows a possible implementation of Equation 1. The mean function in Line 3 is a column-wise mean on the matrix cor.

Time Complexity
The algorithm's complexity is dominated by computing the correlation (ρ) matrix cor whose complexity is O(|π j |dn). Since this function will be run for every set π j it will be computed for the n curves/users at the end. However, it may be computed in parallel in up to p cores, which yields a final time complexity of O( n 2 d p )

Magnitude & Amplitude Indices
These two indices are jointly computed since they are estimated together from a linear regression model as explained in Section 2 (see equation (2)).
Algorithm 2 shows a possible implementation for computing I M (π j , X ) and I A (π j , X ).
The mean(·) functions in lines 3, 8, 10 and 12 are column-wise. Finally, the division and product operations in lines 7 and 9 are both column and element-wise, while the operation in line 11 is row-wise.

Algorithm 2 Get Magnitude&Amplitude Indices
1: function get magamp indices(M X , π j , means, vars) 2: Paramaters: Time Complexity The algorithm's complexity is dominated by computing the covariance matrix cov whose complexity is O(|π j |dn). Similarly as in the previous algorithm, this function will be run for every set π j and hence, it will be computed for the n curves/users at the end. However, it may be computed in parallel in up to p cores, which again yields a final time complexity of O( n 2 d p )

Three indices simultaneously
We have already shown how to compute the three indices with a total complexity of O( n 2 d p ) in the two cases, which would give us the same total time complexity. However, we would like to compute the three indices simultaneously in order to optimize the time spent. For this purpose, we are going to leverage the following result.
Thus, we can compute the correlation matrix used in Algorithm 1 from the covariance matrix used in Algorithm 2. Basically, the resulting algorithm is similar to the previous one for magnitude and amplitude indices, but with one more parameter and one more line. This extra Line 8 applies the Equation 5, first dividing (column-wise) by the standard deviations of M X , and second dividing (row-wise) by the standard deviations from the corresponding curves in π j . Algorithm 3 below outlines our proposed method.
Algorithm 3 Get Shape, Magnitude and Amplitude Indices 1: function get sma indices(M X , π j , means, vars, sds) 2: Paramaters: 3: Time Complexity This final algorithm's complexity does not increase from the previous one, O( n 2 d p ). In spite of not decreasing the time complexity, this last algorithm computes one more index adding a few more instructions, saving the whole computation of the correlation matrix (from scratch) in Algorithm 1. Additionally, each algorithm may be computed in parallel for each partition π j and then concatenating the results. Observe that, we can reduce the time complexity depending on how many partitions we use and how much computing power we had, i.e., p = 10 leads to one order of magnitude reduction, while p = 100 allows two orders of magnitude reduction, and so on.

Outlier Detection in Functional Data
The aim of this section is to present the results of a simulation study where we compare MUOD with some competitors recently used in the literature of functional data. Since some of them are based on the use of measures known as functional depths, we briefly introduce a general definition. A functional depth is a measure that allows to order and rank the observations in a functional sample from the most to the least central, and it gives high values to central observations and low values to non-central observations. In FDA, unlike in univariate statistics, where R provides a natural order criterion for observations, several criteria have been employed to order functional data, and therefore there exist different implementations of the notion of functional depth (see [37], where different functional depths have been reviewed and used in classification problems).
Recall that we propose three type-oriented outlier detection methods to find alternatively magnitude (MUOD mag ), amplitude (MUOD amp ) and shape (MUOD sha ) outliers, and note that we refer to their simultaneous use on a single data set simply as MUOD. We will compare MUOD mag , MUOD amp , MUOD sha and MUOD with a battery of competitors given by the outlier detection methods considered in [38]. In addition, we also consider a method proposed in [39]. Next, we briefly describe these competitors (for more details, see [38] and references therein, and [39]): • B tri and B wei ( [13]): depth-based outlier detection methods which select the threshold of a functional depth by means of two alternative procedures based on bootstrap techniques. B tri and B wei differ in their resampling steps: in B tri the resampling is done on a trimmed version of the original sample, that is, after deleting from the sample a given proportion of least deep curves (trimmed resampling); in B wei the resampling is done giving weights to sample observations that are proportional to their depth values (weighted resampling). B tri and B wei work with any functional depth. Following their authors, we use B tri and B wei together with the h-modal depth ( [40]).
• FBAG ( [41]): procedure that reduces the outlier detection problem from functional to multivariate by means of the functional principal component analysis technique. Once obtained the first two functional principal components scores, FBAG orders the scores using the multivariate halfspace depth ( [42]) and builds a non-outlying region. FBAG detects as outliers those observations whose scores are outside the non-outlying region.
• FHDR ( [41]): procedure that differs from FBAG after obtaining the first two functional principal components scores. FHDR performs a bivariate kernel density estimation on the scores and defines a high density region. FHDR detects as outliers those observations whose scores are outside the high density region.
• FBPLOT ( [18]): procedure that extends the standard boxplot to the functional framework, allowing the identification of a non-outlying band defined by a specified percentage of the deepest curves, and therefore the detection of outliers. FBPLOT works with any functional depth. Following its authors, we use FBPLOT together with the modified band depth ( [19]).
• OG ( [39]): depth-based outlier detection method based on a visualization tool known as outliergram. The outliergram exploits the relation between the modified band depth ( [19]) and modified epigraph index ( [43]) to help understanding shape features of observations, and it expressly detects shape outliers.
• KFSD smo , KFSD tri and KFSD wei ( [38]): depth-based outlier detection methods which select the threshold of the kernelized functional spatial depth (KFSD, [37]) by means of a probabilistic procedure based on three alternative resampling techniques. KFSD smo , KFSD tri and KFSD wei differ in their resampling steps: in KFSD smo the resampling is simple and smoothed, that is, once an observation is sampled, a small perturbation is added to the observation to avoid the presence of repeated observations; in KFSD tri the resampling is trimmed and smoothed; in KFSD wei the resampling is weighted and smoothed.
We have tested all these methods with random samples of our dataset of Google+ users in order to observe the time performance. Experiments have been carried out in an AMD Opteron 6276 multicore (x64) @ 2.3GHz with 512GiB of RAM under Debian 7.9. Figure S10 shows the time performance for all these methods including ours. We have run our method with one single partition (MUOD), and using 10 partitions (MUOD 10-way) in order to check the scalability and verify that the running time decreases one order of magnitude (better time performance). It is observable that the parallel version needs a setup time which is amortized as long as the data size increases.  . FBPLOThas the best performance by far as we increase the data size. MUOD and MUOD 10-way are the second best performance lines. Observe that, FBPLOTand MUOD are the only ones performing in less than 100 seconds for a data size of 3 · 10 4 users. FBAGand FHDRhave a similiar behaviour as the data size grows and performs similar to MUOD, but they are not able to process a data size of 3 · 10 5 . OutGRAM have a similar behaviour than the former algorithms but is not able to process dataset bigger than 3 · 10 4 users. Then we found the KSFD family and the B family with a poor performance and a dataset size limit of 10 3 users.
We conclude that, only FBPLOT and MUOD are able to scale out for datasets bigger than 10 5 users, which is far from our real dataset from Google+ (5, 619, 786 users). For this reason, we are only going to use FBPLOT and MUOD in our real data study case (Section 6), where both algorithms are evaluated and compared.

Simulation Study
In this section we present the results of a simulation study where we compare our methods with the competitors described in the previous section. The goal is to show that our procedures are competitive in the usual framework used in the FDA literature.

Mixture Models 1, 2, and 3
In the first part of our simulation study we generate observations using the following model: where t ∈ {t j }, {t j } is a sequence of d equidistant points between 0 and 2π, and a 1 and a 2 are independent realizations of a continuous uniform random variable between 0.75 and 1.25. In Figure S11 we show a simulated data set obtained using (6), d = 25 and n = 100, where n is the size of the data set. To generate outliers with respect to model (6) we use three different models. They generate magnitude, amplitude, and shape outliers, respectively. The magnitude outliers' model is given by where k is a realization of a discrete random variable with P (k = −1) = P (k = 1) = 0.5.
In Figure S12 we show a simulated data set obtained using a mixture of models (6) and (7) (mixture model 1), with α = 0.05 being the probability that each curve is a magnitude outlier. The amplitude outliers' model is given by where b 1 and b 2 are independent realizations of a continuous uniform random variable between 1.30 and 1.50. In Figure S13 we show a simulated data set obtained using a mixture of models (6) and (8) (mixture model 2), with α = 0.05 being the probability that each curve is an amplitude outlier.
The shape outliers' model is given by where (t) ∈ { (t j )} and { (t j )} is a d-dimensional sample of independent realizations of a normal random variable having mean equal to 0 and variance equal to 1/9. In Figure S14 we show a simulated data set obtained using a mixture of models (6) and (9)   sets and is performed in terms of correct outlier detection percentages, false outlier detection percentages and F-measures. F-measures are given by where R = T P (T P +F N ) is known as recall measure, P = T P T P +F P is known as precision measure and T P, F N , and F P are the number of true positives, false negatives and false positives, respectively.
The F-measure is the harmonic mean of precision and recall and it is widely used instead of accuracy when the data set is unbalanced as in our seting ( [44]). It gives a good trade-off for precision while preserving a high recall rate.
The results obtained by our methods and their competitors are shown in Table S1, where we also report the F-measure-based rankings of the methods for each mixture model. Focusing on summary indices such as the F-measures, the results in Table S1 show that: • The proposed type-oriented methods are always among the four best methods. More in detail, MUOD mag is the third best method in presence of magnitude outliers, MUOD amp is the best method in presence of amplitude outliers, which are probably the most challenging outliers to detect, and MUOD sha is the fourth best method in presence of shape outliers. Therefore, one the main novelties of our methods, that is, their type-orientation, it is supported by good and stable results for the considered mixture models.
• MUOD, which is the method obtained by the simultaneous use of MUOD mag , MUOD amp and MUOD sha on a single data set, has a remarkably good performance since it is always among the seven best methods. Clearly, for the considered mixture models, MUOD slightly suffers in terms of false outlier detection percentages since it searches for all type of outliers while all the simulated data sets contains (eventually) only one type of outliers.
• Concerning our competitors: B wei and KFSD smo perform better than MUOD mag in presence of magnitude outliers, however these are methods that entail important computational limitations; the same methods and B tri perform better than MUOD sha in presence of shape outliers, but the remark on the computational limitations still holds; FBAG is the best competitor of MUOD amp in presence amplitude outliers, however it shows poor performances in presence of magnitude and shape outliers.

Mixture Model 4
In Figure S15 we show a simulated data set obtained using a mixture of models (6), (7), (8) and (9) (mixture model 4) with α = 0.05 being the probability that each curve is an outlier and 1 3 being the probability that each outlier is a magnitude, amplitude and shape outlier, respectively.
For mixture model 4 the comparison among methods is based on 300 generated data sets and is performed as in Table S1. The results are shown in Table S2.
Note that for the type-oriented methods, i.e., OG, MUOD mag , MUOD amp and MUOD sha , the correct outlier detection percentages of Table S2 are difficult to interpret because these methods search for a specific type of outliers. Therefore, to interpret the performances of OG, MUOD mag , MUOD amp and MUOD sha that we observe in Table S2, it is necessary to decompose the outlier detection problem into three parts. This decomposition is presented   Table S3. Note that this table has the same structure of Table S1, but it is associated to a fairly more complex outlier detection problem: in this case the observations and the three types of outliers are generated from a unique mixture model. Clearly, since the non-typeoriented methods search for any type of outliers, their false outlier detection percentages of Table S3 are also difficult to interpret. • MUOD has a good performance in terms of the F-measures of Table S2. It is the third best method, and it is slightly outperformed by two computational intensive and non-type-oriented methods such as B wei and KFSD smo .
• The good performance of MUOD is due to the good and complementary performances of MUOD mag , MUOD amp and MUOD sha shown in Table S3: especially MUOD mag and MUOD amp detect very satisfactorily the outliers for which they have been proposed, whereas the performance of MUOD sha is slightly less efficient in terms of its F-measure.
Since mixture model 4 simultaneously allows magnitude, amplitude and shape outliers, it represents a model able to generate data sets that might resemble our real data application. For this reason, we use mixture model 4 to generate a data set having a sample size comparable to the sample size that we observe in our real data application and we perform outlier detection on it. We generate a data set having 6,000,000 observations and we compare the performances of FBPLOT with the ones of the proposed procedures. We restrict this comparison to these four procedures for computational reasons (see Figure S10).
The simulated data set contains 300,305 outliers (roughly 5%) divided in this way: 99,787 magnitude outliers, 100,166 amplitude outliers and 100,352 shape outliers. In Table S4 we report how many observations are detected by FBPLOT, MUOD mag , MUOD amp and MUOD sha as outliers (in the diagonal and in bold), and we also describe the intersection among the outputs of these procedures (outside the diagonal). According to Table S4, MUOD mag and MUOD amp are the two procedures detecting the most outliers, and their outputs in practice do not overlap. MUOD sha detects less outliers and it shows an appreciable overlap with MUOD mag (18.05% of the observations detected as outliers by MUOD sha are also detected as outliers by MUOD mag ). FBPLOT is the procedure detecting the least outliers and it shows appreciable overlaps with MUOD mag and MUOD sha (22.15% and 54.59% of the observations detected as outliers by F BP LOT are also detected as outliers by MUOD mag and MUOD sha , respectively). A graphical illustration of such results is given in Figure S16. Tables S5 and S6 replicate Tables S2 and S3, respectively, but now we only consider FBPLOT, MUOD, MUOD mag , MUOD amp and MUOD sha as procedures and their performances are evaluated on a unique big data set (with 6,000,000 observations) instead of on many small data sets (300 data sets with 100 observations). In these new tables we omit the F-measure-based rankings.
The results in Tables S5 and S6 when compared to the results in Tables S2 and S3 show that: Outliers groups. Simulation outliers shape magnitude amplitude fbplot Figure S16: Venn's diagram for outliers and algorithm's detected sets. This plot visually summarizes the results in Tables S4 and S5.   • MUOD improves its performance in terms of F-measures when we compare Table S5 with Table S2. On the contrary, FBPLOT shows an important deterioration of its performance.
• Also MUOD mag , MUOD amp and MUOD sha improve their performances in terms of F-measures when we compare Table S6 with Table S3.  Figure S17: Accuracy for the considered algorithms when sampling. The plot shows average accuracy for 100 simulations using random samples of 100K observations. Each bar shows the accuracy as the intersection ratio between the outliers detected on a sample and the outliers on the total population (6M observations) for an algorithm. The bars show that shape, magnitude and amplitude are not affected when sampling (accuracy ∼ 90%) while FBPlot seems to be sensitive to observation samples. Figure S18: Precision and recall for FBPlot against shape, magnitude and amplitude when sampling. The plot shows average precision and recall for 100 simulations using random samples of 100K observations. Each bar shows that there is a considerable overlap between FBPlot and shape, magnitude and amplitude, yet in different ways. While FBPlot and shape overlap in each sample, FBPlot becomes a small subset of shape. On the other hand, FBPlot overlaps with magnitude and amplitude too, but in this case, magnitude and amplitude become small subsets of FBPlot. To sum up, these plots extend Figures S16 and S17, and show the FBPlot behavior as comparison with MUOD (shape, magnitude, and amplitude) when sampling.
We explore now the behavior of the different algorithms applied over samples of a large population instead of over the whole population. For that, we use the population of 6,000,000 observations described above, and take random samples of 100,000 observations. Figures S17 and S18 present the accuracy, precision, and recall observed when performing this sampling.
This experiment shows that the outliers identified by MUOD in the samples matches in a large degree those identified in the whole population. This is not the case with FBPlot.

Mixture Models 5, 6, and 7
We expand our simulation study considering three additional mixture models that were used in [39] to evaluate OG. We refer to them as mixture model 5, 6 and 7, respectively. Since OG was proposed to expressly detect shape outliers, these mixture models generate different types of shape outliers, which are also the type of outliers that until now our procedures seem to have the most problems to detect.
• In mixture model 5 the main model is given by where t ∈ [0, 1] and (t) is a Gaussian process with zero mean and covariance function γ(s, t) = 0.3 exp {−|s − t|/0.3}, while the contamination model is given by • In mixture model 6 the main model is given by where t ∈ [0, 1] and (t) is a Gaussian process with zero mean and covariance function γ(s, t) = exp {−|s − t|}, while the contamination model is given by where u follows a Bernoulli distribution with probability 1/2 and µ is uniformly distributed in [0.25, 0.75].
• In mixture model 7 the main model is given by (11), with t and (t) defined as in mixture model 6, while the contamination model is given by with µ defined as in mixture model 6.
In Figure S19 we show three simulated data sets obtained using mixture models 5, 6 and 7 with d = 50 equidistant points between 0 and 1, n = 100 and α = 0.05 being the probability that each curve is an outlier in each model. The comparison among methods for mixture models 5, 6 and 7 is performed similarly as for the previous four models, and it is reported in Table S7.
The results in Table S7 show that: • As expected, for the considered mixture models, MUOD sha is the appropriate method among the proposed methods. Moreover, it is always among the two best methods for these mixture models generating shape outliers.
• MUOD has also a good performance since it is always among the six best methods and this is due to the fact that both MUOD mag and MUOD amp do not detect many false outliers.
• The best competitors for MUOD sha and MUOD are B wei , the KFSD-based techniques and OG. For B wei and the KFSD-based techniques hold the remarks made above.
For OG, its good performances in mixture models 5, 6 and 7 contrast with its poor performances in mixture models 1, 2 , 3 and 4, and therefore OG does not represent a real competitor for our proposals.
After comparing our methods with a wide set of competitors in standard FDA simulated scenarios and showing their good performances, in the next section we present some supplementary details on our real data application. As for the 6,000,000 observations run of mixture model 4, also in this case we restrict the analysis to the proposed procedures and FBPLOT.

Outlier Detection in our Real Data Application
In this section we present some remarks on the outlier detection in our real data application.
Recall that our real data set describes 21 features of 5,619,786 users of Google+ and that, after discarding 2 features, we treat it as a functional data set. Table S9 replicates Table S4: first, we report how many observations are detected by FBPLOT, MUOD mag , MUOD amp and MUOD sha as outliers (in the diagonal and in bold); second, we describe the interaction among the outputs of these procedures (outside the diagonal).   Note that the all the observations in our real data set belong exclusively to one of the classes of above. In Table S10 we describe how the observations are distributed in these classes (in absolute and relative terms).
In Figure S20 we represent the observations as functional data † : † We use a logarithmic transformation that avoids the argument of the logarithm to be non-positive.  Observing Figure S20 we notice that the four classes of observations detected as outliers by the proposed procedures would make functional data sets different among them. Such a result seems to justify the detection of four different types of outliers. Moreover, thanks to Figure S21, where we represent 5000 no MUOD observations as functional data, it is possible to appreciate another important fact, i.e., there are differences between the observations detected as outliers and the observations that are normal according to the proposed procedures. For more qualitative details on these differences, see the main article. Once defined the classes no MUOD, only sha, only mag+sha, only amp+sha and mag+amp+sha, and before closing this section, in Tables S11 and S12 we show their distributions against two classes related with the FBPLOT procedure: no FBPLOT and yes FBPLOT, the observations that are normal and outlying according to FBPLOT, respectively.
According to Tables S11 and S12, FBPLOT in practice does not detect as outliers those observations that were not detected as outliers by the proposed procedures. Moreover, the 99.81% and 97.52% of only amp+sha and mag+amp+sha observations are outlying  sha, respectively. Therefore, the main differences between proposed procedures and FBPLOT can be observed in the classes only mag+sha and especially only sha.
In Figures S22-S24 we represent as functional data: a. Figure S22: a 1% of the 284482 (only sha, no FBPLOT) observations (left) and a 25% of the (only sha, yes FBPLOT) observations (right).
In Figures S22-S24 we also draw the upper bound of the non-outlying region defined by FBPLOT.   Observing Figures S22-S24 we notice that the proposed procedures detect as outliers more observations than FBPLOT, especially as only sha outliers. These observations, when compared to some of the observations that are normal according to the proposed procedures ( Figure S21), show a behavior that seems to justify their detection as outliers. For more qualitative details on the differences between MUOD and FBPLOT, see the main article.