Wisdom of crowds benefits perceptual decision making across difficulty levels

Decades of research on collective decision making has claimed that aggregated judgment of multiple individuals is more accurate than expert individual judgement. A longstanding problem in this regard has been to determine how decisions of individuals can be combined to form intelligent group decisions. Our study consisted of a random target detection task in natural scenes, where human subjects (18 subjects, 7 female) detected the presence or absence of a random target as indicated by the cue word displayed prior to stimulus display. Concurrently the neural activities (EEG signals) were recorded. A separate behavioural experiment was performed by different subjects (20 subjects, 11 female) on the same set of images to categorize the tasks according to their difficulty levels. We demonstrate that the weighted average of individual decision confidence/neural decision variables produces significantly better performance than the frequently used majority pooling algorithm. Further, the classification error rates from individual judgement were found to increase with increasing task difficulty. This error could be significantly reduced upon combining the individual decisions using group aggregation rules. Using statistical tests, we show that combining all available participants is unnecessary to achieve minimum classification error rate. We also try to explore if group aggregation benefits depend on the correlation between the individual judgements of the group and our results seem to suggest that reduced inter-subject correlation can improve collective decision making for a fixed difficulty level.


Supplementary Information
Reaction Time Analysis Figure S1: Reaction time and accuracy (A) RT distributions from stimulus offset, shown for all trials. (B) RT distributions shown for correct and incorrect trials separately. (C) RT distributions shown for trials of different confidence levels (1-5). In (A-C) the plotted RT distribution is the average taken over individual participant RTs. Probability distributions have been fitted to the individual RT data using MATLAB's 'fitdist' function. (D) Accuracy and mean RT as a function of confidence (1-5). (E) 95 % confidence intervals of logistic regression slopes (β), using RT to predict single-trial accuracy at individual level. Clearly the slopes of the each logistic regression are significantly less than 0.

Classwise Principal Component Analysis (CPCA)
Single-trial classification of EEG data is a challenging problem due to the high dimension low sample size problem. Under the small sample size conditions, a large portion of the data space is sparse and carries almost no useful information. Generally the non-informative subspace is discarded through extraction of a small set of useful features (feature extraction). In this study we have used a recently proposed non-linear computationally efficient, locally adaptable, pattern classification technique for tackling the HDLSS problem of EEG data. In this algorithm, the strength of PCA as a dimensionality reduction technique is exploited, while preserving the class-specific information to facilitate further classification. The main idea behind the technique is to identify and discard an irrelevant subspace in data by applying PCA to each class. The classification is then carried out in the lower dimensional residual space.
Without loss of generlaity , let us assume it is a binary classification problem. • Step 1 (Feature Extraction) : In this step a piecewise linear subspace is extracted from the training trials.
Let w i (i = 1, 2) denote two classes with means µ i , and covariances Σ i . Let x * ∈ R n be the unknown test data. To represent x * in the two subspaces S 1 and S 2 we use the following transformation, where F i is a matrix of order n × m i with the columns of F i as the basis vectors of S i . The two classes are transformed in a similar way. In the simplest case, F i can be taken as V i ∈ R n×m i , where V i consists of the m i (m i to be chosen) principal components of the class w i . To ensure that class differences arising from the two means are accounted for, F i is taken as To make all projections orthonormal, we orthonormalize the columns of F i via the Gram-Schmidt process.
There is a scope of betterment in term of classification accuracy, if simple feature extraction techniques are applied to the subspace S i directly. If linear feature extraction techniques are used (e.g. LDA) with T i ∈ R m i ×m , as the feature extraction matrix, then F i can be taken simply as [V i |V b ]T i and the rest mathematical formulation remains the same. • Step 2 (Classification) : In this step unknown test trial is classified in one of the subspaces estimated during the feature extraction step. To complete the feature extraction process, one of the two subspaces has to be removed. This problem can be solved within a classification framework. The feature extraction process will then be a bi-product of the classification process. We elaborate it below. For simplicity, the classes are taken to be Gaussian with prior probabilities P (w i ). Applying the Bayes classifier at the first subspace implies where is the marginal probability of x * i and P (w i1 |x i * ), i = 1, 2, are the posterior probabilities of the two classes in the first subspace. Note that the conditional distribution x * 1 given w i1 remain Gaussian due to the linear transformation of x * (see equation (1)), for i = 1, 2. In particular, In a very similar manner, x * is projected in the second subspace. The posterior probabilities of the two classes in the second subspace are then given by where Based on these posteriors we associate x * to one of the two classes. After projecting x * in the first class it is associated to class w k with the maximum posterior k = arg max i=1,2 P (w i1 |x * 1 ), similarly x * is assigned to class w with the maximum posterior = arg max i=1,2 P (w i2 |x * 2 ). Finally x * is associated to class w g by comparing the highest posterior probabilities obtained in individual subspaces, i.e., g = arg max k, [P (w k1 |x * 1 ) P (w 2 |x * 2 )]. Discriminative Filter Weights Although the features appearing in CPCA are abstract, the corresponding filters have a physical interpretation. By filters we mean the columns of F i . The filter coefficients, when arranged into a spatiotemporal array, indicate brain areas and time scales which are engaged in encoding the differences between the two classes (target present/absent).