Adaptive Spontaneous Transitions between Two Mechanisms of Numerical Averaging

We investigated the mechanism with which humans estimate numerical averages. Participants were presented with 4, 8 or 16 (two-digit) numbers, serially and rapidly (2 numerals/second) and were instructed to convey the sequence average. As predicted by a dual, but not a single-component account, we found a non-monotonic influence of set-size on accuracy. Moreover, we observed a marked decrease in RT as set-size increases and RT-accuracy tradeoff in the 4-, but not in the 16-number condition. These results indicate that in accordance with the normative directive, participants spontaneously employ analytic/sequential thinking in the 4-number condition and intuitive/holistic thinking in the 16-number condition. When the presentation rate is extreme (10 items/sec) we find that, while performance still remains high, the estimations are now based on intuitive processing. The results are accounted for by a computational model postulating population-coding underlying intuitive-averaging and working-memory-mediated symbolic procedures underlying analytical-averaging, with flexible allocation between the two.

Averaging numerical information is essential in the formation of preferences about a variety of items, from stocks and grocery lists to participants in a contest, as well as for making decisions between alternatives characterized by numerical values [1][2][3][4] . While previous work has indicated that human observers can generate quite accurate estimations of numerical values 5,6 , they had mostly relied on group estimations. Here we set out to investigate the ability of participants to carry out such estimations and the underlying mechanisms, using a psychophysical, within participant approach.
In particular, we are interested in distinguishing between two potential estimation mechanisms: a) an analytic one (also termed the exact system) that is based on rule-governed serial operations, performed on values held in working-memory; and b) an intuitive one (also termed the approximate system) that is based on parallel processes operating on analog and fuzzy representations (7)(8)(9)(10)(11) ; but see 12,13 ). According to this schema, numerical intuition is considered to reflect reliance on perceptual-like mechanisms, such as those that operate in statistical estimations of the numerosity or size of visual elements [14][15][16] , while analytic calculations are seen as a product of a symbolic pathway used for the sequential application of arithmetic operations or heuristics [17][18][19] . These two mechanisms differ in their functional properties: the intuitive system is automatic, rapid and high-in-capacity, yet capable only of an approximate (coarse) estimate at the single-item level, while conversely, the analytic/symbolic system is precise, but mediated by working memory and thus restricted in capacity. Importantly, these functional discrepancies render each system optimal under different task contingencies.
Consider, for example, a situation in which one must assess the average-value (e.g., prices, quality-evaluations) of a certain quantity of items that are only briefly presented without being able to take notes. When presented with only a few items (i.e., within working memory capacity), and assuming no time pressure, one should better apply an analytic solution, which involves a sequential application of numerical operations. Nonetheless, as the sequence-length increases, a growing amount of information is excluded from the already occupied working memory, resulting in a deteriorated reliability of the analytic solution. On the other hand, since the intuitive system has a higher capacity than the analytic one [20][21][22] , any additional information should theoretically improve its accuracy, since uncorrelated noise at the single-item level averages-out (see Results). Therefore, from a normative point of view, when the amount of information reaches a certain threshold, or when the information is presented at a speed that exceeds the temporal capability of the serial analytical system 23 , one should shift from analytic thinking to intuition.
To investigate these issues in a controlled environment, we have conducted four experiments in which participants were presented with sequences of two-digit numbers and required to produce an estimation of the average. The first two experiments (Exp. 1, N = 18, skewed distributions; Exp. 2, N = 18, normal distributions) used a moderate presentation rate of 500 ms per numeral, and no RT pressure allowing both strategies to operate, in order to probe for spontaneous strategy changes with set-size. In the last two experiments we probed reliance on intuition alone, by presenting the numbers at an extremely rapid rate (Exp. 3, N = 18, 100 ms per numeral), or by explicitly limiting the response time (RT) (Exp. 4; N = 18, RT limit of 2.5 sec; see Method section for full description; and Fig. 1 for an illustration of a typical trial).
Our aims are thus two-fold: (i) to demonstrate that individuals are able to discriminate between the averages of rapidly-presented number sequences, and test whether participants spontaneously adapt their strategy as a function of set-size to enhance performance; (ii) to offer a dual-component computational model that accounts for these abilities.

Results
We first present the qualitative predictions of the two components of the computational model that will allow us to characterize the behavioral signatures of each of the systems, separately.

Computational model: predictions for the intuitive and analytic systems. The model's
intuitive-component is grounded on neurophysiological evidence, demonstrating that approximate numerosity and number representations are coded in the parietal cortex of primates 24,25 and humans 26,27 . The model follows Dehaene and colleagues [28][29][30][31] to assume that symbolic numbers activate broad numerosity tuned neural detectors. We assume that when presented with a sequence of numerosity displays or numbers, the sequence's average can be estimated from the number-tuned neural activation profile, by weighting the contribution of each neuron's activity according to its preferred number/quantity [i.e., extracting a population vector; 32 ; see Computational Model Section for additional description]. As we show in the simulations below, this model predicts that estimations of a sequence's average improve with increasing number of samples ( Fig. 2 Lower Panel, right-hand; blue line), since intrinsic noise in the broad representation of each individual number averages out.
The analytic-component of the model assumes that procedural operations (such as multicolumn addition and division, or heuristic approximations of those operations) are serially performed on symbols available in working memory 33 . As the working memory capacity is limited (about 4+ /-2 items; 34,35 ), the model predicts that as set-size increases, the distance between the subjective estimation and the true average increases, reflecting the lower relative number of samples used ( Fig. 2 Lower Panel, right-hand; red line; see Computational Model Section for additional description). Furthermore, the model assumes that, at each sequence, the quantity of items available in working memory is subject to some trial-to-trial variability (e.g., 36 ), predicting a positive correlation between accuracy and RT, as they both increase with the number of samples that the system has used in the estimation.  [1][2]. Each trial begins with a 300 ms fixation cross, after which a sequence of two-digit numbers is presented (500 ms per numeral). The sequence setsize was 4, 8 or 16 (randomly between trials). The only instructions participants received were to convey as accurately as possible the sequence's average, by vertically sliding a mouse-controlled bar set on a number ruler between 0 and 100 (the number corresponding to the bar's location was concurrently displayed).
Scientific RepoRts | 5:10415 | DOi: 10.1038/srep10415 The combined model, which integrates both strategies to account for the performance of the participants, assumes that a set-size-dependent parameter determines which of the two strategies is utilized in each trial (exclusive dual-process). This model provides a quantitative account of the data, which is presented after the experimental results.

Experimental results
In Exp. 1 (skewed) and Exp. 2 (normal) the presentation rate was 500 ms per item. Our main interest is the dependency of the accuracy of the estimates (see quantification of accuracy below) and their RT on set-size. As there was no difference in the dependency of these DVs on set-size, in the two experiments [F(2, 34) = 0.95; p = 0.4 of the interaction of experiment-type with set size on accuracy; F(2, 34) = 0.84; p = 0.44 of the interaction of experiment-type with set size on RT], the analysis is reported collapsed across the two (other than when mentioned explicitly).
Sensitivity to Numerical Averages. The participants exhibited above-chance sensitivity to the arithmetic averages of the presented number sequences, as evident by using two different measures that are computed on a subject by subject manner: (1) the correlation across trials between the sequence's average and participant's evaluations was high (chance involves a null-hypothesis according to which the participant's response in a given trial is uncorrelated with the specific values presented in that trial, but may reflect the general statistics across the experiment, thus predicting zero correlation) [Pearson correlation = 0.66; SD = 0.11; p < 0.0001; for all participants; left panel in Fig. 3 shows the performance . This remarkable sensitivity to the numerical average of the sequence was also found for the 16-number trials, separately: (1) participants' correlation was significant [r = 0.64 (SD = 0.12); p < 0.05, for all participants]; and (2) square deviations were smaller as compared to shuffled responses [Actual_16 = 58.1; Shuffled_16 = 139.64; p < 0.05, for all participants, except one, who nevertheless was not discarded from analysis]. In order to exclude the possibility that participants were merely picking a random number from the sequence, we compared square deviations between low-range and high-range sequences (median split of sequence range, defined as the difference between the maximal and minimal numerical values of the sequence). We found no difference in square deviations between low and high range sequences [Low = 60.66; High = 63.11; t(35) = − 0.8; p = 0.43], suggesting that participants did not choose a random number from within the sequence.
To test whether participants systematically under-or overestimate the means, we have computed each participant's average signed deviation from the actual means. An unbiased observer should exhibit no such deviation. We found a small, yet significant negative deviation [− 0.79; t(35) = − 2.37; p = 0.023, as compared to 0], suggesting that participants underestimate the sequences' average. Since in Exp. 1 we used sequences with a skewed distribution of numerical values (see Method), we were also able to distinguish between an estimation that is based on arithmetic average as opposed to the median of the sequence, despite the fact that no trial-by-trial feedback, based on the actual averages, was delivered (see Method). We compared the square of the deviations of each participant's evaluations from the sequence's mean and median and found that the former was smaller than the latter [Mean = 59.58; Median = 98.4; p < 0.01 for all 18 participants]. We show in the Suppl., that participants' estimations are based on both digits, rather than only on the decimal digit of each number.
Taken together, these results suggest that participants are sensitive to the statistical average of multiple rapidly presented two-digit numbers.
Set-Size Effects. We quantify participants' accuracy by taking the square-root of the mean square deviations (hereafter RMSD) of the subjective estimations from the sequences' averages: where for each trial, i, x is the subjective estimation and µ is the sequence's arithmetic mean (note that higher values of RMSD imply lower accuracy). We found that accuracy changed as a function of set-size [repeated measure ANOVA with the within subject factor of set-size; F(2, 70) = 6.22; p = 0.003], indicating a non-monotonic dependency (see  This non-monotonic dependency is inconsistent with either the predictions of the intuitive or the analytical system operating alone, but is consistent with a dual-process account of numerical averaging (see Computational Model Section; and black line in Fig. 3, right panel). According to this account, participants rely on analytic/sequential thinking in set-size 4 and on the intuitive/holistic estimations in set-sizes 8 and 16.
Importantly, this non-monotonic effect is not the result of averaging two monotonic linear effects as this pattern is seen at the individual participant level (most participants -~70% -exhibit this non-monotonic function of accuracy/set-size). Additional support for the hypothesis that participants employed an analytic strategy in the 4-number condition is obtained from the observation that the proportion of trials in which participants' evaluations were perfect (deviation = 0) was significantly higher in the 4-number condition, relative to the 16-number condition [Perfect_4 = 0.15; Perfect_16 = 0.11; t(35) = − 2.37; p = 0.023; error rates were arcsine transformed prior to statistical analysis].

Response-times.
As the non-monotonic set-size-accuracy relation implies that participants rely on different mechanisms when evaluating the average of 4 and 16 numbers, we further hypothesized that participants' response times (RTs) would also differ between the set-size conditions, as the intuitive system is more rapid than the analytic system 22,23 . Indeed, we found that RTs decrease monotonically [repeated measure ANOVA with the within subject factor of set-size; F(2, 70) = 32.86; p < 0.0001; see Moreover, we found that participants who were slower to answer the set-size 4 problems, as quantified by their ratio of mean-RT in the 4 and the 16 conditions, were more accurate in the 4-condition [Pearson correlation between RT_4/RT_16 and RMSD = − 0.54; p < 0.001; see Figure S1 in Suppl.; no significant correlations were found in the 8-and 16-number conditions]. This suggests that the participants who "took more time" in the 4-number condition, did so in order to perform more operations, resulting in better accuracy in that condition.
Temporal Bias. If indeed participants rely on analytic calculations of the sequences' average in the 4-number condition, we should observe a biased (non-flat) temporal weighting profile of the presented numbers. This is due to the fact that analytic (explicit) processes rely on content available in working . This suggests that the recency observed in 4-number condition is most likely to stem from trials in which only a small subset of (mostly) recent items were maintained in WM distorting the evaluation, while the smaller recency observed in the 16-number condition reflects the general temporal decay profile of the neurons' ensemble activity.
Furthermore, we found that the analytic/intuitive RT ratio (RT-4/RT-16), correlates negatively with the individual recency bias [Pearson correlation = − 0.33; p = 0.049; no significant correlations were found in the 8-and 16-number conditions]. This indicates that participants who were slower in the 4-number condition based their estimation on most of the values shown, and thus suffered less from recency in that condition. Thus, it is likely that the RT ratio described above reflect individual differences in the amount of items held in working memory.
Taken together, these results demonstrate that participants are able to spontaneously tap onto the appropriate system in the two extreme conditions: when the information-load is grossly within the limits of working memory capacity (i.e., the 4-number condition) participants rely mostly on analytic operations to calculate the numerical average, as evident by slow RTs, positive RT-accuracy correlation and a deteriorative recency bias. Conversely, when the amount of information clearly overflows the analytic capacities (i.e., the 16-number condition) participants mostly rely on intuitive processes, as evident in their fast RTs, RT-accuracy invariability and relatively unbiased temporal weighting that does not impair accuracy.
In Exp. 1 and 2 we set no exogenous limitations on the participants' processing time, an advantage which afforded the employment of analytic thinking in set-size 4. In experiments 3 (N = 18) and 4 (N = 18; see Method section), we tested whether by accelerating the processing time, either by presenting the numbers at an extremely rapid rate (Exp. 3; 100 ms per numeral), or by setting a stringent RT limitation (Exp. 4; 2.5 sec), would shift the mechanism to the intuitive mode 39 , at all set sizes. This should lead to a monotonically increasing accuracy as a function of set-size. In addition we wanted to probe the ability to make average estimations of even more rapid numerical sequences (Exp. 3).

Response-times.
As the monotonic set-size-accuracy relation implies that participants rely on a single intuitive component when response-time is limited, we further hypothesized that participants' RTs would differ little between the set-size conditions. While a small speedup with set-size was found (indicating the lack of a speed-accuracy tradeoff), the RT differences were much smaller compared with Exp. 1 and 2, especially, in the set-size 4 condition (see cyan and red lines in Fig. 4, left panel). Also, unlike in Exp. 1 and 2, within subject RT-RMSD correlations were null, for all set-size conditions and there was no interaction between speed and accuracy in the different set-size conditions [ANOVA of the RT*RMSD interaction between set-sizes; Exp The Intuitive Population Coding component. For the intuitive numerical averaging process, we adapted a population vector model 32 . Each number (10-90), defines a distinct Gaussian distribution (SD/ width w) over the neural network. Upon the presentation of a number, each unit/neuron responds probabilistically, by triggering a number of spikes that is sampled from a Poisson distribution with a mean, λ , determined by the corresponding numerical tuning-curve (see Fig. 2B). Each successive number presented triggers an additional, accumulated probabilistic neural activation. At the display's offset, a unit sums each neuron's firing-rate multiplied by its preferred number and divided by the sum of the overall network's activity (see eq. 1). The product is the neuron representing the activation weighted average. Finally, this neuron's preferred number (Gaussian's tuning curve peak) is perceived as the sequence's average. calculations made on items available in working memory or on heuristic approximations of them (for example, a 'rough arithmetic' heuristic, in which each number is rounded. This introduces an additional noise parameter per operation. The model simulation is carried out with a zero value of this parameter, but the results are very similar with small heuristic-based noise). We assume that on each trial the working memory capacity is determined by sampling a value from a Gaussian distribution with the average of 4 34,35 and SD as a free parameter. The sampled value is rounded to a positive integer to represent a discrete item (see Fig. 2, left lower panel).
We fitted the two-component model to the data observed in Exp. 1-4 (see Suppl., for fitting procedure and results) and found that the model is able to account for the non-monotonic relationship between accuracy (RMSD) and set-size observed in Exp. 1-2 (see Fig. 3, black line) as well as for the monotonic improvement in accuracy with set-size in Exp. 3-4 (see Fig. 7). The model accounts for the non-monotonic accuracy with set-size in Exp. 1-2 as a result of its changing strategy: analytic for set-size 4 and intuitive for set-sizes 8 and 16. On the other hand, in Exp. 3-4, which involves strict response deadline, the model relies solely on the intuitive component, which predicts a monotonic improvement with set-size.

Discussion
We found that human participants have a remarkable ability to estimate the average of rapid sequences of two-digit numbers, at presentation rates that stretch from 2 to 10 items/sec. Importantly, at moderate presentation rates (2/sec), the relation between set-size and accuracy is not monotonic: accuracy decreases from set-size 4 to set-size 8, yet increases from set-size 8 to set-size 16 (Fig. 3). This pattern is predicted by a dual-, but not a single-component account of numerical averaging, which is based on the distinction drawn between approximate/intuitive and exact/analytic numerical cognition 19 . Under this scheme intuitive averaging is the result of perceptual-like population-based averaging, while analytic averaging relies on serial symbolic-based operations or heuristics, mediated by working memory. As we showed in our simulation study ( Fig. 2 and Computational Model Section), the two components predict opposite effects of set-size on accuracy, allowing the dual system to account for the observed non-monotonic pattern. In particular, analytic processes become less accurate with set size (due to the WM-capacity, they work with a lower fraction of the total values that need to be averaged), while the intuitive system, modeled as a population-coding of analogous quantity/numerosity, gains precision with increasing set size, as uncorrelated noise at the single-item level averages-out.
According to our dual component model, participants are able to adaptively select the strategy (analytical vs. intuitive) that fits to the task contingency and demands (set-size, presentation-rate or response deadline). In particular, they carry out analytical calculation with a small set-size and at a moderate presentation rate (2/sec), but they spontaneously switch to intuitive computations at a high set size or high presentation rate (10/sec). Additional support for this interpretation is given by the estimation response time (RT). First, RTs in the 4-number condition were much slower than in the 16-number condition. Second, only in the 4-number condition there was a significant positive correlation between RT and accuracy (more operations undertaken improve the estimation). As a consequence of this adaptive strategy, participants are able to enhance overall performance, as compared to reliance either on intuition or analytic thinking alone. Third, we found that, when experimental limitations were set either on presentation time (Exp. 3) or on response time (Exp. 4), participants exhibit a monotonic improvement in accuracy as set-size increases (and little set-size difference in RTs; see Fig. 4 cyan line). This result is predicted by a single intuitive component and is in agreement with studies showing that the extraction of statistical properties, such as instance-frequency or the average size/diameter of circles is more efficient (i.e., faster and more accurate) in larger set-sizes [40][41][42] .
These results provide critical support in favor of the influential distinction drawn between approximate and exact numerical cognition 19 , and propose an explicit computational account, motivated by neurophysiological data. This computational model extends the processes of numerical cognition to the averaging of sequences of numbers -an operation which is crucial for the formation of preferences and decision-making [1][2][3][4] . The model provides further behavioral predictions as for the expected RMSD at additional set-sizes: for example, it predicts that RMSD will be smaller (accuracy higher) for 2 samples as compared to 4, and that RMSD will be lower (accuracy lower) for 12 samples as compared to 16. At the neural level, the model predicts that in intuitive averaging tasks (e.g., Exp. 3-4), when presented with two extreme numerical values (e.g., 10 and 90), the most active neural representation in the parietal cortex will appear during the encoding of the sequence around the average (50) rather than at the specific values. Future studies are also needed to examine whether, as our model predicts, activation accumulates during numerical averaging, in face of known adaptation effects taking place when participants view adjacent numbers passively 43 . Future versions of the models may include a sub-additive summation of activation to account for adaptation effects.
Recent research provides complementary results supporting the distinction between intuitive/approximate and analytical/exact strategies of numerical averaging. By contrasting average-estimations under explicit instructions to rely, either on intuition, or on computation it was found that while the computational strategy is more accurate than the intuitive one, at low set-size, the situation reverses at high set-size (Rusou, Usher & Zakay, under review). The present results are consistent with this, but further suggest that reliance on each of these mechanisms is flexible and depends upon task contingencies such as the number of samples and amount of available processing time. The extent to which these spontaneous transitions extend to other domains such as decision making and perception remains to be investigated. Our paradigm may also facilitate the study of dyscalculia, which has been shown to involve impairments in both analytic and approximate numerical processing (e.g., 7). For example, it may allow to establish whether dyscalculia patients suffer from an inability to spontaneously adapt to task-contingencies and to rely on the appropriate mechanism under different conditions.

Materials and Methods
Participants. Overall, 72 participants participated in the four experiments (N 1 = 18 (Mean age = 23.8; SD = 2.5); N 2 = 18 (Mean age = 24; SD = 2.2); N 3 = 18 (Mean age = 22.8; SD = 1.9); N 4 = 18 (Mean age= 23.1; SD= 1.5); different participants in each experiment). All participants were undergraduate students recruited through the Tel Aviv University Psychology Department's participant pool, were naive to the purpose of the experiment and had normal, or corrected-to-normal, vision. Informed consent was obtained from all subjects. Participants were awarded either course credit for their participation or a small financial compensation (40 NIS; equivalent to about $10). Participants received a performance-dependent bonus of additional 10-20 NIS. All procedures and experimental protocols were approved by the ethics committee of the Psychology department of Tel Aviv University (Application 743/12). All experiment were carried out in accordance with the approved guidelines.
Stimulus Materials and Procedure. The basic set-up of a trial is depicted in Fig. 1. In Exp. 1 and 2, each trial began with a central fixation cross (300 ms) after which a sequence of two-digit numbers was presented (white Arabic numerals on black background; each number displayed for 500 ms; without blank ISIs). The sequence set size (i.e., the quantity of displayed numbers) was 4, 8 or 16 -randomly between trials. The only instructions participants received were to convey as accurately as possible the sequence's average, by vertically sliding a mouse-controlled bar set on a number ruler between 0 and 100 (the number corresponding to the bar's location was concurrently displayed) and pressing the left mouse button when reaching the desired number. In Exp. 1 and 2, we explained to the participants that their only objective is to be as accurate as possible and offered payoff for accuracy. After completing 20 practice trials, participants underwent 120 experimental trials divided into 6 blocks. Each block terminated with performance-feedback (block-average correlation) and a short, self-paced break. To generate each sequence of numbers in Exp. 1, 3 and 4 we predefined four triangular skewed-density distributions, ranged between 10 and 90; with means of: 40, 46, 54, or 60. Each sequence was sampled from one of the four distributions (random between trials). In case two identical numbers were sampled successively, the entire sequence was shuffled in order to prevent successive presentation. In Exp. 2, we used normal (Gaussian) underlying distributions to generate each sequence of numbers (means of the distributions were randomly sampled between 35 and 65; SD of distributions was 30). All stimuli were generated using Matlab© and were presented on a gamma-corrected ViewSonic (Walnut, CA) 17-in. monitor viewed at Scientific RepoRts | 5:10415 | DOi: 10.1038/srep10415 a distance of 41 cm. The screen resolution was set to 1,024 × 768 pixels, and the monitor had a refresh rate of 60 Hz.
Data and Statistical Analysis. We obtained participants' evaluation and response time (RT; measured from sequence's offset until mouse button press) in each trial. All regression weights used in the different analyses or depicted in figures are unstandardized beta coefficients. We discarded data from one participant in Exp. 3 for being at chance performance on both chance-level measures (i.e., correlation and shuffled responses); no other data was discarded in all experiments.