## Introduction

Humans can generally count or estimate the number of objects in a scene quite easily, yet the perceptual mechanisms and the cognitive strategies underlying this ability are still little understood. Numerical judgments are extremely fast and virtually errorless up to four items, while they become slower or more approximate for larger numerosities1,2,3. This behavior suggests the existence of two independent systems for perception of very small and larger numerosities, the subitizing and the Approximate Number System (ANS) respectively4.

Interestingly, counting speed of larger numerosities also increases considerably if stimuli are grouped into smaller clusters5,6, a phenomenon that has been termed groupitizing7. Counting is particularly fast when the number of clusters and the number of items included in each cluster is very low (e.g. 8 = 4 + 4), falling within the subitizing range7. Two recent studies have generalized the groupitizing effect to non-spatial grouping cues, different numerosity tasks and formats. Ciccione and Dehaene8 showed a groupitizing advantage only when items were divided into clusters of the same number of items, irrespective whether the items were grouped spatially or by color alone. Anobile et al.9 went on to show that groupitizing can also boost sensory precision measured with an approximate numerosity estimation task, both for spatial arrays and temporal sequences. Starkey and McCandliss7 noticed that school-age children with higher arithmetical abilities took most advantage of groupitizing cues, while there was no groupitizing effect in preschoolers, suggesting that the ability to groupitize may reflect the use of arithmetical strategies (e.g. divide-and-sum).

A reasonable conclusion from these studies is that groupitizing arises from two independent factors: the ability to subitize small groups parsed from the larger set, and the ability to combine the group estimates through mental calculation. The first aspect implies that participants may recruit the subitizing system to estimate numerosities higher than the normal 4-item limit. This strategy would require considerable cross-talk between subitizing and ANS, usually considered to be independent systems. However, there is some evidence for interconnection between the systems. Under dual task conditions, sensory thresholds for estimating numerosities in the subitizing range become comparable to those measured in the estimation range, suggesting that the estimation system works even within the subitizing range, but performance for low numbers normally augmented by the automatic deployment of visuo-spatial attentional resources10,11,12. The heavy reliance of subitizing on attention may therefore constitute a characteristic feature of this system and explain its higher precision. Thus, measuring performance under conditions of deprived attention may serve as a diagnostic test of whether groupitizing is based on the subitizing system.

Number estimation is not always veridical. The clearest example comes from numberline studies, which require participants to map number onto space. Under many conditions, including deprived attention, the mapping shows a strong compressive non-linearity13. While this has been described as reflecting a native logarithmic system of encoding number13 several recent studies explain the non-linearity as an example of “central tendency” or “regression to the mean”, a principle observed in almost all perceptual systems14. Regression to the mean is well described within the Bayesian framework, where the mean can be considered a Bayesian prior13,15,16. An important prediction from this approach is that the magnitude of the compressive non-linearity should vary with the precision of the numerosity judgments: the worse the precision (higher Weber fractions), the greater should be the non-linearity. If groupitizing is rooted in the subitizing system, which needs attention to boost precision10, we expect there to be less regression to the mean for grouped than ungrouped stimuli, and that this advantage should disappear under attentional deprivation.

In the current study we tested whether the grouping-induced improvements in precision and accuracy of number estimation is based on extending the subitizing system to larger numerosities. To this aim we measured precision and accuracy of numerosity estimation for grouped and ungrouped arrays while modulating attentional resources with dual tasks. If the groupitizing phenomenon is rooted in the subitizing system, attentional deprivation should affect precision more for grouped than ungrouped stimuli. We further explored whether groupitizing may rely on arithmetical computation, with a preliminary study correlating simple calculations skills with precision for estimating grouped or ungrouped numerosities.

## Methods

### Power analyses

Sample size was calculated with a Power analyses using G*Power software (version 3.1). As the main goal of the current experiment was to detect a numerosity thresholds change under attentional load the analyses aimed to calculate the required sample size to reliably detect a difference between two dependent means: average Weber Fractions in single and dual task conditions (two tailed paired t-test). The effect size was estimated from Burr et al.10. With an ⍺ = 0.05 and a Power of 0.95, the analyses suggested a required sample size of 6.

### Participants

Twelve young adults (mean age = 26.1, standard deviation = 2.9, range = 22–32) participated in this study. Participants were all psychology students with no mathematical or other learning disorders nor over-exercised calculation skills and all with a normal or corrected-to-normal vision.

### Materials and procedure

Stimuli were generated and presented with PsychToolbox17 routines for Matlab (ver. R2016b. 9.1.0.441655. The Mathworks, Inc., https://it.mathworks.com). Subjects sat 57 cm from a 19″ screen monitor (60 Hz), in a quiet and dimly light room. One experimenter (P.A.M.M.) performed the tests throughout the study. The experimental procedures were approved by the local ethics committee (Comitato Etico Pediatrico Regionale—Azienda Ospedaliero-Universitaria Meyer, Florence). The research was performed in accordance with the Declaration of Helsinki and informed consents were obtained from all participants prior to the experiment.

Participants each performed five sessions: in four they were asked to estimate the numerosity of ungrouped or grouped arrays both in single or dual task conditions, while in the fifth session they were given a mental calculation task. The conditions were tested separately with the order counterbalanced across subjects. No feedback was provided, and participants were not informed about the numerosity range. They were also not informed about the different spatial structures of the numerical arrays (ungrouped or grouped), and they were left free to choose any strategy to solve the task, and the possibility of performing mental calculation with the grouped stimuli was never mentioned.

### Numerosity stimuli and experimental paradigm

Stimuli were the same as those used by Anobile et al.9. The arrays were sets of white squares (0.4° × 0.4°) with black borders (in order to balance overall luminance) constrained within a square area of 6° × 6°. The only difference from Anobile et al.9 was that in each trial, one item was randomly selected and replaced with a different shape, either a diamond, a triangle or a circle (with a total area equal to that covered by the squares).

In the ungrouped conditions, the position of each item was randomly selected from 106 possible positions within the stimulus area, the centers of equally spread sectors within the 6 × 6 area (each grid 0.5° × 0.5°). For the spatially grouped condition, items were arranged within a maximum of 4 groups (Fig. 1). Each group (spanning over a max area of 1 × 1.5°) was located in one quadrant centered at 3° from the central fixation point. Each group was randomly assigned to one quadrant (between 1 and 4), then the individual items positions were randomly selected out the 12 possible locations in the selected quadrant. Within each quadrant, the maximum center-to-center distance between elements was 2° and the minimum was 0.5°.

Each trial started with a black central fixation point that turned white after 1 s and remained on screen for the entire experiment. After another 1 s an array of items was centrally displayed for 200 ms, followed by a blank screen. In the single tasks (performed separately with ungrouped and grouped stimuli), participants were asked to verbally estimate the numerosity of the array, disregarding the shape of the individual items. The response was entered by the experimenter on the numeric keypad, who also initiated the following trial. Participants were asked to respond quickly, but to concentrate on accuracy. In the dual-tasks (again, performed separately with ungrouped and grouped stimuli) participants were asked first to identify the oddly shaped item by pressing the appropriate arrow key (diamond: left arrow; triangle: down arrow; circle: right arrow), then to verbally estimate the numerosity of the array. The experimenter (blind to the stimuli) hit the spacebar as soon as the response was spelled out, then inserted the number on a numeric pad.

We tested all numerosities between 5 to 17. In the grouped conditions, each numerosity was organized into 2–4 clusters, each comprising a variable number of items (between 2 and 6), resulting in the following configurations: 2, 2, 1–3, 3–3, 3, 1–2, 2, 2, 2–4, 4–3, 3, 3–3, 3, 3, 1–3, 3, 3, 2–3, 3, 3, 3–4, 4, 4–5, 5, 3–4, 4, 3, 3–4, 4, 4, 3–4, 4, 4, 4–5, 5, 6–5, 4, 4, 4. All clusters except three (13 = 5, 5, 3; 16 = 5, 5, 6; 17 = 5, 4, 4, 4) contained 1 to 4 elements.

On every trial, numerosities and configuration patterns (i.e. 3,3,3,1 or 3,1,3,3) were randomly selected. Each participant completed 150 trials for each condition, with each numerosity presented in mean 12 times, for a total of 600 trials for the entire experiment. Trials with response times higher than 3 standard deviations were considered outliers and eliminated from the analysis (0.8% of the trails).

### Mental calculation test

Mental calculation proficiency was measured by a custom-made computerized test. Each trial started with a central fixation cross. As soon as the participants pressed the space bar, the stimuli (1° × 1.5° digits, and 1° × 1° operand, Arial font) were displayed. Each trial required the participant to mentally solve an arithmetic operation. Each participant solved 37 operations in total. Each operation was randomly selected trial-by-trial between: 3 + 3, 4 + 2, 2 + 5, 3 + 4, 4 + 4, 5 + 3, 3 + 6, 4 + 5, 2 × 3, 2 × 4, 2 × 5, 2 × 6, 2 × 7, 2 × 8, 2 × 9, 3 × 3, 3 × 4, 3 × 5, 3 × 6, 4 × 4, 4 × 5, 4 × 6, 6–3, 6–4, 7–3, 7–5, 8–3, 8–4, 9–4, 9–6, 2 + 1 + 2, 3 + 1 + 3, 3 + 3 + 3, 3 + 4 + 4, 5 + 3 + 5, 5 + 6 + 5, 6 + 5 + 6. Participants mentally calculated the result as fast as possible and responded verbally (no explicit time limit was provided). The experimenter (blind to the stimuli) hit the spacebar as soon as the participants spelled out the result (which recording response time), then entered the response on the numeric keypad. Trials with response time higher than 3 standard deviations were considered outliers and eliminated from the analysis (1.3% of trails).

### Data analysis

Data were separately analyzed for each subject. For the numerosity estimation task we calculated the average perceived numerosity (accuracy) and the response standard deviation (precision), separately for each numerosity and condition. Standard deviations were divided by the corresponding perceived numerosity, resulting in the Weber fraction (Wf), a dimensionless index of precision18. The Weber fractions calculated for each separate numerosity were also averaged across numerosity levels, in order to obtain a summary precision index.

The magnitude of the attentional cost induced by grouped spatial structure was measured as the normalized difference between average Weber fractions calculated in the single (ST) and dual (DT) tasks, averaged across numerosity levels:

$$Attentional\;cost= \frac{{Wf}_{DT}- {Wf}_{ST}}{{Wf}_{DT}+{Wf}_{ST}}$$
(1)

where $${Wf}_{DT}$$ and $${Wf}_{ST}$$ are average Weber fractions for the dual and single tasks.

The thresholds improvements induced by grouping in the single task was measured as the normalized difference between average Weber fractions calculated in the ungrouped (NG) and grouped (G) conditions, averaged across numerosity levels:

$$Groupitizing \;advantage= \frac{{Wf}_{NG}- {Wf}_{G}}{{Wf}_{NG}+{Wf}_{G}}$$
(2)

where $${Wf}_{NG}$$ and $${Wf}_{G}$$ are the average Weber fraction for the ungrouped and the grouped conditions in the single task.

Weber fractions were analyzed with Repeated Measures ANOVA and Bonferroni corrected post-hoc t-tests. Effect sizes (η2 and Cohen’s d) are also reported when appropriate. The relation between attentional cost, total numerosity and number of groups was analyzed with zero-order (Spearman) and partial correlations. Log10 Bayes factors (LogBF) are reported alongside standard Rho (ρs) and p-values. Positive Log10 Bayes factors should be interpreted as lending substantial (0.5–1), strong (1–1.5), very strong (1.5–2) and decisive (> 2) support to the alternative hypothesis. Negative LogBF within these ranges is evidence for the null hypothesis.

To evaluate non-linear compression of mean estimates of numerosity we fitted the data with power functions:

$$y=a{N}^{b}$$
(3)

where y is the average estimate of numerosity, N physical numerosity and a and b constants free to vary. The value of the exponent b is an index of non-linearity, with b = 1 implying a linear relationship, and b < 1 a compressive non-linearity (b = 0.5 implies square root).

The Bayesian central tendency model assumed that the perceived numerosity y was given as a weighted average of the physical numerosity and the mean of the range.

$$y=N\left(1-{w}_{p}\right)+{w}_{p}\stackrel{-}{N}$$
(4)

where $${w}_{p}$$ is the weight assigned to the prior, which for an optimal observer is proportional to the relative reliabilities (inverse variances) of the two sources of information. Under the simplifying assumption of Weber’s Law, this becomes:

$${w}_{p}= \frac{{\left({Wf}_{i}\cdot N\right)}^{2}}{{\left({Wf}_{i}\cdot N\right)}^{2}+{\sigma }_{P}^{2}}$$
(5)

where $${Wf}_{i}$$ is the Weber fraction for condition, and $${\sigma }_{P}^{2}$$ is the variance of the prior, estimated to best fit all four conditions simultaneously.

For the mental calculation task, two separate z scores were calculated for each participant (using the mean and the standard deviation of the entire group), one for accuracy, the other for response speed. We then averaged the two z scores to yield a combined math performance index, following the procedure previously used by Anobile et al.18. Participants were categorized as belonging to the “low” or “high” math sample if the combined z-score for mental calculation was below or above the 50th percentile. To evaluate the relation between numerosity estimation and calculation skills we performed standard Pearson’ correlations, with correction for multiple comparisons.

Statistical analyses were performed using JASP (version 0.12.2, The JASP Team 2020, https://jasp-stats.org) and Matlab (R2016b).

## Results

### Effect of grouping and attention on numerosity estimation thresholds

We used a dual-task paradigm to measure the effect of attentional deprivation on precision and accuracy of numerosity estimation for ungrouped and grouped spatial arrays. Participants estimated numerosity, either during a concurrent visual search task (spot out the odd-shaped item), or with the visual distractor present, but ignored (single-task). Figure 2a shows that when the distractor was ignored, leaving attentional resources for the numerosity task, there was a strong groupitizing advantage, about 20% on average. Depriving attention affected grouped but not ungrouped stimuli, annulling the groupitizing advantage. For ungrouped stimuli the small effect of attentional deprivation was similar at all numerosities (Fig. 2b), while for grouped stimuli, it was clearly strongest at lower numerosities (Fig. 2c).

These effects were born out by three-way repeated measures ANOVA, with spatial structure (ungrouped or grouped), attentional load (single or double task) and numerosity (13 levels) as factors. There were significant main effects for spatial structure (F(1,11) = 5.8, p = 0.034, η2 = 0.013, d = 0.23) and for attentional load (F(1,11) = 11.2, p = 0.006, η2 = 0.046, d = 0.44). Crucially, the interaction shown in Fig. 2a between attentional load and spatial structure was significant (F(1,11) = 5.4, p = 0.04, η2 = 0.011, d = 0.21). Post-hoc tests showed that with full attention, Weber fractions for grouped arrays were significantly lower than those for ungrouped arrays (t = 3.35, pbonf = 0.017, squares in Fig. 2a), while in the dual-task they were statistically indistinguishable (t = 0.11, pbonf = 1). Modulating attention did not alter Weber fractions for ungrouped arrays (t = 1.37, pbonf = 1) while for grouped arrays, Weber fractions in dual-task were higher than that in single-task (t = 4.082, pbonf = 0.004). There was also a significant interaction between numerosity and attentional load, being stronger at low numerosities (F(12,132) = 3.14, p < 0.0001, η2 = 0.04, d = 0.41). The triple interaction was not significant (F(12,132) = 0.9, p = 0.58, η2 = 0.012, d = 0.22). Yet, if groupitizing is based on a capacity-limited, subitizing-like system, depriving attention should most strongly impact the lowest grouped numerosities. Indeed, although the triple interaction did not reach significance, attention seems to affect estimation thresholds more for low numerosities, and only for grouped stimuli. Planned comparison t-tests confirmed that attentional deprivation did not significantly affect estimation thresholds of ungrouped stimuli for any of the numerosities tested (all p > 0.05 Fig. 2b). On the other hand, when the stimuli were spatially grouped, attention most strongly modulated estimation thresholds for the lowest numerosity (N5: t = 5.149, pbonf = 0.0007; N6: t = 3.913, pbonf = 0.158; N7: t = 4.48, pbonf = 0.015; pbonf > 0.05 for all the other numerosity, Fig. 2c, see also Fig. 3b.

To avoid a systematic association between total numerosity and number of groups, numerosities in the grouped condition were presented with different configurations, varying between 2 and 4 clusters. For example, the number eight was shown either with the (2, 2, 2, 2) or with the (4, 4) configurations. We tested whether the attentional modulation of thresholds was particularly marked for certain configurations, and whether it depended primarily on the number of groups or on the total numerosity, or both. We correlated the attentional cost (defined as the normalized difference between Weber fractions in the single and dual conditions: Eq. 1) with the number of groups and total numerosity (Fig. 3). As larger numerosities were generally divided into more groups than lower numerosities (positive correlation between total numerosity and number of subgroups: ρs = 0.51, p = 0.02, LogBF = 0.8), we also calculated partial correlations, evaluating the variance independently explained by each of these factors (total numerosity or number of groups). Attentional cost negatively correlated with both the number of groups and total numerosity (both ρs < 0.001, LogBF > 1.7), suggesting that the detrimental effect of attention was higher when both the number of groups and the total numerosity were lower and tended to decrease for larger numerosities. The correlation between the attentional cost and total numerosity remained significant even when taking into account the effect of the number of groups (ρs =  − 0.53, p = 0.017, LogBF = 0.90). Similarly, the correlation between attentional cost and number of groups also remained significant when controlling for the total numerosity (ρs =  − 0.62, p = 0.006, LogBF = 0.99). These results indicate that attentional deprivation acts on both the total numerosity and on the number of groups: its negative impact on estimation thresholds was strongest for the lowest numerosities and for stimuli divided into fewer groups.

### Effect of spatial structure and attention on accuracy of estimating numerosity

Under many conditions, including deprived attention, the mapping shows a strong compressive non-linearity13, considered by many as an example of regression to the mean. If groupitizing is rooted in the attention-dependent subitizing system, which requires attention to boost numerical estimation precision, the effects of grouping and attentional deprivation should also be evident in estimation accuracy.

Figure 4a–d shows the average estimates of numerosity for the four conditions. In general, low numerosities were overestimated and high numerosities underestimated, both following a regression to the mean. However, as usually observed, the regression to the mean was greater at high numerosities (where precision is less), resulting in a strong compressive non-linearity. To measure the non-linearity created by these biases, we fitted each set of data with a power function (Eq. 3, methods), shown by the blue lines. The fits were all very good (total R2 over all conditions = 0.986).

Importantly, as predicted, the non-linearity was not the same in all four conditions, but was highest for conditions with the highest Weber fractions. Figure 4e plots the index of the power function against average Weber fraction. The non-linearity clearly increases with Weber fractions, from 0.89 for the grouped single task condition (index of 1 means a linear function), to 0.80 for the ungrouped single task condition to 0.72 for the two dual task conditions. Where performance is most precise, it is also most accurate. The correlation between the two measures was r =  − 0.983, p = 0.008, LogBF = 0.84.

To test the quantitative predictive power of the Bayesian model of central tendency, we fitted the data with the Bayesian prediction, given by Eq. (4) of methods. The equation essentially states that perceived numerosity will be a weighted average of the actual physical numerosity of the stimulus and the mean numerosity of the range tested (the prior). Relative weighting of the two is determined by their precision: the more precise the estimates, the higher the weighting Eq. (5). That has two consequences. Assuming constant Weber fractions implies that thresholds increase linearly with numerosity, so the regression effects will be more pronounced at higher than at lower numerosities, leading to the compressive non-linearity. Secondly, as the Weber fractions increase between conditions, the prior (which we assume to remain constant between conditions) will have greater effect, resulting in the greater non-linearities that we observe (Fig. 4e).

The fits are shown by the red curves of Fig. 4a–d. The four fits have only 1 degree of freedom for all of them, the width of the prior ($${\sigma }_{P}$$ of Eq. 5) was constant for all four conditions, selected to simultaneously minimize the residuals of all four fits. The resulting fits were excellent, with total R2 = 0.988 (compared with 0.986 for the power fits). Thus, the Bayesian central tendency model explains well the data, qualitatively and quantitatively.

### Relation with arithmetical abilities

Despite the relatively small number of participants in this study (primarily designed to examine in detail the effects of attention on groupitizing), we also looked for possible correlations between groupitizing and math skills. Participants did a simple speeded calculation test described in methods, which was scored for both speed and accuracy. The average accuracy across participants was 90% ± 7%, and average speed was 1.3 ± 0.3 s. We combined z-scores of speed and accuracy (see methods) and correlated this index against Weber fractions for ungrouped and grouped stimuli.

For ungrouped stimuli, Weber fractions were uncorrelated with the math index (r =  − 0.18, p = 0.288, LogBF =  − 0.24; Fig. 5a); but for grouped stimuli the correlation was significant, and remained close to significance after correcting for multiple comparison (α = 0.5/2: r =  − 0.56, p = 0.029, LogBF = 0.54; Fig. 5b). We also found that participants with higher arithmetical skills gained more from grouping of stimuli than less skilled participants (r = 0.58, p = 0.023, LogBF = 0.61; Fig. 5c). While these results should be taken with caution before replication in future studies, they suggest the very interesting possibility that groupitizing could be a sensitive predictor of math skills.

## Discussion

The aim of the present study was to directly test whether the groupitizing phenomenon7 depends on subitizing, by measuring the consequences of depriving attentional resources on numerosity estimation thresholds of spatially grouped and ungrouped items. As previous studies9 have shown, numerosity thresholds for spatially grouped stimuli were lower than for randomly scattered stimuli. However, depriving attention with a concomitant dual task completely obliterated the groupitizing advantage, consistent with the suggestion that it relies on subitizing. We also explored the link between groupitizing and arithmetic, and showed that simple mental calculations skills in adult participants correlated with estimation thresholds for grouped but not ungrouped stimuli, and also with the advantage given by grouping.

Although subitizing was originally thought to be pre-attentive, dependence on attention has become a signature of the subitizing system. Many studies have shown that attention has a much stronger detrimental effect in the subitizing than estimation range, enough to equate subitizing precision and reaction times to those of higher numerosities during dual tasks10,12,19,20,21. The selective detrimental effect of attentional deprivation in the subitizing range was reinforced by a recent clinical single case study with a simultanagnosic patient (PA)22, who suffered a severe visual attentional deficit. PA showed no subitizing advantage for low numerosities, while his numerosity perception was relatively spared for intermediate numerosities, above the subitizing range. The subitizing advantage, at least in the visual domain, could thus emerge from the well-known capacity-limited attentive tracking system, that allows precise tagging of a few objects in space23. Other studies show that depriving auditory and haptic attentional resources also affects visual subitizing19. Future studies should investigate the effect of cross-modal attention deprivation on groupitizing.

The current study showed that performing a dual task completely eliminates the groupitizing advantage for estimation thresholds, in the same way that it eliminates the subitizing advantage for low numbers: estimation thresholds for grouped arrays in dual task became like those measured with ungrouped arrays in single task. Depriving attention during estimation of ungrouped arrays, on the other hand, did not affect estimation thresholds. Given that the numerosities tested were the same across the grouped and ungrouped conditions (in both cases well exceeding the subitizing range), the only factor driving the attentional modulation was the spatial configuration. We presume that ungrouped arrays were judged primarily by estimation system, largely independently of attention, whereas grouped arrays trigger the additional intervention of the subitizing system, which boosts performance. However, as subitizing requires attentional resources, during dual-task only the estimation system could operate, bringing performance for grouped arrays down to that of ungrouped stimuli. In the grouped condition, the detrimental effect of dual task scaled both with total numerosity and with the number of groups, with stronger cost for low numerosities and lower number of groups. The higher cost of attention for low numerosities and fewer groups suggests that groupitizing acts on both these factors. With larger total numerosities and/or number of groups, the attentional free estimation system is likely to kick in, even if items are spatially segregated, resulting in a weaker attentional modulation of estimation thresholds.

We also found that estimation biases differed across attentional and grouping conditions. All estimates departed from linearity and tended toward the center of the numerosity range, with the effect increasing when attention was deprived. The observed compressed non-linearity was well fitted by a Bayesian model of central tendency14,15,24,25. This effect has been described for a wide range of stimuli26,27,28,29,30,31, and is thought to maximize the perceptual efficiency by exploiting contextual effects. An important prediction of the Bayesian model is that the magnitude of the non-linearity should depend on perceptual thresholds. This prediction was borne out, with a strong and significant correlation between magnitude of non-linearity and Weber fractions. And the Weber fractions predicted well the form of the non-linearity, with only one degree of freedom (strength of the prior, unchanged between conditions).