Rotation is visualisation, 3D is 2D: using a novel measure to investigate the genetics of spatial ability

Spatial abilities–defined broadly as the capacity to manipulate mental representations of objects and the relations between them–have been studied widely, but with little agreement reached concerning their nature or structure. Two major putative spatial abilities are “mental rotation” (rotating mental models) and “visualisation” (complex manipulations, such as identifying objects from incomplete information), but inconsistent findings have been presented regarding their relationship to one another. Similarly inconsistent findings have been reported for the relationship between two- and three-dimensional stimuli. Behavioural genetic methods offer a largely untapped means to investigate such relationships. 1,265 twin pairs from the Twins Early Development Study completed the novel “Bricks” test battery, designed to tap these abilities in isolation. The results suggest substantial genetic influence unique to spatial ability as a whole, but indicate that dissociations between the more specific constructs (rotation and visualisation, in 2D and 3D) disappear when tested under identical conditions: they are highly correlated phenotypically, perfectly correlated genetically (indicating that the same genetic influences underpin performance), and are related similarly to other abilities. This has important implications for the structure of spatial ability, suggesting that the proliferation of apparent sub-domains may sometimes reflect idiosyncratic tasks rather than meaningful dissociations.


Figures
. Trivariate Cholesky decomposition path estimates: g, Rotation, Visualisation.  Tables   Table S1. Descriptive statistics. Table S2. Internal consistency and test-retest reliability of Bricks measures. Table S3. Subtest intercorrelations. Table S4. Subtest factor analysis. Table S5. Bricks correlations with other measures. Table S6. Subtest intercorrelations, regressed on verbal ability. Table S7. Subtest intercorrelations, regressed on non-verbal ability. Table S8. Subtest intercorrelations, regressed on g. Table S9. Functional composite intercorrelations, regressed on verbal ability. Table S10. Functional composite intercorrelations, regressed on non-verbal ability. Table S11. Functional composite intercorrelations, regressed on g. Table S12. Dimensional composite correlation, regressed on other measures. Table S13. Subtest factor analysis, regressed on other measures. Table S14. Twin correlations and approximated variance components.   S17. Proportions of Bricks subtest correlations due to common genetic influences. Table S18. Proportions of Bricks subtest correlations due to common non-shared environmental influences. Table S19. Proportions of correlations with other measures due to common genetic influences.

Rationale
As discussed in the main text, the literature on spatial abilities is inconsistent regarding the relationship between mental rotation and spatial visualisation, and between 2D and 3D stimuli. If rotation and visualisation were dissociable processes, it was reasoned that traditional 2D and 3D mental rotation stimuli may engage them differently. With 3D mental rotation stimuli, target objects commonly rotate freely in three dimensions, such that key identifiable features are out of view or disguised by foreshortening. However, with 2D stimuli, in which the object rotates only in the picture plane (i.e., as though rotating the whole image itself, rather than the object), full information about the object is always available, and there is no need to visualise missing or disguised features.
The Bricks battery was therefore developed to isolate rotation and visualisation cleanly, and to include stimuli depicting 2D objects with concealed features (as a closer match to common 3D stimuli), and 3D objects which do not obscure features (as with common 2D stimuli). In this way, the putative rotation and visualisation processes could be assessed both separately and together, equally in 2D and in 3D.

Design
Six subtests were conceived. Each consists of a series of items with a stimulus image containing a "target" object, and four multiple-choice response images, only one of which (the correct response) depicts the same object as the target, following a suitable transformation. Participants completed the subtests in the following order: i) 2D Rotation: the most "natural" form of 2D rotation, in which the target (a two-dimensional object) is rotated only in the picture plane, and the target stimulus and correct response contain exactly the same information.
ii) 2D Rotation / Visualisation combined: to add the element of incomplete information commonly found in 3D stimuli, the target object is partially obscured behind an "occluder" -a square or circle quadrant partially obscuring the target. In the correct response, the target has rotated (in the picture plane) but the occluder is immobile.
iii) 2D Visualisation: the target remains entirely motionless and unchanged, but the occluder is in a different location in the correct response, thereby revealing a different portion of the target. iv) 3D Rotation / Visualisation combined: the most "natural" form of 3D rotation, in which the target (a threedimensional object, computer-generated and rendered with simple overhead "lighting") has been rotated freely in three dimensions in the correct response. v) 3D Rotation: corresponding to 2D rotation but with an image of an apparently three-dimensional objectin the correct response, the target is rotated only in the picture plane (i.e., as though the whole image had rotated, or the "camera" showing the scene had rotated on the spot). As with 2D rotation, the target stimulus and correct response therefore contain invariant information, with even the lighting and shadows remaining unchanged. vi) 3D Visualisation: to assess visualisation without rotation, the target stimulus depicts a wireframe drawing of an object, and the correct response shows the "solid" version, otherwise unchanged. The participant must therefore use the available information to determine how the solid will appear (e.g., which features are in view from the current perspective and which are obscured by others).

Development
A JavaScript web application, "Building Bricks", was developed to enable appropriate stimuli to be created for each subtest. This allows the creation of images of "bricks" (rectangular blocks, either 2D or 3D) of variable size, including one or more "studs" -protrusions of arbitrary length emerging from the main body of the brick, from the "top", "bottom" or both. 2D bricks may be rotated in the picture plane, 3D bricks in any direction, and the camera may be rotated to simulate picture-plane rotation for 3D objects. Occluders (squares or circle quadrants) of arbitrary size may be added to any corner of the image. Various other options such as colours or camera distances may be altered as required, and bricks may be presented in wireframe or solid form.
This software is freely available online under the open-source MIT license, and researchers are welcome to experiment with it to see how the constructs were operationalised, or to create their own items. It is accessible via this page: https://www.forepsyte.com/resources/public For each subtest, 12 items of varying difficulty were created and administered, but with a view to reducing this to 9 items post hoc before the calculation of scores. This allowed the final selection to be approximately equated for difficulty between subtests, and for 'experimental' items (e.g., those with potentially counterintuitive responses) to be included in the initial battery before being discarded on the basis of their psychometric properties. Examples of stimulus images and the corresponding correct responses are shown in Fig. 1.
Participants completed the Bricks battery online, via a website created for the purpose using the opensource "psy.js" JavaScript library, which was developed specifically for the administration of psychometric measures such as questionnaires and cognitive tests. This library is also freely available at the link above.

Procedure
For each subtest, participants read appropriate instructions, completed two simple practice items (which provided feedback and clarification of the subtest rules), and then completed the test items in a fixed sequence of approximately increasing difficulty (selected based on pilot work). A time limit of 20 seconds was allowed for each item -the time remaining was displayed to participants via a timer at the top right of the screen. If participants made four consecutive incorrect responses, they were discontinued from the current subtest and began the next. Including the time spent reading instructions and reviewing practice items, the battery typically took 20-25 minutes to complete.

Data cleaning and scoring
After the participant exclusions described in the main text (e.g., excluding those with relevant severe disabilities), and prior to the data preparation procedures described (outlier removal, etc.), additional exclusions were made on the basis of suspected random or thoughtless responding. Conservative cut-offs were used to identify participants with very low variability in their responses -3SD below the mean, indicating that they had clicked on the same response option repeatedly for most or all items -or with mean reaction times of less than one second per item. Participants falling below these cut-offs were excluded from analysis.
For each item, a score of 1 was awarded for a correct response, or 0 for incorrect responses, no response or the item being skipped due to discontinuation. Scores from the nine items in the final battery were summed to yield subtest scores. These individual subtest scores were then cleaned and combined into "functional", "dimensional" and "overall Bricks" composites, as described in the main text. Path estimates (standardised) for the structure of additive genetic influences on g, Rotation and Visualisation (see Table S31 for more details). The paths in red indicate the genetic influences on Visualisation (the last variable in the model): i) those common to all three variables; ii) those shared only between Rotation and Visualisation but not with g (suggesting influences specific to spatial ability); and iii) those unique to Visualisation alone. The latter (italicised) is non-significant -i.e., all genetic influences on Visualisation are shared with Rotation. Path estimates (standardised) for the structure of additive genetic influences on verbal ability, non-verbal ability, and the 2D and 3D Bricks composites (see Table S36 for more details). The paths in red indicate the genetic influences on 3D (the last variable in the model): i) those common to all four variables; ii) those shared between non-verbal ability, and the 2D and 3D Bricks composites, but not with verbal ability; iii) those shared only between 2D and 3D but not with verbal or non-verbal ability (suggesting influences specific to spatial ability); and iv) those unique to 3D alone. The latter (italicised) is non-significant -i.e., all genetic influences on 3D are shared with 2D. Mean scores (standard deviations) for the whole sample, separately by sex, and for MZ and DZ twins, for the six Bricks subtests, the three functional and two dimensional composites, the single overall Bricks mean, and the other cognitive measures. N = sample size (the sample shown is fully independent, selecting one individual randomly per twin pair). ANOVA performed on cleaned, normality-transformed data to test effects of sex and zygosity. Results = F statistic; ** = p < 0.01; * = p < 0.05; R 2 = proportion of variance explained by sex, zygosity and their interaction. Consistency (Cronbach's alpha) and test-retest reliability (Pearson's r) for the six Bricks subtests, the three functional and two dimensional composites, and the single overall mean. The consistency sample is fully independent, with one individual selected randomly from each twin pair. Test-retest reliability was assessed with a separate pilot sample. Table S3. Subtest intercorrelations. Correlations (Pearson's r) between the six subtests, and between each subtest and the overall Bricks mean. The sample is fully independent, with one individual selected randomly from each twin pair. R = Rotation; R / V = Rotation and Visualisation; V = Visualisation. All correlations significant at p < 0.0001.  Correlations (Pearson's r) with other cognitive measures for the three functional and two dimensional Bricks composites, and the single overall mean. Mill Hill and Raven's Matrices correlate r = 0.31 with each other in this sample (N = 1420). All correlations significant at p < 0.0001.

Overall Bricks
N.B. The Rotation correlations with each other measure are significantly lower than those of Visualisation (all p < 0.01); however, since the 'Rotation / Visualisation combined' correlations do not differ significantly from those of Visualisation (despite the 'Rotation / Visualisation combined' conditions including both elements), this seems most likely to be related to the slightly lower reliability of one of the Rotation subtests (2D rotation) compared to the others, coupled with the highly-powered sample size, rather than representing a theoretically meaningful difference. Correlations (Pearson's r) between the six subtest residuals after regression on verbal ability (Mill Hill scores), and between each subtest and the overall Bricks mean. R = Rotation; R / V = Rotation and Visualisation; V = Visualisation. All correlations significant at p < 0.0001. Table S7. Subtest intercorrelations, regressed on non-verbal ability. Correlations (Pearson's r) between the six subtest residuals after regression on non-verbal ability (Raven's Matrices scores), and between each subtest and the overall Bricks mean. R = Rotation; R / V = Rotation and Visualisation; V = Visualisation. All correlations significant at p < 0.0001. Correlations (Pearson's r) between the six subtest residuals after regression on g (the mean of verbal and non-verbal ability scores), and between each subtest and the overall Bricks mean. R = Rotation; R / V = Rotation and Visualisation; V = Visualisation. All correlations significant at p < 0.0001. Correlations (Pearson's r) between the three functional Bricks composite residuals after regression on verbal ability (Mill Hill scores). All correlations significant at p < 0.0001.  Correlations (Pearson's r) between the three functional Bricks composite residuals after regression on g (the mean of verbal and non-verbal ability scores). All correlations significant at p < 0.0001. Correlation between 2D and 3D dimensional Bricks composites, after regression on verbal ability, non-verbal ability or g (their mean). All correlations significant at p < 0.0001. Factor loadings of Bricks subtests on the first (and only) principal component produced by factor analysis of the six subtest scores, after regression on verbal ability, non-verbal ability or g.  Model-fitting estimates (95% confidence intervals) for additive genetic (A), shared environmental (C) and residual (E; i.e., nonshared environment and error) components of variance, for the six Bricks subtests and for verbal ability (Mill Hill), non-verbal ability (Raven's Matrices) and g (their mean). For Bricks composites, see Table 2. Italicised estimates are non-significant (their confidence intervals include zero).  Bivariate correlated factors solutions of four models: three between the functional Bricks composites, and one between the dimensional composites. Results indicate the phenotypic correlations between the two composites in each model, decomposed into proportions attributable to additive genetic (A), shared environmental (C) or non-shared environmental/error (E) components (with 95% confidence intervals). The proportions explained reflect the correlation between the traits for that component, weighted by the two univariate component estimates -for example, the proportion of the phenotypic correlation due to A equals the genetic correlation weighted by the product of the square roots of the two univariate heritabilities estimated by the model. Italicised estimates are non-significant (their CIs include zero). Totals may exceed 1.00 due to rounding. Table S17. Proportions of Bricks subtest correlations due to common genetic influences. N.B. The figures shown are proportions of the total covariance, so the two lower and non-significant estimates in this table reflect the correspondingly higher non-shared environment components (Table S17) for those associations (and the wide CIs), rather than a meaningful distinction from the other correlations. Table S18. Proportions of Bricks subtest correlations due to common non-shared environmental influences.  Path estimates (standardised and squared, with 95% confidence intervals) for bivariate ACE Cholesky decomposition. The influences on the first entered variable (Rotation) are as in the univariate model for that variable (precise estimates vary between models), but those on the second (Visualisation) are decomposed into influences shared with the first variable, and those unique to the second. Italicised estimates are non-significant (their CIs include zero). Path estimates (standardised and squared, with 95% confidence intervals) for bivariate ACE Cholesky decomposition. The influences on the first entered variable (Rotation) are as in the univariate model for that variable (precise estimates vary between models), but those on the second (Rotation / Visualisation combined) are decomposed into influences shared with the first variable, and those unique to the second. Italicised estimates are non-significant (their CIs include zero). Path estimates (standardised and squared, with 95% confidence intervals) for bivariate ACE Cholesky decomposition. The influences on the first entered variable (Visualisation) are as in the univariate model for that variable (precise estimates vary between models), but those on the second (Rotation / Visualisation combined) are decomposed into influences shared with the first variable, and those unique to the second. Italicised estimates are non-significant (their CIs include zero). Path estimates (standardised and squared, with 95% confidence intervals) for bivariate ACE Cholesky decomposition. The influences on the first entered variable (2D) are as in the univariate model for that variable (precise estimates vary between models), but those on the second (3D) are decomposed into influences shared with the first variable, and those unique to the second. Italicised estimates are non-significant (their CIs include zero).   N.B. Two of these (3D Rotation's correlations with 2D Visualisation and with 3D Rotation/Visualisation) are technically nonsignificant, with CIs including zero; but given the high point estimates, and since this subtest's genetic correlations with other subtests have generally wider CIs than others, it seems likely that this reflects differences in the reliability of the subtests (or indeed chance differences) rather than a meaningful distinction from the other associations.  N.B. Most subtests have modest non-shared environmental influences in common. Some of these correlations are non-significant, but only barely (their 95% CIs are just below zero) and all the CIs overlap, so this is unlikely to reflect meaningful distinctions.

Overall Bricks
(The corresponding matrix for shared environment correlations is omitted, as there are no significant shared environmental influences on the Bricks measures). Comparison of univariate ACE models to fully saturated models. ep = estimated parameters; χ2 = -2 log-likelihood; df = degrees of freedom, AIC = Akaike information criterion. The p-values indicate no significant deterioration in fit between the saturated and constrained models (i.e., the ACE models fit well).