Sociality and interaction envelope organize visual action representations

Humans observe a wide range of actions in their surroundings. How is the visual cortex organized to process this diverse input? Using functional neuroimaging, we measured brain responses while participants viewed short videos of everyday actions, then probed the structure in these responses using voxel-wise encoding modeling. Responses are well fit by feature spaces that capture the body parts involved in an action and the action’s targets (i.e. whether the action was directed at an object, another person, the actor, and space). Clustering analyses reveal five large-scale networks that summarize the voxel tuning: one related to social aspects of an action, and four related to the scale of the interaction envelope, ranging from fine-scale manipulations directed at objects, to large-scale whole-body movements directed at distant locations. We propose that these networks reveal the major representational joints in how actions are processed by visual regions of the brain.


Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection

Data analysis
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. Behavioural & social sciences study design All studies must disclose on these points even when the disclosure is negative. Note that full information on the approval of the study protocol must also be provided in the manuscript.

Magnetic resonance imaging
Experimental design Design type

Design specifications
The data reported here are quantitative measurements of brain activity using functional MRI, as well as behavioral ratings.
The research sample consisted of 13 members of the Harvard University community (5 males, 21-39 years). This sample is reasonably representative, especially since we did not predict that the visual system's organization would differ in another randomly-selected group of participants.
Participants were sampled based on convenience, taking into consideration their prior history of claustrophobia, ability to remain still during past fMRI experiments, and the absence of metal in their bodies. The sample size was selected based on common sample sizes used in the literature to investigate visual cortex representations; in addition, we performed a post-hoc analysis to investigate the reliability of these data across subjects, and found that it was very reliable within our chosen sub-set of voxels (average split-half correlation distance = 0.21, which was significantly lower than in scrambled data; t(119) = 229.4, p < 0.001). Therefore, we believe that this sample is reasonably representative of the human visual system, and not significantly influenced by outliers.
Imaging data were collected using a 32-channel phased-array head coil with a 3T Siemens Prisma fMRI Scanner at the Harvard Center for Brain Sciences. In addition, human ratings data were collected using Amazon Mechanical Turk, where responses were in the form of clicks on a clickable map of the human body and multiple-choice responses to yes-or-no questions.
Data were collected between July, 2016 and December, 2016.
No data were excluded from the analysis.
No participants dropped out of the study.
Participants were not allocated into experimental groups.
See above.
Participants were recruited from a registry of past fMRI participants and through word of mouth.
Harvard University Institutional Review Board.
The design of the study was "condition-rich," meaning that responses were collected during 5-second "mini-blocks", which are shorter than standard blocked designs but longer than standard event-related designed.
60 5-second blocks were included in each of 8 imaging runs. In addition, 4 15-second null blocks were interspersed throughout each run, with an additional 4 s. at the beginning and 10 s. at the end of each run. To assess subjects' alertness throughout the experiment, we recorded their performance in a simple task (detecting the presence of a square red frame around the videos). Specifically, we recorded each button press and then assessed whether they detected the majority of the probe items (15 per run). Subjects were included as long as they missed fewer than 5 probe items.
Brain Voyager QX, version 2.8.4 was used to pre-process the anatomical and functional data. Functional preprocessing included slice scan-time correction, 3D motion correction, linear trend removal, temporal high-pass filtering (0.008 Hz cutoff), spatial smoothing (4 mm FWHM Kernel), and a transformation to Talairach coordinates. Whole-brain randomeffect group GLMs were fit separately for each video set, as well as for both odd and even runs of each video set. In all cases, the design matrix included regressors for each condition block, specified as a square-wave regressor for each 5second stimulus presentation time, convolved with a 2-gamma function that approximated the idealized hemodynamic response. Across these GLMs, the average variance inflation factor across conditions of the design matrix was 1.03 (where a value greater than 5 is considered problematic), and the average efficiency was 0.2 (Liu, Frank, Wong, & Buxton, 2001). Voxel time series were normalized within a run using a z-transform and corrected for temporal autocorrelations during GLM fitting. Beta weights extracted from these group-level random-effects GLMs were averaged across subjects for each voxel, and then taken as the primary measure of interest for all subsequent analyses. Each subject's cortical surface was reconstructed from the high-resolution T1-weighted anatomical scan using Freesurfer software, and one subject was selected as the display brain for the group data.
Data were normalized to fit within Talairach coordinates. In addition, voxel time series from functional runs were normalized within a run using a z-transform and corrected for temporal autocorrelations during GLM fitting.
We did not perform volume censoring.
We used an encoding-model approach (Mitchell et al., 2008;Huth et al., 2012) to model each voxel's response magnitude for each action video as a weighted sum of the elements in the video's feature vector (e.g., individual body parts) using L2 ("ridge") regularized regression. The regularization coefficient (!) in each voxel was selected for each voxel to minimize the mean-squared error of the fit in a 10-fold cross-validation procedure. Models were fit separately for the two video sets. To ensure that our models were not over-fit, we estimated their ability to predict out of sample using a leave-one-out cross-validation procedure. This was done by training the model iteratively on data from 59/60 videos in each voxel. We then calculated the predicted response magnitude for the held-out video (beta weights from the training model * feature vector for the held-out video). After 60 iterations, the predicted and actual data for the held-out actions were correlated to produce a single cross-validated r-value (rCV) for each voxel. All models were fit using responses from the group data. This procedure was performed separately using data from video set 1 and data video set 2.
See above.
Split-half reliability was calculated for each voxel by correlating the betas extracted from odd and even runs of the main task. Reliability was calculated across sets by correlating odd and even betas from glms calculated over the two video sets. We then used a procedure from Tarhan & Konkle (under review,