Audience preferences are predicted by temporal reliability of neural processing

Naturalistic stimuli evoke highly reliable brain activity across viewers. Here we record neural activity from a group of naive individuals while viewing popular, previously-broadcast television content for which the broad audience response is characterized by social media activity and audience ratings. We find that the level of inter-subject correlation in the evoked encephalographic responses predicts the expressions of interest and preference among thousands. Surprisingly, ratings of the larger audience are predicted with greater accuracy than those of the individuals from whom the neural data is obtained. An additional functional magnetic resonance imaging study employing a separate sample of subjects shows that the level of neural reliability evoked by these stimuli covaries with the amount of blood-oxygenation-level-dependent (BOLD) activation in higher-order visual and auditory regions. Our findings suggest that stimuli which we judge favourably may be those to which our brains respond in a stereotypical manner shared by our peers.

: Effect of temporal window size on relationship of ISC with viewership. A: When including the ads, peak correlation of ISC with viewership is attained for a window size of 3-4 minutes. Meanwhile, when considering programming only, the correlation of ISC with Nielsen ratings monotonically increases with temporal window size on the range 1-6 minutes. B: same as A, but computed after removing the downward trend in the viewership. : Population ratings are strongly correlated with those of the sample. Horizontal axis denotes the mean ratings assigned to each advertisement by the N = 12 study participants, with error bars denoting the SEM across subjects. Vertical axis denotes the aggregated ratings of > 7000 online voters. The sample ratings explain 59% of the variance in the population ratings.   The majority of tweets could be unambiguously linked to a single scene: median=1 scene per tweet. When ambiguous, the tweet count of all candidate scenes was incremented, resulting in a mean of 6.88 scenes per tweet. (B) The corresponding cumulative density function (CDF).  14.2 ± 5.9 "Kid assembles team" (Hyundai) 6.65 5.83 ± 2.29 0.14 ± 0.089 24.7 ± 23.2 "Love Ballad" (M&M's) 6.34 6.67 ± 1.92 0.14 ± 0.085 22.2 ± 21.5 "Whispering in the library" (Oreo) 5.88 6.58 ± 2.57 0.13 ± 0.088 15.9 ± 7.3 "Jocks love Jared" (Subway) 4.86 3.83 ± 1.47 0.16 ± 0.13 25.0 ± 26.1 * scaled and offset to match mean and standard deviation of 2013 ratings

Supplementary Note 1
Effect of temporal window size on prediction of Nielsen ratings. To predict viewership from neural reliability, we chose a temporal window size of 3 minutes as this value seemed to us to reflect a reasonable time over which to average neural activity to arrive at an estimate of behavior. Figure 1 displays the effect of varying the window size (from 1 to 6 minutes) on the resulting correlation coefficient between predicted and actual viewership. For all window sizes and all conditions (with and without ads, with and without downward trend in viewership), bootstrapped 95% confidence intervals exclude r = 0. Moreover, for all window sizes and all conditions, correlation coefficients are statistically significant (p < 0.01). When including ads, a broad peak is attained at a window size of 3-4 minutes, while for programming only the prediction accuracy monotonically increases across the selected range. The largest jump in correlation coefficient occurs from 1 to 2 minutes, possibly reflecting the fact that viewership at time t is related to neural reliability on the interval (t − τ, t), with τ > 0. Hence, matching the temporal window instantaneously to the window used to measure viewership is suboptimal. Note that for a window size of w minutes, the first w − 1 minutes lack the sufficient temporal aperture to compute our measure of reliability. To handle this, we simply assumed that the ISC at t < 0 was equal to that experienced during the first minute of viewing. Another way of saying this is that for the first w − 1 minutes of viewing, we only averaged over the available data.

Supplementary Note 2
Effect of age category on prediction of Nielsen ratings. There were two age categories provided by Nielsen: 18-49, and 25-54. Our subject pool (ages ranged from 19-32) overlapped with both categories. As a result, we summed the viewership across the two categories to form the viewership time series used in the regression. Here we report the results when considering the categories separately. Table 1 lists the correlation coefficients, p-values, and bootstrapped 95% confidence intervals on r for each age category, where we have also included the results for programming only as well as those obtained after detrending the viewership as described in the manuscript.

Supplementary Note 3
Facebook-USA Today Ad Meter scores.
For the combined analysis, the ratings of the 2012 commercials were scaled under the assumption that the quality was comparable to that of 2013. The 2012 ratings were scaled and offset such that the entire set of ratings from 2012 (N = 56) matched the ratings from 2013 (N = 54) in mean and standard deviation (5.61 ± 1.07) using the following: 2012 rescaled ratings = 2012 ratings × 1.6306 + 0.2483.
The drop in prediction performance of population ratings from neural reliability from 2012 to 2013 is driven by the Subway advertisement (2012: r = 0.90, 2013: r = 0.73). After exclusion, the correlation for 2013 increases to r = 0.88, p = 0.0018. This ad is distinctive in that it has repeated and jarring scene cuts throughout its 30 second duration, which may be the source of the relatively strong neural reliability (i.e., stronger than expected given the linear relationship with population rating).