Robust and replicable measurement for prepulse inhibition of the acoustic startle response

Measuring animal behavior in the context of experimental manipulation is critical for modeling, and understanding neuropsychiatric disease. Prepulse inhibition of the acoustic startle response (PPI) is a behavioral phenomenon studied extensively for this purpose, but the results of PPI studies are often inconsistent. As a result, the utility of this phenomenon remains uncertain. Here, we deconstruct the phenomenon of PPI and confirm several limitations of the methodology traditionally utilized to describe PPI, including that the underlying startle response has a non-Gaussian distribution, and that the traditional PPI metric changes with different stimuli. We then develop a novel model that reveals PPI to be a combination of the previously appreciated scaling of the startle response, as well as a scaling of sound processing. Using our model, we find no evidence for differences in PPI in a rat model of Fragile-X Syndrome (FXS) compared with wild-type controls. These results in the rat provide a reliable methodology that could be used to clarify inconsistent PPI results in mice and humans. In contrast, we find robust differences between wild-type male and female rats. Our model allows us to understand the nature of these differences, and we find that both the startle-scaling and sound-scaling components of PPI are a function of the baseline startle response. Males and females differ specifically in the startle-scaling, but not the sound-scaling, component of PPI. These findings establish a robust experimental and analytical approach that has the potential to provide a consistent biomarker of brain function.

of negative scaling: 1) Some rat-conditions had a small negative sound-scaling to a very weak prepulse condition, such as the 2 dB prepulse sound. This was likely due to overfitting to the noise for conditions with little or no actual PPI. This could explain why the unbounded model was no better than the bounded model after cross-validation. And 2) some rat-conditions had a large negative startle-scaling to a strong prepulse condition, such as the 18 dB prepulse sound. In fact, these strong prepulse conditions had so much PPI that we did not have many data points close to startle saturation, a further indication of the presence of sound-scaling. Without a welldefined saturation point, the model was free to converge on less biologically plausible startlescaling values to compensate for other parts of the fit. In this case, the model appeared to be using a negative startle-scaling to indirectly increase the slope of the prepulse curve to better fit the rising part of the curve. For these reasons, we chose to stick with the more interpretable model with ! " and # " bounded between 0 and 1.

Standard PPIratio assumptions cannot describe the phenomenon of PPI
Here we derive that the PPIratio metric can never decrease as a function of increasing startle sound if PPI is just due to a scaling of the startle response, under the assumption that the acoustic startle response is well captured by any monotonically increasing function.
Protocol for measuring and fitting sound and startle-scaling PPI model. 1) Measure the acoustic startle response at many different stimuli. a) Vary the startle sound level across the full range of values over which the startle response changes. For example, we varied the startle sound level between 0 -60 dB above background, as this was the range over which our rats' startle responses varied from zero to maximum startle. b) Vary the prepulse sound level and/or the delay time across the range of values over which PPI changes as a function of that parameter. For example, we varied the prepulse sound between 0 -18 dB above background and the delay between 50 -200 ms. To get a measurement of PPI, it is only necessary to pick a single prepulse sound level and delay. We sought to understand a large swath of the phenomenon and therefore utilized many different prepulse sound levels and delays. c) For each stimulus-i.e. combination of prepulse sound, delay time, and startle sound-collect data from at least 50-100 trial repeats for every animal tested. d) For each trial, normalize the raw accelerometer data by a baseline accelerometer measure, e.g. by the data from times prior to the presentation of any stimulus (Fig.  S2a&b). e) Take the log10 of all of the normalized accelerometer data. f) For each trial, find the maximum value of the log normalized data in a 100 ms window following the presentation of the startle sound. 2) Compute the average startle at each stimulus a) For each rat, find the mean and standard error of the trial maxima from 1f at each stimulus. b) For every mean startle value from 2a, subtract the mean value across all of the control stimuli, i.e. the stimuli with startle sound level 0 dB above background across all prepulse conditions. c) We define the resulting values as the startle to a given stimulus for an animal, and we can plot these values as startle response versus startle sound (Fig. 2a). 3) Fit the PPI model to the average startle data for each rat. Python code to implement this step can be found at https://github.com/angevineMiller/ppi_model. a) Implement a sigmoid function with startle-scaling and sound-scaling parameters (Materials and Methods Eq. 3&4). This function should accept 5 parameters for each prepulse condition: , max , % 1 , and ? for the baseline sigmoid and ! " and # " for the scaling of the baseline sigmoid due to a prepulse condition, @. Note that ! " and # " , but not the baseline sigmoid parameters, change across different prepulse conditions within the same animal. b) Implement an objective function that computes the total RMSE across every stimulus between the average startle response data from 2c and the model predictions at those stimuli. This RMSE is computed under a choice of model parameters for every prepulse condition (i.e. , max , % 1 , ?, ! " , and # " for all prepulse conditions, @). In Figure 2a, this can be seen as the total error across all of the data points and their corresponding model curves of the same color. c) Use a minimization algorithm (e.g. Scipy.optimize) to find the optimal model parameters that minimize the objective function against the average startle data for each individual rat. Initial conditions for the scaling parameters can be set to no scaling. Initial conditions for the baseline sigmoid can be set to anything that you think will optimize the chances of converging on the best fit (we chose the parameters that best fit the sigmoid to the startle values with no prepulse). 4) Evaluate group differences in the model parameters a) Standardize the parameters to all range between 0 -1 and subtract the means. b) For each prepulse condition, run a linear classifier such as linear discriminate analysis (LDA). c) Compute the mean absolute (unsigned) distance from the linear discriminate hyperplane. d) Compute LDA classification accuracy using leave-one-out cross-validation. e) Report group separability if the mean absolute distance and the cross-validated classification accuracy are significantly great from permutation tests on the group labels. 5) Find baseline threshold and saturation for each animal a) The baseline saturation is defined as , max of the baseline sigmoid for a given animal. b) Compute the baseline threshold, defined as the startle sound level at which an animal's baseline startle curve reaches 5% of , max . 6) Evaluate group differences in PPI a) For each prepulse condition, fit two linear models per group: one for soundscaling versus baseline threshold and one for startle-scaling versus baseline saturation, and plot these with 95% confidence intervals (Fig. 5&S6). b) For each prepulse condition, check for group difference in the baseline parameter.
If there are significant group differences in the baseline parameter, an ANCOVA cannot be computed for that prepulse condition. You can run t-tests for group difference in the scaling parameters but be aware that these differences could be caused by non-random group differences in the baseline startle. c) Assuming no/few group differences in the baseline parameters, compute two ANCOVAs for each prepulse condition: one for startle-scaling as a function of group and baseline saturation, and one for sound-scaling as a function of group and baseline threshold. Include a group by baseline interaction terms in all ANCOVAs. d) If the baseline by group interaction terms are significant in any of the ANCOVAs, we cannot use those prepulse conditions because they break the homogeneity of slopes assumption. e) Assuming no/few significant interaction terms, recompute all of the ANCOVAs without interaction terms, and look for significant main effects of group. f) Control for multiple comparisons, where each of your prepulse conditions is a separate comparison, using a bootstrapped ratio test to determine the probability of seeing a given number of significant prepulse conditions by chance alone. Alternatively, control for multiple comparisons using Bonferroni correction or related methodology. g) Report group differences in PPI startle-scaling or sound-scaling if it holds up to the control for multiple comparisons.

Results generalize across different background sounds, ages, and manipulations
We separated the experiments with a 70 dB background sound level from those with a 77 dB background sound level and separately analyzed all of our primary findings for the two background sound levels. All of the results were consistent with our original findings. In addition, the animals used for the 70 dB and 77 dB experiments were different ages. In the 70 dB experiments, the animals were 3 -7 months old at the time of experimentation, whereas in the 77 dB experiments the animals were 9 -15 months old. Thus, this control also shows that our results generalize across these relative age groups. In particular, for both 70 dB background (younger rats) and 77 dB background sounds (older rats) experiments: (1) The startle response distribution is better described by a log-normal than a normal distribution within animals for all of the stimuli. (5) Startle-scaling was inversely correlated with baseline saturation, and sound-scaling was inversely correlated with baseline threshold. For the 70 dB background experiments, the mean Person's r for sound-scaling versus baseline threshold was -0.31 ± 0.15, and the r 2 values ranged from 0 to 0.83; the mean Pearson's r for startle-scaling versus baseline saturation was -0.47 ± 0.08, and the r 2 values ranged from 0.02 to 0.75. For the 77 dB background experiments, the mean Person's r for sound-scaling versus baseline threshold was -0.49 ± 0.08, and the r 2 values ranged from 0.01 to 0.57; the mean Pearson's r for startlescaling versus baseline saturation was -0.60 ± 0.04, and the r 2 values ranged from 0.20 to 0.57. (6) For the Fmr1 KO versus WT male comparison, the 70 dB experiments were those that primarily varied the prepulse level and the 77 dB background experiments were those that primarily varied the delay time. Furthermore, as we described in response to the previous comment, we never combine data from prepulse-varying and delay-varying experiments because we do not know the correspondence between manipulations of these variables. As such, the LDA and ANCOVA analyses are already split by background level, and since none of these comparisons were significant, we can conclude that we were not able to detect group differences to either of the background sounds. (7) All of the experiments comparing WT male and WT female rats used a 70 dB background sound, so these comparisons cannot generalize across background sound.

Animals did not startle prior to the startle sound
First, we computed the startle response in the window after the prepulse sound onset but before the startle sound onset. For 457/462 (98.9%) rat-conditions, the average startle response to the prepulse sound remained below the rat's 5% startle threshold. Thus, the vast majority of rats, in the vast majority of conditions, did not startle to any of the prepulse conditions. Next, we analyzed the startle response for all of the animals in the 20 ms window prior to the startle sound onset, i.e. the end of the delay interval. In this period, we found that the startle response was below threshold for all 462/462 rat-conditions. Thus, even in the rare cases where there were startles during the prepulse time window, the startle responses had all returned to below 5% startle threshold before the onset of the startle sound for all rats and all prepulse conditions. Finally, we analyzed whether the model parameters were any different for the 9/462 ratconditions that exceeded threshold during the time of the prepulse sound. Of these 9 ratconditions, there were 8 unique rats since one rat exceeded threshold in two conditions. For these 8 rats, we compared the baseline model parameters (saturation, slope, and midpoint) with the parameters for the other rats in comparable experiments that didn't startle to a prepulse. The absolute Z-scores were less than 2.0 for all of the 8 rats' baseline parameters, and for the slope and midpoint parameters the absolute Z-scores were always less than 1.0.
We also looked at the PPI scaling parameters. One of the 9 rat-conditions exceeding 5% threshold was to the 0 dB prepulse condition, which by definition has no scaling in our model. For each of the 8 rat-conditions with a nonzero prepulse sound, we compared the startle-scaling and sound-scaling parameters with all of the other rats at that prepulse condition. For soundscaling 6/8 rat-conditions had an absolute Z-score below 2.0, while for startle-scaling all 8/8 the absolute Z-score was below 2.0 for all rat-conditions.

Dynamics between first and second halves of experiment
We fit the model separately to the animals' startle data from the first and last halves of an experiment (spanning a total of 12 session across several days) and then looked for changes in the model parameters for individual animals' baseline startle curves.
We detected small but significant changes in the baseline model parameters and startle threshold from the first to last halves of the experiments. Within animals, the startle saturation decreased by a mean of 6.10 ± 0.77% (p < 10 A&* , t-test), which is 26.4 ± 3.2% of the betweenanimals IQR for this parameter. The startle threshold increased by a mean of 5.57 ± 0.86% (p < 10 AB , t-test), which is a mean of 20.2 ± 3.1% of the between-animals IQR. The slope increased by a mean of 7.57 ± 2.71% (p < 0.03, t-test), which is a mean of 21.8 ± 9.6% of the betweenanimals IQR. The midpoint increased by a mean of 2.47 ± 0.40% (p < 10 AB , t-test), which is a mean of 12.6 ± 2.0% of the between-animals IQR.
We also looked for changes in the PPI startle-scaling and sound-scaling parameters between the first and second halves of the experiments. We detected increases in startle-scaling in 6/13 prepulse conditions (p < 0.05, t-test), which is more conditions that we expect by chance (p < 10 AC , bootstrap test for multiple comparisons), and 4/13 hold up to Bonferroni correction for multiple comparisons (p < 0.004). Furthermore, the biggest changes were in the stronger prepulse conditions (louder prepulse, shorter delay). The mean startle-scaling changes across animals within the 6 conditions ranged from 2.7% -17.9%. However, the changes for individual animals were relatively small compared to differences between animals. The mean ratio of within-animal change in startle-scaling to the across-animal standard deviation was greater than 1 in only 1/13 conditions, and in no conditions was it greater than 2.
We also found a small decrease in sound-scaling in 2/13 conditions and a small increase in sound-scaling in 1 condition, which is more conditions that we expect by chance (p < 0.03, bootstrap test). However, none of the conditions held up to Bonferroni correction (p > 0.004), the changes were not in a consistent direction, and the overall magnitude of the changes were smaller than for startle-scaling.

No evidence for hearing loss due to loudest startle sounds or older animals
As discussed above, our results hold true even if we limit ourselves to analyzing the experiments with a 70 dB background sound (and hence lower absolute sound levels), which were also the experiments with only younger animals. Thus, any potential hearing loss due to the louder sounds or older animals does not affect our main conclusions. Nevertheless, we did observe small but significant changes in the baseline parameters and startle threshold between the first and second halves of the experiments, and these changes are generally in the directions we might expect from hearing loss.
If these changes were due to hearing loss caused by the loudest sound levels or the age of the oldest animals, then we would expect to see larger magnitude changes in experiments with louder absolute sounds and older animals, compared to experiments with weaker absolute sounds and younger animals. To test this, we separately computed the changes in baseline model parameters between the first and second halves for experiments with 70 dB background sound and for experiments with 77 dB background sound. The experiments with 70 dB background sound always had lower maximal absolute sound level because the maximal relative sound level was 60 dB above background in both experiment types, and the 70 dB background experiments always had younger animals (see Supplementary Table 1).
We detected no differences in any of the baseline parameters nor in the startle threshold between the 70 dB experiments and the 77 dB experiments (p > 0.08, t-test), and the trend was actually toward greater changes in the 70 dB experiments (data not shown). These results indicate that the loudest startle sounds did not in and of themselves result in significant hearing loss, even among the older animals. It is still possible that the changes observed in both experiment types could have been caused by hearing loss not attributable to the loudest sounds alone, but it is not obvious how this would occur. We therefore suggest that these changes are likely due to habituation or other dynamics of the startle response not analyzed here. 1. One Fmr1 KO rat was euthanized after developing a tumor, and one WT male rat was not included due to experimental error. 2. Stimulus is defined as a unique combination of prepulse sound level and startle sound level (e.g. 14 dB prepulse, 100 ms delay, 35 dB startle sound)

Supplementary Figure Legends:
Supplemental Figure 1 Characterization of the Fmr1 KO rat. (a) Two mutant models were generated (Fmr1-m2 and Fmr1-m4) with frame-shift indels in exon 7. CRISPR-SpCas9 target site is underlined with protospacer adjacent motif (PAM) in bold. (b) FMR1 expression is absent in knockout male whole brain extracts. (c) Body weight trended larger, and testes/body weight ratio was greater in m2 and m4 knockout males compared to wildtype littermates at 30 days of age. N = 3 -5 per group, p-value determined by Student's T-test.
Supplemental Figure 2 Normalizing movement data for apparatus gain. (a) Raw accelerometer data for all trials (colors) in a single session for one rat. Inset shows accelerometer data in the first 100 ms of the trials. (b) Histogram of accelerometer readings in the first 100 ms across all trials in a single session for one rat (same rat as Fig. S2a). Solid curve shows the Gaussian fit to the histogram, and the legend shows the mean and standard deviation of this Gaussian. (c) Normalized movement of a single trial for one rat (same rat as Fig. S2a&b). Dashed vertical lines indicate the startle sound onset (left) and 100 ms after the startle sound onset (right). Arrows indicate startle sound onset and the maximum normalized movement in the 100 ms window.

Supplemental Figure 3 Estimation precision of model parameters and their correlations. (a)
Distribution of RMS errors for the model for each animal, compared to the distribution of errors that occur from swapping the model parameters for each rat with the parameter sets for all of the other rats in the same experiment (left), and the within-rat changes in error that occur from swapping parameter sets from other animals (right). Note that all values of the difference are greater than zero, indicating that the actual parameter set is always the better fit compared to all other animals' parameter sets within an experiment. (b) Scatter plot of sound-scaling versus threshold (left) and startle-scaling versus saturation (right) for the 14 dB, 100 ms condition from experiments that varied the prepulse level. Vertical and horizontal error bars for each point indicate 90% confidence intervals. The correlations between the x-and y-axis measures is evident even when taking into account the confidence intervals (c) Range of correlation values that occur due to resampling the parameters from within their confidence intervals by refitting the model 10,000 times to jittered data and recomputing the correlations. For the correlations observed between sound-scaling and threshold (blue), 14/15 have D < 0.05 of including the value ? = 0. For the correlations observed between startle-scaling and saturation (red), 10/15 have D < 0.05 of including the value ? = 0. These data indicate that the majority of observed correlations are robust to noise in the data. (d) Correlation values between startle-scaling and baseline saturation (left) and between sound-scaling and baseline threshold (right) for prepulsevarying experiments (top) and delay-varying experiments (bottom). Points are the observed Pearson's r values across animals and within conditions (same data that makes up histograms in Fig. 4d). Boxes are the median and interquartile range (IQR) of the r values within animals and within conditions after jittering the parameters by the noise in the data (i.e. representing what could be expected for the correlations just due to compensatory effects in the parameters that would be exposed solely due to noise in the data). For panels c & d, whiskers extend to the last datum within 1.5 IQR beyond the first and third quartiles; points outside of this range are represented as outliers. Dotted horizontal line represents an r value of 0.  , and no group differences startle-scaling or sound-scaling at any prepulse condition (p > 0.05, ANCOVA).