Stimulus-dependent representational drift in primary visual cortex

To produce consistent sensory perception, neurons must maintain stable representations of sensory input. However, neurons in many regions exhibit progressive drift across days. Longitudinal studies have found stable responses to artificial stimuli across sessions in visual areas, but it is unclear whether this stability extends to naturalistic stimuli. We performed chronic 2-photon imaging of mouse V1 populations to directly compare the representational stability of artificial versus naturalistic visual stimuli over weeks. Responses to gratings were highly stable across sessions. However, neural responses to naturalistic movies exhibited progressive representational drift across sessions. Differential drift was present across cortical layers, in inhibitory interneurons, and could not be explained by differential response strength or higher order stimulus statistics. However, representational drift was accompanied by similar differential changes in local population correlation structure. These results suggest representational stability in V1 is stimulus-dependent and may relate to differences in preexisting circuit architecture of co-tuned neurons.

The study by Marks and Goard addresses the potential stability of visually-evoked responses in the mouse visual cortex, comparing activity evoked by either traditional drifting gratings or natural scene movies. They find that, especially for movies, neural responses exhibit "representational drift" across sessions. They also find changes in the correlational structure of local networks, although they do not see differences by cell type or layer. However, the overall conclusions are quite limited -there is little exploration of what aspects of the visual stimulus or behavioral state drive the changes, what the cellular or circuit mechanism might be, nor what consequences for perception might follow. These gaps are particularly evident given the significant body of work that already exists on this topic. In addition, there are some analytical issues that make it difficult to fully interpret the results.
1. The question -the stability of sensory representations -is a well examined one. Moreover, despite the lengthy introduction, many earlier studies are not mentioned by the authors (Gavornik and Bear 2014, Makino and Komiyama, Puscian...Higley 2020) that explicitly address the relationship between repeated stimulus presentation and response properties. In particular, Puscian et al show that repeated presentation of drifting gratings drives changes in response amplitude without alteration in sensory representation.
2. The authors use correlations in activity across entire trials as a metric for both reliability and responsiveness, as well as the readout of "representational drift". However, it is unclear how to interpret this signal, as it will be a mixture of both visually-evoked signals and spontaneous activity from inter-stimulus intervals. For example, relative increases in the amount of spontaneous fluctuations will necessarily reduce the correlation between trials in the absence of any change in the "visually evoked response". While this type of overall correlation is potentially helpful for calcium imaging data that is already a filtered version of the underlying spike train, the lack of similarity to other studies and the potential confound from inter-stimulus periods limits interpretation.
3. A related issue is the direct comparison of responses evoked by brief gratings versus natural scenes. The gratings have clear onset times, and therefore robust, discrete onset transients of large amplitude that are clearly seen in the data ( Figure 1F). In contrast, natural scenes may or may not have such robust and repeatable onset transients, leading to "muddier" responses whose signal-tonoise may be more sensitive to spontaneous fluctuations (also apparent in Figure 1F). Thus, the selective "representational drift" may simply reflect the apples-to-oranges comparison of the two stimulus sets and an unconventional response metric rather than a fundamental difference in plasticity.
4. The authors provide some analysis of changes in arousal across training by measuring pupil diameter. However, the demonstration that average pupil diameter does not correlate over sessions with RDI seems cursory at best. It would be helpful to see that the pupil changes observed do provide within-session information that tracks visual responsiveness, as published previously by many groups. In addition, other behavioral state variables (e.g., whisking) might provide an alternative metric. This seems particularly critical, as the authors in the end find no convincing explanation for the RDI.
5. Finally, in the absence of a mechanism or function, it would be interesting to at least see greater exploration of the relationship between visual stimulation and RDI. For example, does RDI occur for movies if only gratings are presented from weeks 2-6? Does more frequent stimulus presentation drive faster or more robust RDI? Does the RDI persist if visual stimulation is stopped for a period of time?
Reviewer #2 (Remarks to the Author): The manuscript examines stability of visual cortex responses to drifting gratings and natural movies over the course of several weeks of recording. The stability of neuronal representations and how they change over time is an important question as it constrains for how these representations can influence downstream structures and guide behavior.
The primary claims of the paper are as follows: (i) responses to natural movies are less stable than responses to drifting gratings; (ii) this difference cannot be trivially explained by differences in the magnitude of response events; (iii) changes in responses cannot be explained by behavioral variables; (iv) representational drift appears to affect different layers as well as both excitatory and inhibitory neurons to a similar degree.
Overall, this is an interesting and well-executed study. The fact that natural movie responses exhibit a greater degree of plasticity is clear. What is less clear is what may be driving this difference and whether behavioral changes might contribute to apparent representational drift, contrary to the claims of the manuscript.
1. It is clear that natural movie responses are less stable than that to drifting gratings but it is not clear what is it about natural movies that underlies this difference. As the result, while this is an interesting observation, the reader is left wondering what it might imply about neural circuit function. Are MOV stimuli simply richer and more diverse, and therefore more likely to reveal representational drift over sessions? If that were the case, it would not decrease my enthusiasm for the paper, as it would suggest that simple parametric stimuli that are typically used overestimate stability of responses.
The authors speculate that differences in connectivity of ensembles recruited by gratings vs movies may be responsible. However, the speculation that neurons that happen to respond at the same moment during presentation of the MOV stimulus are less likely to be integrated into the same synaptic subnetwork than those that respond to the same grating orientation is at odds with prior work. Cossell et al. (manuscript ref. 72) showed that neurons responses to natural images were highly predictive of their synaptic connectivity. While that study did not use natural movies, it would be surprising if the results did not also extend to movie stimuli. Therefore, it seems likely that neuronal ensembles that are co-active during natural movie presentation are also recurrently connected.
2. The data supporting the claim that behavioral variables cannot explain differential stability of PDG and MOV responses are weak. Firstly, the authors do show that pupil diameter decreases over the time. The corresponding decrease in arousal may cause the overall reduction in the strength response events for both PDG and MOV stimuli ( Figure 2D). Could MOV responses be more sensitive to animal's level of arousal?
The authors state that "pupil size stabilized after the third session, and is unlikely to contribute to progressive drift observed in later sessions". The only corresponding data figure is Supplemental Figure 8C. It would be much easier to relate this to RDI measurements if pupil size was plotted for each recording week (as RDI values in the main figures). Since there is a lot of mouse-to-mouse variability in RDI curves and pupil size was only measured in 4 of the mice, it would also be helpful to compare pupil measurements and response stability for those specific animals.
Secondly, pupil diameter and position are the only aspects of behavior quantified. No videography or other data is presented. Without additional behavioral measurements, I would encourage the authors to carefully consider the claims they make about potential behavior-related confounds and perhaps focus on arousal (based on pupillometry) rather than behavior in general.
3. I am not sure what additional insights are gained from the signal correlation analysis in Figure 6. Surely, if single cell trial-average responses change, signal correlations are likely to change too. It would have been quite surprising if single cell responses changed in such a way as to preserve signal correlations. Am I missing something here?
4. An important question that is not discussed in the manuscript is whether repeated presentation of the same stimulus may be contributing to representational drift. Would the same amount of drift be observed between week 1 and week 7 if the stimulus was not presented on weeks 2-6, or if it was presented with fewer repetitions? If the authors' data can shed light on this question, it would strengthen the manuscript. Otherwise, the authors ought to consider the potential role of repeated exposure to the stimulus in the discussion.
5. Animals of a wide range of ages (12 -30 weeks) are used in the study. Does animal age explain any of the variability in the extent of representational drift?
Reviewer #3 (Remarks to the Author): The manuscript by Mark and Goard titled 'Stimulus-dependent representational drift in primary visual cortex' examines the stability of natural scene representation in primary visual cortex (V1) relative to the stability of simpler grating stimuli. This is an important issue to address because although it is well recognized that orientation tuning to classic Cartesian stimuli in the upper layers of primary cortex is stable across days, it is completely unknown whether responses to complex, naturalistic stimuli are stable. The study is comprehensive and uses the micro-prism technique to capture neural responses from layers 2/3, 4, and 5. The scholarship of the manuscript is excellent; the manuscript is well-written and was a pleasure to read. The quality of the data set is superb and will likely serve as reference material for years to come. Associating a metric of quality of tracking single neurons across sessions is extremely useful. The discovery that the representation of naturalistic movies in V1 is less stable than classic oriented stimuli is of interest to the fields of mammalian vision and sensory systems in general.
Major 1. The authors include a results section on the topic of whether the greater instability observed for MOV can be accounted for by behavioral state. This is important. It would be useful to examine this issue in further detail. Eye movements: do the eyes move more during MOV than PDG for the four mice examined? Does this change with sessions number? Cumulative distribution of pupil position for each individual mouse for all sessions could address this issue (ideally degrees, but relative to median with-session ok) 2. If possible, it would be valuable perform additional analysis on the inhibitory neuron population, specifically, to sub-divide the inhibitory neurons into functional classes. For example, are the inhibitory neurons broadly tuned for orientation more or less stable than inhibitory neurons sharply tuned for orientation?
3. The authors' justification for concluding that their data suggest local network connectivity may influence stability is tenuous. Do the authors have any data demonstrating that instability of naturalistic images is not the result influence arising from top-down input? ...if not, fine, then remove this claim. Neither the main conclusion, nor the significance of the manuscript is altered if this statement is removed or moved to the Discussion. 4. It would be useful if the description of the segmentation included more details. (I)What parameters and range are used to modify the kurtosis measure? What is the median adaptive threshold, is the spatial extent of the localization fixed, and what range of threshold are typically used? (II) During the manual check (which is a good idea), what are the criteria for rejection of a segment? Inclusion criteria? (the graphical interface is a nice touch, that should help ensure consistency across users)

Minor
Please report the luminance of each frame in the movie sequence.
Line 75. The work cited is not peer reviewed, so using this work as a rationale for further study is problematic. Furthermore, the rationale is sufficiently strong without including a reference to results uploaded to BioArchive. Line 239. The term 'highly salient' features should be avoided when interpreting the data generated here. There is no salience to the stimuli-if anything, the animal may be learning to ignore these stimuli, given the presented stimuli do not have behavioral significance. The authors' point being made here is unclear.

Figure1
Legend. Please do not use term the term 'watching'.
Line 629. How many pixels typically went into calculating the neuropil per neuron? What are the spatial dimensions of a pixel?

Response to Reviewers
We would like to thank the reviewers for their insightful comments on the manuscript. We have added several additional experiments and revised the manuscript to address the concerns raised in the reviews. Indicated changes to the manuscript are marked in red and line numbers are specified in the response.

Reviewer #1 (Remarks to the Author):
The study by Marks and Goard addresses the potential stability of visually-evoked responses in the mouse visual cortex, comparing activity evoked by either traditional drifting gratings or natural scene movies. They find that, especially for movies, neural responses exhibit "representational drift" across sessions. They also find changes in the correlational structure of local networks, although they do not see differences by cell type or layer. However, the overall conclusions are quite limited -there is little exploration of what aspects of the visual stimulus or behavioral state drive the changes, what the cellular or circuit mechanism might be, nor what consequences for perception might follow. These gaps are particularly evident given the significant body of work that already exists on this topic. In addition, there are some analytical issues that make it difficult to fully interpret the results.
We thank the reviewer for their comments, though we are not sure if we understand the primary criticism. A significant portion of our manuscript concentrated on aspects of the stimulus (Figures 2, 5, S9), behavioral state (Figures S8-9), and cell types/layers (Figures 3, 4) that might be responsible for the differential representational drift. Admittedly, a number of the findings were negative results (e.g., the differential drift could not be explained by differences across layers or higher order spatial correlations). However, checking these possibilities is an important part of investigating the mechanism. We also provide a hypothesis for explaining the differential drift -namely, that subnetworks of highly connected neurons with similar stimulus tuning stabilize responses (Figures 6, S10). This allows the same neurons to reliably encode particular visual features such as orientation, while responses to other visual features are more malleable.
We agree that there is already a significant body of work on representational drift, but this is the first manuscript that describes differential representational drift within the same neurons for different stimuli. Although there remain questions on the underlying mechanism, this finding alone changes our understanding of representational drift. This is an important distinction -these studies (and a number of others) are focused on perceptual learning rather than on representational drift. We know that perceptual learning can change cortical representations (even for orientation), but this is very different from spontaneous changes due to representational drift. We have clarified this point in the manuscript (lines 208-210).
That said, it is possible that the repeated stimulus alone induces some perceptual learning, despite the lack of task or reward (this point was also raised by Reviewer #2). Indeed, this was found in the Gavornik & Bear 2014 paper upon repeated passive presentation, albeit with a simpler stimulus sequence and more repetitions (800 repeats) presented on a daily basis.
To address this possibility, we have run experiments to test representational drift across the same time span, but without the intervening stimulus presentations ( Figure S8; lines 210-218; specific experiments explained in point 5 below).
2. The authors use correlations in activity across entire trials as a metric for both reliability and responsiveness, as well as the readout of "representational drift". However, it is unclear how to interpret this signal, as it will be a mixture of both visually-evoked signals and spontaneous activity from inter-stimulus intervals. For example, relative increases in the amount of spontaneous fluctuations will necessarily reduce the correlation between trials in the absence of any change in the "visually evoked response". While this type of overall correlation is potentially helpful for calcium imaging data that is already a filtered version of the underlying spike train, the lack of similarity to other studies and the potential confound from inter-stimulus periods limits interpretation.
The reason our metric uses a between-trial correlation based metric instead of a tuning curve similarity based metric is that we do not know exactly which features are inducing reliable responses in the natural movies. With the PDG stimulus, we do show equivalence between our between-trial metric and orientation tuning metrics ( Figure S2). However, with the MOV stimulus, we believe our representational drift metric is the best method for determining how responses change over time. Note that our event-based metric (Figure 2) also revealed similar changes despite analyzing changes in single response events.
However, we agree with the reviewer that inter-stimulus intervals may influence our metric. To address the concern about inter-stimulus intervals, we carried out additional experiments and analyses. Specifically, we checked the stability of responses to PDG and MOV stimuli using matched temporal structures, and found that eliminating the interstimulus intervals from the PDG stimulus (or adding interstimulus intervals to the MOV stimulus) did not affect the differential representational drift ( Figure S7; lines 187-208; specific experiments described in point 3 below).

A related issue is the direct comparison of responses evoked by brief gratings versus natural
scenes. The gratings have clear onset times, and therefore robust, discrete onset transients of large amplitude that are clearly seen in the data ( Figure 1F). In contrast, natural scenes may or may not have such robust and repeatable onset transients, leading to "muddier" responses whose signal-to-noise may be more sensitive to spontaneous fluctuations (also apparent in Figure 1F). Thus, the selective "representational drift" may simply reflect the apples-to-oranges comparison of the two stimulus sets and an unconventional response metric rather than a fundamental difference in plasticity.
4. The authors provide some analysis of changes in arousal across training by measuring pupil diameter. However, the demonstration that average pupil diameter does not correlate over sessions with RDI seems cursory at best. It would be helpful to see that the pupil changes observed do provide within-session information that tracks visual responsiveness, as published previously by many groups. In addition, other behavioral state variables (e.g., whisking) might provide an alternative metric. This seems particularly critical, as the authors in the end find no convincing explanation for the RDI.
We have divided our previous supplementary figure on eye movements and pupil dilation (formerly Figure S8) into two supplementary figures to address this and other concerns raised by the reviewers (Figures S10 and S11). We have added analysis showing that pupil dilation is moderately correlated with response magnitude, as shown by previous papers, and that neither eye movements or pupil size appreciably differ across stimulus conditions (lines 290-292; 767-770). We also analyzed eye movements and pupil dilation as a function of session number, and added all results to the paper (lines 284-302, Figures S10 and S11).
However, our principal finding is just what the reviewer described -while we observed some eye movements and changes in pupil dilation, since the changes are similar across stimulus sets, the behavioral variables are not sufficient to explain the differential drift between PDG and MOV stimuli.
5. Finally, in the absence of a mechanism or function, it would be interesting to at least see greater exploration of the relationship between visual stimulation and RDI. For example, does RDI occur for movies if only gratings are presented from weeks 2-6? Does more frequent stimulus presentation drive faster or more robust RDI? Does the RDI persist if visual stimulation is stopped for a period of time?
To address this issue, we imaged at Day 0 to record baseline responses to PDG and MOV stimulus, then imaged during a single session on Day 42 without any visual stimulus presentations in the interim. We chose not to present either stimulus in the interim in order to be able to directly compare the representational drift across stimuli. We found that even in the absence of extensive repeated stimulus presentation a period of 42 days produces differential representational drift between the stimuli. We have added these results to the manuscript (lines 210-218; Figure S8, reprinted below). The difference in RDI between stimuli qualitatively appeared smaller for the spaced imaging sessions, though it is difficult to make any firm conclusions given the smaller sample size of mice. Regardless, the repeated presentation cannot explain the difference in representational drift we see across visual stimuli.  Fig. 1i (desaturated curves). Error bars are ± s.e.m. MOV RDI is significantly different from PDG RDI (F1,228 = 11.6, ***p < 0.001; linear mixed-effects model, fixed effect for stimulus, random effect for mouse).

Reviewer #2 (Remarks to the Author):
The manuscript examines stability of visual cortex responses to drifting gratings and natural movies over the course of several weeks of recording. The stability of neuronal representations and how they change over time is an important question as it constrains for how these representations can influence downstream structures and guide behavior.
The primary claims of the paper are as follows: (i) responses to natural movies are less stable than responses to drifting gratings; (ii) this difference cannot be trivially explained by differences in the magnitude of response events; (iii) changes in responses cannot be explained by behavioral variables; (iv) representational drift appears to affect different layers as well as both excitatory and inhibitory neurons to a similar degree.
Overall, this is an interesting and well-executed study. The fact that natural movie responses exhibit a greater degree of plasticity is clear. What is less clear is what may be driving this difference and whether behavioral changes might contribute to apparent representational drift, contrary to the claims of the manuscript.
We thank the reviewer for their positive comments and have addressed their concerns about the behavioral changes in the revised manuscript.

It is clear that natural movie responses are less stable than that to drifting gratings but it is not
clear what is it about natural movies that underlies this difference. As the result, while this is an interesting observation, the reader is left wondering what it might imply about neural circuit function. Are MOV stimuli simply richer and more diverse, and therefore more likely to reveal representational drift over sessions? If that were the case, it would not decrease my enthusiasm for the paper, as it would suggest that simple parametric stimuli that are typically used overestimate stability of responses.
We also suspected that the difference may be due to the "complexity" of the MOV stimulus in comparison to the PDG stimulus. However, we were surprised that phase scrambling of the MOV stimuli had no effect on the progressive representational drift ( Figure 5). Even when the stimulus was 100% phase scrambled (preserving only spatial and temporal frequency distribution), the representational drift was as pronounced as for the unscrambled movie ( Figure   5b). This indicates that higher order spatial features alone are not responsible for the difference between stimuli.
The authors speculate that differences in connectivity of ensembles recruited by gratings vs movies may be responsible. However, the speculation that neurons that happen to respond at the same moment during presentation of the MOV stimulus are less likely to be integrated into the same synaptic subnetwork than those that respond to the same grating orientation is at odds with prior work. Cossell et al. (manuscript ref. 72) showed that neurons responses to natural images were highly predictive of their synaptic connectivity. While that study did not use natural movies, it would be surprising if the results did not also extend to movie stimuli. Therefore, it seems likely that neuronal ensembles that are co-active during natural movie presentation are also recurrently connected.
Yes, this is an important point that we should have explained more clearly. We are not arguing that neurons with similar responses to MOV stimuli are never strongly connected. However, with PDG stimuli, neurons responding to a particular grating are all highly connected within a local subnetwork of iso-oriented neurons, which constrains representational drift. On the other hand, two neurons responding to the same time point in the MOV stimuli may be highly connected, but they are not necessarily reciprocally connected with other neurons in a local subnetwork (since all of the responsive neurons may be responding to different spatial and temporal aspects of the stimulus). As a result, we expect the responses of the neurons will be less constrained by local connectivity. We have discussed this point in more detail in the discussion (lines 481-484).
2. The data supporting the claim that behavioral variables cannot explain differential stability of PDG and MOV responses are weak. Firstly, the authors do show that pupil diameter decreases over the time. The corresponding decrease in arousal may cause the overall reduction in the strength response events for both PDG and MOV stimuli ( Figure 2D). Could MOV responses be more sensitive to animal's level of arousal?
Good point. One possibility is that the different stimuli actually cause differential changes in arousal. To test this, we analyzed the change in pupil diameter separately for each stimulus, and found that changes in average pupil size were not statistically different across stimulus conditions. The results have been added to the manuscript (lines 290-302, Figure S11). Furthermore, we observe differential representational drift between stimuli even when we only show the mice the stimuli during two sessions spaced by 42 days (Figure S8), when changes in arousal due to repeated stimuli are unlikely to influence responses.
The authors state that "pupil size stabilized after the third session, and is unlikely to contribute to progressive drift observed in later sessions". The only corresponding data figure is Supplemental Figure 8C. It would be much easier to relate this to RDI measurements if pupil size was plotted for each recording week (as RDI values in the main figures). Since there is a lot of mouse-to-mouse variability in RDI curves and pupil size was only measured in 4 of the mice, it would also be helpful to compare pupil measurements and response stability for those specific animals.
We have plotted RDI curves and pupil size measurements for both stimuli for each of the four mice in which these were simultaneously recorded (Figures S10, S11). We found no clear relationship between either changes in eye movements or changes in pupil size and the stability of a mouse's responses to either stimulus.
Secondly, pupil diameter and position are the only aspects of behavior quantified. No videography or other data is presented. Without additional behavioral measurements, I would encourage the authors to carefully consider the claims they make about potential behaviorrelated confounds and perhaps focus on arousal (based on pupillometry) rather than behavior in general. 4. An important question that is not discussed in the manuscript is whether repeated presentation of the same stimulus may be contributing to representational drift. Would the same amount of drift be observed between week 1 and week 7 if the stimulus was not presented on weeks 2-6, or if it was presented with fewer repetitions? If the authors' data can shed light on this question, it would strengthen the manuscript. Otherwise, the authors ought to consider the potential role of repeated exposure to the stimulus in the discussion.
We agree that this is a potential concern (this was also pointed out by reviewer #1). To address this concern, we carried out a new set of experiments in which mice were only imaged at D0 and D42 without any intervening stimulus presentations. We found that even in the absence of extensive repeated stimulus presentation a period of 42 days produces differential representational drift between the stimuli. We have added these results to the manuscript (lines 208-218, Figure S8). 5. Animals of a wide range of ages (12 -30 weeks) are used in the study. Does animal age explain any of the variability in the extent of representational drift?
To check whether animal age had any affect on representational drift, we analyzed RDI as a function of animal age at the first imaging session. We found a trend toward lower RDI in older animals, but no significant correlation for either stimulus. We have added this finding to the manuscript (lines 270-271, Figure S5d).

Reviewer #3 (Remarks to the Author):
The manuscript by Mark and Goard titled 'Stimulus-dependent representational drift in primary visual cortex' examines the stability of natural scene representation in primary visual cortex (V1) relative to the stability of simpler grating stimuli. This is an important issue to address because although it is well recognized that orientation tuning to classic Cartesian stimuli in the upper layers of primary cortex is stable across days, it is completely unknown whether responses to complex, naturalistic stimuli are stable. The study is comprehensive and uses the micro-prism technique to capture neural responses from layers 2/3, 4, and 5. The scholarship of the manuscript is excellent; the manuscript is well-written and was a pleasure to read. The quality of the data set is superb and will likely serve as reference material for years to come. Associating a metric of quality of tracking single neurons across sessions is extremely useful. The discovery that the representation of naturalistic movies in V1 is less stable than classic oriented stimuli is of interest to the fields of mammalian vision and sensory systems in general.
We thank the reviewer for the kind comments on the manuscript. Major 1. The authors include a results section on the topic of whether the greater instability observed for MOV can be accounted for by behavioral state. This is important. It would be useful to examine this issue in further detail. Eye movements: do the eyes move more during MOV than PDG for the four mice examined? Does this change with sessions number? Cumulative distribution of pupil position for each individual mouse for all sessions could address this issue (ideally degrees, but relative to median with-session ok) This is a good question (related to a question raised by reviewer #2). We analyzed the change in eye movements and pupil diameter separately for each stimulus, and found that changes in these factors were not significantly different between stimuli. We also analyzed changes across session number and found no clear relationship between changes in either eye movements or pupil size and the stability of a mouse's responses to either stimulus. The results have been added to the manuscript (lines 284-302, Figures S10 and S11).
2. If possible, it would be valuable perform additional analysis on the inhibitory neuron population, specifically, to sub-divide the inhibitory neurons into functional classes. For example, are the inhibitory neurons broadly tuned for orientation more or less stable than inhibitory neurons sharply tuned for orientation?
To address this, we have divided neurons into sharply-tuned and broadly-tuned groups (threshold OSI = 0.4 based on distribution, see Reviewer Figure 1 inset). We found that RDI may be slightly higher for sharply-tuned interneurons compared to broadly-tuned interneurons (see Reviewer Figure 1 below). These findings are potentially interesting, as they indicate that subsets of neurons (presumably the more broadly-tuned PV+ interneurons) may have lower levels of representational drift. However, although differences in tuning selectivity have been observed across genetically-defined inhibitory subtypes, we do not feel confident classifying inhibitory neuron subtypes based only on tuning selectivity without confirming using subtypespecific driver lines, so we decided to leave this as a reviewer figure. 3. The authors' justification for concluding that their data suggest local network connectivity may influence stability is tenuous. Do the authors have any data demonstrating that instability of naturalistic images is not the result influence arising from top-down input? ...if not, fine, then remove this claim. Neither the main conclusion, nor the significance of the manuscript is altered if this statement is removed or moved to the Discussion.
We agree -although we observe local changes in signal correlations, more distal top-down inputs could also be influencing responses. We have removed this statement from the manuscript and discussed it with more care in the Discussion (lines 484-487). We have added more details to the methods to describe the ROI segmentation (lines 672-673; 691-693; 702-707). Briefly: (I) We filtered the individual pixels with a 3 x 3 pixel window before calculating kurtosis to reduce outlier values. (II) The scoring was based on the clarity of the cellular structure in the average fluorescence and activity map (e.g., is the cell structure visible in all weeks and clearly separated from nearby cells). Although the activity map was not required to be constant across weeks since cells can change their activity, we were particularly wary of correlated changes between the average fluorescence and activity map that would suggest movement of the z-plane. This process was difficult to accomplish algorithmically, so we used subjective assessments. However, in all cases we defined ROIs using data spanning both stimulus conditions to eliminate systematic bias. In addition, we checked that our quality score inclusion threshold did not affect the key results ( Figure S1f).

Minor
Please report the luminance of each frame in the movie sequence.
We have added information on the luminance to the Methods (line 586).
Line 75. The work cited is not peer reviewed, so using this work as a rationale for further study is problematic. Furthermore, the rationale is sufficiently strong without including a reference to results uploaded to BioArchive.
We have deleted this sentence from the rationale. Line 239. The term 'highly salient' features should be avoided when interpreting the data generated here. There is no salience to the stimuli-if anything, the animal may be learning to ignore these stimuli, given the presented stimuli do not have behavioral significance. The authors' point being made here is unclear.
Sorry, we should have clarified that we were referring to bottom-up salience (e.g., sudden movements or changes in luminance, agnostic to behavioral significance). We wanted to investigate whether movie frames with high bottom-up salience were driving high magnitude / stable responses in substantial subsets of cells, but our data indicated the opposite. We have clarified this point in the manuscript (lines 272-273).

Figure1
Legend. Please do not use term the term 'watching'.
Thank you, corrected.
Line 629. How many pixels typically went into calculating the neuropil per neuron? What are the spatial dimensions of a pixel?
The neuropil annulus varied depending on the local cell density (we excluded other ROIs from the neuropil mask), but the average size was approximately 30 pixels. The spatial dimension of each pixel is 0.545 x 0.545 um for a typical recording (for the 2x magnification used in most experiments). We have added the details on the pixel size of the images to the methods (lines 641-643).
Note on change to statistical tests: During the revision, we realized that in some cases our data had a nested structure (a large sample of neurons from a smaller number of mice). Since there appears to be between-mouse differences in RDI effects ( Figure S5), counting each neuron as an independent sample could inflate Type I errors (Aarts et al., Nat. Neurosci., 2014). To address this, nested data were compared using a linear mixed-effects model (fixed effect for stimulus, random effect for mouse; see lines 156-157; 783-784). This did not affect the statistical significance of any of the key results.