Abstract
Many behaviors rely on predictions derived from recent visual input, but the temporal evolution of those inputs is generally complex and difficult to extrapolate. We propose that the visual system transforms these inputs to follow straighter temporal trajectories. To test this ‘temporal straightening’ hypothesis, we develop a methodology for estimating the curvature of an internal trajectory from human perceptual judgments. We use this to test three distinct predictions: natural sequences that are highly curved in the space of pixel intensities should be substantially straighter perceptually; in contrast, artificial sequences that are straight in the intensity domain should be more curved perceptually; finally, naturalistic sequences that are straight in the intensity domain should be relatively less curved. Perceptual data validate all three predictions, as do population models of the early visual system, providing evidence that the visual system specifically straightens natural videos, offering a solution for tasks that rely on prediction.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Primary visual cortex straightens natural video trajectories
Nature Communications Open Access 13 October 2021
-
Separability and geometry of object manifolds in deep neural networks
Nature Communications Open Access 06 February 2020
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout






Data availability
The data supporting the findings of this study are available from the corresponding author on reasonable request.
Code availability
The code used to analyze the data of this study is available from the corresponding author on reasonable request.
Change history
15 May 2019
The original and corrected figures are shown in the accompanying Author Correction.
References
Barlow, H. B. Possible principles underlying the transformation of sensory messages. Sensory Communication (ed. Rosenblith, W.) 217–234 (M.I.T. Press, 1961).
Atick, J. J. & Redlich, A. N. Towards a theory of early visual processing. Neural Comput. 320, 1–13 (1990).
van Hateren, J. H. A theory of maximizing sensory information. Biol. Cybern. 68, 23–29 (1992).
Meister, M., Lagnado, L. & Baylor, D. A. Concerted signaling by retinal ganglion cells. Science 270, 1207–1210 (1995).
Balasubramanian, V. & Berry, M. J. A test of metabolically efficient coding in the retina. Network 13, 531–552 (2002).
Puchalla, J. L., Schneidman, E., Harris, R. A. & Berry, M. J. Redundancy in the population code of the retina. Neuron 46, 493–504 (2005).
Doi, E. et al. Efficient coding of spatial information in the primate retina. J. Neurosci. 32, 16256–16264 (2012).
Hubel, D. H. & Wiesel, T. N. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962).
Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996).
Bell, A. J. & Sejnowski, T. J. The ‘independent components’ of natural scenes are edge filters. Vision Res. 37, 3327–3338 (1997).
Goris, R. L. T., Simoncelli, E. P. & Movshon, J. A. Origin and function of tuning diversity in macaque visual cortex. Neuron 88, 819–831 (2015).
Rust, N. C. & DiCarlo, J. J. Selectivity and tolerance (‘invariance’) both increase as visual information propagates from cortical area V4 to IT. J. Neurosci. 30, 12978–12995 (2010).
Le Gall, D. MPEG: a video compression standard for multimedia applications. Commun. ACM 34, 46–58 (1991).
Tishby, N., Pereira, F. C. & Bialek, W. The information bottleneck method. In Proc. Allerton Conference on Communication, Control and Computing 37, 368–377 (1999).
Wiskott, L. & Sejnowski, T. J. Slow feature analysis: unsupervised learning of invariances. Neural Comput. 14, 715–70 (2002).
Richthofer, S. & Wiskott, L. Predictable feature analysis. In Proceedings IEEE 1fourth International Conference on Machine Learning and Applications (2016).
Palmer, S. E., Marre, O., Berry, M. J. & Bialek, W. Predictive information in a sensory population. Proc. Natl Acad. Sci. USA 112, 6908–13 (2015).
DiCarlo, J. J. & Cox, D. D. Untangling invariant object recognition. Trends Cogn. Sci. 11, 333–341 (2007).
Noreen, D. L. Optimal decision rules for some common psychophysical paradigms. Proc. of the Symposium in Applied Mathematics of the American Mathematical Society and the Society for Industrial and Applied Mathematics 13, 237–279 (1981).
Tenenbaum, J. B., De Silva, V. & Langford, J. C. A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–23 (2000).
Roweis, S. T. & Saul, L. K. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–6 (2000).
Poole, B., Lahiri, S., Raghu, M., Sohl-Dickstein, J. & Ganguli, S. Exponential expressivity in deep neural networks through transient chaos. Advances in Neural Information Processing Systems 29, 3360–3368 (2016).
Mante, V., Bonin, V. & Carandini, M. Functional mechanisms shaping lateral geniculate responses to artificial and natural stimuli. Neuron 58, 625–638 (2008).
Berardino, A., Ballé, J., Laparra, V. & Simoncelli, E. P. Eigen-distortions of hierarchical representations. Advances in Neural Information Processing Systems 30, 3530–3539 (2017).
Adelson, E. H. & Bergen, J. R. Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. A 2, 284 (1985).
Carandini, M. & Heeger, D. J. Normalization as a canonical neural computation. Nat. Rev. Neurosci. 13, 51–62 (2012).
Mallat, S. Group invariant scattering.Commun. Pur. Appl. Math. 65, 1331–1398 (2012).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Yamins, D. L. K. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc Natl Acad. Sci. USA 111, 8619–8624 (2014).
Khaligh-Razavi, S. M. & Kriegeskorte, N. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Comput. Biol. 10, e1003915 (2014).
Tacchetti, A., Isik, L. & Poggio, T. Invariant recognition drives neural representations of action sequences. PLoS Comput. Biol. 13, e1005859 (2017).
Hong, H., Yamins, D. L. K., Majaj, N. J. & Dicarlo, J. J. Explicit information for category-orthogonal object properties increases along the ventral stream. Nat. Neurosci. 19, 613–22 (2016).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25, 1–9 (2012).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proc. International Conference on Learning Representations 3, 1–14 (2015).
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. International Conference on Machine Learning 7, 1–9 (2015).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. Conference on Computer Vision and Pattern Recognition 29, 770–778 (2016).
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proc. Conference on Computer Vision and Pattern Recognition 30, 2261–2269 (2017).
Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Annu. Rev. Neurosci. 24, 1193–1216 (2001).
Barlow, H. Redundancy reduction revisited. Network 12, 241–253 (2001).
Machens, C. K., Gollisch, T., Kolesnikova, O. & Herz, A. V. M. Testing the efficiency of sensory coding with optimal stimulus ensembles. Neuron 47, 447–456 (2005).
Geisler, W. S. Visual perception and the statistical properties of natural scenes. Annu. Rev. Psychol. 59, 167–192 (2008).
Bialek, W., De Ruyter Van Steveninck, R. R. & Tishby, N. Efficient representation as a design principle for neural coding and computation. In Proc. International Symposium on Information Theory, 659–663 (2006).
Fukushima, K. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybernet. 36, 193–202 (1980).
Serre, T., Oliva, A. & Poggio, T. A feedforward architecture accounts for rapid categorization. Proc. Natl Acad. Sci. USA 104, 6424–6429 (2007).
Bai, Y., et al. Neural straightening of natural videos in macaque primary visual cortex. Soc. Neurosci. Abstr. 485.07 (2018).
Hénaff, O. J. & Simoncelli, E. P. Geodesics of learned representations. In Proc. International Conferenceon Learning Representations 4, 1–10 (2016).
Hénaff, O.J., Goris, R.L.T. & Simoncelli, O.J. Perceptual evaluation of artificial visual recognition systems using geodesics. Cosyne Abstr. II-72 (2016).
Li, N. & DiCarlo, J. J. Unsupervised natural experience rapidly alters invariant object representation in visual cortex. Science 321, 1502–1507 (2008).
Li, N. & DiCarlo, J. J. Unsupervised natural visual experience rapidly reshapes size-invariant object representation in inferior temporal cortex. Neuron 67, 1062–1075 (2010).
Cox, D. D., Meier, P., Oertelt, N. & DiCarlo, J. J. ‘Breaking’ position-invariant object recognition. Nat. Neurosci. 8, 1145–1147 (2005).
Seshadrinathan, K., Soundararajan, R., Bovik, A. C. & Cormack, L. K. Study of subjective and objective quality assessment of video. IEEE Transactions on Image Process. 19, 1427–1441 (2010).
Seshadrinathan, K., Soundararajan, R., Bovik, A. C. & Cormack, L. K. A subjective study to evaluate video quality assessment algorithms. In SPIE Proceedings Human Vision and Electronic Imaging, 1–10 (2010).
Wichmann, F. A. & Hill, N. J. The psychometric function: I. Fitting, sampling, and goodness of fit. Percept. Psychophys. 63, 1293–1313 (2001).
Jordan, M. I., Ghahramani, Z., Jaakkola, T. S. & Saul, L. K. Introduction to variational methods for graphical models. Mach. Learn. 37, 183–233 (1999).
Kingma, D. P. & Welling, M. Auto-encoding variational bayes. In Proc. International Conference on Learning Representations 2, 1–14 (2014).
Simoncelli, E. P. & Freeman, W. T. in Proceedings second IEEE., International Conference on Image Processing, 444–447 (1995).
Green, D. G. Regional variations in the visual acuity for interference fringes on the retina. J. Physiol. 207, 351–6 (1970).
Acknowledgements
We thank S. Palmer and J. Salisbury for making available the video sequences in their Chicago Motion Database. We are also grateful to Y. Bai for helpful comments on the manuscript. This work was supported by the Howard Hughes Medical Institute (O.J.H, R.L.T.G., E.P.S).
Author information
Authors and Affiliations
Contributions
O.J.H., R.L.T.G. and E.P.S. conceived the project and designed the experiments. O.J.H. designed the analysis and performed the experiments. O.J.H, R.L.T.G. and E.P.S. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Journal peer review information: Nature Neuroscience thanks Konrad Kording and other anonymous reviewer(s) for their contribution to the peer review of this work.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
Fig. Supplementary 1 Recovery analysis for curvature estimation.
We simulated 4 observers with different sensitivities, viewing 21 different sequences with varying perceptual curvature, and evaluated our ability to estimate the perceptual curvature from the same amount of data we use in our experiment. Simulated observers’ sensitivities span the range of human sensitivities, and perceptual curvatures vary from 0° to 180°. (a) Greedy, two-step estimation, that first estimates the most likely perceptual trajectory, and then measures its curvature, is plagued by substantial bias. (b) Our method, which estimates the most likely perceptual curvature given many plausible perceptual trajectories, is largely unbiased.
Supplementary Fig. 2 Initial, middle and final frames from the first six natural and artificial sequences used in our experiment.
Natural image sequences follow the top (blue) path, whereas artificial sequences follow the bottom (green) path between the same end-points.
Supplementary Fig. 3 Initial, middle and final frames from the last five natural and artificial sequences used in our experiment.
Natural image sequences follow the top (blue) path, whereas artificial sequences follow the bottom (green) path between the same end-points.
Supplementary Fig. 4 Predictability of natural, artificial and naturalistic sequences, for first, second, third and fourth-order predictors in the intensity and perceptual domains.
Each predictor is fit independently to a sequence in the pixel-intensity and perceptual domains by regressing the previous 2 (first-order), 3 (second-order), 4 (third-order) or 5 (fourth-order) samples onto the next one. We then compare the errors of these predictors in each domain. As expected, higher-order predictors are more accurate than lower-order ones, but all show the same pattern of errors across domains and sequences types. Circles indicate the median across sequences, error bars (where visible) represent the 68% confidence interval. Left: natural sequences (experiment 1, n = 12 sequences). Middle: artificial sequences (experiment 2, n = 12 sequences). Right: naturalistic ‘contrast’ sequences (experiment 3, n = 9 sequences). The ‘control’ trajectories show the same curvature as in the intensity domain, but are otherwise identical to the human observers’ perceptual trajectories.
Supplementary Fig. 5 Changes in curvature in contemporary deep convolutional neural network architectures.
Despite their strong performance in object recognition, none of these architectures straighten natural videos. Circles indicate the median across sequences, error bars representing the 68% confidence interval are smaller than these circles (n = 12 sequences for natural and artificial stimuli, n = 9 sequences for naturalistic ‘contrast’ stimuli). (a) 19-layer VGG architecture34. (b) 19-layer VGG architecture with batch normalization35. (c) 152-layer Residual Network architecture36. (d) 121-layer Dense Network architecture37.
Supplementary Fig. 6 Recovery analysis for multiple simulated populations.
In Figs. 3c, 4c and 5c we show a single, typical, population of simulated controls (whose median change in curvature is depicted here by a gray arrow). The simulation process is inherently variable, as is the subsequent recovery, due to finite numbers of subjects and trials. Here we evaluate the dispersion of the median curvature change across repetitions of the simulation and recovery procedure (gray histogram, n = 5 independent repetitions). Experiments 1 and 2: the curvature change for human observers is much larger than for any of the simulated controls (p < 0.001, two-tailed Z-test). Experiment 3: human observers show increased curvature relative to controls, but much less so than in experiment 2 (p = 0.02, two-tailed Z-test). Together, these experiments show that the typical simulated populations shown in Figs. 3–5 are representative of the distribution across simulated populations.
Supplementary information
Rights and permissions
About this article
Cite this article
Hénaff, O.J., Goris, R.L.T. & Simoncelli, E.P. Perceptual straightening of natural videos. Nat Neurosci 22, 984–991 (2019). https://doi.org/10.1038/s41593-019-0377-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41593-019-0377-4
This article is cited by
-
Primary visual cortex straightens natural video trajectories
Nature Communications (2021)
-
Neural tuning and representational geometry
Nature Reviews Neuroscience (2021)
-
Separability and geometry of object manifolds in deep neural networks
Nature Communications (2020)