Sighted animals extract motion information from visual scenes by processing spatiotemporal patterns of light falling on the retina. The dominant models for motion estimation exploit intensity correlations only between pairs of points in space and time. Moving natural scenes, however, contain more complex correlations. We found that fly and human visual systems encode the combined direction and contrast polarity of moving edges using triple correlations that enhance motion estimation in natural environments. Both species extracted triple correlations with neural substrates tuned for light or dark edges, and sensitivity to specific triple correlations was retained even as light and dark edge motion signals were combined. Thus, both species separately process light and dark image contrasts to capture motion signatures that can improve estimation accuracy. This convergence argues that statistical structures in natural scenes have greatly affected visual processing, driving a common computational strategy over 500 million years of evolution.
Subscribe to Journal
Get full journal access for 1 year
only $17.42 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Field, D.J. Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am. A 4, 2379–2394 (1987).
Ruderman, D.L. & Bialek, W. Statistics of natural images: scaling in the woods. Phys. Rev. Lett. 73, 814–817 (1994).
Simoncelli, E.P. & Olshausen, B.A. Natural image statistics and neural representation. Annu. Rev. Neurosci. 24, 1193–1216 (2001).
Hassenstein, B. & Reichardt, W. Systemtheoretische Analyse der Zeit-, Reihenfolgen-und Vorzeichenauswertung bei der Bewegungsperzeption des Rüsselkäfers Chlorophanus. Z. Naturforsch. 11, 513–524 (1956).
Adelson, E.H. & Bergen, J. Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. A 2, 284–299 (1985).
Fitzgerald, J.E., Katsov, A.Y., Clandinin, T.R. & Schnitzer, M.J. Symmetries in stimulus statistics shape the form of visual motion estimators. Proc. Natl. Acad. Sci. USA 108, 12909–12914 (2011).
Anderson, J.M.M. & Giannakis, G.B. Image motion estimation algorithms using cumulants. IEEE Trans. Image Process. 4, 346–357 (1995).
Sayrol, E., Gasull, A. & Fonollosa, J.R. Motion estimation using higher order statistics. IEEE Trans. Image Process. 5, 1077–1084 (1996).
Rust, N.C., Mante, V., Simoncelli, E.P. & Movshon, J.A. How MT cells analyze the motion of visual patterns. Nat. Neurosci. 9, 1421–1431 (2006).
Borst, A., Haag, J. & Reiff, D.F. Fly motion vision. Annu. Rev. Neurosci. 33, 49–70 (2010).
van Santen, J.P.H. & Sperling, G. Elaborated reichardt detectors. J. Opt. Soc. Am. A 2, 300–321 (1985).
Theobald, J.C., Duistermars, B.J., Ringach, D.L. & Frye, M.A. Flies see second-order motion. Curr. Biol. 18, R464–R465 (2008).
Zanker, J.M. Theta motion: a paradoxical stimulus to explore higher order motion extraction. Vision Res. 33, 553–569 (1993).
Hu, Q. & Victor, J.D. A set of high-order spatiotemporal stimuli that elicit motion and reverse-phi percepts. J. Vis. 10, 9.1–9.16 (2010).
Lu, Z.L. & Sperling, G. Three-systems theory of human visual motion perception: review and update. J. Opt. Soc. Am. A Opt. Image Sci. Vis. 18, 2331–2370 (2001).
Victor, J.D. & Conte, M.M. Evoked potential and psychophysical analysis of Fourier and non-Fourier motion mechanisms. Vis. Neurosci. 9, 105–123 (1992).
Geisler, W.S. Visual perception and the statistical properties of natural scenes. Annu. Rev. Psychol. 59, 167–192 (2008).
Dror, R.O., O'Carroll, D.C. & Laughlin, S.B. Accuracy of velocity estimation by Reichardt correlators. J. Opt. Soc. Am. A Opt. Image Sci. Vis. 18, 241–252 (2001).
van Hateren, J.H. & van der Schaaf, A. Independent component filters of natural images compared with simple cells in primary visual cortex. Proc. Biol. Sci. 265, 359–366 (1998).
Stocker, A.A. & Simoncelli, E.P. Noise characteristics and prior expectations in human visual speed perception. Nat. Neurosci. 9, 578–585 (2006).
Katsov, A.Y. & Clandinin, T. Motion processing streams in Drosophila are behaviorally specialized. Neuron 59, 322–335 (2008).
Joesch, M., Schnell, B., Raghu, S., Reiff, D. & Borst, A. ON and OFF pathways in Drosophila motion vision. Nature 468, 300–304 (2010).
Clark, D.A., Bursztyn, L., Horowitz, M.A., Schnitzer, M.J. & Clandinin, T.R. Defining the computational structure of the motion detector in Drosophila. Neuron 70, 1165–1177 (2011).
Silies, M. et al. Modular use of peripheral input channels tunes motion-detecting circuitry. Neuron 79, 111–127 (2013).
Buchner, E. Elementary movement detectors in an insect visual system. Biol. Cybern. 24, 85–101 (1976).
Tuthill, J.C., Chiappe, M.E. & Reiser, M.B. Neural correlates of illusory motion perception in Drosophila. Proc. Natl. Acad. Sci. USA 108, 9685–9690 (2011).
Eichner, H., Joesch, M., Schnell, B., Reiff, D.F. & Borst, A. Internal structure of the fly elementary motion detector. Neuron 70, 1155–1164 (2011).
Tuthill, J.C., Nern, A., Holtz, S.L., Rubin, G.M. & Reiser, M.B. Contributions of the 12 neuron classes in the fly lamina to motion vision. Neuron 79, 128–140 (2013).
Westheimer, G. The ON-OFF dichotomy in visual processing: From receptors to perception. Prog. Retin. Eye Res. 26, 636–648 (2007).
Ales, J.M. & Norcia, A.M. Assessing direction-specific adaptation using the steady-state visual evoked potential: results from EEG source imaging. J. Vis. 9, 8 (2009).
Moulden, B. & Begg, H. Some tests of the Marr-Ullman model of movement detection. Perception 15, 139 (1986).
Mather, G., Moulden, B. & O'Halloran, A. Polarity-specific adaptation to motion in the human visual system. Vision Res. 31, 1013–1019 (1991).
Emerson, R.C., Bergen, J.R. & Adelson, E.H. Directionally selective complex cells and the computation of motion energy in cat visual cortex. Vision Res. 32, 203–218 (1992).
Nishida, S.y. Advancement of motion psychophysics: review 2001–2010. J. Vis. 11, 11 (2011).
Quenzer, T. & Zanker, J. Visual detection of paradoxical motion in flies. J. Comp. Physiol. A Neuroethol. Sens. Neural Behav. Physiol. 169, 331–340 (1991).
Ilg, U.J. & Churan, J. Motion perception without explicit activity in areas MT and MST. J. Neurophysiol. 92, 1512–1523 (2004).
Hedges, J.H. et al. Dissociation of neuronal and psychophysical responses to local and global motion. Curr. Biol. 21, 2023–2028 (2011).
Ratliff, C.P., Borghuis, B.G., Kao, Y.-H., Sterling, P. & Balasubramanian, V. Retina is structured to process an excess of darkness in natural scenes. Proc. Natl. Acad. Sci. USA 107, 17368–17373 (2010).
Barlow, H.B. & Levick, W.R. The mechanism of directionally selective units in rabbit's retina. J. Physiol. (Lond.) 178, 477 (1965).
Ibn-elhaj, E., Aboutajdine, D., Pateux, S. & Morin, L. HOS-based method of global motion estimation for noisy image sequences. Electron. Lett. 35, 1320–1322 (1999).
Schiller, P.H., Finlay, B.L. & Volman, S.F. Quantitative studies of single-cell properties in monkey striate cortex. I. Spatiotemporal organization of receptive fields. J. Neurophysiol. 39, 1288 (1976).
Potetz, B. & Lee, T.S. Statistical correlations between two-dimensional images and three-dimensional structures in natural scenes. J. Opt. Soc. Am. A Opt. Image Sci. Vis. 20, 1292–1303 (2003).
Wiederman, S.D., Shoemaker, P.A. & O'Carroll, D.C. A model for the detection of moving targets in visual clutter inspired by insect physiology. PLoS ONE 3, e2784 (2008).
Poggio, T. & Reichardt, W. Visual control of orientation behaviour in the fly. Q. Rev. Biophys. 9, 377–438 (1976).
Card, G. & Dickinson, M.H. Visually mediated motor planning in the escape response of Drosophila. Curr. Biol. 18, 1300–1307 (2008).
Harris, J.M. & Parker, A.J. Independent neural mechanisms for bright and dark information in binocular stereopsis. Nature 374, 808–811 (1995).
Samonds, J.M., Potetz, B.R. & Lee, T.S. Relative luminance and binocular disparity preferences are correlated in macaque primary visual cortex, matching natural scene statistics. Proc. Natl. Acad. Sci. USA 109, 6313–6318 (2012).
Bialek, W. Physical limits to sensation and perception. Annu. Rev. Biophys. Biophys. Chem. 16, 455–478 (1987).
Laughlin, S.B. Energy as a constraint on the coding and processing of sensory information. Curr. Opin. Neurobiol. 11, 475–480 (2001).
Laughlin, S.B. & Sejnowski, T.J. Communication in neuronal networks. Science 301, 1870–1874 (2003).
Potters, M. & Bialek, W. Statistical mechanics and visual signal processing. J. Phys. I 4, 1755–1775 (1994).
Juusola, M., Uusitalo, R. & Weckström, M. Transfer of graded potentials at the photoreceptor-interneuron synapse. J. Gen. Physiol. 105, 117 (1995).
Gohl, D.M. et al. A versatile in vivo system for directed dissection of gene expression patterns. Nat. Methods 8, 231–237 (2011).
Kitamoto, T. Conditional modification of behavior in Drosophila by targeted expression of a temperature-sensitive shibire allele in defined neurons. J. Neurobiol. 47, 81–92 (2001).
Stavenga, D.G. Angular and spectral sensitivity of fly photoreceptors. II. Dependence on facet lens F-number and rhabdomere type in Drosophila. J. Comp. Physiol. A Neuroethol. Sens. Neural Behav. Physiol. 189, 189–202 (2003).
Palomares, M., Ales, J.M., Wade, A.R., Cottereau, B.R. & Norcia, A.M. Distinct effects of attention on the neural responses to form and motion processing: A SSVEP source-imaging study. J. Vis. 12, 15 (2012).
Berens, P. CircStat: A MATLAB toolbox for circular statistics. J. Stat. Softw. 31, 1–21 (2009).
Ales, J.M., Farzin, F., Rossion, B. & Norcia, A.M. An objective method for measuring face detection thresholds using the sweep steady-state visual evoked response. J. Vis. 12, 18 (2012).
Gardner, M. Mathematical games: the fantastic combinations of John Conway's new solitaire game, 'life'. Sci. Am. 223, 120–123 (1970).
We thank J. Burge, P. Varghese, B. Wandell and members of the Clandinin laboratory for helpful comments on this manuscript. D.A.C. was supported by an US National Institutes of Health T32 Vision Research Training grant and a postdoctoral fellowship from the Jane Coffin Childs Foundation. J.E.F. was supported by a National Science Foundation Graduate Research Fellowship and by NSF-0801700. J.M.A. and A.M.N. were supported by a grant from the US National Institutes of Health (EY015790). D.M.G. was supported by a post-doctoral fellowship from the US National Institutes of Health, and M.A.S. was supported by a postdoctoral fellowship from the Jane Coffin Childs Foundation. In T.R.C.'s laboratory, this work was supported by US National Institutes of Health Director's Pioneer Award (DP1 OD003530) and by R01 EY022638.
The authors declare no competing financial interests.
Integrated supplementary information
(a) The classical Hassenstein-Reichardt Correlator (HRC). In this paper, h1(x) and h2(x) were modeled as Gaussian spatial acceptance filters (centered on different points in visual space), f(t) was a low-pass filter, and g(t) was a high-pass filter. In essence, the HRC multiplies delayed values of the contrast with current contrast values across two spatial points. The subtraction stage results in mirror symmetry, thereby enabling responses to both rightward and leftward motion. (b) The classical motion energy (ME) model applies several oriented spatiotemporal filters to the visual input. These filtered signals are subsequently squared and linearly combined to compute the 'motion energy.' (c) To compute the 2- and 3-point correlation images, we computed the product with the rightward orientation (left) and then subtracted the mirror symmetric product with a leftward orientation (center). This results in the images on the right, which are also displayed in Figure 1b. (d) HRC-like diagrams for the diverging and converging 3-point correlators used in Figure 1d.
Supplementary Figure 2 Triple correlations only signify motion when the stimulus is light-dark asymmetric.
Each row presents a comparison between correlational motion signatures. Columns present: (i) context for each comparison; (ii) properties of pairwise motion estimators; (iii) properties of diverging 3-point estimators; and (iv) properties of converging 3-point estimators. (ai) Motion is approximated by the rigid translation of images. (aii-aiv) Cartoon of the correlation structure that each estimator detects. (bi) Example sinusoidal grating. (bii) Pair correlations signified motion across the image. (biii-biv) Triple correlations depended on the local phase of the sinusoidal grating and spatially averaged to zero. (ci) Example asymmetric grating. The luminance at each point in space was the luminance of the example sinusoidal grating raised to the tenth power. (cii) Pair correlations varied across the image. (ciii-iciv) Triple correlations still depended on the local phase of the grating, but their spatial average was nonzero. (di) Cartoon of an ensemble of sinusoidal gratings that vary in period and phase. (dii-div) The accuracy with which correlations convey motion was examined across this ensemble. The performance of each estimator was quantified through the Pearson's correlation between the estimator output and the simulated velocity. We linearly combined estimators to quantify the improvements afforded by multiple correlational signals. Neither spatial averaging nor triple correlations improved the motion estimate. (e) Same as (d), but for asymmetric gratings. In this case, both spatial averaging and triple correlations improved the accuracy of motion estimation. The numbers above each bar denote the fractional increase with respect to the 2-point estimate. Error bars are standard deviations over cross-validating trials (see Online Methods).
Rows a-c present a comparison between correlational motion signatures, when (a) the contrast gain is set locally by considering the average luminance over one degree squares of pixels, (b) the contrast gain is set locally by considering the average luminance over five degree squares of pixels, (c) the contrast gain is set globally by considering the average luminance over the full image. Columns apply to rows a-c and present: (i) example local luminance average, which sets the contrast gain; (ii) accuracy of pairwise motion estimators; (iii) accuracy of diverging 3-point estimators; and (iv) accuracy of converging 3-point estimators. Columns (ii)-(iv) are of the same format as in Figs. 1 and S2, and Figs. S3cii-civ are identical to Figs. 1cii-iv. Error bars are standard deviations over cross-validating trials (see Online Methods). (d) Contrast histograms when the contrast gain was determined by averaging over one degree squares of pixels (left), five degree squares of pixels (center), or the full visual field (right). The mean (i.e. c̄ = 〈c〉), variance (i.e. 〈(c−c̄)2〉), and third central moment (i.e. 〈(c−c̄)3〉) are shown alongside each histogram.
(a) Diagrams of the update rules that generate glider stimuli (see also Online Methods). Given a seed row and a seed column of pixel contrasts (upper left), the glider update rules fill in all remaining pixels, one row at a time, to generate an instantiation of the glider. The red points in the diagrams exemplify the update rule for each glider. The illustrative choices are not special, as any such pixel combinations will obey the update rule by construction. (b) Within a 2-point glider, all 3-point correlations average to 0. Similarly, within a 3-point glider, 2-point correlations (and the other 3-point correlations) average to 0. (c) Example space-time plots of the glider stimuli. (d) The ON/OFF model proposed in Eichner et al.27 correctly predicted the signs of the 2-point glider responses but did not predict the observed 3-point glider responses. Error bars are SEM, as in Fig. 3.
Supplementary Figure 5 Drosophila respond to several triple correlations involving two points in space and three points in time.
The correlation structures for each glider and sample space-time intensity plots are shown at top, and the behavioral responses (relative to the positive 2-pt glider response) are shown below. We found that flies respond less strongly to these gliders than to the diverging and converging gliders (compare to Fig. 3d). We measured statistically significant responses in only 2 of 6 cases (two-tailed t-test, '*' corresponds to p=2.0×10-2 (t15=2.6) and '**' corresponds to p=3.7×10-3 (t15=3.4)). Error bars are SEM and N=16.
We first counted the number of times that each correlational element appeared in right and left-moving edges of the same polarity. The difference between the rightward and leftward counts provided an edge-specific directional signal (denoted “Net” in the figure). As expected, these directional signals depended on edge type for 3-point correlational elements but not for 2-point correlational elements. In particular, the sign of each 3-point directional signal inverted when the edge polarity inverted. In Figure 5, we used these directional signals as linear weighting coefficients to predict Drosophila behavioral responses to moving edges from measured glider responses.
(a) Space-time plots of the two mirror-symmetric opposing edge adapters. The light and dark edges moved in opposite directions and are highlighted by the green and purple lines, respectively. Each presentation of the adapters had a random spatial phase. See also movie M1 for examples of the adapters. The probe temporally alternated between adapter A and adapter B. The spatial period of the adapting stimuli was doubled in the probe stimulus, but the speed of the edges remained the same. Green and purple lines highlight moving edges and show that the directions of the light and dark edges invert during the two halves of the probe. The probe was always presented with the same spatial phase. See also movie M2 for an example of the probe. (b) Evoked response waveforms for the probe stimuli under unadapted and adapted conditions. Data were only from the time interval shown in Figure 6a (and in panel c below). The data show that responses to identical stimuli were differentially affected by the identity of the adapter. (c) Strength of the first harmonic response as a function of time after the end of the adapting period. Both adapted responses were above baseline for approximately 5 seconds. Error patches on lines represent 1 SEM. (d) Phase-amplitude plots for the adapted EEG responses shown in Figure 6c. Ellipses represent 1 SEM.
(a-b) Space-time intensity plots of the adapters and gliders. (a) Subjects' visual systems were adapted with a static adapter, with opposing edge motion, with light edge motion, and with dark edge motion. Each presentation of the adapters had a random spatial phase. See also movie M1 for an example opposing edge stimulus. (b) Space-time plots corresponding to a single row of each glider stimulus (shown after adaptation). All centroids move to the right in the top row and to the left in the bottom row. See also movie M3 for example glider stimuli. (c) Individual subject responses to the glider stimuli following adaptation to the static adapter. Individual subjects' responses are coded by color. The underlying bar plot shows subject means and SEMs for each glider. (d) Two out of nine subjects perceived an overwhelming motion after-effect after adaptation to light or dark edge motion. Shown here is one of these subject's glider responses to the gliders after adaptation to rightward-moving light edges. All responses were to the left, a result that we interpreted as a motion after-effect resulting from net motion in the adapter. The opposing edge adapter avoided this problem.
Supplementary Figures 1–8, Supplementary Table 1, Supplementary Modeling, Supplementary Movies 1–3 (PDF 13460 kb)
This stimulus was designed to be equiluminant in time. Light edges move to the right, and dark edges move to the left. In the EEG experiment, this stimulus was full screen. See Fig. S7 for a space-time diagram of this stimulus. (MOV 2524 kb)
This stimulus consisted of two alternating versions of the opposing edge stimuli. Light edges moved to the right for half of the probe and to the left for the other half. Dark and light edges always moved in opposite directions. See Fig. S7 for a space-time diagram of this stimulus. (MOV 1031 kb)
Here each 3-pt glider stimulus is presented sequentially: positive diverging, negative diverging, positive converging, and negative converging. All glider centroids moved to the right. Most subjects perceived leftward motion in the second glider and rightward motion in the first, third, and fourth gliders (see Figure 7 and S8). (MOV 12452 kb)
About this article
Cite this article
Clark, D., Fitzgerald, J., Ales, J. et al. Flies and humans share a motion estimation strategy that exploits natural scene statistics. Nat Neurosci 17, 296–303 (2014). https://doi.org/10.1038/nn.3600
Current Biology (2020)
Journal of Vision (2020)
Proceedings of the National Academy of Sciences (2020)
Current Biology (2020)
Current Opinion in Neurobiology (2020)