Edge co-occurrences can account for rapid categorization of natural versus animal images

Perrinet, Laurent U.; Bednar, James A.

doi:10.1038/srep11400

Download PDF

Article
Open access
Published: 22 June 2015

Edge co-occurrences can account for rapid categorization of natural versus animal images

Laurent U. Perrinet¹ &
James A. Bednar²

Scientific Reports volume 5, Article number: 11400 (2015) Cite this article

4361 Accesses
20 Citations
16 Altmetric
Metrics details

Subjects

Abstract

Making a judgment about the semantic category of a visual scene, such as whether it contains an animal, is typically assumed to involve high-level associative brain areas. Previous explanations require progressively analyzing the scene hierarchically at increasing levels of abstraction, from edge extraction to mid-level object recognition and then object categorization. Here we show that the statistics of edge co-occurrences alone are sufficient to perform a rough yet robust (translation, scale and rotation invariant) scene categorization. We first extracted the edges from images using a scale-space analysis coupled with a sparse coding algorithm. We then computed the “association field” for different categories (natural, man-made, or containing an animal) by computing the statistics of edge co-occurrences. These differed strongly, with animal images having more curved configurations. We show that this geometry alone is sufficient for categorization and that the pattern of errors made by humans is consistent with this procedure. Because these statistics could be measured as early as the primary visual cortex, the results challenge widely held assumptions about the flow of computations in the visual system. The results also suggest new algorithms for image classification and signal processing that exploit correlations between low-level structure and the underlying semantic category.

Object representations in the human brain reflect the co-occurrence statistics of vision and language

Article Open access 02 July 2021

Disentangling diagnostic object properties for human scene categorization

Article Open access 11 April 2023

Common spatiotemporal processing of visual features shapes object representation

Article Open access 20 May 2019

Introduction

Oriented edges in images of natural scenes tend to be aligned in co-linear or co-circular arrangements, with lines and smooth curves more common than other possible arrangements of edges. The visual system appears to take advantage of this prior knowledge about natural images, with human contour detection and grouping performance well predicted by such an “association field”¹ between edge elements. Geisler et al.² have estimated this prior information available to the visual system by extracting contours from a database of natural images and showed that these statistics could predict behavioral data from humans in a line completion task.

One possible candidate substrate for implementing an association field in mammals is the set of long-range lateral connections between neurons in the primary visual cortex (V1), which could act to detect contours matching the association field³. To fill this role, lateral connections would need to be orientation specific and aligned along contours⁴ and indeed such an arrangement has been found in V1 of the tree shrew^3,5 and the monkey⁶.

In this paper, we show that an association field of this type can be used for image categorization. Human performance in tasks like determining whether an image contains an animal is surprisingly accurate even at very rapid time⁷. To explain this rapid process, previous researchers have investigated whether low-level features could explain human performance in rapid categorization tasks, but they concluded with general claims that “it is very unlikely that human observers could rely on low-level cues”, (SI Table 2)⁸ and “low-level information [alone] cannot explain human performance”, (p.19)⁹. Here we show that alternative low-level cues, namely the association field between edges represented as early as V1, can achieve image categorization performance comparable to humans and to previous hierarchical models of the visual system. We also show the images falsely reported as having animals by humans have association fields strongly resembling those of animal images, suggesting that humans are making use of this co-occurrence information when performing rapid image categorizations.

Results

To determine what information edge co-occurences could provide about image category, we examined how the statistics of edge co-occurrence vary across three image categories. The first two consist of the image databases (600 images each) used by Serre et al.⁸, which contain either animals at different close-up views in a natural setting (which we call “animal images”), or natural images without animals (which we call “non-animal natural images”). A third database consists of self-acquired images from a biology laboratory setting, containing 600 indoor views of a man-made indoor environment in which animals are reared (which we call “man-made images”). These images also do not contain animals, but provide a novel set for control purposes. From a sparse representation of oriented edges at different scales, we define the association field as the four-dimensional histogram of edge co-occurrences p(d, ψ, θ, σ) (see definitions in Fig. 1).

Computing the Kullback-Leibler (KL) divergence between this four-dimensional function and various possible factorizations suggests that we can consider p(d, σ) separately from p(ψ, θ) (see SI Table 1). The distribution of edge distances and relative scales p(d, σ) proved to be quite similar across the different classes of images (see Fig. 3), because these variables depend primarily on the viewpoint and location of the observer rather than on the objects in the scene. The remaining two angular parameters p(ψ, θ) can be visualized using a “chevron map” (see Fig. 2), which indicates the probability of each possible angular configuration between edges. As found in previous work², collinear, parallel and (to some extent) orthogonal configurations appear more commonly than chance. Results for other datasets are broadly similar, but with systematic differences. Figure 3 shows chevron maps for the difference between the non-animal natural image dataset and the other two datasets. In particular, images of man-made environments have more collinear configurations, while images with animals have more highly curved and converging angles and fewer collinear or orthogonal configurations.

To assess these differences quantitatively, we built a simple classifier to measure if this representation is sufficient to categorize different image categories reliably. For each individual image, we constructed a vector of features as either (FO) the first-order statistics, i.e., the histogram of edge orientations, (CM) the “chevron map” subset of the second-order statistics, (i.e., the two-dimensional histogram of relative orientation and azimuth; as in Fig. 2), or (SO) the full, four-dimensional histogram of second-order statistics (i.e., all four parameters of the edge co-occurrences). We gathered these vectors for each different class of images and tested a standard Support Vector Machine (SVM) classification algorithm. The results of the SVM classifier are reported using an F1 score, as in Serre et al.⁸, which equally weights false positive and false negative error rates to fairly assess each approach. Here we used the F1 score to directly compare our method to that of Serre et al.⁸. This process was cross-validated 20 times by drawing new training and testing sets. Using these different trials, we could measure the variability of the F1 score. The variability was always less than ≈ 4%. Results are summarized in Fig. 4.

Performance is almost perfect for distinguishing non-animal natural versus laboratory (man-made) images and is still quite high for classifying animal images versus non-animal natural images, a much more subtle distinction. This high level of performance is very surprising, given the explicit claims from Serre et al.⁸ and others that no low-level cue was likely to work well. Results for the chevron map confirm that performance of the classifier comes primarily from a geometrical feature rather than a viewpoint-dependent feature. Note that by definition, our measure of the statistics of edge co-occurrence is invariant to translations, scalings and rotations in the plane of the image. This last property—not shared by first-order statistics of edges—makes it possible to explain the rather unintuitive result that categorization in humans is relatively independent to rotations¹⁰. We also performed the same classification where images’ signal-to-noise ratio was halved (“Noise” condition above). Results are degraded but qualitatively similar, as was also observed in psychophysical experiments¹¹.

Finally, if humans use edge co-occurences to do rapid image categorization, images falsely detected by humans as containing animals should have second-order statistics similar to those of images that do contain animals. Figure 5 shows that the set of the most common false-alarm images does have a chevron map strikingly similar to that for images that do have animals and that on average the false alarm images have second-order histograms closer to the animal than to the non-animal datasets.

Discussion

These results call into question previous claims that a hierarchical analysis of the visual scene is necessary for classification into high-level categories⁸. We speculate that the observed differences in second-order statistics for images with animals have an underlying basis in the physical constraints governing the shapes of animals. Specifically, animals typically have compact shapes, constrained by their capacity to move, unlike plants rooted in one location that must stretch rather than move towards resources. Conversely, man-made objects tend to have long, straight lines due to their methods of manufacture. We would expect that other categories of objects could similarly be distinguished by their second-order statistics, assuming that their form follows their function in ways analogous to the categories tested here. Thus we expect that the second-order statistics will be useful as a rough but general and fast method for distinguishing a wide range of scene and object categories. Similar observations apply to other sensory systems, where we would expect co-occurence between primary sensory elements (such as stimulation of a patch of skin, presence of a specific auditory frequency, or activation of a taste or smell receptor) to differ between ecologically important classes of stimuli.

In this study, we showed that edge co-occurrences were sufficient to distinguish between the animal/non-animal datasets from Serre et al.⁸ and Kirchner and Thorpe¹², with performance comparable to that of humans in rapid categorization tasks. We also showed that these statistics can distinguish between these datasets and scenes of various man-made environments. How well will this approach generalize to other datasets, such as different combinations of animal/non-animal datasets? Our analysis suggests that similar performance should be found for all dataset pairs that have a statistically significant difference in the “roundness” of the contours extracted in the images. Although we expect such differences to be found reliably across the animal/non-animal datasets currently in use, it should be possible to find or construct a non-animal dataset that has similar edge co-occurence statistics to that of an animal dataset. For such comparisons, the model predicts that human observers would also have trouble rapidly making such a distinction (as suggested by the similar patterns of errors in Fig. 5). Selecting or constructing such image sets and testing them with human observers will be an important way that the performance of this approach can be tested in future studies; even though humans should be able to categorize the images reliably when given enough time for study, the model predicts that they will be unable to do so under the constraints of rapid categorization.

In addition, our results predict that animal measurements of p(θ, ψ) should be dynamically adaptive, as recently reported by McManus et al.¹³ for macaque V1, since p(θ, ψ) varies strongly across environments. The statistics of the dense network of lateral connections in V1 analyzed by Bosking et al.⁵ and Hunt et al.³ suggest that a local representation of these probabilities is available for supporting such computations. We predict that if these patterns of connectivity are built by adapting to the visual statistics, e.g., through Hebbian learning¹⁴, lab-reared animals will have much stronger connections between neurons with collinear preferences than will wild-raised animals. Finally, a straightforward prediction is that neural activity in early visual areas contributes directly to making even seemingly high-level judgments. Indeed, the model suggests that this set of features could be computed locally in these areas and projected to cortical or subcortical areas that mediate fast behavioral responses¹⁵. This prediction could be tested using methods similar to those in Michel et al.¹⁶, by recording neural activity in V1 in animals performing decision-making tasks with images whose curvature distribution has been synthetically modified.

Methods Summary

The first step of our method involves defining the dictionary of templates or filters for detecting edges. We use a log-Gabor representation, which is well suited to represent a wide range of natural images¹⁷ (animal or non-animal). This representation gives a generic model of edges as defined by their shape, orientation and scale. We set the parameters to match what has been reported for simple-cell responses in macaque area V1. This architecture is similar to that used by Geisler et al.² and is detailed in the supplementary material.

The resulting dictionary of edge filters is over-complete. The linear representation would thus give an inefficient representation of the distribution of edges (and thus of the statistics of edge co-occurrences). Therefore, starting from this linear representation, we searched for the most sparse representation. Because this search is combinatorial and thus very computationally expensive, we approximated a solution using a greedy approach first proposed by Perrinet et al.¹⁸. To validate the categorization performance, we used the standard SVM library as implemented by Pedregosa et al.¹⁹. We used the Jensen–Shannon divergence distance as a metric between histograms²⁰. The results of the SVM classifier are given as the F1 score to directly compare our method to that of Serre et al.⁸.

Additional Information

How to cite this article: Perrinet, L. U. and Bednar, J. A. Edge co-occurrences can account for rapid categorization of natural versus animal images. Sci. Rep. 5, 11400; doi: 10.1038/srep11400 (2015).

References

Field, D. J., Hayes, A. & Hess, R. F. Contour integration by the human visual system: evidence for a local “association field”. Vision Res. 33, 173–193 (1993) doi: 10.1016/0042-6989(93)90156-Q.
Article CAS Google Scholar
Geisler, W., Perry, J., Super, B. & Gallogly, D. Edge co-occurrence in natural images predicts contour grouping performance. Vision Res. 41, 711–724 (2001) doi: 10.1016/s0042-6989(00)00277-7.
Article CAS Google Scholar
Hunt, J. J., Bosking, W. H. & Goodhill, G. J. Statistical structure of lateral connections in the primary visual cortex. Neural Sys. & Cir. 1, 3+ (2011) doi: 10.1186/2042-1001-1-3.
Article Google Scholar
Hess, R. F., Hayes, A. & Field, D. J. Contour integration and cortical processing. J. Physiol.-Paris. 97, 105–119 (2003) doi: 10.1016/j.jphysparis.2003.09.013.
Article CAS Google Scholar
Bosking, W. H., Zhang, Y., Schofield, B. & Fitzpatrick, D. Orientation selectivity and the arrangement of horizontal connections in tree shrew striate cortex. J. Neurosci. 17, 2112–2127 (1997).
Article CAS Google Scholar
Sincich, L. C. & Blasdel, G. G. Oriented axon projections in primary visual cortex of the monkey. J. Neurosci. 21, 4416–4426 (2001).
Article CAS Google Scholar
Thorpe, s., Fize, D. & Marlot, c. Speed of processing in the human visual system. Nature. 381, 520–522 June (1996) doi: 10.1038/381520a0.
Article CAS ADS Google Scholar
Serre, T., Oliva, A. & Poggio, T. A feedforward architecture accounts for rapid categorization. PNAS. 104, 6424–6429 (2007) doi: 10.1073/pnas.0700622104.
Article CAS ADS Google Scholar
Drewes, J., Trommershauser, J. & Gegenfurtner, K. R. Parallel visual search and rapid animal detection in natural scenes. J. of Vision. 11, (2011) doi: 10.1167/11.2.20.
Crouzet, S. M. & Serre, T. What are the visual features underlying rapid object recognition? Front. Psycho. 2, 326+ (2011) doi: 10.3389/fpsyg.2011.00326.
Google Scholar
Felix A. Wichmann, Doris I. Braun & Karl R. Gegenfurtner. Phase noise and the classification of natural images. Vision Res. 46, 1520–1529 (2006) doi: 10.1016/j.visres.2005.11.008.
Article Google Scholar
Kirchner, H. & Thorpe, S. J. Ultra-rapid object detection with saccadic eye movements: Visual processing speed revisited. Vision Res. 46, 1762–1776 (2006).
Article Google Scholar
McManus, J. N. J., Li, W. & Gilbert, C. D. Adaptive shape processing in primary visual cortex. PNAS 108, 9739–9746 (2011) doi: 10.1073/pnas.1105855108.
Article CAS ADS Google Scholar
Bednar, J. A. Building a mechanistic model of the development and function of the primary visual cortex. J. Physiol.-Paris. 106, 194–211 (2012) doi: 10.1016/j.jphysparis.2011.12.001.
Article Google Scholar
Rice, G. E., Watson, D. M., Hartley, T. & Andrews, T. J. Low-Level image properties of visual objects predict patterns of neural response across Category-Selective regions of the ventral visual pathway. J. Neurosci. 34, 8837–8844 (2014) doi: 10.1523/jneurosci.5265-13.2014.
Article CAS Google Scholar
Michel, M. M., Chen, Y., Geisler, W. S. & Seidemann, E. An illusion predicted by V1 population activity implicates cortical topography in shape perception. Nat. Neurosci. 16, 1477–1483 (2013) doi: 10.1038/nn.3517.
Article CAS Google Scholar
Fischer, S., Redondo, R., Perrinet, L. U. & Cristobal, G. Sparse approximation of images inspired from the functional architecture of the primary visual areas. EURASIP J. A. S. P. 2007, 090727–122 (2007) doi: 10.1155/2007/90727.
MATH Google Scholar
Perrinet, L. U., Samuelides, M. & Thorpe, S. J. Sparse spike coding in an asynchronous feed-forward multi-layer neural network using matching pursuit. Neurocomputing. 57, 125–134 (2004) doi: 10.1016/j.neucom.2004.01.010.
Article Google Scholar
Pedregosa F. et al. Scikit-learn: Machine learning in Python. J. M. L. R. 12, 2825–2830 (2011).
MathSciNet MATH Google Scholar
Cha, S. H. & Srihari, S. N. On measuring the distance between histograms. Pattern Recogn. 35, 1355–1370 (2002) doi: 10.1016/s0031-3203(01)00118-2.
Article Google Scholar

Download references

Acknowledgements

L.U.P. was supported by EC IP project FP7-269921, “BrainScaleS" and ANR project ANR-13-BSV4-0014-02 “ANR BalaV1". Thanks to David Fitzpatrick for allowing J.A.B. access to the laboratories in which the man-made images were taken.

Author information

Authors and Affiliations

Institut de Neurosciences de la Timone, CNRS / Aix-Marseille Université,
Laurent U. Perrinet
Institute for Adaptive and Neural Computation, University of Edinburgh,
James A. Bednar

Authors

Laurent U. Perrinet
View author publications
You can also search for this author in PubMed Google Scholar
James A. Bednar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.U.P. and J.A.B. wrote the main manuscript text and L.P. prepared figures. All authors reviewed the manuscript.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Perrinet, L., Bednar, J. Edge co-occurrences can account for rapid categorization of natural versus animal images. Sci Rep 5, 11400 (2015). https://doi.org/10.1038/srep11400

Download citation

Received: 08 October 2014
Accepted: 13 May 2015
Published: 22 June 2015
DOI: https://doi.org/10.1038/srep11400

This article is cited by

Learning heterogeneous delays in a layer of spiking neurons for fast motion detection
- Antoine Grimaldi
- Laurent U. Perrinet
Biological Cybernetics (2023)
Revisiting horizontal connectivity rules in V1: from like-to-like towards like-to-all
- Frédéric Chavane
- Laurent Udo Perrinet
- James Rankin
Brain Structure and Function (2022)
Topography of Visual Features in the Human Ventral Visual Pathway
- Shijia Fan
- Xiaosha Wang
- Yanchao Bi
Neuroscience Bulletin (2021)
Co-circularity opponency in visual texture
- Hiromi Sato
- Frederick A. A. Kingdom
- Isamu Motoyoshi
Scientific Reports (2019)
Cross-orientation suppression in visual area V2
- Ryan J. Rowekamp
- Tatyana O. Sharpee
Nature Communications (2017)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.