Letter perception emerges from unsupervised deep learning and recycling of natural image features

Testolin, Alberto; Stoianov, Ivilin; Zorzi, Marco

doi:10.1038/s41562-017-0186-2

Letter
Published: 21 August 2017

Letter perception emerges from unsupervised deep learning and recycling of natural image features

Nature Human Behaviour volume 1, pages 657–664 (2017)Cite this article

1420 Accesses
33 Citations
38 Altmetric
Metrics details

Subjects

A Correction to this article was published on 02 November 2017

This article has been updated

Abstract

The use of written symbols is a major achievement of human cultural evolution. However, how abstract letter representations might be learned from vision is still an unsolved problem^1,2. Here, we present a large-scale computational model of letter recognition based on deep neural networks^3,4, which develops a hierarchy of increasingly more complex internal representations in a completely unsupervised way by fitting a probabilistic, generative model to the visual input^5,6. In line with the hypothesis that learning written symbols partially recycles pre-existing neuronal circuits for object recognition⁷, earlier processing levels in the model exploit domain-general visual features learned from natural images, while domain-specific features emerge in upstream neurons following exposure to printed letters. We show that these high-level representations can be easily mapped to letter identities even for noise-degraded images, producing accurate simulations of a broad range of empirical findings on letter perception in human observers. Our model shows that by reusing natural visual primitives, learning written symbols only requires limited, domain-specific tuning, supporting the hypothesis that their shape has been culturally selected to match the statistical structure of natural environments⁸.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Deep learning architecture and examples of natural image and printed letter data.**

**Fig. 2: Emergent neuronal receptive fields, representational selectivity, and letter identification accuracy in the model.**

**Fig. 3: Simulations of human psychophysical studies.**

**Fig. 4: Spatial-frequency analysis of perceptual channel mediating letter identification.**

Word contexts enhance the neural representation of individual letters in early visual cortex

Article Open access 16 January 2020

Micha Heilbron, David Richter, … Floris P. de Lange

Scale and translation-invariance for novel objects in human vision

Article Open access 29 January 2020

Yena Han, Gemma Roig, … Tomaso Poggio

Qualitative similarities and differences in visual object representations between brains and deep networks

Article Open access 25 March 2021

Georgin Jacob, R. T. Pramod, … S. P. Arun

Change history

02 November 2017
In the version of this Letter originally published, in the sentence beginning “Written symbols are culture specific...”, in the second example, ‘Φ’ was used instead of ‘F’; it should have read ‘(for example, ℱ versus F)’. This has now been corrected in all versions of the Letter.

References

Grainger, J., Rey, A. & Dufau, S. Letter perception: from pixels to pandemonium. Trends Cogn. Sci. 12, 381–387 (2008).
Article PubMed Google Scholar
Finkbeiner, M. & Coltheart, M. Letter recognition: from perception to representation. Cogn. Neuropsychol. 26, 1–6 (2009).
Article PubMed Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. E. Deep learning. Nature 521, 436–444 (2015).
Article CAS PubMed Google Scholar
Hinton, G. E. & Salakhutdinov, R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).
Article CAS PubMed Google Scholar
Zorzi, M., Testolin, A. & Stoianov, I. Modeling language and cognition with deep unsupervised learning: a tutorial overview. Front. Psychol. 4, 515 (2013).
Article PubMed Central PubMed Google Scholar
Hinton, G. E. Learning multiple layers of representation. Trends Cogn. Sci. 11, 428–434 (2007).
Article PubMed Google Scholar
Dehaene, S. & Cohen, L. Cultural recycling of cortical maps. Neuron 56, 384–398 (2007).
Article CAS PubMed Google Scholar
Changizi, M. A., Zhang, Q. & Ye, H. The structures of letters and symbols throughout human history are selected to match those found in objects in natural scenes. Am. Nat. 167, 117–139 (2006).
Article Google Scholar
Dehaene, S. Reading in the Brain: The New Science of How We Read (Penguin, London, 2009).
Google Scholar
Dehaene, S. & Cohen, L. The unique role of the visual word form area in reading. Trends Cogn. Sci. 15, 254–262 (2011).
Article PubMed Google Scholar
Grainger, J., Dufau, S., Montant, M., Ziegler, J. C. & Fagot, J. Orthographic processing in baboons (Papio papio). Science 336, 245–248 (2012).
Article CAS PubMed Google Scholar
Grainger, J., Dufau, S. & Ziegler, J. C. A vision of reading. Trends Cogn. Sci. 1529, 1–9 (2016).
Google Scholar
Dehaene, S., Cohen, L., Morais, J. & Kolinsky, R. Illiterate to literate: behavioural and cerebral changes induced by reading acquisition. Nat. Rev. Neurosci. 16, 234–244 (2015).
Article CAS PubMed Google Scholar
Riesenhuber, M. & Poggio, T. Hierarchical models of object recognition in cortex. Nat. Neurosci. 2, 1019–1025 (1999).
Article CAS PubMed Google Scholar
Felleman, D. J. & Van Essen, D. C. Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991).
Article CAS PubMed Google Scholar
DiCarlo, J. J., Zoccolan, D. & Rust, N. C. How does the brain solve visual object recognition? Neuron 73, 415–434 (2012).
Article CAS PubMed Central PubMed Google Scholar
Dehaene, S., Cohen, L., Sigman, M. & Vinckier, F. The neural code for written words: a proposal. Trends Cogn. Sci. 9, 335–341 (2005).
Article PubMed Google Scholar
Fiset, D. et al. Features for identification of uppercase and lowercase letters. Psychol. Sci. 19, 1161–1168 (2008).
Article PubMed Google Scholar
Polk, T. A. & Farah, M. J. A simple common contexts explanation for the development of abstract letter identities. Neural Comput. 9, 1277–1289 (1997).
Article CAS PubMed Google Scholar
Testolin, A., Stoianov, I., Sperduti, A. & Zorzi, M. Learning orthographic structure with sequential generative neural networks. Cogn. Sci. 40, 579–606 (2016).
Article PubMed Google Scholar
Carreiras, M., Armstrong, B. C., Perea, M. & Frost, R. The what, when, where, and how of visual word recognition. Trends Cogn. Sci. 18, 90–98 (2014).
Article PubMed Google Scholar
Pelli, D. G., Farell, B. & Moore, D. C. The remarkable inefficiency of word recognition. Nature 423, 752–756 (2003).
Article CAS PubMed Google Scholar
Ziegler, J. C., Perry, C. & Zorzi, M. Modelling reading development through phonological decoding and self-teaching: implications for dyslexia. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 369, 20120397 (2014).
Article PubMed Central PubMed Google Scholar
Harm, M. W. & Seidenberg, M. S. Phonology, reading acquisition, and dyslexia: insights from connectionist models. Psychol. Rev. 106, 491–528 (1999).
Article CAS PubMed Google Scholar
Thesen, T. et al. Sequential then interactive processing of letters and words in the left fusiform gyrus. Nat. Commun. 3, 1284 (2012).
Article PubMed Central PubMed Google Scholar
McClelland, J. L. & Rumelhart, D. E. An interactive activation model of context effects in letter perception: I. An account of basic findings. Psychol. Rev. 88, 375–407 (1981).
Article Google Scholar
Rey, A., Dufau, S., Massol, S. & Grainger, J. Testing computational models of letter perception with item-level event-related potentials. Cogn. Neuropsychol. 26, 7–22 (2009).
Article PubMed Google Scholar
Di Bono, M. G. & Zorzi, M. Deep generative learning of location-invariant visual word recognition. Front. Psychol. 4, 635 (2013).
PubMed Central PubMed Google Scholar
Chang, L.-Y., Plaut, D. C. & Perfetti, C. A. Visual complexity in orthographic learning: modeling learning across writing system variations. Sci. Stud. Read. 8438, 1–22 (2015).
Google Scholar
Friston, K. J. The free-energy principle: a unified brain theory? Nat. Rev. Neurosci. 11, 127–138 (2010).
Article CAS PubMed Google Scholar
Testolin, A. & Zorzi, M. Probabilistic models and generative neural networks: towards an unified framework for modeling normal and impaired neurocognitive functions. Front. Comput. Neurosci. 10, 73 (2016).
Article PubMed Central PubMed Google Scholar
Stoianov, I. & Zorzi, M. Emergence of a ‘visual number sense’ in hierarchical generative models. Nat. Neurosci. 15, 194–196 (2012).
Article CAS PubMed Google Scholar
Anderson, M. L. Neural reuse: a fundamental organizational principle of the brain. Behav. Brain Sci. 33, 245–313 (2010).
Article PubMed Google Scholar
Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Annu. Rev. Neurosci. 24, 1193–1216 (2001).
Article CAS PubMed Google Scholar
Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996).
Article CAS PubMed Google Scholar
Bell, A. J. & Sejnowski, T. J. The ‘independent components’ of natural scenes are edge filters. Vision Res. 37, 3327–3338 (1997).
Article CAS PubMed Central PubMed Google Scholar
Rao, R. P. N. & Ballard, D. H. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79–87 (1999).
Article CAS PubMed Google Scholar
Snavely, N., Seitz, S. M. & Szeliski, R. Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. 25, 835–846 (2006).
Article Google Scholar
Hubel, D. H. & Wiesel, T. N. Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 195, 215–243 (1968).
Article CAS PubMed Central PubMed Google Scholar
Candès, E. & Donoho, D. Ridgelets: a key to higher-dimensional intermittency? Philos. Trans. R. Soc. Lond. A Math. Phys. Eng. Sci. 357, 2495–2509 (1999).
Article Google Scholar
Olshausen, B. A. Highly Overcomplete Sparse Coding in Proceedings of SPIE Electronic Imaging 8651 (2013).
Hyvärinen, A., Hurri, J. & Hoyer, P. O. Natural Image Statistics: A Probabilistic Approach to Early Computational Vision. (Springer, London, 2009).
Book Google Scholar
Liu, L. et al. Spatial structure of neuronal receptive field in awake monkey secondary visual cortex (V2). Proc. Natl Acad. Sci. USA 113, 1913–1918 (2016).
Article CAS PubMed Central PubMed Google Scholar
Chang, C. H. C. et al. Adaptation of the human visual system to the statistics of letters and line configurations. Neuroimage 120, 428–440 (2015).
Article PubMed Google Scholar
Hutzler, F., Ziegler, J. C., Perry, C., Wimmer, H. & Zorzi, M. Do current connectionist learning models account for reading development in different languages? Cognition 91, 273–296 (2004).
Article PubMed Google Scholar
Mueller, S. T. & Weidemann, C. T. Alphabetic letter identification: effects of perceivability, similarity, and bias. Acta Psychol. (Amst.) 139, 19–37 (2012).
Article Google Scholar
Pelli, D. G., Burns, C. W., Farell, B. & Moore, D. C. Feature detection and letter identification. Vision Res. 46, 4646–4674 (2006).
Article PubMed Google Scholar
Moret-Tatay, C. & Perea, M. Do serifs provide an advantage in the recognition of written words? J. Cogn. Psychol. 23, 619–624 (2011).
Article Google Scholar
Parish, D. H. & Sperling, G. Object spatial frequencies, retinal spatial frequencies, noise, and the efficiency of letter discrimination. Vision Res. 31, 1399–1415 (1991).
Article CAS PubMed Google Scholar
Solomon, J. A. & Pelli, D. G. The visual filter mediating letter identification. Nature 369, 395–397 (1994).
Article CAS PubMed Google Scholar
Majaj, N. J., Pelli, D. G., Kurshan, P. & Palomares, M. The role of spatial frequency channels in letter identification. Vision Res. 42, 1165–1184 (2002).
Article PubMed Google Scholar
Bengio, Y. Deep Learning of Representations for Unsupervised and Transfer Learning in Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning Workshop 27, 17–36 (2012).
Google Scholar
Cottrell, G. W. Looking Around the Backyard Helps to Recognize Faces and Digits. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2008).
Larsen, A. & Bundesen, C. A template-matching pandemonium recognizes unconstrained handwritten characters with high accuracy. Mem. Cognit. 24, 136–143 (1996).
Article CAS PubMed Google Scholar
Zorzi, M. et al. Extra-large letter spacing improves reading in dyslexia. Proc. Natl Acad. Sci. USA 109, 11455–11459 (2012).
Article CAS PubMed Central PubMed Google Scholar
Zachrisson, B. Studies in the Legibility of Printed Text (Almqvist & Wiksell, Stockholm, Sweden, 1965).
Google Scholar
Legge, G. E. Psychophysics of Reading: Normal and Low Vision (Lawrence Erlbaum Associates, Mahwah, NJ, 2007).
Google Scholar
Wiley, R. W., Wilson, C. & Rapp, B. The effects of alphabet and expertise on letter perception. J. Exp. Psychol. Hum. Percept. Perform. 42, 1186–1203 (2016).
Article PubMed Central PubMed Google Scholar
Snow, C., Burns, S. & Griffin, P. Preventing Reading Difficulties in Young Children (National Academies Press, Washington, DC, 1998).
Google Scholar
Hinton, G. E. Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002).
Article PubMed Google Scholar
Hertz, J. A., Krogh, A. S. & Palmer, R. G. Introduction to the Theory of Neural Computation (Westview Press, Boulder, CO, 1991).
Google Scholar
Townsend, J. T. Theoretical analysis of an alphabetic confusion matrix. Percept. Psychophys. 9, 40–50 (1971).
Article Google Scholar
Gilmore, G. C., Hersh, H., Caramazza, A. & Griffin, J. Multidimensional letter similarity derived from recognition errors. Percept. Psychophys. 25, 425–431 (1979).
Article CAS PubMed Google Scholar
Phillips, J. R., Johnson, K. O. & Browne, H. M. A comparison of visual and two modes of tactual letter resolution. Percept. Psychophys. 34, 243–249 (1983).
Article CAS PubMed Google Scholar
Loomis, J. M. Analysis of tactile and visual confusion matrices. Percept. Psychophys. 31, 41–52 (1982).
Article CAS PubMed Google Scholar
Van Der Heijden, A. H. C., Malhas, M. S. M. & van den Roovaart, B. P. An empirical interletter confusion matrix for continuous-line capitals. Percept. Psychophys. 35, 85–88 (1984).
Article PubMed Google Scholar
LeBlanc, R. S. & Muise, J. G. Alphabetic confusion: a clarification. Percept. Psychophys. 37, 588–591 (1985).
Article CAS PubMed Google Scholar
Courrieu, P., Farioli, F. & Grainger, J. Inverse discrimination time as a perceptual distance for alphabetic characters. Vis. Cogn. 11, 901–919 (2004).
Article Google Scholar
Simpson, I. C., Mousikou, P., Montoya, J. M. & Defior, S. A letter visual-similarity matrix for Latin-based alphabets. Behav. Res. Methods 45, 431–439 (2012).
Article Google Scholar
Boles, D. B. & Clifford, J. E. An upper- and lowercase alphabetic similarity matrix, with derived generation similarity values. Behav. Res. Meth. Instrum. Comput. 21, 579–586 (1989).
Article Google Scholar
Podgorny, P. & Garner, W. R. Reaction time as a measure of inter- and intraobject visual similarity: letters of the alphabet. Percept. Psychophys. 26, 37–52 (1979).
Article Google Scholar
Pelli, D. G. & Bex, P. Measuring contrast sensitivity. Vision Res. 90, 10–14 (2013).
Article PubMed Central PubMed Google Scholar
Ziskind, A., Henaff, O., LeCun, Y. & Pelli, D. G. The Bottleneck in Human Letter Recognition: a Computational Model in Vision Sciences Society Annual Meeting 2014 (2014).
Testolin, A., Stoianov, I., De Filippo De Grazia, M. & Zorzi, M. Deep unsupervised learning on a desktop PC: a primer for cognitive scientists. Front. Psychol. 4, 251 (2013).
Article PubMed Central PubMed Google Scholar

Download references

Acknowledgements

This work was supported by grants from the European Research Council (no. 210922) and University of Padova (Strategic Grant NEURAT) to M.Z., I.S. was supported by a Marie Curie Intra European Fellowship PIEF-GA-2013-622882 within the 7th Framework Programme. We thank J. McClelland for useful discussions and K. Friston for suggestions on the simulation of the neuroimaging data. No funders had any role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Department of General Psychology and Padova Neuroscience Center, University of Padova, via Venezia 8, Padova, 35131, Italy
Alberto Testolin & Marco Zorzi
Laboratoire de Psychologie Cognitive - UMR7290, Centre National de la Recherche Scientifique, Aix-Marseille Université, 3, place Victor Hugo, Marseille, 13331, CEDEX 3, France
Ivilin Stoianov
Institute of Cognitive Sciences and Technologies (ISTC), National Research Council (CNR), Via Martiri della Libertà 2, Padova, 35137, Italy
Ivilin Stoianov
IRCCS San Camillo Hospital Foundation, via Alberoni 70, Venice-Lido, 30126, Italy
Marco Zorzi

Authors

Alberto Testolin
View author publications
You can also search for this author in PubMed Google Scholar
Ivilin Stoianov
View author publications
You can also search for this author in PubMed Google Scholar
Marco Zorzi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.T., M.Z. and I.S. conceived the experiments, discussed the results and wrote the paper. A.T. wrote the code and ran the simulations. A.T. and I.S. analysed the data.

Corresponding author

Correspondence to Marco Zorzi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A correction to this article is available online at https://doi.org/10.1038/s41562-017-0253-8.

Electronic supplementary material

Supplementary Information

Supplementary Figures 1–5, Supplementary Table 1, Supplementary Methods, Supplementary Results, Supplementary References

Life Sciences Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Testolin, A., Stoianov, I. & Zorzi, M. Letter perception emerges from unsupervised deep learning and recycling of natural image features. Nat Hum Behav 1, 657–664 (2017). https://doi.org/10.1038/s41562-017-0186-2

Download citation

Received: 09 September 2016
Accepted: 21 July 2017
Published: 21 August 2017
Issue Date: September 2017
DOI: https://doi.org/10.1038/s41562-017-0186-2

This article is cited by

On the ability of standard and brain-constrained deep neural networks to support cognitive superposition: a position paper
- Max Garagnani
Cognitive Neurodynamics (2024)
A Developmental Approach for Training Deep Belief Networks
- Matteo Zambra
- Alberto Testolin
- Marco Zorzi
Cognitive Computation (2023)
Unsupervised learning predicts human perception and misperception of gloss
- Katherine R. Storrs
- Barton L. Anderson
- Roland W. Fleming
Nature Human Behaviour (2021)
Visual sense of number vs. sense of magnitude in humans and machines
- Alberto Testolin
- Serena Dolfi
- Marco Zorzi
Scientific Reports (2020)
Learning representation hierarchies by sharing visual features: a computational investigation of Persian character recognition with unsupervised deep learning
- Zahra Sadeghi
- Alberto Testolin
Cognitive Processing (2017)