The use of written symbols is a major achievement of human cultural evolution. However, how abstract letter representations might be learned from vision is still an unsolved problem1,2. Here, we present a large-scale computational model of letter recognition based on deep neural networks3,4, which develops a hierarchy of increasingly more complex internal representations in a completely unsupervised way by fitting a probabilistic, generative model to the visual input5,6. In line with the hypothesis that learning written symbols partially recycles pre-existing neuronal circuits for object recognition7, earlier processing levels in the model exploit domain-general visual features learned from natural images, while domain-specific features emerge in upstream neurons following exposure to printed letters. We show that these high-level representations can be easily mapped to letter identities even for noise-degraded images, producing accurate simulations of a broad range of empirical findings on letter perception in human observers. Our model shows that by reusing natural visual primitives, learning written symbols only requires limited, domain-specific tuning, supporting the hypothesis that their shape has been culturally selected to match the statistical structure of natural environments8.
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Grainger, J., Rey, A. & Dufau, S. Letter perception: from pixels to pandemonium. Trends Cogn. Sci. 12, 381–387 (2008).
Finkbeiner, M. & Coltheart, M. Letter recognition: from perception to representation. Cogn. Neuropsychol. 26, 1–6 (2009).
LeCun, Y., Bengio, Y. & Hinton, G. E. Deep learning. Nature 521, 436–444 (2015).
Hinton, G. E. & Salakhutdinov, R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).
Zorzi, M., Testolin, A. & Stoianov, I. Modeling language and cognition with deep unsupervised learning: a tutorial overview. Front. Psychol. 4, 515 (2013).
Hinton, G. E. Learning multiple layers of representation. Trends Cogn. Sci. 11, 428–434 (2007).
Dehaene, S. & Cohen, L. Cultural recycling of cortical maps. Neuron 56, 384–398 (2007).
Changizi, M. A., Zhang, Q. & Ye, H. The structures of letters and symbols throughout human history are selected to match those found in objects in natural scenes. Am. Nat. 167, 117–139 (2006).
Dehaene, S. Reading in the Brain: The New Science of How We Read (Penguin, London, 2009).
Dehaene, S. & Cohen, L. The unique role of the visual word form area in reading. Trends Cogn. Sci. 15, 254–262 (2011).
Grainger, J., Dufau, S., Montant, M., Ziegler, J. C. & Fagot, J. Orthographic processing in baboons (Papio papio). Science 336, 245–248 (2012).
Grainger, J., Dufau, S. & Ziegler, J. C. A vision of reading. Trends Cogn. Sci. 1529, 1–9 (2016).
Dehaene, S., Cohen, L., Morais, J. & Kolinsky, R. Illiterate to literate: behavioural and cerebral changes induced by reading acquisition. Nat. Rev. Neurosci. 16, 234–244 (2015).
Riesenhuber, M. & Poggio, T. Hierarchical models of object recognition in cortex. Nat. Neurosci. 2, 1019–1025 (1999).
Felleman, D. J. & Van Essen, D. C. Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991).
DiCarlo, J. J., Zoccolan, D. & Rust, N. C. How does the brain solve visual object recognition? Neuron 73, 415–434 (2012).
Dehaene, S., Cohen, L., Sigman, M. & Vinckier, F. The neural code for written words: a proposal. Trends Cogn. Sci. 9, 335–341 (2005).
Fiset, D. et al. Features for identification of uppercase and lowercase letters. Psychol. Sci. 19, 1161–1168 (2008).
Polk, T. A. & Farah, M. J. A simple common contexts explanation for the development of abstract letter identities. Neural Comput. 9, 1277–1289 (1997).
Testolin, A., Stoianov, I., Sperduti, A. & Zorzi, M. Learning orthographic structure with sequential generative neural networks. Cogn. Sci. 40, 579–606 (2016).
Carreiras, M., Armstrong, B. C., Perea, M. & Frost, R. The what, when, where, and how of visual word recognition. Trends Cogn. Sci. 18, 90–98 (2014).
Pelli, D. G., Farell, B. & Moore, D. C. The remarkable inefficiency of word recognition. Nature 423, 752–756 (2003).
Ziegler, J. C., Perry, C. & Zorzi, M. Modelling reading development through phonological decoding and self-teaching: implications for dyslexia. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 369, 20120397 (2014).
Harm, M. W. & Seidenberg, M. S. Phonology, reading acquisition, and dyslexia: insights from connectionist models. Psychol. Rev. 106, 491–528 (1999).
Thesen, T. et al. Sequential then interactive processing of letters and words in the left fusiform gyrus. Nat. Commun. 3, 1284 (2012).
McClelland, J. L. & Rumelhart, D. E. An interactive activation model of context effects in letter perception: I. An account of basic findings. Psychol. Rev. 88, 375–407 (1981).
Rey, A., Dufau, S., Massol, S. & Grainger, J. Testing computational models of letter perception with item-level event-related potentials. Cogn. Neuropsychol. 26, 7–22 (2009).
Di Bono, M. G. & Zorzi, M. Deep generative learning of location-invariant visual word recognition. Front. Psychol. 4, 635 (2013).
Chang, L.-Y., Plaut, D. C. & Perfetti, C. A. Visual complexity in orthographic learning: modeling learning across writing system variations. Sci. Stud. Read. 8438, 1–22 (2015).
Friston, K. J. The free-energy principle: a unified brain theory? Nat. Rev. Neurosci. 11, 127–138 (2010).
Testolin, A. & Zorzi, M. Probabilistic models and generative neural networks: towards an unified framework for modeling normal and impaired neurocognitive functions. Front. Comput. Neurosci. 10, 73 (2016).
Stoianov, I. & Zorzi, M. Emergence of a ‘visual number sense’ in hierarchical generative models. Nat. Neurosci. 15, 194–196 (2012).
Anderson, M. L. Neural reuse: a fundamental organizational principle of the brain. Behav. Brain Sci. 33, 245–313 (2010).
Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Annu. Rev. Neurosci. 24, 1193–1216 (2001).
Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996).
Bell, A. J. & Sejnowski, T. J. The ‘independent components’ of natural scenes are edge filters. Vision Res. 37, 3327–3338 (1997).
Rao, R. P. N. & Ballard, D. H. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79–87 (1999).
Snavely, N., Seitz, S. M. & Szeliski, R. Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. 25, 835–846 (2006).
Hubel, D. H. & Wiesel, T. N. Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 195, 215–243 (1968).
Candès, E. & Donoho, D. Ridgelets: a key to higher-dimensional intermittency? Philos. Trans. R. Soc. Lond. A Math. Phys. Eng. Sci. 357, 2495–2509 (1999).
Olshausen, B. A. Highly Overcomplete Sparse Coding in Proceedings of SPIE Electronic Imaging 8651 (2013).
Hyvärinen, A., Hurri, J. & Hoyer, P. O. Natural Image Statistics: A Probabilistic Approach to Early Computational Vision. (Springer, London, 2009).
Liu, L. et al. Spatial structure of neuronal receptive field in awake monkey secondary visual cortex (V2). Proc. Natl Acad. Sci. USA 113, 1913–1918 (2016).
Chang, C. H. C. et al. Adaptation of the human visual system to the statistics of letters and line configurations. Neuroimage 120, 428–440 (2015).
Hutzler, F., Ziegler, J. C., Perry, C., Wimmer, H. & Zorzi, M. Do current connectionist learning models account for reading development in different languages? Cognition 91, 273–296 (2004).
Mueller, S. T. & Weidemann, C. T. Alphabetic letter identification: effects of perceivability, similarity, and bias. Acta Psychol. (Amst.) 139, 19–37 (2012).
Pelli, D. G., Burns, C. W., Farell, B. & Moore, D. C. Feature detection and letter identification. Vision Res. 46, 4646–4674 (2006).
Moret-Tatay, C. & Perea, M. Do serifs provide an advantage in the recognition of written words? J. Cogn. Psychol. 23, 619–624 (2011).
Parish, D. H. & Sperling, G. Object spatial frequencies, retinal spatial frequencies, noise, and the efficiency of letter discrimination. Vision Res. 31, 1399–1415 (1991).
Solomon, J. A. & Pelli, D. G. The visual filter mediating letter identification. Nature 369, 395–397 (1994).
Majaj, N. J., Pelli, D. G., Kurshan, P. & Palomares, M. The role of spatial frequency channels in letter identification. Vision Res. 42, 1165–1184 (2002).
Bengio, Y. Deep Learning of Representations for Unsupervised and Transfer Learning in Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning Workshop 27, 17–36 (2012).
Cottrell, G. W. Looking Around the Backyard Helps to Recognize Faces and Digits. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2008).
Larsen, A. & Bundesen, C. A template-matching pandemonium recognizes unconstrained handwritten characters with high accuracy. Mem. Cognit. 24, 136–143 (1996).
Zorzi, M. et al. Extra-large letter spacing improves reading in dyslexia. Proc. Natl Acad. Sci. USA 109, 11455–11459 (2012).
Zachrisson, B. Studies in the Legibility of Printed Text (Almqvist & Wiksell, Stockholm, Sweden, 1965).
Legge, G. E. Psychophysics of Reading: Normal and Low Vision (Lawrence Erlbaum Associates, Mahwah, NJ, 2007).
Wiley, R. W., Wilson, C. & Rapp, B. The effects of alphabet and expertise on letter perception. J. Exp. Psychol. Hum. Percept. Perform. 42, 1186–1203 (2016).
Snow, C., Burns, S. & Griffin, P. Preventing Reading Difficulties in Young Children (National Academies Press, Washington, DC, 1998).
Hinton, G. E. Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002).
Hertz, J. A., Krogh, A. S. & Palmer, R. G. Introduction to the Theory of Neural Computation (Westview Press, Boulder, CO, 1991).
Townsend, J. T. Theoretical analysis of an alphabetic confusion matrix. Percept. Psychophys. 9, 40–50 (1971).
Gilmore, G. C., Hersh, H., Caramazza, A. & Griffin, J. Multidimensional letter similarity derived from recognition errors. Percept. Psychophys. 25, 425–431 (1979).
Phillips, J. R., Johnson, K. O. & Browne, H. M. A comparison of visual and two modes of tactual letter resolution. Percept. Psychophys. 34, 243–249 (1983).
Loomis, J. M. Analysis of tactile and visual confusion matrices. Percept. Psychophys. 31, 41–52 (1982).
Van Der Heijden, A. H. C., Malhas, M. S. M. & van den Roovaart, B. P. An empirical interletter confusion matrix for continuous-line capitals. Percept. Psychophys. 35, 85–88 (1984).
LeBlanc, R. S. & Muise, J. G. Alphabetic confusion: a clarification. Percept. Psychophys. 37, 588–591 (1985).
Courrieu, P., Farioli, F. & Grainger, J. Inverse discrimination time as a perceptual distance for alphabetic characters. Vis. Cogn. 11, 901–919 (2004).
Simpson, I. C., Mousikou, P., Montoya, J. M. & Defior, S. A letter visual-similarity matrix for Latin-based alphabets. Behav. Res. Methods 45, 431–439 (2012).
Boles, D. B. & Clifford, J. E. An upper- and lowercase alphabetic similarity matrix, with derived generation similarity values. Behav. Res. Meth. Instrum. Comput. 21, 579–586 (1989).
Podgorny, P. & Garner, W. R. Reaction time as a measure of inter- and intraobject visual similarity: letters of the alphabet. Percept. Psychophys. 26, 37–52 (1979).
Pelli, D. G. & Bex, P. Measuring contrast sensitivity. Vision Res. 90, 10–14 (2013).
Ziskind, A., Henaff, O., LeCun, Y. & Pelli, D. G. The Bottleneck in Human Letter Recognition: a Computational Model in Vision Sciences Society Annual Meeting 2014 (2014).
Testolin, A., Stoianov, I., De Filippo De Grazia, M. & Zorzi, M. Deep unsupervised learning on a desktop PC: a primer for cognitive scientists. Front. Psychol. 4, 251 (2013).
This work was supported by grants from the European Research Council (no. 210922) and University of Padova (Strategic Grant NEURAT) to M.Z., I.S. was supported by a Marie Curie Intra European Fellowship PIEF-GA-2013-622882 within the 7th Framework Programme. We thank J. McClelland for useful discussions and K. Friston for suggestions on the simulation of the neuroimaging data. No funders had any role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A correction to this article is available online at https://doi.org/10.1038/s41562-017-0253-8.
About this article
Cite this article
Testolin, A., Stoianov, I. & Zorzi, M. Letter perception emerges from unsupervised deep learning and recycling of natural image features. Nat Hum Behav 1, 657–664 (2017). https://doi.org/10.1038/s41562-017-0186-2
Nature Human Behaviour (2021)
PLOS ONE (2021)
Scientific Reports (2020)
Applied Sciences (2020)
Combining Denoising Autoencoders and Dynamic Programming for Acoustic Detection and Tracking of Underwater Moving Targets