Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Letter perception emerges from unsupervised deep learning and recycling of natural image features

A Correction to this article was published on 02 November 2017

This article has been updated

Abstract

The use of written symbols is a major achievement of human cultural evolution. However, how abstract letter representations might be learned from vision is still an unsolved problem1,2. Here, we present a large-scale computational model of letter recognition based on deep neural networks3,4, which develops a hierarchy of increasingly more complex internal representations in a completely unsupervised way by fitting a probabilistic, generative model to the visual input5,6. In line with the hypothesis that learning written symbols partially recycles pre-existing neuronal circuits for object recognition7, earlier processing levels in the model exploit domain-general visual features learned from natural images, while domain-specific features emerge in upstream neurons following exposure to printed letters. We show that these high-level representations can be easily mapped to letter identities even for noise-degraded images, producing accurate simulations of a broad range of empirical findings on letter perception in human observers. Our model shows that by reusing natural visual primitives, learning written symbols only requires limited, domain-specific tuning, supporting the hypothesis that their shape has been culturally selected to match the statistical structure of natural environments8.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Deep learning architecture and examples of natural image and printed letter data.
Fig. 2: Emergent neuronal receptive fields, representational selectivity, and letter identification accuracy in the model.
Fig. 3: Simulations of human psychophysical studies.
Fig. 4: Spatial-frequency analysis of perceptual channel mediating letter identification.

Change history

  • 02 November 2017

    In the version of this Letter originally published, in the sentence beginning “Written symbols are culture specific...”, in the second example, ‘Φ’ was used instead of ‘F’; it should have read ‘(for example, versus F)’. This has now been corrected in all versions of the Letter.

References

  1. 1.

    Grainger, J., Rey, A. & Dufau, S. Letter perception: from pixels to pandemonium. Trends Cogn. Sci. 12, 381–387 (2008).

    Article  PubMed  Google Scholar 

  2. 2.

    Finkbeiner, M. & Coltheart, M. Letter recognition: from perception to representation. Cogn. Neuropsychol. 26, 1–6 (2009).

    Article  PubMed  Google Scholar 

  3. 3.

    LeCun, Y., Bengio, Y. & Hinton, G. E. Deep learning. Nature 521, 436–444 (2015).

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Hinton, G. E. & Salakhutdinov, R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).

    CAS  Article  PubMed  Google Scholar 

  5. 5.

    Zorzi, M., Testolin, A. & Stoianov, I. Modeling language and cognition with deep unsupervised learning: a tutorial overview. Front. Psychol. 4, 515 (2013).

    Article  PubMed Central  PubMed  Google Scholar 

  6. 6.

    Hinton, G. E. Learning multiple layers of representation. Trends Cogn. Sci. 11, 428–434 (2007).

    Article  PubMed  Google Scholar 

  7. 7.

    Dehaene, S. & Cohen, L. Cultural recycling of cortical maps. Neuron 56, 384–398 (2007).

    CAS  Article  PubMed  Google Scholar 

  8. 8.

    Changizi, M. A., Zhang, Q. & Ye, H. The structures of letters and symbols throughout human history are selected to match those found in objects in natural scenes. Am. Nat. 167, 117–139 (2006).

    Article  Google Scholar 

  9. 9.

    Dehaene, S. Reading in the Brain: The New Science of How We Read (Penguin, London, 2009).

    Google Scholar 

  10. 10.

    Dehaene, S. & Cohen, L. The unique role of the visual word form area in reading. Trends Cogn. Sci. 15, 254–262 (2011).

    Article  PubMed  Google Scholar 

  11. 11.

    Grainger, J., Dufau, S., Montant, M., Ziegler, J. C. & Fagot, J. Orthographic processing in baboons (Papio papio). Science 336, 245–248 (2012).

    CAS  Article  PubMed  Google Scholar 

  12. 12.

    Grainger, J., Dufau, S. & Ziegler, J. C. A vision of reading. Trends Cogn. Sci. 1529, 1–9 (2016).

    Google Scholar 

  13. 13.

    Dehaene, S., Cohen, L., Morais, J. & Kolinsky, R. Illiterate to literate: behavioural and cerebral changes induced by reading acquisition. Nat. Rev. Neurosci. 16, 234–244 (2015).

    CAS  Article  PubMed  Google Scholar 

  14. 14.

    Riesenhuber, M. & Poggio, T. Hierarchical models of object recognition in cortex. Nat. Neurosci. 2, 1019–1025 (1999).

    CAS  Article  PubMed  Google Scholar 

  15. 15.

    Felleman, D. J. & Van Essen, D. C. Distributed hierarchical processing in the primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991).

    CAS  Article  PubMed  Google Scholar 

  16. 16.

    DiCarlo, J. J., Zoccolan, D. & Rust, N. C. How does the brain solve visual object recognition? Neuron 73, 415–434 (2012).

    CAS  Article  PubMed Central  PubMed  Google Scholar 

  17. 17.

    Dehaene, S., Cohen, L., Sigman, M. & Vinckier, F. The neural code for written words: a proposal. Trends Cogn. Sci. 9, 335–341 (2005).

    Article  PubMed  Google Scholar 

  18. 18.

    Fiset, D. et al. Features for identification of uppercase and lowercase letters. Psychol. Sci. 19, 1161–1168 (2008).

    Article  PubMed  Google Scholar 

  19. 19.

    Polk, T. A. & Farah, M. J. A simple common contexts explanation for the development of abstract letter identities. Neural Comput. 9, 1277–1289 (1997).

    CAS  Article  PubMed  Google Scholar 

  20. 20.

    Testolin, A., Stoianov, I., Sperduti, A. & Zorzi, M. Learning orthographic structure with sequential generative neural networks. Cogn. Sci. 40, 579–606 (2016).

    Article  PubMed  Google Scholar 

  21. 21.

    Carreiras, M., Armstrong, B. C., Perea, M. & Frost, R. The what, when, where, and how of visual word recognition. Trends Cogn. Sci. 18, 90–98 (2014).

    Article  PubMed  Google Scholar 

  22. 22.

    Pelli, D. G., Farell, B. & Moore, D. C. The remarkable inefficiency of word recognition. Nature 423, 752–756 (2003).

    CAS  Article  PubMed  Google Scholar 

  23. 23.

    Ziegler, J. C., Perry, C. & Zorzi, M. Modelling reading development through phonological decoding and self-teaching: implications for dyslexia. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 369, 20120397 (2014).

    Article  PubMed Central  PubMed  Google Scholar 

  24. 24.

    Harm, M. W. & Seidenberg, M. S. Phonology, reading acquisition, and dyslexia: insights from connectionist models. Psychol. Rev. 106, 491–528 (1999).

    CAS  Article  PubMed  Google Scholar 

  25. 25.

    Thesen, T. et al. Sequential then interactive processing of letters and words in the left fusiform gyrus. Nat. Commun. 3, 1284 (2012).

    Article  PubMed Central  PubMed  Google Scholar 

  26. 26.

    McClelland, J. L. & Rumelhart, D. E. An interactive activation model of context effects in letter perception: I. An account of basic findings. Psychol. Rev. 88, 375–407 (1981).

    Article  Google Scholar 

  27. 27.

    Rey, A., Dufau, S., Massol, S. & Grainger, J. Testing computational models of letter perception with item-level event-related potentials. Cogn. Neuropsychol. 26, 7–22 (2009).

    Article  PubMed  Google Scholar 

  28. 28.

    Di Bono, M. G. & Zorzi, M. Deep generative learning of location-invariant visual word recognition. Front. Psychol. 4, 635 (2013).

    PubMed Central  PubMed  Google Scholar 

  29. 29.

    Chang, L.-Y., Plaut, D. C. & Perfetti, C. A. Visual complexity in orthographic learning: modeling learning across writing system variations. Sci. Stud. Read. 8438, 1–22 (2015).

    Google Scholar 

  30. 30.

    Friston, K. J. The free-energy principle: a unified brain theory? Nat. Rev. Neurosci. 11, 127–138 (2010).

    CAS  Article  PubMed  Google Scholar 

  31. 31.

    Testolin, A. & Zorzi, M. Probabilistic models and generative neural networks: towards an unified framework for modeling normal and impaired neurocognitive functions. Front. Comput. Neurosci. 10, 73 (2016).

    Article  PubMed Central  PubMed  Google Scholar 

  32. 32.

    Stoianov, I. & Zorzi, M. Emergence of a ‘visual number sense’ in hierarchical generative models. Nat. Neurosci. 15, 194–196 (2012).

    CAS  Article  PubMed  Google Scholar 

  33. 33.

    Anderson, M. L. Neural reuse: a fundamental organizational principle of the brain. Behav. Brain Sci. 33, 245–313 (2010).

    Article  PubMed  Google Scholar 

  34. 34.

    Simoncelli, E. P. & Olshausen, B. A. Natural image statistics and neural representation. Annu. Rev. Neurosci. 24, 1193–1216 (2001).

    CAS  Article  PubMed  Google Scholar 

  35. 35.

    Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996).

    CAS  Article  PubMed  Google Scholar 

  36. 36.

    Bell, A. J. & Sejnowski, T. J. The ‘independent components’ of natural scenes are edge filters. Vision Res. 37, 3327–3338 (1997).

    CAS  Article  PubMed Central  PubMed  Google Scholar 

  37. 37.

    Rao, R. P. N. & Ballard, D. H. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 2, 79–87 (1999).

    CAS  Article  PubMed  Google Scholar 

  38. 38.

    Snavely, N., Seitz, S. M. & Szeliski, R. Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. 25, 835–846 (2006).

    Article  Google Scholar 

  39. 39.

    Hubel, D. H. & Wiesel, T. N. Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 195, 215–243 (1968).

    CAS  Article  PubMed Central  PubMed  Google Scholar 

  40. 40.

    Candès, E. & Donoho, D. Ridgelets: a key to higher-dimensional intermittency? Philos. Trans. R. Soc. Lond. A Math. Phys. Eng. Sci. 357, 2495–2509 (1999).

    Article  Google Scholar 

  41. 41.

    Olshausen, B. A. Highly Overcomplete Sparse Coding in Proceedings of SPIE Electronic Imaging 8651 (2013).

  42. 42.

    Hyvärinen, A., Hurri, J. & Hoyer, P. O. Natural Image Statistics: A Probabilistic Approach to Early Computational Vision. (Springer, London, 2009).

    Book  Google Scholar 

  43. 43.

    Liu, L. et al. Spatial structure of neuronal receptive field in awake monkey secondary visual cortex (V2). Proc. Natl Acad. Sci. USA 113, 1913–1918 (2016).

    CAS  Article  PubMed Central  PubMed  Google Scholar 

  44. 44.

    Chang, C. H. C. et al. Adaptation of the human visual system to the statistics of letters and line configurations. Neuroimage 120, 428–440 (2015).

    Article  PubMed  Google Scholar 

  45. 45.

    Hutzler, F., Ziegler, J. C., Perry, C., Wimmer, H. & Zorzi, M. Do current connectionist learning models account for reading development in different languages? Cognition 91, 273–296 (2004).

    Article  PubMed  Google Scholar 

  46. 46.

    Mueller, S. T. & Weidemann, C. T. Alphabetic letter identification: effects of perceivability, similarity, and bias. Acta Psychol. (Amst.) 139, 19–37 (2012).

    Article  Google Scholar 

  47. 47.

    Pelli, D. G., Burns, C. W., Farell, B. & Moore, D. C. Feature detection and letter identification. Vision Res. 46, 4646–4674 (2006).

    Article  PubMed  Google Scholar 

  48. 48.

    Moret-Tatay, C. & Perea, M. Do serifs provide an advantage in the recognition of written words? J. Cogn. Psychol. 23, 619–624 (2011).

    Article  Google Scholar 

  49. 49.

    Parish, D. H. & Sperling, G. Object spatial frequencies, retinal spatial frequencies, noise, and the efficiency of letter discrimination. Vision Res. 31, 1399–1415 (1991).

    CAS  Article  PubMed  Google Scholar 

  50. 50.

    Solomon, J. A. & Pelli, D. G. The visual filter mediating letter identification. Nature 369, 395–397 (1994).

    CAS  Article  PubMed  Google Scholar 

  51. 51.

    Majaj, N. J., Pelli, D. G., Kurshan, P. & Palomares, M. The role of spatial frequency channels in letter identification. Vision Res. 42, 1165–1184 (2002).

    Article  PubMed  Google Scholar 

  52. 52.

    Bengio, Y. Deep Learning of Representations for Unsupervised and Transfer Learning in Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning Workshop 27, 17–36 (2012).

    Google Scholar 

  53. 53.

    Cottrell, G. W. Looking Around the Backyard Helps to Recognize Faces and Digits. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2008).

  54. 54.

    Larsen, A. & Bundesen, C. A template-matching pandemonium recognizes unconstrained handwritten characters with high accuracy. Mem. Cognit. 24, 136–143 (1996).

    CAS  Article  PubMed  Google Scholar 

  55. 55.

    Zorzi, M. et al. Extra-large letter spacing improves reading in dyslexia. Proc. Natl Acad. Sci. USA 109, 11455–11459 (2012).

    CAS  Article  PubMed Central  PubMed  Google Scholar 

  56. 56.

    Zachrisson, B. Studies in the Legibility of Printed Text (Almqvist & Wiksell, Stockholm, Sweden, 1965).

    Google Scholar 

  57. 57.

    Legge, G. E. Psychophysics of Reading: Normal and Low Vision (Lawrence Erlbaum Associates, Mahwah, NJ, 2007).

    Google Scholar 

  58. 58.

    Wiley, R. W., Wilson, C. & Rapp, B. The effects of alphabet and expertise on letter perception. J. Exp. Psychol. Hum. Percept. Perform. 42, 1186–1203 (2016).

    Article  PubMed Central  PubMed  Google Scholar 

  59. 59.

    Snow, C., Burns, S. & Griffin, P. Preventing Reading Difficulties in Young Children (National Academies Press, Washington, DC, 1998).

    Google Scholar 

  60. 60.

    Hinton, G. E. Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002).

    Article  PubMed  Google Scholar 

  61. 61.

    Hertz, J. A., Krogh, A. S. & Palmer, R. G. Introduction to the Theory of Neural Computation (Westview Press, Boulder, CO, 1991).

    Google Scholar 

  62. 62.

    Townsend, J. T. Theoretical analysis of an alphabetic confusion matrix. Percept. Psychophys. 9, 40–50 (1971).

    Article  Google Scholar 

  63. 63.

    Gilmore, G. C., Hersh, H., Caramazza, A. & Griffin, J. Multidimensional letter similarity derived from recognition errors. Percept. Psychophys. 25, 425–431 (1979).

    CAS  Article  PubMed  Google Scholar 

  64. 64.

    Phillips, J. R., Johnson, K. O. & Browne, H. M. A comparison of visual and two modes of tactual letter resolution. Percept. Psychophys. 34, 243–249 (1983).

    CAS  Article  PubMed  Google Scholar 

  65. 65.

    Loomis, J. M. Analysis of tactile and visual confusion matrices. Percept. Psychophys. 31, 41–52 (1982).

    CAS  Article  PubMed  Google Scholar 

  66. 66.

    Van Der Heijden, A. H. C., Malhas, M. S. M. & van den Roovaart, B. P. An empirical interletter confusion matrix for continuous-line capitals. Percept. Psychophys. 35, 85–88 (1984).

    Article  PubMed  Google Scholar 

  67. 67.

    LeBlanc, R. S. & Muise, J. G. Alphabetic confusion: a clarification. Percept. Psychophys. 37, 588–591 (1985).

    CAS  Article  PubMed  Google Scholar 

  68. 68.

    Courrieu, P., Farioli, F. & Grainger, J. Inverse discrimination time as a perceptual distance for alphabetic characters. Vis. Cogn. 11, 901–919 (2004).

    Article  Google Scholar 

  69. 69.

    Simpson, I. C., Mousikou, P., Montoya, J. M. & Defior, S. A letter visual-similarity matrix for Latin-based alphabets. Behav. Res. Methods 45, 431–439 (2012).

    Article  Google Scholar 

  70. 70.

    Boles, D. B. & Clifford, J. E. An upper- and lowercase alphabetic similarity matrix, with derived generation similarity values. Behav. Res. Meth. Instrum. Comput. 21, 579–586 (1989).

    Article  Google Scholar 

  71. 71.

    Podgorny, P. & Garner, W. R. Reaction time as a measure of inter- and intraobject visual similarity: letters of the alphabet. Percept. Psychophys. 26, 37–52 (1979).

    Article  Google Scholar 

  72. 72.

    Pelli, D. G. & Bex, P. Measuring contrast sensitivity. Vision Res. 90, 10–14 (2013).

    Article  PubMed Central  PubMed  Google Scholar 

  73. 73.

    Ziskind, A., Henaff, O., LeCun, Y. & Pelli, D. G. The Bottleneck in Human Letter Recognition: a Computational Model in Vision Sciences Society Annual Meeting 2014 (2014).

  74. 74.

    Testolin, A., Stoianov, I., De Filippo De Grazia, M. & Zorzi, M. Deep unsupervised learning on a desktop PC: a primer for cognitive scientists. Front. Psychol. 4, 251 (2013).

    Article  PubMed Central  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by grants from the European Research Council (no. 210922) and University of Padova (Strategic Grant NEURAT) to M.Z., I.S. was supported by a Marie Curie Intra European Fellowship PIEF-GA-2013-622882 within the 7th Framework Programme. We thank J. McClelland for useful discussions and K. Friston for suggestions on the simulation of the neuroimaging data. No funders had any role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Affiliations

Authors

Contributions

A.T., M.Z. and I.S. conceived the experiments, discussed the results and wrote the paper. A.T. wrote the code and ran the simulations. A.T. and I.S. analysed the data.

Corresponding author

Correspondence to Marco Zorzi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A correction to this article is available online at https://doi.org/10.1038/s41562-017-0253-8.

Electronic supplementary material

Supplementary Information

Supplementary Figures 1–5, Supplementary Table 1, Supplementary Methods, Supplementary Results, Supplementary References

Life Sciences Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Testolin, A., Stoianov, I. & Zorzi, M. Letter perception emerges from unsupervised deep learning and recycling of natural image features. Nat Hum Behav 1, 657–664 (2017). https://doi.org/10.1038/s41562-017-0186-2

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing