Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Learning as the unsupervised alignment of conceptual systems

A preprint version of the article is available at arXiv.


Concept induction requires the extraction and naming of concepts from noisy perceptual experience. For supervised approaches, as the number of concepts grows, so does the number of required training examples. Philosophers, psychologists and computer scientists have long recognized that children can learn to label objects without being explicitly taught. In a series of computational experiments, we highlight how information in the environment can be used to build and align conceptual systems. Unlike supervised learning, the learning problem becomes easier the more concepts and systems there are to master. The key insight is that each concept has a unique signature within one conceptual system (for example, images) that is recapitulated in other systems (for example, text or audio). As predicted, children’s early concepts form readily aligned systems.

Your institute does not have access to this article

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Different modes of learning.
Fig. 2: Unsupervised linking of conceptual systems via conceptual alignment.
Fig. 3: Alignment correlation versus mapping accuracy for three real-world datasets.
Fig. 4: The impact of multiple conceptual systems on alignment strength.

Data availability

The different source datasets used in this work are publicly available for download. The ImageNet images are available from The OpenImages V4 Boxes dataset is available from The AudioSet dataset is available from The pretrained GloVe embedding (Common Crawl 840B word tokens) is available from The age-of-acquisiton ratings are available from Finally, the concept intersections used in the analysis can be downloaded from

Code availability

The Python code used to perform the analysis in this work can be downloaded from and is licensed under Apache License 2.0. The code repository for computing image embeddings using the DeepCluster algorithm is located at DeepCluster is licensed under a Creative Commons Attribution-NonCommercial 4.0 International Public Licence.


  1. Fenson, L. et al. Variability in early communicative development. Monographs Soc. Res. Child Dev. 59, 1–185 (1994).

    Article  Google Scholar 

  2. Quine, W. V. O. Word and Object (MIT Press, 1960).

  3. McMurray, B., Horst, J. S. & Samuelson, L. K. Word learning emerges from the interaction of online referent selection and slow associative learning. Psychol. Rev. 119, 831–877 (2012).

    Article  Google Scholar 

  4. Yu, C. & Smith, L. B. Modeling cross-situational word-referent learning: prior questions. Psychol. Rev. 119, 21–39 (2012).

    Article  Google Scholar 

  5. Bell, A. J. & Sejnowski, T. J. An information-maximization approach to blind separation and blind deconvolution. Neural Comput. 7, 1129–1159 (1995).

    Article  Google Scholar 

  6. Chambers, K. E., Onishi, K. H. & Fisher, C. Infants learn phonotactic regularities from brief auditory experience. Cognition 87, B69–B77 (2003).

    Article  Google Scholar 

  7. Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996).

    Article  Google Scholar 

  8. Younger, B. A. & Cohen, L. B. Developmental change in infants’ perception of correlations among attributes. Child Dev. 57, 803–815 (1986).

    Article  Google Scholar 

  9. Caron, M., Bojanowski, P., Joulin, A. & Douze, M. Deep clustering for unsupervised learning of visual features. In Proceedings of the 15th European Conference on Computer Vision 132–149 (Springer, 2018).

  10. Pennington, J., Socher, R. & Manning, C. D. GloVe: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing 1532–1543 (Association for Computational Linguistics, 2014).

  11. Tyler, L. K. & Moss, H. E. Towards a distributed account of conceptual knowledge. Trends Cogn. Sci. 5, 244–252 (2001).

    Article  Google Scholar 

  12. Martin, C. B., Douglas, D., Newsome, R. N., Man, L. L. & Barense, M. D. Integrative and distinctive coding of visual and conceptual object features in the ventral visual stream. eLife 7, e31873 (2018).

    Article  Google Scholar 

  13. de Beeck, H. P. O., Pillet, I. & Ritchie, J. B. Factors determining where category-selective areas emerge in visual cortex. Trends Cogn. Sci. 23, 784–797 (2019).

    Article  Google Scholar 

  14. Marks, L. E. The Unity of the Senses: Interrelations among the Modalities (Academic Press, 1978).

  15. de Sa, V. R. & Ballard, D. H. Category learning through multimodality sensing. Neural Comput. 10, 1097–1117 (1998).

    Article  Google Scholar 

  16. Fazly, A., Alishahi, A. & Stevenson, S. A probabilistic computational model of cross-situational word learning. Cogn. Sci. 34, 1017–1063 (2010).

    Article  Google Scholar 

  17. Goodman, N., Tenenbaum, J. B. & Black, M. J. A Bayesian framework for cross-situational word-learning. In Advances in Neural Information Processing Systems 457–464 (NIPS Foundation, 2008).

  18. Smith, L. & Yu, C. Infants rapidly learn word-referent mappings via cross-situational statistics. Cognition 106, 1558–1568 (2008).

    Article  Google Scholar 

  19. Kiela, D. & Bottou, L. Learning image embeddings using convolutional neural networks for improved multi-modal semantics. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing 36–45 (Association for Computational Linguistics, 2014).

  20. Lazaridou, A., Pham, N. T. & Baroni, M. Combining language and vision with a multimodal skip-gram model. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 153–163 (Association for Computational Linguistics, 2015).

  21. Ngiam, J. et al. Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning 689–696 (ACM, 2011).

  22. Ororbia, A., Mali, A., Kelly, M. & Reitter, D. Like a baby: visually situated neural language acquisition. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 5127–5136 (Association for Computational Linguistics, 2019).

  23. Lewis, M., Zettersten, M. & Lupyan, G. Distributional semantics as a source of visual knowledge. Proc. Natl Acad. Sci. USA 116, 19237–19238 (2019).

    Article  Google Scholar 

  24. Marr, D. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information (Henry Holt, 1982).

  25. Amodio, M. & Krishnaswamy, S. MAGAN: aligning biological manifolds. In Proceedings of the 35th International Conference on Machine Learning 215–223 (PMLR, 2018).

  26. Ham, J., Lee, D. D. & Saul, L. K. Semisupervised alignment of manifolds. In Proceedings of the 10th International Workshop Artificial Intelligence and Statistics 120–127 (Society for Artificial Intelligence and Statistics, 2005).

  27. Wang, C. & Mahadevan, S. Manifold alignment using procrustes analysis. In Proceedings of the 25th International Conference on Machine Learning 1120–1127 (ACM, 2008).

  28. Wang, C. & Mahadevan, S. Heterogeneous domain adaptation using manifold alignment. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence 1541–1546 (AAAI Press, 2011).

  29. Shepard, R. N. & Chipman, S. Second-order isomorphism of internal representations: shapes of states. Cogn. Psychol. 1, 1–17 (1970).

    Article  Google Scholar 

  30. Kuznetsova, A. et al. The Open Images dataset V4: unified image classification, object detection, and visual relationship detection at scale. Preprint at (2018).

  31. Gemmeke, J. F. et al. AudioSet: an ontology and human-labeled dataset for audio events. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing 776–780 (IEEE, 2017).

  32. Kuperman, V., Stadthagen-Gonzalez, H. & Brysbaert, M. Age-of-acquisition ratings for 30,000 English words. Behav. Res. Methods 44, 978–990 (2012).

    Article  Google Scholar 

  33. Goldfield, B. A. & Reznick, J. S. Early lexical acquisition: rate, content and the vocabulary spurt. J. Child Language 17, 171–183 (1990).

    Article  Google Scholar 

  34. Samuelson, L. K. Statistical regularities in vocabulary guide language acquisition in connectionist models and 15–20-month-olds. Dev. Psychol. 38, 1016–1037 (2002).

    Article  Google Scholar 

  35. Mervis, C. B. in Emory Symposia in Cognition, 1. Concepts and Conceptual Development: Ecological and Intellectual Factors in Categorization 201–233 (Cambridge Univ. Press, 1987).

  36. Jones, S. S., Smith, L. B. & Landau, B. Object properties and knowledge in early lexical learning. Child Dev. 62, 499–516 (1991).

    Article  Google Scholar 

  37. Samuelson, L. K. & Smith, L. B. Early noun vocabularies: do ontology, category structure and syntax correspond? Cognition 73, 1–33 (1999).

    Article  Google Scholar 

  38. Frank, M. C., Slemmer, J. A., Marcus, G. F. & Johnson, S. P. Information from multiple modalities helps 5-month-olds learn abstract rules. Dev. Sci. 12, 504–509 (2009).

    Article  Google Scholar 

  39. Spelke, E. S. & Kinzler, K. D. Core knowledge. Dev. Sci. 10, 89–96 (2007).

    Article  Google Scholar 

  40. Ullman, S., Harari, D. & Dorfman, N. From simple innate biases to complex visual concepts. Proc. Natl Acad. Sci. USA 109, 18215–18220 (2012).

    Article  Google Scholar 

  41. Gentner, D. Structure-mapping: a theoretical framework for analogy. Cogn. Sci. 7, 155–170 (1983).

    Article  Google Scholar 

  42. Holyoak, K. J. & Thagard, P. Analogical mapping by constraint satisfaction. Cogn. Sci. 13, 295–355 (1989).

    Article  Google Scholar 

  43. Larkey, L. B. & Love, B. C. CAB: connectionist analogy builder. Cogn. Sci. 27, 781–794 (2003).

    Article  Google Scholar 

  44. Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. Preprint at (2015).

  45. Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

Download references


This work was supported by NIH grant no. 1P01HD080679, Wellcome Trust Investigator Award no. WT106931MA and a Royal Society Wolfson Fellowship 183029 to B.C.L.

Author information

Authors and Affiliations



B.D.R. and B.C.L. conceived the study and analyses. B.D.R. implemented the computational workflow and analyses. B.D.R. and B.C.L. interpreted the results and wrote the manuscript.

Corresponding author

Correspondence to Brett D. Roads.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1 and 2.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Roads, B.D., Love, B.C. Learning as the unsupervised alignment of conceptual systems. Nat Mach Intell 2, 76–82 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing