Learning as the unsupervised alignment of conceptual systems


Concept induction requires the extraction and naming of concepts from noisy perceptual experience. For supervised approaches, as the number of concepts grows, so does the number of required training examples. Philosophers, psychologists and computer scientists have long recognized that children can learn to label objects without being explicitly taught. In a series of computational experiments, we highlight how information in the environment can be used to build and align conceptual systems. Unlike supervised learning, the learning problem becomes easier the more concepts and systems there are to master. The key insight is that each concept has a unique signature within one conceptual system (for example, images) that is recapitulated in other systems (for example, text or audio). As predicted, children’s early concepts form readily aligned systems.

A preprint version of the article is available at ArXiv.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Different modes of learning.
Fig. 2: Unsupervised linking of conceptual systems via conceptual alignment.
Fig. 3: Alignment correlation versus mapping accuracy for three real-world datasets.
Fig. 4: The impact of multiple conceptual systems on alignment strength.

Data availability

The different source datasets used in this work are publicly available for download. The ImageNet images are available from http://www.image-net.org/. The OpenImages V4 Boxes dataset is available from https://storage.googleapis.com/openimages/web/download_v4.html. The AudioSet dataset is available from https://research.google.com/audioset/download.html. The pretrained GloVe embedding (Common Crawl 840B word tokens) is available from https://nlp.stanford.edu/projects/glove/. The age-of-acquisiton ratings are available from http://crr.ugent.be/papers/AoA_ratings_Kuperman_et_al_BRM.zip. Finally, the concept intersections used in the analysis can be downloaded from https://doi.org/10.17605/OSF.IO/NDRMG.

Code availability

The Python code used to perform the analysis in this work can be downloaded from https://doi.org/10.17605/OSF.IO/NDRMG and is licensed under Apache License 2.0. The code repository for computing image embeddings using the DeepCluster algorithm is located at https://github.com/facebookresearch/deepcluster. DeepCluster is licensed under a Creative Commons Attribution-NonCommercial 4.0 International Public Licence.


  1. 1.

    Fenson, L. et al. Variability in early communicative development. Monographs Soc. Res. Child Dev. 59, 1–185 (1994).

  2. 2.

    Quine, W. V. O. Word and Object (MIT Press, 1960).

  3. 3.

    McMurray, B., Horst, J. S. & Samuelson, L. K. Word learning emerges from the interaction of online referent selection and slow associative learning. Psychol. Rev. 119, 831–877 (2012).

  4. 4.

    Yu, C. & Smith, L. B. Modeling cross-situational word-referent learning: prior questions. Psychol. Rev. 119, 21–39 (2012).

  5. 5.

    Bell, A. J. & Sejnowski, T. J. An information-maximization approach to blind separation and blind deconvolution. Neural Comput. 7, 1129–1159 (1995).

  6. 6.

    Chambers, K. E., Onishi, K. H. & Fisher, C. Infants learn phonotactic regularities from brief auditory experience. Cognition 87, B69–B77 (2003).

  7. 7.

    Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996).

  8. 8.

    Younger, B. A. & Cohen, L. B. Developmental change in infants’ perception of correlations among attributes. Child Dev. 57, 803–815 (1986).

  9. 9.

    Caron, M., Bojanowski, P., Joulin, A. & Douze, M. Deep clustering for unsupervised learning of visual features. In Proceedings of the 15th European Conference on Computer Vision 132–149 (Springer, 2018).

  10. 10.

    Pennington, J., Socher, R. & Manning, C. D. GloVe: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing 1532–1543 (Association for Computational Linguistics, 2014).

  11. 11.

    Tyler, L. K. & Moss, H. E. Towards a distributed account of conceptual knowledge. Trends Cogn. Sci. 5, 244–252 (2001).

  12. 12.

    Martin, C. B., Douglas, D., Newsome, R. N., Man, L. L. & Barense, M. D. Integrative and distinctive coding of visual and conceptual object features in the ventral visual stream. eLife 7, e31873 (2018).

  13. 13.

    de Beeck, H. P. O., Pillet, I. & Ritchie, J. B. Factors determining where category-selective areas emerge in visual cortex. Trends Cogn. Sci. 23, 784–797 (2019).

  14. 14.

    Marks, L. E. The Unity of the Senses: Interrelations among the Modalities (Academic Press, 1978).

  15. 15.

    de Sa, V. R. & Ballard, D. H. Category learning through multimodality sensing. Neural Comput. 10, 1097–1117 (1998).

  16. 16.

    Fazly, A., Alishahi, A. & Stevenson, S. A probabilistic computational model of cross-situational word learning. Cogn. Sci. 34, 1017–1063 (2010).

  17. 17.

    Goodman, N., Tenenbaum, J. B. & Black, M. J. A Bayesian framework for cross-situational word-learning. In Advances in Neural Information Processing Systems 457–464 (NIPS Foundation, 2008).

  18. 18.

    Smith, L. & Yu, C. Infants rapidly learn word-referent mappings via cross-situational statistics. Cognition 106, 1558–1568 (2008).

  19. 19.

    Kiela, D. & Bottou, L. Learning image embeddings using convolutional neural networks for improved multi-modal semantics. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing 36–45 (Association for Computational Linguistics, 2014).

  20. 20.

    Lazaridou, A., Pham, N. T. & Baroni, M. Combining language and vision with a multimodal skip-gram model. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 153–163 (Association for Computational Linguistics, 2015).

  21. 21.

    Ngiam, J. et al. Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning 689–696 (ACM, 2011).

  22. 22.

    Ororbia, A., Mali, A., Kelly, M. & Reitter, D. Like a baby: visually situated neural language acquisition. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 5127–5136 (Association for Computational Linguistics, 2019).

  23. 23.

    Lewis, M., Zettersten, M. & Lupyan, G. Distributional semantics as a source of visual knowledge. Proc. Natl Acad. Sci. USA 116, 19237–19238 (2019).

  24. 24.

    Marr, D. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information (Henry Holt, 1982).

  25. 25.

    Amodio, M. & Krishnaswamy, S. MAGAN: aligning biological manifolds. In Proceedings of the 35th International Conference on Machine Learning 215–223 (PMLR, 2018).

  26. 26.

    Ham, J., Lee, D. D. & Saul, L. K. Semisupervised alignment of manifolds. In Proceedings of the 10th International Workshop Artificial Intelligence and Statistics 120–127 (Society for Artificial Intelligence and Statistics, 2005).

  27. 27.

    Wang, C. & Mahadevan, S. Manifold alignment using procrustes analysis. In Proceedings of the 25th International Conference on Machine Learning 1120–1127 (ACM, 2008).

  28. 28.

    Wang, C. & Mahadevan, S. Heterogeneous domain adaptation using manifold alignment. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence 1541–1546 (AAAI Press, 2011).

  29. 29.

    Shepard, R. N. & Chipman, S. Second-order isomorphism of internal representations: shapes of states. Cogn. Psychol. 1, 1–17 (1970).

  30. 30.

    Kuznetsova, A. et al. The Open Images dataset V4: unified image classification, object detection, and visual relationship detection at scale. Preprint at https://arxiv.org/abs/1811.00982 (2018).

  31. 31.

    Gemmeke, J. F. et al. AudioSet: an ontology and human-labeled dataset for audio events. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing 776–780 (IEEE, 2017).

  32. 32.

    Kuperman, V., Stadthagen-Gonzalez, H. & Brysbaert, M. Age-of-acquisition ratings for 30,000 English words. Behav. Res. Methods 44, 978–990 (2012).

  33. 33.

    Goldfield, B. A. & Reznick, J. S. Early lexical acquisition: rate, content and the vocabulary spurt. J. Child Language 17, 171–183 (1990).

  34. 34.

    Samuelson, L. K. Statistical regularities in vocabulary guide language acquisition in connectionist models and 15–20-month-olds. Dev. Psychol. 38, 1016–1037 (2002).

  35. 35.

    Mervis, C. B. in Emory Symposia in Cognition, 1. Concepts and Conceptual Development: Ecological and Intellectual Factors in Categorization 201–233 (Cambridge Univ. Press, 1987).

  36. 36.

    Jones, S. S., Smith, L. B. & Landau, B. Object properties and knowledge in early lexical learning. Child Dev. 62, 499–516 (1991).

  37. 37.

    Samuelson, L. K. & Smith, L. B. Early noun vocabularies: do ontology, category structure and syntax correspond? Cognition 73, 1–33 (1999).

  38. 38.

    Frank, M. C., Slemmer, J. A., Marcus, G. F. & Johnson, S. P. Information from multiple modalities helps 5-month-olds learn abstract rules. Dev. Sci. 12, 504–509 (2009).

  39. 39.

    Spelke, E. S. & Kinzler, K. D. Core knowledge. Dev. Sci. 10, 89–96 (2007).

  40. 40.

    Ullman, S., Harari, D. & Dorfman, N. From simple innate biases to complex visual concepts. Proc. Natl Acad. Sci. USA 109, 18215–18220 (2012).

  41. 41.

    Gentner, D. Structure-mapping: a theoretical framework for analogy. Cogn. Sci. 7, 155–170 (1983).

  42. 42.

    Holyoak, K. J. & Thagard, P. Analogical mapping by constraint satisfaction. Cogn. Sci. 13, 295–355 (1989).

  43. 43.

    Larkey, L. B. & Love, B. C. CAB: connectionist analogy builder. Cogn. Sci. 27, 781–794 (2003).

  44. 44.

    Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. Preprint at https://arxiv.org/abs/1409.1556 (2015).

  45. 45.

    Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

Download references


This work was supported by NIH grant no. 1P01HD080679, Wellcome Trust Investigator Award no. WT106931MA and a Royal Society Wolfson Fellowship 183029 to B.C.L.

Author information

B.D.R. and B.C.L. conceived the study and analyses. B.D.R. implemented the computational workflow and analyses. B.D.R. and B.C.L. interpreted the results and wrote the manuscript.

Correspondence to Brett D. Roads.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1 and 2.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Roads, B.D., Love, B.C. Learning as the unsupervised alignment of conceptual systems. Nat Mach Intell 2, 76–82 (2020). https://doi.org/10.1038/s42256-019-0132-2

Download citation