Real-world face recognition requires us to perceive the uniqueness of a face across variable images. Deep convolutional neural networks (DCNNs) accomplish this feat by generating robust face representations that can be analysed in a multidimensional ‘face space’. We examined the organization of viewpoint, illumination, gender and identity in this space. We found that DCNNs create a highly organized face similarity structure in which identities and images coexist. Natural image variation is organized hierarchically, with face identity nested under gender, and illumination and viewpoint nested under identity. To examine identity, we caricatured faces and found that identification accuracy increased with the strength of identity information in a face, and caricature representations ‘resembled’ their veridical counterparts—mimicking human perception. DCNNs therefore offer a theoretical framework for reconciling decades of behavioural and neural results that emphasized either the image or the face in representations, without understanding how a neural code could seamlessly accommodate both.
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
All data used for analysis are available via the Open Science Framework at https://osf.io/ebvys/.
All of the code used for plotting and analysis is available via the Open Science Framework at https://osf.io/ebvys/.
Marr, D. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information (MIT Press, 1982).
Brunelli, R. & Poggio, T. Face recognition: features versus templates. IEEE Trans. Pattern Anal. Mach. Intell.15, 1042–1052 (1993).
Riesenhuber, M. & Poggio, T. Hierarchical models of object recognition in cortex. Nat. Neurosci.2, 1019–1025 (1999).
Bülthoff, H. H. & Edelman, S. Psychophysical support for a two-dimensional view interpolation theory of object recognition. Proc. Natl Acad. Sci. USA89, 60–64 (1992).
Yuille, A. L. Deformable templates for face recognition. J. Cogn. Neurosci.3, 59–70 (1991).
Biederman, I. Recognition-by-components: a theory of human image understanding. Psychol. Rev.94, 115–147 (1987).
Poggio, T. & Edelman, S. A network that learns to recognize three-dimensional objects. Nature343, 263–266 (1990).
Turk, M. & Pentland, A. Eigenfaces for recognition. J. Cogn. Neurosci.3, 71–86 (1991).
Valentine, T. A unified account of the effects of distinctiveness, inversion and race in face recognition. Q. J. Exp. Psychol. A43, 161–204 (1991).
Troje, N. F. & Bülthoff, H. H. Face recognition under varying poses: the role of texture and shape. Vision Res.36, 1761–1772 (1996).
O’Toole, A. J., Abdi, H., Deffenbacher, K. A. & Valentin, D. Low-dimensional representation of faces in higher dimensions of the face space. J. Opt. Soc. Am. A10, 405–411 (1993).
O’Toole, A. J., Deffenbacher, K. A., Valentin, D. & Abdi, H. Structural aspects of face recognition and the other-race effect. Mem. Cognit.22, 208–224 (1994).
Nestor, A., Plaut, D. C. & Behrmann, M. Feature-based face representations and image reconstruction from behavioral and neural data. Proc. Natl Acad. Sci. USA113, 416–421 (2016).
Blanz, V. & Vetter, T. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques 187–194 (ACM Press/Addison-Wesley, 1999).
Benson, P. J. & Perrett, D. I. Perception and recognition of photographic quality facial caricatures: implications for the recognition of natural images. Eur. J. Cogn. Psychol.3, 105–135 (1991).
Benson, P. J. & Perrett, D. I. Visual processing of facial distinctiveness. Perception23, 75–93 (1994).
Byatt, G. & Rhodes, G. Recognition of own-race and other-race caricatures: implications for models of face recognition. Vision Res.38, 2455–2468 (1998).
Lee, K., Byatt, G. & Rhodes, G. Caricature effects, distinctiveness and identification: testing the face-space framework. Psychol. Sci.11, 379–385 (2000).
Rhodes, G., Byatt, G., Tremewan, T. & Kennedy, A. Facial distinctiveness and the power of caricatures. Perception26, 207–223 (1997).
Rhodes, G., Brennan, S. & Carey, S. Identification and ratings of caricatures: implications for mental representations of faces. Cogn. Psychol.19, 473–497 (1987).
Lowe, D.G. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference onComputer Vision Vol. 2, 1150–1157 (IEEE, 1999).
Dalal, N. & Triggs, B. Histograms of oriented gradients for human detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005 Vol. 1, 886–893 (IEEE, 2005).
Ojala, T., Pietikainen, M. & Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell.24, 971–987 (2002).
Riesenhuber, M. & Poggio, T. Models of object recognition. Nat. Neurosci.3, 1199–1204 (2000).
Moghaddam, B., Jebara, T. & Pentland, A. Bayesian face recognition. Pattern Recognition33, 1771–1782 (2000).
Taigman, Y., Yang, M., Ranzato, M. & Wolf, L. Deepface: closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 1701–1708 (IEEE, 2014).
Sankaranarayanan, S., Alavi, A., Castillo, C. & Chellappa, R. Triplet probabilistic embedding for face verification and clustering. In Proceedings of the IEEE International Conference on Biometrics Theory, Applications and Systems 1–8 (IEEE, 2016).
Schroff, F., Kalenichenko, D. & Philbin, J. Facenet: a unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 815–823 (IEEE, 2015).
Chen, J.-C. et al. An end-to-end system for unconstrained face verification with deep convolutional neural networks. In Proceedings of the IEEE International Conference on Computer Vision Workshops 118–126 (IEEE, 2015).
Ranjan, R., Sankaranarayanan, S., Castillo, C. D. & Chellappa, R. An all-in-one convolutional neural network for face analysis. In 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017) 17–24 (IEEE, 2017).
Fukushima, K. Neocognitron: a hierarchical neural network capable of visual pattern recognition. Neural Netw.1, 119–130 (1988).
KrizhevskyA., SutskeverI. & HintonG. E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Proc. Sÿst.25, 1097–1105 (2012).
Parde, C. J. et al. Face and image representation in deep CNN features. In 12th IEEE International Conference onAutomatic Face and Gesture Recognition (FG 2017) 673–680 (IEEE, 2017).
O’TooleA. J., CastilloC. D., PardeC. J., HillM. Q. & ChellappaR. Face space representations in deep convolutional neural networks. Trends Cogn. Sci.22, 794–809 (2018).
DiCarlo, J. J. & Cox, D. D. Untangling invariant object recognition. Trends Cogn. Sci.11, 333–341 (2007).
Hong, H., Yamins, D. L., Majaj, N. J. & DiCarlo, J. J. Explicit information for category-orthogonal object properties increases along the ventral stream. Nat. Neurosci.19, 613–622 (2016).
Yamins, D. L. & DiCarlo, J. J. Using goal-driven deep learning models to understand sensory cortex. Nat. Neurosci.19, 356–365 (2016).
Brennan, S. E. Caricature generator: the dynamic exaggeration of faces by computer. Leonardo18, 170–178 (1985).
Rhodes, G. Superportraits: Caricatures and Recognition (Psychology Press, 1997).
Leopold, D. A., O’Toole, A. J., Vetter, T. & Blanz, V. Prototype-referenced shape encoding revealed by high-level aftereffects. Nat. Neurosci.4, 89–94 (2001).
Van Der Maaten, L. Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res.15, 3221–3245 (2014).
Grill-Spector, K. & Weiner, K. S. The functional architecture of the ventral temporal cortex and its role in categorization. Nat. Rev. Neurosci.15, 536–548 (2014).
Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M. & Boyes-Braem, P. Basic objects in natural categories. Cogn. Psychol.8, 382–439 (1976).
EberhardtS., CaderJ. G. & SerreT. How deep is the feature analysis underlying rapid visual categorization? Adv. Neural Inf. Proc. Syst.29, 1100–1108 (2016).
Kietzmann, T. C. et al. The occipital face area is causally involved in facial viewpoint perception. J. Neurosci.35, 16398–16403 (2015).
Natu, V. S. et al. Dissociable neural patterns of facial identity across changes in viewpoint. J. Cogn. Neurosci.22, 1570–1582 (2010).
Grill-Spector, K. et al. Differential processing of objects under various viewing conditions in the human lateral occipital complex. Neuron24, 187–203 (1999).
Yue, X., Cassidy, B. S., Devaney, K. J., Holt, D. J. & Tootell, R. B. Lower-level stimulus features strongly influence responses in the fusiform face area. Cerebral Cortex21, 35–47 (2010).
Kay, K. N., Weiner, K. S. & Grill-Spector, K. Attention reduces spatial uncertainty in human ventral temporal cortex. Curr. Biol.25, 595–600 (2015).
Szegedy, C. et al. Intriguing properties of neural networks. Preprint at https://arxiv.org/abs/1312.6199 (2013).
Bansal, A., Castillo, C. D., Ranjan, R. & Chellappa, R. The do’s and don’ts for CNN-based face verification. In ICCV Workshops 2545–2554 (IEEE, 2017).
Ranjan, R. et al. A Fast and Accurate System for Face Detection, Identification, and Verification. In Proceedings of the IEEE Transactions on Biometrics, Behavior, and Identity Science 82–96 (IEEE, 2019)
Chen, J.-C., Patel, V. M. & Chellappa, R. Unconstrained face verification using deep CNN features. In IEEE Winter Conference on Applications of Computer Vision (WACV) 1–9 (IEEE, 2016).
Bansal, A., Nanduri, A., Castillo, C. D., Ranjan, R. & Chellappa, R. UMDFaces: an annotated face dataset for training deep networks. In IEEE International Joint Conference on Biometrics (IJCB) 464–473 (IEEE, 2017).
Guo, Y., Zhang, L., Hu, Y., He, X. & Gao, J. MS-Celeb-1M: a dataset and benchmark for large-scale face recognition. In European Conference on Computer Vision 87–102 (Springer, 2016).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res.9, 2579–2605 (2008).
Wattenberg, M., Viégas, F. & Johnson, I. How to use t-SNE effectively. Distill1, e2 (2016).
This work had funding support from the Intelligence Advanced Research Projects Activity (IARPA). This research is based on work supported by the Office of the Director of National Intelligence (ODNI) and IARPA (via R&D contract no. 2014-14071600012). The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA or the US Government.
University of Maryland has filed a US patent application that covers portions of network A. R.R. and C.D.C. are co-inventors on this patent.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Hill, M.Q., Parde, C.J., Castillo, C.D. et al. Deep convolutional neural networks in the face of caricature. Nat Mach Intell 1, 522–529 (2019). https://doi.org/10.1038/s42256-019-0111-7
Journal of Vision (2021)
Vision Research (2021)
Trends in Cognitive Sciences (2020)
ACM Transactions on Graphics (2020)