The role of semantics in the perceptual organization of shape

Schmidt, Filipp; Kleis, Jasmin; Morgenstern, Yaniv; Fleming, Roland W.

doi:10.1038/s41598-020-79072-w

Download PDF

Article
Open access
Published: 17 December 2020

The role of semantics in the perceptual organization of shape

Filipp Schmidt^1,2,
Jasmin Kleis¹,
Yaniv Morgenstern¹ &
…
Roland W. Fleming^1,2

Scientific Reports volume 10, Article number: 22141 (2020) Cite this article

1590 Accesses
3 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Establishing correspondence between objects is fundamental for object constancy, similarity perception and identifying transformations. Previous studies measured point-to-point correspondence between objects before and after rigid and non-rigid shape transformations. However, we can also identify ‘similar parts’ on extremely different objects, such as butterflies and owls or lizards and whales. We measured point-to-point correspondence between such object pairs. In each trial, a dot was placed on the contour of one object, and participants had to place a dot on ‘the corresponding location’ of the other object. Responses show correspondence is established based on similarities between semantic parts (such as head, wings, or legs). We then measured correspondence between ambiguous objects with different labels (e.g., between ‘duck’ and ‘rabbit’ interpretations of the classic ambiguous figure). Despite identical geometries, correspondences were different across the interpretations, based on semantics (e.g., matching ‘Head’ to ‘Head’, ‘Tail’ to ‘Tail’). We present a zero-parameter model based on labeled semantic part data (obtained from a different group of participants) that well explains our data and outperforms an alternative model based on contour curvature. This demonstrates how we establish correspondence between very different objects by evaluating similarity between semantic parts, combining perceptual organization and cognitive processes.

Skeletal descriptions of shape provide unique perceptual information for object recognition

Article Open access 27 June 2019

Common spatiotemporal processing of visual features shapes object representation

Article Open access 20 May 2019

Revealing the multidimensional mental representations of natural objects underlying human similarity judgements

Article 12 October 2020

Introduction

We live in a world in which our survival depends critically on successful interactions with objects. This requires inferring an object’s properties—such as its material, potential usages, dangerousness and so on. We mostly infer these properties from our previous experiences about other objects from the same or similar class. For example, what we know about a peacock butterfly (e.g., fragile, able to fly, nectar eater, harmless), we can use to make inferences about other butterfly varieties. Only by constantly making such inferences, can we interact with objects in our environment without having to learn the properties of each newly encountered object de novo^1,2,3,4,5. As object shape is arguably the most important cue for object recognition and concept learning (e.g.,^4,6,7,8), shape presumably plays a major role in this generalization. In other words, we assume that peacock and lemon butterflies have similar properties because they have broadly similar shapes. Here, we consider a specific measure of the relationships between shapes: our striking ability to identify point-to-point correspondences between objects (Fig. 1;^9,10).

Previous studies showed that perceived correspondence between object shapes is to some extent robust against changes in viewpoint and more complex transformations^{11,12,13,14,15,16,17}. This is also true at the level of point-to-point correspondence, that is, when identifying the corresponding points on the surfaces of two objects (e.g.,^{18,19,20,21,22}). In previous work, we found humans were very good at solving the point-to-point correspondence problem for 2D shapes across classes of rigid (e.g., rotation) and non-rigid transformations (e.g., growing new limbs)^9,10, with high levels of agreement to the ground truth and other observers.

A simple heuristics model based on contour curvature, however, was better at predicting human responses than the ground-truth⁹. The heuristic model assumes observers identify salient locations on the original contour (e.g., a spike and a bump on an otherwise smooth contour) and then find the corresponding salient regions on the transformed contour (e.g., a spike and a bump on the rotated contour). Finally, observers establish point-to-point correspondence for intermediate locations on the shapes relative to these salient points. For example, if a particular location on the original contour is located halfway between the spike and the bump, they will choose as corresponding location that one halfway between spike and bump along the rotated object contour.

Such image-based heuristic approaches cannot, however, explain all point-to-point correspondences between objects. Shapes will often be very different making it near impossible to establish correspondence based on their geometrical features alone (e.g., curvature profiles). An alternative that we hypothesize here is that humans can also solve correspondence tasks by combining shape and semantic information. For example, in Fig. 1, it is hard to reconcile our intuitions about correspondences between the hands in A, B, and C with a correspondence based on geometrical features alone. Rather, we seem to use our knowledge about the semantic organization of the hand (‘the point lies on the knuckle of the thumb’) to guide our responses. Previous studies with unfamiliar objects show that correspondence can be established without such semantic information. In other words, semantics are not necessary for determining correspondence. Yet, it seems plausible that—if available—high-level semantic information facilitates spatial correspondence judgements. Indeed, here we test whether semantic cues are sufficient to override geometrical similarities between objects.

It is well known that objects are not only perceived in terms of their overall shape, but also in terms of their parts^23,24,25. Accordingly, observers might establish correspondence between very different shapes by relying on semantically labelled parts (e.g., the wings or legs of a butterfly). Specifically, we might segment objects into recognizable parts, such as legs, wings or tails, that can be matched across objects, and then use broadly the same heuristic as described above to interpolate between these sparse correspondences⁹. The key difference to previous work is that, rather than defining the salient regions that form the anchor points for correspondence by local geometrical features alone, those regions are instead defined by the semantic parts. This would allow observers to identify point-to-point contour correspondences even if contour shapes differ wildly. For example, if presented with an elephant and anteater (Fig. 2A), with geometrically very different outlines, observers would be able to match up the trunks of the two shapes and work out the correspondence between any given point based on its relative position along the trunk’s outline.

To test this hypothesis that humans establish correspondence between very different shapes based on the perceptual organization of the shapes together with previous knowledge about semantic parts, we obtained point-to-point correspondence judgments for contours with different shapes but similar part organizations (6 pairs of animal shapes, Experiment 1; Fig. 2) as well as for contours with the same shapes but different part organization (5 animal shapes with ambiguous interpretations, Experiment 2; Fig. 5). For the first set of contours (Experiment 1), it is very difficult to establish point-to-point correspondence based on shape features alone as shapes are very different. For the second set of contours (Experiment 2), it is impossible to use shape features at all as the contours are geometrically identical and differ only in their interpretation. Thus, Experiment 2 is designed to test the role of semantic part organization in establishing correspondence in the extreme. Specifically, by holding shape constant, the experiment tests whether semantics are sufficient to override purely geometrical factors in determining correspondence between shapes. Across both experiments, we can test to what extent observers agree in their correspondence responses under these challenging conditions, and whether we can explain their responses by a model based on part organization and semantic correspondence. For comparison, we contrast it with a simple model based on uniform sampling around the contour as well as with a model based on shape features.

Together with previous studies illustrating the role of shape features for correspondence between unfamiliar stimuli with no semantic part organization, this would show that depending on the available information human observers will flexibly rely on either perceptual or cognitive processes to establish correspondences. Specifically, in this paper we aim to demonstrate how we establish correspondence between very different objects by evaluating similarity between semantic parts, combining perceptual organization and cognitive processes.

Experiments

Experiment 1: Different geometry, similar parts

Participants

15 students from Justus-Liebig-University Giessen, Germany, with normal or corrected vision participated in the experiment for financial compensation (11 w, 4 m, mean age = 22.5 years, SD = 2.9). This number is based on our previous work using the same paradigm¹⁰. All participants gave informed consent, were debriefed after the experiment, and treated according to the ethical guidelines of the American Psychological Association. All testing procedures were approved by the ethics board at Justus-Liebig-University Giessen and were carried out in accordance with the Code of Ethics of the World Medical Association (Declaration of Helsinki).

Stimuli

Stimuli were 6 pairs of 2D contours (Fig. 2) that were chosen to have different shapes but similar part organization (e.g., same number of legs or wings). For each of the base shapes (left shapes in Fig. 2A–F), we defined 50 probe locations by sampling the contour at equidistant intervals starting from a random position on the contour.

Procedure

Before the start of the experiment, participants were handed a written instruction with an outline of the experiment and the literal instruction ‘Your task is to find the correspondence between the red dot on the left contour and the dot on the right contour. You can move a green dot with the mouse and confirm your selection with a mouse click. Please do not take any breaks while responding to a pair! You can take as much time for each decision as you need for a confident choice. Make sure to work thoroughly!’ We also present the Elephant–Anteater pair as an example (Fig. 3A), with a red dot on the elephant contour and an unplaced green dot in the white space next to the anteater. In response to questions about how to perform the task, the experimenter replied that there was no right or wrong answer.

In each trial of the experiment, participants were presented with one of the shape pairs (Fig. 2A–F). The base shape (e.g., elephant) was presented on the left side and the test shape (e.g., anteater) on the right side of the screen. We successively presented probe points on the contour of the base shape and asked participants to use the mouse to place a small bullseye marker ‘at the corresponding location’ on the contour of the test shape (Fig. 3A). The probe point was a small red dot (0.10° radius) and the bullseye was a small green dot (0.10° radius) surrounded by a ring (0.75° radius). After participants confirmed their choice with a mouse click, the probe point was replaced by a probe at a different location, where the location was selected across 50 preselected locations. In addition to presenting the points one by one, the locations were also presented in a random order to minimize the influence of a participant’s previous decision from nearby locations. Each pair of shapes remained on screen until participants responded to all 50 probe points, enabling a dense mapping of perceived correspondences. Each participant responded to each of the stimulus pairs and to the same probe points; across participants pairs were presented in random order. Finally, base and test shapes were either presented in the same orientation (as in Fig. 2) or in different orientation (e.g., elephant and anteater facing each other), with orientation counterbalanced across pairs and participants.

Stimuli were presented on a white background on a EIZO CG277 monitor at a resolution of 2560 × 1440 pixels and a monitor refresh rate of 59 Hz, controlled by MATLAB2018a (The MathWorks, Inc., Natick, Massachusetts, United States) using the Psychophysics Toolbox extension²⁶. The two shapes of each pair were uniformly scaled so that their bounding boxes had the same area. The width of the resulting shapes varied between 15.33° and 22.99° of visual angle, and the height varied between 4.42° and 27.81° of visual angle (with a distance between participants and monitor of about 50 cm).

Analysis

Note that for our stimuli there is no ground truth or mathematically correct solution to map the base to the test shape. Consequently, we analyze the results with respect to the extent to which participants agree with each other, and test how well this agreement can be explained by different models (random, shape-based and semantic-based models).

Results and discussion

Results of Experiment 1 are plotted in Fig. 4. Responses of participants are highly systematic with generally (i) high agreement between participants, (ii) well-preserved ordering of corresponding locations, and (iii) corresponding locations on similar semantic parts of base and test shape (e.g., probe points on the elephant’s trunk are matched with locations on the anteater’s snout; see Fig. 4A and Sect. Modeling).

With respect to (i), we quantify agreement between participants as response congruity on a scale from 0, indicating congruity of random responses, up to 1, indicating perfect congruity. Specifically, for each probe location on the base shape, we calculated the average of distances between all participants’ responses along the contour of the test shape (distances expressed as a percentage of test shape perimeter). Thus, congruity refers to the spatial proximity between all responses from the same probe point. We calculate the grand mean of these average distances for all probe points, obtaining a single congruity score for each pair of shapes. Finally, we project that score onto a continuum between random and perfect congruity by subtracting 1 from the ratio of mean of distances to randomly placed responses on the test shape, where a value of 1 refers to perfect congruity with all responses at the very same location (i.e., zero distance between responses of different participants). The results showed that for all pairs, participants are significantly more congruent than the random model (pairs A–F: 0.90, 0.86, 0.90, 0.88, 0.48, 0.71; Wilcoxon signed rank test: − 5.61 < Z < − 6.15, all p < 0.001).

Why are participants considerably less consistent for the Butterfly–Owl pair (0.48) than for the other tested pairs? We reasoned that this resulted from the ambiguous 3D orientation that allows seeing the animals as viewed from the front or back, rendering correspondence ambiguous too. In line with this idea, individual participants either arranged their responses on the owl in the same (n = 8; median responses of this group: inset 1 in Fig. 4E) or reversed order (n = 7; inset 2 in Fig. 4E) as the probe points on the butterfly, with congruity of 0.76 and 0.81 within both groups. This pattern of responses might be to some extent explained by the presentation of butterfly and owl either heading in the same direction (as depicted in Fig. 2E) or in different directions. Indeed, the majority of participants presented with butterfly and owl heading in the same direction also arranged responses in the same order (5 out of 7), while different heading directions rather tended to produce reversed orderings (5 out of 8).

With respect to (ii), we quantify the extent to which responses preserved the ordering points on the test shape. For this, we calculated how often the ordering of the median responses reversed (i.e., how often a given dot was flanked by different dots on the test shape than on the base shape). We compared the number of reversals with the mean number of reversals that occur with the same number of random responses. The results of this analysis revealed that ordering was preserved significantly more in the participants’ median responses than in the random model (preserved order in pairs A–F: 100%, 80%, 94%, 92%, 16%, 62%; Wilcoxon signed rank test: − 2.23 < Z < − 6.94, p < 0.026). Again, the preservation of ordering is much higher for the Butterfly–Owl pair when considering the two groups separately based on their perceived orientation (Fig. 4E, inset 1: 56%; inset 2: 68%). Regarding (iii), a formal quantification of corresponding locations on similar semantic parts is presented in the Results and Discussion of Experiment 3.

Together, these analyses provide an initial indication that participants can consistently identify correspondences across quite widely divergent shapes. This consistency between participants in establishing correspondences might be explained by common strategies based on contour curvature⁹ or based on semantic part organization (e.g., elephant’s trunk and anteater’s snout). In Experiment 2, we sought to further test to what extent semantic part organization can be used to establish correspondence, by testing stimulus pairs of ambiguous shapes that were physically identical but could be interpreted with different semantic part organizations.