Forming cognitive maps for abstract spaces: the roles of the human hippocampus and orbitofrontal cortex

How does the human brain construct cognitive maps for decision-making and inference? Here, we conduct an fMRI study on a navigation task in multidimensional abstract spaces. Using a deep neural network model, we assess learning levels and categorized paths into exploration and exploitation stages. Univariate analyses show higher activation in the bilateral hippocampus and lateral prefrontal cortex during exploration, positively associated with learning level and response accuracy. Conversely, the bilateral orbitofrontal cortex (OFC) and retrosplenial cortex show higher activation during exploitation, negatively associated with learning level and response accuracy. Representational similarity analysis show that the hippocampus, entorhinal cortex, and OFC more accurately represent destinations in exploitation than exploration stages. These findings highlight the collaboration between the medial temporal lobe and prefrontal cortex in learning abstract space structures. The hippocampus may be involved in spatial memory formation and representation, while the OFC integrates sensory information for decision-making in multidimensional abstract spaces.

were assigned as the x-, y-, and z-axes, respectively.The coordinate of a location was represented as (x) in a 1D space, (x, y) in a 2D space, and (x, y, z) in a 3D space.For example, the coordinate of the first location was (1) in 1D space, (1,1) in 2D space, and (1, 1, 1) in 3D space.Working memory.The spatial n-back task 4,5 was used to measure the subjects' working memory.In each trial, a square was presented in one of eight positions (Fig.

S1-1b
) in the screen for 500 ms.The subjects were asked to judge whether the position of the square was the same as that of the n th trials before within 2,500 ms (Fig.

S1-1a
).Four levels of the n-back task (1-, 2-, 3-, and 4-back) were used following the procedure shown in Fig. S1-1c.In each level, the subjects first practiced until they thought they were ready and then completed 30 formal judgements.They entered the next level of practice if the response accuracy (RA) was higher than 60%.Otherwise, the task ended.
Discriminability tests.Following the procedure of Psychophysics Measure 6 , we assessed the subjects' ability to discriminate between the sizes of geometrical shapes (circle or oval), lengths of lines (vertical or horizontal), and the angles between two crossing lines (Fig. S1-2a).In each trial, two stimuli (either two circles, ovals, vertical lines, horizontal lines, or angles) were displayed on the two sides of the screen.The subjects were requested to select the one with larger parameter (size, longer length, or larger angle).The parameter of the left-side stimulus was fixed, whereas the parameter of the right-side stimulus changed according to the following rules.Two sequences, ascending and descending, were set.In the ascending sequence, the size of the right-side stimulus was first set smaller than the left-side stimulus and then increased in each of the following trials until the subjects thought the right stimulus was equal to or larger than the left-side one.In the descending sequence, the size of  Rule learning.The rule learning task was used to measure the subjects' ability of learning from feedback.The subjects were instructed to learn the rank of six colors from feedback (Fig. S1-2b).In each trial, two squares with different colors were showed on the screen.The subjects needed to guess and to select the one with the higher rank.Feedback about whether their choice was correct or wrong was displayed on the screen after each selection.When the subjects thought that they had acquired the ranks of the six colors or after they completed 100 trials of learning, they stopped the trials and sorted the colors in sequence.
chose the optimal option within 3,000 ms (Fig. S2-2).In addition, to save time, the interval between the presentation of the destination and the options was narrowed to 2 s, and the fixation cross period between each path and the ITI was fixed at 1 s.On Day 1, all the subjects were trained to perform a navigation task in 1D and 3D spaces.First, we described the experimental procedure to the subjects with the assistance of slides.The description was only relevant to the task procedure, including the items displayed on the screen and the operations that the subjects need to perform.
The subjects did not receive any information about the structure of the abstract spaces.
Second, the subjects were asked to complete a navigation task including ten paths in the 1D abstract space (Fig. S2-1) and in the 3D space (Fig. S2-2).After the task, we asked the subjects to describe the task procedure and their strategy used in the task.
The subjects were asked to repeat the task if they obtained fewer than six goals (obtained less than six coins) in either space.One subject misunderstood the task instruction and repeated the navigation task in the 3D space once.All of the other subjects met the requirement after the first task in the two spaces.In general, the total practice time for each subject was about 4-8 minutes.
On Day 2, the subjects completed a navigation task in the 3D space before the fMRI scanning.After the task, we asked the subjects to describe the task procedure to ensure that they fully understood the task instructions.The subjects were required to complete the task again if they attained fewer than eight goals (i.e., eight coins).All the subjects described the task procedure correctly after the training task.None of the subjects collected fewer than eight coins in this training task.In general, the practice time for each subject was about 3-6 minutes.After the subjects fully understood the experimental procedure and the instructions, we asked them to take a 5-minute short break and then invited them to participate in the subsequent MRI scans.
Fig. S7, the subjects can reach the goal location (the circle in orange) with the fewest steps by two possible paths (the yellow and the green lines) from the current location (the circle in blue).Therefore, in this trial, locations 6 and 7 were both optimal locations.First, a location was randomly chosen from the two locations (6 and 7) as the predefined optimal choice.Second, three other locations were chosen from the locations around the current location (within the gray dotted frame).It is possible that both locations 6 and 7 could be chosen as the options.In this situation, whether the subjects chose location 6 or 7, their response was recorded as correct.
The selected options were arranged according to the sequence shown in Fig. S7, with the location below the current location as the first option and the location on the right below the current location as the last option.The other locations were arranged in a clockwise direction.When the subjects navigated to one of the four corners of the 2D space, the current location was only surrounded by three other locations.In this case, the three locations, along with a non-selectable blank image were set as the options.
The selection and arrangement of the options in the 3D space was the same as that in the 2D space, except that, when the subjects navigated to the corners of the space, there were still more than four locations for the potential options.

Fig. S2 .
Fig. S2.Schematic diagram of the coordinate assignment to the locations in an

Fig. S3 .
Fig. S3.An example navigational path in a 2D abstract space.The circle in blue

Fig. S4 .
Fig. S4.Behavioral performance corresponding to the three LMM analyses.(a) The

Fig. S5 .
Fig. S5.Brain activation significantly associative with the learning level.Regions

Fig. S1- 1 .
Fig. S1-1.Setting and procedure of the spatial n-back task used in the experiment on Day 1.(a) Procedure of the n-back task.In each trial, a square was displayed on the screen for 250 ms and the subjects needed to respond within 2,500 ms before the next trial began.(b) The eight potential positions of the square.(c) Working memory test.We included four levels of the n-back task, n  (1, 2, 3, 4).For each level of the n-back task, the subjects practiced before completing 30 judgements.They were promoted to the next level of n-back task if the response accuracy (RA) > 60%.The highest level of the n-back task was set to four.
the right-side stimulus was first set larger than the left-side stimulus and then decreased in each of the following trials until the subjects thought the right-side stimulus was equal to or smaller than the left-side one.Each time after the subjects switched his/her choice, the next sequence began and was set to an ascending or descending sequence randomly.If the subjects successfully judged the size of the stimuli in a sequence, the amount of change (the change of size in each trial within a sequence) was reduced in the next sequence.The task ended when the subjects failed to discriminate the size of the two stimuli in three continuous sequences.The last stride that the subjects successfully discriminated was recorded as the score of discriminability.A lower score indicated better discriminability.

Fig. S1- 2 .
Fig. S1-2.Experimental materials for discriminability tests and rule learning used in the experiment on Day 1.(a) Materials including circles, ovals, vertical lines, horizontal lines, and angles.(b) The color sequence used in rule learning.

Fig. S2- 1 .
Fig. S2-1.Navigation task in the 1D abstract space for the behavioral training.(a)The 1D abstract space.(b) Procedure of the navigation task.This was same as the procedure shown in Fig.2aexcept for the following three settings.First, feedback was given to the subjects after each trial.Second, the interval between the presentation of the destination and the options was narrowed down to 2 s.Third, only two options were provided in each trial.The symbols were created by the authors.Abbreviations: g, navigational goal or destination; cn, the current location of the n th step; RT, response time.

Fig. S2- 2 .
Fig. S2-2.Navigation task in the 3D abstract space used in the behavioral training.(a) The 3D abstract space.(b) Procedure of the navigation task.This was same as the procedure shown in Fig. 2a except for the following two settings.First, feedback was given to the subjects after each trial.Second, the interval between the presentation of the destination and the options was reduced to 2 s.The symbols were created by the authors.Abbreviations: F1, F2, and F3 indicate the three features; g, navigational goal or destination; c0, the current location of the first step; RT, response time.

Table S1 .
Behavioral performances of the subjects in the navigation task during the fMRI scanning in the five abstract spaces.