Humans are involved in various real-life networked systems. The most obvious examples are social and collaboration networks but the language and the related mental lexicon they use, or the physical map of their territory can also be interpreted as networks. How do they find paths between endpoints in these networks? How do they obtain information about a foreign networked world they find themselves in, how they build mental model for it and how well they succeed in using it? Large, open datasets allowing the exploration of such questions are hard to find. Here we report a dataset collected by a smartphone application, in which players navigate between fixed length source and destination English words step-by-step by changing only one letter at a time. The paths reflect how the players master their navigation skills in such a foreign networked world. The dataset can be used in the study of human mental models for the world around us, or in a broader scope to investigate the navigation strategies in complex networked systems.
Background & Summary
The way people navigate among the world's physical objects is of crucial importance from the perspective of their survival. Due to their crucial function, the navigation strategies of humans have been extensively studied in the past1,
Our database has been collected by a smartphone application called “fit-fat-cat”13. After the users install the game (see Figure 1), they are asked to transform a three-letter English source word into a three-letter target word through meaningful intermediate three-letter English words by changing only a single letter at a time. The word chain fit-fat-cat is a good solution of a game with source word ‘fit’ and target word “cat”. These word chains are collected anonymously, with some other optional metadata about the players (gender, language, year of birth). The word chains in the database can be considered as the footprints of human navigation over the word morph network of the English language and may be good sources for studying human navigation strategies in the future.
Here we identify two possible future directions where the shared data could be used. The word morph network is almost surely unknown for most people before playing the game. One may reason that native English speakers have significant advantages but it seems from the data that this is not the case. In fact the learning curve for native and non-native speakers seem to be very similar. Since the game records are timestamped, the dataset may be a good candidate source for analyzing how people find their way in a foreign networked system, how fast they build up a mental model for a basically unknown networked world, what characteristic features such models may have and how that model changes in time. Secondly, the in-depth analysis of the paths used by people after mastering the game can give valuable information about other systems navigated by humans day-by-day. Cognitive psychology, neuroscience, public transportation services, social and organizational networks are all imaginable areas that could possibly benefit from such information.
The dataset reported in this paper is collected via a smartphone application called “fit-fat-cat” running on most Android phones and tablets. The application is available from the Google Play store13 and the installation is as easy as any other apps. After installing and launching the application, the word morph game is ready to play. By playing the game, the players give their consent to openly use their anonymized data without restrictions.
The screenshots in the “Play games” box of Figure 1 show the main screen of the application. When a player starts a game the source and target words are randomly picked from the all possible three-letter English words. The source and target words are displayed in a box with an arrow pointing from the source (DEE in the example of Figure 1) to the target word (COP). Below this box one can see one's chain entered so far. When starting a new game the chain contains only the source word (DEE). The player can enter the consecutive words in a user friendly manner by using a virtual keyboard of the phone. First they select which letter they want to change (D in the example) than they choose the new letter with the keyboard (see the left screenshot in the “Play games” box of Figure 1). After changing a letter, the app automatically adds the new word to the chain. This way the players can see which words they have already tackled when solving a word puzzle. The game may end in three ways. If the player can reach the target word through such one-letter transformations the puzzle is solved. In this case, the word becomes green-colored to show the end of the game and the player gets some motivating messages at the bottom of the screen (see the middle screenshot in the “Play games” box of Figure 1). Secondly, the player can give up the game by pressing the “new game” button represented by a circle arrow to the left of the + and − buttons at the top of the screen. In this case the player gets the next puzzle automatically. Finally, the player can press the “magic wand” button to the left of the “new game”. In this case a possible (optimal, i.e. shortest path) solution of the puzzle is shown before starting a new game (see the right screenshot in the “Play games” box of Figure 1). No matter how the game is ended, the chain field (along with some other metadata, see the Data Records section) is anonymously submitted to our database stored in the cloud. Thus the application collects also the chains which are not completed, i.e. their last word is not the target word.
The application is capable of generating puzzles with longer words and also in other languages. The word length can be increased and decreased by pressing the + and − buttons respectively in the top left corner of the screen. The language indicator in the top right corner (displaying EN in the screenshots of Figure 1) shows the current language, which can be changed by pressing the button. These buttons are activated only after playing fifty successful games with the three-letter English version. Note that this manuscript describes the three-letter data, but the games with longer words are also shared in the dataset.
For providing a better gaming experience to the players, they can obtain badges for their achievements and can write themselves up to leaderboards. Badges are given for playing certain number of games, for optimal solutions or for very fast solutions. The achievements and leaderboards can be accessed by pressing the “Menu” button represented by ⋮ (vertical dots) in the top left corner of the application.
In addition to collecting the chains, the application also collects optional data about the players. Gender, language and year of birth can be given in a form (see the screenshot in the “Add optional player data” box of Figure 1) and players willing to disclose these pieces of information earn the “Open person” badge. The application is collecting data since October 2016. The collected dataset is available at Data Citation 1: Open Science Framework http://dx.doi.org/10.17605/OSF.IO/JTYVD.
Interpretation of the data
The collected chains are footprints of the process how people master their navigational skills in the network lying behind the game. The word morph network is a network of three-letter English words, in which two words are connected by a link if they differ in only a single letter at the same position (see Figure 2a). For example the word “FIT” is connected to the word “FAT” as they differ only in their middle letter. “FAT” is linked to “CAT” as they differ in their first letter, but “FIT” and “CAT” are not connected in this network since they differ in more than one letter. The chains collected from players are paths in this network and reflect valuable information about how people try to navigate between nodes. Figure 2b shows a subgraph of the word morph network and illustrates two solutions for a game between source and target words “YOB” and “WAY”.
The fit-fat-cat application collecting the data is available at the Google Play store13. The source code of the app can be requested via e-mail to the corresponding author.
Our dataset (available at Data Citation 1: Open Science Framework http://dx.doi.org/10.17605/OSF.IO/JTYVD) is organized into three files. The file named “three_letter_scrabble_words.txt” contains the 3-letter long official word list for Scrabble as of 2016. From this list one can reconstruct the word morph network (see Figure 2) containing the 3-letter English words as nodes that are connected if they differ in exactly one letter. The graph representation of this network is provided under the file “word_morph_network.gml” in the GML format14.
The third file named “word_navigation_game_export.json” is the dataset containing information about the word games as played by our players along with some additional meta data about the players themselves the provision of which is optional. All these data is organized into one JSON15 formatted file containing 19828 game records gathered from our players. Game records are stored under the object name “GameLogs” and grouped by players using anonymous player identifiers (28 character long hash strings). Each game record is identified by a 20 letter long hash code and contains eight fields as detailed in Table 1. The player records are stored under the object name “Users”, each player is identified by a 28 letter long hash string. Gender, language and date of birth are collected from players who are willing to share them (not mandatory for playing the game). The corresponding fields in the dataset are described in Table 2.
The collection of the game data is fully automated by the “fit-fat-cat” application. After the players finish the game, the application automatically uploads the game record to our server. Only the “chain” field is collected directly from the player, all other information (date, time taken to complete the game, source word, target word, chain length etc.) are filled out automatically by the application. The collection process of the optional data about the players uses standard techniques to increase the quality of the data (e.g. gender and language information can be chosen from lists).
The application chooses source and target words in a randomized fashion, which ensures that the data is not biased towards specific words and contains navigation paths (i.e. game solutions) covering all regions of the word morph network. The dataset is fully anonymized, players and games are identified by hash codes, which shades the identity of the players but eases the processing of the data.
The shared dataset is easy to read and process by basic software tools. The file “three_letter_scrabble_words.txt” holding the list of 3-letter long official words for Scrabble can be read by an arbitrary text editor. The word morph network provided under the file “word_morph_network.gml” can be parsed by ordinary GML parsers (e.g. by using igraph's16 GML interface either from Python or R codes). The game data shared under the file named “word_navigation_game_export.json” can be parsed by standard JSON15 parsers. Exemplary code snippets are displayed in Supplementary Table 1 to show how to parse and use the database from Python or R codes. Some basic statistics (number of records and the mean value of the word chain lengths entered by the players) are also computed to illustrate the usage of the data.
How to cite this article: Kőrösi, A. et al. A dataset on human navigation strategies in foreign networked systems. Sci. Data 5:180037 doi: 10.1038/sdata.2018.37 (2018).
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Kőrösi, A. et al. Open Science Framework http://dx.doi.org/10.17605/OSF.IO/JTYVD (2017)
Project no. 123957 and 124171 has been implemented with the support provided from the National Research, Development and Innovation Fund of Hungary, financed under the FK_17 and K_17 funding schemes respectively. A. Gulyás was supported by the János Bolyai Fellowship of the Hungarian Academy of Sciences.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files made available in this article.