Coordination over a unique medium of exchange under information scarcity

Nioche, Aurélien; Garcia, Basile; Lefebvre, Germain; Boraud, Thomas; Rougier, Nicolas P.; Bourgeois-Gironde, Sacha

doi:10.1057/s41599-019-0362-2

Download PDF

Article
Open access
Published: 03 December 2019

Coordination over a unique medium of exchange under information scarcity

Aurélien Nioche ORCID: orcid.org/0000-0002-0567-2637^1,2,3,4,5^na1,
Basile Garcia^4,5,6,7,8^na1,
Germain Lefebvre^6,7,9,10,
Thomas Boraud^4,5,11,
Nicolas P. Rougier ORCID: orcid.org/0000-0002-6972-589X^4,5,8,12^na1 &
…
Sacha Bourgeois-Gironde ORCID: orcid.org/0000-0003-1767-0919^2,3,9^na1

Palgrave Communications volume 5, Article number: 153 (2019) Cite this article

1131 Accesses
2 Citations
Metrics details

Subjects

Abstract

Several micro-founded macroeconomic models with rational expectations address the issue of money emergence, by characterizing it as a coordination game. These models have in common the use of agents who dispose of perfect or near-perfect information on the global state of the economy and who display full-fledged computational abilities. Several experimental studies have shown that a simple trial-and-error learning process could constitute an explanation for how agents coordinate on a single mean of exchange. However, these studies provide subjects with full information regarding the state of the economy while restricting the number of goods in circulation to three. In this study, by the mean of multi-agent simulations and human experiments, we test the hypothesis according to which coordination over a unique medium of exchange is possible in the context of information scarcity. In our experimental design, subjects and artificial agents are only aware of the outcome of their own decisions. We provide results for economies with 3 and 4 goods to evaluate to which extent it is possible to generalize results obtained with 3 goods to $n$ goods. Our findings show that in an economy à la Iwai, commodity money can emerge under drastic information restrictions with three goods in circulation, but generalization to four or more goods is not guaranteed.

Globalization and the rise and fall of cognitive control

Article Open access 18 June 2020

The evolution of fixed-supply and variable-supply currencies

Article Open access 20 April 2022

Humans depart from optimal computational models of interactive decision-making during competition under partial information

Article Open access 07 January 2022

Introduction

In the last decades, monetary economics has shifted from a purely macroeconomic understanding of money to an analysis of its micro-foundations, both in its game-theoretical and behavioral dimensions. Following the intuitions of Karl Menger (1892) and starting with the Jones’ model in the mid-1970’s (Jones, 1976), several search-theoretic models have been proposed in order to identify the conditions for money emergence (Diamond, 1984; Kiyotaki and Wright, 1989, 1991; Oh, 1989; Aiyagari and Wallace, 1991; Kiyotaki and Wright, 1993; Shi, 1995; Iwai, 1996; Kehoe et al., 1993; Wright, 1995; Luo, 1998). They are considered search-theoretic models in the sense that they describe situations where agents need to search for a trading partner before transacting (Nosal and Rocheteau, 2011). Besides, these models belong to the class of micro-founded macroeconomic models with rational expectations. Agents with rational expectations can take advantage of all the available information to form their expectations and decide which action is optimal on the belief that every other agent in the economy has a similar ability (Muth, 1961).

Their first advantage is that they explain a macroeconomic phenomenon—money emergence—from individual decision-making processes. The second advantage of these models is that they explain money emergence that does not require the economies to be centralized: they do not need to assume a monetary authority for the agents to coordinate over a unique medium of exchange. Focusing on the function of a medium of exchange, these models highlight the key role that the money can play in limiting frictions in exchange processes (i.e., the difficulty to find an exchange partner). However, these models are based on three unrealistic assumptions: the omniscience of economic agents, infinite time and an extremely large number of agents (unbounded).

A question that immediately arises is whether money emergence without a monetary authority is possible in an economy populated by agents with restricted abilities and having limited access to information. More precisely, we want to know whether coordination over a unique medium of exchange is possible when agents proceed by trial and error and have access to local information only.

A partial answer has been brought to this question, through agent-based simulations with artificial agents using a reinforcement learning process (Marimon et al., 1990; Duffy and Ochs, 1999; Kindler et al., 2017) in a Kiyotaki-and-Wright’s environment (Kiyotaki and Wright, 1989, 1993). In these simulations, reinforcement learning agents have by construction limited computational abilities, and their informational inputs are only constituted by the success and failures of each exchange attempt. In contrast to Kiyotaki-and-Wright’s theoretical agents, they are completely blind to the global state of the economy, and the tuning of their preferences does not rely on the knowledge of the latter. Yet, results report achievement of monetary equilibria, indicating that fully rational agents are not required for money to emerge. In a similar perspective, other work considers the question of money emergence under heterogeneous beliefs, where some agents are rational, and the remaining fraction learns by an adaptive learning rule, showing that coordination is also eventually possible in this setting (Branch and McGough, 2016).

The Kiyotaki’s and Wright’s model (Kiyotaki and Wright, 1989, 1993) has been experimentally tested, to show if results obtained analytically or by numerical simulation were reproducible with actual human subjects. It had been shown that a monetary equilibrium can be reached with human subjects evolving in a search-theoretic environment (Brown, 1996; Duffy and Ochs, 1999; Duffy, 2001), or at least reaching a high proportion of speculators (Lefebvre et al., 2018). Interestingly, it has been shown that a reinforcement model fits well their experimental data obtained in a Kiyotaki-and-Wright’s environment (Kiyotaki and Wright, 1989, 1993), suggesting that although more sophisticated behavior rules were available, subjects tended to favor immediate past feedback (Duffy and Ochs, 1999; Duffy, 2001).

One first critic that we can address the computational and experimental aforementioned studies, is that although they succeeded in demonstrating achievements of monetary equilibrium, they were mainly considering the fundamental equilibrium of Kiyotaki and Wright (Kiyotaki and Wright, 1989, 1993). Indeed, Kiyotaki and Wright (Kiyotaki and Wright, 1989, 1993) consider two types of equilibrium: (i) fundamental, where the monetized good is less costly to store than the other goods in circulation, what explains easily why it is preferred, (ii) speculative, where some agents are required to incur at first supplementary costs (i.e., to speculate). The speculative equilibrium is particularly interesting, as it provides insight about a specific cognitive ability that could sustain money emergence (i.e., the ability to endorse a cost on short term with view on distant goals), and yet, it is the one for which the results are the scarcest (Brown, 1996; Duffy and Ochs, 1999; Duffy, 2001; Kindler et al., 2017; Lefebvre et al., 2018). Secondly, in contrast with virtual agents learning by reinforcement that are only provided with scarce information, human subjects had access to information about the global state of the economy in studies mixing the use of artificial agents and human subjects (Duffy and Ochs, 1999; Duffy, 2001; Lefebvre et al., 2018). Thirdly, to our knowledge, these computational and experimental studies are based on search-theoretic models involving only three goods (Brown, 1996; Duffy and Ochs, 1999; Duffy, 2001; Kindler et al., 2017; Lefebvre et al., 2018). In this case, only one type of agent uses the monetary good genuinely as a medium of exchange. It remains to know whether their conclusions can hold if there are more than three goods in circulation.

Let us note that in recent literature, numerous questions have been treated through an experimental money-emergence paradigm: Whether a convergence on a money equilibria is preferred to a gift exchange equilibria, where an agent has the possibility to give a good in the hope of obtaining another later (Duffy and Puzzello, 2014), how inflation tax affects economic activity (Anbarci et al., 2015), how a foreign money may be accepted by agents in an international framework (Jiang and Zhang, 2018; Ding et al., 2018), how a monetary equilibrium is reachable under assumption of a finite horizon (Davis et al., 2019), or even how when a first money already circulates in the economy, a second may emerge (Rietz, 2019). However, either they assume a central authority that injects money (Anbarci et al., 2015; Ding et al., 2018), either money does not emerge endogenously, as a fraction of agents is first provided with tokens (worthless goods that none agents consume) they are compelled to exchange to obtain their consumption good (Duffy and Puzzello, 2014; Jiang and Zhang, 2018; Davis et al., 2019; Rietz, 2019). In these experiments, the cognitive requirements for money emergence as an endogenous process are thus never explicitly tested.

The purpose of this study is to know whether economies populated with human subjects can reach a monetary state in the context of information scarcity, that is in a case of extremely incomplete information in the sense of the game theory, forcing the subjects to take their decisions under a strong form of ambiguity. More precisely, this study aims to investigate whether coordination over a unique medium of exchange can occur with subjects only experiencing the direct outcome of their decision, learning by trial-and-error and without any additional information.

Hence, the question is to know whether results obtained with virtual agents combining a restriction on computational abilities and informational input can be generalized to economies populated with humans. To assess their reliability and to broaden our conclusions, we decided to include an additional good, including in our study economies with four goods in circulation. To meet these goals, we borrowed certain elements from the previous search-theoretical models to define the structure of our economies, such as the production-consumption specialization and the absence of double coincidence of wants (i.e., if an agent produces $i$ and consumes $j$, no agent produces $j$ and consumes $i$, so that pure bartering is not an effective solution). However, instead of using an environment a la Kiyotaki and Wright (Kiyotaki and Wright, 1989, 1993), we decided to use a search-theoretical structure that presents more generality than Kiyotaki and Wright’s one, based on the Iwai’s model (Iwai, 1996). Iwai’s model differs in two fundamental ways from Kiyotaki and Wright’s model (Kiyotaki and Wright, 1989, 1993): (i) the exchange technology consists in random pairing inside markets specialized in a pair of good while in Kiyotaki and Wright (1989, 1993), agents are randomly matched regardless of any other characteristic (ii) there are no storage cost, such as storing a good $i$ is not costlier than storing good $j$. That is why we decided to adopt an Iwai-like environment with indistinguishable goods, in a way to avoid that money emergence bears on intrinsic features of goods, as it is in the case of the Kiyotaki and Wright’s fundamental equilibrium. We began by conducting a series of simulations. In the simulated economies, agents are producing a certain good and looking to obtain another one through exchanges, have little knowledge about the environment in which they operate—they only know if their attempt of exchange was a success or a failure. They are learning using a basic reinforcement mechanism, associating a value to each choice option available to them and updating by trial-and-error the efficiency of each type of exchange. We used the results of these simulations to identify the experimental conditions that would promote the coordination over a single medium of exchange. Subsequently, we observed the behaviors of human subjects under similar informational constraints and we compare the theoretical and experimental results. To conclude, we discuss the possibility of coordination over a unique medium exchange in the context of information scarcity, in a three and four goods setting.

Materials and methods

Model

General framework

Each economy is composed of different types of agents. A type of agent is defined by what agents of this type produce and consume. The goal of each agent is to obtain his consumption good. Agents proceed to exchanges between them to achieve this goal. Agents have feedback only about their exchange attempt and learn by reinforcement the efficiency of each type of exchange. We vary across simulations the distribution of agents among the existing types. By construction, if a good $m$ becomes money, an agent that produces it or consumes it should try to exchange directly his production good against his consumption good. Otherwise, the agent is supposed to use it as a medium of exchange, that is to exchange his production good against $m$, and then $m$ against his consumption good.

Production-consumption specialization

We consider an economy with $G$ goods in circulation, with $G\ \ge \ 3$. We denote these goods $1,2,\ldots,G$. Each agent is specialized in production and consumption. A agent of type $(i,j)$ produces good $i$ and consumes good $j$ (with $j\,\ne\, i$). We suppose a non double coincidence of needs: if an agent of type $(i,j)$ exists, then an agent of type $(j,i)$ does not exist. We use a minimally connected endowment-need distribution (Iwai, 1996), such that existing agent types are: $(G,1),(1,2),\ldots ,(G-1,G)$. The number of agents for each type is exogenously set. We designate by ${x}_{G1}$ the number of agent of type $(G,1)$, ${x}_{12}$ the number of agent of type $(1,2)$, ..., ${x}_{G-1G}$ the number of agent of type $(G-1,G)$. Each agent enters the economy equipped with a unit of its production good. Each time an agent receives its consumption good, it consumes it and immediately after, produces a new unit of its production good (each agent owns a single storage unit).

Exchange technology

The exchange technology relies on a trading-post mechanism (Iwai, 1996). At each time step, each agent chooses the type of exchange it wants to perform, depending on the good it has in hand. This choice determines to which market it goes. There is an equal number of markets and goods in circulation. Each market is specialized in a pair of good $(i,j)$, such as in the $ij$-market it is possible to exchange $i$ against $j$, and $j$ against $i$. Our trading technology works synchronously (i.e., all exchanges occur simultaneously). Thus, in each $ij$-market, we randomly associates each i-seller – j-buyer to a j-seller – i-buyer, if there is a sufficient number of j-sellers– i-buyers. Therefore, in each $ij$-market, the probability of successfully exchanging a good $i$ against a good $j$ depends on the respective number of i-sellers – j-buyers and j-sellers– i-buyers (e.g., if there is in the $ij$-market at time $t$, 4 $i$-sellers – $j$-buyers and 8 $j$-sellers – $i$ buyers, 4 $ij$-exchanges will take place and the probability of success for a $i$-seller – $j$ buyers is 0.5 while a $j$-seller – $i$-buyer will proceed to the desired exchange with certainty).

Information scarcity

An agent does not know other agents’ choices, nor the probabilities of success of each exchange: the only information it has access to is whether or not it succeeded in the desired exchange.

Strategies

The goal of each agent is to obtain as quickly as possible his consumption good.

We will specifically consider:

The direct exchange strategy. For a type-$ij$ agent with $i$ in hand (his production good), it consists of trying an exchange against $j$ (his consumption good).
The indirect exchange strategy with $k$ as a medium of exchange. For a type-$ij$ agent with $i$ in hand (his production good), it consists of trying an exchange against the good $k$ (with $k\,\ne\, i,j$). With $k$ in hand, it consists of trying an exchange against $j$ (his consumption good).

Simulations

Decision-making process

Each agent learns to estimate the success rate of each type of exchange. This allows it to estimate the time needed to get its consumption good depending on the choice is made.

Success rate estimates for each exchange type are based on a reinforcement learning process. At time step $t$, when an agent attempts to exchange $i$ against $j$, it updates the success rate estimation associated to the exchange of type $(i,j)$, noted ${e}_{ij}$ according to:

$${e}_{ij}^{t+1}={e}_{ij}^{t}+\alpha \cdot (s-{e}_{ij}^{t})$$

(1)

with $\alpha \in [0,1]$, a free parameter and $s$, a binary variable such as $s=1$ if the agent succeeded in his exchange, $0$ otherwise. $\alpha$ is a learning rate which defines to which extent an agent takes into account his latest attempted exchange. If $\alpha =1$, the agent considers only his latest attempted exchange. If $\alpha =0$, the agent does not take into account the new observations of failure or success of the last attempted exchange.

When making a decision, each agent considers the expected temporal interval between the time of choice and the time he gets his consumption good. It is assumed that the longer the time interval, the lower the value for the agent. Let $v(ij)$ be the value associated to the choice $ij$ (i.e., exchange $i$ against $j$) and ${\Delta }_{ij}$ the estimation by the agent of the time that will be spent before consumption if he chooses $ij$:

$$v(ij)=1/{(1+\beta )}^{{\Delta }_{ij}}$$

(2)

with $\beta \ >\ 0$, a free parameter. $\beta$ is a discount factor parameter: the closer to $0$, the more subjective values are discounted with time (Osborne, 2016). Since it takes at least one unit of time for the agent to get its consumption good, the value function $v$ is bounded between 0 and 1.

We assume that for each exchange of type $(i,j)$, the agent has an estimation of the success rate associated to this type of exchange (${e}_{ij}$). The higher the estimated success rate, the lower the estimated time to succeed in this exchange. Let ${\delta }_{ij}$ be the estimated time to achieve a type-$ij$ exchange:

$${\delta }_{ij}=1/{e}_{ij}$$

(3)

For a type-$ij$ agent, ${\Delta }_{ij}={\delta }_{ij}$. If a type-$ik$ agent (with $k\,\ne\, j$), the value of ${\Delta }_{ij}$ depends on the action policy chosen by the agent, as ${\Delta }_{ij}$ would be equal in this case to the sum of the $\delta$-values for each intermediary exchange planned by the agent. For instance, for a type-$ik$ agent following an indirect exchange strategy with good $j$, ${\Delta }_{ij}={\delta }_{ij}+{\delta }_{jk}$. An exhaustive description of valuation functions for the specific case where $G=3$ is given in the supplementary section.

Agents make decisions using a probabilistic decision rule. The standard approach is to use a softmax function to introduce stochasticity in choice (Sutton and Barto, 1998). However, Apesteguia and Ballester (2018) show that the combination of a softmax rule and either a risk-sensitivity or a temporal discounting model can be problematic, as the parameter describing the risk-sensitivity discounting effect can have a non-monotonic effect on the variable of interest. For this reason, the rule implemented is a simple $\epsilon$-rule (Sutton and Barto, 1998). Let $v(ij)$ be the value associated with choice $ij$ and $p(ij)$, the probability to choose to exchange $i$ against $j$. $p(ij)$ is computed as follows:

$$p(ij)=\left\{{\begin{array}{l}{1-\gamma}\qquad\qquad{{\text{if}}\,\forall k:v(ij) \,> \,v(ik),} \\{\gamma /(G - 1)}\qquad{{\text{otherwise}}.}\end{array}} \right.$$

(4)

with $\gamma \in [0,1]$, a free parameter. $\gamma$ is an exploitation-exploration rate (Sutton and Barto, 1998): the lower the $\gamma$-value, the more prone the agent will be to choose the option with the highest subjective value. On the contrary, the higher the $\gamma$-value, the more the agent will be prone to choose another option.

Protocol and parametrization

We ran $10,800$ simulations with $G=3$ and $10,800$ simulations with $G=4$. Each simulation lasted $100$ time-steps. The exploration parameter ($\epsilon$) was varied between $0.10$ and $0.15$. The learning rate ($\alpha$) was varied between $0.10$ and $0.25$. The discount factor ($\beta$) was varied between $0.80$ and $1.20$. The initial values of success rate estimates for all types of exchange were set to $1$. The fact that the initial values were set to $1$ precluded the presence of bias in preferences (such as bias such as the appearance of commodity money was more likely). With these values, the value associated with exchanging his production good against his consumption good was indeed higher than the value of any other exchange for all agents, implying that all agents were preferring the direct exchange strategy at the first time-step.

When $G=3$, ${x}_{31}$ was set to $50$ while ${x}_{12}$ and ${x}_{23}$ were varied between $10$ and $200$.

When $G=4$, ${x}_{41}$ and ${x}_{12}$ were set to $50$ (following results from simulations with $G=3$) while ${x}_{23}$ and ${x}_{34}$ were varied between $10$ and $200$.

Artificial experiments

We ran $4$ separate simulations before the experiment using the same distribution of agents as for experiments ($2$ matching the conditions of Experiment I and $2$ matching the conditions of Experiment II). The cognitive parametrization of the agents was: $\alpha =0.175$, $\beta =1.000$ & $\gamma =0.125$ (these values correspond to the average value of each parameter used for the simulations).

Post-hoc simulations

We fitted our behavioral data on the decision-making model using Scipy’s (Jones et al., 2001) differential evolution algorithm (provided by the module optimize). We optimized model parameters by minimizing the negative log-likelihood of the model for each subject individually.

Using the best-fit parameter values of the subjects to parametrize the artificial agents (the distribution of the best-fit parameter values is given in Fig. S18A of the Supplementary Section), we ran $4$ post-hoc simulations ($2$ matching the conditions of Experiment I and $2$ matching the conditions of Experiment II).