A network approach to cartel detection in public auction markets

Competing firms can increase profits by setting prices collectively, imposing significant costs on consumers. Such groups of firms are known as cartels and because this behavior is illegal, their operations are secretive and difficult to detect. Cartels feel a significant internal obstacle: members feel short-run incentives to cheat. Here we present a network-based framework to detect potential cartels in bidding markets based on the idea that the chance a group of firms can overcome this obstacle and sustain cooperation depends on the patterns of its interactions. We create a network of firms based on their co-bidding behavior, detect interacting groups, and measure their cohesion and exclusivity, two group-level features of their collective behavior. Applied to a market for school milk, our method detects a known cartel and calculates that it has high cohesion and exclusivity. In a comprehensive set of nearly 150,000 public contracts awarded by the Republic of Georgia from 2011 to 2016, detected groups with high cohesion and exclusivity are significantly more likely to display traditional markers of cartel behavior. We replicate this relationship between group topology and the emergence of cooperation in a simulation model. Our method presents a scalable, unsupervised method to find groups of firms in bidding markets ideally positioned to form lasting cartels.

In the PD and many other games in which collectively optimal actions are personally costly to players, altruistic cooperation emerges under a variety of conditions through mechanisms such as reciprocity and the altruistic punishment of defectors or cheaters 14 .
Just as certain market conditions are known to favor cartels, researchers have observed that when players of the PD are arranged in some space or network which restricts their potential interactions, the potential for the emergence stable cooperation crucially depends on the structure of the space [15][16][17][18] . For example, correlations in the spatial distribution of agents have been shown to facilitate cooperation 19 . To the best of our knowledge, this observation that local correlation of interactions have significant influence on the emergence of cooperation has not been applied to the problem of detecting cartels. This paper proposes focusing the search for cartels on groups occupying ideal positions in the competitive landscape for the emergence of cooperation.
We propose to apply network science methods to identify groups of intensely interacting firms in a market and to screen them for collusive potential, based on their network topology and how it may facilitate collusion. We focus on the specific case of collusion in public contracting markets, in which public bodies buy goods and services from private firms. These markets are vulnerable to collusion because of the inelasticity and regularity of government's demand for certain goods 20,21 . They are also large, accounting for between 10-20% of GDP in the OECD 22 . Contracts are commonly awarded via auction to the lowest bidder. In these markets cartels often engage in bid-rigging, coordinating their bids to mask their agreement to avoid competition 23 . The US Department of Justice highlights bid-rigging as one of the primary modes of anti-competitive behavior in public markets, besides collective price-setting and market-allocation 24 .
Specifically, we use data on firms bidding for contracts to map in public contracting markets as networks of competing firms. We argue that such a network represents an embedding of the firms into a space which describes the competitive landscape of their industry or location, including its geography, technology, and scale. Within such co-bidding networks, we detect groups of firms whose local network topology are naturally conducive to sustaining collusion. Our findings suggest that certain topological features may be necessary for cartels to successfully operate in the long-run.
Previous work on cartels has considered co-bidding networks of firms [25][26][27] , but using network topology to detect groups of firms within markets is a relatively new idea. For example Conley and Decarolis 28 use an agglomerative clustering method to group firms based on their bidding behavior. In recent work on cartel screening, Imhof et al. have used patterns of bidding interactions between firms to study cartels 29 , though do not consider interactions between cartel and non-cartel firms. In general data on bids in public procurement are not made public, presenting a major obstacle to research on bid-rigging. This information is kept secret because it can be useful to firms engaged in collusion 30 . In fact the OECD recommends that issuers of public procurement contracts do not publish information on losing bidders and bids for this reason 31 . The responsible authorities certainly have access to such information.
More broadly, network methods have been fruitfully applied to problems in criminology including corruption 32-34 , the mafia 35 , and the evolution of criminal behavior in society 36 .
We first map the co-bidding market of the suppliers of public school milk in 1980s Ohio containing a known cartel case 37 . We note that the cartel firms occupy a distinguished position in the network: they frequently interact with one another, forming coherent links, and are relatively isolated from outside firms, forming an exclusive group. We then turn to a dataset of bids on nearly 150,000 contracts awarded in the Republic of Georgia from 2011 to 2016 worth roughly 5 billion US dollars. Using a greedy, bottom-up algorithm to detect overlapping groups of interacting nodes, we find that groups with cohesive and exclusive interactions have higher prices, are less likely to sue each other, and are more likely to have low variance in their bids and prices -classic screens for cartel behavior used by competition authorities around the world 38 . Finally, we simulate a market in which firms compete for randomly placed contracts when they are close in proximity, introducing spatial correlations to interactions. Firms see their competitors for the contract, and decide to compete or collude based on the previous actions of their partners and the frequency with which they meet them. In the resulting co-bidding network, detected groups with coherent and exclusive links successfully collude with much higher frequency.

Results
Our framework to find groups of firms that may be engaging in collusion consists of several steps. First we extract the co-bidding network of firms in a market, connecting two firms by an edge with a weight that increases as they more frequently bid for the same contract. We then identify groups of firms which frequently bid for the same contracts using a modified version of a popular overlapping community detection algorithm 39 . The method is greedy, and the function to merge nodes into groups has a penalty term for the number of nodes included, insuring that the groups detected remain small relative to the size of the market. Finally, we calculate topological features of the groups: their coherence and exclusivity. We suggest sustained collusion is more likely to emerge among high coherence and exclusivity groups because they offer the ideal conditions for firms to learn to cooperate and trust one another. We find evidence of this phenomenon in three settings: a dataset of school milk contracts with a known cartel, a dataset of virtually all contracts awarded in the Republic of Georgia over several years, and in a simulation model of contracting markets with spatial correlations.
The 1980s ohio school milk market. We first analyze bidding data from the market for public school milk in 1980s Ohio 37 . Every summer school districts called for bids from dairies to provide school milk for the following academic year. Firms submitted sealed bids quoting a price in cents per pint. In 1993 representatives from two firms confessed to colluding with a third firm to rig bids for contracts in the Cincinnati area as part of a settlement. The third firm eventually settled out of court, paying significant civil penalties.
Previous work by Porter and Zona highlights irregularities in the bidding behavior of the suspected cartel firms compared to the rest of the market 37 . Exploiting specific features about the market for school milk, the authors created an econometric model to predict the bids of firms on contracts, including information on the capacity of firms, the specifications of the bids (i.e. whether drinking straws were required), and the physical distance between the firm and school. They found that the bids submitted by cartel members were often decreasing in distance -a highly suspicious fact given that a major cost in the supply of school milk is its transportation.
For each year from 1981-1990, inclusive, we created the co-bidding network of firms, connecting two firms based on the similarity of their bidding behavior. We apply our method to detect overlapping groups of interacting firms. We use a force layout algorithm to visualize the network in subplot A of Fig. 1, highlighting the cartel firms in red and outlining the detected groups. For each group we calculate its coherence, the ratio of the geometric to arithmetic means of its edge weights 40 and its exclusivity, the ratio of strength within the group to the total strength of nodes in the group (including edges leaving the group). As features of groups of firms, coherence captures the consistency and intensity of interactions among firms in the group, while exclusivity quantifies the extent to which group interactions happen in isolation from the rest of the firms in the broader market.
We plot the distribution of groups across all ten years in the coherence-exclusivity space in subplot B of Fig. 1. In the first plot we show the distribution groups detected in 100 null models for each year in which bidding behavior was randomized. Specifically, the null model shuffles bidders between contracts, such that each firm bids on the same number of contracts and each contract receives the same number of bids. In the second plot we show the observed distributions, indicating the position of the cartel firms (which our group detection algorithm identified as a group in each year) with white circles. We note two phenomena: the first is that groups in the empirical network have significantly higher exclusivity and coherence than what would be expected if the bids were random, while the second is that the high coherence and exclusivity regime is sparsely populated in both the empirical data and the null model. Georgian public procurement markets. We now turn to data from a much larger procurement market covering a wide range of goods and services. Specifically we collected virtually all public contracts from the Republic of Georgia from 2011 to 2016. The data consists of nearly 150,000 contracts bid on by nearly 15,000 unique firms with total value roughly five billion US dollars. As with the Ohio dataset, we observe the bids and bidder identities for each contract. Rather than cluster our data by contract-level product type, we proceed by analyzing the whole network, arguing that firms participate in many markets and that any categorization of firm into market on the basis of firm or contract metadata would exclude many interactions between firms in adjacent markets. We visualize the co-bidding network for one year of data in the supplementary information, observing that most contracts are awarded to firms in a densely connected giant component of competitive activity.
We apply our method to detect overlapping groups in the whole market and calculate their coherence and exclusivity for each year. In the analysis that follows we consider only groups of firms identified from the co-bidding network that exclusively bid on at least 30 contracts in a given year in order to focus on significantly interacting firms. Our findings are robust to a range of cutoffs, which we report in the SI.
To compare our data against a plausible null model, we created randomized networks from data by shuffling the contracts firms bid on within specific product classes. This insures that firms bidding exclusively on school milk contracts do not bid on software consulting contracts in our null model. We use the resulting distributions of group cohesion and exclusivity from the null model to create thresholds for labeling groups from the empirical network as suspicious. We consider a group from the empirical co-bidding network suspicious if its coherence and exclusivity exceed the 80th percentile of coherence and exclusivity of groups in the null model in the same year. In the supplementary information we apply the same threshold to classify Ohio groups as suspicious and find that it consistently detects the cartel group with a low false positive rate. www.nature.com/scientificreports www.nature.com/scientificreports/ We visualize the distributions of groups in the coherence and exclusivity space for the randomized and empirical data in subplot A of Fig. 2. We plot the data from all years in the same visualization, and highlight the suspicious zone of high coherence and exclusivity calculated for 2016.
As there are no confirmed ground truth cartels in our dataset (the Competition Authority of Georgia confirmed to us via direct correspondence that there have been no confirmed cartel cases in public procurement markets in Georgia 41 ), we validate our claim that groups in the suspicious zone are operating under conditions that facilitate collusion using four measures. First we consider the average cost of contracts won by the group when they were the only participants. As contracts are announced with a reserve price, we can scale each contract's cost outcome to enable comparisons between contracts. We plot the distribution of relative prices for contracts won by suspicious groups versus all other groups in subplot B of Fig. 2. We confirm that groups in the suspicious zone are winning more expensive contracts, confirmed by a Mann-Whitney U test shown in Table 1.
Next, we calculated two price and bid based screens for collusion from the literature. The first is the price coefficient of variation of contracts won by the group 38 , measuring the extent a group's prices are both high and stable. This screen is based on the theoretical observation that when prices are set collectively, it is costly to coordinate price changes 42 . It aligns with empirical observations of real cartels 43 and has been used extensively by competition authorities 3 . Specifically, the price coefficient of variation CV price G of a group G is defined in terms of the average cost of contracts C cornered by the group, µ C G , and the standard deviation σ C G : The second cartel screen we apply is the average of the coefficient of variation of bids on each contract for which only group firms submitted bids 44 . Previous research has shown that the fake bids submitted by losing members of the cartel tend to closely hug the winning bid. For each contract c bid on exclusively by members of a group, we calculate the coefficient of variation in the bids: We average over all contracts C cornered by a group G to obtain its bidding coefficient of variation:  We say that a group of firms has a low bidding coefficient of variation if it is less than one standard deviation below the market average. A Mann-Whitney U test, shown in Table 1, indicates that groups in the suspicious zone are significantly more likely to have lower CV bidding c and CV price G . We carry out one more test of our method using data on bid protests. Bid protests are legal actions by firms against contracts awarded by procurement authorities. Firms can protest, for example, if the contract was not advertised in the proper venue, or if they believe criteria to participate in an auction unfairly excluded them. We collected data on which firms protested which contracts, including the firm to which the contract was awarded. We argue that colluding firms would never protest the contracts won by their cartel partners, while one may expecting intensely competing firms to frequently protest each others' winnings. For each group we check if any contract awarded to a group member was protested by another group member that year. We find that suspicious groups are half as likely to have such internal protests -a statistically significant difference shared in Table 1.
Suspicious groups detected by our methods are more likely to manifest the four collusive markers we have measured than their non-suspicious counterparts. Though this is no proof of collusion, it does indicate that many of the groups of firms that competition authorities might be interested in investigating based on their behavioral patterns exist in the same high coherence and exclusivity zone as the Ohio school milk cartel. In the next section we present a simple simulation model of a procurement market with spatial correlations which replicates our observation that collusion is more common among cohesive and exclusive groups. simulation model. We simulated a market of interacting firms placed uniformly at random in the unit square. The location of firms can be interpreted as their physical location or as a more abstract position in a space of product similarities (for instance firms supplying computer hardware might be closer to one another). Contracts, also located randomly, attract bids from nearby firms, introducing spatial correlation to the interactions between firms. We assume that firms participating in an auction know the other participants. Each firm must decide whether to cooperate or compete for the contract using two factors: the firm's memory of the previous action of the other firms, and the frequency by which they have met the same firms in the recent past.
For the first factor, the focal firm recalls the previous decision made by the other firms it is meeting using a proportional tit-for-tat strategy 12 . The second factor increases the likelihood of cooperation when the other firms have been. The decision to collude depends on the product of these two: familiarity and experiences of reciprocity are essential to start and sustain collusion 45 . In order to keep the model as simple as possible, we do not introduce a price mechanism or consider who wins a given auction. We seek to demonstrate that random spatial correlations can create environments with locations heterogeneously favorable to collusion. We present the precise parameters and initial conditions in the section on data and methods.
We simulated 5000 instances of our model, each time initializing a new market with randomly placed firms and contracts. In each instance we award 2,000 contracts, discarding the data from the first 1,000 contracts as burn-in. As before, we constructed the co-bidding networks of firms, detected groups in them, and plotted their distributions in Fig. 3, subplot A.
For each group, we calculated the rate at which members unanimously cooperated on a contract, in other words the relatively frequency of successful collusion among the group. We plot the distribution of this frequency across the coherence-exclusivity space in subplot B. In agreement with our empirical evidence, we find that collusion is significantly more likely to emerge among groups in the region of high coherence and exclusivity. These findings are robust to a range of threshold values for agents to cooperate. We report these in the supplementary information. www.nature.com/scientificreports www.nature.com/scientificreports/ As discussed earlier in the article, economic theory and empirical observation suggests that there are certain environments in which cartels are more likely to emerge 7 . Inspired by the literature on evolutionary game theory, we considered a simple model of cooperation based games played between agents embedded in space 19 . Our findings support the notion that the co-bidding network captures localized market conditions, which in turn govern the likelihood and effectiveness of emergent cooperation. This indicates that cartels may only be able to survive in the high coherence and high exclusivity zone. Of course this does not imply that any group in this zone will engage in cartel behavior -we present this finding as rather a necessary condition for collusion than a sufficient one. Interestingly, in the case when a market is significantly governed by the locations of firms in physical space, for example in the Ohio milk market, our model has the potential to be calibrated with geographical data.

Discussion
In this paper we developed a framework to find groups of firms in public contracting markets and to screen them for collusive markers. Testing our method on a ground truth case, a large scale market without known collusion, and a simple model of such markets, we find that collusion seems more likely to emerge among groups of firms with cohesive and exclusive interactions. Groups occupying such distinguished places in the broader market have found a niche with conditions ripe for the emergence of cooperation.
We must acknowledge that our approach to cartel detection is only suggestive -it cannot prove that a group of firms are engaged in collusion. Our features describe more necessary than sufficient conditions for cartel behavior: patterns of interaction cannot conclusively prove collusion. Rather we propose that our method be used to narrow down a large space of possibilities, into a shorter list of candidates for investigation. Authorities can then apply classical screens for evidence of illegal cooperation 46,47 , for example by observing abnormal stability in prices or market shares 38,44 , or by comparing observed behavior against a model of competitive behavior 20,48 . More granular data required for these tests for collusion can be collected once a key subset of firms is identified, at significantly lower cost. Other data-driven studies develop screens for particular exotic auction formats such as average-price auctions or multi-round auctions 49 -our approach can also complement these context-specific screens.
Another advantage of our approach is that it does not rely on information from whistleblowers to highlight a candidate group of firms, avoiding a potential source of bias in the cartel literature 50 . We also acknowledge that there are other cartel strategies in public contracting markets beside bid-rigging, for instance when firms agree to stay out of each others' markets entirely.
In our model we do not consider the idea that some firms might simply be honest and refuse to form a cartel even in optimal conditions, nor do we consider how fear of prosecution might influence the choice to collude. Though the illegality of collusion adds an additional obstacle to the emergence of cooperation among firms, the empirical observation that cartel life spans are heterogeneous suggests that many firms are willing to collude, but that only certain environments are conducive to cartels 7 .
In spite of the limitations, we note that our method can be applied to other questions about cartels. For example, what does the co-bidding network look like near a cartel when it is born compared to when it dies? The inner-workings of potential cartels would surely be reflected in network structure of the market. Observed cartels in other contexts have operated by methods such as rotating the winner 51 , by side payments to losing firms 20 , and some even run internal auctions to optimize their profits 52 . Observing the relationship between the procedure by which contracts are awarded, for example to the trimmed-average bidder in Italian road contracts 28 or by randomly chosen open or sealed bid procedures in timber auctions 53 , and network structure may also reveal whether firms are competing or colluding. The specifics of a market and manner by which contracts are awarded matters a great deal to how collusion might evolve 54 . Certain rules make it easier for firms to collude or easier to detect collusion 28,53 .
We are confident that our approach can be applied to these cases in which we have extra information about the rules of a market. It is likely still the case that certain patterns of interaction are effective markers of collusion and that networks provide a useful map of such interactions.

Methods and Data
Co-bidding networks, group detection, and group features. We define a public contracting market's co-bidding network as a projection of a bipartite network onto the set of firms active in the market. Specifically, we form a bipartite network of contracts and firms bidding on them, then create a network of firms which bid for the same contract. We weight the connections based on the similarity of co-bidding behavior between firms using Jaccard similarity. Specifically, firm A and firm B are connected by a link with weight equal to the overlap of the contracts they bid on: where c A (c B ) is the set of contracts of A (B) with at least one other bidder and |·| is the cardinality of a set. Given a co-bidding network our aim is to extract groups of nodes which may be analyzed for cartel activity. Groups should be communities in the network sense: there should be more interactions within the group than leaving the group. The case in question suggests several other criteria for our algorithm. Groups should be small, as cooperation becomes more difficult to sustain with more participants. Firms might be present in more than one part of the market, so we should consider overlapping groups.
We adapt a bottom-up method for community detection which merges nodes into groups by local optimization of a fitness function from previous work by Lancichinetti, Fortunato, and Kertész (hence: LFK) 39 . We define the fitness f G of a group of nodes G in a network as: www.nature.com/scientificreports www.nature.com/scientificreports/ where s in G and s out G denote the strength (the sum of weights) of edges within the group and adjacent to the group, respectively. |G| is the size of the group, and α and β are free parameters which control the size of the groups found. When α is increased, additional strength is penalized, while β penalizes the number of group members independently of their strength. In the paper we set both parameters to 1.5. Increasing α insures that new nodes added to a group interact primarily within the group, while increasing β restricts the size of the groups we detect, in line with the stylized facts about cartels from the economics literature that lasting cartels are small and frequently interacting 7 . We report the sizes of the groups found in the empirical cases in the SI.
Given such a fitness function of a group of nodes in a co-bidding network, we can define the fitness of a node n relative to a group by calculating the difference in fitness of the group with n and without it: With this node-level measure of fitness we can define our group detection algorithm. For each node in the network: • select a node n and initialize a group containing only n, • select the neighbor of n with the largest fitness and, if it has positive fitness, add it to the group.
• repeat until no nodes adjacent to the group have positive fitness.
In this way we find groups in the network which are overlapping, small (tuned by the parameters), with more weight among themselves than with non-group members. It is possible to save significant computational time by initializing new groups only for nodes that have not been included in a group before. In contrast with the LFK method we do not recalculate the individual fitness of all nodes in the group following the inclusion of a new node. In this sense our adaptation is greedy and not iterative, saving computational time.
Once groups have been extracted from a market's co-bidding network, we then define topological features of each group that may suggest that the firms could form a cartel. The first measure is the coherence 40 of a group C G , the ratio of the geometric and arithmetic means of the edges weights among group members, measuring the balance and overall frequency of interactions among the group members: The second measure is exclusivity, the ratio of strength within the group over the total strength of the group, excluding on edges to non-group members, measuring the group's relative isolation in the broader market: Null models. In both empirical cases, we created null models of the market to capture the extent to which groups of certain cohesion and exclusivity emerge by chance. For the Ohio school milk data we shuffled the bidders across all contracts, preserving the number of bidders each contract received, and the number of contracts each firm bid on. In Georgia we repeated the same procedure with an additional restriction: firm bids were only shuffled among contracts with the same 2-digit Common Procurement Vocabulary (CPV) code 55 . CPV codes describe the type of good or service being contracted, from road repair to medicine. By restricting the random shuffling of bids by CPV code, we create a randomized version of the broader market which preserves the tendency of firms providing similar products to interact.
Agent based model. In this section we describe the specific parameters of our simulated model of a spatially embedded contracting market. Each simulated market was initialized with 50 firms and 75 issuers of contracts (analogous to school districts in the Ohio milk market) placed uniformly at random in the unit square. We then play 2,000 rounds corresponding to contract auctions. In each round a randomly selected issuer releases a contract C placed nearby (at a position drawn from a 2-d normal distribution centered on the issuer with standard deviation 0.3). Firms participate in the competition for the contract if they are within 0.1 distance of the contract (if no firms are close enough, the distance for inclusion is extended by 0.1 repeatedly until at least one firm participates). The set of firms participating, F, is known to all firms.
Each firm must then decide to collude or compete. Collusion is successful if all firms collude. Each firm f considers two pieces of information about the other firms in its decision making process, its memory of previous interactions with each other firm, and the relative frequency with which it meets with the others. It recalls the decision made by the other firms the previous time they met (initialized randomly) and calculates the share of previous round cooperators: www.nature.com/scientificreports www.nature.com/scientificreports/ where δf C prev equal 1 if f cooperated the last time it encountered f. This is the proportional (compared to the absolute) generalization of the tit-for-tat strategy to multi-agent games 12 . Next, f considers how often, in the last k contracts it was participating in, the current other firms were a subset of the participating firms. If this is true at least two-thirds of the time, the firm considers the other firms it meets as familiar.
i denotes the firms participating in the i'th previous contract of firm f. f frequency increases as f tends to meet the same firms. We set k to 10 in the paper.
The focal firm's decision to collude or compete depends on the product of these two factors: > . Finally, we add noise to the system by allowing a 0.1% chance that a firm spontaneously colludes. In our model agents do not learn or track the outcome of their actions -they only react to their most recent memory of other firms and the frequency by which they meet. After 2,000 contracts are awarded, we end the simulation and discard the outcomes of the first 1,000 contracts as burn-in.
Datasets. The Ohio school milk data was generously provided by Porter and Zona 37 . The data consists of a significant share of all school-milk procurement contracts from 1980s Ohio provided to Porter and Zona by the State of Ohio. Porter and Zona served as expert witnesses in a trial against the suspect cartel. There are several other significant examples of cartels in public school milk markets in the US during the 1980s, for example in Florida and Texas 20,56 .
We collected the Georgian contracts dataset from the centralized procurement portal of the State Procurement Agency (SPA) of Georgia (https://tenders.procurement.gov.ge/), including all contracts awarded through the portal between 2011 and 2016. Contracts are awarded to the lowest bidder in a sealed-bid auction. Each contract includes a product category (CPV code 55 , which we use for the null model, and a reserve price, the maximum price that the public buyer would pay for the good or service, which we use to normalize prices. The procurement portal also reports bid protests: these are legal disputes of participants in the procurement process against the agency issuing a contract. For example, a firm may protest that it was unfairly excluded from the competition for a contract.