Problem statement

Data is a resource that has asset value (Fisher, 2009) and can be economically capitalized (Mayer-Schönberger and Ramge, 2018), meanwhile data protection and privacy regulations are rapidly evolving across the globe (European Union, 2016; China, 2017; Monteiro, 2018). Data sharing agreements are increasingly complex yet often insufficient which has in turn resulted in data producers not being able to capitalize on the full value of their data resources (OECD Publishing, 2011). Further, trade in online data-oriented services are facing obstacles at increasing frequency due to political meddling via antitrust probes or antiterrorism measures like the US Patriot Act. Political interventions tend to lead to an undermining of individual self-determinism and autonomy as they disrupt the natural progression of a global data economy. We propose defining the term “data sovereign” [noun] as a person or entity with the ability to possess and protect the data. Here, the word “sovereign” is borrowed from the fundamental economic notion of “consumer sovereignty (Treasury Board of Canada Secretariat, 2006)” that William H. Hutt coined to denote “… the market economy, the production decisions of entrepreneurs are rigidly governed by the freely expressed spending decisions of consumers (Salerno, 2009).” The term data sovereign is the needed term to appropriately fill in an existing terminology gap as it, for example, enables rights to data portability, is conducive to the free flow of cross-border data, and assists in the economic agglomeration of cyberspace.

Data ownership is initially assigned by default to the person providing the data (e.g., a social media user) to a particular entity (e.g., organization, State, country) (Fig. 1). Just like Zuboff enumerated in her Surveillance Capitalism (Zuboff, 2019): “The idea of ‘data ownership’ is often championed as a solution. But what is the point of owning data that should not exist in the first place? It’s like negotiating how many hours a day a seven-year-old should be allowed to work, rather than contesting the fundamental legitimacy of child labor.” In contrast, a data producer is any agent who has the ability to add value to raw data or other primary data products rather than the individual providing their personal data. Existing terms, such as data residency and data localization, that describe the global flow of data were developed for cloud infrastructure technologies. Data residency refers to the geographic locale in which the organization physically stores its data. Data localization differs from data residency in that data is additionally subject to the laws of the country in which it is physically stored. In lay use, data residency and data localization are used so interchangeably that their individual meanings have become lost and indistinguishable.

Fig. 1
figure 1

Lexicon of terms relevant to data disputes.

Proposed concept

In contrast to the above terms, we propose that the data sovereign is initially appointed as the entity possessing the data (e.g., the social media platform, government entity). In our formulation, data sovereign status is achieved when one both possesses the data and can defend any attack on that data. Examples of an attack on the data might be a data breach scandal (Snider and Baig, 2019), or an infiltration by a State actor. Weber defines “power” as the ability to possess any resource (Trans. Waters, Waters, et al., 2010). Following Weber’s formulations, data as an economic resource should be preserved through “the sole grantor of the right to physical force (Waters, 2015)” with the legitimacy of domination. Such a willingness of enterprises to convey power may effectively align incentives for data innovations towards enhancing data resource protection capabilities. Using “force” to protect data does not imply an abandonment of data sharing. Rather, it should be easy for an organization (e.g., corporations, industry bodies, government entities) to enable sharing of data and data products internally or with trusted partners. From an operational standpoint, a data force belongs to a specific data sovereign or an alliance of data sovereigns and functions like a guild of mercenaries who conduct data activities. The core function of the data force is the defense and exploitation of the data sovereign’s data and data products. In essence, the data force aims to enhance data and data products by repairing or mitigating vulnerabilities through the use of data innovations. Wargaming, the simulation of different confrontations, is a strategy used by data forces to identify and patch vulnerabilities, and may include simulation of attacks on other data sovereigns as pre-emptive strikes.

The results of a COVID-19 screening test are owned by the patient, who often takes possession of them through a patient portal into the electronic health record system or via a phone call from the healthcare facility. These data are of use to public health officials for contact tracing of positive cases to identify those whom the patient came in contact with while they were contagious. The lab is (likely) mandated by regulations to report positive cases to the local public health office, who then becomes the data sovereign as they are the ones compiling statistics from multiple labs in the locale and acting on the data. Data sharing agreements with the Centers for Disease Control may exist with some State and local governments, for example, where their assistance has been requested. Healthcare organizations, and public health officials may work together to create models that simulate the spread of the pandemic, which is useful for resource planning and health policy purposes. Similarly, the fast-growing global online food delivery services market, of which the most significant segment is Restaurant-to-Consumer Delivery, tracks the food preferences and habits of consumers to improve service, expand the customer base through targeted advertising, and optimize menus to fit the food preferences of the community. Delivery orders, owned by the consumer, are placed through a mobile app (e.g., Uber Eats). This data can be later used to shorten the distance between farm and table, which is particularly helpful to dairy farmers (Waters et al., 2010) in the COVID-19 era. The person must provide a phone number (mobile) and geolocation to the app when buying food for delivery, resulting in the appointment of the data producer (i.e., organization) controlling the app as a data sovereign over the shared data. Local, State, and/or Federal regulations govern how personal information must be protected and how/when it can be used by the organization. For example, some data may require data localization to prevent the data from being stored or transferred internationally, and the ability of law enforcement to utilize geolocation tracking of cell phones varies by locale in terms of privacy protections and warrant requirements. There are legitimate reasons to keep personal data private and to not allow it to be part of the global data economy. A malicious actor such as an identity thief cannot be a data sovereign as illegitimate make money fast schemes require the divestment of collected data or operationalization of the collected data in an attack on legitimate data sovereigns. Similarly, data that is of national security interest has a State or Federal agency as it’s data sovereign. These data sovereigns must defend this sensitive data from data piracy and data terrorism threats. Social media networks and adult websites are often criticized for profiting off of revenge porn, deep fakes (i.e., AI-generated videos that realistically places new faces onto existing faces in the videos), or content published without a person’s/the copyright holder’s consent or permission. While these organizations do remove videos with content that violates their terms of service when requested or to comply with a court order, it becomes a game of whack-a-mole unless data tracking technology is put in place at content upload to self-delete content previously flagged or removed. YouTube’s Content ID (Shinn, 2015) is an example of such data tracking technology, as it creates a digital fingerprint to allow copyright holders to identify and manage their copyrighted content by blocking new videos, monetizing those videos, or tracking the video’s viewership statistics. An alternate strategy to fend off data attacks would be to apply Snapchat’s idea to vanish data in the source after transference.

Economic benefits

Finding a term for dedicated use might help resolve a long-standing debate over various data disputes (Fig. 1), such as in where and how to set data resource boundaries in cyberspace (Kalir and Maxwell, 2002). Our goal in coining “data sovereign” is to eliminate some misunderstandings regarding data resources as rightfully belonging to data producers instead of the person providing the data or the State. Regulations in certain countries assign data ownership to the State, which has the potential to significantly disrupt global “data” economic paths to prosperity. For example (Bogaerts and Segers, 2018).

Russia’s On Personal Data Law (OPD-Law) requires the storage, update, and retrieval of data on its citizens to be limited to data center resources within the Russian Federation. While laws like this do tend to enhance citizen’s privacy from other nation States, they tend to be motivated by national protectionism and contribute to the creation of border-based data silos that obstruct businesses and governments from realizing the full potential of global cross-border data flows. In negotiating a trade agreement, the United States would prefer that (The United States Trade Representative, 2019) the European Union “does not impose measures (e.g., customs duties) that restrict cross-border data flows” and “does not require the use or installation of local computing facilities.” The United States takes a different tact with China, blocking both TikTok and a US$1.2 billion deal between MoneyGram and Ant Financial (a business conglomerate owned by Alibaba Express) (StraitsTimes.Com, 2018) in order to keep personal information and identifying data out of the hands of a foreign entity that they think shares information freely with a foreign government. Our definition stands contrary to digital factionalism and the “splinternet (Box, 2019)” and may thus cause controversy in both academic and industrial circles. We recommend using the power of decentralized markets to protect data via non-legislative acts—production and protection from attack.

When it comes to data, “accessing data trumps owning it (Ogilvy, 2017),” thus data sharing and portability rights are often the main commodity sold or traded to external entities. Like the transfer of State sovereignty in international relations (Taylor, 1997), utilizing the concept of a data sovereign enables the transfer of rights for data sharing and/or portability purposes. The data sovereign can transfer partial rights to governments or other industrial competitors through a modest negotiation. Such a transfer is proactive and strategic when developed by anticipating or analyzing trends and reviewing the organization’s past performance concerning externally induced crises or threats. When it comes to data resources, it is often too late to execute a reactive transfer when opportunities arise as these defensive transfers often incur from or result in a considerable loss of data asset valuation.

The Internet Archive was founded by Brewster Kahle to preserve large quantities of the World Wide Web. A recent attack on the Internet Archive was conducted by the US National Writers Union and several digital publishing companies (Hasbrouck, 2020). They listed five distribution channels that indicate how authors (and publishers) were harmed by the presence of an archive, including: (1) downloads via OpenLibrary.org of e-books assembled from page images, (2) audiobooks generated from images of scanned pages, (3) viewing of page images on OpenLibrary.org, (4) viewing of page images on Archive.org, and (5) APIs for automated downloads of page images. Kahle responded to their attack on the Internet Archive by noting that “many of these books are no longer available for sale in the original book form.” Books that are no longer available for purchase in their original format don’t collect royalties, and thus publishers and authors lack a valid economic argument for harm from the Internet Archive collating this material.

A data sovereign that is inherently concerned with the movement of data may be amenable to the free flow of data across geopolitical borders. We often take for granted that we live in an interconnected world. With the adoption of “data sovereign,” boundaries in cyberspace are appropriately defined. For example, Europe’s “right to be forgotten” that gives European Union citizens the power to demand data about them be deleted or restricted only applies within the European Union’s borders as only this territory is subject to European regulations (unless otherwise negotiated in a trade agreement) (Court of Justice of the European Union, 2019).

The concept of data sovereign enables economic agglomeration in cyberspace. Utilizing data as an economic resource usually has the threat of new entrants and substitute products. When a data sovereign cannot effectively protect their own data resources, it may be prudent to join an alliance of data sovereigns for added protection. This concept of a “data sovereign” may help form data industry clusters in cyberspace that can grow the global data economy through their collaborations. Collective proactive defense and transfer reflects the data sovereign’s subjective willingness of using the power of decentralized markets to protect data via non-legislative acts. For example, the co-location of computer servers from multiple organizations in a secure data center provides improved security and economic benefits in terms of rent, personnel, and efficient use of large air conditioning systems creating efficiencies in electricity consumption. Note that such an agglomeration no longer aims to optimize the internal flow of data but seeks to form a strategic defense in the form of physical and virtual fortifications built by the data sovereign conglomerate.

Conclusion

The history of resource allocation has always been deeply political, and data is now viewed by governments as a valuable resource. An inequity in data-rich profit distribution continues to exist, partially due to government constraints or meddling, which potentially threatens the global data economy and the values of social justice on which it is based. One should be wary of any political determinism or interventions regarding inequalities of data resources exploitation. Instead of anti-trust breakups or mandated takeovers by a non-foreign entity (e.g., Oracle’s potential acquisition of TikTok), designating data sovereigns may solve the root problem by relying on the rule of “survival the fittest” that will likely produce smaller tech firms and open the field to more competitors. The approaches of deriving profits from data remain unclear if data producers cannot be confirmed as data sovereigns. Being designated as a data sovereign, to some extent, is an incentive for data producers (e.g., organizations, corporations, industry bodies, or government specifies) to engage in appropriate management of data as a resource. In sum, “data sovereign” is useful from both a data science perspective and an economic perspective. The definition and meaning of “data sovereign” are apposite and may facilitate to solely recognize the sovereign affiliation of a data resource and its possible future capitalization.