YSTAFDB, a unified database of material stocks and flows for sustainability science

Myers, Rupert J.; Reck, Barbara K.; Graedel, T. E.

doi:10.1038/s41597-019-0085-7

Download PDF

Data Descriptor
Open access
Published: 07 June 2019

YSTAFDB, a unified database of material stocks and flows for sustainability science

Scientific Data volume 6, Article number: 84 (2019) Cite this article

4729 Accesses
16 Citations
11 Altmetric
Metrics details

Subjects

Abstract

We present the Yale Stocks and Flows Database (YSTAFDB), which comprises most of the material stocks and flows (STAF) data generated at the Center for Industrial Ecology at Yale University since the early 2000s. These data describe material cycles, criticality, and recycling in terms of 62 elements and various engineering materials, e.g., steel, on spatial scales and timeframes ranging from cities to global and the 1800s to ca. 2013. YSTAFDB integrates this diverse collection of STAF data, previously scattered across various non-uniformly formatted electronic files, into a single data structure and file format. Here, we discuss this data structure as well as the usage and formatting of data records in YSTAFDB. YSTAFDB contains 100,000+ data records that are all situated in their systems contexts, with additional metadata included as available. YSTAFDB offers a comprehensive basis upon which STAF data can be accumulated, integrated, and exchanged, and thereby improves their accessibility. Therefore, YSTAFDB facilitates deeper understanding of sustainable materials use and management, which are key goals of contemporary sustainability science.

Design Type(s)	data integration objective • source-based data analysis objective
Measurement Type(s)	reference material
Technology Type(s)	digital curation
Factor Type(s)	Material • temporal_interval • geographic location
Sample Characteristic(s)	Earth (Planet) • anthropogenic habitat

Machine-accessible metadata file describing the reported data (ISA-Tab format)

China material stocks and flows account for 1978–2018

Article Open access 25 November 2021

China economy-wide material flow account database from 1990 to 2020

Article Open access 17 August 2022

A Comprehensive dataset for Australian mine production 1799 to 2021

Article Open access 20 June 2023

Background & Summary

Sustainability science studies are becoming increasingly data intense. Concurrently, the need for these studies is growing amid heightened concerns for issues such as material scarcity, climate change, waste reduction, and equitable economic growth and development. A sustainability science study relies upon material stocks and flows (STAF) data to describe how materials and related properties such as mass, energy, and money are used in its system of interest. Such systems may be anthropogenic or natural, describe a supply chain of a company, a food web comprising endangered marine species, or environmental emissions of one or more energy generation technologies. STAF data may describe full or partial (life) cycle(s) of one or more reference material(s) in these systems, e.g., iron (Fe) in a study involving static material flow analysis (MFA)¹; transport-related goods and elements in a study involving dynamic MFA²; and battery products and components in a study involving life cycle assessment (LCA)³. Analyses of material efficiency⁴, criticality (i.e., the risk of material unavailability)⁵, and recycling⁶ may analyze these data directly to describe material systems; alternatively, STAF data may be applied to characterize impacts and assess environmental damage (or benefit) of product systems^7,8.

Sustainability science studies are constrained by the limited availability of STAF data and their ease of (re)use. The availability of useable STAF data is compounded by multiple factors, including: (1) the high and increasing interconnectedness and complexity of anthropogenic and natural systems; (2) the relatively recent development and usage of computational approaches to sustainability science⁹; and (3) the ongoing establishment of infrastructure to support these approaches. Current efforts into developing research practices that enable higher data accessibility, transparency, and efficient re-use of data^10,11 may eventually alleviate some of the current challenges. Presently, STAF data are obtained from diverse (public and confidential) sources, structured in different formats, described with different terminology, and produced using different methodologies. Consequently, results from apparently similar sustainability science studies may vary significantly. These issues make it challenging to unify and build upon STAF data, and also to verify the reliability of the studies that use them. Therefore, a comprehensive and openly accessible STAF database would be a highly desirable resource for sustainability science. It would facilitate re-use of STAF data and lead to more reliable and higher quality sustainability science analyses and assessments.

This paper presents the Yale STAF Database (YSTAFDB), which contains most of the STAF data associated with studies of material cycles, recycling, and criticality conducted by Graedel and colleagues at the Center for Industrial Ecology at Yale University since the early 2000s. These 100,000+ data records were previously reported in various formats across 60 publications (e.g.^{5,12,13,14,15}). YSTAFDB is unique in its diversity of STAF data, which cover ~75% of the periodic table of elements excluding those that are non-primordial and radioactive (e.g., polonium [Po]), various engineering materials, spatial scales ranging from local to global, and timescales from the early 1800s to ca. 2013. The data are recorded in a consistent manner within a material cycle ‘systems context’¹⁶. Therefore, YSTAFDB presents a step toward overcoming the limited accessibility that has resulted from the incomplete availability of the STAF data, by integrating them into a single data structure and database format.

It is useful to indicate the broader context of the material cycles, recycling, and criticality studies from which data records in YSTAFDB originate. These data result from an approximately two decades long ‘Stocks and Flows (STAF) Project’ that sought to quantitatively describe anthropogenic material cycles. However, the STAF Project was conducted among a wider industrial ecology research community effort to understand material systems: this community notably applied MFA as a basis for analysis of environmental and policy issues. Other exemplars of this community effort include: work done by Baccini, Brunner, and colleagues, who were key players in defining MFA methodology in a systematic way and applying it to understand the metabolism of anthropogenic systems such as cities and local regions^17,18; and also coordinated studies conducted at institute (e.g., Wuppertal Institute) to international (e.g., Organisation for Economic Cooperation and Development) levels to improve material efficiency, particularly at the national economy scale. For example, the Japanese Government initiative to develop a ‘sound material-cycle society’¹⁹ developed and applied MFA indicators to measure and drive its resource productivity agenda^20,21. This history indicates that the STAF Project and YSTAFDB comprise one key part of a landscape of MFA studies and STAF data.

The comprehensive and consistent nature of data records in YSTAFDB facilitates its use as a key STAF data resource. For example, YSTAFDB may be used to accumulate, structure, and enhance STAF data in the future to facilitate sustainability analyses and assessments, and thus to help identify and approach sustainable development. These additional STAF data may be sourced from historical work such as those described above, as well as (more) contemporary and future studies. We are working toward this goal through this initial release of YSTAFDB, which we provide and discuss as a set of comma separated value (csv) files. The following sections of this paper describe the methods used to create YSTAFDB, its properties and the data records in the csv files as released, and its usage.

Methods

YSTAFDB contains data from 60 publications that are broadly grouped into three categories: (1) material cycles; (2) criticality; and (3) recycling. Brief descriptions of the methods used to produce these data are provided here; complete descriptions of these methods are available elsewhere^16,22. Our approach is to record the material cycles data within their systems contexts, utilizing the Unified Materials Information System (UMIS)²³ as a data structure, also described here.

Material cycles

The first step to produce a material cycle (Fig. 1) is to define its goal and scope. This may be, e.g., to quantify how much and in what form Cu is used across all major anthropogenic activities in 2018. A system of interest, i.e., a system that corresponds to the goal and scope, is defined by a system boundary that comprises a reference material, a reference timeframe, and a reference space. For the example described here, the system boundary may be represented by Cu (reference material), the year 2018 (reference timeframe), and geographic entity (e.g., North America). Reference materials may be elements (e.g., copper [Cu]), engineering materials (e.g., brass), specific products (e.g., a Ford Focus), product groups (e.g., cars), etc.

The system is then populated with processes, which are linked together by flows in sufficient detail to satisfy the project goal and scope. Processes involve one or more of the following properties: (1) transformation, to transform material from one type to another; (2) distribution, to distribute material from one location to another; (3) and/or storage, to withdraw and/or deposit material from or into a stock. Distributive processes may be used as modelling constructs or conceptual tools in material cycles to simplify the underlying calculations and data visualisations. They are sometimes depicted as ‘market processes’ (e.g.^1,24). Processes and flows are often specified to cumulatively represent a (life) cycle of a reference material of interest, e.g., the global socioeconomic metabolism with a reference material of ‘all materials’²⁵. In the Cu example used in the previous paragraph, processes and flows would be specified to describe the anthropogenic Cu cycle, including the production of engineering materials (e.g., Cu metal), fabrication and manufacturing (e.g., of Cu wire), use (e.g., in buildings), waste management and recycling (e.g., of old wire scrap), and the relationships that these processes have with the environment (e.g., mining of chalcopyrite ore) and Cu stocks (e.g., unmined chalcopyrite ore). Quantitative data are then collected from various sources to describe as many processes, stocks, and flows as possible. The processes, stocks, and flows without quantitative data are termed ‘data gaps’.

Data gaps can be reduced or filled by applying mass and energy conservation, assumptions, or estimations to the reference material cycle. This is typically done in one of two ways:

(1)
By re-assessing and re-specifying the processes and flows initially used to define and populate the system of interest, and obtaining additional quantitative data for these re-specified processes and flows¹⁶. This procedure may be iterated many times before the data gaps are sufficiently reduced and the system is described to the level of detail desired by the analyst, i.e., until it fulfills the project goal and scope. Alternatively, the project goal and scope may be redefined to accommodate the prevailing availability of STAF data. This method leads to material cycles specified in terms of user-defined processes and flows, and thus often requires reported data to be reinterpreted by the analyst.
(2)
By estimating data gaps, including the uncertainties of these estimates, without re-specification of processes and flows²⁶. This method may lead to material cycle models with relatively poorer initial accuracy, although ‘incremental’ refinement of data used to fill data gaps through use of additional data sources would eventually lead to more reliable, accurate, and transparent models than those produced using the former method. The recently developed incremental method^26,27 facilitates closer comparison among material cycle models and to data reported by data providers, e.g.²⁸.

YSTAFDB contains STAF data for material cycles that were produced using the first method only. However, STAF data generated using both methods can be stored in YSTAFDB – they are distinguished here only to clarify how material cycles may be produced.

Unified Materials Information System (UMIS)

Material cycles data were structured using the Unified Materials Information System (UMIS)²³ in YSTAFDB. UMIS is a data structure that can be used to integrate STAF data from various sources into a consistently formatted, flexible, and generalizable system context without loss of information. UMIS does this by labeling STAF data with their positions in their respective systems. These labels uniquely index subsystems, their constituent processes and flows, and also stocks and metadata associated with these processes and flows. In a tree-type hierarchy of processes arranged by specificity, we term the parent a subsystem and its child a process. In doing so, we adopt common informatics terminology (‘parent’, ‘child’, ‘tree’, ‘tables’, ‘mapping tables’, etc.) that is relevant to describing the same types of data systems (e.g., databases) in sustainability science.

Process labels take the form a.b.c.d.e, where a is the reference material, b is the aggregate subsystem module abbreviation (representing the material (life) cycle stage), c is the subsystem code, d is the type of process (transformative (T) or distributive (D)), e is a process code that is unique to each process in each subsystem for reference material a, and dots (.) demarcate these five components of process labels. An example process label is 58.USE.3.T.1;1. Flow labels take the form origin_destination, where origin and destination refer to initial and terminal processes for the flow. An example flow label is 1.ENV.5;1.D.12;12_1.PEM.1.T.1;1. These UMIS structured data, within their material cycle and systems context, may be visualized completely in UMIS ‘elicitation’ diagrams (Fig. 2).

Figure 2a,d each present two generic material cycles in conventional block flow diagram format (four in total), two with flows omitted (a), and the same two diagrams but with flows included (d). Two generic material cycles are presented in each figure panel (a) and (d) to illustrate how ‘divergent’ disaggregation is conceptually treated in UMIS²³. The reader can observe that the anthropogenic processes represented in the block flow diagrams within a single figure panel, (a) or (d), are disaggregated differently: in one, a generic ‘anthropogenic’ process is disaggregated into ‘production & use’ and ‘recycling & disposal’ processes; in the other, a generic ‘anthropogenic’ process is disaggregated into ‘production & alloying’ and ‘use & waste management’ processes. We illustrate how UMIS reconciles this case of divergent disaggregation in Fig. 2b,c,e,f, where we equivalently represent the data in these two block flow diagrams into one tree-type (b-c) UMIS diagram and one matrix-type UMIS diagram (e-f). We provide these alternative diagrams to conceptually show how data can be comprehensively and consistently structured in UMIS at the whole system level, in a generic yet unified manner. They are complementary to traditional STAF data visualizations such as block flow and Sankey diagrams.

We purposely distinguish processes and flows data, stocks data, and metadata in order to conceptualize them as three distinct layers in UMIS that together completely describe a material system (e.g., a material cycle). These layers are termed the ‘processes and flows layer’, the ‘virtual reservoir’, and the ‘metadata layer’ in UMIS (see Fig. 4b in²³; Fig. 2 here shows the ‘processes and flows layer’ only). We distinguish the process and flows layer, which has a direct physical meaning analogous to an input-output table, from the latter two layers that have indirect physical meanings such as uncertainty determinations²³. This distinction is useful to enhance the comparability of the processes and flows layer in UMIS to the flow-based input-output tables and process matrices that are used in input-output analysis and life cycle assessment.

Tree-type UMIS diagrams are useful for visualizing STAF data within databases such as YSTAFDB. These diagrams depict ‘trees’ (Figs 2b,e and 3) that may cumulatively represent the entire material cycle (and thus potentially also the whole system). In a previous contribution²³ we termed these trees ‘material trees’; however, we henceforth refer to these trees as ‘process trees’ because this term is better aligned with their purpose, which is to represent tree-type process hierarchies such as those shown in Fig. 3. Branches in these process trees represent non-overlapping parts of a material cycle, which are often represented in terms of (life cycle) stages (e.g., fabrication and manufacturing). Infinite disaggregation of data is possible in UMIS by specifying child, grandchild, etc., nodes (processes and/or subsystems) in each process tree. Disaggregation is termed ‘consistent’ if each disaggregated node is more specific than its parent node. However, nodes can be disaggregated by specificity in more than one way: e.g., a cars process may be disaggregated into red cars and blue cars processes, or alternatively, that same cars process may be disaggregated into big cars and small cars processes. Here, red/blue and big/small cars occur on the same disaggregation level in the process tree and are not additive. This type of disaggregation is termed ‘divergent’. UMIS uniquely labels consistent and divergent disaggregation such that double counting of data can be avoided in computational modeling of UMIS structured STAF data. Figure 2b,e show tree-type UMIS diagrams for a material system containing divergent disaggregation of data in subsystem ANT.1 (in aggregate subsystem module ANT). Figure 3 shows consistent disaggregation.

The tree-type UMIS diagram in Fig. 3 is simplified by showing labeled subsystems, several flows, and omitting processes. However, all processes and flows can be shown explicitly in tree-type UMIS elicitation diagrams (Fig. 2c,f). Tree-type UMIS elicitation diagrams that include all labeled processes are thus able to concisely describe and visualize classified and labeled STAF data. Therefore, they may be used to comprehensively query data records in STAF databases such as YSTAFDB (processes, flows, stocks, metadata).

UMIS labels of processes and flows in YSTAFDB data records can be parsed to identify and update their locations in material cycles. Parsing of UMIS labels in data records would be needed to accommodate changes to the STAF data structure, e.g., to update them as more data are added to YSTAFDB and/or as subsystem and/or process disaggregation changes. Therefore, we envisage that integrating additional data into YSTAFDB, and use of existing data in YSTAFDB, will be facilitated through the development and application of an internationally standardized classification system for processes and flows (i.e., materials, products, energy, etc. that are distributed among processes). The Harmonized System (HS)²⁹ is one such classification system that may be used for this purpose. The North American Industry Classification System (NAICS) is another. The use of a classification system will thus also reduce reinterpretation of reported data.

Criticality

A criticality assessment characterizes the risk of material unavailability in a system of interest. Criticality assessments are produced using STAF data for material cycles and supplemented by additional data as established by the methodology used (e.g., political stability indicator values⁵).

Criticality data in YSTAFDB were produced using the methodology developed by Graedel and colleagues^5,22 (Fig. 4). This methodology defines criticality along three dimensions: (1) supply risk (sr); (2) environmental implications (ei); and (3) vulnerability to supply restriction (vsr). The overall criticality indicator is a linear combination of scores along these three dimensions.

Supply risk characterizes the chance that material supply, from both virgin and secondary resources, may not meet demand. It is characterized as medium-term or long-term depending on the assessment scope. Medium-term is particularly relevant to corporations and nations, and timescales of 5–10 years, whereas long-term is for global assessments and timescales of decades or longer. Medium- and long-term supply risk includes a geological, technological, and economic (gte) indicator, which comprises depletion time (dt) and companion metal fraction (cf) factors. Depletion time represents several combined effects: reserves; mining production; demand; output from the use phase; quantity landfilled; secondary (scrap) supply; net loss to tailings, slag, and other by-products; lifetime; and end-of-life recycling rate. Medium-term supply risk additionally comprises a social and regulatory (s_r) indicator, containing policy potential index (ppi) and human development index (hdi) factors, and a geopolitical (gp) indicator, containing worldwide governance indicators – political stability & absence of violence/terrorism (wgi_pv), and global supply concentration (gsc) factors.

Environmental implications represent the potential burdens that materials place on the environment throughout their (life) cycles, e.g., damage to ecosystems caused by toxic emissions from metal production, which may limit their availability. The environmental implications indicator for a material is determined by grouping damage to human health and to ecosystems, which are produced through a cradle-to-gate life cycle assessment for that material. Environmental implications data in YSTAFDB utilize a functional unit of 1 kg material at the factory gate, inventory data from Ecoinvent 2.2³⁰, and the ReCiPe v1.10 method with world normalization and hierarchist weighting³¹.

Vulnerability to supply restriction characterizes the importance of a material to society, e.g., iron (Fe) is globally relied upon in infrastructure, housing, vehicles, etc., so is relatively important. It is determined differently on corporate, national, and global levels. Although included in the criticality methodology developed by Graedel and colleagues, values at the corporate level were not calculated; readers are directed to^5,22 for methodological details at this level. National level vulnerability to supply restriction contains an importance (i) indicator comprised of material assets (ma) and national economic importance (ne) factors, a substitutability (s) indicator comprised of substitute performance (sp), substitute availability (sa), environmental impact ratio (er), and net import reliance ratio (irr) factors, and a susceptibility (su) indicator comprised of global innovation index (gii) and net import reliance (ir) factors. Global vulnerability to supply restriction contains an importance (i) indicator comprised of a single material assets (ma) factor, and a substitutability (s) indicator, comprised of substitute performance (sp), substitute availability (sa), and environmental impact ratio (er) factors.

Recycling

Material cycles also characterize material recycling. Recycling related properties of material cycles reported by Graedel and colleagues were summarized in a few key publications^6,14,32. These properties include:

(1)
in-use dissipation, indicating unrecoverable material lost during use;
(2)
rates of currently unrecyclable and potentially recyclable material, and of end-of-life recycling;
(3)
market shares of key material applications, such as construction, machinery, packaging, etc.; and
(4)
unspecified recycling potential, which is used in the absence of data.

Currently unrecyclable material is material that is prevented from recycling due to prevailing technological and economic barriers. Potentially recyclable material may be functionally recycled, non-functionally recycled, or not recovered. End-of-life recycling rates in YSTAFDB correspond to functionally recycled material only, i.e., percentages of material sent back to production, fabrication, etc. (in the same material cycle) from waste management relative to the amount of material sent to waste management from use at end-of-life. Various recycling related characteristics of material cycles are shown in Fig. 5.

Data input and templates

Numerous differently formatted STAF data were input into YSTAFDB. Many of these data exist graphically and in portable document format (pdf), which therefore required significant manual effort to extract. Some data were present in spreadsheet format: these data were typically sorted by system boundary properties (reference material, reference timeframe, reference space), and by process or flow name. We consistently reformatted STAF data from material cycles into uniformly formatted spreadsheets, hereafter ‘templates’, which were then consistently parsed in a later step. The template structure was specified to allow sufficient annotation of key STAF data properties, e.g., material name, units, etc. We included these key properties in YSTAFDB as metadata. STAF data for a single material cycle publication were used to fill one template, such that ~60 filled templates were used to develop the material cycles database tables in YSTAFDB. An example template is provided along with this contribution³³.

Data Records

YSTAFDB contains the fifteen core tables shown in Table 1 and Fig. 6, which are each provided as csv files³³. Headings of tables in YSTAFDB are hereafter written in italics. A complete list of core tables and fields in YSTAFDB, including descriptions and examples, can be viewed in Table S1 of the Supplementary Information. These tables are supplemented by 63 hierarchy tables (Table 1 and Fig. 6), which are also each provided as csv files³³. Each hierarchy table represents the complete process/subsystem hierarchy of a reference material cycle for which data are available in YSTAFDB. Excerpts from the flows data table and flows_citations mapping table are shown in Tables 2 and 3, respectively, to indicate their nature. YSTAFDB contains a total of 115,829 data records in the flows, cross_boundary_flows, processes, recycling, criticality, criticality_sr, criticality_ei, and criticality_vsr data tables.

Table 1 Descriptions of the core and supplementary tables in YSTAFDB.

Full size table

Table 2 An excerpt from the flows table in YSTAFDB. ‘\N’ indicates a ‘null’ data entry (i.e., an empty cell).

Full size table

Table 3 An excerpt from the flows_citations table in YSTAFDB. ‘\N’ indicates a ‘null’ data entry (i.e., an empty cell).

Full size table

Publications and citations

Material cycles, recycling, and criticality data from 60 published studies were stored in YSTAFDB. These data were themselves produced through the collection, interpretation, and analysis of data from peer-reviewed journal papers³⁴, government reports³⁵, metal association and study group reports³⁶, industry consultations, and other sources^28,37. The sources of these data are defined as citations in YSTAFDB. The published studies that analyzed these data are referred to as publications in YSTAFDB (and available in the publications table). Information such as author, title, journal, year, doi, etc. are recorded in these tables, as well as unique identifiers (hereafter ‘ids’) for each data record. Data records in the publications and citations tables are referred to using their ids in other YSTAFDB tables.