Publication, funding, and experimental data in support of Human Reference Atlas construction and usage

Experts from 18 consortia are collaborating on the Human Reference Atlas (HRA) which aims to map the 37 trillion cells in the healthy human body. Information relevant for HRA construction and usage is held by experts, published in scholarly papers, and captured in experimental data. However, these data sources use different metadata schemas and cannot be cross-searched efficiently. This paper documents the compilation of a dataset, named HRAlit, that links the 136 HRA v1.4 digital objects (31 organs with 4,279 anatomical structures, 1,210 cell types, 2,089 biomarkers) to 583,117 experts; 7,103,180 publications; 896,680 funded projects, and 1,816 experimental datasets. The resulting HRAlit has 22 tables with 20,939,937 records including 6 junction tables with 13,170,651 relationships. The HRAlit can be mined to identify leading experts, major papers, funding trends, or alignment with existing ontologies in support of systematic HRA construction and usage.

The year in which the author's first publication appeared.

involved_funding int8
An integer indicating the number of funding opportunities the author has been involved in.

Notes
The "involved_funding" field indicates the number of funded projects that have supported the author's publications as of August 2023.This field will need to be updated as the author secures new funding in the future.

Notes
The "pmid" field is a unique identifier based on the PubMed system, commonly used in biomedical literature.
The "doi" field is another unique identifier that provides a permanent link to the article.
hralit_institution Purpose To store detailed information about research institutions.

Notes
The "acronym" field can be used for easier identification and referencing of funding sources.

hralit_funder_cl eaned
Purpose To store cleaned and standardized information about entities that fund research.

Notes
The data for this table is sourced from the 5th release of the Anatomical Structures, Cell Types, and Biomarker (ASCT+B) Tables.

hralit_other_pu blication
Purpose To store information of publications associated with CZ CELLxGENE and CellMarker.

Notes
The data for this

Notes
This table serves as a junction table to establish many-to-many relationships among anatomical_structure entities, cell type entities, and biomarker entities.
The data for this table is sourced from the 5th release of the Anatomical Structures, Cell Types, and Biomarker (ASCT+B) Tables.
Table S2.Record count statistics for 22 HRAlit database tables.
hralit_publicatio n Purpose To store detailed information about individual publications.Fields Column Data type Description pmid varchar Unique identifier for each publication based on the PubMed ID. doi varchar Digital Object Identifier for the publication.pubyear int8 The year in which the publication appeared.article_title varchar Title of the article.journal_title varchar Journal in which the article was published.
for the organ related to the dataset.donor_id varchar Identifier for the donor associated with the dataset.individual_id varchar Identifier for the individual from whom the sample was taken.protocols_used varchar Protocols used in the generation or analysis of the dataset.rin_score_from_p axgene varchar RIN score obtained from the PAXgene process.rin_score_from_fr ozen varchar RIN score obtained from the frozen sample.organ varchar Organ from which the sample was taken.autolysis_score varchar Score indicating the level of tissue autolysis.sample_ischemic _time varchar Time for which the sample underwent ischemia.sample_type varchar Type of sample (e.g., normal).pathology_notes varchar Notes related to pathology findings.source varchar Source from which the dataset was obtained (i.e., hubmap, cxg, gtex).dataset_hubmap_ id varchar HuBMAP identifier for the dataset.dataset_status varchar Current status of the dataset (e.g., published).dataset_date_tim e_created varchar Date and time the dataset was created.dataset_date_tim varchar Date and time the dataset was last modified.e_modified dataset_data_typ es varchar Types of data included in the dataset.dataset_portal_url varchar URL to access the dataset in a portal.the collection to which the dataset belongs.collection_name varchar Name of the collection.publication_doi varchar DOI for publications related to the dataset.organ_ontology varchar Ontological classification for the organ.anatomical_struct ure varchar Description of the anatomical structure from which the sample was taken.anatomical_struct varchar Ontological classification for the anatomical structure.ure_ontology suspension_type varchar Type of suspension used in the sample.
NotesThe data for this table is sourced from CZ CELLxGene API and CellMarker.CellMarker data is available at http://xteam.xbio.top/CellMarker/download/Human_cell_markers.txthralit_creator Purpose To store information about the creators of various digital objects for HRA.
ColumnData type Description pmid varchar Unique identifier for each publication based on the PubMed ID.doi varchar Digital Object Identifier for the publication.sourcevarcharSource from which the publication information was obtained (i.e., cxg, cellmarker).Relations hipsLinked to "hralit_publication" table via "pmid" field.Notes This table serves as a junction table to establish many-to-many relationships among creators, authors, and digital objects.The data for this table is available at https://hubmapconsortium.github.io/ccfreleases/v1.4/docs.hralit_reviewer Purpose To store information about the reviewers of various digital objects for HRA.structure, used to reference a specific entity, such as "http://purl.obolibrary.org/obo/UBERON_0006082".pref_label varchar Preferred label for the anatomical structure.type varchar The type of the anatomical structure.Relations hips Linked to "hralit_asctb_linkage" table via "iri" field.Linked to "hralit_triple" table via "iri" field.Notes The data for this table is sourced from the 5th release of the Anatomical Structures, Cell Linked to "hralit_triple" table via "iri" field.Notes The data for this table is sourced from the 5th release of the Anatomical Structures, Cell Types, and Biomarker (ASCT+B) Tables.death_event varchar Event or circumstance leading to the donor's death.source varchar Source from which the donor information was obtained (i.e., hubmap, cxg, gtex).age_unit varchar Unit in which the age is measured (e.g., years).Relations hips Linked to "hralit_datasets" table via "donor_id".Notes The data for this table is sourced from the Human BioMolecular Atlas Program (HuBMAP) portal, CZ CELLxGENE portal, and Genotype-Tissue Expression (GTEx) portal.hralit_dataset Purpose To store detailed information about datasets, including associated donor and sample details, metadata, and dataset status.
table is sourced from the Human BioMolecular Atlas Program (HuBMAP) portal, CZ CELLxGENE portal, and Genotype-Tissue Expression (GTEx) portal.

Table S3 .
Node count statistics for HRAlit database tables.
Note that in CellMarker, CxG, and GTEx publications, the number refers to the publications with DOIs.

Table S4 .
Linkage count statistics for HRAlit database for 22 relationship types.