SARS-ANI: a global open access dataset of reported SARS-CoV-2 events in animals

Nerpel, Afra; Yang, Liuhuaying; Sorger, Johannes; Käsbohrer, Annemarie; Walzer, Chris; Desvars-Larrive, Amélie

doi:10.1038/s41597-022-01543-8

Download PDF

Data Descriptor
Open access
Published: 23 July 2022

SARS-ANI: a global open access dataset of reported SARS-CoV-2 events in animals

Scientific Data volume 9, Article number: 438 (2022) Cite this article

6211 Accesses
19 Citations
312 Altmetric
Metrics details

Subjects

Abstract

The zoonotic origin of SARS-CoV-2, the etiological agent of COVID-19, is not yet fully resolved. Although natural infections in animals are reported in a wide range of species, large knowledge and data gaps remain regarding SARS-CoV-2 in animal hosts. We used two major health databases to extract unstructured data and generated a global dataset of SARS-CoV-2 events in animals. The dataset presents harmonized host names, integrates relevant epidemiological and clinical data on each event, and is readily usable for analytical purposes. We also share the code for technical and visual validation of the data and created a user-friendly dashboard for data exploration. Data on SARS-CoV-2 occurrence in animals is critical to adapting monitoring strategies, preventing the formation of animal reservoirs, and tailoring future human and animal vaccination programs. The FAIRness and analytical flexibility of the data will support research efforts on SARS-CoV-2 at the human-animal-environment interface. We intend to update this dataset weekly for at least one year and, through collaborations, to develop it further and expand its use.

Measurement(s)	SARS-CoV-2 event in animal hosts
Technology Type(s)	Manual data collection from text sources
Factor Type(s)	ID • primary_source • archive_event_number • link_web • secondary_source • secondary_source_ID • secondary_source_web • host_com_orig • host_sci_orig • host_com_res • host_sci_res • host_colloq • host_sci_spec_res • family • epidemiological_unit • number_cases • number_susceptible • number_tested • number_deaths • age • sex • country_iso3 • country_name • subnational_administration • city • location_detail • date_confirmed • date_reported • date_published • related_to_other_entries • related_ID • test • sampling_type • test_2 • sampling_type_2 • test_3 • sampling_type_3 • negative_test • negative_sampling_type • negative_test_2 • negative_sampling_type_2 • reason_for_testing • symptoms • outcome • living_conditions • source_of_infection • variant • control_measures • original_source • link_original_source
Sample Characteristic - Organism	animal host(s)
Sample Characteristic - Environment	domestic • wild• captive• farmed
Sample Characteristic - Location	Global

A comprehensive dataset of animal-associated sarbecoviruses

Article Open access 07 October 2023

A geopositioned and evidence-graded pan-species compendium of Mayaro virus occurrence

Article Open access 14 July 2023

A real-time spatio-temporal syndromic surveillance system with application to small companion animals

Article Open access 28 November 2019

Background & Summary

Although the coronavirus disease 2019 (COVID-19) pandemic is driven by human-to-human transmission, its causative agent, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is of zoonotic origin¹. Emergence in humans most likely involved at least two independent zoonotic spillover (i.e. animal-to-human transmission) events from wild animals kept at the Huanan Seafood Market in Wuhan, China^2,3. The first officially confirmed case in an animal was reported in February 2020, when a dog in Hong Kong tested positive for the virus shortly after its owner was diagnosed with COVID-19⁴. SARS-CoV-2 is recognized as a generalist coronavirus¹, showing a great capacity for infecting multiple animal species⁵, including pets, e.g. dogs^4,6,7, cats^6,8,9, and Syrian hamsters¹⁰; zoo animals¹¹, e.g. gorillas (https://www.science.org/content/article/captive-gorillas-test-positive-coronavirus), lions^12,13,14, tigers^12,14, mountain lion¹³, and Asian small-clawed otter (https://www.aphis.usda.gov/aphis/newsroom/stakeholder-info/sa_by_date/sa-2021/sa-04/covid-georgia-otters); farmed animals, e.g. mink^15,16,17; and free-ranging wildlife¹¹, e.g. white-tailed deer^18,19,20 and leopard²¹. SARS-CoV-2-infected animals may exhibit subclinical to severe signs of infection^4,14,17 and implemented interventions range from individual treatment to preventive culling.

Animal infections mostly result from human-to-animal transmission (“spillback”) and can, in certain cases, lead to onward epizootic circulation of the virus, among conspecifics, e.g. in hamsters²², mink¹⁶, tigers²³, and white-tailed deer^18,19,20, or between species, e.g. from mink to cats²⁴. Additional spillback events from humans to animals may have occurred and gone undetected²⁵. Recently, animal-to-human transmissions were observed from farmed mink¹⁶, pet hamster¹⁰, cat²⁶, and possibly free-ranging white-tailed deer²⁷. These secondary spillovers resulted in the mass culling of mink²⁸ and occasionally led to subsequent human-to-human transmission¹⁰. Not only does SARS-CoV-2 represent a risk for public health, but it is also a threat of unknown magnitude for animal health and welfare as well as conservation^29,30,31.

Reports on SARS-CoV-2 in animals are primarily available from the Program for Monitoring Emerging Diseases (ProMED-mail) (https://promedmail.org/) and the World Animal Health Information System (WAHIS) of the World Organisation for Animal Health (WOAH, formerly OIE) (https://wahis.woah.org/). However, this data is unstructured (narrative text) and/or available in multiple excel sheets or PDF files, therefore not usable without preliminary, time-consuming curation and formatting steps. The Animal and Plant Health Inspection Service (APHIS) of the U.S. Department of Agriculture (USDA) publishes a dashboard of confirmed cases of SARS-CoV-2 in animals in the United States (https://www.aphis.usda.gov/aphis/dashboards/tableau/sars-dashboard) while the Canadian Animal Health Surveillance System (CAHSS) has a dashboard for Canada (https://cahss.ca/cahss-tools/sars-cov-2-dashboard). Data can be downloaded from the APHIS-USDA dashboard, but only as an image, PDF, or PowerPoint (not machine-readable) whereas underlying data is not accessible from the CAHSS dashboard. Both the Danish and Dutch governments maintain a website providing information related to SARS-CoV-2 in mink farms in the respective country (https://www.foedevarestyrelsen.dk/Dyr/Dyr-og-Covid-19/Mink-og-COVID-19 and https://www.rijksoverheid.nl/actueel/nieuws, respectively), without any possibility of accessing the raw data.

Although the WOAH, Food and Agriculture Organization (FAO), and World Health Organization (WHO) recently published a statement on “the prioritization of monitoring SARS-CoV-2 infection in wildlife and preventing the formation of animal reservoirs” (https://www.who.int/news/item/07-03-2022-joint-statement-on-the-prioritization-of-monitoring-sars-cov-2-infection-in-wildlife-and-preventing-the-formation-of-animal-reservoirs), there is currently no comprehensive global dataset on SARS-CoV-2 events in animals that can be easily imported, processed, and analysed.

In the context of the current COVID-19 pandemic, availability of FAIR (Findable, Accessible, Interoperable, and Reusable) data³² is critical to understand the current and developing epidemiology of SARS-CoV-2 at human-animal interfaces and mitigate the impacts of this and future pandemics. In this paper, following the Open Science Principles^33,34, we document and share the methods used to produce SARS-ANI, a comprehensive curated global dataset of SARS-CoV-2 events in animals. We provide a detailed description of the dataset and supplement it with user-friendly documentation and materials (code and archived reports) to enhance data comprehension and use. We also present usage examples that shed light on the epidemiological and clinical patterns of SARS-CoV-2 animal infections globally, but also at the country and species level. The generated dataset is publicly available and readily usable for analytical purposes. Using harmonized taxonomic names, the SARS-ANI Dataset greatly facilitates access to and re-use of data on SARS-CoV-2 events in animals. The dashboard also allows non-experts to access and view SARS-CoV-2 animal events. The continuous analysis of SARS-CoV-2 occurrence data in animals is especially critical to adapting monitoring, surveillance and vaccination programs for animals and humans in a timely manner and evaluating the developing threat SARS-CoV-2 represents for public and animal health as well as biodiversity and conservation.

Methods

Data for this dataset was collected and integrated from two major animal health databases: i) the Program for Monitoring Emerging Diseases (ProMED-mail) (https://promedmail.org/), which is a program of the International Society for Infectious Diseases (ISID, https://isid.org/), and ii) the World Animal Health Information System (WAHIS) of the World Organisation for Animal Health (WOAH, formerly OIE) (https://wahis.woah.org/).

Step 1: integrating ProMED-mail reports

ProMED-mail (https://promedmail.org/) is the largest publicly available system reporting global infectious disease outbreaks (outbreak denotes the occurrence of one or more cases in an epidemiological unit). It provides reports (called “posts”) on outbreaks and disease emergence. The information flow leading to publication of ProMED-mail reports is as follows: a disease event to be dispatched is selected from daily notifications of outbreaks received via emails, searching through the Internet and traditional media, and scanning of official and unofficial websites. All incoming information is reviewed and filtered by an editor or associate editor who, subsequently sends them to a multidisciplinary team of subject matter expert moderators who assess the accountability and accuracy of the information, interpret it, provide commentary, and give references to previous ProMED-mail reports and the scientific literature³⁵. One ProMED-mail report, identified via a unique report identifier, may depict one single or several health events.

The integration of the ProMED-mail reports of interest followed two steps:

i) Selection of ProMED-mail reports

We identified reports describing SARS-CoV-2 events in animals, i.e. presenting at least one individual case of SARS-CoV-2 in an animal, via the “Search Posts” function provided on the ProMED-mail website. We used the keywords “animal” and “COVID-19” (which are consistently used in the “Subject” of the ProMED-mail posts to report information related to SARS-CoV-2 in animals) to retrieve the reports pertaining to natural and experimental infections or vaccine assays in animals as well as general discussions on SARS-CoV-2 in animals (note: although COVID-19 refers to the disease caused by SARS-CoV-2 in humans and should not be used for animals, ProMED-mail conveniently uses this keyword for both humans and animals). Reports describing naturally occurring infection (i.e. the presence of the virus is demonstrated through laboratory method(s)) or exposure (i.e. the presence of antibodies against SARS-CoV-2 is evidenced through laboratory method(s)) of a single individual or group of individuals were filtered manually and considered for data extraction. As of date of submission (22 June 2022), the ProMED-mail database included 232 reports on SARS-CoV-2 in animals.

ii) Link to previous reports

When a health event is continuing, ProMED-mail publishes follow-up reports, which provide references to previous ProMED-mail reports (at the end of the report or in the section “See Also” at the end of the post). We used this information to identify the potential relationship of each reported event to a previous one (e.g. clinical follow-up, further spread of the virus, and treatment outcome) and entered this data into the final dataset.

Step 2: integrating WAHIS reports

WAHIS (https://wahis.woah.org/) is a Web-based computer system that processes data on animal diseases in real-time. WAHIS data reflects the information gathered by the Veterinary Services from WOAH (formerly OIE) Members and non-Members Countries and Territories on WOAH-listed diseases in domestic animals and wildlife, as well as on emerging and zoonotic diseases. In accordance with the WOAH Terrestrial Animal Health Code³⁶, the detection of infection with SARS-CoV-2 in animals meets the criteria for reporting to the WOAH as an emerging infection (https://www.woah.org/app/uploads/2021/03/a-reporting-sars-cov-2-to-the-oie.pdf). Only authorised users, i.e. the Delegates of WOAH Member Countries and their authorised representatives, can enter data into the WAHIS platform to notify the WOAH of relevant animal disease information.

One WAHIS report, denominated via a unique report identifier, may contain one single or several outbreaks, each identified via a unique outbreak identifier. All information is publicly accessible on the WAHIS interface.

The integration of the WAHIS reports of interest was performed in two steps:

i) Selection of WAHIS reports

We used the WAHIS dashboard of animal disease events (https://wahis.woah.org/#/events) to extract cases of SARS-CoV-2 infection in animals notified by WOAH Member and non-Members States. WAHIS publishes immediate notifications (INs) and follow-up reports (FURs), identifiable through the prefix “IN” and “FUR” in their respective names. Immediate notifications dispense information on newly notified events while FURs generally provide updates on previously notified, ongoing events (e.g. number of newly infected animals and new deaths, newly implemented control measures).

We applied filters to the field “DISEASE” (“SARS-CoV-2 in animals (inf. with)”) and “REPORT DATE” to select reports related to SARS-CoV-2 events from 1^rst December 2019 until today. The reports can be consulted online or downloaded as an individual PDF or Excel file, each file corresponding to one country report (i.e. several outbreaks can be included in one report). As of date of submission (22 June 2022), the WAHIS dashboard included 311 reports on SARS-CoV-2.

ii) Identification of gaps and dataset completion

ProMED-mail screens a large range of information sources including WAHIS reports. The ProMED-mail posts mention the event ID of the WAHIS report(s) used as information source, which makes it possible to consult the original source on the WAHIS dashboard. Therefore, we chose to first identify SARS-CoV-2 events in animals in the ProMED-mail database. In a second step, we used the WAHIS dashboard to identify gaps, i.e. complete the previously entered SARS-CoV-2 events (hereinafter referred to as sibling events) and find additional events not reported in ProMED-mail (Fig. 1).

For each country (using the filter “COUNTRY/TERRITORY” on the WAHIS dashboard), we identified sibling events by comparing the WAHIS reports against all the previously entered ProMED-mail reports of the country, using the information on species, subnational administration, and date of laboratory confirmation (a buffer of ±7 days was considered due to possible discrepancies related to confirmation by different laboratories) or date of publication when date of laboratory confirmation was missing (in this case a buffer of 30 days was considered because date of publication is strongly database-dependent). We did not use information about the city here because reports may inconsistently refer to city/village of outbreak occurrence due to data privacy.

This strategy, although time consuming, was consistently applied throughout the data extraction process, ensuring a comprehensive collection of information for each outbreak, accuracy of the data, and reproducibility of the method.

Data extraction

ProMED-mail provides detailed, text-based (narrative) reports of health events. This data is unstructured whereas WAHIS uses both semi-structured (.pdf file organized into sections, including free text) and structured data (.xlsx format) to display the reports. Each selected report underwent manual review by a veterinarian, guaranteeing a full understanding of the content and context. Information was manually extracted and hand-coded.

The following event information was extracted (when available) and entered into a structured template within a dedicated .csv file:

- Animal host: common name (i.e. most specific designation provided by the source(s), in English) and scientific name as mentioned in the source(s) (scientific names are harmonized so that only the first letter of the genus is capitalized);
- Geographic location: country, subnational administration, city;
- SARS-CoV-2 variant;
- Dates: when the case was laboratory confirmed, reported by WAHIS, and published;
- Metrics: number of cases, number of deaths, number of susceptible animals.

Moreover, the following information on animal patient(s)/case(s) were extracted to populate the dataset:

- Age;
- Sex;
- Living conditions;
- Main reason for testing;
- Suspected source of infection;
- Symptoms: main reported clinical signs allegedly associated to SARS-CoV-2 were summarized with one to several keywords mentioned in the text. Multiple symptoms were separated by the operator “and”.

Extracted data described above was entered into the dataset as mentioned in the report and no information was subjected to any interpretation before entry. In addition, to facilitate understanding of the data, integration with other sources, and analysis, we have added the five following patient attributes:

- The common and scientific name (resolved to species or subspecies level, depending on the available information) of the animal host, harmonized against the National Center for Biotechnology Information (NCBI) taxonomic backbone³⁷;
- The colloquial name of the host, i.e. the name commonly used to identify the animal in non-specialist language (e.g. “tiger” for “Sumatran tiger”);
- The scientific name of the host resolved to the species level;
- The higher taxonomy (i.e. family) of the animal host, retrieved from the report, expert knowledge, or the literature.

Finally, for each SARS-CoV-2 event recorded in the dataset, we reported the primary and secondary source of information, i.e. source name (ProMED-mail or WAHIS) and link to the online report, as well as the original information source as referred by the primary source. A copy of each report used during the data extraction process was downloaded and saved as a PDF file. We inserted a timestamp on the saved file (ProMED-mail reports) or the download date was specified within the file name (it was not possible to insert a timestamp on WAHIS reports).

Data documenting each event corresponds to information available in the ProMED-mail and/or WAHIS report when consulting the report (see timestamp or download date). Potential subsequent editions or modifications of the report by ProMED-mail and/or WAHIS was not considered.

Disclaimer

Use of the data from the WAHIS platform requires mentioning the following statement: “The World Organisation for Animal Health (WOAH) bears no responsibility for the integrity or accuracy of the data contained herein, in particular due, but not limited to, any deletion, manipulation, or reformatting of data that may have occurred beyond its control”.

Data Records

Each row of the dataset represents a SARS-CoV-2 event in animal(s), identified by a unique identifier (field ID). We consider as an event when one single case or several epidemiologically related cases were identified by the presence of viral RNA (proof of infection) and/or antibodies (proof of exposure) in an animal. Epidemiologically related cases include e.g. animals belonging to the same farm, captive animals housed together, pets belonging to the same household, or animals sampled within the same (generally transversal) study, featuring similar event and patient attributes, i.e. they underwent the same laboratory test(s) and showed the same results (including variant), exhibited the same symptoms and disease outcome, and were confirmed, reported (when applicable), and published on the same date (e.g. when pets of the same species, sharing the same household, showed different symptoms, they are reported as two distinct events). Events include follow-up history reports of outbreaks (e.g. follow-up on the clinical status of the animal, variant identification after case confirmation).

Each unique SARS-CoV-2 event is characterized by 50 quantitative and qualitative event and patient attributes (columns) that structure the dataset. Therefore, the dataset comprises five numeric fields (number_cases, number_susceptible, number_tested, number_deaths, and age), three date fields (date_confirmed, date_reported, and date_published), one character field (sex), and 33 string data fields, including the event unique identifier (ID), seven fields relevant to host names and taxonomy, and two data fields requiring pre-defined string values (epidemiological_unit and related_to_other_entries). Eight metadata fields are dedicated to the information sources (e.g. names, links).

The field related_to_other_entries specifies potential relationship between events, and thereby allows identifying events that are related in space or time as well as follow-up reports (e.g. when animals described in two or more events are living together or when a follow-up report presents an update of another event, itself referred as updated by). Online-only Table 1 describes the 50 fields presented in the final dataset and their format. Supplementary File 1 provides three examples illustrating the structure and coding scheme of the SARS-ANI Dataset.

We have considered the two following values throughout the dataset:

NA (not applicable): when the field does not apply to the event. For example when only one laboratory test is conducted (field test), NA is reported for the second and third tests (test_2 and test_3, respectively).
NS (not specified): when the information is relevant for the event but has not been specified in the report(s). For example, when a PCR test is performed for diagnostic purpose but the report(s) does not mention which sample was used, the sample type (sampling_type) is NS.

Accompanying the dataset, we publish the R code to perform technical and visual validation of the data as well as the ProMED-mail and WAHIS reports used as information sources (n = 364 reports as of date of submission). We also share the list of ProMED-mail and WAHIS reports that have not been included in the dataset and main reason for exclusion. This strategy, in line with the Open Science Principles³⁴, aims to ensure that the data being reported is accurate and that all information can be accessed by researchers, policymakers, and the public. This also guarantees reproducibility of the data collection, may motivate further external validation processes as well as a large re-use of the data. The SARS-ANI files and products are summarized in Table 1. Event displays are also freely available on the SARS-ANI dashboard (https://vis.csh.ac.at/sars-ani/, see Usage Notes).

Table 1 Details of the SARS-ANI files and products.

Full size table

Static dataset

A static copy of the dataset in .csv file format is deposited on Zenodo³⁸ (first version, v1.0, was uploaded on 11 April 2022) and all versions are publicly available at https://doi.org/10.5281/zenodo.6442730, together with the SARS-ANI related files (metadata, R code, archived reports) described in Table 1. As of date of submission, the last uploaded static copy of the dataset (v1.1, 20 June 2022) encompasses all SARS-CoV-2 events published between 29 February 2020 and 7 June 2022. Version 1.1 displays 696 records of SARS-CoV-2 events in animals, representing 1,947 documented cases (infections and/or exposures), in 25 farmed, captive, wild, and domestic taxonomically-resolved animal species belonging to 14 families, from 39 countries worldwide (covering 150 subnational administrative areas). The number of cases was not reported by ProMED-mail and/or WAHIS in 121 events. Figure 2 shows the geographic distribution of the reported SARS-CoV-2 outbreaks included in the dataset. Table 2 summarizes the number of SARS-CoV-2 cases reported globally in each animal host.

Table 2 Number of globally reported SARS-CoV-2 cases (infections or exposures) per animal host (as of date of submission, 22 June 2022).

Full size table

Dynamic dataset

A live version of the SARS-ANI Dataset in .csv file format is publicly available on the GitHub repository, accessible at https://github.com/amel-github/sars-ani, together with the related SARS-ANI files described in Table 1. We plan to update the dataset weekly for at least the next 12 months. Then, depending on resources, the dataset will be updated at least half-yearly. The same technical procedures for data extraction and validation will be applied to any new event added to the dataset.

The GitHub interface allows users to flag potential inaccurate records in the dataset, which will trigger an error correction scheme, mainly consisting in processing the flagged record through another validation loop to check and replace the erroneous field when needed. Through the SARS-ANI GitHub repository and dashboard (https://vis.csh.ac.at/sars-ani/, see Usage Notes), we also expect to motivate experts in animal health, epidemiology, and conservation to support us filling up potential gaps in the records. The GitHub interface allows anybody to suggest changes to the dataset and code via the Issues Tracker (e.g. reporting an error and submitting new data). Contribution to the code can be implemented via a pull request (see Contributing.md, Table 1).

Limitations

The dataset includes only SARS-CoV-2 events that have been published in ProMED-mail and/or WAHIS. Therefore, the integration of an event in the dataset strongly depends on the reporting strategy of the country to the WOAH, the intensity of the research and surveillance strategy in the different animal species (e.g. whether pets from infected households are systematically investigated or not), the media coverage on the diagnosed cases, and the uptake of the reported event by the ProMED-mail team.

Moreover, we have identified five minor limitations in the dataset:

1.
Some SARS-CoV-2 events in animals were not depicted in detail across the information sources, especially those related to infections in mink farms, for which the number of confirmed cases and deaths is often missing. Additionally, it was not always possible to discern from the report whether mink belonged to the same farm unit or not. In several reports on SARS-CoV-2 infections in mink farms, the number of sampled animals is specified whereas the number of positive animals is not (e.g. https://wahis.oie.int/#/report-info?reportId=16733), which makes it impossible to infer whether the samples were pooled or not. Furthermore, SARS-CoV-2 infections in mink from Denmark and the Netherlands that occurred before 12 November 2020 (https://www.woah.org/en/oie-statement-on-covid-19-and-mink/) do not appear on the WAHIS dashboard of animal disease events while both countries endured a considerable burden in the fur-farming system³⁹. Therefore, missing values (e.g. number of cases) for those events, reported in ProMED-mail, could not be completed. For these reasons, the dataset does not allow an accurate estimation of the economic and health burden of SARS-CoV-2 in mink.
2.
Reports on mink and white-tailed deer often report the number of dams (sometimes dams and young) as the number of susceptible animals in the farm or herd, omitting adult males. We have reported the number as given in the information source(s), therefore, for these species, the number of susceptible animals in each event may be underestimated.
3.
Although very uncommon, some errors were found in the ProMED-mail or WAHIS reports. For example, when date of laboratory results precedes date of sampling or when sequencing is mentioned as first diagnostic test, although a PCR was most likely performed beforehand. Since we did not have any mean to retrieve the correct information, we entered the data as mentioned in the reports.
4.
To respect and protect the privacy of the animal owners, outbreak location (i.e. city or village), as provided in the ProMED-mail and WAHIS reports, may not represent the exact location of the outbreak and should be interpreted with caution.
5.
When multiple events are related (e.g. animals were living together: related_to_other_entries = living together), the number of susceptible animals (number_susceptible) from the same species is identical in both events and is therefore redundant. Similarly, when animals belonging to the same species and living together exhibited different symptoms or underwent different laboratory tests (therefore reported as distinct events) but were culled as part of a control strategy (i.e. related_to_other_entries = living together AND control_measures = culling OR selective culling), the number of reported deaths (number_deaths) is identical for these events (and is therefore redundant). Although very few records correspond to those cases, this may lead to a certain degree of over-counting the number of susceptible or dead animals if filtering of the events on the fields related_to_other_entries and control_measures is not performed accordingly (e.g. number of deaths should be counted once for farm animals living together). The code provided to explore the dataset presents examples of how using filters.

Technical Validation

Validation of the collected data followed several steps (Fig. 1).

Quality control & data cleaning

First, the data underwent a quality control and cleaning procedures where the unique values of each field were checked to search for inaccurate (e.g. containing typographical errors or not belonging to a pre-defined list of entities), unreliable (e.g. the value was not specific), incorrectly formatted (e.g. date was formatted as dd/mm/yyyy instead of yyyy-mm-dd), or missing data in the dataset. This step was performed in R⁴⁰ using the base function unique(). Events containing detected errors were manually inspected against original reports. When necessary, the erroneous values were modified, replaced, or removed.

Taxonomic validation

Misspelled animal names and errors in taxonomy can lead to incorrect scientific conclusions and poor policy design. Moreover, harmonization of host names aids with integrating other datasets (e.g. data on host biological traits, geographic distribution, or association with other pathogens). Therefore, for each event, we performed taxonomic validation of the common and scientific name of the animal host, using the R⁴⁰ package taxize⁴¹. We checked if the scientific names were up to date and if those names were spelled correctly using the function gnr_resolve() (names with a score ≤0.98 were manually inspected and, when needed, corrected). In addition, the scientific name for each common name (and vice versa) was resolved against the NCBI taxonomic backbone³⁷ using the function comm2sci() (sci2comm()). Finally, the higher taxonomy (i.e. family) for species names was validated by querying the NCBI database, using the function tax_name().

Search for duplicate events

We have identified duplicate events, defined as unique event reported more than once in the dataset. Events were flagged as duplicates if the geolocation information (i.e. country, subnational_administration, city, and location_detail), resolved animal host denomination (host_com_res and host_sci_res), sex, age, symptoms, date (date_confirmed or date_reported if date_confirmed was missing or date_published if the two other dates were missing), number of cases, number of deaths, number of susceptible, tests conducted, outcome, and relationship to another event (related_ID) were identical. This step was executed in R⁴⁰, using the dplyr package⁴². Flagged events were manually inspected against original information source(s) to confirm redundancy. Duplicate events were removed. Events that were incorrectly flagged as duplicates were corrected. The code to reproduce the three steps (described above) of the technical validation into R⁴⁰ (Table 1) is freely accessible at https://github.com/amel-github/sars-ani and at https://doi.org/10.5281/zenodo.6442730.

Visual validation

The data was visually inspected through different graphical displays. Figures and maps were produced in R⁴⁰, using the packages ggplot2⁴³, webr⁴⁴, and visNetwork⁴⁵ for graphical visualizations. The code to visually summarize the data (through maps, figures, and interactive network) is provided as an R Markdown document (Table 1) and is publicly accessible at https://github.com/amel-github/sars-ani and at https://doi.org/10.5281/zenodo.6442730.

Expert validation

“Expert judgment is defined as an informed opinion of people with experience in the subject, who are recognized by others as qualified experts in it, and who can provide information, evidence, judgments and assessments” ⁴⁶. First, the project leader reviewed unresolved issues met during data collection, conducted random verification of the events recorded in the dataset against the original reports, and randomly checked entry of ProMED-mail and WAHIS reports in the dataset. In a second step, two experts in our team examined the data as well as its graphical displays and subsequently provided feedback. These expert validation steps enabled to clarify questions and further resolve potential omissions in the dataset.

Usage Notes

The SARS-ANI Dataset is currently the most comprehensive global dataset of SARS-CoV-2 events reported in animals. Information displays in the dataset aims to facilitate a wide range of analyses of SARS-CoV-2 infections/exposures in animals that will further our understanding of the epidemiology and impact of SARS-CoV-2 in the different animal hosts, at different scales: international, national, or subnational. The data format, the standardized coding of the health events, and the harmonized host names make the dataset intelligible (also for non-experts) while they allow for a great analytical flexibility and interesting integration potential. We hope that these qualities will enhance the reuse and combination of the data across sectors and disciplines.

Addressing SARS-CoV-2 events in animals

Figures 3, 4, and 5 visually display answers to some of the many questions that can be investigated using the data; other questions/visualizations can be computed from the published code (https://github.com/amel-github/sars-ani and https://doi.org/10.5281/zenodo.6442730):

What is the case fatality rate of SARS-CoV-2 per animal host and country? (Fig. 3)
Which SARS-CoV-2 variants have been identified in the different animal hosts? (Fig. 4)
Why were animals tested for SARS-CoV-2? (Fig. 5)

The dataset can become an essential tool to estimate the realised and potential threat and impact of SARS-CoV-2 on animal husbandry (specifically mink/fur animals), pets, wildlife (including captive wild animals), and conservation programmes^30,47. Coupled with economic data on the cost of testing, implemented control measures (e.g. mass culling) and value of farmed and captive animals, it can facilitate research in animal health economics, e.g. assessing the economic burden of SARS-CoV-2 infections on animal production systems and conservation programmes or supporting cost-benefit analysis of prevention of zoonotic-origin pandemics⁴⁸. The dataset can also assist risk-based veterinary surveillance by identifying surveillance needs to protect animal health and efficiently prioritizing resource allocation, especially in resource-limited contexts.

Additionally, the SARS-ANI Dataset can be expanded through the integration of other datasets (e.g. VIRION⁴⁹) or be ported and integrated into existing platforms on animal diseases (e.g. the Wildlife Health Information Sharing Partnership: https://whispers.usgs.gov/home or the WildHealthNet initiative: https://oneworldonehealth.wcs.org/Initiatives/WildHealthNet.aspx).

Finally, the procedure and the standardized reporting format developed in SARS-ANI are applicable to other infectious threats; likewise, the flexible and well-documented analytical tools can be adapted and used for descriptive epidemiology of other diseases.

One Health surveillance strategy

Because surveillance of zoonotic-origin diseases at the human-animal-environment interface is extremely challenging, there is a need for comprehensive One Health approaches to monitor SARS-CoV-2 carriage and infections in animals and humans globally¹⁸. One Health tools that enable the integrative analysis and visualization of SARS-CoV-2 events are critical. The SARS-ANI Dataset can be combined with other data across sectors (e.g. data on COVID-19 cases in humans, land-use and environmental data) to support research efforts on SARS-CoV-2 at the human-animal-environment interface⁵⁰, e.g. identifying hotspots of circulation and spillover, developing and adapting integrated One Health surveillance systems of SARS-CoV-2 events, and elucidating the natural ecology of SARS-CoV-2. We believe the SARS-ANI Dataset, with timely and reliable information, can assist inter-professional and multi-sectoral SARS-CoV-2 prevention and control activities, including the development of relevant national and international regulations and agreements to improve preparedness and reduce the risk of transmission between humans and animals. Information sharing on SARS-CoV-2 infections/exposures in animal living close to humans will benefit veterinary and public health professionals in their investigations of SARS-CoV-2 cases at the human-animal interface (e.g. need for testing pets/captive animals in contact with COVID-19 diagnosed owners/caretakers; types of samples and tests to be performed). The data can also provide background information to develop practical advice for the international trade in domestic and farmed species, for which a role in human infections has been recognized¹⁰. Furthermore, the data can be used to develop and adapt national or global One Health prevention, preparedness and response plans for emerging coronavirus diseases and assist public health officers in their task.

Finally, leveraging the experience from SARS-ANI and other projects^31,49, further research may allow the development of a global dataset of known spillback events that will enhance our understanding of the barriers and facilitators of zoonotic transmission as well as the training of retrospective and predictive models to identify patterns and predict future emergence³¹.

SARS-ANI VIS: informing scientists, stakeholders, and the public

The SARS-ANI dashboard, publicly accessible at https://vis.csh.ac.at/sars-ani/, provides intuitive insights into specific aspects of SARS-CoV-2 events in animals at-a-glance (Fig. 6). The visual representations of the data are displayed in a narrative format, where information is conveyed through vertically connected segments. Each segment features a selected topic, starting from a general overview and leading to more specific questions, such as variants across species or reported clinical signs. The partition into segments aims to ease the understanding of the information and facilitate webpage navigation. Each segment is carefully designed to exhibit the intended information in a fashion that can be comprehended by scientists and the general public alike. The dashboard thus will facilitate access to the data, favour animal health information sharing, and foster global understanding of the data among the scientific community, stakeholders, and the public. The dashboard intends to support public education about the risk of SARS-CoV-2 transmission between humans and animals and raise public awareness about possible wildlife conservation issues posed by the SARS-CoV-2 pandemic. The dashboard is linked to the live dataset available on GitHub (https://github.com/amel-github/sars-ani) and will therefore be subjected to continuous updates.

Code availability

A static version of the SARS-ANI Dataset and related files that accompany the dataset (metadata, R code, archived reports) have been deposited on Zenodo³⁸ and are available at https://doi.org/10.5281/zenodo.6442730, for public access. A live version of the dataset (together with the related files) is accessible on GitHub at https://github.com/amel-github/sars-ani. Please refer to the README file in the code release for further instructions.

References

Holmes, E. C. COVID-19 - lessons for zoonotic disease. Science 375, 1114–1115 (2022).
Article ADS CAS Google Scholar
Worobey, M. et al. The Huanan market was the epicenter of SARS-CoV-2 emergence. Preprint at https://doi.org/10.5281/zenodo.6299600 (2022).
Pekar, J. E. et al. SARS-CoV-2 emergence very likely resulted from at least two zoonotic events. Preprint at https://doi.org/10.5281/zenodo.6291628 (2022).
Sit, T. H. C. et al. Infection of dogs with SARS-CoV-2. Nature 586, 776–778 (2020).
Article ADS CAS Google Scholar
Mahdy, M. A. A., Younis, W. & Ewaida, Z. An overview of SARS-CoV-2 and animal infection. Front. Vet. Sci. 7 (2020).
Hamer, S. A. et al. SARS-CoV-2 infections and viral isolations among serially tested cats and dogs in households with infected owners in Texas, USA. Viruses 13, 938 (2021).
Article CAS Google Scholar
Zambrano-Mila, M. S., Freire-Paspuel, B., Orlando, S. A. & Garcia-Bereguiain, M. A. SARS-CoV-2 infection in free roaming dogs from the Amazonian jungle. One Health https://doi.org/10.1016/j.onehlt.2022.100387, 100387 (2022).
Zhang, Q. et al. A serological survey of SARS-CoV-2 in cat in Wuhan. Emerg. Microbes Infect. 9, 2013–2019 (2020).
Article CAS Google Scholar
Miró, G. et al. SARS-CoV-2 Infection in one cat and three dogs living in COVID-19-positive households in Madrid, Spain. Front. Vet. Sci. 8 (2021).
Yen, H.-L. et al. Transmission of SARS-CoV-2 delta variant (AY.127) from pet hamsters to humans, leading to onward human-to-human transmission: a case study. Lancet 399, 1070–1078 (2022).
Article CAS Google Scholar
Lecu, A., Bertelsen, M. F. & Walzer, C. in Science-based facts & knowledge about wild animals, zoos and SARS-CoV-2 virus Fifth (v9) edn Ch. 4.4 (ed Alexis Lécu and Rafaela Fiuza) https://www.eazwv.org/page/inf_handbook (European Association of Zoo and Wildlife Veterinarians, 2022).
Bartlett, S. L. et al. SARS-CoV-2 infection and longitudinal fecal screening in Malayan tigers (Panthera tigris jacksoni), Amur tigers (Panthera tigris altaica), and African lions (Panthera leo krugeri) at the Bronx zoo, New York USA. J. Zoo Wildl. Med. 51, 733–744 (2021).
Article Google Scholar
Koeppel, K. N. et al. SARS-CoV-2 reverse zoonoses to pumas and lions, South Africa. Viruses 14, 120 (2022).
Article CAS Google Scholar
McAloose, D. et al. From people to Panthera: Natural SARS-CoV-2 infection in tigers and lions at the Bronx zoo. mBio 11, e02220–02220 (2020).
Article CAS Google Scholar
Lu, L. et al. Adaptation, spread and transmission of SARS-CoV-2 in farmed minks and associated humans in the Netherlands. Nat. Commun. 12, 6802 (2021).
Article ADS CAS Google Scholar
Munnink, B. B. O. et al. Transmission of SARS-CoV-2 on mink farms between humans and mink and back to humans. Science 371, 172–177 (2021).
Article ADS Google Scholar
Oreshkova, N. et al. SARS-CoV-2 infection in farmed minks, the Netherlands, April and May 2020. Euro Surveill. 25, 2001005 (2020).
Article Google Scholar
Hale, V. L. et al. SARS-CoV-2 infection in free-ranging white-tailed deer. Nature 602, 481–486 (2022).
Article ADS CAS Google Scholar
Chandler, J. C. et al. SARS-CoV-2 exposure in wild white-tailed deer (Odocoileus virginianus). PNAS 118, e2114828118 (2021).
Article CAS Google Scholar
Kuchipudi, S. V. et al. Multiple spillovers from humans and onward transmission of SARS-CoV-2 in white-tailed deer. PNAS 119, e2121644119 (2022).
Article CAS Google Scholar
Mahajan, S. et al. Systemic infection of SARS-CoV-2 in free ranging leopard (Panthera pardus fusca) in India. Preprint at https://www.biorxiv.org/content/10.1101/2022.01.11.475327v1 (2022).
Sia, S. F. et al. Pathogenesis and transmission of SARS-CoV-2 in golden hamsters. Nature 583, 834–838 (2020).
Article ADS CAS Google Scholar
Grome, H. et al. SARS-CoV-2 outbreak among Malayan tigers and humans, Tennessee, USA, 2020. Emerg. Infect. Dis. 28, 833 (2022).
Article CAS Google Scholar
van Aart, A. E. et al. SARS-CoV-2 infection in cats and dogs in infected mink farms. Transbound. Emerg. Dis. https://doi.org/10.1111/tbed.14173 (2021).
Mallapaty, S. The search for animals harbouring coronavirus — and why it matters. Nature 591, 26–28 (2021).
Article ADS CAS Google Scholar
Sila, T. et al. Suspected cat-to-human transmission of SARS-CoV-2, Thailand, July-September 2021. Emerg. Infect. Dis. 28 (2022).
Pickering, B. et al. Highly divergent white-tailed deer SARS-CoV-2 with potential deer-to-human transmission. Preprint at https://www.biorxiv.org/content/10.1101/2022.02.22.481551v1 (2022).
Frutos, R. & Devaux, C. A. Mass culling of minks to protect the COVID-19 vaccines: is it rational? New Microbes New Infect. 38, 100816 (2020).
Article CAS Google Scholar
Akinsorotan, O. A., Olaniyi, O. E., Adeyemi, A. A. & Olasunkanmi, A. H. Corona virus pandemic: Implication on biodiversity conservation. Front. water 3 (2021).
Bang, A. & Khadakkar, S. Biodiversity conservation during a global crisis: Consequences and the way forward. PNAS 117, 29995–29999 (2020).
Article ADS CAS Google Scholar
Fagre, A. C. et al. Assessing the risk of human-to-wildlife pathogen transmission for conservation and public health. Ecol. Lett. 25, 1534–1549 (2022).
Article Google Scholar
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
Article Google Scholar
Vicente-Saez, R. & Martinez-Fuentes, C. Open Science now: A systematic literature review for an integrated definition. J. Bus. Res. 88, 428–436 (2018).
Article Google Scholar
Molloy, J. C. The open knowledge foundation: open data means better science. PLoS Biol. 9, e1001195 (2011).
Article CAS Google Scholar
Yu, V. L. & Madoff, L. C. ProMED-mail: An early warning system for emerging diseases. Clin. Infect. Dis. 39, 227–232 (2004).
Article Google Scholar
World Organisation for Animal Health. Terrestrial Animal Health Code https://www.woah.org/en/what-we-do/standards/codes-and-manuals/terrestrial-code-online-access/ (2021).
Schoch, C. L. et al. NCBI Taxonomy: a comprehensive update on curation, resources and tools. Database (Oxford) 2020 (2020).
Nerpel, A. et al. SARS-ANI: a global open access dataset of reported SARS-CoV-2 events in animals. Zenodo https://doi.org/10.5281/zenodo.6442731 (2022).
Koopmans, M. SARS-CoV-2 and the human-animal interface: outbreaks on mink farms. Lancet Infect. Dis. 21, 18–19 (2021).
Article CAS Google Scholar
R Core Team. R: A language and environment for statistical computing https://www.R-project.org/ (R Foundation for Statistical Computing, Vienna, Austria, 2021).
Chamberlain, S. A. & Szöcs, E. taxize: taxonomic search and retrieval in R. F1000Res. 2, 191 (2013).
Article Google Scholar
Wickham, H., François, R., Henry, L. & Müller, K. dplyr: A grammar of data manipulation https://CRAN.R-project.org/package=dplyr (2022).
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer-Verlag New York, 2016).
Keon-Woong, M. webr: Data and functions for web-based analysis https://CRAN.R-project.org/package=webr (2020).
Almende, B. V., Thieurmel, B. & Robert, T. visNetwork: Network Visualization using ‘vis.js’ Library https://CRAN.R-project.org/package=visNetwork (2021).
Escobar-Pérez, J. & Martínez, A. Validez de contenido y juicio de expertos: Una aproximación a su utilización. Avances en Medición 6, 27–36 (2008).
Google Scholar
Fagre, A. C. et al. Assessing the risk of human-to-wildlife pathogen transmission for conservation and public health. Ecol. Lett. 0, 1–16 (2022).
Google Scholar
Bernstein, A. S. et al. The costs and benefits of primary prevention of zoonotic pandemics. Sci. Adv. 8, eabl4183 (2022).
Article ADS CAS Google Scholar
Carlson, C. J. et al. The Global Virome in One Network (VIRION): an atlas of vertebrate-virus associations. mBio 13, e02985–02921 (2022).
Article Google Scholar
Amuasi, J. H. et al. Calling for a COVID-19 One Health Research Coalition. Lancet 395, 1543–1544 (2020).
Article CAS Google Scholar

Download references

Acknowledgements

We thank our colleagues of the Unit Veterinary Public Health and Epidemiology, University of Veterinary Medicine Vienna, Austria, for their inputs on the different visual displays of the data. The SARS-ANI Team is grateful to the Legal Department of the University of Veterinary Medicine Vienna, Austria, for its advice regarding data (re)use, and especially to C. Ruckenbauer. We also thank W. Schueller for his support with the GitHub interface and Prof. W. H. M. van der Poel for his helpful information on the WAHIS reporting.

Funding

Open Access funding for this article was provided by the University of Veterinary Medicine Vienna (Vetmeduni Vienna).

Author information

Authors and Affiliations

Unit of Veterinary Public Health and Epidemiology, University of Veterinary Medicine Vienna, Veterinaerplatz 1, 1210, Vienna, Austria
Afra Nerpel, Annemarie Käsbohrer & Amélie Desvars-Larrive
Complexity Science Hub Vienna, Josefstaedter Strasse 39, 1080, Vienna, Austria
Liuhuaying Yang, Johannes Sorger & Amélie Desvars-Larrive
Wildlife Conservation Society, 2300 Southern Blvd, Bronx, NY, 10460, USA
Chris Walzer
Research Institute of Wildlife Ecology, University of Veterinary Medicine Vienna, Savoyenstrasse 1, 1160, Vienna, Austria
Chris Walzer
VetFarm, University of Veterinary Medicine Vienna, Kremesberg 13, 2563, Pottenstein, Austria
Amélie Desvars-Larrive

Authors

Afra Nerpel
View author publications
You can also search for this author in PubMed Google Scholar
Liuhuaying Yang
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Sorger
View author publications
You can also search for this author in PubMed Google Scholar
Annemarie Käsbohrer
View author publications
You can also search for this author in PubMed Google Scholar
Chris Walzer
View author publications
You can also search for this author in PubMed Google Scholar
Amélie Desvars-Larrive
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.N. developed the structured template and the coding scheme, extracted the data, investigated flagged errors and redundancies, updated the dataset, archived the information sources, and helped write the manuscript. A.D.L. conceived the study, coordinated the production of the dataset, developed the code for technical and visual validation, created the figures and tables, and helped write the manuscript. L.Y. created the dashboard, including conceptualisation, developing the data story and data automation process. J.S. supported the development of the dashboard and helped write the manuscript. A.K. and C.W. contributed to the data review process and helped write the manuscript.

Corresponding author

Correspondence to Amélie Desvars-Larrive.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary File 1

Online-only Table

Online-only Table 1 Description of the fields presented in the final dataset and format.

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Nerpel, A., Yang, L., Sorger, J. et al. SARS-ANI: a global open access dataset of reported SARS-CoV-2 events in animals. Sci Data 9, 438 (2022). https://doi.org/10.1038/s41597-022-01543-8

Download citation

Received: 15 April 2022
Accepted: 08 July 2022
Published: 23 July 2022
DOI: https://doi.org/10.1038/s41597-022-01543-8

This article is cited by

Using drivers and transmission pathways to identify SARS-like coronavirus spillover risk hotspots
- Renata L. Muylaert
- David A. Wilkinson
- David T. S. Hayman
Nature Communications (2023)
Survey of white-footed mice (Peromyscus leucopus) in Connecticut, USA reveals low SARS-CoV-2 seroprevalence and infection with divergent betacoronaviruses
- Rebecca Earnest
- Anne M. Hahn
- Nathan D. Grubaugh
npj Viruses (2023)

Subjects

Abstract

Similar content being viewed by others

A comprehensive dataset of animal-associated sarbecoviruses

A geopositioned and evidence-graded pan-species compendium of Mayaro virus occurrence

A real-time spatio-temporal syndromic surveillance system with application to small companion animals

Background & Summary

Methods

Step 1: integrating ProMED-mail reports

Step 2: integrating WAHIS reports

Data extraction

Disclaimer

Data Records

Static dataset

Dynamic dataset

Limitations

Technical Validation

Quality control & data cleaning

Taxonomic validation

Search for duplicate events

Visual validation

Expert validation

Usage Notes

Addressing SARS-CoV-2 events in animals

One Health surveillance strategy

SARS-ANI VIS: informing scientists, stakeholders, and the public

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary File 1

Online-only Table

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Using drivers and transmission pathways to identify SARS-like coronavirus spillover risk hotspots

Survey of white-footed mice (Peromyscus leucopus) in Connecticut, USA reveals low SARS-CoV-2 seroprevalence and infection with divergent betacoronaviruses

Search

Quick links