SARS-ANI: a global open access dataset of reported SARS-CoV-2 events in animals

The zoonotic origin of SARS-CoV-2, the etiological agent of COVID-19, is not yet fully resolved. Although natural infections in animals are reported in a wide range of species, large knowledge and data gaps remain regarding SARS-CoV-2 in animal hosts. We used two major health databases to extract unstructured data and generated a global dataset of SARS-CoV-2 events in animals. The dataset presents harmonized host names, integrates relevant epidemiological and clinical data on each event, and is readily usable for analytical purposes. We also share the code for technical and visual validation of the data and created a user-friendly dashboard for data exploration. Data on SARS-CoV-2 occurrence in animals is critical to adapting monitoring strategies, preventing the formation of animal reservoirs, and tailoring future human and animal vaccination programs. The FAIRness and analytical flexibility of the data will support research efforts on SARS-CoV-2 at the human-animal-environment interface. We intend to update this dataset weekly for at least one year and, through collaborations, to develop it further and expand its use. Measurement(s) SARS-CoV-2 event in animal hosts Technology Type(s) Manual data collection from text sources Factor Type(s) ID • primary_source • archive_event_number • link_web • secondary_source • secondary_source_ID • secondary_source_web • host_com_orig • host_sci_orig • host_com_res • host_sci_res • host_colloq • host_sci_spec_res • family • epidemiological_unit • number_cases • number_susceptible • number_tested • number_deaths • age • sex • country_iso3 • country_name • subnational_administration • city • location_detail • date_confirmed • date_reported • date_published • related_to_other_entries • related_ID • test • sampling_type • test_2 • sampling_type_2 • test_3 • sampling_type_3 • negative_test • negative_sampling_type • negative_test_2 • negative_sampling_type_2 • reason_for_testing • symptoms • outcome • living_conditions • source_of_infection • variant • control_measures • original_source • link_original_source Sample Characteristic - Organism animal host(s) Sample Characteristic - Environment domestic • wild• captive• farmed Sample Characteristic - Location Global Measurement(s) SARS-CoV-2 event in animal hosts Technology Type(s) Manual data collection from text sources Factor Type(s) ID • primary_source • archive_event_number • link_web • secondary_source • secondary_source_ID • secondary_source_web • host_com_orig • host_sci_orig • host_com_res • host_sci_res • host_colloq • host_sci_spec_res • family • epidemiological_unit • number_cases • number_susceptible • number_tested • number_deaths • age • sex • country_iso3 • country_name • subnational_administration • city • location_detail • date_confirmed • date_reported • date_published • related_to_other_entries • related_ID • test • sampling_type • test_2 • sampling_type_2 • test_3 • sampling_type_3 • negative_test • negative_sampling_type • negative_test_2 • negative_sampling_type_2 • reason_for_testing • symptoms • outcome • living_conditions • source_of_infection • variant • control_measures • original_source • link_original_source Sample Characteristic - Organism animal host(s) Sample Characteristic - Environment domestic • wild• captive• farmed Sample Characteristic - Location Global

Step 1: integrating proMeD-mail reports. ProMED-mail (https://promedmail.org/) is the largest publicly available system reporting global infectious disease outbreaks (outbreak denotes the occurrence of one or more cases in an epidemiological unit). It provides reports (called "posts") on outbreaks and disease emergence. The information flow leading to publication of ProMED-mail reports is as follows: a disease event to be dispatched is selected from daily notifications of outbreaks received via emails, searching through the Internet and traditional media, and scanning of official and unofficial websites. All incoming information is reviewed and filtered by an editor or associate editor who, subsequently sends them to a multidisciplinary team of subject matter expert moderators who assess the accountability and accuracy of the information, interpret it, provide commentary, and give references to previous ProMED-mail reports and the scientific literature 35 . One ProMED-mail report, identified via a unique report identifier, may depict one single or several health events.
The integration of the ProMED-mail reports of interest followed two steps: i) Selection of ProMED-mail reports We identified reports describing SARS-CoV-2 events in animals, i.e. presenting at least one individual case of SARS-CoV-2 in an animal, via the "Search Posts" function provided on the ProMED-mail website. We used the keywords "animal" and "COVID-19" (which are consistently used in the "Subject" of the ProMED-mail posts to report information related to SARS-CoV-2 in animals) to retrieve the reports pertaining to natural and experimental infections or vaccine assays in animals as well as general discussions on SARS-CoV-2 in animals (note: although COVID-19 refers to the disease caused by SARS-CoV-2 in humans and should not be used for animals, ProMED-mail conveniently uses this keyword for both humans and animals). Reports describing naturally occurring infection (i.e. the presence of the virus is demonstrated through laboratory method(s)) or exposure (i.e. the presence of antibodies against SARS-CoV-2 is evidenced through laboratory method(s)) of a single individual or group of individuals were filtered manually and considered for data extraction. As of date of submission (22 June 2022), the ProMED-mail database included 232 reports on SARS-CoV-2 in animals.
ii) Link to previous reports When a health event is continuing, ProMED-mail publishes follow-up reports, which provide references to previous ProMED-mail reports (at the end of the report or in the section "See Also" at the end of the post). We www.nature.com/scientificdata www.nature.com/scientificdata/ used this information to identify the potential relationship of each reported event to a previous one (e.g. clinical follow-up, further spread of the virus, and treatment outcome) and entered this data into the final dataset.
Step 2: integrating WAHIS reports. WAHIS (https://wahis.woah.org/) is a Web-based computer system that processes data on animal diseases in real-time. WAHIS data reflects the information gathered by the Veterinary Services from WOAH (formerly OIE) Members and non-Members Countries and Territories on WOAH-listed diseases in domestic animals and wildlife, as well as on emerging and zoonotic diseases. In accordance with the WOAH Terrestrial Animal Health Code 36 , the detection of infection with SARS-CoV-2 in animals meets the criteria for reporting to the WOAH as an emerging infection (https://www.woah.org/app/uploa ds/2021/03/a-reporting-sars-cov-2-to-the-oie.pdf). Only authorised users, i.e. the Delegates of WOAH Member Countries and their authorised representatives, can enter data into the WAHIS platform to notify the WOAH of relevant animal disease information.
One WAHIS report, denominated via a unique report identifier, may contain one single or several outbreaks, each identified via a unique outbreak identifier. All information is publicly accessible on the WAHIS interface.
The integration of the WAHIS reports of interest was performed in two steps: i) Selection of WAHIS reports We used the WAHIS dashboard of animal disease events (https://wahis.woah.org/#/events) to extract cases of SARS-CoV-2 infection in animals notified by WOAH Member and non-Members States. WAHIS publishes immediate notifications (INs) and follow-up reports (FURs), identifiable through the prefix "IN" and "FUR" in their respective names. Immediate notifications dispense information on newly notified events while FURs generally provide updates on previously notified, ongoing events (e.g. number of newly infected animals and new deaths, newly implemented control measures).
We applied filters to the field "DISEASE" ("SARS-CoV-2 in animals (inf. with)") and "REPORT DATE" to select reports related to SARS-CoV-2 events from 1 rst December 2019 until today. The reports can be consulted online or downloaded as an individual PDF or Excel file, each file corresponding to one country report (i.e. several outbreaks can be included in one report). As of date of submission (22 June 2022), the WAHIS dashboard included 311 reports on SARS-CoV-2.
ii) Identification of gaps and dataset completion ProMED-mail screens a large range of information sources including WAHIS reports. The ProMED-mail posts mention the event ID of the WAHIS report(s) used as information source, which makes it possible to consult the original source on the WAHIS dashboard. Therefore, we chose to first identify SARS-CoV-2 events in animals in the ProMED-mail database. In a second step, we used the WAHIS dashboard to identify gaps, i.e. complete the previously entered SARS-CoV-2 events (hereinafter referred to as sibling events) and find additional events not reported in ProMED-mail (Fig. 1).
For each country (using the filter "COUNTRY/TERRITORY" on the WAHIS dashboard), we identified sibling events by comparing the WAHIS reports against all the previously entered ProMED-mail reports of the country, using the information on species, subnational administration, and date of laboratory confirmation (a buffer of ±7 days was considered due to possible discrepancies related to confirmation by different laboratories) or date of publication when date of laboratory confirmation was missing (in this case a buffer of 30 days was considered because date of publication is strongly database-dependent). We did not use information about the city here because reports may inconsistently refer to city/village of outbreak occurrence due to data privacy.
This strategy, although time consuming, was consistently applied throughout the data extraction process, ensuring a comprehensive collection of information for each outbreak, accuracy of the data, and reproducibility of the method.

Data extraction.
ProMED-mail provides detailed, text-based (narrative) reports of health events. This data is unstructured whereas WAHIS uses both semi-structured (.pdf file organized into sections, including free text) and structured data (.xlsx format) to display the reports. Each selected report underwent manual review by a www.nature.com/scientificdata www.nature.com/scientificdata/ veterinarian, guaranteeing a full understanding of the content and context. Information was manually extracted and hand-coded.
The following event information was extracted (when available) and entered into a structured template within a dedicated .csv file: -Animal host: common name (i.e. most specific designation provided by the source(s), in English) and scientific name as mentioned in the source(s) (scientific names are harmonized so that only the first letter of the genus is capitalized); -Geographic location: country, subnational administration, city; -SARS-CoV-2 variant; -Dates: when the case was laboratory confirmed, reported by WAHIS, and published; -Metrics: number of cases, number of deaths, number of susceptible animals.
Moreover, the following information on animal patient(s)/case(s) were extracted to populate the dataset: -Age; -Sex; -Living conditions; -Main reason for testing; -Suspected source of infection; -Symptoms: main reported clinical signs allegedly associated to SARS-CoV-2 were summarized with one to several keywords mentioned in the text. Multiple symptoms were separated by the operator "and".
Extracted data described above was entered into the dataset as mentioned in the report and no information was subjected to any interpretation before entry. In addition, to facilitate understanding of the data, integration with other sources, and analysis, we have added the five following patient attributes: -The common and scientific name (resolved to species or subspecies level, depending on the available information) of the animal host, harmonized against the National Center for Biotechnology Information (NCBI) taxonomic backbone 37 ; -The colloquial name of the host, i.e. the name commonly used to identify the animal in non-specialist language (e.g. "tiger" for "Sumatran tiger"); -The scientific name of the host resolved to the species level; -The higher taxonomy (i.e. family) of the animal host, retrieved from the report, expert knowledge, or the literature.
Finally, for each SARS-CoV-2 event recorded in the dataset, we reported the primary and secondary source of information, i.e. source name (ProMED-mail or WAHIS) and link to the online report, as well as the original information source as referred by the primary source. A copy of each report used during the data extraction process was downloaded and saved as a PDF file. We inserted a timestamp on the saved file (ProMED-mail reports) or the download date was specified within the file name (it was not possible to insert a timestamp on WAHIS reports).
Data documenting each event corresponds to information available in the ProMED-mail and/or WAHIS report when consulting the report (see timestamp or download date). Potential subsequent editions or modifications of the report by ProMED-mail and/or WAHIS was not considered.
Disclaimer. Use of the data from the WAHIS platform requires mentioning the following statement: "The World Organisation for Animal Health (WOAH) bears no responsibility for the integrity or accuracy of the data contained herein, in particular due, but not limited to, any deletion, manipulation, or reformatting of data that may have occurred beyond its control".

Data Records
Each row of the dataset represents a SARS-CoV-2 event in animal(s), identified by a unique identifier (field ID). We consider as an event when one single case or several epidemiologically related cases were identified by the presence of viral RNA (proof of infection) and/or antibodies (proof of exposure) in an animal. Epidemiologically related cases include e.g. animals belonging to the same farm, captive animals housed together, pets belonging to the same household, or animals sampled within the same (generally transversal) study, featuring similar event and patient attributes, i.e. they underwent the same laboratory test(s) and showed the same results (including variant), exhibited the same symptoms and disease outcome, and were confirmed, reported (when applicable), and published on the same date (e.g. when pets of the same species, sharing the same household, showed different symptoms, they are reported as two distinct events). Events include follow-up history reports of outbreaks (e.g. follow-up on the clinical status of the animal, variant identification after case confirmation).
The field related_to_other_entries specifies potential relationship between events, and thereby allows identifying events that are related in space or time as well as follow-up reports (e.g. when animals described in two or more events are living together or when a follow-up report presents an update of another event, itself referred as updated by). Online-only Table 1 describes the 50 fields presented in the final dataset and their format. Supplementary File 1 provides three examples illustrating the structure and coding scheme of the SARS-ANI Dataset.
We have considered the two following values throughout the dataset: • NA (not applicable): when the field does not apply to the event. For example when only one laboratory test is conducted (field test), NA is reported for the second and third tests (test_2 and test_3, respectively). • NS (not specified): when the information is relevant for the event but has not been specified in the report(s).
For example, when a PCR test is performed for diagnostic purpose but the report(s) does not mention which sample was used, the sample type (sampling_type) is NS.
Accompanying the dataset, we publish the R code to perform technical and visual validation of the data as well as the ProMED-mail and WAHIS reports used as information sources (n = 364 reports as of date of submission). We also share the list of ProMED-mail and WAHIS reports that have not been included in the dataset and main reason for exclusion. This strategy, in line with the Open Science Principles 34 , aims to ensure that the data being reported is accurate and that all information can be accessed by researchers, policymakers, and the public. This also guarantees reproducibility of the data collection, may motivate further external validation processes as well as a large re-use of the data. The SARS-ANI files and products are summarized in Table 1. Event displays are also freely available on the SARS-ANI dashboard (https://vis.csh.ac.at/sars-ani/, see Usage Notes).

Static dataset.
A static copy of the dataset in .csv file format is deposited on Zenodo 38 (first version, v1.0, was uploaded on 11 April 2022) and all versions are publicly available at https://doi.org/10.5281/zenodo.6442730, together with the SARS-ANI related files (metadata, R code, archived reports) described in Table 1. As of date of submission, the last uploaded static copy of the dataset (v1.1, 20 June 2022) encompasses all SARS-CoV-2 events published between 29 February 2020 and 7 June 2022. Version 1.1 displays 696 records of SARS-CoV-2 events in animals, representing 1,947 documented cases (infections and/or exposures), in 25 farmed, captive, wild, and domestic taxonomically-resolved animal species belonging to 14 families, from 39 countries worldwide (covering 150 subnational administrative areas). The number of cases was not reported by ProMED-mail and/or WAHIS in 121 events. Figure 2 shows the geographic distribution of the reported SARS-CoV-2 outbreaks included in the dataset. Table 2 summarizes the number of SARS-CoV-2 cases reported globally in each animal host.

Dynamic dataset.
A live version of the SARS-ANI Dataset in .csv file format is publicly available on the GitHub repository, accessible at https://github.com/amel-github/sars-ani, together with the related SARS-ANI files described in Table 1. We plan to update the dataset weekly for at least the next 12 months. Then, depending on resources, the dataset will be updated at least half-yearly. The same technical procedures for data extraction and validation will be applied to any new event added to the dataset.
The GitHub interface allows users to flag potential inaccurate records in the dataset, which will trigger an error correction scheme, mainly consisting in processing the flagged record through another validation loop to check and replace the erroneous field when needed. Through the SARS-ANI GitHub repository and dashboard (https://vis.csh.ac.at/sars-ani/, see Usage Notes), we also expect to motivate experts in animal health, epidemiology, and conservation to support us filling up potential gaps in the records. The GitHub interface allows anybody to suggest changes to the dataset and code via the Issues Tracker (e.g. reporting an error and submitting new data). Contribution to the code can be implemented via a pull request (see Contributing.md, Table 1).

Limitations.
The dataset includes only SARS-CoV-2 events that have been published in ProMED-mail and/ or WAHIS. Therefore, the integration of an event in the dataset strongly depends on the reporting strategy of the country to the WOAH, the intensity of the research and surveillance strategy in the different animal species (e.g. whether pets from infected households are systematically investigated or not), the media coverage on the diagnosed cases, and the uptake of the reported event by the ProMED-mail team.
Moreover, we have identified five minor limitations in the dataset: 1. Some SARS-CoV-2 events in animals were not depicted in detail across the information sources, especially those related to infections in mink farms, for which the number of confirmed cases and deaths is often missing. Additionally, it was not always possible to discern from the report whether mink belonged to the same farm unit or not. In several reports on SARS-CoV-2 infections in mink farms, the number of sampled animals is specified whereas the number of positive animals is not (e.g. https://wahis.oie.int/#/ report-info?reportId=16733), which makes it impossible to infer whether the samples were pooled or not. Furthermore, SARS-CoV-2 infections in mink from Denmark and the Netherlands that occurred before 12 November 2020 (https://www.woah.org/en/oie-statement-on-covid-19-and-mink/) do not appear on the WAHIS dashboard of animal disease events while both countries endured a considerable burden in the fur-farming system 39 . Therefore, missing values (e.g. number of cases) for those events, reported in www.nature.com/scientificdata www.nature.com/scientificdata/ ProMED-mail, could not be completed. For these reasons, the dataset does not allow an accurate estimation of the economic and health burden of SARS-CoV-2 in mink. 2. Reports on mink and white-tailed deer often report the number of dams (sometimes dams and young) as the number of susceptible animals in the farm or herd, omitting adult males. We have reported the number as given in the information source(s), therefore, for these species, the number of susceptible animals in each event may be underestimated. 3. Although very uncommon, some errors were found in the ProMED-mail or WAHIS reports. For example, when date of laboratory results precedes date of sampling or when sequencing is mentioned as first diagnostic test, although a PCR was most likely performed beforehand. Since we did not have any mean to retrieve the correct information, we entered the data as mentioned in the reports. 4. To respect and protect the privacy of the animal owners, outbreak location (i.e. city or village), as provided in the ProMED-mail and WAHIS reports, may not represent the exact location of the outbreak and should be interpreted with caution. 5. When multiple events are related (e.g. animals were living together: related_to_other_entries = living together), the number of susceptible animals (number_susceptible) from the same species is identical in both events and is therefore redundant. Similarly, when animals belonging to the same species and living together exhibited different symptoms or underwent different laboratory tests (therefore reported as distinct events) but were culled as part of a control strategy (i.e. related_to_other_entries = living together AND control_measures = culling OR selective culling), the number of reported deaths (number_deaths) is identical for these events (and is therefore redundant). Although very few records correspond to those cases, this may lead to a certain degree of over-counting the number of susceptible or dead animals if filtering of the events on the fields related_to_other_entries and control_measures is not performed accordingly (e.g. number of deaths should be counted once for farm animals living together). The code provided to explore the dataset presents examples of how using filters.

technical Validation
Validation of the collected data followed several steps (Fig. 1).

Quality control & data cleaning. First, the data underwent a quality control and cleaning procedures
where the unique values of each field were checked to search for inaccurate (e.g. containing typographical errors or not belonging to a pre-defined list of entities), unreliable (e.g. the value was not specific), incorrectly formatted (e.g. date was formatted as dd/mm/yyyy instead of yyyy-mm-dd), or missing data in the dataset. This step was performed in R 40 using the base function unique(). Events containing detected errors were manually inspected against original reports. When necessary, the erroneous values were modified, replaced, or removed.

Taxonomic validation. Misspelled animal names and errors in taxonomy can lead to incorrect scientific
conclusions and poor policy design. Moreover, harmonization of host names aids with integrating other datasets (e.g. data on host biological traits, geographic distribution, or association with other pathogens). Therefore, for each event, we performed taxonomic validation of the common and scientific name of the animal host, using the

SARS-ANI VIS Dashboard
Visual interactive displays of some selected data of the SARS-ANI Dataset enabling to monitor SARS-CoV-2 events in animals in near real-time (https:// vis.csh.ac.at/sars-ani). www.nature.com/scientificdata www.nature.com/scientificdata/ R 40 package taxize 41 . We checked if the scientific names were up to date and if those names were spelled correctly using the function gnr_resolve() (names with a score ≤0.98 were manually inspected and, when needed, corrected). In addition, the scientific name for each common name (and vice versa) was resolved against the NCBI taxonomic backbone 37 using the function comm2sci() (sci2comm()). Finally, the higher taxonomy (i.e. family) for species names was validated by querying the NCBI database, using the function tax_name().
Search for duplicate events. We have identified duplicate events, defined as unique event reported more than once in the dataset. Events were flagged as duplicates if the geolocation information (i.e. country, sub-national_administration, city, and location_detail), resolved animal host denomination (host_com_res and host_sci_res), sex, age, symptoms, date (date_confirmed or date_reported if date_confirmed was missing or date_published if the two other dates were missing), number of cases, number of deaths, number of susceptible, tests conducted, outcome, and relationship to another event (related_ID) were identical. This step was executed in R 40 , using the dplyr package 42 . Flagged events were manually inspected against original information source(s) to confirm redundancy. Duplicate events were removed. Events that were incorrectly flagged as duplicates were corrected. The code to reproduce the three steps (described above) of the technical validation into R 40 (Table 1) is freely accessible at https://github.com/amel-github/sars-ani and at https://doi.org/10.5281/zenodo.6442730.

Visual validation. The data was visually inspected through different graphical displays. Figures and maps
were produced in R 40 , using the packages ggplot2 43 , webr 44 , and visNetwork 45 for graphical visualizations. The code to visually summarize the data (through maps, figures, and interactive network) is provided as an R Markdown document (Table 1) and is publicly accessible at https://github.com/amel-github/sars-ani and at https://doi.org/10.5281/zenodo.6442730.

expert validation. "Expert judgment is defined as an informed opinion of people with experience in the subject,
who are recognized by others as qualified experts in it, and who can provide information, evidence, judgments and assessments" 46 . First, the project leader reviewed unresolved issues met during data collection, conducted random verification of the events recorded in the dataset against the original reports, and randomly checked entry of ProMED-mail and WAHIS reports in the dataset. In a second step, two experts in our team examined the data as well as its graphical displays and subsequently provided feedback. These expert validation steps enabled to clarify questions and further resolve potential omissions in the dataset.

Usage Notes
The SARS-ANI Dataset is currently the most comprehensive global dataset of SARS-CoV-2 events reported in animals. Information displays in the dataset aims to facilitate a wide range of analyses of SARS-CoV-2 infections/exposures in animals that will further our understanding of the epidemiology and impact of SARS-CoV-2 in the different animal hosts, at different scales: international, national, or subnational. The data format, the standardized coding of the health events, and the harmonized host names make the dataset intelligible (also for www.nature.com/scientificdata www.nature.com/scientificdata/ non-experts) while they allow for a great analytical flexibility and interesting integration potential. We hope that these qualities will enhance the reuse and combination of the data across sectors and disciplines.
Addressing SARS-CoV-2 events in animals. Figures 3,4, and 5 visually display answers to some of the many questions that can be investigated using the data; other questions/visualizations can be computed from the published code (https://github.com/amel-github/sars-ani and https://doi.org/10.5281/zenodo.6442730): • What is the case fatality rate of SARS-CoV-2 per animal host and country? (Fig. 3) • Which SARS-CoV-2 variants have been identified in the different animal hosts? (Fig. 4) • Why were animals tested for SARS-CoV-2? (Fig. 5) The dataset can become an essential tool to estimate the realised and potential threat and impact of SARS-CoV-2 on animal husbandry (specifically mink/fur animals), pets, wildlife (including captive wild animals), and conservation programmes 30,47 . Coupled with economic data on the cost of testing, implemented control measures (e.g. mass culling) and value of farmed and captive animals, it can facilitate research in animal health economics, e.g. assessing the economic burden of SARS-CoV-2 infections on animal production systems and conservation programmes or supporting cost-benefit analysis of prevention of zoonotic-origin pandemics 48 . The dataset can also assist risk-based veterinary surveillance by identifying surveillance needs to protect animal health and efficiently prioritizing resource allocation, especially in resource-limited contexts.
Additionally, the SARS-ANI Dataset can be expanded through the integration of other datasets (e.g. VIRION 49 ) or be ported and integrated into existing platforms on animal diseases (e.g. the Wildlife Health Information Sharing Partnership: https://whispers.usgs.gov/home or the WildHealthNet initiative: https://oneworldonehealth.wcs.org/Initiatives/WildHealthNet.aspx). . Therefore, the number of diagnosed cases is largely under-estimated in mink. This is also true for deer, but to a lesser extent. **NS: Not specified in the reports. The hamster species was neither specified in the ProMED-mail nor WAHIS report.
www.nature.com/scientificdata www.nature.com/scientificdata/ Finally, the procedure and the standardized reporting format developed in SARS-ANI are applicable to other infectious threats; likewise, the flexible and well-documented analytical tools can be adapted and used for descriptive epidemiology of other diseases.
One Health surveillance strategy. Because surveillance of zoonotic-origin diseases at the human-animal-environment interface is extremely challenging, there is a need for comprehensive One Health approaches to monitor SARS-CoV-2 carriage and infections in animals and humans globally 18 . One Health tools that enable the integrative analysis and visualization of SARS-CoV-2 events are critical. The SARS-ANI Dataset can be combined with other data across sectors (e.g. data on COVID-19 cases in humans, land-use and environmental data) to support research efforts on SARS-CoV-2 at the human-animal-environment interface 50 , e.g. identifying hotspots of circulation and spillover, developing and adapting integrated One Health surveillance systems of SARS-CoV-2 events, and elucidating the natural ecology of SARS-CoV-2. We believe the SARS-ANI Dataset, with timely and reliable information, can assist inter-professional and multi-sectoral SARS-CoV-2 prevention and control activities, including the development of relevant national and international regulations and agreements to improve preparedness and reduce the risk of transmission between humans and animals. Information sharing on SARS-CoV-2 infections/exposures in animal living close to humans will benefit veterinary and public health professionals in their investigations of SARS-CoV-2 cases at the human-animal interface (e.g. need for testing pets/captive animals in contact with COVID-19 diagnosed owners/caretakers; types of samples and tests to be performed). The data can also provide background information to develop practical advice for the international trade in domestic and farmed species, for which a role in human infections has been recognized 10 . Furthermore, the data can be used to develop and adapt national or global One Health prevention, preparedness and response plans for emerging coronavirus diseases and assist public health officers in their task.
Finally, leveraging the experience from SARS-ANI and other projects 31,49 , further research may allow the development of a global dataset of known spillback events that will enhance our understanding of the barriers and facilitators of zoonotic transmission as well as the training of retrospective and predictive models to identify patterns and predict future emergence 31 .
SARS-ANI VIS: informing scientists, stakeholders, and the public. The SARS-ANI dashboard, publicly accessible at https://vis.csh.ac.at/sars-ani/, provides intuitive insights into specific aspects of SARS-CoV-2 events in animals at-a-glance (Fig. 6). The visual representations of the data are displayed in a narrative format, where information is conveyed through vertically connected segments. Each segment features a selected topic, starting from a general overview and leading to more specific questions, such as variants across species www.nature.com/scientificdata www.nature.com/scientificdata/  www.nature.com/scientificdata www.nature.com/scientificdata/ or reported clinical signs. The partition into segments aims to ease the understanding of the information and facilitate webpage navigation. Each segment is carefully designed to exhibit the intended information in a fashion that can be comprehended by scientists and the general public alike. The dashboard thus will facilitate access to the data, favour animal health information sharing, and foster global understanding of the data among the scientific community, stakeholders, and the public. The dashboard intends to support public education about the risk of SARS-CoV-2 transmission between humans and animals and raise public awareness about possible wildlife conservation issues posed by the SARS-CoV-2 pandemic. The dashboard is linked to the live dataset available on GitHub (https://github.com/amel-github/sars-ani) and will therefore be subjected to continuous updates.

Code availability
A static version of the SARS-ANI Dataset and related files that accompany the dataset (metadata, R code, archived reports) have been deposited on Zenodo 38 and are available at https://doi.org/10.5281/zenodo.6442730, for public access. A live version of the dataset (together with the related files) is accessible on GitHub at https://github.com/ amel-github/sars-ani. Please refer to the README file in the code release for further instructions.

acknowledgements
We thank our colleagues of the Unit Veterinary Public Health and Epidemiology, University of Veterinary Medicine Vienna, Austria, for their inputs on the different visual displays of the data. The SARS-ANI Team is grateful to the Legal Department of the University of Veterinary Medicine Vienna, Austria, for its advice regarding data (re)use, and especially to C. Ruckenbauer. We also thank W. Schueller for his support with the GitHub interface and Prof. W. H. M. van der Poel for his helpful information on the WAHIS reporting.

author contributions
A.N. developed the structured template and the coding scheme, extracted the data, investigated flagged errors and redundancies, updated the dataset, archived the information sources, and helped write the manuscript. A.D.L. conceived the study, coordinated the production of the dataset, developed the code for technical and visual validation, created the figures and tables, and helped write the manuscript. L.Y. created the dashboard, including conceptualisation, developing the data story and data automation process. J.S. supported the development of the dashboard and helped write the manuscript. A.K. and C.W. contributed to the data review process and helped write the manuscript.