A global dataset of microbial community in ticks from metagenome study

Ticks are important vectors of various zoonotic pathogens that can infect animals and humans, and most documented tick-borne pathogens have a strong bias towards microorganisms with strong disease phenotypes. The recent development of next-generation sequencing (NGS) has enabled the study of microbial communities, referred to as microbiome. Herein, we undertake a systematic review of published literature to build a comprehensive global dataset of microbiome determined by NGS in field-collected ticks. The dataset comprised 4418 records from 76 literature involving geo-referenced occurrences for 46 species of ticks and 219 microorganism families, revealing a total of 83 emerging viruses identified from 24 tick species belonging to 6 tick genera since 1980. The viral, bacterial and eukaryotic composition was compared regarding the tick species, their live stage and types of the specimens, or the geographic location. The data can assist the further investigation of ecological, biogeographical and epidemiological features of the tick-borne disease. Measurement(s) microbial community Technology Type(s) Next Generation Sequencing Factor Type(s) tick Sample Characteristic - Organism tick Sample Characteristic - Environment microbial community Sample Characteristic - Location Whole world Measurement(s) microbial community Technology Type(s) Next Generation Sequencing Factor Type(s) tick Sample Characteristic - Organism tick Sample Characteristic - Environment microbial community Sample Characteristic - Location Whole world


Background & Summary
Ticks are important vectors and reservoirs of a broad range of pathogens that are capable of causing diseases in humans, livestock, and wild animals. In the worldwide range, more than 800 tick species have been documented, including over 700 species in the family Ixodidae (hard ticks) and 193 species in the family Argasidae (soft ticks) 1,2 ; At least 30 tick species are reported to feed on human beings and at least 103 known pathogens are transmitted by ticks [3][4][5][6] . Tick-borne pathogens co-evolve with their vectors and hosts and survive, multiply and circulate due to their adaptation to these different biological systems. Some are significant threats to human and animal health, for example, species of Anaplasma, Babesia, spotted fever group Rickettsiae, Borrelia, and viruses 3,[6][7][8] . Ixodidae is the largest tick family having 3 active life cycle stages, including a single nymphal stage 9,10 . Argasidae also has 3 active life stages, but most species have multiple nymphal stages before developing into adults 4 .
Emerging and re-emerging tick-borne infectious diseases pose a continuing threat to human health. In the past three decades, application of molecular technologies had assisted in discovering new tick-borne pathogens and identifying the pathogenicity of the microorganisms previously detected in ticks. An increasing number of tick-borne pathogens have been reported, heartland virus 11 , tick-borne encephalitis virus 12 , Borrelia burgdorferi sensu lato 13 , Rickettsia rickettsi 14 , severe fever with thrombocytopenia syndrome virus 8 , are just a few examples of important pathogens that pose threats to human health. However, the diversity of tick-borne infectious diseases remained underestimated, since the investigations tended to be heavily biased toward research on microorganisms that infect humans or animals of economic and social importance 15,16 . The advent of advanced technologies such as high-throughput sequencing, meta-genomics, meta-transcriptomics, etc., had enabled a systematic understanding on a high variety of pathogenic or non-pathogenic, known or unknown, endogenous or exogenous microorganisms that are carried by ticks 9,15,16 . Several large-scale microbiome datasets derived from tick samples sourced from wide geographic regions are now publicly available in recent years. A number of novel tick-associated pathogens were discovered by NGS such as Bole Tick Virus 1, Changping Tick Virus 1, Dabieshan Tick Virus, Wuhan Tick Virus 2 17 . However, at present, a systematic account of microbiome data is lacking, thus far from adequate to attain a complete understanding of the diversity of tick-associated microbiome 15,16,18,19 .
Herein, we performed a systematic review of published literature to build a comprehensive global dataset on the diversity and distribution of microbiome by NGS performed in field-collected ticks. The data on the viral microbiome, bacterial microbiome and eukaryotic microbiome were assembled separately to identify all the viruses, bacteria and eukaryotes present in a tick sample, and to determine novel pathogens that can be carried by ticks.

Methods
Data collection. The study was performed according to the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement 20 . To attain an exhaustive review of the published literature on the microbiome diversity by NGS in field-collected ticks, a literature search was conducted on Chinese and English databases using a set of terms and Boolean operators, mainly through PubMed, Web of Science (WOS), China National Knowledge Infrastructure (CNKI) and the WanFang databases up to 1 April 2022, without language or publication-type restrictions. At the first step, general search terms were applied that included: "tick", "Amblyomma", "Archaeocroton", "Bothriocroton", "Dermacentor", "Haemaphysalis", "Hyalomma", "Ixodes", "Nosomma", "Rhipicephalus", "Rhipicentor", "Robertsicus", "Antricola", "Argas", "Carios", "Nothoaspis", "Ornithodoros", "next-generation sequencing", "high-throughput sequencing", "deep sequencing", "Roche 454", "Illumina", "Ion Torrent", "SOLiD" in English literature databases search, and the keywords ("tick", "virome", "microbiome", "metagenome", "high throughput sequencing", "deep sequencing", "next generation sequencing") were used in Chinese literature databases search. Data on all types of microorganisms, including viruses, bacteria, and eukaryotes were included. Emerging pathogens were defined as those first isolated or discovered after 1980. Ticks can feed on a wide range of vertebrates, therefore to highlight the presence of pathogens that were unique to ticks, we chose to include data that were performed on field-collected free-living ticks, while not include data from the detached ticks, since the latter might represent a complex microbiome of both tick and animal host derived. We excluded the following studies: (i) data obtained from experimentally fed ticks or detached ticks collected from animals; (ii) studies on the evaluation of the methods or the isolation and propagation of laboratory strains; (iii) review paper and (iv) studies that only tested the specific microorganism in ticks (Fig. 1a).
A total of 2797 studies were retrieved for screening, comprised of 2070 from the English database and 727 from the Chinese database. The title and abstract of the retrieved studies were screened independently by three reviewers (MC L, JT Z, and Y Z) to identify studies potentially eligible for inclusion, which was narrowed down to 362 studies. For the third step, the full texts of the remaining studies were retrieved and independently assessed for eligibility by two reviewers (ZY H and BK F). Finally, a total of 7 Chinese and 69 English studies were eligible for data extraction (Fig. 1a). The earliest one was published in 2011, and the number of publications increased over the years, with a remarkable increase starting from the year 2017 (Fig. 1b). Of all selected studies, 69 (90.8%) used the Illumina sequencing platform, and 5.3% used the Ion Torrent sequencing platform (Fig. 1c). www.nature.com/scientificdata www.nature.com/scientificdata/ Data were from 46 species of ticks in 7 genera collected from 24 countries in 6 continents, and the geographical distribution of tick genera was shown in Fig. 2a. The viral metagenomic profiling, eukaryotic and bacterial microbiome profiling that corresponded to various tick genera were displayed across countries (Fig. 2b,c).
Full text of all the selected papers were reviewed, and data were extracted into a standardized dataset in Microsoft Excel 2019 that mainly includes: (i) identification of tested ticks at the family, genus, and species levels, (ii) methods for tick species identification, (iii) life cycle stages of the tested ticks, (iv) the geographic location of the ticks at country and province levels, (v) taxonomic annotations of microorganisms at family, genus, species levels, (vi) the platforms used for NGS. A re-check by two persons (MC L and JT Z) was performed to correct errors and remove duplicates. All conflicts of opinion and uncertainties were discussed and resolved by consensus with a third reviewer (JJ C). The main variable of interest was the viral/bacterial/eukaryotic component of the microbiome, determined for specific tick species at a specific site over time. All data were entered into the resultant by trained coauthors.
Geo-positioning. The location information of the tick-collection site was extracted at the province level from the selected literatures. If no data on longitude or latitude were reported, or the location information was only given at a large scale such as a scenic area, mountainous region, ArcGIS 10.7 software was used to extract the geographical coordinates of the center points of the corresponding administrative areas from the digital map, which were obtained from GADM (Database of Global Administrative Areas) and Standard Map Service System. If the collection site could not be determined by any of these means, the authors were contacted for further information. We used R Studio Version 4.1.2 software and ArcGIS 10.7 software to statistically analyze and visualize the obtained geographic data.

Data records
The dataset of microbiome in field-collected ticks, based on NGS is available on figshare 21 . The columns contained in the dataset are shown as follows:

technical Validation
This dataset contains 4418 records that were extracted from 7 Chinese references and 69 English references. All recorded data were cross-checked by trained coauthors, and all uncertainties and discrepancies were discussed by consensus with a third reviewer. The first authors were also contacted to clarify the missing or ambiguous data.
The identification methods for tick species are critical in ensuring the credibility of the data, which is particularly relevant for juvenile stages when the morphological identification is difficult at the species level. The identification methods by morphology, molecular diagnosis for 18S rRNA sequencing, or combination of both methods were recoded. For the studies only used the morphological identification, the risk of confusion in tick species should be warned.
In the process of verifying the geographic location of tick collection, an independent third-party was designated to re-check the information. The verification process refers to the same standard as that used in the data entry process. In order to unify the location information which provided no uniform standard to the province level, ArcGIS software was used to determine the coordinates of the central points of the provinces, which were marked on the Baidu Map to ensure that each coordinate point corresponds to an accurate administrative region. The geographic distribution of the ticks (Amblyomma, Dermacentor, Haemaphysalis, Hyalomma, Ixodes, Rhipicephalus, and Ornithodoros) that were tested for microbiome were separately displayed (Fig. 2a). The top five viral families (Flaviviridae, Nairoviridae, Parvoviridae, Phenuiviridae, and Rhabdoviridae) reported with the highest number of studies in relate to the tested ticks were mapped (Fig. 2b). The top five bacterial families (Anaplasmataceae, Coxiellaceae, Moraxellaceae, Pseudomonadaceae, Rickettsiaceae) reported with the highest number of studies in relate to the tested ticks, as well as Borreliaceae, the important tick-borne pathogens with a variety of vertebrates host and causes the most common tick-borne disease-Lyme borreliosis in the Northern Hemisphere 22,23 were mapped (Fig. 2c). The two eukaryotic families (Babesiidae, Schistosomatidae) reported with the highest number of studies, as well as Fungi were mapped (Fig. 2c). The viral and bacterial composition at phylum or family level, as well as the number of records that corresponded to the tick genus were illustrated in Fig. 3 Information about tick life cycle stages, tick genera, and sex were shown in Table 1. Of the 76 literature, the adult ticks, nymph ticks, and larva ticks were tested in 53, 23 and 9 of the literature, respectively. Ixodes was the most frequently tested tick genus (with totally 43 studies that underwent NGS), followed by Dermacentor (24). Female ticks were recorded in 53 studies, and male ticks in 47 studies.

Usage Notes
Investigating the potential tick-borne pathogens remains an important part of the source tracing and early warning of infectious diseases and emerging infections. To the best of our knowledge, this study represents the first attempt to comprehensively understand the microbial community, that was present in tick species acquired by using the NGS platform. NGS data from a total of 76 literature that recorded 46 species of ticks from 24 countries during 2011 to 2021 were compiled in the dataset. For each record, tick species were paired with relevant geo-positioning, timeline variables, microbiology composition, the number of records, and sequence platform. The dataset revealed the fundamental structure of the viral, bacterial and eukaryotic microbiome in tick species, which allowed for further comparative study. For example, the bacterial and viral composition of the NGS data could be compared regarding the tick species, their live stage and types of the specimens, or by their geographic location or collection season. The abundance of viruses or bacteria grouped at the family/genus/species level a b c d