Mapping ticks and tick-borne pathogens in China

Understanding ecological niches of major tick species and prevalent tick-borne pathogens is crucial for efficient surveillance and control of tick-borne diseases. Here we provide an up-to-date review on the spatial distributions of ticks and tick-borne pathogens in China. We map at the county level 124 tick species, 103 tick-borne agents, and human cases infected with 29 species (subspecies) of tick-borne pathogens that were reported in China during 1950−2018. Haemaphysalis longicornis is found to harbor the highest variety of tick-borne agents, followed by Ixodes persulcatus, Dermacentor nutalli and Rhipicephalus microplus. Using a machine learning algorithm, we assess ecoclimatic and socioenvironmental drivers for the distributions of 19 predominant vector ticks and two tick-borne pathogens associated with the highest disease burden. The model-predicted suitable habitats for the 19 tick species are 14‒476% larger in size than the geographic areas where these species were detected, indicating severe under-detection. Tick species harboring pathogens of imminent threats to public health should be prioritized for more active field surveillance.

T icks are hematophagous arthropods that parasitize vertebrates including livestock, wild animals, and human beings throughout the world 1 . The major threat that ticks impose on human and animal health lies in their vector role in the transmission of a variety of pathogens, and their epidemiological and epizootic significance is considered only second to mosquitoes 2,3 . Pathogenic organisms harbored by ticks mainly encompasses viruses, bacteria (in particular rickettsiae and spirochetes), protozoa, and helminth, with increasing diversity over the past 30 years 4,5 . The ongoing geographic expansion of tick species, possibly driven by climatic and environmental changes, has drawn global attention 6 . A typical example is Haemaphysalis (Ha.) longicornis, which was originally native to East Asia, spread to Australia, New Zealand, and several Pacific Islands since 1983 7 , and was recently found in the eastern U.S. The expansion of Ha. longicornis has raised public health and animal health concerns due to its capability of transmitting tick-borne agents, e.g., severe fever with thrombocytopenia syndrome virus (SFTSV), spotted fever group rickettsiae, and A. phagocytophilum [8][9][10][11] . Another example is the increasing incidence of tick-borne encephalitis (TBE) across the Euro-Asia in the past three decades, which was linked to the expansion of its competent vectors, Ixodes (I.) ricinus and I. persulcatus ticks 12,13 . This globalization trend of ticks and increasing variety of tick-borne diseases (TBD) are calling for an extensive and in-depth research on the spatial distributions of both ticks and tick-borne pathogens, as well as their underlying risk determinants.
In China, the growing awareness of emerging tick-borne pathogens has greatly inspired investigations on ticks and TBDs in recent years 4,14 . One study compiled a data set with regard to tick distribution and diversity up to the county level in China from peer-reviewed literature published between 1960 and 2017 15 . Another study reviewed the geographic distribution of tick species at the province level together with the diversity and specificity of animal hosts of ticks 16 . Yu et al. 17 reviewed the association between pathogenic microorganisms and tick vectors throughout China based on the literature up to 2014. However, none of the studies provided high resolution spatial distribution of tick-borne pathogens, nor did they investigate systematically ecological niches of either major tick species or prevalent tickborne pathogens.
Here we conduct an up-to-date review on the spatial distribution of predominant tick species, tick-borne agents, and human cases of TBDs in China, based on which we build predictive models to assess the contributions of relevant socioenvironmental factors to the ecological suitability of selected 19 ticks and two tick-borne pathogens, and map model-projected risks to inform future surveillance and control efforts. Ha. longicornis is found to harbor the highest variety of tick-borne agents, followed by Ixodes persulcatus, Dermacentor nutalli and Rhipicephalus microplus. The top five tick-borne agents that parasitize the largest number of tick species are Anaplasma phagocytophilum, Borrelia burgdorferi sensu stricto, Borrelia garinii, Ehrlichia chaffeensis, and Rickettsia raoultii. The model predicted suitable habitats for the 19 tick species are extensive, 14-476% larger in size than the geographic areas where these species have been observed. Tick species that are severely underdetected but harboring pathogens of imminent threats to public health should be prioritized for field surveillance.
The abundance of tick species varies substantially across the seven biogeographic zones which are defined by climatic and ecological characteristics ( Fig. 1) (Supplementary Tables 1-5), indicating decent predictive power. The ecoclimatic and environmental variables that were predictive of the geographic distribution of the ticks differed among the species, even for those in the same genus (Fig. 2f, Supplementary Tables 1-5). Temperature seasonality and mean temperature in the driest quarter were the two most important drivers, contributing ≥5% to the ensemble of models for 14-and 12-tick species, respectively, followed by elevation contributing ≥5% to the models for seven tick species (Fig. 2f, Supplementary Tables 1-5). The same predictor, however, may drive the risk in different directions for different tick species . For example, a high temperature in the driest quarter was associated with a high probability of presence for I. granulatus and R. haemaphysaloide but with a low probability for I. persulcatus and Ha. longicornis (Supplementary Figs. 11,13,16,22).
The model-predicted high-risk areas of the 19 tick species were much more extensive than have been observed, 31-520% greater in the number of affected counties, 14-476% larger in the size of affected geographic area, and 25-556% larger in the affected population size (  (Table 1).
Ecological clustering of tick species. Based on the ecological similarity represented by the environmental and ecoclimatic predictors, the 19 tick species were grouped into five clusters with clear patterns of spatial aggregation (Fig. 2). D. nuttalli and D. silvarum constituted cluster I that covered the vast region in northern (including northeastern and northwestern) China. This cluster stretches over biogeographic zones I-IV characterized by middle to high elevations, shrub grassland, strong seasonality in temperature, relatively low temperature in the wettest quarter (often also the warmest quarter), and low precipitation in the driest month ( Fig. 2 and Supplementary Figs. 23,24). Ha. longicornis, Hy. scupense, and R. sanguineus were grouped into Cluster II which was mainly found in biogeographic zones II, III, and VI, featuring the landscape of shrub grassland and irrigated or rainfed croplands at low-middle elevations (<1600 m) in central, eastern, and northwestern China ( Fig. 2 and Supplementary  Figs. 16,20,27). R. microplus, R. haemaphysaloides, I. granulatus and Ha. hystricis were grouped into Cluster III that was mainly distributed in biogeographic zones V-VII covered by coniferous or broad-leaved woods at low elevations in southern and central China where the weather is warm and humid with low seasonality in temperature ( Fig. 2 and Supplementary Figs. 13,19,21,22). Cluster IV, composed of I. persulcatus, Ha. concinna and Ha. japonica, ecologically fits biogeographic zones I and III in northwestern and northeastern China covered by coniferous or broad-leaved forests as well as cropland, featuring strong seasonality in temperature and low temperatures in the driest season ( Fig. 2 and Supplementary Figs. 11, 17, 18). Cluster V comprises Distribution of tick-borne agents. Among the 103 tick-borne agents detected in China, 65 were newly identified in the past two decades (Fig. 3). Ha. longicornis is the tick species harboring the highest variety of tick-borne agents, as many as 44 known species including seven Rickettsia species, seven Babesia, 12 Anaplasmataceae, four Theileria, four Borrelia, nine viruses, and Francisella tularensis (F. tularensis) (Fig. 3). Other competent tick species that carry 20 or more agents are I. persulcatus (36 agents), D. nutalli (32 agents), R. microplus (31), D. silvarum (30), Ha. concinna (24), and Hy. asiaticum (23). Agents that parasitize more than ten tick species are R. raoultii (in 15 tick species), R. heilongjiangensis (14), Anaplasma (A.) phagocytophilum (22), Ehrlichia (E.) chaffeensis (16), A. bovis (ten), B. burgdorferi sensu stricto (20), B. garinii (18), B. afzelii (11), Coxiella (C.) burnetii (14), Jingmen tick virus (12), and Theileria (T.) annulata (11) (Fig. 3).
As the most commonly recorded agent in the Anaplasmataceae, A. phagocytophilum was scattered over the whole nation except for the southwest ( Cases were also seen in the northeast and the southeast. E. chaffeensis had a comparably wide geographic scope, except that it was also detected in the southwest (Yunnan Province), and sporadic human cases were reported in Inner Mongolia, Beijing, Tianjin, Shandong, and Guangdong provinces. A. capra was the third most commonly detected agent in humans in the family with a total of about 29 reported case in the northeast, although it was also found in ticks in the central and the west. Another widely distributed agent was E. canis, found in the east, the south, and the northwest.
As to Borrelia burgdorferi sensu lato complexes in the genus Borrelia, which are the etiological agent of Lyme disease, ticks carrying B. garinii, B. afzelii, and B. burgdorferi sensu stricto shared similar distributions across northwestern, northern, northeastern, and southern China, although B. garinii was more widely detected in ticks (  Table 1 The average testing areas under the curve (AUC) of the BRT models at the county level and model-predicted numbers, areas and population sizes of affected counties for the 19  The distributions of Babesia spp. species were mostly focal, but Ba. microti was found in the north, the northeast, the east, the southeast, and the southwest of the country ( Fig. 4d; Supplementary Fig. 38; Supplementary Note 2). Human infections with Ba. microti occurred in the east near Shanghai and in Yunnan Province of the southwest. Ba. divergens was found in human cases in Xinjiang, Gansu, and Shandong provinces, extending from the northwest to the central and to the east of the country, but was detected in ticks only in the northeast. Ticks harboring Ba. venatorum were only found in the northeast, but human infections were reported in both the northeast and the northwest. Ba. crassa-like agents parasitized ticks and infected humans in the northeast, mainly in Heilongjiang Province. A human infection with Ba. spp. XXB/Hangzhou was recorded in Zhejiang Province, but detection of this agent in ticks has not been reported yet. Human infections with uncharacterized Babesia spp. were mostly reported in Inner Mongolia in the north, Zhejiang Province in the east, and Yunnan Province in the southwest.
T. annulata was the most widely distributed agent in the Theileria genus, followed by T. sergenti and T. luwenshuni. All three agents were reported in the western, northern, northeastern, and central parts of China, with T. annulata also detected in the south (Supplementary Fig. 39; Supplementary Note 2). Other Theileria agents, including T. sinensis, T. ovis, T. equi, and T. uilenbergi, were only found in Xinjiang in the northwest or Gansu in the central west. Thus far, Theileria agents have not been associated with human infection in China.
Of the seven known tick-borne bacteria in China, C. bumetii, the causative agent for Q fever, was the most widely distributed, found in either ticks or humans across the country except for the  (Fig. 4f). Another recently discovered agent, Alongshan virus (ALSV), was detected in ticks in the northern tips of Inner Mongolia and Heilongjiang, as well as in human cases across the two provinces.
Risk mapping and risk factors for pathogens associated with major TBDs. The majority of human SFTS cases during 2010-2018 were diagnosed in Liaoning Province in the northeast, Shandong, Jiangsu, and Zhejiang provinces on the east coast, and Henan, Hubei and Anhui provinces in central China (Fig. 5a). The etiological virus, SFTSV, was detected primarily in Ha. longicornis, but also in D. nuttalli in northern Xinjiang. The model-predicted risk areas resembled the current reporting regions (Fig. 5b). Approximately 251.5 million people reside in high-risk areas where the model-predicted probability of SFTSV presence exceeds 50%. Temperature seasonality, mean temperatures during the wettest quarter, elevation, annual temperature range, closed woodland, mean temperatures during the driest quarter 24 were the leading risk determinants for the presence of SFTSV with RC >7% (Table 2). SFTSV ecologically prefers regions at low to moderate elevations (<1000 m) and with strong seasonality and wide annual variation range in temperature, a low mean temperature in the winter (driest quarter), and a high mean temperature and precipitation in the summer (wettest quarter or month) (Supplementary Fig. 42).
Human cases of TBEV primarily clustered in northeastern China, coinciding with the model predicted high-risk areas (Fig. 5c). The northwestern region where the virus was found only in ticks was classified as having mild to moderate risks (Fig. 5d). In total, about 94.5 million residents live in the high-risk areas. Temperature seasonality was by far the most influential predictor with RC = 54.0% (Table 2), which was also a leading predictor for the presence of I. persulcatus, a major carrier of TBEV (Supplementary Table 1). Additional important contributors included mean temperature in the driest quarter, elevation, and closed woodland (RC > 7%). High risks of TBEV were flagged by low to medium elevations (<1000 m), strong seasonality in both temperature and precipitation, and low temperature in both winter (driest quarter) and summer (warmest month) (Supplementary Fig. 43).

Discussion
We assembled the most comprehensive, if not all, records of tick species and tick-borne pathogens in China that cover a time span of 70 years. We reported detection locations of some tick species not covered by existing compiled data sets 16 , e.g., H. longicornis in Xinjiang and I. persulcatus in Jiangsu (Supplementary Figs. 2, 3). Our cross tabulation of tick species and associated pathogens is also more complete than previous studies, e.g., listing ten known pathogens detected in H. japonica and nine tick species harboring SFTSV, compared to only two and one, respectively, in a previous review 17 . Using a robust machine-learning algorithm, we found that the geographic scopes of major tick species (particularly Ha. longicornis, I. sinensis, R. microplus, R. sanguineus, and  (Table 1). However, it is also possible that our models were underspecified and thus overestimated the scopes.
The ecological niches for ticks are complex, and the key predictors differ even within the same genus. For example, Ha. concinna, Ha. japonica and Ha. hystricis prefer places covered by coniferous or broad-leaved trees, whereas Ha. longicornis thrives in shrub grasslands. It is therefore meaningful to group tick species by their ecological characteristics, in addition to their genera, to better understand the overall risk of tick exposure at any given place. We found five clusters of tick species that share comparable ecological niches and geographic distributions. Such clustering offers additional information for risk assessment and field investigation. For instance, despite the low detection and low risk of Ha. longicornis in northwestern China ( Supplementary  Figs. 3, 31), it should be targeted for survey in this region where both R. sanguineus (Supplementary Figs. 5, 32) and Hy. scupense (Supplementary Figs. 7, 34), which are in the same ecological cluster as Ha. longicornis, have high prevalence of field detection and model-predicted risks.
Ha. longicornis is by far the most widely distributed and influential tick species, exposing over 40% of the nation's population in 1140 counties of eastern and northeastern China. The underlying implication for public health is enormous, as Ha. longicornis harbors 44 tick-borne pathogens and is a competent vector for SFTSV that was associated with a case fatality ratio of 12-50% 25 . Native to East Asia, Ha. longicornis was thought to be imported from Japan to Australia and New Zealand in the 19 th century 7 . The recent emergence of Ha. longicornis in eight states of the U.S. suggests a global spread and flags the need for close monitoring of the ticks and related pathogens in this regions 8,26 .
The genus of Ixodes, mainly comprising I. persulcatus, I. sinensis and I. granulatus, also imposes serious threats to public health. I. persulcatus is a major carrier for both TBEV and B. burgdorferi sensu lato (Borrelia), the latter also being carried and transmitted by two other Ixodes species 27,28 . The three tick species have distinct ecological habitats that jointly cover many Among the ecoclimatic factors, the most influential for tick ecology are temperature seasonality and the mean temperature in the driest quarter. The driest quarter as well as the coldest quarter overlap with the winter season in most areas. Consequently, the survivability in the winter season is a key to the ecology of ticks. While grasslands, croplands, and woods constitute the natural habitat for ticks, our analyses indicate the nontrivial role of human settlements. For example, both Ha. longicornis and Ha. japonica had a positive association with the coverage of rural settlement, indicating their tendency of parasitizing domestic animals 29,30 . In contrast, both Hy. asiaticum and D. marginatus seem to prefer grasslands and croplands with fewer rural settlements (Supplementary Tables 4, 5). The potential preference of I. sinensis for a high coverage of urban constructions warrant further investigation given the fast urbanization in China (Supplementary Table 1).
Our findings on ecological drivers for SFTSV and TBEV, two notifiable tick-borne pathogens in China, bear some similarity with our previous ecological studies on SFTS and TBE diseases in human 27,31,32 . For example, elevation, close-canopy woodland, shrubland, and precipitation in the driest quarter were also drivers for SFTS 32 , and elevation and close-canopy woodland were drivers for TBE 27 . However, the current work identified more climatic conditions that are important for the ecology of the pathogens, e.g., temperature seasonality and mean temperature in the driest quarter for both SFTSV and TBEV, and even more climatic drivers for SFTSV (Table 2). In addition, the current study identified a wider geographic area of 393 counties (522,000 km 2 ) with a larger population (251.5 million people) at risk of potential SFTSV occurrence, compared to 222 counties (324,000 km 2 ) covering 142.2 million people in one previous study 31 , and 384 counties (579,000 km 2 ) covering 226.5 million people in another 32 . In particular, we found that the Xinjiang Uygur Autonomous Region in northwestern China and areas in central and northern China are subject to an increasing yet largely neglected risk of SFTSV infection 31,32 . The model-predicted area and population at risk of TBEV occurrence are also larger than those shown in a previous study (278 vs. 214 counties, 1,609,000 vs. 1,430,000 km 2 , and 94.5 vs. 68.4 million persons) 27 . The additional at-risk areas regarding TBEV found by the current study are mainly distributed in the south of northeastern China which is densely populated 27 . The differences in the results may come from differences in data and methodology, e.g., our TBE data are more up to date, and our choice of control sites for ecological modeling weighed in existence probability of the competent ticks. Most importantly, the current study focuses on the ecology of the pathogens rather than human diseases, fundamentally different from the previous studies.
Although the risk of SFTSV occurrence is low to moderate in the northwestern area (northern Xinjiang), more active surveillance of both human cases and SFTSV-carrying ticks are recommended for several reasons: (1) SFTS was reported in a tourist to this area recently 33 , (2) SFTSV was detected from both D. nuttalli and Hy. asiaticum, two prevalent tick species in the area 33,34 , and (3) this area lies in the heart of the Central Asia Flyway where ticks may exchange among distant regions via migrant birds 35 . For TBEV, we also recommend surveillance and prevention in northern Xinjiang, although no human cases have been reported there. Our models were built on data only from China and need to be further validated using data from other countries, e.g., South Korea and Japan where SFTS is also endemic.
As Lyme disease is not a notifiable disease in China, many human infections with B. burgdorferi sensu lato can only be located to provinces rather than counties. Some human cases were reported in the central north and central east of China, e.g., Shandong and Henan provinces, where model-predicted risks for the Ixodes are low (Supplementary Figs. 3, 38). On the other hand, these areas have moderate to high risks for R. sanguineus, R. microplus and Hy. scupense, which are also known to harbor Borrelia (Fig. 3, Supplementary Figs. 5, 7) 36,37 . More active field surveillance of ticks capable of transmitting Lyme disease and related pathogens is recommended for these regions.
Our study is subjected to a few limitations. Firstly, the survey locations of ticks were unlikely sampled randomly. Consequently, a certain level of bias could exist in the modeling results. Most likely, biased sampling might have biased our analyses towards the null, i.e., contributions of some important factors might have been underestimated if their distributions had influenced the sampling of survey sites. However, the surveyed 1134 counties (39% of all counties in China) cover all biogeographical zones of China ( Supplementary Fig. 1), implying a low possibility of geographic bias. Secondly, the land cover data used for ecological modeling of ticks were collected in 2005, which may not be appropriate given the rather rapid landscape change in China in the recent decade. On the other hand, while some tick detection records were obtained post 2010, the establishment of their habitat likely had evolved much longer. Finally, while the AUC values are high for all the models we fitted, AUC does not necessarily reflect the goodness of fit and could be misleading when the absence data are associated with high uncertainty 38 . Such uncertainty exists as most surveys are cross sectional.
In conclusion, our study found it necessary to expand current field survey efforts for ticks, especially those harboring pathogens implicative of imminent threats to public health. Meanwhile, it is wise to strengthen surveillance for TBDs by increasing diagnosis and treatment capacities in areas where human cases have emerged or the model-predicted risk level is high. As urbanization and the Grain-to-Green Program (restoring forests from croplands) are ongoing in parallel in China, the dynamics of ticks and tick-borne pathogens as well as diseases will be complex and need close monitoring.

Methods
Data on ticks and tick-borne pathogens. We assembled a comprehensive database of ticks and tick-borne pathogens (the database is available upon request) from a variety of sources, including (1)  were searched for studies published between January, 1950 and December, 2018, using the following keywords: ("Tick" or "Ticks") and "China". We also checked the references in retrieved articles to reach more relevant articles. Each article was carefully reviewed by two team members independently to collect the following information using a standard form: study date, study location, spatial resolution, tick species identified, laboratory methods, and detection results for tick-borne pathogens. Any disagreement between the two staff members was resolved by discussion and consensus among the reviewers and other co-authors. Only studies with clearly identifiable results, i.e., presence or absence, time and location of tick species or tick-borne pathogens were included in our database (Supplementary Table 6). If a tick species or tick-borne pathogen was reported more than once in the same county (e.g., through seasonal collections or by different study groups or in more than one townships within a county) during the study period, it was counted only once in our analyses. The tick-borne pathogens included in our database were detected from ticks by any of the following laboratory tests: real-time polymerase chain reaction (RT-PCR), PCR, isolation or culture, or smear microscope. If more than one pathogen were found in the same study or isolated from the same tick, a record was created for each pathogen in our database. For articles containing ambiguous data, the original authors were contacted for clarification; if the ambiguity was not clarified, the data in question were excluded from our database. These literature-extracted data and data from other sources were integrated to form one database at the county level for final analyses.
Data on socioenvironmental and ecoclimatic factors. We collected a variety of environmental and climatic variables that are commonly used in ecological studies on the spatial distribution of tick species and tick-borne pathogens 15,31,43,44 . The choice of variables is mainly based on empirical ecological evidence in the literature and their spatial variability. In addition, we focus on ecological variables that are potentially shared by multiple species so that the results can be compared across species. For example, although our previous study has shown that the land cover of tea farms contributes to the distribution of SFTS patients and hence is a potential predictor for the presence of Ha. longicornis, we did not include this variable because there is no evidence for the association of tea farms with most other tick species.
The 38 years (1981 to 2018) of climatic data were collected from 2006 weather surveillance stations in mainland China, covering 71.3% of 1134 surveyed counties (http://cdc.nmic.cn/home.do). The climatic data include average monthly meteorological variables such as temperature, maximum temperature, minimum temperature, relative humidity, and rainfall during the 38 years. For the 877 counties (326 with tick presence) without meteorological stations, the mean values of the nearest five surveillance stations were used as a proxy for their meteorological variables. From these longitudinal meteorological variables, 19 cross-sectional ecoclimatic variables (BIO01-19, also called bioclimatic variables recommended by the U.S. Geological Survey) were created and their yearly averages were used as predictors in our risk models 24,45 . These ecoclimatic variables better capture the seasonal trends of different species related to their physiological constraints than traditional meteorological variables and have been widely used in ecological studies 24 .
China updated its land cover data every 5-10 years since 1995. Data sets from different years may not be directly comparable or combinable as land cover categories and classification criteria often changed. Raster-type land cover data of China in the year 2005 and 2015 with a resolution of one square kilometer were obtained from the National Earth System Science Data Sharing Infrastructure (http://www.geodata.cn). Our tick records span over the past five decades. While many tick surveys were conducted after 2010, the establishment of the ticks' habitats likely has evolved much longer. Therefore, we used the 2005 land cover data to model the distributions of tick species. We used the more recent 2015 land cover data for modeling the distributions of SFTSV and TBEV because a large portion of the data was generated over the recent decade, in particular for SFTSV (after 2010). Elevation data were obtained from the Shuttle Radar Topography Mission (SRTM) archives (http://www.srtm.csi.cigar.org).
Given that many ticks feed on domestic animals and the majority of patients affected by tick-borne diseases in China were rural residents, we extracted demographic data in the form of the proportion and density of rural population from the 2010 census data obtained from the National Bureau of Statistics of China (http://www.stats.gov.cn/). We strictly limited the number of demographic and socioeconomic variables to (1) avoid model overfitting as the survey data of tick species are rather limited, and (2) focus on variables with a direct rather than indirect link to the ecology of tick species and tick-borne pathogens so that the results are more generalizable to outside China.
In total, 45 socioenvironmental and ecoclimatic variables at the county level were extracted from these data using the ArcGIS 10.0 software (ESRI Inc., Redlands, CA, USA) (Supplementary Table 7). Data cleaning and reorganization with regard to these variables, such as the calculation of the proxy climatic variables for counties not covered by meteorological stations, were performed in the statistical software R (ver. 3.6.0).
Data on clinical cases of tick-borne pathogens. Clinically diagnosed or laboratory-confirmed TBE and SFTS cases at clinics and hospitals are reported, as mandated by the Ministry of Health, to the Chinese Information System for Diseases Control and Prevention. The county-level data of human cases of TBE and SFTS during 2005-2018 were collected from the Chinese Scientific Data Center for Public Health (http://www.phsciencedata.cn) and were used in the ecological models for tick-borne pathogens.
Spatial mapping. Recorded occurrences of ticks, tick-borne agents, and human cases were geo-referenced at the county level when data permit or at the prefecture or province level otherwise. In total, 78.8% of tick species, 63.3% of tick-borne pathogens (100% for SFTSV and TBEV), and 60.6% of TBDs were geo-referenced at the county level. All maps were produced using the ArcGIS 10.0 software.
Ecological modeling. For each of the 19 major tick species, a case-control study design was used to build predictive machine-learning models at the county level. Records that could not be geo-referenced at the county level were excluded. Briefly, for each given tick species, counties with at least one record of occurrence were considered as "cases", and those surveyed but lacking any evidence of occurrence were considered as "controls" 46 . The numbers of "cases" and "controls" for each tick species were listed in Supplementary Data 1. The remaining counties where tick surveys have not been conducted or have not yielded conclusive findings were excluded from model building but were included for risk mapping. For example, among a total of 1134 counties where ticks were surveyed and tick species were determined, 382 counties recorded occurrence of Dermacentor nuttalli, and were thus considered "cases", and the other 752 counties were considered "control" sites, for modeling the distribution of Dermacentor nuttalli. A Boosted Regression Trees (BRT) model at the county level was fitted to the training set to assess the contributions of ecoclimatic and socio-environmental predictors to the geographic distribution of the given tick species. The BRT model is a popular approach to ecological studies and has been widely used for risk mapping of infectious diseases such as avian influenza, rabies, and helminth [47][48][49][50] . The BRT model couples the advantages of two algorithms, regression trees and machine learning techniques, and allows nonlinear relationships between outcomes and covariates and multicollinearity among covariates 51 . For each BRT model, 43 variables including 24 environmental and 19 ecoclimatic factors were used as potential predictors (Supplementary Table 7) for the presence and absence of the tick species in each county. The fitted model was used to project risk levels in counties without tick surveys 52,53 . To counterbalance potential sampling bias of survey counties, we built a logistic regression model for the selection of tick survey counties with all ecoclimatic and socio-environmental variables as predictors (Supplementary Table 7). The response of this model was one for all tick-surveyed counties and zero for unsurveyed counties. The predictors were chosen using a backward procedure at the significance level of 0.05 (Supplementary Table 7). The reciprocals of predicted sampling probabilities of all surveyed counties were first rescaled to have a mean of one and then used as weights in the BRT models for the 19 major tick species [54][55][56] .
A tree complexity of five, a learning rate of 0.005 and a bagging fraction of 75% were used for the primary analysis based on their satisfactory performance in our previous research 57,58 . Bagging is a procedure that resamples data points to fit sequential trees so to improve predictive performance. A 10-fold cross validation was used to identify the optimal number of trees using the gbm.step function in the R dismo package. The output of a BRT model consists of both predicted probabilities of occurrence and relative contributions (or influences) of predictors. The relative contribution is calculated based on how many times a predictor is chosen for splitting and how much each split improves the objective function, averaging over all trees. These relative contributions of all predictors are standardized so that they sum to one 51 . A two-stage bootstrapping procedure was employed to provide a more robust and parsimonious estimation of model parameters. In each stage, the following split-and-fit step was repeated for a certain number of times. A training set with 75% of data points was randomly selected by bootstrapping without replacement, and the remaining 25% served as a test set. A BRT model was built using the training set, and then applied to the test set for validation if needed. In the first stage, the split-and-fitting step was repeated for ten times to screen important predictors. Validation of the trained model using the test set was not performed in this stage. Predictors that had a relative contribution <2% for all bootstrap training sets were excluded from the next stage. In the second stage, the split-and-fitting step was repeated for 100 times using the remaining predictors. As no variable selection was performed in this stage, all 100 models had the same predictors but yielded different contribution estimates. The relative contributions of the predictors were averaged over the 100 BRT models to represent their final relative contributions. The ROC curves and areas under the curve (AUC) based on the test sets were also averaged to represent the final predictive performance. The standard deviations and 95% percentiles of the relative contributions and AUCs across the 100 models were used to quantify the uncertainty in the estimation. Considering that there could be false negative and false positive counties in the observed data, we also calculated partial area AUC with a tolerance level of 0.2 for omission error 59 . For partial area AUC, the horizontal axis is the total rate of positives rather than false positives. we presented the ratio of the partial AUC to the area under the random selection line (diagonal line) as suggested by Peterson et al. 59 . Finally, the predicted probabilities were averaged over the 100 models to represent the final estimates of the county-specific probabilities of presence, which were mapped for the 19 main tick species 47,48,52,53 . BRT Modeling was conducted using the R packages dismo and gbm, and predictive performance was assessed using ROCR and pROC in the R v3.6.0 environment (https://www.r-project.org). We also performed a sensitivity analysis using a learning rate of 0.01 for selected tick species but found no substantial difference in the contribution estimates. Due to both the data size (45 predictors) and the number of models runs ([19 ticks + 2 pathogens] × 100), we cannot afford a full cross-validation optimization for all model configuration parameters.
To determine model-predicted high-risk counties for each tick species, we chose a cut-off value that maximizes sensitivity + specificity along the ROC curve for each final BRT model 60,61 . Counties with predicted probabilities above the cut-off value for a given model were considered as having a high risk of harboring the corresponding tick species. For each tick species, the number, area, and population size of model-predicted high-risk counties were compared to the quantities of counties with observed occurrence (Table 1).
Clustering ticks with similar ecological niches and their spatial distribution. To explore similarity in ecological niches among the 19 predominant tick species, a hierarchical cluster analysis based on the weighted-average linkage method was performed 62 . Features used for clustering were formed as the following. We first excluded predictors that are not influential (excluded from final models) for all 19 ticks. For each tick species, three quantities associated with each remaining ecological predictor were calculated as features for clustering. One is the average relative contribution of this predictor in the final 100 BRT models. If the predictor was not included in the final models for this tick species, its relative contribution was set to zero. The second quantity is a measure for the difference in this predictor between case counties (positive for the given tick species) and all counties. We first calculated the median value of this predictor among all case counties and quartile intervals of the predictor among all counties in the nation. We then assigned one of the numbers 1-4 according to which quartile interval the median lies in, e.g., assign 1 (4) if the median lies in the lowest (highest) quartile. The third quantity is the linear correlation between the predictor and model-predicted presence probabilities of the given tick species among all counties (averaged over the 100 models). These three quantities of all ecological predictors jointly serve as features for clustering. A dendrogram was created to demonstrate the clustering pattern of these 19 tick species, together with a thematic matrix illustrating the features (Fig. 2). This matrix has tick species as rows and predictors as columns. The color of each cell in the matrix shows the average relative contribution and the number shows the quartile (1-4 for 1 st -4 th quartiles) location of the median of cases. To map geographic distributions of the identified clusters of tick species at the county level, we define the presence of each cluster as the presence of any tick species in that cluster.
Population at risk for emerging tick-borne pathogens. BRT models were also used to evaluate the risk and risk drivers for the presence of the etiological pathogens for SFTS and TBE, the two most commonly reported tick-borne diseases (TBD) with mandated surveillance in China. For each pathogen, the corresponding model considers the same 45 potential environmental and ecoclimatic predictors used for modeling tick species. All counties where the pathogen was detected in ticks according to literature or human cases of the associated TBD were reported by surveillance were regarded as "cases". For a county to be assigned as a "control", the following conditions must be satisfied: (1) No human cases of the associated TBD and any other evidence for the presence of the pathogen were reported by surveillance or literature; (2) The primary tick vector (Ha. longicornis for SFTSV and I. persulcatus for TBEV) was surveyed but not found; or the primary tick vector was not surveyed and the probability of the existence of this tick species, as predicted by the corresponding BRT model, is smaller than the cutoff that yields the best predictive performance of the model (represented by the Youden's index).
We made these stringent criteria for controls because it is much harder to exclude the possibility of the existence of any pathogen. Note that a county where the tick vector was detected but tested negative for the pathogen does not qualify this county as a control, because even in "case" counties, detection rates of a pathogen among field-collected ticks are usually low. Our choice of control counties may result in a certain level of overestimation of high-risk counties, yet overestimation is preferred to underestimation from the perspective of disease prevention. A two-stage bootstrap procedure was also used to generate 100 BRT models for each TBD, based on which the average relative contributions were calculated and the average predicted probabilities of TBD were mapped.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Model results generated as part of this study and the raw data that support the findings of this study are available within the paper and its supplementary information files. Source data are provided with the paper.