Abstract
Controlling foodborne diseases requires robust outbreak detection and a comprehensive understanding of outbreak dynamics. Here, by integrating large-scale phylogenomic analysis of 3,642 isolates and epidemiological data, we performed ‘data-driven’ outbreak detection and described the long-term outbreak dynamics of the leading seafood-associated pathogen, Vibrio parahaemolyticus, in Shenzhen, China, over a 17-year period. Contradictory to the widely accepted notion that sporadic patients and independent point-source outbreaks dominated foodborne infections, we found that 71% of isolates from patients grouped into within-1-month clusters that differed by ≤6 single nucleotide polymorphisms, indicating putative outbreaks. Furthermore, we showed that despite the long time spans between clusters, 70% of them were genomically closely related and were inferred to arise from a small number of common sources, which provides evidence that hidden persistent reservoirs generated most of the outbreaks rather than independent point-sources. Phylogeographical analysis further revealed the geographical heterogeneity of outbreaks and identified a coastal district as the potential hotspot of outbreaks and as the hub and major source of cross-district spread events. Our findings provide a comprehensive picture of the long-term spatiotemporal dynamics of foodborne outbreaks and present a different perspective on the major source of foodborne infections, which will inform the design of future disease control strategies.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout




Data availability
The sequencing data have been deposited in the NCBI Sequence Read Archive under accession number PRJNA745505. Background information of sequenced isolate is listed in Supplementary Table 1. Source data are provided with this paper.
References
WHO estimates of the global burden of foodborne diseases: foodborne disease burden epidemiology reference group 2007–2015 (WHO, 2015).
Deng, X., den Bakker, H. C. & Hendriksen, R. S. Genomic epidemiology: whole-genome-sequencing–powered surveillance and outbreak investigation of foodborne bacterial pathogens. Annu. Rev. Food Sci. Technol. 7, 353–374 (2016).
Ronholm, J., Nasheri, N., Petronella, N. & Pagotto, F. Navigating microbiological food safety in the era of whole-genome sequencing. Clin. Microbiol. Rev. 29, 837–857 (2016).
Waldram, A., Dolan, G., Ashton, P. M., Jenkins, C. & Dallman, T. J. Epidemiological analysis of Salmonella clusters identified by whole genome sequencing, England and Wales 2014. Food Microbiol. 71, 39–45 (2018).
Hassoun-Kheir, N. et al. Concordance between epidemiological evaluation of probability of transmission and whole genome sequence relatedness among hospitalized patients acquiring Klebsiella pneumoniae carbapenemase-producing Klebsiella pneumoniae. Clin. Microbiol. Infect. 27, 468.e1–468.e7 (2021).
Jackson, B. R. et al. Implementation of nationwide real-time whole-genome sequencing to enhance listeriosis outbreak detection and investigation. Clin. Infect. Dis. 63, 380–386 (2016).
Moura, A. et al. Real-time whole-genome sequencing for surveillance of Listeria monocytogenes, France. Emerg. Infect. Dis. 23, 1462–1470 (2017).
Pijnacker, R. et al. An international outbreak of Salmonella enterica serotype Enteritidis linked to eggs from Poland: a microbiological and epidemiological study. Lancet Infect. Dis. 19, 778–786 (2019).
Pettengill, J. B. et al. A multinational listeriosis outbreak and the importance of sharing genomic data. Lancet Microbe 1, e233–e234 (2020).
Baker-Austin, C. et al. Vibrio spp. infections. Nat. Rev. Dis. Primers 4, 1–19 (2018).
Haendiges, J. et al. Pandemic Vibrio parahaemolyticus, Maryland, USA, 2012. Emerg. Infect. Dis. 20, 718–720 (2014).
Haendiges, J. et al. A nonautochthonous U.S. strain of Vibrio parahaemolyticus isolated from Chesapeake Bay oysters caused the outbreak in Maryland in 2010. Appl. Environ. Microbiol. 82, 3208–3216 (2016).
Haendiges, J. et al. Characterization of Vibrio parahaemolyticus clinical strains from Maryland (2012–2013) and comparisons to a locally and globally diverse V. parahaemolyticus strains by whole-genome sequence analysis. Front. Microbiol. 6, 125 (2015).
Gonzalez-Escalona, N., Gavilan, R. G., Toro, M., Zamudio, M. L. & Martinez-Urtaza, J. Outbreak of Vibrio parahaemolyticus sequence type 120, Peru, 2009. Emerg. Infect. Dis. 22, 1235–1237 (2016).
Gonzalez-Escalona, N., Jolley, K. A., Reed, E. & Martinez-Urtaza, J. Defining a core genome multilocus sequence typing scheme for the global epidemiology of Vibrio parahaemolyticus. J. Clin. Microbiol. 55, 1682–1697 (2017).
Martinez-Urtaza, J. et al. Epidemic dynamics of Vibrio parahaemolyticus illness in a hotspot of disease emergence, Galicia, Spain. Emerg. Infect. Dis. 24, 852–859 (2018).
Baker-Austin, C. et al. Genomic epidemiology of domestic and travel-associated Vibrio parahaemolyticus infections in the UK, 2008–2018. Food Control 115, 107244 (2020).
Li, Y. et al. Vibrio parahaemolyticus, southern coastal region of China, 2007–2012. Emerg. Infect. Dis. 20, 685–688 (2014).
Azarian, T. et al. Phylodynamic analysis of clinical and environmental Vibrio cholerae isolates from Haiti reveals diversification driven by positive selection. mBio 5, e01824-14 (2014).
Chen, X. et al. A new emerging serotype of Vibrio parahaemolyticus in China is rapidly becoming the main epidemic strain. Clin. Microbiol. Infect. 26, 644.e1–644.e7 (2019).
Marshall, K. E. H. et al. Protracted outbreak of Salmonella newport infections linked to ground beef: possible role of dairy cows — 21 States, 2016–2017. MMWR Morb. Mortal. Wkly Rep. 67, 443–446 (2018).
Martinez-Urtaza, J. et al. Pandemic Vibrio parahaemolyticus O3:K6, Europe. Emerg. Infect. Dis. 11, 1319–1320 (2005).
Zhang, R. et al. Impact of water quality variations on the microbial metagenome across coastal waters in Shenzhen, south China. Ocean Coast. Manage. 208, 105612 (2021).
Advances in Science and Risk Assessment Tools for Vibrio Parahaemolyticus and V. Vulnificus Associated with Seafood: Meeting Report (WHO, FAO, 2021).
Li, Y. et al. Application of digital PCR and next generation sequencing in the etiology investigation of a foodborne disease outbreak caused by Vibrio parahaemolyticus. Food Microbiol. 84, 103233 (2019).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Croucher, N. J. et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res. 43, e15 (2015).
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2 – approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).
Zhou, Z. et al. GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens. Genome Res. 28, 1395–1404 (2018).
Yang, C. et al. Recent mixing of Vibrio parahaemolyticus populations. ISME J. 13, 2578–2588 (2019).
Yang, C. et al. Genetic diversity, virulence factors and farm-to-table spread pattern of Vibrio parahaemolyticus food-associated isolates. Food Microbiol. 84, 103270 (2019).
Payne, M. et al. Enhancing genomics-based outbreak detection of endemic Salmonella enterica serovar Typhimurium using dynamic thresholds. Microb. Genom. 7, 000310 (2019).
Rambaut, A., Lam, T. T., Carvalho, L. M. & Pybus, O. G. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus Evol. 2, vew007 (2016).
Drummond, A. J. & Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214 (2007).
Rambaut, A., Drummond, A. J., Xie, D., Baele, G. & Suchard, M. A. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst. Biol. 67, 901–904 (2018).
Bollback, J. P. SIMMAP: stochastic character mapping of discrete traits on phylogenies. BMC Bioinform. 7, 88 (2006).
Revell, L. J. Phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3, 217–223 (2012).
Acknowledgements
We thank D. Falush, B. Kan, B. Pang and E. Tourrette for valuable comments, and the personnel from 16 sentinel hospitals and 10 district Centers for Disease Control and Prevention in Shenzhen for their participation in and contribution to our surveillance work. This study was funded by the National Key Research and Development Program of China (No. 2018YFC1603902 and 2017YFC1601500, Y.C.), the Sanming Project of Medicine in Shenzhen (No. SZSM201811071, Q.H.), the China National Science and Technology Major Projects Foundation (No. 2017ZX10303406, Q.H.), the National Natural Science Foundation of China (No. 32000008 to C.Y. and No. 31770001 to Y.C.), the China Postdoctoral Science Foundation (No. 2020M672836, C.Y.), the Natural Science Foundation of Guangdong Province (No. 2019A1515011523, Y.L.), the Youth Innovation Promotion Association, CAS (No. 2022278, C.Y.), the Shenzhen Key Medical Discipline Construction Fund (No. SZXK064, Q.H.), the Key scientific and technological project of Shenzhen Science and Technology Innovation Committee (No. KCXFZ202002011006190, Q.H.), and the Non-profit Central Research Institute Fund of the Chinese Academy of Medical Sciences (No. 2020-PT330-006, Q.H.).
Author information
Authors and Affiliations
Contributions
Y.C., Q.H., R.Y. and C.Y. designed, initiated and coordinated the study. Y.L., L.L., L.Z., L.W., Y.J., Q.C., L.H., M.J., X.S., L.H., R.C. S.W., C.W. and Y.Q. contributed to data collection and management. C.Y., Y.L. and Y.W. analysed the data. All authors contributed to interpretation of the data. C.Y. wrote the first draft of the paper and Y.L., J.M.-U., R.Y., Y.C. and Q.H. reviewed and revised the paper. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Microbiology thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1
Pairwise SNP distance distribution between all the 3,642 isolates.
Extended Data Fig. 2 Temporal dynamics of the number of isolates from patients.
Years (2002–2005) with <50 patient isolates were merged. PCG, pathogenic clonal group.
Extended Data Fig. 3 Spatiotemporal distribution of the number of patient isolates.
(A) Temporal distribution of the number of patient isolates from PCG-others. Point sizes of panel are scaled with the number of patient isolates in a PCG. (B) Geographical distribution of the number of patient isolates from all the PCGs. The numbers in the heatmap indicate the number of patients.
Extended Data Fig. 4 Pairwise SNP distance distribution between isolates from outbreaks.
(A) Pairwise SNP distance distribution (bar, left Y-axis) between Foodborne Disease Outbreak Surveillance (FDOS)-outbreak isolates of different pathogenic clonal group (PCGs), and the proportion of clustered isolates under different SNP cutoffs (lines and points, right Y-axis). Blue and red indicate all the SNPs and non-recombined SNPs, respectively. (B) Pairwise SNP distance distribution between 34 isolates from four outbreaks of external independent datasets.
Extended Data Fig. 5
Number and size distribution of Foodborne Disease Outbreak Surveillance (FDOS) outbreaks, P-clusters, and Ob-clusters under different SNP cutoffs (3, 6, and 10 SNPs) and time intervals (1 week and 1 month).
Extended Data Fig. 6
Size distribution of Ob-clusters detected/not detected by the Foodborne Disease Outbreak Surveillance (FDOS).
Extended Data Fig. 7 BEAST maximum clade credibility (MCC) trees of two representative P-clusters, PC052 (A) and PC176 (B).
The colors of circles in the tips indicate different Ob-clusters or non-Ob-clusters within a P-cluster.
Extended Data Fig. 8
Inferred source district distribution (posterior probability >0.7) of cross-district Ob-clusters before and after subsampling.
Extended Data Fig. 9 SNP distance distribution over different geographical distance to hotspot district between isolates from Ob-clusters sourced from the hotspot.
The bold black horizontal line indicates the mean SNP distance.
Extended Data Fig. 10
Genome quality assessment flowchart (A) and characteristics of the high-quality genomes (B).
Supplementary information
Supplementary Information
Supplementary Table 1.
Source data
Source Data Fig. 2
Statistical source data.
Source Data Fig. 3
Statistical source data.
Source Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 1
Statistical source data.
Source Data Extended Data Fig. 2
Statistical source data.
Source Data Extended Data Fig. 3
Statistical source data.
Source Data Extended Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 6
Statistical source data.
Source Data Extended Data Fig. 7
BEAST maximum clade credibility (MCC) trees (nexus format).
Source Data Extended Data Fig. 8
Statistical source data.
Source Data Extended Data Fig. 9
Statistical source data.
Source Data Extended Data Fig. 10
Statistical source data.
Rights and permissions
About this article
Cite this article
Yang, C., Li, Y., Jiang, M. et al. Outbreak dynamics of foodborne pathogen Vibrio parahaemolyticus over a seventeen year period implies hidden reservoirs. Nat Microbiol 7, 1221–1229 (2022). https://doi.org/10.1038/s41564-022-01182-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41564-022-01182-0