The seventh pandemic of cholera in Europe revisited by microbial genomics

In 1970, the seventh pandemic of cholera (7 P) reached both Africa and Europe. Between 1970 and 2011, several European countries reported cholera outbreaks of a few to more than 2,000 cases. We report here a whole-genome analysis of 1,324 7 P V. cholerae El Tor (7 PET) isolates, including 172 from autochthonous sporadic or outbreak cholera cases occurring between 1970 and 2011 in Europe, providing insight into the spatial and temporal spread of this pathogen across Europe. In this work, we show that the 7 PET lineage was introduced at least eight times into two main regions: Eastern and Southern Europe. Greater recurrence of the disease was observed in Eastern Europe, where it persisted until 2011. It was introduced into this region from Southern Asia, often circulating regionally in the countries bordering the Black Sea, and in the Middle East before reaching Eastern Africa on several occasions. In Southern Europe, the disease was mostly seen in individual countries during the 1970s and was imported from North and West Africa, except in 1994, when cholera was imported into Albania and Italy from the Black Sea region. These results shed light on the geographic course of cholera during the seventh pandemic and highlight the role of humans in its global dissemination.

C holera is a life-threatening epidemic diarrheal disease caused by Vibrio cholerae. This bacterium is a natural inhabitant of coastal, estuarine and brackish water environments globally 1 . However, a limited number of V. cholerae populations are pathogenic for humans, only one of which is associated with the current cholera pandemic 1 . This choleragenic population possesses particular features: it belongs to serotype O1 or, more rarely, serotype O139, and it contains a repertoire of pathogenicity islands and virulence genes, including the CTXϕ prophage, which encodes the cholera toxin (CT) responsible for most of the symptoms of the cholera diarrheal syndrome (i.e., a rapid and massive loss of body fluids leading to severe dehydration), and VPI-1, which contains the genes encoding the toxin-coregulated pilus (TCP) required for adhesion to the intestinal wall and the regulatory gene toxT promoting expression of both CT and TCP 2 . In 1970, nine years after its onset in Indonesia, the seventh pandemic of cholera (7P) reached both Africa and Europe 3 . However, little is known about the routes of cholera transmission in Europe during the 7P, beyond the spatiotemporal distribution of cases notified by individual countries to the World Health Organization (WHO) 4 . Furthermore, the available data must be analyzed with care, due to possible underreporting, or even a total absence of reporting, in some countries. For example, in Romania, due to the political situation prevailing before the revolution, no cases of cholera were officially reported to the WHO until 1989, whereas 712 cases in total were reported from 1990 to 1995 (ref. 4 ). In the pre-genomic era, only a few attempts were made to understand the population structure, relatedness, and origins of the V. cholerae O1 strains implicated in outbreaks in Europe, and even then, only at the level of an individual country or region [5][6][7] . The typing methods used had a low resolution and were unsuitable for phylogenetic inference. Unsurprisingly, they therefore proved inadequate or even yielded confusing results, particularly for studies of isolates collected over long periods of time. Hence, 51% of the 546 V. cholerae O1 isolates collected in Romania between 1977 and 1995 were of the same phage-type, M4, which was identified throughout the study period 5 . By contrast, a DNA-based method, ribotyping (ribosomal RNA gene restriction pattern analysis), performed on 94 toxigenic V. cholerae O1 isolates from Romania and Moldova collected between 1977 and 1994, showed these isolates to belong to up to 18 different ribotypes 6 .
Large-scale whole-genome sequencing studies have greatly improved our understanding of the population structure, evolution, and spread of Vibrio cholerae O1 (refs. 1,[8][9][10][11] ). It has been clearly established that a single lineage, "7PET", is responsible for the current pandemic 1,8 . This lineage has become established in the Bay of Bengal, whence it can spread to the rest of the world 8 . Long-range spread to China 9 , Africa 10 or America 1 is well documented by genomic studies. However, these recent studies included only a very small number of European isolates. As a means of elucidating the spatiotemporal spread of cholera in Europe and shedding light on the dynamics of its acquisition of antibiotic resistance through microbial genomics, we selected and sequenced additional V. cholerae O1 isolates representative of the widest possible temporal and geographic distribution of European autochthonous cholera cases reported to the WHO 4 or in published studies (Table 1). By contrast with previous findings, we show unambiguously that the European cholera outbreaks studied were caused by repeatedly introduced 7PET populations originating from South Asia and not by local long-term persistent 7PET or non-7PET populations. We also show that the local environment was not involved in the accumulation of antibiotic resistance determinants in cholera outbreak strains over time.

Results and discussion
We selected 156 V. cholerae O1 isolates from European autochthonous cholera cases (Supplementary Data 1). We subsequently found that 12 of these isolates were actually from imported cases, or even from cases occurring in Africa and Asia. They were nevertheless included, to enrich the global phylogenetic context. We contextualized these sequenced V. cholerae O1 isolates within a global collection of 1168 7PET genomic sequences (Supplementary Data 1) and constructed a maximum likelihood phylogeny of 1324 genomes-including 172 European genomes-using 13,544 single-nucleotide variants (SNVs) evenly distributed over the non-repetitive, non-recombinant core genome ( Supplementary Fig. 1). A time-scaled phylogeny was also constructed from 1283 of these 7PET genomic sequences (Fig. 1).
Phylogenomics of the 7PET European isolates. Our phylogenetic analyses clearly showed that eight 7PET sublineages were imported into Europe between 1970 and 2011, leading to the establishment of local transmission chains (Fig. 1). These 7PET sublineages, encompassing the three genomic waves described by Mutreja et al. 8 , were named in accordance with a standardized nomenclature ( Table 2). In 1970, an "Ogawa strain"-named at the time according to its serotype-was first reported between July and October 1970, in cholera cases in the Black Sea region (Odessa and Kerch) of the former USSR, the former Czechoslovakia, the Middle East (Turkey, Lebanon, and Israel), North Africa (Libya and Tunisia), and West Africa (Guinea, Sierra Leone, Liberia, Ghana, and Côte d'Ivoire) 3 . This "Ogawa strain" corresponds to our sublineage EUR1/AFR1. This sublineage continued to circulate in Africa until the mid-1990s 10 . All but one of the 22 European isolates collected in the early 1970s and studied here belonged to sublineage EUR1/AFR1. These isolates include four Eastern European isolates collected in 1970 that clustered together. In particular, all these isolates had a specific synonymous SNV (T → C) at nucleotide position 2955669 according to GenBank accession number LT907989. One was collected from Odessa in the Black Sea region (now Ukraine) and three were collected from the eastern part of former Czechoslovakia (now Slovakia). The first of these isolates is a representative of the cholera outbreak that began in Odessa in August 1970, with more than 300 cases reported for this year in Ukraine 12 . The other three isolates are representative of a small outbreak (four cases and one death were reported to the WHO but underreporting by the authorities is suspected) that occurred in late October around Vojany, among workers involved in the building of a power plant 13 . Our genomic analysis suggests that the disease was probably exported from Ukraine to Vojany, which is located only 15 km from the Western Ukrainian border, rather than being caused by the effluent of aircraft flying from regions of endemic cholera in India to Europe, as proposed by Rondle et al. 14 . One hypothesis put forward to explain the origin of an unexpected cholera outbreak in Guinea, West Africa, in the summer of 1970 was the return of Guinean students from the Black Sea coast 15 . The presence of a specific synonymous SNV (T → C) at nucleotide position 2955669 in our Ukrainian and Slovakian isolates (but not in the African isolates) is not consistent with this hypothesis and instead suggests the involvement of students, pilgrims or soldiers flying back from the Middle East 3 . The five isolates collected in 1971 in Portugal and Spain, which reported 64 and 22 non-imported cases, respectively, for this year, did not group together by country of isolation in the phylogenetic analysis. They belonged to the upper part of the EUR1/AFR1 sublineage, in which most of the isolates collected after 1970 were from North Africa. This suggests that the relatively small number of scattered cholera cases reported by these two countries in 1971 may have been due to multiple importations of the bacterium at a time at which the disease was active in the southern part of the Mediterranean basin. Algeria, for example, reported 1332 cholera cases in 1971 (refs. 4,16 ). Unfortunately, no Algerian isolates from 1971 were available for this study. Despite being speculative, our hypothesis may be supported by the occurrence, in the summer of 1971, of a cluster of seven cholera cases in two villages of Zaragoza province, located on a route used by immigrant workers from North Africa to cross Spain 17 . In 1973, Italy reported 278 cholera cases and 23 deaths from this disease between August and October. Explosive outbreaks occurred around major ports in three Southern provinces: Campania, Apulia, and Sardinia 18 . The two isolates studied here, one from Campania and one from Apulia, clustered together and with Tunisian isolates collected in 1973 and 1974 in  the phylogenetic analysis. All these isolates had a specific synonymous SNV (T → C) at nucleotide position 1462272. In 1973, Tunisia, which is 85 miles away from Sicily, reported 656 cholera cases, up from four in the preceding year 4 . The largest European outbreak during the seventh pandemic was that in Portugal. This country reported 2467 cases and 48 deaths from cholera between April and October 1974 (refs. 4,19 ). The disease spread from the south to the north of the country over a period of 20 days, with 17 of the 18 administrative areas in Portugal eventually reporting cases of cholera caused by V. cholerae O1 serotype Inaba. We found that the Portuguese isolates from 1974 belonged to sublineage EUR1/AFR1, but were not derived from the strains circulating in this country in 1971. Instead, the 1974 isolates were derived from Angolan strains isolated in 1971 and 1972. Interestingly, all but two of the 10 Portuguese and Angolan isolates sequenced had a non-synonymous SNV (G103A; alteration C03 according to the nomenclature of ref. 10 ) in the wbeT gene associated with the Inaba serotype (Supplementary Data 1). The other two isolates, obtained from two members of the same family (one case and one carrier) contaminated in Portugal in 1974, belonged to serotype Ogawa (wildtype wbeT gene). Both could be classified as true Ogawa revertants of unknown significance. Our phylogeny is consistent with the stationing of colonial troops in the city of Tavira, where the outbreak began, in 1974 (ref. 19 ). These troops had been traveling back and forth between Portugal and its African colonies, such as Angola, where 934 cholera cases were reported in 1974 (more than three times the number of cases observed in 1972 and 1973). In July 1975, the cholera epidemic resumed in Portugal, with 1,066 cases and eight deaths 4 . We were able to identify the strain concerned as the same 7PET strain responsible for the outbreak in the preceding year, but were unable to determine whether this strain had persisted from the 1974 outbreak or had been reimported by the thousands of refugees from Angola that entered Portugal in Summer 1975. The robust phylogenetic framework developed here also enabled us to trace the origins of three sporadic autochthonous cases of cholera reported in France in the early 1970s. The three isolates concerned-which are detailed in the Supplementary Notes section "French autochthonous sporadic cases"-belonged to EUR1/AFR1 and probably originated from North Africa (n = 2) or West Africa (n = 1).
In 1970, a second cholera strain was identified, contemporary to the introduction into Europe of the "Ogawa strain" (EUR1/ AFR1 sublineage). This "Inaba strain"-also named at the time according to its serotype-was first described in Astrakhan (former USSR) in August, and then in the Middle East and Eastern Africa, before the end of 1970 (ref. 3 ). This "Inaba strain" corresponds to our EUR2/AFR3 sublineage, the isolates of which belong to the Inaba serotype, with a premature stop codon in the wbeT gene (mutation C157T; alteration A05). This sublineage represents the second introduction of cholera into Europe. The small outbreak (less than 100 cases) of cholera caused by V. cholerae O1 serotype Inaba reported by Ukraine 12 in 1974 was found to be caused by EUR2/AFR3 (Supplementary Data 1).
We subsequently identified six other introductions of cholera from South Asia into Europe, all in Eastern Europe, including five involving Romania (Figs. 1 and 2, Tables 1 and 3, and Supplementary Data 1). The last introduction occurred in 2011. In three instances, the introductions of 7PET sublineages into Europe (EUR3/AFR4, EUR4/AFR6, and EUR6/AFR8) involved Romania, but subsequently displayed a broader geographic expansion, with similar patterns of spread. EUR3/AFR4 was the sublineage involved in the third introduction of cholera into Europe, and the first into Romania, where it was found in 1977 and 1981. As no cholera cases were officially reported by Romania during this period (Table 1), we estimated the burden of disease due to this sublineage using available data from Tulcea 20 (Table 4), a county encompassing the Danube delta, which was one of the most severely affected regions from 1990 to 1995, when data were reported to the WHO. These local data indicate that 115 cases were identified in Sulina, a port at the mouth of the Danube on the Black Sea in 1977. In 1981, 154 cholera cases were identified along the Danube River (including its delta), and the disease was thought to have been imported from Constanţa, one of the largest cities in Romania and the most important harbor on the Black Sea. EUR3/AFR4 also comprised isolates from Turkey (1976)(1977)(1978)(1979)(1980) and was found later in the Horn of Africa in 1985. The EUR4/AFR6 sublineage was responsible for the fourth introduction of cholera into Europe and the second into Romania. This sublineage was found in Romania in both 1987 and 1990. In 1987, the health authorities of Tulcea 20 reported 149 cases of cholera at 37 sites, mostly along the Danube river and its branches. In 1990, Romania provided nationwide data for the first time, with 270 cases of cholera and one death from the disease in July, in the Danube delta area. EUR4/AFR6 was also detected in Russia (Stavropol) in 1990, Turkey in 1991 and in East Africa from 1993 to 2003. One EUR4/AFR6 isolate from Ukraine was obtained from a water sample collected from the Danube in 1990, rather than from a human case. EUR6/AFR8 was responsible for the sixth introduction of cholera into Europe and the fourth into Romania. It was identified in Romania in 1993, when the country reported 15 cholera cases, including one imported case, and again in 1994 (80 reported cases, including 46 imported cases). This sublineage was widespread ( Supplementary Fig. 2) and caused relatively large outbreaks in neighboring countries, such as Ukraine in 1994 (813 cases and 20 deaths) and 1995 (525 cases and 10 deaths), and the Republic of Moldova in 1995 (240 cases and 5 deaths). Interestingly, EUR6/AFR8 was detected in a fish isolate collected in Odessa in October 1993, one year before the start of the Ukrainian outbreak (September 1994) 12,21 . In 1994, EUR6/AFR8 spread to Southern Europe: Albania (626 cases and 20 deaths) in September 4,7,22 , then Italy in October, with a small cluster of 12 cases (no deaths) in Apulia, close to Albania 4,7,23 .  10 Domman et al. 1  Comparison of genomic data with other typing schemes. Ribotyping was used in previous studies investigating the global population structure of V. cholerae O1 (refs. 26,27 ) or to investigate 7PET European outbreak isolates 6,7 . In our dataset, ribotyping data were available for 51 isolates from the five sublineages introduced into Romania (Table 5 and Supplementary Data 1). Up to five ribotypes were observed among isolates from the same sublineage (EUR4/AFR6), whereas some ribotypes, such as B5a and B21a, were common to isolates from two or even three sublineages (B5a in EUR3/AFR4 and EUR4/AFR6 and B21a in EUR3/AFR4, EUR5, and EUR6/AFR8). This confirms the irrelevance of non-genomic methods for inferring true genetic relationships between 7PET isolates and for accurate studies of the spatiotemporal spread of 7PET.
Evolution of antimicrobial resistance in European 7PET isolates. We show here that even though each sublineage is well characterized by its antimicrobial resistance gene content at genomic level (Table 3), diverse antimicrobial resistance phenotypes are observed within each sublineage (Fig. 3, Table 6 and Supplementary Data 1). Several explanations for this observation can be proposed. Firstly, it may be due to genetic rearrangements involving AMR determinants and affecting a proportion of isolates from a given 7PET sublineage. For example, some isolates from the EUR5 and EUR6/AFR8 sublineages lost their GI-15 genomic island conferring resistance to streptomycin and sulfamethoxazole, and some isolates from the EUR6/AFR8 and EUR7 sublineages presented an internal deletion of their ICEVChMoz10/ICEVchBan9 and ICEVchInd5/ICEVchBan5 genomic islands, respectively ( should be used with caution for distinguishing and tracking 7PET strains over time. Antimicrobial resistance susceptibility testing was previously performed on hundreds of V. cholerae O1 isolates collected between 1977 and 1995 in Romania 5,31 . These studies documented a gradual increase in antibiotic resistance, with the acquisition of resistance to nitrofurans in 1990-1991, tetracycline in 1993-1994, cotrimoxazole in 1993-1995 and nalidixic acid in 1995. Our genomic and phenotypic data are largely consistent with both these previous studies. However, in our dataset, the first isolates resistant to nitrofurans-caused by point mutations in the VC_0715 and VCA0637 nitroreductase genes 11 -were isolated in 1991 (EUR5 sublineage) and not in 1990 (EUR4/AFR6). Israil et al. 31 suggested that isolates resistant to tetracycline (TET), cotrimoxazole AMR antimicrobial resistance, WT wild type,absence. a According to Mutreja et al. 8 . b These non-silent point mutations in chromosomal genes VC_0715 and VC_A0637, and gyrA and parC are associated with resistance to nitrofurans, and to quinolones, respectively; the GI-15 genomic island harbors the genes responsible for resistance to streptomycin (aadA-like) and sulfonamides (sul1); the ICEVchBan9/ICEVchMoz10 genomic island harbors the genes responsible for resistance to streptomycin (strAB), tetracycline (tetA), chloramphenicol (floR), sulfonamides (sul2), trimethoprim/vibriostatic agent O/129 (dfrA1), and cotrimoxazole (sul2 and dfrA1); the ICEVchInd5/ICEVchBan5 genomic island harbors the genes responsible for resistance to streptomycin (strAB), chloramphenicol (floR), sulfonamides (sul2), trimethoprim/vibriostatic agent O/129 (dfrA1), and cotrimoxazole (sul2 and dfrA1). c GI-15 is absent in rare isolates. d Frequent ICEVchBan9 internal deletion encompassing the strAB, tetA, floR, and sul2 genes. e Rare ICEVchInd5 internal deletion encompassing the strAB, floR, and sul2 genes. (SXT), and nalidixic acid (NAL) emerged in Romania following the widespread use of TET and SXT in 1990and of NAL in 1993-1994 in Romania. Our phylogenomic analysis on contextual global 7PET isolates clearly shows that the main changes in AMR were driven by the introduction of sublineages into Europe, which already carried AMR determinants in South Asia before their introduction into Europe (Supplementary Data 1). This finding makes it possible to reject the hypothesis 31 that the local use of antibiotics in Romania was responsible for the emergence of AMR V. cholerae O1 strains.
Genomic epidemiology and sources of cholera. Even though it did not cover all notified outbreaks (Supplementary Notes section "7PET outbreak isolates not studied"), our genomic analysis identified two different patterns in Europe during the seventh pandemic of cholera. The first pattern concerned the Southern European countries, which were hit during the 1970s, when the disease was very active in Africa, particularly in North Africa, which is geographically close to Southern Europe, and West Africa, due to its colonial ties with European countries. The disease was easily brought back to Europe by colonial troops, migrant workers, refugees or tourists. Studies on imported cholera cases in Europe between the mid-1970s and late 1980s showed that most of these cases occurred in citizens of nations in which cholera was endemic (migrant workers or refugees), mostly in North Africa 32,33 . The outbreaks in Southern Europe were of limited size, with the exception of the 1974-1975 outbreak in Portugal. Good hygiene, access to safe water, and good sanitation conditions were probably responsible for the containment of the disease. Secondary cases of cholera are, thus, extremely rare in high-income countries 32,34 . In several of these outbreaks, epidemiological investigations identified raw fish and seafood as the vehicle of transmission 18,19,23 . The larger extent of the cholera outbreak in Portugal in 1974 may be partly due to the contamination of one brand of commercially bottled water 19  (1) Timed phylogeny, identical to that in Fig. 1, but represented vertically. (2) The seventh pandemic V. cholerae O1 El Tor (7PET) sublineages are indicated. The names of the European and European/African sublineages are indicated by a one-digit number within or next to the blue line and corresponding to that found in the sublineage name (e.g., 1 for EUR1, 2 for EUR2/AFR3, etc). The antimicrobial resistance (AMR) phenotype for 10 antibiotics, the presence of chromosomal AMR genes (4), and genomic islands encoding AMR (5) are shown for each isolate, according to its position in the phylogeny. More detail about the genes and genomic islands encoding AMR is provided in the footnotes of   (7) a According to Mutreja et al. 8 . b According to Damian et al. 6 and Koblavi et al. 26 .
The second pattern was seen in Eastern European countries bordering the Black Sea, which were recurrently affected, over a longer period of time, by cholera strains originating from South Asia. For example, Romania experienced five introductions of cholera, three of which occurred during a 4-year period in the 1990s (Supplementary Fig. 3). However, most of these introductions were linked to a more global spread of these sublineages involving the Middle East and eventually reaching Africa, including East Africa in particular. Another particular feature of cholera in Eastern Europe was that most of the cases occurred during warmer seasons (July to the end of October) in the Danube delta and along the Black Sea Coast, where the main risk factors for cholera were the drinking of surface waters directly from the Danube (particularly in remote locations or during disruptions at water treatment plants), a lack of food hygiene and the consumption of uncooked or undercooked fish 5,12,19 .
Understanding the balance between the role of the aquatic environment as a long-term reservoir of toxinogenic V. cholerae and the role played by humans in the spread and maintenance of the pathogen-through direct (human-to-human) or indirect (due to transient pollution of the environment with feces from cholera patients) transmission-is of particular importance if we are to combat this scourge effectively. The "cholera paradigm" theory, which has prevailed for the last two decades, suggests that coastal environmental factors influenced by climate and weather conditions are the key elements underlying cholera epidemiology 36 . This theory, based on the clear association between V. cholerae populations and plankton communities in coastal ecosystems, was developed on the basis of observations made primarily in the Bay of Bengal 36,37 , the global hub and natural habitat of the 7P cholera agent. However, outside Asia, genomic reconstruction of the spread of 7PET over several decades was not consistent with perennial aquatic environmental reservoirs acting as the primary source of epidemic cholera in Africa 10 and the Americas 1 . The data presented here-showing that 7PET sublineages, recurrently introduced from Asia, ultimately became extinct after a few years of circulation in Europe-also clearly rule out the long-term establishment of an aquatic reservoir of the disease in Europe, particularly in the countries bordering the Black Sea. Instead, our data suggest that the dynamics of cholera in this region, which houses a number of major ports, may be accounted for by the recurrent importation of 7PET strains via the movement of populations, such as the arrival of migrant workers from areas of endemic cholera during the 1990s 5 . Direct (human-to-human) or indirect (pollution of the environment with feces from cholera patients) secondary transmission can then cause cholera outbreaks in situations of poor hygiene, inadequate sanitation, the unsanitary harvesting and marketing of seafood, and risky dietary and drinking habits. As the movement of populations from regions in which cholera is endemic has not  ( ceased, and, indeed, has even increased over time, improvements in water, sanitation and hygiene (WASH) services and in health infrastructures almost certainly prevented the occurrence of cholera outbreaks in Europe after 2011.

Methods
Bacterial isolates. The 156 7PET isolates sequenced in this study are listed in Supplementary Data 1. They were obtained from the collections of the French National Reference Center for Vibrios and Cholera, Institut Pasteur, Paris, France; the Cantacuzino NMMIRD; and the Public Health Agency of Canada. The isolates were characterized by standard biochemical, culture, and serotyping methods 38 .
Antibiotic susceptibility testing. Antibiotic susceptibility was determined by the disk diffusion method, on Mueller-Hinton (MH) agar, in accordance with the guidelines of the Antibiogram Committee of the French Society for Microbiology (CA-SFM 2006) 39  Whole-genome sequencing. Whole-genome sequencing was carried out at the PF1 sequencing platforms of the Institut Pasteur, the genotyping and sequencing core facility of the Institut du Cerveau et de la Moëlle Epinière (Paris, France), or at the Public Health Agency of Canada, on Illumina platforms generating 70-300 bp paired-end reads, yielding a mean of 99-fold coverage (minimum 16-fold, maximum 1260-fold).
Short-read sequence data were submitted to the European Nucleotide Archive (ENA), under study accession number PRJEB38484; genome accession numbers are provided in Supplementary Data 1.
Additional genomic data. Raw sequence files or assembled genomes from 1,168 7PET genomes described by Weill et al. 10,11 or unpublished were downloaded from the ENA or GenBank and included in this study (Supplementary Data 1). We generated 100 bp overlapping simulated reads from the assembled genomes with fas-ta_to_fastq.pl (https://github.com/ekg/fasta-to-fastq/blob/master/fasta_to_fastq.pl).
The GenBank accession numbers used to perform these analyses are shown in Supplementary Table 1 Phylogenetic analysis. Repetitive (insertion sequences and the TLC-RS1-CTX region) and recombinogenic (VSP-II) regions in the alignment were masked 10 . Putative recombinogenic regions were detected and masked with Gubbins 50 version 2.3.4. A maximum likelihood (ML) phylogenetic tree was built from an alignment of 13,544 chromosomal SNVs, with RAxML 51 version 8.0.20, under the GTR model, with 200 bootstraps. The final tree was rooted on the A6 genome-the earliest and most ancestral 10 seventh pandemic isolate, collected in Indonesia in 1957-and visualized with iTOL 52 version 5 (https://itol.embl.de) ( Supplementary  Fig. 1). The rooted tree was time-scaled with LSD2 (ref. 53 ) version 1.6.5, using a strict molecular clock model with temporal outlier detection. LSD2 detected 41 temporal outliers (listed in Supplementary Data 1), which were removed from the time-scaled tree. The ancestral characters for geographic regions were reconstructed on the time-scaled tree with PastML 54 version 1.9.29.2, with the MPPA method and model F81. We ensured that the geographic predictions of the most recent common ancestors (MRCAs) of European clades were not disturbed by biased sampling, by performing reconstruction under the same settings for five trees obtained by pruning some of the leaves corresponding to overrepresented regions from the full time-tree. For each pruned subtree, we retained two (randomly selected) of 36, two of 20, and two of 40 leaves from those sampled in Romania in 1990Romania in , 1991Romania in , and 1993Romania in -1995, respectively, four of 18 leaves from those sampled in Ukraine or Moldova in 1993-1995, all the leaves sampled in India in 1990 (n = 3) and 1991 (n = 3), and all the leaves sampled from India and Pakistan in 1993-1995 (n = 9). This procedure made it possible to bring the sampling proportions for these countries closer to the proportions of reported cases. However, these sampling variations did not change the geographic prediction for the MRCAs of EUR4/AFR6, EUR5, EUR6/AFR8. The time-scaled tree with ancestral geographic reconstruction was visualized with iTOL ( Fig. 1).
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Short-read sequence data were submitted to the European Nucleotide Archive (ENA), under study accession number PRJEB38484, and the genome accession numbers are provided in Supplementary Data 1