Viral genetic clustering and transmission dynamics of the 2022 mpox outbreak in Portugal

Pathogen genome sequencing during epidemics enhances our ability to identify and understand suspected clusters and investigate their relationships. Here, we combine genomic and epidemiological data of the 2022 mpox outbreak to better understand early viral spread, diversification and transmission dynamics. By sequencing 52% of the confirmed cases in Portugal, we identified the mpox virus sublineages with the highest impact on case numbers and fitted them into a global context, finding evidence that several international sublineages probably emerged or spread early in Portugal. We estimated a 62% infection reporting rate and that 1.3% of the population of men who have sex with men in Portugal were infected. We infer the critical role played by sexual networks and superspreader gatherings, such as sauna attendance, in the dissemination of mpox virus. Overall, our findings highlight genomic epidemiology as a tool for the real-time monitoring and control of mpox epidemics, and can guide future vaccine policy in a highly susceptible population.


Cases and sequence sampling
The first mpox cases in Portugal were laboratory-confirmed on 17 May 2022, and the epidemic peaked around 17 July 2022, followed by a decreasing trend in the number of cases. A total of 951 cases were confirmed in Portugal, as of 10 January 2023 (Fig. 1). To understand the genomic epidemiology of MPXV in Portugal during the 2022 multi-country outbreak, viral genome sequences were obtained from 495 individuals with a PCR-positive test. The dates of sample collection of the 495 genotyped cases spanned from 4 May (ISO week 18) to 16 September (ISO week 37) 2022, representing a large sequence sampling of 54.2% (495 out of 914) of all confirmed mpox cases in Portugal during this period (corresponding to 52.1% of total confirmed genomic analyses of these genomes revealed that the MPXV causing the 2022 multi-country outbreak shares a recent common ancestor with MPXV sequences connected to Nigeria (A.1 lineage) 24 , where a large outbreak occurred in 2017 and 2018 (ref. 12,16,28). More recently, other hMPXV1 lineages (for example, A.2.1, A.2.2, A.2.3 or A.3) have been sporadically detected in countries without endemic mpox from a wide latitude (for example, the United States, Vietnam, Egypt and the United Kingdom), supporting the aforementioned link to the epidemic in Nigeria and that other human-to-human transmission chains are probably ongoing besides the recognized large B.1 outbreak 27,29 . However, the number of sequences reported throughout the years (until 2022) is still very limited, hindering the establishment of precise origins, evolutionary routes and clone dissemination. The sequences collected during the 2022 outbreak diverge by around 50 single-nucleotide polymorphisms (SNPs) from pre-outbreak sequences 24 . This indicates an unexpectedly high substitution rate for the slow-evolving double-stranded DNA Orthopoxvirus. In-depth mutational analysis suggested the action of host APOBEC3 (apolipoprotein B mRNA-editing catalytic polypeptide-like 3) enzymes in viral evolution (owing to the mutating bias of these SNPs: GA>AA and TC>TT base substitutions) and enabled us to estimate that the introduction of hMPXV1 into the human population occurred around 2016, probably followed by 'silent' human-to-human transmission 27 . Signs of potential MPXV-human adaptation in the microevolution of the current outbreak were also investigated early on 24,30,31 , but the adaptive value of the APOBEC3 hypermutation in the long term is still uncertain 27 .
Since the beginning of the 2022 outbreak, and mirroring the open science practices observed in the case of SARS-CoV-2, researchers from several countries immediately started to share MPXV genomes in public databases (for example, GISAID, GenBank), and existing platforms (such as Nextstrain (https://nextstrain.org/monkeypox/hmpxv1) and GenSpectrum (https://mpox.genspectrum.org/)) were rapidly adapted. This collaborative environment facilitated the execution of phylogenetics-based spatiotemporal studies and the surveillance of the genetic variability of MPXV by tracking its spread and assessing evolutionary traits potentially linked to its adaptation to the human host or   Table 1).

Transmission dynamics
To understand the transmission intensity and estimate key epidemiological parameters of this outbreak in Portugal, we used case incidence data to fit a time-discrete SEIR model (Fig. 2 (Fig. 2a).
We estimated a subsequent decline in R t , falling below 1 at the end of June 2022. The model accounted for travel history and was able to reconstruct well the true number of cases (Fig. 2b), with an estimated reporting rate of 0.62 (95% CrI, 0.43-0.83). We also estimated the proportions of the MSM population in Portugal that were susceptible (Fig. 2c), infected (Fig. 2d) and recovered (Fig. 2e) Tables 1 and 2).

Size, timespan and international 'linkage' of Portuguese subclusters
The Portuguese subclusters showed considerable diversity in terms of size, timespan and inclusion of sequences from other countries (Fig. 3a). Seven Portuguese subclusters contained at least ten sequences from Portugal, with cluster 172 (corresponding to the internationally designated sublineage B.  Tables 1 and 2). In addition, we assessed the timespan of the subclusters, that is, the time (in days) between the earliest and latest detection of a given sublineage at national and international levels ( Fig. 3a,b and Supplementary Table 2). At the international level, as of 2 November 2022, the largest time interval (172 days) was observed for sublineage B.1.3, which has been detected in 13 countries since 28 April 2022 (Fig. 3b). It is noteworthy that the earliest occurrence of several Portuguese subclusters that involved international sequences was observed in a Portuguese case, suggesting that the emergence of some outbreak sublineages (or, at least, the early dissemination of some sublineages at a global level) occurred in Portugal (Fig. 3a). At the national level, 24 out of the 52 Portuguese subclusters included mpox cases detected in a time interval equal to or above 30 days ( Fig. 3a and Supplementary Table 2). The largest time interval between two Portuguese cases of the same cluster was observed for cluster 174 (corresponding to the international sublineage B.1.5), with a total of 109 days (between ISO weeks 21 and 36, 2022) (Fig. 3b).
Given the estimated substitution rate of the outbreak-causing MPXV 27 and its spread through sexual networks 1,22 , it is worth noting that a genetic subcluster does not necessarily represent a sustained transmission chain. To address this challenge of detecting multiple introductions, we assessed whether individuals within the same subcluster reported travel history (within the incubation period of MPXV) to different countries and then assessed whether those countries were represented by sequences in the same subcluster. Travel history was reported for 85 out of 494 cases (17.2%), with 43 of them belonging to subclusters with international sequences (Supplementary Tables 1  and 2). The names of travel-related countries (that is, countries from the travel history inquiry, known for 84 out of 85 cases) were reported by at least one case for 25 Portuguese subclusters (17 having international sequences) and 6 international subclusters with a single Portuguese sequence. Not unexpectedly, for 14 Portuguese subclusters, more than one country of travel was reported by different individuals whose sequences were in the same subcluster, thus providing strong evidence for more than one introduction of those 14 sublineages into the country ( Fig. 3c and Supplementary Table 2). For instance, this scenario was identified for the four internationally designated sublineages showing the highest circulation in Portugal (B.1.1, B.1.5, B.1.7 and B.1.9). Most notably, for 11 subclusters, we found at least one match between a travel-related country and a country with sequences in those subclusters, with that country being the most or second most represented in terms of the number of sequences for seven subclusters ( Fig. 3c and Supplementary Table 2).
In addition to the inclusion of travel history, epidemiological data on sexual contact with tourists, travelers or visitors was reported in 101 out of 494 cases (20%) (Supplementary Table 1) and could indirectly inform about potential introductions or reintroductions of the virus into Portugal. Most of these cases, 50 out of 101 (49.5%), belonged to subclusters with international sequences, namely 20 Portuguese subclusters and 6 international subclusters with a single Portuguese sequence. The name of the tourism-related country (that is, countries linked to sexual contacts with tourists, travelers or visitors) was specified by 50 individuals. In total, sexual contact with tourists from a known country was confirmed by at least one Portuguese individual for 21 subclusters (13 with international sequences). Similarly to the scenario observed after the inclusion of travel history data, we found subclusters with more than one tourism-related country (reported by different individuals) in the same cluster (potentially reflecting additional independent introductions), as well as subclusters in which a given tourism-related country matched a country with sequences in that cluster ( Fig. 3c and Supplementary Table 2).

Transmission routes and exposure context of Portuguese subclusters
Detailed epidemiological data on the most probable transmission routes and on possible exposure in the 21 days before symptom onset were available for most of the studied cases (399 and 318 out of the 494 cases, respectively) (Supplementary Table 1). Sexual contact was reporting travel history are highlighted in black or bright green when the travelrelated country matches or does not match a country with sequences in the same cluster, respectively. c, Characterization of the Portuguese genetic subclusters according to the most frequently reported transmission routes and exposure context (excluding cases with no data available), and to potential international linkage based on travel history and/or sexual contact with tourists, travelers or visitors in the 21 days before symptom onset.   Table 1). Within each Portuguese subcluster, small events involving sexual contacts were reported as the most frequent exposure context in 34 subclusters, followed by household exposure (most frequently reported in ten small subclusters with two to seven cases per subcluster) (Fig. 3c and Supplementary Table 2). The in-depth epidemiological investigation also recorded sauna attendance in the 21 days before symptom onset for 76 out of the 494 studied individuals. Notably, 71 out of the 76 cases reporting 'sauna' (93%) were phylogenetically placed at the outbreak basal level ('root') (n = 22) or were part of subclusters (n = 49), with 33 belonging to subclusters with international sequences (Supplementary Table 1). In total, 23 out of the 52 Portuguese subclusters included at least one case reporting 'sauna' (Fig. 3c and Supplementary Table 2). In particular, sauna attendance was reported in at least one case for the top six subclusters in terms of the number of Portuguese sequences. We could trace back the attendance of 28 of the 76 individuals who reported sauna attendance to one particular sauna ('sauna1') located in the Lisbon and Tagus Valley geographic region. These cases were found across 11 Portuguese subclusters, eight of which included international sequences (Fig. 3c). Sexual contacts with multiple or anonymous individuals were also reported by 283 out of 494 cases (57.3%), with nine subclusters exclusively involving cases that reported this variable ( Fig. 3c and Supplementary Table 2).

Discussion
The estimated substitution rate of the MPXV that caused the 2022 outbreak (around six substitutions per genome per year) 27 and its transmission characteristics, associated with sexual networks and superspreader events providing the opportunity for rapid outbreak dissemination [21][22][23] , challenge the identification and tracking of singular transmission chains, as well as the distinction between continuous within-country transmission from multiple introductions. Still, our transmission model and genetic clustering analysis with extensive epidemiological data integration provide evidence regarding the transmission dynamics of MPXV.
Despite the complexity of identifying the epidemiological parameters of MPXV in this outbreak, we fitted an SEIR model accounting for travel history and obtained a reasonable fit. Owing to the long infectious period of MPXV 2 , we expected that R t would fall below 1 before cases start declining, as observed in this outbreak. The fast decline of the R t in Portugal can be explained by three main factors: contact tracing procedures, behavioral changes and vaccination. Contact tracing procedures were implemented as early as May 2022, when the first cases were reported. Intimate contacts and sexual partners of cases were identified, when possible, and were advised to stay in isolation for the maximum duration of the incubation period. During the pre-smallpox-vaccination period, in June 2022, the public health strategy focused on mobilizing the MSM population for prevention and control measures and on identifying groups with high risk of MPXV transmission, particularly those who would have a high number of sexual partners. This particular group of the MSM population might have played a key role in viral transmission in this outbreak 36,37 , as supported by the present study. Targeted public health messages to the MSM population and community stakeholders working in the field of sexually transmitted diseases (STDs) probably led to behavior changes, as seen by the rapid decrease of the R t . Following the approval of the smallpox vaccine for emergency use in the mpox outbreak, ring vaccination started in Portugal in July 2022, consolidating the decrease in transmission risk. The difference between the estimated and the observed epidemic peaks probably translates to a lag of reporting. One of the unique features of this outbreak is that most cases self-identify as MSM. We estimate that only 1.3% of the MSM population in Portugal 38 was infected with the virus, leaving an almost entirely susceptible MSM population with the potential for future epidemics. This could be corroborated by a targeted seroprevalence study in Portugal, as well as in other high incidence-reporting countries. Recent studies have suggested the role of both cryptic and asymptomatic transmission, which could explain, partly, the proportion of cases not observed and not reported 33,[39][40][41] . This model emphasizes the value of transmission models, which enables us to estimate the true underlying size of the epidemic, accounting for reporting probabilities as well as susceptible population proportions.
Our transmission model has potential limitations in the assumptions adopted, which are important to consider when interpreting the results. Firstly, we assumed a homogeneous mixing of the entire MSM population living in Portugal, and we did not include network data among the MSM population because of the incompleteness of these data in the inquiries. Secondly, we relied on laboratory-confirmed case data as an indicator of epidemic size and used travel history as an approximation for the imported cases. Lastly, the model does not account for the possibility that some exposed individuals are perhaps less susceptible to the MPXV (for example, smallpox-vaccinated individuals), and the low number of vaccinated individuals during the period of this time series (less than 500) hampered the inclusion of the effect of vaccination on these dynamics.
Regarding the genomic epidemiological analysis, we were able to identify the B.1 sublineages (that is, genetic subclusters) with the highest impact on the mpox epidemic in Portugal and assess whether their circulation seemed to be higher in Portugal (owing to the lack of international sequences) or revealed multi-country dissemination (Fig. 3). Moreover, the inclusion of travel data, as well as data on sexual contacts with tourists, travelers or visitors, complemented the phylogenetic data to support the existence of multiple introductions of the same sublineage, even though different scenarios were observed. For example, sublineage B.1.9, which had intense circulation in Portugal since the beginning of the epidemic (Fig. 3), was found to have been reintroduced at a later stage after the epidemic peak. By contrast, other sublineages seemed to be introduced multiple times throughout the epidemic, although each introduction probably seeded limited transmission chains (for example, B.1.5 and B.1.7). Supporting the reliability of the disclosed international linkage, we found a link between travel-related or tourism-related countries and the countries with (a high proportion of) sequences in those subclusters. For instance, we found one travel-related introduction of sub-lineage B.1.2 from the United States, and this country represented around half (105 out of 214) of all B.1.2 sequences reported worldwide during the study period. In another example, Germany was the most frequently represented country (114 out of 161 sequences) within cluster 181 (corresponding to sublineage B.1.1), which was consistent with the travel history reported by one Portuguese case linked to that cluster. Still, it is noteworthy that travel history data also allowed the identification of other potential independent B.1.1 introductions from Brazil, Canada and Italy, which corroborates the complexity of such inferences, even with the vast epidemiological data and high sequence sampling rates ( Supplementary Tables 1 and 2). Also, in a few instances, the lack of matches could have been underestimated because of the low number of sequences available in GenBank from some countries. For example, several cases across different subclusters reported travel history to Spain and Brazil, but only a few matches were found. Article https://doi.org/10.1038/s41591-023-02542-x Attendance at small venues involving sexual contacts, namely saunas (in particular 'sauna1'), was found to be among the most reported exposure contexts within the studied population, which is consistent with previous reports 1, 21,22,32,42,43 . We cannot directly point to saunas as the main triggers for the extensive dissemination of MPXV, as it is not possible to ascertain whether these locations were the actual settings of transmission for all of these cases. However, our study clearly supports the important role of such potential superspreader events in outbreak dissemination 21,22,38 , as these cases were frequently integrated into large subclusters (for example, they were reported for the six largest subclusters) and were commonly found across subclusters with international sequences (for example, almost half of the cases reporting sauna attendance were linked to these subclusters). From another perspective, the potential epidemiological linkage of a particular setting (for example, 'sauna1') to several subclusters clearly shows the difficulties of untying the complex transmission networks associated with this multi-country outbreak. These challenges are highlighted not only by the considerably high number of cases reporting sexual contact with multiple or anonymous individuals, but also because almost all subclusters (49 out of 52) involved cases attending different hospitals or STD clinics. For instance, each of the five largest subclusters involved cases from at least 11 different hospitals or STD clinics, with potential negative impact on epidemiological data collection, despite the existence of standardized case investigation forms. Furthermore, these challenges cannot be disconnected from the intrinsic subjectivity associated with the interviews, which are dependent on both recall bias and the participants' willingness to provide the requested information, considering the exposure context (sexual networks). Future efforts to develop methods to improve the collection of this information while protecting individuals' privacy are warranted to increase the added value of genomic epidemiology. This is well reflected by the fact that the epidemiological inquiries were able to collect information about person-to-person contact for only 13 pairs of individuals among the 494 studied cases. This information is pivotal to complement the genomic data to track transmission. Indeed, the phylogenetic data clearly discard a direct or indirect epidemiological connection between the cases only for two out of the 13 pairs (Supplementary Table 1). This concordance (that is, most known contacts were confirmed in the same genetic subclusters) suggests that our clustering analysis might provide a linkage between most cases within the same cluster. We are aware that because of the large size of the dataset and MPXV genome, occasional sequence or phylogenetic artifacts (for example, reversions, spurious mutations or false homoplasies) might insert inconsistencies into the clustering. Still, our conservative clustering approach and inspection of the Portuguese data suggests that this effect may be minimal, reinforcing the presented global clustering scenario, which would be difficult to uncover with only epidemiological data. An unexpected finding was the detection of a non-outbreak-related sequence (PT0428) from sublineage A.2.3. This case, also self-identified as MSM, could be linked to the West Africa region and parallels other recent detections of non-B.1 lineages across the world 27 . By unveiling concurrent human-to-human transmission and broad dissemination of multiple hMPXV-1 lineages (outside the B.1 outbreak), these recent findings corroborate the idea that the historical paradigm of MPXV ecology, evolution and epidemiology has changed, posing new challenges for the prevention and control of mpox 24,27,29,30 .
By taking advantage of a large sequence sampling and the collection of vast epidemiological data, we shed light on the introductions and transmission dynamics of MPXV in Portugal. As one of the countries that reported the first cases and, similarly to the United Kingdom 42 and Spain 38 , one of the most affected countries during the early stages of the 2022 outbreak 24,33 , it is very likely that Portugal's epidemics played an important role in the early and widespread dissemination of MPXV worldwide. This is supported by our study, as the emergence of some outbreak sublineages, or their early global dissemination, most probably occurred in Portugal. Our study also estimates that only 62% of the true case incidence was observed and that 1.3% of the MSM population was infected during this period, leaving the possibility for similar epidemics to occur. We also consolidated the key role of events or venues with superspreader potential and sexual networks in MPXV transmission and identified the sublineages with the highest impact on outbreak spread in Portugal, while emphasizing their spatiotemporal landscape and international linkage. Our study is an integrative genomic epidemiology analysis in the context of the 2022 worldwide MPXV outbreak. Despite lacking individual network contact data to associate the corresponding genetic subclusters and transmission chains, our combined approach may prove to be crucial to untangle and respond to novel and emerging viral threats, even in contexts as complex as the one faced during the 2022 multi-country mpox outbreak. This study leverages evidence about mpox transmission that can support and guide future public health interventions, including vaccine strategies.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41591-023-02542-x.

Methods
This research complies with all relevant ethical regulations. The planning, conduct and reporting of this study was in accordance with the Declaration of Helsinki, as revised in 2013. Ethical approval for the use of surveillance data was not required due to the National Health Authority (Directorate-General of Health) permit to access and use surveillance data for communicable disease outbreak investigations in the public interest. At laboratory level, the Portuguese National Institute of Health (INSA) is the national reference laboratory, being the Portuguese laboratory authorized by the Directorate-General of Health (through the Technical orientation no. 004/2022 of 31 May 2022) to process the samples for identification and genetic characterization of MPXV. All samples subjected to viral genetic characterization were processed in an anonymized fashion. This study was approved by the INSA's ethical committee 'Comissão de Ética para Saúde'.

Epidemiological data collection
Demographic and epidemiological data were collected by the Directorate-General of Health, through SINAVE and the main STD clinics. Data were collected by the attending physician by conducting both face-to-face and phone interviews using the surveillance-standardized case investigation form. Demographic variables that were included were sex and age. Risk practice variables included travel history, contact with a confirmed case, self-identifying as MSM, number of sexual partners, having anonymous and/or multiple sexual partners, engaging in sexual activities with tourists and attending sex venues (sauna, or public or private parties). Countries reported in the travel history inquiry or linked to sexual contacts with tourists, travelers or visitors were referred to as 'travel-related' or 'tourism-related' countries throughout the manuscript, respectively. Details are presented in Supplementary Table 1.

Statistical analyses
Transmission dynamics. We fitted a discrete-time SEIR model to the reported case data using a daily time step. We assumed an incubation period of 5.6 days (ref. 44) and an infectious period of 21 days (ref. 2,3). For the number of susceptible cases, we used the estimates for the MSM Portuguese population 45 . The population was assumed to be completely susceptible, and the model was seeded with ten infections at the beginning of the case time series. We estimated a time-varying reproduction number R(t) as a random walk function with a fixed standard deviation of 0.1. The rates of change of each compartment are described in equations (1)(2)(3)(4). Here, S, E, I and R represent the susceptible, exposed, infectious and recovered proportions of the population. Additionally, α is the rate of onset of infectiousness, σ is the rate of recovery from infection and β (t) is the time-varying transmission rate, calculated as σR(t).
The expected number of reported cases at day t, C(t) is calculated as shown in equation (5), where ρ is the estimated time-constant reporting rate and N is the size of the population. The model was fit using a negative binomial likelihood with estimated over-dispersion parameter, φ.
The model was fitted in a Bayesian framework using Hamiltonian Monte Carlo No-U-Turn sampling in the CmdStanR package. The model was run for 2,000 iterations with two chains and a warm-up period of 1,000 iterations. Sensitivity results across different infectious periods of 14, 21 and 28 days (σ) ∈ {0.071, 0.048, 0.036}, with different infection seeds and without accounting for the importations are shown in Supplementary Figs. 1, 2 and 3. We assessed convergence using the R-hat statistic for each parameter.

DNA Extraction, sequencing and genome consensus generation
All biological samples were received by the Emergency Response and Biopreparedness Unit at INSA and were screened for MPXV using real-time PCR targeting the rpo18 gene 46 , on a CFX Opus Real-Time PCR System (Biorad), with viral genome sequencing being attempted for available samples with real-time PCR threshold cycle (Ct) ≤ 30. The first ten mpox genome sequences (PT0001 to PT0010) were generated as described in a previous publication 24 . For samples PT0011 to PT0048, with the exception of the DNA extraction (which was conducted using the MagMAX Viral/Pathogen Nucleic Acid Isolation kit in a KingFisher Extractor), the same procedure (Nextera XT library preparation and shotgun metagenomics by 2 × 150 bp paired-end sequencing on an Illumina NextSeq 2000 apparatus) was applied. After the development of an amplicon-based sequencing (PrimalSeq) approach 47 , all subsequent samples (PT0049-PT0595) were processed using an adapted version of this protocol and primer scheme (https://doi.org/10.17504/protocols. io.5qpvob1nbl4o/v2). Specifically, DNA amplification was performed using 12.5 µl of NEBNext Q5 Hot Start HiFi PCR Master Mix (New England Biolabs), 3.7 µl of each primer pool (1.5 µM per pool in the final reaction) (in separate reactions), 3.8 µl of nuclease-free water and 5 µl of template DNA in a final reaction volume of 25 µl. PCR amplification conditions were 3 min at 98 °C, followed by 35 cycles of 15 s at 98 °C and 5 min at 63 °C. PCR products of each sample were then pooled and subjected to clean-up with Agencourt AMPure XP (Beckman Coulter, catalog no. A63880) using a 1:1 volume ratio.
Dual-indexed libraries were constructed according to the Nextera XT library preparation guide (Illumina) with minor modifications. Normalization of libraries was performed using a bead-based procedure or a standard procedure according to the concentration values of each library following parallel capillary electrophoresis in the Fragment Analyzer instrument (Agilent). Library pools were denatured and diluted before loading according to the manufacturer's protocols. Paired-end sequencing (2 × 150 bp) was performed on Illumina NextSeq 550 or NextSeq 2000 instruments. Reference-based genome consensus sequences were obtained using the INSaFLU pipeline v.1.5.2 (https://insaflu.insa.pt/; https://github.com/INSaFLU; https:// insaflu.readthedocs.io) 48 , following the same procedure as previously described 24 . Sample details are presented in Supplementary Table 1.

Phylogenetics and identification of outbreak subclusters
To identify and characterize the subclusters of outbreak-related Portuguese sequences within the framework of the international MPXV genetic diversity, a global IQ-TREE phylogenetic tree was built using the Nextstrain 49 hMPXV-1 build (https://github.com/ nextstrain/monkeypox; commit https://github.com/nextstrain/ monkeypox/tree/06ae223e34aff74402e9e4caf5d4322c7c99aad3) over curated genome sequences retrieved from the National Center for Biotechnology Information (NCBI) GenBank Virus collection (https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#). Sequence and metadata curation involved Nextclade, Augur and custom scripts (https://github.com/nextstrain/monkeypox/tree/06ae223e34aff744 02e9e4caf5d4322c7c99aad3/ingest). MPXV sublineage classification Nature Medicine Article https://doi.org/10.1038/s41591-023-02542-x followed the international nomenclature proposed in https://github. com/mpxv-lineages. As of 2 November 2022, the 'big' MPXV Nextstrain public dataset (available for navigation at https://nextstrain.org/ groups/neherlab/PT-MPXV-transmission/2022-11-02) included 2,774 sequences, with the B.1 subbranch including 2,721 outbreak-related sequences collected in 28 countries across Europe, Asia, Africa, Oceania, South America and North America (Supplementary Table 1). This 2022 outbreak subtree included 502 Portuguese MPXV genome sequences, of which 494 (representing 18.2% (494 out of 2,714) of the global B.1 sequence dataset, as of 2 November 2022) were used in this study (Supplementary Table 1). Eight Portuguese sequences were excluded from the clustering analysis because of same-patient redundancy and/or uncertainty about patient identification and/or sample collection date). The global outbreak B.1 subtree and respective metadata (Supplementary Dataset 1), including enriched epidemiological data collected for the Portuguese samples (Supplementary Table 1), as described above, were used to identify and characterize all genetic subclusters (defined as any subbranch with at least two sequences diverging from the outbreak basal level by at least one SNP) using ReporTree v.1.0.1 (https://github.com/insapathogenomics/ ReporTree) 50 . Specifically, ReporTree was requested to cut the tree at one single threshold level from the outbreak basal level ('root') using the 'root-dist' method available through the 'TreeCluster' analysis mode 51 , and setting a distance unit ('-d' argument) corresponding to less than one SNP. ReporTree was also asked to report statistics for all derived subclusters and key metadata variables. A simplified command line string is as follows: reportree.py -m metadata.tsv -t out-break_subtree.nwk -analysis treecluster --method-threshold root_dist-1 -d 0.000002 --columns_summary_report metadata_1,metadata_n --metadata2report metadata_1,metadata_n -out reportree. Advanced and integrative visualization and assessment of the phylogenetic tree, together with genomic, epidemiological and spatiotemporal data was performed with Auspice (https://nextstrain.org/groups/neherlab/ PT-MPXV-transmission/2022-11-02). In addition, https://auspice.us/ was used to interactively explore and interpret the identified subclusters (the JSON file, the 'divergence' outbreak B.1 tree and metadata with subclusters are available for navigation in Supplementary Dataset 1). In particular, genetic subclusters including at least two Portuguese sequences ('Portuguese subclusters') were thoroughly inspected, with the cluster-defining SNP being identified and checked for main homoplasies. The 'mutation_profile' python script (https://github.com/ insapathogenomics/mutation_profile) was applied to screen whether the cluster-defining SNPs follow signatures potentially compatible with APOBEC3-mediated viral genome editing (namely, GA>AA and TC>TT replacements, which were observed in 86% of those SNPs) (Supplementary Table 2). No clustering inferences were taken for the sequences categorized in the 'outbreak basal level' ('root'), because of the outbreak characteristics (see Results) and because one could not exclude the possibility that such 'under-divergence' may be due to the lack of genome sequence completeness of some Portuguese and non-Portuguese sequences and/or bioinformatics artefacts (for example, reversions) across the multiple pipelines used worldwide. The identified sublineages were also characterized in terms of timespan; that is, the time (in days) between the earliest and latest detection at both national and international levels. For 21 Portuguese patients, the genotyped sample did not correspond to the patient's earliest MPXV-positive sample (see Supplementary Table 1). Therefore, to avoid bias in the evaluation of the timespan between the first and latest detection of a given sublineage in Portugal, we used the date of collection of the first PCR-positive sample instead of the date of collection of the first genotyped sample for the timespan analysis (reflected in Fig. 3).

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
MPXV reads mapping to the reference sequence MPXV-UK_P2 (GenBank accession no. MT903344.1) were deposited in the European Nucleotide Archive (ENA) (BioProject accession no. PRJEB53055). Assembled consensus sequences were deposited in the NCBI under the accession numbers detailed in Supplementary Table 1.