Multiple early introductions of SARS-CoV-2 into a global travel hub in the Middle East

International travel played a significant role in the early global spread of SARS-CoV-2. Understanding transmission patterns from different regions of the world will further inform global dynamics of the pandemic. Using data from Dubai in the United Arab Emirates (UAE), a major international travel hub in the Middle East, we establish SARS-CoV-2 full genome sequences from the index and early COVID-19 patients in the UAE. The genome sequences are analysed in the context of virus introductions, chain of transmissions, and possible links to earlier strains from other regions of the world. Phylogenetic analysis showed multiple spatiotemporal introductions of SARS-CoV-2 into the UAE from Asia, Europe, and the Middle East during the early phase of the pandemic. We also provide evidence for early community-based transmission and catalogue new mutations in SARS-CoV-2 strains in the UAE. Our findings contribute to the understanding of the global transmission network of SARS-CoV-2.


Scientific Reports
| (2020) 10:17720 | https://doi.org/10.1038/s41598-020-74666-w www.nature.com/scientificreports/ destinations from 18 March 2020 and Dubai airport was closed to passenger flights on 25 March 2020; hence, patients after 18 March 2020 were expected to be more likely a result of community transmission as opposed to imported infections. The index patient in the UAE was a female Chinese tourist (aged 63 years) travelling from Wuhan with other family members to visit her son in Dubai. The Chinese family arrived in Dubai on 16 January 2020 and tested positive on the 29 January 2020 (Table 1). Over the next seven weeks, there were multiple new cases among tourists and residents with travel history (44.9% had travel history from Europe) ( Table 1). Nearly two-thirds (63.3%) of patients were male and 61.2% were aged between 20 and 44 years reflecting the young age structure of the UAE population 5 . Majority of patients (88%) were asymptomatic or had mild symptoms and only four required intensive care with invasive ventilation (one death; Table 1). SARS-CoV-2 whole genome sequencing was performed on all 49 COVID-19 patient samples. Only genomes with almost complete coverage (n = 25, "Methods" section) were used for phylogenetic analysis. The 25 genomes were obtained from cases with disease onset in late January (n = 1), early February (n = 1), late February (n = 6), early March (n = 8), and late March (n = 9). Of those, approximately two-thirds were male and aged between 10 and 40 years (Table 1).

Phylogenetic analysis.
To understand early viral transmission in Dubai in the global context, we performed phylogenetic analysis on the 25 novel viral genomes we sequenced from early patients in the UAE (Table 1) in this study ("Methods" section) along with 157 largely complete SARS-CoV-2 genomes deposited in GISAID from different countries between December 2019 and early March 2020 6,7 (Supplementary Table S1).
Consistent with multiple independent introductions, the UAE SARS-CoV-2 isolates were distributed across the phylogenetic tree (Fig. 1). The majority (76%) clustered with clades A2a (48%) and A3 (28%) which are largely composed of isolates from COVID-19 patients in Europe and Iran, respectively. This clearly suggests that the major introductions into the UAE during the early phase of the pandemic originated from Europe and the Middle East/Iran. Supporting its European origin, all individuals with the A2a clade isolates were mostly European and/or with recent travel history to a European country, mainly to Italy (n = 4), Germany (n = 3), United Kingdom (n = 2), Spain (n = 1), and Norway (n = 1) ( Table 1 and Fig. 2). Onset of symptoms reported in this group was within or after the second week of March (Table 1) suggesting that the viral infections in this group could have occurred during late February to early March. Of note, a SARS-CoV-2 isolate submitted from Mexico (GISAID ID: EPI_ISL_412972) was 100% identical to that from an Italian expatriate working in the UAE (L0881), while another submitted in Germany (GISAID ID: EPI_ISL_412912) differed by a single mutation (Fig. 1). All three individuals had a recent travel history to Italy and overlapping infection time frames (late February-early March). Within this group, isolates from patients L1758, L0484, and L2185 were identical (Fig. 2) suggesting a possible common direct source of transmission.
Isolates in the A3 clade were obtained from five individuals with travel history to Iran (L2409, L6627, L0904, L0184, and L4682), one Indian resident (L0231), and one Indian tourist (L0068) (Fig. 2). Onset of symptoms for the five individuals with travel history in this group was reported to be around 21-24 February (Table 1). Patient L0231 had no travel history and reported symptom onset on 7 March suggesting a possible communitybased transmission event. Interestingly, all but one isolate obtained from patient L4682-the only patient in this group with severe clinical presentation-shared a common ancestral strain identical to that obtained from patient L2409. The SARS-CoV-2 isolate from L4682 had two unique missense variants in the ORF1ab gene (Supplementary Table S3) which might be worth investigating for any possible biological effect(s). Consistent with its Iranian origin, a SARS-CoV-2 sequence submitted by the University of Sydney (GISAID ID: EPI_ISL_412975) on 28 February 2020 differed by only two mutations from that of L2409, and both this Iranian male tourist and the Australian male had a recent travel history to Iran. We speculate that individuals with travel history to Iran around this time frame (L8386, L6867, and L3280), for whom a full viral genome sequence could not be obtained, were also very likely to cluster within the A3 clade.
Only one viral strain obtained from L5630, a family member of the early Chinese index patient, belonged to the B2 clade. Although we did not obtain full viral genome sequences from the other members of that Chinese family, we expect that all had a similar strain to L5630. Interestingly, our data do not suggest any transmission of this clade at least among the earliest patients ( Fig. 2) included in this study which is consistent with the reported early detection and isolation of this family. This finding also supports the notion of secondary source(s) for the ongoing local transmission.
The remaining five isolates did not belong to A2a, A3, B2, or any of the clades on nextrain.org as of 12 May 2020, suggesting earlier introduction(s). Those isolates were obtained from four Asians, two residents (L4280, L6599) and two tourists (L4184, L9766), and one Czech resident (L1014) working as an airline cabin crew with travel history to Austria (Table 1). Consistent with the Asian predominance among this patient group and the fewer (1 or 2) mutations for most of their isolates (4 out of 5) relative to the Wuhan reference genome (Fig. 2), several early viral strains submitted in Asia clustered very closely to this group (Fig. 1). L4280 was the first sequenced patient without travel history and became infected after transporting a work colleague, L0826, to hospital. Patient L0826 reported symptoms onset on 22 January suggesting that community-based transmission started in the UAE in early-to-mid January. L6599 was an Indian expatriate living with three other Filipino and Sri Lankan expatriates (L3715, L2771, L8480) ( Table 1). All four individuals had no documented recent travel history suggesting local transmission, and although full viral genome sequences could only be obtained from one patient L6599, it is very likely that all have related isolates.
In aggregate, we identified 70 variants relative to the reference GenBank SARS-CoV-2 sequence NC_045512.2. The majority of these variants were missense (n = 41) with the most frequent nucleotide change being C > T (n = 33), and more than half (38/70) were localized in the ORF1ab gene (Supplementary Table S3 www.nature.com/scientificreports/ out of the 70 variants were novel as they were not identified in the Chinese National Center for Bioinformation Database (https ://bigd.big.ac.cn/ncov/varia tion/annot ation ; last accessed August 13, 2020). The novel variants were a coding missense variant and a synonymous variant in the N and ORF1ab genes, respectively. In addition, 9 variants were very rare (i.e. seen less than 4 times out of 81,625 genomes), including one missense variant (F850I) in the S gene (Supplementary Table S3).

Discussion
Our findings suggest multiple independent spatiotemporal introductions of SARS-CoV-2 into the UAE where the majority of introductions (76%) were from Iran and Europe during two different time frames (mid-late February and early March, respectively). Although we show evidence for possible local transmission within the Middle Eastern/Iranian isolates, it will be important to sequence further isolates at subsequent dates to determine whether these introductions succeeded in seeding more clustering and whether such clustering was affected by proactive and vigilant public health measures, such as transitioning to online learning for schools and universities, implementing work-from-home protocols across all sectors, and nationwide disinfection campaigns. Six isolates (22%) did not cluster with the European or Iranian groups and represented earlier introductions which did not appear to seed larger clusters in our sampled cohort. However, additional sequencing is needed to determine the extent of community transmission, especially given that our data strongly suggest that the earliest patient (early to mid-January) in the UAE could have been a secondary infection from one of those introductions.
The new SARS-CoV-2 mutations identified in the UAE warrant further investigation to explore whether they influence viral characteristics, especially pathogenicity, or provide important information for vaccine development. One of the major strengths of the study was the non-biased representative sample of early cases, including the index family cluster, in Dubai from the only central testing lab, along with detailed demographic and clinical information. Limitations included the inability to conduct full whole genome sequencing on more samples most likely due to low viral load issues, although we were able to deduce the origin of transmission in most of those individuals based on travel history. Regardless, this study contributes important molecular epidemiological data that can be used to further understand the global transmission network of SARS-CoV-2 8 .

Methods
Human subjects and ethics approval. Sociodemographic and clinical data was extracted from the electronic medical records of the earliest 49 patients with laboratory confirmed SARS-CoV-2 from 29 January to 18 March 2020 using the WHO case report form. Cases were categorized into three groups based on disease severity: asymptomatic and mild cases with either no symptoms or mild non-life-threatening symptoms e.g. dry cough, mild fever; moderate cases with symptoms (e.g. breathlessness, persistent fever) requiring hospitalization and medical attention (e.g. supplementary oxygen therapy, intravenous fluids); and severe/critical cases with advanced disease and pneumonia requiring admission to intensive care units and specialized life-support treatment (e.g. mechanical ventilation). This study was approved by the Dubai Scientific Research Ethics Committee-Dubai Health Authority (approval number #DSREC-04/2020_02). The requirement for informed consent was waived as this study was part of a public health surveillance and outbreak investigation in the UAE. Nonetheless, all patients treated at a healthcare facility in the UAE provide written consent for their deidentified data to be used for research and this study was performed in accordance with the relevant laws and regulations that govern research in the UAE. Germany) was amplified using 26 overlapping primer sets covering most of the SARS-CoV-2 genome as recently described by our group 9 . PCR products were then sheared by ultra-sonication (Covaris LE220-plus series, MA, USA) and prepared for sequencing using the SureSelectXT Library Preparation kit (Agilent, CA, USA). This library was sequenced using the Illumina MiSeq Micro Reagent Kit, V2 (2 X 150 cycles).    Fig. S1). Assembled genomes with at least 20X average coverage across most nucleotide positions (56-29,797) were used for subsequent phylogenetic analysis (Supplementary Table S1). A total of 25 viral genomes (24 by shotgun and 1 by target enrichment) met this inclusion criterion and were submitted to the Global Initiative on Sharing All Influenza Data (GISAID) database under accession IDs: EPI_ISL_435119-435,142 (Supplementary Table S2).

Data availability
All data generated or analysed during this study are included in this published article (and its Supplementary Information files) and the sequences are available on the GISAID database under the corresponding accession numbers.