Introduction

Currently ASFV, the etiological agent for African swine fever (ASF) is causing a pandemic in swine, that has been mostly restrained to the Eastern hemisphere. However, in 2021 for the first time the pandemic strain of ASF was detected in the Dominican Republic ASF and has caused continued outbreaks on the island of Hispaniola1. Currently, there is only one vaccine for commercial use for ASF and its use is limited in Vietnam, with several other experimental vaccines under different stages of development2. For the rest of the world, control of ASF is restricted to culling of infected farms and animals.

African swine fever virus (ASFV) is large dsDNA virus, that encodes for a large number of genes The exact number of genes present in the genome as well as the genome length (ranging from 170 to 190 kb) vary among different virus isolates. The ASFV genome encodes for over 150 different proteins, with variation between isolates, particularly in the multi-gene family genes, and genes that have been used to genotype ASFV: P54, P72, and the central variable region (CVR) of pB602L3. Until recently full-length sequencing the large genome of ASFV was costly and difficult, which restricted this type of information to very few historical genomes of ASFV that have been fully sequenced. The lack of historical ASFV isolates with full sequenced genomes has been identified as a major GAP in ASFV research4 as this knowledge is of great importance in understanding historical outbreaks of ASFV, and the evolution of the ASFV genome over time, particularly in variable regions of the virus genome.

The 2021 outbreak in the Dominican Republic was caused by an isolate that has been determined to be a derivative of the pandemic strain currently circulating in Asia and Europe, and is a genotype II ASFV5. Here we report, for the first time, the full-length sequence of the African swine fever strain that caused the outbreak in 1978–180 in the Dominican Republic, and determined that it was a close relative to the sequenced Sardinia viruses that were causing outbreaks at that time6,7.

Results

ASFV full genome alignment

Forty curated ASFV sequences were aligned to DR-1980 using the default parameters of Genome Alignment tool in CLC Workbench. A Cladogram was created with representative sequences using the neighbor joining method and Jukes-Cantor mode. Phylogeny was inferred using 1000 bootstrap replications showing that the DR-1980 isolate was most closely related to the ASFV genomes that were sequenced form outbreaks occurring in Sardinia at that time (Fig. 1).

Figure 1
figure 1

The full-length trees. 40 curated ASFV sequences were aligned to DR-1980 and a cladogram was created using the neighbor joining method and Jukes-Cantor mode with phylogeny inferred using 1000 bootstraps. Bootstrap values are shown at each branch. Genomes are labeled by GenBank accession number region of isolation date of isolation_p72 genotype. DR-1980 is highlighted in blue font.

Genotyping of DR-1980

Genotyping of the DR-1980 isolate was determined following the three commonly used methods3 (Table 1). ASFV isolate 56/Ca/1978 (Sardinia: Cagliari-1978), which was circulating during the same time frame, was 99% identical and the closest match based on whole genome sequence identity to DR-1980. Of note, very few full-length ASFV genomes have been sequenced from this time, therefore the possible origin of DR-1980 should not be overstated and could have been from any outbreak region especially considering the two viruses are not genetically identical.

Table 1 Genotyping of DR-1980.

Analysis of individual DR-1980 proteins

The 165 annotations of DR-1980 were translated in CLC Genomics Workbench and extracted. ASFV [Taxonomy ID: 10497] nucleotide sequences were downloaded from the International Nucleotide Sequence Database Collaboration (INSDC) using the National Center for Biotechnology Information (NCBI) website (http://www.ncbi.nlm.nih.gov) in GenBank and fasta format (accessed March 14th, 2022). All protein coding sequences (CDS) were extracted and combined with ASFV protein coding sequences downloaded from the Viral Bioinformatics Research Centre (VBRC) website (https://4virology.net) (accessed April 13th, 2022) to create a local ASFV database that was compared to the DR-1980 translations using the default parameters of blastp.

The following genes encoded for unique nucleotide sequences, specific for DR-1980: KP93L (truncated form of the protein, 85% identical to closest sequenced isolate), DP93R (unique amino acid P45A), L60L (unique amino acid D3N), I215L (D203del), MGF 360-12L (unique amino acid combination of R101Q or M151L or N191S & S176N), G1211R (K505R), and CP2475L (unique amino acid P204A). Amino acid sequences for each of the 7 unique DR-1980 genes were aligned to other sequenced isolates (supplemental Fig. 1) with the unique amino acids for the DR-1980 isolate highlighted.

Discussion

Currently the island of Hispaniola is facing a continued outbreak from a descendent of the Georgia 2007 strain that is causing the continued outbreaks in Asia and Europe. Here we report, for the first time, a full-length sequence from the outbreak in 1978–1980 in the Dominican Republic, addressing the historical GAP in available full-length ASFV genomes4. The conclusion of this outbreak was to cull all the swine on the island of Hispaniola and repopulate. At the time of this outbreak there was no vaccine available for ASFV, and control was limited only to culling of infected animals. Recently, the first ASF commercial vaccine was approved that is based on the backbone of the current genotype II pandemic strain containing a single deletion in the I177L gene8. This vaccine could be considered to help in the control of the current outbreak situation in the island of Hispaniola. However, the possibility of controlling ASF outbreaks with the aid of a vaccination program, was not an available option in 1980. The decision in 1980 was made to cull all swine from the island of Hispaniola to stop continued outbreaks of ASF. The full-length sequence of the 1980 strain is valuable information that could aid in making different decisions for the current outbreak in the Dominican Republic.

Methods

Viral DNA isolation and next generation sequencing

ASFV isolate DR-180 was passed once in primary swine macrophage cultures produced from blood, as previously described9. Viral DNA was sequenced, as previously described, using an Illumina Nextseq50010 and oxford nanopore MinIon11 sequencing platforms. In brief, virus DNA was extracted from infected macrophage cultures using the MagMax pathogen RNA/DNA kit (Applied Biosystems). For nanopore sequencing, subtractive hybridization of methylated genomic DNA was performed, the DNA was barcoded using the rapid PCR barcoding kit, and loaded per manufacture instructions onto the GridIon. For illumina sequencing, the Nextera XT v2 kit was used (Illumnia, San Diego, CA, USA) following the manufacturer’s protocol. Sequence analysis was performed using CLC Genomics Workbench software (CLCBio, Waltham, MA, USA). Coverage Plots of reads mapped back to the consensus sequence (supplemental Fig. 2).

Genome assembly

All steps were performed using CLC Genomics Workbench (version 21). Illumina reads were trimmed for quality (limit = 0.05), ambiguous base pairs (max = 2), adapters, minimum size (min = 50) and from the 5′ (20 nucleotides) and 3′ terminal end (5 nucleotides). To remove reads resulting from host sequence, Minion and Illumina reads were mapped to ASFV strain Nu1979 (Accession: MW723481.1) and collected, resulting in 377 Minion reads, 17 million paired-end Illumina reads, and 7,885 orphaned Illumina reads. All reads were then entered into the “De Novo Assemble Long Reads and Polish with Short Reads” pipeline using the default parameters resulting in four contigs. De novo assembly was then performed using only the Illumina reads, with the four polished contigs serving as guidance only reads. 5 contigs were created. Both assemblies were combined and one 183, 687 bp contig was constructed based on homology overlap between the contigs. Illumina reads were mapped backed to the genome using default parameters and resulted in an average depth of coverage of 10,529 reads.

Annotation of the genome

All 165 translated protein sequences were extracted from the ASFV strain 56/Ca/1978, (Accession: MN270969.1) GenBank file and compared against the DR-1980 consensus sequence using the default parameters of tblastn. Strand, start and end nucleotide positions for each gene were extracted from the output file and, when required, manually extended to include the correct start and stop codon. Annotations were entered on to the DR-1980 genome in CLC Genomics Workbench, from ASFV reference genomes resulting in 165 annotations.