Lung microbiome of stable and exacerbated COPD patients in Tshwane, South Africa

Chronic obstructive pulmonary disease (COPD) is characterised by the occurrence of exacerbations triggered by infections. The aim of this study was to determine the composition of the lung microbiome and lung virome in patients with COPD in an African setting and to compare their composition between the stable and exacerbated states. Twenty-four adult COPD patients were recruited from three hospitals. Sputum was collected and bacterial DNA was extracted. Targeted metagenomics was performed to determine the microbiome composition. Viral DNA and RNA were extracted from selected samples followed by cDNA conversion. Shotgun metagenomics sequencing was performed on pooled DNA and RNA. The most abundant phyla across all samples were Firmicutes and Proteobacteria. The following genera were most prevalent: Haemophilus and Streptococcus. There were no considerable differences for alpha and beta diversity measures between the disease states. However, a difference in the abundances between disease states was observed for: (i) Serratia (3% lower abundance in exacerbated state), (ii) Granulicatella (2.2% higher abundance in exacerbated state), (iii) Haemophilus (5.7% higher abundance in exacerbated state) and (iv) Veillonella (2.5% higher abundance in exacerbated state). Virome analysis showed a high abundance of the BeAn 58058 virus, a member of the Poxviridae family, in all six samples (90% to 94%). This study is among the first to report lung microbiome composition in COPD patients from Africa. In this small sample set, no differences in alpha or beta diversity between stable and exacerbated disease state was observed, but an unexpectedly high frequency of BeAn 58058 virus was observed. These observations highlight the need for further research of the lung microbiome of COPD patients in African settings.


Targeted and shotgun metagenomics approach. The targeted metagenomics was performed at
Inqaba Biotechnical Industries (Pretoria, South Africa), a commercial NGS service provider. Briefly, the extracted bacterial DNA was amplified by targeting the V1-V3 region of the 16S rRNA gene (using 27F and 518R primers). Paired-end libraries (2 × 300 bp) were prepared using the NEBNext ® Ultra™ II DNA library prep kit for Illumina ® (New England Biolabs, USA) and sequencing was performed on an Illumina MiSeq instrument (Illumina, USA). After, the targeted approach, a subset of six samples were selected for virome sequencing according to the following criteria: (i) samples should be from both states of disease and (ii) samples should be representative of the diversity in the samples (one for low diversity, one for intermediate diversity and one for high diversity). For shotgun metagenomics of the amplified and pooled virome samples, paired-end libraries (2 × 300 bp) were prepared with the Nextera DNA Flex library preparation kit (Illumina, San Diego, CA, USA) and sequencing performed on an Illumina MiSeq instrument by the National Institute of Communicable Diseases Sequencing Core Facility, South Africa. The fragments of the 16S rRNA sequences were analysed using QIIME2 version 2019.1 (1548866877) and the Greengenes database version 13.8 [47][48][49] . Human DNA was removed from the virome sequences using Bowtie2 Galaxy version 2.3.4.3 using Hg38 genome as a reference genome 50,51 . Thee virome sequences were analysed using Kraken 2 Galaxy version 2.1.1in the Galaxy platform with 2019 virome database 52,53 . The viral sequencing results were compared to the virus-host database (https:// www. genome. jp/ virus hostdb/ view/) to determine the host of the viruses identified 54 .
Statistical analysis and data visualisation. The data was analysed on R using the following packages: (i) Qiime2R version 0.99.21 (to import QIIME2 data), (ii) phyloseq version 1.30.0 (alpha diversity, beta diversity, statistical tests, principal component analysis (PCoA), hierarchical clustering and relative abundance of the taxa), (iii) ggplot2 version 3.3.2 (for the plotting of all graphs), (iv) DESeq2 version 1. 26.0 (to determine if there was a log2fold difference) and (v) ALDex2 version 1.20.0 [55][56][57][58][59] . A p-value greater than 0.05 was considered significant (for any of the statistical tests unless otherwise specified). The Wilcoxon sum rank test was used as statistical test for the alpha diversity measures.
Ethics approval and consent to participate. Ethics approval was obtained from the Research Ethics committee, Faculty of Health Sciences, University of Pretoria (REC no: 237/2017). Written informed consent was received from all participants. All methods were performed in accordance with the guidelines and regulations as stipulated by the REC.

Consent for publication.
All authors consent to the publication.

Results
Patient demographics. A total of 24 participants were enrolled in the study; 18 males and six females the aged from 50 years old to 82 years old (median age was 60 years old). Only one of the participants was HIVinfected. Participants were distributed across the three hospitals as follows: (i) Hospital A (Tertiary Academic Hospital): 16  Comparison of exacerbation and stable states of disease for the microbiome. The relative abundance of the Actinobacteria, Bacteroidetes, Firmicutes, Fusobacteria and Proteobacteria phyla differed across the disease states; with a higher abundance of Firmicutes (63% in the exacerbated state and 61% in the stable state) and a lower abundance of Actinobacteria (5% in the exacerbated state and 5% in the stable state), Bacteroidetes (11% in the exacerbated state and 9% in the stable state) and Proteobacteria (19% in the exacerbated state and 17% in the stable state), during the exacerbated state ( Figure S1). At a genus level ( Figure S2), the exacerbated state showed changes in 75 genera; with 49 genera that had a lower relative abundance and 26 genera that had a higher abundance. Key genera that showed lower relative abundance during the exacerbated state included Porphyromonas (0.19% in the exacerbated state and 3.92% in the stable state), Serratia (0.00% in the exacerbated state and 2.99% in the stable state), Staphylococcus (0.00% in the exacerbated state and 1.02% in the stable state) and Streptococcus (47.88% in the exacerbated state and 49.61% in the stable state). Genera that showed a higher relative abundance in the exacerbated state included Granulicatella (5.30% in the exacerbated state and 3.06% in the stable state), Haemophilus (16.82% in the exacerbated state and 11.08% in the stable state), Prevotella (10.02% in the exacerbated state and 7.87% in the stable state) and Veillonella (6.92% in the exacerbated state and 4.44% in the stable state). Although, the relative abundance differed across the disease state, with DESeq2 analysis and ALDEx2 analysis no significant difference were observed when a false discovery rate (FDR) of 0.05 was used. When an FDR of 0.2 was used, significant differences were observed across the disease states ( Fig. 2) for DESeq2 analysis but not for ALDEx2 analysis.
There was no significant difference in the alpha-diversity between disease states (Fig. 3) for the microbiome using the Wilcoxon sum rank test for both Chao1 (p-values = 0.58) and Simpson diversity measures (p-value = 0.72). Beta-diversity measures showed no clustering for any of the variables using PCoA and weighted UniFrac (for microbiome) measures (Fig. 4).  Graph of the DESeq2 analysis showing the log2fold differential abundance of the different genera between the exacerbated state and stable state of disease (n = 24) in the sputum microbiome of COPD participants. Differences were considered significant with the p-value (adjusted for false discovery rate using Benjamini-Hochberg correction) cut-off of 0.2. Log2fold changes greater than zero indicated an increase in the relevant genera, whereas log2fold changes less than zero indicated a decrease in the relevant genera. All genera shown below the zero line had a decreased relative abundance with the stable state of disease i.e. these genera were increased during the exacerbated state of disease. The error bars corresponding to the calculated lfcSE (standard error).  5). The most prevalent species was BeAn 58058, a member of the Poxviridae family that was detected in all specimens sent for virome sequencing followed by bacteriophages (associated with both Gram-positive and Gram-negative bacteria). Most of the viruses identified were dsDNA viruses (ranging from 97.23 to 98.15%).

Discussion
In this study, the composition of the sputum microbiome of COPD participants was investigated and was compared between the different disease states i.e. stable state of disease and exacerbated state of disease. Two phyla predominated, Firmicutes and Proteobacteria, with Streptococcus and Haemophilus being the most prevalent genera. However, this study observed no significant differences between the exacerbated and stable states of disease in COPD, in terms of, alpha diversity and beta diversity for the sputum microbiome in COPD. When an FDR of 0.2 was used for DESeq2 analysis, significant differences were observed between the two disease states for the relative abundance. With the virome, a high prevalence of the viruses, BeAn 58058 was observed. In this study, there was difficulty in recruiting HIV-positive individuals with COPD and as a result, only a single HIVpositive participant was recruited in this study.
In both disease states, four phyla dominated: Firmicutes (ranging from 41 to 91%), Proteobacteria (ranging from 3 to 62%), Bacteroidetes (ranging from 3 to 22%) and Actinobacteria (ranging from 1 to 22%). This is in agreement with other studies conducted on the lung microbiome in healthy individuals and other lung diseases 60,61 . Even though some of these studies had different patient groups (e.g. asthmatics and smokers), used different specimen types [e.g. bronchoalveolar lavage (BAL)] and used different sequencing technologies (e.g. 454 sequencing), these four phyla were always found to be dominant in the lung microbiome [60][61][62][63] . However, the most prevalent phylum has been found to differ between different disease e.g. in severe COPD, Proteobacteria is more prevalent whereas in the healthy lung Firmicutes is more prevalent [60][61][62][63][64][65] . This study showed a higher prevalence www.nature.com/scientificreports/ of Firmicutes; previous studies have shown that the microbiome in mild COPD is similar to that of the healthy lung 64,65 . When the stable and exacerbated states of disease were compared in this study a higher abundance of the Firmicutes phylum (2% higher in the exacerbated state) and lower abundances of the Proteobacteria (2% higher in the exacerbated state), Actinobacteria (3% higher in the exacerbated state) and Bacteroidetes phyla (2% higher in the exacerbated state) was observed. Previous studies (all conducted using sputum specimens) that have compared the two diseases states in COPD have observed the same trend, where one of the phyla shows an increased prevalence and the other phyla showed a decreased prevalence in the exacerbated state, however, none of these studies reported the percentage increase [66][67][68][69][70][71] [66][67][68][69][70][71] .
The genera that showed the highest frequency in this study (in both disease states) were:

Granulicatella (Firmicutes), Haemophilus (Proteobacteria), Prevotella (Bacteroidetes), Streptococcus (Firmicutes) and Veillonella (Firmicutes).
Previous studies conducted on the lung microbiome of healthy individuals and COPD patients have observed these genera in high abundances along with Pseudomonas and Porphyromonas. 72,73 . Most of these studies were conducted using 454 sequencing in the USA or Europe with a variety of different specimens. A study by Wang et al. (2016) showed the most similarity to this study with one key difference; the changes in abundance of genera during the exacerbated was different from this study. In this study, Haemophilus had a higher abundance    13,74 . Additionally, seasonal variation may play a role in bacteria identified 75 . Most of the exacerbation samples in this study were collected in either autumn or winter. In Pretoria, the dry season is in winter which is in contrast to the United Kingdom, where the dry season generally falls in summer. Additionally, the bacteria that showed a higher prevalence (between 2 to 6% higher) during the exacerbated state of disease, i.e. Granulicatella, Haemophilus, Prevotella and Veillonella have been associated with gastrointestinal reflux disease (GERD) 76 . As a result of COPD patients having a common cough, GERD is associated with COPD and is considered a comorbidity 77 . In fact, GERD has been observed to be a predictor of exacerbations in COPD and implies that a higher prevalence of these bacteria could be used as a potential indicator of COPD exacerbations 77,78 . In this study, bacterial alpha diversity and beta-diversity analysis showed no difference between disease states. This observation is in agreement with previous COPD studies except for a study by Jubinville et al. (2018) who observed a difference in alpha diversity when comparing paired samples i.e. the diversity in the paired samples differed across the disease state with most exacerbated samples showing a higher diversity [66][67][68]70 . All these studies were conducted in Europe (the UK and Spain) or Northern America (Canada and USA) using sputum specimens, with most studies having less than 30 participants and having used 454 sequencing. The only difference between these studies and the study by Jubinville et al. (2018) was the diversity measure used; most of the other studies used the Shannon index (often combined with Chao1 and Faith PD diversity measure), whereas Jubinville et al. (2018) used the Simpson index. Unlike, the Shannon index, the Simpson index is more affected by the relative abundances (i.e. evenness) of the species in a sample; this suggests that during the exacerbated state of disease, the abundances of species/OTUs changes but not the number of species/OTUs (richness) 79 .
In this study, the most prevalent viral family was Poxviridae followed by Siphoviridae and Myoviridae. When compared to the only two other studies that have focused on the COPD lung virome, this study differed in the relative abundance of the key families 40,41,87   www.nature.com/scientificreports/ A high abundance of Poxviridae was observed in this study, particularly the BeAn 58,085 virus (BAV). Poxviridae is a family of complex, double-stranded DNA (dsDNA) viruses that are often zoonotic and are known to cause skin lesion, with the most well-known virus being variola virus, the causative agent for smallpox (has been eradicated) 87 . Only two other virome studies, one that studied fluid in the human body (conducted in Spain) and one that studied ocular adnexa (conducted in Denmark on samples collected between 2005 and 2014) detected the BeAn 58058 virus in humans 88,89 . This virus (BeAn 58058) was originally isolated from rodents (Oryzomys sp.) in Brazil in 1963 90 . According to the viral-host database, the only known host for the BeAn 58058 virus is the Oryzomys sp., however, other Poxviridae have been known to infect a wide variety of hosts including humans 54 . The BeAn 58085 virus is considered a variant of the Vaccinia virus, a close relative of the smallpox virus that was used as a vaccine vector for smallpox until 1970 91,92 . There are three possible explanations for the high abundance of BeAn 58058 virus detected in this study. The first theory is that the BeAn 58058 virus is an ancient virus that over time has incorporated as part of the human genome; the theory is supported by (i) A study by Mollerup et al. (2019) conducted on the virome of the ocular adnexa, which showed that viral reads (i.e. the BeAn 58058 virus) identified had high sequence homology to sequences of human origin, (ii) A study that was conducted on the human genome (studying structural variants) identified the BeAn 58058 virus as part of the genome and iii) Poxviridae re dsDNA viruses and can easily integrate into the double-stranded human genome 93 . The second theory is that BeAn 5808 is a DNA artefact of the smallpox vaccine (which was a live attenuated vaccine) received years earlier; evidence supporting this theory includes the following: (i) the study population in this study were all over the age of 50 years and would have received the smallpox vaccine before the vaccination programme for the smallpox virus was terminated in South Africa (in 1970) and (ii) the Vaccinia virus, which was used for the smallpox vaccine showed high homology with the BeAn 58058 virus 91,92,94 . The third theory is that the participants in this study encountered an environmental exposure from which the virus was contracted, e.g. rats and its similarity to the cotia virus, which can infect human cells 95 . The fourth theory is that the BeAn 5808 is a contaminant (i.e. a sequence not truly in the sample) from the extraction kit, from animal cells, reagents used or even from a previous sequencing run 96,97 . Further analysis of the lung virome, as well as the human genome of healthy individuals (i.e. not suffering from any lung disease) across different geographical regions and age groups, should provide insight into this in the future.
This study had several limitations. First, this study had a small population size and did not have paired samples for the different disease states. Second, a sputum specimen was chosen for this study (instead of BAL, which has been used by most studies on the COPD microbiome) as it is the most patient-friendly method i.e. is non-invasive 98 . The sputum microbiome has a mixture of the microbiomes from both the upper respiratory tract and the lower respiratory tract [98][99][100][101] . Additionally, sputum specimens have higher bacterial loads and are better for longitudinal studies (as these specimens are non-invasive) 99 . Third, as only a single HIV participant could be recruited into this study, no comparison between HIV positive individuals and HIV negative individuals could be performed for the sputum microbiome in COPD patients; this aspect therefore requires further research. Lastly, no controls were included in the study; the lack of negative controls for the extraction procedure (conducted in a Biosafety level 2 cabinet with DNase away and RNase away) means that the laboratory contamination from extraction reagents, from a previous sequencing run, etc. cannot be ruled out 96,97,101 . However, a strength of this study was that it provided a good pilot overview of the sputum microbiome and the sputum virome of the COPD lung in a South African setting. A diverse microbiome was observed in this study in both the stable and the exacerbated states of disease; with Proteobacteria predominating in the exacerbated state of disease. Conversely, the virome (studied both DNA and RNA viruses) was dominated by a single virus, the BeAn 58058 virus (a dsDNA virus). Most viruses found previously found in respiratory tract were shown to be RNA viruses, such as Influenza viruses, however, most shotgun metagenomics approaches favour DNA viruses, such as members of the Siphoviridae. As result members of the Siphoviridae family and other DNA viruses, such as BeAn 58058 dominate the lung virome. However, the origins of the BeAn 58085 virus and its possible clinical relevance is unknown. Future studies into the virome would require further investigation into this virus by studying the lung virome in healthy individuals and other lung diseases in the South African and international context. Future studies into the COPD lung microbiome should include longitudinal studies that compared the stable and exacerbated states of disease over several time points in the same individuals.

Conclusions
This study is among the first to report lung microbiome composition in COPD patients from Africa. No statistically significant differences in the microbiome of COPD patients during the different states of disease were observed in this study. However, this study did note differences in the frequencies of key phyla and genera when compared to other studies from Europe and the USA. However, the reason for this differing microbial profile is unknown and warrants further research. In the virome, a high frequency of the BeAn 58058 virus was observed in the six samples; the explanation for this observation is unclear. To conclude, the sputum microbiome in South African COPD patients is diverse, regardless of the disease state, while the sputum virome warrants further research.

Data availability
The sequencing data from this study is available in the NCBI Sequence Read Archive (SRA) database (https:// www. ncbi. nlm. nih. gov/ sra) Bioproject PRJNA683885 (Accession numbers SAMN17041381 to SAMN17041404 and SAMN17065738 to SAMN17065743). The scripts used in R and in QIIME2 were added to a Github respiratory at https:// github. com/ tgmah omed/ COPDM icrob iome.