Environmental prospecting of black yeast-like agents of human disease using culture-independent methodology

Melanized fungi and black yeasts in the family Herpotrichiellaceae (order Chaetothyriales) are important agents of human and animal infectious diseases such as chromoblastomycosis and phaeohyphomycosis. The oligotrophic nature of these fungi enables them to survive in adverse environments where common saprobes are absent. Due to their slow growth, they lose competition with common saprobes, and therefore isolation studies yielded low frequencies of clinically relevant species in environmental habitats from which humans are thought to be infected. This problem can be solved with metagenomic techniques which allow recognition of microorganisms independent from culture. The present study aimed to identify species of the family Herpotrichiellaceae that are known to occur in Brazil by the use of molecular markers to screen public environmental metagenomic datasets from Brazil available in the Sequence Read Archive (SRA). Species characterization was performed with the BLAST comparison of previously described barcodes and padlock probe sequences. A total of 18,329 sequences was collected comprising the genera Cladophialophora, Exophiala, Fonsecaea, Rhinocladiella and Veronaea, with a focus on species related to the chromoblastomycosis. The data obtained in this study demonstrated presence of these opportunists in the investigated datasets. The used techniques contribute to our understanding of environmental occurrence and epidemiology of black fungi.

Metagenomics are culture-independent methods for the study of microbial diversity, based on next generation sequencing (NGS) and allowing characterization of fungi in complex environmental systems, using specific molecular markers for identification 15 . Abundant metagenomic data are available in public databases such as Sequence Read Archive (SRA 16 ), Rast Server (MG-RAST 17 ), and EBI metagenomics (EMG 18 ). Likewise, sequences of several molecular markers are available that are in use for taxonomy and routine molecular identification of species in Herpotrichiellaceae, i.e. ITS, TEF1, BT2, and ACT1 19 . Alternatively, padlock probes, which are specific oligonucleotides with the ability to identify single nucleotide polymorphisms (SNPs), have been proposed for the recognition of several groups of black agents [20][21][22][23][24][25] . DNA barcoding, based on the ITS region and applying short sequences (25-41 bp) of nucleotides specific for a single taxonomic species 26 , can additionally be used to recognize herpotrichiellaceous species by variable regions in the ribosomal operon.
The present study aims to explore the environmental occurrence of chromoblastomycosis agents in the family Herpotrichiellaceae in environmental samples in tropical areas of Brazil. We compare metagenomic data present in public databases, using barcodes and padlock probes for species identification. This approach should lead to better understanding of the sources and routes of infection of patients with chromoblastomycosis.

Results
Datasets containing herpotrichiellaceous fungi. In total, 169 large datasets distributed in 3,786 samples from Brazil were analyzed (Table S1). Of these, only 11 large datasets arranged in 179 samples have sequences of members of Herpotrichiellaceae, originating from five states and representing environmental samples from different geographic areas (Fig. 1A).
The generated data was according to the scope of each metagenome project evaluated, which resulted in a high variation in size of the datasets. The read number ranged from 14,293 to 1,394,769,476, with the rhizosphere metadata (PRJNA362455) being the one with the highest number of reads (Table 1; Fig. 1B). Within each read pool, the ones matching Herpotrichiellaceae ranged from 4 reads to 14,821 sequences, with the highest concentration in the plant metadata (PRJNA522264). All results considered normalized data (Table 1; Fig. 1C).
The total number of reads matching herpotrichiellaceous fungi was 18,329. Of this data pool, 84% (15,526 reads) were identified by barcode markers, and only around 5.6% (1,032 reads) exclusively by padlock probe markers. The number of sequences identified simultaneously by both markers were 1,771 reads (Table 1), which underlined the requirement to use more than a single tool for in silico identification.

Discussion
In this study we investigated the presence of sequences of herpotrichiellaceous fungi in metagenomic datasets that were generated after analysis of divergent environmental sources, using molecular markers for in silico identification of causal agents of chromoblastomycosis and phaeohyphomycosis. The tools used as reference were padlock probes developed for rapid detection of pathogenic Fonsecaea species in clinical samples (F. pedrosoi, F. nubica, F. monophora and F. pugnacius 20,24 ), the agent of neurotropic phaeohyphomycosis Cladophialophora bantiana 23 , and other opportunistic species with variable pathology 21,22,25 . ITS rDNA barcoding sequences had previously been recommended for rapid identification of clinical and environmental sequences 27 , and were suggested for taxonomic identification in metagenomic data 26 .  www.nature.com/scientificreports/ The results indicated that this methodology represents complementary data to studies on direct isolation via culture [9][10][11][12][13][14]28 , which all reported low frequency of these agents in the environment. Judging from the number of sequences present in the evaluated datasets, the low frequency of herpotrichiellaceous fungi, compared to the total number of fungal sequences, was confirmed (Table 1). For example, Fonsecaea pedrosoi, a major agent of chromoblastomycosis in Brazil 2 , was detected in metagenomic data from plant-and soil-associated materials. This habitat is in line with the hypothesis of chromoblastomycosis as an implantation disease from inoculated plant-derived material. This demonstrates that in silico identification can be used as a new tool to uncover the natural habitat of agents of opportunistic diseases and assists in elucidating the environmental occurrence and the route of infection of causative species.
The infection route of agents of chromoblastomycosis nevertheless remains controversial. Their occurrence in living plants has extensively been discussed. Previous studies have shown that Fonsecaea species occurring in living plant material mostly belong to other species than those repeatedly encountered on the human host 13,29 . In our study, the non-pathogenic Fonsecaea species were not detected. A study presented an in vitro plant infection model showing that the agents of human chromoblastomycosis have a certain degree of plant-invasive ability 30 , suggesting that those species occur on plants as well. We may hypothesize, that both strictly saprobic and opportunistic species are very rare and thus both have a low chance to be detected in non-optimal datasets using unbiased methodology. Differences in habitat choice, even when minute, may influence species-specific population dynamics and representation in metagenomics datasets, slight differences determining presence or absence.
Species of the genus Rhinocladiella have been described as less common agents of chromoblastomycosis 31,32 , i.e. R. aquaspersa, R. similis and R. tropicalis 3 . The extremely rare agent Rhinocladiella similis has also been isolated from dialysis water and from babassu coconuts 14,33 , while in our in silico data, R. similis was observed in the rhizosphere of maize. The human host thus is unlikely to be the prime habitat of R. similis. The saprobe R. atrovirens was identified in plant and soil-associated habitats. In addition, Veronaea botryosa, an extremely rare agent of disseminated infections in patients with CARD9 immune disorders 34,35 , had previously been isolated from babassu coconuts 14 and from creosote-treated railway ties 10 . In this study, the species was identified in mangrove, maize rhizosphere and in sugarcane filter cake, indicating a wider saprobic occurrence. www.nature.com/scientificreports/ Presence of herpotrichiellaceous opportunists in the environment has been shown by several authors [8][9][10][11][12][13][14]28 . Our in silico data showed that the most common sequences in metagenomic databases belonged to the genus Exophiala. This is the largest genus in the family Herpotrichiellaceae containing numerous species, many of which are opportunistic pathogens of cold-and warm-blooded animals 19,36 . We detected species reported from various types of disease other than chromoblastomycosis, i.e. E. bergeri, E. dermatitidis, E. jeanselmei, E. heteromorpha, E. mesophila, E. spinifera, E. oligosperma and E. xenobiotica 37 . Also E. angulospora, E. pisciphila and E. equina, associated with infections of cold-blooded animal such as frogs, toads and fish 36,38 were detected. Exophiala cancerae was first described from the Lethargic crab disease (LCD) occurring along the Brazilian coas 36,39 . This species hitherto had only been found in endemic coastal areas. However, in our study it was identified in soil, plant roots and in a sugar filter cake, indicating a wider environmental occurrence. Other unexpected encounters were E. castellanii, previously isolated from water 40 but in our data among mycorrhizal fungi, E. brunnea, known from litter 36 but here in association with mycorrhizal fungi, rhizosphere and plant, and E. sideris from the hydrocarbon-polluted environments 41 but here from plant-and soil-associated materials, and finally E. exophialae known from straw in a burrow of Dasypus septemcinctus, but here from river water, rhizosphere and associated with ants.
The genus Cladophialophora was represented by two opportunistic species, C. arxii and C. immunda. Cladophialophora arxii was originally reported from a disseminated infection 42 and C. immunda from a patient with a subcutaneous ulcer 43 . The latter species was later detected in sites polluted with hydrocarbons 44 , which matches with its presence in soils contaminated with crude oil analyzed in this study. The environmental saprobe C. chaetospira is known to occur in plant litter 10,43 , while in our study it was found in mangroves and in soil contaminated with crude oil.

conclusions
The methodology presented in this study was shown to be a reliable and quick alternative to identify the presence of agents of clinical interest in environmental samples, which is particularly valid for fungi that are difficult to bring in culture, such as black yeasts and other opportunistic agents of human disease. The use of molecular markers as tools for the identification of Herpotrichiellaceae in metagenomic datasets proved to be an effective way to study microhabitats of these fungi, demonstrating the importance of mining databanks for tracking fungal agents. Although local, Brazilian databases were used, the investigated fungi have global distributions, and results are likely to be similar elsewhere. However, data availability is still limited, since the barcode sequences and padlocks described in the literature are restricted to relatively few species. This may explain why in a number of cases our data are significantly different from existing literature, in that common saprobic relatives were not detected, while species with supposedly limited distribution were found in remote, variable habitats suggesting a low degree of host-or habitat-specificity. Expansion of databases may provide a more balanced picture in the future.

Materials and methods
Database construction. The metagenomic database was created based on projects disponible in the Sequence Read Archive (SRA) (https ://www.ncbi.nlm.nih.gov/sra). To search the projects, the term "metagenomic Brazil" was used and all projects were downloaded. This dataset contained a total of 3,786 samples with approximately 2 terabytes (Table S1). The database was assembled only with metagenomes that complied with four criteria: (1) DNA sequences; (2) Brazilian projects to narrow down the selection; (3) environmental link (arthropods and other animals, aquatic bodies, hostile environments including rocks, decomposing materials with plant debris and soil), since within the geographic area the actual habitat is unknown; (4) public data available for download in the SRA. The datasets were rearranged according to eight types of sources, i.e. rhizosphere (PRJNA379918, PRJNA362455, PRJEB24131), ant (PRJNA321130), aquatic (PRJNA237344), biotechnological (PRJNA285006, PRJEB5245), mycorrhizal (PRJNA339563), plant (PRJNA522264), mangrove (PRJNA478407), and soil (PRJNA421085) ( Table 3).
Identification tools. The molecular markers for members of the family Herpotrichiellaceae described in the literature (Table S2) were used for species identification in the metagenome datasets. A total of 97 barcode identifiers with 25-41 bp 26 and 25 padlock probes sequences with 28-42 bp with different SNPs were collected from an rDNA internal transcribed spacer (ITS2 [20][21][22][23][24][25] ).

Identification in silico.
Comparison of metagenomes with molecular marker sequences was performed with local BLASTn (v2.6.0.+). For the data mining, only alignments with coverage and identity cutoff of 100% (perfect match) were considered (Fig. 2). Matches with values below the cutoff were excluded. Because padlock and barcode probes are extremely specific for species identification, cases of slight misalignment and non-perfect sequence identity do not characterize the fungus in the analyses (Fig. 2). Metagenome reads from doublestrand sequencing where considered once in the final read count.