Phylogenomic and biochemical analysis reassesses temperate marine yeast Yarrowia lipolytica NCIM 3590 to be Yarrowia bubula

Yarrowia clade contains yeast species morphologically, ecologically, physiologically and genetically diverse in nature. Yarrowia lipolytica NCIM 3590 (NCIM 3590), a biotechnologically important strain, isolated from Scottish sea waters was reinvestigated for its phenotypic, biochemical, molecular and genomic properties as it exhibited characteristics unlike Y. lipolytica, namely, absence of extracellular lipolytic activity, growth at lower temperatures (less than 20 °C) and in high salt concentrations (10% NaCl). Molecular identification using ITS and D1/D2 sequences suggested NCIM 3590 to be 100% identical with reference strain Yarrowia bubula CBS 12934 rather than Y. lipolytica CBS 6124 (87% identity) while phylogenetic analysis revealed that it clustered with Y. bubula under a separate clade. Further, whole genome sequencing of NCIM 3590 was performed using Illumina NextSeq technology and the draft reported here. The overall genome relatedness values obtained by dDDH (94.1%), ANIb/ANIm (99.41/99.42%) and OrthoANI (99.47%) indicated proximity between NCIM 3590 and CBS 12934 as compared to the reference strain Y. lipolytica. No extracellular lipase activity could be detected in NCIM 3590 while LIP2 gene TBLASTN analysis suggests a low 42% identity with e value 2 e−77 and 62% coverage. Hence molecular, phylogenetic, genomics, biochemical and microbial analyses suggests it belongs to Yarrowia bubula.


Results and discussion
ITS and D1/D2 rDNA identification and phylogenetic analysis. The sequenced lengths of ITS and D1/D2 regions of NCIM 3590 showed that it consisted of 311 and 446 nucleotides, respectively which have been submitted to GenBank (NCBI, USA) with accession numbers MK411246 and MK411222, respectively. BLAST analysis of ITS regions of NCIM 3590 showed 100% identity with the reference strain Y. bubula CBS 12934 (KY105958.1) with query coverage of 95%. The ITS sequence of NCIM 3590 when compared with Y. bubula CBS 12934 (consensus length 290 nt) showed 100% similarity index with gap number and gap length as 0 and divergence as 0%. In contrast, comparison with reference Y. lipolytica CBS 6124 (consensus length 311 nt), 83% similarity with gap number (5), gap length (11), divergence (14.2%) and sequence identity (87%) was seen. For D1/ D2 sequences, NCIM 3590 showed 100% identity, query coverage, and sequence similarity with 0% divergence, gap length and gap number (0) with the reference Y. bubula CBS 12934 (NG_059943.1). Sequence comparison with Y. lipolytica CBS 6124, showed a similarity index of 89.5% with gap number and gap length (1), divergence (13%) and identity (87.5%). Generally, ITS region is selected as the standard fungal barcode for identification 31 but, for yeasts it is recommended to use D1/D2 together with ITS region for identification and establishment of evolutionary relationships 32 . It has also been reported that strains showing more than 1% difference or with changes in more than 3 nucleotides in ITS and D1/D2 domains are likely to represent different species 4 . Thus, the molecular identification using ITS and D1/D2 sequences suggests that NCIM 3590 could belong to Y. bubula. Further the phylogenetic analysis of ITS and D1/D2 sequences from GenBank, NCBI showed that Yarrowia sp. could be separated into 13 clades (Fig. 1a,b and Table S1). All the seven different Y. lipolytica strains used in the study were grouped together in a single clade with 100% bootstrap support while the ITS and D1/D2 sequences of NCIM 3590 clustered with CBS 12934 suggesting its close relatedness. Since a close similarity between NCIM 3590 with Y. bubula and its distinct difference from Y. lipolytica strains was seen, the sequence comparisons of the strains from these clades/groups were analyzed and the results for ITS and D1/D2 given in supplementary information (Table S2). The ITS and D1/D2 genetic distances of NCIM 3590 with other Y. lipolytica species ranged from 0.392 to 0.407 and ~ 0.21 base substitutions per site, respectively ( Table 1). The percent divergence over sequence pairs between NCIM 3590 with different Yarrowia groups was calculated using p-distance method and the results given in supplementary information (Table S4). NCIM 3590 was considered as a different group for comparison and establishing the evolutionary divergence. The base differences per site in ITS and D1/D2 sequences between each Yarrowia species was ascertained and averaged forming a dataset of 14 groups and divergence given (Table S4). The divergence for ITS and D1/D2 rDNA sequences with Y. lipolytica was 13.35 and 9.19%, respectively and no divergence seen with Y. bubula (Table S4). The variations found in their D1/D2 and ITS sequences are adequate to separate the Yarrowia species from one another 4,11 . Thus, as per delineation of species since less than 1% and no nucleotide difference was observed between NCIM 3590 and Y. bubula CBS 12934, the phylogenetic analysis suggests that they belong to the same species and is distinctly separated from the Y. lipolytica clade.
NGS and de novo genome assembly. In order to validate the above results, whole genome sequencing of NCIM 3590 was undertaken. The initial genome de novo assembly was carried out using SPAdes assembler (optimized with 10 million reads, k-mers: 21, 33, 55 and 77) enabling assembly of 2485 contigs representing with N50 of 39,245 bp (Table 2). Contigs (521) were found to be more than 10 kb length with an average contigs length of 8592 bp. The SPAdes 33 assembler relies on "paired de-Bruijn graphs" (PDBG) approach which utilises a k-bimer based adjustment strategy for creating de-Bruijn graph using the paired-end reads. The genome is assembled using multiple k-mer sizes and eventually combining the reads into contigs. The assembler was initially designed for assembling prokaryotic genomes but later developed to accommodate large eukaryotic genomes. Assembled contigs were further scaffolded de novo using SSPACE that resulted in 2074 scaffolds with the N50 of 69,096 bp. SSPACE (SSAKE-based Scaffolding of Pre-Assembled Contigs after Extension) program is used to scaffold pre-assemblies produced by SPAdes. SSPACE 34 requires paired-end data from next-generation sequencing technology, read orientation information, mean values and standard deviations of the insert sizes used in library preparation. Using paired-read sequencing data the assembler assess the order, distance and orientation of contigs and combines them into scaffolds. Based on the alignments the contigs are linked into scaffolds www.nature.com/scientificreports/ and N-characters (gaps) are placed between the connected scaffolds. As per assembly statistics, scaffolds (730) greater than 1 kb were considered for super-scaffolding/gap closing along with paired end data using SOAPde-novo2 with asm flags (3,4) parameter (https ://githu b.com/aquas kylin e/SOAPd enovo 2). SOAPdenovo2 35 utilizes six modules namely, read error correction, de Bruijn graph (DBG) construction, contig assembly, paired-end (PE) reads mapping, scaffold construction, and gap closure. As a result of super-scaffolding method applied on greater than 1 kb scaffolds, the total number of scaffolds and non-ATGC count was decreased. The minimum scaffold length was increased from 1000 to 1032 bases ( Table 2). The de novo genome assembly strategy was selected as an unbiased approach since it does not consider prior knowledge of the source DNA sequence length,  The availability of completely annotated genome assembly is a significant advantage for the study of any organism. CLIB 122 36 was the first genome reported to be fully sequenced and annotated. Additionally WSH-Z06, H222 (CLIB 80), W29 (CLIB 89), IBT 446 and PO1f. were fully sequenced and annotated at the chromosome level while few strains were assembled upto contigs and scaffolds level (https ://www.ncbi.nlm.nih.gov/genom e/brows e#!/ eukar yotes /yarro wia). The reference genomes, namely, Y. bubula CBS 12934 37 and two Y. lipolytica strains CLIB 122and CLIB 89 (W29) 38 , were selected for comparison of whole genome sequencing data as these strains not only had a genome size similar to NCIM 3590 for assembly purpose and also the availability of complete genomic and annotation information.. In general, Y. lipolytica has a GC content reported to lie between 49 and 50%. The two reference strains CLIB 122 and CLIB 89 exhibited a GC content of 49.1 and 49.0% while for NCIM 3590 and Y. bubula CBS 12934 was 47 and 46.2%, respectively which was 2% lower than the Y. lipolytica strains (Table S5).
Overall genome relatedness index (OGRI) and genome comparison. The digital DDH (dDDH) tools calculate inter-genomic distances using three formulas and convert these distances into percent-wise dDDH similarities. For distance calculation, Formula 1 utilizes HSP (High-scoring Segment Pairs) length and total lengths of the genome; Formula 2 uses identities and HSP length while Formula 3 uses identities and total lengths [39][40][41] . The calculated dDDH values for CBS 12934 with NCIM 3590 were found to be 98.8, 94.1 and 99.1% with Formulas 1, 2 and 3, respectively (Table 3) indicating that all three formulas suggest a high degree of similarity or relatedness. In contrast, the dDDH similarity values for NCIM 3590 were found to be ~ 17, 26 and 17% with Formulas 1, 2 and 3, respectively suggesting low level of relatedness. A dDDH similarity score of greater than or equal to 70% is a criterion for assigning two strains to the same species 40,42 and hence the values obtained   (Table 3). Two genomes are considered to belong to the same species when they have a genome distance value (GD) of less than 0.0284 39,41 and thus, based on the distance values also the two genomes of NCIM 3590 and Y. bubula CBS 12934 seems to be closely related. Another parameter for relatedness using dDDH, calculates the difference in G + C content (%) which was found to be 0.32% between NCIM 3590 and reference Y. bubula CBS 12934 while it was 1.94% and 1.87% for CLIB 122 and CLIB 89, respectively (Table 3). Meier-Kolthoff et al. 43 suggested that a value greater than 1.0% indicates distinct species 43 . Since the difference in G + C content is greater than 1% between NCIM 3590and CLIB 122 and CLIB 89, it suggests that NCIM 3590 does not belong to the same species as CLIB 122 and 89. Though the GGDC tool of dDDH has been effectively used for species delineation in prokaryotes, very few reports exist on its application in fungal and yeast systems. Mefteh et al. 44 used this tool to show that the two strains of Penicillium citrinum genomes belong to the same species (DDH 97.3% and distance 0.004), the two strains of Geotrichum candidum are genetically distant (DDH 18.3% and distance 0.236) using Formula 2 44 . Similarly, relatedness between Saccharomyces cerevisiae S288c and four Candida species namely C. auris 6684, C. albicans (SC-5314 and WO-1), C. lusitaniae ATCC 42720 and C. glabrata CBS-138 has been studied 45 . Genomic relatedness was also determined using Average Nucleotide Identity (ANI) and OrthoANI (OANI) which are the mean of nucleotide identity values between the two organisms and have been widely used indices under OGRI 46 . The ANIb/ANIm/OANI values between CLIB 122 and CLIB 89 is 99.71%/99.70%/99.72% indicating that they belong to the same species (Fig. 2a). The ANIb/ANIm/OANI values between NCIM 3590 and CBS 12934 is ~ 99.41% while between CLIB 122 and CLIB 89 were ~ 80.00%, respectively ( Table 3). The inter-genomic distance between NCIM 3590 and Y. bubula CBS 12934 was calculated to be 0.01 and between two Y. lipolytica species was 0, while between NCIM 3590 and Y. lipolytica species was 0.17 (Fig. 2b). Values greater than 96% and genome distance closer 0 indicate that strains belong to the same species 46 . The data obtained for NCIM 3590 and CBS 12934 using ANI, OrthoANI and dDDH suggests that both strains are genomically related to each other and corroborates with the established boundaries for genomic species delineation (95-96% for ANI and OANI, 70% for digital DDH) [46][47][48] . Our results suggest that the NCIM 3590 and Y. bubula CBS 12934 are closely related and belong to the same species.
Based on ITS and D1/D2 rDNA sequencing, whole genome comparison and OGRI, NCIM 3590 is likely to be Y. bubula. The strain has tremendous potential for biotechnological applications and scarce information with respect to its basic microbiological aspects is available and hence, it was deemed necessary to investigate these aspects. www.nature.com/scientificreports/ Colony morphology. As no information regarding phenotypic characterization, namely, size, form, elevation, margin edge, surface, opacity and chromogenesis of NCIM 3590 is available in literature, the yeast growth on different culture media was undertaken 49 . The marine yeast NCIM 3590 was able to grow on all the eight different growth media studied ( Fig. S1; Table S6), and compared with the reference strains Y. lipolytica CBS 6124 14 and Y. bubula CBS 12934 3 (Table 4). Of these, five media (MEA, MGYP, YPG, PDA and TA) exhibited similar colony characteristics and were opaque, white in colour, circular, 2-5 mm in diameter, entire margins and edge, umbonate, dome-shaped elevation with shiny, wrinkled surface. In contrast, the reference strain when grown on MEA (5%, w/v) for 72 h showed tannish-white butyrous colonies 14 whereas Y. bubula showed cream coloured butyrous colonies 3 . The yeast when grown on YLDM, YES and YPO showed varied colony morphologies (Table S6). YLDM is a selective medium used to differentiate Y. lipolytica from other yeasts as its colonies produce a unique deep brown colour after 24 h due to the presence of pigments 49,50 . On YLDM, NCIM 3590 showed circular, white coloured, hat-shaped colonies (Fig S1f) in contrast to the reference Y. lipolytica strain. On YES, media, no growth was seen when grown on minimal medium YNB-sucrose (1%, w/v) suggesting its inability to utilize the sugar while, it was able to assimilate sucrose when grown on complete medium, with white wrinkled surface, erose margin and fuzzy growth (Fig. S1g). YES agar, is routinely used for sporulation tests in yeasts 3 . No sporulation was seen in NCIM 3590 upto 15 days. Most Yarrowia strains are haploid (the only known exception is (CBS 6124) and therefore cannot sporulate unless being mated. On YPO, the colony morphology is given in Table S6 and no zone of clearance was observed in 72 h suggesting the absence of extracellular lipolytic activity (Fig. S1h). This is unlike Y. lipolytica which is a known producer of extracellular lipase and shows a clear zone of clearance when grown on YPO. Hence, growth patterns for NCIM 3590 were found to be different from the reference Y. lipolytica strain. www.nature.com/scientificreports/ Cellular morphology. The light microscopic image (Fig. 3a) and scanning electron micrographs (Fig. 3b,c) of cells grown for 72 h in YNBG liquid media were spherical to ellipsoidal, 3-6 µm in size and displayed unipolar or bipolar budding. Elongated yeast forms were observed after 5 days of incubation and no filaments were observed even when grown for more than 7 days. Yarrowia sp. are dimorphic exhibiting yeast cells, pseudohyphae and hyphae depending on growth conditions. A comparison of the cellular morphology and budding pattern between different Yarrowia sp. is given in Table 4. It is to be noted that while most of Yarrowia sp. reported so far show multilateral budding, NCIM 3590 shows bilateral budding pattern. In YNB medium, Y. lipolytica grows as the yeast form with a polar budding while hyphal growth can be induced either by N-acetyl-d-glucosamine (NAG) or by adding serum to the culture medium 51 . No growth of NCIM 3590 upto 5 days could be seen in vitamin free media and serum (1%, v/v). In NAG (1%, w/v) and serum (10%, v/v) the cells grew in yeast form and no transition to the filamentous form was observed. This is in agreement with the earlier report by Bankar et al. 25 wherein, only yeast forms were observed for NCIM 3590 24 .

Effect of temperature and salt on growth of NCIM 3590. The effect of different temperatures and
salt concentrations on growth of NCIM 3590 strain was evaluated. The yeast grew at temperatures between 10 and 28 °C with optimal growth at 20 °C and no growth on YNBG was observed at 30 °C and beyond as shown in Fig. 4a. Being a marine isolate, the effect of salt was studied on the yeast NCIM 3590 upto 72 h (Fig. 4b). The   4 were unable to grow at this concentrations. Thus, a variation amongst Yarrowia species was observed as they could have adapted to the diverse ecological niches they were isolated from. Hence, based on growth at low temperature and high salt concentrations, it suggests that NCIM 3590 clearly differs phenotypically from Y. lipolytica CBS 6124.

Biochemical characterization. Sugar assimilation studies. The assimilation and fermentation studies of
NCIM 3590 were carried out on differing sugars and sugar alcohols ( Table 4). The test for glucose fermentation was found to be negative as no gas production could be detected which corroborates the results obtained with other Yarrowia sp. [1][2][3][4][5][6][7][8] . Assimilation test for NCIM 3590 at 72 h were analysed using the metabolic fingerprinting database in Biolog. Carbon compounds strongly assimilated by NCIM 3590 were d-glucose, i-erythritol, d-gluconic acid, 2-keto d-gluconic acid and N-acetyl-d-glucosamine (NAG). Weak assimilation was seen for d-galactose, salicin, l-sorbose, d-xylose, l-arabinose, d-arabinose, d-ribose, glycerol, d-mannitol, succinate while no assimilation could be seen for inulin, sucrose, d-raffinose, d-mellibiose, d-trehalose, maltose, α-Methy l-d-glucoside, d-cellobiose, l-rhamnose and d-glucosamine (Table S7). The results from Biolog suggests NCIM 3590 as Y. lipolytica with probability score 1.0, similarity value (SIM) of 0.896 and distance of 1.543. A species to be considered acceptable for identification must have a distance value greater than 5.0 and a SIM of greater than 0·75. Based on the probability score and SIM value NCIM 3590 can be identified as Y. lipolytica. However, the distance value is much lower than the acceptable value of greater than 5. Also to be noted that the Biolog database has only Y. lipolytica and no other Yarrowia species listed in it and hence the best hit amongst the available database with Y. lipolytica is seen.
Sugar assimilation and utilization patterns also illustrate the diversity and adaptation amongst strains thereby, offering a convenient key for yeast identification. Upon comparison (Table 4), all Yarrowia species assimilate glucose, glycerol and i-erythritol while strains with differential assimilation of sugars such as NAG, d-mannitol and d-galactose have been reported. NAG could not be assimilated by Y. galli CBS 9722 2 , Y. yakushimensis CBS 10254 5 (Table S8). Y. lipolytica PO1d, a known producer of extracellular lipase produced 20 and 50 U/mL of lipase activity in YNBO (Yeast nitrogen base containing 1% (w/v) olive oil) and YPGO, respectively while Y. lipolytica strains, ATCC20460 and IMUFRJ 50682, produced up to 30 U/L on unsupplemented olive mill wastewater (OMW) 52 . Y. lipolytica CECT 1240 (ATCC 18942) 53 and Y. lipolytica W29 (ATCC 20255) 54 showed higher lipolytic activity of 700 and 770 U/L with YNBOandYPDO. Thus, though variability in the levels of extracellular levels of lipase are seen in Y.lipolytica, all strains produced it unlike as that seen in case of NCIM 3590 which did not show any significant levels of activity.
LIP2 gene analysis. As no significant extracellular lipase activity was detected in the crude supernatant, bioinformatic analysis was carried out to determine the presence of LIP2 gene in NCIM 3590. Extracellular lipase, Lip2, encoded by LIP2 gene (Gene Id YALI0A20350g) is a 334-amino acid precursor protein containing the putative 13-amino acid signal sequence 13 . The TBLASTN of LIP2 gene showed 42% identity and 60% coverage with e value 2 e −77 with scaffold 57 of NCIM 3590 (Table S9). Further, the matched coordinates 100,268 to 101,158 from scaffold 57 (NKYT01000426.1) were used to identify the ORFs using the NCBI ORF finder. The generated 8 ORFs were used as query to carry out BLASTP (Reverse BLAST) against the LIP2 gene. Out of 8 ORFs generated only one ORF showed 44% identity (113/254 aa) with e value 9 e −79 and 62% coverage (158/254 aa). To validate the results, a similar TBLASTN for LIP2 gene was carried out with reference CLIB 122 and CLIB 89 which showed 100% identity and coverage while in reverse blast, one ORF out of 9, resulted in 100% identity and coverage with LIP2. Thus it seems that the ORF obtained from NCIM 3590 shows low homology to LIP2. According to Meunchan et al. 14 , while LIP2 gene is likely to be present in all members of the clade it has been suggested that the gene has undergone a number of evolutionary events with a high number of duplications. Differing degrees of homologies amongst them exist and 11 lipases homologous to YlLip2 seen of which many were found to be transcriptionally inactive while others were actively expressed as in Y. lipolytica, Y. galli and Y. phangngensis 13 . Hence, low e value, percent identity and coverage in case of NCIM 3590 suggests that the lipase from NCIM 3590 may be transcriptionally inactive or may not belong to the LIP2 family.
In conclusion, this study reassesses the strain NCIM 3590 based on molecular, phylogenetic, genomic, biochemical and microbiological data. Based on this, we suggest that NCIM 3590 and Y. bubula CBS 12934 belong to same species. The availability of NCIM 3590 genome will help in providing a platform for elucidating its potential applications and contribute to the understanding of this unusual Yarrowia strain with optimum growth temperature at 20 °C.
Phylogenetic analysis was performed with ITS and D1/D2 sequences of different Yarrowia species taken from NCBI database. All sequences were aligned separately using Clustal W 57 with default parameters in MEGA X 58 (Molecular Evolutionary Genetics Analysis) software. For both ITS and D1/D2 rDNA set the best fit nucleotide substitution model was determined using Maximum likelihood (ML) criterion. ML tree was constructed using general time-reversible model with gamma-distributed rates of variation among sites and a proportion of invariable sites (GTR + G + I model). The reliability of the trees was tested by bootstrapping with 1000 replicates. The tree is drawn to scale with branch lengths in the same units as those of evolutionary distances (number of base substitutions per site) which is used to infer the phylogenetic tree. Percent divergence at nucleotide level was calculated using p-distance method in MEGA X (www.megas oftwa re.net). www.nature.com/scientificreports/ NGS and de novo genome assembly. The genomic DNA from NCIM 3590 was extracted by CTAB method followed by NEXTFlex DNA library preparation and Illumina NextSeq 500 paired-end sequencing according to the manufacturer's instructions. In total, ~ 41 Million paired-end reads (150 bp) were generated (estimated coverage ~ 600 ×). The generated paired-end reads of 150 bps were processed further for de novo genome assembly. Quality control of reads was checked with FastQC v2.2 (http://www.bioin forma tics.babra ham.ac.uk/proje cts/fastq c/) and trimming carried out using an in-house perl script. Reads were assembled into contigs using SPAdes v3.1 33 and were further assembled into scaffolds with SSPACE 34 . Scaffolds were checked for the length and the scaffolds which are above 1000 bases length were used for super-scaffolds and gap closure using SOAPdenovo2 tool 35 . The reads obtained were submitted to Sequence Read Archive (SRA) and scaffolds to Genbank in NCBI repository.
Overall genome relatedness index (OGRI) and genome comparison. The OGRI between genome of NCIM 3590 and the reference genomes of CLIB 122, W29 (CLIB 89) and CBS 12934 was determined using digital DNA-DNA hybridization (dDDH) and Average Nucleotide Identity (ANI). The dDDH was calculated using the web-based DSMZ service (http://ggdc.dsmz.de) available with Genome-to-Genome Distance Calculator (GGDC 2.0) with BLAST method. The ANI was determined using JspeciesWS 48 while orthoANI was calculated using Orthologous Average Nucleotide Identity Tool 46 (OAT).
Colony morphology of NCIM 3590. The colony morphology was studied by growing the yeast on different media as described by Kurtzman et al. 49  Effect of temperature and salt on growth of NCIM 3590. For pre-inoculum, the yeast cells were grown in liquid YNBG and incubated on a rotary shaker (120 rpm) at 20 °C for 48 h, cells harvested and washed twice with autoclaved distilled water and centrifuged at 10,000g for 10 min. The cell pellet was re-suspended in water and 1 OD cells (1 OD ~ 2 × 10 7 CFU/mL) inoculated per 100 mL media as mentioned below. To study the effect of temperature on growth of NCIM 3590, batch experiments were carried out. Cells (0.5 OD per 50 mL media) were inoculated into YNBG and incubated at different temperatures (5,10,15,20,25,28,30 and 35 °C) on a rotary shaker (120 rpm) for varying time intervals (24-120 h) and growth assessed every 24 h. The cells were spun at 10,000g for 10 min at 4 °C and the pellet was washed twice with sterile distilled water, vortex mixed to separate the cells, OD 600 taken and cell dry weight determined by freeze-drying the cells. Similarly, to study the effect of salt, different concentration of sodium chloride (0, 0.1, 0.25, 0.5, 1, 2, 4, 5, 7.5, 10, 15 and 20%, w/v NaCl) in 5% glucose on NCIM 3590 growth was determined as mentioned above.
Sugar assimilation. The sugar assimilation profile of NCIM 3590 was determined using Biolog system (Biolog MicroStation with Microlog System, Release 4.20, Biolog, Hayward, CA, USA, (http://www.biolo g.com/ micro ID.html). Culture was grown on YPD agar for 72 h at 20 °C and the yeast suspension was prepared in 15 mL sterile water at inoculum density transmittance level 47 ± 2% (corresponds to 0.33 Optical density (OD 600 nm) or 6 × 10 6 colony forming units per mL (CFU/mL). The Biolog YT MicroPlate was inoculated with 100 µL of cell suspension and incubated for 24, 48, 72 and 96 h at 20 °C. Colorimetric change in each well was referenced against control wells and scored as mentioned in the protocol.
The fermentation test was carried out manually by Kurtzman et al. 49 wherein 0.1 OD cells were inoculated in YNB containing 1% (w/v) glucose in medium size tubes containing inverted durhams tube 49 . Bromothymol blue was used as an indicator which on acid production changes the medium color from blue to green or yellow. Gas production is evidenced visually by presence of visible air bubbles trapped inside the Durham tube. www.nature.com/scientificreports/ Glucose Tween 80 (YPGTw). Stock solutions of fatty acid (10% olive oil and 10% tributyrin) were subjected to sonication three times for 1 min each on ice, autoclaved separately and added into liquid media buffered with 50 mM phosphate buffer, pH 6.8. The samples were removed after 72 h to determine lipase activity, soluble protein content and cell wet weight. The cells were harvested by centrifugation at 5000g for 10 min at 4 °C, the supernatant obtained was used as extracellular lipase source. Spectrophotometric method using p-nitrophenyl palmitate (p-NPP) as a substrate was used with slight modification for measurement of lipase activity 12,61 . Protein concentration was estimated by the method of BCA with bovine serum albumin as a standard 62

Data availability
The whole genome sequencing data can be accessed through BioProject Accession Number PRJNA328405. The respective BioSample Accession Numbers is SAMN05170375. The SRA reference numbers of the whole genome sequencing are SRX1850030 (Illumina NextSeq500 short paired-end reads). This Whole Genome sequencing data has been deposited at DDBJ/EMBL/GenBank under the Accession NKYT00000000. The version described in this paper is version NKYT01000000. The datasets generated and analysed during the current study for ITS and D1/ D2 rDNA sequences are available at NCBI with Genbank Accession Nos. MK411246 and MK411222 respectively. www.nature.com/scientificreports/