Network-assisted genetic dissection of pathogenicity and drug resistance in the opportunistic human pathogenic fungus Cryptococcus neoformans

Cryptococcus neoformans is an opportunistic human pathogenic fungus that causes meningoencephalitis. Due to the increasing global risk of cryptococcosis and the emergence of drug-resistant strains, the development of predictive genetics platforms for the rapid identification of novel genes governing pathogenicity and drug resistance of C. neoformans is imperative. The analysis of functional genomics data and genome-scale mutant libraries may facilitate the genetic dissection of such complex phenotypes but with limited efficiency. Here, we present a genome-scale co-functional network for C. neoformans, CryptoNet, which covers ~81% of the coding genome and provides an efficient intermediary between functional genomics data and reverse-genetics resources for the genetic dissection of C. neoformans phenotypes. CryptoNet is the first genome-scale co-functional network for any fungal pathogen. CryptoNet effectively identified novel genes for pathogenicity and drug resistance using guilt-by-association and context-associated hub algorithms. CryptoNet is also the first genome-scale co-functional network for fungi in the basidiomycota phylum, as Saccharomyces cerevisiae belongs to the ascomycota phylum. CryptoNet may therefore provide insights into pathway evolution between two distinct phyla of the fungal kingdom. The CryptoNet web server (www.inetbio.org/cryptonet) is a public resource that provides an interactive environment of network-assisted predictive genetics for C. neoformans.


Sequence and functional annotation data for Cryptococcus neoformans
CryptoNet was constructed for the all protein coding genes of C. neoformans var. grubii H99 (serotype A) which lists 6,962 genes validated by deep coverage RNA-sequences 14 . We also added 13 validated mitochondria coding genes for further coverage (C. neoformans var. grubii H99 Sequencing Project, Broad Institute of Harvard and MIT, http://www.broadinstitute.org). As a result, a total of 6,975 protein coding genes were used for the construction of a C. neoformans functional network. For functional analysis of C. neoformans genes, we used Gene Ontology (GO) annotations for Saccharomyces cerevisiae orthologs, obtained from Saccharomyces Genome Database (SGD) 15 . Orthologous genes between two species are assigned by BLASTP bidirectional best hits.

The gold standard co-functional gene links
Machine learning approaches for network construction requires gold standard linkage data. Pathway annotation databases are credible resources to collect them, yet current annotations for C. neoformans var. grubii (serotype A) are not sufficient. Therefore, we decided to use pathway annotations of C. neoformans var. neoformans (serotype D) because C. neoformans var. neoformans shares ~90% coding genome with C. neoformans var. grubii 16 . Pathway annotations by Kyoto Encyclopedia of Genes and Genomes (KEGG) were downloaded on August, 2013 17 , A total of 206,624 positive gold standard gene pairs were generated from 105 annotated KEGG pathways. To avoid network training bias toward a few dominant pathways, we excluded gene pairs derived from three over-dominant pathways 18 : metabolic pathways (cne:01100), biosynthesis of secondary metabolites (cne:01110), aminoacyl-tRNA biosynthesis (cne:00970). These three KEGG terms account for ~78 % of gold standard positive gene pairs. For orthology mapping between C. neoformans var. grubii and C. neoformans var. neoformans, we used an inparalog score threshold of 1 by Inparanoid (v4.1) 19 . The final gold standard set consists of 43,524 positive and 955,467 negative gene pairs among 1,414 C. neoformans genes.

Benchmarking and integrating networks
The likelihood of co-functional link supported by given experimental or computational data was measured by Bayesian statistics approach 20 . Log likelihood score (LLS) was calculated by following equation: | and | are probability of gold standard positive and negative link for given experimental data, respectively, and and are probability of gold standard positive and negative link before the experimental data provided, respectively. To avoid overtraining, we calculate LLS with 0.632 bootstrapping as described in 21 .
For the links with multiple LLSs, we integrated the scores using weighted sum (WS) method as described in 21 : , where represents LLS ( is the maximum LLS of a given functional link), and i is the index number for all other LLS by ranked order.
is a free parameter used as a weight factor, and T is a minimum threshold of LLS.

Co-functional links inferred from co-citation of genes (CC)
The original co-citation algorithm was based on an idea that functionally related two genes tend to be cited at the same research article abstract 22 . However, some articles have names of genes in the main text. To improve sensitivity of search, we scanned full text rather than just abstract for co-citation analysis. We searched PubMed Central (PMC, http://www.ncbi.nlm.nih.gov/pmc/) for articles containing "Cryptococcus neoformans" in abstract and any C. neoformans gene name in full text. As a result, we found a total of 609 articles containing C. neoformans gene names, and then assign probability of association between genes by Fisher's exact test.

Co-functional links inferred from co-expression of genes (CX)
We downloaded 12 microarray data sets containing no less than 8 samples of gene expression from Gene Express Omnibus (GEO) 23 on August, 2013. Pearson correlation coefficient was measured between all pairs of gene vectors of expression values to infer functional association between two genes. We tested a total of 12 microarray data sets and were able to infer functional links from seven of them: GSE31911 and 6 published data sets [24][25][26][27][28][29] including 191 samples (Table S8).

Co-functional links inferred from domain co-occurrence between proteins (DC)
Domains recur functional regions of proteins. Because domains are functional subunits of proteins, proteins that share a similar set of domains may contribute to a similar function. Using profiles of domain occurrence for proteins by InterPro database 30 , we measure likelihood of functional association for given tendency of domain co-occurrence (DC) between two proteins. To learn a more informative co-occurrence pattern, we used a weighted version of the mutual information score, in which higher weights were given to rarer domains under the assumption that rarer domains harbor more specific pathway information.

Co-functional links inferred from genomic contexts (PG, GN)
We used genomic context information of C. neoformans proteins to discover functional associations between genes with two different methods, phylogenetic profiles [31][32][33] and gene neighbors [34][35][36] . The similarity of phylogenetic profiles between two C. neoformans genes reflects the degree of co-inheritance of two genes during speciation, because functional constraints between functionally coupled genes mainly determine co-inheritance pattern. We first ran BLASTP to compare all C. neoformans protein sequences against all protein sequences from 1,626 Bacteria genomes, 122 Archaea genomes, and 396 Eukaryota genomes. Phylogenetic profile matrices of the blast hit scores were constructed, and the similarity between profiles was measured by mutual information scores as described in 37 . For C. neoformans genes, we found that the similarity of phylogenetic profiles for each of the two domains of life (Archaea and Bacteria) performed better than that for all 2,144 genomes in retrieving gold-standard functional links. Therefore, we integrated the two domain-specific networks into a single network by phylogenetic profiles. For network inference by genomic neighborhood across 1,746 prokaryote genomes, we used two approaches to measure the genomic neighborhood: chromosomal distance between neighboring genes 35,36,38 and probability of observed neighborhood 34 . Because we previously found complementarity between these two methods 39 , we integrated them into a single network by genomic neighborhood.

Co-functional links by orthology-based transfer from yeast and human networks
Associalogs are conserved functional associations transferred from different species by orthology 21 . We transferred conserved functional links between C. neoformans genes from YeastNet v3 40  ,where A and B are C. neoformans genes and Aʹ and Bʹ are orthologous genes from S. cerevisiae or human, and the transferred functional association of Aʹ-Bʹ from that of A-B obtains weighted values as how likely A-Aʹ and B-Bʹ are orthologous by Inparanoid. We transferred 7 and 2 co-functional associations of YeastNet v.3 and HumanNet, respectively (summarized in Table S1).

Assessment of prediction power of networks for pathogenicity
To assess predictive power of CryptoNet for C. neoformans pathogenicity, we used a total of 73 genes involved in three different virulence phenotypes were collected from literatures: 28 genes for capsule formation, 23 genes for melanin production, and 22 genes for thermotolerance. (Table S2). Predictions were performed for each of virulence phenotype. Ranks of virulence genes were assigned by sum of LLS by CryptoNet links from a gene to all other genes for the same virulence phenotype. If the known virulence genes are well interconnected by a given network, they are highly ranked by this network-assisted prioritizing method. The performance of network-assisted ranking was then assessed by the receiver operating characteristic (ROC) curve, which is usually summarized as a single score, area under the ROC curve (AUC). The ROC curve in general represents true positive (TP) rate for the given cost of false positive (FP) rate.

⁄ ⁄
, where TP is a correctly predicted virulence gene, FP is an incorrectly predicted virulence gene, true negative (TN) is a correctly predicted non-virulence gene, and false negative (FN) is an incorrectly predicted non-virulence gene. AUC score ranges between 0.5 and 1, which indicate random prediction and perfect prediction, respectively. Visualization of networks for three virulence phenotypes were conducted by Cytoscape 43 .

Prediction of novel candidate genes for anti-fungal drug resistance
To predict candidate genes for antifungal drug resistance, we devised a method of searching for context-associated hub genes. This method requires two components: subnetworks and an expression signature. Each subnetwork is composed of a hub gene having no less than 50 connected neighbors by CryptoNet and its neighbors. We predefined 2,135 subnetworks for the given number of neighbors as a threshold. An expression signature represents a cellular context such as drug stress condition. To construct an expression signature for antifungal drug stress, we collected 230 C. neoformans genes that were up-regulated by >2-fold upon treatment of fluconazole 44 (Table S4). Normalization of the expression data was conducted by Limma 45 . For the given pair of gene sets, one for 230 up-regulated genes by drug treatment and the other for neighbor genes of each of 2,135 hubs, we measured significance of enrichment using Fisher's exact test. If neighbors for a hub gene are significantly enriched among the up-regulated genes by fluconazole treatment, the corresponding hub gene is considered to be associated with the context of fluconazole treatment. We found 94 hub genes are significantly associated with gene expression response to fluconazole treatment (pvalue < 0.05).

Construction of ypk1Δ, sho1Δ and kin1Δ mutants
The genes were deleted with a disruption cassette containing split Nat r dominant selectable marker generated by double joint PCR as described before 46 . The gene disruption cassette was introduced into the H99S strain through the biolistic transformation method, as previously described 47 . The correct genotype of these mutants was confirmed by Southern blot analysis as described before 48 . Primers for this disruption and Southern blot analysis were described in Table S9.

Assay for virulence factor production, thermotolerance, and antifungal drug resistance
The C. neoformans Madhani collection strains used in this study were provided by Fungal Genetics Stock Center. Before tests, we confirmed the correct genotype of each strain by diagnostic PCR to check potential cross-contamination during strain storage and recovery. Each strain was cultured in a liquid YPD (yeast extract-peptone-dextrose) medium for 16 hours at 30C. The agar-based DME (Dulbecco's modified Eagle's) medium (Invitrogen, Carlsbad, CA) for capsule production, and agar-based Niger seed medium, which contains indicated concentration of glucose (0.1% and 0.5%), for melanin biosynthesis were prepared as previously described 49 . The capsule production of strains showing growth defect at 37C were determined at 30C. The relative capsule size of each cell was quantitatively measured, previously described 49 . Melanin production was monitored and photographed daily. For thermotolerance test, strains were incubated overnight at 30°C in a liquid YPD medium, washed, serially diluted (1 to 10 4 dilutions) with dH 2 O, and spotted onto a solid YPD medium. Each plate was incubated at 37C or 39C for 2-4 days and photographed for incubation period. For antifungal drug resistant test, each serially diluted strain (1 to 10 4 dilutions) was spotted onto a solid YPD medium containing the indicated concentration of azoles (fluconazole, itraconazole, and ketoconazole) or amphotericin B and incubated at 30C and photographed for 2 to 5 days.

Galleria mellonella infection assay and in vivo mouse study
G. mellonella in the final instar larval stage (15 larvae per strain) was infected through prolegs with 800,000 Cryptococcus cells in 4 μl of PBS by the Hamilton syringe. After injection, larvae were incubated at 30 o C or 37 o C for 14 days and then larvae showing signs, such as changes in body color and no movement in response to touch, were considered dead.
Larvae transforming into pupa were censored. The Kaplan-Meier survival curves were constructed by Prism 5.01 (GraphPad Software) and p-values were calculated from Gehan-Breslow-Wilcoxon test.
Four-to six-week-old female A/Jcr mice (National Cancer Institute, 18-20 g) were utilized in this study. For infection, strains were cultured in YPD medium overnight at 30°C, washed twice with phosphate buffered saline (PBS), and resuspended in PBS at 2×10 6 cells per ml. Serially diluted cells were plated onto YPD medium and incubated at 24°C for 72 hr to determine viability and CFU. Ten mice per strain (except 9 mice for strain YSB1720 due to one death after pentobarbital treatment) were anesthetized with pentobarbital (Lundbeck Inc.) and infected via intranasal instillation with 10 5 cells (in 50 µl). Survival was monitored twice daily, and moribund mice were CO 2 -euthanized. The Kaplan-Meier survival curves were generated with Prism 5.02 (GraphPad Software), and P values were evaluated from a Logrank (Mantel-Cox) test.

Ethics statement
The animal studies at Duke University Medical Center were in full compliance with the guidelines of the Duke University Medical Center Institutional Animal Care and Use Committee (IACUC) and the United States Animal Welfare Act (Public Law 98-198). The Duke University Medical Center IACUC approved all of the animal studies. The studies were conducted under protocol number A217-11-08 in Division of Laboratory Animal Resources (DLAR) facilities that are accredited by the Association for Assessment and Accreditation of Laboratory Animal Care (AAALAC).