A self-consistent probabilistic formulation for inference of interactions

Large molecular interaction networks are nowadays assembled in biomedical researches along with important technological advances. Diverse interaction measures, for which input solely consisting of the incidence of causal-factors, with the corresponding outcome of an inquired effect, are formulated without an obvious mathematical unity. Consequently, conceptual and practical ambivalences arise. We identify here a probabilistic requirement consistent with that input, and find, by the rules of probability theory, that it leads to a model multiplicative in the complement of the effect. Important practical properties are revealed along these theoretical derivations, that has not been noticed before.


INTRODUCTION
The underlying mechanisms of complex functional networks are typically poorly understood or inaccessible to measurements (1,2).The available information for wiring these interaction networks typically consists of incidence patterns of factors at one end of the causation process, with the outcome of an inquired effect at the other far end.These patterns can indicate the presence of functional interconnected events, though they rarely provide cues of when, where and how the intermediate events interconnect.However, when the immediate consequences of factors are separated in spacetime, they are often regarded as non-interacting, even if their farther consequences interconnect before reaching the effect.Evading the notion of "distant" interactions leads to the recommendation of multiple models for no-interaction (null-models) within the same study (3)(4)(5)(6)(7)(8)(9).For example, besides the risk-additive null-model vehemently advocated in epidemiology, a risk-multiplicative null-model has been invoked for factors acting separately on different steps of a multistage pathogenesis process (3,5), and has been also prescribed for independence of protecting factors (4,(6)(7)(8)(9).
A multiplicity of no-interaction models conveys unnecessary conceptual and pragmatic difficulties.How many non-interaction null models can be conceived within the same study?Which model is appropriate in each tested pair?Can an interaction model prescribed for one case be mathematically equivalent to a non-interaction null-model prescribed for another case?How to distinguish one from the other in each particular case?Space-time resolution of connected events are hardly possible from incidence data of factors and effect only.
A more practical perspective has been tacitly conveyed in large scale interaction studies by relaying on a single non-interaction model for the whole analysis (10)(11)(12).We learn as much as the evidence permits, by the sole diagnostic of the presence of interactions, disregarding their space-time separation.However, further conceptual ambivalences hinder the identification of a unique appropriate model.It is ussually agreed that factors interact when the effect of their combination deviate from what would be expected based on their individual separate effects (4,13).Though intuitive, far from guiding to definite physical meanings this consensus merely defers the issue to what a reasonable expectation of the effect of joint independent factors is (2,14,15), leaving the resolution of the concept open.Indeed, diverse and inconsistent measures of interaction pervade the literature (7,(10)(11)(12)(13)(16)(17)(18)(19)(20).
Here we undertake a novel resolution of the concept of interaction from basic probabilistic principles and elementary requirements, with a definition that embraces distant interactions.The stated requirements are enough to determine a unique model.Our derivation reveals theoretical properties unnoticed so far, which turned into superior performance when analyzing the data of the largest wiring experiment on Saccharomyces Cerevisiae to date (12).

RESOLUTION OF THE CONCEPT OF INTERACTION Probabilistic Model
We are interested in classifying the interaction or non-interaction relationships between factors with respect to an effect.The contextual background is generally unknown, and the effects are not fully under control of the factors status, responding stochastically.A probabilistic framework is required to account for this uncertainty.The occurrence of  can be classified into three manifestations   ,   and   , corresponding to the effect of exposure to factors ,  and  respectively (Figure 1).This correspondence can be represented by the logical expression  ≡   ∨   ∨   , where ∨ denotes logical OR.Note that   ,   and   are each other not necessarily exclusive.Generally, our data does not allow us to discern between these manifestations, undistinguishably observable as .This decomposition is only a temporary construct.As we will see, the variants   ,   and   cancel in our derivations, leaving equations that depend only on observables.

A) B)
Our elementary assumption is that for non-interacting factors  and  the probability Pr(| ) = 1 − Pr(| ) must satisfy the following factorization: Pr(| ) = Pr(  |) Pr(  |) Pr(  ) , ∀ ∈ ,  ∈  ( 1 ) for all realizations ,  of the factors , .The overbar denotes logical negation, henceforth indicating the non-occurrence of an effect or the non-exposure to a factor.Concatenation denotes logical AND.In this case  =       , since none of   ,   nor   is realized when the effect  is not realized.A particular logical framework yielding ( 1 ) is shown in Methods.Everything that follows is derived from ( 1 ), which is the minimal requirement of no-interaction.

Neutral model
Factorization ( 1 ) has practical limitations since it is expressed in terms of the un-observables   ,   and   .It is shown in Methods that the probability Pr(| ) of non-interacting factors  and  must satisfy where the function (, ) is defined as Equality ( 2) is a requirement of the neutral model resulting from the factorization ( 1 ), but expressed in terms of the observables ,  and  only.

Abstraction from mechanistic details
The notation   ,   ,   and  enwraps the unobserved mechanistic structure in our logical framework.All these structures cancel in the derivation of the neutral model ( 2)-( 3 ) (see Methods).Therefore, departure from ( 2 ) implies that for any resolution of the effect  into a disjunction   ∨   ∨   , requirement ( 1 ) cannot hold.In other words, there is no possible separation of the pathways leading to effect  that can be independently associated to the factors  and .In this case information of the effects of independent exposure is not enough to predict their joint effect.Fortunately, because of this structure cancelation, searching for a particular splitting   ,   and   of the effect  satisfying ( 1 ), that might not even exist, is not required for the diagnosis of interaction.This is of crucial importance in practice.For example, large scale interaction studies are nowadays performed to build interconnected maps of simpler organisms (12).Millions of gene pairs are tested but only a very small fraction interacts.This already daunting task would be impossible if the molecular mechanisms mediating each pair were required a priori to detect interaction.Instead, the presence of interactions can be diagnosed by equality ( 2 ), and then subsequent experiments to discern interaction mechanisms can be specifically targeted in promising pairs, without wasting in the rejected pairs.

Safety device against spurious susceptibility
If a factor not mechanistically associated to an effect is correlated to a causal factor, it can appear coincidentally associated to the effect, without actual causal connections.Such is the case of a "neutral" locus in the close proximity (linkage disequilibrium) to a locus causally connected to a given disease, mimicking the frequencies and correlations of the causal locus, displaying a spurious association to the disease.
Suppose which associates gene  to the effect   , even when the molecular machinery involved in the disease is not perturbed by this gene.The right side of ( 4 ) can be interpreted as the expected risk originated from the variants of causal factor  in the proportions they co-occur with the variant  of factor .The structure Pr( ) of the population spuriously "propagates" the susceptibility of factor  to factor , and that is why neutral genes are often erroneously associated to diseases.
The terms Pr(|) and Pr(|) introduce spurious susceptibilities Pr(  |) and Pr(  |) in the numerator of ( 3 ) (see also equation ( 10 ) in Methods).These fake associations cancel with the denominator, and the neutrality function ends up depending only on the pure susceptibility carrier Pr(  |) and Pr(  |) (see equation ( 11)-( 12) in Methods).

Invariance to population structure
Along with the cancellation of the propagated association also went away the dependency on the structure Pr( ) of the joint distribution of factors variants shown in the denominator of ( 3 ) (See equation ( 12 ) in Methods).Hence, the neutrality function derived here is invariant to the marginal distribution of factors.
Correlated structure of Pr( ) is ubiquitous in populations, controls, and patients, and their impact on inferences has been realized for long in population genomics and genetics.However, so far as we know, no such built-in cleaning device has been revealed in previous approaches to the interaction subject.This issue has been ignored, or patched a posteriori at best.The correlation cleaning can be particularly relevant for experimental design and data analysis in big-data scenarios, like cancer genome projects.

Interaction measure
The background factor distribution Pr( ) is required in the denominator of ( 3 ).However, the structure of Pr( ) is often unknown, and gathering related information usually requires expensive experimental designs.One way to get rid of these factors is to take ratios.
= Pr(| ) Pr(| ) Pr(| ) Pr(| ) Where ,  ∈  and ,  ∈ .Note that  is expressed in terms of observables only, but without requiring the distribution Pr( ) to be computed.Replacing each probability term in ( 5) by ( 1), reduce the non-interaction hypothesis to  = 1, or by taking logarithms The magnitude of interaction is an increasing function of the module of , and log  > 0 or log  < 0 indicates positive or negative interactions respectively (see demonstration in Methods, equations ( 16)-( 20)).Equations ( 5) and ( 6) lead to a model multiplicative in the complement of the effect, Pr(| ) ( 7 ) which can also be verified directly from ( 1 ).
Despite this model has been advocated in toxicology (21,22), it is mostly ignored by the dominant epidemiology and genetic networks literature (1,13,23).

DETECTING GENETIC INTERACTIONS Interaction and fitness
Several genome-scale interaction studies have been conducted in Saccharomyces Cerevisiae (13), and the recent literature illustrates the quandary in the resolution of the concepts of interaction and fitness (see Methods).Can the measures used in these studies be explained in terms of the model multiplicative in the complement of the effect?How these models perform in the detection of interaction?To address these interrogations in a unified manner, the problem originally contextualized in terms of genes and fitness, is reformulated here in terms of factors and effect.
Let  denotes the average rate of cell duplication per unit of time.The probability for a strain  that cell duplicates at least once in a lapse of time  (i.e. ≡  > 0) can be modeled according to (24) by In this case  and  might denote gene variants at two locus  and .We choose, without losing generality, the duplication average time of wild-type strain as the unit of time, i.e.  00 = 1.Substituting ( 8) in ( 5) and taking logarithm, the measure ( 6 ) for genetic interaction case yields In experiments where cell grow at a constant rate, the parameter  is not the duplication rate but the growing rate,  is not the number of duplications, but the number of cell.Derivations ( 8)-( 9) follow faithfully.
By posing the genetic interaction problem in terms of probabilities of factors and effect we have shown that the multiplicative model in the complement of the effect imply additivity of duplication rates ( 9 ), demonstrating in the way, that the measure of interaction used by Janos et.al. can be justified from the basic principles appealed here.
In what follows, performance of the derived measure are evaluated with data from the largest global interaction network study in Saccharomyces Cerevisiae (12).We compare the measure ( 9 ), with the multiplicative on rate measure ℳ =  11 −  01  10 used in the original study, where  are the growth rate relative to wild-type of single-mutants (01 and 10) and double-mutant (11) isogenic cell populations (see Data source in Methods for more details).

Interaction networks
The overall impact on the interaction network is shown in the log  vs. ℳ plot in Figure 2A of the Essential x Essential (ExE) genes pairs dataset.A similar plot is obtained for Nonessential x Nonessential (NxN), see Figure S 1A.The initial correlation between the most extreme negative interactions (lowerleft quadrant), is progressively deteriorated, due to a remarkable propensity of the measure log  to score positive interactions with respect to ℳ.
We adopted from the original work (12) the regions |ℳ| > 0.08 to classify interaction at p-value < 0.05.The symmetric regions |log | > 0.0886 contains the same number of interacting pairs at p-value < 0.05 for the ExE data.The overbar indicates no interaction, the symbol   denote the number of pairs regarded as no-interaction by ℳ and interaction by ; the symbol   those pairs regarded as interaction by ℳ and no-interaction by ; the symbol   those pairs regarded as interaction by both ℳ and ; and the symbol   those pairs regarded as no-interaction by both ℳ and .(12)).However, by using  the pattern reverses with positive interactions now prevailing in the ExE library.Within this library,  scored more than 20 000 new positive interactions, whereas the number of negative interactions decreased to a similar extent (Figure 2A).An even distribution of positive and negative interactions is produced by  with the NxN library (Figure S 1B).
Major discrepancies between  and  in detecting positive genetic interactions arise for double mutant fitness values below 0.5, , spanning interaction J-scores from 0.2 to 0.7 in both libraries (empty triangle in the positive side of Figure 2B, filled by  in Figure 2C, and similar in Figure S 1C dark-gray dots).This suggests that  measure would be particularly insensitive to low double-mutant fitness values.On the other hand, negative genetic interactions are consistently captured by both measures across different magnitudes of interactions and fitness (lower-left quadrant in Figure 2A, and left side of Figure 2B-C, Figure S 1A-C).

Functional Annotations
To evaluate how the interacting gene pairs segregates into functional categories with each measures we computed the number  ,   and   of pairs co-annotated to the same biological process (GO BP), pathway (KEEG), or molecular complexes (Data S 1-Data S 3).On average, ℳ missed 31% to 44% of the positive interactions annotated by , more notable within molecular complexes (Figure S 2A); whereas positive interactions missed by  did not exceeded 6%.On the other hand,  missed 1.6% to 25% of the negative interactions annotated by ℳ along the three functional categories.A similar global pattern was obtained from the interaction analysis of the NxN library (Data S 5-Data S 7; Figure S 3A).
Subsequently, we highlighted processes, pathways and molecular complexes with higher rates of annotated interactions missed by either measures within the ExE library (Figure S

Biological Processes
More than 25% of the positive interactions reported by J, spanning 140 of 189 GO_BP terms with 500 or more interactions, are not reported by  (Data S 1).Particularly, the top percent (> 50%) pertained to "ribonucleoprotein complex subunit organization" and "ribonucleoprotein complex assembly", where J detected 567 and 545 positive interactions (  +  ) respectively; while  only detected 267 and 257 of them.Comparatively, J missed 0.5% to 13% of the positive interactions reported by .
On the other hand, 5% to 36% of the negative interactions reported by ℳ are missed by .In 71 of 234 GO_BP categories, the number of interactions missed by  (i.e.  ) represents more than 25% of the total number of interactions this measure report (  +  ).Among the top 10 GO_BP categories with higher miss annotation rates for both positive and negative interactions, five are related to ribosome biogenesis (Figure S 2B).

Pathways
In 20 of the 35 pathways comprising more than 10 interactions,  measure would miss between 25% and 100% of the positive genetic interactions scored by J (Data S 2, Figure S 2A, Figure S 4A-B).Of note, ℳ missed more than 40% interactions reported by  in the pathways "Basal transcription factors", "Ribosome biogenesis in eukaryotes" and "Proteasome", summarizing more than 300 genetic interactions.Particularly, positively interacting gene pairs co-annotated to the pathway "Ribosome biogenesis in eukaryotes" emerged under-annotated by .
Otherwise, negative genetic interactions were almost equally scored by both measures (Figure S 4B).The largest divergence among the negative interactions was found within the pathway "Ribosome biogenesis in eukaryotes" with 26 of 243 interacting pairs classified by  as negative interacting pairs (10.7%).Altogether, roughly 120 genetic interactions pairs co-annotated to the pathway "ribosome Biogenesis in eukaryotes" are differentially scored by  and J (56.9%).

Molecular complexes
Among the molecular complexes defined by Costanzo et al. 2016 containing more than 10 interaction pairs, 30 and 54 comprised positive and negative interactions respectively (Data S 3).Overall, negative interactions were roughly 15-fold higher than positive interactions across the molecular complexes.Two major and related complexes, the "preribosome-large subunit precursor" and "90S preribosome" comprised a large number of divergent annotations between  and  (Figure S 2C).More than 50% of positive genetic interactions are missed by , whereas 30-40% of negative interactions are missed by .

Missing hubs
J and M agree on more than 80% of the detected interactions, and several hubs are co-identified within ExE and NxN libraries (Data S 8).Both measures produce hubs with similar degree, identity and magnitude of the genetic interactions (Figure S 5A-B left).However, the spread above the diagonal in the plots of connections degree suggest the detection of highly connected genes by , not detected by  in both libraries (Figure S 5A-B right).  1 and the distribution and magnitude of the interactions can be appreciated from Figure 3A and B for both measures.Notably, the few connections detected only by  lay in the borderline close to  = −0.08.
The five candidate hubs surfaced by  within the ExE library are annotated with 27 to 81 physical interactions, and 10 to 41 genetic interactions (SGD; http://www.yeastgenome.org/),and their Temperature Sensitive (TS) alleles significantly decrease the fitness (between 0.2037 and 0.3420) as expected for a hub protein (Table S1).Moreover, the five genes are pleiotropic as verified by the multiple Gene Ontology annotations using GO-Slim terms (25).
The candidate hubs tif35 and tim17 are essential components of molecular complexes partaking protein translation and mitochondrial import channel structure (26,27).The genes trm112, noc4 and rrp7 are involved in ribosome biogenesis and export, and located in the nuclear compartment of the cell (28-30).Accordingly, these genes displayed expression correlation with a set of 20 genes enriched for the GO_BP ribosome biogenesis (Data S 4: SPELL analysis ACS>5.3, p-value=7.31e-23)(31).Moreover, trm112, noc4 and rrp7 are included in the GO_BP categories "ncRNA processing" and "ribonucleoprotein complex biogenesis" which comprised a large number of interactions missed by  (Figure S 2B; Data S 1).Noc4 is also a member of the protein complex "90S preribosome", a category where  missed about 70% of the positive interactions reported by  (Figure S 2C; Data S 3).Altogether, the five candidate hubs displayed more than 700 previously unnoticed positive interactions, spanning at least three different biological processes and two cellular compartments (Table 1 and Table S1).The extensive re-wiring of the local genetic interaction network of the hubs trm112, noc4 and rrp7 is illustrated in Figure 3C-E.Although no direct genetic interaction was detected among them,  identified 123 intermediary connectors (distance 1).Notably, these set of connecting genes were enriched for Biological Processes such as RNA processing (40/123 genes, p<1.34e-6), ribonucleoprotein complex subunit organization (21/123, p<1.47e-5) and ribonucleoprotein complex biogenesis (34/123, p<1.47e-4) (see Method).In contrast,  found nop14 as the unique intermediary connector among two of these hubs, and the interaction network is pretty sparse (Figure 3E).

A) B) C) D) E) F) G) H) I) Figure 4: Double mutant fitness (f11) vs J/M interaction score comprising positive interactions (ExE library). Black and dark-gray dots indicate MJ co-identified interactions the first with coordinates M and the later with coordinate J. Red and green dots indicates 𝑀 𝐽 and 𝑀 𝐽 interactions. The newly identified hubs are highlighted in magenta. A) All positive interactions. B) masking interactions. C) Suppressor interactions. D-F) distribution of double mutant fitness f11 of the interaction detected by M. G-I) distribution of double mutant fitness f11 of the interaction detected by J.
Four candidate hubs surfaced from the NxN library after a similar analysis, summarizing more than 490 new positive interactions (Table 1).These hubs are annotated with several physical and genetic interactions and their deletion in the yeast genome significantly decreased cell fitness (Table S 1).
A symmetric analysis was preformed to detect   candidate hubs with more than 10 connections only detected by  (i.e.  > 10), where J missed more than 90% (i.e.  +   < 0.1  ).No candidates were gathered in any of the libraries (ExE and NxN), neither even with a less restrictive filter   +  <   , suggesting that J measure is sensitive to detect meaningful interactions out of the reach of M.

Masked and suppressor interactions
Considering that major divergences between the interactions measures comprise positive genetic interactions surfaced only by J, we further classify them into suppressor,  11 > min( 01 ,  10 ), or masking interactions (the reminder positive interaccions).Moreover, we analyzed the distribution of the emerging hubs listed in Table 1 across these types of alleviating genetic interactions (Figure 4 and Figure S 7).Most of the newly detected positive interactions comprised masking interactions (19703 of 22622, 87%), resulting in double mutant (f11) fitness values within 0.1-0.8(Figure 4G-I).On the contrary, masking interactions producing f11<0.4are poorly detected by M (Figure 4D-F).The amount of Jsurfaced masking interactions equates the number of suppressor interactions detected in the ExE library (Figure 4I).
On the other hand, the five ExE hubs partake 643 masking (89%) and 81 (11%) suppressor interactions.Such hubs are surfaced by J at fitness values f11<0.4(Figure 4B-C).Particularly, trm112, noc4 and rrp7 which are involved in Ribosome Biogenesis displayed 78 positive interactions within this process, 70 masking and eight suppressors.

DISCUSSION
The resolution of the concept of interaction accomplished from first principles here, recognize the inclusion of multistage distant interactions as a more realistic approach.The implied neutrality function lead to the model of no interaction multiplicative in the complement of the effect.The derivations revealed theoretical properties unnoticed so far, that turned out particularly relevant for genetic interaction mapping in large scale assays, where distribution of gene variants differ between populations, and genetic susceptibilities are spuriously propagated by linkage disequilibrium.The warranted abstraction from the intricacies of molecular mechanisms mediating the interactions, is of crucial practical importance for global genetic interaction mapping, and other big-data scenarios.
The choice of the interaction measure determines the identity, sign, strength and distribution of genetic interactions across functional modules and subcellular compartments of the cells, conditioning the functional mapping.Genetic interaction profiles has been used to assemble hierarchical models of cell function on Saccharomyces Cerevisiae (12).The gene-fitness interaction question, reformulated in terms of factors and effect, lead to a simple additive formula also benefited from the theoretical niceties here revealed, lacking in other interaction measures.The comparative performance shown on the genetic interaction data imply:  Partial re-wiring of regions of the currently known genetic interaction landscape in yeast with functional implications and relevancy to comparative genomics studies. Less aversion to positive interactions, re-assessing their roles in biological processes, pathways, and molecular complexes, now suggesting that positive interactions prevail among essential genes. Capturing experimentally supported hubs mostly enriched by positive interactions, otherwise missed by current multiplicative measure. That masking interactions have a more important role on ribosome biogenesis than previously conceived.
It has been previously considered that interaction models become shady with low fitness double mutants (15).Precisely, major divergence between M and J comprise positive genetic interactions of the masking type resulting in low fitness phenotypes (<0.3) and spanning a wide range of interaction scores (0.2<J<0.7).
Costanzo et.al. gauge the functional similarity between genes with the Pearson correlation coefficient (PCC) of the genetic interaction profiles, and create similarity networks that organize genes into clusters highlighting biological modules (12).Figure 2E emphasizes the discrepancies between PCC computed with J and M measures (ExE library).Of note, more than 30% of pairs reported with similarity PCC > 0.2 by one measure is missed by the other.Since the genetic interaction profile of a particular gene is composed of its specific array of negative and positive interactions, it is expected that dissimilar annotation of such interactions redefine some functional domains, as well as modify other profile similarity-based derivations like prediction of gene function, pleiotropy and network connectivity (12).
The architectural features of complex systems, found in social and technological networks are shared by molecular interaction networks within a cell.This universality suggest that similar laws govern most complex networks in nature.Nothing in the derivation of the interaction model restrict the scope to network biology or biomedical applications.The unity of arguments elaborated here permit to analyze dissimilar interaction models and experimental data within a common framework.We hope this unified view contributes towards the end of a vivid controversy and complaisance with the coexistence of different model of interaction that plague the literature with inconsistent revenues.
Systematic analysis of tripartite or more complex genetic interactions networks commence to emerge(34) (35).Hence, theoretical research and experimental designs remain critical for more accurate construction of higher-order interactions landscape.

Log-linearity of the neutral model
The logarithm of Pr(| ) in non-interaction or interaction scenarios can be expressed in the form log{Pr(| )} =  +   +   −   , ∀ ∈ ,  ∈  ( 14 ) where a generally valid assignment can be conveniently obtained by fixing =   +   − log Pr(| ) Pr(| ̅  ̅ ) ( 17 ) In the particular non-interaction case, the minimal requirement ( 1 )  The summation terms cancel in the numerator and denominator of ( 13 ), yielding the condition ( 2 ) of no-interaction.

Meaning of log-linear parameters
Far from merely being a convenient mathematical approximation valid within a limited subset of the neutral functions, the log-linear form of the neutral functions is a general valid model for no-interaction (21).
Suitable meanings of the parameters ,   ,   can be suggested from the probability terms in ( 16) and ( 18 ), explicitly relating unobserved structure of inner mechanism to observed ones.In the particular non-interaction scenario, according to the neutral function ( 19 ), the exponential of   +   is the probability of  relative to no exposition.

Genes and fitness
We inspect various interaction networks studies in Saccharomyces Cerevisiae from the recent literature that collected growth measurements of wild-type (00), single-mutants (01 and 10), and double-mutant (11) isogenic cell populations (13).Even when they addressed the same interaction question, and advocated the same null multiplicative model  11  00 =  01  10 on fitness , their definitions of fitness differ, and so are their predictions (13).
Jasnos et.al. assayed growth curves of the resulting progeny of 639 randomly crossed pairs of isogenic individuals with deletions performing slow growth rates in one of 758 genes (11).These authors defined fitness  by the factor   , of a population growing continuously at a rate , and chose the null model as the log-fitness scale  = ( 00 +  11 ) − ( 01 +  10 ), which become additive on rates.
Onge et.al. studied the interaction of 650 double-deletion strains, corresponding to pairings of 26 nonessential genes that confer resistance to the DNA-damaging agent methanesulfonate (MMS).These authors defined fitness of each deletion strain directly by its duplication rate , relative to that of wild type (10)

Data source
We downloaded from http://thecellmap.org/costanzo2016/ the normalized interaction data files SGA_ExE and SGA_NxN (12).The normalization removed systematic biases in colony size arising from experimental factors, and a model of fitness and genetic interactions for each double mutant were fit to the normalized colony sizes (12).For our purpose, entries with NaN in any of the numerical fields or with negative fitness values were ignored.The columns named "Query single mutant fitness (SMF)", "Array SMF", and "Double mutant fitness" are here denoted  01 ,  10 and  11 respectively.
For the sake of fairness, we do not introduce further data processing.The plots and analysis are purposely maintained in a factual level, as close as possible to the normalized data of the original source.In doing so, the comparison accommodates well to the exposition of the original work.

Functional Annotations Biological Processes
The gene ontology (GO) biological process data and the yeast gene association files were downloaded from http://geneontology.org/ on August 19, 2014(Ashburner et al., 2000).We considered only those BP with more than 500 interaction pairs scored by both measures without regarding interaction sign (TP).Then, we compute the fraction of gene pairs which displayed positive interaction according to J but

Figure 1 :
Figure 1: Sketch of interaction and no-interaction scenarios. and  are observed factors,  accounts for unobserved process, and  is the effect of interest.The effect is classified into variants   ,   and   , related to each factor.a) No-interaction scenario.b) Interaction scenario (there is a cross-over of the pathways from factors leading to the effect).

Figure 1
Figure 1 sketches response patterns of mechanisms perturbed by two binary observed factors  and  leading to a dichotomous effect . Unknown background agents are considered in the unobservable .The occurrence of  can be classified into three manifestations   ,   and   , corresponding to the effect of exposure to factors ,  and  respectively (Figure1).This correspondence can be represented by the logical expression  ≡   ∨   ∨   , where ∨ denotes logical OR.Note that   ,   and   are each other not necessarily exclusive.Generally, our data does not allow us to discern between these manifestations, undistinguishably observable as .This decomposition is only a temporary construct.As we will see, the variants   ,   and   cancel in our derivations, leaving equations that depend only on observables.

Figure 2 :
Figure 2: J vs M differential scoring of interacting gene pairs retrieved from the Essential X Essential SGA library constructed in Saccharomyces Cerevisiae (12).Black dots indicate MJ co-identified interactions, red dots   , green dots   and light-gray dots indicates   .A) Comparison of the magnitude of interaction computed from the multiplicative measure ℳ (X axis), and the additive measure   (Y axis).The log bar-chart inset represents the  ,   ,   and   frequencies.Vertical and horizontal lines delimit the no-interaction region enclosed by -0.08 < ℳ < 0.08 and -0.0886 <   < 0.0886.Dashed line indicate the over-conservative interaction threshold   > 0.34 used to illustrates strong interactors in candidate hubs missed by ℳ. B) Double mutant fitness f11 vs M interaction score comprising the interacting gene pairs detected within the ExE library.C) like B) but f11 vs J. D)   (red dots) and   (green dots) with coordinates M and J respectively.E) Functional similarity (PCC).Vertical and horizontal lines at PCC=0.2 delimit the similarity threshold.

Figure 3 :
Figure 3: Candidate hub missed by  listed in Table 1.A)  vs.  of the ExE library.B)  vs.  of the NxN library.C) Trm112, noc4 and rrp7 interacting networks according to J. E) according to M. D) zap1, vma7, msm1, rpb4 interacting networks according to .F) according to .

FIGURESFigure S 1 :Figure S 2 :Figure S 3 :Figure S 5 :Figure S 6 :Figure S 7 :
FIGURES S1 TO S8 Pr(  |) is the risk of a mutation  of a gene  casually associated to cancer   .Let  be a gene not causally associated to that cancer, such that Pr(  | ) = Pr(  |) for every allele  ∈ .
About 18% of the interactions assigned by one measure are missed by the other.Hence, if one of the measure were correct, the other measure would miss 18% of the interactions, while 18% of the reported interactions are spurious, i.e.The relative frequency of negative genetic interactions detected by ℳ exceeds the number of positive interactions in both ExE and NxN libraries (Figure S 1B, and Figure1A in

Table 1
Candidate hubs obtained with the  measure that are missed by the  measure ( ) identified from ExE and NxN libraries.
Thus, we explore genes which according to  has less than 10% of the number of interactions only captured by J, i.e. satisfying   +   < 0.1 .The candidate hubs so captured from the ExE and NxN libraries are listed in Table ( 13 )sely suppose the logarithm of Pr(| ) can be expressed in the linear form  +   +   , where  is a constant,   depends only on  ∈ , and   depends only on  ∈ .The numerator in( 13 )becomes (36,37) .The null model take the form  =  11 −  01  10 , where  00 = 1.Costanzo et.al.wired the most extensive global genetic interaction network in SaccharomycesCerevisiae, with over 23 million double mutants involving 5 416 different genes(12), including the first large-scale interaction network comprising ~120 000 pairs of essential genes.These authors modeled colony growth rate  based on the empirical observations that colony area  scaled linearly with time(36,37), i.e.   = , and defined fitness proportional to the rate of change of colony area relative to that of wild type, i.e.  ∝ .Like Onge et.al., the null model take the form  =  11 −  01  10 , though their  means growth rate instead of duplication rate.

Table S 1
: Features of J-exclusive candidate Hubs within the ExE and NxN library (J>0 and M>0).Data S 1: Segregation of J vs M interacting gene pairs (ExE library) across Biological Processes Data S 2: Segregation of J vs M interacting gene pairs (ExE library) across Pathways Data S 3: Segregation of J vs M interacting gene pairs (ExE library) across Molecular Complexes defined by Costanzo et al. 2016 Data S 4: Genes with expression profiles most similar to the query gene set rrp7, noc4 and trm112 Data S 5: Segregation of J vs M interacting gene pairs (NxN library) across Biological Processes Data S 6: Segregation of J vs M interacting gene pairs (NxN library) across Pathways Data S 7: Segregation of J vs M interacting gene pairs (NxN library) across Molecular Complexes defined by Costanzo et al. 2016 Data S 8: Hubs equally identified (shared) by J and M within the ExE and NxN libraries.