Abstract
Statistical models that analyse (pairwise) relations between variables encompass assumptions about the underlying mechanism that generated the associations in the observed data. In the present paper we demonstrate that three Ising model representations exist that, although each proposes a distinct theoretical explanation for the observed associations, are mathematically equivalent. This equivalence allows the researcher to interpret the results of one model in three different ways. We illustrate the ramifications of this by discussing concepts that are conceived as problematic in their traditional explanation, yet when interpreted in the context of another explanation make immediate sense.
Introduction
Scientific advances can be achieved by two types of theories: those that simply seek to identify correlations between observable events without regard to linking mechanisms; and those that specify the mechanisms governing the relations between observable events (Bandura, p. 21).^{1}.
Examining the structure of observed associations between measured variables is an integral part in many branches of science. At face value, associations (or their quantification in the form of correlations) inform about a possible relation between two variables, yet contain no information about the nature and directions of these relations. Making causal inferences from associations requires the specification of a mechanism that explains the emergence of the associations^{2,3}. By constructing an explanatory model to account for associations in the data, that has testable consequences at the level of the joint distribution of variables, it is possible to test the adequacy of the model against the data. When the model is deemed sufficiently adequate with respect to the data, this is often perceived as justification for the proposed causal interpretation^{4}.
We can discern (at least) three general frameworks, each representing a different mechanism to explain the emergence of associations between variables, with their own collection of corresponding explanatory models. These frameworks and their corresponding statistical models, all originate from different disciplines and have received considerable attention in diverse fields such as, physics, mathematics, statistical mechanics, causality, biology, epidemiology and social sciences. Although both authors originate from psychology and primarily illustrate their findings with examples from this field, the frameworks and models discussed in this paper clearly transcend the domain of psychology and as such have a multidisciplinary relevance. In the current paper we refer to these frameworks as, respectively, the common cause, reciprocal affect and common effect framework.
The common cause framework explains the observed associations through a latent (unobserved) variable acting as a common cause with respect to the manifest (observed) variables^{5}. Causal models that propose a common cause mechanism as generating the associations between manifest variables, are also known as reflective models^{6,7}; the manifest variables are indicators of the latent variable and reflect its current state. In the statistical literature, models in this framework are therefore often referred to as latent variable models. Latent variable models have proven to be extremely successful at fitting observed multivariate distributions, at the same time their theoretical and philosophical underpinning remains problematic. Latent variables are both a powerful and controversial concept, in psychology for example, the idea of psychological constructs as intelligence^{8,9} and personality^{10} being latent variables has been the subject of many intense debate. In particular about the question whether one should take a realist interpretation of latent variables, that is the latent variable signifying a real but hidden entity, to justify the use of latent variable analysis^{11,12}. An important reason for this is that a latent cause is never observed and similar to physics around the turn of the 20th century, there was need for…
Popper, p. 211^{13}. … an epistemological programme: to rid the theory of ‘unobservables’, that is, of magnitudes inaccessible to experimental observation; to rid it, one might say, of metaphysical elements.
It is in the reciprocal affect framework that we find such a programme without ‘unobservables’. In this framework the relations between variables are represented as is, in that, observed associations between manifest variables are explained as a consequence of mutualistic relations between these variables^{14}. The idea that variables are associated as a consequence of reciprocal affect has been formalised in the field of network analysis^{15} and has been studied extensively in diverse field of science such as mathematics, physics and biology^{16,17,18,19}. The Ising model is a suitable example of a statistical model in this framework, as it captures all main effects and pairwise interactions between variables^{20,21}. While originally a model of ferromagnetism from statistical physics, in the last decade the Ising model has been adopted within the social sciences, where the network perspective has been gaining much popularity^{22,23,24,25}. As an alternative for the latent variable perspective, the network approach has lead to valuable new insights about, for example, psychopathology and the aforementioned concept of intelligence^{26}.
In the, final, common effect framework, observed associations between manifest variables are explained as arising from (unknowingly) conditioning on a common effect of these variables; the manifest variables are marginally independent with respect to each other, yet their collective state leads to the occurrence (or absence) of the effect^{27,28}. Variables can act as the collective cause towards an effect in (at least) two ways, either as the separate indicators of an artificial compound score (e.g. SocioEconomicStatus, SAT)^{29}, or as determinants of a naturally occurring phenomenon, such as agents culminating into the outbreak of an epidemic. In the statistical literature, the term collider variable models is used for this framework, as the collective state of the independent variables collides into the effect^{4,30}. Because of the independence, one would naturally expect not to find any associations between these variables. However, from the literature on causality it is known that conditioning on a collider variable introduces (spurious) correlations among the variables functioning as the collective cause. This phenomenon is known as endogenous selection bias and will results in the observation of associations between the manifest variables^{30,31,32,33,34}.
It is clear that each of these frameworks proposes a radically different explanation for the emergence of associations between a set of manifest variables. In this paper we argue that these differences only exist with respect to the theoretical interpretation of these frameworks. Specifically, we demonstrate that the prototypical statistical models for binary data in each framework are mathematically equivalent and that this equivalence extends to more realistic models that capture all main effects and pairwise interactions between the observed variables. Through this we obtain three, statistically equivalent, representations of the Ising model that each explain the occurrence of associations between binary variables by a theoretically very distinct mechanism.
Results
Prototypical models
To enhance the readability of this section we start by introducing the variables that return in all discussed models and clarify the mathematical notation used in the text and equations for the distribution functions. We will denote random variables with capital letters and possible realisations of random variables with lower case letters. We represent vectors with boldfaced letters and use boldfaced capital letters to indicate matrices for parameters. Manifest variables are denoted with Roman letters, whereas we use Greek letters to indicate latent variables and parameters that need be estimated.
In the context of the paper we are primarily interested in the vector X = [X_{1}, X_{2}, …, X_{N}], consisting of N binary random variables that can take +1 and −1 as values, as we look to the mechanism by which the three frameworks explain the observed associations between the realisations of this vector denoted by x = [x_{1}, x_{2}, …, x_{n}]. Furthermore, each of the models we discuss includes a vector containing the main effect parameters δ = [δ_{1}, δ_{2}, …, δ_{n}], consisting of N numbers in . Except for equation (1) which we write out in full, we use ∑_{i} and ∏_{i} to denote respectively and for the remainder of the equations. Additionally, we use p(x) to denote p(X = x), which extends to all variables in both conditional and joint probability distributions, such that we can read it as the probability of observing some realisation of the random vector X, optionally, conditional on, or together with, the realisation of some other variable.
We consider the Rasch model^{35}, an Item Response Theory (IRT) model from the field of psychometrics, as the prototypical model for binary data in the common cause framework. Historically, the Rasch model has been developed for modelling the responses of persons to binary scored items on a test. The model is graphically represented in Fig. 1(a) as a Directed Acyclic Graph (DAG)^{36}, where the latent random variable Θ acts as the common cause of the manifest random variables X. The Rasch model is characterised by the following distribution for the manifest variables (X), conditional on the latent variable :
The marginal probabilities for X can be obtained by endowing Θ with a distribution as shown in equation (2). While this gives us an expression for the manifest probabilities of the Rasch model, for almost all choices for the distribution of f(θ), this expression becomes computationally intractable.
In the traditional interpretation of the Rasch model, x_{i} indicates whether the response to item i is correct (x_{i} = +1) or incorrect (x_{i} = −1). In this context, the continuous random variable (Θ) represents the latent ability being examined by the set of items (X). The vector δ contains the item main effects, where δ_{i} represents the easiness of item i, such that −δ_{i} represents the difficulty of item i with respect to the measured ability. The response of an individual on item i is a tradeoff between the ability of the person (θ) and the item difficulty (−δ_{i}). When the ability of the person is greater than the difficulty of the item, the probability for a correct response will be higher than for an incorrect response (θ > −δ_{i} ⇒ p(x_{i} = 1) > p(x_{i} = −1)), if the ability of the person is lower than the item difficulty the reverse holds (θ < −δ_{i} ⇒ p(x_{i} = 1) < p(x_{i} = −1)). As such, persons with a greater ability will always have a higher probability of giving a correct response and persons always has a higher probability for a correct response on an easy item than on a more difficult item. A key property of the Rasch model is that of local independence, which entails that only variation in Θ determines the probability for a response on an item. That is, conditionally on the state of the latent variable all manifest variables are independent, such that marginally (with respect to the latent variable) they are dependent. Consequently, any observed associations between the manifest variables can be traced back to the influence of the latent variable. It is the latent variable that causes the associations on the manifest variables, which is why the Rasch model falls within the common cause framework.
For the reciprocal affect framework we examine the CurieWeiss model from statistical physics^{37,38,39}, originally used to model the state of a set of magnetic moments, for which the thermodynamical properties correspond to that of the classical CurieWeiss theory of magnetism and where the pairwise interactions between the magnetic moments are replaced by the mean magnetisation. Graphically, the CurieWeiss model can be represented as an undirected graph wherein the manifest variables (X), representing the set of magnetic moments, are fully connected with each other and all connection are of equal strength, as illustrated in Fig. 1(b). The distribution of the manifest variables (X) in the CurieWeiss model is given by:
In the conventional interpretation of the CurieWeiss model, x_{i} indicates that the magnetic spin of moment i is upward (x_{i} = +1) or downward (x_{i} = −1), whereas the main effect for each moment (δ_{i}) indicates the natural preference of moment i to be in an upward (δ_{i} > 0) or downward (δ_{i} < 0) spin position, due to the external magnetic field not present in X. In equation (3) we use Z to represents the normalising constant, in thermodynamical systems often referred to as the partition function, that makes the distribution sum to one. In the CurieWeiss model, this partition function sums over all 2^{N} possible configurations of the vector X, which we denote in this paper as and is given by the following expression:
As the pairwise interactions between the magnetic moments are replaced by the mean magnetisation, all interactions are captured by the squared sum of the set of moments in the exponential of the CurieWeiss distribution. By averaging over individual interactions between magnetic moments, the CurieWeiss model is the simplest nontrivial model that exhibits a phase transition in statistical mechanics. However, because the model violates fundamental principles of statistical physics and its predictions are only partially verified by experiments, it is considered as being mainly of theoretical interest^{37}. Nonetheless, due to its simplicity, the CurieWeiss model has been useful in understanding the dynamics of equivalent phenomena in more realistic systems, such as the Ising model^{39}. Still, it is clear that as the observed associations between magnetic moments are presumed to emerge due to the magnetic interaction between these moments themselves, the CurieWeiss model falls within the reciprocal affect framework.
Whereas in the Rasch model, from the common cause framework, associations between manifest variables are explained by the latent variable Θ, in the CurieWeiss model, from the reciprocal affect framework, these associations are captured in the squared sum of the set of moments in the exponential of the distribution. The key ingredient for establishing the connection between these two models has been known for a long time and has also been rediscovered quite a few times in quite diverse fields of science^{37,40,41,42,43,44,45}. It was in his Brandeis lecture that Mark Kac^{37} established the relation between the CurieWeiss model and the Rasch model through an ingenuous use of the following Gaussian integral:
What Kac realized is that whenever you see the exponential of a square, you can replace it with the right hand side integral from equation (5). In the Methods section we demonstrate that applying this Gaussian identity to the CurieWeiss distribution from equation (3) linearises the squared sum in the exponential and introduces a random variable Θ, such that we obtain a latent variable representation of the CurieWeiss model. We then show that because the square in the exponential is gone, we can rewrite the expression for the latent variable representation of the CurieWeiss model such that, both the marginal distribution of the manifest variables and that of the manifest variables conditional on θ, are identical to that of the Rasch model from equation (1) and equation (2). Having established the relation between the prototypical models from the common cause and reciprocal affect framework we turn to our third framework.
For the common effect framework we consider X as a set of independent random variables, which we will collectively call the cause, together with a single dependent binary random variable (E), which we will call the effect. Their joint distribution, given in equation (6), is a collider structure and can be graphically represented in a DAG as illustrated in Fig. 1(c).
In this collider distribution, x_{i} indicates if cause i is active (x_{i} = +1) or inactive (x_{i} = −1), whereas (e) indicates whether the effect was present (e = 1) or absent (e = 0) at that time. The main effect for each cause (δ_{i}) denotes the natural predisposition for cause i to be active (δ_{i} > 0) or inactive (δ_{i} < 0) at any given time. As mentioned and shown by equation (9) in the Methods section, are the individual causes independent of each other in the marginal distribution of X. As a consequence when we marginalise X with respect to E, the causes will not show any associations among each other. From the literature on causality it is however known that selection with respect to a common effect variable will introduce (spurious) correlations among the causes. That is, by using only observations of x where the common effect is present (e = 1), this set of observations will show a pattern of associations among the causes. This is known as endogenous selection bias with respect to a collider variable and can be mathematically represented in the distribution of the causes conditionally on the effect. In the Methods section we demonstrate that when we apply this selection bias mechanism to the collider structure from equation (6), the distribution of the collective cause (X) conditionally on the effect, exactly gives the CurieWeiss model from statistical physics and hence, the Rasch model.
Realistic models
Having studied the three statistical explanations in their simplest nontrivial form, we conclude that, although their theoretical interpretation is radically different, the three models are mathematically indistinguishable. Still, the simplicity of the prototypical models for each framework also makes them often unrealistic with respect to the observed reality. Specifically, the Rasch model and simple collider model only allow for main effects between the observations and do not consider possible pairwise interactions. The CurieWeiss model does allow some crude form of interaction, however, as the individual interactions between nearest neighbours are replaced by the mean interaction, one must make the (often) unrealistic assumption that all observations are interconnected with the same strength. Fortunately, we can swiftly generalise all three prototypical models to more realistic forms.
We start with the Ising model, of which the CurieWeiss model is the simplest form, from the reciprocal affect framework. Like the CurieWeiss model, the Ising model was originally introduced in statistical physics as a model for magnetism, with the same possible values and interpretation for X and its possible realisations. However, instead of only considering the mean magnetisation, the Ising model captures all pairwise interactions between the set of manifest variables (X). The distribution of the Ising model, where is the sum over all distinct pairs of magnetic moments, is commonly written as follows:
The pairwise interactions are represented in the Ising model distribution by the symmetric N × N connectivity matrix ∑ in . In this connectivity matrix, σ_{ij} modulates the reciprocal affect relation between x_{i} and x_{j}, indicating if moments i and j prefer to be in identical (σ_{ij} > 0), or opposing (σ_{ij} < 0) spin positions, wherein the higher the absolute value of σ_{ij}, the stronger this preference. Under the condition that all offdiagonal entries of Σ are equal, the Ising model reduces to the prototypical CurieWeiss model. Because the diagonal values of the connectivity matrix in the Ising model are arbitrary, i.e., the probability of X is independent of these values, we can choose the values for the diagonal in such a way that the connectivity matrix becomes positive (semi) definite. As a consequence the eigenvalue decomposition of the matrix will also be nonnegative. As clarified in the Methods section, by applying this transformation to the Ising model distribution from equation (7) we obtain an eigenvalue representation of the Ising model:
where λ_{r} is the r^{th} nonnegative eigenvalue of the vector Λ = [λ_{1}, λ_{2}, …, λ_{N}] and q_{ir} the value of the i^{th} row and r^{th} column of the N × N eigenvector matrix Q. In the equation above should be read as , we continue this practice in the notation of the applicable equations in the Methods section. In the Methods section we demonstrate how this eigenvalue representation allows us to connect the Ising model from the reciprocal affect framework to the more realistic models in both the other frameworks. First, by applying the Gaussian identity from Kac to the squared sum in the exponent for each eigenvalue in equation (8), we obtain a latent variable representation of the Ising model^{46}, with as many latent dimensions as there are nonzero eigenvalues. This latent variable representation of the Ising model is then shown to be the multidimensional IRT model^{47} from the common cause framework, of which the Rasch model is the simplest instance, but allows for more than one latent variable to explain the observed associations between the manifest variables. Similarly, we can introduce (independent) effect variables for each eigenvalue in equation (8), such that we obtain a collider representation of the Ising model where endogenous selection bias has taken place. We then show that this distribution is a version of the common effect model as seen in equation (6), that is extended such that the collective cause can collide into more then one common effect.
Discussion
We have shown that the mathematical equivalence of the simple prototypical models from the common cause, reciprocal affect and common effect framework, extends to the more realistic counterparts of these models. That is, there exist three, statistically indistinguishable, representations of the Ising model that explain observed associations either through marginalisation with respect to latent variables, through reciprocal affect between variables, or through conditioning on common effect variables. We therefore argue that these are not three different models, but just one model for which three distinct theoretical interpretations have been developed in different fields of science. Consequently, any set of associations between variables that is sufficiently described by a model in one framework, can be explained as emerging from the mechanism represented by any of the three theoretical frameworks. We illustrate the implications of this by considering one of the most controversial topics in the common cause framework, differential item functioning (DIF)^{48} and discuss it in the context of the three possible interpretations.
In it’s traditional (common cause) framework DIF indicates that, conditional on the level of the latent variable, the probability for some response is dependent on group membership. For items that exhibit DIF it is not only variation in Θ that determines the probability for a response on an item. From a common cause perspective the occurrence of DIF is a violation of local independence and as such measurement invariance. In the context of ability testing DIF is often perceived as indication of item bias^{49}. As a fictitious example, consider the situation where on certain items from the Revised NEO Personality Inventory (NEO PIR)^{50}, that intents to measure the Big Five personality traits^{51}, we find that for a group of subjects with the same latent trait score, those that listed their occupation as being a manager always have a higher probability of giving a correct response on these items, compared to subjects that have no occupation as a manager.
Needless to say, in this context items that exhibit DIF are seen as bad because they pose a problem for both the reliability and validity of a test. From a reciprocal affect perspective, identification of DIF would exhibit itself in the form of differences in the estimated pairwise associations between items depending on group membership. As such the appearance of DIF in the NEO PIR example would also be viewed as troublesome. However, in contrast to the common cause framework, the appearance of DIF in a network model is at least informative in that our model might be incomplete, i.e., the network is missing a node. The interpretation of DIF in a common effect framework is best understood, in the context of the current example, by considering the answer to the question: What causes people to obtain a managerial position as occupation? It is safe to say that in most cases a persons personality is an important factor in this process. In other words, people that are selected to become manager, get this position because they posses a certain set of personality traits associated with being a successful manager^{52}. As such, the items in the NEO PIR that show DIF in this case measure those personality traits that are most sought after in managers. More broadly in the context of the common effect framework, the occurrence of DIF indicates how well an item predicts differences in the effect. In contrast to the disruptive interpretation of DIF in both the common cause and reciprocal affect frameworks, the occurrence of DIF within the context of the common effect framework is actually both sensible and informative.
The previous example clearly demonstrates how fundamental concepts, that are firmly established in their traditional framework as being problematic, can be perceived as neutral and informative or even desirable in another context. Having multiple possible interpretations for the same model allows for more plausible explanations when it comes to the theoretical concepts and the causal inferences we obtain from the measurement model applied to our data. For example in the context of psychopathology, depression has been habitually being treated as a common cause variable for which its symptoms are the interchangeable indicators. Measures of these symptoms with the popular Beck Depression Inventory^{53} have shown to fit a latent variable model with one underlying general depression factor and three highly intercorrelated subfactors, or a twofactor solution well^{54}. However, in a common cause framework depression symptoms, such as sleep problems, loss of energy and trouble concentrating, are assumed independent of one another, as they are purely caused by the latent variable interpreted as depression. More recently it has been shown that a network model can also give an accurate description of data on depression symptoms^{26}. The reciprocal affect representation of depression, where symptoms can directly influence each other, is as an explanation more in line with our perceived reality. Furthermore, the historical success of theoretically very implausible models, such as the latent variable model, can thus in retrospect, arguably be explained by the equivalence of these three models.
Being able to interpret the outcome of an applied measurement model from theoretically very distinct perspectives, instead of only the perspective as traditionally assumed by the model, is great progress, as it allows for novel explanations that might be a better reflection of our perceived reality. Furthermore, in their different fields of application different aspects of these models have been studied and different methodology has been developed. Through their connection much of these developments become available to all fields of application.
Methods
In this section we clarify the mathematics involved in connecting the simple prototypical models, as well as the more realistic Ising model representations, for the three different frameworks. In the first proof we demonstrate the equivalence between the simple collider, CurieWeiss and Rasch models, the prototypical (yet unrealistic) models for respectively the common effect, reciprocal affect and common cause explanation for observed associations between a set of binary variables.
Proof for the equivalence of the simple prototypical models
Collider to CurieWeiss
The simple collider model from the common effect framework is characterised by the following joint probability distribution p(x, e):
In order to connect this collider model to the CurieWeiss model we introduce endogenous selection bias on the set of manifest variables forming the collective cause, by conditioning on the effect being present. This is mathematically presented as the conditional distribution p(xe = 1), proportional to the product of the marginal distribution for the cause p(x) and the probability of observing the effect given the cause p(e = 1x), defined by:
We can simplify the expression for p(xe = 1) by recognising that the product of exponentials in the numerator can be rewritten as a sum within the exponential. Furthermore, the denominator of the expression is only dependent on the sum of X and thus independent of the specific pattern that the realisation of X takes. As a consequence p(xe = 1) is only proportional to the numerator of equation (10), such that we can write:
In order to obtain a valid probability density function we have to add the appropriate normalising constant that makes the probabilities sum to one again. In this case this translates to dividing the expression in equation (11) for a certain realisation of X by the sum of this expression for all possible configurations of X:
It can quickly be verified that the resulting expression in equation (12) is identical to the distribution for the CurieWeiss model introduced in equation (3), with the same normalising constant as given in equation (4). Thus proofing that, conditional on the effect being present, the distribution of the collective cause in the collider model is equivalent to the distribution of a set of directly interacting magnetic moments in the CurieWeiss model.
CurieWeiss to Rasch
Next, we will connect the CurieWeiss model from statistical physics to the Rasch model from psychometrics. We start from the distribution function of the CurieWeiss model, where we use Z to denote the appropriate normalising constant:
Next we use Kac’s Gaussian identity from equation (5) to linearise the quadratic sum in the exponential of the CurieWeiss distribution, to that end let , so we can rewrite it in the following way:
By incorporating this transformation we obtain a latent variable representation of the CurieWeiss model
Which we can simplify further by merging the two sums in the exponential:
Where we will use to denote the appropriate normalising constant. For the next step towards our goal we multiply both the numerator and denominator of equation (16) by , such that we obtain the equivalent expression:
Next we rearrange the expression in equation (17) by switching the denominators of both factors, taking the sum in the first numerator out of the exponential so it becomes a product and transferring exp (−θ^{2}) out the numerator of the first factor and into the numerator of the second factor:
The resulting expression can be recognised as , the marginal probability for some realisation of X where the latent variable Θ is integrated out. Let us denote the second factor of the expression in equation (18) as the distribution of the latent variable (f(θ)), which gives us:
Such that the distribution of the set of binary random variables (X), conditionally on the latent variable (Θ), is:
Again, it is readily seen that the resulting latent variable expression of the CurieWeiss model in equation (20) is identical to the distribution of the Rasch model from equation (1). This completes our first proof in which we demonstrated that by conditioning on the effect in the collider model from the common effect framework, the distribution of set of binary random variables (X) is equivalent to that of the CurieWeiss model from the reciprocal affect framework. Furthermore, when we linearise the quadratic sum in the exponential of the CurieWeiss model, we obtain a latent variable representation of this model where the distribution of the manifest random variables (X) given the latent variable (Θ) is equivalent to that of the Rasch model from the common cause framework. Consequently, given the equivalence of the collider model and the CurieWeiss model and that of the CurieWeiss model and the Rasch model, we can conclude that the collider model and the Rasch model are also equivalent.
In the next proof we demonstrate that this equivalence relation between the three frameworks extends to the more realistic models of these frameworks, as those allow pairwise interactions between the random variables in the set X. We start with the conventional representation of the full Ising model from the reciprocal affect framework and rewrite this into an equivalent eigenvalue representation. Subsequently we connect this to both a latent variable representation equivalent to the multidimensional IRT model from the common cause framework and a collider representation from the common effect framework.
Three representations of the Ising model
Conventional to Eigenvalue representation
The distribution of the Ising model is commonly written as follows:
Where the partition function Z, that makes the distribution sum to one, is given by:
In order to connect the Ising model from the reciprocal affect framework to the models from both other frameworks, we first have to rewrite it into matrix notation such that we can obtain the eigenvalue representation of the Ising model. To that end we first rewrite the sum over all distinct i, j pairs in the exponent, as a function of the sum over i and the sum over j:
Such that we may rewrite the Ising model in matrix notation:
All parameters, except for entries on the diagonal of the connectivity matrix, are identifiable from the data. However, as x_{i} x_{j} = 1 when i = j any diagonal entry for the connectivity matrix will be cancelled out by the partition function. With the observation that the diagonal values of Σ are thus arbitrary (i.e., do not change the probabilities), we can shift them in such a way that the connectivity matrix Σ becomes positive (semi) definite and hence its eigenvalue decomposition nonnegative. This allows for the transformation Σ + cI = QΛQ^{T}, where c contains the chosen values for the diagonal of the connectivity matrix, that when implemented gives:
By taking the expression out of its matrix notation we obtain the eigenvalue representation of the Ising model:
Eigenvalue to Latent Variable representation
We obtain a latent variable representation of the Ising model by applying Kac’s Gaussian identity to the squared sum in the exponent of equation (26). To that end let a^{2} be for each of the N eigenvalues in the Ising model and replace this with the right hand side integral from equation (5):
Incorporating this transformation into the Ising model and letting , we get he latent variable representation of the Ising model, where the number of nonzero eigenvalues represents the number of latent dimensions in the model:
In order to connect this latent variable representation to the multidimensional IRT model we the multiply we multiply both the numerator and denominator of equation (28) by , such that we obtain the equivalent expression:
Next we rearange and simplify the expression in equation (29). To that end let us merge the sums over x_{i} in the exponential and denote λ_{r}q_{ir} as α_{ir}, where α_{ir} is the value of the i^{th} row and r^{th} column of the N × N matrix A in . We can then rewrite the sum over r as a product of the vector , containing the i^{th} row of the matrix A and the vector θ. Furthermore, we switch the denominators of both factors and transfer out the numerator of the first factor, to the numerator of the second factor:
In the resulting expression we can again recognise a function for the marginal probability of X where all latent dimensions (Θ) are integrated out. Finally we take the sum over i out of the exponential such that we obtain:
We can recognise this particular latent variable representation of the Ising model as a multidimensional IRT model^{47} from the common cause framework, where the vector Θ represent the set of latent abilities measured by the items in X. In addition to this vector, we also find the matrix A in the model, where the i^{th} row contains the discrimination parameters for all latent variables on item i. In the traditional interpretation of the IRT framework, the discrimination parameter quantifies how well the item measures the corresponding latent variable, or in model terms, the degree to which the probability for item responses varies with respect to each latent variable in Θ. We obtain the following expression for the conditional probabilities of X given the vector of latent variables (Θ):
Note that, as the number of nonzero eigenvalues represents the number of latent dimensions in the model, under the condition that only the first eigenvalue is nonzero and the discrimination parameters with respect to the single resulting latent variable are 1 for all the items, the model reduces to the Rasch model from equation (1).
Eigenvalue to Collider representation
To acquire an collider representation of the Ising model we start again from the eigenvalue representation of the Ising model:
By taking the partition function out of the expression we obtain the following proportionality relation:
Next we can introduce a set of (independent) effect variables (E = [E_{1}, E_{2}, …, E_{m}]) for each eigenvalue, such that we obtain a collider representation of the Ising model where endogenous selection bias has taken place for multiple effect variables. To that end, we recognise p(x) from equation (34) as p(xe = 1), the conditional probability of the collective cause (X) given that all effects are present (E = 1), proportional to the product of the marginal distribution for the collective cause and the probability of observing the effects given the collective cause . Taking the sum over i in p(x) and the sum over r in p(e = 1x) out of their respective exponential and adding the appropriate normalising constant to make the probabilities sum to one we obtain the following expression:
Such that we can write the joint distribution of causes and effect variables as a common effect representation of the Ising model:
We can quickly recognise a collider model in this distribution that is extended for as much common effect variables as there are nonnegative eigenvalues. With this we have completed our second and final, set of proofs, showing that three, statistically equivalent, representations of the Ising model exist that explain observed associations between binary variables as arising either through marginalisation with respect to latent variables, through reciprocal affect between variables, or through conditioning on common effect variables.
Additional Information
How to cite this article: Kruis, J. and Maris, G. Three representations of the Ising model. Sci. Rep. 6, 34175; doi: 10.1038/srep34175 (2016).
References
Bandura, A. Social cognitive theory: An agentic perspective. Asian J. Soc. Psychol. 2, 21–41 (1999).
Costner, H. L. Theory, deduction and rules of correspondence. Am. J. Sociol. 75, 245–263 (1969).
Edwards, J. R. & Bagozzi, R. P. On the nature and direction of relationships between constructs and measures. Psychol. Methods 5, 155–174 (2000).
Pearl, J. Causality: models, reasoning and inference. Economet. Theor. 19, 675–685 (2003).
Reichenbach, H. The Direction of Time(University of California Press, Berkeley, 1956).
Fornell, C. & Bookstein, F. L. Two structural equation models: Lisrel and PLS applied to consumer exitvoice theory. J. Mark. Res. 19, 440–452 (1982).
Markus, K. A. & Borsboom, D. Reflective measurement models, behavior domains and common causes. New Ideas Psychol. 31, 54–64 (2013).
Spearman, C. ‘General intelligence’ objectively determined and measured. Am. J. Psychol. 15, 201–293 (1904).
Jensen, A. R. The g factor: The science of mental ability(Praeger, Westport, 1998).
McCrae, R. R. & Costa, P. T. Empirical and theoretical status of the fivefactor model of personality traits in The SAGE Handbook of Personality Theory and Assessment: Volume 1 – Personality Theories and Models 273–294 (SAGE, London, 2008).
Borsboom, D., Mellenbergh, G. J. & van Heerden, J. The concept of validity. Psychol. Rev. 111, 1061–1071 (2004).
Borsboom, D. Psychometric perspectives on diagnostic systems. J. Clin. Psychol. 64, 1089–1108 (2008).
Popper, K. R. The Logic of Scientific Discovery(Hutchinson, London, 1959).
Van Der Maas, H. L. et al. A dynamical model of general intelligence: the positive manifold of intelligence by mutualism. Psychol. Rev. 113, 842–861 (2006).
Epskamp, S., Maris, G., Waldorp, L. J. & Borsboom, D. Network Psychometrics in Handbook of Psychometrics. (Wiley, New York, 2015).
Lee, T.D. & Yang, C.N. Statistical theory of equations of state and phase transitions II. Lattice gas and Ising model. Phys. Rev. 87, 410–419 (1952).
Besag, J. Spatial interaction and the statistical analysis of lattice systems. J. R. Stat. Soc. Series B Stat. Methodol. (Methodological) 36, 192–236 (1974).
Besag, J. Statistical analysis of nonlattice data. Statistician 24, 179–195 (1975).
Barzel, B. & Barabási, A. L. Universality in network dynamics. Nat. Phys. 9, 673–681 (2013).
Ising, E. Beitrag zur theorie des ferromagnetismus. Zeit. Phys. 31, 253–258 (1925).
Jaynes, E. T. Information theory and statistical mechanics. Phys. Rev. 106, 620–630 (1957).
Barabási, A. L. The network takeover. Nat. Phys. 8, 14–16 (2012).
Cramer, A. O., Waldorp, L. J., van der Maas, H. L. & Borsboom, D. Comorbidity: A network perspective. Behav. Brain Sci. 33, 137–150 (2010).
Borsboom, D. & Cramer, A. O. Network analysis: an integrative approach to the structure of psychopathology. Annu. Rev. Clin. Psychol. 9, 91–121 (2013).
Schmittmann, V. D. et al. Deconstructing the construct: A network perspective on psychological phenomena. New Ideas Psychol. 31, 43–53 (2013).
van Borkulo, C. D. et al. A new method for constructing networks from binary data. Sci. Rep. 4, 5918 (2014).
Blalock, H. M. Causal models involving unmeasured variables in stimulusresponse situations in Causal Models in the Social Sciences 335–347 (AldineAtherton, Chicago, 1971).
Bollen, K. & Lennox, R. Conventional wisdom on measurement: A structural equation perspective. Psychol. Bull. 110, 305–314 (1991).
Hauser, R. M. Disaggregating a socialpsychological model of educational attainment. Soc. Sci. Res. 1, 159–188 (1972).
Greenland, S., Pearl, J. & Robins, J. M. Causal diagrams for epidemiologic research. Epidemiology 10, 37–48 (1999).
Heckman, J. J. Sample selection bias as a specification error. Econometrica 47, 153–161 (1979).
Greenland, S. Quantifying biases in causal models: classical confounding vs colliderstratification bias. Epidemiology 14, 300–306 (2003).
Hernán, M. A., HernándezDaz, S. & Robins, J. M. A structural approach to selection bias. Epidemiology 15, 615–625 (2004).
Elwert, F. & Winship, C. Endogenous selection bias: The problem of conditioning on a collider variable. Annu. Rev. Sociol. 40, 31–53 (2014).
Rasch, G. Probabilistic Models for some Intelligence and Attainment Tests(The Danish Institute of Educational Research, Copenhagen, 1960).
Christofides, N. Graph Theory: An Algorithmic Approach(New York: Academic Press Inc, 1975).
Kac, M. Statistical physics: Phase transitions and superfluidity in Brandeis University Summer Institute in Theoretical Physics, Vol. 1 (eds Chretien, M., Gross, E. & Deser, S. ) 241–305 (Gordon and Breach Science Publishers, New York, 1968).
Stanley, H. E. Introduction to Phase Transitions and Critical Phenomena. (Oxford University Press, New York, 1971).
Ellis, R. S. & Newman, C. M. The statistics of curieweiss models. J. Stat. Phys. 19, 149–161 (1978).
Olkin, I. & Tate, R. F. Multivariate correlation models with mixed discrete and continuous variables. Ann. Math. Stat. 32, 448–465 (1961).
Emch, G. G. & Knops, H. J. Pure thermodynamical phases as extremal kms states. J. Math. Phys. 11, 3008–3018 (1970).
Cox, D. R. & Wermuth, N. A note on the quadratic exponential binary distribution. Biometrika 81, 403–408 (1994).
McCullagh, P. Exponential mixtures and quadratic exponential families. Biometrika 81, 721–729 (1994).
Lauritzen, S. L. Graphical models(Oxford University Press, USA, 1996).
Anderson, C. J. & Yu, H.T. Logmultiplicative association models as item response models. Psychometrika 72, 5–23 (2007).
Marsman, M., Maris, G., Bechger, T. & Glas, C. Bayesian inference for lowrank ising networks. Sci. Rep. 5, 9050 (2015).
Reckase, M. Multidimensional Item Response Theory(Springer, 2009).
Mellenbergh, G. J. Item bias and item response theory. Int. J. Educ. Res. 13, 127–143 (1989).
Meredith, W. Measurement invariance, factor analysis and factorial invariance. Psychometrika 58, 525–543 (1993).
Costa, P. T. & MacCrae, R. R. Revised NEO personality inventory (NEO PIR) and NEO fivefactor inventory (NEO FFI): Professional manual(Psychological Assessment Resources, 1992).
Goldberg, L. R. An alternative “description of personality”: the bigfive factor structure. J. Pers. Soc. Psychol. 59, 1216–1229 (1990).
Barrick, M. R. & Mount, M. K. The big five personality dimensions and job performance: a metaanalysis. Pers. Psychol. 44, 1–26 (1991).
Beck, A. T., Ward, C. H., Mendelson, M., Mock, J. & Erbaugh, J. An inventory for measuring depression. Arch. Gen. Psychiatry 562, 53–63 (1961).
Beck, A. T., Steer, R. A. & Carbin, M. G. Psychometric properties of the beck depression inventory: Twentyfive years of evaluation. Clin. Psychol. Rev. 8, 77–100 (1988).
Acknowledgements
We thank Dr. Maria A. Bolsinova and Dr. Maarten Marsman for proof reading the manuscript. This work was supported by by NWO (The Netherlands Organisation for Scientific Research), No. 022.005.0 (JK) and No. CI112S037 (GM).
Author information
Affiliations
Contributions
J.K. and G.M. wrote the main manuscript. All authors reviewed the manuscript.
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Rights and permissions
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
About this article
Cite this article
Kruis, J., Maris, G. Three representations of the Ising model. Sci Rep 6, 34175 (2016). https://doi.org/10.1038/srep34175
Received:
Accepted:
Published:
Further reading

Dynamic estimation in the extended marginal Rasch model with an application to mathematical computer‐adaptive practice
British Journal of Mathematical and Statistical Psychology (2020)

Using Network Analysis to Identify Central Symptoms of Adolescent Depression
Journal of Clinical Child & Adolescent Psychology (2019)

The network approach to psychopathology: a review of the literature 2008–2018 and an agenda for future research
Psychological Medicine (2019)

The network structure of schizotypy in the general population
European Archives of Psychiatry and Clinical Neuroscience (2019)

Gibbs sampling using the data augmentation scheme for higherorder item response models
Physica A: Statistical Mechanics and its Applications (2019)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.