Introduction

Questions and assumptions about mind-brain supervenience go back at least as far as Plato's dialogues circa 400 BCE1. While there are many different notions of supervenience, we find Davidson's canonical description particularly illustrative2:

[mind-brain] supervenience might be taken to mean that there cannot be two events alike in all physical respects but differing in some mental respect, or that an object cannot alter in some mental respect without altering in some physical respect.

Colloquially, supervenience means “there cannot be a mind-difference without a physical-difference.” This philosophical conjecture has potentially widespread implications. For example, neural network theory and artificial intelligence often implicitly assume a local version mind-brain supervenience3,4. Cognitive neuroscience similarly seems to operate under such assumptions5. Philosophers continue to debate and refine notions of supervenience6. Yet, to date, relatively scant attention has been paid to what might be empirically learned about supervenience.

In this work we attempt to bridge the gap between philosophical conjecture and empirical investigations by casting supervenience in a probabilistic framework amenable to hypothesis testing. We then use the probabilistic theory of pattern recognition to determine the limits of what one can and cannot learn about supervenience through data analysis. The implications of this work are varied. It provides a probabilistic framework for converting philosophical conjectures into statistical hypotheses that are amenable to experimental investigation, which allows the philosopher to gain empirical support for her rational arguments. This leads to the construction of the first explicit proof (to our knowledge) of a universally consistent classifier on graphs and the first demonstration of the tractability of answering supervenience questions. Supervenience therefore seems to perhaps be a useful but under-utilized concept for neuroscientific investigations. This work should provide further motivation for cross-disciplinary efforts across three fields—philosophy, statistics and neuroscience—with shared goals but mostly disjoint jargon and methods of analysis.

Results

Statistical supervenience: a definition

Let be the space of all possible minds and let be the set of all possible brains. includes a mind for each possible collection of thoughts, memories, beliefs, etc. includes a brain for each possible position and momentum of all subatomic particles within the skull. Given these definitions, Davidson's conjecture may be concisely and formally stated thusly: mm bb′, where are mind-brain pairs. This mind-brain supervenience relation does not imply an injective relation, a causal relation, or an identity relation (see Appendix 1 for more details and some examples). To facilitate both statistical analysis and empirical investigation, we convert this local supervenience relation from a logical to a probabilistic relation.

Let FMB indicate a joint distribution of minds and brains. Statistical supervenience can then be defined as follows:

Definition 1. is said to statistically supervene on for distribution F = FMB, denoted , if and only if , or equivalently .

Statistical supervenience is therefore a probabilistic relation on sets which could be considered a generalization of correlation (see Appendix 1 for details).

Statistical supervenience is equivalent to perfect classification accuracy

If minds statistically supervene on brains, then if two minds differ, there must be some brain-based difference to account for the mental difference. This means that there must exist a deterministic function g* mapping each brain to its supervening mind. One could therefore, in principle, know this function. When the space of all possible minds is finite—that is, —any function g: mapping from minds to brains is called a classifier. Define misclassification rate, the probability that g misclassifies b under distribution F = FMB , as

where denotes the indicator function taking value unity whenever its argument is true and zero otherwise. The Bayes optimal classifier g* minimizes LF (g) over all classifiers: g* = argmin gLF (g). The Bayes error, or Bayes risk, L* = LF (g*), is the minimum possible misclassification rate.

The primary result of casting supervenience in a statistical framework is the below theorem, which follows immediately from Definition 1 and Eq. (1):

Theorem 1. .

The above argument shows (for the first time to our knowledge) that statistical supervenience and zero Bayes error are equivalent. Statistical supervenience can therefore be thought of as a constraint on the possible distributions on minds and brains. Specifically, let indicate the set of all possible joint distributions on minds and brains and let be the subset of distributions for which supervenience holds. Theorem 1 implies that . Mind-brain supervenience is therefore an extremely restrictive assumption about the possible relationships between minds and brains. It seems that such a restrictive assumption begs for empirical evaluation, vis-á-vis, for instance, a hypothesis test.

The non-existence of a viable statistical test for supervenience

The above theorem implies that if we desire to know whether minds supervene on brains, we can check whether L* = 0. Unfortunately, L* is typically unknown. Fortunately, we can approximate L* using training data.

Assume that training data are each sampled identically and independently (iid) from the true (but unknown) joint distribution F = FMB . Let gn be a classifier induced by the training data, gn : . The misclassification rate of such a classifier is given by

which is a random variable due to the dependence on a randomly sampled training set . Calculating the expected misclassification rate is often intractable in practice because it requires a sum over all possible training sets. Instead, expected misclassification rate can be approximated by “hold-out” error. Let be a set of n′ hold-out samples, each sampled iid from FMB . The hold-out approximation to the misclassification rate is given by

By definition of g*, the expectation of (with respect to both and ) is greater than or equal to L* for any gn and all n. Thus, we can construct a hypothesis test for L* using the surrogate .

A statistical test proceeds by specifying the allowable Type I error rate α > 0 and then calculating a test statistic. The p-value—the probability of rejecting the least favorable null hypothesis (the simple hypothesis within the potentially composite null which is closest to the boundary with the alternative hypothesis)—is the probability of observing a result at least as extreme as the observed. In other words, the p-value is the cumulative distribution function of the test statistic evaluated at the observed test statistic with parameter given by the least favorable null distribution. We reject if the p-value is less than α. A test is consistent whenever its power (the probability of rejecting the null when it is indeed false) goes to unity as n → ∞. For any statistical test, if the p-value converges in distribution to δ 0 (point mass at zero), then whenever α > 0, power goes to unity.

Based on the above considerations, we might consider the following hypothesis test: H0 : L* > 0 and HA : L* = 0; rejecting the null indicates that . Unfortunately, the alternative hypothesis lies on the boundary, so the p-value is always equal to unity7. From this, Theorem 2 follows immediately:

Theorem 2. There does not exist a viable test of .

In other words, we can never reject L* > 0 in favor of supervenience, no matter how much data we obtain.

Conditions for a consistent statistical test for ε-supervenience

To proceed, therefore, we introduce a relaxed notion of supervenience:

Definition 2. is said to ε-supervene on for distribution F = FMB, denoted , if and only if L* < ε for some ε > 0.

Given this relaxation, consider the problem of testing for ε-supervenience:

Let be the test statistic. The distribution of is available under the least favorable null distribution. For the above hypothesis test, the p-value is therefore the binomial cumulative distribution function with parameter ε; that is, p-value = , where . We reject whenever this p-value is less than α; rejection implies that we are 100(1 − α)% confident that .

For the above ε-supervenience statistical test, if gng* as n → ∞, then as n, n′ → ∞. Thus, if L* < ε, power goes to unity. The definition of ε-supervenience therefore admits, for the first time to our knowledge, a viable statistical test of supervenience, given a specified ε and α. Moreover, this test is consistent whenever gn converges to the Bayes classifier g*.

The existence and construction of a consistent statistical test for ε-supervenience

The above considerations indicate the existence of a consistent test for ε-supervenience whenever the classifier used is consistent. To actually implement such a test, one must be able to (i) measure mind/brain pairs and (ii) have a consistent classifier gn . Unfortunately, we do not know how to measure the entirety of one's brain, much less one's mind. We therefore must restrict our interest to a mind/brain property pair. A mind (mental) property might be a person's intelligence, psychological state, current thought, gender identity, etc. A brain property might be the number of cells in a person's brain at some time t, or the collection of spike trains of all neurons in the brain during some time period t to t′. Regardless of the details of the specifications of the mental property and the brain property, given such specifications, one can assume a model, . We desire a classifier gn that is guaranteed to be consistent, no matter which of the possible distributions is the true distribution. A classifier with such a property is called a universally consistent classifier. Below, under a very general mind-brain model , we construct a universally consistent classifier.

Gedankenexperiment1. Let the physical property under consideration be brain connectivity structure, so b is a brain-graph (“connectome”) with vertices representing neurons (or collections thereof) and edges representing synapses (or collections thereof). Further let , the brain observation space, be the collection of all graphs on a given finite number of vertices and let , the mental property observation space, be finite. Now, imagine collecting very large amounts of very accurate identically and independently sampled brain-graph data and associated mental property indicators from FMB. A kn-nearest neighbor classifier using a Frobenius norm is universally consistent (see Methods for details). The existence of a universally consistent classifier guarantees that eventually (in n, n′) we will be able to conclude for this mind-brain property pair, if indeed ε-supervenience holds. This logic holds for directed graphs or multigraphs or hypergraphs with discrete edge weights and vertex attributes, as well as unlabeled graphs (see ref. 8 for details). Furthermore, the proof holds for other matrix norms (which might speed up convergence and hence reduce the required n) and the regression scenario where is infinite (again, see Methods for details).

Thus, under the conditions stated in the above Gedankenexperiment, universal consistency yields:

Theorem 3. as n, n′→ ∞.

Unfortunately, the rate of convergence of LF (gn ) to LF (g*) depends on the (unknown) distribution F = FMB9. Furthermore, arbitrarily slow convergence theorems regarding the rate of convergence of LF (gn ) to LF (g*) demonstrate that there is no universal n, n′ which will guarantee that the test has power greater than any specified target β > α 10. For this reason, the test outlined above can provide only a one-sided conclusion: if we reject we can be 100(1 − α)% confident that holds, but we can never be confident in its negation; rather, it may be the case that the evidence in favor of is insufficient because we simply have not yet collected enough data. This leads immediately to the following theorem:

Theorem 4. For any target power βmin > α, there is no universal n, n′ that guarantees β ≥ βmin .

Therefore, even ε-supervenience does not satisfy Popper's falsifiability criterion11.

The feasibility of a consistent statistical test for ε-supervenience

Theorem 3 demonstrates the availability of a consistent test under certain restrictions. Theorem 4, however, demonstrates that convergence rates might be unbearably slow. We therefore provide an illustrative example of the feasibility of such a test on synthetic data.

Caenorhabditis elegans is a species whose nervous system is believed to consist of the same 302 labeled neurons for each organism12. Moreover, these animals exhibit a rich behavioral repertoire that seemingly depends on circuit properties13. These findings motivate the use of C. elegans for a synthetic data analysis14. Conducting such an experiment requires specifying a joint distribution FMB over brain-graphs and behaviors. The joint distribution decomposes into the product of a class-conditional distribution (likelihood) and a prior, FMB = FB|MFM . The prior specifies the probability of any particular organism exhibiting the behavior. The class-conditional distribution specifies the brain-graph distribution given that the organism does (or does not) exhibit the behavior.

Let Auv be the number of chemical synapses between neuron u and neuron v according to15. Then, let be the set of edges deemed responsible for odor-evoked behavior according to16. If odor-evoked behavior is supervenient on this signal subgraph , then the distribution of edges in must differ between the two classes of odor evoked behavior17. Let Euv|j denote the expected number of edges from vertex v to vertex u in class j. For class m0, let Euv|0 = Auv + η, where η = 0.05 is a small noise parameter (it is believed that the C. elegans connectome is similar across organisms12). For class m1, let Euv|1 = Auv + zuv , where the signal parameter zuv = η for all edges not in and zuv is uniformly sampled from [−5, 5] for all edges within . For both classes, let each edge be Poisson distributed, .

We consider kn -nearest neighbor classification of labeled multigraphs (directed, with loops) on the 279 under Frobenius norm (the C. elegans somatic nervous system has only 279 neurons that make synapses with other neurons). The kn -nearest neighbor classifier used here satisfies kn → ∞ as n → ∞ and kn /n → 0 as n → ∞, ensuring universal consistency. (Better classifiers can be constructed for the joint distribution FMB used here; however, we demand universal consistency.) Figure 1 shows that for this simulation, rejecting (ε = 0.1)-supervenience at α = 0.01 requires only a few hundred training samples.

Figure 1
figure 1

C. elegans graph classification simulation results.

The estimated hold-out misclassification rate (with n′ = 1000 testing samples) is plotted as a function of class-conditional training sample size nj = n/2, suggesting that for ε = 0.1 we can determine that holds with 99% confidence with just a few hundred training samples generated from FMB . Each dot depicts for some n; standard errors are . For example, at nj = 180 we have (where indicates the floor operator), and standard error less than 0.01. We reject : L* ≥ 0.1 at α = 0.01. Note that L* ≈ 0 for this simulation.

Importantly, conducting this experiment in actu is not beyond current technological limitations. 3D superresolution imaging18 combined with neurite tracing algorithms19,20,21 allow the collection of a C. elegans brain-graph within a day. Genetic manipulations, laser ablations and training paradigms can each be used to obtain a non-wild type population for use as M = m113 and the class of each organism (m0 vs. m1) can also be determined automatically22.

Discussion

This work makes the following contributions. First, we define statistical supervenience based on Davidson's canonical statement (Definition 1). This definition makes it apparent that supervenience implies the possibility of perfect classification (Theorem 1). We then prove that there is no viable test against supervenience, so one can never reject a null hypothesis in favor of supervenience, regardless of the amount of data (Theorem 2). This motivates the introduction of a relaxed notion called ε-supervenience (Definition 2), against which consistent statistical tests are readily available. Under a very general brain-graph/mental property model (Gedankenexperiment 1), a consistent statistical test against ε-supervenience is always available, no matter the true distribution FMB (Theorem 3). In other words, the proposed test is guaranteed to reject the null whenever the null is false, given sufficient data, for any possible distribution governing mental property/brain property pairs.

Alas, arbitrary slow convergence theorems demonstrate that there is no universal n, n′ for which convergence is guaranteed (Theorem 4). Thus, a failure to reject is ambiguous: even if the data satisfy the above assumptions, the failure to reject may be due to either (i) an insufficient amount of data or (ii) may not be ε-supervenient on . Moreover, the data will not, in general, satisfy the above assumptions. In addition to dependence (because each human does not exist in a vacuum), the mental property measurements will often be “noisy” (for example, accurately diagnosing psychiatric disorders is a sticky wicket23). Nonetheless, synthetic data analysis suggests that under somewhat realistic assumptions, convergence obtains with an amount of data one might conceivably collect (Figure 1 and ensuing discussion).

Thus, given measurements of mental and brain properties that we believe reflect the properties of interest and given a sufficient amount of data satisfying the independent and identically sampled assumption, a rejection of : L* ≥ ε in favor of entails that we are 100(1 − a)% confident that the mental property under investigation is ε-supervenient on the brain property under investigation. Unfortunately, failure to reject is more ambiguous.

Interestingly, much of contemporary research in neuroscience and cognitive science could be cast as mind-brain supervenience investigations. Specifically, searches for “engrams” of memory traces24 or “neural correlates” of various behaviors or mental properties (for example, consciousness25), may be more aptly called searches for the “neural supervenia” of such properties. Letting the brain property be a brain-graph is perhaps especially pertinent in light of the advent of “connectomics”26,27, a field devoted to estimating whole organism brain-graphs and relating them to function. Testing supervenience of various mental properties on these brain-graphs will perhaps therefore become increasingly compelling; the framework developed herein could be fundamental to these investigations. For example, questions about whether connectivity structure alone is sufficient to explain a particular mental property is one possible mind-brain ε-supervenience investigation. The above synthetic data analysis demonstrates the feasibility of ε-supervenience on small brain-graphs. Note that ε-supervenience tests need not investigate seemingly intractable problems, like consciousness. For example, aspects of visual perception appear to supervene on visual cortical activity (for example, binocular rivalry28). Moreover, an inability to reject ε-supervenience for small ε is also potentially meaningful. For example, perhaps auditory localization precision supervenes on a rate code only to some ε > c, the rest supervening on a spike timing code29. Similar supervenience tests on increasingly complex mental properties will potentially benefit from either higher-throughput imaging modalities30,31, more coarse brain-graphs32,33, or both.

Methods

The 1-nearest neighbor (1-NN) classifier works as follows. Compute the distance between the test brain b and all n training brains, di = d(b, bi ) for all i [n], where [n] = 1,2,..., n. Then, sort these distances, d(1) < d(2) < ... < d(n) and consider their corresponding minds, m(1), m(2),..., m(n), where parenthetical indices indicate rank order among {di } i[n]. The 1-NN algorithm predicts that the unobserved mind is of the same class as the closest brain's class: . The kn nearest neighbor is a straightforward generalization of this approach. It says that the test mind is in the same class as whichever class is the plurality class among the kn nearest neighbors, . Given a particular choice of kn (the number of nearest neighbors to consider) and a choice of d(·,·) (the distance metric used to compare the test datum and training data), one has a relatively simple and intuitive algorithm.

Let gn be the kn nearest neighbor (kn NN) classifier when there are n training samples. A collection of such classifiers {gn }, with kn increasing with n, is called a classifier sequence. A universally consistent classifier sequence is any classifier sequence that is guaranteed to converge to the Bayes optimal classifier regardless of the true distribution from which the data were sampled; that is, a universally consistent classifier sequence satisfies LF (gn ) → LF (g*) as n → ∞ for all FMB . In the main text, we refer to the whole sequence as a classifier.

The kn NN classifier is consistent if (i) kn → ∞ as n → ∞ and (ii) kn /n → 0 as n → ∞34. In Stone's original proof34, b was assumed to be a q-dimensional vector and the L2 norm (, where j indexes elements of the q-dimensional vector) was shown to satisfy the constraints on a distance metric for this collection of classifiers to be universally consistent. Later, others extended these results to apply to any Lp norm9. When brain-graphs are represented by their adjacency matrices, one can stack the columns of the adjacency matrices, effectively embedding graphs into a vector space, in which case Stone's theorem applies. Stone's original proof also applied to the scenario when was infinite, resulting in a universally consistent regression algorithm as well.

Note that the above extension of Stone's original theorem to the graph domain implicitly assumed that vertices were labeled, such that elements of the adjacency matrices could easily be compared across graphs. In theory, when vertices are unlabeled, one could first map each graph to a quotient space invariant to isomorphisms and then proceed as before. Unfortunately, there is no known polynomial time complexity algorithm for graph isomorphism35, so in practice, dealing with unlabeled vertices will likely be computationally challenging8.