Universal principles justify the existence of concept cells

The widespread consensus argues that the emergence of abstract concepts in the human brain, such as a “table”, requires complex, perfectly orchestrated interaction of myriads of neurons. However, this is not what converging experimental evidence suggests. Single neurons, the so-called concept cells (CCs), may be responsible for complex tasks performed by humans. This finding, with deep implications for neuroscience and theory of neural networks, has no solid theoretical grounds so far. Our recent advances in stochastic separability of highdimensional data have provided the basis to validate the existence of CCs. Here, starting from a few first principles, we layout biophysical foundations showing that CCs are not only possible but highly likely in brain structures such as the hippocampus. Three fundamental conditions, fulfilled by the human brain, ensure high cognitive functionality of single cells: a hierarchical feedforward organization of large laminar neuronal strata, a suprathreshold number of synaptic entries to principal neurons in the strata, and a magnitude of synaptic plasticity adequate for each neuronal stratum. We illustrate the approach on a simple example of acquiring “musical memory” and show how the concept of musical notes can emerge.


Central Limit Theorem
Let {X i } n i=1 be n independent random variables with zero means and standard deviations {σ i } n i=1 . We introduce new random variable: with a certain cdf F n (·). Then, we have [2]: where Φ is the cdf of the standard normal distribution and ρ i = E[|X i | 3 ]. Moreover, the constant, C, is bounded by [3,4]: Property (3), being a version of the Central Limit Theorem, implies that empirical averages of independent random variables with zero means and finite second and third moments are asymptotically normally distributed as n → ∞. If no further assumptions are imposed then the convergence rate is

Decay of tails of the membrane potential
Employing (3), using a random x (see the main text), and noting that where, as above, F n (·) is the corrected distribution. Using inequalities (1), we get the following estimate on the firing probability to a random stimulus x: Now by employing this concentration inequality together with (5), we find the bounds: where For high n, these bounds converge to the probability value 1 − Φ( √ 3θ), provided in the main text. We also note the exponential convergence of p up (θ, n) to zero as a function of θ. This is a direct consequence of measure concentration effects.
2 Conditions of neuronal firing in forward time: Selection of β sl We set the firing threshold small enough, e.g. θ = 1. Then, with high probability, all neurons are active, i.e., d j ≥ 1, and there are no lost stimuli ( Fig. 2(a) in the main text).
For convenience, we denote by h = 3 n x i the first stimulus activating the j-th neuron at t * ≥ 0, i.e., y j (t < t * ) = 0, y j (t * ) > 0. Let us now find the condition that the neuron keeps "firing" for t > t * .
We decompose w j into vectors parallel and orthogonal to h (by omitting the index j): w = w + w ⊥ , where w := q(t) h h and w , w ⊥ = 0. Then, Eq. (2c) from the main text yields: By construction, at t = t * the neuron fires, i.e., v(t * ) = q(t * ) h > θ. Note that q(t ≥ t * ) > 0, otherwise y = 0 and there is no dynamics. Selecting β > θ/ h we ensure the firing condition y(t ≥ t * ) > 0. Then, w ⊥ (t) → 0 and q → β, which implies: provided in the main text. Note that the value of β should not be too high, since it can diminish the neuronal selectivity (see below). Choosing β = θ/ h + , where 0 < 1, ensures activity of the neuron but it requires knowledge of h , inaccessible a priori. Then, by using h 2 ∼ N (1, 2 √ 5n ) (directly follows from Section 1 for n high enough) and requiring P( h 2 > δ 2 ) = p sl , where δ ∈ (0, 1) is a lower bound of h , we can set: This guarantees faring of the neuron to the stimulus h in forward time with a probability no smaller than p sl . Note that the higher the neuronal dimension n, the higher p sl can be chosen.

Selectivity after learning
We assume that a neuron has learnt an arbitrary stimulus, which we denote by h ∈ 3 n x i . Then, after learning w = β h h . We now estimate the probability that the neuron is silent to another arbitrary stimulus given h: This can be done in several ways.

Probability by normal distribution
By employing normal distribution, from (12) we get: Then, we extend it to arbitrary h as above: where κ(·; µ, σ) is the normal pdf with the mean µ = 1 and the standard deviation σ = 2 √ 5n . Equation (18) corresponds to Eq. (6) in the main text.

Comparison of two approaches
The neuronal selectivity is given by [Eq. (7) in the main text] S(n, L) = P L−1 , where P can be taken either from (16) or from (18). Figure 1 shows the neuronal selectivity estimated by two methods. The lower bound estimated from inequalities (16) (blue curve) is too conservative, while Eq. (18) matches well the numerical results (see Fig. 3(a) in the main text).

Learning condition
At t = 0, we assume that the neuron detects the first stimulus h 1 = y 1 , i.e., w(0), h 1 > θ cn , which is equivalent to q(0) > θ/ h 1 in Eq. (9). Thus, to keep firing we require By using Eq. (9) we get that, at the end of the first interval ∆, w → β cn h 1 / h 1 . In general, the initial condition for the k-th interval is This is equivalent to To meet the firing condition at t = (k − 1)∆, we require q k0 > θ cn / h 1 which yields where we used y i , y j = 0 for j = i (see the main text). Thus, given that α is big enough, the neuron will fire during the whole process of learning.
Once the learning is finished, w = β cn which is equivalent to

Estimate of β cn
For convenience, let's denote: We then set β cn = θ cn Ψ, where Ψ satisfies [Eq. (24)]: where p cn is the lower probability bound. This equation ensures that the concept stratum learns at least K inputs with the probability not smaller than p cn . For further calculations, we assume that z := y 2 is exponentially distributed: for some constant λ > 0. Then, S follows the Erlang distribution: To find the distribution of M we write: where F z is the cdf of z. Thus, We now can assume that S and M are independent and hence f (m, s) = f M (m)f S (s). Then, Eq. (26) yields By using (28), (30), (31), and operating, we get We now note that a is a small parameter. Thus, we can approximate e −u−a √ u ≈ e −u (1 − √ ua) and evaluate the integral (32): This equation provides the estimate: We now note that λ = 1/E[z] and assume that all neurons in the selective stratum have learnt stimuli, i.e., where b is a binary vector representing neurons activated by the stimulus h. Thus, Then, we note that b 2 ∼ B(m, p) and hence E[ b 2 ] = mp. In the case that all L stimuli have been learnt, we have p = L −1 . Now, we have E[( h −δ) 2 ] = 1−2δE[ h ]+δ 2 . In the first order approximation E[ h ] ≈ 1. Thus, we have Substituting approximation (37) into Eq. (34) we obtain Eq. (10) provided in the main text.