Multiplicative modulations in hue-selective cells enhance unique hue representation

There is still much to understand about the color processing mechanisms in the brain and the transformation from cone-opponent representations to perceptual hues. Moreover, it is unclear which areas(s) in the brain represent unique hues. We propose a hierarchical model inspired by the neuronal mechanisms in the brain for local hue representation, which reveals the contributions of each visual cortical area in hue representation. Local hue encoding is achieved through incrementally increasing processing nonlinearities beginning with cone input. Besides employing nonlinear rectifications, we propose multiplicative modulations as a form of nonlinearity. Our simulation results indicate that multiplicative modulations have significant contributions in encoding of hues along intermediate directions in the MacLeod-Boynton diagram and that model V4 neurons have the capacity to encode unique hues. Additionally, responses of our model neurons resemble those of biological color cells, suggesting that our model provides a novel formulation of the brain's color processing pathway.

The color processing mechanisms in the primary visual cortex and later processing stages are a target of debate among color vision researchers. What is also unclear is which brain area represents unique hues, those pure colors unmixed with other colors. In spite of a lack of agreement on color mechanisms in higher visual areas, human visual system studies confirm that color encoding starts with three types of cones forming the LMS color space. Cones send opponent feedforward signals to LGN cells with single-opponent receptive fields [1]. Cone-opponent mechanisms such as those in LGN were the basis of the "Opponent Process Theory" of Hering [2], where he introduced unique hues. The four unique hues, red, green, yellow and blue were believed to be encoded by coneopponent processes. Later studies [3, 4,5], however, confirmed that the cone-opponent mechanisms of earlier processing stages do not correspond to Hering's red vs. green and yellow vs. blue opponent processes. In fact, they observed that the color coding in early stages is organized along the two dimensions of the MacLeod and Boynton (MB) [6] diagram. That is, along L vs. M and S vs. LM axes 1 . 1 In the MB diagram, horizontal and vertical axes represent excitations of cones in an equi-luminance plane. The horizontal axis, often referred to as the L vs. M axis, corresponds to opposing signal Beyond LGN, studies on multiple regions in the ventral stream show an increase in nonlinearity with respect to the three cone types from LGN to higher brain areas [7] and also a shift of selectivity toward intermediate hues in the MB diagram with more diverse selectivity in later processing stages [8]. Specifically, in V1, some suggested that neurons representing local hue have single-opponent receptive fields similar to LGN cells [9,10] with comparable chromatic selectivities obtained by a rectified sum of the three cone types [11] or by combining LGN activations in a nonlinear fashion [4]. In contrast, Wachtler et al. [12] found that the tunings of V1 neurons are different from LGN and that the responses in V1 are affected by context.
Although Namima et al. [13] found neurons in V4, AIT and PIT to be luminance-dependent and that the effect of luminance in responses of neurons varies from one stimulus color to another, others reported that in the Macaque extrastriate cortex, millimeter-sized neuron modules called globs, have luminance-invariant color tunings [14]. Within globs in V4, clusters of hue-selective patches with sequential representation following the color order in the HSL space were identified, which were called "rainbows of from L and M cones. The vertical axis, referred to as S vs. LM, represents the opposing signal from S cones against the combination of signals from L and M cones. patches" [15] (See Figure 6(a) as an example)). A similar observation was noted by Conway and Tsao [16], who suggested that cells in a glob are clustered by color preference and form the hypothesized color columns of Barlow [17]. Patches in each cluster have the same visual field location with a great overlap in their visual field with their neighboring patches [15,14]. Moreover, each color activates 1-4 overlapping patches and neighboring patches are activated for similar hues. Comparable findings in V2 were reported in [18]. Following these observations, Li et al. [15] suggested that different multi-patch patterns represent different hues, and such a distributed and combinatorial color representation could encode the large space of physical colors, given the limited number of neurons in each cortical color map. Other studies also suggested that glob populations uniformly represent color space [19] with narrow tunings for glob cells [19,20,21].
Not only is there disagreement about the color processing mechanisms in the visual cortex, but also which region in the brain represents unique hues. Furthermore, transformation mechanisms from cone-opponent responses to unique hues are unclear. While unique red is found to be close to the +L axis in the MB diagram, unique green, yellow and blue hues cluster around intermediate directions [22], not along cone-opponent axes. Perhaps clustering of the majority of unique hues along intermediate directions could describe the suggestion by Wuerger et al. [23] who proposed that the encoding of unique hues, unlike the tuning of LGN neurons, needs higher order mechanisms such as a piecewise linear model in terms of cone inputs. The possibility of unique hue representations in V1 and V2 was rejected in [24], who like others [21,25] observed neurons in PIT show selectivities to all hue angles 2 and that there are more neurons selective to those close to unique hues. The choice of stimuli for recordings in [24] was then challenged in [26] commenting that it is still unclear whether or not unique hues are represented in IT. Similarly, Zaidi et al. [27] observed no significance of unique hues in human subjects and responses of IT neurons.
Among all the attempts to understand the neural processes for transformation from cone-opponency to perceptual colors, a number of computational models tried to suggest mechanisms for this problem and other aspects of color representation in higher areas [28,29,30,31,32]. These models, however, are one-layer formulations of perceptual hue encoding, or in other words, the totality of processing in these models is compressed into a single layer process. The end result may indeed provide a suitable model in the sense of its input-output characterization. However, it does not make an explicit statement about what each of the processing areas of the visual cortex are contributing to the overall result and they do not shed light upon the mystery of color representation mechanisms in the brain.
In this article, we introduce a computational color processing model that as Brown [33] argues, helps in "understand[ing] how the elements of the brain work together to form functional units and ultimately generate the complex cognitive behaviors we study". For this purpose, we build a hierarchical framework, inspired by neural mechanisms in the visual system, that explicitly models neurons in each of LGN, V1, V2, and V4 areas and reveals how each visual cortical area participates in the process. In this model, nonlinearity is gradually increased in the hierarchy as observed by [7]. In particular, while a half-wave rectifier unit keeps the V1 tunings similar to those of LGN [11], it makes V1 neurons nonlinear in terms of cone inputs. In V2, in addition to single-opponent cells, we propose employing neurons with multiplicative modulations, which not only introduce another form of nonlinearity but also allow neuronal interactions in the form of mixing of color channels as well as a decrease in the tuning bandwidths. De Valois et al. [31] suggested that additive or subtractive modulation of cone-opponent cells with S-opponent cell responses rotates the cone-opponent axes to red-green and blue-yellow directions. Here, we achieved this rotation with multiplicative modulations of V1 L-and M-opponent cell activations with V1 S-opponent neuron responses. We call these cells "multiplicative V2" neurons. Finally, V4 responses are computed by linearly combining V2 activations with weights determined according to tuning peak distances of V2 cells to the desired V4 neuron tuning peak.  Figure 1(b) demonstrates our network in action. Each layer of this model implements neurons in a single brain area. Each map within a layer consist of neurons of a single type with receptive fields spanning the visual field of the model, for example, a map of neurons selective to red hue in model layer V4. The leftmost layer in this figure shows the input to the network with the LMS cone activations. We found that the tuning peak of multiplicatively modulated V2 cells shifts toward hues along intermediate directions in the MB space. Consequently, these neurons have substantial input weights compared to single-opponent V2 cells to V4 neurons selective to hues along intermediate directions.
Moreover, we observed a gradual decrease in distance of tuning peaks to unique hue angles reported by [34] from our model LGN cells to V4 neurons. Our simulation results demonstrate that responses of our network neurons resemble those of biological color cells.
In what follows, we will make a distinction between our model and brain areas by referring to those as layers and areas, respectively. That is, a set of model neurons implementing cells in a brain area will be referred to as a model layer. For example, our model layer V2 implements cells in brain area V2.  Note that the color employed for each feature map here is figurative and not a true representation of the hue-selectivity of its comprising neurons. In model layer V4, an example of a model cluster is shown in a larger view, similar to the clusters found in monkey V4 [15]. Each model cluster corresponds to a column of the three dimensional matrix obtained by stacking V4 maps. Each element of a model cluster is called a model patch. (b) An example showing each layer of the hierarchical color model on an image of a hue wheel. Each layer of the network is shown by a bounding box, with a number of neuronal maps inside the box. Next to each map, the selectivity of its neurons is written. The receptive field of each neuron in these maps is centered at the corresponding pixel location. The neuron responses are shown in grayscale, with a minimum response as black, and maximum activation as white. For example, in the map for neuron type L-on in the LGN layer, strong activities are observed for neurons with receptive fields around the red hue region. The dark border around each feature map is shown only for the purpose of this figure and is not part of the activity map.

Results
In this section, we explain our simulation experiments, designed to make two important aspects of our model clear:

Model Neuron Tunings
In order to test the effectiveness of our approach for modeling local hues, we examined the tuning of each hueselective neuron in the individual layers of our network. For this purpose, we sampled the hue dimension of the HSL space. We keep saturation and lightness values constant and set to 1 and 0.5 respectively, following [15]. Our sampling consists of 60 different hues in the range of [0, 360) degrees, separated by 6 degrees. When these HSL hue angles are mapped to a unit circle in the MB space, they are not uniformly spaced on the circle and are rotated. For example, the red hue in the HSL space at 0 deg corresponds to the hue at about 18 deg in the MB space. The mapping of the 60 sampled hues on a unit circle in the MB diagram are shown on the unit circles in Figure 2. The color of each dot corresponds to the hue it represents on the unit circle. Note that the positive vertical direction in the tuning plots corresponds to lime hues, following the plots from Conway and Tsao [16] , their Figure 1. We present each of these 60 hues to the model and record the activities of model LGN, V1, V2 and V4 neurons. Plots in Figures 2 and 3 show model neuron activities to each of the sampled hues. In each plot, the circular dimension represents the hue angle in the MB diagram, and the radial dimension represents the response level of the neuron. We found the tunings of our model LGN and V1 neurons look relatively similar with differences due to the nonlinearity of V1 neurons imposed by the rectifier. We plotted the responses of V1 neurons in both negative and positive ranges for comparison purposes with those of LGN, and in V2 and V4, the responses are shown in the positive range. Although it might not be evident from tunings of V1 cells and those of single-opponent V2 neurons in Figure 2, due to the plotted range of responses in these figures, we emphasize that the tunings of these cells look identical when plotted in the same range. The average difference of responses between pairs of model V1 and their corresponding single-opponent V2 cells is on the order of 10 −6 . An example of similar tunings for these cells is shown in figures 2 (27) and 2 (28) Comparing the tunings of model single-opponent and multiplicative V2 cells give a clear image of narrower tunings in the S-modulated V2 cells. Not only these cells have narrower tunings, but they generally peak close to intermediary directions. See, for example, the tuning of V2 L-off × S-off cell with a narrow tuning and a peak close to the unique green hue angle.
In our model layer V4, we implemented six different neuron types according to distinct red, yellow, green, cyan, blue and magenta in the HSL space. The chosen hues are 60 deg apart on the hue circle of HSL and the weights from model V2 cells to model V4 neurons were determined according to the distance between mean peak activations of model V2 neurons to the desired hue in a model V4 cell. Tunings of our model V4 neurons, depicted in Figure 3, show a clear peak for each cell close to its desired selectivity, with narrower tunings compared to single-opponent V2 cells.

Tuning bandwidths
In order to obtain a quantitative evaluation of the above observation with regards to narrower tunings due to multiplicative modulations, we computed the bandwidth of all neurons in model layers V2 and V4, following [35]. In case of peaks at more than one hue, we take the mean peak hue as the peak response representative for computation of bandwidth and later for peak tunings. Note that the goal of this analysis is to verify whether multiplicative modulations result in neurons with smaller bandwidths, i.e. narrower tunings. In this work, we did not hypothesize a model as Kiper et al. [35] who computed the bandwidth threshold analytically for linear and nonlinear tunings, nor did we have a population of neurons to report the percentage of linear/nonlinear cells. Instead, we simply computed the bandwidth for each cell type in model layers V2 and V4 with respect to its responses at discrete sampled hues and plotted the distribution of tuning bandwidths. Specifically, we computed the bandwidth of 6 single-opponent V2 cells, 8 multiplicatively modulated V2 neurons, and 6 hue-selective V4 cells tunings that were shown in Fig    , and V2 to 60 hues sampled from the hue dimension in the HSL space. Each sampled hue is mapped to its corresponding hue anlge in the MB space and is shown by a colored dot corresponding to the sampled hue on the circumference of a unit circle in the MB space. In each plot, the circular dimension represents the hue angle in the MB space. The level of responses is shown in the radial dimension in these plots. In each row, the model layer the neurons belong to is specified on the left edge of the row. The neuron type is mentioned below each plot. Tunings in (27) and (28)

Tuning mean peaks
Watcher et al. [12] observed that most neurons in V1 peak around non-opponent directions, while Kiper et al. [35] reported that cells in V2 exhibited no preference to any particular color direction and no obvious bias to unique hues. We tested the mean tuning peaks of our model neurons to examine for cone-opponent vs. intermediate selectivites. Figure 4(b) shows a polar histogram of tuning mean peaks of neurons in all layers of our model, where each sector of the circle represents a bin of the polar histogram. This figure clearly demonstrates that the majority of LGN, V1, and single-opponent V2 cells (5 out of 6 neuron types) peak close to cone-opponent axes. In contrast, our model multiplicative V2 cells and hue-sensitive V4 neurons peak at both cone-opponent and intermediate hues, as reported in [35] and [19]. In other words, with an increase in nonlinearity, representation of hues along intermediate directions start to develop.

Unique hue representation
In Figure 4(b), we observed that each V4 bar in the polar histogram was paired with a multiplicative V2 bar. We wondered about the contribution of multiplicative V2 cells to the responses of V4 neurons. Therefore, we compared the sum of single-opponent V2 cell weights against that of multiplicative V2 neurons, depicted in Figure 5(a). Interestingly, V4 cells selective to magenta, blue, and green hues, which are off the cone-opponent directions, have significant contributions from multiplicative V2 cells. In green and magenta, in particular, multiplicative cells make up more than 75% of V2-V4 weights. The rest of hues, which are close to the cone-opponent directions in the MB space, receive a relatively large feedforward input from the single-opponent V2 cells. In short, multiplicative cells play a significant role in the representation of hues in intermediate directions, while single-opponent cells have more substantial contributions to hues along cone-opponent axes. Next, we asked the question of how well neurons in each of our model layers represent unique hues? To answer this, we computed the distance of mean peak angles for our model neurons to the unique hue angles reported by Miyahara et al. [34]. For each unique hue, in each layer of our model, we report the distance of a model neuron with a peak closest to the distinct unique hue angle. That is, the minimum distance of mean peak angle among all neuron types in each layer to a given unique hue angle. The distances, layer by layer, are shown in Figure 5(b). In this figure, the distances of our model neurons to unique red and yellow are relatively small for all the four model layers, at less than 5 deg. This could be due to the fact that unique red and yellow reported in [34] are close to cone-opponent axes in the MB space, in agreement with findings of [22] for unique red hue. For unique green and blue, the V2 and V4 distances are far smaller than those of earlier layers. Specifically, there is a 44 deg drop in V2 and V4 distances to unique green compared to that of V1. Also, the distance is even smaller in V4 than V2 to the unique blue hue. As a summary, we observed a gradual development in the exhibition of selectivity to unique green and blue hues, while selectivity to unique red and yellow was observed in early as well as higher layers. Moreover, these results suggest that V4 cells with peak selectivities at less than 5 deg distance to unique hue angles, and consequently, neurons in higher layers, have the capacity to encode unique hues.

Hue Distance correlation
In their study, Li et al. [15] found a correlation between pairs of stimulus hue distances and the cortical distances of maximally activated patches in each cluster. Figure 6(a) illustrates an example of an identified map of three clusters of patches in V4 reported in [15], with one cluster shown in a larger view in Figure 6(b). Figure 6(c) depicts the cortical distances of activated patches for pairs of hues as a function of the stimulus hue distances. For this analysis, Li et al. [15] employed an ordered representation for hues, according to the sequence ordering of patches witnessed in clusters, with 0 for magenta, 1 for red, 2 for yellow, and so on. They defined the hue distances as the difference of these assigned values.
In order to test for a similar relationship between hue distances and the pattern of activities of model V4 neurons, we stacked our model V4 maps, in the order shown in Figure 1(a), beginning with magenta, red, and so on. Stacking these maps results in a three-dimensional array, each column of which can be interpreted as a cluster of hue-selective patches, with neighboring patches sensitive to related hues, similar to those observed in V4 of monkeys [15]. We call each column of our stacked maps a model cluster and each element of the column a model patch. An example of a model V4 cluster in a larger view is shown in Figure 1(a). For a given model cluster and a pair of stimulus hues, we compute the distance of the two model patches within the cluster that are maximally activated by those hues. For example, the distance of the red patch from the cyan patch as shown in Figure 1(a) is 4. In this experiment, we employed our sampled hues from the HSL space, starting from red at 0 deg, separated 30 [34]. For each given model layer and a unique hue, the minimum distance from mean peak hues of all neurons in the layer to the unique hue is reported. As the plot shows, unique hue representation develops in the hierarchy and the distances decrease gradually from LGN to V4. Unique red and yellow representations develop in earlier stages compared to unique green and blue. Note the significant drop in the mean peak distance for V2 and V4 neurons to unique green, achieved by increasing the nonlinearity in those layers. Maximum Activation Distance (d) Figure 6: (a) Color map of V4 neurons in three clusters of patches (adapted from [15]). (b) Larger view of a cluster of patches in V4(adapted from [15]). (c) Cortical distance of activated patches in 6(b) as a function of hue distances (adapted from [15]).(d) Correlation analysis between the hue distances and model patch distances in each model cluster.
ordering assigned to stimulus hues employed in [15], we assigned values in the range [0, 5.5] at 0.5 steps starting with 0 for magenta. The plot in Figure 6(d) demonstrates our model patch distances as a function of hue distances with a clear correlation. The correlation coefficient was r = 0.93, p = 2.09 × 10 −29 . In other words, similar to the biological V4 cells, the pattern of responses in our model V4 neurons is highly correlated with the ordering of hues in the HSL space.

Hue Reconstruction
Li et al. [15] showed that in monkeys, 1-4 patches are needed to represent any hue in the visual field. Moreover, they showed that different hues were encoded with different multi-patch patterns. Then, they suggested that a combination of these activated patches can form a representation for the much larger space of physical colors. Along this line, we show, through a few examples, that for a given hue, a linear combination of model V4 neurons can be learned and used for representing that particular hue. It is important to note that it would be impossible to learn weights for the infinitely many possible physical hues. Hence, we show only a few examples here. However, our experiment is an instance of the possible mechanism for color representation suggested by Li et al. [15].
In this experiment, for a given hue value, we independently sampled the saturation and lightness dimensions at 500 points. The samples were uniformly distributed along each dimension. As a result, we have 500 colors of the same hue. The goal is to compute a linear combination of model V4 neurons, which can reconstruct the groundtruth hue.
The hues in this experiment were represented as a number in the (0, 2π] range. For numerical reasons, red is represented as 2π, not 0. We performed an L1-regularized least square minimization, using the "L1 ls" function described in [36]. Table 1 shows some of the results for this experiment. Interestingly, in all cases, no more than four neuron types have large weights compared to the rest of the neurons. This is in agreement with the findings of [15]. Specifically, in the case of red and yellow hues, about 99% of the contribution is from only a single cell, red and yellow neurons respectively. The last row in Table 1 is most insightful. It presents the weights for a lavender hue in equal distance from blue (240 deg) and magenta (300 deg). The weights for this example seem counter-intuitive as they include green, cyan and magenta with positive contributions. In addition, blue is absent. However counterintuitive the weights seem, careful scrutiny of mean peak angles for V4 hues reveals that lavender hue at 270 deg is somewhere between the peaks for V4 cyan (at 193 deg) and magenta (at 300 deg), and closer to magenta. This hue is mainly reconstructed from magenta, with more than 70% contribution, while the small weight for cyan is com-pensated with that of green. In other words, in this case, the green cell plays the role of shifting the reconstruction from magenta toward lavender.
Once again, it must be stressed that this experiment was performed to examine the possibility of combinatorial representation mechanisms and a thorough investigation of this mechanism in the computational sense is left for future work. The examples shown here attest to the fact that intermediary hues encoded by model V4 neurons can indeed span the massive space of physical hues and are enough for reconstructing any arbitrary hue from this space.

Discussion
Our goal was to further understanding of the color processing mechanisms in the brain and to begin to assign color representational roles to specific brain areas. We investigated the contributions of each visual area LGN, V1, V2, and V4 in local hue representation by proposing a mechanistic computational model inspired by neural mechanisms in the visual system. Through a gradual increase in nonlinearity in terms of cone inputs, we observed a steady decrease in tuning bandwidths with a gradual shift in peak selectivities toward intermediate hue directions. Although one might be able to model the end result with a mathematical model in a single-layer fashion, such models do not lend insight to the neuronal mechanisms of color processing in the brain. In contrast, not only do our model neurons in each individual layer exhibit behavior similar to those of biological cells, but also at the system level, our hierarchical model as a whole provides a plausible process for the progression of local hue representation in the brain. The main difference in terms of potential insight provided by a single-layer mathematical model and our work is that our model can make predictions about real neurons that can be tested. A model whose contributions are of the input-output behavior kind cannot (see also [33]).
We proposed multiplicative modulations in V2 as a means to increase nonlinearity in the hierarchy. We demonstrated that such modulations could rotate the cone-opponent axes to intermediate directions of perceptual red-green and yellow-blue hues and shift the tuning peaks toward unique hue angles. In short, our model predicts that multiplicative modulations are key operations in the encoding of hues in intermediate directions and unique hue representation.
Our experimental results demonstrated that hue selectivity for model V4 neurons similar to that of neurons in area V4 of the monkey visual system was successfully achieved. Besides, our observations from the hue reconstruction experiment clearly confirmed the possibility of reconstructing the whole hue space using a combination of the hue-selective neurons in the model V4 layer. How this is achieved in the brain, for the infinitely many possible hues, remains to be investigated.
Finally, our hierarchical network of neurons provides an important implication with regards to unique hue representations. Specifically, our computational experiments showed that as the visual signal moves through the hierarchy, responses with peaks close to unique hues start to develop. For unique red, our model single-opponent LGN cells peaked at less than a degree distance from this hue, while for unique green more complicated computations, or higher order mechanisms as put by others, were required and reaching such a close peak was delayed until model layer V2. Putting these together, we believe the answer to the question "which region in the brain represents unique hues?" is not limited to a single brain area, which in turn could be the source of disagreement among color vision researchers. Instead, our findings suggest that this question must be asked for each individual unique hue and that the answer will consist of an assorted set of brain regions for all four unique hues.
In our model, adding a variety of neurons such as concentric and elongated double-opponent color cells would result in a more inclusive system. However, we did not intend to make predictions about all aspects of color processing but only hue encoding mechanisms. We found that concentric double-opponent color cells, for example, have tuning bandwidth distribution similar to singleopponent neurons and tuning peaks along cone-opponent axes. This finding suggests that the contributions of concentric double-opponent cells are comparable to those of single-opponent neurons for hue representation, but we did not investigate those contributions in other color representations.
Our hierarchical model can be further extended to encode saturation and lightness. In the future, we would like to also address the problem of learning weights from V2 to V4. Furthermore, the experiment on hue reconstruction was performed with a simple linear regression model. A more sophisticated learning algorithm might result in more insightful weights. Lastly, in order to keep our model simple and avoid second-order equations, we skipped lateral connections between neuron types. However, these are part of the future development of a secondorder model for our network. In this work, the input to our model is LMS channels.
In the event that the presented stimulus was available in RGB, we first performed a conversion into LMS channels using the transformation algorithm proposed by [37] (we used the C code provided by the authors). As a result, one can think of the presented stimulus to the network as the activations of three cone types. These cone activations are fed to single-opponent LGN cells, which in turn feed single-opponent V1 cells with nonlinear rectification. In the V2 layer, single-opponent neurons replicate the activations of those of V1, but with larger receptive fields. Later, we refer to these single-opponent cells as "additive V2 neurons". "Multiplicative V2" neurons form when single-opponent V1 cells with L and M cone inputs are multiplicatively modulated by V1 neurons with S-cone input. This approach is in a sense similar to S-modulations proposed by De Valois et al. [31], but in a multiplicative manner and not additive or subtractive. Finally, the huesensitive neurons in V4 receive feedforward signal from additive and multiplicative V2 cells.
Our model was implemented in TarzaNN [38]. The neurons in all layers are linearly rectified. The rectification was performed using: where P is neuron activity, and m and b are the slope and base spike rate respectively, τ is a lower threshold of activities and s represents the saturation threshold. This rectifier maps responses to [τ, 1]. Depending on the settings of parameters τ and s, and the range of activations for the model neurons, the rectifier might vary from being linear to nonlinear. Wherever this rectifier is employed in the rest of the paper, we mention the settings of the parameters, and whether parameter settings resulted in neuron activations to become linear or nonlinear in terms of their input. The input to the hierarchical network was always resized to 256×256 pixels. The receptive field sizes, following [39], double from one layer to the one above. Specifically, the receptive field sizes we employed were 19 × 19, 38 × 38, 76 × 76, and 152 × 152 pixels for LGN, V1, V2, and V4 layers respectively.

Model LGN Cells
The first layer of the hierarchy models single-opponent LGN cells. The LGN cells are characterized by their opponent inputs from cones. For example, LGN cells receiving excitatory input from L cones and inhibitory signals from M cones are known as L-on cells. Model LGN cell responses were computed by [40]: where * represents convolution. In this equation, model LGN response, R LGN , is computed by first, linearly combining cone activities, R L , R M , and R S , convolved with normalized Gaussian kernels, G, of different standard deviations, σ, followed by a linear rectification, φ. For model LGN neurons, we set τ = −1 and s = 1 to ensure the responses of these neurons are linear combinations of the cone responses [4,11]. The differences in standard deviations of the Gaussian kernels ensure different spatial extents for each cone as described in [1]. Each weight in Eq. 2, determines presence/absence and excitatory/inhibitory effect of the corresponding cone. The weights used for model LGN cells were set following [1] and [9]. In total, we modeled six different LGN neuron types, L-on, L-off, M-on, M-off, S-on, and S-off. As an example, consider the weights for M-on cells as −1.0, 1.1, 0 from L, M, and S cones respectively. These neurons receive opposite contributions from L and M cones, while S cones with weight 0 exhibit no contribution. That is, M and L cones have excitatory and inhibitory effects respectively, while S cones are absent. This type of neuron is known to best respond to cyan-like hues [41]. A relatively similar hue selectivity is observed in L-off cells, with In what follows, whenever we refer to a cell as L, M, or S in layers LGN and higher, we will be referring to the pair of on and off neurons in that layer. For instance, M-on and M-off neurons in LGN might be called M neurons in this layer, for brevity.

Model V1 cells
Local hue in V1, as suggested in [10] and [9], can be encoded by single-opponent cells. To obtain such a representation in the model V1 layer, the responses are determined by convolving input signals with a Gaussian kernel. Note that since single-opponency is implemented in the model LGN layer, by simply convolving model LGN signals with a Gaussian kernel, we will also have single-opponency in V1. The local hue responses of V1 were obtained by: where φ is the rectifier in Eq. 1. With τ = 0 and s = 1 for the rectifier, our model V1 neurons will be nonlinear functions of cone activations. In Eq. 3, substituting R LGN with any of the six model LGN neuron type responses will result in a corresponding V1 neuron type. Therefore, there are six neuron types in layer V1 corresponding to L-on, L-off, M-on, M-off, S-on, and S-off. The size of the Gaussian kernels for each of these neurons determines their receptive field sizes. In our implementation, the receptive field size doubles from one layer to the next following similar observations in the ventral stream [39].

Model V2 cells
In our network, the V2 layer consists of two types of hue selective cells: single-opponent and multiplicative. The single-opponent neurons are obtained by: where φ is the rectifier in Eq. 1. With τ = 0 and s = 1 for the rectifier, the single-opponent V2 cells are nonlinear functions of cone activations. In Eq. 4, substituting R V1 with each of the six model V1 neuron type responses will yield a model V2 neuron type with similar selectivities, but with a larger receptive field. To be more specific, the responses of single-opponent V2 neurons can be considered as a linear combination of V1 activations. To increase the nonlinearity as a function of cone activations in V2, as observed by Hanazawa et al. [7], and also to nudge the selectivities further toward intermediate hues, as found by Kuriki et al. [8], we introduce multiplicative V2 neurons. These cells not only add another form of nonlinearity to the model, other than that obtained by the rectifier in V1, but also mix the different color channels from V1 and exhibit a decrease in their tuning bandwidths. In their model, De Valois et al. [31] suggested that S-modulated neurons rotate the cone-opponent axes to perceptual-opponent directions. Their modulations with S activations were in the form of additions and subtractions, which does not add to the nonlinearity of neuron responses. We leverage their observation, but in the form of multiplicative modulations for additional nonlinearity. That is, each V2 multiplicative cell response is the result of multiplying L or M neurons from V1 with a V1 S cell activations. For example, in Figure 1(b), "L-off × S-off" is for a cell obtained by modulating a V1 L-off cell responses by a V1 S-off neuron activations. In our model, the multiplicative V2 neurons are computed as: where × represent multiplicative modulation, and R V1{L, M} and R V1{S} are for responses of an L or M cell and S neuron from V1 respectively. As before, φ is the rectifier from Eq. 1 with the same parameters as those of the additive V2 cells. Multiplicative V2 cells are nonlinear with respect to cone inputs and bilinear with regards to V1 activations.
Multiplicative V2 neurons have narrower bandwidths than those of additive V2 cells, which we showed quantitatively earlier. However, consider the multiplicative V2 maps in Figure 1(b) for a brief qualitative explanation. For the hue wheel as input, relatively high responses of the single-opponent V2 cells span a larger region of their map compared to multiplicative V2 cells. This is an indication that multiplicative V2 cells are selective to a narrower range of hue angles. As an example, both L-off and S-off V1 cells have high activations for relatively large regions of the input respectively. However, when multiplied, the resulting neuron, i.e. the "L-off × S-off" cell has strong responses for regions with greenish color, and the activation of the L-off V1 cell to bluish regions is suppressed to the extent that the L-off × S-off cell shows close to no responses.
As a summary, in model layer V2, a total of 14 neuron types are implemented: 6 additive and 8 multiplicative cell types.

Model V4 cells.
We modeled V4 neurons representing local hue using a weighted sum of convolutions over model V2 neuron outputs. More specifically, responses of the i-th V4 neuron, R V4,i , are computed as: where R V2,j represents the responses of the j-th V2 neuron, and φ is the rectifier introduced in Eq. 1, with τ = 0 and s = 1. As a result of this parameter setting for the rectifier, each V4 cell is a linear combination of V2 cell responses and hence, nonlinear in terms of cone inputs. The set of weights {w ij } j=1,...,14 determine the hue to which the i-th model V4 neuron shows selectivity. In model layer V4, we implemented six different neuron types according to distinct hues: red, yellow, green, cyan, blue, and magenta. The chosen hues are 60 deg apart on the hue circle of HSL, with red at 0 deg. These hues were also employed in the V4 color map study [15] and for comparison purposes, we utilize these hues. When the six V4 colors are mapped to the MB space, the hue angles are shifted with respect to those of HSL, with red, yellow and cyan hues close to cone-opponent directions in the MB space, and green, blue and magenta along offopponent axes. From here on, we will refer to V4 neurons based upon their selectivities, e.g., model V4 red or model V4 cyan neurons. Although here we limit the number of modeled neuron types in this layer to six, we would like to emphasize that changes in combination weights will lead to neurons with various hue selectivities in this layer. Modeling neurons with selectivities to a wide variety of hues with yet narrower tunings could be accomplished in higher layers, such as IT, by combining hue-selective model neurons in V4.
In order to determine the weights from V2 to V4 neurons, w ij 's in Equation 6, we considered the distance between mean peak activations of model V2 neurons to the desired hue in a model V4 cell. The hue angle between these two hues on the hue circle is represented by d ij . Then, the weight w ij from model V2 neuron j to model V4 neuron i is determined by: where N (.; 0, σ) represents a normal distribution with 0 mean and σ standard deviation, and Z i is a normalizing constant obtained by The weights used for each of V4 neuron types are summarized in Figure 5(c). In this figure, each row represents the weights for a single V4 cell, and the columns are for model V2 cells. Note that all V2 to V4 weights are normalized to sum to 1.0. That is, the sum of weights in each row is 1. In this figure, dark red shows a large contribution, while dark blue represents close to no input from the relevant V2 neuron. Consider, for example, the weights for the red V4 cells. This neuron has relatively large weights from V2 L-on, M-off, and M-off × S-off cells. In other words, cells with large contributions from L cones. This observation is not surprising as previous research by Webster et al. [22] found that unique red in human subjects has largest contributions from L cones. In Figure 1(b), at the V4 layer, from top to bottom, the neurons selective to magenta, red, yellow, green, cyan, and blue are displayed. As expected, model V4 yellow neurons, for instance, show activations across red, yellow, and green regions of the stimulus, with stronger activations in the yellow segment.

Choice of the model
Looking back at our network architecture in Figure 1(a), and also the computational operations for each layer of our model, the reader might wonder why we did not employ a convolutional neural network (CNN) for hue representation. After all, our network architecture is similar to that of a CNN: the responses of neurons in each layer of the model are computed by a convolution followed by a rectification, similar to the operations in a CNN. We emphasize here that our choice of the model differs from a CNN for the following reasons: 1. Our goal was to introduce a biologically inspired model that would help in understanding hue encoding mechanisms in the brain. In doing so, we designed each neuron in our network according to the existing findings of the brain. For example, the receptive field profile and the weights from cones to single-opponent cells in our model LGN layer were set based on the reported findings of Reid et al. [1]. In a CNN, these parameters of the model are learned from data, and as a result, any receptive field profile and any setting of weights might be learned, which could possibly be different from those of biological color neurons. Similar to our discussion about one-layer models, in an end-to-end manner, CNNs might succeed in hue representation and specifically in encoding of unique hues. However, the individual neurons in such models might not match with those of the brain and hence, will not demystify color processing in the brain.
2. One challenge in convolutional neural networks is interpreting the learned features in the hidden layers. Often, the learned features in the first hidden layer are compared with biological V1 neurons. However, learned features in deeper layers are difficult to explain. There have been attempts to understand and interpret hidden layer features [42,43]. However, a clear understanding of learned features and the ability to explain the reason behind the decision in learning those features is yet to be achieved. As a result, had we employed a CNN model, we would not have been able to explain the learned features in all layers of our model, which was far from our goal to assign representational roles to each brain area from LGN to V4.
3. In this work, we did not have access to any cell recording data. Nonetheless, even with such data accessible to us, we would not have been able to use a CNN model. Often, cell recording data is limited and sparse and not enough for learning the massive number of parameters in a CNN.
We acknowledge that a certain set of parameters in our model were set according to biological findings and the remaining parameters, such as the weights from our model layer V2 to V4, were set heuristically. Indeed, a learning algorithm, in this case, might prove to help make predictions about these connections in the brain. This step, as described in the Discussion section, is left to be further explored in the future.