Main

Cephalopod camouflage consists of matching the animal’s appearance to that of its substrate and typically contains two-dimensional (2D) and three-dimensional (3D) components. Although both components are technically textural4,15,16, in this field the term ‘texture’ is often applied only to 3D features, caused, for example, by the contraction of skin papillae5,17. We studied here the 2D features of camouflage and therefore refer to them as skin patterns and to the process as pattern matching. Pattern matching does not consist of a faithful reproduction of the substrate’s appearance but, rather, of the visually initiated statistical estimation and generation of that appearance5. These sophisticated operations are carried out instinctively18 by the brain of animals that diverged from us more than 550 million years ago19, well before large brains existed. The generation of 2D skin patterns relies on a motor system that controls the expansion state of up to several million pigment cells (chromatophores) embedded in the animal’s skin5, among other specialized cell types17,20. The expansion state of each chromatophore depends on a radial array of muscles controlling the size of a central pigment sac21 and, therefore, on the activity of one to a few motoneurons, the dendrites and somata of which lie in the animal’s central brain6,7. The generation of a skin pattern therefore results from the appropriate coordination and control of tens of thousands of motoneurons by a system that interprets complex visual scenes2,3,18.

We recently developed methods to track the instantaneous expansion state of tens of thousands of chromatophores in the behaving cuttlefish S. officinalis—a master of camouflage14. Here we improve on these techniques and report a new complementary analysis to describe quantitatively the space, dynamics and reliability of camouflage patterns and, through this, gain insights into its control system. To this end, objective measurements are critical because camouflage evolved to exploit perceptual clustering by observers, so as to fool them22,23. Earlier efforts to categorize camouflage patterns suggested that they belong to a small number of classes8,9,10,11, a surprising result given the size of this system. However, a recent study using artificial backgrounds suggested that patterns, quantified as the differential expression of a set of pattern components, do not readily cluster in a low-dimensional projection24. Using natural and artificial 2D backgrounds (Methods and Extended Data Figs. 1 and 2), we acquired a dense videographic sampling of the animal’s generative pattern repertoire and analysed motion within skin-pattern space (Fig. 1a).

Fig. 1: Camouflage-pattern space.
figure 1

a, A path to camouflage, from starting (s) to ending (e) skin patterns, could be direct and ballistic (dashed line) or meandering, with successive accelerations and decelerations (grey). b, Two examples of camouflage skin patterns, typically classified as disruptive (left) and mottled (right). Insets: magnification of the area on the mantle indicated by a red square. Bottom insets: high-resolution segmented images. Top insets: lower-resolution wide-field images. The apparatus and stimuli are shown in Extended Data Figs. 1 and 2. Scale bars, 10 mm (main images), 20 mm (top insets) and 0.5 mm (bottom insets). A, anterior; P, posterior. c, Skin-pattern space was visualized using a 2D UMAP embedding of skin patterns produced by one representative animal of ten analysed. n = 215,577 images. Naturalistic and artificial backgrounds are shown in Extended Data Fig. 2b,c. d, Nine representative images were taken from each of the eight regions of skin-pattern space in c.

Skin-pattern space is high-dimensional

To quantitatively assess camouflage pattern space, we presented a series of natural images to cuttlefish using printed fabric, filming cuttlefish skin at both high and low resolution. Figure 1b shows the processed high-resolution images of one cuttlefish on two backgrounds. These images were acquired using an array of 17 high-resolution cameras, synchronized with a single low-resolution camera for a global view (Fig. 1b (colour insets)). Low-resolution and high-resolution image sets were used to generate what we name skin-pattern-representation and chromatophore-representation spaces, respectively. We gathered over 200,000 low-resolution cuttlefish images from 27 h of behavioural videos of an animal on our background set (Extended Data Fig. 2b,c). We then used a pretrained neural network to parameterize skin patterns (Methods and Extended Data Figs. 1 and 2). The skin-pattern space is displayed in a 2D uniform manifold approximation and projection (UMAP) embedding (Fig. 1c), and selected patterns corresponding to different regions of that space (i–viii) are illustrated in Fig. 1d. Whereas patterns within each window seemed of a kind, their precise realizations differed. The smallest variations were due to chromatophore flickering (detected in high-resolution data; Supplementary Video 3) and small local fluctuations. However, larger variations represented different instantiations of a skin pattern (Fig. 1d). Having tested the explanatory power of linear and nonlinear methods for dimensionality estimation (Methods), we opted for a linear method—parallel analysis25,26. Parallel analysis reports the number of principal components (PCs) with statistically significant explanatory power (versus a null distribution based on independently shuffled data). This approach indicated 59.4 ± 1.23 relevant dimensions (Extended Data Fig. 3a–e), although parallel analysis often underestimates the true dimensionality of a linear space above 20 dimensions27 (Methods).

The apparent high dimensionality of camouflage patterns hinted that a reasonably close relationship might exist between backgrounds and skin patterns. As natural backgrounds themselves are difficult to parametrize simply28, we tested this hypothesis in several ways. In the first, we used a set of 30 natural images (Extended Data Fig. 2a) and measured the correlation between background and final skin pattern along the PCs of skin-pattern space (Fig. 2a). They were significantly correlated (PC1–3, P < 10−10) in all of the animals tested. In the second, we tested spatial frequency, a simple texture metric in image analysis. Using checkerboards as backgrounds29 (Extended Data Fig. 2c), we observed, as others had previously8,11,29, that a coarse sampling of spatial frequencies (half-periods, 0.04–20 cm) led to only a few clusters of correlated skin patterns. Observing that this sampling of spatial frequencies was too sparse, we added 16 checkerboard sizes in an intermediate range (Fig. 2b). A clear trend then emerged, linking monotonically background and skin-response spatial frequencies. Decomposition of chromatophore space using Leiden clustering identified groupings of chromatophores (components; Methods) of which the expansion was positively or negatively related to background spatial frequency (Fig. 2c and Extended Data Fig. 4a–c). Sepia camouflage can therefore smoothly and predictably transition from one pattern to another, when challenged with appropriate sets of backgrounds. This sensitivity was expressed differently over individual pattern components, resulting in an elaborate relationship between visual stimulus and skin patterning. We examined other metrics of pattern matching, as well as low-level image statistics (including Fourier, Weibull30, contrast and skewness) and their combinations. None (spatial frequency included) matched the predictive power of a high-dimensional visual texture parameterization (Extended Data Fig. 4d–f).

Fig. 2: The relationship between camouflage and natural or checkerboard backgrounds.
figure 2

a, The correlation between camouflage patterns and natural background images in skin-pattern space (stimuli N0–N29; n = 3 animals, >8 trials per stimulus; Methods). PC1 (accounting for 17.5 ± 0.8% of the variance) shows significant stimulus–response correlation (Pearson’s r = 0.62, 0.64, 0.54; P < 10−22). In the three analysed animals, the first 3 (animal S205), 3 (animal S206) and 2 (animal S207) PCs are significantly correlated (35.8 ± 5.1% variance, Pearson’s r = 0.56 ± 0.05, P < 10−15). b, Skin patterns evoked by checkerboards of different spatial frequencies (square sizes, 0.04–20 cm, only 0.08–10 cm shown) reveal a monotonic gradient of intermediate responses. PC1 shows a statistically significant stimulus–response relationship within the shaded region (0.31–1.25 cm; linear regression r2 = 0.50 ± 0.04, P ≤ 0.0001; n = 3 animals, 4–8 trials per stimulus). In the three analysed animals, the first 4 (animal 1), 2 (animal 2) and 4 (animal 3) of the top 50 PCs are statistically significant (r2 = 0.40 ± 0.03, P ≤ 0.0001). The error bars show the 95% confidence intervals. c, Four clusters of co-varying chromatophores (components), of which the state depends positively (red) or negatively (blue) (P ≤ 0.05) on the stimulus, in one representative animal of three analysed. n = 4–8 trials per stimulus. Each point represents the mean steady-state response sampled at 25 Hz over 46 s. Top, cluster locations. Bottom, correlations between the mean chromatophore area and checkerboard period.

Transitions are tortuous and intermittent

We examined the paths taken through skin-pattern space when an animal changed camouflage in response to changes between three backgrounds (N13, N26, N29; Extended Data Fig. 2b). Background changes occurred every 5–10 min (Methods). In some trials, the animal lay still during background changes (Supplementary Videos 1 and 2). In other trials, the background change induced the animal to swim to a new position, while adopting a new camouflage. Camouflage-trajectory durations were equally distributed in the two conditions (Extended Data Fig. 3d,e).

Three trajectories through skin-pattern space (pattern transitions), taken from the same animal in response to the same background change, are shown as projections into the PC1–2 plane (Fig. 3a)—they were tortuous and differed across trials, typical of our results. The instantaneous velocity of pattern change (Methods) also varied along each path (Fig. 3b and Extended Data Fig. 5). In regions of greatest tortuosity (but not only there), the speed of pattern change decreased, to pick up again until a next deceleration, before eventually converging to a stable camouflage. The direction of motion at the exit of each low-velocity region pointed towards the final camouflage pattern (Fig. 3c (left); Rayleigh test, P = 1.1 × 10−54, n = 85 trials, 3 animals, 3 backgrounds; Methods), rather than in a direction parallel to the direct path linking the starting and final camouflage patterns (Fig. 3c (right)). This indicates that the animal updated its heading on its course through pattern space. The number of successive low-velocity regions increased as the animal skin approached its target pattern (Fig. 3d (grey)), as did the dwell time in each such region (Fig. 3d (red)). These results suggested that the path to a camouflage contained successive error-correction steps, as confirmed by direct measurements (Fig. 3e).

Fig. 3: Dynamics of camouflage transitions.
figure 3

a, Exploratory trajectories (lengths, 40–126 s) in skin-pattern space (PC1, 15.2%; PC2, 13.1%) in response to the same background switches (N13 to N29). The stars represent the three background textures (N13 (orange), N26 (green) and N29 (red)). b, Speed profile of pattern change (colour) of one trajectory (length: 220 s) in response to background switch from N26 to N29. c, Test of two motion-direction models (update and memory) for motion in skin-pattern space. The dark green vectors point to the end goal from the starting point; the light green vectors point to the end goal from each intermediate slow point; the blue vectors show the actual motion direction when exiting each slow point. Data support the update model: the distribution of α is significantly biased to 0 (Rayleigh test, P < 10−10), but not that for β (Rayleigh test, P > 0.01). n = 85 trajectories, 3 animals on 3 backgrounds (N13, N26 and N29). d, The number of transitions (steps) between slow points per trial (grey) and the dwell time at slow points (red) increase as the skin pattern becomes more similar to the background. n = 868 slow points from 85 trajectories in 3 animals. Data are mean ± s.e.m. The x axis shows the distance (in top two PCs) from the skin pattern at each slow point to the background pattern (bins of 285 arbitrary units (a.u.)). e, The correlation (corr.) between skin and background patterns increases as the number of transitions (steps) between slow points increases (Methods). Ordinate plots change (Δ) in correlation between the skin and background compared with at behaviour onset. n = 85 trajectories, 3 animals.

Variable composition of camouflage patterns

We next used high-resolution imaging to identify large pattern components18 that might reflect the higher levels of a hypothesized control hierarchy in the chromatophore system. Using Leiden community detection (Methods and Extended Data Fig. 6) over the pattern-motion segments of a camouflage change (that is, in between low-velocity regions), we identified clusters of co-varying chromatophores (Fig. 4a,b (colour coded)). The identified components were neither trivial nor did they match manually annotated components identified from static images5,18. The two components in Fig. 4c (red and blue, left) overlapped within the central square but differed from one another in their activities (Fig. 4c (right)), indicating that a seemingly singular feature—the dorsal square, characteristic of many disruptive patterns—is composed of interspersed subcomponents, each capable of independent control. Generally, the degree of pairwise correlation between components was independent of their spatial overlap (Fig. 4d; Pearson’s r56,928 = −0.043, P = 4.34 × 10−25, 3 animals). Individual components could be tight and clumpy, or loose and distributed. Our pattern decomposition had high explanatory power only if the components had been derived from the same trajectories (Fig. 4e). Performance declined when components were extracted from different trajectories or patterns (Fig. 4e and Extended Data Fig. 6), hinting that each trajectory in chromatophore space (that is, each realization of a camouflage) uses a different arrangement of components.

Fig. 4: Organization and reorganization of chromatophore groupings during pattern transitions.
figure 4

a, Speed profile of a transition (background: N13 to N29) in chromatophore space (PC1–2: 15.6%, 9.8%) contained five slow points (blue, top). Groupings of chromatophores (coloured, bottom) that changed together (pattern components) during transitions between these points (1 of 3 analysed animals). b, Chromatophores (chrom.) in pattern components shrank (top row) or expanded transiently (bottom) during the transition in a. The pink shading shows the time of motion between slow points. c, Interdigitated groups of 1,736 (red) and 3,903 (blue) chromatophores, located in left half of the dorsal square, show different activity (right; average in b, top). The heat maps show the size of individual chromatophores (rows, z-scored). d, The correlation of activity between pattern components is not linked to their physical separation (Pearson’s r56,928 = −0.043, P = 4.34 × 10−25; 3 animals; Wasserstein distance; Extended Data Fig. 7c–e). e, The variance explained (200 PCs) by the dataset in which PCs are defined. Decomp., same dataset; nearest, the most similar transition; all, all transitions, downsampled; static, activity at static patterns; shuffled, randomized groupings. n = 21, 18 and 21 trajectories (traj.) from 3 animals, 3 backgrounds (N13, N26 and N29). f, Trajectories (the same PCs as in a) and images for two similar transitions (teal, 87.2 s; pink, 87.1 s; backgrounds are the same as in a). g, Chromatophores (n = 1,123) that co-varied in the pink trial (purple cluster, g1, left) split into many clusters in the teal trial. Chromatophores (n = 1,532) that co-varied in the teal trial (teal cluster, g2, right) split in the pink trial. The heat maps show the size of individual chromatophores (rows, z-scored). h, Pattern-component reorganization. Groupings are based on activity in the pink (left) and teal (right) trials in f and g. The line thickness is proportional to number of shared chromatophores. i, The fractions of chromatophores that grouped consistently across pairs of trials. The mean intersection over union (IoU) of chromatophore groupings decreases as the distance between the transition pairs increases (n = 32.3 ± 0.5, 31.2 ± 0.9, 33.2 ± 0.4 clusters; 44, 32, 30 transitions; animals and backgrounds are as described in e; Extended Data Fig. 7f,g). The lines show shuffled groupings.

We examined these arrangements in more detail by tracking, at the chromatophore resolution, two camouflage-pattern trajectories from the same animal, initiated after the same background switch (from N13 to N29) and that looked similar to the naked eye (top and bottom image rows). These trajectories have neighbouring starting and ending points and occupy overlapping regions in chromatophore space (pink and teal) (Fig. 4f). A set of about 1,100 chromatophores, defined by their covariation in the pink trajectory, formed one component (Fig. 4g (top left, purple)). The same chromatophores, analysed again but over the teal trajectory (right), now defined over 15 components. This analysis was repeated with a different chromatophore set, this time chosen from the teal trajectory (Fig. 4g (bottom right, cyan)). Here also, this component split into smaller ones in the other trajectory (left). The intricacy of this reorganization is summarized in Fig. 4h, in which the left and right margins represent the components generated by analysing one or the other trajectory. Subsets of chromatophores that belonged to one component joined a different component a few moments later, even (as here) when the camouflage changes were not distinguishable by eye. Across pairs of trajectories, the fraction of chromatophores classified as belonging to the same components decreased as the distance between trajectories increased (Fig. 4i and Extended Data Fig. 7; Pearson’s r3,882 = −0.619, P = 0.0; 3 animals). Thus, camouflage-pattern components are not stable entities and can be defined only over specific segments of activity.

Pattern trajectories during blanching

Cephalopods often turn pale (blanch18,31) when they perceive a threat. These changes appear to be for conspicuous ‘deimatic’ display rather than camouflage, because they converge to similar patterns whatever the background (Supplementary Video 4). We therefore used blanching as a comparison for pattern-change dynamics during camouflage. Figure 5a shows 3 out of 17 blanching responses to a looming visual stimulus (Methods) in one animal displaying two different initial camouflages (Extended Data Fig. 8). In this PC projection of chromatophore space, the three trajectories converged from their starting states to the same neighbourhood of chromatophore space, a blanched deimatic pattern (B), before typically returning to their initial camouflage (blue and green trajectories) or (only once in these 17 trials) to a different one (red). The blanching motion was fast; recovery was slower (Fig. 5b and Extended Data Fig. 8b) with gradual deceleration. We compared the curvature of camouflage and blanching trajectories in 2–200 PC dimensions; blanching paths were always more direct than those taken in camouflage, and required fewer dimensions to account for the same variance (Fig. 5c and Extended Data Fig. 9).

Fig. 5: Transition to and from the blanched state.
figure 5

a, Chromatophore-space trajectories from camouflages (disruptive (D), background N29; mottled (M), background N13) to blanched (B) in response to approaching visual stimuli (sepia219 is shown throughout, except in c, k and l; other animals are shown in Extended Data Figs. 810). The solid and dashed lines show the motion to and from the blanched state, respectively. b, The outwards and return paths of all trajectories (n = 17, sepia219). Note the slower returns. The colour shows the speed of pattern change. c, Blanching (blanch) trajectories are straighter (lower mean curvature, 75-PC space) than camouflage (camo.) transitions (Student’s t-test, camo. versus blanch, 3 animals each, P = 0.0017). d, Whole-mantle, ‘disruptive’ camouflage. Scale bar, 10 mm. e, Chromatophore segmentation in a cropped region (indicated by the yellow box in d) at the trial start, at maximum blanching and at the trial end, with corresponding marginal histograms (red). D and M, camouflage at trial onset as in a. Note that traces of the start and end patterns can be seen at the blanch timepoint. f, Hierarchical clustering of whole-mantle patterns during (top) and after (bottom) blanching reveals conserved subtrees (colours). The open circles show the trials in e (D (black); M (blue); n = 17 trials, cophenetic correlation = 0.26; P = 0.015, Mantel test). g, Chromatophore size over time (single trial), ordered by the time of recruitment during return from blanching. Scale bar, 4 s. h, Chromatophores coloured by recruitment-time rank (as in g), suggesting a non-random, compartmentalized sequence. i, Leiden clustering (17 trials) reveals six components (colours). j, The density distribution of the chromatophore mean rank over trials for each component (on the basis of i), showing a reliable sequence (Kruskal–Wallis, H = 67.3, P = 3.7 × 10−13), all pairs being significantly different (post hoc multiple hierarchical permutation tests, P = 0.001). The shading shows the binned s.d. k, The explained variance after reduction to components derived from the 200-PC baseline (unclustered), the individual trajectories (cluster by trial), the whole dataset (cluster all trials) and a shuffled dataset (shuffle). n = 3 animals per dataset. l, The ratio of explained variance between cluster by trial and cluster by all trials. Blanching components are more generalizable across trials (two-tailed permutation t-test, camouflage versus blanching, 3 animals each, P = 0.0475).

In the experiment in Fig. 5a,b, the animal returned to the neighbourhood of its pre-blanching state in 16 out of 17 trials, suggesting that information about its initial state remained (Extended Data Fig. 8). Indeed, although blanching trajectories converged towards the same state, they remained separable near B—their point of convergence. This is illustrated in a magnified view of the edge of the dorsal square (Fig. 5d,e): in the blanched state, the edge of the square was detectable (with reduced contrast) in the D (disruptive) trial, but not in the M (mottled) trial, consistent with their respective starting and ending patterns. The predictability of the return pattern from the blanched state is illustrated in the tanglegrams in Fig. 5f and Extended Data Fig. 8d, based on the correlations between blanched and ending patterns. The rapid chromatophore shrinking (blanching) followed by slower expansion is shown in Fig. 5g, in which chromatophores are ordered by expansion onset. By mapping the ranked chromatophores back onto the mantle, we observed that they formed reliable non-random patterns (Fig. 5h), confirmed as reliable components by community-detection clustering (Fig. 5i,j and Extended Data Fig. 10). The contrast between the repeatability of skin-pattern restoration after blanching (Fig. 5k,l) and the variability of camouflage pattern composition (Fig. 4f–i) supports the hypothesis that camouflage and blanching are under differential control.

Discussion

Our results paint a complex picture of camouflage control. First, possibly consistent with the high resolution of chromatophore motor control5,14, skin-pattern space is high-dimensional—the same backgrounds led to many different instantiations of a given skin pattern that are difficult to distinguish by eye. Second, camouflage smoothly covaries with ranges of natural or artificial visual textures. Skin patterns were composed of components, or chromatophore clusters, independently recruited24, and displaying different sensitivities and responses. The Sepia visual system must therefore represent visual textures in some detail, probably in the optic lobes32, and the animal’s camouflage strategy is adapted to matching high-dimensional background targets. Third, the paths (in skin-pattern space) taken during a camouflage change are tortuous, intermittent—consisting of alternating pattern motion and relative stability—and not stereotyped. The number of pauses and their duration increased as convergence neared. The correlation between skin and background patterns increased as the number of pattern-motion steps increased. At each intermittent motion onset, pattern motion aimed towards the target camouflage, reflecting knowledge of the animal’s instantaneous state rather than the memorization of its initial motion direction at the onset of the behaviour. Together these results suggest that camouflage relies on feedback during the approach to an adaptive pattern, more akin to correction of hand reaching movements in primates33,34 or of tongue reaching in rodents35 than to ballistic motion towards a memorized target. Fourth, trajectories between camouflages involve pattern components defined by chromatophore co-variation; these components could be large or small, tight or loose, suggesting a multiscale control system. However, different trajectories between similar pairs of camouflages invoked different (in numbers and composition) pattern components, suggesting control flexibility. Owing to such flexibility, describing body pattern as the combination of around 30 fixed pattern components24 may underestimate the complexity and dimensionality of camouflage pattern space. Identifying the smallest consistent components of camouflage patterns was not possible and will probably require very large datasets. Fifth, blanching evoked by threats to camouflaging animals retained a trace (at chromatophore resolution) of the initial camouflage. The animal usually returned to its initial state after withdrawal of the threat, through paths decomposable into reliable components. This suggests that blanching co-occurs with camouflage. Blanching represents the shrinking of chromatophores caused by the relaxation of the chromatophore muscles. By contrast, the return to a camouflage pattern requires the differential expansion of chromatophores by the contraction of those same muscles. Thus, blanching could be generated by a transient and general inhibition of the chromatophore motor drive, downstream of the camouflage control level; however, because recovery from blanching reveals components with different dynamics, this putative inhibition probably acts upstream of the motoneurons (at an intermediate level of chromatophore control) rather than directly on them.

In conclusion, camouflage in Sepia appears to be both very flexible and to follow non-stereotypical paths when analysed at cellular resolution. The dynamics of its output suggest the use of feedback to converge onto a chosen camouflage. Regarding where such feedback could originate from, a first possibility is proprioceptors in or around each chromatophore. Evidence for such proprioceptors around cephalopod chromatophores is lacking5. A second possibility is that cuttlefish use vision to assess the match between their immediate skin-patterning output and the background, for example, during each low-velocity segment in pattern-space motion. This could be tested by masking the animal’s skin during camouflaging. A third possibility is efference copy of the motor command to the chromatophore array. This would require the existence of appropriate motor collaterals (not described to date), some calibration of the copy and some form of integrator, such that the copy accurately represents the true generated output. Our results will inform mechanistic studies required to understand this remarkable system.

Methods

Experimental animals

All research and animal care procedures were carried out in accordance with the institutional guidelines that are in compliance with national and international laws and policies (DIRECTIVE 2010/63/EU; German animal welfare act; FELASA guidelines). The study was approved by the appropriate animal welfare authority (E. Simon, Regierungspräsidium Darmstadt) under approval number V54-19c20/15-F126/1025. European cuttlefish S. officinalis were hatched from eggs collected in the English Channel and the North Atlantic and reared in a seawater system at 20 °C. The closed system contains 4,000 l of artificial seawater (ASW; Instant Ocean) with a salinity of 3.3% and pH of 8–8.5. Water quality was tested weekly and adjusted as required. Trace elements and amino acids were supplied weekly. Marine LED lights above each tank provided a 12 h–12 h light–dark cycle with gradual on- and off-sets at 07:00 and 19:00. The animals were fed live food (either Hemimysis spp. or small Palaemonetes spp.) ad libitum twice per day. Experimental animals of unknown sex, 4 to 10 months after hatching, ranging from 42 to 90 mm in mantle length, were selected for healthy appearance and calm behaviour. The animals were housed together in 120 l glass tanks with a constant water through-flow resulting in five complete water exchanges per hour. Enrichment consisted of natural fine-grained sand substrate, seaweed (Caulerpa prolifera), rocks of different sizes, and various natural and man-made 3D objects.

Data acquisition

Experiments were performed in a 700 mm × 700 mm × 135 mm live-in filming tank in a separate 800 l system with its own water exchange, filtration and environmental enrichment (Extended Data Fig. 1a–c). At least 2 days before experiments, animals were moved from their home aquarium into the filming tank for acclimatization; they remained there throughout the days or weeks of filming. During experiments, a black frame was placed into the middle of the arena, restricting animals to a 400  mm × 400 mm area, keeping tank enrichment temporarily out of sight. During filming, an acrylic lid was placed onto the water surface to remove optical distortions caused by water ripples, and the arena was lit by four LED strip lights with diffusers, mounted 15 cm above the acrylic lid (SAW4 white, 698 cm length, Polytec), providing an illuminance of 3,400 lx measured at the lid centre). Background images were presented to the animal as prints on a 400-mm-wide fabric roll (210 g m−2, 75 d.p.i.), moved over the arena floor gently with a manual crank. For experiments with natural backgrounds, a 2-mm-thick transparent acrylic sheet was placed on top of the fabric to provide extra stability for some of the animals. This increased the chance of capturing in-focus high-resolution frames during pattern transition.

We presented a set of 30 natural images with diverse visual statistics in at least five random orders (private collection; Extended Data Fig. 2a). Three background images were selected for further experiments (Extended Data Fig. 2b) on the basis of reliably eliciting distinct camouflage patterns in multiple animals. Checkerboard backgrounds were logarithmic series in three ranges of square sizes, one coarsely sampled from 0.04 to 20 cm, and two with denser sampling, from 0.63 to 2.5 cm and from 0.18 to 0.63 cm (Extended Data Fig. 2c). The coarse series was repeatedly presented in two random orders, and the finer series in one random order each. In three animals, these frozen random sequences were additionally broken in a portion of the trials by skipping through the fabric roll. The four sets of frozen random series (and their respective reverses) were not presented in a defined order. The evoked behaviours were comparable, and therefore combined for analysis.

For experiments with looming visual stimuli, the effective size of the arena was reduced to 150 × 400 mm by inserting a transparent plexiglass wall. An LCD monitor (Dell U2412M, size 52 × 32.5 cm, 50 × 22 cm visible to the animal, 60 Hz refresh rate) was suspended along the long arena edge at 40° from horizontal and maintained at a constant luminance (300 cd m−2).

Visual stimuli were (1) manual presentations of the experimenter’s hand approaching the animal at approximately 45°, stopping 20 cm away from the animal with fingers outstretched (hand looms); or (2) single presentations of a dark expanding circle on the monitor, subtending a visual angle of about 1.5° at onset, before expanding to simulate an object approaching at constant speed, according to the equation:

$$\frac{r\left(t\right)}{d}=\tan \left(\frac{\theta \left(t\right)}{2}\right)=\frac{l}{vt}$$

where r(t) is the radius of the circle on the screen, d is the distance between screen and animal, θ(t) is the angular size, l is the half-width of an approaching object and v is the approach velocity. Stimuli were generated using PsychoPy36 and presented at six different values of l/v corresponding to collision times of 2.3, 5.7, 9.2, 17.0, 25.0 and 34.4 s. The spot was located on the screen directly above the animal at a constant x-coordinate, with the y-coordinate varied to match the position of the animal’s head, approximately 45° from its zenith. The minimum interstimulus interval was 2 min, and the background was changed after 1–5 stimuli. Sessions contained either one or both stimulus types; in sessions with hand looms only, the monitor was removed. In the first 30 min of session 1 for each animal, several stimuli of different l/v values and hand looms were presented to find a stimulus that elicited vigorous blanching behaviour for a given animal; subsequently, this stimulus was over-represented in the stimulus order.

For high-resolution filming, 17 calibrated cameras (Basler ace acA4112-30uc) were arranged in a planar array, each recording a 3,000 × 3,000 pixel video at 25 fps. A camera’s field of view was 52.4 mm × 52.4 mm (17.4 µm per pixel, 1 chromatophore occupying 54 pixels on average), with approximately 20% (20.1 ± 2.0) of pixels overlapping in neighbouring cameras. An additional low-magnification camera was mounted next to, and synchronized with, the high-resolution array, with a low-resolution field of view of 360 × 360 mm (119.8 µm per pixel, 1 pixel containing 2.4 chromatophores on average). All of the cameras were mounted onto a 2D rail system moved by stepper motors. To deal with high bandwidths, all video data were directly hardware-encoded to h264 format in real-time during the experiment. For this purpose, we used three computers running Ubuntu (v.18.04), each equipped with two graphics cards (NVIDIA Quadro M4000) providing a maximum number of eight encoding streams on each computer. We developed PylonRecorder237, a multi-threaded C++ acquisition software. Each instance of this software was used to retrieve the signal from one camera through USB3, encode it to h264 through libnvenc/FFmpeg and write it to one dedicated solid-state drive. A fourth computer equipped with a PCAN-USB interface (PEAK-System) running PylonRecorder2 with an additional plugin38 was used to control and monitor the entire experiment. An Arduino Mega 2560 equipped with a CAN bus shield was used as a central hardware trigger source for all of the cameras39. A tracking camera was placed outside the array to view the entire experimental arena. After calibrating the tracking camera to rail positions, the experimenter could position the camera array over the animal as it moved by selecting it in the tracking view.

Skin-pattern representation from low-resolution data

Low-resolution imaging data were processed to generate a representation of the skin pattern (Extended Data Fig. 1d). In this study, ‘skin-pattern representation’ refers to 2D visual textures15,16,40. Cuttlefish can produce different 2D textures through chromatophore activity, and also alter their 3D appearance through postural motion and contraction of papillae17. These 3D alterations have effects on camouflage and alter the 2D visual patterning of the cuttlefish skin. These were detected by and incorporated into our low-resolution analysis.

Segmentation and alignment

For each frame, the cuttlefish was segmented from the background with the Detectron2 platform41 using a pretrained baseline model (COCO Instance Segmentation with Mask R-CNN, R50-FPN, 3x), fine-tuned with a cuttlefish training dataset. The cuttlefish images were then aligned by one of two ways: (1) aligning the long axis of an ellipse fitted to the cuttlefish segmentation mask, with the anterior-posterior orientation determined by a model similar to the one above, but trained from a different baseline (COCO Person Keypoint Detection Keypoint R-CNN, R50-FPN, 3x); or (2) maximizing image cross-correlation from one frame to another. Erroneously segmented frames were detected with a threshold on the area of the segmentation masks at 2 s.d. As a result, about 3% of all frames were removed from the subsequent analyses.

Texture representation

The texture representation used in our low-resolution imaging (Figs. 13) was the max-pooled fifth layer activations (conv5_1) of the VGG-19 neural network with weights pretrained with the ImageNet dataset in an object-recognition task, accessed through the Keras platform42. The choice of layer and model was informed by findings from psychophysics experiments on visual textures synthesized using Gram matrices of different layers of the model43, and more broadly by the visual texture literature44,45. To our knowledge, this method has not been previously used to study cuttlefish camouflage.

The inputs to the neural network were preprocessed as follows: cuttlefish images were converted into 8-bit greyscale and histogram-equalized using OpenCV 4 (ref. 46). The background, as detected in the segmentation step, was replaced by middle grey. The images were cropped and/or padded into a square such that the cuttlefish body length was half of the image length. The cuttlefish body length was estimated for each video by taking the mean lengths of the fitted ellipses from 5–10 randomly selected frames. Finally, the images were downscaled to 224 × 224 px, and zero-centred using the VGG-19/ImageNet-compliant input preprocessing function in Keras.

The max-pooled representation used in this study is a vector of length 512, where each element is the maximum value of one of 512 feature maps (each of size 14 × 14). The Gram matrix representation mentioned above43,47 is a vector of length 262,144, vectorized from the Gram matrix of size 512 × 512 (symmetric), where each element is the scalar product between a pair of the 512 feature maps (each vectorized to a vector of length 196). The pairwise Euclidean distances of a random sample of 300 data points computed in the max-pooled representation space showed high correlation with the same computed in the Gram matrix space, despite being summarized by relatively few parameters. (Extended Data Fig. 3a)

The 512-dimensional pattern representation was further compared using the Portilla–Simoncelli16 visual texture model (Extended Data Figs. 3e and 4d,e). Inputs to the Portilla–Simoncelli model were preprocessed similarly, with the only differences being (1) the 224 × 224 images were padded up to 256 × 256 and (2) zero-centring was not performed. Using the standard configuration of 4 scales and orientations respectively and a neighbourhood size of 7 px, this representation consists of about 800 unique parameters.

This skin-pattern representation can be interpreted as a metric that captures textural information using 512 variables derived objectively from the visual world. It was used to construct the UMAP visualization, estimate the dimensionality of camouflage pattern space and study camouflage pattern dynamics.

Data selection

Full-length videos were subsampled every 10 frames to generate the entire skin-pattern space of an animal (Fig. 1c), and every 100 frames to identify the time windows of skin pattern transitions46 (Figs. 3 and 4). Transition periods were identified at timepoints at which (1) there was a jump between the 2–4 clusters (k-means) in the estimated pattern space (see below); or (2) the speed of change in pattern representation exceeded 1 s.d. Before and after each selected timepoint of pattern transition, the period between the times when the speed of pattern change exceeded and then returned to the baseline (mean) was designated as a chunk of pattern transition. Two consecutive chunks were merged into one if the interval between them was less than 20 s and did not contain a background switch. After identification using subsampled data, transition periods were processed at the full frame rate (25 Hz). To study static camouflage matching (Fig. 2), the last 30–60 s (depending on the animal) of each stimulus trial (5–10 min each) were considered to be stabilized camouflage response (Extended Data Fig. 3d), and were processed for subsequent analyses.

Visualization of skin-pattern space

Skin-pattern space was visualized using a UMAP model (min_dist=0.8, n_neighbours=100)48, which embeds the 512-dimensional pattern representation into two dimensions nonlinearly. The UMAP model was trained with a geometry-preserving sample of 20,000 data points selected using the geosketch algorithm on the top PCs accounting for 80% of dataset variance49. Misoriented frames were identified with a preliminary round of clustering and withheld during the training, but later embedded (Fig. 1c). Visual inspection found the above processing to be robust against the occasional upstream misorientation. For visualization, 3 × 3 grid points were laid onto each of the selected regions in the 2D UMAP space, the nearest data point with a distance of ≤0.1 was selected and the corresponding skin pattern was plotted (Fig. 1d).

Skin-pattern space analysis

Dimensionality

To estimate the dimensionality of skin-pattern space, we followed a previously proposed pipeline27. We first standardized features by removing mean and scaling to unit variance. We obtained an upper-bound dimensionality estimate using parallel analysis—a linear method that was found to be the most accurate among the tested linear methods for both linearly and nonlinearly embedded simulated data. We next fitted a linear (principal component analysis (PCA), 90% variance cut-off) and a nonlinear (Joint Autoencoder) model, respectively, to the data with the same number of latent dimensions as determined by parallel analysis in the previous step. For the Joint Autoencoder, we increased the size of the dense layer from 36 to 240, and the number of training epochs from 1,000 to 2,000 to reflect the increase in the number of input features (from 96 to 512). We found that a nonlinear model (variance explained, 60.0 ± 0.58) did not perform significantly better than a linear model (variance explained, 74.0 ± 0.65), suggesting that skin-pattern space in our data was largely linear (Extended Data Fig. 3a). We therefore chose parallel analysis, a linear method, to estimate the dimensionality of skin-pattern space. In brief, parallel analysis reports the number of PCs with statistically significant explanatory power compared with a null distribution defined by a parallel PCA in which the data points of each feature are independently shuffled. It should be noted that parallel analysis tends to underestimate the true dimensionality of a linear space above 20, although to a lesser extent than nonlinear methods. The above analysis was performed using 20,000 randomly sampled data points (frames) from each animal, as the estimation tends to stabilize beyond that sample size (Extended Data Fig. 3c).

Pattern matching

Pattern matching was studied using two stimulus sets: natural images and checkerboard series. To study visual features of natural image backgrounds (Fig. 2a and Extended Data Fig. 4d–f), the backgrounds were sampled by random selections of patches corresponding to animal size (>6 patches) near the animal from low-resolution imaging data. The background patches were then masked by the contour of the animal, processed through the same VGG-19 network for the pattern representation and further used to extract low-level statistical visual features.

Four parameters were derived from Fourier statistics28. The image was transformed to a power spectrum by fast Fourier transform (FFT). The 2D power spectrum was radially averaged and fitted with a line in log–log scale. FFT-α and FFT-β were the slope and intercept, respectively, of the fitted line. The third FFT parameter was the peak of the residual of the 1D power spectrum from the 1/fα fit. The fourth parameter, FFT-iso, was calculated as the ratio of the contour at 60% of the energy to a fitted isotropic ellipse in the 2D power spectrum. From the 2D power spectrum, the spatial autocorrelation was computed by inverse FFT (Wiener–Khinchin theorem). The Auto-freq parameter was the frequency at 50% of maximal auto-correlation. Two Weibull parameters, CE (contrast energy) and SC (spatial coherence), represent the width and the shape of the Weibull fits for the local contrast histogram, derived from multiple filters with different spatial scales50. The kurtosis and skewness of the contrast-value distribution were measured after using a first-order difference-of-Gaussians filter (size = 5) to extract contrast values.

To link visual statistics to an animal’s camouflage pattern, we calculated the correlation between animal patterns and background images (Fig. 2a and Extended Data Fig. 4e). To enable the direct comparison between the body patterns and backgrounds, the 512-dimensional pattern representations of both body patterns and backgrounds (755 pairs from 3 animals) were first transformed by PCA. The first 50 PCs were then used for canonical correlation analysis to identify the linear combination of PCs best able to correlate body patterns and backgrounds. The Pearson correlation was calculated for each PC between body patterns and backgrounds, by animal. Second, different general linear models were trained to predict the camouflage patterns using individual or combinations of the visual statistics described above (Extended Data Fig. 4d). For each animal, we performed threefold cross validation (2/3 training, 1/3 test) on animal–background image pairs. For the training set, 13 general linear models were fitted separately on two visual texture representations (VGG-19 and Portilla & Simoncelli texture model), nine low-level image features, the combination of these nine features and downsized images. Model prediction residuals were calculated using the test dataset. The relative reduction of such residuals from the residual by the null model (fitted only using the intercept) were calculated as deviance reduction. The averaged deviance reduction, computed from 1,000 repetitions of fitting and cross-validation, was used to compare the performance of different visual features in predicting the animal’s responses. Similarly, for the checkerboard dataset (Fig. 2b), the skin-pattern representation was first transformed by PCA on all animals collectively (50 components). Then linear regression was performed on each of the PCs per animal.

Dynamics

The speed of skin-pattern change was calculated as the time derivative (dt = 0.04 s (Figs. 3 and 4) and dt = 0.4 s (Extended Data Fig. 3d–f)) of the Euclidean distance of the first 200 PCs in skin-pattern space, smoothed with a 2 s window.

To compare the dynamics associated with animal locomotion and background transition (Extended Data Fig. 3d–f), the speed profiles were aligned (t = 0) to the peak in motion speed (where the background remained unchanged), or the trough in background correlation (corresponding to a background transition, which were occasionally followed by motion of the animal). The aligned speed profiles were resampled at 1 s intervals. Periods during which the background remained unchanged were identified as ones where the frame-to-frame image correlation remained above 0.9 for at least 10 s. Motion epochs were detected during these constant-background periods by thresholding the 2D speed of the centre-of-mass of the cuttlefish mask at 2 s.d. above the mean. A background transition is defined as a period between two constant-background periods of different background identities. The background identity of each constant-background period was determined by the following procedure: first, 4 patches of the first frame around (but not containing) the animal were combined into a composite. Then, the third-layer (conv3_1) activation of the VGG-19 model (see above) of each composite was max-pooled and then classified (k-means, 3 classes, with manual cluster sorting). The motion- and background transition-triggered speed profiles were built for each animal. We measured the duration of skin-pattern change starting at the time at which the motion speed (in pattern space) exceeded 10% of the peak motion speed above the baseline.

To characterize the dynamics of skin pattern change during camouflage transitions (Figs. 3 and 4), low-velocity regions of each trajectory were identified as local minima after 2 s window smoothing. Before and after each slow point, that is, during deceleration (from local speed maximum to local minimum) and acceleration (from speed local minimum to local maximum), the speed quartiles were used to separate fast (Fig. 4b (red)) from slow phases. The duration of each slow phase was defined as the dwell time at that slow point (Fig. 3d). Each step between the fast and slow phases along the trajectory was considered a step in camouflage refinement. In skin-pattern space, the distance from skin pattern to background pattern was measured in the top two PCs. For the histogram of steps per trial (Fig. 3d), the distance to the background at slow points was used as the distance for each step. The histogram was plotted for each trajectory and averaged across all trajectories (n = 85, from 3 animals). The dwell time was bin-averaged (bin = 55) along the distance (Fig. 3d (red curve)). For Fig. 3c, two motion-direction models were distinguished by measuring two angles, α and β, as an animal’s skin pattern moved from a starting pattern (start), through intermediate slow points towards an eventual steady-state pattern (goal). α is the angle between the vector connecting point n − 1 to point n, and the vector connecting point n − 1 to the goal. β is the angle between the vector connecting point n − 1 to point n, and connecting the start to the goal. In the memory model, the animal follows the initial direction from the start to the goal, resulting in both α and β values of near 0. In the update model, the animal updates the direction that it must move to reach the goal in every step, resulting in α values of near 0, but not β. The angle was measured as the arctan of the cross product and dot product of the two vectors in the top two PCs. In Fig. 3e, we calculated after each step (that is, at each local minimum of pattern motion velocity) the correlation between the skin pattern at that time and the background, in the space defined by PCs 1–50. The difference between this instantaneous correlation and that measured at behaviour onset was then averaged across all of the trials analysed above.

Chromatophore segmentation and tracking

High-resolution imaging data were processed to extract chromatophore population activity using a computational pipeline14 that was modified to accommodate camera-array data, designed to film larger animals (Extended Data Fig. 1e).

Data selection

We filtered images over all the cameras with a difference-of-Gaussians (DoG) filter that was tuned to detect chromatophore-size features (2 and 1 s.d.). The sum of all pixels over all cameras was taken as a focus statistic. We placed a dataset-specific threshold on this statistic to select a series of in-focus time periods (chunks) for the different experiments:

Checkerboard datasets (Fig. 2): as described above, the last 30–60 s of each 5–10 min trial was selected as the stabilized camouflage response for subsequent analyses. All chunks were confirmed visually for lack of animal locomotion.

Pattern transition datasets (Figs. 3 and 4): analysis of low-resolution video (above) revealed pattern transition timepoints. The subset of these transitions that were also in focus of the high-resolution camera array (~50%,) were taken for chromatophore analysis.

Threatening stimulus datasets (Fig. 5): all trials in which animals displayed a decrease in mean chromatophore size to less than 90% of the mean starting size in the first 2 s of the trial and remained in focus, were used to calculate the Spearman R for blanching time versus return speed. For all of the other analyses, we discarded low-vigour blanching responses in which the mean chromatophore size during blanching remained above 50% of the mean starting size.

Panorama construction

For the first timepoint in every chunk, we next determined which cameras in the array contained a view of the cuttlefish. We constructed a rough panorama view over all cameras in the array using our extrinsic camera calibration. This image was filtered using the same DoG filter as introduced above, and smoothed with a Gaussian filter (s.d., 25 pixels). We then thresholded this image, taking the largest contour as a cuttlefish mask. Images containing mask pixels were taken as the relevant cameras for that chunk.

Depending on the animal’s size and position relative to the array, 1–7 cameras were typically relevant for a given chunk in our datasets. For these cameras, and taking the first image in every chunk, we next used parallax-tolerant nonlinear stitching51 to form a single panorama view. Prominent greyscale image features were detected using SURF52, and features were matched across cameras with overlapping field of views. An affine transform was estimated from these matched feature points, and outliers were removed using the M-estimator SAmple Consensus (MSAC) algorithm53. Noisy image pairs containing few (10–150 depending on the dataset) matched features were removed. We refined our initial camera extrinsic parameters using these matched features. We performed bundle adjustment using the Levenberg–Marquardt algorithm54,55, optimizing the similarity transform between all sets of cameras. Finally, robust elastic warping56 was performed to remove parallax effects. We saved the nonlinear transformations mapping each camera’s image into the resulting panorama.

Chromatophore segmentation

In parallel with the above panorama construction, we segmented chromatophores on the relevant cameras (see above) over all of the images within usable chunks. In this study, we refer to the pigmented chromatophore proper as ‘chromatophore’, and ‘chromatophore size’ as the size of the pigment cell that we track. We trained convolutional neural networks (U-Net57) to perform semantic segmentation, classifying a cuttlefish’s dark chromatophores. We used the prediction score as a probabilistic readout of the expansion state, allowing for sub-pixel resolution and improved signal-to-noise ratio. At our imaging resolution, light chromatophores14 were not detected reliably enough for robust segmentation. Classifiers were trained on 64 × 64 cropped images of cuttlefish skin, manually labelled using a custom GUI (pyQt). To increase classifier robustness, we used image augmentation58, randomly rotating, reflecting, scaling brightness, Gaussian blurring and applying piecewise affine transformations.

Aligning segmented panorama within a chunk

To track chromatophore expansion states, we modified our strategy14 of fixing their pixel locations over the images in a dataset. We did this in two steps, removing animal and breathing movements to register all images within a chunk, followed by alignment over chunks, described below. For every frame in a chunk, we used our nonlinear transformations, calculated above (see the ‘Panorama construction’ section), to form panoramas of segmented images. During panorama construction, images were sequentially mapped into a unified reference frame56. Notably, during this process, we updated the panorama only at pixel locations where no image had yet been mapped to. Overlapping fields of view were thus not averaged together. This method helped us deal with errors in panorama mapping coming from slight animal movements. On the first panorama image of a chunk, we selected a random set of chromatophores distributed over the animal for tracking. For subsequent frames we used Lukas–Kanade optical flow and moving-least-squares interpolation14 to track animal movements and align all images to the first image of a chunk.

Stitching over chunks

We mapped all chunks, separated in time by intervals as long as several days, into a common reference frame. We call this process ‘stitching’. Coarse-to-fine grid alignment was performed as described previously14, with four changes to increase accuracy. First, we stitched together the first segmented panorama image from each chunk, rather than the average segmented image over a chunk. Second, we used a 128 × 128 pixel grid for coarse alignment, rather than 256 × 256 pixels. Third, we introduced a manual refinement step, in which poorly matched coarse grid points and images in cases in which registration failed were removed using a custom GUI (pyQt). Finally, grid alignment was followed by an additional alignment step: we used the SyN algorithm59 (sigma_diff = 7, radius = 32) to register image pairs precisely, with a scale space of three levels (50, 25 and 5 pixels). The image with the lowest average reprojection error before manual refinement was selected as the dataset reference frame.

Chromatophore extraction

To extract chromatophore expansion states (areas) over time throughout a dataset, we mapped the first segmented panorama from all chunks into the dataset reference frame, and averaged the resulting image. We then applied the watershed transformation to this average aligned frame to determine chromatophore regions. The chromatophore expansion state was determined by mapping segmented panorama images from a chunk’s reference frame into the dataset’s common reference frame, and calculating the sum within every chromatophore region.

Imaging artefacts due to compression during video recording occurred about every 250 frames. Around such artefacts, detectable as periodic sharp peaks in PCA speed, 10 frames were removed and remained blank. For analyses concerning sequence of activation of individual chromatophores (Fig. 5), these artefacts were instead removed with a median filter with a 1 s window. A mask was constructed on the average aligned frame with DoG filtering to remove chromatophores with low probability of detection due to imprecise alignment. These chromatophores were generally located around papillae. This also removed regions at the curved edges of the mantle, resulting in the tracking of 76%, 75% and 70% of the pixels on the mantle in Fig. 2, and 78%, 75% and 79% in Fig. 4. In Fig. 5, a rectangular mask was used to restrict the analysis to the dorsal part of the mantle (Extended Data Fig. 10), of which 98%, 92% and 69% (in sepia218, sepia219, sepia221, respectively) of pixels remained after DoG filtering.

Pipeline implementation

The chromatophore-tracking pipeline was implemented using OIST’s Deigo and Saion HPC systems. Deigo performed all steps except for chromatophore segmentation, processing jobs in parallel on single nodes with up to 128 cores and 512 GB RAM per node. Chromatophore segmentation was performed on Saion GPU nodes using up to 32 GPUs (Nvidia V100 and P100s). Datasets for which the animal was small enough to fit in a single camera view were processed without panorama construction on CPU nodes of MPIBR’s computing cluster (24–32 cores, 192–512 GB RAM per node). Data management and parallel computation was performed as described previously14.

Chromatophore space analysis

Dynamics

Speed in chromatophore space (Fig. 5 and Extended Data Figs. 5 and 8) was calculated as the time derivative (dt = 0.04 s) of Euclidean distance in 200 PCs, and was then smoothed by a 2 s window.

Component analysis

Tens of thousands of chromatophores (60,884; s.d., 679) were grouped into 32 ± 3 pattern components on the basis of their covariation during pattern changes. Chromatophore areas over time during pattern transitions were transformed by PCA. The top 50 PCs were then used to define co-varying chromatophores as connected nodes (n_neighbors = 10). The Leiden algorithm (resolution = 2 (Figs. 2 and 4) and 0.5 (Fig. 5)) was used to detect non-overlapping communities from the network of chromatophores60 (scanpy package61). These communities of chromatophores were taken as pattern components. Components of which the mean expansion state was significantly changed (>1 s.d.) during the whole or specific phases of the pattern transition were considered to be active components (Fig. 4a).

To compare the degree of pairwise correlation of chromatophore activity between pattern components and their physical separation, multiple metrics were used to measure how two pattern components are interdigitated in space. We measured spatial overlap after spatial binning (Extended Data Fig. 7a,b), pairwise distance (Extended Data Fig. 7c) and Wasserstein distance (Fig. 4d and Extended Data Fig. 7d). To estimate how well community-based clustering could capture overall chromatophore activity (Fig. 4e), we substituted chromatophore activity for the mean activity of all chromatophores within a pattern component. This simplified chromatophore state matrix was then transformed by the same PCA model previously fit to the original chromatophore state matrix. Percentages of explained variance were compared between simplified and original trajectories in the same space defined by the first 200 PCs. To compare different component clusterings on the basis of the covariation in different pattern transitions (Fig. 4i), we paired components sharing the largest proportions of chromatophores among all possible pairs. For all transition pairs, the mean intersection over union of chromatophore groupings (also known as the Jaccard index62) was used as a similarity metric between different partitions. We tested the following additional metrics of clustering similarity: Wallace coefficient63 and adjusted rand index64 (Extended Data Fig. 7f,g). In all cases, the clustering similarity metrics were plotted against the distance between transitions pairs, which was defined by the mean pairwise distance between two transitions: for two trajectories of length M and N, we calculated the average of the M × N matrix of distances. This distance was normalized by the s.d. of all the dataset of each animal.

For checkerboard datasets (Fig. 2c), log-linear regression was done on the mean area of the chromatophores in a given component over stimulus square sizes ranging from 0.1625 to 2.5 cm.

For threatening stimuli datasets (Fig. 5i and Extended Data Fig. 10), clustering was performed on the fast phase of the outwards and return trajectories. The fast phase was defined as the time when the instantaneous mean chromatophore speed (smoothed with a 4 s Hann time window) was above 10% of the peak speed in the respective outwards and return trajectories. Chromatophore area time-series were centred using only these fast phases before performing PCA and community detection (using the top 50 PCs). Such trimming was performed to isolate the behaviours of interest (pattern changes) in response to threatening stimuli, and thus exclude timepoints when the animal was set on a static pattern.

To compare chromatophore components between camouflage and blanching datasets (Fig. 5k–l), clustering was performed on all trials (all trials), and also for each individual trial (by trial). Explained variance for each trial and condition was computed as above, and the ratio obtained by dividing the by-trial-explained variance by the all-trial-explained variance. The shuffled dataset was generated by shuffling chromatophore-to-component assignment after by-trial clustering. We used all trials for blanching datasets (see the ‘Data selection’ section above) (sepia218, n = 11; sepia219, n = 17; sepia221, n = 4). For camouflage datasets, we selected trajectories that were close in PC space; they were selected by hierarchical clustering (Ward’s linkage), performed on the pairwise Hausdorff distances (in the first two PCs) between all pairs of camouflage trajectories. We selected the largest trial cluster after cutting the hierarchy at a cophenetic distance of d = 100 (sepia213, n = 7; sepia218, n = 8; sepia219, n = 10). Analysis was performed across a range of resolution parameter values (0.25 to 4, with 0.25 steps) to check for the robustness of the results across different scales of component decompositions (Extended Data Fig. 9a). For Fig. 5k, the resolution parameter for each dataset was chosen to match the number of components extracted on individual trajectories.

Experiments with threatening visual stimuli

To compute correlations between start, blanched and end pattern, we first took a 10-frame (0.4 s) average of each chromatophore area around each event timepoint per trial. Hierarchical clustering was performed using correlation distance and complete linkage, and tanglegrams plotted with the R package dendextend, using the ‘step2side’ algorithm for untangling65.

To identify component recruitment sequences, we first used median-filtered normalized (minimum–maximum) chromatophore area time series and, for each trial, selected chromatophores that had a minimum size increase of 0.15 during the return to camouflage. Times of chromatophore recruitment during the return trajectory were obtained after smoothing with a 1 s Hann time window and trial-wise minimum–maximum normalization. The time of recruitment was defined as the time of upwards crossing of a 0.1 threshold. Choosing the time of peak speed yielded similar results. Times of recruitments were then ranked using the average method to resolve identical ranks.

The density of recruitment ranks was analysed by histogram binning over trials (50 equally sized bins). The distribution over trials was plotted similarly by first computing, for each chromatophore, their mean rank over all of the trials. The s.d. of the bin density was computed as:

$$\sigma =\sqrt{{\sum }_{i}{p}_{i}\left(1-{p}_{i}\right)}$$

for all chromatophores i, with pi the probability that a chromatophore with a normally distributed rank N(μi,σi) falls into that bin, where μi and σi are the observed mean rank and s.d., respectively. A Kruskal–Wallis test was performed on component-wise chromatophore-averaged mean ranks. Post hoc multiple hierarchical permutation tests were performed with the Python package Hierarch66 using 100 permutations, 10 bootstrap samples and Benjamini–Hochberg correction.

Quantification of tortuosity

To compute curvature along pattern trajectories, we reparameterized trajectories by their arclength. This enabled us to measure curvature homogeneously along the trajectories, independently of their speed. We first applied PCA (2–200 PCs tested; Extended Data Fig. 9b) on individual trials and then used the CubicSpline function (sciPy) to fit piecewise cubic polynomials to the trajectory coordinates along each principal component, using arclength as the independent variable. We then interpolated along these trajectories such that they were traversed at unit speed. The curvature at each point n along these trajectories was computed as ||Tn + 1 − Tn||, where T is the local tangent vector. For threatening visual stimulus datasets, reparameterization and curvature were computed only over the fast phases (see the ‘Component analysis’ section) to include only dynamics in response to the stimuli.

Statistics and reproducibility

Unless stated otherwise, data are mean ± s.e.m. Box plots show the median and upper and lower quartiles, with whiskers extending to 1.5× the interquartile range and outliers are shown as individual points. Experiments were repeated independently several times with similar results. The numbers of repetitions were as follows:

Skin-pattern space analysis (Fig. 1 and Extended Data Fig. 3a–c) was carried out in 12 animals, 6 of which (each with at least 20 analysable trials of swift background change) were included in the analysis of background change (Extended Data Fig. 3d–f). Sample sizes were not predetermined, but chosen based on experience with similar experiments and on animal availability. Natural-image experiments (Fig. 2a and Extended Data Fig. 4d–f) were carried out in 3 animals with 8 to 12 repetitions each. Checkerboard experiments with dense sampling (Fig. 2b,c and Extended Data Fig. 4a–c) were carried out in 3 animals with 4 to 14 repetitions per stimulus in each animal. Three animals (14, 30 and 29 repetitions, respectively, for 6 types of background changes) with high-quality high-resolution data were included in the analyses of chromatophore space (Figs. 34 and Extended Data Figs. 57). For each animal, experiments were conducted in two to three experimental sessions on separate days. Threatening visual stimulation (moving hand or looming image display) experiments (Fig. 5 and Extended Data Figs. 810) were carried out with 4 animals in 1 to 4 experimental sessions on separate days, yielding 11, 22, 19 and 9 trials with high-quality high-resolution data. All filming experiments were repeated by two to three experimenters, on different days, with the same animals, with comparable results.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.