Introduction

There is growing attention devoted to analysing physical systems through machine learning (ML) techniques given the ground-breaking advancements in artificial intelligence strategies1,2. With prominent examples of generative modelling3, recommendation systems4, natural language processing5, decision processes and disease detection6, ML provides means to grasp data features that can escape the eyes of a trained professional. It has also initiated the effort in quantum ML to be performed by quantum devices7,8. In the case of classification tasks, ML became a useful tool to reveal phase transition boundaries in spin systems9,10,11,12,13,14,15, topological models16,17,18,19,20,21, photonic condensates22, and strongly correlated fermionic systems23,24,25. In quantum chemistry it is used to predict properties of organic compounds and perform high-throughput calculations26,27. In nanophotonics ML techniques are widely used for inverse design28,29. Other examples include detection of Wigner function negativity in multimode quantum states30 and automatic learning of topological photonic phase transitions31,32. In many cases ML gives greater insight into non-equilibrium systems33,34 which are well known to host numerous nontrivial solutions35. Notably, many fundamental features in nature such as the complicated patterns appearing on animal coats36 and proliferation of defects in the Higgs field37 are linked to non-equilibrium analogues of phase transitions. This question was investigated in optical systems, specifically noting cooperative phenomena and self-organisation during lasing38,39,40. The nature of such phase transitions was also studied in non-reciprocal systems41 which describe systems with gain and loss. Similar physics can be studied in condensed matter systems, such as superfluids and Bose–Einstein condensates, offering an experimentally friendly strategy to explore such pattern formation and spontaneous self-organisation42 which can benefit from ML techniques.

Semiconductor microcavities43 in the strong light–matter coupling regime show increasing promise for studying novel nonlinear low-dimensional optical phenomena. The normal modes in this regime are exciton–polaritons44, quasiparticles coherently composed of both excitonic resonances in embedded quantum wells and trapped photonic cavity modes. They enjoy the benefits of picosecond scale response times and high nonlinearity (particle interactions) coming from their photonic and excitonic parts, respectively. To date, various nonlinear effects were studied, showing polariton condensation (or lasing)45,46,47, spin pattern formation48, solitons49, vortices50,51, quantum correlations52, among many others44.

Perhaps the most exciting advancement are lattices of polariton condensates which have emerged as a promising way to create extended systems of trapped nonlinear light53. They can be realised using a variety of techniques such as lithographically patterned inorganic54 and organic55 cavities which act on the photonic mode, or using sculpted nonresonant lasers which act on the exciton mode56. The latter case offers the interesting option of creating either ballistic gain guided57,58 or optically trapped59,60,61 polariton condensates through the repulsive interactions between polaritons and photoexcited background excitons. Today, polariton lattices have enabled the studies for topological properties57,62,63,64,65, dispersionless bands66,67, as analogue simulators of the XY-model68,69 and oscillatory networks70, and as optimisers for NP-hard problems71,72,73.

With rapid improvements in the abovementioned techniques, the coherence length of polariton condensate lattices now greatly exceeds the typical unit cell size58,61,74 which gives hope to study new and interesting phases of dissipative bosonic matter determined by the coherent flow of polaritons across the lattice sites. Indeed, in contrast to lattices, spatially uniform condensates are notoriously difficult to realise due to cavity disorder fragmenting the polariton fluid. Nonetheless, this idealised scenario has captured theoretical work in the recent years focused on dissipative Kibble–Zurek mechanisms through proliferation of vortices due to modulational instability75, spontaneous Turing patterns in resonantly driven systems76, non-equilibrium Berezinskii–Kosterlitz–Thouless phase transition in the optical parametric oscillator77 and incoherent pumping78 regimes, and the critical exponent universality at long times79. Formation of polarisation domain walls through the condensation (phase transition) quench80 and XY spin phases81 were reported in lattice chains, and vortex street formation due to snaking instabilities in both resonantly82 and nonresonantly83 driven polariton fluids. It is therefore of interest to develop and apply ML strategies for these driven-dissipative systems to facilitate understanding on how different phases are separated in this zoo of possibilities, especially in terms of the state-of-the-art condensate lattices.

In this paper, we use ML to classify phases of spinor exciton–polariton condensate lattices. We focus on recent experimental findings demonstrating highly nontrivial polarisation behaviour between optically trapped condensates resulting in both spontaneous and random pattern formation of the condensate polarisation (polariton pseudospin orientation)60,84, a so-called spin-bifurcation regime. We have chosen this system since it offers a relatively simple experimental method to verify our findings through full Stokes polarimetry measurements on the emitted cavity light which carries information on the polariton pseudospin (or spin for short). We use ML to distinguish polarisation patterns across our lattice. This provides an efficient method to map out non-equilibrium phase boundaries. We sketch out the clustering of our multidimensional data and, using learning by confusion13, we refine the boundaries between different phases. Our results are applicable to other observables across different driven-dissipative oscillatory systems such as coupled laser arrays and photonic condensates.

Results

Model

We consider a square lattice of optical cavities typically represented by coupled micropillars [see the sketch in Fig. 1a]. We consider the regime where the ground state mode of each pillar becomes macroscopically occupied by the polariton condensate. Each condensate is described by a coherent spinor wave function \({{{\Psi }}}_{n}={({\psi }_{n+},{\psi }_{n-})}^{{{{{{{{\rm{T}}}}}}}}}\) for the nth lattice site. The two spinor components ψ± correspond explicitly to the circular polarisation of the cavity light σ±. The whole lattice is incoherently pumped by off-resonant linearly polarised light at high energy such that no phase or polarisation information is transferred from the laser source into the condensates. Such a system can be modelled using a set of coupled generalised spinor Gross–Pitaevskii equations44,

$$i\frac{{\mathrm d}{{{\Psi }}}_{n}}{{\mathrm d}t}= \frac{i}{2}\left({W}_{t}(t)-\eta {S}_{n}\right){{{\Psi }}}_{n}-\frac{1}{2}(\epsilon +i\gamma ){\hat{\sigma }}_{x}{{{\Psi }}}_{n}\\ +\ \frac{1}{2}(\bar{\alpha }{S}_{n}+\alpha {S}_{n}^{z}{\hat{\sigma }}_{z}){{{\Psi }}}_{n}-(1-i{{\Lambda }})\frac{J}{2}\mathop{\sum}\limits_{\langle nm\rangle }{{{\Psi }}}_{m},$$
(1)

where we have introduced the condensate pseudospin to describe the polarisation (magnetisation) of the lattice,

$${{{{{{{{\bf{S}}}}}}}}}_{n}={({S}_{n}^{x},{S}_{n}^{y},{S}_{n}^{z})}^{{{{{{{{\rm{T}}}}}}}}}=\frac{1}{2}{{{\Psi }}}_{n}^{{{{\dagger}}} }{{{\hat{{{{{\boldsymbol{\sigma }}}}}}}}}{{{\Psi }}}_{n}.$$
(2)

Here \({{{\hat{{{{{\boldsymbol{\sigma }}}}}}}}}=({\hat{\sigma }}_{x},{\hat{\sigma }}_{y},{\hat{\sigma }}_{z})\) is the standard Pauli vector, and the magnitude of the spin for nth condensate is Sn = (ψn+2 + ψn2)/2. The factor 1/2 is conventional. When presenting pseudospin patterns for the lattice we use normalised intensities at each site defined as sn = Sn/Sn. The parameters in the first line of Eq. (1) include: Wt(t) describing the time-dependent incoherent pump rate (gain) with subtracted linear losses (i.e., we have absorbed the conventional linear polariton loss parameter Γ, corresponding to the cavity photon escape rate, into our net gain parameter W); η being a gain clamping (saturation) parameter describing isotropic nonlinear losses; ϵ and γ being energy and linewidth (losses) splitting between the linearly polarised modes \({\psi }_{x,y}=({\psi }_{+}\pm {\psi }_{-})/\sqrt{2}\). Physically, the complex valued linear polarisation splitting appears due to cavity strain85, leading to non-Hermitian coupling between circular polarisation components and defining the effective spin properties. The first term in the second line of Eq. (1) describes the nonlinear shift of polariton energy due to polariton–polariton interactions for the same spin (α1) and opposite spin (α2) components. Specifically, in the circular polarisation basis we use the combinations α = α1 − α2 and \(\bar{\alpha }={\alpha }_{1}+{\alpha }_{2}\). Finally, the last term in Eq. (1) describes the Josephson type coupling between lattice sites, J, and Λ is an energy dampening parameter according to the Landau–Khalatnikov approach86. The sum is to be taken over nearest lattice neighbours.

Fig. 1: Lattice of coupled polariton condensates.
figure 1

a Sketch of a square-arranged polariton lattice based on coupled micropillars. J denotes the tunnelling between sites and W corresponds to the gain coming from the incoherent pump. b, c State space flow diagrams showing the evolution of the single condensate for several different initial conditions (here ϕ and θ are polar and azimuthal angles that parametrise pseudospin direction). We reveal the change from a single dominant fixed point attractor sz = 0 into two attractors of broken symmetry between spin-up and spin-down polaritons sz ≠ 0.

The system of equations (1) was found to describe successfully experiments on trapped polariton condensates60,85,87. To study the condensates polarisation patterns, the incoherent pump is increased slowly and linearly in time until the target value W is reached at the time tf,

$${W}_{t}(t)=W\left({{\Theta }}[{t}_{f}-t]\frac{t}{{t}_{f}}+{{\Theta }}[t-{t}_{f}]\right),$$
(3)

where Θ[t] is a Heaviside step function. Starting from noisy background (stochastic initial conditions), the polaritons will condense (i.e., Sn > 0 solution forms) when a critical threshold pump power Wcond is reached. The condensation threshold is determined by the condition Sn = 0 and when a single eigenvalue of Eq. (1) goes from having a negative imaginary part to positive imaginary part with increasing pump power Wt. This crossover takes place at Wcond = − (γ + ZΛJ), where Z = 4 is the number of nearest neighbours, and belongs to a linearly polarised solution written \({S}_{n}=-{S}_{n}^{x}\) (because γ increases the gain for vertically polarised polaritons). We will throughout the paper refer to this linear polarisation regime as the XY phase in our ML analysis which refers to the fact that the pseudospin is lying on the equatorial plane of the Poincaré sphere. In the terms of amplitude oscillator models, the condensation point is also a bifurcation point marking the departure of the condensate (the oscillator) from the stable Sn = 0 solution. We note that Wcond < 0, which may seem counter intuitive from the perspective of “negative power”, but arises naturally since our parameter W describes the difference between pump gain and linear cavity losses.

When we further increase the pump power, the system becomes spontaneously circularly polarised at a second critical power value Wbif even though the gain and saturation are spin isotropic and Eq. (1) does not favour one sz spin projection over the other85. This phenomenon was labelled as a spin bifurcation. It allows for observation of spontaneous magnetic ordering between interacting condensates60, and can give rise to topologically protected elementary excitations64. Spin bifurcation can be demonstrated in the simplest case of a single condensate (i.e., J = 0). Using the polariton pseudospin parametrised on the Poincaré sphere by the polar and azimuthal angles θ and ϕ, we can express it as \({{{{{{{\bf{s}}}}}}}}={(\sin \theta \cos \phi ,\sin \theta \sin \phi ,\cos \theta )}^{{{{{{{{\rm{T}}}}}}}}}\). Solving the generalised Gross–Pitaevskii equation numerically for W = 0 and W = 5/3, and random initial conditions, we observe how the phase space flow transforms from one dominant fixed point attractor into two fixed point attractors just by increasing the pump [Fig. 1b, c]. This corresponds to spontaneous symmetry breaking for the sz spin projection, known as the polariton spin bifurcation85. The unit of time t is taken in units of ϵ−1 and we used γ = 0.2, η = α1 = 0.083, α2 = −0.1α1, and Λ = 0.25 similar to previous studies where the model was fitted to experimental observations85.

In order to determine the spin bifurcation pump power Wbif we need to consider the stationary solutions of Eq. (1) where each node has the same particle population Sn = Sn+1 and same magnitude spin polarisation \(| {S}_{n}^{z}| =| {S}_{n+1}^{z}|\). It can be shown that solutions which satisfy the above requirements and minimise the bifurcation threshold are of the form84

$${{{\Psi }}}_{n}=\left\{\begin{array}{ll}{{{\Psi }}}_{n+1},&\,{{\mbox{if}}}\,\quad {S}_{n}^{z}={S}_{n+1}^{z},\\ -{\hat{\sigma }}_{x}{{{\Psi }}}_{n+1},&\,{{\mbox{if}}}\,\quad {S}_{n}^{z}=-{S}_{n+1}^{z}.\end{array}\right.$$
(4)

These trivial solutions characterise ferromagnetic and antiferromagnetic states where two condensates are spin parallel with zero phase slip between them or spin antiparallel with a π phase slip between them respectively. The bifurcation threshold is dictated by the parameters of the system and possible spin arrangement between nearest neighbours,

$${W}_{{{\mbox{bif}}}}={W}_{{{\mbox{cond}}}}+\eta \frac{{(\epsilon -{Z}_{\uparrow \downarrow }J)}^{2}+{(\gamma +{Z}_{\uparrow \downarrow }{{\Lambda }})}^{2}}{\alpha (\epsilon -{Z}_{\uparrow \downarrow }J)}.$$
(5)

Here, Z and Z are the number of nearest-neighbour ferromagnetic and antiferromagnetic bonds for a condensate in the lattice (equal for all nodes). In general, Eq. (5) states that a stationary polarisation pattern of certain parallel and antiparallel nearest-neighbour spins may arise when Wt is increased to Wbif. However, it is not known beforehand what determines the exact outcome of Eq. (1) starting from some initial state vector. For example, Z = Z = 2 patterns have many different possible configurations for a given lattice size which all have the same bifurcation point Wbif. We also do not know the stability of these steady-state solutions and what other solutions might exist. Apart from ferromagnetic and antiferromagnetic bonding configurations between nearest-neighbour condensates one can expect more complex states to appear which can be categorised broadly as stationary, cyclic, and chaotic with condensate patterns of varying spin and magnitude. Our goal is to use ML to characterise and cluster these patterns.

Next, we continue to present our numerical results. Specifically, we describe: (1) the numerical procedure of generating the dataset of polariton polarisation patterns; (2) details of the data analysis and visualisation; (3) mapping of coarse-grained phase boundaries and qualitative description of the zoo of phases; (4) introduce unsupervised ML methods; (5) and present the phase diagram of the polariton lattice spin phases.

Numerical simulations

We consider an 8 × 8 polariton lattice and numerically solve generalised Gross–Pitaevskii equations (see details in the “Numerical modelling” subsection in “Methods”).

In Fig. 2 we show an example of four simulations of the full lattice polarisation. In Fig. 2a, c, e, g we plot normalised \({s}_{n}^{z}(t)\) spin components for all sites as a function of time. In Fig. 2b, d, h, f we plot final polarisation patterns measured at tf, where colour bars encode the magnitude of the spin component \({s}_{n}^{z}({t}_{f})\). The four examples shown in Fig. 2 are picked from a set of 100 unique simulations with random gain W and coupling strength J to illustrate the plethora of phases appearing in our system. Specifically, Fig. 2a, c, e, g correspond to W = {0.77, 0.005, 0.69, 0.12} and J = {0.13, 0.48, 0.24, 0.48}, respectively. To model experimental conditions, we also use stochastic initial conditions.

Fig. 2: Polariton lattice dynamics.
figure 2

In the left column we show examples of dynamical trajectories \({s}_{n}^{z}(t)\) for an 8 × 8 lattice of condensates for different values of J and W. Overlaid black lines correspond to different condensates in the lattice. In the right column we show the corresponding normalised magnetisation \({s}_{n}^{z}({t}_{f})\) at final time tf = 480. Depending on W and J distinct polarisation patterns appear with hints of the antiferromagnetic order (a, b), weak circular polarisation (c, d), two spin-down and two spin-up neighbours (e, f), and the striped pattern (g, h). Note the strong non-convergent character of the dynamics in (e).

The resulting dynamics can correspond to both stationary [Fig. 2b, d, h] and nonstationary patterns [Fig. 2f]. The latter emerge due to the interplay of drive, decay and nonlinearity in the system. Our goal is to find stationary states with distinct polarisation pattern formation that can be seen as phases of matter for polaritons, which we refer as polaritonic phases in the following. We observe that various polaritonic phases can emerge as analogues of spin phases, albeit in the driven-dissipative setting. For instance, in Fig. 2b we observe a spin pattern that resembles antiferromagnetic ordering with [Z, Z = (0, 4)].

Having observed qualitatively different behaviour for polarisation of the nonlinear polaritonic lattice, we may ask a question: how do we classify and draw boundaries between different polaritonic spin phases? Unlike the thermodynamic equilibrium case, in the driven-dissipative case we do not have an established theory of phase transitions81. We therefore take a data-driven approach, and use ML for unsupervised clustering of polaritonic phases.

Data visualisation

The prepared dataset of polarisation patterns contains {Sn} lists with 192 entries for each point on the equally spaced grid \({\{{J}_{j},{W}_{k}\}}_{j,k}\). We set tf = 3000 in Eq. (3) such that all other timescales are surpassed. While the full dynamics is obtained by numerical propagation of Eq. (1), in practice we only retain data at the last timesteps at \({\mathbb{T}}={\{{t}_{f}+i\delta t\}}_{i = 0}^{10}\) with δt = 1. Next, we perform pre-processing on the raw data to ensure only relevant configurations are studied. For this, we discard nonstationary data points where the variance (difference) between spin patterns in the time series \({\mathbb{T}}\) is greater than some sensibly chosen tolerance, and concentrate only on stationary states. For convenience, we also filter out redundant configurations differing only through trivial symmetry operations in the sign of \({s}_{n}^{z}\) (for example, the two types of lattices in an antiferromagnetic arrangement). This is done by performing a rotation, which corresponds to changing the signs of \({s}_{n}^{y}\) and \({s}_{n}^{z}\) pseudospin components in cases where \({s}_{n}^{z}\) have the same magnitude.

We then proceed by analysing the high-dimensional data. The starting point corresponds to data visualisation through dimensionality reduction. We employ two methods corresponding to the t-distributed stochastic neighbour embedding (t-SNE)88 and principal component analysis (PCA). These techniques allow for plotting datasets in a low-dimensional feature space (two or three dimensions).

Performing PCA for the dataset we can potentially identify the most important features of the condensate spin lattice. Namely, PCA converts data points into a set of sequential orthogonal components and maximises the magnitude of the sample variance. This can be used as an additional pre-processing step before t-SNE analysis (choosing most relevant features), or for two- and three-dimensional visualisation. For the specific problem we consider, however, PCA did not prove useful for the visualisation of the polaritonic dataset. Complex polarisation patterns cannot be easily distinguished by the dominant principal component (e.g. total magnetisation \({M}_{z}={\sum }_{n}{S}_{n}^{z}\)). This prompts us to use t-distributed stochastic neighbour embedding instead.

We use t-SNE as a tool for finding points in the parameter space that share similar behaviour (see details in the “Visualisation” subsection in “Methods”). In the reduced space t-SNE locates points in a way that similar patterns are placed together, while distinct patterns are shown by distant points (with high probability). This property is useful for mapping the hypothetical phase boundaries, where t-SNE offers a visualisation for clusters of points with qualitatively similar behaviour. We note however that t-SNE does not preserve the distance between points, and can only help drawing qualitative conclusions.

In Fig. 3a we show the two-dimensional t-SNE data visualisation for the dataset with \({s}_{n}^{x}\) and \({s}_{n}^{z}\) components. Specifically, we use the medium perplexity level of 100 and the learning rate of 200. We note that the resulting t-SNE diagram does not change qualitatively with the change of hyperparameters, and similar results can be obtained in a broad range of perplexities and learning rates. Each sample is represented by a thin grey dot. Additionally, we check the polarisation patterns (examples) from a sparse set of \({\{{J}_{j},{W}_{k}\}}_{j,k}\) values, shown as big coloured dots in Fig. 3a. Qualitatively similar patterns are drawn in the same colour. In Fig. 3b we present these examples of polaritonic phases, forming the qualitative map and giving them tentative names. Specifically, we identified: an XY phase where sn = (−1, 0, 0)T; chequerboard antiferromagnetic patterns corresponding to the two-dimensional antiferromagnet (AFM); cluster AFM patterns with zero total z-magnetisation, and configurations where two nearest neighbours are spin-aligned, and two nearest neighbours are anti-aligned, [Z, Z = (2, 2)]; stripe phase with zero total z-magnetisation and [Z, Z = (3, 1)]; a ferromagnetic phase with uniform spin values of \({s}_{n}^{z}\approx \pm\! 1\). We find that configurations with [Z, Z = (1, 3)] (1P-3A) are rare and generally unstable. Additionally, we observe patterns with non-homogeneous polarisation distributions. We label them as a hypothetical wave phase (similar patterns occupying high J and low W region); a glassy phase with emergent domains of reverted polarisation on the dominant background; a diagonal stripe phase with continuous change of \({s}_{n}^{z}\) along the diagonals (distinct from the horizontal/vertical stripe phase).

Fig. 3: Visualisation of tentative polariton phases.
figure 3

a Data visualisation of polarisation patterns using the dimensionality reduction method. Results are obtained using the t-distributed stochastic neighbour embedding (t-SNE) approach on the raw dataset with {sn}. We observe qualitative clustering of characteristic polarisation patterns, and label them by checking the lattice magnetisation for marked points (solid coloured dots). b Qualitative map of polariton phases, shown by the corresponding coloured dots in the pump-tunnelling coordinates (central plot). The map is extracted from t-SNE data and the performed pattern analysis. We provided typical instances of the lattice polarisation \({s}_{n}^{z}\) (insets) for each hypothetical phase, where the top colour bar is the same for all lattices. The labels correspond to: AFM antiferromagnet (red dots), FM ferromagnet (yellow dots), 1P-3A one parallel three antiparallel configuration (dark green dots), cluster AFM (green dots), stripe phase (blue dots), XY phase (purple dots). Wave, glassy and diagonal stripe phase are also shown with rather self-explanatory patterns. The grey region (bottom left) corresponds to lattices where polaritons are not condensed (i.e., Sn = 0).

This zoo of discussed polaritonic phases serves us as a base hypothesis. The question is: do we indeed label distinct driven-dissipative phases defined by unique polarisation patterns in the condensate lattice, or are these simply mixed and frozen patterns between conventional FM and AFM configurations? Next, we test the hypothesis using the unsupervised clustering and neural network (NN)-based learning by confusion approaches.

Unsupervised learning

We now have a map of the polariton condensate lattice phases. Our next step is to perform unsupervised clustering. This procedure analyses the underlying data structure of an unlabelled dataset. The goal is to provide labels for data points, separating them into distinct groups. These groups share similar properties, in our case being stable and stationary spin patterns. We remind that each data point (associated with specific J and W) corresponds to a high-dimensional vector v describing raw polarisation components {sn} or compressed feature vectors {pi}.

In the polariton dataset analysis we use agglomerative and K-means clustering from sklearn library (see details in the “Clustering” subsection in “Methods”). The clustering algorithms are applied to both the raw dataset and the pre-processed dataset with a chosen number of principal components. Other possible choices concern the selection of metric and distance types. To choose a setting of high-performance, we develop a quality score, where good choices consistently assign same labels to data points in the three well known phases (XY, AFM and FM). We achieve best result for {sn} data pre-processed with PCA and considering five principal components. We identified the optimal distance choice as the complete distance with the Manhattan metric. Applying first the agglomerative clustering and labelling each data point, associated to one cluster, by different colours, we plot the resulting phase diagram in Fig. 4a. Comparison with the qualitative map inferred from t-SNE [Fig. 3b] allows us to assess the quality of clustering. We observe the phase boundaries in certain parameter regions. In particular, between XY phase, antiferromagnetic ordering and ferromagnetic ordering are visible. At the same time, while we see that several qualitatively different antiferromagnetic patterns appear at small J and high pump W, the boundaries within are difficult to establish. Finally, the region of 0.75 < J < 1.5 and −0.5 < W < 0.5 with diagonal stripes and spin-glass patterns does cluster out, but contains varying labels that correspond to those with antiferromagnetic orderings. Performing K-means clustering, we observe qualitatively the same performance for K = 6, thus suggesting that some of patterns previously identified in Fig. 3 cannot be categorised as unique phases. To get more quantitative insight into the polariton lattice physics, we apply NN-based methods and further test and refine the phase boundaries.

Fig. 4: Polariton phase diagrams.
figure 4

a Polariton phases separated using the agglomerative clustering. The diagram is built with the Manhattan metric, complete distance, for \({s}_{n}^{x},{s}_{n}^{y},{s}_{n}^{z}\) components and principal component analysis with five principal components. b Polariton phase boundaries obtained using a learning by confusion algorithm. We separated six distinct regions (labelled by I–VI) in the pump vs tunnelling rate coordinates. Comparing results with the qualitative map we confirm the presence of: I—XY ordering, II—antiferromagnet (AFM) phase, III—clustered AFM, IV—stripe ordering, V—ferromagnet (FM) phase, VI—diagonal stripe ordering. The black region shows the range of parameters below the condensation point.

Learning by confusion

While unsupervised learning methods allow to screen datasets and mine qualitative results, typically they are not suitable to determine phase boundaries. In contrast, supervised learning has shown great potential in determining phase boundaries using the power of NNs1,9. They assume that the representative candidates for the phases are known, for instance, defined by zero and infinite temperature limits in classical spin systems. Typically, the datasets of spin patterns are formed by Monte Carlo procedures, where each point in the parameter space (temperature, interactions, etc.) is assigned to a collection of similar patterns. Training the NN as a classifier then allows for identifying the boundary between distinct collections (or phases in the physical sense). In the absence of prior labelling and multiple phases the direct application of supervised training is infeasible. In the following, we use NN-based technique that allows us to determine the phase boundaries without prior knowledge of the phases (no phase labels are provided). This corresponds to the learning by confusion (LbC) approach proposed in ref. 13.

The main idea of LbC is in providing hypothetical labelling and then using supervised training to identify regions where the hypothesis is justified. For simplicity, we will discuss a one-dimensional phase boundary determination, where one of the parameters is fixed. A full phase diagram is then obtained by consecutive line-by-line scanning (both J and W can be fixed and scanned interchangeably). First, let us describe the details of the LbC approach. For this, consider a system that shows a qualitative different behaviour as a function of the parameter W. This corresponds to two phases separated by the critical point located at a certain (unknown) pump power Wcrit [see the sketch in Fig. 5a, insets]. To infer the critical point we can train an NN assigning hypothetical (fictitious) labels, where a candidate for the critical point W0 is chosen on the interval from W1 to W2. All points for W < W0 are considered to be in the first phase (labelled as “yellow”), and points for W > W0 are in the second phase (labelled as “blue”). This corresponds to our hypothesis that needs to be tested for a set of candidate critical points. Note that labelling is applied both to training and testing sets used in variational NN optimisation. We start by setting the critical point to be at the end of the interval, W0 = W1. In this case all data points are assigned to the group “blue”. Next, we test the accuracy of the trained network defined as the probability in which predictions match the provided labels. We obtain 90% accuracy, as test data also contains only examples with a single label. Same situation holds at the other end of the interval, W0 = W2. However, the situation changes when W0 is placed between W1 and W2, and two labels are present. In this case, unless W0 corresponds to the true critical point Wcrit, we are training the network to put qualitatively different data points (feature vectors) in the same phase, leading to confusion and reduced accuracy. The accuracy approaches unity when labelling is performed correctly, meaning W0 = Wcrit. This happens because the inner structure of the lattice matches the markup. The overall behaviour for the accuracy thus resembles a W-shape13 (not to be associated with the parameter W), and is symmetric if Wcrit is located in the middle of the [W1, W2] interval. In other words, the point of phase transition corresponds to the point where the first derivative of the described accuracy function changes sign from plus to minus.

Fig. 5: Learning by confusion.
figure 5

a An example W-shape of the accuracy of neural network training during learning by confusion. We fixed the tunnelling rate J and varied the lattice gain parameter W, observing peak accuracy in training when the hypothetical labelling coincided with the genuine one. The insets show cartoons for possible types of labelling. Circles show genuine labelling corresponding to two phases (yellow and blue), with the true critical point placed in the middle. The hypothetical labelling is shown by stars. b The structure of the neural network used in learning by confusion. It contains three layers: input layer (64 × 3 neurons), hidden dense layer (80 neurons) and output dense layer (2 outputs).

To perform LbC we construct a feed-forward NN with three layers [see the NN structure in Fig. 5b]. The first input layer consists of 192 neurons such that the raw data {sn} can be analysed. The input leads to the fully-connected hidden layer that consists of 80 neurons with sigmoid activation functions, and we use L2 regularisation with the weight of l2 = 0.001. The output layer is also fully connected and has two outputs for learning the effective probability to be in two phases. Here the ReLU (rectified linear unit) activation functions are applied with l2 = 0.001. The example of W-shape accuracy plot obtained during the learning stage for fixed J and varying W is shown in Fig. 5a.

We apply LbC to the polariton lattice data and refine (and test) the boundaries of the phase diagram previously obtained from unsupervised learning. We concentrate on the parameter intervals where phase transition is potentially expected, and use LbC either to find the critical point of a transition, or merge phases if no W-shape dependence is observed. To train the NN we need use a large sample of polarisation patterns. This is achieved by taking a patch of parameters (working with the coarse-grained grid of J and W) and generating multiple patterns by numerically solving Eq. (1) for different initial conditions. The final phase diagram is shown in Fig. 4b, which can be compared to the agglomerative clustering results in Fig. 4a. At small J and W we reveal the region of linearly polarised condensates sn = (−1, 0, 0)T which we refer as the XY phase (labelled by I and coloured in yellow). We note that this phase corresponds to the type of attractor shown in Fig. 1b. As the pump gain increases, we approach three phases of different mixtures of ferromagnetic and antiferromagnetic ordering. We identify them in Fig. 4b as chequerboard AFM (II), cluster AFM (III), and the stripe phase (IV). At high W and J the system clearly enters the ferromagnetic phase (V). At low W the LbC scans however revealed only two phase boundaries where the W-shape emerges. We associate it to the diagonal stripe phase (VI) in the −0.5 < W < 0.5, J > 0.6 region. At the same time, we did not identify distinct cluster with the conjectured glassy and wave phases sketched in Fig. 3b. We conclude that they likely correspond to the transition between phases (crossover). For instance, these are often observed in case of the finite-sized spin systems and manifest as domain walls and domain structures. We note that the performance of LbC procedure does not depend significantly on the structure of the NN, as long as it has a high expressivity. At the same time, the procedure is limited by the non-convex optimisation procedure (and possible local minimum trapping) as well as the fixed number of training samples that coarse-grains phase boundaries.

In the study we made the first steps towards mapping spin phases in polaritonic lattices. Exploiting a data-driven approach, we concentrated on clustering of polarisation patterns, and did not dive into the physics of identified phases. The next steps can include studying the identified diagonal stripe phase, highlighting the differences with respect to the horizontal/vertical stripe phase and other phases, and studies of crossover to the FM phase. There are also potential ways to enhance the clustering. One route may be the analysis of data in the latent feature space obtained by variational autoencoders. Finally, learning by confusion approach can be further improved if deep NNs or more complex convolutional NNs are used.

Conclusions

We have studied polarisation patterns that emerge as steady states in nonlinear polaritonic lattices. For different values of pump gain and lattice tunnelling rates, we see qualitatively distinct patterns that correspond to polariton phases with mixtures of ferromagnetic and antiferromagnetic bonding of chequerboard, stripe, diagonal and cluster types. Using data analysis and machine learning techniques we classified these patterns and identified their phase boundaries. First, a qualitative phase map is developed using the t-distributed stochastic neighbour embedding as a data visualisation tool. Next, unsupervised learning based on agglomerative clustering was used to sketch the phase diagram of polariton phases as a function tunnelling rate and pump gain. Finally, a neural network-based learning by confusion approach was used to mark and refine the boundaries between polariton phases. The work describes a path for studying phase transitions in nonlinear optical systems, and highlights the use of data-driven approaches in polaritonic systems.

Methods

Numerical modelling

To describe the dynamics of polaritonic lattices we solve Eq. (1) for a square geometry with 8 × 8 sites with periodic boundary conditions. These are chosen to suppress the boundary effects, where close to thermodynamic limit physics can be studied. One can also consider soft-open boundary conditions, or damped boundary conditions, where dissipation grows as you get closer to the boundary. In the simulations, we vary two easily tunable experimental parameters Wt and J in the relevant range to generate a dataset of possible polarisation patterns accessible in experiment. The target nonresonant pump power W can be readily tuned in time and the Josephson coupling strength J can be tuned by changing the overlap between adjacent lattice sites at the lithography stage (micropillars), or by tuning the lattice potential optically.

Visualisation

We analyse the dataset using the open source Python library sklearn. We perform t-SNE with adjustable hyperparameters being the perplexity and the learning rate. Perplexity corresponds to the averaged number of accounted nearest neighbours (data points) which affect the learning process, and generally sets the statistical certainty in separating two points. Learning rate is responsible for the rate at which we update the positions, determining the step size in minimisation of loss function. The hyperparameters can be tuned to balance the capture of local and global details in the dataset. A good choice of hyperparameters can be additionally tested by confirming the effective clustering of known polaritonic phases with ferromagnetic and antiferromagnetic patterns (for instance, by labelling known configurations and checking their positions on the t-SNE diagram).

Clustering

We use K-means and agglomerative clustering approaches2. Both methods generally search for the mean values for K clusters, and adjust those means such that the chosen distance between the means and data points is minimised. K-means clustering requires defining the number of clusters K in advance. In contrast, the agglomerative clustering belongs to hierarchical methods. At first, all data points are assigned to distinct clusters (labelled from 1 up to the cardinality of the data point v). Next, using the pre-defined distance metric for two data points v and w from different cluster, the difference between clusters is evaluated. The cluster with a difference being below the threshold value are merged iteratively. The distance corresponds to four distinct types: complete, single, average and Ward’s. The complete distance type relies on the maximum distance between two data points in different clusters. The single distance type uses the minimum distance between two points from different clusters. The average distance type relies on the average distance between all of points from two clusters that are compared. The Ward distance type relies on the sum of squared distances to the centre of the cluster. The popular distance metrics are: (1) Euclidean distance d(v, w) v − w as L2 norm of the difference of two vectors; (2) cosine distance d(v, w) vw/(vw); (3) Manhattan L1 distance d(v, w)  ∑ivi − wi; among others.