Introduction

Maxwell’s demon, a thought experiment, demonstrates that if one could access information about the state of a system through a classical measurement, then one can exploit such information to gain mechanical work or energy from the system through classical control over it. This thought experiment leads to the generalization of the second law of thermodynamics by emphasizing the possibility of information-work conversion. It is one of the vital principles that rectify thermal fluctuations, without using strong nonlinearity, simply by measurement and classical control1. In classical Maxwell’s demon, the measurement is ideally arbitrarily precise, and the back action on the system does not need to be considered, as a measured quantity of a system is treated as a hidden variable. It changes dramatically when the measurement cannot give full information about the system’s state. The outcome of the measurement thus implies the new state of the system through the inference of the gained information. The different couplings to a probe and its subsequent measurement form new states of the system. Such events often turn out to be destructive; however, they sometimes can conditionally distill the system into a more useful resource2,3, as broadly explored in quantum information, especially in resource theories. If the distillation fails, it can still be repeated continuously until obtaining the resource. In this way, the chance of failure, in principle, can be minimized to zero at the cost of the speed or protocol multiplexing. The initial thermal fluctuations are used at the output with a negligible probability. Practically, it does not significantly influence the generated resource. The quantum transformation to a better resource, therefore, depends on the coupling and measurement back action4,5,6. First, we need to define free states, free couplings, and free measurements7,8. The free states are naturally thermal equilibrium states diagonal in the energy eigenbasis. The essential free unitary coupling of the system to the probe is then energy-conserving; the free probe energy measurement commutes with the probe’s energy. Free-controlled operations are also energy-conserving couplings with an ancilla in thermal states that maximally have the energy of the input ones. It defines the most basic but nontrivial playground to explore and compare Maxwell demon methods mutually and also with other noise rectification strategies.

In the resource theory, the usefulness of a quantum state is characterized operationally by some groups of physical and implementable processes that cannot generate the given resource, such as local operation and classical communication (LOCC), identifying the entanglement resource7,9. Quantifiers that are monotonically non-increasing under such physical processes are called resource monotones. In the case of continuous-variable systems, one can consider state transformations under Gaussian thermal operation to identify resource monotones such as temperature-like quantities, generalizing the equilibrium temperature10. The work of Serafini et al.11, on the other hand, provides a full characterization of possible single-mode transformations through Gaussian thermal operation leading to a no-go theorem, preventing lowering the entropy of a single-mode below the background with algorithmic cooling.

The quantum Maxwell demon is more involved and diverse for a bosonic system representing a single mode of photons, phonons, or other bosonic particles. Here, the simplest case of a free coupling is an energy-conserving beam-splitter type of resonant coupling. After this beam-splitter coupling, macroscopic measurement integrating energy already allows conditional manipulation with continuous energy statistics, for example, used in the work of Iskhakov et al.12,13. Microscopic single-quanta detection opens space for subtracting individual energy quanta conditionally14,15,16,17,18,19,20,21, even for macroscopic thermal states, and charging the macroscopic battery by average energy22,23.

However, for microscopic phononic states with few quanta on average, the statistics after subtraction become crucial for charging a microscopic battery. Such a battery is represented by a two-level system coupled to the phonons, light, or microwave fields. Multiple subtractions increase mean energy and reduce autocorrelation between quanta, causing them to be more statistically independent17,24. They mainly increase the mean-to-deviation ratio of the system’s energy, which is essential for information theory and thermodynamics25. Moreover, as recently demonstrated, the correlations between two thermal baths allow Maxwell’s demon-based protocol to extract more work26, and measurement strategies have been used in quantum memristors27.

In this work, we propose a nonlinear bosonic Maxwell’s demon working at the quantum level through the simple and deterministic protocol, which is expected to be straightforwardly realized in various quantum platforms. We first investigate this deterministic Maxwell demon method for a broadly feasible energy-conserving coupling, a linear Jaynes–Cummings (JC) coupling28, probing a bosonic system sequentially by two-level systems to reach an out-of-equilibrium state. We then prove that the output state can excite another two-level system better than any thermal state. Differently from photonic Maxwell’s demon, we consider phononic systems represented, for example, by the extensively used mechanical modes of a single atom29 or, recently, a macroscopic oscillator30. Alternatively, microwave superconducting experiments can also be considered for the experimental tests31. In these cases, usually, the mean thermal occupation per mode can be much higher than that of thermal light sources. Despite the low dimension of the probes, the deterministic linear subtraction increases both the mean-to-deviation ratio of energy and the probability of exciting two-level systems higher than that from thermal states. It proves the power of such operations beyond a conventional Fock state lowering32. The remaining noise can be further suppressed by a sharper measurement using still energy-conserving nonlinear JC coupling33,34,35 available at trapped-ion platforms36,37, cavity quantum electrodynamics38, and superconducting circuits39 mentioned above to perform a nonlinear subtraction of more quanta at once. Trilinear interactions, additionally, can also considered as alternative options40,41,42,43.

We prove that optimally implementing nonlinear subtractions after the linear ones increases both the mean-to-deviation ratio of energy and the probability of excitation of atoms. Such improvement after a few nonlinear subtractions becomes distinctly significant, more than 10 times improved, for successfully exciting hundreds of qubits. It is, to our knowledge, the first example showing that two-quanta processes can bring the statistics of phononic mode closer to a Poissonian, without any classical external drive and intense nonlinear saturation, which is typical for such processes in laser37. It proves that Maxwell’s demon, based on available nonlinear energy-conserving couplings, can open a new territory for quantum statistical and thermodynamical investigations.

Results

Overview

To understand the overall picture and procedure of this work, we devote this subsection to explain the overview of the proposed protocol employed to gradually shape the probability distribution of a harmonic oscillator close to a Poissonian for a better probability of exciting a two-level system, which is regarded as a quantum battery, via JC interaction. Figure 1 displays the overall processes of the scheme. A harmonic oscillator in thermal equilibrium with a thermal bath undergoes a linear excitation subtraction using a linear JC coupling for several times before further extracting its excitations through the nonlinear interaction so that its probability distribution becomes even more squeezed from both sides at low and high quanta. The output harmonic oscillator is then used to charge a two-level battery through the linear JC coupling to demonstrate that it outperforms the thermal bound, the maximum probability of exciting a qubit with a thermal state ρth, and the initial state it is associated with. This improvement originated from the successive change in the population distribution of the harmonic oscillators as a result of sequential subtractions of the oscillators’ excitation. In the limit of a high initial mean excitation \(\bar{n}\) of the oscillators, only linear subtraction cannot give the charging performance higher than the combination of both linear and nonlinear subtractions. For example, for \(\bar{n}=70\), using only a sequence of linear subtractions gives rise to the highest charging performance of Pe = 0.9745 after 42 subtractions, while using a shorter sequence of 15 linear and 5 nonlinear subtractions gives rise to the performance higher at Pe = 0.9784. This difference is more visible once we independently and completely excite a hundred atoms, as 42 linear subtractions give the probability of 0.076 compared to 0.11 using 15 linear and 5 nonlinear subtractions. The involvement of nonlinear subtractions allows a sufficiently high charging performance to be achieved with a significantly smaller number of required subtractions.

Fig. 1: The scheme of the overall protocol.
figure 1

The diagram represents the overview of the deterministic nonlinear bosonic protocol for resonantly charging a quantum battery, represented by a two-level system. The protocol is divided into two parts: phonon subtractions and the two-level charging process. The excitation subtraction also consists of two parts: linear (green) and nonlinear (blue) subtractions using protocol II, illustrated in the bottom inset and described in detail in the latter part of the subsection “Linear subtraction” in Results. Several linear subtractions are performed to gradually shape the probability distribution of a harmonic oscillator, initially in a thermal state ρth, into a nearly Gaussian distribution before nonlinear subtractions take place to trim and squeeze the probability distribution further. Such harmonic oscillators, hence, are used to charge the quantum batteries, initially being set to be in the ground state \(\left\vert {{{{{{{\rm{g}}}}}}}}\right\rangle\), through Jaynes-Cummings interaction. The excitation probability pe,out is expected to exceed its corresponding thermal bound.

The lower inset depicts the excitation subtraction procedure of both linear and nonlinear subtractions. A harmonic oscillator in a motional state \({\rho }_{N}^{{{{{{{{\rm{mo}}}}}}}}}\) resonantly interacts with a two-level system through a JC interaction, with the interaction Hamiltonian linear in oscillator variables, or a nonlinear JC interaction34 having higher powers of the oscillator variables in the interaction Hamiltonian depending on which type of excitation subtraction being performed at that stage. After they interact for a strategically chosen period of time, the state of the qubit is measured. If the measurement results in the excited state, the subtraction is thus performed successfully, otherwise, it fails. The measured outcome then feeds forward to decide whether the harmonic oscillator could be kept or replaced by its previous successful version before repeating every step again in the next round of subtraction.

Linear subtraction

The linear subtraction is performed using a resonant JC interaction whose Hamiltonian in the interaction picture can be written as

$${\hat{H}}_{{{{{{{{\rm{int}}}}}}}}}=\hslash \lambda \left({\hat{\sigma }}_{+}\hat{a}+{\hat{\sigma }}_{-}{\hat{a}}^{{{{\dagger}}} }\right),$$
(1)

where λ is the coupling strength, \({\hat{\sigma }}_{+}\) (\({\hat{\sigma }}_{-}\)) is the rising (lowering) operator of the qubit, and \({\hat{a}}^{{{{\dagger}}} }\) (\(\hat{a}\)) is the creation (annihilation) operator of the oscillators. We then can express the unitary operator associated with this coupling running for an interval time t as

$${\hat{U}}_{{{{{{{{\rm{JC}}}}}}}}}(t) \, = \, \left\vert {{{{{{{\rm{e}}}}}}}}\right\rangle \left\langle {{{{{{{\rm{e}}}}}}}}\right\vert \cos (\lambda t\sqrt{\hat{n}+1})+\left\vert {{{{{{{\rm{g}}}}}}}}\right\rangle \left\langle {{{{{{{\rm{g}}}}}}}}\right\vert \cos (\lambda t\sqrt{\hat{n}})\\ - {{{{{{{\rm{i}}}}}}}}{\hat{\sigma }}_{-}{\hat{a}}^{{{{\dagger}}} }\frac{\sin (\lambda t\sqrt{\hat{n}+1})}{\sqrt{\hat{n}+1}}-{{{{{{{\rm{i}}}}}}}}{\hat{\sigma }}_{+}\hat{a}\frac{\sin (\lambda t\sqrt{\hat{n}})}{\sqrt{\hat{n}}},$$
(2)

where \(\left\vert {{{{{{{\rm{g}}}}}}}}\right\rangle \left\langle {{{{{{{\rm{g}}}}}}}}\right\vert\) and \(\left\vert {{{{{{{\rm{e}}}}}}}}\right\rangle \left\langle {{{{{{{\rm{e}}}}}}}}\right\vert\) denote the projections on the ground and excited states of the qubits, respectively, and \(\hat{n}={\hat{a}}^{{{{\dagger}}} }\hat{a}\) is the number operator of the harmonic oscillators.

The first proposed protocol, denoted as protocol I, for linearly subtracting the motional excitations is schematically illustrated in Fig. 2. At the beginning, harmonic oscillators are in thermal equilibrium with a thermal bath at temperature T, while the probes, two-level systems, are prepared in the ground state \(\left\vert {{{{{{{\rm{g}}}}}}}}\right\rangle\). The initial composite state of this can be expressed as

$${\rho }_{0}=\left\vert {{{{{{{\rm{g}}}}}}}}\right\rangle \left\langle {{{{{{{\rm{g}}}}}}}}\right\vert \otimes {\rho }_{{{{{{{{\rm{th}}}}}}}}},$$
(3)

where ρth denotes the state of a harmonics oscillator in thermal equilibrium with a mean number of excitations \(\bar{n}\),

$${\rho }_{{{{{{{{\rm{th}}}}}}}}}=\mathop{\sum }\limits_{m=0}^{\infty }\frac{{\bar{n}}^{m}}{{(\bar{n}+1)}^{m+1}}\left\vert m\right\rangle \left\langle m\right\vert \equiv {\rho }_{0}^{{{{{{{{\rm{mo}}}}}}}}},$$
(4)

and is also regarded as the initial motional state of the oscillators.

Fig. 2: The scheme of protocol I.
figure 2

The diagram illustrates the proposed phonon subtraction protocol. In the Nth round of subtraction, an oscillator in a state \({\rho }_{N}^{{{{{{{{\rm{mo}}}}}}}}}\) couples with a two-level system at its ground state for the optimal interaction time \({t}_{N}^{{{{{{{{\rm{op}}}}}}}}}\). Then a measurement in the energy basis of the qubit is performed. If the excited state \(\left\vert {{{{{{{\rm{e}}}}}}}}\right\rangle\) is obtained, we keep the oscillator (blue), as its energy is successfully subtracted, for the next round, otherwise replacing it with a new oscillator in the initial thermal state.

The mean number of motional excitations \(\bar{n}\) is related to the temperature T by \(\bar{n}={(\exp (\hslash \omega /{k}_{{{{{{{{\rm{B}}}}}}}}}T)-1)}^{-1}\), where ω is the angular frequency of the oscillators. The interaction between the qubits and the oscillators is run for the optimal time, \({t}_{0}^{{{{{{{{\rm{op}}}}}}}}}\), chosen to maximize the probability of exciting the qubits, approximately related to \(\bar{n}\) as \(\lambda {t}_{0}^{{{{{{{{\rm{op}}}}}}}}}\approx \pi /(2\sqrt{\bar{n}+1})\) (see the Methods section more details). Subsequently, the measurement in the eigenbasis \(\{\left\vert {{{{{{{\rm{g}}}}}}}}\right\rangle ,\left\vert {{{{{{{\rm{e}}}}}}}}\right\rangle \}\) on these qubits is performed. We then postselect only those oscillators with the probes in the excited state \(\left\vert {{{{{{{\rm{e}}}}}}}}\right\rangle\) to be used in the further steps of the protocol. The measurement and postselection project the state of the qubits and the harmonic oscillators onto

$${\rho }_{1}^{{\prime} }({t}_{0}^{{{{{{{{\rm{op}}}}}}}}})= \, \frac{1}{{P}_{{{{{{{{\rm{e}}}}}}}}}^{(0)}({t}_{0}^{{{{{{{{\rm{op}}}}}}}}})}\left({\hat{\sigma }}_{+}\hat{a}\frac{\sin (\lambda {t}_{0}^{{{{{{{{\rm{op}}}}}}}}}\sqrt{\hat{n}})}{\sqrt{\hat{n}}}{\rho }_{0}\frac{\sin (\lambda {t}_{0}^{{{{{{{{\rm{op}}}}}}}}}\sqrt{\hat{n}})}{\sqrt{\hat{n}}}{\hat{a}}^{{{{\dagger}}} }{\hat{\sigma }}_{-}\right)\\ = \, \left\vert {{{{{{{\rm{e}}}}}}}}\right\rangle \left\langle {{{{{{{\rm{e}}}}}}}}\right\vert \otimes \mathop{\sum }\limits_{m=0}^{\infty }\frac{{\bar{n}}^{m+1}}{{(\bar{n}+1)}^{m+2}}\frac{{\sin }^{2}(\lambda {t}_{0}^{{{{{{{{\rm{op}}}}}}}}}\sqrt{m+1})}{{P}_{{{{{{{{\rm{e}}}}}}}}}^{(0)}({t}_{0}^{{{{{{{{\rm{op}}}}}}}}})}\left\vert m\right\rangle \left\langle m\right\vert \\ = \, \left\vert {{{{{{{\rm{e}}}}}}}}\right\rangle \left\langle {{{{{{{\rm{e}}}}}}}}\right\vert \otimes {\rho }_{0}^{{{{{{{{\rm{mo,e}}}}}}}}},$$
(5)

where \({P}_{{{{{{{{\rm{e}}}}}}}}}^{(0)}({t}_{0}^{{{{{{{{\rm{op}}}}}}}}})\) is the probability of observing the excited state \(\left\vert {{{{{{{\rm{e}}}}}}}}\right\rangle\) at the optimal time, which acts as the normalization factor of the term in the bracket and \({\rho }_{0}^{{{{{{{{\rm{e}}}}}}}}}\) is the state of the oscillators after the postselection. The qubits of those postselected systems are then reset back to their ground state \(\left\vert {{{{{{{\rm{g}}}}}}}}\right\rangle\) by the dissipation of their energy to the environment. We consider that the qubit dissipation is much slower than the JC interaction and, therefore, does not decohere the process. We mention here that the excited qubits are no longer coupled to harmonics oscillators after the measurement, but coupled with the radiation modes instead to erase them. The qubit thus relaxes its excitation to the cold optical environment by spontaneous photon emission. It is essential for ground qubit state cooling of trapped atoms (and superconducting qubits), reaching nearly 100%. We also would like to emphasize that, like others, Maxwell’s demons, this proposed protocol does not invalidate the second law of thermodynamics as it requires a resource to reset the probe to its ground state44.

After that, we replace those failed systems with new systems in the initial state ρ0. All mentioned processes are then repeated again, but this time, the initial motional state of the ensemble for the new round has changed from \({\rho }_{0}^{{{{{{{{\rm{mo}}}}}}}}}\) due to the measurement back action of the first subtraction. For the Nth round of repeat-until-success subtraction by the JC interaction, the state of the ensemble can be expressed as

$${\rho }_{N}^{{{{{{{{\rm{mo}}}}}}}}}=\hat{a}\frac{\sin (\lambda {t}_{N-1}^{{{{{{{{\rm{op}}}}}}}}}\sqrt{\hat{n}})}{\sqrt{\hat{n}}}{\rho }_{N-1}^{{{{{{{{\rm{mo}}}}}}}}}\frac{\sin (\lambda {t}_{N-1}^{{{{{{{{\rm{op}}}}}}}}}\sqrt{\hat{n}})}{\sqrt{\hat{n}}}{\hat{a}}^{{{{\dagger}}} }+(1-{P}_{{{{{{{{\rm{e}}}}}}}}}^{(N-1)}({t}_{N-1}^{{{{{{{{\rm{op}}}}}}}}})){\rho }_{0}^{{{{{{{{\rm{mo}}}}}}}}},$$
(6)

where \({\rho }_{N-1}^{{{{{{{{\rm{mo}}}}}}}}}\) is the achieved motional state of the previous subtraction, \({P}_{{{{{{{{\rm{e}}}}}}}}}^{(N-1)}\) is the probability of getting the excited state \(\left\vert {{{{{{{\rm{e}}}}}}}}\right\rangle\) in the previous round, and \({t}_{N-1}^{{{{{{{{\rm{op}}}}}}}}}\) is the interaction time that maximizes \({P}_{{{{{{{{\rm{e}}}}}}}}}^{(N-1)}\). Note that the motional state in Eq. (6) when N = 1 differs from the state in Eq. (5) by additional terms associated with the repeat-until-success subtractions as studied by Marek et al.45. The probability \({P}_{{{{{{{{\rm{e}}}}}}}}}^{(N)}\) of getting an excited state at the optimal interaction time \({t}_{N}^{{{{{{{{\rm{op}}}}}}}}}\) can be written as

$${P}_{{{{{{{{\rm{e}}}}}}}}}^{N}({t}_{N}^{{{{{{{{\rm{op}}}}}}}}})=\mathop{\sum }\limits_{m=0}^{\infty }{p}_{m}^{(N)}{\sin }^{2}(\lambda {t}_{N}^{{{{{{{{\rm{op}}}}}}}}}\sqrt{m}).$$
(7)

where the probability distribution \({p}_{m}^{(N)}\) of the harmonic oscillators in the state ρN can be expressed as

$${p}_{m}^{(N)}={p}_{m+1}^{(N-1)}{\sin }^{2}(\lambda {t}_{N-1}^{{{{{{{{\rm{op}}}}}}}}}\sqrt{m+1})+(1-{P}_{{{{{{{{\rm{e}}}}}}}}}^{(N-1)}({t}_{N-1}^{{{{{{{{\rm{op}}}}}}}}})){p}_{m}^{(0)},$$
(8)

We denote \({p}_{m}^{(N-1)}\) to be the achieved probability distribution in the previous round with \({p}_{m}^{(0)}={\bar{n}}^{m}/{(\bar{n}+1)}^{m+1}\), the initial probability distribution at thermal equilibrium. This equation describes how each subtraction gradually shapes the probability distribution of the oscillators in each round.

Let us consider the semiclassical case when the average excitation is very large, \(\bar{n} \, \gg \, 1\). The square of the sine function, the first term in Eq. (8), acts as a population filter. After the first subtraction, N = 1, with the optimal time \(\lambda {t}_{0}^{{{{{{{{\rm{op}}}}}}}}}\approx \pi /(2\sqrt{\bar{n}+1})\), the probabilities \({p}_{n}^{(1)}\) with the value of n very different from the initial average excitation \(\bar{n}\) are suppressed as the values of \({\sin }^{2}(\pi \sqrt{m+1}/(2\sqrt{\bar{n}+1}))\) is considerably smaller than unity, while those probabilities \({p}_{m}^{(1)}\) with m close to \(\bar{n}\) dominate the new probability distribution. This filtering effect still holds true for the subsequent subtractions, and it also makes the optimal interaction times of several further rounds are approximately the same as the first one: \(\lambda {t}_{N}^{{{{{{{{\rm{op}}}}}}}}}\approx \lambda {t}_{0}^{{{{{{{{\rm{op}}}}}}}}}\approx \pi /(2\sqrt{\bar{n}+1})\) for \(N \, \ll \, \bar{n}\). As the probability of having the qubit in the excited state grows progressively with the number of performed subtractions, see the detailed discussion in subsection “Charging performance” in Results, the last term in Eq. (8) gradually becomes a smaller contribution. Each subtraction of the motional excitation, with optimized coupling, thus gradually modifies the probability distribution \({p}_{m}^{(N)}\) into a Gaussian distribution centered around \(\bar{n}\). The center of the probability distribution becomes noticeably shifted toward the motional ground state, when the number of subtractions becomes comparable with the initial average excitation number \(\bar{n}\), indicating that the average excitation number decreases slightly each time we perform a subtraction.

Protocol I can be further improved, if in the last step of each subtraction, instead of replacing the failed systems with systems in thermal equilibrium, \({\rho }_{0}^{{{{{{{{\rm{mo}}}}}}}}}\), we replace them with the successfully achieved systems of the previous round. The diagram of the second protocol, named protocol II, is depicted in Fig. 3. The achieved state of the ensemble after the Nth round of subtraction becomes

$${\rho }_{N}^{{{{{{{{\rm{mo}}}}}}}}}=\hat{a}\frac{\sin (\lambda {t}_{N-1}^{{{{{{{{\rm{op}}}}}}}}}\sqrt{\hat{n}})}{\sqrt{\hat{n}}}{\rho }_{N-1}^{{{{{{{{\rm{mo}}}}}}}}}\frac{\sin (\lambda {t}_{N-1}^{{{{{{{{\rm{op}}}}}}}}}\sqrt{\hat{n}})}{\sqrt{\hat{n}}}{\hat{a}}^{{{{\dagger}}} }+(1-{P}_{{{{{{{{\rm{e}}}}}}}}}^{(N-1)}({t}_{N-1}^{{{{{{{{\rm{op}}}}}}}}})){\rho }_{N-1}^{{{{{{{{\rm{mo}}}}}}}}},$$
(9)

where the population distribution of the harmonic oscillators is modified as

$${p}_{m}^{(N)}={p}_{m+1}^{(N-1)}{\sin }^{2}(\lambda {t}_{N-1}^{{{{{{{{\rm{op}}}}}}}}}\sqrt{m+1})+(1-{P}_{{{{{{{{\rm{e}}}}}}}}}^{(N-1)}({t}_{N-1}^{{{{{{{{\rm{op}}}}}}}}})){p}_{m}^{(N-1)}.$$
(10)

This modification can suppress both tails of the probability distribution faster and better than the protocol I at the cost of collecting and storing the outcomes of the previous steps. The nearly Gaussian statistics of phonons produced by protocols I and II are illustrated and compared in Fig. 4.

Fig. 3: The scheme of protocol II.
figure 3

The diagram of the improved protocol is similar to that of protocol I shown in Fig. 2, but in this case, the failed systems are replaced by the successful systems obtained in the previous round instead of using systems in the initial thermal state.

Fig. 4: The phonon statistics after linear subtractions.
figure 4

The change in the population distributions after five linear subtractions using a protocol I and b protocol II is demonstrated. The black solid lines represent the initial population distribution of the oscillators being in thermal equilibrium with the mean excitation number \(\bar{n}=30\). Sequential subtractions gradually form a nearly Gaussian distribution probability distributions (orange), with its peak located slightly lower than the initial mean excitation \(\bar{n}=30\). The distribution obtained from protocol II, in (b), is noticeably narrower than that obtained from protocol I but has slightly lesser mean excitation, \(\langle \hat{n}\rangle\). These two distributions are then compared to Poisson distributions (blue) with the same average phonon numbers. Their relevant information is also given, including their mean excitation \(\langle \hat{n}\rangle\), their second-order correlation functions g2(0), their mean-to-deviation ratios of excitation \({{{{{{{\mathcal{R}}}}}}}}\), defined in Eq. (13), and their Fano factors F.

We note here that the required conditions to have the statistics of the population close to a Gaussian distribution are both sufficiently high initial excitation \(\bar{n}\), i.e. high temperature and a sufficient number of subtractions by linear Jaynes-Cummings model with protocol II. A few rounds of such subtractions are insufficient to make the distribution symmetric. A low value \(\bar{n}\), on the other hand, cannot allow enough rounds of linear subtraction, as they would deplete the excitation of the oscillator and bring it close to the ground state instead.

We assume that the thermalization time of the motional state is very long compared to the total time spent in all processes of protocols I and II so that the heat transferred from the thermal bath to the considered oscillators is very small and negligible. The thermalization effect, as a result, can be ignored.

Charging performance

A population inversion of qubits happens when the probability Pe of finding the qubits in the excited state \(\left\vert {{{{{{{\rm{e}}}}}}}}\right\rangle\) exceeds the probability Pg of finding them in the ground state \(\left\vert {{{{{{{\rm{g}}}}}}}}\right\rangle\), i.e., Pe − Pg > 0 or Pe > 1/2. We then devote this part to demonstrate and explain the performance of sequential linear subtractions. Let us first discuss the relation between the population distribution of a harmonic oscillator and the maximum excitation probability of a two-level system. For a qubit initially being in the ground state \(\left\vert {{{{{{{\rm{g}}}}}}}}\right\rangle\), the probability Pe of getting the excited state \(\left\vert {{{{{{{\rm{e}}}}}}}}\right\rangle\) after it is coupled an oscillator via the JC interaction for t is

$${P}_{{{{{{{{\rm{e}}}}}}}}}(t)=\mathop{\sum }\limits_{m=0}^{\infty }{p}_{m}{\sin }^{2}(\lambda t\sqrt{m}),$$
(11)

where pm is the population distribution of the oscillator. As an oscillator being in its motional ground state \(\left\vert 0\right\rangle\) cannot be coupled with a qubit in the ground state \(\left\vert {{{{{{{\rm{g}}}}}}}}\right\rangle\), the probability Pe thus must be smaller than 1 − p0, where p0 is the probability of finding the oscillator in its ground state. This means the desired probability distribution should have a small probability p0. For a weak coupling case in which λt 1 so that \({p}_{m}{\sin }^{2}(\lambda t\sqrt{m}) \sim {p}_{m}{(\lambda t)}^{2}m\, \ll \, 1\), the probability approaches a linear rule \({(\lambda t)}^{2}\langle \hat{n}\rangle < 1\) and the statistics of the oscillator do not matter in the classical excitation limit. For stronger coupling, however, this simple approximation breaks. Each term in the summation oscillates in time differently depending on its index m due to the sine function. Narrower probability distributions thus give a constructive summation of those terms in the summation, as they cause smaller mis-match between the oscillating time dependence of the dominant probabilities pm and provide a higher chance of getting the excited state \(\left\vert {{{{{{{\rm{e}}}}}}}}\right\rangle\). For example, the perfect scenario in which the excited state \(\left\vert {{{{{{{\rm{e}}}}}}}}\right\rangle\) is obtained via the JC interaction for certain is when the oscillator is in an arbitrary Fock state \(\left\vert n\right\rangle\) as we can just choose the interaction time t precisely to match \(\lambda t\sqrt{n}=\pi /2\), leading to Pe = 1. Another factor to be considered for a Gaussian-like distribution, with probabilities pm falling shapely when being far from the peak, is the mean excitation number, \(\langle \hat{n}\rangle\). If two distributions have an identical Gaussian-like shape but different mean excitation numbers, the one with a larger mean excitation can give larger Pe. A larger \(\langle \hat{n}\rangle\) provides smaller mis-match between the oscillating time dependence \(\lambda t\sqrt{m}\) of the sine function in the summation. When m differs from the mean excitation number \(\langle \hat{n}\rangle\) by Δm such that \(m=\langle \hat{n}\rangle +\Delta m\) and \(| \Delta m| \, \ll \, \langle \hat{n}\rangle\), the oscillating time dependence of the probability pm can be approximated as

$$\begin{array}{rcl}\lambda \sqrt{m}&=&\lambda \sqrt{\langle \hat{n}\rangle +\Delta m}\\ &\approx &\lambda \sqrt{\langle \hat{n}\rangle }+\frac{\lambda \Delta m}{2\sqrt{\langle \hat{n}\rangle }}.\end{array}$$
(12)

The difference in the time dependence of the dominant probabilities in the summation of Eq. (11) is, therefore, inversely proportional to \(\sqrt{\langle \hat{n}\rangle }\). Of course, the mean excitation becomes irrelevant when it comes to the case of an excited Fock state, as demonstrated earlier that, with a single oscillating term in the summation, Pe = 1 can be obtained for certain regardless of the mean excitation number. However, we need to bear in mind that the statistics will immediately play a crucial role once there exists a small deviation from Fock states. From these discussed facts, among the parameters commonly used for analyzing the statistics of excitation, such as second-order correlation function, \({g}^{2}(0)=\langle {\hat{a}}^{{{{\dagger}}} 2}{\hat{a}}^{2}\rangle /{\langle {\hat{a}}^{{{{\dagger}}} }\hat{a}\rangle }^{2}\), and Fano factors \(F=\langle {(\Delta \hat{n})}^{2}\rangle /\langle \hat{n}\rangle\), the appropriate parameter indicating the desirable phonon statistics, motivated by Eq. (12), should be the mean-to-deviation ratio (MDR) of the population, denoted by \({{{{{{{\mathcal{R}}}}}}}}\), which is defined as

$${{{{{{{\mathcal{R}}}}}}}}=\frac{\langle \hat{n}\rangle }{\sqrt{\langle {(\Delta \hat{n})}^{2}\rangle }}$$
(13)

where \(\langle {(\Delta \hat{n})}^{2}\rangle =\langle {\hat{n}}^{2}\rangle -{\langle \hat{n}\rangle }^{2}\) represents the phonon fluctuation. It is more likely that the atom is excited better by a phonon with a greater value of \({{{{{{{\mathcal{R}}}}}}}}\).

Figure 5 shows the maximum probability of having the excited state \(\left\vert {{{{{{{\rm{e}}}}}}}}\right\rangle\) increases after each linear subtraction. This increase originates from the fact that the population distribution becomes narrower after each subtraction. However, for a small initial mean excitation number, around \(\bar{n} \sim 1\)–2, the subtractions do not always increase \({P}_{{{{{{{{\rm{e}}}}}}}}}^{(N)}\) as the excitations of the oscillators are almost exhausted, i.e., most oscillators are in their motional ground state and no longer coupled with the two-level systems. As expected, since protocol II gives a smaller probability of being in the motional ground state, p0, and a narrower probability distribution, it then gives higher probabilities \({P}_{{{{{{{{\rm{e}}}}}}}}}^{(N)}\), for N ≥ 2. From the figure, we can clearly see that the increase of \({P}_{{{{{{{{\rm{e}}}}}}}}}^{(N)}\) gradually becomes saturated, as the value of \(\Delta {P}_{{{{{{{{\rm{e}}}}}}}}}^{(N)}={P}_{{{{{{{{\rm{e}}}}}}}}}^{(N)}-{P}_{{{{{{{{\rm{e}}}}}}}}}^{(N-1)}\) becomes smaller. Further subtractions barely increase the excitation probability. It is obvious from the figure that the saturated value of \({P}_{{{{{{{{\rm{e}}}}}}}}}^{(N)}\) obtained from protocol II is slightly greater than that from protocol I. This is because protocol II shapes the distribution in such a way that the probability p0 and its neighborhood become very small, as shown in Fig. 4b, compared to the distribution obtained from protocol I, depicted in Fig. 4a.

Fig. 5: The relation of charging performance and the initial phonon number \(\bar{n}\) of a thermal oscillator after subtractions.
figure 5

The figure demonstrates the increase of the maximum probability \({P}_{{{{{{{{\rm{e}}}}}}}}}^{(N)}\) of getting the excited state \(\left\vert {{{{{{{\rm{e}}}}}}}}\right\rangle\) through the JC coupling after the Nth subtraction, when using a protocol I, depicted in Fig. 2, and b protocol II, shown in Fig. 3. The probability \({P}_{{{{{{{{\rm{e}}}}}}}}}^{(N)}\) increases sharply with the initial mean excitation \(\bar{n}\) and reaches its plateau around \(\bar{n} \sim 5\). Protocol II apparently provides a better chance of exciting a two level-system compared to protocol I. The dotted lines are set to be at 0.5 to mark population inversion \({P}_{{{{{{{{\rm{e}}}}}}}}}^{(N)} \, > \, 0.5\), while the dash-dotted lines display the excitation probability if the oscillator was in a coherent state. A thermal state with a very large mean excitation, \(\bar{n} \, \gg \, 1\), can excite a qubit with probability of Pe ≈ 0.6411.

For a large number \(\bar{n}\), the performance of charging a two level system in an Nth round of operation, for N ≤ 20, can be roughly approximated as \({P}_{{{{{{{{\rm{e}}}}}}}},{{{{{{{\rm{I}}}}}}}}}^{(N)} \, \approx \, 0.84-0.20{e}^{-0.5N}+0.003N\), for protocol I, and \({P}_{{{{{{{{\rm{e}}}}}}}},{{{{{{{\rm{II}}}}}}}}}^{(N)}\approx 0.93-0.29{e}^{-0.4N}+0.002N\) for protocol II, where \({P}_{{{{{{{{\rm{e,I}}}}}}}}}^{(N)}\) (\({P}_{{{{{{{{\rm{e}}}}}}}},{{{{{{{\rm{III}}}}}}}}}^{(N)}\)) is the charging performance in the Nth round using protocol I (protocol II). The charging performance from both protocol in this case is displayed in Fig. 6. These approximations explicitly quantify how the performance of the proposed protocols improved with the number of operation rounds N, to portray the improvement of the performance in each round compared to the cost of resources using for running the protocols.

Fig. 6: The comparison of the charging performance using protocols I and II.
figure 6

The charging performance using protocol I (blue) and protocol II (red) when the initial mean excitations are \(\bar{n}=30\) (empty circles) and \(\bar{n}=70\) (filled circles) after N linear subtractions, with N ≤ 20, is displayed. The solid lines represent the asymptotic approximations, when \(\bar{n} \, \gg \, 1\). As these protocols are not yet different for N < 2, the blue circles, in this case, are placed behind the red circles.

Nonlinear subtraction

As we pointed out in Fig. 4, the distribution still has a long decaying tail for larger populations resembling the thermal statistics. To remove this limitation and shape the population distribution even faster and better, linear subtraction alone is no longer sufficient. From Fig. 6, the performance of the linear subtractions eventually will reach its saturation, but there is still a way to break through it by utilizing a nonlinear interaction whose interaction Hamiltonian is of the form,

$${\hat{H}}_{{{{{{{{\rm{non}}}}}}}}}=\hslash {\lambda }^{{\prime} }\left({\hat{\sigma }}_{+}{\hat{a}}^{2}+{\hat{\sigma }}_{-}{\left({\hat{a}}^{{{{\dagger}}} }\right)}^{2}\right),$$
(14)

where \({\lambda }^{{\prime} }\) denotes the coupling strength of the interaction. This interaction Hamiltonian would have the same form as the Hamiltonian in Eq. (1) if the annihilation and creation operators in Eq. (1) were replaced by their squares, \(\hat{a}\to {\hat{a}}^{2}\) and \({\hat{a}}^{{{{\dagger}}} }\to {\left({\hat{a}}^{{{{\dagger}}} }\right)}^{2}\). The unitary operator describing the time evolution of this nonlinear coupling is given as

$${\hat{U}}_{{{{{{{{\rm{non}}}}}}}}}(t) = \, \left\vert {{{{{{{\rm{e}}}}}}}}\right\rangle \left\langle {{{{{{{\rm{e}}}}}}}}\right\vert \cos \left({\lambda }^{{\prime} }t\sqrt{(\hat{n}+1)(\hat{n}+2)}\right)+\left\vert {{{{{{{\rm{g}}}}}}}}\right\rangle \left\langle {{{{{{{\rm{g}}}}}}}}\right\vert \cos \left({\lambda }^{{\prime} }t\sqrt{\hat{n}(\hat{n}-1)}\right)\\ -{{{{{{{\rm{i}}}}}}}}{\sigma }_{-}{\left({\hat{a}}^{{{{\dagger}}} }\right)}^{2}\frac{\sin \left({\lambda }^{{\prime} }t\sqrt{(\hat{n}+1)(\hat{n}+2)}\right)}{\sqrt{(\hat{n}+1)(\hat{n}+2)}}\\ -{{{{{{{\rm{i}}}}}}}}{\sigma }_{+}{\hat{a}}^{2}\frac{\sin \left({\lambda }^{{\prime} }t\sqrt{\hat{n}(\hat{n}-1)}\right)}{\sqrt{\hat{n}(\hat{n}-1)}}.$$
(15)

As mentioned, nonlinear repeat-until-success subtractions are performed in the same way as linear subtractions explained in subsection “Linear subtraction” in Results, except this time, the employed interaction becomes nonlinear. However, unlike the linear case, a nonlinear subtraction cannot trim the tails of the population distribution desirably, see Supplementary Note 1 for more details. Therefore, the prior population distribution should be similar to a Gaussian to some extent, and the probabilities pm associated with high-energy levels, \(m \, \gg \, \bar{n}\), must be already very small. Otherwise, the nonlinear subtraction will form a ripple in the probability distribution of the harmonic oscillators, which is an undesired effect. Due to the normalization condition, the probability distribution with a ripple is more dispersed compared to those without it. The probability Pe, as a result, is not as large as it potentially should be.

Consequently, in order to use nonlinear subtractions properly, we have to perform several linear subtractions using protocol II, displayed in Fig. 3, so that the population distribution is modified close to a Gaussian with sufficiently short tails. After that, nonlinear subtractions using the procedure of protocol II can then be performed. The achieved state after a nonlinear subtraction can be expressed as

$${\rho }^{{{{{{{{\rm{f}}}}}}}}}={\hat{a}}^{2}S(\hat{n}){\rho }^{{{{{{{{\rm{i}}}}}}}}}S(\hat{n}){\left({\hat{a}}^{{{{\dagger}}} }\right)}^{2}+\left(1-{P}_{{{{{{{{\rm{e}}}}}}}}}^{{\prime} }({\tau }^{{{{{{{{\rm{op,}}}}}}}}})\right){\rho }^{{{{{{{{\rm{i}}}}}}}}},$$
(16)

with

$$S(\hat{n})=\frac{\sin \left({\lambda }^{{\prime} }{\tau }^{{{{{{{{\rm{op}}}}}}}}}\sqrt{\hat{n}(\hat{n}-1)}\right)}{\sqrt{\hat{n}(\hat{n}-1)}}$$
(17)

where ρi is the former state of the oscillators before the nonlinear subtraction, τop is the interaction time that gives the first locally optimal probability of successful subtraction. The probability of successful subtraction, on the other hand, reads

$${P}_{{{{{{{{\rm{e}}}}}}}}}^{{\prime} }(t)=\mathop{\sum }\limits_{m=2}^{\infty }{p}_{m}^{{{{{{{{\rm{i}}}}}}}}}{\sin }^{2}\left({\lambda }^{{\prime} }t\sqrt{m(m-1)}\right),$$
(18)

where \({p}_{m}^{{{{{{{{\rm{i}}}}}}}}}\) is the prior probability distribution of the harmonic oscillators, the diagonal elements of ρi. In contrast to linear subtractions, the suitably chosen interaction time for nonlinear subtractions is not the time that optimizes the probability \({P}_{{{{{{{{\rm{e}}}}}}}}}^{{\prime} }\). From the equation, it is easy to notice that the optimal interaction time is approximately at \({\lambda }^{{\prime} }t=\pi /2\), resulting in \({\sin }^{2}\left({\lambda }^{{\prime} }t\sqrt{m(m-1)}\right)\approx 1\) for m > 1. However, with this interaction time, a nonlinear subtraction almost does not modified the probability distribution at all, as the sine function barely shows its influence. We instead need to choose the interaction time τop that gives the first locally optimal \({P}_{{{{{{{{\rm{e}}}}}}}}}^{{\prime} }\) such that \({\lambda }^{{\prime} }{\tau }^{{{{{{{{\rm{op}}}}}}}}} < \pi /2\). The interaction time τop for this case is approximately related to the mean phonon number \(2\langle \hat{n}\rangle\) as

$${\lambda }^{{\prime} }{\tau }^{{{{{{{{\rm{op}}}}}}}}}\approx \frac{\pi }{2\langle \hat{n}\rangle }.$$
(19)

This means we can run a nonlinear interaction even faster than a linear one. A nonlinear subtraction using protocol II manipulates the probability distribution as

$${p}_{m}^{{{{{{{{\rm{f}}}}}}}}}={p}_{m+2}^{{{{{{{{\rm{i}}}}}}}}}{\sin }^{2}\left({\lambda }^{{\prime} }{\tau }^{{{{{{{{\rm{op}}}}}}}}}\sqrt{(m+2)(m+1)}\right)+(1-{P}_{{{{{{{{\rm{e}}}}}}}}}^{{\prime} }({\tau }^{{{{{{{{\rm{op}}}}}}}}})){p}_{m}^{{{{{{{{\rm{i}}}}}}}}}.$$
(20)

Figure 7 compares the probabilities of success and their change after each subtraction of two different schemes: the scheme that employs only linear subtractions (the upper bar chart) and the scheme that uses linear subtractions followed by nonlinear subtractions (the lower bar chart) to boost the performance before charging the quantum battery. From the upper bar chart, when only linear subtractions are being used, the sequential increase of the probability of success gradually reaches its saturation. Therefore, additional linear subtractions just barely improve the probability of success and the charging performance, denoted by the last green bar. The lower bar chart, on the other hand, demonstrates that the saturated performance can be further boosted with the help of nonlinear subtractions, whose probabilities of success are represented by the two red bars. The success probability of a nonlinear subtraction is noticeably lower than that of the previous linear subtractions, but it is still sufficiently large to make the nonlinear subtraction protocol practical. After six linear and two nonlinear subtractions, the charging performance or the probability of getting the excited state through the JC coupling, denoted by the green bar, becomes even larger than its previous version when all eight subtractions are linear. This increase lies in the change in the shape of the population distribution after the nonlinear subtractions. As shown in the inset of Fig. 7, the distribution becomes more squeezed, which is more desirable for exciting a two-level system.

Fig. 7: The improved charging performance after using nonlinear subtractions.
figure 7

a The bar chart represents the probabilities of success to obtain the excited state in Nth subtractions using protocol II with an initial average phonon number of \(\bar{n}=30\). b On the other hand, the bar chart below also shows such probabilities, but this time, the 7th and 8th subtractions are performed by two consecutive nonlinear subtractions, with their probabilities of success denoted with the red bars. The actual probabilities of success, Pe, are written in white on these bars so that we can see their fractional differences. Above each bar, a probability distribution of the harmonic oscillator contributing to the probability of success is displayed together with the value \({{{{{{{\mathcal{R}}}}}}}}\) of the mean-to-deviation ratio (MRD) of phonons. The last green bars of both bar charts represent the probabilities of getting the excited state through the JC coupling after eight subtractions, which can be regarded as the charging performance. To compare the performance of both schemes, we mark the height of the upper green bar on the lower one with the white dashed line. In the inset, the probability distributions of the two schemes of subtraction are compared. The solid black line represents the distribution obtained from eight linear subtractions, while the orange histogram shows the distribution obtained when the two nonlinear subtractions are introduced. The latter is then compared with a Poissonian distribution of the same mean excitation \(\langle \hat{n}\rangle\), displayed by the blue histogram.

We note here that a nonlinear subtraction cannot be used with the procedure of protocol I, as at the end of the protocol, the failed systems are replaced by systems in a thermal state, which makes high-energy populations not sufficiently small. As a result, nonlinear subtractions cause the population distribution to be even more dispersed and a small ripple in the distribution to form.

Discussion

We previously compared the charging performance obtained from the two subtraction strategies: the linear-subtractions-only strategy and the combination of linear and nonlinear subtractions. It emerged that the latter provides a better charging performance compared to the first. The remaining question is whether nonlinear subtractions should be performed at an earlier stage at the very end of a subtraction sequence, or even something in between, to get the optimal charging performance. To answer this, we then examined different combinations of linear and nonlinear subtractions, where the coupling is optimized for each subtraction depending on the previous measurement outcomes. The result turns out to be that the later the nonlinear subtractions take place, the greater the performance can be. The probabilities of successfully charging an individual battery associated with these strategies of subtraction may, at first, look insignificantly different, but at the scale of mass production the differences become eventually magnified. This fact is demonstrated in Fig. 8 by comparing the probabilities of successfully charging a hundred out of a hundred quantum batteries using different combinations of linear and nonlinear subtractions. To relate it with the result in subsection “Nonlinear subtraction” in Results, we consider only the cases in which eight subtractions, including six linear and two consecutive nonlinear subtractions, are performed before the charging stage and compare their charging performances to that of the linear-subtractions-only strategy. From the figure, the best performance is obtained if the last two subtractions are nonlinear. Its increase is even several times larger than the order of the excitation probability \({({P}_{{{{{{{{\rm{e}}}}}}}}})}^{100}\) obtained through the only-linear-subtraction case. The underlying reason originates from the fact that nonlinear subtractions are better at squeezing the phonon population but improper for trimming its tails. Earlier use of nonlinear subtractions, as a result, reduces the mean phonon number of the end-product motional state and leads to a lower mean-to-deviation ratio \({{{{{{{\mathcal{R}}}}}}}}\). Several linear subtractions thus prepare a properly trimmed phonon distribution to be squeezed by the following nonlinear subtractions.

Fig. 8: The performance of charging a hundred qubits.
figure 8

The figure compares the probability \({({P}_{{{{{{{{\rm{e}}}}}}}}})}^{100}\) of successful charging a hundred qubits after eight sequential phonon subtractions on a thermal state with the initial phonon number of \(\bar{n}=30\), using different combinations of linear and nonlinear subtractions. The blue bar denotes the performance of charging after eight linear subtractions, while the others demonstrate the performance after six linear and two consecutive nonlinear subtractions in different arrangements of order. Under these bars, the labels nth+(n + 1)th are used to identify the appearance order of the two nonlinear subtractions in the subtraction sequence. It is apparent that the performance of charging is improved better if we assign the nonlinear subtractions to perform last. The numbers above these bars are the actual values of such probabilities \({({P}_{{{{{{{{\rm{e}}}}}}}}})}^{100}\).

We add here that as the third law of quantum thermodynamics forbids us to prepare a perfectly pure state, an ideal projective measurement is thus impossible to achieve with a finite amount of time and resource44. This emphasizes the benefit of the nonlinear interaction as it helps us reduce the required number of subtractions and measurements to achieve a sufficiently high charging performance. For example, for \(\bar{n}=30\), we need 14 linear subtractions with protocol II to achieve a charging performance of Pe = 0.94, while this performance can be accomplished by 8 subtractions: six linear and two nonlinear subtractions, as depicted in Fig. 7. Moreover, in comparison with the previously proposed Maxwell’s demons46,47, where the measurement is made to ensure the energy transfer from cold to hot reservoirs, our protocols use measurement to conditionally change the state so that the population distributions are transformed into a less noisy out-of-equilibrium distribution sufficient to excite qubits.

Conclusions

The idea of classical Maxwell’s demon initiates the reformation of classical thermodynamics, generates a connection between information and thermodynamic work, and provides the fundamental idea of the conversion between these two quantities. With the information of a system obtained through measurements and precise control, the system can then be manipulated into an out-of-equilibrium state, and, in return, its energy can be extracted. The idea of such conversion is carried on to its quantum version with some fundamental differences. In contrast to the classical case, in which a measurement is treated to be arbitrarily sharp without any back action on the measured system, in the quantum domain, both measurement outcomes and their back actions unavoidably affect the way we conditionally control and manipulate the system in order to generate a useful resource. Each performed measurement not only extracts the system’s information but also transforms its state accordingly.

We have proposed a simple but deterministic protocol to realize a bosonic Maxwell’s demon at a quantum level, exploiting the free coupling between the mechanical modes of a single atom and its internal electronic state. A measurement of the qubit with its outcome implying absorption of phonons by the qubit is regarded as phonon subtraction. It is shown that linear subtractions from both protocols I and II transform the phonon state from an initial thermal state into an out-of-equilibrium state with a nearly Gaussian phonon distribution. This transformed motional state can eventually be used to charge a microscopic battery, another qubit, by exciting it through a linear JC coupling. The charging performance of such out-of-equilibrium states is higher than that of its initial thermal state, which can be indicated by its increased mean-to-deviation ratio, \({{{{{{{\mathcal{R}}}}}}}}\). The performance is enhanced each time a subtraction is performed but becomes saturated eventually. To break through this limitation, a nonlinear subtraction, using a nonlinear JC coupling to absorb more phonons at once, must be exploited using the procedure of subtraction protocol II. The nonlinear interaction can boost the charging performance further, at the cost of its speed in the repeat-until-success protocols. It can further squeeze the phonon distribution better than the linear version, which increases \({{{{{{{\mathcal{R}}}}}}}}\) as a result. Nonetheless, it still has a drawback as it cannot trim the tails of the phonon population properly, making it better used as a final performance booster. The involvement of nonlinear subtractions helps us reduce the number of required subtractions to achieve a sufficiently high charging performance. Although we use a trapped ion as an example quantum platform, this proposed protocol can also be realized easily in other platforms in which a nonlinear JC coupling is available, such as superconducting circuits and cavity quantum electrodynamics.

Using such states from nonlinear subtractions in parallel to fully and independently excite a hundred two-level systems can give more than ten times higher success rates than those from only linear ones. We, therefore, believe that nonlinear-based Maxwell’s demon can potentially pave the way for a new area of theoretical and experimental research in quantum statistics and thermodynamics.

Methods

Optimal interaction times

In semi-classical treatment \(\bar{n} \, \gg \, 1\), the probability of a qubit in its ground state being excited by interacting with a thermal oscillator having a large mean phonon number \(\bar{n} \, \gg \, 1\) as can be approximated as48

$${P}_{{{{{{{{\rm{e}}}}}}}}}(t) \, \approx \, \lambda t\sqrt{\bar{n}}D(\lambda t\sqrt{\bar{n}})$$
(21)

where λ is the coupling strength between the qubit and phonons, and D(x) is Dawson’s integral, defined by

$$D(x)={e}^{-{x}^{2}}\int\nolimits_{0}^{x}{e}^{{x}^{{\prime} 2}}{{{{{{{\rm{d}}}}}}}}{x}^{{\prime} }.$$
(22)

Before reaching the quantum-revival region, this semi-classical approximation agrees well with the calculation obtained from rigorous quantum treatment. The probability Pe is, therefore, maximized when \(\lambda t\sqrt{\bar{n}}\approx 1.502\).

On the other hand, a thermal oscillator with small mean phonon number, \(\bar{n} \, \ll \, 1\) dominantly occupies in its ground and the first excited states, \(\left\vert 0\right\rangle\) and \(\left\vert 1\right\rangle\). The population of all other motional states becomes negligible compared to that of the two states. It is the phonon in the Fock state \(\left\vert 1\right\rangle\) that mainly excites a qubit. The probability of successful excitation of a qubit in this case then becomes

$${P}_{{{{{{{{\rm{e}}}}}}}}}(t)=\mathop{\sum }\limits_{m=0}^{\infty }{p}_{m}{\sin }^{2}(\lambda t\sqrt{m})\approx \bar{n}{\sin }^{2}(\lambda t\sqrt{1})+{{{{{{{\mathcal{O}}}}}}}}({\bar{n}}^{2}).$$
(23)

This imply that the maximum Pe occurs when λt ≈ π/2. To compromise between these two extreme limits, \(\bar{n}\, \gg \, 1\) and \(\bar{n}\ll 1\), we then approximate the optimal interaction time as \(\lambda {t}_{0}^{{{{{{{{\rm{op}}}}}}}}}\approx \pi /(2\sqrt{\bar{n}+1})\), which approaches π/2 for small \(\bar{n}\) and still well agrees with the semi-classical treatment for large \(\bar{n}\). We note here that this approximation still hold true even in the intermediate limit of the average phonon number \(\bar{n}\). For example, when \(\bar{n}=2\), the approximation of \(\lambda {t}_{0}^{{{{{{{{\rm{op}}}}}}}}} \, \approx \, \pi /(2\sqrt{\bar{n}+1})\) is differed from its actual value only by ~1%.

For the semi-classical case, \(\bar{n} \, \gg \, 1\), after several linear subtractions, the population distribution of phonons is modified into nearly Gaussian with its peak centered around its average phonon number \(\langle \hat{n}\rangle\) and can, therefore, be qualitatively approximated as a Gaussian distribution as

$${p}_{m} \sim \frac{1}{\sqrt{2\pi }\sigma }\exp \left(-\frac{{(m-\langle \hat{n}\rangle )}^{2}}{2{\sigma }^{2}}\right),$$
(24)

where σ2 represents the variance of the distribution. The optimal interaction times for linear subtractions for the approximated Gaussian distribution is then of the form \(\lambda {t}_{N}^{{{{{{{{\rm{op}}}}}}}}}\approx \pi /(2\sqrt{\langle \hat{n}\rangle +1})\), which is similar to the previous result. On the other hand, for this nearly Gaussian distribution, the probability associated with a successful nonlinear subtraction becomes

$${P}_{{{{{{{{\rm{e}}}}}}}}}^{{\prime} }\approx \frac{1-{e}^{-2{\lambda }^{{\prime} 2}{t}^{2}{\sigma }^{2}}\cos (2{\lambda }^{{\prime} }t(\langle \hat{n}\rangle ))}{2}.$$
(25)

With a sufficiently small variance σ2, we, therefore, can qualitatively approximate the optimal interaction time for a nonlinear subtraction as \({\lambda }^{{\prime} }{\tau }^{{{{{{{{\rm{op}}}}}}}}}\approx \pi /2\langle \hat{n}\rangle\), as shown in Eq. (19).