Modular quantum computation in a trapped ion system

Modern computation relies crucially on modular architectures, breaking a complex algorithm into self-contained subroutines. A client can then call upon a remote server to implement parts of the computation independently via an application programming interface (API). Present APIs relay only classical information. Here we implement a quantum API that enables a client to estimate the absolute value of the trace of a server-provided unitary operation \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$U$$\end{document}U. We demonstrate that the algorithm functions correctly irrespective of what unitary \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$U$$\end{document}U the server implements or how the server specifically realizes \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$U$$\end{document}U. Our experiment involves pioneering techniques to coherently swap qubits encoded within the motional states of a trapped \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${}^{171}{{\rm{Yb}}}^{+}\,$$\end{document}171Yb+ ion, controlled on its hyperfine state. This constitutes the first demonstration of modular computation in the quantum regime, providing a step towards scalable, parallelization of quantum computation.

W hen Google upgrades their hardware, applications that make use of Google services continue to function without needing to update. This modular architecture allows a client, Alice, to leverage computations done by a third party, Bob, without knowing any details regarding how these computations were executed. Modularity is enabled by an interface-an established set of rules that specify how Alice delivers input to Bob, and how Bob returns relevant output to Alice. Once agreed, Alice can design technology that makes use of the Bob's service as subroutines, while remaining blissfully ignorant of their implementation. Known as APIs (application programming interfaces), such interfaces are now industry standard. Their adoption is almost universal-from specifying how we leverage pre-built software packages as subroutines to how we interface remotely with present-day quantum computers.
Present interfaces assume only classical information is exchanged, limiting the scope of collaborative quantum computing. What happens when this information exchange is allowed to be quantum? Consider the scenario where Bob offers a service to implement some unitary operation U. A client, Alice wishes to evaluate the normalized trace TðUÞ ¼ trðUÞ=2 n by calling on Bob's service as a subroutine. If this can be achieved, the benefits are two-fold. Alice can treat Bob's service as a black-box. She need not know anything about the quantum circuits that synthesize U. In addition, Alice can use the same device to evaluate the normalized trace of a different unitary U ′ , by exchanging Bob's service for another. This is, in fact, impossible. To see this, note that TðUÞ depends on the global phase of U-a quantity that is unphysical. Therefore, its determination would enable Alice to measure an unphysical quantity. Thus, the standard quantum algorithm for estimating TðUÞ, known as DQC1 1 , cannot operate by offloading synthesis of U to a third party (see Fig. 1a). Indeed, the design of devices that realize complex U-dependent processes, given some unknown U, has received considerable attention [2][3][4][5][6][7][8][9][10] . In this context, several no-go results have been established [11][12][13][14][15] , motivating recent works in identifying what sacrifices or restrictions are necessary to circumvent these no-go theorems [14][15][16][17][18][19][20] .
Here, we report on the experimental implementation of a workaround for the DQC1 algorithm. The key observation is that while TðUÞ depends on the global phase, its modulus does not. The resulting protocol-modular DQC1-enables us to evaluate jTðUÞj by outsourcing implementation of U to a third party 15 . We successfully use it to evaluate jTðUÞj for 19 different unitary operations. The quantum circuit for the client remains the same for each U-guaranteeing true modular architecture. The physical implementation involves a new implementation of the controlled swap (CSWAP) gate-coherently swap two motional modes of an ion trapped in a three-dimensional harmonic oscillator, controlled on the internal levels of the trapped ion. Our experimental techniques are scalable, resilient to noise on part of the client, and chaining multiple iterations enables a modular variant of Shor's factoring algorithm that requires fewer entangling gates 21 . This presents the first demonstration of a modular quantum algorithm and provides an important step towards collaborative quantum computing.

Results
Framework. The modular DQC1 algorithm can be understood by dividing its actions into two separate parties, which we refer to here as server and client. The server, Bob, offers the service of implementing an n-qubit unitary process. Interaction with a client, Alice, is enabled by a publicly announced quantum interface. The interface specifies a designated Hilbert space of a designated quantum system S in which client and server are to exchange quantum information. For simplicity, we assume that the agreed system and Hilbert space used by client to send input quantum states to the server in the same as that used by the server to deliver output quantum states to the client. In principle, this need not be the case (see methods for formal definition). Bob is not constrained to preserve information stored in any other degrees of freedom within S. This is an important point. If Alice is guaranteed that Bob will preserve certain additional degrees of freedom, she is able to synthesizes certain U-dependent process that would otherwise be impossible 14,15 . Our goal is to take on the  Fig. 1 The DQC1 and modular DQC1 algorithms. a The standard DQC1 algorithm operates by applying U on an n-qubit register controlled by a pure qubit initialized in state jþi. TðUÞ can then be estimated through appropriate measurements on the control qubit. This algorithm cannot leverage a third party to implement U as it is impossible to add a control to an unknown unitary 11 . b The modular DQC1 algorithm evaluates jTðUÞj in a way in which U can be outsourced to a third party. Here, Alice introduces a second n-qubit register. She then sends the server one of the n-qubit registers via a specified interface (this could be the original register, or involve first mapping the register into a medium suitable for communication via a SWAP gate). On the proviso that the server applies U and return the result via the specified interface, Alice is able to estimate jTðUÞj by performing a σ 1 measurement on the control qubit role of Alice, and build a device that employs Bob's service as a subroutine to evaluate jTðUÞj.
To do this, Alice begins with a bipartite system, consisting of S to be delivered to Bob and some A that she retains for the duration of the protocol. The protocol then contains two distinct tasks (see Fig. 1b): Preprocessing-representing Alice's necessary actions of preparing some ρ 1 on the joint system A S before delivery of S to Bob; Postprocessing-representing Alice's actions to retrieve jTðUÞj from the state ρ 2 ¼ Uρ 1 U y after receiving Bob's output. Here, U represents the unitary process on S implemented by Bob.
Alice can achieve this by taking a single pure qubit initialized in state jþi ¼ ðj0i þ j1iÞ= ffiffi ffi 2 p , together with two maximally mixed n-qubit registers. In the preprocessing stage, she coherently swaps the two registers, controlled on the pure qubit to obtain ρ 1 . Alice then forwards one of the registers to Bob via the specified interface and awaits the result of Bob's computation. Upon receipt of this result, Alice enters the postprocessing stage. This involves a second application of the control swap gate. Measurement of the ancilla in the σ 1 ¼ 0 j i 1 h j þ 1 j i 0 h j basis then has expectation value of jTðUÞj 2 , enabling efficient estimation of jTðUÞj. Further details are shown in Fig. 1b.
The combination of preprocessing and postprocessing constitutes the modular DQC1 protocol. Critically, neither procedure depends on the physical means that Bob chooses to realize U. For instance, Bob could initially implement U by applying physical operations directly on the system S. Alternatively, Bob could map the received quantum state to a more efficient physical platform for information processing, and implement U on that platform. Alice's modular DQC1 protocol would function regardless. Moreover, Alice's preprocessing and postprocessing procedures are independent of matrix elements of U. This becomes pertinent in cases where U could represent some unknown environmental process. The protocol then functions as a probe, able to efficiently estimate jTðUÞj 2 for any such process without the need for full tomography.
Implementation. We demonstrate a proof of principle realization of modular DQC1 using a trapped 171 Yb þ ion in a harmonic potential when n ¼ 1. In this special case, the protocol involves a system of three qubits. Qubit C represents the control, which is encoded into the internal states of 171 Yb þ . Two registers, denoted as qubits X and Y, are encoded into the external motional levels of 171 Yb þ . Unitary operations on qubit C are performed by applying resonant microwaves 22,23 . Meanwhile entangling gates between the ancilla and the two registers are realized by applying counterpropagating Raman laser beams with appropriate frequency differences and phases [24][25][26][27][28] .
The theoretical circuit for modular DQC1 has also been further tailored for the ion trap system. Notably, during both preprocessing and postprocessing, Alice has inserted an extra SWAP gate between qubit C and qubit X. The actions of these SWAP gates have no effect on the algorithms output, but benefit this particular set-up, as the control qubit in the ion trap system is most directly accessible-and thus the most practical one for outsourcing operations to a third party. For the proof-of-principle experiment, we simulate the scenario where Bob operates on C directly-with understanding that in more realistic scenarios, information within X is likely first mapped to some flying qubit to be delivered to Bob. Fig. 2 illustrates further details.
During preprocessing, the standard circuit design for synthesizing ρ 1 involves application of a CSWAP gate on registers X and Y with qubit C as the control. Since ρ 1 is input-independent, any means of preparing this state is equally valid. Here, we perform preprocessing without explicitly using the CSWAP gate, with no impact on practical usages and scalability (see Supplementary  Notes 1 and 2). Subsequently, a second SWAP is then used to interchange information between X and C-enabling C to be used as the interface qubit.
During postprocessing, a second CSWAP gate is necessary for extracting information about jTðUÞj from ρ 2 . This ρ 2 is inputdependent, thus the CSWAP gate needs to be synthesized online. In our experiment, we pioneer a technique to achieve this involving motional qubits and a sequence of Raman laser beams together with microwaves (see Fig. 3. The full implementation of CSWAP is illustrated in Supplementary Note 1). Appropriately configured measurements on qubit C via fluorescence detection will then have measurement outcomes with an expectation value of jTðUÞj 2 . Repeated applications of the protocol thus efficiently estimate jTðUÞj to any specified accuracy.
To characterize the faithfulness of our CSWAP operation, we find its 8 8 truth table. The implementation achieves a fidelity (classical gate fidelity 29 ) of 0:85 ± 0:02 (see Supplementary Note 3 for details). In methods, we illustrate that effects of these imperfections can be mitigated-such that use of our CSWAP operations does not impact Alice's capability to efficiently estimate jTðUÞj to any fixed error.  Fig. 2 Modular DQC1 on a trapped ion. The modular DQC1 algorithm redesigned to function on a 171 Yb þ ion. Here, the control qubit C is encoded within two hyperfine levels of the S 1=2 manifold in the ion. Denote these by where F is the quantum number of total internal angular momentum and m F is the magnetic quantum number. The transition frequency between j0 C i and j1 C i is 12642.826 MHz. Qubits X and Y are encoded within the ground and first excited states of two radial motional modes in 171 Yb þ , denoted as j0 X i, j1 X i and j0 Y i, j1 Y i. The trap frequencies of modes X and Y are given by 2.53 and 2.00 MHz. After suitable preprocessing, information encoded within the control qubit can be forwarded to an external server via a suitable interface where the action of U is out-sourced NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-12643-2 ARTICLE Experimental benchmarks. To benchmark Alice's protocol, our experiment also needs to simulate the actions of the server Bob. Critically, we ensure that our experimental procedure for enacting Alice's modular DQC1 protocol in both preprocessing and postprocessing remains invariant regardless of which U is implemented. Operationally, this enables us to simulate the following scenario: 1. Alice performs relevant preprocessing and walks away from her lab. 2. Bob then implements U on C in her absence. 3. Alice can then return to perform estimation of jTðUÞj without specific knowledge, which U was implemented, or what methodology Bob used.
This enables us to illustrate the core tenet of modularity-that the client's circuit does not need to change depending on U.
During benchmarking, we assess Alice's performance for a wide range of unitaries U. Specifically, these include unitaries of the form U σ ðχÞ ¼ expðÀiχσ=2Þ, where σ 2 fσ 1 ; σ 2 ; σ 3 g involves all three possible Pauli operators, and χ 2 f0; π=6; π=3; π=2; 2π=3; 5π=6; πg as shown in Fig. 4. For each choice of U, the protocol was executed 1000 times to obtain an estimation of jTðUÞj 2 -denoted as MðUÞ-with standard error of 0.02. We compare these experimental results to their theoretical predictions in Fig. 4a. As we can see, the experimental estimations of jTðUÞj 2 are significantly lower than their true values. Fortunately, Alice is able to calibrate her device to account for these errors. To do this, she assumes the resulting estimations M are offset by a scaling factor of λ, i.e., MðUÞ ¼ λjTðUÞj 2 . Alice can determine λ by first benchmarking her device against an 'identity server' (e.g., preforming preprocessing and postprocessing without calling on the services of Bob). The effectively evaluates MðIÞ-which should output 1 under ideal conditions, and thus enables immediate estimation of λ. She can then scale all results by a factor of 1=λ. As this scaling is independent of how the server implements U, this form of error correction does not impact modularity of the procedure.
In our experiment, the value λ is determined to be 0:69 ± 0:02 to a confidence level of 95%. The re-calibrated estimations for jTðUÞj 2 are plotted with theoretical predictions in Fig. 4b. As we can see, Alice's estimations of jTðUÞj 2 are now in good agreement with their true values. As such, we illustrates that modular DQC1 can continue to operate in today's experimental conditions.

Discussion
Here, we experimentally demonstrated the first modular quantum protocol-a variation of the standard DQC1 protocol that allows a device to determine the normalized trace of a completely unknown unitary process U. The experiment illustrates how Alice can outsource part of the computation to Bob-namely the realization of U. Alice needs no knowledge of how Bob chooses to realize U. The only information Alice and Bob need to share is an agreement on how to communicate quantum information to each other. Modular architecture has been critical in distributed classical computing. Our experiment presents its analog in the quantum regime. Fig. 3 Implementation of the control SWAP gate. A CSWAP operation on X and Y represents coherently interchanging the populations of j1 C 0 X 1 Y i and j1 C 1 X 0 Y i. In experiment this is achieved as follows: first, we temporarily shelve j0 C i into an ancillary Zeeman level j ± Z C i ¼ jF ¼ 1; m F ¼ ± 1i by microwave pulses. The two Zeeman levels jZ C i and j À Z C i are employed sequentially with equal duration, so that the AC stark shift and energy level jittering of both Zeeman levels cancel. The transition between j0 C i and j ± Z C i ¼ jF ¼ 1; m F ¼ ± 1i is realized by a microwave pulse with frequency 12642:819 ± 9:507m F MHz. Meanwhile the SWAP operation that interchanges j1 C 0 X 1 Y i and j1 C 1 X 0 Y i is realized by three sequential Raman pulses (see Supplementary Note 1 and 2). Each Raman process is represented by a cube in the figure, where an arrow indicates that population is transferred, and a dot shows population is not transferred. Subsequently, the shelved j0 C i is restored by a second microwave pulse Our implementation involved the design and realization a coherent quantum controlled swap gate, swapping two motional modes of a trapped ion depending on its internal hyperfine states. This technique presents a more favorable means of scaling than encoding qubits only within the internal states of ions. In Supplementary Note 1, we illustrate that our techniques can be adapted to efficiently swap two registers containing many motional modes, controlled on the hyperfine states of single ion. Meanwhile employing higher-energy excitations of the motional modes can enable potential coherent swaps of continuous variable degrees of freedom. These techniques provide possible means of realizing a number of interesting quantum protocols, including quantum anomaly detection 30 , and quantum computing with continuous variable encodings 31 .

Methods
Formal framework. A quantum application programming interface (API) specifies a public agreement between a client and server in how to communicate quantum information 15 . In particular, an interface I involves two tuples: Þ consisting of the physical system S in , and precise Hilbert space H in that Alice promises to use to deliver information to the server, as well as the computational basis B in , which Alice will use to encode this information. 2. I out ¼ ðS out ; H out ; B out Þ consisting of the exact physical system S out , Hilbert space H out , and computational basis B out , which the server will use to return output quantum information to Alice. We then say that a server, Bob, implements U via interface I if on delivery of jϕi encoded within I in , Bob will return Ujϕi encoded within I out . Note that in many settings, our experiment included, I in ¼ I out .
Once an interface is agreed. Alice can then design modular algorithms that take advantage of Bob's service. Formally, we define two possible classes of elementary actions 1. Implement some elementary circuit elements (e.g., a elementary quantum gate, a single-qubit measurement) 2. Call upon the server to act on S in and wait for reception of S out A modular quantum algorithm is then defined as a U-independent sequence of elementary actions that enable Alice to realize a quantum process P½U whenever Bob implements U. In our experiment, P½U was a quantum process whose output allowed efficient estimation of trðUÞ. The key advantages of this modular architecture is that it ensures • Independence of realization-Alice's algorithm realizes P½U, irrespective of what sequence of physical operations Bob uses to implement U.
• Independence of function-If Alice wishes to realizes P½V, she does not need to modify her algorithm. She just needs to find a server that implements V instead of U via interface I .
We note also that while in many practical scenarios, client and server would be spatially separated, this need not be the case. An examples of local APIs in the classical setting are software packages, where certain functions can be invoked as subroutines without needing to know their details.
Client error calibration. Here, we illustrate details of how Alice can calibrate her device to account for experimental noise in her set-up. Specifically, the expected output state of the circuit immediately prior to the measurement is Measurement in Pauli-X basis then yields the desired expectation value of hσ 1 i ideal ¼ jTðUÞj 2 . By the central limit theorem, she can thus estimate jTðUÞj 2 to any specified accuracy ϵ by repeating the procedure Oð1=ϵ 2 Þ times. In our actual experiment, Alice's device is not ideal. The dominant noise occurs during the implementation of CSWAP gate, caused by fluctuations in the magnetic field, trap frequencies, polarization, and intensity of the Raman lasers. This introduces decoherence, such that Alice obtains in place of ρ ideal , where 0 1 À λ 1 benchmarks the level of effective decoherence. Subsequent Pauli-X yields expectation values hσ 1 i exp ¼ λjTðUÞj 2 .
Alice can estimate the value of λ by effectively running modular DQC1 using U ¼ I, without making use of a third party service. Once λ is determined, Alice can mitigate the effects of noise by setting her estimation to be jTðUÞj 2 est ¼ hσ 1 i exp =λ, enabling an estimation of accuracy ϵ with Oðλ À2 ϵ À2 Þ server calls. Therefore, our modular DQC1 algorithm is resilient to experimentally dominant sources of noise on the part of client.
We note that in this entire procedure, Alice's actions does not depend on which unitary Bob implements, or how he chooses to implement this unitary. Thus, the noise-corrections do not affect the modular nature of the algorithm. Benchmarking results for modular DQC1 with 19 different server supplied unitary operations U k;χ ¼ expðÀiχσ k =2Þ, where χ ranges over f0; π=6; π=3; π=2; 2π=3; 5π=6; πg and σ k ¼ σ 1 ; σ 2 ; σ 3 ranges over all three standard Pauli directions. a displays resulting experimental estimations of TðU k;χ Þ (blue bars), as compared to theoretic predictions (black lines). The disparity is due to decoherence. b Calibrating to account for this decoherence enables agreement between theory and experiment. Error bars in both a and b means a confidential interval with 95% confidence

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.