Implementing efficient selective quantum process tomography of superconducting quantum gates on IBM quantum experience

The experimental implementation of selective quantum process tomography (SQPT) involves computing individual elements of the process matrix with the help of a special set of states called quantum 2-design states. However, the number of experimental settings required to prepare input states from quantum 2-design states to selectively and precisely compute a desired element of the process matrix is still high, and hence constructing the corresponding unitary operations in the lab is a daunting task. In order to reduce the experimental complexity, we mathematically reformulated the standard SQPT problem, which we term the modified SQPT (MSQPT) method. We designed the generalized quantum circuit to prepare the required set of input states and formulated an efficient measurement strategy aimed at minimizing the experimental cost of SQPT. We experimentally demonstrated the MSQPT protocol on the IBM QX2 cloud quantum processor and selectively characterized various two- and three-qubit quantum gates.

The experimental implementation of selective quantum process tomography (SQPT) involves computing individual elements of the process matrix with the help of a special set of states called quantum 2-design states. However, the number of experimental settings required to prepare input states from quantum 2-design states to selectively and precisely compute a desired element of the process matrix is still high, and hence constructing the corresponding unitary operations in the lab is a daunting task. In order to reduce the experimental complexity, we mathematically reformulated the standard SQPT problem, which we term the modified SQPT (MSQPT) method. We designed the generalized quantum circuit to prepare the required set of input states and formulated an efficient measurement strategy aimed at minimizing the experimental cost of SQPT. We experimentally demonstrated the MSQPT protocol on the IBM QX2 cloud quantum processor and selectively characterized various two-and three-qubit quantum gates.
In the quest to build a real quantum computer, several difficulties need to be overcome, which include pure state initialization, implementing high fidelity quantum operations, performing efficient and noise-free measurements and protecting the quantum state against decoherence. Quantum state tomography (QST) 1 and quantum process tomography (QPT) 2 are standard tools that are extensively used for the characterization and benchmarking of quantum information processing devices and protocols.
Resource requirements for standard QST and QPT methods grow exponentially with increasing system size, and hence several novel methods have been designed that focus on simplifying and reducing experimental complexity such as maximum likelihood estimation 3 , adaptive quantum tomography 4 , self-guided tomography 5 , ancilla-assisted tomography 6 , compressed sensing tomography 7,8 , and least square optimization based tomography 9,10 . These novel tomography protocols have been experimentally demonstrated on various physical configurations such as NMR 11,12 , linear-optics 13 , NV-centers 14 , ion-trap based quantum processors 15 , photonic qubits 16 , and superconducting qubits [17][18][19][20] . It has been shown that sequential weak value measurement can be used to perform direct QPT of a qubit channel 21,22 . A unitary 2-design and a twirling QPT protocol have been used to certify a seven-qubit entangling gate on an NMR quantum processor 23 .
In recent years, researchers across the globe are engaged in building quantum systems of a larger register size termed noisy intermediate-scale quantum (NISQ) processors, such as the IBM quantum processor based on superconducting technology with 32 qubits, and NMR, ion-trap based quantum computers and linear optical photonic quantum processors which have achieved register sizes of 12, 10 and 14 qubits, respectively [24][25][26] . In some cases on such NISQ devices, instead of the complete characterization of the large-scale quantum process, one is only interested in a specific part, and the method used is termed selective quantum process tomography (SQPT) 27 which allows us to perform partial process tomography. Specifically SQPT allows us to estimate single and selective elements of the process matrix, and provides the desired partial information of system dynamics. The first implementation of SQPT was reported using optics 28 , which involved the preparation of special quantum states called quantum 2-design states. Further developments in SQPT include the generalization of the SQPT protocol for arbitrary dimensions 29 and an efficient protocol using an NMR quantum processor 30 . However, the experimental complexity involved in performing SQPT is still high, and better strategies to implement SQPT need to be designed. www.nature.com/scientificreports/ In this work we demonstrate a modified SQPT (MSQPT) protocol on a five-qubit IBM QX2 quantum information processor and use it to characterize several two-and three-qubit superconducting quantum gates. We propose a general quantum circuit for initial input state preparation to efficiently implement the MSQPT protocol. We implement an efficient measurement framework wherein detection is performed on only a single qubit. Our experimental results show that one can use the modified SQPT protocol to efficiently and selectively characterize the desired quantum process. We demonstrate that the MSQPT results can be further refined to construct the underlying true quantum process by solving a constrained convex optimization problem.

Preliminaries
Standard selective quantum process tomography. A quantum process denoted by the superoperator can be described using the Kraus operator representation 31 : with {E i } being a fixed set of basis operators, and ρ being the quantum state evolving under . The matrix χ with elements χ mn characterizes the given quantum process . Estimating the complete matrix χ is referred to as performing QPT of . Full QPT is achieved by preparing a complete set of linearly independent quantum states {ρ i } and then letting them evolve under the quantum process under consideration 32 . However, sometimes it suffices to estimate specific elements of the χ matrix, a procedure referred to as selective QPT (SQPT), with an experimental complexity which is lower than the full QPT protocol 27 .
A specific element χ mn of the process matrix can be determined by computing 'average survival probabilities' F mn as 28 : where {|φ i �} are a set of quantum 2-design states 30 , K is their cardinality and D is the dimension of the Hilbert space.
From Eq. (2), it can be seen that in order to compute F mn , one has to prepare the system in the state (E † m |φ j ��φ j |E n ) , let it pass through the given quantum process and then calculate the overlap with the original state |φ j ��φ j | for all quantum 2-design states. However, the operator (E † m |φ j ��φ j |E n ) is in general not a valid density operator. Previous implementations 28 have proposed an alternative way to resolve this issue, i.e. instead of the operator (E † m |φ j ��φ j |E n ) , the quantum system is prepared in the state (E m ± E n ) † |φ j ��φ j |(E m ± E n ) (which has to be divided by its trace for normalization), passed through the given quantum channel, the overlap with the original state is measured, the modified average fidelities F ± mn are computed and finally the real part of F mn is determined as: This approach is not experimentally viable as the number of experiments are quadrupled and to estimate a single element of the χ matrix we need to construct a large number of unitary operations corresponding to (E m + E n ) † |φ j ��φ j |(E m + E n ) and (E m − E n ) † |φ j ��φ j |(E m − E n ) for the real part and (E m + iE n ) † |φ j ��φ j |(E m + iE n ) and (E m − iE n ) † |φ j ��φ j |(E m − iE n ) for the imaginary part of the process matrix for all |φ j � . For different values of m and n, we would again need to experimentally construct different sets of unitary operations which is a hard task to perform. Although the SQPT protocol is computationally less resourceintensive as compared to the standard QPT method, the number of experimental settings required to prepare the input states for computing a selected element of the process matrix is still quite high.

Protocol for modified selective quantum process tomography.
We propose a generalization of the SQPT method, namely the MSQPT protocol, which considerably reduces the experimental complexity of computing a desired element of the process matrix with high precision. We have designed a more efficient way of performing SQPT on an IBM quantum processor. We rewrite the operators (2) www.nature.com/scientificreports/ (E † m |φ j ��φ j |E n ) and � j = |φ j ��φ j | in Eq. (2) in terms of fixed basis operators {E i } as E † m j E n = i j c mn i E i and j = k j e k E k (with j e k ∈ R ), which leads to the compact form: where the complex scalar quantities j β mn ki = j e k j c mn i can be computed analytically and do not depend upon the quantum process. It turns out that if we choose Pauli matrices as basis operators, then for given values of m and n, the tensor j β mn ki is sufficiently sparse and most of its values are zero. We hence only need to compute k for those values of i and k for which j β mn ki = 0 . The sparsity of j β mn ki is directly connected to the experimental complexity in terms of the number of coefficients Ē i k that need to be estimated. The question now arises about the estimation of the coefficients Ē i k . Given a set of operators E i (n-qubit Pauli operators), one can associate a well defined (positive and unit trace) density operator ρ i with it as follows: It is easy to see that Equation (6) hinges on the fact that the identity operator does not evolve under the process matrix . This provides us with a way to experimentally estimate the desired coefficients Ē i k , where we need to prepare the system in states ρ i , let it evolve under the process and then measure E k .
We note here that the identity operator is not preserved under the action of non-unital maps. In such cases, in order to perform modified SQPT of a given non-unital map, one needs to prepare the system in the state corresponding to the identity operator E 0 as well, which is the maximally mixed state denoted by ρ 0 . Hence, for non-unital maps, Eq. (6) is modified as: , which can be experimentally computed by preparing the system in the ρ 0 state, passing it through the given non-unital quantum channel and then measuring the observables E k . Once Ē 0 k is determined, the other desired coefficients Ē i k can be experimentally determined using Eq. (7). The computational efficiency of the MSQPT protocol is based on the fact that the total number of input states that are required to calculate the average survival probabilities (Eq. 2) is much fewer as compared to the SQPT method, as a single unitary operator is applied simultaneously on all system qubits to prepare the input state. Furthermore, only a single detection is required at a time, which reduces the number of readouts required to determine a specific element of the process matrix, further reducing the experimental complexity of the protocol.
The quantum circuit to implement the n-qubit MSQPT protocol is given in Fig. 1. The symbol '/' through the input wire represents a multiqubit quantum register. The first quantum register contains a single qubit while the second and third quantum registers comprise n − 1 qubits, respectively. The first and the second quantum registers collectively represent the system qubits denoted by |0� s , while the third quantum register represents the ancilla qubits denoted by |0� a . The first block prepares the desired pure input state | i � , where H ⊗(n−1) is applied on the second register followed by n − 1 CNOT gates, with the control being at the second quantum register and the target being at the third quantum register. The unitary gate R i is then applied on the system qubits, where the columns of the unitary operation R i are the normalized eigenvectors of the density matrix ρ i . For non-unital quantum channels, in order to prepare the n-qubit system in the maximally mixed state ρ 0 ,one Figure 1. The quantum circuit to acquire data to perform an n-qubit MSQPT. The symbol '/' through the input wire represents a multiqubit quantum register. The first and the second quantum registers collectively represent the system qubits (denoted by |0� s ), and the third quantum register represents the ancilla qubits (denoted by |0� a ). The first block prepares the desired pure input state | i � . The unitary gate R i is then applied on the system qubits. The second block represents the unknown quantum process which is to be applied to the system qubits and the last block represents the measurement settings to compute the expectation value of the desired observable. www.nature.com/scientificreports/ needs an extra ancillary qubit as compared to preparing other ρ i (i > 0) . One has to prepare the joint system (main system qubits+ancilla qubits) in the state: where |e i � and |e ′ i � are computational basis vectors of the system qubits and the ancilla qubits, respectively. It is to be noted that | 0 � has a Schmidt rank greater than 1, which demonstrates that the state is entangled. For example, for a two-qubit system, the state of the combined system (system + ancilla) is: For an n-qubit system, the state ρ 0 corresponding to the identity operator can be prepared by applying n Hadamard gates on n system qubits and then applying n CNOT gates with the system qubits being the control and ancilla qubits being the target. The unitary gate R 0 is an n-qubit identity operation. The rest of the protocol and the quantum circuit given in Fig. 1 remains unaltered. The second block represents the unknown quantum process which is to be characterized and the last block represents the measurement settings to compute the expectation values of the desired observables. Note that in the third block, after the appropriate quantum mapping, only a single detection is performed at a time, to measure a desired observable.
In order to represent a valid quantum map, the χ matrix should satisfy following conditions 33 : Using the MSQPT method, the χ matrix is Hermitian by construction, however there is no guarantee that it will satisfy the last two conditions. One can use the constrained convex optimization (CCO) technique 10 to obtain a valid χ cco matrix from χ msqpt as follows: where χ msqpt is the experimentally obtained process matrix using the MSQPT protocol and χ cco is the variable process matrix which represents the underlying true quantum process.
State preparation and unitary operator construction. We note here that for an n-qubit system, all density operators ρ i in Eq. (5) represent mixed states (except for n = 1 ). We hence require ancillary qubits to experimentally prepare the quantum system in the state ρ i .
It turns out that for an n-qubit system, all non-zero eigenvalues of the operator ρ i in Eq. (5) are the same and are equal to 1/2 n−1 . Let {|u i 1 �, |u i 2 �, |u i 3 �, . . . , |u i 2 n−1 �} represent the complete set of normalized eigenvectors of the operator ρ i corresponding to its non-zero eigenvalues. The state of the combined system (system + ancilla) we need to prepare is given by: where |a i � are the basis states of the ancilla qubits. Note that in general | i � represents an entangled state. After tracing over the ancillary qubits, the system will be in the desired state ρ i .
The unitary operator U i , such that U i |0� sys |0� ancilla = | i � can be constructed as follows: 1. Apply a Hadamard gate on ( n − 1 ) system qubits; 2 n−1 number of states will be in a superposition state while the ancilla qubits will be in the state |0� ancilla . 2. Apply CNOT gates with the system qubits being the controls and ancilla qubits being the target. We hence have |0� ancilla −→ |a i 1 � , |0� ancilla −→ |a i 2 � , and so on. 3. Map the computational basis states of the system qubits to the eigenvectors of ρ i using the unitary gate R i , where the columns of R i are the normalized eigenvectors of ρ i Eq. (5). Note that the column position of eigenvectors depends on which computational basis vector we want to map onto which eigenvector. The combined system (system + ancilla qubits) will be in the | i � state. 4. Repeat the steps [1][2][3] to prepare other states ρ i .

Results and discussion
The IBM quantum processor is based on superconducting qubits and is freely available through the cloud [34][35][36] , and has been used to demonstrate various quantum protocols 37,38 . More details about the architecture of the IBM QX2 processor and the topology of superconducting qubits are given in 39 and information about the form of the Hamiltonian and important relaxation parameters can be found in 40,41 . We use the five-qubit IBM QX2 processor to demonstrate the MSQPT protocol described in the previous section. The system is prepared in an input state corresponding to all qubits being in the |0� state. After the gate implementation, projective measurements are performed in the Pauli σ z basis and the quantum circuit is implemented multiple times to compute the Born probabilities. The IBM quantum architecture requires a pure quantum state as an input state and only allows the implementation of unitary operations. We hence utilize ancillary qubits to prepare the system in a mixed state and to simulate non-unitary evolution.
(10) | i � = |u i 1 �|a 1 � + |u i 2 �|a 2 � + · · · + |u i 2 n−1 �|a 2 n−1 � √ www.nature.com/scientificreports/ We implement the MSQPT protocol corresponding to two-and three-qubit gates and element wise construct the corresponding full χ matrices. In all the cases considered, we use the experimentally constructed χ msqpt , solve the CCO problem (Eq. 9a) and obtain χ cco , which represents the underlying true quantum process. The fidelity of experimentally implemented quantum gates is computed using the measure 14 : To validate our circuits, we also theoretically simulate the MSQPT protocol on the IBM processor and obtain χ sim . The fidelity of the simulated quantum gates is computed by using a similar measure as given in Eq. (11).

MSQPT of two-qubit quantum gates.
For two qubits, we need to prepare 15 input (mixed) states ρ i (Eq. 5) corresponding to all the Pauli operators E i . For all ρ i s, it turns out that out of four eigenvalues, only two eigenvalues are non-zero ( 1 = 2 = 1/2 ). Let |v i 1 � and |v i 2 � represent the normalized eigenvectors of the operator ρ i corresponding to 1 and 2 , respectively. To perform MSQPT of two qubits on the IBM computer, we use one ancillary qubit and prepare three-qubit input (pure) states of the form: All 15 three-qubit pure input states |ψ i � corresponding to E i are listed below: As an illustration, the IBM quantum circuit for implementing MSQPT of a two-qubit SWAP gate, corresponding to the quantum state |ψ 6 � and the observable E 13 = σ z ⊗ σ x , is given in Fig. 2. The system qubits are denoted q [1] q [2] q [3] q [4] c5 (b) Probability histogram 45.020%

54.980%
|00000 | 00010 Figure 2. (a) The IBM quantum circuit to perform MSQPT of a two-qubit SWAP gate. The first block prepares the three-qubit input state |ψ 6 � . The quantum process corresponding to the two-qubit SWAP gate is applied in the second block and in the last block, the quantum map U 13 =CNOT 12 . R y (− π 2 ) is applied to compute Tr(σ z ⊗ σ x �(ρ 6 )) by detecting the second qubit in the σ z basis. www.nature.com/scientificreports/ by q[0] and q [1] (the first and second qubit, respectively) while the ancilla qubit is denoted by q [2]. To prepare the system in the pure state |ψ 6 � , the unitary operation U 6 = S 2 . CNOT 12 .CNOT 23 . H 2 . H 1 is applied on the initial state |000� in the first block. In the second block, the quantum process (� system ⊗ I ancilla ) corresponding to a twoqubit SWAP gate is implemented on the system qubits. In the last block, the quantum map corresponding to the unitary operation U 13 = CNOT 12 .R y (− π 2 ) is used to transform the output state and determine E 13 = �σ z ⊗ σ x � by measuring the second qubit in the σ z basis 30 . The quantity corresponding to Tr(σ z ⊗ σ x �(ρ 6 )) is experimentally computed, which is equivalent to Tr(σ 2z U 13 (�(ρ 6 ))U 13 † ) . Using Eq. (5) we obtain: One can thus efficiently compute all the E i k (Eq. 4) and estimate the corresponding average survival probabilities F mn . The list of all unitary operations U i corresponding to all quantum maps which transform output states in order to determine E k by detecting either of the system qubits in the σ z basis (i.e. by measuring either σ 1z or σ 2z ) is given in 30 .
The 16 × 16 grid matrix plots in Fig. 3a represent χ matrix corresponding to the two-qubit SWAP gate, where the position of the specific grid represents the corresponding element of the χ matrix, while its color represents its value. For instance, the first yellow square in the matrix plot in Fig. 3a denotes the element χ 11 = 0.25 of the theoretically constructed process matrix χ the . Only 16 yellow squares have non-zero values in the theoretically constructed matrix plot for the SWAP gate. The second and third columns represent matrix plots corresponding χ msqpt , and χ cco respectively obtained by implementing MSQPT protocol on IBM QX2 processor. The differences in the theoretically computed and experimentally obtained matrix plots reflect errors due to decoherence and statistical and systematic errors while preparing the initial input state. The color grids in the matrix plots in Fig. 3 in the third column (CCO experimental) have a smaller deviation as compared to the matrix plots in the second column (MSQPT experimental). This improved fidelity implies that one can use the MSQPT data to solve CCO problem and reconstruct the full process matrix more accurately. The experimental fidelity of χ msqpt for the SWAP gate (Fig. 3a) turned out to be 0.799, while the improved fidelity of χ cco turned out to be 0.929. We also computed the process matrices for the two-qubit CNOT gate and the corresponding matrix plots are shown in Fig. 3b. The experimental fidelity of χ msqpt for the CNOT gate turned out to be 0.828, while the improved fidelity of χ cco turned out to be 0.953. We obtained fidelities of F (χ sim ) ≥ 0.99 for all the quantum gates, which ensures that all the quantum circuits are correct. The fidelity values of F (χ cco ) ≥ 0.9 shows that one can retrieve the full dynamics of the quantum process with considerably high precision by solving the optimization problem (Eq. 9a) using the experimentally constructed full χ msqpt matrix.
The standard QPT protocol is based on the linear inversion method and requires the preparation of 15 linearly independent input states and further requires the state tomography of each output state. Hence the total number of readouts to determine a specific element of the two-qubit process matrix with high precision, using the standard QPT protocol, is 15 × 15 = 225 . The SQPT protocol uses quantum two-design states as initial input states and further requires a quantum operation to prepare the system in the desired state. Determining the real and imaginary parts of F mn respectively requires a total of 80 state preparations. Further, to estimate the overlap with original state |φ j ��φ j | (Eq. 2), three readouts need to be performed (as there are three non-zero coefficients (   p  x  e  O  C  C  p  x  e  T  P  Q  S  M  y  r  o  e  h  T  p  x  e  O  C  C  p  x  e  T  P  Q  S  M  y  r  o  e  h  T   Real part   Imaginary part Real part Imaginary part

Figure 3.
Matrix plots corresponding to the real and imaginary parts of the (a) χ matrix for the SWAP gate and (b) χ matrix for the CNOT gate. The first column represents the theoretically constructed process matrix χ the , while the second and third columns represent χ msqpt , and χ cco , respectively. The top row represents the real part of the process matrix while the bottom row represents the imaginary part of the process matrix. The matrix plots were generated using MATLAB 42 . www.nature.com/scientificreports/ in the decomposition of j ). Hence the total number of readouts to determine a specific element of the two-qubit process matrix with high precision, using the SQPT protocol, is 80 × 3 = 240 . In the MSQPT protocol, the average survival probabilities can be computed quite efficiently as the total number of states that need to be prepared are only 15, there are only 12 readouts per mutually unbiased basis (MUB) set, and there are 5 MUB sets which form a complete set of quantum 2-design states. Hence the total number of readouts to determine a specific element of the two-qubit process matrix with high precision, using the MSQPT protocol, is only 12 × 5 = 60 . The experimental complexity and number of ancilla qubits required to determine a specific element of the process matrix with high precision for a two-qubit system, using the MSQPT method, are compared with the standard QPT and SQPT methods in Table 1.

MSQPT of three-qubit quantum gates.
To perform MSQPT on a three-qubit system, we need to prepare 63 input (mixed) states ρ i corresponding to all the three-qubit Pauli operators E i . It turns out that for all ρ i , out of 8 eigenvalues only 4 are non zero and are equal to 1/4. Let |u i 1 � , |u i 2 � , |u i 3 � and |u i 4 � be the 4 eigenvectors of ρ i with non-zero eigenvalues. In order to prepare the system in the any of the ρ i states, we first need to prepare a five-qubit pure state: After tracing out the two ancilla qubits, the three system qubits are in the state ρ i , i.e., Tr ancilla (|� i ��� i |) =ρ i . The list of all five-qubit pure input states {| i �} is given in Supplementary Information. Preparation of the five-qubit input state requires finding the correct decomposition of the unitary operator R i (Fig. 1) in terms of CNOT gates and single-qubit rotations. We note here that finding the decomposition of a general unitary operation R i is not an easy task. The complexity of the implementation of a given unitary operation primarily depends on the limitations of the quantum hardware being used. The IBM processor that we have used allows only a limited number of quantum gates to be implemented directly. Thus the implementation of a general unitary operation on the IBM processor involves its efficient decomposition as a sequence of the available quantum gate and then its implementation. Particularly in the context of MSQPT, the construction of unitary operators R i is system-specific and finding a general algorithm to experimentally implement R i on a given physical system is a research direction that requires more efforts. There are several techniques available to decompose a given unitary operation into a universal set of quantum gates [44][45][46][47] . In this study we have used the Mathematica package UniversalQCompiler 46,47 as an optimization tool to prepare the input state | i � from the initial state |00000� . The quantum circuit to perform MSQPT of a three-qubit Toffoli gate is given in Fig. 4, corresponding to the five-qubit pure input state | 50 � and the observable E 15 = I ⊗ σ z ⊗ σ y . The system qubits are denoted by q[0], q [1] and q [2], while the ancilla qubits are denoted by q [3] and q [4], respectively. The first block in Fig. 4 prepares the five-qubit pure input state | 50 � while the second block represents the action of the Toffoli gate on the system qubits and the last block represents the action of the quantum map corresponding to the unitary operation U 15 =CNOT 23 . A measurement is made on the third qubit in the σ z basis, to compute the quantity Tr(σ 3z U 15 (�(ρ 50 ))U 15 † ) , and obtain: All the Tr(E k �(E i )) = �E i k � can be computed in a similar fashion, corresponding to the desired average survival probability F mn . The list of all unitary operations U i , corresponding to all quantum maps for the threequbit system can be found in 48 . The experimentally obtained 64 × 64 dimensional χ matrix corresponding to the three-qubit Toffoli gate is depicted in Fig. 5 as a bar plot, where the first and second columns represent the real and imaginary parts of the χ matrix, respectively. The first row denotes the theoretically constructed process matrix χ the , while the second and third rows represent the experimentally constructed process matrices χ msqpt and χ cco , respectively. The experimental gate fidelity for χ msqpt turns out to be 0.589, while the much improved experimental gate fidelity obtained for the case of χ cco turns out be 0.946. To ensure the correctness of the circuits, we also simulated all the circuits on the IBM simulator, with a simulation fidelity of 0.98.
The total number of readouts to determine a specific element of the three-qubit process matrix with high precision, using the standard QPT protocol, is 63 × 63 = 3969 . For three qubits, the cardinality of the set of quantum 2-design states is 72 (9 MUB sets each having a cardinality of 8). Determining the real and imaginary parts of F mn respectively for three qubits requires a total of 288 state preparations using the SQPT protocol. Further to estimate the overlap with original state j , 7 readouts need to be performed. Hence the total number of readouts required to determine a specific element of the three-qubit process matrix with high precision, using www.nature.com/scientificreports/ the SQPT method, is 288 × 7 = 2016 . For the MSQPT method, the total number of states we need to prepare are 63 (corresponding to the complete set of basis operators) and the total number of readouts required is 504 (for each MUB set we need to perform 56 readouts, so the total number of readouts is 9 × 56 = 504 ). This makes the MSQPT method vastly more efficient as compared to the standard QPT and SQPT protocols. The experimental complexity and number of ancilla qubits required to determine a specific element of the process matrix with high precision for a three-qubit system, using the MSQPT method, are compared with the standard QPT and SQPT methods in Table 2.

Conclusions
We proposed a quantum circuit to efficiently implement the MSQPT protocol which reduces the experimental cost of performing standard SQPT. We implemented the MSQPT protocol on the IBM quantum processor. The system was prepared in a mixed state corresponding to all Pauli operators and the MSQPT protocol to perform element wise process tomography of two-and three-qubit quantum gates was successfully implemented. Our experimental results indicate that MSQPT is substantially more efficient as compared to SQPT and standard methods, when estimating specific elements of the process matrix with high precision. We also showed that one can utilize the full process matrix obtained experimentally via MSQPT, to solve the l 2 -norm minimization problem and reconstruct the underlying true quantum process. The MSQPT method opens up several avenues for future applications such as finding an optimal set of basis operators, developing generalized algorithms to find all sets of quantum maps to perform efficient measurements, and finding efficient decompositions of unitaries using the set of available quantum gates for easy experimental implementation.   Figure 5. Tomographs corresponding to the three-qubit Toffoli gate, with the first and second columns representing the real and imaginary part of the χ matrix, respectively. The first row represents the theoretically constructed χ matrix while the second and third rows represent the experimentally constructed χ matrix obtained by implementing the MSQPT and the CCO protocols, respectively. The tomographs were generated using Mathematica 43 .