Introduction

Private information retrieval (PIR), introduced by Chor et al.1, allows a user (say Alice) to retrieve a bit xi from a database held by a server (say Bob), without revealing the retrieval address i (user privacy). A symmetrically private information retrieval (SPIR)2 scheme is a PIR scheme satisfying an additional requirement named “database security”, that is, Alice should not get more information than xi from database. In recent years, many SPIR protocols have been proposed in classical cryptography. However, the security of most classical cryptosystems is based on the assumptions of computational complexity which might be vulnerable to quantum computation3,4. One may want to know whether this drawback can be overcome by quantum protocols as that in quantum key distribution (QKD)5.

In fact, since user privacy and database security appear to be conflicting, the task of SPIR cannot be realized ideally even in quantum cryptography6. More practically, the quantum scheme for SPIR problem, called quantum private query (QPQ)7, loosens the security into the following. (1) (Database security) Alice can elicit a few more bits than the ideal requirement (i.e., just 1 bit) from database and (2) (user privacy) user privacy is guarded in the sense of cheat-sensitivity (that is, Bob's attack will be discovered by Alice with a nonzero probability if he tries to obtain Alice's retrieval address) and it would be better if the probability for Bob to reveal the address Alice queries can be kept small meanwhile.

Most earlier QPQ protocols7,8,9 utilizing unitary operations, show great significance in theory, but are difficult to be implemented when a large database is concerned. In 2011, Jakobi et al.10 proposed a QPQ protocol (J-protocol) based on SARG04 QKD scheme11. As many QKD protocols have been realized experimentally, QKD-based quantum private query is more practical and hence has attracted a great deal of concern. In 2012, Gao et al.12 presented a flexible generalization of J-protocol. Afterwards, Panduranga Rao et al.13 also gave two modifications of J-protocol's postprocessing. Recently, Zhang et al.14 designed a QPQ protocol based on a novel counterfactual QKD protocol.

However, the queried message is generally supposed to be a single bit in the above practical QPQ protocols, which is not the fact in a real implementation. In fact, meaningful message is generally composed of multiple adjacent bits, hence Alice has to query many times from database to obtain the entire message in the bit-by-bit way. Here, we turn to a more realistic model called “QPQ of blocks” (QPQB), which allows the user to retrieve a multi-bit block (i.e., multiple adjacent bits) from database in one query. In our QPQB model, for the sake of simplicity, the database X is partitioned into entries (blocks) with the same length l. Concretely, and each entry Xk (1 ≤ kN) is an l-bit message. Here, N is the number of entries in database and k is the address of the entry Xk.

It is worth noting that, the idea of “QPQ of blocks” is natural but nontrivial, since the security of single-bit QPQ cannot be achieved as ideally as that of QKD with the composable security definition15,16 (e.g., Bob always has a nonzero probability to reveal Alice's retrieval address). Concretely, suppose the database stores blocks of information with the same length of 100 bits and the total number of blocks is 100, then there are 10, 000 bits information in total. If Alice wants the information of the 14th block which contains the bits from 1401st to 1500th, then she has to make 100 queries to obtain these bits in the single-bit QPQ scenario. While as we know, Bob always has a nonzero probability p (though it might be very small) to reveal the retrieval address in each bit query. Obviously, once Bob obtains the address of the queried bit in any one of the 100 queries, he can infer which block Alice is retrieving. That is, the probability with which user privacy keeps secret is only (1 − p)100 in this condition. Apparently, the security degrades very fast with the size of blocks, which is a significant problem for QPQ in real-world applications. Luckily in QPQ of blocks, Alice can obtain the entire block in one query and as to user privacy, it only needs to hide the address of the block instead of the addresses of its bits. Hence, similar to that pointed out by Chor et al.17, remarkable saving is possible by utilizing the block structure and the research on QPQB may be an interesting and worthwhile work.

To fulfill the task of QPQB, we first review the idea for realizing QKD-based QPQ of single bit. As we know, distributing an oblivious key is of vital importance to achieve it10. That is, Alice and Bob should share a raw key Kr in the way that (1) Bob knows Kr entirely, (2) Alice knows only part of its bits and (3) Bob does not know which bits are known to Alice. After some classical postprocessing on the raw key, Alice only knows roughly one bit in the final key Kf and Bob still does not know which bit is known to Alice. Then, the final key is used to encrypt database so that (1) Alice can subsequently recover the bit she queries from the encrypted database with her known bit in Kf and (2) both user privacy and database security are well protected.

Following the above idea, each l-bit entry in QPQB needs to be encrypted by an l-bit string (i.e., l adjacent bits) which should be (1) completely known or unknown to Alice and (2) completely known to Bob while he does not know whether it is known to Alice. Intuitively, we need to design a d( = 2l)-level oblivious QKD protocol in which transmitting one qudit can provide l adjacent bits satisfying the above two requirements. Naturally, we expect that it can be achieved by generalizing the SARG04 protocol11 on which J-protocol10 is based to its d-level version. However, it is scarcely possible. Concretely in the SARG04 protocol, the fact that (1) each key bit is encoded on the basis of the qubit (that is, |0〉 and |1〉 represent bit 0, while |+〉 and |−〉 represent bit 1) and (2) only two complementary bases can be exploited owing to its decipher method, makes it can only generate one bit in the raw key by transmitting one carrier state of any dimension. That is, SARG04 protocol which can be used to generate an oblivious key, cannot be generalized to the high-dimension version. Oppositely, as we know, BB84 protocol18 can be generalized to the high-dimension versions19,20,21, but they cannot be used to distribute oblivious key since they are vulnerable to the quantum memory attack10. Then, how to overcome this barrier?

In this paper, via an unbalanced-state technique, we design a new QKD scheme which is indeed an intermediate of BB84 and SARG04 protocols. It can not only be used to generate oblivious key, but also be generalized to its high-dimension version (detailed analysis is shown in Methods). On this basis, we propose a quantum protocol for QPQ of blocks, in which the database security is guarded by the impossibility of reliably distinguishing non-orthogonal states, while user privacy is protected by the fact that the states with identical support cannot be unambiguously discriminated. Moreover, our protocol is cheat-sensitive and loss-tolerant.

Results

Here, we give a quantum protocol for QPQ of blocks, which allows the user to retrieve an l-bit entry from database in one query. Our protocol is based on a variant of multi-level BB84 protocol in which the carrier states are transmitted with different probabilities.

Proposed protocol for QPQ of blocks

Let d = 2l, then and are two complementary orthogonal bases for d-level quantum system, where . The carrier states adopted in our protocol are chosen from the bases B1 and B2 and represents an l-bit string, i.e., the binary representation of j. A detailed description of the protocol is as follows:

(R1) Alice sends Bob a long sequence of qudits which are chosen from basis B1 or B2 and among them, each state in is prepared with probability , while each in is prepared with probability . Here, .

(R2) Bob measures each received qudit in basis B1 or B2 randomly.

(R3) Bob announces in which instances he has successfully detected the qudits; the ones which are not detected are discarded.

(R4) Bob chooses some positions randomly and requires Alice to announce the states of the transmitted qudits there. Then he discards his outputs which are obtained by measuring qudits in incompatible bases and compares the remaining ones with Alice's announcement. If the error rate is higher than a certain threshold value, or the proportions of the states do not coincide with the corresponding probabilities with which they should be prepared in step (R1), the protocol terminates.

(R5) Bob announces all measurement bases he chose in step (R2).

(R6) After dropping the checking qudits, Alice and Bob share an oblivious raw key Kr successfully. Concretely, each element in Kr is corresponding to one measurement result of Bob and hence is an l-bit string entirely known to Bob. Apparently, Alice would know half of the elements in Kr by checking the measurement bases announced by Bob. It is worth noting that the raw key is determined by the receiver Bob's measurement outputs rather than Alice's state preparation, which is quite different from previous protocols.

(R7) Enough qudits should be transmitted so that the number of elements in Kr equals to kN (k is a security parameter and we will discuss its value later). The raw key is cut into k substrings in the way that each substring has N elements. These substrings are added bitwise (see Fig. 1) to obtain the final key Kf and Alice's information on Kf is reduced to roughly one element after that. This process is similar to that in Ref. 10.

Figure 1
figure 1

(Gao): Bitwise adding (taking l = 3 for example) — an adequate classical postprocessing to reduce Alice's knowledge on the final key.

Clearly, Alice's information on the sum string is lower than that on the initial strings. Question marks symbolize Alice's unknown bits.

(R8) If Alice does not know any element in Kf finally, the protocol fails.

(R9) Suppose that Alice knows the mth element in Kf and wants the nth entry Xn in database, she announces the number s = mn. Then Bob encrypts the database by bitwise adding Kf, shifted by s elements and sends the encrypted database to Alice. Obviously, Xn is encrypted by and consequently can be correctly recovered by Alice.

Features of our protocol

Our protocol is somewhat like the high-dimension version of J-protocol, but some nontrivial modifications are necessary. On one hand, the oblivious raw key in J-protocol is determined by the qudit sender's state preparation, but in our protocol it is determined by the receiver Bob's measurement results (see step (R6)) and hence is entirely known to Bob. On the other hand, the raw key bits are encoded onto the states of the qudits in our protocol while they are encoded onto the bases of the qudits in J-protocol. For these reasons, our protocol can not only resist the quantum memory attack by Bob, but also distribute l adjacent bits in Kr by transmitting one qudit, which ensures the realization of “QPQ of blocks”.

Our protocol is loss-tolerant. Note that the qudits in are linearly dependent and cannot be unambiguously discriminated by Bob22,23. Furthermore, Alice never declares the correct measurement bases in our protocol. It means that Bob cannot make sure the state (or basis) of the qudit by any method. Therefore even in the shield of channel loss, the information Bob could obtain is inconclusive and it would be subsequently compressed in the bitwise-adding phase. Hence, Bob cannot cheat by lying in step (R3) (i.e., announcing that a qudit is lost when he gets an unwanted result) to obtain virtual benefit.

Following the protocol, Alice will know on average elements in Kf after step (R7). And P0, the probability that she does not know any element at all and the protocol fails, is . By choosing an appropriate value of k, we can ensure both and small P0 (see Table 1), which implies a successful execution of the protocol. For example, for a database with 105 entries, k = 15 is an appropriate choice which provides Alice with known elements in the final key on average, whereas the probability of failure is only about 4.7%. On the other hand, even if Alice knows elements in Kf, she can only obtain one chosen entry of the database, because the other entries known to her will be at random positions in the database.

Table 1 Possible choices of k for different database sizes N, as well as the failure probability P0 and expected number of entries an honest Alice would gain from database

Now, we study some general attacks and analyze the security of our protocol.

Database security

To elicit more entries from database, Alice has to know more elements (i.e., Bob's measurement outputs) in the raw key Kr. For this purpose, Alice generally prepares bipartite entangled states |Ψ〉AB, keeps systems A by herself and sends systems B to Bob in step (R1). Then after Bob announces the measurement bases, Alice infers Bob's measurement results by measuring corresponding systems A. Without loss of generality, we assume that|Ψ〉AB can be expressed as

where |j B1 and .

Let's discuss the conditions for Alice to pass Bob's checking. When being requested to declare the state of one qudit in step (R4), Alice first measure corresponding system A, i.e., discriminating or randomly. If the measurement result is |βj〉 (|γk〉), she announces to Bob. To give correct qudit state, system A need to be discriminated perfectly no matter which basis Bob chooses, that is, the following conditions must hold.

(i) 〈βjk〉 = δjk, for .

(ii) 〈γjk〉 = δjk, for .

Meanwhile, to satisfy the required proportions of the qudits, the following conditions must hold.

(iii) , .

(iv) , .

Since , equation (2) can be written as

If conditions (ii) and (iv) hold, by comparing equation (1) with equation (3), we have

It is clearly contradict with condition (iii). In other words, entangled state which satisfies the above four conditions simultaneously is nonexistent. To avoid being detected, at least two entangled states are needed. One satisfies conditions (i) and (iii) (corresponding to the situation that the carrier states are chosen from B1), the other satisfies conditions (ii) and (iv).

Therefore, Alice can prepare a long sequence of entangled states which are randomly in state

or

where 〈ϕjk〉 = δjk and sends systems B to Bob in step (R1) while keeping systems A by herself. To announce the state of one qudit correctly in step (R4), she first measures corresponding system A in basis . If the measurement result is |ϕj〉 and she prepares |Ψ1〉 (|Ψ2〉) in this position, she announces to Bob. Clearly, this kind of attack cannot be detected by Bob.

Now, we discuss the maximal information Alice could gain by this attack. Without loss of generality, we suppose that Alice prepares |Ψ2〉 in some position. Then, she can select different strategies to obtain Bob's measurement result after step (R5). If the basis Bob announced in step (R5) is B2 (which appears with probability ), Alice measures system A in basis and hence gets Bob's output completely (see equation(6)). If the basis announced by Bob is B1 (which also appears with probability ), since , we have

where . Hence, system A would randomly collapse to one of the linearly independent symmetric states when Bob measures system B in basis B1. Therefore, to infer Bob's measurement result, Alice need to make an unambiguous discrimination on the symmetric states 24 with the maximum average success probability being , i.e., 2α. Consequently, Alice can obtain at most entries from database by this means. When α is small, Alice's advantage decreases distinctly. Moreover, with the growth of database size N, the ratio which represents the percentage of the entries Alice would obtain from database, decreases rapidly (see Table 2). Take α = 0.1, N = 105 for example, dishonest user can get at most 40 entries which occupy only 0.05% of the total entries. It is very little relation to database security for such a complex attack.

Table 2 Alice's advantages for database of different sizes. Here, α = 0.1

Now, we consider a more general attack. For those positions where Alice prepares |Ψ1〉 (|Ψ2〉) while Bob's measurement basis is B2 (B1), Alice can postpone the measurement on corresponding systems A held by herself until the very end of the protocol, so that she can know which of them contribute to an element in the final key Kf. Then she can perform a joint measurement on them to guess the final added value in Kf in the way similar to that in Refs. 10, 12. The maximal success probability of Alice's joint unambiguous state discrimination (USD) measurement on m systems declines rapidly with the increase of m even in the simplest situation when d = 2 (see Fig. 2), which means a high security degree for the database security under this kind of attack.

Figure 2
figure 2

(Gao): For d = 2, the maximal success probability of Alice's joint unambiguous state discrimination (USD) on m systems declines rapidly with the increase of m.

User privacy

If Bob is dishonest and wants to reveal the address Alice is retrieving, he has to make clear the question whether the measurement basis announced by himself is coincide with the basis of the qudit (i.e., whether the corresponding element in Kr is conclusive in Alice's view) for each received qudit. Therefore, he has to devote himself to judging which basis the qudit is chosen from, i.e., discriminating two equally likely mixed states

and

ρ1 and ρ2 cannot be unambiguously discriminated because they have the same support22,23,25. However, the protocol is not perfectly concealing because ρ1 ≠ ρ2. Bob can make a minimal error discrimination (MED) on them, with the minimal error probability PE26 being

By simple computation, we find that qst, the element in the sth row and tth column of matrix ρ21, satisfies

for . To keep things straightforward, we depict the relationship between PE and α, l in Fig. 3. Obviously, the minimal error probability PE increases with the growth of l and α. Even in the most favorable situation to Bob where l = 1 and α is very close to zero, he would make a mistake in the MED measurement on each received qudit with a probability no less than 14.64%. Obviously, it is very difficult for Bob to get Alice's privacy after the bitwise adding phase in step (R7), thus assuring the user privacy in our protocol.

Figure 3
figure 3

(Gao): The influence of parameter α and block length l on PE.

Here, PE is the minimal error probability of Bob's minimal error discrimination on each qudit.

It is worth noting that Bob's attack would be discovered by Alice, because the qudit would be disturbed inevitably in the MED measurement and subsequently Bob could not always output correct value in Kr. Take d = 2 for example, the carrier states are chosen from {|0〉, |1〉, |+〉, |−〉}. Here, both |0〉 and |−〉 are prepared with probability , while both |1〉 and |+〉 are prepared with probability . Hence, ρ1 = α|0〉〈0| + (1 − α)|1〉〈1|, ρ2 = (1 − α)|+〉〈 + | + α|−〉〈−|. The minimal error probability PE is , which is larger than 14.64% for all and the MED measurement operators26 are Π1 = |ξ1〉〈ξ1|, Π2 = |ξ2〉〈ξ2|, where

Therefore, the minimal error discrimination of ρ1 and ρ2 is equivalent to measuring the received qubit in basis {|ξ1〉, |ξ2〉}. Without loss of generality, we suppose that the qubit sent by Alice is |0〉 and corresponding measurement basis announced by Bob is {|0〉, |1〉}, then Bob should output 0 in the generation of raw key to avoid being detected. However, since both |0〉 and |1〉 can collapse to |ξ1〉 or |ξ2〉 in the MED measurement (see equations (12,13)), Bob could not output correct result all the time after making the MED measurement on it. His attack would be discovered afterwards when offering false entry to Alice. It indicates that our protocol is also cheat-sensitive.

Discussion

Compared to QPQ of single bit, QPQ of blocks is not only a more realistic model for application but also a nontrivial idea in security. In this paper, based on a variant of high-dimension BB84 scheme, we propose a protocol to realize QPQ of blocks. Our protocol is cheat-sensitive and loss-tolerant. Besides, the security of our protocol is well protected and the advantages of both sides are strictly limited by α. Furthermore, parameter α can be changed to balance the advantage between user privacy and database security to satisfy different application requirements. Concretely, in the scenario where the user privacy is emphasized, α should be given a larger value; if the database security is more concerned, α should be assigned a smaller one. Moreover, in the situation where “fairness to both sides” is pursued, by making a trade-off between user privacy and database security, we can roughly estimate a proper value for α (see Supplementary information). From an experimental viewpoint, the d-dimension carrier state in our protocol can be prepared with current technology, e.g., a single photon distributed over d orthogonal modes as considered in Refs. 27, 28. Recently, some high-dimension BB84-like quantum key distribution protocol has been demonstrated29, which also provides fundamental assurance to the application of our protocol.

Methods

By using a special technique in BB84 protocol, i.e., transmitting the carrier states |0〉, |1〉,|+〉, |−〉 with different probabilities , , , respectively, we obtain an intermediate of BB84 and SARG04 protocols, which can be called unbalanced-state BB84 (US-BB84) protocol (see Table 3). Similar to BB84 protocol, the key bit in US-BB84 protocol is encoded on the state rather than the basis of the qubit. Obviously, the US-BB84 protocol can be generalized to its high-dimension version in the same way as BB84 does19,20,21. Now, we show that it can also be used to distribute an oblivious key as follows:

  • (S1) Alice sends Bob a long sequence of qubits, in which both |0〉 and |−〉 are prepared with probability, while both |1〉 and |+〉 are prepared with probability. Here the parameter . |0〉 and |+〉 represent bit 0, while |1〉 and |−〉 represent 1.

  • (S2) Bob measures the received qubits in basis {|0〉, |1〉} or {|+〉, |−〉} randomly.

  • (S3) Bob randomly chooses some positions and requires Alice to announce the states of the transmitted qubits there. Then he discards his outputs which are obtained by measuring the qubits in incompatible bases and compares the remaining ones with Alice's announcement. If the error rate is higher than a certain threshold value, or the proportions p(|0〉), p(|1〉), p(|+〉), p(|−〉) do not coincide with the probabilities, , , , the protocol terminates. Here p(|0〉), p(|1〉), p(|+〉) and p(|−〉) represent the proportions of the states |0〉, |1〉, |+〉, |−〉 in Bob's remaining outputs.

  • (S4) Bob announces all measurement bases he chose.

  • (S5) After dropping the checking qubits, Alice and Bob successfully share an oblivious key Kr, which is composed of Bob's measurement outputs and hence is entirely known to Bob. Obviously, Alice would know half of the bits in Kr by checking the measurement bases announced by Bob.

Table 3 Comparison of BB84, SARG04 and US-BB84 protocols. The last two columns show that only the high-dimension US-BB84 protocol can be used to realize QPQ of blocks

As shown above, Bob knows the oblivious key entirely, but he cannot reliably infer which bits are known to Alice, because the carrier states are linearly dependent and cannot be unambiguously discriminated. On the other hand, we now show that, owing to the checking of the proportions of carrier states in step (S3), Alice could not obtain the whole key even using entanglement-measurement attack. Generally, Alice can prepare bipartite entangled states in the following forms

and sends systems B to Bob in step (S1). When being requested to declare the state of one qubit in step (S3), Alice first measures system A, i.e., discriminating the states or randomly. Then if the measurement result is |ϕ0〉 (|ϕ1〉), she announces |0〉 (|1〉) to Bob; if the measurement result is |γ0〉 (|γ1〉), she announces |+〉 (|−〉) to Bob. Note that in step (S3), Bob's measurement result would be thrown away if he measures the qubit in basis {|0〉, |1〉} ({|+〉, |−〉}) while Alice announces the state |+〉 or |−〉 (|0〉 or |1〉).

Take α = 0.1 for example, to pass Bob's checking in step (S3), Alice has to ensure that (1) she can always declare the states of transmitted qubits correctly, which means that 〈ϕ01〉 = 0 and 〈γ01〉 = 0 and (2) after Bob discards the results which are obtained by measuring qubits in incompatible bases, the states |0〉, |1〉, |+〉, |−〉 should occupy 10%, 40%, 40%, 10% of the remaining ones respectively. Therefore, |Ψ〉 must have the following forms

By simple computation, we can find that equations (16) and (17) cannot be satisfied simultaneously. That is, there is no such an entangled state that Alice could pass Bob's checking in step (S3).

However, Alice can prepare a long sequence of entangled states randomly in

or

and sends systems B to Bob in step (S1). When being asked to announce the state of one qubit in step (S3), if Alice prepared |Ψ〉1 there, she measures system A in basis {|ϕ0〉, |ϕ1〉}, then announces |0〉 (|1〉) to Bob when the measurement output is |ϕ0〉 (|ϕ1〉). In this case, Bob would discard his measurement result if he measures this qubit in basis {|+〉, |−〉}. The situation is similar when Alice prepares |Ψ〉2. Obviously, Alice can pass Bob's checking in this way.

Then, how many bits can be gained by Alice via this attack? Without loss of generality, we assume that Alice prepares |Ψ〉1 and sends system B to Bob. If the measurement basis anouced by Bob in step (S4) is {|0〉, |1〉} (which occurs with probability ), Alice can undoubtedly obtain this key bit by measuring system A in basis (see equation (18)); if the announced basis is {|+〉, |−〉} (which also occurs with probability ), note that |Ψ〉1 can also be written as

Alice can infer the key bit by unambiguously discriminating the non-orthogonal states and with maximal average success probability being 0.2. That is, Alice can obtain 60% (a little larger than 50% an honest Alice could obtain) of the key bits at most. In fact, Alice could not obtain the whole key as long as .