Quantum private set intersection cardinality based on bloom filter

Private Set Intersection Cardinality that enable Multi-party to privately compute the cardinality of the set intersection without disclosing their own information. It is equivalent to a secure, distributed database query and has many practical applications in privacy preserving and data sharing. In this paper, we propose a novel quantum private set intersection cardinality based on Bloom filter, which can resist the quantum attack. It is a completely novel constructive protocol for computing the intersection cardinality by using Bloom filter. The protocol uses single photons, so it only need to do some simple single-photon operations and tests. Thus it is more likely to realize through the present technologies. The validity of the protocol is verified by comparing with other protocols. The protocol implements privacy protection without increasing the computational complexity and communication complexity, which are independent with data scale. Therefore, the protocol has a good prospects in dealing with big data, privacy-protection and information-sharing, such as the patient contact for COVID-19.

multi-party summation 29 , quantum digital signature 30 , identity-based quantum signature 31 and quantum private query protocols 32,33 . Of course, there are some quantum protocols for PSI cardinality which are proposed 34,35 in recently. However, we need more practical and high-efficiency PSI cardinality protocols to fit the application in real world.
Contributions: In this paper, we propose a novel quantum private set intersection cardinality based on the Bloom filter, which can resist the quantum attacks. It is a completely novel constructive protocol for computing the intersection cardinality by using Bloom filter. Firstly, the elements in two data sets are filtered by using a Bloom filter, and then are transmitted by using BB84 protocol. Lastly, the intersection of privacy sets can be calculated. The novel cardinality protocol uses single photons, so it only need to do some simple single-photon operations and tests. Comparing with other protocols, the results show that the novel protocol achieves privacy preservation without increasing computational complexity and communication complexity, and the computational complexity and communication complexity are independent with the data scale. Thus it is more likely to realize with the present technologies. Therefore, the protocol has a good prospects in dealing with big data, privacy-protection and information-sharing, such as the patient contact for COVID- 19. In this paper, we present a practical and feasible quantum private set intersection cardinality protocol, which can privately compute the intersection cardinality. The organization of the paper is following, the second section is the basic knowledge about BB84 protocol and Bloom filter which will use in the protocol. We present a novel protocol about quantum private set intersection cardinality based on Bloom filter in "Quantum private set intersection cardinality" section. The security and correctness analysis are shown in "Performance" section. Finally, in "Conclusion" section, we give the conclusion of the paper. Preliminaries BB84 protocol. The BB84 protocol 36 encodes information with four polarized photons. Let's label these four states of polarization as {↔ , ր , , տ} . In two dimension Hilbert space X = {↔, �} and Z = {ր, տ} form two different orthogonal basis. Based on the Uncertainty Principle, X can differentiate ↔ and state, Z can differentiate ր and տ state.
The following four steps are the BB84 protocol.
(1) Coding and Transmission. The sender, Alice, randomly selects a basis from X and Z and encodes the information. Then Alice records the basis that she has selected.
(2) Reception and test. The receiver, Bob, randomly selects a basis from X and Z and tests its receiving state. Then Bob records the basis.
(3) Comparison and selection. Bob tells Alice the bases he have chosen, Alice responses on which bases they have selected the same. Then they discard the other different bits. By this means, they can share a key which is called row key.
(4) Testing of Eavesdropping. Alice and Bob randomly select some bits in row key and compare them in classical channel. If there exist error bits, it means the key is not secure and exists an eavesdropper.
The probability which Alice and Bob select the same basis is 1/2, so the efficiency will be 50% . If there is an eavesdropper who wants to test the states by using the random basis, He will have 1/2 possibility to select the correct basis. However, the eavesdropper selects the incorrect basis, he will alter the state. If Bob options the correct basis, he will get an incorrect bit. So each time when the eavesdropper tests, he has 1/4 possibility to get wrong bit. when Alice and Bob select n bits to test whether there exist an eavesdropper, the possibility will be 1 − (3/4)n with the eavesdropper being detected.
Bloom filter. Bloom filter 37 is a space and time efficient method, which can test an element whether in a set or not. An initial Bloom filter b includes m bits that the initial values are 0s, and has k hash functions h i (0 ≤ i < k) . Here we could get the k hash functions from random oracles. And b j (0 ≤ j < m) is the j-th bit of the Bloom filter b. Bloom filter has two kinds of operations, one is Add(x), the other is Test(x). Add(x) adds element x to the set. Test(x) tests the element x to the set.
Create(m): m bits (0 ≤ j < m) are set to 0 Add(x): Hash the element x by using the k hash functions h i and change the k bits g i to 1.
Test(x): Using all k hash functions h i to hash the element x and judging all k bits g i in set, then the test function returns 1 (true).
However, due to the collision probability of the hash function, it is impossible to guarantee that the element must exist in the set when the element ′ s b i are all 1. So it may be exist a certain false positive probability in Bloom filter, namely the false positive rate. i.e. Test(x) may be true, but x is not added in set. The more data adds into Here, we introduce a third party (Charlie) to assist client Alice and client Bob to calculate the intersection cardinality with the input private sets, and then propose a novel QPSI protocol based on Bloom filter with the help of Charlie. Charlie could be dishonest but never collude with other parties.

Quantum private set intersection cardinality
System model. Based on the quantum public key distribution, BB84 protocol and Bloom filter, we propose a novel QPSI protocol to calculate the intersection cardinality with the input private sets. First we assume that the system model has a third party (Charlie) and two clients(Alice and Bob), and the sets A, B are the private sets of Alice and Bob. The elements in A, B lie in Z N , where Z N = {0, 1, 2, . . . , N − 1} , N = 2 n (i.e.n = logN ). Moreover, assume that n i=1 n c i < N 2 , N and n c i are public. In the protocol, we suppose all the clients and the third party are semi-honest: they are curious with the privacy of others, but are honest to carry out the operations of the scheme. The system shows in Fig. 1.

Protocol.
The protocol includes Thirteen steps as following: Step 1. Alice initials the bloom filiter, generates the the bloom filiter(N) and k hash functions.
Step 2. By running BB84 QKD protocol, Alice shares the k hash functions h i and N with Bob.
Step 3. Alice and Bob use the k hash functions h i to hash the private sets A, B into the corresponding private vectors (x 0 , x 1 , . . . , x N−1 ) , (y 0 , y 1 , . . . , y N−1 ) respectively.
Alice generates the private vector (x 0 , x 1 , ..., x N−1 ) ∈ F N 2 by her private set A, where each element of the set determines one component of the vector. Similarly, Bob generates the private vector (y 0 , y 1 , . . . , y N−1 ) ∈ F N 2 by his private set B.
Step 4. Charlie chooses N groups of single photon sequences, and each group includes m single photons, these single photons are chosen randomly from the following four states, {|0 ′ �, |1 ′ �, |+ ′ �, |− ′ �}, . . . ; s N 1 , s N 2 , . . . , s N m } . In addition, Charlie records the initial states of N groups of photon sequences that he has chosen.
Step 5. Charlie again chooses m * (m * ≤ m) additional photons which are in four states {|0�, |1�, |+�, |−�} , and inserts each group single photon sequences randomly. We call these photons are puppet photons which can avoid attack from the participant (such as Bob) e.g.,{s i 1 , s * i 1 , s i 2 , s * i 2 , . . . , s i m , s * i m , } , here s * i j are the puppet photons. Correspondingly, we use S * to denote the sequence of all (m + m * )N photons, which includes m * N puppet photons and mN signal photons. Charlie makes a record of the positions where these puppet photons have inserted.
Step 6. Charlie chooses q decoy photons randomly from four states {|0�, |1�, |+�, |−�} . when transmitting the photon sequence, these decoy photons can check if there is an eavesdropper or not. In addition, Charlie randomly puts the q decoy photons into the sequence S * , and calls the new sequence as S * C . Then Charlie records the details of the positions and states of the q decoy photons. Thus, only Charlie knows the initial states and the positions of the q decoy photons. Finally, Charlie sends the new sequence S * C which include signal photons, puppet photons and decoy photons to Alice in order from quantum channel.
Step 7. When Alice receives the sequence S * C from Charlie, she will ask for Charlie opening the positions of q decoy photons in S * C and the corresponding test bases. Then Alice tests the decoy photons sequence with the right bases and publishes the corresponding test consequences. Charlie contrasts the initial states of the decoy photons that he has recorded to the corresponding test consequences of Alice. Lastly, comparing the error rate with the threshold value which is decided in advance by the channel noise, if the error rate is higher, this protocol will be discarded. Otherwise, go to the next step.
Step 8. Alice deletes the decoy photons from S * C and obtains the photons sequence S * , that includes N groups, and each group has (m + m * ) photons, the single photon sequences are Alice does a unitary operation on the signal photon and the puppet photon, i.e., for s i j (j = 1, 2, . . . , m) and s * i j (j = 1, 2, . . . , m * ) , the strategies is that:if x * i−1 = 0 , Alice will do a local unitary operation I on the signal photon and the puppet photon s * i j (s * i j ) ; If x * i−1 = 1 , Alice will do a local unitary operation σ x on the signal photon and the puppet photon s * i j (s * i j ).
Step 9. Then, Alice chooses q decoy photons randomly from four states{|0�, |1�, |+�, |−�} to avoid eavesdropping. Similarly, Alice puts q decoy photons into the sequence S * randomly, and we call the new sequence as S * A . Then Alice records the decoy photons' detail positions and states. Finally, Alice sends the new photons sequence S * A to Bob in orderly through the quantum channel.
Step 10. Analogously, when Bob receives the photons sequence S * A from Alice, he asks Alice to publish the detail positions of the q decoy photons in S * A and the corresponding test bases. Then Bob tests the decoy photons sequence with the right bases and publishes the corresponding test consequences. Alice contrasts the initial states of the q decoy photons that he has recorded to the corresponding test consequences of Bob. Compares the error rate with the threshold value which is decided in advance by the channel noise. Thus, if the error rate is higher, this protocol will be discarded. Otherwise, go to the next step.
Step 11. Bob deletes the q decoy photons from S * A and gets S * , that includes N groups, and each group has (m + m * ) photons. The photon sequences are Bob does the same unitary operation as Alice on the (m + m * ) photons:s i j for j = 1, 2, . . . , m * and s * i Here, all m signal photons sequences in N groups are selected initially in state |0 ′ � = cosθ|0� + sinθ|1� . In table I, we give all possible cases of Charlie's test. For instance, if x i = 1 , Alice will do the unitary operator σ x on the group of the m signal photon. So this group signal photon of the state will be changed into the state sinθ|0� + cosθ |1� . Just like Alice, if y i = 1 , Bob will do the unitary operator σ z on the group of the m signal photon, then he can get the state of each signal photon in sinθ|0� − cosθ |1� . Therefore, the test result of this group signal photon in the end must be |1 ′ � , i.e., thus we can see that the final state is orthogonal to the initial state |0 ′ � . Then, t = t + 1 . Moreover, there are other 3 cases (i.e., it is depicted in Table 1), for example, Charlies gets the final state |0 ′ � are 1 with the probabilities (cosθ 2 − sinθ 2 ) 2 and 4cosθ 2 sinθ 2 .
Visibly, for the first row in tableI, the probability that the initial state is identical with Charlie's test result is 100% , so in this group Charlie do one test on any signal photon and the counter t need not add one. In table I, for the second and third rows, we can know that the best choice is θ = π 8 in our protocol, so that (cosθ 2 − sinθ 2 ) 2 = (cos2θ) 2 = 1 2 and 4cosθ 2 sinθ 2 = (sin2θ) 2 = 1 2 . It means that the probabilities of Charlie's getting the state |0 ′ � are both 1 2 in the second and the third rows. Moreover, in this group the probability is Nevertheless, if x i = 0 or y i = 0 , Charlie finds that in this group at least one test result is identical with the initial state with probability 1 − 1 2 m . It means that the error probability(i.e., " x i = 0 or y i = 0 " will be judged as " x i = 1 or y i = 1 ") is 1 2 m . Therefore, if it has r errors, the error probability will be Security. Now we analyze the security. The protocol is implemented with the help of Charlie(TP), who could insincere but never collude with any other 34 . Firstly, we consider the Charlie's (TP) attacks. In order to get the partial or whole private information of Alice or Bob, insincere Charlie may initially use some entangled photon pairs (e.g., EPR pairs) to replace the initial single photons. Then Charlie will keep one photon of the entangled photon pair in his hand and send the other to Alice or Bob. When Alice or Bob receives the entangled photon, they will do the private operations (I, σ x or σ z ) on the photons, then Charlie wants to find out the operations that Alice or Bob has performed on the corresponding photon when the photon in their hands. In fact, no matter what operation Alice or Bob have done, the reduced density matrix of the subsystem that Charlie holds doesn't change anything. For instance, if Charlie prepare the entangled photon pairs state |10� , then he will keep the first photon in his hand and send the second photon to the parties, then the parties do the operations, the reduced density matrix which Charlie still keep the state 1 3 |0��0| + 2 3 |1��1| , no matter what operations the parties do. That means, Alice or Bob's private operations can't affect the reduced density matrix of the subsystem. So even if Charlie prepare a entangled quantum resource to replace the single photon, he would not be able to extract any of Alice's or Bob's private information.   www.nature.com/scientificreports/ In addition, if Charlie is fraudulent, he want to intercept all photons of the sequence S * A which are send from Alice to Bob, including signal photons, puppet photons and decoy photons, and want to get some or all information about Alice's private operations (I, σ x or σ z ) which connect with Alice's private vector ( x i = 0 or x i = 1 ). To avoid detection, he might just pick a particular photon from each group and instead of it with a false photon. In addition, we suppose that Charlie can accurately speculate the photon's initial state not the decoy photon's, then Charlie can use the optimal Unambiguous State Discrimination(USD) test 38 . Based on USD Charlie can know the select photon which the two possible states is actually in. The successful probability of USD is following Here F(ρ 0 , ρ 1 ) is fidelity that Charlie is trying to distinguish from the two quantum states. Assuming that the initial state that Charlie send is in |0 ′ � = cosθ |0� + sinθ|1� , then Alice return to the state in |0 ′′ sinθ� + cosθ|1� (i.e., x i = 1 ), the successful probability of USD is p USD .
When θ = π 8 , get, So according to the optimal Unambiguous State Discrimination, Charlie can successfully infer x i = 0 or x i = 1 with the probability 0 · 29 . Whereas, Charlie still cannot get the values of any x i without the hash functions h k .
Since (x 1 , x 2 , . . . , x N−1 ) is corresponding to ADD(x) with h i (A) , Charlie cannot rightly guess i whether belongs to Alice's private set A without the information of h i (A).
Charlie tests all the photons that Alice sends to Bob directly (In fact, Alice and Bob are easily to find this malicious attack), but he can not get the information about A or B. Suppose Charlie succeeds in getting Alice's private vector (x 0 , x 1 , . . . , x N−1 ) , he cannot obtain the original set A without the hash functions h k , so the security is guaranteed by hash functions h k based on Quantum Key Distribution. Clearly, the hash function h k are completely secure. Similarly, based on the Test(y) with h i (B) , Bob can get the private vector (y 0 , y 1 , . . . , y N−1 ) , and Charlie can not get Bob's original vector (y 0 , y 1 , . . . , y N−1 ) because of the privacy hash function h k (B).
In consequence the protocol is esistant to attacks by a insincere or malicious Charlie. Then, we discuss Alice's or Bob's attack. Assume that Bob wants to get Alice's private input. When Bob receives the sequence S A , he will delete all decoy photons in Step 11 and get the sequences S * , that including Alice's private information, Bob would not obey the rules honestly, he will try to get Alice's vector x 0 , x 1 , . . . , x N−1 through testing S * sequences group by group, and then he sends the fake sequences to Charlie. Here, we only analyze one group photon sequences, for example, Alice send s i 1 , s * i 1 , s i 2 , . . . , s * i 2 , . . . , s * i m * , s i m to Bob, when Bob receives the sequences, he will hide the value of x i . Moreover, suppose that the states of photons sequences s i 1 , s * i 1 , s i 2 , . . . , s * i 2 , . . . , s * i m * , s i m choose from |0 ′ � = cosθ |0� + sinθ|1� and the states of puppet photons sequences choose from |0�, |1�, |+�, |−� by Charlie randomly. Then if the puppet photons are not considered, we can get the following two cases: Firstly, if x i = 0 , after Alice does the operation, the states of all signal photons are not change. Thus Bob can accurately identify x i through testing all signal photons with base |0 ′ �, |1 ′ � , but he doesn't know the correct base of test. So the probability that Bob know x i = 0 is 1 2 . secondly, if x i = 1 , after Alice does the operation σ x or σ z on photon sequences, and then the state will be changed into |0 ′′ � = sinθ|0� + cosθ |1� or |1 ′′ � = cos|0� − sinθ|1� . Moreover, if Bob chooses the right base |0 ′ �, |1 ′ � to test the photons sequences, then he is able to find the states of the signal photons which are not in the same state, and further, he could correctly understand and deduce x i = 1 . Similarly, he doesn't know the right test base. So the probability that Bob know x i = 1 is 1 2 . From above analysis, if Bob is able to distinguish puppet photons and signal photons, then Bob can get the values of x i with the probability 50% . But, because Bob doesn't know the states of the puppet photons and signal photons, and also doesn't know the location that the puppet photons are inserted in the sequence of signal photons. Meanwhile, the states of any puppet photon and signal photon are non-orthogonal. Based on the basic laws of quantum mechanics, we know that the non-orthogonal states are not distinguishable. Therefore, Bob attack is not feasible.
In addition, in order to improve the security, Charlie can dynamically choose θ one group by another, where θ ∈ (0, π 4 )) , the initial states |0 ′ � = cosθ |0� + sinθ|1� , |1 ′ � = sinθ|0� − cosθ|1� . But because Bob doesn't know the signal photons's initial states, he could not choose the right test base yet. Therefore, he could not get the private information that Alice has encoded on the signal photons.
Lastly, we discuss the attacks from outsider. In addition, because the outsider does not know the decoy photons's inserted positions and the test bases, if there is an eavesdropper, it will be easily to find based on the decoy photons. For example, the entangle-and-measure, the intercept-and-resend, the measure-and-resend attack are easily found by checking the decoy photons. Here we only discuss entangle-and-measure attack. Moreover, we use the decoy photons to check the eavesdropper, here the decoy photon in state |ψ� d , |ψ� d ∈ R {|0�, |1�, |+�, |−�} . When the outsider gets the decoy photons, he will use an ancillary photon with state |0� a and do an oracle operator U f on ancillary photon state |ψ� d and decoy state|0� a , the operator U f is following 39 . . We know there are q decoy photons, so the secure requirements is determined by the q secure parameter. Thus, it is not feasible for outsider to carry out such attack.
In fact, the outsider stealing Alice or Bob's information is equivalent to find the operation what Alice or Bob has done on the sequences of puppet photons and signal photons. Since the operation is based on the private vectors of Alice and Bob. So the initial states which are randomly chosen from {|0 ′ �, |1 ′ �, |+ ′ �, |− ′ �} and {|0�, |1�, |+�, |−�} are not knew for outsider. According to the law of quantum mechanics, we can't distinguish the non-orthogonal states. So, the attack from outsider is not possible.
Based on the BB84 protocol and literature 40 , we know that both the communication and computation complexities are O(NlogN) (here one group photons's number is m = logN ). Thus we know that they are independent with the data scale of set A and B. So the protocol is more suitable to handle big data. Table 2 provides a comparison and summary of the performance with other existing protocols. In Table 2, s in classic algorithms represent the data scale. Table 2 shows: (1) Comparing with the classic PSI-CA protocols, The computational complexity and communication complexity will increase linearly with the increasing data scale, such as Huang's 11 scheme, Dong's 25 , Kerschbaum's 39 , and Zhu's 41 . So if the data scale are too large, the complexity will increase linearly and the efficiency will be greatly reduced. But in our protocol the computational complexity and the communication complexities are independent with data scale. (2) Comparing with the quantum protocol of PSI-CA protocol, our protocol only uses the single photons, and adopts the single-photon operations, and tests which are more feasible with current technologies than entangled states. Such as the protocol 34 which use the multi-photon entangled states, complicated oracle operations and tests in high dimensional Hilbert space.
(3) Comparing with the exist protocols, Our protocol doesn't have failure rate. From the above analysis, our protocol is more feasible and practical with existing technologies.

Conclusion
In this paper, we propose a novel quantum private set intersection cardinality based on Bloom Filter to privately compute the cardinality intersection. In order to keep the fairness, the protocol need the help of the third party (Charlie). We use basic laws of quantum mechanics to guarantee the security. Such as, the BB84 protocol and the quantum tests technology can resist all kinds of quantum attacks(the entangle-and-measure, the intercept-andresend, the measure-and-resend attack and so on). In addition, the new protocol takes single photons as quantum resources, so we only do the simple single-photon operations and tests. Thus it is more feasible to prepare these quantum resources and do the single-photon operations and tests with present technologies. Comparing with other protocols, the results show that our protocol achieves privacy preservation without increasing computational complexity and communication complexity, and the computational complexity and communication complexity are independent with the data scale. Therefore, our protocol has a good prospect in dealing with big data, privacy-protection and information-sharing.