Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

# Avoiding genetic racial profiling in criminal DNA profile databases

### Subjects

A preprint version of the article is available at bioRxiv.

## Abstract

DNA profiling has become an essential tool for crime solving and prevention, and CODIS (Combined DNA Index System) criminal investigation databases have flourished at the national, state and even local level. However, reports suggest that the DNA profiles of all suspects searched in these databases are often retained, which could result in racial profiling. Here, we devise an approach to both enable broad DNA profile searches and preserve exonerated citizens’ privacy through a real-time privacy-preserving procedure to query CODIS databases. Using our approach, an agent can privately and efficiently query a suspect’s DNA profile device in the field, learning only whether the profile matches against any database profile. More importantly, the central database learns nothing about the queried profile, and thus cannot retain it. Our approach paves the way to implement privacy-preserving DNA profile searching in CODIS databases and any CODIS-like system.

This is a preview of subscription content, access via your institution

$32.00 All prices are NET prices. ## Data availability All of the measurements reported in this paper, together with the code, have been deposited in Zenodo35. The input data used for the performance measurements were synthetically generated based on current CODIS specifications (Methods). The data-generation script is included with the Zenodo repository along with instructions on how to reproduce the experimental evaluation. Source data are provided with this paper. ## Code availability The code used for all performance evaluation is freely available under an MIT license in the private-codis GitHub repository (https://github.com/jBlinden/private-codis). Both the code and the raw measurements reported in this paper have been deposited at Zenodo35. ## References 1. Combined DNA Index System (CODIS) (Federal Bureau of Investigation); https://www.fbi.gov/services/laboratory/biometric-analysis/codis 2. What is Rapid DNA? (ANDE); https://www.ande.com/what-is-rapid-dna/ 3. Rapid DNA Solution — Because Every Minute Counts (ThermoFisher, accessed 12 April 2021); https://www.thermofisher.com/us/en/home/industrial/forensics/human-identification/forensic-dna-analysis/dna-analysis/rapidhit-id-system-human-identification.html 4. Hazel, J. W., Clayton, E. W., Malin, B. A. & Slobogin, C. Is it time for a universal genetic forensic database? Science 362, 898–900 (2018). 5. Crowley, M. How commandos could quickly confirm they got their target. The New York Times (27 October 2019). 6. Joly, Y., Marrocco, G. & Dupras, C. Risks of compulsory genetic databases. Science 363, 938–940 (2019). 7. CODIS—NDIS Statistics (Federal Bureau of Investigation, accessed 1 February 2021); https://www.fbi.gov/services/laboratory/biometric-analysis/codis/ndis-statistics 8. Arnaud, C. Thirty years of DNA forensics: how DNA has revolutionized criminal investigations. Chem. Eng. News 95, 16–20 (2017). 9. Murphy, H. Coming soon to a police station near you: the DNA ‘Magic Box’. The New York Times (21 January 2019). 10. Ransom, J. & Southall, A. ‘Race-Biased Dragnet’: DNA from 360 black men was collected to solve Vetrano murder, defense lawyers say. The New York Times (31 March 2019). 11. NYPD’s ‘Knock-and-Spit’ DNA database makes you a permanent suspect. Newsweek (11 February 2019). 12. Joly, Y. et al. Establishing the International Genetic Discrimination Observatory. Nat. Genet. 52, 466–468 (2020). 13. Jackman, T. Nationwide DNA testing backlog has nearly doubled, despite$1 billion in federal funding. Washington Post (23 March 2019).

14. Dickerson, C. U. S. Government plans to collect DNA from detained immigrants. The New York Times (2 October 2019).

15. Ransom, J. & Southall, A. N. Y. P. D. Detectives gave a boy, 12, a soda. He landed in a DNA database. The New York Times (15 August 2019).

16. Frequently Asked Questions on CODIS and NDIS (Federal Bureau of Investigation); https://www.fbi.gov/services/laboratory/biometric-analysis/codis/codis-and-ndis-fact-sheet

17. Core STR Loci Used in Human Identity Testing (NIST, accessed 12 April 2021); https://strbase.nist.gov/coreSTRs.htm

18. Wang, Z. et al. Developmental validation of the Huaxia Platinum System and application in 3 main ethnic groups of China. Sci. Rep. 6, 31075 (2016).

19. Norrgard, K. Forensics, DNA fingerprinting and CODIS. Nat. Educ. 1, 35 (2008).

20. ENFSI DNA Working Group. in DNA Database Management Review and Recommendations 22–25 (ENFSI, 2017).

21. Hopcroft, J. E., Motwani, R. & Ullman, J. D. Introduction to Automata Theory, Languages and Computation (Pearson, 2006).

22. Kilian, J. Founding cryptography on oblivious transfer. In Proc. Twentieth Annual ACM Symposium on Theory of Computing, STOC ‘88 20–31 (ACM, 1988).

23. Rabin, M. O. How to exchange secrets with oblivious transfer. IACR Cryptol. EPrint Arch. 2005, 187 (2005).

24. Kolesnikov, V., Kumaresan, R., Rosulek, M. & Trieu, N. Efficient Batched Oblivious PRF with Applications to Private Set Intersection. In Proc. 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ‘16 818–829 (ACM, 2016).

25. O’Connor, K. L., Butts, E., Hill, C. R., Butler, J. & Vallone, P. Evaluating the effect of additional forensic loci on likelihood ratio values for complex kinship analysis. In Proc. 21st International Symposium on Human Identification 10–14 (NIST, 2010).

26. Yao, A. C.-C. Protocols for secure computations. In Proc. 23rd Annual Symposium on Foundations of Computer Science 160–164 (IEEE, 1982).

27. Lipmaa, H. in Advances in Cryptology—ASIACRYPT 2003. Lecture Notes in Computer Science Vol. 2894 (ed. Laih, C.S.) 416–433 (Springer, 2003).

28. Narayanan, A., Thiagarajan, N., Lakhani, M., Hamburg, M. & Boneh, D. Location privacy via private proximity testing. In Proc. NDSS Symposium 2011 (The Internet Society, 2011).

29. Naor, M. & Pinkas, B. Oblivious transfer and polynomial evaluation. In Proc. Thirty-First Annual ACM Symposium on Theory of Computing STOC ‘99 245–254 (ACM, 1999).

30. Canetti, R. Security and composition of multiparty cryptographic protocols. J. Cryptol. 13, 143–202 (2000).

31. Ishai, Y., Kilian, J., Nissim, K. & Petrank, E. in Advances in Cryptology. CRYPTO 2003. Lecture Notes in Computer Science Vol. 2729 (ed. Boneh, D.) 145–161 (Springer, 2003).

32. Boyle, E. et al. Efficient two-round OT extension and silent non-interactive secure computation. In Proc. 2019 ACM SIGSAC Conference on Computer and Communications Security 291–308 (ACM, 2019).

33. Troncoso-Pastoriza, J. R., Katzenbeisser, S. & Celik, M. U. Privacy preserving error resilient DNA searching through oblivious automata. In Proc. 14th ACM Conference on Computer and Communications Security 519–528 (ACM, 2007).

34. Sasakawa, H. et al. Oblivious evaluation of non-deterministic finite automata with application to privacy-preserving virus genome detection. In Workshop on Privacy in the Electronic Society (WPES) 21–30 (ACM, 2014).

35. Blindenbach, J. A., Jagadeesh, K. A., Bejerano, G. & Wu, D. J. Avoiding Genetic Racial Profiling in Criminal DNA Profile Databases (Zenodo, 2021); https://doi.org/10.5281/zenodo.4589351

## Acknowledgements

We thank B. Case and D. Boneh for helpful discussions in an early phase of this project and A. Regev for support (K.A.J.). This work was also supported by the Joint University Microelectronics Program (JUMP) Undergraduate Research Initiative (J.A.B.), the Stanford A.I. Lab (G.B.), NSF CNS-1917414 (D.J.W.) and a University of Virginia SEAS Research Innovation Award (D.J.W.).

## Author information

Authors

### Contributions

J.A.B., K.A.J., G.B. and D.J.W. designed the study, analyzed results and wrote the manuscript. J.A.B. wrote software for the analysis with input from K.A.J., G.B. and D.J.W.

### Corresponding authors

Correspondence to Gill Bejerano or David J. Wu.

## Ethics declarations

### Competing interests

The authors declare no competing interests.

Peer review information Nature Computational Science thanks Denise Syndercombe Court, Tara C. Matise and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Fernando Chirigati was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Extended data

### Extended Data Fig. 1 Layered DFA for equality test.

This DFA computes the equality-check function gv (w) that outputs 1 if v=w and 0 otherwise. In particular, for a vector $${\boldsymbol{v}} = (v_1, \ldots ,v_n) \in \left\{ {0,1} \right\}^n$$, this DFA only accepts the input $${\boldsymbol{w}} = \left( {w_1, \ldots ,w_n} \right) \in \left\{ {0,1} \right\}^n$$ where vi=wi for all 1 ≤ in. We use this DFA to decide whether there is a match at a single STR locus. If we denote the single start state as ‘layer 0’, the two states one can arrive at from layer 0 after reading the first bit as ‘layer 1’, etc. we see that this DFA has n+1 layers, such that after reading i bits, it can only be in one of the two states in layer i.

### Extended Data Fig. 2 Layered DFA for thresholding.

This DFA computes the threshold function $$h_{\left( {a_1, \ldots ,a_n} \right),k}$$ for the k=1 case. Namely, $$h_{\left( {a_1, \ldots ,a_n} \right),k}\left( {b_1, \ldots ,b_n} \right)$$ outputs 1 if ai=bi for all but at most k indices 1 ≤ in. In other words, for any sequence of bits (a1,…,an){0,1}n, this DFA accepts if the input b1,…,bn satisfies bi=ai for all but at most one index i. For instance, in this work, we use this DFA to decide whether a DNA profile matches against a database record on at least 19 out of 20 loci (that is, the setting where k=1 and n=20) as well as the other configurations. Here, the ith input bit bi{0,1} is the (blinded) equality bit denoting whether there is a match in the ith STR locus (between the agent device’s query and the central database’s record). In our protocol, this (blinded) equality bit is computed using the equality-test DFA from Extended Data Fig. 1. The bits a1,…,an in the function description $$h_{\left( {a_1, \ldots ,a_n} \right),k}$$ are the blinding values chosen by the server. Recall that the blinding is introduced to hide from the client all information on whether there was a match at STR locus i between the database server’s profile and the client’s query. The client only learns whether its query matches the record or not, and nothing more. Much like Extended Data Fig. 1, this DFA has n+1 layers, such that after reading i bits, the computation can only be in one of the (at most) 3 states of layer i.

## Source data

### Source Data Fig. 3

Statistical source data.

## Rights and permissions

Reprints and Permissions

Blindenbach, J.A., Jagadeesh, K.A., Bejerano, G. et al. Avoiding genetic racial profiling in criminal DNA profile databases. Nat Comput Sci 1, 272–279 (2021). https://doi.org/10.1038/s43588-021-00058-3

• Accepted:

• Published:

• Issue Date:

• DOI: https://doi.org/10.1038/s43588-021-00058-3

• ### Protecting against racial bias in DNA databasing

• Denise Syndercombe Court

Nature Computational Science (2021)