Whitehead Institute for Biomedical Research, Nine Cambridge Center, Cambridge, Massachusetts 02142, USA.
- Yaniv Erlich
Department of Computer Science, Princeton University, 35 Olden Street, Princeton, New Jersey 08540, USA.
- Arvind Narayanan
Competing interests statement
The authors declare no competing interests.
Yaniv Erlich is a fellow at the Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, USA. He received his Ph.D. from Cold Spring Harbor Laboratory, New York, USA, in 2010 and his B.Sc. from Tel-Aviv University, Israel, in 2006. Before that, he worked in computer security and was responsible for conducting penetration tests on financial institutes and commercial companies. His research involves developing new algorithms for computational human genetics. Yaniv Erlich's homepage.
Arvind Narayanan is an assistant professor in the Department of Computer Science, and the Center for Information Technology and Policy at Princeton University, New Jersey, USA. He studies information privacy and security. His research has shown that data anonymization is broken in fundamental ways, for which he jointly received the 2008 Privacy Enhancing Technologies Award. His current research interests include building a platform for privacy-preserving data sharing. Arvind Narayanan's homepage.
- Safe Harbor
A standard in the US Health Insurance Portability and Accountability Act (HIPAA) rule for de-identification of protected health information by removing 18 types of quasi-identifiers.
Sets of alleles along the same chromosome.
- Cryptographic hashing
A procedure that yields a fixed-length output from any size of input in a way that is hard to determine the input from the output.
- Dictionary attacks
Approaches to reverse cryptographic hashing by scanning only highly probable inputs.
A common generic name in computer security to denote party A.
A common generic name in computer security to denote party B.
- Type 1 error
The probability of obtaining a positive answer from a negative item.
- Linkage equilibrium
Absence of correlation between the alleles at two loci.
The probability of obtaining a positive answer for a positive item.
The probability of obtaining a negative answer for a negative item.
- Linkage disequilibrium
(LD). The correlation between alleles at two loci.
- Effect sizes
The contributions of alleles to the values of particular traits.
- Positive predictive value
The probability that a positive answer belongs to a true positive.
- Expression quantitative trait locus
(eQTL). A genetic variant associated with variability in gene expression.
- Genotype imputation
A class of statistical techniques to predict a genotype from information on surrounding genotypes.
- Application programming interface
(API). A set of commands that specify the interface with a data set or software applications.
A measure of association in case–control genome-wide association studies.
- Read mapping
A computationally intensive step in the analysis of high-throughput sequencing to find the location of a short DNA sequence (string) in the genome.
- Edit distance
The total number of insertions, deletions and substitutions between two strings.