In the last few years, scientists have been able to design transcription activator–like effectors (TALEs) to bind many sequences in DNA, leading to technology that has been used to edit genomes in zebrafish, rats, worms and human cells as well as to induce gene expression.

Unlike other known DNA-binding domains, TALE domains use a straightforward code, allowing researchers to easily design modular proteins that bind to desired DNA sequences. Though repetitive units of the TALEs were known to interact with specific nucleotides in a 1:1 ratio, the molecular details of these interactions had been a mystery. Now two independent groups have solved crystal structures that reveal how exactly this DNA recognition works.

TALEs, originally derived from plant pathogens, are made up of 'repeats', or domains of 33–35 conserved amino acids. In the middle of each repeat are two variable amino acids that determine which nucleotide a repeat recognizes. Researchers led by Yigong Shi, Nieng Yan and colleagues at Tsinghua University determined the crystal structure of an engineered TALE with 11.5 repeats bound to its DNA target at 1.8 angstrom resolution, as well as a structure of the unbound protein at 2.4-angstrom resolution; these structures revealed dramatic conformational changes by the TALEs. Separately, researchers led by Barry Stoddard at the Fred Hutchinson Cancer Research Center and colleagues determined a 3.0-angstrom crystal structure of a naturally occurring 23.5-repeat TALE bound to its DNA target, capturing examples of the six most common TALE recognition units.

These structures reveal that the repeats associate with each other to form a right-handed superhelix that wraps itself along the major groove of DNA. Each individual repeat is a left-handed two-helix bundle that exposes the two variable amino acids to DNA. One of these amino acids makes a specific contact with a nucleotide in the DNA-sense strand, while the other stabilizes the contact between the DNA and the protein.

How the TALEs interact with the host cell's transcriptional machinery is still unclear. What is clear is the structure behind their ability to contact DNA sequences specifically. This knowledge should aid design of improved DNA-binding proteins.