Figure 2: Deriving folded three-dimensional structure for a target protein sequence. | Nature Biotechnology

Figure 2: Deriving folded three-dimensional structure for a target protein sequence.

From: Protein structure prediction from sequence variation

Figure 2

(a) Workflow as implemented on the publicly available web server EVfold.org. Related methods (Table 1) follow similar steps, but details differ. The amino acid sequence of the target protein is used to perform a database search for putative structural homologs, with attention to the optimal cutoff in sequence similarity so that sufficient sequences are available yet they are not too far diverged to lose subfamily specificity. Minimally, hundreds of sequences are needed to derive plausible causative evolutionary couplings. For ten candidate structures for a medium-sized protein (200 residues), the computation takes less than an hour on a typical laptop computer. (b) The principal confounding effect dealt with by global probability models, but not by the local models, is that of transitive (indirect) correlations that do not reflect causative evolutionary constraints on interactions. For example, correlations between residues A and B, residues A and D, and residues D and C are causative because they reflect direct interactions, whereas residues A and C show transitive correlation owing to their mutual direct interactions with residue D. The transitive correlations, in special cases, can have numerically stronger correlation values than causative correlation, for example, if two noninteracting residues have in common several neighbors27, thereby confounding structure prediction.

Back to article page