Identifying immunologically-vulnerable regions of the HCV E2 glycoprotein and broadly neutralizing antibodies that target them

Isolation of broadly neutralizing human monoclonal antibodies (HmAbs) targeting the E2 glycoprotein of Hepatitis C virus (HCV) has sparked hope for effective vaccine development. Nonetheless, escape mutations have been reported. Ideally, a potent vaccine should elicit HmAbs that target regions of E2 that are most difficult to escape. Here, aimed at addressing this challenge, we develop a predictive in-silico evolutionary model for E2 that identifies one such region, a specific antigenic domain, making it an attractive target for a robust antibody response. Specific broadly neutralizing HmAbs that appear difficult to escape from are also identified. By providing a framework for identifying vulnerable regions of E2 and for assessing the potency of specific antibodies, our results can aid the rational design of an effective prophylactic HCV vaccine.


Supplementary Note 1. Comparison of predictions of the proposed model with those using a simpler conservation-only model
To investigate the importance of incorporating interactions between mutations at different residues, we compared the predictions of the proposed method with those obtained using a simpler model based only on amino acid conservation (or single mutant probabilities), which ignores such interactions. The most meaningful comparison is with respect to a conservation-based maximum entropy model parametrized only by the "fields" ℎ " ( ). These are given by As we show below, further tests revealed that incorporating residue interactions into our model is not only important in making fitness predictions, but also provides interesting differences in the main results of the proposed work; namely, for the classification of antibodies based on relative escape times ( Fig. 4a and Supplementary Fig. 4).
First, we computed the analogous result to Fig. 4a (relative escape times associated with the binding residues of HmAbs defined based on global alanine scanning 9 ), using the conservationonly model. These results are shown in Supplementary Fig. 11, where they are contrasted against the predictions of our proposed model. A notable observation is that the predictions of the two models are quite distinct for domain D HmAb HC84-20. Specifically, while the conservation-only model predicted the minimum escape time of HC84-20 to be similar to the other domain D HmAbs ( Supplementary Fig. 11, bottom panel), our model identified it as comparatively escape-resistant ( Supplementary Fig. 11, top panel). This distinction in our model prediction suggests that the mutations at the non-conserved binding residues of HC84-20 still bear high escape time due to the additional constraints imposed by their interaction with other protein residues (that are taken into account exclusively in our model). Note that HmAb HC84-20 is different from the other domain D HmAbs (HC84-24 and HC84-26) as residue 442-a common binding residue of these HmAbs-is not a binding residue 9 of HC84-20 at RB ≤ 20%. Mutations at the residue 442 are known to be associated with escape from domain D HmAbs 12 and our model correctly associated it with relatively lower escape time (< 100 generations). This is evident from our model predictions at RB ≤ 40% ( Supplementary Fig. 4) where residue 442 is a binding residue of all domain D HmAbs (including HC84-20), and at RB ≤ 20% ( Supplementary Fig. 11, top panel) where it is a binding residue only of HmAbs HC84-24 and HC84-26. Thus, the absence of residue 442 from the binding residues of HmAb HC84-20 at RB ≤ 20% suggests that this domain D HmAb may be relatively escape-resistant compared with other HmAbs, which is also in line with predictions of a recent report 13 .
Second, we compared predictions of the two models for relative escape times associated with the binding residues of HmAbs defined based on selective alanine scanning (Supplementary Table 4). Contrary to the conservation-only model's prediction, our model predicted the minimum escape time of HmAbs AR3A-AR3C ( Supplementary Fig. 12) to be comparatively higher than other HmAbs, suggesting the potential difficulty for the virus in escaping these antibodies. This is consistent with the reported potency of these antibodies in preventing as well as clearing chronic HCV infection in humanized mice 14 .

Supplementary Figures
Supplementary Figure 1 | Comparison of the protein length, mean residue entropy, and number of parameters required to estimate fitness landscape for different HCV proteins. Of all HCV proteins, E2 has the highest mean residue entropy, as well as the highest number of parameters. Mean residue entropy can be calculated by using the relation =  Table 4). All HmAbs predicted to be relatively escape-resistant are grouped together (in no specific order) on the left (shaded) and the remaining ones are grouped together on the right. In this box plot, the bold horizontal line indicates the median, the edges of the box represent the first and third quartiles, whiskers extend to span a 1.5 inter-quartile range from the edges (both box edges and whiskers are not visible due to the distribution being too skewed around the median), and the spheres represent the outliers (values beyond the range of whiskers).

Supplementary Figure 8 | Robustness of the (a) single and (b) double mutation probabilities to the number of sequences in the MSA.
Each box plot shows the normalized root-mean-square-error (NRMSE) of the probabilities observed in the subsampled (sampling with replacement) MSA for 500 runs, where B is the total number of sequences (after data preprocessing). It can be observed that the median of the NRMSE of the probabilities calculated using the subsampled MSA converged to within 0.05, even when only half of the total sequences were used to construct the subsampled MSA. In each box plot, the bold horizontal line indicates the median, the edges of the box represent the first and third quartiles, and whiskers extend to span a 1.5 inter-quartile range from the edges. The results are presented for the sets of residues involved in antigenic domains (Fig. 3b) or in forming the binding residues of HmAbs (Fig. 4a). The y-axis represents the escape times averaged over three randomly-selected mutually distinct sets of 25 T/F sequences, with red error bars denoting one standard deviation. The result is presented for the sets of residues involved in antigenic domains (Fig. 3b) or in forming the binding residues of HmAbs (Fig. 4a).  Table 4). The background of the HmAbs predicted by our proposed model to be relatively escape-resistant is shaded grey (top panel).