Data-mining unveils structure–property–activity correlation of viral infectivity enhancing self-assembling peptides

Gene therapy via retroviral vectors holds great promise for treating a variety of serious diseases. It requires the use of additives to boost infectivity. Amyloid-like peptide nanofibers (PNFs) were shown to efficiently enhance retroviral gene transfer. However, the underlying mode of action of these peptides remains largely unknown. Data-mining is an efficient method to systematically study structure–function relationship and unveil patterns in a database. This data-mining study elucidates the multi-scale structure–property–activity relationship of transduction enhancing peptides for retroviral gene transfer. In contrast to previous reports, we find that not the amyloid fibrils themselves, but rather µm-sized β-sheet rich aggregates enhance infectivity. Specifically, microscopic aggregation of β-sheet rich amyloid structures with a hydrophobic surface pattern and positive surface charge are identified as key material properties. We validate the reliability of the amphiphilic sequence pattern and the general applicability of the key properties by rationally creating new active sequences and identifying short amyloidal peptides from various pathogenic and functional origin. Data-mining—even for small datasets—enables the development of new efficient retroviral transduction enhancers and provides important insights into the diverse bioactivity of the functional material class of amyloids.

A point to note here is that in Nature these peptides are usually found as their extended sequence which might have a completely different biophysical property than a truncated peptide version. Hence it will be difficult to say if a peptide derived from functional or pathogenic amyloid which shows enhanced viral transduction, will also show the same effect in their native/extended form. Rather with this study what is clear is that short peptides despite the source of origin, their physicochemical property determines the activity, which in this manuscript is viral transduction.
Data and methodology: The data and methodology in the manuscript are appropriate for the current study.
Analytical approach: All the analytical approaches are valid for the current manuscript. However, the infectivity data lacks statistical analysis.
Experiments with SFG spectroscopy are outside the scope of my expertise.
Suggested improvements.
1. The information about the time scale of aggregate formation is unclear from the manuscript. 2. In the Experimental section, it is unclear if the virus is added to the peptide solution or to the aggregates. 3. It would be useful to know any information on the interaction of virus and the aggregated structures.
Clarity and context: The manuscript is clear and informative.
References: Please check the reference for CR assay in the Experimental section.
The queries given above are some of the concerning issues of the manuscript. These need to be addressed carefully for improving the manuscript before publication.

Reviewer #2 (Remarks to the Author):
This submission contributed by Kaygisiz et al. applied data mining method into structureproperty-activity relationship of transduction enhancing peptides. The authors found that µm-sized -sheet rich aggregates enhance infectivity. This manuscript was very well written and structured. Firstly, I am very impressed by the detailed interpretation, rigorous logic, and massive work. Secondly, the investigations on amyloid-like peptide nanofibers provided abundant explorations and elucidations for their structure-property relationships. Thirdly, the boosting scientific development can support data mining for amyloid-like peptide nanofibers. Fourthly, the authors found unique phenomenon relating amyloid fibrils. Therefore, I highly recommend the publication of this work in Nature communication after minor revision. Here are some suggestions. 1. It is suggested that highlighting the data mining procedure in TOC and in article figures to distinguish this work with the ordinary research article. The current version seems missing the graphical description of this key procedure. 2. All the experimental results (such as each TEM image) are suggested to provided the reference source one by one. This kind of error presents more in SI document. 3. In Figure 4A, please provide all the eight kinds of possible combinations of physicochemical features. This is the root for the subsequent data mining section. Then, Figure 4 can be separated into two figures, not so large as the current version. 4. Actually, the zeta potential values for amyloid-like peptide nanofibers vary in a relatively large range. The standard deviations for zeta potential values are not shown in the current dataset. Especially in SI document, some of the standard deviations of infections are shown but some are not. This will leave doubts to the related research fellows. 5. Most important, the data mining is highlighted at the beginning of the title. However, the data mining procedure and evaluation are not evidently provided in the whole article. Please display the data mining procedure in both abstract, methods, results, discussion, and conclusion. 6. Please check all the abbreviations throughout the manuscript. For example, "CAR T-cell therapy", "HIV", "HIV-1", "TZM-bl cells", etc. The full name should be presented at the first time. The SI document is suggested to check through.

Reviewer #3 (Remarks to the Author):
The present manuscript provides a comprehensive analysis of a range of small peptides on their ability to enhance viral infectivity. This property could be of use for viral uptake related applications. Correlations are obtained for various parameters tested and it is concluded that a combination of factors must be met for efficient enhancement of viral infectivity. Some of these factors were known already, such as the need for positive charges. One main new discovery is that the size of the aggregates matters, with larger aggregates being more potent. This information might well prove useful for understanding the effect of amyloids as adjuvants and should be of value to the field of viral enhancement by aggregates. The novelty to the amyloid field as a whole is rather limited. Much emphasis has been placed on correlations of activity with hydrophobicity and other physical parameters. While solid insights are obtained, they are not necessarily novel. That is, extensive work by Dobson and many others has established nearly the same concepts for amyloid protein misfolding in a series of studies over 2 decades ago. This work does not appear to be cited and it diminishes the broader significance of the present study. There are also numerous proven algorithms available that have good predictive qualities for aggregation and amyloid formation, making the present study less novel. One of the parameters that has also been considered in the literature, but which is not discussed here, is beta-sheet propensity, which leads to a preference for beta-branched side chains in many amyloids. This is only a minor concern, but it was somewhat surprising that it was not included. The paper categorizes aggregates into fibril forming and amorphous aggregate forming peptides. It should be noted that this cannot be correlated merely with intrinsic properties of peptide sequence alone and it is also difficult to decipher simply based on one experiment under one condition. The problem is that nearly all amyloids can form fibrils or more amorphous material, it just matters on the conditions. Sometimes they can co-exist on the same EM grid and sometimes only subtle changes in conditions can switch from one to the other. Sometimes the behavior can even be complicated by the exact purification protocol prior to dissolution. The detection can also be difficult in some cases. The image shown for amorphous aggregates could indeed be fully devoid of fibrils, but it could also contain lots of fibrils. Sometimes fibrils can appear after stripping less ordered proteins or peptide from their surface. Surface binding of non-fibrillar material can make ThT fluorescence highly problematic. The authors need to acknowledge this potential complication more clearly. The analysis of small peptide fragments from naturally occurring pathogenic amyloids is useful in the sense that it shows how their model has predictive power. Unfortunately, the authors go beyond this point, which leads to several concerns. First, the study uses small peptide fragments only, which makes it harder to learn much about the actual pathological proteins or peptides. Second, there is little evidence that these amyloids act by being adjuvants in vivo. In fact, it seems that nearly all peptides are not very potent at promoting infection based on the data obtained. Third, it is argued that the approach of looking for micron-sized aggregates of amyloid fibrils is often ignored in the field of pathogenic amyloids and that studying intrinsic properties of peptides (as done in the present study) could be helpful. I am not so convinced by this statement, as many studies have extensively looked at the correlation between amyloid size and toxicity. In the vast majority of cases, it has been concluded that the large, micron-sized aggregates are rather inert (so the exact opposite of the present study) and that smaller oligomers or protofibrils are more toxic. In this context, it is also important to note that the same protein can often take on many different aggregation states from smaller oligomers, short fibrils, all the way to micron-sized bundled fibrils of different polymorphs. There clearly maturation factors at play here that allow the same sequence to take up many different conformations and aggregation state and many of those will depend on cellular context. Another important practical consideration regarding the use of pathogenic amyloids as adjuvants would be the potential danger that these peptides might act as seeds and cause neurodegenerative or other diseases. The discussion about the membrane interaction appears highly speculative. There is no structural information obtained for any of the peptides, and it is not clear which regions interact with the membrane. Much hydrophobic surface might be covered by inter filament interactions and is perhaps not even available for membrane interaction. Yes, the Wimley-White scales indicate that hydrophobic side chain have an easier time at partitioning into the membrane and such residues can clearly promote membrane binding. Having said that some peptides, such as TAT, can promote cell entry through the use of positively charged clusters. I suggest making this section more convincing or take it out. It appears far too speculative as written.