Machine learning models to accelerate the design of polymeric long-acting injectables

Long-acting injectables are considered one of the most promising therapeutic strategies for the treatment of chronic diseases as they can afford improved therapeutic efficacy, safety, and patient compliance. The use of polymer materials in such a drug formulation strategy can offer unparalleled diversity owing to the ability to synthesize materials with a wide range of properties. However, the interplay between multiple parameters, including the physicochemical properties of the drug and polymer, make it very difficult to intuitively predict the performance of these systems. This necessitates the development and characterization of a wide array of formulation candidates through extensive and time-consuming in vitro experimentation. Machine learning is enabling leap-step advances in a number of fields including drug discovery and materials science. The current study takes a critical step towards data-driven drug formulation development with an emphasis on long-acting injectables. Here we show that machine learning algorithms can be used to predict experimental drug release from these advanced drug delivery systems. We also demonstrate that these trained models can be used to guide the design of new long acting injectables. The implementation of the described data-driven approach has the potential to reduce the time and cost associated with drug formulation development.

What happens if the models are trained on the current external validation dataset and evaluated on the current training data?
The majority of the variables used are properties of the molecule that do not change (I believe only release and time does). Since the predictions are made over time, I feel the authors should explain which variables are constant and which ones aren't.
Quality of the figures in the supplement is poor. Please update the supplement with high quality figures so that they are readable.
Was any of the drugs in the external dataset present in the training set? This is not mentioned and should be clarified.
Can the authors provide a tsv file instead of the excel files (same matrix employed to train the ML models)? Ideally, it should be released in zenodo to be used as a benchmark for future studies. The GitHub repo should be restructured so that there are directories (Python modules) where all the models are located.
Page 7 line 62. "As well as" shouldn't be preceded by a comma.
The validation proposed by the authors is strong but it could be improved if the authors conduct a nested-CV approach and report the values of the test dataset as well. Then the best hyperparameters would be tested in the external dataset ( Figure 1). It would be important to show that the most optimal hyperparameters match what has been reported now with a single CV. Given the small amount of data points, it is not surprising that NN do not perform well here. Also, given that most of the data points contain variables that are properties of the chemical, tree-based models (using thresholds) are likely to be the best performing models. This should be discussed in the conclusion. Also, which other variables could be added and why they couldn't be added in the current study.
Do drug_TPSA and Durg_NHA values come from Rdkit predictions or were collected from studies? Maybe the predictions are wrong and that's why they are irrelevant.

Conclusion
Why is it hard to collect more data points than the current ones? In other words, why are there only a few drugs that have been used for training the ML models and not over 100 for instance?
The authors present different machine learning approaches for the design of long-acting injectables. The paper is very well written and structured in a very clean way. All data is nicely presented and discussed. In principle, I like the study and the conclusions.
However, in my opinion, the study in its current form is not suitable for a journal like Nature Communications as no experimental results or prospective designs are included. Parts of this are circumvented by using an external validation set. Still, no design ideas or conclusions are taken from the results. For example, anaylsis from SHAP might be discussed or used to improve the properties of a compound and not just visualize the importance of an descriptor.
At the current stage, it is a solely retrospective computational analysis.
I recommend to either extend the study by experimental prospective analysis or submit to a more theoretical journal.

REVIEWER REPORT:
Reviewer #1 (Remarks to the Author): Machine Learning Models to Accelerate the Design of Polymeric Long-Acting Injectables Summary: The authors present their work on training machine learning (ML) models to predict the efficacy of long-acting injectables. The ML models are trained Methodology: Page 5, line 119. "from previously published studies". Please add the references to the studies the first time they are mentioned. Was this data generated by the same research group? I would also appreciate a summary of what they contain.

RESPONSE:
We have added the reference to the relevant papers in the manuscript as suggested. This data was collected by a variety of different research groups around the world (this is also now highlighted in the manuscript).
What happens if the models are trained on the current external validation dataset and evaluated on the current training data?

RESPONSE:
As suggested, we have performed these tests and the results were found to be equivalent to the previous. The results are shown in the table below. However, based on the additional feedback, we do not have a fixed test set anymore but do a full nested CV as requested. As such the table below is no longer relevant to the study and we do not include it in the manuscript. The majority of the variables used are properties of the molecule that do not change (I believe only release and time does). Since the predictions are made over time, I feel the authors should explain which variables are constant and which ones aren't.
RESPONSE: This is correct, some additional input features have since been added based on the other feedback received, however this statement remains true. Only values of time vary for a given drug release profile prediction. A better description of this has been added to the input feature selection subsection of the results and discussion section.
Quality of the figures in the supplement is poor. Please update the supplement with high quality figures so that they are readable.

RESPONSE:
We have updated the supplementary information document to reflect all of the changes in the current manuscript and we have improved the quality of the figures now included.
Was any of the drugs in the external dataset present in the training set? This is not mentioned and should be clarified.

RESPONSE:
Yes, there were some individual drugs that were included in both the previous training and external validation datasets. There were also some polymers that were include in both the previous training and external validation datasets. However, there were no drug-polymer combinations repeated between the previous training and external validation dataset. To train the models, data is grouped by drug-polymer combination (i.e., to reflect the composition of each LAI), as this would best allow the model to be cross validated against LAI data splits. This remains true in the updated version of the manuscript where we implement the suggested nested cross-validation approach. This has also been clarified in the updated manuscript.
Can the authors provide a tsv file instead of the excel files (same matrix employed to train the ML models)? Ideally, it should be released in zenodo to be used as a benchmark for future studies. The GitHub repo should be restructured so that there are directories (Python modules) where all the models are located.

RESPONSE:
We have provided a tsv file in the GitHub repo and restructured the directories in the repo. Upon acceptance, we will also release the datafile in zenodo.

RESPONSE:
All of the commas preceding "as well as" have been removed in the updated version.
The validation proposed by the authors is strong but it could be improved if the authors conduct a nested-CV approach and report the values of the test dataset as well. Then the best hyperparameters would be tested in the external dataset (Figure 1). It would be important to show that the most optimal hyperparameters match what has been reported now with a single CV.

RESPONSE:
Based on this feedback we have implemented a nested cross-validation approach to train the models (we have also included some additional features as per additional feedback), the combination of both has significantly improved the accuracy of the models versus the previous leave one group out cross-validation approach with less features. The manuscript (and GitHub repo) has been updated to describe this new training approach and results.

RESPONSE:
This has been corrected, and the figure updated with the nested cross-validation data.
Given the small amount of data points, it is not surprising that NN do not perform well here. Also, given that most of the data points contain variables that are properties of the chemical, tree-based models (using thresholds) are likely to be the best performing models. This should be discussed in the conclusion.

RESPONSE: This has been added to the conclusions.
Also, which other variables could be added and why they couldn't be added in the current study.

RESPONSE:
We had initially considered including initial drug release values (up to one day) as input features in these models to potentially improve performance. These were not included in the previous versions of the models, as there were no consistent drug release timepoints reported in the literature data that we collected. However, we have since imputed the missing values and included some initial drug release timepoints, which (combined with the nested cross-validation approach) has significantly improved the performance of the models.
Do drug_TPSA and Durg_NHA values come from Rdkit predictions or were collected from studies? Maybe the predictions are wrong and that's why they are irrelevant.

RESPONSE:
Yes, both values are generated by Rdkit, and we agree with the reviewer that these predictions may be wrong and therefore irrelevant for that reason. This comment is now mentioned in the result and discussion section.

Conclusion
Why is it hard to collect more data points than the current ones? In other words, why are there only a few drugs that have been used for training the ML models and not over 100 for instance?
RESPONSE: Data quality is the major issue here, rather than data quantity. As mentioned above in the previous response, we encountered a lot of "missing data" when collecting data from the literature. The field seems to be limited by a lack of standardization and consistency in reporting results. In our case this led to many potential manuscripts being discounted from inclusion in our dataset as they were missing some key feature of the drug delivery systems. For example, many manuscripts do not disclose important properties of the LAI systems, including polymer properties (such as molecular weight, lactic acid to glycolic acid ratio, drug loading levels, particle size measurements, etc). Please see the embedded table below as an example of this (i.e., a non-exhaustive snapshot of missing LAI data for a Web of Science literature search). Unlike with the aforementioned initial drug release timepoints, it is not easy to impute these properties of the polymers. We are actually in the process of writing a perspective article on this issue. This table will be included in that article, and not publish in the current manuscript.

Reviewer #3 (Remarks to the Author):
The authors present different machine learning approaches for the design of long-acting injectables. The paper is very well written and structured in a very clean way. All data is nicely presented and discussed.
In principle, I like the study and the conclusions.
However, in my opinion, the study in its current form is not suitable for a journal like Nature Communications as no experimental results or prospective designs are included. Parts of this are circumvented by using an external validation set. Still, no design ideas or conclusions are taken from the results. For example, anaylsis from SHAP might be discussed or used to improve the properties of a compound and not just visualize the importance of an descriptor.
At the current stage, it is a solely retrospective computational analysis. I recommend to either extend the study by experimental prospective analysis or submit to a more theoretical journal.

RESPONSE:
In the revised version of the manuscript, we have conducted some further model interpretation steps. Specifically, we have used the SHAP analysis to derive some key design criteria for "fast" and "slow" release PLGA system. Based on these design criteria, we have prepared and characterized (experimentally) two new LAI systems for drugs that were not previously included in our dataset. Finally, we have measured the in vitro drug release of these new systems and compared the experimental release with the predicted release profiles generated by the trained model. Overall, these experimental and predicted drug release profiles were in good agreement with each other, and we found that the design criteria (which we derived from the SHAP analysis) provided excellent experimental parameters to design "fast" and "slow" release PLGA systems for new drugs.
Additionally, based on the feedback of Reviewer 1 we have re-trained the previous models via the implementation of a nested cross-validation approach and included some additional input feature for the LAI system. The combination of both has significantly improved the accuracy of the models versus the previous leave one group out cross-validation approach (which had less input features). Thus, the SHAP analysis and design criteria that we have developed are based on these updated models.
The authors have addressed all my comments. I hope that the data file will be uploaded to zenodo.
Reviewer #3 (Remarks to the Author): Thanks a lot for the revised manuscript. The manuscript has greatly improved and the extended studies add a lot of value to the new version. The prospective example shows the potential of the method and a possibility how to use interpretations from SHAP to design new system. The study is very interesting and should be published in Nature Communications.