Figure 4 | Scientific Reports

Figure 4

From: Machine learning for comprehensive forecasting of Alzheimer’s Disease progression

Figure 4

Synthetic data is challenging to distinguish from real data. The AUC of a logistic regression model trained at each time point to distinguish between real and synthetic data is shown. At each time point a dataset is formed comprised of real and synthetic patient data in which the baseline data for each group is the same. A logistic regression model is trained to separate these two groups, and the performance using the AUC metric is estimated with 5-fold cross validation. This procedure is repeated many (100) times and the average AUC is shown along with the standard deviation. Finally, this entire method is repeated for each CV fold (which are all shown). To handle missing data mean imputation is used, with the corresponding entries in the synthetic data also assigned the same values. The performance of the logistic regression at each time point is consistent with statistically indistinguishable real and synthetic data.

Back to article page