Yet Another Automated Gleason Grading System (YAAGGS) by weakly supervised deep learning

The Gleason score contributes significantly in predicting prostate cancer outcomes and selecting the appropriate treatment option, which is affected by well-known inter-observer variations. We present a novel deep learning-based automated Gleason grading system that does not require extensive region-level manual annotations by experts and/or complex algorithms for the automatic generation of region-level annotations. A total of 6664 and 936 prostate needle biopsy single-core slides (689 and 99 cases) from two institutions were used for system discovery and validation, respectively. Pathological diagnoses were converted into grade groups and used as the reference standard. The grade group prediction accuracy of the system was 77.5% (95% confidence interval (CI): 72.3–82.7%), the Cohen’s kappa score (κ) was 0.650 (95% CI: 0.570–0.730), and the quadratic-weighted kappa score (κquad) was 0.897 (95% CI: 0.815–0.979). When trained on 621 cases from one institution and validated on 167 cases from the other institution, the system’s accuracy reached 67.4% (95% CI: 63.2–71.6%), κ 0.553 (95% CI: 0.495–0.610), and the κquad 0.880 (95% CI: 0.822–0.938). In order to evaluate the impact of the proposed method, performance comparison with several baseline methods was also performed. While limited by case volume and a few more factors, the results of this study can contribute to the potential development of an artificial intelligence system to diagnose other cancers without extensive region-level annotations.


Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection The datasets used in this study are not publicly available at this moment due to data usage agreement restrictions. The de-identification process and usage of slides have been approved by the respective Institutional Review Boards (IRBs) of KUGH and HUMC.

Data analysis
To assess the performance of the second stage model as a grade group predictor, Cohen's kappa score was measured between the model output and the reference standard on the validation dataset, both with and without quadratic weighting. The grade group prediction accuracy was also assessed. Confidence intervals for kappa statistics were computed based on the equation presented by McHugh.For other performance indices, such as accuracy, the normal approximation of the binomial confidence interval was used.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability The source codes used in this study are copyrighted by Deep Bio Inc. and are available only under appropriate license agreement with the copyright holder.

nature research | reporting summary
April 2020 Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. (1) slides for a tissue biopsied from organs other than the prostate or surrounding tissues, (2) immunohistochemistry or special stains slides, (3) slides with inadequate quality for pathologic diagnosis, including severe out of focusing or indelible markings.

Replication
The first stage model was trained to classify input patch images into two classes, benign and cancer. The MIL method was used to train the first stage model, as in another study.21 We adopted DenseNet-121 as our first stage model architecture and used the ImageNet pre-trained weight parameters as the initial model parameter values. The condition of the first stage model training is as follows: Initial learning rate was 0.01. The optimizer used was stochastic gradient descent (SGD) optimizer with 0.9 momentum and 1e-5 weight decay. The learning rate scheduler was a step learning rate scheduler with 25 epoch step size and 0.1 decay rate. The total training epoch was 100 epochs and mini batch size was 128. Applied data augmentations were random horizontal flip, random 90° rotation, and random change of the brightness, contrast, saturation, and hue as amounts of ±0.1, ±0.3, ±0.3, and ±0.05, respectively. The model with the best PR AUC score at tuning set was chosen. After the training step, the last hidden layer output of the model was tapped to obtain 1024-dimensional feature vectors for the input patch images.
To train the second stage model, we converted WSIs in the discovery set into feature maps, as described above, using the first stage model. More specifically, patch images of 360 × 360 pixel size in three channels were converted into images of 1 × 1 pixel size in 1024 channels, resulting in the width and height of each WSI resized by 1/360. Accordingly, WSIs equal to or less than 2 × 2 cm2 can be safely converted into 1024-channel 64 × 64 pixel-sized feature maps. The second stage model was trained to classify these feature maps into grade groups, based on the converted discovery set. The architecture of the second stage model consists of two parts. The front part is composed of five layers of a 1 × 1 kernel convolution with batch-normalization and rectified linear unit (ReLU) non-linear activation, and the second part is composed of 16 blocks of convolution layer with the residual connection. The exact configuration of the second-stage model and the residual block is presented in Supplementary Table 1  and Supplementary Table 2, respectively. In training the second stage model, we minimize the weighted cross entropy loss as the objective function, as a class imbalance exists in the discovery dataset. The weight parameter was 1, 1, 1.5, 1.4, 1.7, and 1.6 for each class from 'benign' to 'grade group 5'. The training condition of the second stage model was as follows. The initial learning rate was 0.1. The SGD optimizer was used with 0.9 momentum and 1e-5 weight decay. The learning rate scheduler used was a step learning rate scheduler with 40 epoch step size and 0.1 decay rate. The total training epoch was 150 epochs and mini batch size was 256. Applied data augmentations were random shift with -5 to 5 pixel range horizontally and vertically, random vertical flip, and random 90° rotation. The model with the best Kappa quad score at tuning set was selected. Figure 4 depicts the entire training process.
Randomization From the total dataset, 99 cases (936 WSIs) randomly selected per category per institution were set aside for the validation set. The remaining 689 cases (6,664 WSIs) were used for discovery.

Behavioural & social sciences study design
All studies must disclose on these points even when the disclosure is negative.

Study description
Briefly describe the study type including whether data are quantitative, qualitative, or mixed-methods (e.g. qualitative cross-sectional, quantitative experimental, mixed-methods case study).

Data collection
Provide details about the data collection procedure, including the instruments or devices used to record the data (e.g. pen and paper, computer, eye tracker, video or audio equipment) whether anyone was present besides the participant(s) and the researcher, and whether the researcher was blind to experimental condition and/or the study hypothesis during data collection.

Timing
Indicate the start and stop dates of data collection. If there is a gap between collection periods, state the dates for each sample cohort.

Data exclusions
If no data were excluded from the analyses, state so OR if data were excluded, provide the exact number of exclusions and the rationale behind them, indicating whether exclusion criteria were pre-established.

Non-participation
State how many participants dropped out/declined participation and the reason(s) given OR provide response rate OR state that no participants dropped out/declined participation.

Randomization
If participants were not allocated into experimental groups, state so OR describe how participants were allocated to groups, and if allocation was not random, describe how covariates were controlled.

Ecological, evolutionary & environmental sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sampling strategy
Note the sampling procedure. Describe the statistical methods that were used to predetermine sample size OR if no sample-size calculation was performed, describe how sample sizes were chosen and provide a rationale for why these sample sizes are sufficient.

Data collection
Describe the data collection procedure, including who recorded the data and how. Access & import/export Describe the efforts you have made to access habitats and to collect and import/export your samples in a responsible manner and in compliance with local, national and international laws, noting any permits that were obtained (give the name of the issuing authority, the date of issue, and any identifying information).

Disturbance
Describe any disturbance caused by the study and how it was minimized.

Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.

Authentication
Describe the authentication procedures for each cell line used OR declare that none of the cell lines used were authenticated.

Mycoplasma contamination
Confirm that all cell lines tested negative for mycoplasma contamination OR describe the results of the testing for mycoplasma contamination OR declare that the cell lines were not tested for mycoplasma contamination.

Commonly misidentified lines (See ICLAC register)
Name any commonly misidentified cell lines used in the study and provide a rationale for their use.

Specimen provenance
Provide provenance information for specimens and describe permits that were obtained for the work (including the name of the issuing authority, the date of issue, and any identifying information).

Specimen deposition
Indicate where the specimens have been deposited to permit free access by other researchers.

Dating methods
If new dates are provided, describe how they were obtained (e.g. collection, storage, sample pretreatment and measurement), where they were obtained (i.e. lab name), the calibration program and the protocol for quality assurance OR state that no new dates are provided.
Tick this box to confirm that the raw and calibrated dates are available in the paper or in Supplementary Information.

Ethics oversight
Identify the organization(s) that approved or provided guidance on the study protocol, OR state that no ethical approval or guidance was required and explain why not.
Note that full information on the approval of the study protocol must also be provided in the manuscript.

Animals and other organisms
Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research

Laboratory animals
For laboratory animals, report species, strain, sex and age OR state that the study did not involve laboratory animals.

Wild animals
Provide details on animals observed in or captured in the field; report species, sex and age where possible.

Ethics oversight
Identify the organization(s) that approved or provided guidance on the study protocol, OR state that no ethical approval or guidance was required and explain why not.
Note that full information on the approval of the study protocol must also be provided in the manuscript.

Ethics oversight
Identify the organization(s) that approved the study protocol.
Note that full information on the approval of the study protocol must also be provided in the manuscript.

Clinical data
Policy information about clinical studies All manuscripts should comply with the ICMJE guidelines for publication of clinical research and a completed CONSORT checklist must be included with all submissions.
Clinical trial registration Provide the trial registration number from ClinicalTrials.gov or an equivalent agency.

Study protocol
Note where the full trial protocol can be accessed OR if not available, explain why.

Data collection
Describe the settings and locales of data collection, noting the time periods of recruitment and data collection.

Outcomes
Describe how you pre-defined primary and secondary outcome measures and how you assessed these measures.

Dual use research of concern
Policy information about dual use research of concern

Hazards
Could the accidental, deliberate or reckless misuse of agents or technologies generated in the work, or the application of information presented in the manuscript, pose a threat to:

Experiments of concern
Does the work involve any of these experiments of concern: No Yes Confirm that both raw and final processed data have been deposited in a public database such as GEO.
Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks.

nature research | reporting summary
April 2020

Data access links
May remain private before publication.
For "Initial submission" or "Revised version" documents, provide reviewer access links. For your "Final submission" document, provide a link to the deposited data.