Artificial intelligence-based assessment of PD-L1 expression in diffuse large B cell lymphoma

Diffuse large B cell lymphoma (DLBCL) is an aggressive blood cancer known for its rapid progression and high incidence. The growing use of immunohistochemistry (IHC) has significantly contributed to the detailed cell characterization, thereby playing a crucial role in guiding treatment strategies for DLBCL. In this study, we developed an AI-based image analysis approach for assessing PD-L1 expression in DLBCL patients. PD-L1 expression represents as a major biomarker for screening patients who can benefit from targeted immunotherapy interventions. In particular, we performed large-scale cell annotations in IHC slides, encompassing over 5101 tissue regions and 146,439 live cells. Extensive experiments in primary and validation cohorts demonstrated the defined quantitative rule helped overcome the difficulty of identifying specific cell types. In assessing data obtained from fine needle biopsies, experiments revealed that there was a higher level of agreement in the quantitative results between Artificial Intelligence (AI) algorithms and pathologists, as well as among pathologists themselves, in comparison to the data obtained from surgical specimens. We highlight that the AI-enabled analytics enhance the objectivity and interpretability of PD-L1 quantification to improve the targeted immunotherapy development in DLBCL patients.


Field-specific reporting
Please select the one below that is the best fit for your research.If you are not sure, read the appropriate sections before making your selection.

Life sciences
Behavioural & social sciences Ecological, evolutionary & environmental sciences For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
For primary cohort, we collected 220 patients with DLBCL diagnosed or treated in Ruijin Hospital from June 2019 to June 2020.All 220 WSIs underwent immunohistochemical staining of PD-L1.For the validation cohort, 61 PD-L1 stained WSIs were collected from the North Branch of Shanghai Ruijin Hospital.Three pathologists from the Ruijin hospital independently rated each sample for PD-L1 quantification.We consolidated their results and calculated the mean and median for each sample.By comparing the results from each pathologist as well as their mean or median with the algorithm's results, we reached a consensus on consistency.All data were utilized, and with the review by the three pathologists, the sample size was deemed sufficient for the experiment.
Data exclusions Given that 6 slides exhibited insufficient cell counts for TPS calculation, a total of 214 slides were ultimately incorporated into the outcome assessment, although the whole pipeline was applied to all slides in the primary cohort.

Replication
To ensure the reproducibility of our experimental results, we conducted five-fold cross-validation experiments and made sure all experimental conditions were consistent.We also meticulously documented all experimental steps and parameter settings, using standardized protocols to allow other researchers to replicate these experiments.All experiments underwent rigorous statistical analysis to confirm the consistency of the findings.

nature portfolio | reporting summary
April 2023

Blinding
Data of all patients were analyzed anoymously.Not additional blinding was done.
Behavioural & social sciences study design All studies must disclose on these points even when the disclosure is negative.

Study description
Briefly describe the study type including whether data are quantitative, qualitative, or mixed-methods (e.g.qualitative cross-sectional, quantitative experimental, mixed-methods case study).

Research sample
State the research sample (e.g.Harvard university undergraduates, villagers in rural India) and provide relevant demographic information (e.g.age, sex) and indicate whether the sample is representative.Provide a rationale for the study sample chosen.For studies involving existing datasets, please describe the dataset and source.

Sampling strategy
Describe the sampling procedure (e.g. random, snowball, stratified, convenience).Describe the statistical methods that were used to predetermine sample size OR if no sample-size calculation was performed, describe how sample sizes were chosen and provide a rationale for why these sample sizes are sufficient.For qualitative data, please indicate whether data saturation was considered, and what criteria were used to decide that no further sampling was needed.

Data collection
Provide details about the data collection procedure, including the instruments or devices used to record the data (e.g.pen and paper, computer, eye tracker, video or audio equipment) whether anyone was present besides the participant(s) and the researcher, and whether the researcher was blind to experimental condition and/or the study hypothesis during data collection.

Timing
Indicate the start and stop dates of data collection.If there is a gap between collection periods, state the dates for each sample cohort.

Data exclusions
If no data were excluded from the analyses, state so OR if data were excluded, provide the exact number of exclusions and the rationale behind them, indicating whether exclusion criteria were pre-established.

Non-participation
State how many participants dropped out/declined participation and the reason(s) given OR provide response rate OR state that no participants dropped out/declined participation.

Randomization
If participants were not allocated into experimental groups, state so OR describe how participants were allocated to groups, and if allocation was not random, describe how covariates were controlled.

Ecological, evolutionary & environmental sciences study design
All studies must disclose on these points even when the disclosure is negative.

Study description
Briefly describe the study.For quantitative data include treatment factors and interactions, design structure (e.g.factorial, nested, hierarchical), nature and number of experimental units and replicates.

Sampling strategy
Note the sampling procedure.Describe the statistical methods that were used to predetermine sample size OR if no sample-size calculation was performed, describe how sample sizes were chosen and provide a rationale for why these sample sizes are sufficient.

Data collection
Describe the data collection procedure, including who recorded the data and how.
Timing and spatial scale Indicate the start and stop dates of data collection, noting the frequency and periodicity of sampling and providing a rationale for these choices.If there is a gap between collection periods, state the dates for each sample cohort.Specify the spatial scale from which the data are taken

Data exclusions
If no data were excluded from the analyses, state so OR if data were excluded, describe the exclusions and the rationale behind them, indicating whether exclusion criteria were pre-established.

Reproducibility
Describe the measures taken to verify the reproducibility of experimental findings.For each experiment, note whether any attempts to repeat the experiment failed OR state that all attempts to repeat the experiment were successful.

Randomization
Describe how samples/organisms/participants were allocated into groups.If allocation was not random, describe how covariates were controlled.If this is not relevant to your study, explain why.

Blinding
Describe the extent of blinding used during data acquisition and analysis.If blinding was not possible, describe why OR explain why blinding was not relevant to your study.

Did the study involve field work?
Yes No

nature portfolio | reporting summary
April 2023 Field work, collection and transport

Field conditions
Describe the study conditions for field work, providing relevant parameters (e.g.temperature, rainfall).

Location
State the location of the sampling or experiment, providing relevant parameters (e.g.latitude and longitude, elevation, water depth).
Access & import/export Describe the efforts you have made to access habitats and to collect and import/export your samples in a responsible manner and in compliance with local, national and international laws, noting any permits that were obtained (give the name of the issuing authority, the date of issue, and any identifying information).

Disturbance
Describe any disturbance caused by the study and how it was minimized.
Reporting for specific materials, systems and methods We require information from authors about some types of materials, experimental systems and methods used in many studies.Here, indicate whether each material, system or method listed is relevant to your study.

Graph analysis
Report the dependent variable and connectivity measure, specifying weighted graph or binarized graph, subject-or group-level, and the global and/or node summaries used (e.g.clustering coefficient, efficiency, etc.).
Multivariate modeling and predictive analysis Specify independent variables, features extraction and dimension reduction, model, training and evaluation metrics.
If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.If new dates are provided, describe how they were obtained(e.g.collection, storage, sample pretreatment and measurement), where they were obtained (i.e.lab name), the calibration program and the protocol for quality assurance OR state that no new dates are provided.Tick this box to confirm that the raw and calibrated dates are available in the paper or in Supplementary Information.Define your software and/or method and criteria for volume censoring, and state the extent of such censoring.Specify type (mass univariate, multivariate, RSA, predictive, etc.) and describe essential details of the model at the first and second levels (e.g.fixed, random or mixed effects; drift or auto-correlation).Define precise effect in terms of the task or stimulus conditions instead of psychological concepts and indicate whether ANOVA or factorial designs were used.CorrectionDescribe the type of correction and how it is obtained for multiple comparisons (e.g.FWE, FDR, permutation or Monte Carlo).Report the measures of dependence used and the model details (e.g.Pearson correlation, partial correlation, mutual information).