Availability of ChatGPT to provide medical information for patients with kidney cancer

ChatGPT is an advanced natural language processing technology that closely resembles human language. We evaluated whether ChatGPT could help patients understand kidney cancer and replace consultations with urologists. Two urologists developed ten questions commonly asked by patients with kidney cancer. The answers to these questions were produced using ChatGPT. The five-dimension SERVQUAL model was used to assess the service quality of ChatGPT. The survey was distributed to 103 urologists via email, and twenty-four urological oncologists specializing in kidney cancer were included as experts with more than 20 kidney cancer cases in clinic per month. All respondents were physicians. We received 24 responses to the email survey (response rate: 23.3%). The appropriateness rate for all ten answers exceeded 60%. The answer to Q2 received the highest agreement (91.7%, etiology of kidney cancer), whereas the answer to Q8 had the lowest (62.5%, comparison with other cancers). The experts gave low assessment ratings (44.4% vs. 93.3%, p = 0.028) in the SERVQUAL assurance (certainty of total answers) dimension. Positive scores for the overall understandability of ChatGPT answers were assigned by 54.2% of responders, and 70.8% said that ChatGPT could not replace explanations provided by urologists. Our findings affirm that although ChatGPT answers to kidney cancer questions are generally accessible, they should not supplant the counseling of a urologist.


Materials and methods
The study utilized ChatGPT, a language model developed by OpenAI in San Francisco, California, USA, based on the GPT-3.5 architecture (last updated in September 2021).A set of ten English-based questions was designed by two urologists from a university hospital.The questions were formulated by referencing the 'People also ask' section when searching for Kidney cancer on Google.com™ and by collaborating to generate questions commonly asked by outpatient individuals, with the aim of avoiding redundancy in overall categories.The questions addressed various aspects of kidney cancer, including symptoms, causes, treatment methods, prevention strategies, genetic effects, incidence rates, treatment of metastatic cancer, differences from other cancers, survival rates, and recurrence rates.The list of questions is presented in Table 1, and the ChatGPT-derived responses are displayed in Fig. 1.
We aimed to measure the quality of ChatGPT answers using simple SERVQUAL questions.SERVQUAL model was conducted covering five dimensions, including tangibility, reliability, response, assessment, and affordability, which allowed respondents to rate ChatGPT answers on a scale of 1-5.We intended to assess levels of online services by adapting the original SERVQUAL model, modifying it to be suitable for evaluating online services.We restructured each category into five items, symbolizing very high/high/normal/low/very low, to facilitate a more fitting evaluation.We attached five SERVQUAL questions to the end of the survey.Moreover, two comprehensive assessment questions were incorporated to assess whether the ChatGPT responses were understandable to patients and whether they could replace the explanations provided by urologists.The survey, which included the ChatGPT answers and the quality assessment, was distributed via email to 103 urologists, of which 24 were experts.The expert was defined as urologists affiliated with the Korean Urological Oncology Society and the Korean Renal Cancer Research Society (KRoCS) with more than 20 kidney cancer cases in clinic per month.KRoCS members consist of professors of urology specializing in the treatment of kidney cancer at university hospitals who have published research [9][10][11] .All respondents were physicians and responses were collected using Google Forms (https:// docs.google.com/ forms).
All statistical analyses were performed using SPSS software (version 27.0; Statistical Package for Social Sciences, Chicago, IL, USA).Student's t-test was used to compare the means between the expert and non-expert groups, and statistical significance was set at p < 0.05.
The Institutional Review Board of Chung-Ang University Gwangmyeoung Hospital approved this study (approval number: 2310-112-114).Because of its retrospective nature, the need for informed consent was waived by the IRB of the Chung-Ang University Gwangmyeoung Hospital based on the Unites States Department of Health and Human Services code 46.116 for requirements for informed consent.The study was conducted according to the ethical standards recommended by the 1964 Declaration of Helsinki and its later amendments.

Results
There were 24 responses to the e-mail survey, with a response rate of 23.3%.The demographic characteristics of the respondents are presented in Table 2. Notably, nine experts reported performing over 20 kidney cancer surgeries per month.The answers to all ten questions are provided in Supplementary Table 1.
The overall positive evaluation rate of the urologists for all ten answers was 77.9%, ranging from 62.5 to 91.7%, as illustrated in Fig. 2. The answer to question 2, which asked about the causes of kidney cancer, received the highest positive evaluation rate of 91.7%, whereas answer 8, pertaining to the differences between kidney cancer and other types of cancer, received the lowest positive evaluation rate of 62.5%.Notably, eight of the 10 answers achieved a positive evaluation rate of ≥ 75%.
Applying the SERVQUAL model (Table 3), the assessment of reliability (average score of 3.4/5.0,p = 0.004) and responsiveness (average score of 3.2/5.0,p < 0.001) yielded lower points compared to tangibility (average score of 4.1/5.0).There was no statistical difference in the positive evaluation rate between expert and non-expert respondents for most of the survey.However, for the assurance dimension, which evaluated the certainty of the total answers provided, the experts gave the lowest positive evaluation rate (44.4% vs. 93.3%,p = 0.28; Table 4).
In the comprehensive assessment, 54.2% of the respondents expressed a positive evaluation (indicating that responses were better than normal) regarding ChatGPT's ability to provide comprehensible responses (Fig. 3).However, only 29.2% of the urologists believed that ChatGPT-derived responses could replace explanations provided by urologists.

Discussion
Our survey of urologists evaluating ChatGPT responses to questions about kidney cancer resulted in an overall positive rating of 77.9%, with an excellent positive evaluation rating of 91.7% for answers to questions such as "causes of kidney cancer," for which an internet search is possible.However, it received relatively low scores for reliability (3.4/5.0) and responsiveness to the latest insights (3.2/5.0).This reflects the fact that ChatGPT only incorporates knowledge as of September 2021 and cannot include the latest treatment trends.Further, 30% acceptability-as-a-surrogate rate in 80% positive evaluations is a significant difference.Perhaps it could be attributed to an issue regarding the doctor's privilege or rights and follow-up research is necessary.Additionally, when examining the responses of expert and non-expert physicians with a monthly caseload of 20 or more kidney cancer cases, there were no significant differences observed except in the Assurance (certainty of total  The answer is insufficient and insufficient (< 50%) 3 (33.3%) 1 ( 6.7%) There is an error in the answer and should not be used for actual medical treatment 1 (11.1%) 1 ( 6.7%) Q5.Genetics 0.524 I totally agree with the answer 3 (33.3%) 3 (20.0%) Overall (> 50%) I agree with the answer 6 (66.7%) 9 (60.0%) The answer is insufficient and insufficient (< 50%) 0 (0.0%) 2 (13.3%) There is an error in the answer and should not be used for actual medical treatment 0 (0.0%) 1 ( 6 The answer is insufficient and insufficient (< 50%) 1 (11.1%)

(20.0%)
There is an error in the answer and should not be used for actual medical treatment 0 (0.0%) 3 (20.0%)Q9.Survival rate 0.700 I totally agree with the answer 3 (33.3%) 5 (33.3%) Overall (> 50%) I agree with the answer 5 (55.6%) 7 (46.7%) The answer is insufficient and insufficient (< 50%) 1 (11.1%) 1 (6.7%) There is an error in the answer and should not be used for actual medical treatment 0 (0.0%) ChatGPT is a convenient and powerful tool for providing medical information.ChatGPT could potentially serve as a tool to provide clinical guidance to patients, suggest treatment options based on guidelines, and be utilized for medical education [12][13][14] .A few studies have used ChatGPT for the treatment or assessment of urological diseases.Coskun et al. assessed the quality of ChatGPT information on prostate cancer and demonstrated that ChatGPT information was lacking in terms of accuracy and interpretation 15 .To improve a patient's deep understanding, the authors suggested the need for improved reliability, evidenced-based information, understanding of patient emotions or experiences, and brevity.Davis et al. examined the appropriateness of NLP for urological diseases and reported that there are limitations to the medical information on NLP 16 .Urologists pointed out that vital information was missing from the content provided by ChatGPT.Despite these results, the use of ChatGPT is gradually expanding in the real world.Doubts regarding reliability likely stem from ChatGPT operating as a generative model that provides appropriate responses in interactive situations.It utilizes natural language processing techniques to understand user input and generate responses based on the training received from a large-scale text dataset.Although the GPT strives to learn patterns, context, and meaning to produce natural conversations, it does not always provide accurate or perfect answers because it relies on pre-trained data.In other words, ChatGPT does not generate "true knowledge-based answers" and does not take responsibility for the responses.
The use of AI or machine learning in urology is common.As ChatGPT is not an AI application trained using a specialized medical database, it may be inaccurate or misleading in answering medical questions 17 .Howard et al. assessed infection consultations and the selection of antimicrobial agents and concluded that answers from ChatGPT were inadequate and inconsistent and recommended a qualitative modification that can be applied to medical specialties 18 .Zhou et al. assessed the appropriateness of ChatGPT in urology and reported that ChatGPT was generally consistent and well-aligned with the guidelines for urological diseases 19 .Davis et al. investigated the appropriateness and readability of ChatGPT responses to urology-related medical inquiries 16 .The authors used 18 urological questions based on Google Trends, covering the categories of malignancy, emergency, and benign diseases.They suggested that vital information lacking in the ChatGPT answers was a limitation.The accuracy or reliability of the answer is poor 0 (0.0%) 1 (6.7%) The answer is accurate and reliable 0 (0.0%) 1 (6.7%) There is a part that is not being able to be trusted 1 (11  Among the five dimensions of SERVQUAL questions, only assurance demonstrated a significant difference between the experts and general urologists (p = 0.028).Most general urologists responded that the ChatGPT answers were reliable and convincing (93.3%); however, approximately 55.5% of the experts on kidney cancer thought the answers were unreliable.The difference in the responses on assurance between the two groups likely stems from the knowledge of the kidney cancer expert group.Although ChatGPT has sufficient function to deliver information about kidney cancer to patients, we suggest that it lacks specialized medical knowledge.
This study had several limitations.First, the low response rate (23.3%) and relatively small sample size are notable limitations.Additionally, it is worth mentioning that only approximately 50% of the respondents were experts who performed more than 20 kidney cancer surgeries per month.Therefore, it can be considered a drawback that the sample may not fully represent all urologists.Further research involving a larger group of expert respondents is required to address these limitations.Second, the overall responses from ChatGPT tended to repetitively explain general information when answering the questions.Evaluating this aspect using the existing SERVQUAL model (tangibility, reliability, responsiveness, assurance, and empathy) may be inappropriate.Therefore, evaluation metrics that specifically assess response specificity are required.Third, this study did not include Bard (Google), Claude 2 or Llama 2, another NLP technology model with a public face; Therefore, it is unclear whether the responses obtained reflect the general characteristics of all NLP technology models.Fourth, our survey only involves questions directed at physicians, excluding input from patients.In future research, targeting patients could provide results that better reflect real-world practice.Lastly, GPT-4 was promptly launched on March 14, 2023, subsequent to GPT-3.5, and is regarded as a more advanced model in terms of its performance and capabilities for ChatGPT.Although we recognize the widespread availability and enhanced accessibility of GPT-3.5 due to its free usage, we acknowledge that it may have limitations in delivering comprehensive information compared to the more advanced GPT-4.We did not conduct a specific inquiry into the accuracy of the information provided by ChatGPT through an examination of the source text.
The application of ChatGPT in the medical or healthcare environment is currently in its nascent stages.Our findings shed light on the potential of AI-driven language models such as ChatGPT to assist in medical information dissemination while emphasizing the importance of maintaining the role of expert human healthcare providers in patient care and education.

Conclusions
According to the urologists surveyed, the ChatGPT answers to common questions regarding kidney cancer were widely understandable and accessible.However, most participants, particularly the group of experts who exhibited a lower level of consensus in the dimension of assurance, concluded that ChatGPT could not entirely substitute for the guidance of a urologist.

Figure 1 .
Figure 1.Flow chart of the study.The entire research process involved creating questionnaires and undergoing validation by two urologists.Following validation, E-mail distribution was conducted.

Figure 2 .
Figure 2. Acceptance rate for ChatGPT answers.The evaluation of ChatGPT's responses by urologists indicates that the positive evaluation rate (combined dark blue + light blue) exceeded 60% across all responses.

Figure 3 .
Figure 3. Comprehensive assessment for ChatGPT answers.In the comprehensive assessment of ChatGPT's responses, the positive evaluation rate for being understandable reached 87.5%, but it was noted that it couldn't fully replace the consultation by urologists, with a response rate of 70.8%.
7 I heard that kidney cancer has metastasized, what is the best treatment?Can it be completely cured?Metastasis 8 What is the difference between kidney cancer and other types of cancer?Differences 9 What is the survival rate of kidney cancer after treatment?Survival rate 10 What is the probability that kidney cancer will recur?Probability

Table 2 .
Demographic characteristics of the respondents.