The concept of health equity involves providing every individual with a fair and just opportunity to attain their highest level of health1. Unfortunately, disparities in healthcare access and the distribution of medical information continue to be significant barriers2. For the Hispanic community, particularly those who primarily speak Spanish, these barriers are often compounded by linguistic challenges, limiting their access to essential healthcare information3,4,5,6. A recent study examined trends in poor health indicators among Black and Hispanic middle-aged and older adults in the United States from 1999 to 20187. The study found that, while Hispanics showed overall improvements in physical inactivity and perceived poor health, they experienced deterioration in hypertension and diabetes rates. Notably, the study reported no significant change in the Hispanic-White gap for kidney disease over the 20-year period, indicating that the disparity in this specific condition did not improve7. In the context of kidney transplantation, where understanding complex medical information is crucial, this language barrier presents a substantial obstacle8,9. The Hispanic population is disproportionately affected by kidney diseases, including higher prevalence rates of conditions leading to kidney failure. According to epidemiological studies, Hispanics are more likely to develop end-stage kidney disease (ESKD) compared to non-Hispanic whites10,11,12.

Additionally, they face longer waiting times for kidney transplants and lower rates of referral for transplant evaluations9,13. These disparities can be attributed to several factors, including language barriers that impede effective communication between healthcare providers and patients, leading to misunderstandings, missed appointments, and incomplete or inaccurate medical documentation. Moreover, the lack of culturally and linguistically appropriate health information contributes to a lower level of health literacy among this population, further complicating their navigation through the transplant referral and evaluation process3,4,5,6.

The provision of culturally appropriate health information is crucial in managing and treating chronic conditions like kidney disease. Culturally sensitive information takes into account not just the language but also the cultural beliefs, practices, and values of a community14,15,16. This approach is particularly important in the Hispanic community, where cultural nuances play a vital role in health-related decision-making. Additionally, Language barriers can significantly impact the quality of healthcare received by non-English speaking patients17,18. In the United States, a considerable portion of the Hispanic population has limited English proficiency, making it challenging for them to access and understand health information in English18,19. This gap is not just a matter of translation but involves conveying complex medical concepts in a linguistically and culturally appropriate manner.

Artificial intelligence, particularly advanced language models like Chat GPT 3.5 and 4.0, presents an innovative approach to addressing the challenges of language barriers in healthcare20,21,22,23,24,25,26,27,28. These AI models hold the potential to accurately and sensitively translate complex medical information, thereby making it accessible to a wider audience29,30,31,32,33,34,35,36. In the specific context of kidney transplantation, where the necessity for detailed and accurate information is critical, the role of AI-driven translations could be transformative, offering a significant advancement in how medical information is communicated to non-English speaking populations.

The primary objective of this study is to evaluate the effectiveness of Chat GPT 3.5 and 4.0 in translating kidney transplantation-related FAQs from English to Spanish, tailored for the Hispanic community. The study focuses on the accuracy and cultural sensitivity of these translations, assessing whether these AI tools can provide reliable, comprehensible, and culturally appropriate medical information. By doing so, the study seeks to determine the potential of AI in improving health information accessibility and contributing to health equity for Spanish-speaking Hispanics.

Materials and methods

Data collection

This study was conducted to perform English-to-Spanish translation of 54 frequently asked questions (FAQs) regarding kidney transplantation. The FAQs were selected to comprehensively represent the relevant topics to patients considering or undergoing kidney transplantations. The FAQs was obtained from (1) Organ Procurement and Transplantation Network (OPTN)37; 19 questions focusing on eligibility criteria, waitlist process, and post-transplant care, (2) National Health Service (NHS)38; 15 questions focusing on patient preparation for kidney transplantation, surgical procedures, and post-transplant care, (3) National Kidney Foundation39; 20 questions focusing on long-term management, lifestyle consideration, and support resources for kidney transplant recipients (Online supplementary data). This study is exempt from Ethics Committee or Institutional Review Board approval, as it neither involves human nor animal subjects, nor does it encompass patient information or identifiable personal data.

AI language model usage

The translation process utilized ChatGPT versions 3.5 and 4.040. These AI chatbots were chosen for their advanced natural language processing capabilities41,42, which include the ability to understand context, generate coherent and contextually appropriate text, and maintain consistency in translations. Each selected FAQ was input into the AI chatbot in its English version, and the models then provided Spanish translations. This process was conducted individually for each question to ensure that each translation was contextually accurate. The AI chatbots were configured to optimize for translation accuracy and cultural relevance, focusing on nuances that would make the translations suitable for the Hispanic community. The study was conducted in December 2023.

Systematic evaluation of translations

Each translation was evaluated using a detailed rubric scale ranging from 1 to 5 (Online supplementary data), when 1 represents a lower or poor performance and 5 indicates a higher or excellent performance43. The rubric scale was designed to assess two key aspects:

  • Linguistic accuracy: This criterion evaluated the grammatical correctness, appropriate use of vocabulary, and syntactic integrity of the translations. Translations were examined for their clarity, readability, and technical precision in medical terminology.

  • Cultural sensitivity: This measure assessed the extent to which translations respected and incorporated cultural nuances, idiomatic expressions, and contextually relevant information for the Hispanic community. This aspect was crucial to ensure that the translations were not only linguistically accurate but also culturally resonant and sensitive to the needs of the target audience.

Two nephrologists of Mexican heritage, fluent in Spanish, O.A.G.V. and M.G.S., meticulously evaluated the translations for accuracy and cultural relevance using a 1–5 scale. The evaluation process totaled approximately 40 h, with each expert contributing around 20 h. They began with O.A.G.V.’s initial assessments, which M.G.S. reviewed and confirmed, and any differences were resolved through consensus. The inter-rater reliability of the evaluators, measured by Cohen’s Kappa, was 0.85, indicating a high level of agreement and supporting the reliability and credibility of the findings.

Statistical analysis

The mean scores for linguistic accuracy and culture sensitivity were summarized as mean ± standard deviation (SD). The score was compared between GPT-3.5 and 4.0 using paired-t test. The score was compared across three question sources using analysis of variance (ANOVA) test. The two-tailed p-value less than 0.05 was considered statistically significant. Statistical analyses were performed using JMP statistical software (version 17, SAS Institute, Cary, NC).


The score for linguistic accuracy and cultural sensitivity of GPT-3.5 and GPT-4.0 for individual FAQs were shown in Table S1. The mean linguistic accuracy score was 4.89 ± 0.31 for GPT-3.5 and 4.94 ± 0.23. There was no significant difference in mean linguistic accuracy score between GPT-3.5 and 4.0 in all questions (p = 0.26) as well as when stratified by FAQ sources. The mean linguistic accuracy score was comparable across three FAQ sources for GPT-3.5 (4.84 ± 0.37 vs. 4.93 ± 0.26 vs. 4.90 ± 0.31 for FAQs from OPTN, NHS, and NKF respectively; p = 0.70) and GPT-4.0 (4.95 ± 0.23 vs. 4.93 ± 0.26 vs. 4.95 ± 0.22 for FAQs from OPTN, NHS, and NKF respectively; p = 0.98) (Table 1).

Table 1 The mean score for linguistic accuracy and culture sensitivity of GPT-3.5 and 4.0

The mean culture sensitivity score was 4.96 ± 0.19 for GPT-3.5 and 4.96 ± 0.19 for GPT-4.0. There was no significant difference in mean culture sensitivity score between GPT-3.5 and 4.0 in all questions (p = 1.00) as well as when stratified by FAQ sources. The mean culture sensitivity score was comparable across three FAQ sources for GPT-3.5 (4.95 ± 0.23 vs. 4.93 ± 0.26 vs. 5.00 ± 0.00 for FAQs from OPTN, NHS, and NKF respectively; p = 0.55) and GPT-4.0 (5.00 ± 0.00 vs. 5.00 ± 0.00 vs. 4.90 ± 0.31 for FAQs from OPTN, NHS, and NKF respectively; p = 0.18) (Fig. 1).

Figure 1
figure 1

Comparative analysis of average accuracy and cultural sensitivity in AI-generated translations of kidney transplant information. Top panel: (Left) GPT 3.5: average accuracy across different organizations (OPTN, NHS, NKF) and overall score. (Right) GPT 3.5: average cultural sensitivity across different organizations (OPTN, NHS, NKF) and overall score. Bottom panel: (Left) GPT 4.0: average accuracy across different organizations (OPTN, NHS, NKF) and overall score. (Right) GPT 4.0: average cultural sensitivity across different organizations (OPTN, NHS, NKF) and overall score.


The study meticulously evaluated the translation capabilities of ChatGPT 3.5 and 4.0, focusing on translating kidney transplantation FAQs for the Hispanic community. The main results indicate that both versions achieved high levels of accuracy and cultural sensitivity, with ChatGPT 4.0 slightly outperforming 3.5 in terms of accuracy. Specifically, ChatGPT 3.5 demonstrated exceptional cultural sensitivity, especially in the NKF subgroup, while ChatGPT 4.0 consistently scored perfect accuracy across all questions. The study’s results are especially significant in the context of health equity. By offering accurate and culturally sensitive translations, AI models like ChatGPT can play a crucial role in leveling the informational playing field for non-English-speaking communities. This is particularly important for Hispanics affected by kidney diseases, who often encounter linguistic hurdles in accessing vital health information44,45,46. The ability of ChatGPT to provide translations that are not only linguistically accurate but also culturally resonant is key to its effectiveness as a tool for disseminating medical information.

While both versions demonstrated high accuracy and cultural sensitivity, it is noteworthy that ChatGPT 3.5 had occasional lower scores in either accuracy or cultural sensitivity in specific questions. This suggests that while the model is highly effective, there is room for improvement, particularly in handling certain nuances that require deeper cultural understanding. In contrast, ChatGPT 4.0’s consistent scoring of 5 in accuracy for all questions reflects advancements in AI technology, although it too faced challenges in cultural sensitivity in a few instances. The effectiveness of AI in translation is not solely dependent on linguistic accuracy but also on its ability to resonate culturally with the intended audience. This is particularly crucial in healthcare, where the cultural context can significantly impact how information is received and acted upon47,48,49.

Comparing this study’s findings with previous research in AI-driven language translation in healthcare, it’s evident that there have been significant advancements50,51,52,53,54. Earlier research often pointed out the limitations of AI models in grasping the complexities of language and cultural context, particularly in medical translations where both accuracy and sensitivity are crucial. These models typically struggled to maintain a balance between literal accuracy and the deeper layers of cultural context, resulting in translations that were technically correct but often lacked relevance and appropriateness in a real-world setting. In contrast, this study highlights a significant progress with ChatGPT 3.5 and 4.0, illustrating their improved ability to not only translate complex medical information accurately but also to consider cultural appropriateness in these translations. This progress signifies a move towards more sophisticated AI models that are linguistically adept and more in tune with the cultural and contextual aspects of language, meeting the practical needs of diverse patient groups, like those seeking information on kidney transplantation.

The study, while pivotal in evaluating the translation capabilities of ChatGPT 3.5 and 4.0 for kidney transplantation FAQs in Spanish for the Hispanic community, presents certain limitations that shape the scope and applicability of its findings. Its focus is narrowly tailored to a specific medical context and a particular linguistic group, which may not encompass the varied complexities of other medical domains or cater to different cultural backgrounds. The reliance on human evaluators introduces an element of subjectivity in assessing translation accuracy and cultural sensitivity, potentially affecting the consistency of the results. Furthermore, the study’s constraint to only two AI models limits a broader comparative analysis across the spectrum of available AI translation technologies. Future research, therefore, should aim to broaden the scope to include diverse medical topics and languages, extend evaluations to a wider range of AI models, and incorporate more objective assessment methods. Such expansion and refinement in research approach would not only enhance the generalizability of the findings but also deepen the understanding of AI’s potential in overcoming language barriers in global healthcare contexts.

In addition, in future research, the exploration of AI-driven translation tools like ChatGPT 3.5 and 4.0 in real clinical practice represents a critical area for advancement, especially in the context of kidney transplantation and health equity. These studies should focus on evaluating the impact of AI translations on patient outcomes, understanding, and engagement in their healthcare journey. Integration with healthcare systems, including electronic health records and patient portals, is also essential to assess the efficiency and effectiveness of AI tools in a clinical setting. Feedback from healthcare providers will be invaluable, offering insights into the practical utility, accuracy, and cultural appropriateness of these translations in enhancing patient care. Additionally, longitudinal studies observing the long-term effects of AI translation tools, their cost-effectiveness, and comparative analyses with traditional translation methods will provide a comprehensive understanding of the role of AI in reducing healthcare disparities. Such research is pivotal in determining the full potential of AI in improving communication and fostering health equity, particularly for linguistically diverse populations in need of specialized medical care like kidney transplantation.


This study demonstrates the significant potential of advanced AI models like ChatGPT 3.5 and 4.0 in bridging language gaps in the healthcare sector. By providing high-quality translations that are both accurate and culturally sensitive, these tools can greatly enhance the accessibility of medical information, particularly for underserved non-English-speaking populations. As AI technology continues to evolve, its role in supporting health equity and improving patient outcomes across diverse communities becomes increasingly vital.