Introduction

In sub-Saharan Africa (SSA), rice productivity is often low due to sub-optimal crop management practices by smallholder farmers1,2,3. Farmers have limited access to agricultural extension services due to the limited number of extension agents (EAs), which results in many rice farmers not having access to updated advice for rice production4,5. Furthermore, within rural socio-cultural systems, EAs often do not effectively reach women farmers. In some areas in SSA, women are negatively affected by socio-cultural and religious constraints, which forbid them from communicating freely with men outside their families5. A wide variety of technology dissemination and scaling tools (rural radio, videos, etc.) have been developed and used to reach women farmers6,7. A dissemination approach in which women service providers reach women farmers has been also proposed for providing field-specific recommendations to farmers, which requires service providers to have digital technologies (smartphone, tablet)5. While further efforts are needed to improve access to electricity and internet to aid the adoption of digital extension services in the rural agrarian communities in SSA, recent development of artificial intelligence (AI) assistance is an unexplored resource for addressing challenges farmers face. One such platform, ChatGPT, represents a new generation of AI technologies driven by advances in large language models8. A recent study on health care reported that although the system was not developed to provide health care, the chatbot responses were preferred over physician responses and rated significantly higher for both quality and empathy9. However, its ability to help address farmers’ questions on rice cultivation in SSA is unexplored.

Therefore, the objective of this study was to evaluate the ability of an AI chatbot assistant (ChatGPT) to provide quality responses to farmers’ questions on rice production. We tested ChatGPT’s ability to respond with high-quality answers to farmers’ questions, by comparing the chatbot responses with EAs’ responses to questions in Kano State, one of major rice producing areas in northern Nigeria10,11.

Results

Table 1 shows questions related to rice production, which are based on answers from 107 interviewed farmers about questions they want to ask EAs for improving their rice production. Popular questions mentioned by farmers were on types of inputs (variety, fertilizer, herbicide). In terms of number of questions in each intervention area, crop establishment, insect and disease management, and weed management had most (5, 5, and 4, respectively). Examples of EAs’ and chatbot responses to questions (nos. 1–3) are shown in Table 2. Mean chatbot responses were significantly longer (335 [202–468] words) than both EAs’ responses with and without extension materials, which had no difference (10 [2–45]) (Fig. 1).

Table 1 List of questions used for this study, and the target area in terms of agronomic practice. Questions are in order of number of farmers giving the same or similar questions (most to fewest).
Table 2 Example of extension agents’ and chatbot responses to questions related to rice production in Kano State, Nigeria.
Figure 1
figure 1

Number of words per response authored by extension agents (EAs) and chatbot. As there was no difference in number of words per response by EAs without and with extension materials, data from both were combined. Different letter indicates significant difference (P < 0.001).

On average over 32 questions, evaluators rated chatbot responses significantly higher quality than responses by EAs without and with extension materials by 19 and 15% (P < 0.01) (Table 3). The mean rating for chatbot responses corresponded to a “good” response (3.8), whereas those for EAs’ responses without and with extension materials corresponded to an acceptable response (3.2 and 3.3, respectively). There was no significant difference in scores between EAs’ responses without and with extension materials. The Pearson correlation coefficient between scores of responses by EAs without and with extension materials was positive and significant (r = 0.71, P < 0.01). The correlation coefficients between scores of responses by chatbot and EAs without and with extension material were not significant (r = − 0.13, P > 0.05; r = − 0.15, P > 0.05).

Table 3 Mean scores of responses by extension agents (EAs) with and without extension materials and chatbot to 32 questions.

The proportion of responses rated very good quality (5; range between 1 and 5) was significantly higher (p < 0.05) for chatbot responses than for those of EAs without and with extension materials (Table 4). The chatbot achieved the best score nearly six times as often as EAs (40% vs. 6% and 8%). In contrast, the proportion of responses rated acceptable was significantly lower for chatbot compared to EAs without and with extension materials (18% vs. 51% and 46%; Table 4). There was no significant difference in the number of responses rated poor and very poor between the chatbot and EAs without and with extension materials (Table 4).

Table 4 Distribution (%) of evaluators’ scores on responses by extension agents (EAs) with and without extension materials and chatbot to 32 questions.

Across the 32 questions, the evaluators preferred the chatbot response over the responses by EAs without and with extension materials for 78% and 69%, respectively (Fig. 2). When we looked at the responses where the chatbot had lower scores than those authored by EAs (questions 11, 12, and 23 in Tables 1 and 3) and having lower score than 3 (14 and 16), we found that the chatbot provided inaccurate information (Table 5)—i.e., the chatbot-recommended seed rate was too high (11); planting time was not correct in dry season (12); financial services were not available (14); soil testing was not recommended (16); recommended number of seedlings per hill was different but should not be different between the two seasons (23).

Figure 2
figure 2

Cumulative probability of the difference in score between responses authored by extension agents (EAs) without and with extension material and chatbot. Response scoring options had a 1–5 scale, where higher values indicated greater quality.

Table 5 Responses authored by extension agents (example) and chatbot (summary only), for questions where chatbot responses had lower scores than those of extension agents (Tables 1 and 3).

After reviewing the chatbot responses, five out of the six EAs who had answered the 32 questions indicated that the chatbot provided relevant answers on rice cultivation and could be used as a tool for EAs to provide farmers with advice (Table 6). All EAs rated the chatbot responses better than their own answers to the questions, and were willing to use chatbot in the future to get the required information to assist farmers.

Table 6 Responses of the six extension agents after reviewing the chatbot responses.

Discussion

While chatbot responses were much longer than EAs’ responses, the evaluators preferred chatbot-generated responses over those by EAs even when the latter had extension materials. In fact, having extension materials did not significantly improve quality scores and the scores were highly correlated between responses by EAs with and without extension materials. The chatbot is programmed to provide detailed and comprehensive responses, whereas EAs may provide more concise and practical advice based on their experience. However, the study also found that the evaluators preferred chatbot responses over those provided by EAs, even when the latter had extension materials. Although the evaluators valued the detailed and comprehensive information provided by the chatbot, farmers might have different opinions from them. Longer answers by the chatbot could potentially overwhelm farmers with too much information. Further evaluation by farmers is needed, if the chatbot is directly used by farmers.

This result also confirmed a recent study on health9, which reported that chatbot responses were preferred over physician responses and rated significantly higher for both quality and empathy. The results from this study suggest that a chatbot might become a useful source of information for advising farmers who have limited access to EAs. However, there was no relationship between scores on the responses by the chatbot and EAs and the chatbot provided inaccurate information related to planting time, seed rate, and fertilizer application rate and timing and that message should be made known to rural farmers. Our result supports the paper on large language models (LLMs) and agricultural extension services12 which proposed an idealized LLM design process with human experts in the loop. Consequently, direct use of this tool by farmers is not recommended at present. Instead of direct use, chatbot could assist EAs when giving advice to farmers by drafting a message based on farmers’ questions. Such an AI-assisted approach could save EAs’ time, enabling them to reach more farmers. Furthermore, EAs could also improve their overall communication skills by reviewing and modifying AI-written drafts. Consequently, further research is needed to evaluate how an AI assistant will enhance EAs responding to farmers’ questions and improve their skills and knowledge.

For direct use by farmers, this study highlights the importance of ensuring that the chatbot is programmed with accurate and up-to-date information and that their responses are regularly reviewed and updated by experts in the field. This could involve ensuring that the AI assistant technologies are tailored to the needs and context of the farmers, providing practical and actionable advice, and ensuring that the AI assistant technologies are developed and implemented in a way that is transparent, accountable, and responsive to the needs and concerns of farmers. By addressing these challenges, farmers could directly benefit from AI assistant technologies. Further research is also needed to evaluate farmers’ perception of advisories provided by AI assistant technologies, changes in farmers’ practices after receiving advisories, and their target impact area (e.g., productivity, resource use efficiency, soil health)13.

Methods

In June 2023, we conducted interviews with farmers who grow rice in irrigated conditions in Kano State, northern Nigeria. Seventeen women and 90 men farmers were randomly selected from 4032 farmers who had participated in an on-farm survey the previous year (unpublished data) and were asked about questions they want to ask EAs for improving their rice production. Each farmer provided up to five questions. After compiling all the questions, similar questions were merged. We also removed some questions that were not relevant for irrigated rice production (e.g., drought-tolerant varieties). We modified questions to make sure that we consistently included information on location and rice production system and protected farmers’ identities. Table 1 shows the list of 32 questions used in this study, which covered a wide range of agronomic interventions including seed, variety, land preparation, crop establishment method, and management of nutrient, water, weeds, and insects and disease.

On August 10, 2023, the full text of the questions (Table 1) was put into a fresh chatbot session8 free of prior questions that could bias the results, and the chatbot response was saved in a Word file.

Six EAs were nominated from an agricultural extension office in Kano based on their expertise and knowledge of rice cultivation practices. To protect EAs’ identities, we do not specify names of the organizations in this paper. Three of the agents were women. None of them had used a chatbot for their extension services before. They were divided into two groups. One group (three agents) used extension materials for answering questions, while the other group did not. They wrote answers to questions on paper in their offices under the supervision of enumerators. The number of words in the responses by EAs with/without extension materials and the chatbot were counted. After EAs completed their responses, they reviewed the chatbot responses and were then asked about its potential use.

After all responses from the six EAs and the chatbot were compiled, for each question, order of the seven answers were randomized. So that, the order can be different from one question to another. Then, we labeled 1 to 7 in each question to blind evaluators to the identity of the responders. We eliminated information that could be used to identify respondents’ identity by evaluators (for a chatbot, we eliminated statements such as “I’m an artificial intelligence”). All the responses were evaluated by four local rice experts—two from research organizations and others from public extension agencies having good knowledge of local rice production. The evaluators were asked to judge the quality of the responses in terms of local relevance using Likert scales (1, very poor; 2, poor; 3, acceptable; 4, good; and 5, very good).

Scores were averaged across evaluators for each question. This method is used when there is no ground truthing in the outcome being studied, and the evaluated outcomes themselves are inherently subjective. Thus, the mean score reflects evaluator consensus, and disagreements (or inherent ambiguity, uncertainty) between evaluators is reflected in the score variance. Thus, analysis of variance (ANOVA) was conducted to assess difference in the quality score of EAs with and and without extension materials responses to ChatGPT responses. The chi-squared test was applied to identify significant differences between evaluators’ scores on responses by extension agents (EAs) with and without extension materials and chatbot. For the chi-squared test, the null hypothesis states that there is no significant difference between the evaluators’ scores, whereas the alternative hypothesis states that these scores differ. We employed a t-test to compare the difference in the number of words in EAs and chatbot responses because the number of words in EAs with and without content is similar. Shapiro and Bartlett tests were used before ANOVA and t-tests to ensure that the data had a normal distribution and was homogeneous in terms of variance. Mean separation was done using the Tukey HDS approach. Pearson correlation between scores of the responses of EAs and the chatbot was performed. All statistical analyses were performed in R statistical software, version 4.3.114.

The distribution of the expert assessment of the responses is presented in Fig. 2. We report the percentage of questions for which the chatbot response was preferred and identified the questions in which the chatbot responses had lower scores than those of EAs.

Ethical approval

The authors confirm that all methods were carried out in accordance with relevant guidelines and regulations. The authors confirm that all experimental protocols were approved by Africa Rice Center Scientific Committee. The authors confirm that informed consent was obtained from all subjects involved in this study.