Artificial intelligence in communication impacts language and social relationships

Artificial intelligence (AI) is already widely used in daily communication, but despite concerns about AI’s negative effects on society the social consequences of using it to communicate remain largely unexplored. We investigate the social consequences of one of the most pervasive AI applications, algorithmic response suggestions (“smart replies”), which are used to send billions of messages each day. Two randomized experiments provide evidence that these types of algorithmic recommender systems change how people interact with and perceive one another in both pro-social and anti-social ways. We find that using algorithmic responses changes language and social relationships. More specifically, it increases communication speed, use of positive emotional language, and conversation partners evaluate each other as closer and more cooperative. However, consistent with common assumptions about the adverse effects of AI, people are evaluated more negatively if they are suspected to be using algorithmic responses. Thus, even though AI can increase the speed of communication and improve interpersonal perceptions, the prevailing anti-social connotations of AI undermine these potential benefits if used overtly.


Introduction
Communication is the basic process through which people form perceptions of others [1][2][3] , build and maintain social relationships 4 , and achieve cooperative outcomes 5 .Applications of artificial intelligence (AI) are increasingly shaping the way that people communicate and interact with one another 6,7 .One of the most visible AI applications is AI-generated reply suggestions in text-based communication, commonly known as smart replies, which aim to help users compose messages with "just one tap" 8 .Despite the rapid deployment of AI applications in new products and contexts and people's growing concerns about the societal consequences of AI 9 , research has predominantly focused on the technical aspects and largely ignored the potential social impacts of integrating AI-generated messages into human-to-human communication.Reports from the AI Now Institute liken this scenario to "...conducting an experiment without bothering to note the results" 10 and have repeatedly noted the under-investment in research on the social implications of AI technologies while calling for an increase in interdisciplinary research focusing on examining these systems within human populations [10][11][12] .
As of last year, algorithmic responses constituted 12% of messages sent through Gmail alone 13 , representing about 6.7 billion emails written by AI on our behalf each day 14 .Smart reply systems aim to make text production more efficient by drawing from general and user-specific text corpora to predict what a person might type and generating one or more suggested responses that a user can choose from when responding to a message 8 .Users' rapid adoption of this type of AI in interpersonal communication has been facilitated by a large body of technical research regarding various methods for generating algorithmic responses (e.g., 8,15,16 ).However, the social implications of this type of AI involvement remain largely unclear.
Given the broad integration of AI systems like these in our social lives, a growing body of work at the intersection of computer and social science is concerned with understanding how such systems may be influencing human behavior and how they are perceived (e.g., 7,17,18 ).Initial studies have found that algorithmic responses can impact how people write 19 , and people believe that the mere presence of smart replies influences the way that they communicate, in part because of the linguistic skew of smart replies, which tend to express more positive motions 20 .However, the social implications of smart reply use remain unclear.
To examine the social consequences of using AI to help generate messages, we conducted a set of randomized controlled experiments to study how the display and use of AI-generated smart replies in real-time text-based communication affects how people interact and perceive each other.We show that a commercially-deployed AI affects various aspects of interpersonal communication.More specifically, we find that AI influences multiple dimensions of social engagement including communication efficiency, emotional tone, and interpersonal evaluations in both positive and negative ways.

AI is Perceived Negatively but Improves Interpersonal Perceptions
Inspired by theories of how computer-mediated communication can affect intimacy and relationship maintenance 21 , we hypothesized that seeing AI-generated reply suggestions could influence participants' feelings of connectedness with their conversation partner.To test the effect of AI mediation on interpersonal trait inferences and perceptions of cooperativeness, we developed a messaging application in which we can manipulate the smart replies that are displayed to users while collecting data about the conversation.
To identify the effects and perceptions of using algorithmic responses in conversation (beyond merely being presented with them), we randomly assigned 219 pairs of participants into three different messaging conditions: 1) both participants can use smart replies (i.e., suggested responses generated using the Google Reply API 22 ), 2) only one participant can use smart replies, or 3) neither participant can use smart replies.Participants engaged in a conversation about a policy issue while the application tracked their use of smart replies.By presenting participants with smart replies, they were encouraged to use them in conversation, which serves as our causal identification strategy for estimating the effects of smart reply use by the self and the partner.After completing the conversation, participants were given a definition of smart replies and asked to rate how often they believed that their partner had used them.They also responded to established measures of dominance and affiliation (Revised Interpersonal Adjective Scale 23 ), and perceived cooperative communication ( 24  We find that the presence of algorithmic responses was a strong encouragement to use them: smart replies account for 14.3% of sent messages on average (t(211)=13.8,p<.0001), and Figure 1 (left panel) shows average smart reply use by experimental condition.Because the variation in smart reply use is experimentally and independently induced for each participant, we can use instrumental variable (IV) estimation to identify the consequences of increased smart reply use by the self and the partner (Figure 1, right panel).Using IV estimation, we find that increased use of smart replies by the self (but not the partner) led to more efficient communication in terms of the number of messages the self sent per minute (t(198)=2.21,p=0.0286).While smart reply use clearly improves communication efficiency, its consequences for interpersonal perceptions are more complex.
Participants are capable of recognizing their partner's use of smart replies to some degree: beliefs about how much their partner used smart replies correlated with actual use but not strongly (Pearson's r=0.22, t(97)=3.62,p=0.0005).Consistent with commonly held beliefs about the negative implications of AI in social interactions 25,26 , we find strong associations between perceived smart reply use by the partner and attitudes towards them.The more participants thought their partner used smart replies, the less cooperative they rated them (t(92)=-9.89,p<0.0001) and the less affiliation they felt towards them (t(92)=-6.90,p<0.0001), as shown in Figure 2, even after controlling for their partner's actual smart reply use.This shows correlationally that people who appear to be using smart replies in conversation ultimately pay an interpersonal toll, even if they are not actually using smart replies.However, this finding does not show causally how attitudes shift in response to actual smart reply use.

Affiliation
Cooperative Communication Our IV estimation strategy reveals that increased use of smart replies by the partner actually improved the self's rating of the partner's cooperation (t(167)=2.23,p=0.0273) and sense of affiliation towards them (t(167)=2.54,p=0.0120).Although perceived smart reply use is judged negatively, actual use results in more positive attitudes.Moreover, we find that conversation sentiment became more positive as a result of both the self using more smart replies (t(198)=2.06,p=0.0404) and the partner using more smart replies (t(198)=2.11,p=0.0362).This finding suggests that the effects of AI mediation on interpersonal perceptions are related to changes in language introduced by the AI system.

AI Sentiment Affects Emotional Content in Human Conversations
To better understand how the sentiment of AI-suggested responses affects conversational language, we conducted a second experiment.Using a between-subjects design, we randomly assigned 299 pairs to discuss a policy issue using our app in one of four conditions: with Google smart replies (i.e., participants receive algorithmic responses generated using the Google Reply API 22 ), positive smart replies (i.e., participants receive algorithmic responses that have positive sentiment, as rated by crowdworkers), negative smart replies (i.e., participants receive algorithmic responses that have negative sentiment, as rated by crowdworkers), or no smart replies (i.e., participants do not receive algorithmic responses).We measured conversation sentiment using VADER, a lexicon and rule-based sentiment analysis tool that is ideal for analyzing short, social messages 27 .A robustness check with another sentiment analysis dictionary is presented in the Methods section.We aggregated VADER scores into a sentiment polarity score ranking from most positive (1) to most negative (-1), with neutral (0) in the middle.The average conversations comprised 20 messages (sd=8.6)and lasted 6.33 minutes (sd=2.67).
We found that the presence of positive and Google smart replies caused conversations to have more positive emotional content than conversations with negative or no smart replies (t(127)=2.75,p=0.007, d=.352; Figure 3).Moreover, the finding that widely-used Google smart replies have a similar effect on conversation sentiment as a set of positive (t(150)=0.51,p=0.61) but not negative smart replies (t(127)=2.40,p=0.018) highlights the positive sentiment bias of smart replies in commercial messaging apps.Taken together, these findings demonstrate how AI-generated sentiment affects the emotional language used in human conversation.

Discussion
Our research shows that a commercially-deployed AI can fundamentally reshape how people communicate with others, and this has both positive and negative consequences.We find that people choose to use AI when given the opportunity, and this increases the efficiency of communication and leads to more emotionally positive language.However, we also find that as participants think that their partner is using more algorithmic responses, they perceive them as less cooperative and feel  less affiliation towards them.This finding could be related to common assumptions about the negative implications of AI in social interactions.For example, humans are already predisposed to trust other humans over computers 26 , and most current communication systems featuring AI mediation lack transparency for users (i.e., the sender knows that their responses have been modified or generated by AI, while the receiver does not).Taken together with users' preference for reducing uncertainty in interactions 28 , this could lead to negative perceptions of AI in everyday communication.Indeed, these negative perceptions are confirmed by recent work, such as 20 , where users described how smart replies often did not capture what they wanted to say and could be altering the way that they communicated with others, and 25 , where text suspected of or labeled as being written by AI was perceived as less trustworthy.
Despite negative perceptions about AI in communication, we find that as people actually use more algorithmic responses, their communication partner has more positive attitudes about them.Even though perceived smart reply use is viewed negatively, actual smart reply use results in communicators being viewed as being more cooperative and affiliative.In other words, it seems that the negative perception of using AI to help us communicate do not match the reality.
Our work provides evidence that AI can alter the language that people use when interacting with others.Understanding the impact on language is important because language is inextricably linked with listeners' characterizations of a communicator, including their personality [1][2][3] , emotions 29,30 , sentiment [31][32][33][34] , and level of dominance (i.e., expressing more aggressive instead of affiliative behavior in an interaction) 35 .Indeed, we find that AI-generated responses changed the expression of emotion in human conversations.The influence of AI on human emotional communication is deeply concerning given that AI already writes about 6.7 billion emails on our behalf daily 14 .With the increasing popularity of other forms of AI mediating our everyday communication (e.g., Smart Compose 36 ), we have little insight into how regularly people are allowing AI to help them communicate or the potential long-term implications of the interference of AI in human communication.Our work suggests that interpersonal relationships are likely to be affected, potentially positively.However, the demonstrated changes in language suggest that we could potentially lose our personal communication styles, with language becoming increasingly homogeneous over time.While current implementations of AI in messaging apps increase efficiency by allowing users to respond to messages more quickly, smart reply use is still viewed negatively, and as we have now demonstrated, it has the ability to alter our language when communicating with other humans.
Our work has implications for the development of AI systems and highlights both opportunities and risks of deploying such systems.A recent laboratory study has shown that a humanoid robot can improve interpersonal communication when expressing vulnerability within a team 37 .Our work takes this research further by demonstrating how a commercially-deployed AI can influence interactions in positive ways through much more subtle forms of intervention than a robot's overt behavior.Merely providing suggestions changes the language used in a conversation, and the changes are consistent with the linguistic qualities of the algorithmic responses.Additionally, previous work has shown that when conversations go awry, people trust the AI more than the person that they're communicating with and assign some of the blame that they otherwise would have assigned to this person to the AI 38 .Taken together, these findings suggest possible opportunities for developers to affect 4/11 conversational dynamics and outcomes by carefully controlling the linguistics of smart replies that are shown to users, such as in 39 .On the other hand, the finding that changes in language are consistent with changes in smart replies raises potential risks as AI gains continues to gain influence over the our social interactions.Knowing that AI can shape the way that we communicate, it is important for researchers and practitioners to consider the broader social consequences when designing algorithms that support communication.
Overall, while AI has the potential to help people communicate more efficiently and improve interpersonal perceptions in everyday conversation, users should be cautioned that these benefits are coupled with alterations to the emotional aspects of our language and a corresponding potential loss of personal expression.

Methods
The study procedure and all materials were approved by our Institutional Review Board (1610006732), and the study was pre-registered on AsPredicted (40389).allows us to recruit participants online (e.g., through crowdsourcing platforms) and engage them in real-time interpersonal communication tasks while receiving smart reply support, as shown in Figure 4.

Web-Based AI-MC Platform
The platform is designed as a web application that allows two participants to text chat with one another in real time and runs on all major modern browsers (e.g., Google Chrome 60, Mozilla Firefox 54, Microsoft Edge 14 and Apple Safari 10).It is built using Node.js and MongoDB on the backend with jQuery and Semantic UI framework on the client side.The interface is reactive to device type and resizes itself to work well on desktops, tablets, and mobile devices (Android and iOS).Throughout the design process, we elicited feedback from colleagues to ensure that the application seemed natural and easy to use.Like in existing commercial messaging applications that feature smart replies, in addition to the standard text box for typing messages, participants can also receive smart replies that they can tap to send.Participants can also scroll to see the history of their conversation at any point during the chat.
Four implementations of the messenger were used in this work: positive and negative sentiment smart replies, real smart replies (i.e., generated by the Google Reply API 22 ), and no smart replies.
In the positive and negative sentiment smart replies conditions, the smart replies shown to participants had only positive or negative sentiment, respectively.For example, in the positive condition, a participant might see smart replies such as, "I like it" and "I can't agree more", whereas in the negative condition, a participant might see smart replies such as, "I don't get it" and "No you are not".These smart replies are chosen randomly from an input json file without being too repetitive (i.e., all three utterances shown in each instance are different, and the same utterance is not shown in immediately subsequent instances).These utterances were pulled from previous work 20 where crowdworkers rated the sentiment of smart replies, and the suggestions included only those that were rated as having definitive positive or negative sentiment, respectively.In both of these implementations, each time participants sends or receives a message, the smart replies are updated.
In the implementation that did not include smart replies, which served as our control condition, participants had to manually type each message that they sent.
The final implementation uses Google's Reply model 22 to generate smart replies.However, since this model does the preand post-processing tasks during run-time and its framework is built with C++ and compiled into an Android archive, it is not possible to run it on desktop environments.A stand-alone CPython library on the top of the Reply model can be compiled on a Linux operating system 40 , which we used to generate smart replies using Google's Reply model.When users send or receive a message, the Python API receives that message and generates smart replies through the Reply model.
The survey itself was conducted using Qualtrics.After obtaining consent, participants were informed that they would be using a messaging system to complete a task with an anonymous partner.Participants were then presented with a task involving a discussion of unfair rejection of work, an issue that is relevant to all crowdworkers on Mechanical Turk 41 .Specifically, we asked pairs to come to an agreement on the "top 3 changes that Mechanical Turk could make to better handle unfairly rejected work".Participants were asked to open the web-based messaging platform in another window while still viewing the Qualtrics survey.After opening the messaging platform, participants waited up to 5 minutes for another participant to enter the conversation.If 5 minutes elapsed without another participant arriving, participants were able to prematurely exit the survey and receive partial compensation.Once another participant arrived, the pair was as much time as they needed to come to an agreement on a ranked list.When finished with the task, participants could press a "Conversation complete" button in the messenger and receive a conversation completion code that they pasted into the Qualtrics survey to confirm that they had completed a conversation with a partner.
After verifying that a conversation was completed and giving a brief description of smart replies, we asked participants how much they believed their partner had used smart replies.Participants were also asked to fill out the Perceived Cooperative Communication scale and the Interpersonal Adjective Scale, Revised (IAS-R).
Perceived cooperative communication was measured through a 7-item scale 24 where participants rated their agreement with statements describing cooperative communication in their overall interaction with their partner.The instructions read, "Thinking about your interaction with your partner, please rate the extent to which you agree with each of these statements".Participants rated each statement on rating-scale items anchored by "Strongly disagree" (1) and "Strongly agree" (7).
The IAS-R provides an empirical measure of various dimensions that underlie interpersonal transactions 23 .To shorten the measure, two adjectives with the highest loading factors from each interpersonal octant were selected, based on the analysis of Wiggins et al 23 , resulting in 16 items to be ranked.The instructions read, "Below are a list of words that describe how people interact with others.Based on your intuition, please rate how accurately each word describes your conversation partner" (adapted from 42 ).Participants rated each statement on rating-scale items anchored by "Extremely inaccurate" (1), "Somewhat

Robustness Check for Sentiment Results
We presented an analysis of conversation sentiment using VADER, a lexicon and rule-based sentiment analysis tool that is ideal for analyzing short, social messages 27 .As in Study 1, we excluded conversations from the analysis that had less than 10 messages exchanged overall and where one participant sent less than 3 messages.
To ensure that results do not significantly change with other dictionaries, we performed a robustness check using Linguistic Inquiry and Word Count (LIWC), a dictionary-based text analysis tool that determines the percentage of words that reflect a number of linguistic processes, psychological processes, and personal concerns 27 .To verify our findings with respect to sentiment from VADER, we analyzed Affect scores from LIWC.Affect, with values ranging from 0-100, is made up of Positive and Negative Emotion variables, which also range from 0-100.All findings with respect to VADER sentiment were confirmed using LIWC.We found that the presence of positive and Google smart replies caused conversations to have higher affect than conversations without smart replies (t(124)=2.95,p<0.001, d=0.272).The effect of positive and Google smart replies on affect was statistically similar (t(150)=0.354,p=0.724).The presence of negative smart replies had a strong negative effect on conversation affect compared to the control condition without smart replies (t(123)=-3.50, p<0.001, d=0.454).

Limitations
There were several limitations to this work.First, we analyzed conversations from participants completing a contrived task on Mechanical Turk.Although we attempted to choose a task that would be personally relevant to all crowdworkers and effectuate the interpersonal closeness that we hoped to examine, many other types of everyday messaging conversations exist, and future work should examine how these results hold up in disparate contexts.
Since our web-based messenger is not yet robust enough for mobile use, this work focused specifically on AI-mediated messaging conversations in a desktop computer environment and may not generalize to similar messaging situations on other use contexts.Interpersonal perceptions in mobile messaging contexts featuring smart replies should also be examined.
As is standard in similar literature (e.g., 23 ), interpersonal perceptions were measured as momentary states.However, these perceptions change and develop over time, so future work should examine whether and how these measures are affected longitudinally under the influence of smart replies.Similarly, these studies occurred with anonymous crowdworkers completing a one-time interaction.We do not know whether our findings would be different in relationships with various levels of interpersonal closeness.Future work should investigate how interpersonal perceptions are related to smart reply use in more socially intimate relationships, such as between friends or co-workers.Additionally, we investigated interpersonal perceptions resulting from real-time messaging conversations, which could manifest differently in other communication contexts.Future work should examine how interpersonal relationships are affected by the presence of AI mediation in asynchronous communication

Figure 2 .
Figure 2. Average rating of the partner's affiliation and cooperative communication by the self for different levels of perceived smart reply use by the partner (N = 361).Error bars show 1 cluster-robust standard error.

Figure 3 .
Figure 3. Mean overall conversation sentiment by experimental condition: both participants assigned to no smart replies, negative, positive, or Google smart replies.Error bars show 1 cluster-robust standard error.

Figure 4 .
Figure 4. We can use our web-based AI-MC platform to control and record the smart replies that participants see.This figure shows both positive and negative sentiment smart reply examples (i.e., blue and grey boxes, respectively).During actual use, participants see only one of those sets.
For example, a message with an Affect score of 50 could be made up of a Positive Emotion score of 50, a Positive Emotion score of 25 and a Negative Emotion score of 25, or a Negative Emotion score of 50.

Figure 5 .
Figure 5. Mean overall conversation affect by experimental condition: both participants assigned to no smart replies, negative, positive, or Google smart replies.Error bars show 1 cluster-robust standard error. ).