Introduction

The fundamental concept of chatbots dates as far back as 1921 when robotics was introduced. Their value became obvious in the year of 2000 when the SmarterChild intelligent agent with AOL Instant Messenger facilitated stock and weather searches (Ask et al. 2016). Gradually, this concept gained more recognition in 2011 with the launch of Apple’s Siri, a voice assistant (VA) with the chatbot technology, and those VAs have been rapidly adopted in diverse fields (Johnson et al. 2012). Over the years, the hype about chatbots has developed due to the significant improvement in artificial intelligence (AI) algorithms, such as natural language processing (NLP) and machine learning (ML) technologies (Rahman et al. 2017). These AI algorithms enabled machines to better recognize information and learn from data by forecasting outcomes (Uliyar 2017). Further customization of AI-enabled chatbots is now being made to better understand human communications (either written or spoken) and to communicate back to people in the same natural language that ought to be understood by human beings as if they were human agents (Uliyar 2017).

Chatbots therefore have been applied to various industries and more than 85% of customer interactions are expected to be replaced by chatbots by the end of the year 2020 (Wirtz et al. 2018). Especially in the tourism and hospitality industry, chatbots have been applied to facilitate services like bookings/reservations, recommendations, and other services (Nica et al. 2018; Ukpabi et al. 2019). China is one of the leading countries in this field in terms of its various chatbot-enabled applications. For example, major online travel agencies (OTAs) in China, such as Ctrip.com, Qunar.com, and Figgy.com, have already introduced chatbot services on their websites to better help their customers book travel packages and flight tickets. It is acknowledged that when chatbots are implemented in these e-Commerce and e-Service websites, key dimensions of chatbot service quality should be identified and considered in the website design process of OTAs (Jain et al. 2018). However, it is found that chatbot quality dimensions and their effect on users’ confirmation/satisfaction and intention to continue using this service are yet to be discovered in the current literature. For the quality dimensions, categorizations of quality dimensions vary by different quality assessments. For example, Parasuraman et al. (1988) proposed the quality dimensions in the context of the customer services in the telecommunications and banking industry, whereas other studies focused on organizational databases or systems. The quality dimensions thus have been less examined cohesively and comprehensively in the context of chatbot services. As to the post-acceptance model of information systems (IS) continuance (i.e., the expectation-confirmation model (ECM) of IS continuance), it has been used to examine post-use adoption of smart technologies and new online services (e.g., Kim et al. 2019; Li et al. 2020a; Park 2020), and some studies included antecedents of post-use confirmation in the extended ECM (e.g., Cheng 2014; Hong et al. 2017; Nascimento et al. 2018). However, the ECM has been rarely extended to examine users’ post-use confirmation, satisfaction, and use continuance in the context of OTAs by proposing chatbot quality dimensions as antecedents, which demonstrates a research gap on the quality dimensions for the post-acceptance model of IS continuance to a larger extent, underscoring the importance of this study.

While technology-enabled consumer-facing services are prevalent in many industries, for some people, using technology devices to serve themselves is still challenging; these people prefer interacting with real human agents. In the context of travel services as well, whereas some users are quite familiar with using travel websites or mobile apps to get the travel and hospitality services they want, others still want to interact with human travel agents due to some fear or uncomfortableness of dealing with technology-enabled services. While chatbot agent services are developed by OTAs to automate travel agents’ repetitive tasks and reduce service costs, they are also meant to reduce some users’ fear and uncomfortableness of using tech-enabled services by providing them with a more human-like interface than previous web/mobile-based self-services (Uliyar 2017). To examine this phenomenon of an individual’s fear or uncomfortableness of using technologies (i.e., technophobia), scholars have introduced the concept of technology anxiety and investigated their roles in users’ technology adoption (Meuter et al. 2003). Besides, there are other theoretical perspectives on human-likeness (i.e., humanness) that explain the phenomena on human reactions to the technologies which provide human-like service interfaces (e.g., Lankton et al. 2015; Mori 1970; Purington et al. 2017). We, therefore, integrate the literature on both technology anxiety and humanness and apply them in our research to investigate how users with a high level of technology anxiety consider and react to chatbot services (e.g., as either another type of technology-enabled services or non-technological human-like agents) at their post-adoption stage.

Our research purpose is threefold. First, this study attempts to identify quality dimensions of chatbot services perceived by users (i.e., understandability, reliability, responsiveness, assurance, and interactivity) which have rarely been examined in the OTA context based on IS literature. Second, drawing on the extended post-acceptance model of IS continuance (i.e., the extended expectation-confirmation model), we examine how these quality dimensions of chatbot services influence user confirmation and satisfaction, which in turn leads to use continuance intention. Third, we will examine how technology anxiety moderates the relationships between chatbot quality dimensions and post-use confirmation in order to see if high technology anxiety may lead to a stronger or a weaker relationship, which can inform us whether users consider chatbot services as human-like agents or technology-enabled services. This research would contribute to the body of knowledge on human-computer interactions with AI-driven services and the IS continuance model in the context of chatbot services of Chinese OTAs. Practically, the results of this study can provide quality assurance specialists, e-service providers, and chatbot developers with guidelines to better understand chatbot users in enhancing their service adoption in the tourism and hospitality sector.

Theoretical background

Chatbot services in the tourism industry

The rise of chatbot services could be directly proportional to rising demand for a more convenient, quick, on-demand, and less pressured self-service (Terpening and Littleton 2016), while their growth and penetration are dependent on advancement in AI, NLP, ML technologies, and chatbot development platforms (Rahman et al. 2017). Chatbot services support both text-based and spoken interactions between humans and machines to facilitate conversation. Both forms are supposed to be designed to provide natural feelings in conversation, so that users should feel that they are having conversations with human agents but not with machines (Prasetya et al. 2018). These conversations are triggered mainly by users’ inputs (e.g., questions or wake-up calls) and consist of smaller task-oriented dialogues (McTear et al. 2016). In addition, chatbot systems are equipped with contextual awareness technologies (Pearl 2016), which enable the machine to wait until it receives a message before taking its turn, by observing when a user is done typing, interpreting typed words, linguistics, and managing miscommunication (Skantze 2007).

Chatbots have been studied from several points of view. First, technical aspects of chatbots have been examined, such as the technologies for speech conversation systems (Abdul-Kader and Woods 2015), the development of chatbots using a reinforcement learning algorithm (Serban et al. 2017), and programming methods for chatbots (Long et al. 2019). Second, some research has primarily focused on human-chatbot interactions, such as how the use of chatbots can increase customer purchase (Luo et al. 2019) and how willing users are to collaborate and interact with chatbots (Ciechanowski et al. 2019). Third, as chatbot technologies are being applied to customer services, some efforts have been made to investigate the usability of the chatbot service (Kang and Kim 2017) and its impact on customer satisfaction (Chung et al. 2020). These attempts have recently been made in various industries, such as healthcare (Nadarzynski et al. 2019) and finance (Quah and Chua 2019). Especially, some empirical studies on the adoption of chatbot services have also recently been conducted in the context of instant messengers and social media (Kahiga 2019; Zarouali et al. 2018).

Among many industries that can benefit from chatbot services, we focus on the tourism industry for the following reasons. First, the tourism industry is considered one of the fields that benefit the most from chatbot services, along with the finance and retail industries (FlowXO 2020). Second, among these industries, the tourism industry seems to have the most percentage of people who are using web (online or mobile) services for their needs (over 80%), while some statistics show that the finance and retail sectors have about 70% of web service users, depending on the products or services they offer (Dubrova 2020; Milenkovic 2020; Osman 2020). Thus, as more percentage of users receive their services through online or mobile channels in the tourism industry, investigation of the role of chatbot services in the tourism context would provide valuable insights to both academicians and practitioners.

As to prior research on chatbot services in the tourism and hospitality industry, several attempts have been made only at the conceptual level. To name a few, Ukpabi et al. (2019) integrated organizational theories to study the firm-level adoption of chatbot services in the tourism sector; Zlatanov and Popesku (2019) introduced the current applications of AI technologies including chatbots in the tourism and hospitality industry; Buhalis and Yen (2020) explained that the benefits of using chatbot services outweigh the challenges for hotels; Ivanov (2020) elaborated the influences of automation technologies (e.g., chatbot services) on tourism and hospitality jobs; and Tussyadiah (2020) pointed out the directions for future studies on automation technologies in a tourism sector. However, these studies have neither identified the quality dimensions nor empirically examined their impacts on users’ adoption of chatbot services. Moreover, the capability and capacity of chatbot services in OTAs keeps accelerating. For example, it is argued by some key informants of chatbot service providers that chatbots operated by OTAs or airlines can provide faster services to travellers during peak travel seasons, with significantly improved accuracy and booking completion rates, and thus about 75% of travellers’ typical post-sales inquiries, such as booking- and post-sale-related questions, could be handled by the chatbots (Gupta 2019). In sum, although more and more customers are using chatbot-enabled OTAs these days (Ivanov and Webster 2019), an attempt to verify the key quality dimensions of chatbot services and their roles in facilitating users’ continuous use has rarely been made especially in the field of tourism and hospitality, which underscores the importance of this study.

Extended post-acceptance model of IS continuance

In order to answer the research question of what makes customers continue to use chatbot-enabled OTAs, the extended post-acceptance model of IS continuance is adopted in this study (Bhattacherjee 2001). This model uses theoretical underpinnings of the expectation-confirmation model (ECM) to explain the mechanism of how consumers of information technology (IT) products or services decide to continue use (or repurchase) a product they have previously adopted (purchased) (Liao et al. 2010; Oliver 1980). Key concepts embedded in this model include users’ post-use confirmation, perceived usefulness, satisfaction, and intention to continue to use an IS. Briefly, this model posits that when (or before) users start to use a product (or a service), they usually have a certain level of performance expectation from the product. After a while, they can access if their expectation about the performance of the product has been met (or exceeded). If so, they perceive that their expectation from the product has been confirmed, which leads to their satisfaction with the use of the product. In addition, users’ post-use assessment of the instrumentality (i.e., usefulness) of the product should also influence users’ satisfaction with the product. Then, this satisfaction should eventually make users continue to use the product or repurchase it (Bhattacherjee 2001).

While the original ECM of IS includes perceived usefulness to explain users’ post-use assessment of the instrumentality of a product, this study focuses only on users’ post-use confirmation of users’ initial expectations from chatbot-based OTAs and extends the ECM with key chatbot quality dimensions for post-use confirmation. The definitions in the context of this study are as follows. Post-use confirmation is defined as the extent to which a user’s initial expectation about the performance of chatbot-based OTAs has been met. Satisfaction is defined as the user’s positive emotional state from an appraisal of the jobs done by chatbot-based OTAs, while use continuance intention refers to the user’s intention to continue using the chatbot-based OTAs (Bhattacherjee 2001).

For the last two decades, the ECM of IS has been applied to numerous empirical studies on the use continuance of consumer electronic products and online services. Recently, this model has been applied to quite a few empirical studies on the post-use adoption of new online services and smart technologies. For example, Park (2020) used the ECM to examine users’ acceptance of smart wearable devices; Li et al. (2020a) extended the ECM to investigate users’ adoption of an augmented reality game app, Pokémon GO; and Kim et al. (2019) examined users’ continuous intention on accommodation apps based on the ECM. Notably, several studies that applied the ECM have extended the model by adding key antecedents for the theoretical loop of confirmation–satisfaction–use continuance. To name a few, Lee and Chen (2014) used DeLone and McLean’s (2003) three quality dimensions (i.e., information, system, and service qualities) as key influencing factors for users’ confirmation and found significant relationships among those quality measures and confirmation in the context of the users’ continuance of m-Commerce services; Cheng (2014) identified four quality dimensions specific to online learning and examined their impacts on user’s post-use confirmation, as well as the relationships among confirmation, satisfaction, and use continuance in the context of an e-Learning service for nurses; Susanto et al. (2016) extended the ECM by investigating the role of users’ trust in the system, perceived privacy, and perceived security in the use continuance of smartphone banking services; Hong et al. (2017) and Nascimento et al. (2018) both investigated the use continuance of smartwatches by examining the role of users’ innovativeness, habit, perceived value, perceived usability, and enjoyment, based on the ECM. These studies not only investigated the post-use adoption of smart devices, but also included antecedents of post-use confirmation (or satisfaction) in their extended ECM. Yet, little empirical effort has been made to extend the ECM and examine the factors affecting users’ post-use confirmation, satisfaction, and use continuance in the context of OTAs. Therefore, this study extends the ECM by proposing five chatbot quality dimensions, which is relevant in the context of chatbot-enabled OTAs.

Chatbot quality dimensions

To propose the quality dimensions relevant to chatbot services in OTAs, the literature on information systems (IS) and service quality was reviewed. The effort for identification of IS quality dimensions started from Zmud’s (1978) article on information dimensionality, where he suggested relevance, accuracy, factuality, quantity, reliability, and readability as quality features of information, and empirically validated the measurement properties of them. Then, since customers’ perception of service quality had been considered an important factor for companies to differentiate their products and services from competitors’ ones (Parasuraman et al. 1988) and the topic of information quality had become more and more important for both academia and practice (Lee et al. 2002), several seminal articles on IS quality were published. Although the scope of this study is not to list all quality dimensions of those seminal articles, comprehensively review, and integrate those articles, to introduce a few, Wang and Strong (1996) proposed four categories of data quality of IS (i.e., intrinsic, contextual, representational, and accessibility) with 15 sub-dimensions in the context of an organizational database, which became an important theoretical background for the studies on IS quality including Lee et al. (2002) and Nelson et al. (2005); Parasuraman et al. (1988) proposed and empirically validated the five dimensions of service quality (i.e., tangibles, reliability, responsiveness, assurance, and empathy) in the context of customer services in the appliance repair, banking, and telecom industries; Lee et al. (2002) came up with a 2 × 2 model of classifying the quality dimensions that belong to the quadrants made by the products vs. services and the confirming specifications vs. customer expectations criteria and tested their measurement properties in the context of information embedded in organizational systems. Later, DeLone and McLean (2003) updated their initial articles on the IS success model (DeLone and McLean 1992) and proposed system, information, and service quality dimensions that fit the context of e-Commerce services, and Nelson et al. (2005) proposed nine dimensions of information and system quality and validated their measurement properties with data collected from various industries.

The followings are the two takeaways from reviewing these seminal articles. First, some quality dimensions belonged to multiple categories (out of information-related, system-related, or service-related). For example, accessibility and reliability, once considered either data or information qualities in some studies (Lee et al. 2002; Wang and Strong 1996; Zmud 1978), are considered either system or service qualities in other studies (Nelson et al. 2005; Parasuraman et al. 1988). It is believed that these different categorizations of quality dimensions by different articles are due to the specificity of the target of quality assessment (i.e., different types of information/data presented, service provided, or technical aspect). Second, therefore, most of these seminal articles on IS quality proposed that the relative importance and salience of those quality dimensions should be different according to the type of IS and the use context. For example, Parasuraman et al. (1988) suggested that service quality dimensions can be adopted based on the services being investigated, DeLone and McLean (2003) mentioned that authors’ selection of quality dimensions depends on the research context and focus, and Nelson et al. (2005) also suggested that the relative importance of those quality dimensions may not be applicable beyond the context of the research. These points on the selection of quality dimensions were adopted in this study and a set of quality dimensions specific to this research context were come up with: chatbot services according to the features embedded in the technology and the context of use.

Therefore, among over twenty information-, system-, and service-quality dimensions identified in the six seminal articles that have been reviewed, it is started with the five service quality dimensions (i.e., tangibles, reliability, responsiveness, assurance, and empathy) proposed by Parasuraman et al. (1988) since their research context (the customer services in the telecommunications and banking industry) is similar to the context of chatbot services of this study; the other studies focused mostly on organizational databases or systems. Three dimensions from Parasuraman et al. (1988), namely reliability, responsiveness, and assurance, which are believed to be salient for chatbot-related smart services, were firstly selected.

Then, as chatbots are supposed to be equipped with NLP technologies that support the contextual understanding of human dialogues and enable interactive conversations with humans, which are relatively new but important quality dimensions of smart services that the extant studies on IS quality failed to propose, recent studies on the quality dimensions of smart services were further explored. Recent studies on smart end-user technologies have suggested the understandability and interactivity as key technological quality dimensions for smart services (Cho et al. 2019; McKinney et al. 2002), which are also believed to be relevant for chatbot services because these two dimensions properly cover chatbots’ capability of understanding human dialogues and providing interactive conversations when servicing users.

Five quality dimensions for chatbot services are therefore selected in the study: (1) understandability, (2) reliability, (3) responsiveness, (4) assurance, and (5) interactivity. The followings provide their conceptual definitions and differences among them with the selected reasons for the study. First, the perceived understandability of a chatbot is selected as an important quality dimension. Over the years, many researchers have identified chatbots’ top priority as being accurate at emulating human conversation. For example, a specific test designed for the quality of chatbots evaluates machines’ ability to show intelligent behavior to understand human conversation (Park et al. 2018). Nguyen (2019) also tried to examine whether a chatbot could improve the consumer experience when the chatbot possesses the capability of understanding. Research also has shown that chatbot agents that understand and use humans’ humor are ranked more likable, cooperative, and capable as well as provide better solutions and performance than those that do not understand it (Sensuse et al. 2019; Thies et al. 2017). The dimension of understandability in this study is distinct from the information understandability (or ease of understanding) as one of the information quality dimensions proposed by Lee et al. (2002) or Wang and Strong (1996). While the understandability of information in Lee et al. (2002) presents an information reader’s perception that the information presented by IS is understandable, the understandability of a chatbot service in this study refers to user’s perception that a chatbot service understands human’s dialogues, the context of a conversation, and the nuance of human language.

Second, reliability, responsiveness, and assurance are also selected as quality dimensions of chatbot services, which have been widely acknowledged in prior literature by adapting their conceptual definitions to better fit the context of chatbot services. Reliability is defined as a user’s perception that a chatbot service “has the ability to perform the promised service dependably and accurately” (Parasuraman et al. 1988, p.23), and it is relevant for chatbot services because providing reliable performance and information to users are considered critical when using chatbot-based services (Chung and Park 2019). AlHagbani and Khan (2016) designed a simple Arabic chatbot to find out that the reliability of chatbot services could increase the acceptance rate of chatbots in the Arabic world’s online communities. Kalia et al. (2017) discovered that the reliability of chatbots can be ensured if a meaningful response is provided in a conversation. Sensuse et al. (2019) also found that chatbots that are reliable could enhance the effectiveness of job performance and motivate further development. Responsiveness is defined as a user’s perception that a chatbot service shows “capability to help users and provide prompt service to users,” while assurance is defined as a user’s perception that a chatbot service has “knowledge and ability to inspire trust and confidence” to users (Parasuraman et al. 1988, p.23). Responsiveness and assurance are originally the service quality dimensions for human agents rather than those for technology-enabled services (DeLone and McLean 2003; Parasuraman et al. 1988). However, it is argued that these two quality dimensions are also relevant for chatbot services because chatbot-enabled service agents are supposed to act like humans in that they should provide users with (1) prompt (or responsive) services without delay in their responses to human’s requests as well as (2) trustworthy answers to make users feel assured that the services and responses they get from chatbots are not very different from those by human service agents. As to responsiveness, Meerschman and Verkeyn (2019) revealed that responsiveness is one of the important chatbot quality attributes and can be used for ensuring the quality of chatbots. Danilava et al. (2013) found that responsiveness could significantly improve the chatbot design in terms of creating interaction profiles. Nguyen (2019) further argued that responsiveness could significantly improve customer support chatbot systems. For assurance, Pereira and Díaz (2018) stated that quality assurance is a major dimension for the unsophisticated script-based conversational chatbots, and Lee and Park (2019) further discovered that assurance is a salient factor in service quality which influences user satisfaction and use intention of chatbots in the financial service industry. Most recently, Li et al. (2020b) built a chatbot, Jennifer, to examine the public information generated from reputable sources during the COVID-19 outbreak and found that the assurance quality of chatbots could be secured when the information is provided from reputable sources. However, both responsiveness and assurance have rarely been examined together in the context of chatbots, especially for OTAs, which underscores the importance of this study.

Third, the perceived interactivity of chatbot services can be defined as a user’s perception that her/his communications with a chatbot service resemble the dialogues s/he has with human agents (with multiple times of interactions), so that s/he feels in control of personal needs when using it (Cho et al. 2019; Heeter 1989). Interactivity is proposed as the last quality dimension for chatbot services since it has been considered an important factor for end user-facing systems to provide personalized services and increase user engagement (Neuhofer et al. 2015). Moreover, interactivity is found to be an important factor in increasing the humanness of chatbot-based systems (Go and Sundar 2019), the level of disclosure of sensitive topics (Sannon et al. 2018), and individuals’ evaluations of the movie (Sundar et al. 2016). The distinction between interactivity and understandability can be explained that the former focuses on users’ perception of their control over the interactive communication with chatbots while the latter focuses on the chatbot’s capability of understanding the conversation. Further, the distinction of assurance from interactivity is that the assurance dimension focuses on user’s feelings of after-use trust that the responses they get from chatbots are as trustworthy as those they would have received from human agents. Overall, five dimensions (understandability, reliability, responsiveness, assurance, and interactivity) of chatbot services capture users’ perception of chatbot enabled-services’ capability of (1) comprehending human’s conversations and providing (2) accurate, (3) spontaneous, (4) trustworthy, and (5) interactive services. Table 1 presents a summary of the key studies on each of the five chatbot quality dimensions identified in this study.

Table 1 A summary of key literature on chatbot quality dimensions

Technology anxiety and humanness of technology

Extant studies have proposed a concept of technology anxiety, which entails the degree to which individuals have difficulty or fear of understanding and using technologies (Meuter et al. 2003). Based on Meuter et al. (2003), The technology anxiety of chatbot services is defined as a user’s perception that s/he feels intimidation, unfamiliarity, and difficulty of using chatbot services. According to extant studies on the impact of technology anxiety on user reactions, those who have high technology anxiety negatively react to any new technology and avoid using it or computer-based services but seek human agents when they need any service (Meuter et al. 2003). Recent studies have provided evidence that technology anxiety (discomfort with technology or a part of technology un-readiness) is negatively associated with users’ adoption or use continuance of various technologies such as e-learning, mobile health, and self-service technologies (Chen et al. 2013; Deng et al. 2014; Kotrlik and Redmann 2009). Other studies have also found that technology anxiety negatively moderates the relationships between key antecedents and the adoption of technological devices (Kim and Forsythe 2008; Lee and Yang 2013; Yang and Forney 2013). For example, Yang and Forney (2013) found that mobile shopping consumers with a low level of technology-use anxiety tend to have a high level of perceived facilitating conditions and a stronger relationship between facilitating conditions and their performance expectations from mobile shopping.

This study attempts to examine if, in the case of chatbot services, technology anxiety works similar to or different from the context of using other types of conventional technologies. As mentioned above, chatbot technologies are often applied to online (mobile) services as human-like customer service agents, so that chatbot-enabled services are supposed to somehow replace real human agents for the customers who want to be served by human agents instead of using the self-service sections in the service providers’ websites or mobile apps (e.g., self-service menus for ‘booking online,’ ‘my account,’ or ‘pay bill’). Thus, it is argued that individuals who have a fear of interacting with machines and prefer human agents to technology-based self-service might have more positive reactions to chatbot services (i.e., pseudo-human agents), even if they still know that the chatbot service is still an AI-based technology service, than those who are comfortable with technology-based self-service menus. In order to explain how the role of technology anxiety may differ in the case of using human-like technologies including chatbot services, three theoretical perspectives are introduced to explain the phenomenon of human reactions to technologies that mimic human behaviors.

The first theory on human reactions to human-like machines is called uncanny valley theory (UVT) (Mori 1970), which posits that an individual’s reaction to a human-like machine first shows a positive relationship with the human-likeness of the machine to a certain point (e.g., industrial robot arms for assembling parts or AI-driven chatbots with human-like voices or sentences), but this positive reaction becomes abruptly negative as the machine comes with some atypical or imperfect human looks and behaviors and fails to show an appropriate level of appearance like humans either in reality or experimental settings (e.g., creepy looking humanoids or chatbots with human-looking animated avatars) (Ciechanowski et al. 2019; Kätsyri et al. 2015; Urgen et al. 2018). This negative reaction can be positive again as the human-likeness of the machine becomes near-perfect, making the human reaction chart against the human-likeness of the machine a valley-like shape (Mori 1970; Urgen et al. 2018). Some studies have shown empirical evidence of this UVT by confirming the sequence of positive-negative-positive relationships between human reactions and the machine’s human-likeness in certain cases (e.g., Kätsyri et al. 2015). In the present study, however, the first positive and linear association between users’ reaction and machines’ (chatbots’) human-likeness (before the ‘valley’ part of the theory) was focused on, as the features added to the online chatbot services in the context of the present study (i.e., chatbots on OTAs), supporting interactive conversations with users via text or voice channels with no likeness of human appearance, can be seen as ones of typical entry-level human-like features of machines (in terms of the x-axis of the UVT curve) without any atypical or imperfect human-likeness, not going beyond the point that users feel eerie or uneasy. Moreover, a recent study found that people tend to experience less uncanny effect when using simpler text-based chatbots than when using chatbots serving with animated (obviously robot-looking) avatars (Ciechanowski et al. 2019). As such, it is believed that the human-like interactive and contextual language processing capability of chatbot services may positively influence the way users react to the chatbot technology.

Second, the literature on technology humanness is introduced, referred to as users’ perceived similarity of a technology device to humans in their motions (behaviors) and physical appearances, which is operationalized on a continuum from ‘system-like’ to ‘human-like’ (Kamide et al. 2014; Lankton et al. 2015). The literature on technology humanness has examined the relationship between users’ familiarity with a technology device and humanness of it (Kamide et al. 2014), the impact of human-like trust in a device on users’ adoption (Carter and Liu 2018), and the relationship among users’ perceived technology-driven social presence, perceived humanness, perceived usefulness, and the enjoyment of a technology device (Lankton et al. 2015). Overall takeaways from these studies are that individuals’ perception of the humanness of the same technology devices varies due to their different perception of social presence enabled or provided by the device (Lankton et al. 2015), and users’ perceived humanness of a device is positively associated with users’ trust in and emotional reactions to the device, which leads to the adoption (Kamide et al. 2014; Lankton et al. 2015). Among several theories used to explain users’ perceived technology humanness, the theory of social presence is focused on (Rettie 2003; Short et al. 1976), since the social presence of a technology device, which refers to users’ perception of interpersonal interactions with the device, is related to the interactive nature of technology-based services such as social media and chatbot services (Lankton et al. 2015).

The last theoretical perspective is the personification of machines by humans. Recent studies have investigated humans’ tendency to personify human-like technologies, such as smart voice assistant systems (Lee et al. 2019; Purington et al. 2017). In addition to feeling some degree of the humanness of a technology device, people start to treat the device as a human companion, although they are well aware that the human-like functions of the device are 100% realized by technological artifacts such as AI and NLP technologies (Lopatovska and Williams 2018). This theoretical perspective also suggests that the degree of personification of a machine varies among different individuals and key technological features that trigger users’ personification of a machine are interactive and conversational functions embedded in a technology (Purington et al. 2017). Some key findings from this literature related to our study are that people are generally more comfortable interacting with machines that can mimic what humans do (e.g., human-like conversations), and their degree of personification is related to users’ satisfaction with the machine (Purington et al. 2017).

All these theories imply that some technology devices can be no longer seen as 100% technology devices or machines but can be, at least, partially personified by users, depending on how humans see them. Therefore, the role of technology anxiety on users’ post-use assessment of chatbot services should be different from conventional technology-based online services, because chatbot services come with human-like features. To examine the role of technology anxiety in chatbots, this study proposes technology anxiety as a key moderating factor for the relationships between chatbot quality dimensions and users’ post-use confirmation. Depending on how users see the chatbot-enabled services with different levels of technology anxiety, either as human-like agents or as another new type of technology-enabled self-services, the way they assess their post-use confirmation against the service quality dimensions should be different. A more detailed argument about the moderating role of technology anxiety will be elaborated in the next chapter.

Research model and hypotheses development

Figure 1 illustrates our research model. Briefly, chatbot quality dimensions are positively associated with users’ post-use confirmation, and technology anxiety positively moderates the relationships between chatbot quality dimensions and confirmation. Confirmation is positively associated with user satisfaction, which is also positively associated with the use continuance intention of chatbot services.

Fig. 1
figure 1

Research model

The relationships between chatbot quality dimensions and post-use confirmation

A chatbot needs to have a robust database of information that allows it to generate suitable and most appropriate responses to users, so its NLP system converts human languages to relevant information whilst responding to users’ inputs with relevant information (Hill et al. 2015). This gives it human-like characteristics, which may eliminate initial distrust users often have towards computer-based systems (Zamora 2017). A number of recent studies on chatbots have verified a positive relationship between users’ understandability and user’s positive reactions. For example, Kuligowska (2015) highlighted the ability of understanding human’s conversation by commercial chatbot services as one of the important evaluation criteria for chatbots. If users find that most information provided by a chatbot is understandable, they are more likely to confirm that their initial expectation of the performance has been met. Therefore, we hypothesize that:

  1. H1:

    Understandability of a chatbot service is positively associated with user’s post-use confirmation.

Reliability is considered one of the important factors for information service functions provided by customer-facing IS services and found to be an important predictor of user satisfaction (Kettinger and Lee 1994). In the context of chatbots, the reliability of the information provided by chatbot services is regarded as an important factor for users (Chung and Park 2019). As such, if chatbot-based OTAs provide dependable services with accurate information (Parasuraman et al. 1988), current users will be more likely to think that their initial expectation of chatbot performance has been confirmed. Therefore, we hypothesize that:

  1. H2:

    Reliability of a chatbot service is positively associated with user’s post-use confirmation.

Responsiveness and assurance are originally the service quality dimensions for human agents rather than those for IT or IS (DeLone and McLean 2003; Parasuraman et al. 1988). However, we argue that these two quality dimensions are also relevant for chatbot services because chatbot-enabled online agents are supposed to act just like human agents. Thus, we define responsiveness as the extent to which a chatbot service shows a willingness to help and provides prompt services to users. If users perceive that their OTA chatbot answers promptly to their questions and provides immediate services for travel booking and destination recommendations, then it is more likely that they will confirm their initial expectation for using the service. Therefore, we hypothesize that:

  1. H3:

    Responsiveness of a chatbot service is positively associated with user’s post-use confirmation.

As previously defined, users perceive assurance from a chatbot if the chatbot service has “knowledge and ability to inspire trust and confidence” to users (Parasuraman et al. 1988, p.23). If users feel that the information given by an OTA chatbot and the transaction they do with the chatbot are trustworthy enough for them to rely on, and they are also confident that those information and transactions are as good as those given or handled by human agents (or other means of online tools such as web-based hotel/travel bookings), they will find that their initial expectation with the chatbot service will be better confirmed. Therefore, we posit that:

  1. H4:

    Assurance of a chatbot service is positively associated with user’s post-use confirmation.

For interactivity, we found some evidence that interactivity plays an important role in users’ reactions to smart services. Cho et al. (2019) suggested that the interactivity of smart wearable devices is an important influencing factor for users’ positive reactions to the use of the device. Moreover, Shin et al. (2013) found that the interactivity of smart TVs improves users’ positive attitude towards the device. Since a chatbot service is also a kind of smart services, a high degree of interactivity should influence users’ positive reactions, positively influencing their evaluation of their post-use confirmation. Thus, if users of OTA chatbot services find that the chatbot shows a seamless interaction without delay and errors and makes them engaged in the conversation just like they are having a conversation with human agents, they will evaluate that their initial expectation for using the service is confirmed. Therefore, we hypothesize that:

  1. H5:

    Interactivity of a chatbot service is positively associated with user’s post-use confirmation.

The moderating role of technology anxiety

As discussed above, users may see the chatbot service as either ‘a human-like service’ that can appropriately replace human’s customer service work or ‘another type of technology-enabled services.’ If users see chatbot services as human-like agents, we expect that those with a high level of technology anxiety would more positively influence the relationships between chatbot quality dimensions and post-use confirmation of their initial expectation. But, if they treat chatbot services as just one type of technology-driven services, which is not different from other self-service technologies, the relationships between chatbot quality dimensions and their assessment of their post-use confirmation would be negatively affected by their anxiety to the chatbot technology.

Technology anxiety has been used as a moderating factor in the studies on the adoption of consumer electronics. To list a few, Yang and Forney (2013) found that technology anxiety mitigates the relationship between facilitating conditions of mobile shopping use and performance expectancy, and it also moderates the relationship between social influence and intention to use mobile shopping. In addition, Kim and Forsythe (2008) found that users’ attitude towards a virtual try-on technology is more strongly related to the use of the technology when the users have a low level of technology anxiety. Finally, Lee and Yang (2013) found that consumers’ perceived interpersonal service quality (against self-service technology) in retail stores and their patronage intention is negatively moderated by technology anxiety. As such, it is found that technology anxiety in general negatively moderates the relationships between user’s perception of service quality/facilitating conditions/attitude and their adoption/post-use reactions.

To the best of our knowledge, however, there has been no empirical study that examines the moderating effect of technology anxiety on the relationships between chatbot service quality dimensions and users’ post-use confirmation. Moreover, a recent study on technology anxiety in the context of chatbot users did not find a significant relationship between technology anxiety and intention to use chatbots (Lee and Park 2019). Due to its double-sided nature of being considered either ‘pseudo-human agents’ or ‘technology-driven services,’ it is still uncertain that users’ perceived technology anxiety has a positive or a negative moderating role in the relationships between chatbot service quality dimensions and their post-use confirmation. However, based on the positive relationships between the human likeness of a technology device and users’ trust in the device as well as users’ feeling of social presence while using human-like chatbot services (Kamide et al. 2014; Lankton et al. 2015), we argue that technology anxiety may have a positive moderating role in the relationships between chatbot service quality dimensions and users’ post-use confirmation. In other words, a user who has a high level of technology anxiety towards chatbots might not have had a very high initial expectation from chatbot services, but when s/he finds that the services provided by chatbots are understandable, reliable, responsive, trustworthy, and interactive, their post-use confirmation become stronger than those who have less technology anxiety towards chatbots due to their higher perception of human-likeness of technology (Purington et al. 2017). Therefore, we propose that:

  1. H6:

    Technology anxiety positively moderates the relationships between service quality dimensions (understandability, reliability, responsiveness, assurance, and interactivity) and post-use confirmation of a chatbot service.

The relationships among confirmation, satisfaction, and use continuance

Ever since Bhattacherjee (2001) proposed and tested the post-acceptance model of IS continuance based on the ECM, many studies have found evidence of significant relationships among confirmation, user satisfaction, and use continuance. Recent studies have also used this model and provided empirical supports for the theoretical relationships among these three variables in the context of smart consumer technologies. To name a few, Susanto et al. (2016) found that confirmation is positively related to user satisfaction, and satisfaction is positively associated with continuance use intention in the context of smartphone banking. Other studies also found significant relationships among confirmation, satisfaction, and use continuance intention as presented in the post-acceptance model of IS continuance in various contexts such as mobile health services, offline-to-online (O2O) services, and smartwatches (Akter et al. 2013; Hsu and Lin 2019; Nascimento et al. 2018). Likewise, we argue that these findings can also be applied to the case of a chatbot service. If current chatbot users find that the chatbot service has met their initial expectation of using it, they will be satisfied with the service. Moreover, if they are satisfied with a chatbot service after a while of using it, then they are more inclined to continue using the service. Thus, we posit that:

  1. H7:

    User’s Post-use confirmation of a chatbot service is positively associated with satisfaction.

  2. H8:

    User’s satisfaction with a chatbot service is positively associated with the use continuance intention.

Research methodology

Measurements

This study collected survey data from the users of OTAs in China with seven-point Likert scales. Table 2 provides the operational definitions of variables used in our research model with referred studies. Our survey items were either adapted from the literature or developed based on the conceptual definitions of extant studies.

Table 2 Operational definitions

Description of the research site and data collection

Three major OTAs in China were selected as our research sites: Ctrip.com, Qunar.com, and Fliggy.com. The reasons for choosing these three OTAs are as follows. First, the Chinese market is chosen because China has become the world’s second-largest OTA market in 2018, and it has grown aggressively with a 27% annual growth rate, more than four times of the U.S.’s growth rate over 2017. Moreover, it is estimated that China’s OTA market will be very close to the size of the U.S.’s one by 2022 (Phocuswright Research 2019). Second, Ctrip.com, Qunar.com, and Fliggy.com are the three major OTA chatbot service providers in China by far (Zou 2019) and users are becoming familiar with their chatbot services. Specifically, these three OTAs cover 74.1% of the total market share in China in the first half of 2019 (Zou 2019), which makes them dominate the OTA market in China. Thus, a large target population can be guaranteed with the reliability and relevancy of this study. Third, in order to accurately validate users’ reactions to chatbot services, it is important for chatbot users to be aware that the service is enabled by chatbot technologies rather than human agents. Unlike the users of some chatbot services who are not well informed that the service is done by chatbots, the users of Ctrip.com, Qunar.com, and Fliggy.com are well aware that services are enabled by chatbots, not by human agents. When users log on to these OTA websites/apps and look for customer services, the options of human and chatbot services are clearly distinguishable and shown with different icons. When users intend to use chatbot services and click the icon of them, a message pops up indicating that the service is provided by chatbots and asks for customers’ questions using a chatbot profile image.

The typical exchange of information in a conversation between a user and a chatbot is illustrated in Fig. 2. To be specific, a chatbot welcomes a user with a message as a starter and different kinds of questions are given as options. After the user selects a specific category, more detailed instructions for a sub-question are provided to guide the user, such as what products can be taken on a plane and how to get refunded a ticket. Typically, a conversation would be completed in three to four turns, and each turn includes a request from an OTA user and a response from the chatbot; if the user is a VIP (very important person) user, her/his simple text of “buy me a ticket to Beijing,” for example, could results in a direct issue of a flight ticket that matches her/his preference, as previous conversations can be used as a reference by chatbots (Gupta 2019).

Fig. 2
figure 2

A user’s typical exchange of information in a conversation with a chatbot in OTAs

As to our survey respondents, they were acknowledged that the customer services are done by chatbots on the chatbot-enabled OTA. In addition, the screening question of “have you used the chatbot services in major travel sites?” was asked to make sure our survey participants were actual users of the chatbot-enabled services. Furthermore, survey participants were required to think about chatbot services while answering the questions. Because only the actual chatbot users were allowed to participate in the survey and chatbot services were solely pinpointed specifically for our research setting, we thus controlled survey respondents’ knowledge that the services were provided by chatbots.

The So Jump (http://www.sojump.com) survey platform was chosen to collect questionnaires, as it is one of the largest and the most commonly used online professional questionnaire platforms to conduct surveys in China (Liu et al. 2019). It allows participants to answer questionnaires easily on either mobile phones or websites. The survey was conducted in October 2019 for a duration of two weeks, and 326 survey responses were collected. A total of 295 responses were used for our data analysis after removing 31 responses due to their incompleteness and aberration with singular answers (Meijer and Sijtsma 1995). The demographic information of respondents presented in Table 3. The questionnaire items for the constructs used in this study are presented in the Appendix Table 8.

Table 3 Demographics of respondents

Results

Measurement model

A confirmatory factor analysis (CFA) using SmartPLS 2.0 was conducted to test our measurement model. As shown in Table 4, composite reliability (CR) and Cronbach’s α values were used to measure internal reliability. CR and Cronbach’s α values of all the constructs were more than the acceptable threshold values of 0.6 (Fornell and Larcker 1981). As to convergent validity, it is measured by factor loadings and average variance extracted (AVE) values. Factor loading scores ranged from 0.521 to 0.869, which surpassed the acceptable threshold value of 0.5 (Bagozzi et al. 1991), and AVE values ranged from 0.483 to 0.700, which exceeded the recommended value of 0.5 (Fornell and Larcker 1981) except for technology anxiety (0.483), support the presence of convergent validity, considering that AVE values could be a relatively conservative estimation of the convergent validity (Fornell and Larcker 1981). In addition, discriminant validity was accessed by comparing the square root values of AVE with the inter-construct correlation coefficients (Fornell and Larcker 1981). As shown in Table 5, all the square roots values of AVE, highlighted in bolds, were larger than those of inter-construct correlation coefficients, indicating that discriminant validity is adequately achieved.

Table 4 Reliability and convergent validity
Table 5 Construct correlations and discriminant validity

Since our survey data collection is cross-sectional and self-reported, it could have a risk of common method bias (CMB). To examine the CMB of the study, we conducted several tests. First, following the suggestion of Podsakoff et al. (2003), Harman’s one-factor test was conducted by including all items in a principal component exploratory factor analysis without rotation. Based on the technique of Podsakoff et al. (2003), if one factor is detected to explain most of the covariance among the measurements, the threat of common method bias is high. Our test results showed that the dominant factor only accounts for 39.8% of the total variance, which is far below the threshold of 50% (Podsakoff et al. 2003), indicating that the threat of CMB is not serious in this study. Second, based on the recommendation of Liang et al. (2007), a common method factor that contains indicators of all the principle constructs was developed and variances of all indicators related to principal constructs and the common method factor were calculated. As shown in Table 6, our results suggested that the average of the square of substantive factor loadings (R12 = 0.6415) was much larger than that of the square of common method factor loadings (R22 = 0.0176), indicating that CMB is not a concern in this study. Third, the correlation matrix (shown in Table 5) indicated that the correlation values are smaller than 0.757 (r < 0.757), which does not imply highly correlated variables, whereas CMB emerges when extremely high correlations exist (r > 0.90) (Pavlou et al. 2007). The above three tests consistently suggest that this study is robust against the CMB concern.

Table 6 Results of common method bias test

Structural model

A partial least squares (PLS) method was used to test our hypotheses using SmartPLS 2.0. PLS is appropriate to use for a multi-path model when the sample size is small (Chin 1998). A bootstrapping technique and a PLS algorithm are used to test whether our hypothesized relationships are positive or negative and significant or not. Results of the structural model are shown in Fig. 3 and Table 7, including explained variances of endogenous variables (R2), the path-coefficients (β), the level of significance (p value) based on t-values, and the effect sizes (ESs) of moderating effects.

Fig. 3
figure 3

Results of the structural model

Table 7 The summary of the hypotheses testing

As shown in Fig. 3 and Table 7, six out of the seven main hypothesized paths were supported except for H3, indicating the relationship between responsiveness and confirmation is not significant (β = 0.072, t-value = 1.173). Among the other dimensions of chatbot quality, understandability was positively related to confirmation (β = 0.145, t-value = 1.744), marginally supporting H1, and reliability was also positively related to confirmation (β = 0.261, t-value = 2.732), supporting H2. So are the relationships between assurance and confirmation (β = 0.159, t-value = 1.972) and between interactivity and confirmation (β = 0.156, t-value = 2.025), supporting both H4 and H5. As expected, confirmation was positively associated with satisfaction (β = 0.757, t-value = 24.864) and satisfaction was positively associated with use continuance (β = 0.760, t-value = 22.469), supporting H7 and H8, respectively. In addition, four control variables (gender, age, chatbot usage frequency, and the average time for chatbot use) were added to the structural model and it was found that none of them except for gender are significant. Overall, 49.2% (0.492) of the variance in confirmation, 57.3% (0.573) of the variance in satisfaction, and 57.3% (0.573) of the variance in use continuance intention were explained by exogenous and control variables.

The moderating effect of technology anxiety (TA) was also tested using the procedure introduced in Chin et al. (2003) with a calculation of effect sizes (Cohen 2013), path-coefficients (β), and the level of significance (t-value) of the interaction term (predictor variable x moderator variable) in a PLS analysis. The results of the effect size calculation are also shown in Fig. 3. We found a very small but not negligible increase in the R2 of confirmation (Henseler and Fassott 2010), by introducing TA as a moderating variable in the relationship between each one of service quality dimensions and confirmation. Results showed that interaction terms for all moderating relationships except the one with responsiveness are significant at the α = 0.05 or 0.1 levels with the path-coefficients (β) and t-values calculated from the PLS algorithm and bootstrapping analysis: understandability x TA (β = 0.154, t-value = 2.08), reliability x TA (β = 0.149, t-value = 1.978), responsiveness x TA (β = 0.091, t-value = 0.737), assurance x TA (β = 0.130, t-value = 1.789), and interactivity x TA (β = 0.196, t-value = 2.401). As such, hypotheses H6a, H6b, and H6e were supported while H6d was marginally supported. Based on Chin et al.’s (2003) interpretation of the moderating effect with a PLS analysis, we found that introducing the interaction factor (TA x each of chatbot quality dimensions) increases the path-coefficient of the direct relationships between each service quality dimension and confirmation, although TA itself is negatively correlated with all the other variables (see Table 5). For example, in H6e, TA increased the path-efficient (the sensitivity or slope) for the positive and significant effect of interactivity on confirmation by β = 0.196. These results imply that TA positively moderates the relationships between four service quality dimensions (except for responsiveness) and users’ post-use confirmation. The higher levels of users’ technology anxiety, the stronger the relationships between four chatbot quality dimensions (i.e., understandability, reliability, assurance, and interactivity) and their post-use confirmation, contradicting the findings from extant studies on TA of conventional (less human-like) technologies, which argued that technology anxiety is a mitigating factor for the relationships between influencing factors and users’ technology adoption (Kim and Forsythe 2008; Lee and Yang 2013; Yang and Forney 2013).

Discussion

Research findings

Among the five chatbot quality dimensions, understandability, reliability, assurance, and interactivity are positively associated with confirmation (H1, H2, H4, and H5 supported), although only marginal support for the relationship between understandability and confirmation is found in our study, indicating that a more rigorous attempt to empirically confirm this relationship in a future study. For reliability and assurance, our findings are consistent with Rosen and Karwan (1994) who highlighted the importance of reliability and assurance in the service settings of restaurants, healthcare, and bookstore. In addition, the significant relationship between understandability and post-use confirmation is consistent with Kuligowska’s (2015) who suggested that chatbots’ ability to understand human’s conversation is an essential evaluation criterion for commercial chatbot services. A positive relationship between interactivity and confirmation is also found, implying that interactivity, which is considered an advanced feature of smart services including smartwatches and chatbots, can influence users’ positive reactions and eventually increase their satisfaction and continuance use of chatbot services. Our findings are consistent with Cho et al. (2019) and Shin et al. (2013) who highlighted an important role of interactivity in facilitating users’ attitudes and reactions to smart devices.

Interestingly, responsiveness is not significantly related to confirmation (H3 not supported). This finding could be interpreted that in the Chinese OTA context, human services are relatively fast in speed compared to other countries, and thus users may not regard responsiveness as one of their key priorities for confirming their post-use expectation. For example, while the average waiting time to talk with a human representative in online customer service is 10 to 12 min in the United Kingdom, the waiting time in China is usually less than five minutes (Fishman et al. 2017). In addition, even the current responsiveness of OTA chatbot services of the three research sites are already very high (the average of three measured items of responsiveness is 5.41/7.00, the highest among all five quality measures, and the averages of the other four variables ranged from 4.6 to 5.1), so that users of the three OTAs do not feel that the variation in responsiveness can make a big difference for their evaluation for post-use confirmation. Accordingly, responsiveness and the speed of chatbot’s responses may not be prioritized from the Chinese users’ perspective.

Moreover, as service quality can be used to measure the relative importance of multiple dimensions in influencing consumers perceptions (Parasuraman et al. 1988), we conclude that in the context of chatbot-based OTA services in China, reliability ranks first, followed by assurance, interactivity, and understandability in the relative importance in chatbot quality dimensions. It is consistent with the findings of Parasuraman et al. (1988) that reliability turns out to be the most critical dimension, followed by assurance. Rosen and Karwan (1994) also found consistent findings that reliability is the most critical among all quality dimensions in teaching, restaurant, healthcare, and bookstore service settings.

Our moderating test results indicate that the higher levels of users’ technology anxiety, the stronger the relationships between chatbot quality dimensions (i.e., understandability, reliability, assurance, and interactivity) and their post-use confirmation of chatbot-based OTAs (H6a, H6b, and H6e supported, and H6d marginally supported). These results are in line with the proposed hypotheses that technology anxiety should have a positive moderating impact on the relationships between service quality dimensions and users’ post-use confirmation due to its’ human-likeness possibly perceived by users. These results are contradictory to extant studies, which have found that technological anxiety mitigates the relationships between users’ attitude/perceptions regarding technologies and use (or performance expectancy) (Kim and Forsythe 2008; Yang and Forney 2013), and statistically counter-intuitive to some extent since the correlation coefficients between technology anxiety and all other variables are negative. We further investigated our results based on the interpretation of moderation effects with a PLS analysis by Chin et al. (2003), but did find that technology anxiety strengthens the relationships between four abovementioned quality dimensions and confirmation. These results suggest that users’ technology anxiety works differently for chatbot-enabled OTAs from traditional technology-enabled products or services. More specifically, if users with a high level of technology anxiety find that a chatbot-enabled OTA service is highly reliable, understandable, trustworthy (with high assurance), and interactive, they might consider it a human-like service with conversational interfaces, rather than a pure technology-enabled service with technological interfaces (e.g., find-select-click interfaces). Moreover, due to the perceived social presence of the services enabled by chatbots’ capability of understanding and interacting with users like human agents (Lankton et al. 2015), they will rate post-use confirmation of their initial expectation from the chatbot-enabled service higher than those who have not much of technological anxiety. Our finding implies that chatbot services may somehow replace real human agents on OTAs at the current stage at least for those who have a high level of technology anxiety.

The positive relationships among confirmation, satisfaction, and use continuance intention (H7 and H8 supported) correspond to the extended post-acceptance model of IS continuance (i.e., ECM) proposed and tested by Bhattacherjee (2001). As expected, our findings are consistent with extant studies; if users find that their initial expectation of using chatbot-enabled OTA is confirmed, they will be satisfied with the service, which will eventually lead to use continuance intention.

Theoretical implications

This paper provides four major theoretical implications. First, this study contributes to the literature on the extended post-acceptance model of IS continuance (i.e., the ECM) since it extends the model by adding chatbot quality dimensions into the model and examined their roles in the chatbot service setting, which has been under-investigated. More specifically, this study introduced the five chatbot quality dimensions as antecedents, which we believe are relevant in the context of chatbot-enabled OTAs. The roles of these service quality antecedents are evaluated simultaneously with the theoretical loop of confirmation, satisfaction, and continuance use intention, as well as technological anxiety as a moderating factor. By theoretically extending the ECM, followed by the empirical analysis in the context of Chinese chatbot-enabled OTA services, this study contributes to the literature on the post-acceptance model of IS continuance. In addition, this study contributes to the literature on chatbot user experience (UX) by identifying salient quality dimensions for contemporary chatbot services. By reviewing several seminal articles on IS quality, this study proposes the five service quality dimensions, provides conceptual definitions of them in the context of chatbot services, and empirically validates their roles in facilitating chatbot users’ post-adoption confirmation. Since this study is among one of the first attempts to define the dimensions of chatbot service quality, the identification and empirical validation of these five dimensions (i.e., understandability, reliability, responsiveness, assurance, and interactivity) could be considered as one of our key theoretical contributions and shed a light onto the literature on chatbots and UX. Although our test results suggest that responsiveness is not significantly related to confirmation, the rest four dimensions are significantly related to post-use confirmation in the context of chatbot-enabled Chinese OTAs. These results on the relationships between the five quality dimensions and post-adoption confirmation of users might not be very surprising in that intuitively any IS-enabled service with good quality should result in a high level of users’ post-use confirmation. However, it is worth to note that this study identifies two new service quality dimensions specific to human-like IT-enabled services (i.e., interactivity and understandability) on top of the three conventional service quality dimensions (i.e., reliability, responsiveness, and assurance) identified by extant studies, and empirically validates the differences among those dimensions. Therefore, these five quality dimensions, possibly with a few more quality dimensions specific to other contexts of the human-like IT-enabled service, can be further used in future studies.

Second, another important theoretical contribution of the study is that, based on the theoretical perspectives of uncanny valley theory (Mori 1970), technology humanness (Lankton et al. 2015), and the personification of technology (Purington et al. 2017), this study introduces technology anxiety as a moderating factor for the relationships between chatbot quality dimensions and post-use confirmation and finds an interesting result that technology anxiety strengthens the relationship between users’ service quality and post-use confirmation in the context of chatbot-enabled OTA services. The literature on technology anxiety and computer anxiety has almost been consistent about the negative impact of technology anxiety on users’ reactions to technology and adoption of it (Igbaria and Parasuraman 1989; Meuter et al. 2003; Parasuraman 2000). Technology anxiety has either a mitigating effect on the relationship between users’ evaluation of a technology-related product and their adoption or a negative direct relationship with users’ adoption of technology if the targets of use are apparently 100% technology-oriented products or services (Meuter et al. 2003; Yang and Forney 2013). These negative impacts of technology anxiety are quite intuitive in that when someone (i.e., a consumer) has anxiety about using technology, s/he tries to avoid technology-enabled services and seeks human alternatives for her/his needs, if any. However, as discussed above, since a chatbot-enabled customer service agent is designed to mimic human’s conversation and the ways real humans interact with their customers to provide the perception of social presence to users (Rettie 2003; Short et al. 1976), it could be seen as a human-like agent (even though the users already acknowledge that the chatbot is supported by AI technology) by some users. Thus, those who have a high level of technology anxiety with possibly a lower initial expectation in our sample using chatbot-enabled OTAs (rather than more technology-enabled alternatives such as OTA mobile apps) could have found that their expectation is well confirmed with a high level of perceived service quality dimensions. With the advancement of an AI-driven NLP technology and its applications, as consumer-facing chatbot services can be developed to more human-like ones and users’ perception of those advanced technologies in the future will change accordingly, technology anxiety will work differently in the way the results of this study implied. Perhaps, consumers would finally happen to do not care whether those agents on the other side of their service interface are humans or chatbots and could even personify the chatbots to treat them as human agents (Purington et al. 2017), as long as they can find what they are looking for from their counterparts in a timely and convenient manner. Some users with less technology anxiety will be comfortable using technology services such as mobile apps or online websites, while others with more technology anxiety will try to find human or human-like agents to get their services done. Therefore, we believe that the results of the positive moderating role of technology anxiety on the relationship between quality dimensions and post-use confirmation in this study can shed a light on the literature on the humanness or the personification of technology devices. As this study did not measure the humanness or the degree of personification of chatbot services, we suggest that future research can look into the relationships among users’ technology anxiety, their perceived humanness (or personification of human-like technologies), service quality dimensions (with a new list of dimensions that fit the context of the study), and their adoption in various AI-driven technology contexts (e.g., humanoids, cyber service agents with the human voice and appearance, etc.).

Third, this study contributes to the new research stream of smart tourism. Among the three aspects of smart tourism research introduced by Gretzel et al. (2015), this study falls into the part of ‘smart experience,’ which focuses on enhancing users’ experience of tourism services with real-time interaction, personalization, and context-awareness supported by new technologies (Buhalis and Amaranggana 2015; Neuhofer et al. 2015). Although the tourism industry is one of the most benefited industries by chatbot services with a handful of recent conceptual studies (e.g., Buhalis and Yen 2020; Ukpabi et al. 2019; Zlatanov and Popesku 2019), it still has not been active in empirically investigating users’ adoption of chatbot services, except very few attempts (e.g., Melián-González et al. 2019), not to mention in identifying and empirically validating the chatbot quality dimensions. This study, therefore, applies the theoretical model of the relationships among chatbot quality dimensions, technology anxiety, and users’ post-use reactions into the context of smart tourism to provide empirical evidence on how current tourism service users assess the quality of AI-enabled smart tourism services (i.e., chatbot services) and react to the service considering their level of technology anxiety to these human-like services. Our results imply that chatbots in the tourism industry that can apparently provide reliable and responsive services with quality assured and real-time interactions will confirm the expectations of users and make them continuously use. Therefore, we believe that our investigation of tourism service consumers’ perception of service quality, reaction to the technology by additionally including the moderating role of technology anxiety, their post-use confirmation/satisfaction, and adoption intention will contribute to the body of knowledge on smart tourism.

Practical implications

For practices, we argue that our research on chatbots has the potential to contribute to the audience of chatbot developers, user experience (UX) designers, quality assurance specialists, and e-Commerce/e-Service providers, and to create end-user awareness on what factors should be considered for better performance of chatbot services when interacting with the system. First, this study proposes the five quality characteristics (i.e., understandability, reliability, responsiveness, assurance, and interactivity) of chatbot services that possibly influence use continuance through confirmation and satisfaction. Among the five characteristics, understandability, reliability, assurance, and interactivity are significantly related to confirmation, and further positively related to use continuance through satisfaction. Therefore, several aspects of the chatbot service (e.g., the four quality dimensions found significant in this study) should be tested before implementation by developers and designers to make sure that clients and end-users have an engaging experience for continued use of this product. By answering the question of which chatbot quality dimensions are relatively more important among five dimensions, this study would help chatbot developers be well informed about which features of a chatbot to focus more on. Our results suggest reliability and assurance are the top two essential aspects, followed by understandability and interactivity in terms of relative importance. We suggest OTA managers, UX designers, and developers to continuously pay attention to make chatbots more reliable, trustworthy, and understandable, providing more interactive functions to satisfy users’ needs and sense of control at the same time.

Second, our results suggest that responsiveness is not significantly related to post-use confirmation, which may imply that chatbot users in our sample do not consider responsiveness a major factor influencing their confirmation for chatbot services in the Chinese OTA context. However, as discussed, this result may have been due to the fact that the chatbot-based OTAs in our sample are already responsive enough with its mean value of the variable 5.41/7.00. Thus, we did not see many variations in the responsiveness of chatbot services in our sample. However, this result does not necessarily mean that the feature of responsiveness in chatbot services is not an important dimension in other contexts of chatbot services. Although this result could suggest to chatbot developers, UX designers, quality assurance specialists, and e-Commerce/e-Service providers especially in China that they should focus more on understandability, reliability, assurance, and interactivity, rather than focusing on the responding speed, the role of responsiveness could still be an important factor in other chatbot contexts, especially where the responding time of the chatbot service is not properly achieved.

Third, although most people know that chatbot services are facilitated by AI-enabled NLP technologies, our result with the role of technology anxiety also suggests that, at least in the context of OTA users in China, chatbots could be considered differently from other obviously 100% technology-enabled services. Our result does not necessarily mean that currently available chatbot services can replace human agents very nicely; it is acknowledged that there are still a considerable proportion of questions that chatbots cannot answer in a proper manner. According to a survey with 500 participants, 60% and 56% of the respondents in the U.S. and the UK, respectively, prefer human agents to chatbots for complex inquiries in their service engagements (CGS 2018). Our positive moderating role of technology anxiety in the relationships between quality dimensions and post-use confirmation could have been due to the fact that there are quite a few groups of people who want avoid other technology-enabled alternatives (e.g., mobile apps) because of their high level of technology anxiety; their probably lower initial expectation for the chatbot service is well confirmed after they recognize that the OTA chatbot services were good enough in each dimension of service quality. Nevertheless, this result still has an important practical implication that, with the advancement of chatbot technologies that increase so-called its humanness, e-Commerce and e-Service providers can essentially churn the users who seek human agents (due to their high level of technology anxiety) into chatbot users at least for service inquiries that are not very complex, which can further help them save cost for their customer services.

Limitations and future research

There are five major limitations of this study. First, a specific context was chosen in this study – chatbot-based OTAs. Even though our findings could provide insights of chatbot quality dimensions into a vast audience of practitioners, our results may be difficult to generalize to all chatbot services. In addition, only the Chinese market was investigated. The results from our hypothesized relationships may be different from different contexts or countries. Future studies therefore should test this model across a variety of industries and countries to improve the generalization of our research model and gain various implications for more audiences. Moreover, the role of responsiveness should be further investigated in other contexts of chatbot services (e.g., hotel and restaurant settings in other countries) to increase the external validity of our research model.

Second, related to the first limitation, as this study covers only a single cultural context (China), the cultural difference with regard to users’ reactions to human-like technologies is not fully taken into consideration. Therefore, it might be problematic to generalize the results of our study over and above the context of Chinese chatbot users. However, we believe that providing more information about the use and adoption situation in China compared to Western countries can help readers comparatively interpret the results of our study. The followings are some interesting findings we have learned from industry data. First, despite the fact that the Asia-Pacific region is expected to have the highest growth rate in the chatbot market (Chatbot Market 2019) and China has become the world’s second-largest OTA market in 2018 (Phocuswright Research 2019), the chatbot usage rate in the Asia-Pacific region (21%) is still reported to be less than a half of that in the North American region (45%) (Andre 2020). This penetration information can be interpreted that chatbot users (our research participants) in China could belong to the early adopter groups based on Roger’s innovation diffusion categories (Rogers 2010), compared to North American chatbot users (the early majority groups). Therefore, our survey respondents could be considered more innovative than North American users, implying that current Chinese chatbot users will show more innovative tendencies when it comes to reacting to human-like technologies, regardless of whether they have a high or a low level of technology anxiety. Second, Chinese consumers have relatively less trust in online products due to various product quality safety issues (Chatbots Magazine 2017). Despite some anecdotal evidence that the situation has recently been improved, their lack of trust in products and services provided online still encourages Chinese consumers to ask more questions before making purchases online. Chatbot services, if their quality dimensions are acceptable and better than expected, should be more welcomed by Chinese online consumers than Westerners, so that the relationships between chatbot service quality dimensions and users’ post-use confirmation (or other adoption variables) should be stronger than those of Westerners. Based on our research model and the consideration of cultural differences, a future study could further investigate the relationships among service quality, technology anxiety, humanness, and users’ post-adoption variables in a more appropriate manner.

Third, only technology anxiety was examined to see whether there is a moderating effect on the relationships between chatbot quality dimensions and post-use confirmation. Practically, there might be other kinds of anxiety that influence our proposed relationship, such as social anxiety, indicating the difficulty to have conversations or interactions with strangers (Schlenker and Leary 1982). Thus, the role of various kinds of anxiety could be further examined to discover what are the obstacles that hinder users’ confirmation, satisfaction, and use continuance towards chatbot services.

Fourth, although the original ECM of IS includes perceived usefulness to explain users’ post-use assessment of instrumentality of a product, this study focuses only on key influencing chatbot quality dimensions for post-use confirmation, satisfaction, and use continuance using the extended post-acceptance model of IS continuance model. Although our research purpose and scope are to investigate what features of chatbot quality influence users’ post-use confirmation, instead of the evaluation of instrumentality of the service, which could be well captured by service quality dimensions we included, in future research, it would be worthwhile to include more variables, such as perceived usefulness and perceived security to provide some additional interesting implications.

Lastly, although we have added four control variables into our structural model testing, more potentially important factors for users’ continuance intention are not taken into consideration. For example, the Internet usage experience or the types of devices they use for chatbot services (e.g., smartphone, tablet PC, or normal PC) could be a factor that could have potentially influenced our dependent variable. Future studies therefore could control more relevant variables to increase the validity of our research model.

Conclusion

This study extends the post-acceptance model of IS continuance in the context of chatbot-enabled OTAs by identifying several quality dimensions of chatbot services (i.e., understandability, reliability, responsiveness, assurance, and interactivity) as antecedents for user’s post-acceptance confirmation, as well as by proposing technology anxiety toward chatbots as a moderating factor for these relationships. This study found that, with the exception for responsiveness, most quality dimensions of chatbot services are significantly related to confirmation, which in turn leads to users’ continuance intention. It also revealed that technology anxiety positively moderates the relationships between chatbot quality dimensions and post-use confirmation, suggesting some users may treat chatbot services as human-like agents. Practically, the results of this study provide quality assurance specialists, e-service providers, and chatbot developers with guidelines to better understand chatbot users in enhancing their service adoption in the tourism and hospitality sector. Future studies could extend this model across a variety of industries and cultural backgrounds to improve the generalizability of our research model and gain various implications for more audiences. In addition, future studies could extend our research model by further investigating the relationships among other service quality dimensions, which are salient in other technology contexts, technology anxiety, other types of anxiety (e.g., social interaction anxiety), humanness, and other post-adoption variables (e.g., recommendation intention) to provide additional implications.