Towards integrated dialogue policy learning for multiple domains and intents using hierarchical deep reinforcement learning
Expert Systems with Applications, 2020•Elsevier
Abstract Creation of Expert and Intelligent Dialogue/Virtual Agent (VA) that can serve
complicated and intricate tasks (need) of the user related to multiple domains and its various
intents is indeed quite challenging as it necessitates the agent to concurrently handle
multiple subtasks in different domains. This paper presents an expert, unified and a generic
Deep Reinforcement Learning (DRL) framework that creates dialogue managers competent
for managing task-oriented conversations embodying multiple domains along with their …
complicated and intricate tasks (need) of the user related to multiple domains and its various
intents is indeed quite challenging as it necessitates the agent to concurrently handle
multiple subtasks in different domains. This paper presents an expert, unified and a generic
Deep Reinforcement Learning (DRL) framework that creates dialogue managers competent
for managing task-oriented conversations embodying multiple domains along with their …
Abstract
Creation of Expert and Intelligent Dialogue/Virtual Agent (VA) that can serve complicated and intricate tasks (need) of the user related to multiple domains and its various intents is indeed quite challenging as it necessitates the agent to concurrently handle multiple subtasks in different domains. This paper presents an expert, unified and a generic Deep Reinforcement Learning (DRL) framework that creates dialogue managers competent for managing task-oriented conversations embodying multiple domains along with their various intents and provide the user with an expert system which is a one stop for all queries. In order to address these multiple aspects, the dialogue exchange between the user and the VA is split into hierarchies, so as to isolate and identify subtasks belonging to different domains. The notion of Hierarchical Reinforcement Learning (HRL) specifically options is employed to learn optimal policies in these hierarchies that operate at varying time steps to accomplish the user goal. The dialogue manager encompasses a top-level domain meta-policy, intermediate-level intent meta-policies in order to select amongst varied and multiple subtasks or options and low-level controller policies to select primitive actions to complete the subtask given by the higher-level meta-policies in varying intents and domains. Sharing of controller policies among overlapping subtasks enables the meta-policies to be generic. The proposed expert framework has been demonstrated in the domains of “Air Travel” and “Restaurant”. Experiments as compared to several strong baselines and a state of the art model establish the efficiency of the learned policies and the need for such expert models capable of handling complex and composite tasks.
Elsevier