KR102323482B1

KR102323482B1 - Conversation agent system and method using emotional history

Info

Publication number: KR102323482B1
Application number: KR1020190043129A
Authority: KR
Inventors: 신홍식; 홍은미; 이청안
Original assignee: 한국전자인증 주식회사
Priority date: 2019-03-19
Filing date: 2019-04-12
Publication date: 2021-11-09
Also published as: KR20200111595A

Abstract

본 기술은 발화 감정 히스토리를 이용한 대화 에이젼트 시스템 및 방법에 관한 것이다. 본 기술의 구체적인 예에 따르면, 다수의 수집된 원본 자료에 대해 감정 변화를 유도하기 위한 할인 누적 보상값을 반영하여 학습 자료를 도출하고 도출된 학습 자료에 대해 설정된 학습 기법에 의거 학습 수행하여 대화 모델을 구출하며, 구출된 대화 모델을 이용하여 입력된 발화 문장 및 발화 감정 각각에 대한 응답 문장 및 응답 감정을 생성하고 생성된 응답 감정 및 응답 문장을 자연어 형태로 변환한 다음 결합하여 발화함에 따라 발화자의 감정에 대한 정확도를 향상시킬 수 있고, 감정이 반영된 효과적인 대화를 수행할 수 있다.The present technology relates to a dialogue agent system and method using speech emotion history. According to a specific example of the present technology, learning data is derived by reflecting the accumulated discount value for inducing emotional change for a large number of collected original data, and learning is performed based on the learning technique set for the derived learning data, and the conversation model and generates a response sentence and response emotion for each input speech sentence and speech emotion using the rescued dialogue model, converts the generated response emotion and response sentence into a natural language form, and combines them to utter the It is possible to improve the accuracy of emotions and to conduct effective conversations reflecting emotions.

Description

CONVERSATION AGENT SYSTEM AND METHOD USING EMOTIONAL HISTORY

본 발명은 발화 감정 히스토리를 이용한 대화 에이젼트 시스템 및 방법에 관한 것으로서, 더욱 상세하게는 발화자의 감정 변화를 유도하기 위한 할인 누적 보상값을 반영하여 응답 문장을 생성함에 따라 발화자의 감정이 적극 반영된 대화를 수행할 수 있도록 한 기술에 관한 것이다.The present invention relates to a dialog agent system and method using a speech emotion history, and more particularly, to a conversation in which the speaker's emotion is actively reflected as a response sentence is generated by reflecting the accumulated discount value for inducing a change in the speaker's emotion. It's about the technology that made it possible.

기존의 대화형 시스템 연구에는 사용자의 감정은 고려 되지 않은 채 발화된 문장에 대해서 답변을 하기에 급급하였으나 근래에는 감정을 포함한 대화형 시스템을 개발하려는 연구가 활발히 진행되고 있다.In the existing interactive system research, the user's emotions were not taken into account and they were rushed to answer the uttered sentences, but recently, research to develop an interactive system including emotions is being actively conducted.

이러한 대화형 시스템에 적용되는 딥러닝 인코더는 딥러닝 기술을 사용해서 가변 길이 문서를 고정 길이 문서 벡터로 표현하는 방법으로, 감정 분류 분야에서 우수한 성능을 보여줄 수 있다. The deep learning encoder applied to such an interactive system can show excellent performance in the field of emotion classification by using deep learning technology to express variable-length documents as fixed-length document vectors.

하지만 전체 문서 시퀀스의 마지막 출력을 문서 벡터로 간주하는 LSTM(Long Short Term Momory) 인코딩 장치의 경우, 입력이 길어짐에 따라 초기에 입력된 패턴의 인식률이 급격히 저하되어, 긴 문서의 인코딩 장치로는 적합하지 않은 문제점이 있다. However, in the case of an LSTM (Long Short Term Memory) encoding device, which considers the final output of the entire document sequence as a document vector, the recognition rate of the initially input pattern rapidly decreases as the input length increases, making it suitable as an encoding device for long documents. There is a problem that has not been done.

본 발명은 발화자의 감정 변화가 유도되는 할인 누적 보상값이 반영된 응답 문장 및 응답 감정을 생성하여 발화함으로써 대화의 정확도를 향상시킬 수 있고, 감정이 반영된 효과적인 대화를 수행할 수 있는 발화 감정 히스토리를 이용한 대화 에이젼트 시스템 및 방법을 제공하고자 함에 있다. The present invention can improve the accuracy of conversation by generating and uttering response sentences and response emotions reflecting the discount accumulated compensation value that induces emotional changes of the speaker, and using the speech emotion history that can perform an effective conversation in which emotions are reflected An object of the present invention is to provide a dialogue agent system and method.

본 발명의 목적은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 실시 예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.The object of the present invention is not limited to the above-mentioned object, and other objects and advantages of the present invention not mentioned may be understood by the following description, and will be more clearly understood by the embodiments of the present invention. It will also be readily apparent that the objects and advantages of the present invention can be realized by means of the means and combinations thereof indicated in the claims.

일 실시예에 의한 발화 감정 히스토리를 이용하여 대화 에이젼트 시스템은 Using the speech emotion history according to an embodiment, the dialogue agent system

수집된 다수의 원본 자료에 대해 발화자의 감정 변화를 유도하기 위한 할인 누적 보상값을 연산하고 연산된 할인 누적 보상값을 반영하여 대화 모델을 구축하는 대화 모델 구축장치;a dialogue model building apparatus for calculating a cumulative discount reward value for inducing a change in the speaker's emotions with respect to a plurality of collected original data, and constructing a dialogue model by reflecting the calculated cumulative discount compensation value;

발화에 포함된 문장 및 감정을 수신하는 수신장치; a receiver for receiving sentences and emotions included in utterances;

상기 수신장치의 추출된 문장 및 감정을 하나의 학습 자료로 전처리하는 전처리장치; a preprocessor for preprocessing the extracted sentences and emotions of the receiving device into one learning data;

상기 전처리장치의 하나의 학습 자료에 대해 구축된 대화 모델에 의거 응답 문장 및 감정을 각각 도출하는 응답 생성장치; 및a response generating device for deriving response sentences and emotions, respectively, based on a dialogue model built for one learning material of the preprocessor; and

도출된 응답 문장 및 감정 각각에 대해 자연어 형태로 변환한 다음 결합하여 발화 감정에 대한 응답 감정 및 발화 문장에 대한 응답 문장을 결합하여 발화하는 출력장치를 포함하는 것을 일 특징으로 할 수 있다.It may be characterized in that it comprises an output device that converts each of the derived response sentences and emotions into a natural language form and then combines them to combine the response emotions for the utterance emotions and the response sentences for the utterance sentences to utter them.

바람직하게 상기 대화 모델 구축장치는, Preferably, the dialogue model building device comprises:

다수의 원본 자료를 수집하는 원본자료 수집모듈;an original data collection module for collecting a plurality of original data;

상기 각각의 원본 자료에 대해 할인 누적 보상값을 연산하는 할인 누적 보상값 연산모듈; a discount cumulative compensation value calculation module for calculating a discount cumulative compensation value for each of the original data;

상기 각각의 원본 자료에 연산된 할인 누적 보상값을 반영하여 학습 자료를 생성한 다음 생성된 학습 자료에 대해 기 정해진 학습 알고리즘을 토대로 학습 수행하는 학습 모듈; 및a learning module that generates learning materials by reflecting the accumulated discount value calculated on the respective original materials, and then performs learning on the generated learning materials based on a predetermined learning algorithm; and

상기 학습 결과에 의거 대화 모델을 구축하는 대화 모델 구축모듈을 포함할 수 있다.It may include a dialog model building module for building a dialog model based on the learning result.

바람직하게 상기 할인 누적 보상값 연산모듈은,Preferably, the discount cumulative compensation value calculation module comprises:

할인 연산 보상값은 reward=0 으로 초기 설정하고,The discount calculation reward value is initially set to reward=0,

index+2n+2< 2 인 경우, x[index +2n+2]의 에피소드의 감정이 행복인 지를 판단하고 행복인 경우 현재 할인 연산 보상값=이전 할인 연산 보상값+rⁿ 으로 설정하며,If index+2n+2<2, it is determined whether the emotion of the episode of x[index +2n+2] is happiness, and if it is happy, the current discount calculation compensation value = the previous discount calculation compensation value+r ⁿ ,

상기 n=n+1 로 증가한 다음 모든 원본 자료에 대해 반복 수행하도록 구비될 수 있다.After increasing to n=n+1, it may be provided to repeatedly perform all original data.

여기서, r은 할인율이고, x[index+2n+2]는 문장과 감정으로 하나의 원본 자료의 에피소드이고, index는 원본 자료의 식별정보이고, n은 할인 누적 보상값으로 보정된 학습 자료와 원본 자료와의 거리 정보이다.Here, r is the discount rate, x[index+2n+2] is an episode of one original data with sentences and emotions, index is the identification information of the original data, and n is the learning data and the original corrected with the discount cumulative reward value. distance information from the data.

바람직하게 할인 누적 보상값은 원본 자료의 임의의 응답 문장의 반응으로 바로 이어 동일한 감정의 문장이 발화된 경우 이전 할인 누적 보상값을 기준으로 증가하고, 원본 자료의 임의의 응답 문장의 반응으로 소정 수의 지난 후 동일한 감정의 문장이 발화된 경우 이전 할인 누적 보상값을 기준으로 감소하도록 구비될 수 있다.Preferably, the discount cumulative compensation value increases based on the previous discount cumulative compensation value when a sentence of the same emotion is uttered immediately following a response of a random response sentence of the original data, and a predetermined number of responses to a random response sentence of the original data When the sentence of the same emotion is uttered after the passage of

바람직하게 상기 발화 감정 히스토리를 이용한 대화 에이젼트 시스템은Preferably, the dialogue agent system using the speech emotion history

기 정해진 강화 학습 정책(Policy)에 의거 정책 변화도 학습법(Policy gradient training)으로 상기 입력된 발화 문장 및 감정과 상기 출력된 응답 문장 및 감정에 대해 학습을 수행하여 상기 대화 모델을 업데이트하는 모델 업데이트장치를 더 포함할 수 있다.A model update device for updating the dialog model by learning the input speech sentences and emotions and the output response sentences and emotions with policy gradient training based on a predetermined reinforcement learning policy (Policy) may further include.

일 실시 예의 발화 감정 히스토리를 이용하여 대화 에이젼트 방법은, The conversation agent method using the speech emotion history of an embodiment includes:

수집된 다수의 원본 자료에 대해 발화자의 감정 변화를 유도하기 위한 할인 누적 보상값을 연산하고 연산된 할인 누적 보상값을 반영하여 대화 모델을 구축하는 대화 모델 구축단계;a dialogue model construction step of calculating a cumulative discount reward value for inducing a change in the speaker's emotions with respect to a plurality of collected original data, and constructing a dialogue model by reflecting the calculated cumulative discount compensation value;

발화에 포함된 문장 및 감정을 수신하는 수신단계; a receiving step of receiving sentences and emotions included in the utterance;

상기 추출된 문장 및 감정을 하나의 학습 자료로 전처리하는 전처리단계; a pre-processing step of pre-processing the extracted sentences and emotions into one learning data;

상기 하나의 학습 자료에 대해 구축된 대화 모델에 의거 응답 문장 및 감정을 각각 도출하는 응답 생성단계; 및a response generating step of deriving a response sentence and an emotion based on a dialogue model built for the one learning material, respectively; and

도출된 응답 문장 및 감정 각각에 대해 자연어 형태로 변환한 다음 결합하여 발화 감정에 대한 응답 감정 및 발화 문장에 대한 응답 문장을 결합하여 발화하는 출력단계를 포함하는 발화 감정 히스토리를 이용한 대화 에이젼트 방법에 의하여 상기 발화 감정 히스토리를 이용한 대화 에이젼트 방법을 일 특징으로 한다.Conversation agent method using speech emotion history, which includes an output step of converting each of the derived response sentences and emotions into a natural language form and then combining them to combine the response emotions for the speech emotions and the response sentences for the speech sentences. A conversation agent method using the speech emotion history is characterized in that it is one.

바람직하게 상기 대화 모델 구축단계는, Preferably, the dialogue model building step comprises:

다수의 원본 자료를 수집하고,Collecting a large number of original data,

상기 각각의 원본 자료에 대해 할인 누적 보상값을 연산하며,Calculate the discount cumulative compensation value for each of the original data,

상기 각각의 원본 자료에 연산된 할인 누적 보상값을 반영하여 학습 자료를 생성한 다음 생성된 학습 자료에 대해 기 정해진 학습 알고리즘을 토대로 학습 수행하고,After generating learning materials by reflecting the accumulated discount value calculated on the respective original materials, learning is performed on the generated learning materials based on a predetermined learning algorithm,

상기 학습 결과에 의거 대화 모델을 구축하도록 구비될 수 있다.It may be provided to build a dialogue model based on the learning result.

바람직하게 상기 할인 누적 보상값은,Preferably, the discount cumulative compensation value is

reward=0 으로 초기 설정하고,Initial set reward=0,

할인 누적 보상값은 원본 자료의 임의의 응답 문장의 반응으로 바로 이어 동일한 감정의 문장이 발화된 경우 이전 할인 누적 보상값을 기준으로 증가하고, 원본 자료의 임의의 응답 문장의 반응으로 소정 수의 지난 후 동일한 감정의 문장이 발화된 경우 이전 할인 누적 보상값을 기준으로 감소하도록 구비될 수 있다.The discount cumulative compensation value increases based on the previous discount cumulative compensation value when a sentence of the same emotion is uttered immediately following the response of a random response sentence of the original data, and a predetermined number of past When the sentence of the same emotion is uttered afterward, it may be provided to decrease based on the previous discount accumulated compensation value.

바람직하게 상기 출력단계 이후에Preferably after the output step

기 정해진 강화 학습 정책(Policy)에 의거 정책 변화도 학습법(Policy gradient training)으로 상기 입력된 발화 문장 및 감정과 상기 출력된 응답 문장 및 감정에 대해 학습을 수행하여 상기 대화 모델을 업데이트하는 모델 업데이트단계를 더 포함할 수 있다.A model update step of updating the dialog model by learning the input utterance sentences and emotions and the output response sentences and emotions with policy gradient training based on a predetermined reinforcement learning policy (Policy) may further include.

일 실시 예에 따르면, 다수의 수집된 원본 자료에 대해 감정 변화를 유도하기 위한 할인 누적 보상값을 반영하여 학습 자료를 도출하고 도출된 학습 자료에 대해 설정된 학습 기법에 의거 학습 수행하여 대화 모델을 구출하며, 구출된 대화 모델을 이용하여 입력된 발화 문장 및 발화 감정 각각에 대한 응답 문장 및 응답 감정을 생성하고 생성된 응답 감정 및 응답 문장을 자연어 형태로 변환한 다음 결합하여 발화함에 따라 발화자의 감정에 대한 정확도를 향상시킬 수 있고, 감정이 반영된 효과적인 대화를 수행할 수 있다.According to an embodiment, the learning data is derived by reflecting the accumulated discount value for inducing emotional change for a plurality of collected original data, and the conversation model is rescued by performing learning based on the learning technique set for the derived learning data. Using the rescued dialogue model, a response sentence and a response emotion for each inputted speech sentence and speech emotion are generated, the generated response emotion and response text are converted into a natural language form, and then combined and uttered to the speaker's emotion. It is possible to improve the accuracy of communication and to conduct effective conversations reflecting emotions.

본 명세서에서 첨부되는 다음의 도면들은 본 발명의 바람직한 실시예를 예시하는 것이며, 후술하는 발명의 상세한 설명과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니된다.
도 1은 일 실시 예에 따른 시스템 구성도이다.
도 2는 일 실시예의 시스템의 대화 모델 구축장치의 세부 구성도이다.
도 3은 일 실시예의 원본 자료 및 학습 자료를 보인 예시도들이다.
도 4는 일 실시 예에 따른 시스템의 동작 과정을 보인 전체 흐름도이다.
도 5는 일실시 예에 따른 대화 모델 구축과정의 세부 흐름도이다.The following drawings attached to the present specification illustrate preferred embodiments of the present invention, and serve to further understand the technical spirit of the present invention together with the detailed description of the present invention to be described later, so the present invention is a matter described in such drawings should not be construed as being limited only to
1 is a system configuration diagram according to an embodiment.
2 is a detailed configuration diagram of an apparatus for constructing a dialogue model of a system according to an embodiment.
3 is an exemplary view showing the original material and the learning material according to an embodiment.
4 is an overall flowchart illustrating an operation process of a system according to an embodiment.
5 is a detailed flowchart of a dialog model building process according to an embodiment.

본 발명은 대화형 시스템에 적용된다. 그러나 본 발명은 이에 한정되지 않고, 본 발명의 기술적 사상이 적용될 수 있는 모든 대화형 통신 시스템 및 방법에도 적용될 수 있다.The present invention applies to an interactive system. However, the present invention is not limited thereto, and may be applied to all interactive communication systems and methods to which the technical spirit of the present invention can be applied.

본 명세서에서 사용되는 기술적 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아님을 유의해야 한다. 또한, 본 명세서에서 사용되는 기술적 용어는 본 명세서에서 특별히 다른 의미로 정의되지 않는 한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 의미로 해석되어야 하며, 과도하게 포괄적인 의미로 해석되거나, 과도하게 축소된 의미로 해석되지 않아야 한다. 또한, 본 명세서에서 사용되는 기술적인 용어가 본 발명의 사상을 정확하게 표현하지 못하는 잘못된 기술적 용어일 때에는, 당업자가 올바르게 이해할 수 있는 기술적 용어로 대체되어 이해되어야 할 것이다. 또한, 본 발명에서 사용되는 일반적인 용어는 사전에 정의되어 있는 바에 따라, 또는 전후 문맥상에 따라 해석되어야 하며, 과도하게 축소된 의미로 해석되지 않아야 한다.It should be noted that technical terms used herein are used only to describe specific embodiments, and are not intended to limit the present invention. In addition, the technical terms used in this specification should be interpreted in the meaning generally understood by those of ordinary skill in the art to which the present invention belongs, unless otherwise defined in this specification, and excessively inclusive. It should not be construed in the meaning of a human being or in an excessively reduced meaning. In addition, when the technical terms used in the present specification are incorrect technical terms that do not accurately express the spirit of the present invention, they should be understood by being replaced with technical terms that can be correctly understood by those skilled in the art. In addition, general terms used in the present invention should be interpreted as defined in advance or according to the context before and after, and should not be interpreted in an excessively reduced meaning.

또한, 본 명세서에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "구성된다" 또는 "포함한다" 등의 용어는 명세서 상에 기재된 여러 구성 요소들, 또는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.Also, as used herein, the singular expression includes the plural expression unless the context clearly dictates otherwise. In the present application, terms such as "consisting of" or "comprising" should not be construed as necessarily including all of the various components or various steps described in the specification, some of which components or some steps are It should be construed that it may not include, or may further include additional components or steps.

또한, 본 명세서에서 사용되는 제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성 요소로 명명될 수 있고, 유사하게 제2 구성 요소도 제1 구성 요소로 명명될 수 있다.Also, terms including ordinal numbers such as first, second, etc. used herein may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 한다.When a component is referred to as being “connected” or “connected” to another component, it may be directly connected or connected to the other component, but another component may exist in between. On the other hand, when it is mentioned that a certain element is "directly connected" or "directly connected" to another element, it should be understood that no other element is present in the middle.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시 예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 또한, 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 발명의 사상을 쉽게 이해할 수 있도록 하기 위한 것 일뿐, 첨부된 도면에 의해 본 발명의 사상이 제한되는 것으로 해석되어서는 아니 됨을 유의해야 한다. 본 발명의 사상은 첨부된 도면 외에 모든 변경, 균등물 내지 대체물에 까지도 확장되는 것으로 해석되어야 한다.Hereinafter, a preferred embodiment according to the present invention will be described in detail with reference to the accompanying drawings, but the same or similar components are assigned the same reference numerals regardless of reference numerals, and redundant description thereof will be omitted. In addition, in the description of the present invention, if it is determined that a detailed description of a related known technology may obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, it should be noted that the accompanying drawings are only for easy understanding of the spirit of the present invention, and should not be construed as limiting the spirit of the present invention by the accompanying drawings. The spirit of the present invention should be construed as extending to all changes, equivalents, or substitutes other than the accompanying drawings.

일 실시 예는 발화자와 대화자 간의 원본 자료에 의거 도출된 할인 누적 보상값이 반영된 학습 자료로 구축된 대화 모델에 이용하여 발화자 문장에 포함된 단어 및 감정 각각에 대한 응답 단어 및 응답 감정을 생성하며 생성된 응답 감정 및 응답 단어 각각을 자연어로 변환한 다음 결합된 응답 문장을 발화하는 구성을 갖춘다.An embodiment generates and generates a response word and a response emotion for each word and emotion included in the speaker's sentence by using a conversation model built with learning data that reflects the discount accumulated compensation value derived based on the original data between the speaker and the speaker. Each of the response emotions and response words is converted into natural language, and then the combined response sentence is uttered.

도 1은 일 실시 예의 감정 히스토리를 이용한 대화 에이젼트 시스템의 구성을 보인 도면이고, 도 2는 도 1에 도시된 대화 모델 구축장치(100)의 세부적인 구성을 보인 도면이고, 도 3은 도 2에 도시된 대화 모델을 구축하기 위해 수집된 원본 자료 및 학습 자료를 보인 도이다.1 is a diagram showing the configuration of a dialogue agent system using an emotion history according to an embodiment, FIG. 2 is a diagram showing a detailed configuration of the dialogue model building apparatus 100 shown in FIG. 1, FIG. 3 is FIG. It is a diagram showing the original data and learning data collected to build the illustrated dialogue model.

도 1 내지 도 3을 참조하면, 일 실시 예에 따른 시스템은, 대화 모델 구축장치(100), 수신장치(200), 전처리장치(300), 응답 생성장치(400), 및 출력장치(500)를 포함할 수 있다.1 to 3 , the system according to an embodiment includes a dialog model building device 100 , a receiving device 200 , a preprocessing device 300 , a response generating device 400 , and an output device 500 . may include.

대화 모델 구축장치(100)는 수집된 다수의 원본 자료에 대해 발화자의 감정 변화를 유도하기 위한 할인 누적 보상값을 반영하여 학습 자료를 도출하고 도출된 학습 자료에 대한 학습을 통해 대화 모델을 구축할 수 있으며, 이에 대화 모델 구축장치(100)는 도 2에 도시된 바와 같이, 원본자료 수집모듈(111), 할인 누적 보상값 연산모듈(112), 학습 모듈(113), 및 대화 모델 구축모듈(114)를 포함할 수 있다.The dialog model building apparatus 100 derives learning materials by reflecting the accumulated discount value for inducing a change in the speaker's emotions for a plurality of collected original data, and builds a dialog model through learning on the derived learning data. As shown in FIG. 2 , the dialog model building device 100 includes an original data collection module 111 , a discount cumulative compensation value calculation module 112 , a learning module 113 , and a dialog model building module ( 114) may be included.

원본 자료 수집모듈(111)은 다수의 원본 자료들을 수집하고, 수집된 원본 자료는 할인 누적 보상값 연산모듈(112)로 전달할 수 있다. 여기서, 하나의 원본 자료는 도 3의 (a)에 도시된 바와 같이, 대화자 간에 발화 문장에 포함된 문장 및 감정으로 나타낸다.The original data collection module 111 may collect a plurality of original data, and the collected original data may be transmitted to the discount accumulation compensation value calculation module 112 . Here, as shown in FIG. 3A , one original data is represented by sentences and emotions included in the spoken sentences between the interlocutors.

할인 누적 보상값 연산모듈(112)는 수집된 원본 자료의 발화에서 이어지는 적어도 하나의 문장 및 감정으로부터 기 정해진 손실 함수에 의거 할인 누적 보상값(reward)을 연산할 수 있다.The discount cumulative compensation value calculation module 112 may calculate a discount cumulative compensation value (reward) based on a predetermined loss function from at least one sentence and emotion following the utterance of the collected original data.

할인 누적 보상값(reward)는 다음 알고리즘으로 손실함수의 해를 도출함에 따라 연산될 수 있다.The discount cumulative reward value (reward) can be calculated by deriving the solution of the loss function with the following algorithm.

(1) 할인 연산 보상값 reward=0 으로 초기 설정 (1) Initial setting of discount calculation reward value reward=0

(2) index+2n+2< 2 인 경우, x[index +2n+2]의 에피소드의 감정이 행복인 지를 판단하고 행복인 경우 현재 할인 연산 보상값=이전 할인 연산 보상값+rⁿ으로 설정(2) If index+2n+2< 2, it is determined whether the emotion of the episode of x[index +2n+2] is happiness, and if it is happy, current discount calculation compensation value = previous discount calculation compensation value+r ⁿ

(3) n=n+1 로 증가한 후 (2) 단계로 진행(3) After increasing to n=n+1, proceed to step (2)

여기서 r은 할인률로 1보다 작은 값을 가지며, x[index+2n+2]는 문장과 감정으로 하나의 원본 자료이다. 여기서, x가 홀수이면 홀수번째 발화 문장으로 대화자 1이 발화한 원본 자료이고, x가 짝수이면 대화자 2가 발화한 원본 자료이다.Here, r is the discount rate and has a value less than 1, and x[index+2n+2] is one original data with sentences and emotions. Here, if x is an odd number, it is the original data uttered by the speaker 1 as an odd-numbered sentence, and if x is an even number, it is the original data uttered by the speaker 2 as an odd-numbered sentence.

또한, index는 원본 자료의 인덱스로서, 연속된 임의의 두개의 에피소드에 대해 할인 누적 보상값(reward)을 도출할 수 있다. 이에 에피소드 x가 대화 중 몇번째 발화 문장인 지를 나타낸다. 예를 들어, 첫번째 발화 문장의 index 및 n의 값은 0이므로, 할인 누적 보상값(reward)을 도출하기 위한 최초 에피소드 x[index+2n+2]는 x[2]이다. 이러한 x[2]의 에피소드에 행복이라는 감정이 포함된 경우 할인 누적 보상값은 이전 할인 연산 보상값+rⁿ이다. 여기서 n은 0 의 값을 가진다. In addition, the index is an index of the original data, and it is possible to derive a discount cumulative reward value for any two consecutive episodes. This indicates the number of utterance sentences in episode x during the conversation. For example, since the values of index and n of the first utterance sentence are 0, the first episode x[index+2n+2] for deriving the discount cumulative reward value is x[2]. If the episode of x[2] includes the emotion of happiness, the accumulated discount reward value is the previous discount operation reward value + r ⁿ . Here, n has a value of 0.

즉, n은 할인 누적 보상값을 연산하기 위한 변수로 발화 문장에서 할인 누적 보상값을 연산하기 위한 감정까지 떨어진 거리를 나타내며, 예를 들어, 첫번째 발화 문장 및 첫번째 응답 문장에서 감정에 할인 누적 보상값으로 보상된 첫번째 문장까지의 거리 n=0이고, 세번째 발화 문장 및 세번째 응답 문장에서 감정에 할인 누적 보상값으로 보상된 두번째 문장까지의 거리 n=1이며, 다섯번째 발화 문장 및 다섯번째 응답 문장에서 감정에 할인 누적 보상값으로 보상된 두번째 문장까지의 거리 n=2 이다.That is, n is a variable for calculating the cumulative discount reward value, and represents the distance from the utterance sentence to the emotion for calculating the cumulative discount compensation value. The distance n = 0 to the first sentence rewarded with The distance n=2 to the second sentence rewarded with the cumulative reward value discounted to the emotion.

또한 원본 자료의 임의의 문장의 반응으로 바로 다음 문장에 포함된 감정이 행복이면 할인 누적 보상값은 이전 할인 누적 보상값을 기준으로 증가될 수 있고, 임의의 문장의 반응으로 소정 회 다음 원본 자료의 문장에 포함된 감정이 행복이면 할인 누적 보상값은 이전 할인 누적 보상값을 기준으로 증가될 수 있다.In addition, if the emotion contained in the immediately following sentence is happiness as a response to a random sentence of the original data, the discount cumulative reward value may be increased based on the previous discount cumulative compensation value, and a predetermined number of times as a response to the random sentence, the next original data If the emotion included in the sentence is happiness, the accumulated discount reward value may be increased based on the previous discount accumulated reward value.

연산된 할인 누적 보상값이 반영된 도 3의 (b)의 학습 자료는 학습 모듈(313)으로 전달되고, 학습 모듈(113)은 할인 누적 보상값이 반영된 학습 자료에 대해 학습을 수행할 수 있다. 여기서, 일 실시 예에 학습든 기계 학습 또는 딥러닝 등의 다양한 형태로 수행될 수 있으나, 이에 한정하지 아니한다.The learning material of FIG. 3B to which the calculated accumulated discount reward value is reflected is transmitted to the learning module 313 , and the learning module 113 may perform learning on the learning material to which the accumulated discount accumulated reward value is reflected. Here, in an embodiment, learning may be performed in various forms, such as machine learning or deep learning, but is not limited thereto.

그리고, 학습 모듈(113)의 학습 결과는 대화 모델 구축모듈(114)로 전달되며, 대화 모델 구축모듈(114)는 학습 결과를 토대로 대화 모델을 구축할 수 있다. 이러한 대화 모델은 순환 신경 회로망 인코더 및 디코더 모델이며, 순환 신경 회로망 인코더 및 디코더 모델을 구축하는 일련의 과정은 본 출원인에 의거 기 출원된 바 있다. 이에 대화 모델 구축모듈(114)은 각 입력 문장과 출력 문장에 대해 할인된 누적 보상을 도출하고, 상기 할인된 누적 보상을 기 정해진 손실함수의 가중 인자로 곱하여 최종 손실함수를 도출하며, 상기 최종 손실함수를 이용하여 경사 하강법 알고리즘으로 입력된 원본 자료에 대해 학습하여 최종 대화 모델을 구축할 수 있고, 이에 구축된 최종 대화 모델에 의거 입력된 발화 문장 및 감정에 각각에 대한 응답 단어 및 응답 감정이 도출될 수 있다.Then, the learning result of the learning module 113 is transmitted to the dialog model building module 114 , and the dialog model building module 114 may build a dialog model based on the learning result. This conversation model is a recurrent neural network encoder and decoder model, and a series of processes for building a recurrent neural network encoder and decoder model has been filed by the present applicant. Accordingly, the dialogue model building module 114 derives a discounted cumulative reward for each input sentence and an output sentence, multiplies the discounted cumulative compensation by a weighting factor of a predetermined loss function to derive a final loss function, and the final loss The final dialog model can be built by learning the original data input by the gradient descent algorithm using the function, and the response word and response emotion for each speech sentence and emotion input based on the final dialog model built on this can be derived.

한편, 수신장치(200)는 발화자의 문장을 수신하는 기능을 수행하고, 예를 들어 발화자에 의해 "나는 너무 행복해."라고 말하는 경우 발화자의 음성을 문장으로 변환하여 전처리장치(300)로 전달한다.On the other hand, the receiver 200 performs a function of receiving the speaker's sentence, for example, when the speaker says "I'm so happy", converts the speaker's voice into a sentence and transmits it to the preprocessor 300 .

여기서, 일 실시 예에서 설명 상의 편의를 위해 수신장치(200)는 발화자의 음성을 문장으로 단순 변환하는 음성 변환기를 일 례로 설명하고 있으나 전술한 다양한 형태로 감정을 추출하여 전처리장치(300)로 전달할 수 있다.Here, in one embodiment, for convenience of description, the receiver 200 is described as an example of a voice converter that simply converts the speaker's voice into a sentence, but extracts emotions in the various forms described above and delivers them to the preprocessor 300 . can

그리고 수신장치(200)는 발화자의 얼굴 표정 또는 행동 인식 등을 통해 발화자의 감정을 추출할 수 있고, 또한 수신된 단어에 포함된 감정이 반영된 발화자의 보이스 강약 및 높 낮음 등을 인식하여 발화자의 감정을 추출할 수 있다. In addition, the receiving device 200 may extract the speaker's emotions through the speaker's facial expression or behavior recognition, etc., and recognize the speaker's voice strength and low in which the emotions included in the received word are reflected to recognize the speaker's emotions. can be extracted.

즉, 발화자의 얼굴 표정 인식 알고리즘을 이용하여 얼굴 근육의 움직임에 따라 변하는 얼굴 모양, 눈·코·입의 변화, 일시적인 주름 등의 빠른 신호가 추출되고, 추출된 빠른 신호로부터 발화자의 감정이 도출된다. 여기서 감정이라 함은 놀라움, 두려움, 혐오, 화, 행복, 슬픔을 의미한다. 즉, 놀라움은 지속 시간이 가장 짧고, 두려움은 피해를 입기 전에 느껴지며, 혐오는 무언가에 대한 반감 행동으로 나타난다. 화는 가장 위험한 감정으로, 좌절이나 위협, 자극 등에 의해 일어난다. 반면 행복은 가장 긍정적인 감정이고, 슬픔은 상실이 원인으로 지속 시간이 길다는 특징을 가진다. 이러한 특징으로 추출된 감정은 전처리장치(300)로 전달된다. That is, by using the speaker's facial expression recognition algorithm, fast signals such as facial shapes that change according to the movement of facial muscles, changes in eyes, nose, and mouth, and temporary wrinkles are extracted, and the speaker's emotions are derived from the extracted fast signals. . Here, emotion means surprise, fear, disgust, anger, happiness, and sadness. That is, surprise has the shortest duration, fear is felt before harm is done, and disgust manifests itself as an antipathy towards something. Anger is the most dangerous emotion and is caused by frustration, threats, or stimulation. On the other hand, happiness is the most positive emotion, and sadness is characterized by a long duration due to loss. The emotion extracted with these characteristics is transmitted to the preprocessor 300 .

한편, 수신장치(200)는 HMM(Hidden Markov Models), CART(Classification and Regression Trees), SSL(Stacked Sequential Learning) 방법 중의 적어도 하나를 이용하여 발화자 보이스의 운율 경계를 추정하여 발화자의 감정을 추출하거나 상기 각 감정 별로 주파수 영역 및 크기 분석 결과를 토대로 발화자의 감정을 추출하여 전처리장치(300)로 전달한다.On the other hand, the receiver 200 uses at least one of Hidden Markov Models (HMM), Classification and Regression Trees (CART), and Stacked Sequential Learning (SSL) methods to estimate the prosody boundary of the speaker's voice to extract the speaker's emotions or Based on the frequency domain and magnitude analysis results for each emotion, the speaker's emotion is extracted and transmitted to the preprocessor 300 .

이하 본 실시 예에서는 설명 상의 편의를 위해 음성-텍스트 변환기를 이용하여 발화자의 음성을 단어 형태로 변환한 후 변환된 단어와 상기 단어에 포함된 감정이 전처리장치(300)로 전달하는 것을 일 례로 설명한다.Hereinafter, in the present embodiment, for convenience of explanation, the speech-to-text converter is used to convert the speaker's voice into a word form, and then, the converted word and emotion included in the word are transmitted to the preprocessor 300 as an example. do.

이에 전처리장치(300)는 수신된 발화자의 문장을 형태소 단위로 분리한 후 형태소 형태의 단어와 상기 분리된 단어에 포함된 감정을 출력하는 기능을 수행한다. 예를 들어, 전처리장치(200)는 "나는", "너무", 및 "행복해"의 단어(x₁ ~ x₄,)와 사용자의 감정인 "행복"이라는 감정(e_x)을 출력한다.Accordingly, the preprocessor 300 performs a function of dividing the received speaker's sentence into morpheme units and outputting the word in the morpheme form and the emotion included in the separated word. _{For example, the preprocessor 200 outputs the words (x 1} to x ₄ ,) of "I", "too much", and "I'm happy" and the emotion e_x of "happiness" which is the user's emotion.

그리고, 감정은 감성 TOBI(Tones and Breaking Indices: 운영전사규약) 등을 이용하여 도출되고, 도출된 감정(e_x)은 해당 감정을 나타내는 단어에 대한 후미에 추가되어 응답 생성장치(400)로 전달된다.And, the emotion is derived using the emotion TOBI (Tones and Breaking Indices: operation transcription rule), etc., and the derived emotion (e_x) is added to the tail of the word representing the emotion and transmitted to the response generating device 400 .

예를 들어, 응답 생성장치(400)는 전처리장치(300)에 의거 처리된 발화 문장 및 감정에 대해 전술한 최종 대화 모델을 토대로 응답 문장 및 응답 감정을 생성하는 기능을 수행함에 따라 응답 생성장치(400)는 각 발화 감정에 대한 응답 감정 및 발화 문장에 대한 응답 문장을 출력한다. 이러한 응답 감정 및 응답 문장 각각은 출력장치(500)로 전달된다.For example, the response generating device 400 performs a function of generating a response sentence and a response emotion based on the above-described final dialogue model with respect to the uttered sentence and emotion processed by the pre-processing unit 300, so that the response generating device ( 400) outputs a response emotion for each speech emotion and a response sentence for the speech sentence. Each of these response emotions and response sentences is transmitted to the output device 500 .

출력장치(500)는 수신된 응답 감정 및 응답 문장을 자연어 형태로 각각 변환한 후 결합하여 응답 문장을 생성하고 생성된 응답 문장을 발화한다. 예를 들어, "나도 행복해" 등의 다양한 응답 문장 및 응답 감정이 출력되게 되며 이는 대화자에 의거 발화된다.The output device 500 converts the received response emotion and response sentence into a natural language form, and combines them to generate a response sentence and utters the generated response sentence. For example, various response sentences and response emotions such as "I'm happy" are outputted, which are uttered by the interlocutor.

또한, 출력장치(500)는 감정이 반영된 응답 문장 및 감정을 다양한 형태로 출력할 수 있다. 예를 들어, 출력부(500)는 아바타 등의 캐릭터에 응답 감정과 매칭되는 얼굴 표정 및/또는 행동과 조절된 보이스 강약 및 높낮음으로 응답 문장을 출력 및/또는 발화할 수 있다. In addition, the output device 500 may output a response sentence and emotion in which emotions are reflected in various forms. For example, the output unit 500 may output and/or utter a response sentence to a character, such as an avatar, with a facial expression and/or action matched with a response emotion, and a voice strength and low pitch adjusted.

이에 일 실시 예는 다수의 수집된 원본 자료에 대해 감정 변화를 유도하기 위한 할인 누적 보상값을 반영하여 학습 자료를 도출하고 도출된 학습 자료에 대해 설정된 학습 기법에 의거 학습 수행하여 대화 모델을 구축하며, 구축된 대화 모델을 이용하여 입력된 발화 문장 및 발화 감정 각각에 대한 응답 문장 및 응답 감정을 생성하고 생성된 응답 감정 및 응답 문장을 자연어 형태로 변환한 다음 결합하여 발화함에 따라 발화자의 감정에 대한 정확도를 향상시킬 수 있고, 감정이 반영된 효과적인 대화를 수행할 수 있다.Accordingly, in one embodiment, learning materials are derived by reflecting the accumulated discount value for inducing emotional changes with respect to a plurality of collected original data, and learning is performed based on the set learning technique for the derived learning materials to build a conversation model, , using the constructed dialogue model to generate response sentences and response emotions for each input speech sentence and speech emotion, convert the generated response emotion and response sentence into a natural language form, and combine and utter the emotion of the speaker. Accuracy can be improved, and effective conversations reflecting emotions can be conducted.

한편, 일 실시 예의 감정 히스토리를 이용한 대화 에이젼트 시스템은 모델 업데이트장치(600)를 더 포함할 수 있다. 전처리장치(300)의 문장 및 감정을 전달받은 모델 업데이트장치(600)는 강화 학습 정책(Policy)에 의거 정책 변화도 학습법(Policy gradient training)으로 상기 입력된 발화 문장 및 감정과 출력된 응답 문장 및 감정에 대해 학습을 수행하여 대화 모델 구축장치(100)의 대화 모델을 업데이트할 수 있다.Meanwhile, the dialog agent system using the emotion history according to an embodiment may further include a model updater 600 . The model updater 600, which has received the sentences and emotions of the preprocessing unit 300, is a policy gradient training method based on a reinforcement learning policy (Policy), and the input speech sentences and emotions and the output response sentences and The conversation model of the conversation model building apparatus 100 may be updated by learning about the emotion.

이에 대화 모델 구축장치(100)는 전처리장치(200)의 문장 및 감정에 대해 정책 변화도 학습법에 의거 학습 수행하여 학습 결과를 구축된 대화 모델에 업데이트함에 따라 감정이 행복할 수록 할인 누적 보상값을 증가할 수 있다.Accordingly, the dialog model building device 100 learns the sentences and emotions of the preprocessor 200 based on the policy change learning method and updates the learning results in the built dialog model. As the emotions become happier, the accumulated discount value can increase

일 실시 예에서 감정은 행복을 일 례로 설명하고 있으나, 다수의 감정에 대해 적용 가능하며, 이에 한정하지 아니한다.In an embodiment, the emotion is described as happiness as an example, but it is applicable to a plurality of emotions, and is not limited thereto.

도 4는 일 실시 예에 따른 감정 히스토리를 이용한 대화 에이젼트 시스템이 발화에 응답하여 대화하는 동작을 설명하기 위한 흐름도이고 도 5는 도 4의 대화 모델 구축장치(100)의 세부 흐름도이다. 일 실시 예에 따른 감정 히스토리를 이용한 대화 에이젼트 방법을 실행하기 위한 프로그램이 기록된 컴퓨터에서 판독 가능한 기록매체가 제공될 수 있다. 상기 프로그램은 아이템 추천 방법을 저장한 응용 프로그램, 디바이스 드라이버, 펌웨어, 미들웨어, 동적 링크 라이브러리(DLL) 및 애플릿 중 적어도 하나를 포함할 수 있다. 감정 히스토리를 이용한 대화 에이젼트 시스템은 프로세서를 포함하고, 프로세서는 감정 히스토리를 이용한 대화 에이젼트 방법이 기록된 기록 매체를 판독함으로써, 감정 히스토리를 이용한 대화 에이젼트 방법을 실행할 수 있다. FIG. 4 is a flowchart illustrating an operation of a dialog agent system using an emotion history to have a conversation in response to an utterance according to an embodiment, and FIG. 5 is a detailed flowchart of the dialog model building apparatus 100 of FIG. 4 . A computer-readable recording medium in which a program for executing the dialog agent method using the emotion history according to an embodiment is recorded may be provided. The program may include at least one of an application program storing an item recommendation method, a device driver, firmware, middleware, a dynamic link library (DLL), and an applet. The conversation agent system using the emotion history may include a processor, and the processor may execute the conversation agent method using the emotion history by reading a recording medium in which the conversation agent method using the emotion history is recorded.

도 4 및 도 5를 참조하면, 단계(10)에서, 발화 감정 히스토리를 이용한 대화 에이젼트 시스템의 대화 모델 구축장치(100)는 수집된 원본 자료에 대해 발화자의 감정 변화를 유도하기 위한 할인 누적 보상값을 도출할 수 있다.Referring to FIGS. 4 and 5 , in step 10, the apparatus 100 for constructing a dialogue model of the dialogue agent system using the speech emotion history provides a discount accumulated compensation value for inducing a change in the speaker's emotion with respect to the collected original data. can be derived.

예를 들어, 모델 구축장치(100)는 단계(10)에서 다수의 원본 자료를 수집한 후 수집된 원본 자료에 대해 할인 누적 보상값을 연산할 수 있다.For example, after collecting a plurality of original data in step 10 , the model building apparatus 100 may calculate a discount cumulative compensation value for the collected original data.

즉, 모델 구축장치(100)는 단계(11)에서 다수의 원본 자료를 수집한 다음 단계(12)에서 각각의 원본 자료에 대해 초기 할인 누적 보상값 reward는 0으로 설정한 다음, 단계(13)에서 index+2n+2< 2 인 지를 판단하고, 단계(13)에서 판단 결과 index+2n+2< 2 를 만족하는 경우 단계(14)에서 x[index +2n+2]의 에피소드의 감정이 행복인 지를 판단한다.That is, the model building apparatus 100 collects a number of original data in step 11, and then sets the initial discount cumulative reward reward value to 0 for each original data in step 12, then step 13 It is determined whether index+2n+2<2 in step 13, and if index+2n+2<2 is satisfied as a result of determination in step 13, in step 14, the emotion of the episode of x[index +2n+2] is happy. judge whether

또한, 단계(14)에서 x[index +2n+2]의 에피소드의 감정이 행복인 경우 대화 모델 구축장치(100)는 단계(15)에서 현재 할인 연산 보상값=이전 할인 연산 보상값+rⁿ으로 설정한 다음 단계(16)에서 n=n+1 로 증가한 다음 단계(13)으로 진행하여 index+2n+2< 2 인 지를 판단한다. In addition, when the emotion of the episode of x[index +2n+2] is happy in step 14 , the dialog model building apparatus 100 performs the current discount calculation compensation value = previous discount calculation compensation value+r ^{n in step 15 .} n = n+1 increases to n=n+1 in step 16 and then proceeds to step 13 to determine whether index+2n+2<2.

이에 단계(17)에서 입력된 원본 자료에 대해 할인 누적 보상값을 반영하여 학습 자료를 도출하고 단계(18)에서 도출된 학습 자료에 대한 기 설정된 기계 학습 기법을 이용하여 학습을 수행하고 단계(19)에서 학습 결과를 토대로 대화 모델을 구축할 수 있다. Accordingly, learning data is derived by reflecting the discount accumulated reward value for the original data input in step 17, and learning is performed using a preset machine learning technique for the learning data derived in step 18, and learning is performed in step 19 ), a conversation model can be built based on the learning results.

한편, 단계(20)에서 발화 감정 히스토리를 이용한 대화 에이젼트 시스템의 수신장치(200)는 발화에 포함된 문장 및 감정을 수신할 수 있다. 이때 문장은 음성 변환기를 토대로 도출될 수 있고, 감정은 감정 분석기를 통해 추출될 수 있다. Meanwhile, in step 20 , the receiver 200 of the dialog agent system using the speech emotion history may receive sentences and emotions included in the speech. In this case, the sentence may be derived based on the voice converter, and the emotion may be extracted through the emotion analyzer.

그리고 단계(30)에서 전처리장치(300)는 추출된 문장 및 감정을 하나의 학습 자료로 전처리할 수 있다. And in step 30, the preprocessor 300 may preprocess the extracted sentences and emotions as one learning material.

한편 단계(40)에서 응답 생성장치(400)는 하나의 학습 자료에 대해 구축된 대화 모델에 의거 응답 문장 및 감정을 각각 도출할 수 있다.Meanwhile, in step 40 , the response generating apparatus 400 may derive a response sentence and emotion based on a conversation model built for one learning material, respectively.

그리고 단계(50)에서 출력장치(500)는 도출된 응답 문장 및 감정 각각에 대해 자연어 형태로 변환한 다음 결합하여 발화 감정에 대한 응답 감정 및 발화 문장에 대한 응답 문장을 결합하여 출력할 수 있다. And in step 50, the output device 500 converts each of the derived response sentences and emotions into a natural language form, and then combines them to combine and output the response emotions to the spoken emotions and the response sentences to the spoken sentences.

한편, 모델 업데이트장치(600)는 단계(60)에서, 강화 학습 정책(Policy)에 의거 정책 변화도 학습법(Policy gradient training)으로 입력된 발화 문장 및 감정과 출력된 응답 문장 및 감정에 대해 학습을 수행하여 대화 모델 구축장치(100)의 대화 모델을 업데이트할 수 있다.On the other hand, the model update device 600, in step 60, based on the reinforcement learning policy (Policy) learning about the input speech sentences and emotions and the output response sentences and emotions by the policy gradient learning method (Policy gradient training) It is possible to update the dialogue model of the dialogue model building apparatus 100 by performing.

다수의 수집된 원본 자료에 대해 감정 변화를 유도하기 위한 할인 누적 보상값을 반영하여 학습 자료를 도출하고 도출된 학습 자료에 대해 설정된 학습 기법에 의거 학습 수행하여 대화 모델을 구축하며, 구축된 대화 모델을 이용하여 입력된 발화 문장 및 발화 감정 각각에 대한 응답 문장 및 응답 감정을 생성하고 생성된 응답 감정 및 응답 문장을 자연어 형태로 변환한 다음 결합하여 발화함에 따라 발화자의 감정에 대한 정확도를 향상시킬 수 있고, 감정이 반영된 효과적인 대화를 수행할 수 있다.For a large number of collected original data, the learning data is derived by reflecting the accumulated discount value for inducing emotional change, and the conversation model is built by learning based on the set learning technique for the derived learning data. to generate a response sentence and response emotion for each input speech sentence and speech emotion, convert the generated response emotion and response sentence into a natural language form, and then combine and utter to improve the accuracy of the speaker's emotion. and can conduct effective conversations reflecting emotions.

이상에서는 본 발명의 바람직한 실시 예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자라면 하기의 특허 청구범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to preferred embodiments of the present invention, those skilled in the art can variously modify and change the present invention within the scope without departing from the spirit and scope of the present invention described in the claims below. You will understand that you can.

100 : 대화 모델 구축장치
111 : 원본자료 수집모듈
112 : 할인 누적 보상값 연산모듈
113 : 학습 모듈
114 : 대화 모델 구축모듈
200 : 수신장치
300 : 전처리장치
400 : 응답 생성장치
500 : 출력장치
600 : 모델 업데이트장치100: dialogue model building device
111: original data collection module
112: discount cumulative compensation value calculation module
113: learning module
114: dialogue model building module
200: receiving device
300: pre-processing device
400: response generator
500: output device
600: model update device

Claims

a dialogue model building device for calculating a cumulative discount compensation value for inducing a change in the speaker's emotions for a plurality of collected original data, and constructing a dialogue model by reflecting the calculated cumulative discount compensation value;
a receiver for receiving sentences and emotions included in utterances;
a pre-processing unit for pre-processing the sentences and emotions received from the receiving unit into one learning data;
a response generating device for deriving response sentences and emotions, respectively, based on a dialogue model built for one learning material of the preprocessor; and
Containing an output device that converts each of the derived response sentences and emotions into a natural language form and then combines them to combine the response emotions for the utterance emotions and the response sentences for the utterance sentences,
The dialogue model building device,
an original data collection module for collecting a plurality of original data;
a discount cumulative compensation value calculation module for calculating a discount cumulative compensation value for each of the original data;
a learning module that generates learning materials by reflecting the accumulated discount value calculated on the respective original materials, and then performs learning on the generated learning materials based on a predetermined learning algorithm; and
Containing a dialogue model building module for building a dialogue model based on the learning result,
The discount cumulative compensation value calculation module,
The discount calculation reward value is initially set to reward=0,
If index+2n+2<2, it is determined whether the emotion of the episode of x[index +2n+2] is happiness, and if it is happy, the current discount calculation compensation value = the previous discount calculation compensation value+r ⁿ ,
After increasing to n = n+1, it is provided to repeatedly perform all original data,
Here, r is the discount rate, x[index+2n+2] is an episode of one original data with sentences and emotions, index is the identification information of the original data, and n is the learning data and the original corrected with the discount cumulative reward value. A dialogue agent system using the speech emotion history, characterized in that it is distance information from the material.

delete

According to claim 1,
The discount cumulative compensation value is
If a sentence with the same emotion is uttered immediately following the response of a random response sentence in the original data, it increases based on the previous discount cumulative reward value,
A dialogue agent system using a speech emotion history, characterized in that when a sentence of the same emotion is uttered after a predetermined number of times as a response to an arbitrary response sentence of the original material, it is reduced based on a previous discount accumulated compensation value.

According to claim 1, wherein the dialogue agent system using the speech emotion history
A model update device that updates the dialog model by learning the sentences and emotions included in the utterance and the derived response sentences and emotions with policy gradient training based on a predetermined reinforcement learning policy (Policy) Conversation agent system using speech emotion history, characterized in that it further comprises.

a dialog model building step of calculating a cumulative discount reward value for inducing a change in the speaker's emotions with respect to a plurality of collected original data, and constructing a dialogue model by reflecting the calculated cumulative discount compensation value;
a receiving step of receiving sentences and emotions included in the utterance;
a pre-processing step of pre-processing the received sentences and emotions into one learning material;
a response generating step of deriving a response sentence and an emotion based on a dialogue model built for the one learning material, respectively; and
Containing an output step of converting each of the derived response sentences and emotions into a natural language form and then combining them to combine the response emotions for the utterance emotions and the response sentences for the utterance sentences,
The dialogue model building step is,
Collecting a large number of original data,
Calculate the discount cumulative compensation value for each of the original data,
After generating a learning material by reflecting the accumulated discount value calculated on the respective original data, learning is performed on the generated learning data based on a predetermined learning algorithm,
It is provided to build a dialogue model based on the learning results,
The discount cumulative compensation value is,
Initial set reward=0,
If index+2n+2< 2, it is determined whether the emotion of the episode of x[index +2n+2] is happiness, and if it is happy, the current discount calculation reward value = the previous discount calculation compensation value + r ⁿ ,
After increasing to n = n+1, it is provided to repeatedly perform all original data,
Here, r is the discount rate, x[index+2n+2] is an episode of one original data with sentences and emotions, index is the identification information of the original data, and n is the learning data and the original corrected with the discount cumulative reward value. Conversation agent method using speech emotion history, characterized in that it is distance information from material.

delete

The method of claim 6, wherein the discount cumulative compensation value is
If a sentence with the same emotion is uttered immediately following the response of a random response sentence in the original data, it increases based on the previous discount cumulative reward value,
A conversation agent method using a speech emotion history, provided to decrease based on a previous discount cumulative reward value when a sentence with the same emotion is uttered after a predetermined number of times as a response to an arbitrary response sentence of the original material.

7. The method of claim 6, wherein after the outputting step
A model update step of updating the dialog model by learning the speech sentences and emotions input by policy gradient training and the output response sentences and emotions based on a predetermined reinforcement learning policy (Policy) Conversation agent method using speech emotion history further comprising.

delete