KR102017229B1

KR102017229B1 - A text sentence automatic generating system based deep learning for improving infinity of speech pattern

Info

Publication number: KR102017229B1
Application number: KR1020190043515A
Authority: KR
Inventors: 윤종성; 김영준; 김위백; 양형원; 이인구; 김홍순; 송민규
Original assignee: 미디어젠(주)
Priority date: 2019-04-15
Filing date: 2019-04-15
Publication date: 2019-09-02
Also published as: WO2020213785A1

Abstract

The present invention relates to a system to automatically generate a text sentence based on deep learning for increasing the infinity of speech patterns. More specifically, common intention information of texts and speech patterns of the texts are inputted through a deep learning modeling part (100) to conduct deep learning training, and then, conducted deep learning model information by intention is provided to an automatic test sentence generation part. The automatic test sentence generation part (300) is used to obtain the deep learning model information by intention from the deep learning modeling part, and then, an intention type provided through an output intention input part (200) is inputted into the corresponding deep learning model information by intention to output an automatically generated natural language speech pattern text sentence, and then, a credibility value for the outputted text sentence is outputted.

Description

A text sentence automatic generating system based deep learning for improving infinity of speech pattern}

본 발명은 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템에 관한 것으로서, 더욱 상세하게는 원어민들로부터 수집된 딥러닝 학습자료인 텍스트들의 발화 패턴과 텍스트들의 의도 정보를 이용하여 딥러닝 트레이닝을 수행하고, 수행 결과인 의도별 딥러닝 모델 정보를 도출하고, 도출된 의도별 딥러닝 모델 정보에 특정 의도 정보를 적용하여 특정 의도를 갖는 다수의 텍스트 문장들을 자동으로 생성하여 출력하는 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템에 관한 것이다.The present invention relates to a deep learning-based automatic text sentence generation system for improving the infinity of the speech pattern, and more particularly, by using the speech pattern of the texts and the intention information of the texts, which are deep learning materials collected from native speakers. Performing deep learning training, deriving deep learning model information for each intention as a result, and generating and outputting a plurality of text sentences having a specific intention by applying specific intention information to the derived deep learning model information for each intention The present invention relates to a deep learning based text sentence generation system for improving the infinity of speech patterns.

음성인식 기술이 발전하면서 자연어 인식에 대한 수요도 높아지고 있다.With the development of speech recognition technology, the demand for natural language recognition is increasing.

종래 기술의 경우, 자연어 인식은 의도 파악과 대상의 추출을 위해 Rule, 통계 모델, 의도별 딥러닝 모델 정보 등 다양한 모델을 사용하여 자연어 인식이 수행되는데, 실제 이러한 모델이 구현된 뒤 이를 검증하기 위한 데이터 수집이 용이치 않은 실정이다.In the prior art, natural language recognition is performed using various models such as rules, statistical models, and deep learning model information for each intention to identify intentions and extract objects. Data collection is not easy.

즉, 다수의 발화 데이터가 수집되어야 이러한 모델을 만들 수 있는데, 수집된 데이터의 대부분은 이미 모델을 생성하는 데 사용되었기 때문에, 검증용으로 재사용하기에는 부적합하다. 이에 따라, 검증용 자연어 문장 데이터를 새로 수집해야 하는데, 이는 비용과 시간이 매우 많이 드는 작업이다.In other words, a large number of utterance data must be collected before this model can be created. Since most of the collected data has already been used to generate the model, it is not suitable for reuse for verification purposes. Accordingly, new natural sentence data for verification must be collected, which is a very expensive and time-consuming task.

따라서, 딥러닝 기술을 활용하여 기존에 수집된 데이터를 기반으로 새로운 테스트용 문장을 생성하는 기술과 이를 통해 모델 생성에 사용되지 않은 순수한 검증용 데이터를 확보할 수 있는 기술의 필요성이 대두되고 있다.Therefore, there is a need for a technology for generating a new test sentence based on data collected using deep learning technology and a technology for obtaining pure verification data not used for model generation.

또한, 종래 기술의 경우, Dictation 및 NLU 기술 적용에 따른 인식 결과 Format이 변경될 수 밖에 없었다. 따라서, 기존의 Grammar Matching 방식으로 자동 평가는 불가능하였으며, 기존 평가시스템에서 발생하는 오류의 자동 수정의 필요성이 대두되고 있다.In addition, in the prior art, the format of the recognition result was changed due to the application of the Dictation and NLU technologies. Therefore, automatic evaluation was impossible with the existing Grammar Matching method, and there is a need for automatic correction of errors occurring in the existing evaluation system.

또한, 종래 기술의 경우, 음성 인식 장비에 자연어 처리 기술이 적용되면서 음성 인식 명령어로써 모든 경우의 발화 패턴이 입력 가능하게 되었으나, 사용자의 잠재적 발화 패턴을 수동으로 작성하여 테스트하는 것은 물리적으로 많은 시간과 비용이 소모되고 생성할 수 있는 패턴에 한계가 있다.In addition, in the conventional technology, as the natural language processing technology is applied to the speech recognition equipment, speech patterns in all cases can be input as speech recognition commands, but manual creation and testing of the potential speech patterns of the user are physically time-consuming. There is a cost and a limit to the patterns that can be generated.

이에 따라, 발화 패턴의 무한성(infinity)을 개선한 사용자의 잠재적 발화 패턴을 자동으로 생성하는 자동화 시스템 개발이 필요해졌다.Accordingly, there is a need for the development of an automated system that automatically generates a user's potential speech pattern that improves the infinity of the speech pattern.

본 발명에서는 상기와 같은 종래 기술의 문제점을 개선하고자 원어민들로부터 수집된 딥러닝 학습자료인 텍스트들의 발화 패턴과 텍스트들의 공통 의도 정보를 이용하여 딥러닝 트레이닝을 수행하고, 수행 결과 의도별 딥러닝 모델 정보를 도출하고, 도출된 의도별 딥러닝 모델 정보에 특정 의도 정보를 적용하여 특정 의도를 갖는 다수의 텍스트 문장들을 자동으로 생성하여 출력하기 위한 딥러닝 기반의 자연어 발화 패턴 텍스트 문장 자동 생성시스템을 제안하게 되었다.In the present invention, in order to improve the problems of the prior art as described above, deep learning training is performed using the speech patterns of texts, which are the deep learning learning materials collected from native speakers, and common intention information of the texts, and the deep learning model for each intention as a result. We propose a deep learning-based natural language utterance pattern text sentence generation system for automatically generating and outputting a large number of text sentences with specific intent by deriving information and applying specific intent information to the derived deep learning model information. Was done.

(선행문헌1) 대한민국등록특허번호 제10-0733469호(Previous Document 1) Republic of Korea Patent No. 10-0733469

따라서, 본 발명은 상기와 같은 종래 기술의 문제점을 감안하여 제안된 것으로서, 본 발명의 제1 목적은 수집된 딥러닝 학습 자료인 원어민 텍스트들의 발화 패턴과 텍스트들의 의도 정보를 이용하여 딥러닝 트레이닝을 수행하여 의도별 딥러닝 모델 정보를 생성하는 것을 목적으로 한다.Accordingly, the present invention has been proposed in view of the above-described problems of the prior art, and a first object of the present invention is to perform deep learning training by using utterance patterns of native speakers' texts, which are collected deep learning materials, and intention information of the texts. The purpose is to generate deep learning model information for each intent.

본 발명의 제2 목적은 생성된 의도별 딥러닝 모델 정보에 특정 의도 정보를 반영시켜 특정 의도를 갖는 텍스트 문장을 자동으로 생성하는 것을 목적으로 한다.A second object of the present invention is to automatically generate text sentences having a specific intention by reflecting specific intention information in the generated deep learning model information for each intention.

본 발명의 제3 목적은 생성된 텍스트 문장을 음성인식 자동평가시스템의 NLU(Natural Language Understanding)의 평가용 텍스트(Text)로 활용하는 것을 목적으로 한다.A third object of the present invention is to utilize the generated text sentence as an evaluation text of Natural Language Understanding (NLU) of the automatic speech recognition system.

본 발명이 해결하고자 하는 과제를 달성하기 위하여, 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템은,In order to achieve the problem to be solved by the present invention, deep learning based text sentence automatic generation system for improving the infinity of the speech pattern,

발화 패턴 정보와 의도 정보를 갖는 딥러닝 학습자료인 다양한 텍스트들을 입력받아 딥러닝 트레이닝을 수행하여 복수의 딥러닝 모델을 포함하는 의도별 딥러닝 모델 정보를 생성하고, 생성된 의도별 딥러닝 모델 정보를 테스트문장자동생성부(300)로 제공하기 위한 딥러닝모델링부(100)와,Deep learning training is performed by receiving various texts, which are deep learning learning materials having speech pattern information and intention information, to generate deep learning model information for each intention including a plurality of deep learning models, and to generate deep learning model information for each intent generated. And the deep learning modeling unit 100 to provide a test sentence automatic generation unit 300,

테스트문장자동생성부(200)의 텍스트 문장 생성시 필요한 의도유형정보를 테스트문장자동생성부(200)로 제공하기 위한 출력의도입력부(200)와,An output intention input unit 200 for providing the test sentence automatic generation unit 200 with intent type information necessary for generating the text sentence of the test sentence automatic generation unit 200;

상기 딥러닝모델링부(100)가 제공한 의도별 딥러닝 모델 정보와 상기 출력의도입력부(200)가 제공한 의도유형정보를 이용하여 의도유형정보에 해당하는 텍스트 문장을 생성하여 출력하고, 출력되는 텍스트 문장에 대한 신뢰값을 출력하기 위한 테스트문장자동생성부(300)를 포함하는 것을 특징으로 한다.By using the deep learning model information for each intent provided by the deep learning modeling unit 100 and the intention type information provided by the output intention input unit 200, a text sentence corresponding to the intention type information is generated and output. And a test sentence automatic generation unit 300 for outputting a confidence value for the text sentence.

이상의 구성 및 작용을 갖는 본 발명인 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템은 기존에 수집된 데이터(딥러닝 학습자료로 활용하기 위해 수집된 특정 의도를 갖는 다양한 발화패턴을 갖는 텍스트들)를 기반으로 수집된 데이터와는 다른 새로운 텍스트 문장(수집된 텍스트의 발화패턴과 동일한 의도를 갖되 다른 발화패턴을 갖는 텍스트 문장)을 자동으로 생성 출력함으로써, 사용자의 잠재적 발화 패턴 즉, 잠재적 발화 패턴의 무한성(infinity)을 개선할 수 있는 효과를 제공하게 된다.The deep learning based automatic text sentence generation system for improving the infinity of the utterance pattern according to the present invention having the above-described configuration and operation utilizes various utterance patterns having specific intentions collected in order to be used as the existing data (deep learning material). And automatically generate and output a new text sentence (text sentence having the same intention as the collected speech pattern of the collected text but having a different speech pattern) based on the collected data based on the collected text). It will provide the effect of improving the infinity of the potential speech pattern.

또한, 사용자를 대상으로 직접 평가용 코퍼스(말뭉치)를 수집하지 않아도 자동적으로 의도 유형별 텍스트 문장(모델 생성에 사용되지 않은 순수한 검증용 데이터)을 확보할 수 있게 되어, 이를 음성인식 자동평가시스템의 NLU(Natural Language Understanding)의 평가용 텍스트(Text)로 활용함으로써, 사용자의 잠재적 발화 패턴을 수동으로 작성하여 테스트함에 따른 장시간 소요와 막대한 비용 소모를 줄일 수 있는 효과를 제공하게 된다.In addition, it is possible to automatically obtain text sentences (pure verification data not used for model generation) by intention type without collecting corpus (corpus) directly for the user, which is the NLU of the automatic speech recognition system. By using it as an evaluation text of (Natural Language Understanding), it is possible to reduce a long time and a huge cost by manually creating and testing a user's potential speech pattern.

도 1은 본 발명의 제1 실시예에 따른 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템을 개략적으로 나타낸 전체 구성도.
도 2는 본 발명의 제1 실시예에 따른 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템의 딥러닝모델링부(100) 블록도.
도 3은 본 발명의 제1 실시예에 따른 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템의 RNN 기본 구조 예시도.
도 4는 본 발명의 제1 실시예에 따른 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템의 테스트문장자동생성부(300) 블록도.
도 5는 본 발명의 제1 실시예에 따른 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템의 테스트문장자동생성부(300)를 통해 생성되는 자동 텍스트 문장 생성 구조도.
도 6은 본 발명의 제1 실시예에 따른 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템의 RNN 알고리즘 구조 및 트레이닝 파라미터 예시도.
도 7은 본 발명의 제1 실시예에 따른 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템의 RNN 모델 트레이닝 예시도.
도 8은 본 발명의 제1 실시예에 따른 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템에 의해 생성된 자동 생성 결과 시험 예시도.
도 9는 본 발명의 제1 실시예에 따른 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템에 의해 생성된 의도 유형별 텍스트 문장 출력 예시도.1 is an overall configuration diagram schematically showing a deep learning based automatic text sentence generation system for improving the infinity of the speech pattern according to a first embodiment of the present invention.
2 is a block diagram of a deep learning modeling unit 100 of a deep learning based automatic text sentence generation system for improving the infinity of the utterance pattern according to the first embodiment of the present invention.
3 is a basic diagram illustrating a basic structure of an RNN of a deep learning based text sentence automatic generation system for improving infinity of a speech pattern according to a first embodiment of the present invention.
Figure 4 is a block diagram of the automatic test sentence generation unit 300 of the deep text-based automatic text sentence generation system for improving the infinity of the utterance pattern according to the first embodiment of the present invention.
5 is an automatic text sentence generation structure diagram generated by the test sentence automatic generation unit 300 of the deep text-based automatic text sentence generation system for improving the infinity of the utterance pattern according to the first embodiment of the present invention.
6 is an exemplary structure and training parameters of an RNN algorithm of an automatic text sentence generation system based on deep learning for improving infinity of a speech pattern according to a first embodiment of the present invention.
7 is an exemplary RNN model training diagram of a deep learning based automatic text sentence generation system for improving the infinity of the speech pattern according to the first embodiment of the present invention.
8 is an exemplary diagram illustrating an automatic generation result test generated by a deep learning based text sentence automatic generation system for improving the infinity of the utterance pattern according to the first embodiment of the present invention.
9 is an exemplary view of outputting a text sentence for each intent type generated by an automatic system for automatically generating text sentences based on deep learning to improve infinity of the speech pattern according to the first embodiment of the present invention.

이하의 내용은 단지 본 발명의 원리를 예시한다. 그러므로 당업자는 비록 본 명세서에 명확히 설명되거나 도시되지 않았지만, 본 발명의 원리를 구현하고 본 발명의 개념과 범위에 포함된 다양한 장치를 발명할 수 있는 것이다. The following merely illustrates the principles of the invention. Therefore, those skilled in the art, although not explicitly described or illustrated herein, can embody the principles of the present invention and invent various devices that fall within the spirit and scope of the present invention.

또한, 본 명세서에 열거된 모든 조건부 용어 및 실시 예들은 원칙적으로, 본 발명의 개념이 이해되도록 하기 위한 목적으로만 명백히 의도되고, 이와 같이 특별히 열거된 실시 예들 및 상태들에 제한적이지 않는 것으로 이해되어야 한다.In addition, all conditional terms and embodiments listed herein are in principle clearly intended to be understood only for the purpose of understanding the concept of the invention and are not to be limited to the specifically listed embodiments and states. do.

본 발명을 설명함에 있어서 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 구성요소들은 용어들에 의해 한정되지 않을 수 있다.In describing the present invention, terms such as first and second may be used to describe various components, but the components may not be limited by the terms.

예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.

어떤 구성요소가 다른 구성요소에 연결되어 있다거나 접속되어 있다고 언급되는 경우는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해될 수 있다.When a component is referred to as being connected or connected to another component, it may be understood that the component may be directly connected to or connected to the other component, but there may be other components in between. .

본 명세서에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니며, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다.The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention, and singular forms may include plural forms unless the context clearly indicates otherwise.

본 명세서에서, 포함하다 또는 구비하다 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것으로서, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해될 수 있다.In this specification, the terms including or including are intended to designate that there exists a feature, a number, a step, an operation, a component, a part, or a combination thereof described in the specification, and one or more other features or numbers, It can be understood that it does not exclude in advance the possibility of the presence or addition of steps, actions, components, parts or combinations thereof.

본 발명의 제1 실시예에 따른 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템은,In the deep learning based automatic text sentence generation system for improving the infinity of the speech pattern according to the first embodiment of the present invention,

상기 딥러닝모델링부(100)가 제공한 의도별 딥러닝 모델 정보와 상기 출력의도입력부(200)가 제공한 의도유형정보를 이용하여 의도유형정보에 해당하는 텍스트 문장을 생성하여 출력하고, 출력되는 텍스트 문장에 대한 신뢰값을 출력하기 위한 테스트문장자동생성부(300)를 포함하여 구성되는 것을 특징으로 한다.By using the deep learning model information for each intent provided by the deep learning modeling unit 100 and the intention type information provided by the output intention input unit 200, a text sentence corresponding to the intention type information is generated and output. And a test sentence automatic generation unit 300 for outputting a confidence value for the text sentence.

또한, 상기 딥러닝모델링부(100)는,In addition, the deep learning modeling unit 100,

딥러닝 학습자료인 특정 발화패턴을 갖는 복수의 텍스트들을 입력받고, 입력된 텍스트들을 트레이닝모델링부(130)로 제공하기 위한 모델텍스트코퍼스부(110);A model text corpus unit 110 for receiving a plurality of texts having a specific speech pattern as a deep learning material and providing the input texts to the training modeling unit 130;

모델텍스트코퍼스부(110)에 입력되는 텍스트들의 의도를 입력받고, 입력된 해당 의도를 해당 텍스트에 태깅하고, 텍스트마다 의도가 태깅된 택스트별 의도 태깅정보를 트레이닝모델링부(130)로 제공하기 위한 모델의도입력부(120);To receive the intents of the texts input to the model text corpus unit 110, tag the corresponding intentions to the corresponding texts, and provide the training modeling unit 130 with intent tagging information for each text tagged with the intents for each text. Model intention input unit 120;

상기 모델텍스트코퍼스부(110)가 제공한 특정 발화패턴을 갖는 복수의 텍스트들과 모델의도입력부(120)가 제공한 텍스트별 의도 태깅정보를 이용하여 딥러닝 트레이닝을 수행하여 의도별 딥러닝 모델 정보를 생성하고, 생성된 의도별 딥러닝 모델 정보를 테스트문장자동생성부(300)로 제공하기 위한 트레이닝모델링부(130);를 포함하여 구성되는 것을 특징으로 한다.Deep learning model for each intention by performing deep learning training using a plurality of texts having a specific speech pattern provided by the model text corpus 110 and intention tagging information for each text provided by the model intention input unit 120 And a training modeling unit 130 for generating the information and providing the generated deep learning model information for each test to the test sentence automatic generation unit 300.

이때, 상기 트레이닝모델링부(130)는,At this time, the training modeling unit 130,

딥러닝 모델링시 모델링에 관한 파라미터를 조정하는 모델링파라미터제공모듈(131)을 포함하여 구성되는 것을 특징으로 한다.In the deep learning modeling is characterized in that it comprises a modeling parameter providing module 131 for adjusting the parameters related to the modeling.

또한, 상기 테스트문장자동생성부(300)는,In addition, the test sentence automatic generation unit 300,

상기 딥러닝모델링부(100)가 제공한 의도별 딥러닝 모델 정보와 상기 출력의도입력부(200)가 제공한 의도유형정보를 이용하여, 상기 의도유형정보에 해당하는 의도를 갖는 다양한 발화패턴의 텍스트 문장들을 자동으로 생성하기 위한 연산모델부(310);By using the deep learning model information for each intent provided by the deep learning modeling unit 100 and the intention type information provided by the output intention input unit 200, various speech patterns having intentions corresponding to the intention type information are included. A calculation model unit 310 for automatically generating text sentences;

상기 연산모델부(310)가 생성한 텍스트 문장들을 출력하기 위한 코퍼스출력부(320);A corpus output unit 320 for outputting text sentences generated by the calculation model unit 310;

상기 코퍼스출력부(320)를 통해 출력되는 텍스트 문장이 출력의도입력부(200)를 통해 입력된 의도유형정보의 의도에 얼마나 유사한지를 나타내는 신뢰값을 출력하기 위한 신뢰값출력부(330);를 포함하여 구성되는 것을 특징으로 한다.A confidence value output unit 330 for outputting a confidence value indicating how similar the text sentence output through the corpus output unit 320 to the intention of the intention type information input through the output intention input unit 200; Characterized in that it comprises a.

이때, 상기 연산모델부(310)는,At this time, the calculation model unit 310,

생성된 텍스트 문장 출력시 출력 옵션에 관한 파라미터를 조정하기 위한 출력파라미터제공모듈(311);을 더 포함하여 구성되는 것을 특징으로 한다.And an output parameter providing module 311 for adjusting a parameter related to an output option when outputting the generated text sentence.

또한, 테스트문장자동생성부(300)가 생성하여 출력하는 텍스트 문장은,In addition, the text sentence generated and output by the test sentence automatic generation unit 300,

딥러닝 학습자료로 사용되기 위해 입력된 텍스트들과는 동일한 의도 유형을 갖되 다른 발화 패턴을 갖는 텍스트 문장인 것을 특징으로을 특징으로 한다.Characterized in that it is a text sentence having the same intention type but different speech patterns than the input text to be used as a deep learning learning material.

또한, 상기 테스트문장자동생성부(300)가 생성하여 출력하는 텍스트 문장은,Further, the text sentence generated and output by the test sentence automatic generation unit 300,

음성인식 자동평가시스템의 NLU(Natural Language Understanding)의 평가용 텍스트로 활용되는 것을 특징으로 한다.Characterized in that it is used as the text for evaluation of the Natural Language Understanding (NLU) of the speech recognition automatic evaluation system.

이하에서는, 본 발명에 의한 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템의 실시예를 통해 상세히 설명하도록 한다.Hereinafter, a deep learning based text sentence automatic generation system for improving the infinity of the speech pattern according to the present invention will be described in detail.

도 1은 본 발명의 제1 실시예에 따른 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템을 개략적으로 나타낸 전체 구성도이다.1 is an overall configuration diagram schematically showing a deep learning based automatic text sentence generation system for improving the infinity of the utterance pattern according to the first embodiment of the present invention.

도 1에 도시한 바와 같이, 본 발명인 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템(1000)은 딥러닝모델링부(100), 출력의도입력부(200), 테스트문장자동생성부(300)를 포함하여 구성되게 된다.As shown in FIG. 1, the deep learning-based text sentence automatic generation system 1000 for improving the infinity of the utterance pattern of the present invention includes a deep learning modeling unit 100, an output intention input unit 200, and a test sentence automatic system. It is configured to include a generation unit 300.

상기 딥러닝모델링부(100)는 발화 패턴 정보와 의도 정보를 갖는 딥러닝 학습자료인 다양한 텍스트들을 입력받아 딥러닝 트레이닝을 수행하여 복수의 딥러닝 모델을 포함하는 의도별 딥러닝 모델 정보를 생성하고, 생성된 의도별 딥러닝 모델 정보를 테스트문장자동생성부(300)로 제공하기 위한 수단이다.The deep learning modeling unit 100 receives various texts, which are deep learning learning materials having speech pattern information and intention information, and performs deep learning training to generate deep learning model information for each intention including a plurality of deep learning models. It is a means for providing the generated test results deep learning model information to the test sentence automatic generation unit 300.

예를 들어, 딥러닝모델링부(100)로 입력되는 딥러닝 학습자료가 '1.판다 레스토랑 찾기', '2.나는 셜록홈즈가 누구인지 알고 싶다' 등일 때, 입력되는 각각의 텍스트들의 발화 패턴 정보는 "판다_레스토랑_찾기", "나_셜록홈즈_누구인지_알고싶다" 이고, 입력되는 각각의 텍스트들의 의도 정보는 '레스토랑 검색','인물검색' 이되는 것이다.For example, when the deep learning material input to the deep learning modeling unit 100 is '1.Find a panda restaurant', '2.I want to know who Sherlock Holmes' is, the pattern of speech of each text inputted The information is "Panda_Restaurant_Find", "I_Sherlock Holmes_I want to know who", and the intention information of each text entered is 'Restaurant Search' and 'People Search'.

이와 같이 딥러닝모델링부(100)는 발화 패턴과 의도 정보를 갖는 다수의 딥러닝 학습자료인 텍스트들을 입력받아 딥러닝 트레이닝을 수행하고, 수행 결과인 의도별 딥러닝 모델 정보(복수의 딥러닝 모델 포함)를 테스트문장자동생성부(300)로 제공하게 된다.As such, the deep learning modeling unit 100 performs deep learning training by receiving texts of a plurality of deep learning materials having speech patterns and intention information, and performs deep learning model information for each intention (a plurality of deep learning models). Including) to the test sentence automatic generation unit 300.

상기 출력의도입력부(200)는 테스트문장자동생성부(300)의 텍스트 문장 생성시 필요한 의도유형정보를 테스트문장자동생성부(300)로 제공하기 위한 수단이다.The output intention input unit 200 is a means for providing the test sentence automatic generation unit 300 with the intent type information necessary for generating the text sentence of the test sentence automatic generation unit 300.

예를 들어, '레스토랑 찾기'라는 의도 유형을 테스트문장자동생성부(300)로 제공하여, 테스트문장자동생성부(300)가 '레스토랑 찾기'라는 의도를 갖는 다양한 발화 패턴의 텍스트 문장을 생성 출력하도록 하는 것이다.For example, by providing a test sentence automatic generation unit 300 with an intention type of 'finding restaurant', the test sentence automatic generation unit 300 generates text sentences of various utterance patterns having an intention of 'finding restaurant'. To do that.

상기 테스트문장자동생성부(300)는 상기 딥러닝모델링부(100)가 제공한 의도별 딥러닝 모델 정보와 상기 출력의도입력부(200)가 제공한 의도유형정보를 이용하여 의도유형정보에 해당하는 텍스트 문장을 생성하여 출력하고, 출력되는 텍스트 문장에 대한 신뢰값을 출력하는 구성이다.The test sentence automatic generation unit 300 corresponds to intention type information by using the deep learning model information for each intention provided by the deep learning modeling unit 100 and the intention type information provided by the output intention input unit 200. It is configured to generate and output a text sentence, and to output a confidence value for the output text sentence.

예를 들어, 출력의도입력부(200)가 제공한 의도유형정보가 '레스토랑 찾기' 라면, 딥러닝모델링부(100)가 제공한 의도별 딥러닝 모델 정보 중, '레스토랑 찾기'란 의도의 딥러닝 모델 정보를 이용하여 다양한 발화 패턴을 갖는 '레스토랑 찾기'란 의도의 텍스트 문장들을 생성하여 출력하는 것이다.For example, if the intention type information provided by the output intention input unit 200 is 'find restaurant', among the deep learning model information for each intention provided by the deep learning modeling unit 100, 'find restaurant' is a deep intention Finding a restaurant with various speech patterns using the running model information generates and outputs text sentences intended.

도 2는 본 발명의 제1 실시예에 따른 딥러닝 기반의 발화 패턴 무한성이 개선된 텍스트 문장 자동 생성시스템의 딥러닝모델링부(100) 블록도이다.2 is a block diagram of a deep learning modeling unit 100 of an automatic text sentence generation system with improved deep learning based speech pattern infinity according to a first embodiment of the present invention.

도 2에 도시한 바와 같이, 상기 딥러닝모델링부(100)는 모델텍스트코퍼스부(110), 모델의도입력부(120), 트레이닝모델링부(130)를 포함하여 구성된다.As shown in FIG. 2, the deep learning modeling unit 100 includes a model text corpus unit 110, a model intention input unit 120, and a training modeling unit 130.

구체적으로 설명하면, 상기 모델텍스트코퍼스부(110)는 딥러닝 학습자료인 특정 발화패턴을 갖는 복수의 텍스트들을 입력받고, 입력된 텍스트들을 트레이닝모델링부(130)로 제공하는 것을 특징으로 한다.In detail, the model text corpus unit 110 receives a plurality of texts having a specific speech pattern which is a deep learning material, and provides the input texts to the training modeling unit 130.

특히 상기 입력되는 특정 발화패턴을 갖는 복수의 텍스트들은 각각의 의도를 갖고 있게 된다.In particular, the plurality of texts having the specific speech pattern inputted have their respective intentions.

예를 들어, 입력되는 특정 발화패턴을 갖는 복수의 텍스트들이 1.Where is the <Panda Restaurant>?, 2.I want to look for <Panda Restaurant>, 3.Could you please find <Panda Restaurant>? 라면, 상기 텍스트들은 'Panda' 란 식당을 찾으라는 의도를 갖는 텍스트들인 것이다. 즉, 상기 텍스트들은 'Find Restaurant(식당 찾기)'라는 의도의 텍스트들인 것이다.For example, a plurality of texts having a specific speech pattern inputted are 1.Where is the <Panda Restaurant> ?, 2.I want to look for <Panda Restaurant>, 3.Could you please find <Panda Restaurant>? Ramen, the texts are texts intended to find a restaurant called 'Panda'. That is, the texts are texts intended to be 'Find Restaurant'.

마찬가지로, 다른 예로서, 음악 검색, 주소 검색, 날씨 검색, 주식 검색 등의 다양한 의도에 해당하는 다양한 발화 패턴을 갖는 텍스트들이 모델텍스트코퍼스부(110)를 통해 입력되는 것이다.Similarly, as another example, texts having various speech patterns corresponding to various intentions such as music search, address search, weather search, stock search, etc. may be input through the model text corpus unit 110.

특히, 딥러닝 학습자료인 특정 발화패턴을 갖는 복수의 텍스트들의 모델텍스트코퍼스부(110)로의 입력은 수집된 텍스트들의 사람에 의한 수동 입력과 텍스트 수집 로봇에 의해 수집된 텍스트들의 로봇에 의한 자동 입력을 포함하는 것을 특징으로 한다.In particular, the input of the plurality of texts having a specific speech pattern, which is a deep learning material, into the model text corpus unit 110 is a manual input of collected texts by a person and an automatic input of texts collected by a text collecting robot. Characterized in that it comprises a.

그리고, 모델의도입력부(120)는 모델텍스트코퍼스부(110)에 입력되는 텍스트들의 의도를 입력받고, 입력된 해당 의도를 해당 텍스트에 태깅하고, 텍스트마다 의도가 태깅된 택스트별 의도 태깅정보를 트레이닝모델링부(130)로 제공하는 것을 특징으로 한다.In addition, the model intention input unit 120 receives the intentions of the texts inputted into the model text corpus unit 110, tags the corresponding intentions into the corresponding texts, and displays the intention tagging information for each text in which the intentions are tagged for each text. Characterized in that provided to the training modeling unit 130.

예를 들어, 모델텍스트코퍼스부(110)에 입력되는 1번 텍스트가 "Where is the <Panda Restaurant>?"이고 1번 텍스트의 의도가 식당 찾기라는 정보가 모델의도입력부(120)에 입력되면, 모델의도입력부(120)는 입력된 1번 텍스트인 "Where is the <Panda Restaurant>?"에 "식당 찾기"란 의도를 태깅하고, "식당 찾기"란 의도정보가 태깅된 텍스트별 의도태킹정보(예:"Where is the <Panda Restaurant>? _식당 찾기")를 트레이닝모델링부(300)로 제공하게 된다.For example, when the text 1 inputted to the model text corpus unit 110 is "Where is the <Panda Restaurant>?" And the information of the text 1 is to find a restaurant, the information is input to the model input unit 120. The model intention input unit 120 tags the intention of "finding a restaurant" to the inputted text "Where is the <Panda Restaurant>?", And the intention tagging for each text tagged with the intention information of "finding a restaurant." Information (eg, “Where is the <Panda Restaurant>? _ Find a restaurant”) is provided to the training modeling unit 300.

이후, 상기 트레이닝모델링부(130)는 상기 모델텍스트코퍼스부(110)가 제공한 특정 발화패턴을 갖는 복수의 텍스트들과 모델의도입력부(120)가 제공한 텍스트별 의도태깅정보를 이용하여 딥러닝 트레이닝을 수행하여 의도별 딥러닝 모델 정보를 생성하고, 생성된 의도별 딥러닝 모델 정보를 테스트문장자동생성부(300)로 제공하는 기능을 수행하게 된다.Thereafter, the training modeling unit 130 uses the plurality of texts having a specific speech pattern provided by the model text corpus 110 and the intention tagging information for each text provided by the model intention input unit 120. By performing the running training to generate the deep learning model information for each intention, and performs the function of providing the generated deep learning model information for each intent to the test sentence automatic generation unit 300.

상기 트레이닝모델링부(130)는 의도별 딥러닝 모델 정보 생성을 위해 도 3에 도시된 종래의 RNN(Recurrent Neural Network) 알고리즘을 활용할 수 있다.The training modeling unit 130 may utilize the conventional Recurrent Neural Network (RNN) algorithm illustrated in FIG. 3 to generate deep learning model information for each intention.

상기 RNN(Recurrent Neural Network) 알고리즘은 입력된 텍스트들의 문장은 시간의 순서에 따라 그 연쇄가 달라질 수 있는 특성을 지니고 있으므로 하나의 단어만을 이해하는 것이 아니라, 이전 단어와 현재 단어의 관계를 학습한 후, 이후 단어를 예측하는 알고리즘인 것으로 종래의 기술인바 구체적인 설명은 생략한다.The Recurrent Neural Network (RNN) algorithm has a characteristic that the chain of the input texts can vary in sequence according to the time, so that not only one word is understood, but the relationship between the previous word and the current word is studied. Since it is an algorithm for predicting a word afterwards, a conventional description thereof will be omitted.

한편, 부가적인 양태에 따라, 상기 트레이닝모델링부(130)는 딥러닝 모델링시 모델링에 관한 파라미터를 조절하는 모델링파라미터제공모듈(131)를 더 포함하여 구성되는 것을 특징으로 한다.On the other hand, according to an additional aspect, the training modeling unit 130 is characterized in that it further comprises a modeling parameter providing module 131 for adjusting the parameters related to modeling during deep learning modeling.

예를 들어, 딥러닝 모델링시 모델링의 차수, 모델링의 깊이, 모델링 네트워크 유형 등과 같은 모델링시 필요한 모델링 옵션을 조절하여 모델링의 차수, 깊이, 네트워크 유형등을 조절할 필요가 있는데, 상기 모델링파라미터제공모듈(131)은 이러한 모델링 옵션에 관한 모델링 파라미터를 트레이닝모델링부(130)에 제공하여 제공된 파라미터로 모델링이 이루어질 수 있도록 하는 것이다.For example, during deep learning modeling, it is necessary to adjust the modeling order, depth, network type, and the like by adjusting modeling options required for modeling, such as modeling depth, modeling depth, modeling network type, and the like. 131 provides modeling parameters related to these modeling options to the training modeling unit 130 so that modeling may be performed using the provided parameters.

도 4는 본 발명의 제1 실시예에 따른 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템의 테스트문장자동생성부(300) 블록도이다.FIG. 4 is a block diagram of an automatic test sentence generation unit 300 of a system for automatically generating text sentences based on deep learning for improving infinity of a speech pattern according to a first embodiment of the present invention.

도 5는 본 발명의 제1 실시예에 따른 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템의 테스트문장자동생성부(300)를 통해 생성되는 자동 텍스트 문장 생성 구조도이다.5 is an automatic text sentence generation structure diagram generated by the test sentence automatic generation unit 300 of the deep text-based automatic text sentence generation system for improving the infinity of the utterance pattern according to the first embodiment of the present invention.

도 4 내지 도 5에 도시한 바와 같이, 상기 테스트문장자동생성부(300)는 연산모델부(310), 코퍼스출력부(320), 신뢰값출력부(330)을 포함하여 구성되게 된다.As shown in FIGS. 4 to 5, the test sentence automatic generation unit 300 includes a calculation model unit 310, a corpus output unit 320, and a confidence value output unit 330.

상기 연산모델부(310)는 딥러닝모델링부(100)로부터 제공된 의도별 딥러닝 모델 정보(50)와 출력의도입력부(200)를 통해 제공된 의도유형정보를 이용하여 상기 의도유형정보에 해당하는 의도를 갖는 다양한 발화패턴의 텍스트 문장들을 자동 생성하게 된다.The computational model unit 310 corresponds to the intention type information using the intention type deep learning model information 50 provided from the deep learning modeling unit 100 and the intention type information provided through the output intention input unit 200. Automatically generate text sentences with various intention patterns.

예를 들어, 딥러닝모델링부(100)에 5만개의 원어민 텍스트가 입력되고 입력된 텍스트들은 150개의 의도 유형으로 구분되고, 5만개의 원어민 텍스트는 150개의 의도 유형별로 특정한 발화패턴을 갖고 있으며, 딥러닝모델링부(100)의 트레이닝모델링부(130)는 상술한 바와 같이 RNN(Recurrent Neural Network) 알고리즘을 활용하여 의도별 딥러닝 모델 정보(50)을 생성하여 연산모델부(310)로 제공하는 것이다.For example, 50,000 native speakers are input to the deep learning modeling unit 100 and the input texts are classified into 150 intention types, and 50,000 native texts have a specific speech pattern for each of 150 intention types. As described above, the training modeling unit 130 of the deep learning modeling unit 100 generates deep learning model information 50 for each intention by using a Recurrent Neural Network (RNN) algorithm and provides the computational model unit 310 to the computational model unit 310. will be.

딥러닝모델링부(100)로부터 의도별 딥러닝 모델 정보(50)와 출력의도입력부(200)로부터 의도유형정보를 제공받은 상기 연산모델부(310)는 상기 의도유형정보에 해당하는 의도를 갖는 다양한 발화패턴의 텍스트 문장들을 자동 생성하게 되는 것이다The computational model unit 310 that receives the deep learning model information 50 for each intention from the deep learning modeling unit 100 and the intention type information from the output intention input unit 200 has an intention corresponding to the intention type information. Automatically generate text sentences with various speech patterns

도 6을 참조하여 구체적으로 설명하면, 딥러닝모델링부(100)로부터 의도별 딥러닝 모델 정보(50)와 출력의도입력부(200)로부터 의도유형정보를 제공받은 연산모델부(310)는 예를 들어, GRU Cell(Gated Recurrent Unit)이라는 대표적인 RNN 알고리즘을 이용하여 출력값인 다양한 발화패턴의 텍스트 문장들을 자동 생성하는 것이다.Referring to FIG. 6, the operation model unit 310 provided with the deep learning model information 50 for each intention from the deep learning modeling unit 100 and the intention type information from the output intention input unit 200 is an example. For example, using the representative RNN algorithm called GRU Cell (Gated Recurrent Unit) automatically generates text sentences of various speech patterns as output values.

도 7을 참조하여 GRU Cell(Gated Recurrent Unit)이라는 RNN 알고리즘을 이용한 다양한 발화패턴의 텍스트 문장 자동 생성에 대해 설명한다.The automatic generation of text sentences of various speech patterns using an RNN algorithm called a GRU Cell (Gated Recurrent Unit) will be described with reference to FIG. 7.

예를 들어, 입력 데이터 <STR>는 출력의도입력부(200)로부터 제공된 의도유형정보에 해당하는 것으로 예를 들어, 의도유형정보가 "식당 찾기"란 의도인 경우, 입력 데이터 <STR>가 S1을 통과하면 Find 라는 출력값이 나오고, 입력 데이터인 Find를 S2에 통과시키면 me 라는 출력값이 나오고, me를 S3에 통과시키면 POI(찾는 목적지 의미함)라는 출력값이 나오고 POI(찾는 목적지 의미함)를 S4에 통과시키면 문장이 종료되는 마침표가 출력됨으로 "Find me POI."란 식당 찾기란 의도를 갖는 특정 발화패턴의 텍스트 문장이 생성 되는것이다. For example, the input data <STR> corresponds to the intention type information provided from the output intention input unit 200. For example, when the intention type information is "find restaurant", the input data <STR> is S1. If it passes, the output is Find, and if the input data, Find, is passed to S2, the output is me, and if you pass me to S3, the output is POI (meaning destination) and POI (meaning destination) is S4. If you pass in, a period is terminated and a sentence is printed, which creates a text sentence with a specific speech pattern with the intention of "Find me POI."

상기 코퍼스출력부(320)는 상기 연산모델부(310)가 자동 생성한 텍스트 문장을 출력하게 되는 것이다. 이때 연산모델부(310)가 자동 생성한 텍스트 문장들은 딥러닝 학습자료로 딥러닝모델링부(100)에 입력된 텍스트와 의도는 동일하되 다른 발화 패턴을 갖는 텍스트 문장들인 것을 특징으로 한다.The corpus output unit 320 outputs a text sentence automatically generated by the calculation model unit 310. In this case, the text sentences automatically generated by the calculation model unit 310 may be text sentences having the same intention as the text input to the deep learning modeling unit 100 as the deep learning learning material but having different speech patterns.

예를 들어, 딥러닝 학습자료로 딥러닝모델링부(100)에 입력된 텍스트가 1. Where is the <Restaurant>?, 2. I want to look for <Restaurant>., 3. Could you please find <Restaurant>? 라면, 상기 연산모델부(310)가 자동 생성한 텍스트 문장은 4. Find <Restaurant>., 5. I want to find <Restaurant>., 6. Could you please looking for <Restaurant>., 7. Where is the <Restaurant>. 등과 같이 의도는 "식당 찾기"로 동일하지만 발화패턴이 상이한 텍스트 문장이 생성되고, 생성된 텍스트 문장들이 코퍼스출력부(320)를 통해 출력되는 것이다.For example, the text entered into the deep learning modeling unit 100 as a deep learning material is 1. Where is the <Restaurant> ?, 2. I want to look for <Restaurant>., 3. Could you please find < Restaurant>? If the text sentence automatically generated by the calculation model unit 310 is 4. Find <Restaurant>., 5. I want to find <Restaurant>., 6. Could you please looking for <Restaurant>., 7. Where is the <Restaurant>. As such, the text sentence is the same as "find restaurant" but different speech patterns are generated, and the generated text sentences are output through the corpus output unit 320.

따라서, '식당 찾기'라는 의도유형정보를 출력의도입력부(200)를 통해 입력하게 되면 '식당 찾기'란 동일 의도를 갖되 상기한 4번 내지 7번 등과 같이 발화 패턴이 서로 상이한 텍스트 문장이 연산모델부(310)에 의해 생성되고, 생성된 텍스트 문장들은 코퍼스출력부(320)를 통해 출력되는 것이다.Therefore, when the intention type information of 'finding a restaurant' is input through the output intention input unit 200, 'finding a restaurant' has the same intention, but text sentences having different utterance patterns, such as 4 to 7, are calculated. The text sentences generated by the model unit 310 are generated through the corpus output unit 320.

즉, 상기 테스트문장자동생성부(300)가 생성하여 출력하는 텍스트 문장은 딥러닝 학습자료로 사용되기 위해 딥러닝모델링부(100)에 입력된 텍스트들과는 동일한 의도 유형을 갖되 다른 발화 패턴을 갖는 텍스트 문장인 것을 특징으로 한다.That is, the text sentence generated and output by the test sentence automatic generation unit 300 has the same intent type as the texts input to the deep learning modeling unit 100 to be used as a deep learning material, but has a different speech pattern. Characterized in that the sentence.

그리고, 상기 신뢰값출력부(330)는 코퍼스출력부(320)를 통해 출력되는 텍스트 문장이 출력의도입력부(200)를 통해 입력된 의도유형정보의 의도에 얼마나 유사한지를 나타내는 신뢰값을 출력하게 된다.The confidence value output unit 330 outputs a confidence value indicating how similar the text sentence output through the corpus output unit 320 is to the intention of the intention type information input through the output intention input unit 200. do.

즉, 생성 출력되는 텍스트 문장이 출력의도입력부(200)를 통해 입력된 의도유형정보의 의도에 얼마나 유사한지를 확인할 수 있는 신뢰값 즉, 확률값을 출력하는 기능을 수행하게 된다.That is, a function of outputting a confidence value, that is, a probability value, which can confirm how similar the generated text sentence is to the intention of the intention type information input through the output intention input unit 200 is performed.

출력되는 텍스트 문장과 의도 유형간의 신뢰값(확률값) 계산은 일반적으로 확률적 거리값을 이용할 수 있는데, 이에 대한 구체적인 설명은 본 출원인이 출원하여 등록된 대한민국등록특허번호 제10-1890704호인 '음성 인식과 언어 모델링을 이용한 간편 메시지 출력장치 및 출력방법'과 대한민국등록특허번호 제10-1913191호인 '도메인 추출기반의 언어 이해 성능 향상장치및 성능 향상방법'에 구체적으로 설명되어 있다.The calculation of the confidence value (probability value) between the output text sentence and the intent type can generally use a probabilistic distance value, and the detailed description thereof includes 'voice recognition', which is registered and filed by the applicant of the Republic of Korea Patent No. 10-1890704. Simple message output device and method using language modeling and 'Korean language registration No. 10-1913191', 'Language extraction based language understanding performance improving device and performance improvement method' are described in detail.

신뢰값을 계산하는 기술은 통계학 혹은 음성 인식 기술에서 주로 사용하는 기술로서, 토픽모델, 오피니언 마이닝, 텍스트 요약, 데이터 분석, 여론 조사 등에서 일반적으로 적용되는 기술이므로 신뢰값을 계산하는 원리를 구체적으로 설명하지 않아도 당업자들은 상기한 의미를 충분히 이해할 수 있다는 것은 자명한 사실이다.The technique of calculating the confidence value is a technique commonly used in statistics or speech recognition technology, and it is a general technique used in topic model, opinion mining, text summarization, data analysis, and public opinion survey. It is obvious that those skilled in the art can fully understand the above meanings.

한편, 부가적인 양태에 따라 상기 연산모델부(310)는 생성된 텍스트 문장 출력시 출력 옵션에 관한 파라미터를 조정하기 위한 출력파라미터제공모듈(311);을 더 포함하여 구성되는 것을 특징으로 한다.Meanwhile, according to an additional aspect, the operation model unit 310 may further include an output parameter providing module 311 for adjusting a parameter related to an output option when outputting the generated text sentence.

예를 들어, 출력파라미터제공모듈(311)이 조정할 수 있는 출력 옵션에 관한 파라미터는 의도 유형과 의도유형별 출력 텍스트 문장 수량일 수 있다.For example, the parameter related to the output option that the output parameter providing module 311 can adjust may be an output text sentence quantity for each intent type and intention type.

예를 들어, 출력파라미터제공모듈(311)를 통해 의도유형을 지정(예: 길찾기, 라디오 검색, 주소 검색 등)하고, 지정된 의도 유형별 출력 텍스트 문장의 수량을 5개로 조정하면, 도 9와 같이, 의도 유형별로 5개씩 자동 생성된 텍스트 문장을 출력하게 되는 것이다.For example, if the intention type is specified through the output parameter providing module 311 (eg, directions, radio search, address search, etc.), and the quantity of output text sentences for each designated intention type is adjusted to five, as shown in FIG. 9. In this case, five text sentences are automatically generated for each intent type.

결론적으로, 상기 테스트문장자동생성부(300)의 연산모델부(310), 코퍼스출력부(320), 신뢰값출력부(330)에 의해, 딥러닝 모델 생성에 사용되지 않은 순수한 검증용 자연어 발화 패턴 텍스트 문장을 생성할 수 있고, 이를 음성인식장치의 테스트용 텍스트 문장으로 활용할 수 있는 효과를 발휘하게 되는 것이다. 특히, 상기 테스트문장자동생성부(300)가 생성하여 출력하는 텍스트 문장은 음성인식 자동평가시스템의 NLU(Natural Language Understanding)의 평가용 텍스트로 활용될 수 있다.In conclusion, by using the computation model unit 310, the corpus output unit 320, and the confidence value output unit 330 of the test sentence automatic generation unit 300, the pure natural language for verification that is not used for generating the deep learning model. It is possible to generate a pattern text sentence, and to use it as a test text sentence for a speech recognition device. In particular, the text sentence generated and output by the test sentence automatic generation unit 300 may be utilized as the text for evaluation of the Natural Language Understanding (NLU) of the automatic speech recognition system.

다음은 실제 본 발명인 발화 패턴의 무한성 개선을 위한 딥러닝 기반의 텍스트 문장 자동 생성시스템에 의해 의도별로 출력되는 텍스트 문장의 적정성 즉, 신뢰성 및 입력 문장의 중복성을 시험한 예시를 도 8을 통해 설명하도록 한다.Next, an example of testing the adequacy of the text sentences output by intention, that is, reliability and redundancy of input sentences, by the deep learning-based automatic text sentence generation system for improving the infiniteness of the speech pattern according to the present invention will be described with reference to FIG. 8. Do it.

도 8과 같이, 10개 의도 유형별 의도 내용으로 Navigate, Search, FM, Search Nearest, Create Calendar, Find Address, Search Menu, DisableGPS, StockMarketTrend, Search Weather 를 입력할 경우에 적정성 즉, 신뢰성은 각각, 100%, 100%, 100%, 100%, 100%, 80%, 100%, 100%, 60%, 100% 으로서, 평균 신뢰값이 94%에 달하여 상당한 신뢰도가 있음을 확인할 수 있었다.As shown in FIG. 8, when Navigate, Search, FM, Search Nearest, Create Calendar, Find Address, Search Menu, DisableGPS, StockMarketTrend, and Search Weather are input as the intent content of each intent type, the reliability, that is, reliability is 100%, respectively. , 100%, 100%, 100%, 100%, 80%, 100%, 100%, 60%, 100%, the average confidence value of 94% was confirmed that there is considerable reliability.

또한, 입력 문장의 중복성은 각각 40%, 40%, 80%, 40%, 0%, 60%, 20%, 40%, 20%, 20% 으로서, 평균 입력문장의 중복성이 36%에 달하여 모델 생성에 사용하지 않은 순수한 검증용 데이터의 신규 확보가 가능한 장점을 가지게 됨을 확인할 수 있었다.Also, the redundancy of input sentences is 40%, 40%, 80%, 40%, 0%, 60%, 20%, 40%, 20%, 20%, respectively, and the average input sentence redundancy reaches 36%. It can be seen that it has the advantage that new acquisition of pure verification data that is not used for generation is possible.

또한, 도 9에 도시한 바와 같이, 1번 의도 유형인 Navigate에 대하여 하기와 같이, 새롭게 자동 생성된 자연어 발화 패턴의 텍스트 문장을 생성 출력할 수 있고, 이를 NLU 평가 Text로 활용할 수 있게 되는 것이다.In addition, as shown in FIG. 9, a text sentence of a natural language utterance pattern newly automatically generated may be generated and output as described below with respect to Navigate, which is the intent type 1, and may be used as an NLU evaluation text.

즉, Generated Pattern 1: navigate to a <POI> on <STREET> and <STREET> ,In other words, Generated Pattern 1: navigate to a <POI> on <STREET> and <STREET>,

Generated Pattern 2: navigate me over to a <POI> on <STREET> and <STREET>,Generated Pattern 2: navigate me over to a <POI> on <STREET> and <STREET>,

Generated Pattern 3: navigate me over to <POI>,Generated Pattern 3: navigate me over to <POI>,

Generated Pattern 4: navigate to the <STREET> and <STREET> intersection,Generated Pattern 4: navigate to the <STREET> and <STREET> intersection,

Generated Pattern 5: navigate to <POI> on the corner of <STREET> and <STREET>.Generated Pattern 5: navigate to <POI> on the corner of <STREET> and <STREET>.

결국, 의도 유형별로 비문법적인 문장이 생성될 확률이 각각 다르게 나타나지만 충분히 활용할 정도 수준의 매칭성을 보여주고 있음을 알 수 있으며, 이렇게 출력된 새로운 문장들을 정제(원본 중복 문장 제거, 비문법적 문장 제거, 도메인 정보 부족 제거 등)를 거치고 나면 NLU 평가 Text로 활용할 수 있게 되는 것이다.As a result, the probability of generating a non-legal sentence for each type of intention is different, but it shows that it shows a level of matching that can be fully utilized, and thus the newly printed sentences are refined (removing original duplicate sentences, removing non-legal sentences, After removing domain information shortage, etc., it can be used as NLU evaluation text.

결국, 본 발명에 의하면, 딥러닝모델링부(100)를 통해 발화 패턴과 의도 정보를 갖는 원어민 텍스트들을 입력받아 딥러닝 트레이닝을 수행하고, 수행 결과인 의도별 딥러닝 모델 정보를 테스트문장자동생성부(300)로 제공하면, 테스트문장자동생성부(300)는 제공된 의도별 딥러닝 모델 정보와 출력의도입력부(200)를 통해 제공된 의도 유형을 이용하여 의도 유형에 해당하는 각기 다른 발화 패턴을 갖는 텍스트 문장을 생성하고 출력함으로써, 사용자의 잠재적 발화 패턴의 무한성(infinity)을 개선할 수 있게 된다.As a result, according to the present invention, the deep learning modeling unit 100 performs a deep learning training by receiving native speaker texts having a speech pattern and intention information, and the test sentence automatic generation unit generates the deep learning model information for each intention which is a result of the execution. When provided to the 300, the test sentence automatic generation unit 300 has different speech patterns corresponding to the intention type using the intention type provided through the deep intention model information and the intention type provided through the output intention input unit 200. By generating and outputting text sentences, it is possible to improve the infinity of the user's potential speech pattern.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형 실시가 가능한 것은 물론이고, 이러한 변형 실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안될 것이다.In addition, although the preferred embodiment of the present invention has been shown and described above, the present invention is not limited to the above-described specific embodiment, the technical field to which the invention belongs without departing from the spirit of the invention claimed in the claims. Of course, various modifications can be made by those skilled in the art, and these modifications should not be individually understood from the technical spirit or the prospect of the present invention.

100 : 딥러닝모델링부
200 : 출력의도입력부
300 : 테스트문장자동생성부100: deep learning modeling unit
200: output intention input unit
300: test sentence automatic generation unit

Claims

In the deep learning based automatic text sentence generation system for improving the infinity of the speech pattern,
Deep learning training is performed by receiving various texts, which are deep learning learning materials having speech pattern information and intention information, to generate deep learning model information for each intention including a plurality of deep learning models, and to generate deep learning model information for each intent generated. And the deep learning modeling unit 100 to provide a test sentence automatic generation unit 300,

An output intention input unit 200 for providing the test sentence automatic generation unit 300 with the intent type information necessary for generating the text sentence of the test sentence automatic generation unit 300;

By using the deep learning model information for each intent provided by the deep learning modeling unit 100 and the intention type information provided by the output intention input unit 200, a text sentence corresponding to the intention type information is generated and output. It includes a test sentence automatic generation unit 300 for outputting a confidence value for the text sentence to be,

The deep learning modeling unit 100,
A model text corpus unit 110 for receiving a plurality of texts having a specific speech pattern as a deep learning material and providing the input texts to the training modeling unit 130;
To receive the intents of the texts input to the model text corpus unit 110, tag the corresponding intentions to the corresponding texts, and provide the training modeling unit 130 with intent tagging information for each text tagged with the intents for each text. Model intention input unit 120;
Deep learning model for each intention by performing deep learning training using a plurality of texts having a specific speech pattern provided by the model text corpus 110 and intention tagging information for each text provided by the model intention input unit 120 And a training modeling unit 130 for generating the information and providing the generated deep learning model information for each test to the test sentence automatic generation unit 300.

The training modeling unit 130,
The deep learning modeling is configured to further include a modeling parameter providing module for adjusting the parameters related to the modeling, wherein the parameters relating to the modeling is characterized in that the degree of modeling, modeling depth, modeling network type Deep learning based text sentence generation system to improve the infiniteness of patterns.

delete

The method of claim 1,
The test sentence automatic generation unit 300,
By using the deep learning model information for each intent provided by the deep learning modeling unit 100 and the intention type information provided by the output intention input unit 200, various speech patterns having intentions corresponding to the intention type information are included. A calculation model unit 310 for automatically generating text sentences;
A corpus output unit 320 for outputting text sentences generated by the calculation model unit 310;
A confidence value output unit 330 for outputting a confidence value indicating how similar the text sentence output through the corpus output unit 320 to the intention of the intention type information input through the output intention input unit 200; Deep learning based automatic text sentence generation system for improving the infinity of the speech pattern, characterized in that comprises.

The method of claim 4, wherein
The calculation model unit 310,
It is configured to further include; output parameter providing module (311) for adjusting the parameters related to the output option when outputting the generated text sentence,
And a parameter related to the output option includes an intention type and an output text sentence quantity for each intention type.

The method of claim 1,
The text sentence generated and output by the test sentence automatic generation unit 300,
Deep learning-based text sentence automatic generation system for improving the infinity of the utterance pattern, characterized in that the text sentence having the same intention type and different utterance pattern than the input text to be used as deep learning learning materials.

The method of claim 1,
The text sentence generated and output by the test sentence automatic generation unit 300,
Deep learning based text sentence generation system for improving the infiniteness of speech patterns, characterized in that it is used as the text for evaluation of NLU (Natural Language Understanding) of the speech recognition automatic evaluation system.