KR20190036463A

KR20190036463A - QUERY AND RESPONSE SYSTEM AND METHOD IN MPEG IoMT ENVIRONMENT

Info

Publication number: KR20190036463A
Application number: KR1020180097020A
Authority: KR
Inventors: 최미란; 왕지현; 김민호; 김현기; 류지희; 배경만; 배용진; 이형직; 임수종; 임준호; 장명길; 허정
Original assignee: 한국전자통신연구원
Priority date: 2017-09-27
Filing date: 2018-08-20
Publication date: 2019-04-04
Also published as: KR102479026B1

Abstract

The present invention relates to a system for query and response in a moving picture experts group (MPEG) internet of media things (IoMT) environment. The system comprises: an IoT terminal receiving and transmitting articulation information and receiving and providing query and response result information; and an articulation analyzing server performing articulation analysis on the articulation information provided from the IoT terminal according to MPEG IoMT data format and providing the query and response result information after using the analyzed articulation information to perform query and response with a query and response server.

Description

TECHNICAL FIELD The present invention relates to a QUERY AND RESPONSE SYSTEM AND METHOD IN MPEG IOMT ENVIRONMENT,

본 발명은 MPEG IoMT 환경에서의 질의응답 시스템 및 방법을 구현하기 위한 것으로, 사용자의 다양한 요구에 부응하는 기기 조작과 정보 전달 및 질의응답에 대한 질문자가 원하는 정답을 정확히 검출하기 위한 장치 및 방법에 관한 것이다. The present invention relates to an apparatus and method for accurately detecting a correct answer desired by a questioner about device operation, information transfer, and query response in response to various demands of a user for implementing a query response system and method in an MPEG IoMT environment will be.

종래의 질의응답 기술은 질문자가 직접 입력한 질문 문장에만 의존하여 정답을 찾기 때문에 다양한 사용자의 요구를 해결하기 어려웠다. The conventional question-and-answer technique has difficulty in solving the needs of various users because it depends on only the question sentence inputted by the questioner and finds the correct answer.

최근에는 웨어러블 장치(Wearable Device)를 비롯한 IoT 기기들이 많이 등장하면서, 단순한 질문만 해결하는 질의응답 시스템은 한계가 있다. In recent years, many IoT devices including wearable devices have been limited, and there is a limit to a question and answer system that solves only simple questions.

이러한 불편함을 해소하기 위해, 질문자의 발화에 대해 기기에서 미리 발화를 분석하여 질문자의 의도를 파악하는 것이 필요하다. In order to alleviate this inconvenience, it is necessary to analyze the utterance of the questioner in advance and grasp the intention of the questioner.

이에, MPEG에서는 IoT 환경에서의 멀티미디어 기술을 구현하기 위하여 MPEG(Moving Picture Experts Group) IoMT(Internet of Media Things) 그룹에서 표준을 만들고 있으며, 여기에 질의응답 사용자 인터페이스를 포함하려고 하고 있다. In order to realize the multimedia technology in the IoT environment, MPEG is making a standard in MPEG (Moving Picture Experts Group) IoMT (Internet of Media Things) group, and it is trying to include a Q & A user interface there.

이를 위하여, 사용자의 발화 내용을 분석하고, 적절한 IoT 기기에서 발화 내용에 따른 처리를 수행할 수 있게 하는 기술에 대한 연구가 이루어지고 있다. For this purpose, research has been conducted on techniques for analyzing the user's utterance contents and performing processing according to utterance contents in an appropriate IoT device.

본 발명은 종래 문제점을 해결하기 위해 안출된 것으로, IoT 환경에서 다양한 장치를 통하여 입력되는 다양한 형태의 질문과 명령에 대한 발화에 대해 질의 처리가 가능한 MPEG IoMT 환경에서의 질의응답 시스템 및 방법을 제공하는 한다. The present invention has been made to solve the conventional problems, and provides a query response system and method in an MPEG IoMT environment in which queries can be performed on various types of questions and commands that are input through various devices in an IoT environment do.

본 발명의 목적은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다. The objects of the present invention are not limited to the above-mentioned objects, and other objects not mentioned can be clearly understood by those skilled in the art from the following description.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 MPEG IoMT 환경에서의 질의응답 시스템은 발화 정보를 입력받아 전송하고, 질의응답 결과 정보를 수신받아 제공하는 IoT 단말; 및 상기 IoT 단말로부터 전송된 발화 정보를 MPEG(Moving Picture Experts Group) IoMT(Internet of Media Things) 데이터 포맷에 따라 발화 분석을 수행하고, 발화 분석된 정보를 이용하여 질의응답 서버와의 질의 응답을 수행한 후 그 질의응답 결과 정보를 IoT 단말에 제공하는 발화 분석 서버;를 포함한다. According to an aspect of the present invention, there is provided a query response system in an MPEG IoMT environment including an IoT terminal for receiving and transmitting speech information, and receiving and providing query response result information; And performing speech analysis on the speech information transmitted from the IoT terminal according to a Moving Picture Experts Group (MPEG) IoMT (Internet of Media Things) data format and performing a query response with a query response server using the speech analysis information And providing the query response result information to the IoT terminal.

상기 MPEG IoMT의 데이터 포맷은 사용자 질문 타입에 대한 정보와 사용자의 질문이 어떤 언어로 표현되어 있는지에 대한 정보를 포함하는 것이 바람직하다. The data format of the MPEG IoMT preferably includes information on the user question type and information on which language the user's question is expressed.

그리고 상기 사용자 질문 타입에 대한 정보는, 질문의 주제를 나타내는 정보, 질문의 초점을 나타내는 정보 및 질문의 의미 또는 목적을 나타내는 정보를 포함하는 것이 바람직하다. The information on the user question type preferably includes information indicating a subject of the question, information indicating the focus of the question, and information indicating the meaning or purpose of the question.

또한, 상기 질문의 초점 정보는, "언제, 어디에, 무엇을, 누가, 왜, 어떻게"와 같은 분류체계로 분류되고, 상기 질문의 의미 및 목적 정보는, 명령 요청, 어휘 요청, 의미 요청, 정보 요청 및 방법 요청 등과 같은 분류체계로 분류된다. Further, the focus information of the question is classified into a classification system such as " When, Where, What, Who, Why, How ", and the meaning and purpose information of the question are classified into a command request, a lexical request, Request and method requests, and so on.

한편, 상기 MPEG IoMT의 데이터 포맷은 문자열(string)로 표현된 질문 도메인 정보를 포함할 수 있다. Meanwhile, the data format of the MPEG IoMT may include question domain information represented by a string.

그리고 본 발명의 일 실시예에 따른 IoT 단말은, 사용자로부터 제공된 발화 정보를 입력받는 입력부; 입력된 상기 발화 정보를 상기 발화 분석 서버에 전송하고, 상기 발화 분석 서버로부터 질의응답 결과 정보를 수신하는 통신부; 및 상기 발화 분석 서버로부터 수신한 질의응답 결과 정보를 출력하는 출력부;를 포함한다. According to another aspect of the present invention, there is provided an IoT terminal comprising: an input unit for receiving input information provided by a user; A communication unit for transmitting the input speech information to the speech analysis server and receiving query response result information from the speech analysis server; And an output unit for outputting query response result information received from the speech analysis server.

그리고 상기 입력부는 사용자의 발화 정보를 입력받는 마이크;를 포함한다. And the input unit includes a microphone for receiving the user's utterance information.

여기서, 상기 입력부는 텍스트 형태의 모달 정보를 입력받는 사용자 인터페이스를 화면에 출력하는 질의창(Query Interface) 제공부를 포함할 수 있다. Here, the input unit may include a query interface unit for outputting a user interface for receiving modal information in the form of text on a screen.

또한, 상기 입력부는 이미지 형태의 모달 정보를 획득하는 카메라를 포함할 수 있다. In addition, the input unit may include a camera for obtaining modal information in an image form.

한편, 상기 출력부는, 질의응답 결과 정보를 화면으로 출력하는 화면 출력부;를 더 포함한다. The output unit may further include a screen output unit for outputting inquiry result information to a screen.

이러한 상기 출력부는, 질의응답 결과 정보를 음성으로 출력하는 음성 출력부;를 더 포함할 수 있다. The output unit may further include an audio output unit for outputting the query response result information by voice.

한편, 상기 발화 분석 서버는, 상기 IoT 단말 및 상기 질의응답 서버와 데이터 통신을 수행하는 통신부; 상기 IoT 단말로부터 제공된 발화 정보의 음성을 인식하는 음성 인식부; 상기 음성 인식된 발화 정보를 MPEG(Moving Picture Experts Group) IoMT(Internet of Media Things) 데이터 포맷에 따라 발화 분석을 수행하는 발화 분석부; 상기 MPEG IoMT의 데이터 포멧으로 발화 분석된 정보를 이용하여 상기 질의응답 서버에 질의하는 질의응답 호출부;를 포함한다. Meanwhile, the utterance analysis server may include: a communication unit for performing data communication with the IoT terminal and the query response server; A speech recognition unit for recognizing speech of the speech information provided from the IoT terminal; An utterance analyzing unit for performing utterance analysis on the speech recognized speech information according to a Moving Picture Experts Group (MPEG) IoMT (Internet of Media Things) data format; And a query response call part for querying the query response server using the information analyzed by the data format of the MPEG IoMT.

그리고 상기 발화 분석 서버는, 텍스트 형태의 질의응답 결과 정보를 음성으로 변환하는 음성 합성부;를 더 포함할 수 있다. The speech analysis server may further include a speech synthesis unit for converting the text-based query response result information into speech.

또한, 본 발명의 일 실시예는 상기 분석된 발화 정보가 질의요청을 위한 정보인지 기기제어 명령을 위한 정보인지를 판단하여 기기제어 명령이면 해당 기기제어 명령을 수행하도록, 발화 정보를 전송한 상기 IoT 단말로 발화 정보를 전달하는 발화 정보 판단부를 더 포함한다. According to an embodiment of the present invention, it is determined whether the analyzed speech information is information for requesting a query or information for a device control command. If the analyzed Ignition information is a device control command, the IoT And a speech information determiner for transmitting the speech information to the terminal.

한편, 상기 음성 인식부는, 발화 정보에 대하여 형태소 분석, 개체명 분석, 구문 분석 등과 같은 언어 처리 과정을 수행한다. Meanwhile, the speech recognition unit performs language processing such as morphological analysis, object name analysis, and syntax analysis on the speech information.

또한, 상기 질의응답 서버는, 상기 발화분석 서버로부터 수신된 정보의 상기 MPEG IoMT의 데이터 포맷을 이용하여 질의 분석을 수행하고, 그 질의 분석된 결과인 질의응답 결과 정보를 상기 발화 분석 서버로 제공한다. The query response server performs a query analysis using the data format of the MPEG IoMT of the information received from the utterance analysis server, and provides the query response result information, which is the result of the query analysis, to the utterance analysis server .

이러한, 상기 질의응답 서버는, 복수의 질의응답 결과가 존재할 경우, 질의응답 결과에 대한 정답 가능성 정보에 따라 설정된 목록 정보를 상기 발화 분석 서버에 전송한다. If there are a plurality of query response results, the query response server transmits to the utterance analysis server list information set in accordance with correct answer possibility information about the query response result.

본 발명의 일 실시예에 따른 MPEG IoMT 환경에서의 질의응답 방법은 발화 분석 서버가 IoT 단말로부터 전송된 발화 정보를 MPEG(Moving Picture Experts Group) IoMT(Internet of Media Things) 데이터 포맷에 따라 발화 분석을 수행하는 단계; 상기 발화 분석 서버가 발화 분석된 정보를 이용하여 질의응답 서버와의 질의 응답을 수행하는 단계; 및 상기 발화 분석 서버가 그 질의응답 결과 정보를 IoT 단말에 제공하는 단계;를 포함한다. In the query response method in the MPEG IoMT environment according to an embodiment of the present invention, the speech analysis server analyzes the speech information transmitted from the IoT terminal according to the MPEG (Moving Picture Experts Group) IoMT (Internet of Media Things) data format ; Performing a query response with the query response server using the information analyzed by the utterance analysis server; And the speech analysis server providing the query response result information to the IoT terminal.

여기서, 상기 MPEG IoMT의 데이터 포맷은, 사용자 질문 타입에 대한 정보와 사용자의 질문이 어떤 언어로 표현되어 있는지에 대한 정보를 포함한다. Here, the data format of the MPEG IoMT includes information on the user question type and information on which language the user's question is expressed.

그리고, 상기 사용자 질문 타입에 대한 정보는, 질문의 주제를 나타내는 정보, 질문의 초점을 나타내는 정보 및 질문의 의미 또는 목적을 나타내는 정보를 포함한다. The information on the user question type includes information indicating the subject of the question, information indicating the focus of the question, and information indicating the meaning or purpose of the question.

따라서, 본 발명의 일 실시예에 따르면, IoT 단말로부터 제공되는 사용자 발화를 분석하고, 이를 MPEG IoMT 데이터 포맷에 따라 발화 분석을 수행하여 질의 응답을 제공함으로써, MPEG IoMT 환경에서도 사용자의 발화를 이용한 질의 응답 서비스를 제공할 수 있는 효과가 있다. Therefore, according to one embodiment of the present invention, the user utterance provided from the IoT terminal is analyzed, and the utterance analysis is performed according to the MPEG IoMT data format to provide a query response. Thus, even in the MPEG IoMT environment, An answer service can be provided.

도 1은 본 발명의 일 실시예에 따른 MPEG IoMT 환경에서의 질의응답 시스템의 구성 블록을 설명하기 위한 도면이다.
도 2는 본 발명의 도 1에 도시된 IoT 단말을 설명하기 위한 구성 블록을 나타낸 도면이다.
도 3은 도 1에 도시된 발화 분석 서버(200)의 블록 구성을 나타낸 도면이다.
도 4는 도 1에 도시된 음성 처리부에 적용된 음성 인식에 대한 음성인식 데이터 포맷을 설명하기 위한 참고도이다.
도 5는 도 1에 도시된 발화 분석 서버에 이용되는 음성인식 데이터 포맷을 설명하기 위한 참고도.
도 6은 도 1에 도시된 발화 분석 서버에 이용되는 IoMT 질의 분석 패킷 포맷을 설명하기 위한 참고도.
도 7은 도 1에 도시된 발화 분석 서버에서의 발화분석 제 1 예를 설명하기 위한 참고도.
도 8은 도 1에 도시된 발화 분석 서버에서의 발화분석 제 2 예를 설명하기 위한 참고도.
도 9는 도 1에 도시된 발화 분석 서버에서의 발화분석 시, "Qfocus 분류체계를 설명하기 위한 참고도.
도 10은 도 1에 도시된 발화 분석 서버에서의 발화분석 시, "QCsemanticCS분류체계를 설명하기 위한 참고도.
도 11은 도 1에 도시된 발화 분석 서버에 이용되는 음성합성 데이터 포멧을 설명하기 위한 참고도.
도 12는 도 1에 도시된 발화분석 서버에서 토큰 활용을 위한 구성 블록을 나타낸 도면.
도 13은 본 발명의 일 실시예에 따른 MPEG IoMT 환경에서의 질의응답 방법을 설명하기 위한 순서도이다. 1 is a diagram for explaining a configuration block of a query response system in an MPEG IoMT environment according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating the IoT terminal shown in FIG. 1 of the present invention.
FIG. 3 is a block diagram of the speech analysis server 200 shown in FIG. 1. Referring to FIG.
4 is a reference diagram for explaining a voice recognition data format for voice recognition applied to the voice processing unit shown in FIG.
5 is a reference diagram for explaining a speech recognition data format used in the speech analysis server shown in FIG.
FIG. 6 is a reference diagram for explaining an IoMT query analysis packet format used in the speech analysis server shown in FIG. 1; FIG.
FIG. 7 is a reference diagram for explaining a first example of the speech analysis in the speech analysis server shown in FIG. 1;
FIG. 8 is a reference diagram for explaining a second example of speech analysis in the speech analysis server shown in FIG. 1; FIG.
9 is a reference diagram for explaining the Qfocus classification system in the speech analysis in the utterance analysis server shown in Fig.
10 is a reference diagram for explaining the QCsemanticCS classification system in the speech analysis in the speech analysis server shown in FIG.
11 is a reference diagram for explaining a speech synthesis data format used in the speech analysis server shown in Fig.
12 illustrates a configuration block for token utilization in the utterance analysis server shown in FIG. 1;
13 is a flowchart for explaining a query response method in an MPEG IoMT environment according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 한편, 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성소자, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성소자, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. As used herein, the terms " comprises, " and / or " comprising " refer to the presence or absence of one or more other components, steps, operations, and / Or additions.

이하, 본 발명의 바람직한 실시예에 대하여 첨부한 도면을 참조하여 상세히 설명하기로 한다. 도 1은 본 발명의 일 실시예에 따른 MPEG(Moving Picture Experts Group, 이하 'MPEG'라 함) IoMT(Internet of Media Things, 이하 "IoMT"라 함) 환경에서의 질의응답 시스템의 구성 블록을 설명하기 위한 도면이다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. 1 illustrates a block diagram of a query response system in an environment of MPEG (Moving Picture Experts Group) (hereinafter referred to as "MPEG") IoMT (hereinafter referred to as "IoMT") environment according to an embodiment of the present invention. Fig.

도 1에 도시된 바와 같이, 본 발명의 일 실시예에 따른 MPEG IoMT 환경에서의 질의응답 시스템은 IoMT(Internet of Things, 이하 "IoT"라 함) 단말(100), 발화 분석 서버(200) 및 질의응답 서버(300)를 포함한다. 1, the query response system in the MPEG IoMT environment according to an embodiment of the present invention includes an Internet of Things (IoMT) terminal 100, a speech analysis server 200, And a query response server 300.

IoT 단말(100)은 사용자로부터 제공된 발화 정보를 입력받아 발화 분석 서버에 전달하고, 발화 분석 서버로부터 수신되는 질의응답 결과 정보를 사용자에게 제공한다. The IoT terminal 100 receives the speech information provided from the user and transfers the speech information to the speech analysis server, and provides the user with the query response result information received from the speech analysis server.

여기서, IoT 단말(100)은 웨어러블 기기를 포함하여 IoT 환경에서 사용되는 모든 기기로써, 다양한 센서, 제어장치들을 포함할 수 있다. Here, the IoT terminal 100 may include various sensors and control devices as all devices used in the IoT environment including a wearable device.

한편, IoT 단말(100)은 사용자의 질의 정보가 포함된 발화 정보와 함께, 기기의 정보 및 센싱 정보를 발화 정보와 함께 발화 분석 서버(200)에 제공할 수 있다. Meanwhile, the IoT terminal 100 may provide the information and sensing information of the device together with the utterance information including the query information of the user to the utterance analysis server 200 together with the utterance information.

도 2는 본 발명의 도 1에 도시된 IoT 단말을 설명하기 위한 구성 블록을 나타낸 도면이다. FIG. 2 is a block diagram illustrating the IoT terminal shown in FIG. 1 of the present invention.

도 2에 도시된 바와 같이 IoT 단말(100)은 입력부(110), 통신부(120) 및 출력부(130)를 포함한다. 2, the IoT terminal 100 includes an input unit 110, a communication unit 120, and an output unit 130. As shown in FIG.

입력부(110)는 사용자의 발화 정보를 입력받는 역할을 한다. 본 실시예에서의 입력부(110)는 발화 정보를 입력받는 마이크인 것이 바람직하다. 그러나 입력부(1000는 텍스트 형태의 발화 정보를 입력받는 사용자 인터페이스를 화면에 출력하는 질의창(Query Interface)과 이미지 형태의 발화 정보를 획득하는 카메라 중 적어도 하나 이상을 더 포함할 수도 있다. The input unit 110 receives the user's utterance information. The input unit 110 in the present embodiment is preferably a microphone that receives the speech information. However, the input unit 1000 may further include at least one or more of a query interface for outputting a user interface for inputting text-type utterance information and a camera for acquiring image-type utterance information.

그리고 통신부(120)는 입력된 발화 정보를 발화 분석 서버(200)에 전송하고, 발화 분석 서버(200)로부터 질의응답 결과 정보를 수신하는 역할을 한다. 여기서, 통신부(120)가 주고받는 정보에는 음성, 텍스트, 이미지 등의 데이터, 발화 분석의 결과 중 기기 제어명령, 사용자가 질문한 질문 문장 및 질의응답 결과 정보인 정답후보 목록가 포함될 수 있다. The communication unit 120 transmits the input speech information to the speech analysis server 200 and receives query response result information from the speech analysis server 200. Here, the information exchanged by the communication unit 120 may include data such as voice, text, and images, a device control command as a result of the utterance analysis, a question sentence questioned by the user, and a correct answer candidate list, which is inquiry response result information.

또한 출력부(130)는 발화 분석 서버(200)로부터 제공된 질의응답 결과 정보를 출력하는 역할을 한다. 본 실시예에서의 출력부(130)는 질의응답 결과 정보를 사용자 인터페이스를 통해 화면에 출력하는 화면 출력부(130), 질의응답 결과 정보를 음성으로 출력하는 음성 출력부(130)를 적어도 하나 이상 포함할 수 있다. The output unit 130 also outputs the query response result information provided from the utterance analysis server 200. The output unit 130 in this embodiment includes a screen output unit 130 for outputting question and answer result information to a screen through a user interface and at least one or more audio output units 130 for outputting question and answer result information as a voice .

그리고 발화 분석 서버(200)는 IoT 단말(100)로부터 제공된 발화 정보를 MPEG(Moving Picture Experts Group) IoMT(Internet of Media Things) 데이터 포맷에 따라 발화 분석을 수행하고, 발화 분석된 정보를 이용하여 질의응답 서버(300)와의 질의 응답을 수행한 후 그 질의응답 결과 정보를 IoT 단말(100)에 제공한다. The speech analysis server 200 performs speech analysis on the speech information provided from the IOT terminal 100 according to the MPEG (Moving Picture Experts Group) data format of IoMT (Internet of Media Things) Performs a query response with the response server 300 and provides the query response result information to the IoT terminal 100.

도 3은 도 1에 도시된 발화 분석 서버(200)의 블록 구성을 나타낸 도면이다. 도 3에 도시된 바와 같이, 발화 분석 서버(200)는 제 1 통신부(210), 음성 처리부(220), 발화 분석부(230), 발화 정보 판단부(240) 및 질의응답 호출부(250) 및 제 2 통신부(270)를 포함한다. FIG. 3 is a block diagram of the speech analysis server 200 shown in FIG. 1. Referring to FIG. 3, the utterance analysis server 200 includes a first communication unit 210, a voice processing unit 220, an utterance analysis unit 230, a utterance information determination unit 240, and a query response call unit 250, And a second communication unit 270.

제 1 통신부(210)는 IoT 단말(100)과 통신을 수행하는 역할을 한다. The first communication unit 210 performs communication with the IoT terminal 100.

그리고 음성 처리부(220)는 IoT 단말(100)로부터 제공된 발화 정보의 음성을 인식하는 역할을 한다. The voice processing unit 220 recognizes the voice of the utterance information provided from the IoT terminal 100.

도 4는 도 1에 도시된 음성 처리부에 적용된 음성 인식에 대한 음성인식 데이터 포맷을 설명하기 위한 참고도이다. 4 is a reference diagram for explaining a voice recognition data format for voice recognition applied to the voice processing unit shown in FIG.

이를 위해, 음성 처리부(220)는 도 4 및 하기의 [표 1]에서와 같이, 음성 인식에 대한 설명 요약을 제공하는 "SpeechRecognitionType"(음성인식유형) 필드와, 음성 인식의 결과 텍스트를 설명하는 "speechText"(음성 텍스트) 필드로 이루어진 음성인식 데이터 포맷을 이용한다. 이때, 발화 정보는 형태소 분석, 개체명 분석, 구문 분석 등과 같은 일반적인 언어 처리 과정을 거칠 수 있다. To this end, the voice processing unit 220 includes a "Speech Recognition Type" field (voice recognition type) field for providing a summary description of voice recognition, as shown in FIG. 4 and the following Table 1, and a speech recognition data format composed of " speechText " (voice text) field. At this time, the utterance information can be subjected to general language processing such as morphological analysis, object name analysis, and syntax analysis.

NameName DefinitionDefinition SpeechRecognitionType
(음성인식유형)SpeechRecognitionType
(Voice recognition type) Provides an abstract of description of speech recognition, which is done in the media analyzer.Provides an abstract description of speech recognition, which is done in the media analyzer. speechText
(음성 텍스트)speechText
(Voice text) Describes the resulting text of speech recognition.Describes the resulting text of speech recognition.

도 5는 도 1에 도시된 발화 분석 결과를 설명하기 위한 참고도이다. 5 is a reference diagram for explaining the results of the speech analysis shown in FIG.

예를 들어, 분석된 데이터가 음성 인식 결과인 것을 나타내고, 이는 사용자의 음성에서 출력된 텍스트인 "Please turn to Channel 7."인 경우, 음성 처리부(220)는 도 5에 도시된 바와 같이, "SpeechRecognitionType" 필드에 "xai:type"이 포함되고, "speechText" 필드에 "Please turn to Channel 7"이 포함됨을 알 수 있다. For example, when the analyzed data is a speech recognition result, and this is the text "Please turn to Channel 7." outputted from the voice of the user, the voice processing unit 220 outputs " SpeechRecognitionType "field contains" xai: type "and the" speechText "field contains" Please turn to Channel 7 ".

또한, 발화 분석부(230)는 도 6 및 [표 2]에서와 같이, 음성 인식된 발화 정보를 MPEG(Moving Picture Experts Group) IoMT(Internet of Media Things) 데이터 포맷에 따라 발화 분석을 수행하는 역할을 한다. 6 and Table 2, the speech analysis unit 230 performs speech analysis on speech-recognized speech information according to the MPEG (Moving Picture Experts Group) IoMT (Internet of Media Things) data format .

발화 분석 및 질의 분석은 "QuestionAnalysisType"(질의분석타입)으로 되어 있으며, 그것은 MPEG IoMT에서 사용하는 데이터 분석 기본타입을 확장한 형태로 두 개의 요소(element)로 구성된다. The speech analysis and the query analysis are made of "QuestionAnalysisType" (query analysis type), and it consists of two elements in the form of extended basic data analysis type used in MPEG IoMT.

하나는 분석된 "anlyzedQuestion"(질문 요소)로 "UserQuestionType"(사용자 질문 타입)으로 되어 있고, 다른 하나는 언어 요소로 사용자의 질문이 어떤 "language"(언어)로 표현되었는지를 분석하여 알려준다. 즉, 이 두 개의 요소는 분석된 질문에 대한 정보를 표현한다. One is the analyzed "anlyzedQuestion" (question element), the "UserQuestionType" (user question type), and the other is the language element, which analyzes the user's question expressed in what "language" (language). In other words, these two elements represent information about the analyzed question.

"UserQuestionType"(사용자 질문 타입)은 세 개의 요소와 하나의 특성(attribute)로 구성된다. "UserQuestionType" (User Question Type) consists of three elements and one attribute.

첫 번째 요소는 "Qtopic"(질문 주제)를 나타내며 문자열(string) 형태로 표현되고, 두 번째 요소는 "Qfocus"(질문의 초점)을 나타난다. The first element represents "Qtopic" (the subject of the question) and is expressed in the form of a string, and the second element represents "Qfocus" (the focus of the question).

여기서, "Qfocus"(질문의 초점)은 도 9 및 [표 3]에서와 같이, CQfocus 분류체계로 미리 분류되어 표현된다. Here, " Qfocus " (focus of the question) is classified and expressed in advance as a CQfocus classification system as shown in FIG. 9 and [Table 3].

세 번째 요소는 질문의 의미 또는 목적으로 이 요소도 ㅇ및ㅍ"표Y마지막으로 질문의 특성으로 "qdomain"(질문 도메인)이 있는데 이것은 질문의 분야를 열(string)로 표현할 수 있게 하였다. 즉 사용자의 질문이 분석되었을 때 분석 결과는 질문의 주제, 초점, 의미, 분야로 나뉘어서 표현되고, 이러한 표현 포맷이 서버나 단말의 적당한 모듈에 전달되어 필요한 동작이 수행되게 된다. The third element is the Q domain, which is the attribute of the question or the purpose of the question. Finally, it is possible to express the field of the question as a string. When the user's question is analyzed, the analysis result is divided into the subject, the focus, the meaning and the field of the question, and the expression format is transmitted to the appropriate module of the server or the terminal to perform the required operation.

NameName DefinitionDefinition QuestionAnalysisTypeQuestionAnalysisType Provides an abstract of description of question analysis, which can be done in a processing unit.Provides an abstract description of the question analysis, which can be done in a processing unit. anlyzedQuestionanlyzedQuestion Describes analyzed question resulting from the question analysis.Describes the result of the question analysis. languagelanguage Indicates the language of the input question.
NOTE If present, the Language element shall take precedence over other language indications present within the input question.Indicates the language of the input question.
NOTE If present, the language element shall take precedence over other language indications present within the input question. UserQuestionTypeUserQuestionType Provides abstracts of User Question description. Describes user's utterance that is the output of speech recognition process. User Question is sent to QA server for providing answers to the user. If it is a control command, it is sent to the actuator.Provides abstracts of User Question description. Describes the user's utterance that the output speech recognition process. User Question is sent to QA server. If it is a control command, it is sent to the actuator. QtopicQtopic Describes topic of the question. Question topic is the object or event that the question is about. Ex. Qtopic is King Lear in "Who is the author of King Lear?". Describes topic of the question. Question topic is the object or event that the question is about. Ex. Qtopic is King Lear in "Who is the King of Lear?" QfocusQfocus Describes the focus of the question, which is one of 5W1H. The type of the focus shall be described using the mpeg7:termReferenceType defined in 7.6 of ISO/IEC　15938-5:2003. A classification scheme that may be used for this purpose is the QfocusCS defined in this document (A.4.3).Ex. What, where, who, what policy.Describes the focus of the question, which is one of 5W1H. The type of the focus shall be described using the mpeg7: termReferenceType defined in 7.6 of ISO / IEC 15938-5: 2003. A classification scheme that may be used for this purpose is the QfocusCS defined in this document (A.4.3) .Ex. What, where, who, what policy. qCsemanticqCsemantic Describes the question classification based on the meaning/purpose of the question. The type of the question classification shall be described using the mpeg7:termReferenceType defined in 7.6 of ISO/IEC　15938-5:2003. A classification scheme that may be used for this purpose is the QCsemanticCS defined in this document (A.4.4). Ex. What does MPEG stand for? (Request for terminology). Could you please turn on the TV? (Request for command)Describes the question classification based on the meaning / purpose of the question. The type of the question classification is described using the mpeg7: termReferenceType defined in 7.6 of ISO / IEC 15938-5: 2003. A classification scheme that may be used for this purpose is the QCsemanticCS defined in this document (A.4.4). Ex. What does MPEG stand for? (Request for terminology). Could you please turn on the TV? (Request for command) qdomainqdomain Describes the domain of the question such as "science", "weather", "history". Ex. Who is the third king of Yi dynasty in Korea? (qdomain: history)Describes the domain of the question such as "science", "weather", "history". Ex. Who is the third king of Yi dynasty in Korea? (qdomain: history)

여기서, CQfocus 분류체계에는 아래 [표 3]에서와 같이, 사용자의 질문이 5W1H 중에 하나에 해당된다는 것을 보여준다. 그리고 질문인 "언제, 어디에, 무엇을, 누가, 왜, 어떻게"라는 질문에 대하여 2진으로 표현할 수 있다. Here, the CQfocus classification scheme shows that the user's question corresponds to one of 5W1H, as shown in [Table 3] below. And we can express the question "When, where, what, who, why, how" in binary.

Binary representationBinary representation Term ID of QfocusCSTerm ID of QfocusCS 00000000 What_questionWhat_question 00010001 Where_questionWhere_question 00100010 When_questionWhen_question 00110011 Who_questionWho_question 01000100 Why_questionWhy_question 01010101 How_questionHow_question 0110 ~ 11110110 to 1111 ReservedReserved

그리고, "QCsemanticCS" 분류체계에는 아래 [표 4]에서와 같이, "명령 요청, 어휘 요청, 의미 요청, 정보 요청, 방법 요청"이라는 질문에 대하여 2진으로 표현할 수 있다. In addition, the "QCsemanticCS" classification system can be expressed in binary with respect to the questions "command request, lexical request, semantic request, information request, method request" as shown in [Table 4] below.

Binary representationBinary representation Term ID of QCsemanticCSTerm ID of QCsemanticCS 00000000 Request_for_commandRequest_for_command 00010001 Request_for_terminologyRequest_for_terminology 00100010 Request for meaningRequest for meaning 00110011 Request for informationRequest for information 01000100 Request for methodRequest for method 0101 - 11110101-1111 ReservedReserved

예를 들어, 도 7에 도시된 바와 같이, "Who is the author of King Lear?"라는 사용자의 질의에 대한 질의 분석 결과는 "analyzedQuestion"과 언어 "en-us"이라고 하면, 질의 분석 결과는 질문의 도메인이 "Literature"이고, 질의 주제는 "King Lear"이며, 질의 초점은 "Who"이고, 질문의 목적은 "Request_for_inforamtion"임을 알 수 있다. For example, as shown in FIG. 7, if the query analysis result of the user query "Who is the author of King Lear?" Is "analyzedQuestion" and the language "en-us" The domain of the query is "Literature", the subject of the query is "King Lear", the focus of the query is "Who", and the purpose of the question is "Request_for_inforamtion".

즉, 첫번째 질문인 "Who is the author of King Lear" 는 우선 언어가 영어로 되어 있다는 내용이 분석되었고, 질문의 주제는 리어왕이고, 초점은 "누구"이며, 질문의 의미/목적은 "정보 요청"이라는 내용으로 분석되어 분석 결과가 포맷에 적절히 담겨진 것이 보여진다. The first question, "Who is the author of King Lear", was analyzed first that the language is English, the subject of the question is Lear King, the focus is "who" and the meaning / purpose of the question is " Request ", and it is shown that the analysis result is properly included in the format.

두 번째 예를 살펴보면, 도 8에 도시된 바와 같이, "How do you make Kimchi?"라는 사용자의 질의에 대한 질의 분석 결과는 "analyzedQuestion"과 언어 "en-us"이라고 하면, 질의 분석 결과는 질문의 도메인이 "Cooking"이고, 질의 주제는 "Kimchi"이며, 질의 초점은 "How"이고, 질문의 목적은 "Request_for_method"임을 알 수 있다. Referring to the second example, as shown in FIG. 8, if the query analysis result of the user query "How do you make Kimchi?" Is "analyzedQuestion" and the language "en-us" Is the domain of "Cooking", the subject of the query is "Kimchi", the focus of the question is "How", and the purpose of the question is "Request_for_method".

즉, 두 번째 질문의 예는 "How do you make Kimchi?" 인데 이 질문도 역시 영어로 분석되었고, 질문의 분야는 "요리"이고, 질문의 주제는 "김치"이며, 질문의 초점은 "어떻게"이고, 질문의 목적은 "정보 요청"으로 분석되어 포맷에 담겨져서 모듈간에 공유된다. An example of the second question is "How do you make Kimchi?" This question is also analyzed in English. The field of the question is "cooking", the subject of the question is "kimchi", the focus of the question is "how" and the purpose of the question is analyzed as "information request" It is embedded and shared between modules.

그리고 발화 정보 판단부(240)는 분석된 발화 정보가 질의요청을 위한 정보인지 기기제어 명령을 위한 정보인지를 판단하는 역할을 한다. 만약, 분석된 발화 정보가 기기제어 명령이면, 발화 정보 판단부(240)는 해당 기기제어 명령을 수행하도록 해당 IoT 단말(100)로 발화 정보를 전달하는 역할을 한다. The speech information determination unit 240 determines whether the analyzed speech information is information for requesting a query or information for a device control command. If the analyzed speech information is a device control command, the speech information determination unit 240 transmits the speech information to the corresponding IoT terminal 100 to perform the device control command.

그리고, 질의응답 호출부(250)은 분석된 발화 정보가 질의 정보인 경우, 상기 MPEG IoMT의 데이터 포멧으로 발화 분석된 정보를 제 2 통신부(260)를 이용하여 질의응답 서버(300)에 전달하는 방식으로 질의한다. 여기서, 제 2 통신부(260)는 질의응답 서버(300)와 통신을 수행하는 역할을 한다. When the analyzed speech information is query information, the query response call unit 250 transmits the information, which is ignited and analyzed in the data format of the MPEG IoMT, to the query response server 300 using the second communication unit 260 . Here, the second communication unit 260 is responsible for communicating with the query response server 300.

한편, 음성 합성부(270)는 상기 질의응답 서버(300)로 질의응답 결과 정보를 IoT 단말(100)에 전송한다. 이때, 상기 질의응답 서버(300)로 수신되는 질의응답 결과 정보는 텍스트임에 따라, 도 11에서와 같이 음성합성 데이터 포맷을 이용하여 텍스트 형태의 질의응답 수행 결과를 음성으로 변환하여 제 1 통신부(210)를 통해 IoT 단말(100)에 전송할 수 있다. On the other hand, the voice synthesizer 270 transmits the inquiry result information to the question answering server 300 to the IoT terminal 100. At this time, according to the query response result information received by the query response server 300, the result of performing a query response of a text form is converted into speech using a speech synthesis data format as shown in FIG. 11, 210 to the IoT terminal 100.

여기서, 음성합성 데이터 포맷은 [표 5]에서와 같이, 음성 합성부에서 수행할 수 있는 음성 합성에 대한 추상적인 설명을 제공하는 SpeechSynthesisType 필드, 음성 합성의 과정에서 합성될 텍스트 입력을 설명하는 TextInput 필드, 음성 출력시, 음성 출력에 반영되는 성별, 톤, 음성 속도와 같은 음성 출력 특징을 나타낸 OutputSpeechFeature 필드 및 입력된 음성의 언어를 나타낸 Language 필드로 이루어진다. As shown in Table 5, the speech synthesis data format includes a Speech SynthesisType field that provides an abstract description of speech synthesis that can be performed by the speech synthesis unit, a TextInput field that describes a text input to be synthesized in the course of speech synthesis An OutputSpeechFeature field indicating a voice output characteristic such as a gender, a tone and a voice speed reflected on the voice output at the time of voice output, and a Language field indicating a language of the voice inputted.

NameName DefinitionDefinition SpeechSynthesisTypeSpeechSynthesisType Provides an abstract description of speech synthesis, which can be done in a processing unit.Provides an abstract description of speech synthesis, which can be done in a processing unit. TextInputTextInput Describes text input to be synthesized by the process of speech synthesis.Describes text input to be synthesized by the process of speech synthesis. OutputSpeechFeatureOutputSpeechFeature Output speech features such as gender, tones and voice speed to be reflected in speech output.Output speech features such as gender, tones and voice speed to be reflected in speech output. LanguageLanguage Indicates the language of the input speech.NOTE If present, the Language element shall take precedence over other language indications present within the speech input.Indicates the language of the input speech.NOTE If present, the Language element shall take precedence over other language indications.

그리고 질의응답 서버(300)는 상기 발화 분석 서버(200)기 MPEG IoMT의 데이터를 이용하여 질의를 요청하면, MPEG IoMT의 데이터에 포함된 질의 분석 정보를 이용하여 질의 분석하고, 그 분석된 결과인 질의응답 결과 정보를 상기 발화 분석 서버(200)로 전송한다. When the query response server 300 requests a query using the data of MPEG IoMT, the query analysis server 300 analyzes the query using the query analysis information included in the data of the MPEG IoMT, And transmits the query response result information to the utterance analysis server 200.

본 발명의 일 실시예에 따른 발화분석 서버(200)는 IoT 단말(100)로부터 전송된 단말의 위치정보를 데이터베이스에 저장된 단말의 위치정보(Point of Interest, 이하 "POI"라함)와 비교하여 단말 사용자의 위치를 인식하는 위치정보 검색부(미도시)를 더 포함할 수 있다. The speech analysis server 200 according to an embodiment of the present invention compares the location information of the terminal transmitted from the IoT terminal 100 with the location information of the terminal (POI) stored in the database, And a location information searching unit (not shown) for recognizing the location of the user.

도 12는 도 1에 도시된 발화분석 서버에서 토큰 활용을 위한 구성 블록을 나타낸 도면이다. 12 is a block diagram illustrating a configuration block for using a token in the utterance analysis server shown in FIG.

본 발명의 일 실시예에 따른 발화분석 서버(200)는 도 12에 도시된 바와 같이, MPEG IoMT에서 이용되는 음성 인식용 API 처리부(281), 음성 합성용 API 처리부(282) 및 질의 분석용 API 처리부(283)를 더 포함할 수 있다.12, the speech analysis server 200 according to an embodiment of the present invention includes an API processing unit 281 for speech recognition, an API processing unit 282 for speech synthesis, and an API for query analysis, which are used in MPEG IoMT, And a processing unit 283.

음성 인식용 API 처리부(281)는 [표 6]에서와 같이, MAnalyzer 클래스를 확장한 IoMT 음성 인식기의 클래스가 이용된 API 패킷 포맷을 이용한다. The API processing unit 281 for speech recognition uses the API packet format in which the class of the IoMT speech recognizer that extends the MAnalyzer class is used, as shown in [Table 6].

Nested ClassesNested Classes Modifier and TypeModifier and Type Method and DescriptionMethod and Description ConstructorConstructor Constructor and DescriptionConstructor and Description MSpeechRecognizer()MSpeechRecognizer () Default constructorDefault constructor MSpeechRecognizer(String id)MSpeechRecognizer (String id) MSpeechRecognizer(String id, String ipAddress, Integer port)MSpeechRecognizer (String id, String ipAddress, Integer port) FieldsFields Modifier and TypeModifier and Type Field and DescriptionField and Description MethodsMethods Modifier and TypeModifier and Type Method and DescriptionMethod and Description AnalyzedDataTypeAnalyzedDataType GetSpeechText()GetSpeechText () This function returns a class (i.e. Java or C++) or a structure (i.e., C), which include a returning type and extracted speech texts following the specification in this document.This function returns a class (ie Java or C ++) or a structure (ie, C), which includes a returning type and an extracting text. AnalyzedDataTypeAnalyzedDataType GetSpeechText(tid)GetSpeechText (tid) This method returns a class (i.e., Java or C++) or a structure (i.e., C), which include a returning type and extracted speech texts following the specification in this document.This method returns a class (i.e., Java or C ++) or a structure (i.e., C), which includes a returning type and an extracted speech text. FloatFloat GetSpeechText _Cost(int tokenType, String tokenName))GetSpeechText _Cost (int tokenType, String tokenName)) This function returns the amount of tokens to use GetSpeechText(). If tokenType is 0, it means "Crypto Currency", if tokenType is 1, it means "Legal Tender". The token name is described in string (e.g., term ID or binary representation) from TokenCS specified in A.5. If the requested token is not supported, returns -1.Ex) GetSpeechText _Cost(0, "BTC") or GetSpeechText_Cost(0, "00000001")
Ex) GetSpeechText _Cost(1, "USD") or GetSpeechText _Cost(1, "10010100")This function returns the amount of tokens to use GetSpeechText (). If tokenType is 0, it means "Crypto Currency", if tokenType is 1, it means "Legal Tender". The token name is described in string (eg, term ID or binary representation) from TokenCS specified in A.5. GetSpeechText _Cost (0, "BTC") or GetSpeechText_Cost (0, "00000001")
Ex) GetSpeechText _Cost (1, "USD") or GetSpeechText _Cost (1, "10010100")

그리고, 음성 합성용 API 처리부(282)는 [표 7]에서와 같이, MAnalyzer 클래스를 확장한 IoMT 음성 합성기의 클래스가 이용된 API 패킷 포맷을 이용한다. The API processing unit 282 for speech synthesis uses the API packet format in which the class of the IoMT speech synthesizer, which extends the MAnalyzer class, is used, as shown in [Table 7].

Nested ClassesNested Classes Modifier and TypeModifier and Type Method and DescriptionMethod and Description ConstructorConstructor Constructor and DescriptionConstructor and Description MSpeechSynthesizer()MSpeechSynthesizer () Default constructorDefault constructor MSpeechSynthesizer(String id)MSpeechSynthesizer (String id) MSpeechSynthesizer(String id, String ipAddress, Integer port)MSpeechSynthesizer (String id, String ipAddress, Integer port) FieldsFields Modifier and TypeModifier and Type Field and DescriptionField and Description MethodsMethods Modifier and TypeModifier and Type Method and DescriptionMethod and Description StringString GetSpeechSynthesisURI()GetSpeechSynthesisURI () This function returns a URI of a synthesized speech.This function returns a URI of a synthesized speech. AnalyzedDataTypeAnalyzedDataType GetSpeechSynthesisURI(tid)GetSpeechSynthesisURI (tid) This method returns a class (i.e., Java or C++) or a structure (i.e., C), which include a returning type.This method returns a class (i.e., Java or C ++) or a structure (i.e., C), which includes a returning type. FloatFloat GetSpeechSynthesisURI_Cost(int tokenType, String tokenName))GetSpeechSynthesisURI_Cost (int tokenType, String tokenName)) This function returns the amount of tokens to use GetSpeechSynthesisURI (). If tokenType is 0, it means "Crypto Currency", if tokenType is 1, it means "Legal Tender". The token name is described in string (e.g., term ID or binary representation) from TokenCS specified in A.5. If the requested token is not supported, returns -1.Ex) GetSpeechSynthesisURI _Cost(0, "BTC") or GetSpeechSynthesisURI_Cost(0, "00000001")
Ex) GetSpeechSynthesisURI _Cost(1, "USD") or GetSpeechSynthesisURI _Cost(1, "10010100")This function returns the amount of tokens to use GetSpeechSynthesisURI (). If tokenType is 0, it means "Crypto Currency", if tokenType is 1, it means "Legal Tender". The token name is described in string (eg, term ID or binary representation) from TokenCS specified in A.5. GetSpeechSynthesisURI _Cost (0, "BTC") or GetSpeechSynthesisURI_Cost (0, "00000001")
Ex) GetSpeechSynthesisURI _Cost (1, "USD") or GetSpeechSynthesisURI _Cost (1, "10010100")

또한, 질의 분석용 API 처리부(283)는 [표 6]에서와 같이, MAnalyzer 클래스를 확장한 IoMT 질의 분석기의 클래스가 이용된 API 패킷 포맷을 이용한다. In addition, the API processing unit 283 for query analysis uses the API packet format in which the class of the IoMT query analyzer, which extends the MAnalyzer class, is used, as shown in [Table 6].

Nested ClassesNested Classes Modifier and TypeModifier and Type Method and DescriptionMethod and Description ConstructorConstructor Constructor and DescriptionConstructor and Description MQuestionAnalyzer()MQuestionAnalyzer () Default constructorDefault constructor MQuestionAnalyzer(String id)MQuestionAnalyzer (String id) MQuestionAnalyzer(String id, String ipAddress, Integer port)MQuestionAnalyzer (String id, String ipAddress, Integer port) FieldsFields Modifier and TypeModifier and Type Field and DescriptionField and Description MethodsMethods Modifier and TypeModifier and Type Method and DescriptionMethod and Description AnalyzedDataTypeAnalyzedDataType GetUserQuestion()GetUserQuestion () This function returns a class (i.e. Java or C++) or a structure (i.e., C), which include a returning type and user question following the specification in this document.This function returns a class (ie Java or C ++) or a structure (ie, C), which includes a returning type and user question. AnalyzedDataTypeAnalyzedDataType GetUserQuestion (tid)GetUserQuestion (tid) This method returns a class (i.e., Java or C++) or a structure (i.e., C), which include a returning type and user question following the specification in this document.This method returns a class (i.e., Java or C ++) or a structure (i.e., C), which includes a returning type and user question. FloatFloat GetUserQuestion_Cost(int tokenType, String tokenName))GetUserQuestion_Cost (int tokenType, String tokenName)) This function returns the amount of tokens to use GetUserQuestion(). If tokenType is 0, it means "Crypto Currency", if tokenType is 1, it means "Legal Tender". The token name is described in string (e.g., term ID or binary representation) from TokenCS specified in A.5. If the requested token is not supported, returns -1.Ex) GetUserQuestion_Cost(0, "BTC") or GetUserQuestion_Cost(0, "00000001")
Ex) GetUserQuestion_Cost(1, "USD") or UserQuestion_Cost(1, "10010100")This function returns the amount of tokens to use GetUserQuestion (). If tokenType is 0, it means "Crypto Currency", if tokenType is 1, it means "Legal Tender". The token name is described in string (eg, term ID or binary representation) from TokenCS specified in A.5. GetUserQuestion_Cost (0, "BTC") or GetUserQuestion_Cost (0, "00000001")
Ex) GetUserQuestion_Cost (1, "USD") or UserQuestion_Cost (1, "10010100")

따라서, 발화분석 서버는 MPEG IoMT 환경에서 질의분석, 음성인식, 음성합성 등의 서비스를 제공할 때마다, 거래 서비스를 제공할 수 있는 효과가 있다. Accordingly, the speech analysis server has an effect of providing a transaction service whenever providing services such as query analysis, speech recognition, and speech synthesis in the MPEG IoMT environment.

이하, 하기에서는 본 발명의 일 실시예에 따른 MPEG IoMT 환경에서의 질의응답 방법에 대하여 도 12를 참조하여 설명하기로 한다. Hereinafter, a query response method in the MPEG IoMT environment according to an embodiment of the present invention will be described with reference to FIG.

도 13은 MPEG IoMT 환경에서의 질의응답 처리 방법에 관한 것으로, 발화 분석 서버에 의해 수행되는 것이 바람직하다. 13 relates to a query response processing method in an MPEG IoMT environment, and is preferably performed by a speech analysis server.

먼저, 발화 분석 서버(200)는 IoT 단말(100)로부터 전송된 발화 정보를 입력받는다(S100). First, the speech analysis server 200 receives the speech information transmitted from the IoT terminal 100 (S100).

그러면, 발화 분석 서버(200)는 입력된 발화 정보를 MPEG(Moving Picture Experts Group) IoMT(Internet of Media Things) 데이터 포맷에 따라 발화 분석을 수행한다(S200). 여기서, 상기 MPEG IoMT의 데이터 포맷은, 사용자 질문 타입에 대한 정보와 사용자의 질문이 어떤 언어로 표현되어 있는지에 대한 정보를 포함한다. Then, the utterance analysis server 200 performs utterance analysis on the input utterance information according to the Moving Picture Experts Group (I / O) data format of IoMT (S200). Here, the data format of the MPEG IoMT includes information on the user question type and information on which language the user's question is expressed.

그리고 상기 사용자 질문 타입에 대한 정보는, 질문의 주제를 나타내는 정보, 질문의 초점을 나타내는 정보 및 질문의 의미 또는 목적을 나타내는 정보를 포함한다.The information on the user question type includes information indicating the subject of the question, information indicating the focus of the question, and information indicating the meaning or purpose of the question.

이때, 발화 분석 서버(200)는 분석된 발화 분석 결과, 질의에 대한 발화 분석 결과인지를 판단한다(S300). At this time, the utterance analysis server 200 determines whether it is the result of the utterance analysis on the query as a result of the analyzed utterance analysis (S300).

상기 판단 단계(S300)에서 질의를 위한 발화이면(YES), 상기 발화 분석 서버(200)는 발화 분석된 정보를 이용하여 질의응답 서버와의 질의 응답을 수행한다(S400). In step S300, if the query is for a query (YES), the utterance analysis server 200 performs a query response with the query response server using the information analyzed using the speech (S400).

이후, 상기 발화 분석 서버(200)가 그 질의응답 결과 정보를 IoT 단말에 제공한다(S500). Thereafter, the utterance analysis server 200 provides the query response result information to the IoT terminal (S500).

한편, 상기 판단 단계(S300)에서 기기제어에 대한 발화이면(YES), IoT 단말(100)에 발화 분석 내용을 제공한다(S600). Meanwhile, if it is determined in step S300 that the device control is ignited (YES), the contents of the speech analysis are provided to the IoT terminal 100 (S600).

이상, 본 발명의 구성에 대하여 첨부 도면을 참조하여 상세히 설명하였으나, 이는 예시에 불과한 것으로서, 본 발명이 속하는 기술분야에 통상의 지식을 가진자라면 본 발명의 기술적 사상의 범위 내에서 다양한 변형과 변경이 가능함은 물론이다. 따라서 본 발명의 보호 범위는 전술한 실시예에 국한되어서는 아니 되며 이하의 특허청구범위의 기재에 의하여 정해져야 할 것이다.While the present invention has been described in detail with reference to the accompanying drawings, it is to be understood that the invention is not limited to the above-described embodiments. Those skilled in the art will appreciate that various modifications, Of course, this is possible. Accordingly, the scope of protection of the present invention should not be limited to the above-described embodiments, but should be determined by the description of the following claims.

100 : IoT 단말 200 : 발화 분석 서버
210 : 제 1 통신부 220 : 음성 처리부
230 : 발화 분석부 240 : 발화 정보 판단부
250 : 질의응답 호출부 260 : 제 2 통신부
300 : 질의응답 서버 100: IoT terminal 200: Ignition analysis server
210: first communication unit 220:
230: utterance analysis unit 240: utterance information determination unit
250: query response calling unit 260: second communication unit
300: Query-response server

Claims

An IoT terminal for receiving and transmitting speech information and receiving and providing query response result information; And
Performs speech analysis on the speech information transmitted from the IoT terminal according to the MPEG (Moving Picture Experts Group) IoMT (Internet of Media Things) data format, and performs a query response with a query response server And providing a query response result information to the IoT terminal.

The method according to claim 1,
The data format of the MPEG IoMT includes:
A query response system in an MPEG IoMT environment that includes information about a user question type and a user's query in which language it is expressed.

3. The method of claim 2,
The information about the user question type may include,
The information indicating the subject of the question, the information indicating the focus of the question, and the information indicating the meaning or purpose of the question, in the MPEG IoMT environment.

3. The method of claim 2,
The focus information of the above-
A query response system in the MPEG IoMT environment, which is classified as a classification system such as "when, where, what, who, why, how".

3. The method of claim 2,
The meaning and the purpose information of the above-
Wherein the information is classified into a classification system such as a command request, a lexical request, a semantic request, an information request, and a method request.

3. The method of claim 2,
The data format of the MPEG IoMT includes:
A query response system in an MPEG IoMT environment, comprising query domain information expressed in a string.

The method according to claim 1,
The IoT terminal,
An input unit for inputting the speech information provided by the user;
A communication unit for transmitting the input speech information to the speech analysis server and receiving query response result information from the speech analysis server; And
And an output unit for outputting query response result information received from the utterance analysis server in an MPEG IoMT environment.

8. The method of claim 7,
The input unit
And a query interface for outputting a user interface for receiving modal information of a text form on a screen.

8. The method of claim 7,
The input unit
And a camera for obtaining modal information in the form of an image.

8. The method of claim 7,
The input unit
And a microphone for receiving the user's utterance information.

8. The method of claim 7,
The output unit includes:
And a screen output unit for outputting the query response result information to a screen.

8. The method of claim 7,
The output unit includes:
And an audio output unit for outputting the question and answer result information by voice.

The method according to claim 1,
Wherein the speech analysis server comprises:
A communication unit for performing data communication with the IoT terminal and the query response server;
A speech recognition unit for recognizing speech of the speech information provided from the IoT terminal;
An utterance analyzing unit for performing utterance analysis on the speech recognized speech information according to a Moving Picture Experts Group (MPEG) IoMT (Internet of Media Things) data format;
And a query response call unit for querying the question answering server using the information analyzed by the data format of the MPEG IoMT.

14. The method of claim 13,
Wherein the speech analysis server comprises:
And a speech synthesizer for converting the text-based query response result information into speech.

14. The method of claim 13,
Wherein the controller determines whether the analyzed speech information is information for a request for inquiry or information for a device control command, and performs a device control command if the analyzed device is a device control command, And a judging unit for judging whether or not the MPEG IoMT environment is valid.

The method according to claim 1,
Wherein the query response server comprises:
Performing a query analysis using the data format of the MPEG IoMT of the information received from the utterance analysis server and providing the query response result information as a result of the query analysis to the utterance analysis server,
And transmits the list information set in accordance with the correct answer possibility information on the query response result to the utterance analysis server when a plurality of query response results exist.

A method of processing a query response in an MPEG IoMT environment,
Performing a speech analysis on the speech information transmitted from the IoT terminal according to a Moving Picture Experts Group (IoMT) Internet of Media Things (IoMT) data format;
Performing a query response with the query response server using the information analyzed by the utterance analysis server; And
And providing the query response result information to the IoT terminal by the speech analysis server.

18. The method of claim 17,
The data format of the MPEG IoMT includes:
Information about user question types
A method of query response in an MPEG IoMT environment, the method comprising:

19. The method of claim 18,
The information about the user question type may include,
The information indicating the subject of the question, the information indicating the focus of the question, and the information indicating the meaning or purpose of the question, in the MPEG IoMT environment.

20. The method of claim 19,
The focus information of the question is classified into a classification system such as " when, where, what, who, why,
Wherein the semantic and object information of the question is classified into a classification system such as a command request, a lexical request, a semantic request, an information request, and a method request, in the MPEG IoMT environment.