KR102014774B1

KR102014774B1 - Server and method for controlling voice recognition of device, and the device

Info

Publication number: KR102014774B1
Application number: KR1020110138225A
Authority: KR
Inventors: 류창선; 김희경; 한영호; 구명완
Original assignee: 주식회사 케이티
Priority date: 2011-12-20
Filing date: 2011-12-20
Publication date: 2019-10-22
Also published as: KR20130070947A

Abstract

단말의 음성인식을 제어하는 제어 서버 및 방법, 그리고 단말이 제공된다. 보다 상세하게는 네트워크를 통하여 단말과 설정된 제 1 프로토콜 연결을 기반으로 단말로부터 음성인식 요청신호를 수신하고, 음성인식 요청신호에 기초하여 복수의 음성인식 엔진들 중 단말에 대응하는 음성인식 엔진을 결정하고, 단말과 결정된 음성인식 엔진간에 음성 데이터가 전송되는 제 2 프로토콜 연결의 식별정보를 결정하고, 결정된 식별정보를 단말로 전송하는 음성인식 제어 서버 및 방법, 그리고 단말이 제공된다. A control server and method for controlling voice recognition of a terminal, and a terminal are provided. More specifically, the voice recognition request signal is received from the terminal based on the first protocol connection established with the terminal through the network, and the voice recognition engine corresponding to the terminal is determined among the plurality of voice recognition engines based on the voice recognition request signal. A voice recognition control server and method for determining the identification information of the second protocol connection through which voice data is transmitted between the terminal and the determined voice recognition engine and transmitting the determined identification information to the terminal are provided.

Description

SERVER AND METHOD FOR CONTROLLING VOICE RECOGNITION OF DEVICE, AND THE DEVICE}

음성인식을 제어하는 서버 및 방법, 그리고 단말에 관한 것으로, 보다 상세하게는 복수의 단말 각각의 음성인식을 제어하는 서버 및 방법, 그리고 단말에 관한 것이다. The present invention relates to a server and a method for controlling voice recognition, and a terminal, and more particularly, to a server and a method for controlling voice recognition of each of a plurality of terminals, and a terminal.

N 스크린(Screen) 서비스는 TV, PC, 태블릿 PC 또는 스마트폰 등의 다양한 기기에서 독립적으로 이용하던 서비스를 사용자 또는 콘텐츠를 중심으로 복합적으로 이용할 수 있게 해주는 서비스이다. 이러한 N 스크린 서비스가 제공됨에 있어서, 다양한 종류의 복수의 기기에서 동시에 동일한 콘텐츠를 재생시키는 기술 및 복수의 기기 중 어느 하나의 단말에서 재생하던 콘텐츠를 복수의 기기 중 다른 기기에서 끊임 없이(Seamless) 재생하는 기술 등이 요구된다. 이와 관련하여, 선행기술인 한국 특허공개번호 제2011-0009587에는 복수의 단말로 동용상 컨텐츠를 제공하는 컨텐츠 서버 간의 재생 이력 동기화를 구현하여 이 기종 단말간의 동영상 컨텐츠 이어보기를 제공하는 구성이 개시되어 있다. The N Screen service is a service that allows a user to use a service that was independently used in various devices such as a TV, a PC, a tablet PC, or a smartphone, centering on a user or content. In the provision of the N screen service, a technology of simultaneously playing the same content on a plurality of devices of various types and seamless playback of content played on any one terminal of the plurality of devices on another device of the plurality of devices Technology is required. In this regard, Korean Patent Publication No. 2011-0009587, which is a prior art, discloses a configuration for providing video content replay between heterogeneous terminals by implementing synchronization of playback history between content servers providing content for a plurality of terminals. .

한편, N-스크린 환경의 확대로 인해 패드, 스마트폰, IPTV 등 다양하고 환경 자체가 다른 단말의 이용과 사용 대수의 팽창으로 다수의 음성인터페이스 요구를 효과적으로 수행하여야 한다. 그러나, 대량의 음성 인터페이스 또는 다른 종류의 단말들의 음성 인터페이스 요구를 처리하는데 있어서, 기존의 시스템은 제한된다. Meanwhile, due to the expansion of the N-screen environment, various voice interfaces such as pads, smart phones, and IPTVs are required to effectively perform a plurality of voice interface requirements due to the expansion of the number of users and the use of different terminals. However, existing systems are limited in handling the voice interface requirements of large amounts of voice interface or other types of terminals.

다양한 형태의 단말들의 서로 다른 특성들을 통합적으로 고려하여 단말들의 음성 인터페이스 제어를 보다 효과적으로 수행할 수 있다. 다수의 단말들의 대용량 음성 인터페이스 요청에 의한 대드락을 예방하고, 네트워크 부하를 감소시킬 수 있다. 다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. It is possible to perform voice interface control of terminals more effectively by integrating different characteristics of various types of terminals. It is possible to prevent large locks due to large voice interface requests of a plurality of terminals and to reduce network load. However, the technical problem to be achieved by the present embodiment is not limited to the technical problems as described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 일 실시예는 네트워크를 통하여 단말과 설정된 제 1 프로토콜 연결을 기반으로 상기 단말로부터 음성인식 요청신호를 수신하는 요청신호 수신부, 상기 음성인식 요청신호에 기초하여 복수의 음성인식 엔진들 중 상기 단말에 대응하는 음성인식 엔진을 결정하는 음성인식 엔진 결정부, 상기 단말과 상기 결정된 음성인식 엔진간에 음성 데이터가 전송되는 제 2 프로토콜 연결의 식별정보를 결정하는 식별정보 결정부 및 상기 식별정보를 상기 단말로 전송하는 식별정보 전송부를 포함하는 음성인식 제어 서버를 제공할 수 있다. As a technical means for achieving the above technical problem, an embodiment of the present invention is a request signal receiving unit for receiving a voice recognition request signal from the terminal based on a first protocol connection established with the terminal through a network, the voice recognition request A speech recognition engine determiner configured to determine a speech recognition engine corresponding to the terminal among a plurality of speech recognition engines based on the signal, and identification information of a second protocol connection in which speech data is transmitted between the terminal and the determined speech recognition engine. It may provide a voice recognition control server including an identification information determining unit for determining and an identification information transmitting unit for transmitting the identification information to the terminal.

또한, 본 발명의 다른 실시예는 네트워크를 통하여 단말과 제 1 프로토콜 연결을 설정하는 단계, 상기 설정된 제 1 프로토콜 연결을 기반으로 상기 단말로부터 음성인식 요청신호를 수신하는 단계, 상기 음성인식 요청신호에 기초하여 복수의 음성인식 엔진들 중 상기 단말에 대응하는 음성인식 엔진을 결정하는 단계, 상기 단말과 상기 결정된 음성인식 엔진간에 음성 데이터가 전송되는 제 2 프로토콜 연결의 식별정보를 결정하는 단계 및 상기 결정된 식별정보를 상기 단말로 전송하는 단계를 포함하는 음성인식 제어 방법을 제공할 수 있다. In addition, another embodiment of the present invention comprises the steps of establishing a first protocol connection with the terminal through a network, receiving a voice recognition request signal from the terminal based on the set first protocol connection, the voice recognition request signal Determining a speech recognition engine corresponding to the terminal from among a plurality of speech recognition engines, determining identification information of a second protocol connection through which speech data is transmitted between the terminal and the determined speech recognition engine; It may provide a voice recognition control method comprising the step of transmitting the identification information to the terminal.

또한, 본 발명의 또 다른 실시예는 네트워크를 통하여 음성인식 제어 서버와 설정된 제 1 프로토콜 연결을 기반으로 상기 음성인식 제어 서버로 음성인식 요청신호를 전송하는 요청신호 전송부, 상기 음성인식 제어 서버로부터 복수의 음성인식 엔진들 중 어느 하나의 음성인식 엔진의 식별 정보를 수신하는 식별정보 수신부, 상기 수신된 식별 정보에 기초하여 상기 어느 하나의 음성인식 엔진과 제 2 프로토콜 연결을 설정하는 연결 설정부, 상기 설정된 제 2 프로토콜 연결을 기반으로 상기 어느 하나의 음성인식 엔진으로 음성 데이터를 전송하는 음성 데이터 전송부 및 상기 어느 하나의 음성인식 엔진으로부터 상기 전송된 음성 데이터에 대응하는 결과정보를 수신하는 결과정보 수신부를 포함하는 단말을 제공할 수 있다. In addition, another embodiment of the present invention is a request signal transmission unit for transmitting a voice recognition request signal to the voice recognition control server based on the first protocol connection established with the voice recognition control server through the network, from the voice recognition control server An identification information receiver configured to receive identification information of any one of a plurality of speech recognition engines, a connection setting unit configured to establish a second protocol connection with the one of the speech recognition engines based on the received identification information; Based on the set second protocol connection, a voice data transmission unit for transmitting voice data to the one voice recognition engine and result information for receiving result information corresponding to the transmitted voice data from any one voice recognition engine. A terminal including a receiver may be provided.

단말 별 특성을 고려하여 단말에 특화된 음성인식 엔진을 결정함으로써, 다양한 형태의 단말들의 서로 다른 특성들을 통합적으로 고려하여 단말들의 음성 인터페이스 제어를 보다 효과적으로 수행할 수 있다. 제어신호를 송수신하는 제 1 프로토콜과 음성 데이터를 송수신하는 제 2 프로토콜을 분리하여 운용함으로써, 다수의 단말들의 대용량 음성 인터페이스 요청에 의한 대드락을 예방하고, 네트워크 부하를 감소시킬 수 있다. By determining the voice recognition engine specific to the terminal in consideration of the characteristics of each terminal, it is possible to perform the voice interface control of the terminals more effectively by integrating the different characteristics of the various types of terminals. By separating and operating the first protocol for transmitting and receiving control signals and the second protocol for transmitting and receiving voice data, it is possible to prevent large locks caused by large-capacity voice interface requests of a plurality of terminals and to reduce network load.

도 1은 본 발명의 일 실시예에 따른 음성인식 제어 시스템의 구성도이다.
도 2는 도 1의 음성인식 제어 서버(10)의 구성도이다.
도 3은 본 발명의 다른 실시예에 따른 음성인식 제어 서버(40) 및 음성인식 엔진 서버(50)의 구성도이다.
도 4는 본 발명의 일 실시예에 따른 단말(20)의 구성도이다.
도 5는 본 발명의 일 실시예에 따른 음성인식 제어 방법을 나타낸 동작 흐름도이다. 1 is a block diagram of a voice recognition control system according to an embodiment of the present invention.
2 is a block diagram of the voice recognition control server 10 of FIG.
3 is a configuration diagram of a voice recognition control server 40 and a voice recognition engine server 50 according to another embodiment of the present invention.
4 is a configuration diagram of a terminal 20 according to an embodiment of the present invention.
5 is an operation flowchart illustrating a voice recognition control method according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. Throughout the specification, when a part is "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element in between. . In addition, when a part is said to "include" a certain component, which means that it may further include other components, except to exclude other components unless otherwise stated.

도 1은 본 발명의 일 실시예에 따른 음성인식 제어 시스템의 구성도이다. 도 1을 참조하면, 음성인식 제어 시스템은 음성인식 제어 서버(10), 복수의 단말(21 내지 23) 및 검색 서버(30)를 포함한다. 다만, 이러한 도 1의 음성인식 제어 시스템은 본 발명의 일 실시예에 불과하므로 도 1을 통해 본 발명의 내용이 한정 해석되는 것은 아니다. 예를 들어, 본 발명의 다양한 실시예들에 따르면, 음성인식 제어 시스템은 복수의 단말(21 내지 23)로 컨텐츠를 제공하는 컨텐츠 제공 서버를 더 포함할 수도 있다. 또한, 도 1에 개시된 바와 같이, 도 1의 음성인식 제어 시스템은 음성인식 제어 서버(10)의 외부에 위치하는 음성인식 엔진 A(11)를 더 포함할 수도 있다. 1 is a block diagram of a voice recognition control system according to an embodiment of the present invention. Referring to FIG. 1, the voice recognition control system includes a voice recognition control server 10, a plurality of terminals 21 to 23, and a search server 30. However, since the voice recognition control system of FIG. 1 is only an embodiment of the present invention, the contents of the present invention are not limitedly interpreted through FIG. 1. For example, according to various embodiments of the present disclosure, the voice recognition control system may further include a content providing server that provides content to the plurality of terminals 21 to 23. In addition, as disclosed in FIG. 1, the voice recognition control system of FIG. 1 may further include a voice recognition engine A 11 located outside the voice recognition control server 10.

음성인식 제어 시스템을 구성하는 도 1의 각 구성요소들은 일반적으로 네트워크(network)를 통해 연결된다. 네트워크는 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크(network)의 일 예에는 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network) 등이 포함되나 이에 한정되지는 않는다. Each component of FIG. 1 constituting the voice recognition control system is generally connected through a network. A network refers to a connection structure capable of exchanging information between respective nodes such as terminals and servers. Examples of such a network include the Internet, a local area network, and a wireless LAN. Local Area Network (WAN), Wide Area Network (WAN), Personal Area Network (PAN), etc. may be included, but is not limited thereto.

음성인식 제어 서버(10)는 복수의 단말(21 내지 23)의 음성인식을 제어한다. 이를 위해, 음성인식 제어 서버(10)는 네트워크를 통하여 복수의 단말(21 내지 23)로부터 음성인식 요청신호를 수신하고, 수신된 음성인식 요청신호에 대응하는 응답 신호를 복수의 단말(21 내지 23)로 전송한다. The voice recognition control server 10 controls voice recognition of the plurality of terminals 21 to 23. To this end, the voice recognition control server 10 receives a voice recognition request signal from the plurality of terminals 21 to 23 through a network, and sends a response signal corresponding to the received voice recognition request signal to the plurality of terminals 21 to 23. To send).

음성인식 제어 서버(10)는 복수의 단말(21 내지 23)과 설정된 제 1 프로토콜 연결을 기반으로 음성인식 요청신호를 수신하고, 음성인식 요청신호에 대한 응답으로서 복수의 단말(21 내지 23)과 음성인식 엔진간의 음성 데이터가 송수신되는 제 2 프로토콜의 식별정보를 전송한다. 이와 같이, 음성인식 제어 서버(10)는 음성인식을 위한 제어신호를 송수신하는 채널과 실제 음성 데이터를 송수신하는 채널을 분리함으로써, 네트워크의 부하를 감소시킴과 동시에 효율적인 음성인식 제어를 수행할 수 있다. The voice recognition control server 10 receives a voice recognition request signal based on the first protocol connection established with the plurality of terminals 21 to 23, and responds to the plurality of terminals 21 to 23 as a response to the voice recognition request signal. The identification information of the second protocol through which voice data is transmitted and received between voice recognition engines is transmitted. As such, the voice recognition control server 10 may separate the channel for transmitting and receiving the control signal for voice recognition and the channel for transmitting and receiving the actual voice data, thereby reducing the load on the network and performing efficient voice recognition control. .

음성인식 제어 서버(10)는 복수의 단말(21 내지 23) 중 어느 하나의 단말로부터 수신된 음성인식 요청신호에 기초하여 어느 하나의 단말의 특성을 파악하고, 파악된 특성에 대응하는 음성인식 엔진을 결정한다. 이와 같이, 음성인식 제어 서버(10)는 어느 하나의 단말의 특성을 고려하여, 어느 하나의 단말에 적합한 음성인식 엔진을 결정함으로써, 다양한 형태의 단말 각각의 특성을 고려한 맞춤형 음성인식 제어를 수행할 수 있다. The voice recognition control server 10 detects the characteristics of any one terminal based on the voice recognition request signal received from any one of the terminals 21 to 23, and performs a speech recognition engine corresponding to the identified characteristics. Determine. As such, the voice recognition control server 10 may determine a voice recognition engine suitable for any one terminal in consideration of the characteristics of any one terminal to perform customized voice recognition control considering the characteristics of each of the various types of terminals. Can be.

본 발명의 일 실시예에 따르면 음성인식 제어 서버(10)는 내부에 복수의 음성인식 엔진들을 포함하고, 단말별 특성을 고려하여 각각의 단말로 복수의 음성인식 엔진들 각각을 추천한다. 또한, 본 발명의 다른 실시예에 따르면 음성인식 제어 서버(10)는 음성인식 제어 서버(10) 외부에 위치하는 음성인식 엔진 A(50)를 단말로 추천할 수도 있다. According to an embodiment of the present invention, the voice recognition control server 10 includes a plurality of voice recognition engines therein, and recommends each of the plurality of voice recognition engines to each terminal in consideration of characteristics of each terminal. In addition, according to another embodiment of the present invention, the voice recognition control server 10 may recommend the voice recognition engine A 50 located outside the voice recognition control server 10 as a terminal.

검색 서버(30)는 복수의 단말(21 내지 23)로부터 수신된 검색 요청 신호를 수신하고, 검색 요청 신호에 대응하는 검색 결과를 복수의 단말(21 내지 23)로 전송한다. 이 때, 검색 요청 신호는 복수의 단말(21 내지 23) 각각이 음성인식 엔진으로부터 수신한 음성 데이터의 결과정보에 의하여 생성된다. The search server 30 receives a search request signal received from the plurality of terminals 21 to 23, and transmits a search result corresponding to the search request signal to the plurality of terminals 21 to 23. At this time, the search request signal is generated by the result information of the voice data which each of the plurality of terminals 21 to 23 receives from the speech recognition engine.

복수의 단말(21 내지 23) 각각은 음성인식 요청신호를 음성인식 제어 서버(10)로 전송하고, 음성인식 제어 서버로부터 음성인식 엔진의 식별정보를 수신한다. 또한, 복수의 단말(21 내지 23) 각각은 음성인식 엔진으로 음성 데이터를 송신하고, 송신된 음성 데이터에 대응하는 결과정보를 수신한다. 이 때, 음성인식 요청신호 및 식별정보는 제 1 프로토콜 연결을 기반으로 단말(21 내지 23)과 음성인식 제어 서버(10)간에 송수신되고, 음성 데이터 및 결과정보는 제 2 프로토콜 연결을 기반으로 단말(21 내지 23)과 음성인식 엔진간에 송수신된다. Each of the plurality of terminals 21 to 23 transmits a voice recognition request signal to the voice recognition control server 10 and receives identification information of the voice recognition engine from the voice recognition control server. In addition, each of the plurality of terminals 21 to 23 transmits the voice data to the voice recognition engine and receives the result information corresponding to the transmitted voice data. In this case, the voice recognition request signal and the identification information are transmitted and received between the terminals 21 to 23 and the voice recognition control server 10 based on the first protocol connection, and the voice data and the result information are based on the second protocol connection. It is transmitted and received between 21 to 23 and the voice recognition engine.

본 발명의 다양한 실시예들에 따르면 복수의 단말 각각은 서로 다른 종류의 단말일 수 있다. 예를 들어, 단말은 네트워크를 통해 원격지의 서버에 접속할 수 있는 TV 장치, 컴퓨터 또는 휴대용 단말일 수 있다. 여기서, TV 장치의 일 예에는 스마트 TV, IPTV 셋톱박스 등이 포함되고, 컴퓨터의 일 예에는 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop) 등이 포함되고, 휴대용 단말의 일 예에는 휴대성과 이동성이 보장되는 무선 통신 장치로서, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(smartphone), 태블릿 PC 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치가 포함될 수 있다. According to various embodiments of the present disclosure, each of the plurality of terminals may be a different type of terminal. For example, the terminal may be a TV device, a computer or a portable terminal capable of connecting to a remote server via a network. Here, an example of a TV device includes a smart TV, an IPTV set-top box, and the like, and an example of a computer includes a laptop, desktop, laptop, etc., which is equipped with a web browser. An example of a terminal is a wireless communication device that guarantees portability and mobility, and includes a personal communication system (PCS), a global system for mobile communications (GSM), a personal digital cellular (PDC), a personal handyphone system (PHS), and a personal digital (PDA). Assistant (IMT), International Mobile Telecommunication (IMT) -2000, Code Division Multiple Access (CDMA) -2000, W-Code Division Multiple Access (W-CDMA), Wireless Broadband Internet (Wibro) terminal, smartphone, tablet PC All kinds of handheld based wireless communication devices such as the like may be included.

이하의 도면들을 통하여 도 1의 음성인식 제어 시스템의 각 구성요소의 동작에 대해서 보다 상세하게 설명한다. The operation of each component of the voice recognition control system of FIG. 1 will be described in more detail with reference to the following drawings.

도 2는 도 1의 음성인식 제어 서버(10)의 구성도이다. 도 2를 참조하면, 음성인식 제어 서버(10)는 요청신호 수신부(11), 음성인식 엔진 결정부(12), 식별정보 결정부(13), 식별정보 전송부(14), 제 1 음성인식 엔진(15), 제 2 음성인식 엔진(16) 및 데이터베이스(17)를 포함한다. 2 is a block diagram of the voice recognition control server 10 of FIG. Referring to FIG. 2, the voice recognition control server 10 may include a request signal receiver 11, a voice recognition engine determiner 12, an identification information determiner 13, an identification information transmitter 14, and a first voice recognition. An engine 15, a second speech recognition engine 16 and a database 17.

다만, 도 2에 도시된 음성인식 제어 서버(10)는 본 발명의 하나의 구현 예에 불과하며, 도 2에 도시된 구성요소들을 기초로 하여 여러 가지 변형이 가능하다. 예를 들어, 음성인식 제어 서버(10)는 관리자로부터 어떤 명령 내지 정보를 입력받기 위한 관리자 인터페이스가 더 포함될 수 있다. 이 경우, 관리자 인터페이스는 일반적으로 키보드, 마우스 등과 같은 입력 장치가 될 수도 있으나, 영상 표시 장치에 표현되는 그래픽 유저 인터페이스(GUI, Graphical User interface)가 될 수도 있다. 다른 예를 들어, 음성인식 제어 서버(10)는 단말(20)과 데이터를 송수신하는 통신부를 더 포함할 수도 있다. 이 경우, 통신부는 네트워크를 경유하여 단말(20)로부터 데이터를 수신하고 수신된 데이터를 음성인식 제어 서버(10) 내부의 다른 구성요소들로 전달하거나, 음성인식 제어 서버(10) 내부의 다른 구성요소로부터 전달된 데이터를 단말(20)로 전송할 수 있다. 또 다른 예를 들어, 음성인식 제어 서버(10)는 적어도 하나 이상의 음성인식 엔진들을 더 포함할 수도 있다. However, the voice recognition control server 10 shown in FIG. 2 is only one implementation example of the present invention, and various modifications are possible based on the components shown in FIG. 2. For example, the voice recognition control server 10 may further include a manager interface for receiving a certain command or information from the manager. In this case, the manager interface may generally be an input device such as a keyboard, a mouse, or the like, or may be a graphical user interface (GUI) expressed on the image display device. For another example, the voice recognition control server 10 may further include a communication unit for transmitting and receiving data with the terminal 20. In this case, the communication unit receives data from the terminal 20 via a network and transfers the received data to other components inside the voice recognition control server 10 or another configuration inside the voice recognition control server 10. Data transmitted from the element may be transmitted to the terminal 20. As another example, the voice recognition control server 10 may further include at least one or more voice recognition engines.

요청신호 수신부(11)는 네트워크를 통하여 단말(20)로부터 음성인식 요청신호를 수신한다. 이 때, 음성인식 요청신호는 단말(20)로 입력된 음성 데이터를 장치 또는 사람이 인식 가능한 문자 또는 숫자 형식의 데이터로 변환할 것을 요청하는 신호를 의미한다. 또한, 단말(20)은 도 1에 도시된 복수의 단말(21 내지 23) 중 어느 하나의 단말(20)을 의미하나, 도 1에 도시된 형태나 종류에 의하여 한정되는 것은 아니다. The request signal receiving unit 11 receives a voice recognition request signal from the terminal 20 through a network. In this case, the voice recognition request signal means a signal for requesting to convert the voice data input to the terminal 20 into data in a text or numeric format that can be recognized by a device or a person. In addition, the terminal 20 refers to any one terminal 20 of the plurality of terminals 21 to 23 illustrated in FIG. 1, but is not limited to the form or type illustrated in FIG. 1.

요청신호 수신부(11)는 네트워크를 통하여 단말(20)과 설정된 제 1 프로토콜 연결을 기반으로 단말(20)로부터 음성인식 요청신호를 수신한다. 이 때, 제 1 프로토콜은 나중에 설명될 제 2 프로토콜과 다른 통신 계층 기반의 프로토콜이다. 예를 들어, 제 1 프로토콜은 어플리케이션 계층 기반의 HTTP(HyperText Transfer Protocol)이고, 제 2 프로토콜은 전송 계층 또는 네트워크 계층 기반의 TCP-IP(Transmission Control Protocol-Internet Protocol)일 수 있다. The request signal receiving unit 11 receives a voice recognition request signal from the terminal 20 based on the first protocol connection established with the terminal 20 through a network. At this time, the first protocol is a communication layer based protocol different from the second protocol which will be described later. For example, the first protocol may be HyperText Transfer Protocol (HTTP) based on an application layer, and the second protocol may be Transmission Control Protocol-Internet Protocol (TCP-IP) based on a transport layer or a network layer.

요청신호 수신부(11)는 API(Application Programming Interface)를 이용하여 단말(20)로부터 음성인식 요청신호를 수신할 수 있다. 또한, 도 1의 식별정보 전송부(14)는 API(Application Programming Interface)를 이용하여 단말(20)로 식별정보를 전송할 수 있다. 다시 말하면, 단말(20)과 음성인식 제어 서버(20)간에 음성인식을 위한 제어신호들은 이러한 API를 통하여 송수신될 수 있으며, 이 경우, 단말(20)에는 API 클라이언트 모듈이, 음성인식 제어 서버(20)에는 API 서버 모듈이 각각 설치될 수 있다. 일반적으로, API는 소프트웨어 구성과 소프트웨어 구성간에 통신을 위한 인터페이스를 의미하는 것으로, 이러한 API의 일 예에는 HTTP API가 포함된다. 일반적으로, API는 복수의 단말 중 어느 하나의 단말과 음성인식 제어 서버(20)간의 통신과 복수의 단말 중 다른 하나의 단말과 음성인식 제어 서버(20)간의 통신간의 독립성을 보장하는데 유용하다. The request signal receiving unit 11 may receive a voice recognition request signal from the terminal 20 using an application programming interface (API). In addition, the identification information transmitter 14 of FIG. 1 may transmit the identification information to the terminal 20 using an application programming interface (API). In other words, the control signals for voice recognition between the terminal 20 and the voice recognition control server 20 can be transmitted and received through the API, in this case, the terminal 20, the API client module, voice recognition control server ( 20) each API server module may be installed. In general, an API means an interface for communication between a software configuration and a software configuration. An example of such an API includes an HTTP API. In general, the API is useful for ensuring independence between communication between any one terminal of the plurality of terminals and the voice recognition control server 20 and communication between the other terminal of the plurality of terminals and the voice recognition control server 20.

음성인식 엔진 결정부(12)는 음성인식 요청신호에 기초하여 복수의 음성인식 엔진들 중 단말(20)에 대응하는 음성인식 엔진을 결정한다. 이 때, 음성인식 엔진 결정부(12)는 음성인식 요청신호에 포함된 단말(20)의 단말 정보에 기초하여 상기 음성인식 엔진을 결정할 수 있다. 예를 들어, 음성인식 엔진 결정부(12)는 단말(20)의 단말 정보에 기초하여 단말(20)이 스마트 폰으로 판단된 경우, 복수의 음성인식 엔진들 중 스마트 폰을 위한 음성인식 엔진을 결정할 수 있다. 다른 예를 들어, 음성인식 엔진 결정부(12)는 단말(20)의 단말 정보에 기초하여 단말(20)이 안드로이드 기반의 단말로 판단된 경우, 복수의 음성인식 엔진들 중 안드로이드 기반의 음성인식 엔진을 결정할 수 있다. The speech recognition engine determiner 12 determines a speech recognition engine corresponding to the terminal 20 among the speech recognition engines based on the speech recognition request signal. At this time, the speech recognition engine determiner 12 may determine the speech recognition engine based on the terminal information of the terminal 20 included in the speech recognition request signal. For example, when the terminal 20 is determined to be a smart phone based on the terminal information of the terminal 20, the voice recognition engine determiner 12 may select a voice recognition engine for the smart phone among the plurality of voice recognition engines. You can decide. For another example, when the terminal 20 is determined to be an Android-based terminal based on the terminal information of the terminal 20, the speech recognition engine determiner 12 may recognize the Android-based speech among the plurality of speech recognition engines. The engine can be determined.

이와 같이, 음성인식 엔진 결정부(12)는 단말(20)의 단말 정보에 기초하여 단말의 하드웨어 타입 또는 소프트웨어 타입을 결정하고, 결정된 타입에 대응하는 음성인식 엔진을 결정할 수 있다. 이 때, 하드웨어 타입의 일 예에는 스마트 폰, 네비게이션, 태블릿 PC, PC, 스마트TV, 셋톱박스 등의 다양한 형태가 포함되고, 소프트웨어 타입의 일 예에는 안드로이드 OS, IOS, 윈도우 OS, 윈도우 모바일 OS, 미들웨어, 소정 어플리케이션 등 다양한 형태가 포함된다. As such, the voice recognition engine determiner 12 may determine the hardware type or the software type of the terminal based on the terminal information of the terminal 20, and determine the voice recognition engine corresponding to the determined type. At this time, one example of the hardware type includes various forms such as a smart phone, navigation, tablet PC, PC, smart TV, set-top box, etc. One example of the software type is Android OS, IOS, Windows OS, Windows Mobile OS, Various forms such as middleware and predetermined applications are included.

음성인식 엔진 결정부(12)는 단말(20)의 단말 정보, 서비스 정보 및 네트워크의 네트워크 정보 중 적어도 하나 이상에 기초하여 상기 음성인식 엔진을 결정할 수도 있다. 이 때, 서비스 정보는 단말(20)이 이용하고 있는 또는 이용하고자 하는 서비스의 종류에 대한 정보이며, 이러한 서비스 정보의 일 예에는 TV 서비스, 지도 서비스, 음악 서비스, 콜센터 서비스, 음성 다이얼 서비스 등 음성인식이 이용 가능한 다양한 형태의 서비스가 포함된다. 또한, 이와 같은 서비스 정보는 음성인식 요청신호로부터 추출되거나, 단말(20)로부터 직접 수신될 수 있다. The speech recognition engine determiner 12 may determine the speech recognition engine based on at least one of terminal information, service information, and network information of a network of the terminal 20. In this case, the service information is information on the type of service that the terminal 20 is using or wants to use. An example of such service information is a voice such as a TV service, a map service, a music service, a call center service, and a voice dial service. Various types of services for which awareness is available are included. In addition, such service information may be extracted from the voice recognition request signal or directly received from the terminal 20.

네트워크 정보는 네트워크의 타입을 포함한다. 이와 같은 네트워크의 일 예에는 앞서 설명된 바와 같이 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network) 등이 포함될 수 있다. The network information includes the type of network. An example of such a network may include the Internet, a local area network (LAN), a wireless local area network (WLAN), a wide area network (WAN), a personal area network (PAN), and the like, as described above. have.

음성인식 엔진 결정부(12)는 음성인식 요청신호에 기초하여 음성인식 엔진과 함께 제 2 프로토콜 연결을 결정할 수 있다. 이 때, 제 2 프로토콜 연결은 단말(20)과 음성인식 엔진간에 음성 데이터와 결과정보가 송수신되는 연결을 의미한다. 앞서 설명된 바와 같이, 이러한 제 2 프로토콜 연결은 제 1 프로토콜 연결과 다른 통신 계층 기반의 프로토콜 연결일 수 있다. The speech recognition engine determiner 12 may determine a second protocol connection with the speech recognition engine based on the speech recognition request signal. In this case, the second protocol connection means a connection in which voice data and result information are transmitted and received between the terminal 20 and the voice recognition engine. As described above, this second protocol connection may be a communication layer based protocol connection different from the first protocol connection.

식별정보 결정부(13)는 단말(20)과 결정된 음성인식 엔진간에 음성 데이터가 전송되는 제 2 프로토콜 연결의 식별정보를 결정한다. 이 때, 식별정보에는 결정된 음성인식 엔진의 주소정보가 포함될 수 있다. 또한, 식별정보에는 제 2 프로토콜을 나타내는 정보가 포함될 수 있다. The identification information determining unit 13 determines identification information of the second protocol connection through which voice data is transmitted between the terminal 20 and the determined voice recognition engine. At this time, the identification information may include the address information of the determined voice recognition engine. In addition, the identification information may include information indicating the second protocol.

음성인식 엔진의 주소정보의 일 예는 음성인식 엔진이 위치하는 곳을 식별하기 위한 URL(Uniform Resource Locator)이 포함된다. 일반적으로, 단말(20)은 이러한 URL을 이용하여 복수의 음성인식 엔진들 중 단말(20)에 적합한 음성인식 엔진으로 음성 데이터를 송신할 수 있다. An example of address information of the speech recognition engine includes a Uniform Resource Locator (URL) for identifying a location where the speech recognition engine is located. In general, the terminal 20 may transmit voice data to a speech recognition engine suitable for the terminal 20 among a plurality of speech recognition engines using the URL.

식별정보는 음성 데이터의 압축 인코딩 정보를 포함할 수도 있다. 이 때, 압축 인코딩 정보는 단말(20)에서 결정된 음성인식 엔진으로 전송되는 음성 데이터를 압축 및 인코딩하기 위한 정보를 의미한다. 예를 들어, 압축 인코딩 정보는 음성 데이터를 압축 레벨 2로 압축하고, 인코딩 레벨 3의 데이터 형태로 인코딩하기 위한 정보를 포함할 수 있다. The identification information may include compressed encoding information of the voice data. In this case, the compression encoding information refers to information for compressing and encoding voice data transmitted to the voice recognition engine determined by the terminal 20. For example, the compression encoding information may include information for compressing voice data to compression level 2 and encoding the data in encoding level 3 data form.

압축 레벨은 단말의 단말 정보, 서비스 정보 및 네트워크 정보 중 적어도 하나 이상에 따라 결정될 수 있다. 예를 들어, 압축 레벨은 네트워크 정보가 3G의 경우 단말 정보 및 서비스 정보를 고려하여 레벨 7로 결정될 수 있다. 다른 예를 들어, 압축 레벨은 서비스 정보가 음악 서비스인 경우 단말 정보 및 네트워크 정보를 고려하여 레벨 10으로 결정될 수도 있다. The compression level may be determined according to at least one of terminal information, service information, and network information of the terminal. For example, the compression level may be determined as level 7 in consideration of terminal information and service information when the network information is 3G. For another example, the compression level may be determined as level 10 in consideration of terminal information and network information when the service information is a music service.

단말(20)은 압축 레벨에 기초하여 음성 데이터를 압축할 수 있다. 또한, 음성인식 엔진은 압축 레벨에 기초하여 압축된 음성 데이터를 복원할 수 있다. 이 경우, 음성인식 엔진은 단말(20) 또는 음성인식 제어 서버(10)로부터 압축 인코딩 정보를 획득할 수 있다. The terminal 20 may compress the voice data based on the compression level. Also, the speech recognition engine may restore the compressed speech data based on the compression level. In this case, the speech recognition engine may obtain compressed encoding information from the terminal 20 or the speech recognition control server 10.

인코딩 레벨은 역시 단말의 단말 정보, 서비스 정보 및 네트워크 정보 중 적어도 하나 이상에 따라 결정될 수 있다. 이러한 인코딩 레벨을 예시하면, 레벨 1은 IR 통신 음성인식, 레벨 2는 블루투스 음성인식, 레벨 3은 아이폰 음성인식, 레벨 4는 안드로이드 폰 음성인식, 레벨 5는 음악 멜로디 또는 허밍 각각을 나타낼 수 있다. The encoding level may also be determined according to at least one or more of terminal information, service information, and network information of the terminal. Illustrating this encoding level, level 1 may indicate IR communication voice recognition, level 2 Bluetooth voice recognition, level 3 iPhone voice recognition, level 4 Android phone voice recognition, and level 5 music melody or humming.

단말(20)은 인코딩 레벨에 기초하여 음성 데이터를 인코딩할 수 있다. 또한, 음성인식 엔진은 인코딩 레벨에 기초하여 인코딩된 음성 데이터를 디코딩할 수 있다. 이 경우, 음성인식 엔진은 단말(20) 또는 음성인식 제어 서버(10)로부터 압축 인코딩 정보를 획득할 수 있다. The terminal 20 may encode the voice data based on the encoding level. In addition, the speech recognition engine may decode the encoded speech data based on the encoding level. In this case, the speech recognition engine may obtain compressed encoding information from the terminal 20 or the speech recognition control server 10.

식별정보 전송부(14)는 식별정보를 단말(20)로 전송한다. 또한, 단말(20)은 수신된 식별정보를 이용하여 결정된 음성인식 엔진으로 음성 데이터를 전송한다. 이 때, 단말(20)은 앞서 설명된 바와 같이 식별정보에 포함된 제 2 프로토콜 연결을 기반으로 음성 데이터를 전송할 수 있다. 또한, 단말(20)은 압축 인코딩 정보에 기초하여 음성 데이터를 인코딩하고, 인코딩된 음성 데이터를 음성인식 엔진으로 전송할 수 있다. 또한, 단말(20)은 단말(20)의 단말 정보, 서비스 정보 및 상기 제 2 프토토콜 연결의 네트워크 정보에 기초하여 음성 데이터를 인코딩하고, 인코딩된 음성 데이터를 음성인식 엔진으로 전송할 수도 있다. The identification information transmitter 14 transmits the identification information to the terminal 20. In addition, the terminal 20 transmits the voice data to the voice recognition engine determined using the received identification information. At this time, the terminal 20 may transmit voice data based on the second protocol connection included in the identification information as described above. In addition, the terminal 20 may encode the speech data based on the compressed encoding information, and transmit the encoded speech data to the speech recognition engine. In addition, the terminal 20 may encode the voice data based on the terminal information of the terminal 20, the service information, and the network information of the second protocol connection, and transmit the encoded voice data to the voice recognition engine.

제 1 음성인식 엔진(15)은 단말(20)로부터 수신된 음성 데이터에 대응하는 결과정보를 생성하고, 생성된 결과정보를 단말(20)로 전송한다. 이 때, 제 1 음성인식 엔진(15)은 앞서 언급된 결정된 음성인식 엔진을 의미한다. 또한, 결과정보는 사람 또는 장치에 의하여 이용될 수 있도록, 음성 데이터로부터 인식된 문자 또는 숫자 형식의 데이터를 의미한다. The first voice recognition engine 15 generates result information corresponding to the voice data received from the terminal 20 and transmits the generated result information to the terminal 20. At this time, the first voice recognition engine 15 means the above-mentioned determined voice recognition engine. In addition, the result information refers to data in the form of letters or numbers recognized from voice data so that it can be used by a person or a device.

제 2 음성인식 엔진(16)은 단말(20)과 다른 유형의 단말로부터 음성 데이터를 수신하는 경우, 수신된 음성 데이터에 대응하는 결과정보를 생성하고, 생성된 결과 정보를 다른 단말로 전송할 수 있다. 음성인식 제어 서버(10)는 복수의 단말들 각각의 특성에 따른 복수의 음성인식 엔진들을 포함할 수 있다. 따라서, 음성인식 제어 서버(10)는 제 1 음성인식 엔진(15) 및 제 2 음성인식 엔진(16) 이외의 적어도 하나 이상의 음성인식 엔진을 더 포함할 수 있다. When the second voice recognition engine 16 receives the voice data from the terminal 20 and the other type of terminal, the second voice recognition engine 16 may generate result information corresponding to the received voice data and transmit the generated result information to another terminal. . The voice recognition control server 10 may include a plurality of voice recognition engines according to characteristics of each of the plurality of terminals. Accordingly, the voice recognition control server 10 may further include at least one voice recognition engine other than the first voice recognition engine 15 and the second voice recognition engine 16.

본 발명의 일 실시예에 따르면, 복수의 음성인식 엔진들 중 어느 하나는 음성인식 제어 서버(10)의 내부에 포함되고, 복수의 음성인식 엔진들 중 다른 하나는 음성인식 제어 서버(10)의 외부의 소정 음성인식 서버에 포함될 수도 있다. According to one embodiment of the invention, any one of the plurality of voice recognition engine is included in the voice recognition control server 10, the other of the plurality of voice recognition engine of the voice recognition control server 10 It may be included in an external predetermined voice recognition server.

데이터베이스(17)는 데이터를 저장한다. 이 때, 데이터는 음성인식 제어 서버(10) 내부의 각 구성요소들 간에 입력 및 출력되는 데이터를 포함하고, 제어 서버(10)와 제어 서버(10) 외부의 구성요소간에 입력 및 출력되는 데이터를 포함한다. 예를 들어, 데이터베이스(15)는 식별 정보 결정부(13)에서 식별 정보 전송부(14)로 전달되는 식별정보를 저장하고, 단말(20)에서 음성인식 제어 서버(10)로 입력되는 음성인식 요청신호를 저장할 수 있다. 이러한 데이터베이스(15)의 일 예에는 음성인식 제어 서버(10) 내부 또는 외부에 존재하는 하드디스크드라이브, 하드디스크드라이브, ROM(Read Only Memory), RAM(Random Access Memory), 플래쉬메모리 및 메모리카드 등이 포함된다. The database 17 stores data. At this time, the data includes data input and output between the components inside the voice recognition control server 10, and the data input and output between the control server 10 and the components outside the control server 10 Include. For example, the database 15 stores the identification information transmitted from the identification information determination unit 13 to the identification information transmission unit 14, and the voice recognition input from the terminal 20 to the voice recognition control server 10. The request signal can be stored. An example of such a database 15 includes a hard disk drive, a hard disk drive, a read only memory (ROM), a random access memory (RAM), a flash memory, a memory card, or the like existing inside or outside the voice recognition control server 10. This includes.

도 3은 본 발명의 다른 실시예에 따른 음성인식 제어 서버(40) 및 음성인식 엔진 서버(50)의 구성도이다. 도 3을 참조하면, 음성인식 엔진 서버(50)는 제 1 음성인식 엔진(51) 및 제 2 음성인식 엔진(52)을 포함한다. 3 is a configuration diagram of a voice recognition control server 40 and a voice recognition engine server 50 according to another embodiment of the present invention. Referring to FIG. 3, the voice recognition engine server 50 includes a first voice recognition engine 51 and a second voice recognition engine 52.

도 3을 참조하면, 음성인식 제어 서버(40)는 네트워크를 통하여 단말(20)과 설정된 제 1 프로토콜 연결을 기반으로 단말(20)로부터 음성인식 요청신호를 수신하고, 수신된 음성인식 요청신호에 기초하여 복수의 음성인식 엔진들 중 단말(20)에 대응하는 음성인식 엔진을 결정하고, 단말(20)과 결정된 음성인식 엔진간에 음성 데이터가 전송되는 제 2 프로토콜 연결의 식별정보를 결정하고, 결정된 식별정보를 단말(20)로 전송한다. 이와 같은 음성인식 제어 서버의 동작에 관하여 설명되지 아니한 사항은 앞서 도 2의 음성인식 제어 서버(10)의 요청신호 수신부(11), 음성인식 엔진 결정부(12), 식별정보 결정부(13), 식별정보 전송부(14) 및 데이터베이스(17)에 대하여 설명된 내용과 동일 또는 설명된 내용으로부터 당업자에 의해 용이하게 유추 가능하므로 이하 설명을 생략한다. Referring to FIG. 3, the voice recognition control server 40 receives a voice recognition request signal from the terminal 20 based on a first protocol connection established with the terminal 20 through a network, and receives the received voice recognition request signal. A voice recognition engine corresponding to the terminal 20 is determined based on the plurality of voice recognition engines, and identification information of the second protocol connection through which voice data is transmitted between the terminal 20 and the determined voice recognition engine is determined. The identification information is transmitted to the terminal 20. The matters not described with respect to the operation of the voice recognition control server are described above in the request signal receiver 11, the voice recognition engine determiner 12, and the identification information determiner 13 of the voice recognition control server 10 of FIG. 2. Since the identification information transmitting unit 14 and the database 17 can be easily inferred from the same or described contents by those skilled in the art, the following description is omitted.

제 1 음성인식 엔진(51)은 단말(20)과의 제 2 프로토콜 연결을 기반으로 단말(20)로부터 수신된 음성 데이터에 대응하는 결과정보를 생성하고, 생성된 결과정보를 단말(20)로 전송한다. 이 때, 제 1 음성인식 엔진(50)은 앞서 언급된 결정된 음성인식 엔진을 의미한다. 또한, 결과정보는 사람 또는 장치에 의하여 이용될 수 있도록, 음성 데이터로부터 인식된 문자 또는 숫자 형식의 데이터를 의미한다. The first voice recognition engine 51 generates result information corresponding to the voice data received from the terminal 20 based on the second protocol connection with the terminal 20, and sends the generated result information to the terminal 20. send. At this time, the first voice recognition engine 50 means the above-mentioned determined voice recognition engine. In addition, the result information refers to data in the form of letters or numbers recognized from voice data so that it can be used by a person or a device.

제 2 음성인식 엔진(52)은 단말(20)과 다른 유형의 단말로부터 음성 데이터를 수신하는 경우, 수신된 음성 데이터에 대응하는 결과정보를 생성하고, 생성된 결과 정보를 다른 단말로 전송할 수 있다. 또한, 음성인식 엔진 서버(50)는 복수의 단말들 각각의 특성에 따른 복수의 음성인식 엔진들을 포함할 수 있다. 따라서, 음성인식 엔진 서버(50)에는 제 1 음성인식 엔진(51) 및 제 2 음성인식 엔진(52) 이외의 적어도 하나 이상의 음성인식 엔진을 더 포함할 수 있다. 이와 같은, 제 1 음성인식 엔진(51) 및 제 2 음성인식 엔진에 대하여 설명하지 아니한 사항은 앞서 도 1 내지 2를 통하여 음성인식 엔진에 대하여 설명된 내용과 동일하거나 설명된 내용으로부터 당업자에 의하여 용이하게 유추 가능하므로 이하 설명을 생략한다. When the second voice recognition engine 52 receives voice data from the terminal 20 and a different type of terminal, the second voice recognition engine 52 may generate result information corresponding to the received voice data and transmit the generated result information to another terminal. . In addition, the voice recognition engine server 50 may include a plurality of voice recognition engines according to characteristics of each of the plurality of terminals. Therefore, the speech recognition engine server 50 may further include at least one speech recognition engine other than the first speech recognition engine 51 and the second speech recognition engine 52. Such matters that are not described with respect to the first speech recognition engine 51 and the second speech recognition engine are the same as those described above for the speech recognition engine through FIGS. Since it can be inferred, the following description is omitted.

도 4는 본 발명의 일 실시예에 따른 단말(20)의 구성도이다. 이와 같은 도 4의 단말(20)은 도 1에 도시된 복수의 단말(21 내지 23) 중 어느 하나의 단말일 수 있다. 다만, 단말(20)이 도 1에 도시된 복수의 단말(21 내지 23)의 형태 또는 종류로 한정 해석되지는 않는다. 도 4를 참조하면, 단말(20)은 요청신호 전송부(21), 식별정보 수신부(22), 연결 설정부(23), 음성 데이터 전송부(24), 결과 정보 수신부(25) 및 검색 요청부(26)를 포함한다. 4 is a configuration diagram of a terminal 20 according to an embodiment of the present invention. The terminal 20 of FIG. 4 may be any one of the plurality of terminals 21 to 23 illustrated in FIG. 1. However, the terminal 20 is not limited to the form or type of the plurality of terminals 21 to 23 shown in FIG. 1. Referring to FIG. 4, the terminal 20 includes a request signal transmitter 21, an identification information receiver 22, a connection setup unit 23, a voice data transmitter 24, a result information receiver 25, and a search request. Part 26 is included.

다만, 도 4에 도시된 단말(10)은 본 발명의 하나의 구현 예에 불과하며, 도 6에 도시된 구성요소들을 기초로 하여 여러 가지 변형이 가능하다. 예를 들어, 단말(20)은 사용자로부터 어떤 명령 내지 정보를 입력받기 위한 사용자 인터페이스가 더 포함될 수 있다. 이 경우, 사용자 인터페이스는 일반적으로 키보드, 마우스 등과 같은 입력 장치가 될 수도 있으나, 영상 표시 장치에 표현되는 그래픽 유저 인터페이스(GUI, Graphical User interface)가 될 수도 있다. 다른 예를 들어, 단말(20)은 음성인식 제어 서버(10)와 데이터를 송수신하는 통신부를 더 포함할 수도 있다. 또 다른 예를 들어, 단말(20)은 일반적인 단말이 포함하는 구성들(예를 들어, 영상 및 음성 처리부)이 더 포함될 수도 있다. 또한, 단말(20)은 데이터베이스를 더 포함할 수도 있다. However, the terminal 10 shown in FIG. 4 is only one implementation example of the present invention, and various modifications are possible based on the components shown in FIG. 6. For example, the terminal 20 may further include a user interface for receiving a certain command or information from the user. In this case, the user interface may generally be an input device such as a keyboard, a mouse, or the like, or may be a graphical user interface (GUI) expressed on the image display device. For another example, the terminal 20 may further include a communication unit for transmitting and receiving data with the voice recognition control server 10. As another example, the terminal 20 may further include components (eg, video and audio processing unit) included in the general terminal. In addition, the terminal 20 may further include a database.

요청신호 전송부(21)는 네트워크를 통하여 음성인식 제어 서버(10)와 설정된 제 1 프로토콜 연결을 기반으로 음성인식 제어 서버(10)로 음성인식 요청신호를 전송한다. 이 때, 음성인식 요청신호는 단말(20)의 단말 정보를 포함할 수 있다. 또한, 제 1 프로토콜은 제 2 프로토콜과 다른 통신 계층 기반의 프로토콜일 수 있다. The request signal transmitter 21 transmits the voice recognition request signal to the voice recognition control server 10 based on the first protocol connection established with the voice recognition control server 10 through the network. In this case, the voice recognition request signal may include terminal information of the terminal 20. In addition, the first protocol may be a communication layer based protocol different from the second protocol.

식별정보 수신부(22)는 음성인식 제어 서버(10)로부터 복수의 음성인식 엔진들 중 어느 하나의 음성인식 엔진의 식별 정보를 수신한다. The identification information receiving unit 22 receives identification information of any one of the plurality of speech recognition engines from the speech recognition control server 10.

연결 설정부(23)는 수신된 식별 정보에 기초하여 어느 하나의 음성인식 엔진과 제 2 프로토콜 연결을 설정한다. The connection setting unit 23 sets up a second protocol connection with any one voice recognition engine based on the received identification information.

음성 데이터 전송부(24)는 설정된 제 2 프로토콜 연결을 기반으로 어느 하나의 음성인식 엔진으로 음성 데이터를 전송한다. 이 때, 음성 데이터 전송부(24)는 압축 인코딩 정보에 기초하여 음성 데이터를 인코딩하고, 인코딩된 음성 데이터를 상기 어느 하나의 음성인식 엔진으로 음성 데이터를 전송할 수 있다. 이 경우, 음성 데이터의 압축 인코딩 정보는 식별정보에 포함될 수 있다. 또한, 음성 데이터 전송부(24)는 단말(20)의 단말 정보, 서비스 정보 및 제 2 프토토콜 연결의 네트워크 정보에 기초하여 음성 데이터를 인코딩하고, 인코딩된 음성 데이터를 음성인식 엔진으로 전송할 수도 있다. The voice data transmitter 24 transmits the voice data to any one voice recognition engine based on the established second protocol connection. At this time, the voice data transmitter 24 may encode the voice data based on the compressed encoding information, and transmit the encoded voice data to any one voice recognition engine. In this case, compressed encoding information of the voice data may be included in the identification information. In addition, the voice data transmitter 24 may encode the voice data based on the terminal information of the terminal 20, the service information, and the network information of the second protocol connection, and transmit the encoded voice data to the voice recognition engine. .

결과정보 수신부(25)는 어느 하나의 음성인식 엔진으로부터 전송된 음성 데이터에 대응하는 결과 정보를 수신한다. The result information receiver 25 receives result information corresponding to the voice data transmitted from any one of the voice recognition engines.

검색 요청부(26)는 결과 정보에 기초하여 검색 서버(30)로 검색 요청 신호를 전송한다. 검색 요청부(26)는 검색 서버(30)로부터 검색 요청 신호에 대응하는 검색 결과를 수신할 수 있다. 다만, 본 발명의 다른 실시예에 따르면, 검색 요청부(26)는 단말(20)과 연관된 대상 단말로 검색 요청 신호에 대응하는 검색 결과를 제공할 것을 요청하는 검색 요청 신호를 검색 서버(30)로 전송할 수도 있다. 이 경우, 대상 단말은 검색 서버(30)로부터 검색 결과를 제공받을 수 있다. The search request unit 26 transmits a search request signal to the search server 30 based on the result information. The search request unit 26 may receive a search result corresponding to the search request signal from the search server 30. However, according to another exemplary embodiment of the present invention, the search request unit 26 may search the search server 30 for a search request signal for requesting to provide a search result corresponding to the search request signal to the target terminal associated with the terminal 20. Can also be sent. In this case, the target terminal may receive a search result from the search server 30.

이와 같은 도 4의 단말(20)은 앞서 도 1 내지 도 3을 통하여 설명된 복수의 단말(21 내지 23) 중 어느 하나의 단말 또는 단말(20)에 대하여 설명된 동작을 수행하는 것이다. 따라서, 도 4를 통하여 단말(20)에 대해 설명되지 아니한 사항은 앞서 도 1 내지 도 3을 통하여 복수의 단말(21 내지 23) 중 어느 하나의 단말 또는 단말(20)에 대하여 설명된 내용을 준용한다. 다시 말하면, 도 4의 단말(20)에 대한 보다 구체적인 설명은 앞서 도 1 내지 도 3을 통하여 설명된 내용과 동일 또는 설명된 내용으로부터 당업자에 의해 용이하게 유추 가능하므로 이하 생략한다. The terminal 20 of FIG. 4 performs the operation described with respect to any one terminal or the terminal 20 of the plurality of terminals 21 to 23 described above with reference to FIGS. 1 to 3. Therefore, the matters that are not described with respect to the terminal 20 through FIG. 4 apply the contents described above with respect to any one terminal or the terminal 20 of the plurality of terminals 21 through 23 through FIGS. 1 through 3. do. In other words, the detailed description of the terminal 20 of FIG. 4 will be omitted below since it can be easily inferred by those skilled in the art from the same or described contents as described above with reference to FIGS. 1 to 3.

도 5는 본 발명의 일 실시예에 따른 음성인식 제어 방법을 나타낸 동작 흐름도이다. 도 5에 도시된 실시예에 따른 음성인식 제어 방법은 도 2에 도시된 실시예에 따른 음성인식 제어 서버(10) 또는 도 3에 도시된 다른 실시예에 따른 음성인식 제어 서버(40)에서 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 2 또는 도 3에 관하여 이상에서 기술된 내용은 도 5에 도시된 실시예에 따른 음성인식 제어 방법에도 적용된다. 5 is an operation flowchart illustrating a voice recognition control method according to an embodiment of the present invention. The voice recognition control method according to the embodiment shown in FIG. 5 is a clock in the voice recognition control server 10 according to the embodiment shown in FIG. 2 or the voice recognition control server 40 according to another embodiment shown in FIG. 3. Thermally treated steps. Therefore, even if omitted below, the contents described above with respect to FIG. 2 or FIG. 3 also apply to the voice recognition control method according to the embodiment shown in FIG. 5.

단계 S51에서 요청신호 수신부(11)는 네트워크를 통하여 단말(20)과 제 1 프로토콜 연결을 설정한다. 단계 S52에서 요청신호 수신부(11)는 설정된 제 1 프로토콜 연결을 기반으로 단말(20)로부터 음성인식 요청신호를 수신한다. 단계 S53에서 음성인식 엔진 결정부(12)는 음성인식 요청신호에 기초하여 복수의 음성인식 엔진들 중 단말(20)에 대응하는 음성인식 엔진을 결정한다. 단계 S54에서 식별정보 결정부(13)는 단말(20)과 결정된 음성인식 엔진간에 음성 데이터가 전송되는 제 2 프로토콜 연결의 식별정보를 결정한다. 단계 S55에서 식별정보 전송부(14)는 결정된 식별정보를 단말(20)로 전송한다. In step S51, the request signal receiving unit 11 establishes a first protocol connection with the terminal 20 through a network. In step S52, the request signal receiving unit 11 receives a voice recognition request signal from the terminal 20 based on the established first protocol connection. In operation S53, the speech recognition engine determiner 12 determines a speech recognition engine corresponding to the terminal 20 among the speech recognition engines based on the speech recognition request signal. In step S54, the identification information determiner 13 determines the identification information of the second protocol connection through which voice data is transmitted between the terminal 20 and the determined speech recognition engine. In step S55, the identification information transmitting unit 14 transmits the determined identification information to the terminal 20.

도 5를 통해 설명된 실시예들에 따른 음성인식 제어 방법들 각각은 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체 및 통신 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. 통신 매체는 전형적으로 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈, 또는 반송파와 같은 변조된 데이터 신호의 기타 데이터, 또는 기타 전송 메커니즘을 포함하며, 임의의 정보 전달 매체를 포함한다. Each of the voice recognition control methods according to the embodiments described with reference to FIG. 5 may also be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by the computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, computer readable media may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transmission mechanism, and includes any information delivery media.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The foregoing description of the present invention is intended for illustration, and it will be understood by those skilled in the art that the present invention may be easily modified in other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다. The scope of the present invention is shown by the following claims rather than the above description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present invention. do.

10: 음성인식 제어 서버
11: 요청신호 수신부
12: 음성인식 엔진 결정부
13: 식별정보 결정부
14: 식별정보 전송부
20: 단말10: voice recognition control server
11: request signal receiver
12: speech recognition engine determination unit
13: Identification Information Determination Unit
14: identification information transmission unit
20: terminal

Claims

A request signal receiving unit receiving a voice recognition request signal from the terminal based on a first protocol connection established with the terminal through a network;
Determine a speech recognition engine corresponding to the terminal among a plurality of speech recognition engines based on at least one of terminal information, service information, and network information of the network included in the speech recognition request signal; A speech recognition engine determiner configured to determine a second protocol connection between the terminal and the speech recognition engine based on a recognition request signal to transmit and receive speech data and result information in a text or numeric format recognized from the speech data;
An identification information determining unit for determining identification information of a second protocol connection through which voice data is transmitted between the terminal and the determined speech recognition engine; And
An identification information transmission unit for transmitting the identification information to the terminal based on the first protocol connection,
The identification information includes compression encoding information including a compression level and an encoding level of the voice data, wherein the compression level and the encoding level are determined according to at least one or more of terminal information, service information, and network information;
And the speech data compressed and encoded based on the compression encoding information is transmitted from the terminal to the determined speech recognition engine based on the second protocol connection included in the identification information.

delete

The method of claim 1,
Wherein the first protocol is a communication layer based protocol different from the second protocol.

The method of claim 1,
The first protocol is HyperText Transfer Protocol (HTTP), and the second protocol is Transmission Control Protocol-Internet Protocol (TCP-IP).

The method of claim 1,
Further comprising a plurality of speech recognition engines.

The method of claim 1,
One of the plurality of speech recognition engines is included in the speech recognition control server, and the other of the plurality of speech recognition engines is included in a predetermined speech recognition server outside the speech recognition control server. Awareness Control Server.

The method of claim 1,
The identification information includes the network address information of the voice recognition engine, voice recognition control server.

delete

The method of claim 1,
The request signal receiving unit receives a voice recognition request signal from a first terminal of a plurality of terminals,
The speech recognition engine determiner determines a speech recognition engine corresponding to the second terminal based on terminal information of the second terminal included in the speech recognition request signal,
The identification information determining unit determines the identification information of the second protocol connection for transmitting voice data between the second terminal and the determined speech recognition engine,
The identification information transmission unit is to transmit to the second terminal, voice recognition control server.

Establishing a first protocol connection with a terminal through a network;
Receiving a voice recognition request signal from the terminal based on the established first protocol connection;
Determining a speech recognition engine corresponding to the terminal among a plurality of speech recognition engines based on at least one of terminal information, service information, and network information of the network included in the speech recognition request signal;
Determining a second protocol connection between the terminal and the speech recognition engine based on the speech recognition request signal to transmit and receive speech data and result information in a text or numeric format recognized from the speech data;
Determining identification information of a second protocol connection through which voice data is transmitted between the terminal and the determined voice recognition engine; And
And transmitting the determined identification information to the terminal based on the first protocol connection.
The identification information includes compression encoding information including a compression level and an encoding level of the voice data, wherein the compression level and the encoding level are determined according to at least one or more of terminal information, service information, and network information,
And the speech data compressed and encoded based on the compression encoding information is transmitted from the terminal to the determined speech recognition engine based on the second protocol connection included in the identification information.

A request signal transmitter for transmitting a voice recognition request signal to the voice recognition control server based on a first protocol connection established with the voice recognition control server through a network;
Identification of any one of the plurality of speech recognition engines determined based on at least one or more of the terminal information, service information and the network information of the network included in the speech recognition request signal from the speech recognition control server An identification receiver configured to receive information based on the first protocol connection;
A connection setting unit configured to establish a second protocol connection through which any one voice recognition engine and voice data and result information in a text or numeric format recognized from the voice data are transmitted and received based on the received identification information;
A voice data transmitter for transmitting the voice data to the voice recognition engine based on the set second protocol connection; And
A result information receiver configured to receive the result information corresponding to the transmitted voice data from the voice recognition engine based on the second protocol connection,
The identification information includes compression encoding information including a compression level and an encoding level of the voice data, wherein the compression level and the encoding level are determined according to at least one or more of terminal information, service information, and network information,
The voice data transmitter transmits the voice data compressed and encoded based on the compressed encoding information to the voice recognition engine based on the second protocol connection included in the identification information.

delete

The method of claim 15,
The first protocol is a communication layer based protocol different from the second protocol.

delete

The method of claim 15,
The voice data transmitting unit encodes the voice data based on the terminal information, the service information, and the network information of the second protocol connection of the terminal, and transmits the encoded voice data to the voice recognition engine.

The method of claim 15,
Further comprising a search request unit for transmitting a search request signal to the search server based on the result information,
The search request signal is a signal for requesting to provide a search result corresponding to the search request signal to a target terminal associated with the terminal.