KR102014774B1 - Server and method for controlling voice recognition of device, and the device - Google Patents
Server and method for controlling voice recognition of device, and the device Download PDFInfo
- Publication number
- KR102014774B1 KR102014774B1 KR1020110138225A KR20110138225A KR102014774B1 KR 102014774 B1 KR102014774 B1 KR 102014774B1 KR 1020110138225 A KR1020110138225 A KR 1020110138225A KR 20110138225 A KR20110138225 A KR 20110138225A KR 102014774 B1 KR102014774 B1 KR 102014774B1
- Authority
- KR
- South Korea
- Prior art keywords
- terminal
- information
- voice recognition
- speech recognition
- voice
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 230000006835 compression Effects 0.000 claims description 19
- 238000007906 compression Methods 0.000 claims description 19
- 238000004891 communication Methods 0.000 claims description 17
- 230000005540 biological transmission Effects 0.000 claims description 8
- 238000012546 transfer Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/18—Multiprotocol handlers, e.g. single devices capable of handling multiple protocols
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Telephonic Communication Services (AREA)
Abstract
A control server and method for controlling voice recognition of a terminal, and a terminal are provided. More specifically, the voice recognition request signal is received from the terminal based on the first protocol connection established with the terminal through the network, and the voice recognition engine corresponding to the terminal is determined among the plurality of voice recognition engines based on the voice recognition request signal. A voice recognition control server and method for determining the identification information of the second protocol connection through which voice data is transmitted between the terminal and the determined voice recognition engine and transmitting the determined identification information to the terminal are provided.
Description
The present invention relates to a server and a method for controlling voice recognition, and a terminal, and more particularly, to a server and a method for controlling voice recognition of each of a plurality of terminals, and a terminal.
The N Screen service is a service that allows a user to use a service that was independently used in various devices such as a TV, a PC, a tablet PC, or a smartphone, centering on a user or content. In the provision of the N screen service, a technology of simultaneously playing the same content on a plurality of devices of various types and seamless playback of content played on any one terminal of the plurality of devices on another device of the plurality of devices Technology is required. In this regard, Korean Patent Publication No. 2011-0009587, which is a prior art, discloses a configuration for providing video content replay between heterogeneous terminals by implementing synchronization of playback history between content servers providing content for a plurality of terminals. .
Meanwhile, due to the expansion of the N-screen environment, various voice interfaces such as pads, smart phones, and IPTVs are required to effectively perform a plurality of voice interface requirements due to the expansion of the number of users and the use of different terminals. However, existing systems are limited in handling the voice interface requirements of large amounts of voice interface or other types of terminals.
It is possible to perform voice interface control of terminals more effectively by integrating different characteristics of various types of terminals. It is possible to prevent large locks due to large voice interface requests of a plurality of terminals and to reduce network load. However, the technical problem to be achieved by the present embodiment is not limited to the technical problems as described above, and other technical problems may exist.
As a technical means for achieving the above technical problem, an embodiment of the present invention is a request signal receiving unit for receiving a voice recognition request signal from the terminal based on a first protocol connection established with the terminal through a network, the voice recognition request A speech recognition engine determiner configured to determine a speech recognition engine corresponding to the terminal among a plurality of speech recognition engines based on the signal, and identification information of a second protocol connection in which speech data is transmitted between the terminal and the determined speech recognition engine. It may provide a voice recognition control server including an identification information determining unit for determining and an identification information transmitting unit for transmitting the identification information to the terminal.
In addition, another embodiment of the present invention comprises the steps of establishing a first protocol connection with the terminal through a network, receiving a voice recognition request signal from the terminal based on the set first protocol connection, the voice recognition request signal Determining a speech recognition engine corresponding to the terminal from among a plurality of speech recognition engines, determining identification information of a second protocol connection through which speech data is transmitted between the terminal and the determined speech recognition engine; It may provide a voice recognition control method comprising the step of transmitting the identification information to the terminal.
In addition, another embodiment of the present invention is a request signal transmission unit for transmitting a voice recognition request signal to the voice recognition control server based on the first protocol connection established with the voice recognition control server through the network, from the voice recognition control server An identification information receiver configured to receive identification information of any one of a plurality of speech recognition engines, a connection setting unit configured to establish a second protocol connection with the one of the speech recognition engines based on the received identification information; Based on the set second protocol connection, a voice data transmission unit for transmitting voice data to the one voice recognition engine and result information for receiving result information corresponding to the transmitted voice data from any one voice recognition engine. A terminal including a receiver may be provided.
By determining the voice recognition engine specific to the terminal in consideration of the characteristics of each terminal, it is possible to perform the voice interface control of the terminals more effectively by integrating the different characteristics of the various types of terminals. By separating and operating the first protocol for transmitting and receiving control signals and the second protocol for transmitting and receiving voice data, it is possible to prevent large locks caused by large-capacity voice interface requests of a plurality of terminals and to reduce network load.
1 is a block diagram of a voice recognition control system according to an embodiment of the present invention.
2 is a block diagram of the voice
3 is a configuration diagram of a voice
4 is a configuration diagram of a
5 is an operation flowchart illustrating a voice recognition control method according to an embodiment of the present invention.
DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.
Throughout the specification, when a part is "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element in between. . In addition, when a part is said to "include" a certain component, which means that it may further include other components, except to exclude other components unless otherwise stated.
1 is a block diagram of a voice recognition control system according to an embodiment of the present invention. Referring to FIG. 1, the voice recognition control system includes a voice
Each component of FIG. 1 constituting the voice recognition control system is generally connected through a network. A network refers to a connection structure capable of exchanging information between respective nodes such as terminals and servers. Examples of such a network include the Internet, a local area network, and a wireless LAN. Local Area Network (WAN), Wide Area Network (WAN), Personal Area Network (PAN), etc. may be included, but is not limited thereto.
The voice
The voice
The voice
According to an embodiment of the present invention, the voice
The
Each of the plurality of
According to various embodiments of the present disclosure, each of the plurality of terminals may be a different type of terminal. For example, the terminal may be a TV device, a computer or a portable terminal capable of connecting to a remote server via a network. Here, an example of a TV device includes a smart TV, an IPTV set-top box, and the like, and an example of a computer includes a laptop, desktop, laptop, etc., which is equipped with a web browser. An example of a terminal is a wireless communication device that guarantees portability and mobility, and includes a personal communication system (PCS), a global system for mobile communications (GSM), a personal digital cellular (PDC), a personal handyphone system (PHS), and a personal digital (PDA). Assistant (IMT), International Mobile Telecommunication (IMT) -2000, Code Division Multiple Access (CDMA) -2000, W-Code Division Multiple Access (W-CDMA), Wireless Broadband Internet (Wibro) terminal, smartphone, tablet PC All kinds of handheld based wireless communication devices such as the like may be included.
The operation of each component of the voice recognition control system of FIG. 1 will be described in more detail with reference to the following drawings.
2 is a block diagram of the voice
However, the voice
The request
The request
The request
The speech
As such, the voice
The speech
The network information includes the type of network. An example of such a network may include the Internet, a local area network (LAN), a wireless local area network (WLAN), a wide area network (WAN), a personal area network (PAN), and the like, as described above. have.
The speech
The identification
An example of address information of the speech recognition engine includes a Uniform Resource Locator (URL) for identifying a location where the speech recognition engine is located. In general, the terminal 20 may transmit voice data to a speech recognition engine suitable for the terminal 20 among a plurality of speech recognition engines using the URL.
The identification information may include compressed encoding information of the voice data. In this case, the compression encoding information refers to information for compressing and encoding voice data transmitted to the voice recognition engine determined by the terminal 20. For example, the compression encoding information may include information for compressing voice data to compression level 2 and encoding the data in encoding level 3 data form.
The compression level may be determined according to at least one of terminal information, service information, and network information of the terminal. For example, the compression level may be determined as level 7 in consideration of terminal information and service information when the network information is 3G. For another example, the compression level may be determined as
The terminal 20 may compress the voice data based on the compression level. Also, the speech recognition engine may restore the compressed speech data based on the compression level. In this case, the speech recognition engine may obtain compressed encoding information from the terminal 20 or the speech
The encoding level may also be determined according to at least one or more of terminal information, service information, and network information of the terminal. Illustrating this encoding level, level 1 may indicate IR communication voice recognition, level 2 Bluetooth voice recognition, level 3 iPhone voice recognition, level 4 Android phone voice recognition, and level 5 music melody or humming.
The terminal 20 may encode the voice data based on the encoding level. In addition, the speech recognition engine may decode the encoded speech data based on the encoding level. In this case, the speech recognition engine may obtain compressed encoding information from the terminal 20 or the speech
The
The first
When the second
According to one embodiment of the invention, any one of the plurality of voice recognition engine is included in the voice
The
3 is a configuration diagram of a voice
Referring to FIG. 3, the voice
The first
When the second
4 is a configuration diagram of a terminal 20 according to an embodiment of the present invention. The terminal 20 of FIG. 4 may be any one of the plurality of
However, the terminal 10 shown in FIG. 4 is only one implementation example of the present invention, and various modifications are possible based on the components shown in FIG. 6. For example, the terminal 20 may further include a user interface for receiving a certain command or information from the user. In this case, the user interface may generally be an input device such as a keyboard, a mouse, or the like, or may be a graphical user interface (GUI) expressed on the image display device. For another example, the terminal 20 may further include a communication unit for transmitting and receiving data with the voice
The
The identification
The
The
The
The
The terminal 20 of FIG. 4 performs the operation described with respect to any one terminal or the
5 is an operation flowchart illustrating a voice recognition control method according to an embodiment of the present invention. The voice recognition control method according to the embodiment shown in FIG. 5 is a clock in the voice
In step S51, the request
Each of the voice recognition control methods according to the embodiments described with reference to FIG. 5 may also be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by the computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, computer readable media may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transmission mechanism, and includes any information delivery media.
The foregoing description of the present invention is intended for illustration, and it will be understood by those skilled in the art that the present invention may be easily modified in other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.
The scope of the present invention is shown by the following claims rather than the above description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present invention. do.
10: voice recognition control server
11: request signal receiver
12: speech recognition engine determination unit
13: Identification Information Determination Unit
14: identification information transmission unit
20: terminal
Claims (20)
Determine a speech recognition engine corresponding to the terminal among a plurality of speech recognition engines based on at least one of terminal information, service information, and network information of the network included in the speech recognition request signal; A speech recognition engine determiner configured to determine a second protocol connection between the terminal and the speech recognition engine based on a recognition request signal to transmit and receive speech data and result information in a text or numeric format recognized from the speech data;
An identification information determining unit for determining identification information of a second protocol connection through which voice data is transmitted between the terminal and the determined speech recognition engine; And
An identification information transmission unit for transmitting the identification information to the terminal based on the first protocol connection,
The identification information includes compression encoding information including a compression level and an encoding level of the voice data, wherein the compression level and the encoding level are determined according to at least one or more of terminal information, service information, and network information;
And the speech data compressed and encoded based on the compression encoding information is transmitted from the terminal to the determined speech recognition engine based on the second protocol connection included in the identification information.
Wherein the first protocol is a communication layer based protocol different from the second protocol.
The first protocol is HyperText Transfer Protocol (HTTP), and the second protocol is Transmission Control Protocol-Internet Protocol (TCP-IP).
Further comprising a plurality of speech recognition engines.
One of the plurality of speech recognition engines is included in the speech recognition control server, and the other of the plurality of speech recognition engines is included in a predetermined speech recognition server outside the speech recognition control server. Awareness Control Server.
The identification information includes the network address information of the voice recognition engine, voice recognition control server.
The request signal receiving unit receives a voice recognition request signal from a first terminal of a plurality of terminals,
The speech recognition engine determiner determines a speech recognition engine corresponding to the second terminal based on terminal information of the second terminal included in the speech recognition request signal,
The identification information determining unit determines the identification information of the second protocol connection for transmitting voice data between the second terminal and the determined speech recognition engine,
The identification information transmission unit is to transmit to the second terminal, voice recognition control server.
Receiving a voice recognition request signal from the terminal based on the established first protocol connection;
Determining a speech recognition engine corresponding to the terminal among a plurality of speech recognition engines based on at least one of terminal information, service information, and network information of the network included in the speech recognition request signal;
Determining a second protocol connection between the terminal and the speech recognition engine based on the speech recognition request signal to transmit and receive speech data and result information in a text or numeric format recognized from the speech data;
Determining identification information of a second protocol connection through which voice data is transmitted between the terminal and the determined voice recognition engine; And
And transmitting the determined identification information to the terminal based on the first protocol connection.
The identification information includes compression encoding information including a compression level and an encoding level of the voice data, wherein the compression level and the encoding level are determined according to at least one or more of terminal information, service information, and network information,
And the speech data compressed and encoded based on the compression encoding information is transmitted from the terminal to the determined speech recognition engine based on the second protocol connection included in the identification information.
Identification of any one of the plurality of speech recognition engines determined based on at least one or more of the terminal information, service information and the network information of the network included in the speech recognition request signal from the speech recognition control server An identification receiver configured to receive information based on the first protocol connection;
A connection setting unit configured to establish a second protocol connection through which any one voice recognition engine and voice data and result information in a text or numeric format recognized from the voice data are transmitted and received based on the received identification information;
A voice data transmitter for transmitting the voice data to the voice recognition engine based on the set second protocol connection; And
A result information receiver configured to receive the result information corresponding to the transmitted voice data from the voice recognition engine based on the second protocol connection,
The identification information includes compression encoding information including a compression level and an encoding level of the voice data, wherein the compression level and the encoding level are determined according to at least one or more of terminal information, service information, and network information,
The voice data transmitter transmits the voice data compressed and encoded based on the compressed encoding information to the voice recognition engine based on the second protocol connection included in the identification information.
The first protocol is a communication layer based protocol different from the second protocol.
The voice data transmitting unit encodes the voice data based on the terminal information, the service information, and the network information of the second protocol connection of the terminal, and transmits the encoded voice data to the voice recognition engine.
Further comprising a search request unit for transmitting a search request signal to the search server based on the result information,
The search request signal is a signal for requesting to provide a search result corresponding to the search request signal to a target terminal associated with the terminal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020110138225A KR102014774B1 (en) | 2011-12-20 | 2011-12-20 | Server and method for controlling voice recognition of device, and the device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020110138225A KR102014774B1 (en) | 2011-12-20 | 2011-12-20 | Server and method for controlling voice recognition of device, and the device |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20130070947A KR20130070947A (en) | 2013-06-28 |
KR102014774B1 true KR102014774B1 (en) | 2019-10-22 |
Family
ID=48865575
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020110138225A KR102014774B1 (en) | 2011-12-20 | 2011-12-20 | Server and method for controlling voice recognition of device, and the device |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR102014774B1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102298767B1 (en) | 2014-11-17 | 2021-09-06 | 삼성전자주식회사 | Voice recognition system, server, display apparatus and control methods thereof |
KR101686073B1 (en) * | 2015-07-22 | 2016-12-28 | 재단법인 실감교류인체감응솔루션연구단 | Method, management server and computer-readable recording medium for allowing client terminal to be provided with services by converting network topology adaptively according to characteristics of the services |
KR102443079B1 (en) | 2017-12-06 | 2022-09-14 | 삼성전자주식회사 | Electronic apparatus and controlling method of thereof |
CN109949817B (en) * | 2019-02-19 | 2020-10-23 | 一汽-大众汽车有限公司 | Voice arbitration method and device based on dual-operating-system dual-voice recognition engine |
CN113096668B (en) * | 2021-04-15 | 2023-10-27 | 国网福建省电力有限公司厦门供电公司 | Method and device for constructing collaborative voice interaction engine cluster |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100526183B1 (en) * | 2003-07-15 | 2005-11-03 | 삼성전자주식회사 | Apparatus and Method for efficient data transmission/reception in Mobile Ad-hoc Network |
JP2011090100A (en) * | 2009-10-21 | 2011-05-06 | National Institute Of Information & Communication Technology | Speech translation system, controller, speech recognition device, translation device, and speech synthesizer |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100348599B1 (en) * | 2000-05-22 | 2002-08-13 | (주)클립컴 | Gateway Apparatus for Voice Communication over Internet Protocol with Integrated Wireless Digital Network Facility |
KR20080043035A (en) * | 2006-11-13 | 2008-05-16 | 삼성전자주식회사 | Mobile communication terminal having voice recognizing function and searching method using the same |
KR20110057890A (en) * | 2009-11-25 | 2011-06-01 | 에스케이 텔레콤주식회사 | System and method for data transmission based on wireless personal area network |
-
2011
- 2011-12-20 KR KR1020110138225A patent/KR102014774B1/en active IP Right Grant
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100526183B1 (en) * | 2003-07-15 | 2005-11-03 | 삼성전자주식회사 | Apparatus and Method for efficient data transmission/reception in Mobile Ad-hoc Network |
JP2011090100A (en) * | 2009-10-21 | 2011-05-06 | National Institute Of Information & Communication Technology | Speech translation system, controller, speech recognition device, translation device, and speech synthesizer |
Also Published As
Publication number | Publication date |
---|---|
KR20130070947A (en) | 2013-06-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20190273955A1 (en) | Method, device and terminal apparatus for synthesizing video stream of live streaming room | |
KR101467519B1 (en) | Server and method for searching contents using voice information | |
JP4114814B2 (en) | Communication terminal and communication system | |
US8036598B1 (en) | Peer-to-peer transfer of files with back-office completion | |
KR102014774B1 (en) | Server and method for controlling voice recognition of device, and the device | |
CN105573609A (en) | Content sharing method and device | |
KR102173242B1 (en) | Local wireless data communication system, method and apparatus for automactic setup of imformation | |
JP6327491B2 (en) | Application test system and application test method | |
US20180014063A1 (en) | Method and Apparatus for Accessing a Terminal Device Camera to a Target Device | |
US11240559B2 (en) | Content reproducing apparatus and content reproducing method | |
KR20130096868A (en) | Method for transmitting stream and electronic device for the method thereof | |
KR102069547B1 (en) | Method and apparatus for transmitting and receiving additional information in a broadcast communication system | |
US10560512B2 (en) | Method for file management and an electronic device thereof | |
US9497245B2 (en) | Apparatus and method for live streaming between mobile communication terminals | |
US20170171285A1 (en) | System and Method for Sharing Web Browser State Information Between User Devices | |
WO2015165415A1 (en) | Method and apparatus for playing audio data | |
CN105120207A (en) | Sweeping robot video monitoring method and server | |
WO2016107511A1 (en) | Video communication method, terminal and system | |
US11095939B2 (en) | Image display device and system thereof | |
KR101445260B1 (en) | Device, server and method for providing contents seamlessly | |
US20120159557A1 (en) | Apparatus and method for controlling contents transmission | |
US20140297790A1 (en) | Server, terminal apparatus, service transit server, and control method thereof | |
JP2008210397A (en) | Communication terminal and communication system | |
US10104422B2 (en) | Multimedia playing control method, apparatus for the same and system | |
KR101909257B1 (en) | Server and method for executing virtual application requested from device, and the device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
E90F | Notification of reason for final refusal | ||
E701 | Decision to grant or registration of patent right | ||
GRNT | Written decision to grant |