Nothing Special   »   [go: up one dir, main page]

KR102014774B1 - Server and method for controlling voice recognition of device, and the device - Google Patents

Server and method for controlling voice recognition of device, and the device Download PDF

Info

Publication number
KR102014774B1
KR102014774B1 KR1020110138225A KR20110138225A KR102014774B1 KR 102014774 B1 KR102014774 B1 KR 102014774B1 KR 1020110138225 A KR1020110138225 A KR 1020110138225A KR 20110138225 A KR20110138225 A KR 20110138225A KR 102014774 B1 KR102014774 B1 KR 102014774B1
Authority
KR
South Korea
Prior art keywords
terminal
information
voice recognition
speech recognition
voice
Prior art date
Application number
KR1020110138225A
Other languages
Korean (ko)
Other versions
KR20130070947A (en
Inventor
류창선
김희경
한영호
구명완
Original Assignee
주식회사 케이티
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 케이티 filed Critical 주식회사 케이티
Priority to KR1020110138225A priority Critical patent/KR102014774B1/en
Publication of KR20130070947A publication Critical patent/KR20130070947A/en
Application granted granted Critical
Publication of KR102014774B1 publication Critical patent/KR102014774B1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/18Multiprotocol handlers, e.g. single devices capable of handling multiple protocols

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A control server and method for controlling voice recognition of a terminal, and a terminal are provided. More specifically, the voice recognition request signal is received from the terminal based on the first protocol connection established with the terminal through the network, and the voice recognition engine corresponding to the terminal is determined among the plurality of voice recognition engines based on the voice recognition request signal. A voice recognition control server and method for determining the identification information of the second protocol connection through which voice data is transmitted between the terminal and the determined voice recognition engine and transmitting the determined identification information to the terminal are provided.

Description

SERVER AND METHOD FOR CONTROLLING VOICE RECOGNITION OF DEVICE, AND THE DEVICE}

The present invention relates to a server and a method for controlling voice recognition, and a terminal, and more particularly, to a server and a method for controlling voice recognition of each of a plurality of terminals, and a terminal.

The N Screen service is a service that allows a user to use a service that was independently used in various devices such as a TV, a PC, a tablet PC, or a smartphone, centering on a user or content. In the provision of the N screen service, a technology of simultaneously playing the same content on a plurality of devices of various types and seamless playback of content played on any one terminal of the plurality of devices on another device of the plurality of devices Technology is required. In this regard, Korean Patent Publication No. 2011-0009587, which is a prior art, discloses a configuration for providing video content replay between heterogeneous terminals by implementing synchronization of playback history between content servers providing content for a plurality of terminals. .

Meanwhile, due to the expansion of the N-screen environment, various voice interfaces such as pads, smart phones, and IPTVs are required to effectively perform a plurality of voice interface requirements due to the expansion of the number of users and the use of different terminals. However, existing systems are limited in handling the voice interface requirements of large amounts of voice interface or other types of terminals.

It is possible to perform voice interface control of terminals more effectively by integrating different characteristics of various types of terminals. It is possible to prevent large locks due to large voice interface requests of a plurality of terminals and to reduce network load. However, the technical problem to be achieved by the present embodiment is not limited to the technical problems as described above, and other technical problems may exist.

As a technical means for achieving the above technical problem, an embodiment of the present invention is a request signal receiving unit for receiving a voice recognition request signal from the terminal based on a first protocol connection established with the terminal through a network, the voice recognition request A speech recognition engine determiner configured to determine a speech recognition engine corresponding to the terminal among a plurality of speech recognition engines based on the signal, and identification information of a second protocol connection in which speech data is transmitted between the terminal and the determined speech recognition engine. It may provide a voice recognition control server including an identification information determining unit for determining and an identification information transmitting unit for transmitting the identification information to the terminal.

In addition, another embodiment of the present invention comprises the steps of establishing a first protocol connection with the terminal through a network, receiving a voice recognition request signal from the terminal based on the set first protocol connection, the voice recognition request signal Determining a speech recognition engine corresponding to the terminal from among a plurality of speech recognition engines, determining identification information of a second protocol connection through which speech data is transmitted between the terminal and the determined speech recognition engine; It may provide a voice recognition control method comprising the step of transmitting the identification information to the terminal.

In addition, another embodiment of the present invention is a request signal transmission unit for transmitting a voice recognition request signal to the voice recognition control server based on the first protocol connection established with the voice recognition control server through the network, from the voice recognition control server An identification information receiver configured to receive identification information of any one of a plurality of speech recognition engines, a connection setting unit configured to establish a second protocol connection with the one of the speech recognition engines based on the received identification information; Based on the set second protocol connection, a voice data transmission unit for transmitting voice data to the one voice recognition engine and result information for receiving result information corresponding to the transmitted voice data from any one voice recognition engine. A terminal including a receiver may be provided.

By determining the voice recognition engine specific to the terminal in consideration of the characteristics of each terminal, it is possible to perform the voice interface control of the terminals more effectively by integrating the different characteristics of the various types of terminals. By separating and operating the first protocol for transmitting and receiving control signals and the second protocol for transmitting and receiving voice data, it is possible to prevent large locks caused by large-capacity voice interface requests of a plurality of terminals and to reduce network load.

1 is a block diagram of a voice recognition control system according to an embodiment of the present invention.
2 is a block diagram of the voice recognition control server 10 of FIG.
3 is a configuration diagram of a voice recognition control server 40 and a voice recognition engine server 50 according to another embodiment of the present invention.
4 is a configuration diagram of a terminal 20 according to an embodiment of the present invention.
5 is an operation flowchart illustrating a voice recognition control method according to an embodiment of the present invention.

DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.

Throughout the specification, when a part is "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element in between. . In addition, when a part is said to "include" a certain component, which means that it may further include other components, except to exclude other components unless otherwise stated.

1 is a block diagram of a voice recognition control system according to an embodiment of the present invention. Referring to FIG. 1, the voice recognition control system includes a voice recognition control server 10, a plurality of terminals 21 to 23, and a search server 30. However, since the voice recognition control system of FIG. 1 is only an embodiment of the present invention, the contents of the present invention are not limitedly interpreted through FIG. 1. For example, according to various embodiments of the present disclosure, the voice recognition control system may further include a content providing server that provides content to the plurality of terminals 21 to 23. In addition, as disclosed in FIG. 1, the voice recognition control system of FIG. 1 may further include a voice recognition engine A 11 located outside the voice recognition control server 10.

Each component of FIG. 1 constituting the voice recognition control system is generally connected through a network. A network refers to a connection structure capable of exchanging information between respective nodes such as terminals and servers. Examples of such a network include the Internet, a local area network, and a wireless LAN. Local Area Network (WAN), Wide Area Network (WAN), Personal Area Network (PAN), etc. may be included, but is not limited thereto.

The voice recognition control server 10 controls voice recognition of the plurality of terminals 21 to 23. To this end, the voice recognition control server 10 receives a voice recognition request signal from the plurality of terminals 21 to 23 through a network, and sends a response signal corresponding to the received voice recognition request signal to the plurality of terminals 21 to 23. To send).

The voice recognition control server 10 receives a voice recognition request signal based on the first protocol connection established with the plurality of terminals 21 to 23, and responds to the plurality of terminals 21 to 23 as a response to the voice recognition request signal. The identification information of the second protocol through which voice data is transmitted and received between voice recognition engines is transmitted. As such, the voice recognition control server 10 may separate the channel for transmitting and receiving the control signal for voice recognition and the channel for transmitting and receiving the actual voice data, thereby reducing the load on the network and performing efficient voice recognition control. .

The voice recognition control server 10 detects the characteristics of any one terminal based on the voice recognition request signal received from any one of the terminals 21 to 23, and performs a speech recognition engine corresponding to the identified characteristics. Determine. As such, the voice recognition control server 10 may determine a voice recognition engine suitable for any one terminal in consideration of the characteristics of any one terminal to perform customized voice recognition control considering the characteristics of each of the various types of terminals. Can be.

According to an embodiment of the present invention, the voice recognition control server 10 includes a plurality of voice recognition engines therein, and recommends each of the plurality of voice recognition engines to each terminal in consideration of characteristics of each terminal. In addition, according to another embodiment of the present invention, the voice recognition control server 10 may recommend the voice recognition engine A 50 located outside the voice recognition control server 10 as a terminal.

The search server 30 receives a search request signal received from the plurality of terminals 21 to 23, and transmits a search result corresponding to the search request signal to the plurality of terminals 21 to 23. At this time, the search request signal is generated by the result information of the voice data which each of the plurality of terminals 21 to 23 receives from the speech recognition engine.

Each of the plurality of terminals 21 to 23 transmits a voice recognition request signal to the voice recognition control server 10 and receives identification information of the voice recognition engine from the voice recognition control server. In addition, each of the plurality of terminals 21 to 23 transmits the voice data to the voice recognition engine and receives the result information corresponding to the transmitted voice data. In this case, the voice recognition request signal and the identification information are transmitted and received between the terminals 21 to 23 and the voice recognition control server 10 based on the first protocol connection, and the voice data and the result information are based on the second protocol connection. It is transmitted and received between 21 to 23 and the voice recognition engine.

According to various embodiments of the present disclosure, each of the plurality of terminals may be a different type of terminal. For example, the terminal may be a TV device, a computer or a portable terminal capable of connecting to a remote server via a network. Here, an example of a TV device includes a smart TV, an IPTV set-top box, and the like, and an example of a computer includes a laptop, desktop, laptop, etc., which is equipped with a web browser. An example of a terminal is a wireless communication device that guarantees portability and mobility, and includes a personal communication system (PCS), a global system for mobile communications (GSM), a personal digital cellular (PDC), a personal handyphone system (PHS), and a personal digital (PDA). Assistant (IMT), International Mobile Telecommunication (IMT) -2000, Code Division Multiple Access (CDMA) -2000, W-Code Division Multiple Access (W-CDMA), Wireless Broadband Internet (Wibro) terminal, smartphone, tablet PC All kinds of handheld based wireless communication devices such as the like may be included.

The operation of each component of the voice recognition control system of FIG. 1 will be described in more detail with reference to the following drawings.

2 is a block diagram of the voice recognition control server 10 of FIG. Referring to FIG. 2, the voice recognition control server 10 may include a request signal receiver 11, a voice recognition engine determiner 12, an identification information determiner 13, an identification information transmitter 14, and a first voice recognition. An engine 15, a second speech recognition engine 16 and a database 17.

However, the voice recognition control server 10 shown in FIG. 2 is only one implementation example of the present invention, and various modifications are possible based on the components shown in FIG. 2. For example, the voice recognition control server 10 may further include a manager interface for receiving a certain command or information from the manager. In this case, the manager interface may generally be an input device such as a keyboard, a mouse, or the like, or may be a graphical user interface (GUI) expressed on the image display device. For another example, the voice recognition control server 10 may further include a communication unit for transmitting and receiving data with the terminal 20. In this case, the communication unit receives data from the terminal 20 via a network and transfers the received data to other components inside the voice recognition control server 10 or another configuration inside the voice recognition control server 10. Data transmitted from the element may be transmitted to the terminal 20. As another example, the voice recognition control server 10 may further include at least one or more voice recognition engines.

The request signal receiving unit 11 receives a voice recognition request signal from the terminal 20 through a network. In this case, the voice recognition request signal means a signal for requesting to convert the voice data input to the terminal 20 into data in a text or numeric format that can be recognized by a device or a person. In addition, the terminal 20 refers to any one terminal 20 of the plurality of terminals 21 to 23 illustrated in FIG. 1, but is not limited to the form or type illustrated in FIG. 1.

The request signal receiving unit 11 receives a voice recognition request signal from the terminal 20 based on the first protocol connection established with the terminal 20 through a network. At this time, the first protocol is a communication layer based protocol different from the second protocol which will be described later. For example, the first protocol may be HyperText Transfer Protocol (HTTP) based on an application layer, and the second protocol may be Transmission Control Protocol-Internet Protocol (TCP-IP) based on a transport layer or a network layer.

The request signal receiving unit 11 may receive a voice recognition request signal from the terminal 20 using an application programming interface (API). In addition, the identification information transmitter 14 of FIG. 1 may transmit the identification information to the terminal 20 using an application programming interface (API). In other words, the control signals for voice recognition between the terminal 20 and the voice recognition control server 20 can be transmitted and received through the API, in this case, the terminal 20, the API client module, voice recognition control server ( 20) each API server module may be installed. In general, an API means an interface for communication between a software configuration and a software configuration. An example of such an API includes an HTTP API. In general, the API is useful for ensuring independence between communication between any one terminal of the plurality of terminals and the voice recognition control server 20 and communication between the other terminal of the plurality of terminals and the voice recognition control server 20.

The speech recognition engine determiner 12 determines a speech recognition engine corresponding to the terminal 20 among the speech recognition engines based on the speech recognition request signal. At this time, the speech recognition engine determiner 12 may determine the speech recognition engine based on the terminal information of the terminal 20 included in the speech recognition request signal. For example, when the terminal 20 is determined to be a smart phone based on the terminal information of the terminal 20, the voice recognition engine determiner 12 may select a voice recognition engine for the smart phone among the plurality of voice recognition engines. You can decide. For another example, when the terminal 20 is determined to be an Android-based terminal based on the terminal information of the terminal 20, the speech recognition engine determiner 12 may recognize the Android-based speech among the plurality of speech recognition engines. The engine can be determined.

As such, the voice recognition engine determiner 12 may determine the hardware type or the software type of the terminal based on the terminal information of the terminal 20, and determine the voice recognition engine corresponding to the determined type. At this time, one example of the hardware type includes various forms such as a smart phone, navigation, tablet PC, PC, smart TV, set-top box, etc. One example of the software type is Android OS, IOS, Windows OS, Windows Mobile OS, Various forms such as middleware and predetermined applications are included.

The speech recognition engine determiner 12 may determine the speech recognition engine based on at least one of terminal information, service information, and network information of a network of the terminal 20. In this case, the service information is information on the type of service that the terminal 20 is using or wants to use. An example of such service information is a voice such as a TV service, a map service, a music service, a call center service, and a voice dial service. Various types of services for which awareness is available are included. In addition, such service information may be extracted from the voice recognition request signal or directly received from the terminal 20.

The network information includes the type of network. An example of such a network may include the Internet, a local area network (LAN), a wireless local area network (WLAN), a wide area network (WAN), a personal area network (PAN), and the like, as described above. have.

The speech recognition engine determiner 12 may determine a second protocol connection with the speech recognition engine based on the speech recognition request signal. In this case, the second protocol connection means a connection in which voice data and result information are transmitted and received between the terminal 20 and the voice recognition engine. As described above, this second protocol connection may be a communication layer based protocol connection different from the first protocol connection.

The identification information determining unit 13 determines identification information of the second protocol connection through which voice data is transmitted between the terminal 20 and the determined voice recognition engine. At this time, the identification information may include the address information of the determined voice recognition engine. In addition, the identification information may include information indicating the second protocol.

An example of address information of the speech recognition engine includes a Uniform Resource Locator (URL) for identifying a location where the speech recognition engine is located. In general, the terminal 20 may transmit voice data to a speech recognition engine suitable for the terminal 20 among a plurality of speech recognition engines using the URL.

The identification information may include compressed encoding information of the voice data. In this case, the compression encoding information refers to information for compressing and encoding voice data transmitted to the voice recognition engine determined by the terminal 20. For example, the compression encoding information may include information for compressing voice data to compression level 2 and encoding the data in encoding level 3 data form.

The compression level may be determined according to at least one of terminal information, service information, and network information of the terminal. For example, the compression level may be determined as level 7 in consideration of terminal information and service information when the network information is 3G. For another example, the compression level may be determined as level 10 in consideration of terminal information and network information when the service information is a music service.

The terminal 20 may compress the voice data based on the compression level. Also, the speech recognition engine may restore the compressed speech data based on the compression level. In this case, the speech recognition engine may obtain compressed encoding information from the terminal 20 or the speech recognition control server 10.

The encoding level may also be determined according to at least one or more of terminal information, service information, and network information of the terminal. Illustrating this encoding level, level 1 may indicate IR communication voice recognition, level 2 Bluetooth voice recognition, level 3 iPhone voice recognition, level 4 Android phone voice recognition, and level 5 music melody or humming.

The terminal 20 may encode the voice data based on the encoding level. In addition, the speech recognition engine may decode the encoded speech data based on the encoding level. In this case, the speech recognition engine may obtain compressed encoding information from the terminal 20 or the speech recognition control server 10.

The identification information transmitter 14 transmits the identification information to the terminal 20. In addition, the terminal 20 transmits the voice data to the voice recognition engine determined using the received identification information. At this time, the terminal 20 may transmit voice data based on the second protocol connection included in the identification information as described above. In addition, the terminal 20 may encode the speech data based on the compressed encoding information, and transmit the encoded speech data to the speech recognition engine. In addition, the terminal 20 may encode the voice data based on the terminal information of the terminal 20, the service information, and the network information of the second protocol connection, and transmit the encoded voice data to the voice recognition engine.

The first voice recognition engine 15 generates result information corresponding to the voice data received from the terminal 20 and transmits the generated result information to the terminal 20. At this time, the first voice recognition engine 15 means the above-mentioned determined voice recognition engine. In addition, the result information refers to data in the form of letters or numbers recognized from voice data so that it can be used by a person or a device.

When the second voice recognition engine 16 receives the voice data from the terminal 20 and the other type of terminal, the second voice recognition engine 16 may generate result information corresponding to the received voice data and transmit the generated result information to another terminal. . The voice recognition control server 10 may include a plurality of voice recognition engines according to characteristics of each of the plurality of terminals. Accordingly, the voice recognition control server 10 may further include at least one voice recognition engine other than the first voice recognition engine 15 and the second voice recognition engine 16.

According to one embodiment of the invention, any one of the plurality of voice recognition engine is included in the voice recognition control server 10, the other of the plurality of voice recognition engine of the voice recognition control server 10 It may be included in an external predetermined voice recognition server.

The database 17 stores data. At this time, the data includes data input and output between the components inside the voice recognition control server 10, and the data input and output between the control server 10 and the components outside the control server 10 Include. For example, the database 15 stores the identification information transmitted from the identification information determination unit 13 to the identification information transmission unit 14, and the voice recognition input from the terminal 20 to the voice recognition control server 10. The request signal can be stored. An example of such a database 15 includes a hard disk drive, a hard disk drive, a read only memory (ROM), a random access memory (RAM), a flash memory, a memory card, or the like existing inside or outside the voice recognition control server 10. This includes.

3 is a configuration diagram of a voice recognition control server 40 and a voice recognition engine server 50 according to another embodiment of the present invention. Referring to FIG. 3, the voice recognition engine server 50 includes a first voice recognition engine 51 and a second voice recognition engine 52.

Referring to FIG. 3, the voice recognition control server 40 receives a voice recognition request signal from the terminal 20 based on a first protocol connection established with the terminal 20 through a network, and receives the received voice recognition request signal. A voice recognition engine corresponding to the terminal 20 is determined based on the plurality of voice recognition engines, and identification information of the second protocol connection through which voice data is transmitted between the terminal 20 and the determined voice recognition engine is determined. The identification information is transmitted to the terminal 20. The matters not described with respect to the operation of the voice recognition control server are described above in the request signal receiver 11, the voice recognition engine determiner 12, and the identification information determiner 13 of the voice recognition control server 10 of FIG. 2. Since the identification information transmitting unit 14 and the database 17 can be easily inferred from the same or described contents by those skilled in the art, the following description is omitted.

The first voice recognition engine 51 generates result information corresponding to the voice data received from the terminal 20 based on the second protocol connection with the terminal 20, and sends the generated result information to the terminal 20. send. At this time, the first voice recognition engine 50 means the above-mentioned determined voice recognition engine. In addition, the result information refers to data in the form of letters or numbers recognized from voice data so that it can be used by a person or a device.

When the second voice recognition engine 52 receives voice data from the terminal 20 and a different type of terminal, the second voice recognition engine 52 may generate result information corresponding to the received voice data and transmit the generated result information to another terminal. . In addition, the voice recognition engine server 50 may include a plurality of voice recognition engines according to characteristics of each of the plurality of terminals. Therefore, the speech recognition engine server 50 may further include at least one speech recognition engine other than the first speech recognition engine 51 and the second speech recognition engine 52. Such matters that are not described with respect to the first speech recognition engine 51 and the second speech recognition engine are the same as those described above for the speech recognition engine through FIGS. Since it can be inferred, the following description is omitted.

4 is a configuration diagram of a terminal 20 according to an embodiment of the present invention. The terminal 20 of FIG. 4 may be any one of the plurality of terminals 21 to 23 illustrated in FIG. 1. However, the terminal 20 is not limited to the form or type of the plurality of terminals 21 to 23 shown in FIG. 1. Referring to FIG. 4, the terminal 20 includes a request signal transmitter 21, an identification information receiver 22, a connection setup unit 23, a voice data transmitter 24, a result information receiver 25, and a search request. Part 26 is included.

However, the terminal 10 shown in FIG. 4 is only one implementation example of the present invention, and various modifications are possible based on the components shown in FIG. 6. For example, the terminal 20 may further include a user interface for receiving a certain command or information from the user. In this case, the user interface may generally be an input device such as a keyboard, a mouse, or the like, or may be a graphical user interface (GUI) expressed on the image display device. For another example, the terminal 20 may further include a communication unit for transmitting and receiving data with the voice recognition control server 10. As another example, the terminal 20 may further include components (eg, video and audio processing unit) included in the general terminal. In addition, the terminal 20 may further include a database.

The request signal transmitter 21 transmits the voice recognition request signal to the voice recognition control server 10 based on the first protocol connection established with the voice recognition control server 10 through the network. In this case, the voice recognition request signal may include terminal information of the terminal 20. In addition, the first protocol may be a communication layer based protocol different from the second protocol.

The identification information receiving unit 22 receives identification information of any one of the plurality of speech recognition engines from the speech recognition control server 10.

The connection setting unit 23 sets up a second protocol connection with any one voice recognition engine based on the received identification information.

The voice data transmitter 24 transmits the voice data to any one voice recognition engine based on the established second protocol connection. At this time, the voice data transmitter 24 may encode the voice data based on the compressed encoding information, and transmit the encoded voice data to any one voice recognition engine. In this case, compressed encoding information of the voice data may be included in the identification information. In addition, the voice data transmitter 24 may encode the voice data based on the terminal information of the terminal 20, the service information, and the network information of the second protocol connection, and transmit the encoded voice data to the voice recognition engine. .

The result information receiver 25 receives result information corresponding to the voice data transmitted from any one of the voice recognition engines.

The search request unit 26 transmits a search request signal to the search server 30 based on the result information. The search request unit 26 may receive a search result corresponding to the search request signal from the search server 30. However, according to another exemplary embodiment of the present invention, the search request unit 26 may search the search server 30 for a search request signal for requesting to provide a search result corresponding to the search request signal to the target terminal associated with the terminal 20. Can also be sent. In this case, the target terminal may receive a search result from the search server 30.

The terminal 20 of FIG. 4 performs the operation described with respect to any one terminal or the terminal 20 of the plurality of terminals 21 to 23 described above with reference to FIGS. 1 to 3. Therefore, the matters that are not described with respect to the terminal 20 through FIG. 4 apply the contents described above with respect to any one terminal or the terminal 20 of the plurality of terminals 21 through 23 through FIGS. 1 through 3. do. In other words, the detailed description of the terminal 20 of FIG. 4 will be omitted below since it can be easily inferred by those skilled in the art from the same or described contents as described above with reference to FIGS. 1 to 3.

5 is an operation flowchart illustrating a voice recognition control method according to an embodiment of the present invention. The voice recognition control method according to the embodiment shown in FIG. 5 is a clock in the voice recognition control server 10 according to the embodiment shown in FIG. 2 or the voice recognition control server 40 according to another embodiment shown in FIG. 3. Thermally treated steps. Therefore, even if omitted below, the contents described above with respect to FIG. 2 or FIG. 3 also apply to the voice recognition control method according to the embodiment shown in FIG. 5.

In step S51, the request signal receiving unit 11 establishes a first protocol connection with the terminal 20 through a network. In step S52, the request signal receiving unit 11 receives a voice recognition request signal from the terminal 20 based on the established first protocol connection. In operation S53, the speech recognition engine determiner 12 determines a speech recognition engine corresponding to the terminal 20 among the speech recognition engines based on the speech recognition request signal. In step S54, the identification information determiner 13 determines the identification information of the second protocol connection through which voice data is transmitted between the terminal 20 and the determined speech recognition engine. In step S55, the identification information transmitting unit 14 transmits the determined identification information to the terminal 20.

Each of the voice recognition control methods according to the embodiments described with reference to FIG. 5 may also be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by the computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, computer readable media may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transmission mechanism, and includes any information delivery media.

The foregoing description of the present invention is intended for illustration, and it will be understood by those skilled in the art that the present invention may be easily modified in other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

The scope of the present invention is shown by the following claims rather than the above description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the present invention. do.

10: voice recognition control server
11: request signal receiver
12: speech recognition engine determination unit
13: Identification Information Determination Unit
14: identification information transmission unit
20: terminal

Claims (20)

A request signal receiving unit receiving a voice recognition request signal from the terminal based on a first protocol connection established with the terminal through a network;
Determine a speech recognition engine corresponding to the terminal among a plurality of speech recognition engines based on at least one of terminal information, service information, and network information of the network included in the speech recognition request signal; A speech recognition engine determiner configured to determine a second protocol connection between the terminal and the speech recognition engine based on a recognition request signal to transmit and receive speech data and result information in a text or numeric format recognized from the speech data;
An identification information determining unit for determining identification information of a second protocol connection through which voice data is transmitted between the terminal and the determined speech recognition engine; And
An identification information transmission unit for transmitting the identification information to the terminal based on the first protocol connection,
The identification information includes compression encoding information including a compression level and an encoding level of the voice data, wherein the compression level and the encoding level are determined according to at least one or more of terminal information, service information, and network information;
And the speech data compressed and encoded based on the compression encoding information is transmitted from the terminal to the determined speech recognition engine based on the second protocol connection included in the identification information.
delete delete delete The method of claim 1,
Wherein the first protocol is a communication layer based protocol different from the second protocol.
The method of claim 1,
The first protocol is HyperText Transfer Protocol (HTTP), and the second protocol is Transmission Control Protocol-Internet Protocol (TCP-IP).
The method of claim 1,
Further comprising a plurality of speech recognition engines.
The method of claim 1,
One of the plurality of speech recognition engines is included in the speech recognition control server, and the other of the plurality of speech recognition engines is included in a predetermined speech recognition server outside the speech recognition control server. Awareness Control Server.
The method of claim 1,
The identification information includes the network address information of the voice recognition engine, voice recognition control server.
delete delete delete The method of claim 1,
The request signal receiving unit receives a voice recognition request signal from a first terminal of a plurality of terminals,
The speech recognition engine determiner determines a speech recognition engine corresponding to the second terminal based on terminal information of the second terminal included in the speech recognition request signal,
The identification information determining unit determines the identification information of the second protocol connection for transmitting voice data between the second terminal and the determined speech recognition engine,
The identification information transmission unit is to transmit to the second terminal, voice recognition control server.
Establishing a first protocol connection with a terminal through a network;
Receiving a voice recognition request signal from the terminal based on the established first protocol connection;
Determining a speech recognition engine corresponding to the terminal among a plurality of speech recognition engines based on at least one of terminal information, service information, and network information of the network included in the speech recognition request signal;
Determining a second protocol connection between the terminal and the speech recognition engine based on the speech recognition request signal to transmit and receive speech data and result information in a text or numeric format recognized from the speech data;
Determining identification information of a second protocol connection through which voice data is transmitted between the terminal and the determined voice recognition engine; And
And transmitting the determined identification information to the terminal based on the first protocol connection.
The identification information includes compression encoding information including a compression level and an encoding level of the voice data, wherein the compression level and the encoding level are determined according to at least one or more of terminal information, service information, and network information,
And the speech data compressed and encoded based on the compression encoding information is transmitted from the terminal to the determined speech recognition engine based on the second protocol connection included in the identification information.
A request signal transmitter for transmitting a voice recognition request signal to the voice recognition control server based on a first protocol connection established with the voice recognition control server through a network;
Identification of any one of the plurality of speech recognition engines determined based on at least one or more of the terminal information, service information and the network information of the network included in the speech recognition request signal from the speech recognition control server An identification receiver configured to receive information based on the first protocol connection;
A connection setting unit configured to establish a second protocol connection through which any one voice recognition engine and voice data and result information in a text or numeric format recognized from the voice data are transmitted and received based on the received identification information;
A voice data transmitter for transmitting the voice data to the voice recognition engine based on the set second protocol connection; And
A result information receiver configured to receive the result information corresponding to the transmitted voice data from the voice recognition engine based on the second protocol connection,
The identification information includes compression encoding information including a compression level and an encoding level of the voice data, wherein the compression level and the encoding level are determined according to at least one or more of terminal information, service information, and network information,
The voice data transmitter transmits the voice data compressed and encoded based on the compressed encoding information to the voice recognition engine based on the second protocol connection included in the identification information.
delete The method of claim 15,
The first protocol is a communication layer based protocol different from the second protocol.
delete The method of claim 15,
The voice data transmitting unit encodes the voice data based on the terminal information, the service information, and the network information of the second protocol connection of the terminal, and transmits the encoded voice data to the voice recognition engine.
The method of claim 15,
Further comprising a search request unit for transmitting a search request signal to the search server based on the result information,
The search request signal is a signal for requesting to provide a search result corresponding to the search request signal to a target terminal associated with the terminal.
KR1020110138225A 2011-12-20 2011-12-20 Server and method for controlling voice recognition of device, and the device KR102014774B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020110138225A KR102014774B1 (en) 2011-12-20 2011-12-20 Server and method for controlling voice recognition of device, and the device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020110138225A KR102014774B1 (en) 2011-12-20 2011-12-20 Server and method for controlling voice recognition of device, and the device

Publications (2)

Publication Number Publication Date
KR20130070947A KR20130070947A (en) 2013-06-28
KR102014774B1 true KR102014774B1 (en) 2019-10-22

Family

ID=48865575

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020110138225A KR102014774B1 (en) 2011-12-20 2011-12-20 Server and method for controlling voice recognition of device, and the device

Country Status (1)

Country Link
KR (1) KR102014774B1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102298767B1 (en) 2014-11-17 2021-09-06 삼성전자주식회사 Voice recognition system, server, display apparatus and control methods thereof
KR101686073B1 (en) * 2015-07-22 2016-12-28 재단법인 실감교류인체감응솔루션연구단 Method, management server and computer-readable recording medium for allowing client terminal to be provided with services by converting network topology adaptively according to characteristics of the services
KR102443079B1 (en) 2017-12-06 2022-09-14 삼성전자주식회사 Electronic apparatus and controlling method of thereof
CN109949817B (en) * 2019-02-19 2020-10-23 一汽-大众汽车有限公司 Voice arbitration method and device based on dual-operating-system dual-voice recognition engine
CN113096668B (en) * 2021-04-15 2023-10-27 国网福建省电力有限公司厦门供电公司 Method and device for constructing collaborative voice interaction engine cluster

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100526183B1 (en) * 2003-07-15 2005-11-03 삼성전자주식회사 Apparatus and Method for efficient data transmission/reception in Mobile Ad-hoc Network
JP2011090100A (en) * 2009-10-21 2011-05-06 National Institute Of Information & Communication Technology Speech translation system, controller, speech recognition device, translation device, and speech synthesizer

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100348599B1 (en) * 2000-05-22 2002-08-13 (주)클립컴 Gateway Apparatus for Voice Communication over Internet Protocol with Integrated Wireless Digital Network Facility
KR20080043035A (en) * 2006-11-13 2008-05-16 삼성전자주식회사 Mobile communication terminal having voice recognizing function and searching method using the same
KR20110057890A (en) * 2009-11-25 2011-06-01 에스케이 텔레콤주식회사 System and method for data transmission based on wireless personal area network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100526183B1 (en) * 2003-07-15 2005-11-03 삼성전자주식회사 Apparatus and Method for efficient data transmission/reception in Mobile Ad-hoc Network
JP2011090100A (en) * 2009-10-21 2011-05-06 National Institute Of Information & Communication Technology Speech translation system, controller, speech recognition device, translation device, and speech synthesizer

Also Published As

Publication number Publication date
KR20130070947A (en) 2013-06-28

Similar Documents

Publication Publication Date Title
US20190273955A1 (en) Method, device and terminal apparatus for synthesizing video stream of live streaming room
KR101467519B1 (en) Server and method for searching contents using voice information
JP4114814B2 (en) Communication terminal and communication system
US8036598B1 (en) Peer-to-peer transfer of files with back-office completion
KR102014774B1 (en) Server and method for controlling voice recognition of device, and the device
CN105573609A (en) Content sharing method and device
KR102173242B1 (en) Local wireless data communication system, method and apparatus for automactic setup of imformation
JP6327491B2 (en) Application test system and application test method
US20180014063A1 (en) Method and Apparatus for Accessing a Terminal Device Camera to a Target Device
US11240559B2 (en) Content reproducing apparatus and content reproducing method
KR20130096868A (en) Method for transmitting stream and electronic device for the method thereof
KR102069547B1 (en) Method and apparatus for transmitting and receiving additional information in a broadcast communication system
US10560512B2 (en) Method for file management and an electronic device thereof
US9497245B2 (en) Apparatus and method for live streaming between mobile communication terminals
US20170171285A1 (en) System and Method for Sharing Web Browser State Information Between User Devices
WO2015165415A1 (en) Method and apparatus for playing audio data
CN105120207A (en) Sweeping robot video monitoring method and server
WO2016107511A1 (en) Video communication method, terminal and system
US11095939B2 (en) Image display device and system thereof
KR101445260B1 (en) Device, server and method for providing contents seamlessly
US20120159557A1 (en) Apparatus and method for controlling contents transmission
US20140297790A1 (en) Server, terminal apparatus, service transit server, and control method thereof
JP2008210397A (en) Communication terminal and communication system
US10104422B2 (en) Multimedia playing control method, apparatus for the same and system
KR101909257B1 (en) Server and method for executing virtual application requested from device, and the device

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E90F Notification of reason for final refusal
E701 Decision to grant or registration of patent right
GRNT Written decision to grant