Nothing Special   »   [go: up one dir, main page]

WO2013189430A2 - 一种自动语音识别业务的实现方法、系统和媒体服务器 - Google Patents

一种自动语音识别业务的实现方法、系统和媒体服务器 Download PDF

Info

Publication number
WO2013189430A2
WO2013189430A2 PCT/CN2013/082219 CN2013082219W WO2013189430A2 WO 2013189430 A2 WO2013189430 A2 WO 2013189430A2 CN 2013082219 W CN2013082219 W CN 2013082219W WO 2013189430 A2 WO2013189430 A2 WO 2013189430A2
Authority
WO
WIPO (PCT)
Prior art keywords
server
asr
media
service data
data packet
Prior art date
Application number
PCT/CN2013/082219
Other languages
English (en)
French (fr)
Other versions
WO2013189430A3 (zh
Inventor
张伟
程佳佳
崔飞
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2013189430A2 publication Critical patent/WO2013189430A2/zh
Publication of WO2013189430A3 publication Critical patent/WO2013189430A3/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/173Transcoding, i.e. converting between two coded representations avoiding cascaded coding-decoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4936Speech interaction details

Definitions

  • the present invention relates to an automatic speech recognition (ASR) technology in the field of communications, and in particular, to an implementation method, system and media server for an ASR service.
  • ASR automatic speech recognition
  • the Media Server is a stand-alone device that provides dedicated media resource functions in the softswitch system. It is also an important device in the packet network, providing media processing functions in basic and enhanced services, and configured for all audio and video related.
  • the media processing operation includes: converting video and audio real-time transport protocol (RTP) data and video and audio files.
  • RTP real-time transport protocol
  • the media server is also configured to receive input from the user through the terminal dual tone multi-frequency (DTMF), guide voice of the broadcast service, and display a dynamic boot screen.
  • the Media Server has the Session Initiation Protocol (SIP) and MSML/MOML protocol capabilities, which enable the media server to complete the entire session process under the control of the application server (APP Server) to achieve interaction with the user.
  • SIP Session Initiation Protocol
  • APP Server application server
  • the Media Control Module is an important module in the media server. It is mainly configured to perform capability negotiation with other entities, provide management and maintenance of resources, and control other service resource modules to perform complex services.
  • the Media Storage Transmission Audio Module is a service resource module in the media server that is configured to store a large amount of audio data and implement audio file playback.
  • An external network port is disposed on the media storage transmission audio module, and the audio data can be directly sent and received through the external network port.
  • the application range of the media server is wide, and can be mainly summarized into functions such as audio and video playback, number collection, and conference.
  • the ASR function recognizes the input audio information, converts it into text, and sends the text message to the user through the message.
  • ASR applications are usually
  • the ASR server is configured to send an ASR server to the user by signaling, for example, to the user's terminal to complete an ASR service.
  • FIG. 1 is a schematic structural diagram of a system for implementing an ASR service in the prior art. As shown in FIG. 1, the system includes: a terminal 11, an APP server 12, a media server 13, and an ASR server 14.
  • the method implementation process based on the system described in FIG. 1 includes the following steps:
  • Step 101 The terminal 11 initiates a call, triggering the APP server 12 to activate the APP service;
  • Step 102 The APP server 12 requests the ASR service from the media server 13 through SIP signaling.
  • Step 103 The media server 13 requests the ASR resource from the ASR server 14 through the SIP signaling, and controls the ASR server 14 to perform the corresponding service through the Media Resource Control Protocol (MRCP).
  • Step 104 The terminal 11 sends the media service data packet to the ASR server 14, The ASR server 14 reports the recognized text information to the media server 13.
  • MRP Media Resource Control Protocol
  • the ASR server is an external device of the media server.
  • the APP server requests the ASR service, it only initiates a request to the media server.
  • the media server determines the current service type.
  • the service type is the ASR application
  • the media server initiates a request to the ASR server, applies for resources, and controls the behavior of the ASR server.
  • the ASR server After receiving the signaling, it waits for the input of the media information, and automatically recognizes the media information as a text and sends it to the media server through the MRCP.
  • the existing implementation methods have certain defects. For example, if the audio capability set of the ASR server does not match the audio capability set of the terminal, the ASR service fails. Because the APP server performs the Session Description Protocol (SDP) negotiation with the media server, the media server does not know whether the current service type is an ASR, so the audio parameters are negotiated with the terminal according to its own capability range. When the APP server sends an information (INFO) command to the media server, the media server can recognize the ASR service type. At this time, the media server passes the end. The SDP information is applied to the ASR server for resources.
  • SDP Session Description Protocol
  • the audio codec capability of the ASR server is different from the negotiation result of the media server and the terminal, for example, the audio codec type negotiated by the media server with the terminal is AMR format, but when the ASR server only supports the G711 audio format, The data that causes the ASR server to access the media service data packet fails, and the ASR service fails.
  • the main purpose of the embodiments of the present invention is to provide an ASR service implementation method, system, and media server, which can solve the problem that the ASR server cannot access the media service when the audio codec capability negotiated by the media server and the terminal cannot meet the ASR server.
  • the problem of packet data ensures the implementation of ASR services.
  • An embodiment of the present invention provides an implementation method for an automatic voice recognition ASR service, where the method includes:
  • the media server After receiving the access request of the APP server, the media server determines the set of audio codec types supported by the media server;
  • the media server After receiving the ASR service request sent by the APP server, the media server applies for the ASR service resource to the ASR server according to the ASR service type.
  • the media server negotiates with the ASR server according to the audio codec type set, transcodes the media service data packet by using the negotiated audio codec type, and sends the transcoded media service data packet to the ASR server.
  • the media server negotiates with the ASR server, transcodes the media service data packet by using the negotiated audio codec type, and sends the transcoded media service data packet to the ASR server, where:
  • the media control module MSCU in the media server sends the session initiation protocol SIP signaling to the ASR server to negotiate, and specifies the audio codec class that the media server matches the ASR server.
  • the voice center interaction module MRU in the media server receives the media service data packet sent by the terminal, and transcodes the media service data packet according to the negotiated audio codec type, and transcodes the media service data packet.
  • the MSCU controls the MSTU to send the transcoded media service data packet to the ASR server.
  • the media server negotiates with the ASR server according to the audio codec type set to obtain an audio codec type, which is:
  • the media server sends the SIP signaling to the ASR server.
  • the ASR server determines whether the audio codec type supported by the ASR server exists in the audio codec capability set supported by the media server. If there is a matching audio codec type, Notifying the media server, the two parties specify the matched audio codec type as the audio codec type for transcoding the media service data packet; if there is no matching audio codec type, the current ASR service process is ended.
  • the method further includes:
  • the terminal sends a media service data packet request to the APP server; the APP server sends a signaling of the access request to the media server according to the media service data packet request, and then the media server specifies an address that the user interacts with the terminal.
  • the media server transcodes the media service data packet, and sends the transcoded media service data packet to the ASR server, where:
  • the MSCU in the media server notifies the MSTU to open the NAT channel
  • the MSCU in the media server sends a transcoding command to the MRU.
  • the MSCU in the media server establishes a link with the ASR server, and notifies the ASR server to wait for audio input, and performs audio recognition;
  • the MRU in the media server transcodes the data in the media service data packet sent by the terminal, and sends the transcoded media service data packet to the receiving port of the MSTU through the MRU inner port;
  • the MSTU in the media server NATs the transcoded media service data packet and sends it to the ASR server.
  • the embodiment of the present invention further provides an implementation system for an ASR service, where the system includes: a media server, an APP server, and an ASR server;
  • the media server is configured to: after receiving the access request of the APP server, determine the set of audio codec types supported by the APP server; after receiving the ASR service request sent by the APP server, apply for the ASR service resource to the ASR server according to the ASR service type;
  • the audio codec type set is negotiated with the ASR server, and the media service data packet is transcoded by the negotiated audio codec type, and the transcoded media service data packet is sent to the ASR server;
  • the APP server Configured to send an access request and an ASR service request to the media server;
  • the ASR server is configured to negotiate with the media server, and receive the transcoded media service data packet sent by the media server.
  • the system further includes a terminal configured to: after receiving the access request of the APP server, the media server sends a media service data packet request to the APP server; correspondingly,
  • the APP server is further configured to send signaling of the access request to the media server according to the media service data packet request;
  • the media server is further configured to specify an address that the user interacts with the terminal after receiving the signaling of the access request.
  • the media server negotiates with the ASR server according to the audio codec type set, transcodes the media service data packet by using the negotiated audio codec type, and sends the transcoded media service data packet to the ASR.
  • Server for: The MSCU in the media server sends SIP signaling to the ASR server to negotiate, and specifies the audio codec type that the media server matches the ASR server;
  • the MRU in the media server receives the media service data packet sent by the terminal, and transcodes the media service data packet according to the negotiated audio codec type, and sends the transcoded media service data packet to the media server.
  • the MSCU controls the MSTU to send the transcoded media service data packet to the ASR server.
  • the media server further includes: an MSCU, an MRU, and an MSTU; wherein the MSCU is configured to send SIP signaling to the ASR server for negotiation, and specify an audio codec type that the media server matches the ASR server; and control the MSTU. Sending the transcoded media service data packet;
  • the MRU is configured to receive a media service data packet sent by the terminal, and transcode the media service data packet according to the negotiated audio codec type, and send the transcoded media service data packet to the media.
  • MSTU in the server
  • the MSTU is configured to send the transcoded media service data packet to the ASR server under the control of the MSCU.
  • the embodiment of the present invention further provides a media server, where the media server is configured to determine an audio codec type set supported by the APP server after receiving the access request of the APP server, and after receiving the ASR service request sent by the APP server,
  • the ASR service type applies for the ASR service resource to the ASR server.
  • the ASR server negotiates with the ASR server according to the audio codec type set, and transcodes the media service data packet by using the negotiated audio codec type, and the transcoded media service is performed. The packet is sent to the ASR server.
  • the ASR service implementation method, system, and media server provided by the embodiment of the present invention, after receiving the access request of the APP server, the media server determines the audio codec type set supported by the media server; after receiving the ASR service request sent by the APP server, the media server receives the ASR service request sent by the APP server. Applying for the ASR service resource to the ASR server according to the ASR service type; the media server according to the audio codec type set Negotiating with the ASR server, transcoding the media service data packet by the negotiated audio codec type, and transmitting the transcoded media service data packet to the ASR server.
  • the media server and the ASR server can determine the audio codec type that matches the two, and the media service data packet encoded by the negotiated audio codec type is sent to the ASR server.
  • the media server does not use the audio codec type supported by the terminal as the audio codec capability set according to the negotiation, and all the audio codec types supported by the media server are used as the audio codec capability set according to the negotiation. . Therefore, the embodiment of the present invention can solve the problem that the ASR server fails to access the media service data packet when the audio codec capability set of the media server cannot meet the ASR server, thereby achieving the effect of improving the success rate of the ASR server accessing the media service data packet. Ensure the realization of ASR business.
  • FIG. 1 is a schematic structural diagram of a system for implementing an ASR service in the prior art
  • FIG. 2 is a schematic flowchart of a method for implementing an ASR service according to an embodiment of the present invention
  • FIG. 3 is a media server and an ASR server negotiated by the media server according to an embodiment of the present invention, and the media server transcodes the media service data packet by using the negotiated audio codec type.
  • FIG. 4 is a schematic structural diagram of a system for implementing an ASR service according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of an embodiment of a media server according to an embodiment of the present invention. detailed description
  • the media server after receiving the access request of the APP server, determines the set of audio codec types supported by the APP server; after receiving the ASR service request sent by the APP server, the media server applies for the ASR to the ASR server according to the ASR service type.
  • the service server negotiates with the ASR server according to the audio codec type set, transcodes the media service data packet by using the negotiated audio codec type, and converts the transcoded media service data.
  • the package is sent to the ASR server.
  • FIG. 2 is a schematic flowchart of a method for implementing an ASR service according to an embodiment of the present invention. As shown in FIG. 2, the method includes the following steps:
  • Step 201 After receiving the access request of the APP server, the media server determines the set of audio codec types supported by the media server.
  • the APP server sends an invitation (INVITE) signaling to the media server for media negotiation, and the media server selects the same set of audio codec types from the set of audio codec capabilities supported by the media server for performing media service data with the terminal. Effective transmission of the package. This step can be implemented using the prior art and will not be described in detail herein.
  • the method further includes: the terminal sending a media service data packet request to the APP server; and the APP server requests the media according to the media service data packet.
  • the server sends the signaling of the access request, after which the media server specifies the address with which the terminal interacts with the terminal.
  • the interaction address is: an external address of the MSTU.
  • Step 202 After receiving the ASR service request sent by the APP server, the media server applies for the ASR service resource to the ASR server according to the ASR service type.
  • the APP server sends an INFO command to the media server, and the media server determines, according to the INFO command, that the service type that the APP server applies to itself is ASR, and then applies for the ASR service resource to the ASR server according to the ASR service type.
  • Step 203 The media server negotiates with the ASR server according to the audio codec type set, transcodes the media service data packet by using the negotiated audio codec type, and sends the transcoded media service data packet to the ASR server. ;
  • the MSCU in the media server sends SIP signaling to the ASR server to negotiate and specify the audio codec type that the media server matches the ASR server;
  • the MRU receives the media service data packet sent by the terminal, and transcodes the media service data packet according to the negotiated audio codec type, and sends the transcoded media service data packet to the MSTU in the media server;
  • the MSCU controls the MSTU to send the transcoded media service data packet to the ASR server.
  • step 203 may include the following steps:
  • Step 301 The MSCU in the media server sends SIP signaling to the ASR server, and negotiates an audio codec type with the ASR server.
  • the SIP signaling carries a set of audio codec capabilities supported by the media server, that is, the SIP signaling carries all audio codec types supported by the voice center interaction module (MRU) in the media server.
  • the ASR server determines whether the audio codec type supported by the ASD server exists in the audio codec capability set supported by the media server. If there is a matching audio codec type, the ASR server notifies the media server, and the two parties specify the matching.
  • the audio codec type is used as the audio codec type for transcoding the media service data packet. Here, if there are two or more matching audio codec types, one of the selected ones is used as the subsequent transcoding of the media service data packet. Audio codec type; If there is no matching audio codec type, the current ASR service flow is ended.
  • the media server does not use the audio codec type supported by the terminal as the audio codec capability set according to the negotiation, and all the audio codec types supported by the media server are used as the audio codec capability according to the negotiation. set.
  • Step 302 The MSCU in the media server notifies the MSTU to open a network address translation (NAT) channel;
  • NAT network address translation
  • the MSCU sends a command to open the NAT channel to the MSTU.
  • Step 303 The MSCU in the media server sends a transcoding command to the MRU.
  • the MSCU in the media server notifies the MRU to receive the media service sent by the terminal.
  • the data packet, and the audio codec type of the port that the MRU is connected to the ASR server is the audio codec type that has been negotiated in step 301, and the audio codec type on which the MRU transcoding is based is the audio code that has been negotiated in step 301.
  • the type of decoding is the audio codec type that has been negotiated in step 301.
  • Step 304 The MSCU in the media server establishes a link with the ASR server, and notifies the ASR server to wait for audio input, and performs audio recognition.
  • the MSCU establishes a TCP/IP link with the ASR server, and the MSCU sends an MRCP command to the ASR server through the MRCP to notify the ASR server to wait for audio input and perform audio recognition.
  • Step 305 The MRU in the media server transcodes the data in the media service data packet sent by the terminal, and sends the transcoded media service data packet, that is, the audio media service data to the MSTU through the MRU inner port.
  • Step 306 The MSTU in the media server receives the transcoded media service data packet sent by the MRU, performs NAT, and sends the packet to the ASR server.
  • the method further includes: the ASR server parses the received media service data packet into a text, and sends the text to the media server by using the MRCP; the media server reports the INFO to the APP server. Execution result, at the same time, the APP server sends the BYE signaling to the media server to release the resource; the media server requests the ASR server to release the resource, and then returns the result to the APP server, and the ASR service ends.
  • the embodiment of the present invention further provides an implementation system of an ASR service.
  • the system includes: a media server 43, an APP server 42, and an ASR server 44;
  • the media server 43 is configured to determine the set of audio codec types supported by the APP server 42 after receiving the access request from the APP server 42. After receiving the ASR service request sent by the APP server 42, apply to the ASR server 44 for the ASR according to the ASR service type.
  • the service resource is negotiated with the ASR server 44 according to the audio codec type set, and the media service data packet is transcoded by the negotiated audio codec type, and the transcoded media service data packet is sent to ASR server 44;
  • the APP server 42 is configured to send an access request and an ASR service request to the media server 43;
  • the ASR server 44 is configured to negotiate with the media server 43 and receive the transcoded media service data packet sent by the media server 42.
  • system further includes a terminal 41 configured to send the media service data packet request to the APP server 42 after the media server 43 receives the access request from the APP server 42;
  • the APP server 42 is further configured to send signaling of the access request to the media server 43 according to the media service data packet request;
  • the media server 43 is further configured to, after receiving the signaling of the access request, specify an address that the user interacts with the terminal.
  • the media server negotiates with the ASR server according to the audio codec type set, transcodes the media service data packet by using the negotiated audio codec type, and sends the transcoded media service data packet to the ASR.
  • Server for:
  • the MSCU in the media server sends SIP signaling to the ASR server to negotiate, and specifies the audio codec type that the media server matches the ASR server;
  • the MRU in the media server receives the media service data packet sent by the terminal, and transcodes the media service data packet according to the negotiated audio codec type, and sends the transcoded media service data packet to the media server.
  • the MSCU controls the MSTU to send the transcoded media service data packet to the ASR server.
  • the media server 43 further includes: an MSCU 51, an MRU 52, and an MSTU 53;
  • the MSCU 51 is configured to send SIP signaling to the ASR server 44 for negotiation, and specify an audio codec type that the media server matches the ASR server 44; Sending the transcoded media service data packet;
  • the MRU 52 is configured to receive a media service data packet sent by the terminal, and transcode the media service data packet according to the negotiated audio codec type, and send the transcoded media service data packet to the MSTU 53 in the media server;
  • the MSTU 53 configured to transmit the transcoded media service data packet to the ASR server 44 under the control of the MSCU 51.
  • the embodiment of the present invention further provides a media server, configured to determine an audio codec type set supported by the APP server after receiving the access request of the APP server, and to send an ASR service request to the ASR according to the ASR service type.
  • the server applies for the ASR service resource; negotiates with the ASR server according to the audio codec type set, transcodes the media service data packet by using the negotiated audio codec type, and sends the transcoded media service data packet to the ASR. server.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Telephonic Communication Services (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明公开了一种ASR业务的实现方法,包括:媒体服务器收到APP服务器的访问请求后,确定自身支持的音频编解码类型集;媒体服务器收到APP服务器发送的ASR业务请求后,根据ASR业务类型向ASR服务器申请ASR业务资源;媒体服务器根据所述音频编解码类型集与ASR服务器进行协商,通过协商所得的音频编解码类型对媒体业务数据包进行转码,并将转码后的媒体业务数据包发送给ASR服务器。本发明还同时公开了一种ASR业务的实现系统和媒体服务器,本发明可解决媒体服务器与终端协商的音频编解码能力无法满足ASR服务器时,ASR服务器无法访问媒体业务数据包数据的问题,保证ASR业务的实现。

Description

一种自动语音识别业务的实现方法、 系统和媒体服务器 技术领域
本发明涉及通信领域中的自动语音识别 (ASR )技术, 尤其涉及一种 ASR业务的实现方法、 系统和媒体服务器。 背景技术
媒体服务器( Media Server, MS )是软交换体系中提供专用媒体资源功 能的独立设备, 也是分组网络中的重要设备, 提供基本、 增强业务中的媒 体处理功能, 并配置为所有与音视频相关的媒体处理操作, 所述媒体处理 操作包括: 视频和音频实时传输协议(RTP )的数据与视、 音频文件的相互 转换。 同时, 媒体服务器也配置为接收用户通过终端双音多频(DTMF )的 输入、 播放业务的引导语音以及显示动态的引导画面。 媒体服务器具有的 会话初始协议(SIP )和 MSML/MOML协议能力, 使得媒体服务器能在应 用服务器(APP Server ) 的控制下完成整个会话过程, 实现与用户的交互。
媒体控制模块( MSCU )是媒体服务器中的一个重要模块, 主要配置为 与其他实体进行能力协商, 提供资源本身的管理、 维护, 以及控制其他业 务资源模块执行复杂的业务。
媒体存储传输音频模块( MSTU )是媒体服务器中的业务资源模块, 配 置为存储海量的音频数据, 并实现音频文件的播放功能。 媒体存储传输音 频模块上设置有对外网口, 可以直接通过所述对外网口收发音频数据。
现有技术中, 媒体服务器的应用范围很广, 主要可以归纳为音视频播 放、 收号和会议等功能。
ASR功能是对输入的音频信息进行识别, 转化为文字, 并将文字信息 通过消息上 4艮给用户。 目前, 在电信领域中, ASR应用通常是通过专门配 置的 ASR服务器来实现的,通过信令指定 ASR服务器将文字发送到用户端, 如发送到用户的终端来完成一次 ASR业务。
图 1为现有技术中实现 ASR业务的系统结构示意图, 如图 1所示, 该 系统包括: 终端 11、 APP服务器 12、 媒体服务器 13和 ASR服务器 14。 基 于图 1所述系统的方法实现流程包括如下步驟:
步驟 101 : 终端 11发起一次呼叫, 触发 APP服务器 12以激活 APP业 务;
步驟 102: APP服务器 12通过 SIP信令向媒体服务器 13请求 ASR业 务;
步驟 103: 媒体服务器 13通过 SIP信令向 ASR服务器 14请求 ASR资 源,并通过媒体资源控制协议(MRCP )控制 ASR服务器 14执行相应业务; 步驟 104: 终端 11向 ASR服务器 14发送媒体业务数据包, 并且 ASR 服务器 14将识别出的文本信息上报给媒体服务器 13。
以上便是目前典型的 ASR业务组网结构图和业务实现流程。其中, ASR 服务器为媒体服务器的外置装置。 APP服务器在请求 ASR业务时只是向媒 体服务器发起请求, 媒体服务器判断当前业务类型, 当业务类型为 ASR应 用时,媒体服务器再向 ASR服务器发起请求, 申请资源, 并控制 ASR服务 器的行为, ASR服务器在收到信令后等待媒体信息的输入, 并自动将媒体 信息识别成文字, 通过 MRCP发送给媒体服务器。
但是, 随着业务应用的扩展, 上述现有实现方法存在一定缺陷, 比如: ASR服务器的音频能力集与终端的音频能力集不匹配,将导致 ASR业务失 败。 因为 APP服务器在同媒体服务器进行会话描述协议 ( SDP )协商时, 媒体服务器并不知道当前的业务类型是否为 ASR, 所以会按照自身的能力 范围同终端协商音频参数。 当 APP服务器向媒体服务器下发信息(INFO ) 指令时, 媒体服务器才能识别出 ASR业务类型, 此时, 媒体服务器通过终 端 SDP信息向 ASR服务器申请资源。 但是, 如果 ASR服务器的音频编解 码能力范围与媒体服务器同终端协商的结果不相同时, 比如: 媒体服务器 同终端协商的音频编解码类型为 AMR格式, 但 ASR服务器只支持 G711 的音频格式时, 将导致 ASR服务器访问媒体业务数据包的数据失败, 最终 导致 ASR业务失败。 发明内容
有鉴于此, 本发明实施例的主要目的在于提供一种 ASR业务的实现方 法、 系统和媒体服务器, 可解决媒体服务器与终端协商的音频编解码能力 无法满足 ASR服务器时, ASR服务器无法访问媒体业务数据包数据的问题, 保证 ASR业务的实现。
为达到上述目的, 本发明实施例的技术方案是这样实现的:
本发明实施例提供了一种自动语音识别 ASR业务的实现方法, 该方法 包括:
媒体服务器收到 APP服务器的访问请求后, 确定自身支持的音频编解 码类型集;
媒体服务器收到 APP服务器发送的 ASR业务请求后, 根据 ASR业务 类型向 ASR服务器申请 ASR业务资源;
媒体服务器根据所述音频编解码类型集与 ASR服务器进行协商, 通过 协商所得的音频编解码类型对媒体业务数据包进行转码, 并将转码后的媒 体业务数据包发送给 ASR服务器。
其中, 所述媒体服务器与 ASR服务器进行协商, 通过协商所得的音频 编解码类型对媒体业务数据包进行转码, 并将转码后的媒体业务数据包发 送给 ASR服务器, 为:
媒体服务器中的媒体控制模块 MSCU向 ASR服务器发送会话初始协议 SIP信令进行协商, 并指定媒体服务器与 ASR服务器匹配的音频编解码类 型;
媒体服务器中的语音中心交互模块 MRU接收终端所发的媒体业务数 据包, 并将所述媒体业务数据包按所述协商的音频编解码类型进行转码, 并将转码后的媒体业务数据包发送到媒体服务器中的媒体存储传输音频模 块 MSTU;
MSCU控制 MSTU将转码后的媒体业务数据包发送至 ASR服务器。 其中, 所述媒体服务器根据所述音频编解码类型集与 ASR服务器进行 协商得到音频编解码类型, 为:
媒体服务器向 ASR服务器发送 SIP信令, ASR服务器收到 SIP信令后, 判断自身支持的音频编解码类型是否存在于媒体服务器支持的音频编解码 能力集中, 如果存在匹配的音频编解码类型, 则通知媒体服务器, 双方指 定所述匹配的音频编解码类型作为后续对媒体业务数据包进行转码的音频 编解码类型; 如果不存在匹配的音频编解码类型, 则结束当前 ASR业务流 程。
上述方案中, 所述媒体服务器收到 APP服务器的访问请求之后, 该方 法还包括:
终端向 APP服务器发送媒体业务数据包请求; APP服务器根据所述媒 体业务数据包请求向媒体服务器发送访问请求的信令, 之后媒体服务器指 定自身与终端进行交互的地址。
其中, 所述媒体服务器对媒体业务数据包进行转码, 并将转码后的媒 体业务数据包发送给 ASR服务器, 为:
媒体服务器中的 MSCU通知 MSTU打开 NAT通道;
媒体服务器中的 MSCU向 MRU下发转码命令;
媒体服务器中的 MSCU同 ASR服务器建立链接, 并通知 ASR服务器 等待音频输入, 并进行音频识别; 媒体服务器中的 MRU将终端所发的媒体业务数据包中的数据进行转 码, 并将转码后的媒体业务数据包通过 MRU内口发送到 MSTU的接收端 口;
媒体服务器中的 MSTU对转码后的媒体业务数据包进行 NAT, 并发送 到 ASR服务器。
本发明实施例还提供了一种 ASR业务的实现系统, 该系统包括: 媒体 服务器、 APP服务器和 ASR服务器; 其中,
所述媒体服务器, 配置为收到 APP服务器的访问请求后, 确定自身支 持的音频编解码类型集; 收到 APP服务器发送的 ASR业务请求后, 根据 ASR业务类型向 ASR服务器申请 ASR业务资源; 根据所述音频编解码类 型集与 ASR服务器进行协商, 通过协商所得的音频编解码类型对媒体业务 数据包进行转码, 并将转码后的媒体业务数据包发送给 ASR服务器; 所述 APP服务器, 配置为向媒体服务器发送访问请求和 ASR业务请 求;
所述 ASR服务器, 配置为与媒体服务器进行协商, 并接收媒体服务器 所发的转码后的媒体业务数据包。
优选地, 该系统还包括终端, 配置为媒体服务器收到 APP服务器的访 问请求之后, 向 APP服务器发送媒体业务数据包请求; 相应的,
所述 APP服务器, 还配置为根据所述媒体业务数据包请求向媒体服务 器发送访问请求的信令;
所述媒体服务器, 还配置为收到所述访问请求的信令后, 指定自身与 终端进行交互的地址。
其中, 所述媒体服务器根据所述音频编解码类型集与 ASR服务器进行 协商, 通过协商所得的音频编解码类型对媒体业务数据包进行转码, 并将 转码后的媒体业务数据包发送给 ASR服务器, 为: 媒体服务器中的 MSCU向 ASR服务器发送 SIP信令进行协商,并指定 媒体服务器与 ASR服务器匹配的音频编解码类型;
媒体服务器中的 MRU接收终端所发的媒体业务数据包,并将所述媒体 业务数据包按所述协商的音频编解码类型进行转码, 并将转码后的媒体业 务数据包发送到媒体服务器中的 MSTU;
MSCU控制 MSTU将转码后的媒体业务数据包发送至 ASR服务器。 优选地, 所述媒体服务器还包括: MSCU、 MRU和 MSTU; 其中, 所述 MSCU, 配置为向 ASR服务器发送 SIP信令进行协商, 并指定媒 体服务器与 ASR服务器匹配的音频编解码类型;控制 MSTU发送转码后的 媒体业务数据包;
所述 MRU, 配置为接收终端所发的媒体业务数据包, 并将所述媒体业 务数据包按所述协商的音频编解码类型进行转码, 并将转码后的媒体业务 数据包发送到媒体服务器中的 MSTU;
所述 MSTU, 配置为在 MSCU的控制下将转码后的媒体业务数据包发 送至 ASR服务器。
本发明实施例还提供了一种媒体服务器, 所述媒体服务器, 配置为收 到 APP服务器的访问请求后, 确定自身支持的音频编解码类型集; 收到 APP服务器发送的 ASR业务请求后, 根据 ASR业务类型向 ASR服务器申 请 ASR业务资源;根据所述音频编解码类型集与 ASR服务器进行协商,通 过协商所得的音频编解码类型对媒体业务数据包进行转码, 并将转码后的 媒体业务数据包发送给 ASR服务器。
本发明实施例提供的 ASR业务的实现方法、 系统和媒体服务器, 媒体 服务器收到 APP服务器的访问请求后,确定自身支持的音频编解码类型集; 媒体服务器收到 APP服务器发送的 ASR业务请求后, 根据 ASR业务类型 向 ASR服务器申请 ASR业务资源;媒体服务器根据所述音频编解码类型集 与 ASR服务器进行协商, 通过协商所得的音频编解码类型对媒体业务数据 包进行转码, 并将转码后的媒体业务数据包发送给 ASR服务器。 本发明实 施例通过媒体服务器与 ASR服务器的协商, 可确定两者匹配的音频编解码 类型, 通过协商所得的音频编解码类型进行编码后的媒体业务数据包被发 送到 ASR服务器。 在所述协商过程中, 媒体服务器不是以终端支持的音频 编解码类型作为协商所依据的音频编解码能力集, 而以媒体服务器支持的 所有音频编解码类型作为协商所依据的音频编解码能力集。 因此, 本发明 实施例可解决媒体服务器的音频编解码能力集无法满足 ASR服务器时, ASR服务器访问媒体业务数据包失败的问题,进而达到了提高 ASR服务器 访问媒体业务数据包成功率的效果, 可保证 ASR业务的实现。 附图说明
图 1为现有技术中实现 ASR业务的系统结构示意图;
图 2为本发明实施例 ASR业务实施例的实现方法流程示意图; 图 3为本发明实施例媒体服务器与 ASR服务器进行协商, 媒体服务器 通过协商的音频编解码类型对媒体业务数据包进行转码, 并将转码后的媒 体业务数据包发送给 ASR服务器的方法实施例的实现流程示意图;
图 4为本发明实施例实现 ASR业务的系统结构示意图;
图 5为本发明实施例所述媒体服务器实施例的结构示意图。 具体实施方式
本发明的实施例中: 媒体服务器收到 APP服务器的访问请求后, 确定 自身支持的音频编解码类型集;媒体服务器收到 APP服务器发送的 ASR业 务请求后, 根据 ASR业务类型向 ASR服务器申请 ASR业务资源; 媒体服 务器根据所述音频编解码类型集与 ASR服务器进行协商, 通过协商所得的 音频编解码类型对媒体业务数据包进行转码, 并将转码后的媒体业务数据 包发送给 ASR服务器。
下面结合附图及具体实施例对本发明作进一步详细说明。
图 2为本发明实施例 ASR业务实施例的实现方法流程示意图, 如图 2 所示, 包括如下步驟:
步驟 201: 媒体服务器收到 APP服务器的访问请求后, 确定自身支持 的音频编解码类型集;
具体为: APP服务器向媒体服务器发送邀请 ( INVITE )信令进行媒体 协商, 媒体服务器从自身支持的音频编解码能力集中选定与终端相同的音 频编解码类型集, 用于与终端进行媒体业务数据包的有效传输。 该步驟可 采用现有技术实现, 此处不再详述。
在一个实施例中, 该步驟中所述媒体服务器收到 APP服务器的访问请 求之后,该方法还包括:终端向 APP服务器发送媒体业务数据包请求; APP 服务器根据所述媒体业务数据包请求向媒体服务器发送访问请求的信令, 之后媒体服务器指定自身与终端进行交互的地址。 所述交互地址为: MSTU 的外口地址。
步驟 202: 媒体服务器收到 APP服务器发送的 ASR业务请求后, 根据 ASR业务类型向 ASR服务器申请 ASR业务资源;
具体为: APP服务器向媒体服务器发送 INFO指令, 媒体服务器根据 所述 INFO指令确定 APP服务器向自身申请的业务类型为 ASR, 之后根据 ASR业务类型向 ASR服务器申请 ASR业务资源。
步驟 203: 媒体服务器根据所述音频编解码类型集与 ASR服务器进行 协商, 通过协商所得的音频编解码类型对媒体业务数据包进行转码, 并将 转码后的媒体业务数据包发送给 ASR服务器;
具体的,媒体服务器中的 MSCU向 ASR服务器发送 SIP信令进行协商 并指定媒体服务器与 ASR服务器匹配的音频编解码类型; 媒体服务器中的 MRU接收终端所发的媒体业务数据包, 并将所述媒体业务数据包按所述协 商的音频编解码类型进行转码, 并将转码后的媒体业务数据包发送到媒体 服务器中的 MSTU; MSCU控制 MSTU将转码后的媒体业务数据包发送至 ASR服务器。
在实际运行过程中, 如图 3所示, 步驟 203 中所述方法的实现可包括 如下步驟:
步驟 301: 媒体服务器中的 MSCU向 ASR服务器发送 SIP信令, 与 ASR服务器协商音频编解码类型;
这里, 所述 SIP信令中携带媒体服务器支持的音频编解码能力集, 即: SIP信令中携带媒体服务器中所述语音中心交互模块(MRU ) 支持的所有 音频编解码类型。 ASR服务器收到 SIP信令后, 判断自身支持的音频编解 码类型是否存在于媒体服务器支持的音频编解码能力集中, 如果存在匹配 的音频编解码类型, 则通知媒体服务器, 双方指定所述匹配的音频编解码 类型作为后续对媒体业务数据包进行转码的音频编解码类型, 这里, 如果 存在两种以上匹配的音频编解码类型, 则从中人选一种作为后续对媒体业 务数据包进行转码的音频编解码类型; 如果不存在匹配的音频编解码类型, 则结束当前 ASR业务流程。
本发明所述实施例中, 媒体服务器不以终端支持的音频编解码类型作 为协商所依据的音频编解码能力集, 而以媒体服务器支持的所有音频编解 码类型作为协商所依据的音频编解码能力集。
步驟 302:媒体服务器中的 MSCU通知 MSTU打开网络地址转换( NAT ) 通道;
这里, MSCU向 MSTU下发打开 NAT通道的命令。
步驟 303: 媒体服务器中的 MSCU向 MRU下发转码命令;
具体的, 媒体服务器中的 MSCU通知 MRU接收终端所发的媒体业务 数据包, 并指定 MRU同 ASR服务器相连的端口的音频编解码类型为步驟 301中已协商的音频编解码类型, 并指定 MRU转码所依据的音频编解码类 型为步驟 301中已协商的音频编解码类型。
步驟 304: 媒体服务器中的 MSCU 同 ASR服务器建立链接, 并通知 ASR服务器等待音频输入, 并进行音频识别;
这里, MSCU同 ASR服务器建立 TCP/IP链接, MSCU通过 MRCP向 ASR服务器发送 MRCP指令来通知 ASR服务器等待音频输入, 并进行音 频识别。
步驟 305: 媒体服务器中的 MRU将终端所发的媒体业务数据包中的数 据进行转码, 并将转码后的媒体业务数据包, 也就是将音频媒体业务数据 通过 MRU内口发送到 MSTU的接收端口;
步驟 306:媒体服务器中的 MSTU收到 MRU发送的转码后的媒体业务 数据包后进行 NAT, 并发送到 ASR服务器。
在一个实施例中, 所述步驟 203之后, 该方法还包括: ASR服务器将 收到的媒体业务数据包解析成文字, 并通过 MRCP将所述文字发送到媒体 服务器; 媒体服务器向 APP服务器上报 INFO执行结果, 同时, APP服务 器向媒体服务器发送 BYE信令, 以释放资源; 媒体服务器向 ASR服务器 请求释放资源, 之后向 APP服务器返回结果, ASR业务结束。
本发明实施例还提供了一种 ASR业务的实现系统, 如图 4所示, 该系 统包括: 媒体服务器 43、 APP服务器 42和 ASR服务器 44; 其中,
所述媒体服务器 43, 配置为收到 APP服务器 42的访问请求后, 确定 自身支持的音频编解码类型集; 收到 APP服务器 42发送的 ASR业务请求 后,根据 ASR业务类型向 ASR服务器 44申请 ASR业务资源;根据所述音 频编解码类型集与 ASR服务器 44进行协商, 通过协商所得的音频编解码 类型对媒体业务数据包进行转码, 并将转码后的媒体业务数据包发送给 ASR服务器 44;
所述 APP服务器 42,配置为向媒体服务器 43发送访问请求和 ASR业 务请求;
所述 ASR服务器 44, 配置为与媒体服务器 43进行协商, 并接收媒体 服务器 42所发的转码后的媒体业务数据包。
进一步地, 该系统还包括终端 41 , 配置为媒体服务器 43收到 APP服 务器 42的访问请求之后, 向 APP服务器 42发送媒体业务数据包请求; 相 应的,
所述 APP服务器 42,还配置为根据所述媒体业务数据包请求向媒体服 务器 43发送访问请求的信令;
所述媒体服务器 43 , 还配置为收到所述访问请求的信令后, 指定自身 与终端进行交互的地址。
其中, 所述媒体服务器根据所述音频编解码类型集与 ASR服务器进行 协商, 通过协商所得的音频编解码类型对媒体业务数据包进行转码, 并将 转码后的媒体业务数据包发送给 ASR服务器, 为:
媒体服务器中的 MSCU向 ASR服务器发送 SIP信令进行协商,并指定 媒体服务器与 ASR服务器匹配的音频编解码类型;
媒体服务器中的 MRU接收终端所发的媒体业务数据包,并将所述媒体 业务数据包按所述协商的音频编解码类型进行转码, 并将转码后的媒体业 务数据包发送到媒体服务器中的 MSTU;
MSCU控制 MSTU将转码后的媒体业务数据包发送至 ASR服务器。 相应的, 如图 5所示, 所述媒体服务器 43还包括: MSCU 51、 MRU 52 和 MSTU 53; 其中,
所述 MSCU 51 , 配置为向 ASR服务器 44发送 SIP信令进行协商, 并 指定媒体服务器与 ASR服务器 44匹配的音频编解码类型; 控制 MSTU 53 发送转码后的媒体业务数据包;
所述 MRU 52, 配置为接收终端所发的媒体业务数据包, 并将所述媒体 业务数据包按所述协商的音频编解码类型进行转码, 并将转码后的媒体业 务数据包发送到媒体服务器中的 MSTU 53;
所述 MSTU 53,配置为在 MSCU 51的控制下将转码后的媒体业务数据 包发送至 ASR服务器 44。
本发明实施例还提供了一种媒体服务器, 配置为收到 APP服务器的访 问请求后,确定自身支持的音频编解码类型集;收到 APP服务器发送的 ASR 业务请求后, 根据 ASR业务类型向 ASR服务器申请 ASR业务资源; 根据 所述音频编解码类型集与 ASR服务器进行协商, 通过协商所得的音频编解 码类型对媒体业务数据包进行转码, 并将转码后的媒体业务数据包发送给 ASR服务器。
以上所述, 仅为本发明的较佳实施例而已, 并非用于限定本发明的保 护范围。

Claims

权利要求书
1、 一种自动语音识别 ASR业务的实现方法, 该方法包括:
媒体服务器收到 APP服务器的访问请求后, 确定自身支持的音频编解 码类型集;
媒体服务器收到所述 APP服务器发送的 ASR业务请求后, 根据 ASR 业务类型向 ASR服务器申请 ASR业务资源;
媒体服务器根据所述音频编解码类型集与所述 ASR服务器进行协商, 通过协商所得的音频编解码类型对媒体业务数据包进行转码, 并将转码后 的媒体业务数据包发送给所述 ASR服务器。
2、 根据权利要求 1所述的 ASR业务的实现方法, 其中, 所述媒体服 务器与 ASR服务器进行协商, 通过协商所得的音频编解码类型对媒体业务 数据包进行转码, 并将转码后的媒体业务数据包发送给 ASR服务器, 为: 媒体服务器中的媒体控制模块 MSCU向所述 ASR服务器发送会话初始 协议 SIP信令进行协商, 并指定媒体服务器与所述 ASR服务器匹配的音频 编解码类型;
媒体服务器中的语音中心交互模块 MRU接收终端所发的媒体业务数 据包, 并将所述媒体业务数据包按所述协商的音频编解码类型进行转码, 并将转码后的媒体业务数据包发送到媒体服务器中的媒体存储传输音频模 块 MSTU;
MSCU控制 MSTU将转码后的媒体业务数据包发送至所述 ASR服务 器。
3、 根据权利要求 1所述的 ASR业务的实现方法, 其中, 所述媒体服 务器根据所述音频编解码类型集与 ASR服务器进行协商得到音频编解码类 型, 为:
媒体服务器向 ASR服务器发送 SIP信令, ASR服务器收到 SIP信令后, 判断自身支持的音频编解码类型是否存在于媒体服务器支持的音频编解码 能力集中, 如果存在匹配的音频编解码类型, 则通知媒体服务器, 双方指 定所述匹配的音频编解码类型作为后续对媒体业务数据包进行转码的音频 编解码类型; 如果不存在匹配的音频编解码类型, 则结束当前 ASR业务流 程。
4、 根据权利要求 1、 2或 3所述的 ASR业务的实现方法, 其中, 所述 媒体服务器收到 APP服务器的访问请求之后, 该方法还包括:
终端向所述 APP服务器发送媒体业务数据包请求;所述 APP服务器根 据所述媒体业务数据包请求向媒体服务器发送访问请求的信令, 之后媒体 服务器指定自身与终端进行交互的地址。
5、 根据权利要求 2所述的 ASR业务的实现方法, 其中, 所述媒体服 务器对媒体业务数据包进行转码, 并将转码后的媒体业务数据包发送给 ASR月良务器, 为:
媒体服务器中的 MSCU通知 MSTU打开 NAT通道;
媒体服务器中的所述 MSCU向 MRU下发转码命令;
媒体服务器中的所述 MSCU同所述 ASR服务器建立链接,并通知所述
ASR服务器等待音频输入, 并进行音频识别;
媒体服务器中的所述 MRU将终端所发的媒体业务数据包中的数据进 行转码, 并将转码后的媒体业务数据包通过所述 MRU 内口发送到所述
MSTU的接收端口;
媒体服务器中的 MSTU对转码后的媒体业务数据包进行 NAT, 并发送 到 ASR服务器。
6、 一种 ASR业务的实现系统, 该系统包括: 媒体服务器、 APP服务 器和 ASR服务器; 其中,
所述媒体服务器, 配置为收到 APP服务器的访问请求后, 确定自身支 持的音频编解码类型集; 收到 APP服务器发送的 ASR业务请求后, 根据 ASR业务类型向 ASR服务器申请 ASR业务资源; 根据所述音频编解码类 型集与 ASR服务器进行协商, 通过协商所得的音频编解码类型对媒体业务 数据包进行转码, 并将转码后的媒体业务数据包发送给 ASR服务器; 所述 APP服务器, 配置为向媒体服务器发送访问请求和 ASR业务请 求;
所述 ASR服务器, 配置为与媒体服务器进行协商, 并接收媒体服务器 所发的转码后的媒体业务数据包。
7、 根据权利要求 6所述的 ASR业务的实现系统, 其中, 该系统还包 括终端, 配置为媒体服务器收到 APP服务器的访问请求之后, 向 APP服务 器发送媒体业务数据包请求;
相应的, 所述 APP服务器, 还配置为根据所述媒体业务数据包请求向 媒体服务器发送访问请求的信令;
所述媒体服务器, 还配置为收到所述访问请求的信令后, 指定自身与 终端进行交互的地址。
8、 根据权利要求 6或 7所述的 ASR业务的实现系统, 其中, 所述媒 体服务器根据所述音频编解码类型集与 ASR服务器进行协商, 通过协商所 得的音频编解码类型对媒体业务数据包进行转码, 并将转码后的媒体业务 数据包发送给 ASR服务器, 为:
媒体服务器中的 MSCU向所述 ASR服务器发送 SIP信令进行协商,并 指定媒体服务器与所述 ASR服务器匹配的音频编解码类型;
媒体服务器中的 MRU接收终端所发的媒体业务数据包,并将所述媒体 业务数据包按所述协商的音频编解码类型进行转码, 并将转码后的媒体业 务数据包发送到媒体服务器中的 MSTU;
MSCU控制 MSTU将转码后的媒体业务数据包发送至 ASR服务器。
9、 根据权利要求 8所述的 ASR业务的实现系统, 其中, 所述媒体服 务器还包括: MSCU、 MRU和 MSTU; 其中,
所述 MSCU, 配置为向 ASR服务器发送 SIP信令进行协商, 并指定媒 体服务器与 ASR服务器匹配的音频编解码类型;控制 MSTU发送转码后的 媒体业务数据包;
所述 MRU, 配置为接收终端所发的媒体业务数据包, 并将所述媒体业 务数据包按所述协商的音频编解码类型进行转码, 并将转码后的媒体业务 数据包发送到媒体服务器中的 MSTU;
所述 MSTU, 配置为在 MSCU的控制下将转码后的媒体业务数据包发 送至所述 ASR服务器。
10、 一种媒体服务器, 所述媒体服务器, 配置为收到 APP服务器的访 问请求后, 确定自身支持的音频编解码类型集; 收到所述 APP服务器发送 的 ASR业务请求后,根据 ASR业务类型向 ASR服务器申请 ASR业务资源; 根据所述音频编解码类型集与所述 ASR服务器进行协商, 通过协商所得的 音频编解码类型对媒体业务数据包进行转码, 并将转码后的媒体业务数据 包发送给所述 ASR服务器。
PCT/CN2013/082219 2013-01-28 2013-08-23 一种自动语音识别业务的实现方法、系统和媒体服务器 WO2013189430A2 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310032134.7 2013-01-28
CN201310032134.7A CN103151041B (zh) 2013-01-28 2013-01-28 一种自动语音识别业务的实现方法、系统和媒体服务器

Publications (2)

Publication Number Publication Date
WO2013189430A2 true WO2013189430A2 (zh) 2013-12-27
WO2013189430A3 WO2013189430A3 (zh) 2014-02-20

Family

ID=48549063

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/082219 WO2013189430A2 (zh) 2013-01-28 2013-08-23 一种自动语音识别业务的实现方法、系统和媒体服务器

Country Status (2)

Country Link
CN (1) CN103151041B (zh)
WO (1) WO2013189430A2 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103151041B (zh) * 2013-01-28 2016-02-10 中兴通讯股份有限公司 一种自动语音识别业务的实现方法、系统和媒体服务器
CN105206273B (zh) * 2015-09-06 2019-05-10 上海智臻智能网络科技股份有限公司 语音传输控制方法及系统
CN107659415B (zh) * 2016-07-25 2021-05-18 中兴通讯股份有限公司 一种云会议的媒体资源管理方法及装置
CN109429068B (zh) * 2017-09-01 2020-09-29 成都鼎桥通信技术有限公司 视频编解码业务处理方法和设备
CN107820324A (zh) * 2017-10-30 2018-03-20 铱方科技(深圳)有限公司 移动终端接收固定电话通话的方法、系统及其绑定方法、系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1633129A (zh) * 2005-01-12 2005-06-29 北京邮电大学 一种基于软交换的媒体服务器
CN1764190A (zh) * 2004-10-22 2006-04-26 微软公司 分布式语音服务
CN1801322A (zh) * 2004-11-19 2006-07-12 国际商业机器公司 使用转录门户组件随需转录语音的方法和系统
CN101437047A (zh) * 2008-12-09 2009-05-20 中兴通讯股份有限公司 对用户终端进行放音/录音的方法、系统及媒体服务器
CN102231734A (zh) * 2011-06-22 2011-11-02 中兴通讯股份有限公司 实现从文本到语音tts的音频转码方法、装置及系统
CN103151041A (zh) * 2013-01-28 2013-06-12 中兴通讯股份有限公司 一种自动语音识别业务的实现方法、系统和媒体服务器

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030195751A1 (en) * 2002-04-10 2003-10-16 Mitsubishi Electric Research Laboratories, Inc. Distributed automatic speech recognition with persistent user parameters
US8451823B2 (en) * 2005-12-13 2013-05-28 Nuance Communications, Inc. Distributed off-line voice services

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1764190A (zh) * 2004-10-22 2006-04-26 微软公司 分布式语音服务
CN1801322A (zh) * 2004-11-19 2006-07-12 国际商业机器公司 使用转录门户组件随需转录语音的方法和系统
CN1633129A (zh) * 2005-01-12 2005-06-29 北京邮电大学 一种基于软交换的媒体服务器
CN101437047A (zh) * 2008-12-09 2009-05-20 中兴通讯股份有限公司 对用户终端进行放音/录音的方法、系统及媒体服务器
CN102231734A (zh) * 2011-06-22 2011-11-02 中兴通讯股份有限公司 实现从文本到语音tts的音频转码方法、装置及系统
CN103151041A (zh) * 2013-01-28 2013-06-12 中兴通讯股份有限公司 一种自动语音识别业务的实现方法、系统和媒体服务器

Also Published As

Publication number Publication date
CN103151041A (zh) 2013-06-12
CN103151041B (zh) 2016-02-10
WO2013189430A3 (zh) 2014-02-20

Similar Documents

Publication Publication Date Title
US20100082824A1 (en) Program network recording method, media processing server and network recording system
WO2019128204A1 (zh) 会议实现方法、装置、设备和系统、计算机可读存储介质
CN108965776B (zh) 一种通信方法以及通信系统
US8582726B2 (en) Method and an apparatus for handling multimedia calls
WO2007031028A1 (fr) Procede de negociation de la duree du paquet de flux multimedia
JP2012523199A (ja) セッションネゴシエーションのための方法及び装置
JP2006525693A (ja) マルチメディア・ストリーミングにおけるクライアント速度機能のシグナリング方法
US20230353603A1 (en) Call processing system and call processing method
WO2013189430A2 (zh) 一种自动语音识别业务的实现方法、系统和媒体服务器
WO2008098509A1 (fr) Procédé et système de négociation d'un support et procédé de transmission d'information de description de support
CN108881149B (zh) 一种可视电话设备的接入方法和系统
CN103118238A (zh) 视频会议的控制方法和视频会议系统
CN101272383B (zh) 一种实时音频数据传输方法
WO2014063511A1 (zh) 一种监控方法及网络电视机顶盒
US9705935B2 (en) Efficient interworking between circuit-switched and packet-switched multimedia services
WO2012174908A1 (zh) 实现从文本到语音的音频转码方法、装置及系统
WO2021073155A1 (zh) 视频会议方法、装置、设备及存储介质
CN111614927A (zh) 视频会话建立法、装置、电子设备及存储介质
CN111131743A (zh) 基于浏览器的视频通话方法、装置、电子设备及存储介质
CN113630512B (zh) 一种富媒体通话移动终端系统及其使用方法
WO2009121284A1 (zh) 一种提供智能业务的方法、系统及网关
WO2008040186A1 (fr) Procédé, système et passerelle destinés à négocier la capacité d'un détecteur de signal des données
US20130051390A1 (en) Method and apparatus for transmitting media resources
WO2011000291A1 (zh) 关联sip会话中rtp包的方法、装置及系统
CN101668092B (zh) 一种网络多媒体终端实现补充业务拨号音的方法和装置

Legal Events

Date Code Title Description
122 Ep: pct application non-entry in european phase

Ref document number: 13806579

Country of ref document: EP

Kind code of ref document: A2