CN113160827A - Voice transcription system and method based on multi-language model - Google Patents
Voice transcription system and method based on multi-language model Download PDFInfo
- Publication number
- CN113160827A CN113160827A CN202110371093.9A CN202110371093A CN113160827A CN 113160827 A CN113160827 A CN 113160827A CN 202110371093 A CN202110371093 A CN 202110371093A CN 113160827 A CN113160827 A CN 113160827A
- Authority
- CN
- China
- Prior art keywords
- module
- client
- voice
- voice data
- platform
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013518 transcription Methods 0.000 title claims abstract description 21
- 230000035897 transcription Effects 0.000 title claims abstract description 21
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000013519 translation Methods 0.000 claims abstract description 27
- 238000012545 processing Methods 0.000 claims description 20
- 238000006243 chemical reaction Methods 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 15
- 238000004891 communication Methods 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims description 4
- 238000009432 framing Methods 0.000 claims description 4
- 230000001360 synchronised effect Effects 0.000 claims description 3
- 230000004888 barrier function Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/12—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1095—Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Networks & Wireless Communication (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a voice transcription system and a method based on a multi-language model, which comprises a platform, a client connected with the platform, a storage module, a voice service module and a display module connected with the client; the platform is used for receiving the information sent by the client and the voice service module and sending the information to the client and the voice service module; the client is used for inputting personal information of a user, sending the personal information to the platform, sending the information sent by the platform to the user and displaying the information through the display module; the storage module is used for storing the voice data; the voice service module is used for transcribing and translating voice data of the user and generating a transcribed text and a translated text; the invention avoids the situation that a translator needs to follow at any time, the cost is higher, the translation cost is high, the working efficiency is improved, and the situation that the translator is inconvenient in the field in some occasions is also avoided.
Description
Technical Field
The invention relates to the technical field of voice communication, in particular to a voice transcription system and a voice transcription method based on a multi-language model.
Background
According to statistics, 5000-6000 languages are common in the world, and more common languages include English, Chinese, Japanese French, German, Russian and the like. With the development of communication and traffic, business and tourism activities among countries are increasingly carried out, international long distance telephone expenses are greatly reduced, and call volume is greatly increased. In 2000 years, the number of tourists entering foreign countries in China exceeds ten million, and the tourists live in the fifth place of the world and the first place in Asia. The language barrier causes great inconvenience to the trade and the tourism, and further development of the trade and the tourism is influenced. To clear up language barriers, spoken language translation becomes an important tool. In tourism and large investment countries like China, tens of thousands of translators are needed in the world.
However, when a field translator is used, the field translator needs to keep close at any time, so that the cost is high, and the translation cost is generally high; the translator has low working efficiency and poor maneuverability, and the translator is inconvenient in some occasions.
Disclosure of Invention
The technical problem to be solved by the invention is to overcome the existing defects, and provide a voice transcription system and method based on a multi-language model, so as to solve the problems that in the technical background, a field translator needs to follow the human body at any time, the cost is high, and the translation cost is high generally; the translator has the defects of low working efficiency, poor maneuverability and inconvenience when the translator is in the field in some occasions.
In order to achieve the purpose, the invention provides the following technical scheme: a voice transcription system and method based on a multi-language model comprises a platform, a client connected with the platform, a storage module, a voice service module and a display module connected with the client;
the platform is used for receiving the information sent by the client and the voice service module and sending the information to the client and the voice service module;
the client is used for inputting personal information of a user, sending the personal information to the platform, sending the information sent by the platform to the user and displaying the information through the display module;
the storage module is used for storing voice data;
and the voice service module is used for transcribing and translating the voice data of the user and generating a transcribed text and a translated text.
Preferably, the voice service module is connected to a processing module, and the processing module is connected to an extraction module and is configured to process voice data sent by the voice service module and send the data to the extraction module; the extraction module is connected with the voice service module and used for extracting characteristics of the voice data sent by the processing module and sending the voice data to the voice service module.
Preferably, the processing module is configured to perform pre-emphasis, framing, windowing, and endpoint detection on the voice data sent by the voice service module, and send the processed voice data to the extraction module.
Preferably, the extraction module is used for extracting important relevant information reflecting speech features and removing relatively irrelevant information from the speech data sent by the processing module through the linear prediction cepstrum coefficients LPCC, and sending the data to the speech service module.
Preferably, the system also comprises a voice acquisition module for acquiring voice data of the user and a conversion module for carrying out A/D conversion on the voice data acquired by the voice acquisition module, wherein the voice acquisition module is connected with the conversion module, and the conversion module is connected with the client.
Preferably, the method comprises the following steps:
s1, a user logs in the client to record personal voice data, and the voice data is sent to the platform through the client, and the platform synchronously sends the voice data to the voice service module and the storage module;
s2, during translation, the voice acquisition module acquires user voice data, the user voice data is transmitted to the client through the conversion module, the client transmits the received voice data to the platform, and the platform transmits the data pushed by the client to the voice service module and stores the data in the storage module;
when the voice data sent by the user is consistent with the voice data recorded by other users, the voice service module only transcribes the voice data into texts and sends the transcribed texts to the platform, the platform sends the texts to each client, the transcribed texts are displayed through the display module connected with each client, and meanwhile, the voice information of the user is sent to each client;
when the voice data sent by the user is different from the voice data recorded by the individual user, the voice service module translates and transcribes the voice data and sends the translated text and the transcribed text to the platform, the platform sends the translated text to the individual corresponding client and sends the transcribed text to the client of the original user, the translated text is displayed through the display module connected with the corresponding client, and the transcribed text is displayed through the display module connected with the client of the original user;
and S3, the platform synchronously sends the voice data, the transcription text and the translation text of each user to the storage module for storage.
Preferably, in step S2, when multiple users communicate with each other, the voice data of the users are synchronized to the platform, the voice data are translated and transcribed through the voice service module, the translated text is sent to another user client for display, and the transcribed text is sent to the original client for display.
Preferably, when the user needs to inquire the communication information, the user logs in the client to input the information, the client sends the translation text and the transcription text of the voice information needing to be inquired to the client of the user, and the translation text are displayed through a display module connected with the client.
Compared with the prior art, the invention provides a voice transcription system and a method based on a multi-language model, which have the following beneficial effects:
according to the invention, a user logs on a client, and inputs voice data to the client and sends the voice data to the platform, the voice data is transcribed and translated through the voice service module connected with the platform, the transcribed text and the translated text are sent to the client of each corresponding user, and the transcribed text and the translated text are displayed through the display module connected with the client so as to be convenient for converting multiple languages, thereby avoiding the situation that a translator needs to keep up with the client at any time, the cost is high, the translation cost is high, the working efficiency is improved, and the situation that the translator is inconvenient in the field in some occasions is avoided.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention without limiting the invention in which:
FIG. 1 is a simplified structural diagram of a system and method for transferring a voice based on a multi-language model according to the present invention.
Detailed Description
In order to make the technical means, the original characteristics, the achieved purposes and the effects of the invention easily understood, the invention is further described below with reference to the specific embodiments and the attached drawings, but the following embodiments are only the preferred embodiments of the invention, and not all embodiments are provided. Based on the embodiments in the implementation, other embodiments obtained by those skilled in the art without any creative efforts belong to the protection scope of the present invention.
Referring to fig. 1, a system and method for voice transcription based on a multi-language model includes a platform, a client connected to the platform, a storage module, a voice service module, and a display module connected to the client;
the platform is used for receiving the information sent by the client and the voice service module, sending the information to the client and the voice service module and sending the data to the storage module for storage;
the client is connected with the platform through a network and used for inputting personal information of a user, sending the personal information to the platform, sending the information sent by the platform to the user and displaying the information through the display module, and the display module can display the transcribed text and the translated text so as to be convenient for the user to watch;
the storage module is used for storing voice data and storing data transmitted and received by the platform;
the voice service module is connected with the platform through a network and used for transcribing and translating voice data of the user and generating a transcribed text and a translated text.
The voice service module is connected with the processing module, and the processing module is connected with the extraction module and used for processing the voice data sent by the voice service module and sending the data to the extraction module; the extraction module is connected with the voice service module and used for extracting characteristics of the voice data sent by the processing module and sending the voice data to the voice service module.
The processing module is used for performing pre-emphasis, framing, windowing and endpoint detection on voice data sent by the voice service module, and sending the processed voice data to the extraction module, wherein the pre-emphasis is also called high-frequency lifting, and is a phenomenon that information is easily lost in a high-frequency part of a voice signal due to the influence of oral-nasal radiation and the like of the voice signal, so that the pre-emphasis is performed in the pre-processing before analog/digital conversion. The purpose of pre-emphasis is to boost the high frequency part and flatten the frequency spectrum of the signal so as to perform frequency spectrum analysis or vocal tract parameter analysis; framing is a common method in speech signal analysis and processing, and is an idea of processing a speech signal by segmenting a section of the speech signal, and the speech signal can be regarded as a characteristic which is kept relatively stable in a limited time period, which is also called short-time stationary. When analyzing the voice signal, the continuous voice signal can be divided into a plurality of relatively independent parts for consideration, so that the continuous voice is simpler to process; windowing operation is carried out after the frame division, the purpose of windowing is that the voice signal is smooth at the beginning and the end, and a rectangular window function and a Hamming window function are used more in practical application; the final step of the preprocessing is end point detection, which is a technique for recognizing the positions of the start and end points of a primitive such as a phoneme, syllable, word, etc. in a speech signal.
The extraction module is used for extracting important relevant information reflecting the voice characteristics and removing relatively irrelevant information from the voice data sent by the processing module through the linear prediction cepstrum coefficient LPCC, and sending the data to the voice service module.
The voice data acquisition system further comprises a voice acquisition module for acquiring the voice data of the user and a conversion module for carrying out A/D conversion on the voice data acquired by the voice acquisition module, wherein the voice acquisition module is connected with the conversion module, and the conversion module is connected with the client.
S1, a user logs in the client to record personal voice data, and the voice data is sent to the platform through the client, and the platform synchronously sends the voice data to the voice service module and the storage module;
s2, during translation, the voice acquisition module acquires user voice data, the user voice data is transmitted to the client through the conversion module, the client transmits the received voice data to the platform, and the platform transmits the data pushed by the client to the voice service module and stores the data in the storage module;
when the voice data sent by the user is consistent with the voice data recorded by other users, the voice service module only transcribes the voice data into texts and sends the transcribed texts to the platform, the platform sends the texts to each client, the transcribed texts are displayed through the display module connected with each client, and meanwhile, the voice information of the user is sent to each client;
when the voice data sent by the user is different from the voice data recorded by the individual user, the voice service module translates and transcribes the voice data and sends the translated text and the transcribed text to the platform, the platform sends the translated text to the individual corresponding client and sends the transcribed text to the client of the original user, the translated text is displayed through the display module connected with the corresponding client, and the transcribed text is displayed through the display module connected with the client of the original user;
and S3, the platform synchronously sends the voice data, the transcription text and the translation text of each user to the storage module for storage.
In step S2, when multiple users communicate, the voice data of the users are synchronized to the platform, the voice data are translated and transcribed through the voice service module, the translated text is sent to another user client for display, and the transcribed text is sent to the original client for display.
When a user needs to inquire the communication information, the client inputs the information by logging in, the client sends the translation text and the translation text of the voice information to be inquired to the client of the user, the translation text and the translation text are displayed by a display module connected with the client, when the user with different languages logs in the client, the voice data of the user is recorded, the voice data is collected by a voice collecting module, an analog signal is converted into a digital signal by a converting module and is sent to the client, the digital signal is sent to a platform by the client, the platform sends the data to a storage module for storage and simultaneously to a voice service module, the data is sent to the voice service module by a processing module and an extracting module for translation and translation of the voice data, the translation text and the translation text are generated and sent to the platform, the platform will be reprinted the text and send to former user, the translation text is sent to other users, the platform can store in data transmission to the storage module simultaneously, and the user pass through the translation data that client receiving platform sent, and be convenient for look over the translation text through display module, avoided the translator to follow at any time, and the cost is higher, the condition that translation expense is high, and work efficiency has been improved, the inconvenient condition of some occasions translator when on the spot has also been avoided and has appeared.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (8)
1. A voice transcription system based on a multi-language model is characterized by comprising a platform, a client connected with the platform, a storage module, a voice service module and a display module connected with the client;
the platform is used for receiving the information sent by the client and the voice service module and sending the information to the client and the voice service module;
the client is used for inputting personal information of a user, sending the personal information to the platform, sending the information sent by the platform to the user and displaying the information through the display module;
the storage module is used for storing voice data;
and the voice service module is used for transcribing and translating the voice data of the user and generating a transcribed text and a translated text.
2. The multi-language model-based speech transcription system as claimed in claim 1, wherein: the voice service module is connected with the processing module, and the processing module is connected with the extraction module and is used for processing the voice data sent by the voice service module and sending the data to the extraction module; the extraction module is connected with the voice service module and used for extracting characteristics of the voice data sent by the processing module and sending the voice data to the voice service module.
3. The multi-language model-based speech transcription system as claimed in claim 2, wherein: the processing module is used for carrying out pre-emphasis, framing, windowing and endpoint detection on the voice data sent by the voice service module and sending the processed voice data to the extraction module.
4. A multi-language model-based speech transcription system as claimed in claim 3, characterized in that: the extraction module is used for extracting important relevant information reflecting voice characteristics and removing relatively irrelevant information from the voice data sent by the processing module through a Linear Prediction Cepstrum Coefficient (LPCC) and sending the data to the voice service module.
5. A multi-language model-based speech transcription system according to any one of claims 1 to 5, characterized in that: the voice data acquisition system further comprises a voice acquisition module for acquiring voice data of a user and a conversion module for carrying out A/D conversion on the voice data acquired by the voice acquisition module, wherein the voice acquisition module is connected with the conversion module, and the conversion module is connected with the client.
6. A voice transcription method based on a multi-language model is characterized by comprising the following steps:
s1, a user logs in the client to record personal voice data, and the voice data is sent to the platform through the client, and the platform synchronously sends the voice data to the voice service module and the storage module;
s2, during translation, the voice acquisition module acquires user voice data, the user voice data is transmitted to the client through the conversion module, the client transmits the received voice data to the platform, and the platform transmits the data pushed by the client to the voice service module and stores the data in the storage module;
when the voice data sent by the user is consistent with the voice data recorded by other users, the voice service module only transcribes the voice data into texts and sends the transcribed texts to the platform, the platform sends the texts to each client, the transcribed texts are displayed through the display module connected with each client, and meanwhile, the voice information of the user is sent to each client;
when the voice data sent by the user is different from the voice data recorded by the individual user, the voice service module translates and transcribes the voice data and sends the translated text and the transcribed text to the platform, the platform sends the translated text to the individual corresponding client and sends the transcribed text to the client of the original user, the translated text is displayed through the display module connected with the corresponding client, and the transcribed text is displayed through the display module connected with the client of the original user;
and S3, the platform synchronously sends the voice data, the transcription text and the translation text of each user to the storage module for storage.
7. The method of claim 6, wherein the method comprises: in step S2, when multiple users communicate with each other, the voice data of the users are synchronized to the platform, the voice data are translated and transcribed through the voice service module, the translated text is sent to another user client for display, and the transcribed text is sent to the original client for display.
8. The method of claim 6, wherein the method comprises: when a user needs to inquire the communication information, the user logs in the client to input the information, the client sends the translation text and the transcription text of the voice information needing to be inquired to the client of the user, and the transcription text and the translation text are displayed through a display module connected with the client.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110371093.9A CN113160827A (en) | 2021-04-07 | 2021-04-07 | Voice transcription system and method based on multi-language model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110371093.9A CN113160827A (en) | 2021-04-07 | 2021-04-07 | Voice transcription system and method based on multi-language model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113160827A true CN113160827A (en) | 2021-07-23 |
Family
ID=76888535
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110371093.9A Pending CN113160827A (en) | 2021-04-07 | 2021-04-07 | Voice transcription system and method based on multi-language model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113160827A (en) |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN202587038U (en) * | 2012-04-11 | 2012-12-05 | 上海车音网络科技有限公司 | Voice data processing platform and system thereof |
CN105103151A (en) * | 2013-02-08 | 2015-11-25 | 机械地带有限公司 | Systems and methods for multi-user multi-lingual communications |
CN105408891A (en) * | 2013-06-03 | 2016-03-16 | 机械地带有限公司 | Systems and methods for multi-user multi-lingual communications |
CN106453043A (en) * | 2016-09-29 | 2017-02-22 | 安徽声讯信息技术有限公司 | Multi-language conversion-based instant communication system |
CN107229616A (en) * | 2016-03-25 | 2017-10-03 | 阿里巴巴集团控股有限公司 | Language identification method, apparatus and system |
JP2018060165A (en) * | 2016-09-28 | 2018-04-12 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | Voice recognition method, portable terminal, and program |
CN108595443A (en) * | 2018-03-30 | 2018-09-28 | 浙江吉利控股集团有限公司 | Simultaneous interpreting method, device, intelligent vehicle mounted terminal and storage medium |
CN110049270A (en) * | 2019-03-12 | 2019-07-23 | 平安科技(深圳)有限公司 | Multi-person conference speech transcription method, apparatus, system, equipment and storage medium |
CN110335610A (en) * | 2019-07-19 | 2019-10-15 | 北京硬壳科技有限公司 | The control method and display of multimedia translation |
CN110457717A (en) * | 2019-08-07 | 2019-11-15 | 深圳市博音科技有限公司 | Remote translating system and method |
CN110556094A (en) * | 2019-10-18 | 2019-12-10 | 重庆旅游人工智能信息科技有限公司 | Artificial intelligent voice simultaneous interpretation system of tour guide machine |
CN110689770A (en) * | 2019-08-12 | 2020-01-14 | 合肥马道信息科技有限公司 | Online classroom voice transcription and translation system and working method thereof |
KR20200090579A (en) * | 2019-01-21 | 2020-07-29 | (주)한컴인터프리 | Method and System for Interpreting and Translating using Smart Device |
CN111554280A (en) * | 2019-10-23 | 2020-08-18 | 爱声科技有限公司 | Real-time interpretation service system for mixing interpretation contents using artificial intelligence and interpretation contents of interpretation experts |
KR20210020448A (en) * | 2019-08-14 | 2021-02-24 | 주식회사 사운드브릿지 | Simultaneous interpretation device based on mobile cloud and electronic device |
CN112447168A (en) * | 2019-09-05 | 2021-03-05 | 阿里巴巴集团控股有限公司 | Voice recognition system and method, sound box, display device and interaction platform |
CN112951236A (en) * | 2021-02-07 | 2021-06-11 | 北京有竹居网络技术有限公司 | Voice translation equipment and method |
-
2021
- 2021-04-07 CN CN202110371093.9A patent/CN113160827A/en active Pending
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN202587038U (en) * | 2012-04-11 | 2012-12-05 | 上海车音网络科技有限公司 | Voice data processing platform and system thereof |
CN105103151A (en) * | 2013-02-08 | 2015-11-25 | 机械地带有限公司 | Systems and methods for multi-user multi-lingual communications |
CN105408891A (en) * | 2013-06-03 | 2016-03-16 | 机械地带有限公司 | Systems and methods for multi-user multi-lingual communications |
CN107229616A (en) * | 2016-03-25 | 2017-10-03 | 阿里巴巴集团控股有限公司 | Language identification method, apparatus and system |
JP2018060165A (en) * | 2016-09-28 | 2018-04-12 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | Voice recognition method, portable terminal, and program |
CN106453043A (en) * | 2016-09-29 | 2017-02-22 | 安徽声讯信息技术有限公司 | Multi-language conversion-based instant communication system |
CN108595443A (en) * | 2018-03-30 | 2018-09-28 | 浙江吉利控股集团有限公司 | Simultaneous interpreting method, device, intelligent vehicle mounted terminal and storage medium |
KR20200090579A (en) * | 2019-01-21 | 2020-07-29 | (주)한컴인터프리 | Method and System for Interpreting and Translating using Smart Device |
CN110049270A (en) * | 2019-03-12 | 2019-07-23 | 平安科技(深圳)有限公司 | Multi-person conference speech transcription method, apparatus, system, equipment and storage medium |
CN110335610A (en) * | 2019-07-19 | 2019-10-15 | 北京硬壳科技有限公司 | The control method and display of multimedia translation |
CN110457717A (en) * | 2019-08-07 | 2019-11-15 | 深圳市博音科技有限公司 | Remote translating system and method |
CN110689770A (en) * | 2019-08-12 | 2020-01-14 | 合肥马道信息科技有限公司 | Online classroom voice transcription and translation system and working method thereof |
KR20210020448A (en) * | 2019-08-14 | 2021-02-24 | 주식회사 사운드브릿지 | Simultaneous interpretation device based on mobile cloud and electronic device |
CN112447168A (en) * | 2019-09-05 | 2021-03-05 | 阿里巴巴集团控股有限公司 | Voice recognition system and method, sound box, display device and interaction platform |
CN110556094A (en) * | 2019-10-18 | 2019-12-10 | 重庆旅游人工智能信息科技有限公司 | Artificial intelligent voice simultaneous interpretation system of tour guide machine |
CN111554280A (en) * | 2019-10-23 | 2020-08-18 | 爱声科技有限公司 | Real-time interpretation service system for mixing interpretation contents using artificial intelligence and interpretation contents of interpretation experts |
CN112951236A (en) * | 2021-02-07 | 2021-06-11 | 北京有竹居网络技术有限公司 | Voice translation equipment and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111128126B (en) | Multi-language intelligent voice conversation method and system | |
CN110049270B (en) | Multi-person conference voice transcription method, device, system, equipment and storage medium | |
CN107945805B (en) | A kind of across language voice identification method for transformation of intelligence | |
CN102903361A (en) | Instant call translation system and instant call translation method | |
CN101510424B (en) | Method and system for encoding and synthesizing speech based on speech primitive | |
WO2008084476A2 (en) | Vowel recognition system and method in speech to text applications | |
CN108053823A (en) | A kind of speech recognition system and method | |
CN111477216A (en) | Training method and system for pronunciation understanding model of conversation robot | |
CN110853615B (en) | Data processing method, device and storage medium | |
KR20140121580A (en) | Apparatus and method for automatic translation and interpretation | |
CN106453043A (en) | Multi-language conversion-based instant communication system | |
CN109256133A (en) | A kind of voice interactive method, device, equipment and storage medium | |
CN109714608B (en) | Video data processing method, video data processing device, computer equipment and storage medium | |
CN101876887A (en) | Voice input method and device | |
US20020198716A1 (en) | System and method of improved communication | |
CN110265000A (en) | A method of realizing Rapid Speech writing record | |
CN113744722A (en) | Off-line speech recognition matching device and method for limited sentence library | |
CN116665674A (en) | Internet intelligent recruitment publishing method based on voice and pre-training model | |
CN109686365B (en) | Voice recognition method and voice recognition system | |
CN111709253B (en) | AI translation method and system for automatically converting dialect into subtitle | |
CN113362801A (en) | Audio synthesis method, system, device and storage medium based on Mel spectrum alignment | |
CN107885736A (en) | Interpretation method and device | |
CN113160827A (en) | Voice transcription system and method based on multi-language model | |
CN102196100A (en) | Instant call translation system and method | |
CN115831125A (en) | Speech recognition method, device, equipment, storage medium and product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |