CN113160827A

CN113160827A - Voice transcription system and method based on multi-language model

Info

Publication number: CN113160827A
Application number: CN202110371093.9A
Authority: CN
Inventors: 鱼海航
Original assignee: Shenzhen Yuliang Technology Co ltd
Current assignee: Shenzhen Yuliang Technology Co ltd
Priority date: 2021-04-07
Filing date: 2021-04-07
Publication date: 2021-07-23

Abstract

The invention provides a voice transcription system and a method based on a multi-language model, which comprises a platform, a client connected with the platform, a storage module, a voice service module and a display module connected with the client; the platform is used for receiving the information sent by the client and the voice service module and sending the information to the client and the voice service module; the client is used for inputting personal information of a user, sending the personal information to the platform, sending the information sent by the platform to the user and displaying the information through the display module; the storage module is used for storing the voice data; the voice service module is used for transcribing and translating voice data of the user and generating a transcribed text and a translated text; the invention avoids the situation that a translator needs to follow at any time, the cost is higher, the translation cost is high, the working efficiency is improved, and the situation that the translator is inconvenient in the field in some occasions is also avoided.

Description

Voice transcription system and method based on multi-language model

Technical Field

The invention relates to the technical field of voice communication, in particular to a voice transcription system and a voice transcription method based on a multi-language model.

Background

According to statistics, 5000-6000 languages are common in the world, and more common languages include English, Chinese, Japanese French, German, Russian and the like. With the development of communication and traffic, business and tourism activities among countries are increasingly carried out, international long distance telephone expenses are greatly reduced, and call volume is greatly increased. In 2000 years, the number of tourists entering foreign countries in China exceeds ten million, and the tourists live in the fifth place of the world and the first place in Asia. The language barrier causes great inconvenience to the trade and the tourism, and further development of the trade and the tourism is influenced. To clear up language barriers, spoken language translation becomes an important tool. In tourism and large investment countries like China, tens of thousands of translators are needed in the world.

However, when a field translator is used, the field translator needs to keep close at any time, so that the cost is high, and the translation cost is generally high; the translator has low working efficiency and poor maneuverability, and the translator is inconvenient in some occasions.

Disclosure of Invention

The technical problem to be solved by the invention is to overcome the existing defects, and provide a voice transcription system and method based on a multi-language model, so as to solve the problems that in the technical background, a field translator needs to follow the human body at any time, the cost is high, and the translation cost is high generally; the translator has the defects of low working efficiency, poor maneuverability and inconvenience when the translator is in the field in some occasions.

In order to achieve the purpose, the invention provides the following technical scheme: a voice transcription system and method based on a multi-language model comprises a platform, a client connected with the platform, a storage module, a voice service module and a display module connected with the client;

the platform is used for receiving the information sent by the client and the voice service module and sending the information to the client and the voice service module;

the client is used for inputting personal information of a user, sending the personal information to the platform, sending the information sent by the platform to the user and displaying the information through the display module;

the storage module is used for storing voice data;

and the voice service module is used for transcribing and translating the voice data of the user and generating a transcribed text and a translated text.

Preferably, the voice service module is connected to a processing module, and the processing module is connected to an extraction module and is configured to process voice data sent by the voice service module and send the data to the extraction module; the extraction module is connected with the voice service module and used for extracting characteristics of the voice data sent by the processing module and sending the voice data to the voice service module.

Preferably, the processing module is configured to perform pre-emphasis, framing, windowing, and endpoint detection on the voice data sent by the voice service module, and send the processed voice data to the extraction module.

Preferably, the extraction module is used for extracting important relevant information reflecting speech features and removing relatively irrelevant information from the speech data sent by the processing module through the linear prediction cepstrum coefficients LPCC, and sending the data to the speech service module.

Preferably, the system also comprises a voice acquisition module for acquiring voice data of the user and a conversion module for carrying out A/D conversion on the voice data acquired by the voice acquisition module, wherein the voice acquisition module is connected with the conversion module, and the conversion module is connected with the client.

Preferably, the method comprises the following steps:

s1, a user logs in the client to record personal voice data, and the voice data is sent to the platform through the client, and the platform synchronously sends the voice data to the voice service module and the storage module;

s2, during translation, the voice acquisition module acquires user voice data, the user voice data is transmitted to the client through the conversion module, the client transmits the received voice data to the platform, and the platform transmits the data pushed by the client to the voice service module and stores the data in the storage module;

when the voice data sent by the user is consistent with the voice data recorded by other users, the voice service module only transcribes the voice data into texts and sends the transcribed texts to the platform, the platform sends the texts to each client, the transcribed texts are displayed through the display module connected with each client, and meanwhile, the voice information of the user is sent to each client;

when the voice data sent by the user is different from the voice data recorded by the individual user, the voice service module translates and transcribes the voice data and sends the translated text and the transcribed text to the platform, the platform sends the translated text to the individual corresponding client and sends the transcribed text to the client of the original user, the translated text is displayed through the display module connected with the corresponding client, and the transcribed text is displayed through the display module connected with the client of the original user;

and S3, the platform synchronously sends the voice data, the transcription text and the translation text of each user to the storage module for storage.

Preferably, in step S2, when multiple users communicate with each other, the voice data of the users are synchronized to the platform, the voice data are translated and transcribed through the voice service module, the translated text is sent to another user client for display, and the transcribed text is sent to the original client for display.

Preferably, when the user needs to inquire the communication information, the user logs in the client to input the information, the client sends the translation text and the transcription text of the voice information needing to be inquired to the client of the user, and the translation text are displayed through a display module connected with the client.

Compared with the prior art, the invention provides a voice transcription system and a method based on a multi-language model, which have the following beneficial effects:

according to the invention, a user logs on a client, and inputs voice data to the client and sends the voice data to the platform, the voice data is transcribed and translated through the voice service module connected with the platform, the transcribed text and the translated text are sent to the client of each corresponding user, and the transcribed text and the translated text are displayed through the display module connected with the client so as to be convenient for converting multiple languages, thereby avoiding the situation that a translator needs to keep up with the client at any time, the cost is high, the translation cost is high, the working efficiency is improved, and the situation that the translator is inconvenient in the field in some occasions is avoided.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention without limiting the invention in which:

FIG. 1 is a simplified structural diagram of a system and method for transferring a voice based on a multi-language model according to the present invention.

Detailed Description

In order to make the technical means, the original characteristics, the achieved purposes and the effects of the invention easily understood, the invention is further described below with reference to the specific embodiments and the attached drawings, but the following embodiments are only the preferred embodiments of the invention, and not all embodiments are provided. Based on the embodiments in the implementation, other embodiments obtained by those skilled in the art without any creative efforts belong to the protection scope of the present invention.

Referring to fig. 1, a system and method for voice transcription based on a multi-language model includes a platform, a client connected to the platform, a storage module, a voice service module, and a display module connected to the client;

the platform is used for receiving the information sent by the client and the voice service module, sending the information to the client and the voice service module and sending the data to the storage module for storage;

the client is connected with the platform through a network and used for inputting personal information of a user, sending the personal information to the platform, sending the information sent by the platform to the user and displaying the information through the display module, and the display module can display the transcribed text and the translated text so as to be convenient for the user to watch;

the storage module is used for storing voice data and storing data transmitted and received by the platform;

the voice service module is connected with the platform through a network and used for transcribing and translating voice data of the user and generating a transcribed text and a translated text.

The voice service module is connected with the processing module, and the processing module is connected with the extraction module and used for processing the voice data sent by the voice service module and sending the data to the extraction module; the extraction module is connected with the voice service module and used for extracting characteristics of the voice data sent by the processing module and sending the voice data to the voice service module.

The processing module is used for performing pre-emphasis, framing, windowing and endpoint detection on voice data sent by the voice service module, and sending the processed voice data to the extraction module, wherein the pre-emphasis is also called high-frequency lifting, and is a phenomenon that information is easily lost in a high-frequency part of a voice signal due to the influence of oral-nasal radiation and the like of the voice signal, so that the pre-emphasis is performed in the pre-processing before analog/digital conversion. The purpose of pre-emphasis is to boost the high frequency part and flatten the frequency spectrum of the signal so as to perform frequency spectrum analysis or vocal tract parameter analysis; framing is a common method in speech signal analysis and processing, and is an idea of processing a speech signal by segmenting a section of the speech signal, and the speech signal can be regarded as a characteristic which is kept relatively stable in a limited time period, which is also called short-time stationary. When analyzing the voice signal, the continuous voice signal can be divided into a plurality of relatively independent parts for consideration, so that the continuous voice is simpler to process; windowing operation is carried out after the frame division, the purpose of windowing is that the voice signal is smooth at the beginning and the end, and a rectangular window function and a Hamming window function are used more in practical application; the final step of the preprocessing is end point detection, which is a technique for recognizing the positions of the start and end points of a primitive such as a phoneme, syllable, word, etc. in a speech signal.

The extraction module is used for extracting important relevant information reflecting the voice characteristics and removing relatively irrelevant information from the voice data sent by the processing module through the linear prediction cepstrum coefficient LPCC, and sending the data to the voice service module.

The voice data acquisition system further comprises a voice acquisition module for acquiring the voice data of the user and a conversion module for carrying out A/D conversion on the voice data acquired by the voice acquisition module, wherein the voice acquisition module is connected with the conversion module, and the conversion module is connected with the client.

In step S2, when multiple users communicate, the voice data of the users are synchronized to the platform, the voice data are translated and transcribed through the voice service module, the translated text is sent to another user client for display, and the transcribed text is sent to the original client for display.

When a user needs to inquire the communication information, the client inputs the information by logging in, the client sends the translation text and the translation text of the voice information to be inquired to the client of the user, the translation text and the translation text are displayed by a display module connected with the client, when the user with different languages logs in the client, the voice data of the user is recorded, the voice data is collected by a voice collecting module, an analog signal is converted into a digital signal by a converting module and is sent to the client, the digital signal is sent to a platform by the client, the platform sends the data to a storage module for storage and simultaneously to a voice service module, the data is sent to the voice service module by a processing module and an extracting module for translation and translation of the voice data, the translation text and the translation text are generated and sent to the platform, the platform will be reprinted the text and send to former user, the translation text is sent to other users, the platform can store in data transmission to the storage module simultaneously, and the user pass through the translation data that client receiving platform sent, and be convenient for look over the translation text through display module, avoided the translator to follow at any time, and the cost is higher, the condition that translation expense is high, and work efficiency has been improved, the inconvenient condition of some occasions translator when on the spot has also been avoided and has appeared.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A voice transcription system based on a multi-language model is characterized by comprising a platform, a client connected with the platform, a storage module, a voice service module and a display module connected with the client;

the storage module is used for storing voice data;

2. The multi-language model-based speech transcription system as claimed in claim 1, wherein: the voice service module is connected with the processing module, and the processing module is connected with the extraction module and is used for processing the voice data sent by the voice service module and sending the data to the extraction module; the extraction module is connected with the voice service module and used for extracting characteristics of the voice data sent by the processing module and sending the voice data to the voice service module.

3. The multi-language model-based speech transcription system as claimed in claim 2, wherein: the processing module is used for carrying out pre-emphasis, framing, windowing and endpoint detection on the voice data sent by the voice service module and sending the processed voice data to the extraction module.

4. A multi-language model-based speech transcription system as claimed in claim 3, characterized in that: the extraction module is used for extracting important relevant information reflecting voice characteristics and removing relatively irrelevant information from the voice data sent by the processing module through a Linear Prediction Cepstrum Coefficient (LPCC) and sending the data to the voice service module.

5. A multi-language model-based speech transcription system according to any one of claims 1 to 5, characterized in that: the voice data acquisition system further comprises a voice acquisition module for acquiring voice data of a user and a conversion module for carrying out A/D conversion on the voice data acquired by the voice acquisition module, wherein the voice acquisition module is connected with the conversion module, and the conversion module is connected with the client.

6. A voice transcription method based on a multi-language model is characterized by comprising the following steps:

7. The method of claim 6, wherein the method comprises: in step S2, when multiple users communicate with each other, the voice data of the users are synchronized to the platform, the voice data are translated and transcribed through the voice service module, the translated text is sent to another user client for display, and the transcribed text is sent to the original client for display.

8. The method of claim 6, wherein the method comprises: when a user needs to inquire the communication information, the user logs in the client to input the information, the client sends the translation text and the transcription text of the voice information needing to be inquired to the client of the user, and the transcription text and the translation text are displayed through a display module connected with the client.