WO2020000764A1

WO2020000764A1 - Hindi-oriented multi-language mixed input method and device

Info

Publication number: WO2020000764A1
Application number: PCT/CN2018/109507
Authority: WO
Inventors: 许晏铭; 吴晓强
Original assignee: 北京金山安全软件有限公司
Priority date: 2018-06-29
Filing date: 2018-10-09
Publication date: 2020-01-02
Also published as: CN108897438A

Abstract

A Hindi-oriented multi-language mixed input method and device, wherein the method comprises: acquiring a Latin character sequence of currently inputted vocabulary entered by means of an input method interface; according to a first language model, acquiring a first candidate character string list in the form of Latin characters that corresponds to the Latin character sequence; acquiring a Hindi character spelling form corresponding to Hindi vocabulary that is in the Latin character spelling form in the first candidate character string list according to the mapping between the Latin character spelling form of Hindi vocabulary and the Hindi character spelling form; generating a first candidate word list comprising the vocabulary in the Latin character spelling form and the Hindi character spelling form; displaying the first candidate word list on the input method interface; acquiring a selection operation for the vocabulary in the first candidate word list, and inputting the selected vocabulary as inputted vocabulary. The method may increase the efficiency of multi-language mixed input, thereby improving user input experience.

Description

Multilingual mixed input method and device oriented to Hindi

Cross-reference to related applications

This application claims the priority of China Patent Application No. “201810713058.9” filed by Beijing Jinshan Security Software Co., Ltd. on June 29, 2018, with the invention name “A Multi-Language Mixed Input Method and Device for Hindi” .

Technical field

The invention relates to the technical field of input methods, and in particular, to a multilingual mixed input method and device for Hindi.

Background technique

With the increasing frequency of international exchanges, mixed input of two or even multiple languages has become more common. At present, the two official languages in India: English and Hindi, which are written in Latin and Sanskrit, respectively. Therefore, Indian users have a mixed demand for Latin and Hindi.

In the prior art, the purpose of multilingual mixed input is achieved by switching input modes. For example, when the user uses the English keyboard to input Latin characters, if the user wants to input a certain Hindi character at this time, the user needs to switch to the Hindi input method for input, and then switch back to the English keyboard to continue inputting Latin characters.

In this way, the user needs to switch the input mode back and forth, and the multi-language mixed input is less efficient and time consuming.

Summary of the invention

The present invention provides a multilingual mixed input method and device oriented to Hindi, which is used to solve the purpose of multilingual mixed input by switching input modes in the prior art. The efficiency of multilingual mixed input is low. And extremely time-consuming technical issues.

An embodiment of one aspect of the present invention provides a multilingual mixed input method for Hindi, including:

Get the Latin character sequence of the current input vocabulary typed by the input method interface;

Obtaining a first candidate character string list of Latin character forms corresponding to the Latin character sequence according to a first language model, where the first language model is a pre-established language model that spells Hindi in the form of Latin characters;

Obtain a target Hindi vocabulary list according to a mapping relationship between the spelling form of the Latin characters of the Hindi vocabulary and the spelling form of the Hindi characters, and the target Hindi vocabulary list includes: the first candidate character The Hindi character spelling corresponding to the Hindi vocabulary of the Latin character spelling in the string list;

Generating, according to the first candidate character string list and the target Hindi vocabulary list, a first candidate word list of words including Latin character spelling form and Hindi character spelling form;

Displaying the first candidate word list on an input method interface;

Acquiring a selection operation of a word in the first candidate word list, and inputting the selected word as an input word.

As a first possible implementation manner of the present invention, obtaining the first candidate character string list of Latin character forms corresponding to the Latin character sequence according to the first language model includes:

When the Latin character sequence is a Hindi vocabulary in the form of a complete Latin character spelling, adding the Hindi vocabulary corresponding to the Latin character sequence to the first candidate character string list; and

An extended option is obtained, the extended option includes: a Hindi word or a vocabulary segment containing a Latin character spelling form of the Latin character sequence, and the extended option is added to a first candidate character string list.

As a second possible implementation manner of the present invention, obtaining the first candidate character string list of the Latin character form corresponding to the Latin character sequence according to the first language model further includes:

When there is no Hindi word in the first language model containing the Latin character spelling form of the Latin character sequence, obtaining a Hindi word in the Latin character spelling form having the highest similarity to the Latin character sequence, and Add it as an extended option to the first candidate string list.

As a third possible implementation manner of the present invention, after obtaining a selection operation of a vocabulary in the first candidate word list and inputting the selected vocabulary as an input vocabulary, the method further includes:

Predicting a subsequent vocabulary of the input vocabulary according to the language model corresponding to the input vocabulary, and generating a second candidate word list according to the prediction result;

Displaying the second candidate word list on an input method interface;

Acquiring a selection operation of a vocabulary of the second candidate word list, and inputting the selected vocabulary as a next input vocabulary.

As a fourth possible implementation manner of the present invention, predicting a subsequent vocabulary of the input vocabulary according to a language model corresponding to the input vocabulary, and generating a second candidate word list according to the prediction result, including:

Determining whether the spelling form of the input vocabulary is a Latin character or a Hindi character;

When the spelling form of the input vocabulary is Latin characters, predicting subsequent input vocabulary according to the first language model;

When the spelling form of the input vocabulary is Hindi characters, the subsequent input vocabulary is predicted according to a second language model, which is a pre-established language model that spells Hindi in the form of Hindi characters.

As a fifth possible implementation manner of the present invention, according to the first language model, the first candidate character string list of the Latin character form corresponding to the Latin character sequence is obtained, and the first language model is a Latin language A language model of the character form spelling Hindi, where,

The pre-establishment of the first language model includes:

Acquiring corpus data spelling Hindi in the form of Latin characters, and preprocessing the corpus data to remove erroneous corpus and low-frequency corpus therein to obtain an effective corpus;

Removing redundant parts in the effective corpus data to obtain a collated corpus;

Use the corpus to organize the language model.

As a sixth possible implementation manner of the present invention, the constructing a language model using the collated corpus includes:

Use the collated corpus to construct a language model in the form of N-Gram, and calculate the parameters of the language model, where the parameters of the language model include: words in the language model, and in the N-gram lexical arrangement, the Nth word is about the former Conditional probability for N-1 words, where N is a positive integer; and

Smooth the conditional probability data so that the conditional probability corresponding to the N-ary vocabulary arrangement that does not appear in the collated corpus is not zero.

The multi-lingual mixed input method for Hindi according to the embodiment of the present invention obtains a Latin character sequence of a current input vocabulary typed by an input method interface, and then obtains a first Latin character form corresponding to the Latin character sequence according to a first language model. Candidate string list, where the first language model is a pre-established language model that spells Hindi in Latin characters, and then according to the pre-established Hindi vocabulary between the spelling form of Latin characters and the Hindi character spelling To obtain the Hindi character spelling form corresponding to the Hindi vocabulary of the Latin character spelling form in the first candidate character string list, and according to the first candidate character string list and the Latin character in the first candidate character string list A Hindi character spelling form corresponding to the Hindi vocabulary of the character spelling form, generating a first candidate list of words including the Latin character spelling form and the Hindi character spelling form, and finally displaying the first candidate word on the input method interface List, and get a selection operation of the words in the first candidate word list, which will be The vocabulary input as input vocabulary. As a result, there is no need to frequently switch input modes to meet the user's simultaneous input requirements for mixed input of Hindi and Latin, improve multi-language mixed input efficiency, and improve user input experience. In addition, according to the mapping relationship, determining the spelling form of the Hindi characters can improve the accuracy of the output result.

An embodiment of another aspect of the present invention provides a multilingual mixed input device for Hindi, including:

Input character acquisition module, which is used to acquire the Latin character sequence of the current input vocabulary typed by the input method interface;

A first candidate character string generating module, configured to obtain a first candidate character string list in the form of a Latin character corresponding to the Latin character sequence according to a first language model, where the first language model is to spell Hindi in the form of a Latin character Language model

A vocabulary mapping module is configured to obtain a target Hindi vocabulary list according to a mapping relationship between a Latin character spelling form of the Hindi vocabulary and a Hindi character spelling form, which is established in advance. The target Hindi vocabulary list includes: The Hindi character spelling form corresponding to the Hindi vocabulary in the Latin character spelling form in the first candidate string list;

A first candidate word list generating module, configured to generate a Hindi character spelling form corresponding to a Hindi word corresponding to a Hindi word spelling in the first candidate character string list and the Latin character spelling form in the first candidate character string list; A first candidate list of words including spellings of Latin characters and spellings of Hindi characters;

A first candidate word list display module, configured to display the first candidate word list on an input method interface;

The first candidate word input module is configured to obtain a selection operation of a word in the first candidate word list, and input the selected word as an input word.

As a first possible implementation manner of the present invention, the first candidate string generating module is specifically configured to:

As a second possible implementation manner of the present invention, the first candidate string generating module is further configured to:

As a third possible implementation manner of the present invention, the device further includes:

A second candidate word list generating module, configured to predict a subsequent vocabulary of the input vocabulary according to the language model corresponding to the input vocabulary, and generate a second candidate word list according to the prediction result;

A second candidate word list display module, configured to display the second candidate word list on an input method interface;

A second candidate word input module is configured to obtain a selection operation of a vocabulary of the second candidate word list, and input the selected vocabulary as a next input vocabulary.

As a fourth possible implementation manner of the present invention, the second candidate word list generating module is specifically configured to:

As a fifth possible implementation manner of the present invention, the device further includes:

A first language model creation module is used to establish a first language model. The first language model creation module includes:

A corpus acquisition unit, configured to acquire corpus data spelling Hindi in the form of Latin characters, and preprocess the corpus data to remove the erroneous corpus and low-frequency corpus to obtain valid corpus;

A corpus deduplication unit for removing redundant parts in the valid corpus data to obtain a collated corpus;

A language model building unit is used to build a language model using the collated corpus.

As a sixth possible implementation manner of the present invention, the language model construction unit is specifically configured to:

The multilingual mixed input device for Hindi according to the embodiment of the present invention obtains the Latin character sequence of the current input vocabulary typed by the input method interface, and then obtains the first Latin character form corresponding to the Latin character input sequence according to the first language model. A list of candidate character strings, where the first language model is a pre-established language model that spells Hindi in the form of Latin characters, and then according to the pre-established Hindi word spelling form and the Hindi character spelling form, Mapping relationship between the first candidate string list to obtain the Hindi character spelling form corresponding to the Hindi vocabulary of the Latin character spelling form in the first candidate string list, and according to the first candidate string list and the first candidate string list A Hindi character spelling form corresponding to the Hindi vocabulary of the Latin character spelling form, generating a first candidate list of words including the Latin character spelling form and the Hindi character spelling form, and finally displaying the first candidate on the input method interface Word list, and obtain a selection operation of words in the first candidate word list, thereby The selected input as input vocabulary words. As a result, there is no need to frequently switch input modes to meet the user's simultaneous input requirements for mixed input of Hindi and Latin, improve multi-language mixed input efficiency, and improve user input experience. In addition, according to the mapping relationship, determining the spelling form of the Hindi characters can improve the accuracy of the output result.

Another embodiment of the present invention provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the multilingual Hindi-oriented multilingual mixture proposed by the above embodiment of the present invention Input method.

In order to achieve the above object, an embodiment of the fourth aspect of the present invention provides a computer program product, and when instructions in the computer program product are executed by a processor, a multi-language oriented Hindi language according to the foregoing embodiment of the present invention is implemented. Mixed language input method.

In order to achieve the foregoing objective, an embodiment of the fifth aspect of the present invention provides a computing device including a memory, a processor, and a computer program stored on the memory and executable on the processor. When the processor executes the program, A multi-language mixed input method for Hindi language according to the above embodiment of the present invention is implemented.

According to the non-transitory computer-readable storage medium of the third to fifth aspects of the present invention, the computer program product and the computing device have similar methods and devices for Hindi-oriented multilingual mixed input according to the first and second aspects of the present invention The beneficial effects are not repeated here.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and / or additional aspects and advantages of the present invention will become apparent and easily understood from the following description of the embodiments with reference to the accompanying drawings, in which:

FIG. 1 is a schematic flowchart of a Hindi-oriented multilingual mixed input method according to a first embodiment of the present invention; FIG.

2 is a schematic flowchart of lexical association input in a Hindi-oriented multilingual mixed input method according to an embodiment of the present invention;

3 is a schematic flowchart of establishing a language model according to an embodiment of the present invention;

4 is a structural block diagram of a multi-lingual mixed input device for Hindi according to an embodiment of the present invention;

FIG. 5 is a structural block diagram of a Hindi-oriented multilingual mixed input device according to an embodiment of the present invention.

detailed description

Hereinafter, embodiments of the present invention will be described in detail. Examples of the embodiments are shown in the drawings, wherein the same or similar reference numerals indicate the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the drawings are exemplary and are intended to explain the present invention, but should not be construed as limiting the present invention.

At present, the following three ways can be used to achieve the user's multilingual mixed input requirements.

The first way is to switch the input mode to achieve the purpose of multilingual mixed input. For example, when the user uses the English keyboard to input Latin characters, if the user wants to input a certain Hindi character at this time, the user needs to switch to the Hindi input method for input, and then switch back to the English keyboard to continue inputting Latin characters.

The second method is to enter the temporary input mode through a preset operation, and the user can type characters in the second language in the temporary input mode. For example, in Chinese and English input methods, the user can switch the input method by clicking the Shift key.

The third method, part of the input method supports two encoding methods in the language model, that is, according to user input, the most suitable encoding rule is automatically selected and the characters are displayed.

In the first mode, the efficiency of mixed-language input is low. In the second mode, after entering the temporary input mode, special processing of characters is required, which increases the development cycle. In the third mode, when two When the encoding differences between languages are small, the accuracy of the output of the speech model is low.

The present invention mainly aims at the technical problems of low efficiency of multilingual mixed input and low accuracy of output results in the prior art, and proposes a multilingual mixed input method oriented to Hindi.

The multi-lingual mixed input method and device for Hindi according to an embodiment of the present invention will be described in detail below with reference to the drawings. Before describing the embodiments of the present invention in detail, in order to facilitate understanding, firstly introduce common technical words:

The language model in the form of N-Gram is based on the following assumptions: the occurrence of the nth vocabulary is related to the first n-1 vocabulary, but not related to any other vocabulary. Among them, the probability of occurrence of each vocabulary can be obtained through statistical calculation of corpus data.

Assuming that the sentence T is composed of vocabulary sequences w ₁ , w ₂ , w ₃ , ..., w _N , the language model in the form of N-Gram can be expressed by the following formula:

P (w _N | w ₁ ......... w _N-1 );

The above formula indicates that the probability of the Nth vocabulary is determined by the probabilities of these vocabularies from w ₁ , w ₂ , w ₃ , ..., w _N-1 that have appeared before. In the process, The previous vocabulary is used to predict the next vocabulary that will appear, and then based on a large number of text observations, it can be obtained that the predicted vocabulary is more and more likely to be behind these existing vocabulary. Therefore, the constructed language model can be a (n-1) -order Markov model or an N-ary language model. As far as the application of input method is concerned, it is different from machine translation and other applications. Generally, it is not necessary to understand long sentences and predict word order. Generally, the value of N can be 2, 3, 4, etc.

FIG. 1 is a schematic flowchart of a Hindi-oriented multilingual mixed input method according to an embodiment of the present invention.

The Hindi-oriented multilingual mixed input method provided by the embodiment of the present invention may be implemented by the Hindi-oriented multilingual mixed input device provided by the embodiment of the present invention, and the device may be configured in any computing device so that the The computing device implements a multilingual mixed input function for Hindi.

The computing device may be a hardware device such as a personal computer (PC), a cloud device, or a mobile device. The mobile device may be a mobile phone, a tablet computer, a personal digital assistant, or a wearable device. And / or display hardware.

As shown in FIG. 1, the multilingual mixed input method for Hindi includes the following steps:

Step 101: Obtain a Latin character sequence of a current input vocabulary typed by an input method interface.

In the embodiment of the present invention, the computing device may be provided with an input method interface, and a user may enter a Latin character sequence through the input method interface. For example, when the computing device is a mobile phone, the user can manually type the Latin character sequence through the touch screen, or when the computing device is a PC, the user can manually type the Latin character sequence through the keyboard.

Optionally, a computing device may be provided with a listener to monitor a user-typed input operation. When the user-typed input operation is monitored, the current input typed by the user on the input method interface may be obtained according to the user's input operation. Vocabulary sequence of Latin characters. For example, when the user wants to enter "mobile phone", he can type "mobile" in the input method interface.

Step 102: Obtain a first candidate character string list of Latin character forms corresponding to the Latin character sequence according to the first language model. The first language model is a pre-established language model that spells Hindi in the form of Latin characters.

In the embodiment of the present invention, the first language model is a pre-established language model that spells Hindi in the form of Latin characters. For example, corpus data that spells Hindi in the form of Latin characters can be obtained, and then a language model is constructed based on the corpus data to obtain a first language model.

In the embodiment of the present invention, when a Latin character sequence is acquired, the Latin character sequence may be input to a first language model to obtain a first candidate character string list of the Latin character form corresponding to the Latin character sequence.

Specifically, when the Latin character sequence is a Hindi vocabulary in the form of a complete Latin character spelling, the Hindi vocabulary corresponding to the Latin character sequence may be directly added to the first candidate character string list. When the Latin character sequence corresponds to a Hindi vocabulary in the form of incomplete Latin character spelling, in order to improve the input efficiency of the user, or to correct and complete the Latin character sequence input by the user, in the present invention, an extension can be obtained Options. The extended option includes: a Hindi word or a vocabulary segment of a Latin character spelling form containing a Latin character sequence, and then the extended option is added to the first candidate character string list.

Sometimes, the user may have a spelling error, so in some embodiments, the input method may also provide an error correction function. That is, obtaining the first candidate character string list in the form of a Latin character corresponding to the Latin character sequence according to the first language model may further include: when the first language model does not contain a character string containing the Latin character sequence When the Hindi vocabulary in the Latin character spelling form is obtained, the Hindi vocabulary in the Latin character spelling form having the highest similarity to the Latin character sequence is obtained, and added to the first candidate character string list as an extended option.

For example, when the user wants to enter the sentence "Main bhi

meri

", The Hindi vocabulary of the Latin character spelling corresponding to this sentence is" Main bhi nahi meri kahani hai ". Assuming that the first Hindi vocabulary typed by the user is" Mai ", the output of the first language model, That is, the extension options can be: Mai, Nai, Main, Maine.

Step 103: Obtain a target Hindi vocabulary list according to the mapping relationship between the Latin character spelling form of the Hindi vocabulary and the Hindi character spelling form, which may include a first candidate. The Hindi character spelling corresponding to the Hindi vocabulary of the Latin character spelling in the string list.

In the embodiment of the present invention, a mapping relationship between the spelling form of the Latin characters of the Hindi vocabulary and the spelling form of the Hindi characters may be established in advance. The Latin character spelling form of the Hindi vocabulary includes two forms, one is : Hindi character spelling Latin pronunciation spelling directly translated from pronunciation, for example, Hindi characters

The corresponding Latin character is "dena", dena has no practical meaning in other scenes, only if you want to get Hindi characters

Only makes sense when you enter dena; another is: some English words that do not appear in Hindi, for example, there is no English word "mobile" in Hindi.

By establishing a mapping between the spellings of the Latin characters and the spellings of the Hindi vocabulary, such as establishing "mobile" and

The mapping relationship between them can ensure that the mapping relationship between the Latin character spelling form of Hindi vocabulary and the Hindi character spelling form is a one-to-one relationship. After the first candidate character string list of the Latin character form is determined, The Hindi character spelling form corresponding to the Hindi vocabulary of the Latin character spelling form in the first candidate character string list can be obtained by querying the above mapping relationship, and the operation is simple and easy to implement. And through the mapping relationship established in advance, the corresponding spelling form of the Hindi character can be determined, which can further improve the accuracy of the output result.

Step 104: Generate a first candidate word list of words including Latin character spelling form and Hindi character spelling form according to the first candidate character string list and the target Hindi vocabulary list.

In the embodiment of the present invention, after obtaining the Hindi character spelling form corresponding to the Hindi vocabulary of the Latin character spelling form in the first candidate character string list, the first candidate character string list and the first candidate character string may be obtained. The Hindi character spelling form corresponding to the Hindi vocabulary of the Latin character spelling form in the list generates a first candidate word list of the vocabulary including the Latin character spelling form and the Hindi character spelling form.

Optionally, the first candidate word list may simultaneously include all the Hindi words in the spelling form of Latin characters in the first candidate character string list and the words in the Hindi character spelling form corresponding to the Hindi word.

Further, since the display interface of the computing device is limited, the Hindi word corresponding to the first number of Latin characters in the first candidate character string list and the Hindi word corresponding to the second number of Hindi words can be selected. A vocabulary in the form of character spelling, and then a first candidate word list is generated based on the selected vocabulary. The first and second numbers can be the same or different. For example, the first number can be two and the second number can be three.

Step 105: Display the first candidate word list on the input method interface.

In the embodiment of the present invention, in order to meet a user's simultaneous input requirement for mixed input of Hindi and Latin, after obtaining the first candidate word list, the first candidate word list may be displayed on the input method interface.

Using the above example as an example, when the Latin character sequence typed by the user is "Mai", the first candidate word list displayed on the input method interface may be:

Nai,

Main, Maine.

Step 106: Acquire a selection operation of a word in the first candidate word list, and input the selected word as an input word.

In the embodiment of the present invention, the selection operation is triggered by a user, and the selection operation may be, for example, a user's click operation, or the user triggers an operation corresponding to a number or a space key on the keyboard, which is not limited.

Specifically, after the first candidate word list is displayed on the input method interface, the user may select a word from the first candidate word list for input according to actual needs. A computing device may be provided with a listener to monitor the selection operation triggered by the user. When the selection operation triggered by the user is monitored, the selected word may be determined according to the selection operation, and then the selected word is used as an input word. Enter it.

Taking the above example as an example, the user can select "Main" as an input word for input.

It should be noted that the present invention takes a mixed input of Hindi and Latin as an example, but the present invention is not limited thereto, and those skilled in the art can implement mixed input of any two languages based on the present invention. Strong scalability.

As a possible implementation, in order to improve the input efficiency of the user, after the selected vocabulary is input as the input vocabulary, the subsequent vocabulary of the input vocabulary can be predicted, so that the user can input the next vocabulary according to the prediction result. Therefore, there is no need for the user to manually type the next vocabulary, which further improves the user's multilingual mixed input efficiency. The above process will be described in detail below with reference to FIG. 2.

FIG. 2 is a schematic flowchart of lexical association input in a Hindi-oriented multilingual mixed input method according to an embodiment of the present invention.

As shown in FIG. 2, on the basis of the embodiment shown in FIG. 1, after step 106, the Hindi-oriented multilingual mixed input method may further include the following steps:

Step 201: Predict the subsequent vocabulary of the input vocabulary according to the language model corresponding to the input vocabulary, and generate a second candidate word list according to the prediction result.

Specifically, when the spelling form of the input vocabulary is Latin characters, the subsequent input vocabulary can be predicted according to the first language model, and when the spelling form of the input vocabulary is Hindi characters, the subsequent input vocabulary is predicted according to the second language model, where The second language model is a pre-established language model that spells Hindi in the form of Hindi characters. For example, Hindi corpus data spelled with Hindi characters can be obtained, and then a language model is constructed based on the corpus data to obtain a second language model.

For example, when the input vocabulary is "Main", it can be known that the spelling form of the input vocabulary is Latin characters, and the subsequent input vocabulary is predicted according to the first language model. The prediction result can be: bhi, ne, to, nahi, khud, hi.

When the input vocabulary is

It can be known that the spelling form of the input vocabulary is Hindi characters, and the subsequent input vocabulary is predicted according to the second language model. The prediction result can be

In the embodiment of the present invention, the second candidate word list may include all words in the candidate result. Further, due to the limited display interface of the computing device, the second candidate word list may include the third number of words in the prediction result. Among them, the third number is preset.

Step 202: Display the second candidate word list on the input method interface.

In the embodiment of the present invention, after the second candidate word list is generated, the second candidate word list may be displayed on the input method interface.

Step 203: Acquire a vocabulary selection operation of the second candidate word list, and input the selected vocabulary as the next input vocabulary.

In the embodiment of the present invention, after the second candidate word list is displayed on the input method interface, the user may select a word from the second candidate word list for input according to actual needs. A computing device may be provided with a listener to monitor the selection operation triggered by the user. When the selection operation triggered by the user is monitored, the selected word may be determined according to the selection operation, and then the selected word is used as the next one. Enter the word for input.

As an application scenario, when a user wishes to efficiently input a mixed input sentence including Latin and Hindi, the multi-language mixed input method for Hindi according to the embodiment of the present invention can be used in the process of inputting vocabulary by the user. To perform error correction, completion and prediction of the input vocabulary.

Suppose the statement the user wants to enter is "Main bhi

meri

", The Hindi vocabulary of the Latin character spelling corresponding to this sentence is" Main bhi nahi meri kahani hai ".

1) When the user inputs the vocabulary "Mai", after completing the first language model to complete the error correction and query the mapping relationship, the first candidate word list obtained can be:

Mai,

Nai,

Main, Maine

2) The user can select the vocabulary "Main", and then predict the subsequent input vocabulary according to the first language model. The second candidate word list obtained can be:

bhi, ne, to, nahi, khud, hi

3) The user can select the vocabulary "bhi", and then predict the subsequent input vocabulary based on the first language model, and the obtained second candidate word list can be:

nahi, bhi, to, ho, hai, na

4) The vocabulary that the user wants to output is Hindi, which is spelled in the form of Hindi characters corresponding to "nahi". At this time, the user can enter "nahi". After the first language model and the query mapping relationship, the obtained first A candidate list can be:

nahi,

mahi,

nani,

5) Users can choose vocabulary

After the user inputs the word "meri", the first candidate word list obtained after the first language model and the query mapping relationship can be:

meri,

Meri, Mari, mari, mero

6) The user can select the vocabulary "meri". After that, the vocabulary that the user wants to output is the Hindi spelled in the form of Hindi characters corresponding to "kahani". At this time, the user can enter "kahani" and pass the first language After the model and the query mapping relationship, the first candidate word list obtained can be:

kahani,

kahaani, kahaniya, kahaani, kahaniyaan

7) Users can choose vocabulary

Then, according to the second language model, subsequent input words are predicted, and the obtained second candidate word list can be:

8) Users can choose vocabulary

This ends the output. Therefore, the input efficiency of the user can be effectively improved.

As another application scenario, when a user wants to enter a Hindi word spelled in the form of Hindi characters, the user does not know the spelling rules of the word, but only knows the spelling of some Latin characters corresponding to the word form. For example, the vocabulary the user wants to enter is

The Latin character spelling form corresponding to this vocabulary is "Abhishek", if the user only remembers the first half of the Latin character spelling form "Abhis".

1) The user can enter the vocabulary "Abhis", after completing the first language model to correct it, and querying the mapping relationship, the first candidate word list obtained can be:

Abhis, Abhishek,

Abhisek, Abhisar

2) Users can choose vocabulary

This ends the output. Therefore, the input efficiency of the user can be effectively improved, and continuous input of the character string can be ensured.

As a possible implementation manner, refer to FIG. 3, which is a schematic flowchart of establishing a language model according to an embodiment of the present invention. The process of establishing the first language model may include the following steps:

Step 301: Obtaining corpus data that spells Hindi in the form of Latin characters, and preprocesses the corpus data to remove the erroneous corpus and low-frequency corpus to obtain a valid corpus.

In the embodiment of the present invention, corpus data spelling Hindi in the form of Latin characters in India can be collected, and then the corpus data is pre-processed to remove the erroneous corpus and low-frequency corpus to obtain an effective corpus. For example, the corpus can be The data is subjected to preprocessing operations such as interference removal of non-text information, spell check correction, data cleaning, data formatting, and selection of high-frequency words, so as to ensure the performance of the first language model after learning.

Step 302: Remove redundant parts in the valid corpus data to obtain a collated corpus.

It should be understood that there is often a lot of redundant information in the obtained effective corpus data. If the effective corpus data is directly used to build a language model, it will seriously affect the learning efficiency of the first language model. Therefore, in the present invention, the redundant part in the effective corpus data can be removed to obtain a collated corpus, thereby reducing the redundancy of the corpus data and the storage space occupied by it, and improving the learning efficiency of the first language model.

Step 303: Construct a language model by using the corpus.

In the embodiment of the present invention, when the collated corpus is obtained, the collated corpus may be used to construct a language model. When constructing a language model, in order to avoid data overflow and improve the performance of the language model, logarithms can be used, and addition operations can be used instead of multiplication operations.

As a possible implementation, since the subsequent input vocabulary needs to be predicted according to the language model and the input vocabulary, and the appearance of subsequent input vocabulary is only related to the previously appeared vocabulary and not related to any other vocabulary, therefore, the language model can A language model in the form of N-Gram is an N-gram language model. Then step 303 may specifically include: constructing a language model in the form of N-Gram using the compiled corpus, and calculating the parameters of the language model, wherein the parameters of the language model include: vocabulary in the language model and N-ary vocabulary arrangement In N, the conditional probability of the Nth word with respect to the first N-1 words, where N is a positive integer.

Assuming the words in the language model are: w ₁ , w ₂ , w ₃ , ..., w _N , the conditional probability of the Nth word with respect to the first N-1 words is:

P (w _N | w ₁ ......... w _N-1 );

It should be noted that, assuming that the vocabulary in the language model is 1000, when the language model is a binary language model, using a binary language model will form a matrix of 1000 * 1000, and using a ternary language model will form 1000 * 1000. The matrix of * 1000 contains a large number of zero values, that is, a sparse matrix. At this time, the sparse data in the formed matrix needs to be smoothed. That is, step 303 may further include: smoothing the conditional probability data, so that the conditional probability corresponding to the N-ary vocabulary arrangement that does not appear in the collated corpus is not zero.

Optionally, data smoothing technology can be used to smooth the conditional probability data to reduce the conditional probability corresponding to the N-ary vocabulary arrangement that has appeared in the collated corpus, so that the conditional probability corresponding to the N-ary vocabulary arrangement that does not appear Not zero.

In order to implement the above embodiment, the present invention also proposes a multilingual mixed input device oriented to Hindi.

The implementation of the device may include one or more computing devices. The computing device includes a processor and a memory, and the memory stores an application program including computer program instructions executable on the processor. The application program can be divided into a plurality of program modules for corresponding functions of each component of the system. The division of program modules is logical rather than physical. Each program module can run on one or more computing devices, and one computing device can also run one or more program modules. In the following, the device of the present invention is described in detail according to the functional logic division of the program module.

FIG. 4 is a schematic structural diagram of a Hindi-oriented multilingual mixed input device according to an embodiment of the present invention.

The multilingual mixed input device 100 for Hindi may be implemented by using a computing device including a processor and a memory. The memory stores program modules that can be executed by the processor. When each program module is executed, the computing device is controlled to implement corresponding operations. Functions.

As shown in FIG. 4, the multilingual mixed input device 100 for Hindi includes: an input character acquisition module 101, a first candidate character string generation module 102, a vocabulary mapping module 103, a first candidate word list generation module 104, a first A candidate word list display module 105 and a first candidate word input module 106. among them,

The input character acquisition module 101 is configured to acquire a Latin character sequence of a current input vocabulary typed by an input method interface.

A first candidate character string generating module 102 is configured to obtain a first candidate character string list in the form of a Latin character corresponding to a Latin character sequence according to a first language model. The first language model is a language model that spells Hindi in the form of Latin characters. .

A vocabulary mapping module 103 is configured to obtain a target Hindi vocabulary list according to a mapping relationship between the spelling form of the Latin characters of the Hindi vocabulary and the spelling form of the Hindi characters, and the target Hindi vocabulary list includes : The Hindi character spelling form corresponding to the Hindi vocabulary of the Latin character spelling form in the first candidate string list.

The first candidate word list generating module 104 is configured to generate a first candidate word list including a Latin character spelling form and a Hindi character spelling form according to the first candidate character string list and the target Hindi word list.

The first candidate word list display module 105 is configured to display the first candidate word list on an input method interface.

The first candidate word input module 106 is configured to obtain a selection operation of a word in the first candidate word list, and input the selected word as an input word.

Further, in a possible implementation manner of the embodiment of the present invention, referring to FIG. 5, based on the embodiment shown in FIG. 4, the Hindi-oriented multilingual mixed input device 100 may further include:

The first candidate string generating module 102 is specifically configured to: when the Latin character sequence is a Hindi vocabulary in the form of a complete Latin character spelling, add the Hindi vocabulary corresponding to the Latin character sequence to the first candidate string list; and The extended option is obtained. The extended option includes a Hindi word or a vocabulary segment of a Latin character spelling form containing a Latin character sequence, and the extended option is added to the first candidate character string list.

The first candidate character string generating module 102 may be further configured to: obtain a similarity to the Latin character sequence when there is no Hindi vocabulary in the first language model containing the Latin character spelling form of the Latin character sequence The Hindi vocabulary with the highest degree of spelling of Latin characters is added as an extended option to the first candidate string list.

A second candidate word list generating module 107 is configured to predict a subsequent vocabulary of the input vocabulary according to a language model corresponding to the input vocabulary, and generate a second candidate word list according to the prediction result.

The second candidate word list display module 108 is configured to display the second candidate word list on the input method interface.

The second candidate word input module 109 is configured to obtain a selection operation of a vocabulary in the second candidate word list, and input the selected vocabulary as a next input vocabulary.

As a possible implementation manner, the second candidate word list generating module 107 is specifically configured to determine whether the spelling form of the input vocabulary is Latin characters or Hindi characters; when the spelling form of the input vocabulary is Latin characters, according to the first The language model predicts subsequent input vocabulary; when the spelling form of the input vocabulary is Hindi characters, the subsequent input vocabulary is predicted according to the second language model, which is a pre-established language that spells Hindi in the form of Hindi characters model.

The first language model creation module 110 is configured to establish a first language model.

As a possible implementation manner, the first language model creation module 110 includes:

A corpus acquisition unit 111 is configured to acquire corpus data spelling Hindi in the form of Latin characters, and preprocess the corpus data to remove the erroneous corpus and low-frequency corpus therein to obtain a valid corpus.

The corpus de-redundant unit 112 is used to remove redundant parts in the effective corpus data to obtain a collated corpus.

The language model constructing unit 113 is configured to construct a language model using the corpus after arrangement.

As a possible implementation manner, the language model constructing unit 113 is specifically configured to: use the collated corpus to construct a language model in the form of N-Gram, and calculate the parameters of the language model, wherein the parameters of the language model include: the language model Vocabulary, as well as the conditional probability of the Nth vocabulary with respect to the first N-1 vocabulary, N is a positive integer; and the conditional probability data is smoothed so that the The conditional probability corresponding to the N-gram lexical arrangement is not zero.

For details of the implementation process of the functions and functions of the various modules in the Hindi-oriented multilingual mixed input device 100 of the present invention, refer to the implementation process of the corresponding steps in the above method. As for the device embodiment, since it basically corresponds to the method embodiment, the foregoing explanation of the method embodiment of the present invention is also applicable to the device embodiment of the present invention. In order to avoid redundancy, all details will not be repeated in the device embodiment. For related unresolved details, please refer to the above-mentioned related description of the embodiment of the multi-lingual mixed input method for Hindi of the present invention with reference to FIG. 1 to FIG. 3. .

The multilingual mixed input device for Hindi according to the embodiment of the present invention obtains a Latin character sequence of a current input vocabulary typed by an input method interface, and then obtains a first Latin character form corresponding to the Latin character sequence according to a first language model. Candidate string list, where the first language model is a pre-established language model that spells Hindi in Latin characters, and then according to the pre-established Hindi vocabulary between the spelling form of Latin characters and the Hindi character spelling To obtain the Hindi character spelling form corresponding to the Hindi vocabulary of the Latin character spelling form in the first candidate character string list, and according to the first candidate character string list and the Latin character in the first candidate character string list A Hindi character spelling form corresponding to the Hindi vocabulary of the character spelling form, generating a first candidate list of words including the Latin character spelling form and the Hindi character spelling form, and finally displaying the first candidate word on the input method interface List, and get a selection operation of the words in the first candidate word list, which will be The vocabulary input as input vocabulary. As a result, there is no need to frequently switch input modes to meet the user's simultaneous input requirements for mixed input of Hindi and Latin, improve multi-language mixed input efficiency, and improve user input experience. In addition, according to the mapping relationship, determining the spelling form of the Hindi characters can improve the accuracy of the output result.

In order to implement the above embodiments in real time, the present invention also provides a non-transitory computer-readable storage medium.

The non-transitory computer-readable storage medium according to the embodiment of the present invention stores executable instructions thereon. When the executable instructions are run on a processor, the multilingual oriented to the Hindi language as proposed in the foregoing embodiment of the present invention is implemented. Mixed input method. The storage medium may be provided on the device as part of the device; or when the device can be remotely controlled by the server, the storage medium may be provided on a remote server that controls the device.

The computer instructions for implementing the method of the present invention may be carried in any combination of one or more computer-readable media. The so-called non-transitory computer-readable medium may include any computer-readable medium, except for the signal itself which is temporarily propagated. The computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer-readable storage media include: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing. In this document, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.

In order to implement the above embodiments, the present invention also provides a computer program product.

In the computer program product according to the embodiment of the present invention, when the instructions in the computer program product are executed by a processor, the multi-language mixed input method for Hindi according to the foregoing embodiment of the present invention is implemented.

Computer program code for performing the operations of the present invention may be written in one or more programming languages, or combinations thereof, including programming languages such as Java, Smalltalk, C ++, and also conventional Procedural programming language—such as "C" or similar programming language. The program code can be executed entirely on the user's computer, partly on the user's computer, as an independent software package, partly on the user's computer, partly on a remote computer, or entirely on a remote computer or server. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider) Internet connection).

In order to implement the above embodiments, the present invention also provides a computing device.

A computing device according to an embodiment of the present invention includes a memory, a processor, and a computer program stored on the memory and executable on the processor. When the processor executes the program, the print-oriented computer according to the foregoing embodiments of the present invention is implemented. Multilingual mixed input method in the local language.

The computing device may be implemented by a central control unit of a computer device as part of the function of the central control unit of the computer device. It can also be implemented by a separate computing device, which is communicatively connected with the central control unit of the computer device. The implementation of the computing device may include, but is not limited to, a single chip microcomputer, a programmable logic controller (PLC), a complex programmable logic device (CPLD), a programmable gate array (PGA), a field programmable gate array (FPGA), and a dedicated nerve Network chip, etc.

Specific implementations of the above-mentioned storage medium and computing device and related parts thereof can be obtained from the corresponding embodiments of the Hindi-oriented multilingual mixed input method or device of the present invention, and have the corresponding Hindi-oriented The multi-language mixed input method or device has similar beneficial effects, and is not repeated here.

The non-transitory computer-readable storage medium, computer program product, and computing device according to the embodiments of the present invention may be implemented with reference to the content specifically described in the foregoing embodiments of the present invention, and have many advantages to the Hindi-oriented multifaceted solutions proposed by the foregoing embodiments of the present invention Similar beneficial effects of the mixed language input method are not repeated here.

It should be noted that, in the description of this specification, the description with reference to the terms “one embodiment”, “some embodiments”, “examples”, “specific examples”, or “some examples” means that the embodiments or The specific features, structures, materials, or characteristics described in the examples are included in at least one embodiment or example of the present invention. In this specification, the schematic expressions of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. In addition, without any contradiction, those skilled in the art may combine and combine different embodiments or examples and features of the different embodiments or examples described in this specification.

In addition, the terms "first" and "second" are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Therefore, the features defined as "first" and "second" may explicitly or implicitly include at least one of the features. In the description of the present invention, the meaning of "plurality" is two or more, such as two, three, etc., unless it is specifically and specifically defined otherwise.

Those of ordinary skill in the art may understand that all or part of the steps carried by the method for implementing the foregoing embodiments may be completed by a program instructing related hardware. The program may be stored in a computer-readable storage medium. The program When executed, one or a combination of the steps of the method embodiments is included.

In the description of this specification, any process or method description in a flowchart or otherwise described herein may be understood to mean an instruction that includes one or more executable instructions for implementing a particular logical function or process step. Modules, fragments or sections of code, and the scope of the preferred embodiments of the present invention includes additional implementations, which may not be in the order shown or discussed, including in a substantially simultaneous manner or in the reverse order according to the functions involved To perform functions, which should be understood by those skilled in the art to which the embodiments of the present invention pertain.

The logic and / or steps represented in the flowchart or otherwise described herein, for example, a sequenced list of executable instructions that can be considered to implement a logical function, can be embodied in any computer-readable medium, For the instruction execution system, device, or device (such as a computer-based system, a system including a processor, or other system that can fetch and execute instructions from the instruction execution system, device, or device), or combine these instruction execution systems, devices, or devices Or equipment. For the purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device.

It should be understood that each part of the present invention may be implemented by hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods may be implemented by software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it may be implemented using any one or a combination of the following techniques known in the art: Discrete logic circuits, application specific integrated circuits with suitable combinational logic gate circuits, programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.

Although the embodiments of the present invention have been shown and described above, it can be understood that the above embodiments are exemplary and should not be construed as limitations on the present invention. Those skilled in the art can interpret the above within the scope of the present invention. Embodiments are subject to change, modification, substitution, and modification.

Claims

A multi-language mixed input method for Hindi, which includes:

Get the Latin character sequence of the current input vocabulary typed by the input method interface;

Obtaining a first candidate character string list of Latin character forms corresponding to the Latin character sequence according to a first language model, where the first language model is a pre-established language model that spells Hindi in the form of Latin characters;

According to a pre-established mapping relationship between the spelling form of the Latin characters of the Hindi word and the spelling form of the Hindi character, a target Hindi word list is obtained, and the target Hindi word list includes: a first candidate character string list The spelling form of the Hindi character corresponding to the Hindi vocabulary of the Latin character spelling;

Generating, according to the first candidate character string list and the target Hindi vocabulary list, a first candidate word list of words including Latin character spelling form and Hindi character spelling form;

Displaying the first candidate word list on an input method interface;

Acquiring a selection operation of a word in the first candidate word list, and inputting the selected word as an input word.
The multi-language mixed input method for Hindi according to claim 1, wherein, according to the first language model, obtaining a first candidate character string list of Latin character forms corresponding to the Latin character sequence, comprising: :

When the Latin character sequence is a Hindi vocabulary in the form of a complete Latin character spelling, adding the Hindi vocabulary corresponding to the Latin character sequence to the first candidate character string list; and

An extended option is obtained, the extended option includes: a Hindi word or a vocabulary segment containing a Latin character spelling form of the Latin character sequence, and the extended option is added to a first candidate character string list.
The multi-lingual mixed input method for Hindi according to claim 2, wherein the first candidate character string list in the form of a Latin character corresponding to the Latin character sequence is obtained according to the first language model, and include:

When there is no Hindi word in the first language model containing the Latin character spelling form of the Latin character sequence, obtaining a Hindi word in the Latin character spelling form having the highest similarity to the Latin character sequence, and Add it as an extended option to the first candidate string list.
The multi-lingual mixed input method for Hindi according to claim 1, wherein the obtaining operation for selecting a word in the first candidate word list uses the selected word as an input word After that, it also includes:

Predicting a subsequent vocabulary of the input vocabulary according to the language model corresponding to the input vocabulary, and generating a second candidate word list according to the prediction result;

Displaying the second candidate word list on an input method interface;

Acquiring a selection operation of a vocabulary of the second candidate word list, and inputting the selected vocabulary as a next input vocabulary.
The multilingual mixed input method for Hindi according to claim 4, wherein the prediction of a subsequent vocabulary of the input vocabulary is performed according to a language model corresponding to the input vocabulary, and a second vocabulary is generated according to the prediction result. Candidate list, including:

Determining whether the spelling form of the input vocabulary is a Latin character or a Hindi character;

When the spelling form of the input vocabulary is Latin characters, predicting subsequent input vocabulary according to the first language model;

When the spelling form of the input vocabulary is Hindi characters, the subsequent input vocabulary is predicted according to a second language model, which is a pre-established language model that spells Hindi in the form of Hindi characters.
The multi-lingual mixed input method for Hindi according to claim 1, wherein, according to the first language model, a first candidate character string list of Latin character forms corresponding to the Latin character sequence is obtained, and The first language model is a pre-established language model that spells Hindi in the form of Latin characters, wherein,

The pre-establishment of the first language model includes:

Acquiring corpus data spelling Hindi in the form of Latin characters, and preprocessing the corpus data to remove erroneous corpus and low-frequency corpus therein to obtain an effective corpus;

Removing redundant parts in the effective corpus data to obtain a collated corpus;

Use the corpus to organize the language model.
The multi-lingual mixed input method for Hindi according to claim 6, wherein the constructing a language model using the collated corpus comprises:

Use the collated corpus to construct a language model in the form of N-Gram, and calculate the parameters of the language model, where the parameters of the language model include: words in the language model, and in the N-gram lexical arrangement, the Nth word is about the former Conditional probability for N-1 words, where N is a positive integer; and

Smooth the conditional probability data so that the conditional probability corresponding to the N-ary vocabulary arrangement that does not appear in the collated corpus is not zero.
A multilingual mixed input device oriented to Hindi, which includes:

Input character acquisition module, which is used to acquire the Latin character sequence of the current input vocabulary typed by the input method interface;

A first candidate character string generating module, configured to obtain a first candidate character string list in the form of a Latin character corresponding to the Latin character sequence according to a first language model, where the first language model is to spell Hindi in the form of a Latin character Language model

A vocabulary mapping module is configured to obtain a target Hindi vocabulary list according to a mapping relationship between a Latin character spelling form of the Hindi vocabulary and a Hindi character spelling form, which is established in advance. The target Hindi vocabulary list includes: The Hindi character spelling form corresponding to the Hindi vocabulary in the Latin character spelling form in the first candidate string list;

A first candidate word list generating module, configured to generate, according to the first candidate character string list and the target Hindi vocabulary list, a first candidate word list including a Latin character spelling form and a Hindi character spelling form ;

A first candidate word list display module, configured to display the first candidate word list on an input method interface;

The first candidate word input module is configured to obtain a selection operation of a word in the first candidate word list, and input the selected word as an input word.
The multilingual mixed input device for Hindi according to claim 8, wherein the first candidate character string generating module is specifically configured to:

When the Latin character sequence is a Hindi vocabulary in the form of a complete Latin character spelling, adding the Hindi vocabulary corresponding to the Latin character sequence to the first candidate character string list; and

An extended option is obtained, the extended option includes: a Hindi word or a vocabulary segment containing a Latin character spelling form of the Latin character sequence, and the extended option is added to a first candidate character string list.
The multilingual mixed input device for Hindi according to claim 9, wherein the first candidate character string generating module is further configured to:

When there is no Hindi word in the first language model containing the Latin character spelling form of the Latin character sequence, obtaining a Hindi word in the Latin character spelling form having the highest similarity to the Latin character sequence, and Add it as an extended option to the first candidate string list.
The multilingual mixed input device for Hindi according to claim 8, further comprising:

A second candidate word list generating module, configured to predict a subsequent vocabulary of the input vocabulary according to the language model corresponding to the input vocabulary, and generate a second candidate word list according to the prediction result;

A second candidate word list display module, configured to display the second candidate word list on an input method interface;

A second candidate word input module is configured to obtain a selection operation of a vocabulary of the second candidate word list, and input the selected vocabulary as a next input vocabulary.
The multilingual mixed input device for Hindi according to claim 11, wherein the second candidate word list generating module is specifically configured to:

Determining whether the spelling form of the input vocabulary is a Latin character or a Hindi character;

When the spelling form of the input vocabulary is Latin characters, predicting subsequent input vocabulary according to the first language model;

When the spelling form of the input vocabulary is Hindi characters, the subsequent input vocabulary is predicted according to a second language model, which is a pre-established language model that spells Hindi in the form of Hindi characters.
The multilingual mixed input device for Hindi according to claim 8, further comprising: a first language model creation module for establishing a first language model, wherein the first language model creation module comprises:

A corpus acquisition unit, configured to acquire corpus data spelling Hindi in the form of Latin characters, and preprocess the corpus data to remove the erroneous corpus and low-frequency corpus therein to obtain an effective corpus;

A corpus deduplication unit for removing redundant parts in the valid corpus data to obtain a collated corpus;

A language model building unit is used to build a language model using the collated corpus.
The multilingual mixed input device for Hindi according to claim 13, wherein the language model construction unit is specifically configured to:

Use the collated corpus to construct a language model in the form of N-Gram, and calculate the parameters of the language model, where the parameters of the language model include: words in the language model, and in the N-gram lexical arrangement, the Nth word is about the former Conditional probability for N-1 words, where N is a positive integer; and

Smooth the conditional probability data so that the conditional probability corresponding to the N-ary vocabulary arrangement that does not appear in the collated corpus is not zero.
A non-transitory computer-readable storage medium having stored thereon a computer program, characterized in that, when the program is executed by a processor, the multi-faceted Hindi-oriented multilingual program according to any one of claims 1-7 is implemented. Mixed language input method.
A computer program product, characterized in that when instructions in the computer program product are executed by a processor, the multi-lingual mixed input method for Hindi according to any one of claims 1-7 is implemented.
A computing device, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor. When the processor executes the program, the processor implements any one of claims 1-7. Multi-language mixed input method for Hindi as described in item 6.