Nothing Special   »   [go: up one dir, main page]

CN113449514B - Text error correction method and device suitable for vertical field - Google Patents

Text error correction method and device suitable for vertical field Download PDF

Info

Publication number
CN113449514B
CN113449514B CN202110687769.5A CN202110687769A CN113449514B CN 113449514 B CN113449514 B CN 113449514B CN 202110687769 A CN202110687769 A CN 202110687769A CN 113449514 B CN113449514 B CN 113449514B
Authority
CN
China
Prior art keywords
error correction
text
word
model
bert
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110687769.5A
Other languages
Chinese (zh)
Other versions
CN113449514A (en
Inventor
励建科
陈再蝶
朱晓秋
周杰
樊伟东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kangxu Technology Co ltd
Original Assignee
Zhejiang Kangxu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Kangxu Technology Co ltd filed Critical Zhejiang Kangxu Technology Co ltd
Priority to CN202110687769.5A priority Critical patent/CN113449514B/en
Publication of CN113449514A publication Critical patent/CN113449514A/en
Application granted granted Critical
Publication of CN113449514B publication Critical patent/CN113449514B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a text error correction method and a text error correction device suitable for the vertical field, comprising the following steps: s1, importing a text into a pretrained Bert error correction model, and performing text word sense error correction; s2, importing the text subjected to error correction by the Bert error correction model into a Pinyin error correction model, and performing secondary error correction; and S3, importing the text subjected to the second error correction by the pinyin error correction model into a hotword replacement rule model, and performing third error correction. According to the text correction method and device, the text input by the user is poured into the Bert correction model to correct the text, the corrected text is imported into the Pinyin correction model to correct the text secondarily, so that after the text is corrected semantically, proper nouns in the vertical field are corrected to achieve the reinforcing effect, the accuracy of text correction is improved, the text after the secondary correction is poured into the hot word replacement rule model to replace the hot words, spoken text such as dialects is converted into proper nouns, and the correction effect is enhanced again.

Description

Text error correction method and device suitable for vertical field
Technical Field
The invention relates to the technical field of natural language processing, in particular to a text error correction method and an error correction device thereof, which are applicable to the vertical field.
Background
Natural Language Processing (NLP) is an artificial intelligence for specialized analysis of human language, and modern NLP is a hybrid discipline that incorporates linguistics, computer science, and machine learning, and in order for NLP to respond more accurately to input text, we need to correct the text, thereby reducing noise. At present, text error correction mainly focuses on semantic analysis to find and replace wrongly written characters, and a text error correction model on the market is mainly divided into two main categories, namely machine learning and deep learning.
However, firstly, the machine learning model cannot fit data, so that the accuracy is low, while the deep learning model needs a large amount of accurate corpus, and meanwhile, a large amount of time is needed for training, and in the vertical field, the accuracy of the common deep model still needs to be improved due to the corpus noise problem;
secondly, there are many proper nouns in the vertical field that will be used in this scenario, it is difficult to detect misplaced words in the proper nouns by means of semantic error correction alone, and the model may even change the correct words to be incorrect based on the corpus;
finally, because of dialects or personal habits, there may be multiple ways of referring to the same thing, which may cause noise such that it is difficult for the NLP to get the correct information, but these terms are not strictly wrong, and general error correction is difficult to react to these words.
Disclosure of Invention
In order to solve the technical problems mentioned in the background art, a text error correction method and an error correction device thereof suitable for the vertical field are provided.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a text error correction method suitable for the vertical field comprises the following steps:
s1, importing a text into a pretrained Bert error correction model, and performing text word sense error correction;
s11, segmenting the text into short sentences according to punctuation marks;
s12, carrying out mask processing on a first word in the short sentence;
s13, performing short sentences on the words subjected to mask processing through a pretrained Bert error correction model to predict, and storing all prediction results in a first list, wherein the prediction results in the first list are arranged according to the order of the prediction scores from large to small;
s131, if the masked word is in the first list, the masked word is regarded as correct;
s132, if the masked words are not in the first list, acquiring all common words with the same pronunciation as the masked words according to pinyin and storing the common words in the second list;
s1321, if there is the same word in List one and List two, the word to be masked
Regarding as wrongly written words, selecting the word with the highest predictive score from the first list to replace the masked word so as to achieve the purpose of error correction;
s1322, if the words in list one and list two are not identical, the word to be masked
Is considered correct;
s14, after judging the first word of the short sentence, carrying out mask processing on the next word in the short sentence, and repeating the step S13 until all Chinese characters in the text are detected and corrected;
s2, importing the text subjected to error correction by the Bert error correction model into a Pinyin error correction model, and performing secondary error correction;
s21, converting all texts subjected to error correction by the Bert error correction model into pinyin;
s22, sequentially comparing the spelling of the hot word with the spelling of the text from small to large according to the number of words;
s23, when the hot word spelling is completely the same as the text spelling, the hot word spelling is the same as the text spelling in the text
Partial replacement with hotwords;
s24, repeating the step S22 and the step S23 until all hot words are checked.
S3, importing the text subjected to the second error correction by the pinyin error correction model into a hotword replacement rule model, and performing third error correction;
s31, importing the text subjected to the second error correction by the pinyin error correction model into a hotword replacement rule model;
and S32, traversing the text by using the key list, replacing the text with a corresponding value, namely a corresponding correct word when the text detects the key, namely the word needing error correction, and outputting the text subjected to final error correction.
As a further description of the above technical solution:
the text error correction device comprises a pretrained Bert error correction model, a Pinyin error correction model and a hot word replacement rule model, wherein the Bert error correction model is a Multi-layer bidirectional Transformers encoder, the Embedding of the Bert error correction model is formed by summing three Embedding, the three Embedding are Token Embeddings, segment Embeddings and Position Embeddings respectively, the Bert error correction model uses Multi-Head Attention for encoding, three dimensions of Key, query and Value are obtained respectively through dimension expansion of the input Embedding, multi-Head division is carried out on each dimension, each Head divided is then carried out with other words, so that a new vector is obtained, the new vector of each Head is spliced, and a final Multi-Head Attention Value is obtained through linear conversion of a weight matrix.
As a further description of the above technical solution:
the pinyin error correction model comprises a database, wherein the database contains hot words in a certain field and corresponding hot word pinyin and word numbers, and the hot words in the certain field are derived from proper nouns in the field.
As a further description of the above technical solution:
the pinyin error correction model comprises a database, wherein the database contains hot words in a certain field and corresponding hot word pinyin and word numbers, and the hot words in the certain field are derived from proper nouns in the field.
As a further description of the above technical solution:
the hot word replacement rule model comprises a dictionary, wherein words to be corrected are set as keys in the dictionary, corresponding correct words are set as values, and all the keys are stored in a key list.
As a further description of the above technical solution:
the pretrained Bert error correction model is pretrained by two models, including Masked language mode and Next sentence prediction;
the Masked language mode pre-trains the Bert error correction model by inputting randomly masked tokens in the corpus and predicting the randomly masked tokens;
the Next sentence prediction is configured to pre-train the Bert error correction model on whether the sentence B is the next sentence of the sentence a by inputting the sentence a and the sentence B, wherein the sentence B is 50% likely to be the next sentence of the sentence a and 50% likely to be a random sentence in the corpus.
As a further description of the above technical solution:
the corpus comprises the corpus of hot words in a vertical field of a certain field.
In summary, due to the adoption of the technical scheme, the beneficial effects of the invention are as follows: according to the text correction method, text input by a user is poured into the Bert correction model for text correction, the corrected text is imported into the Pinyin correction model for secondary correction, so that after the text is subjected to semantic correction, proper nouns in the vertical field are corrected to achieve the enhancement effect, the accuracy of text correction is improved, the text subjected to secondary correction is poured into the hot word replacement rule model for hot word replacement, spoken text such as dialect is converted into proper nouns, the correction effect is enhanced again, through the three correction systems, the text can be subjected to basic correction from the semantic through the context, and the correction can be performed to a certain degree of replacement correction aiming at proper nouns in the vertical field, specific nouns and dialect slang under the application scene environment, which is difficult to achieve by the single Bert correction model.
Drawings
Fig. 1 shows a schematic flow chart of a text error correction method applicable to the vertical field according to an embodiment of the present invention;
fig. 2 shows a schematic diagram of a Bert error correction flow of a text error correction method applicable to the vertical field according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a Pinyin correction flow of a text correction method applicable to the vertical field according to an embodiment of the present invention;
fig. 4 shows a schematic flow chart of a hotword replacement rule applicable to a text error correction method in the vertical field according to an embodiment of the present invention;
fig. 5 shows a schematic diagram of a Bert error correction model input part of a text error correction device applicable to a specific vertical field according to an embodiment of the present invention;
fig. 6 shows a schematic flow diagram of multi_head Attention in a Bert error correction model of a text error correction device suitable for a specific vertical field according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Referring to fig. 1-6, the present invention provides a technical solution: a text error correction method suitable for the vertical field comprises the following steps:
s1, importing a text into a pretrained Bert error correction model, and performing text word sense error correction;
s11, segmenting the text into short sentences according to punctuation marks;
s12, carrying out mask processing on a first word in the short sentence;
s13, performing short sentences on the words subjected to mask processing through a pretrained Bert error correction model to predict, and storing all prediction results in a first list, wherein the prediction results in the first list are arranged according to the order of the prediction scores from large to small;
s131, if the masked word is in the first list, the masked word is regarded as correct;
s132, if the masked words are not in the first list, acquiring all common words with the same pronunciation as the masked words according to pinyin and storing the common words in the second list;
s1321, if there is the same word in List one and List two, the word to be masked
Regarding as wrongly written words, selecting the word with the highest predictive score from the first list to replace the masked word so as to achieve the purpose of error correction;
s1322, if the words in list one and list two are not identical, the word to be masked
Is considered correct;
s14, after judging the first word of the short sentence, carrying out mask processing on the next word in the short sentence, and repeating the step S13 until all Chinese characters in the text are detected and corrected;
s2, importing the text subjected to error correction by the Bert error correction model into a pinyin error correction model, performing secondary error correction, and enhancing the vertical field, wherein a plurality of proper nouns which can be used in a small scene exist in the small scene, and the Bert error correction model can not find the errors and even change the originally correct words into errors based on corpus;
for example, text mistakes a "long positive bank card" as a "long sign bank card", and the semantic error correction by the Bert error correction model alone may not be able to sense this error, so we use the Pinyin error correction model to strengthen by storing proper nouns of small scenes, such as the card name of the five-flower eight-door in the banking field, as hotwords and corresponding Pinyin together with the number of words in a database, such as [ "great wall credit card", "chang+cheng+xin+yong+ka",5];
s21, converting all texts subjected to error correction by the Bert error correction model into pinyin;
s22, sequentially comparing the spelling of the hot word with the spelling of the text from small to large according to the number of words;
s23, when the hot word spelling is completely the same as the text spelling, the hot word spelling is the same as the text spelling in the text
Partial replacement with hotwords;
s24, repeating the step S22 and the step S23 until all hot words are checked.
S3, importing the text subjected to the second correction by the pinyin correction model into a hotword replacement rule model for performing third correction, and further processing the text subjected to the pinyin correction by using the hotword replacement rule model for further optimizing the correction result, wherein the text subjected to the pinyin correction is likely to be ignored by semantic correction of the Bert correction model due to spoken language and dialect, and the pinyin correction model can disregard the text due to the large pronunciation difference with proper nouns;
for example, the text we need is "credit", but the text input is "private credit", and for the Bert error correction model, the semantics of "private credit" are not problematic, and [ "si+ren+dai",3] is significantly different from [ "ge+dai",2], and the pinyin error correction does not respond;
for another example, "me" has several different reading methods in chinese, such as "me", "no", and so on, which are also not recognized by the Bert error correction model and the pinyin error correction model, so we use the hot word replacement rule model to correct errors in these texts, replacing them with words that we need;
s31, importing the text subjected to the second error correction by the pinyin error correction model into a hotword replacement rule model;
and S32, traversing the text by using the key list, replacing the text with a corresponding value, namely a corresponding correct word when the text detects the key, namely the word needing error correction, and outputting the text subjected to final error correction.
Referring to fig. 4 and 5, a text error correction device suitable for a specific vertical field includes a pretrained Bert error correction model, a pinyin error correction model and a hotword replacement rule model, wherein the Bert error correction model is a Multi-layer bi-directional Transformers encoder, the components of the Bert error correction model are formed by summing three components, namely Token components, segment Embeddings and Position Embeddings, the Bert error correction model uses Multi-Head position to encode, the input components are subjected to dimension expansion to respectively obtain three dimensions of Key, query and Value, and each dimension is divided by Multi-Head, each Head divided by Multi-Head is then subjected to self-attribute with other words, so that a new vector is obtained, the new vector of each Head is spliced, and a final Multi-Head Attention Value is obtained by linear conversion of a weight matrix;
the Bert error correction model is more effective in unsupervised learning by means of Multi-Head Attention and bidirectional encoding, and because a Transformer is used, the Bert error correction model is more efficient and can capture dependence of a longer distance than a previous model, and can capture bidirectional context information in a true sense.
Specifically, the pinyin error correction model includes a database, wherein the database contains hot words in a certain field and corresponding hot word pinyin and word numbers, and the hot words in the certain field are derived from proper nouns in the field;
the text corrected by the Bert semantic error correction is corrected secondarily by using the Pinyin error correction model, correction of proper nouns in the related field is emphasized, and the proper nouns are difficult to detect through contexts, so that the proper nouns are likely to be ignored by the semantic error correction, the proper nouns are set to be hot words by using the Pinyin error correction model, when the hot word Pinyin is identical to the text Pinyin, corresponding characters are replaced by the hot words, so that the correctness of the proper noun text is ensured, and the method is convenient to update, and updating can be completed only by adding or deleting proper nouns in a hot word list, for example, a great amount of time can be saved in the fields with frequent product changes such as the banking field.
Specifically, the hot word replacement rule model includes a dictionary in which words to be corrected are set as keys, corresponding correct words are set as values, and all the keys are stored in a key list.
Specifically, the pretrained Bert error correction model is pretrained by two models, including Masked language mode and Next sentence prediction;
masked language mode pretraining the Bert error correction model by inputting randomly masked tokens in the corpus and predicting the randomly masked tokens;
next sentence prediction by entering sentences A and B, where sentence B is 50% likely to be the next sentence of sentence A and 50% likely to be a random sentence in the corpus, let the Bert error correction model pretrain whether sentence B is the next sentence of sentence A.
Specifically, the corpus contains the corpus of hot words in a vertical field of a certain field, a large amount of corpus support is needed for pre-training, in order to improve the recognition capability of the Bert error correction model in the vertical field, the corpus of hot words in the corresponding field is added into the corpus for updating training, for example, the corpus updating training of hot words in the vertical field of a bank is added by using a related model in the bank field.
The text subjected to secondary correction is subjected to third correction by using a hot word replacement rule model so as to strengthen the correction effect, different people can call the same thing differently, noise can be caused to influence task efficiency for NLP, however, the words are not wrong words strictly, so semantic correction and pinyin correction are likely to ignore them, the different calls are set as hot words, when the hot words exist in the text, the hot words are replaced by words required by NLP, so that noise generation is reduced to the greatest extent, and the updating operation of the method is very simple and convenient just by adding words needing correction and corresponding corrected words in the hot word rule;
according to the text correction method, text input by a user is poured into the Bert correction model for text correction, the corrected text is imported into the Pinyin correction model for secondary correction, so that after the text is subjected to semantic correction, proper nouns in the vertical field are corrected to achieve the enhancement effect, the accuracy of text correction is improved, the text subjected to secondary correction is poured into the hot word replacement rule model for hot word replacement, spoken text such as dialect is converted into proper nouns, the correction effect is enhanced again, through the three correction systems, the text can be subjected to basic correction from the semantic through the context, and the correction can be performed to a certain degree of replacement correction aiming at proper nouns in the vertical field, specific nouns and dialect slang under the application scene environment, which is difficult to achieve by the single Bert correction model.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims (6)

1. A text error correction method suitable for the vertical field is characterized by comprising the following steps:
s1, importing a text into a pretrained Bert error correction model, and performing text word sense error correction;
s11, segmenting the text into short sentences according to punctuation marks;
s12, carrying out mask processing on a first word in the short sentence;
s13, performing short sentences on the words subjected to mask processing through a pretrained Bert error correction model to predict, and storing all prediction results in a first list, wherein the prediction results in the first list are arranged according to the order of the prediction scores from large to small;
s131, if the masked word is in the first list, the masked word is regarded as correct;
s132, if the masked words are not in the first list, acquiring all common words with the same pronunciation as the masked words according to pinyin and storing the common words in the second list;
s1321, if there is the same word in List one and List two, the word to be masked
Regarding as wrongly written words, selecting the word with the highest predictive score from the first list to replace the masked word so as to achieve the purpose of error correction;
s1322, if the words in list one and list two are not identical, the word to be masked
Is considered correct;
s14, after judging the first word of the short sentence, carrying out mask processing on the next word in the short sentence, and repeating the step S13 until all Chinese characters in the text are detected and corrected;
s2, importing the text subjected to error correction by the Bert error correction model into a Pinyin error correction model, and performing secondary error correction;
s21, converting all texts subjected to error correction by the Bert error correction model into pinyin;
s22, sequentially comparing the spelling of the hot word with the spelling of the text from small to large according to the number of words;
s23, when the hot word spelling is completely the same as the text spelling, the hot word spelling is the same as the text spelling in the text
Partial replacement with hotwords;
s24, repeating the step S22 and the step S23 until all hot words are checked;
s3, importing the text subjected to the second error correction by the pinyin error correction model into a hotword replacement rule model, and performing third error correction;
s31, importing the text subjected to the second error correction by the pinyin error correction model into a hotword replacement rule model;
and S32, traversing the text by using the key list, replacing the text with a corresponding value, namely a corresponding correct word when the text detects the key, namely the word needing error correction, and outputting the text subjected to final error correction.
2. A text error correction device for implementing the text error correction method applicable to the vertical field as claimed in claim 1, characterized in that the text error correction device comprises a pretrained Bert error correction model, a pinyin error correction model and a hotword substitution rule model, the Bert error correction model is a Multi-layer bi-directional Transformers encoder, the components of the Bert error correction model are summed by three components, the three components are Token components, segment Embeddings and Position Embeddings respectively, the Bert error correction model uses Multi-Head Attention to encode, three dimensions of Key, query and Value are obtained respectively by dimension expansion of the input components, multi-Head is divided for each dimension, each Head is divided from other words by self-attitudes, thus obtaining new vectors, each Head's new vectors are spliced, and finally a Multi-Head conversion Value is obtained by a weight matrix.
3. The text error correction apparatus of claim 2, wherein the pinyin error correction model includes a database containing hotwords of a domain and corresponding hotword pinyin and word counts, the hotwords of the domain originating from proper nouns of the domain.
4. The text error correction apparatus of claim 2, wherein the hot word replacement rule model includes a dictionary that sets a word to be corrected as a key, sets a corresponding correct word as a value, and stores all keys in a key list.
5. The text error correction apparatus of claim 2, wherein the pre-trained Bert error correction model is pre-trained by two models, the two models comprising Masked language mode and Next sentence prediction;
the Masked language mode pre-trains the Bert error correction model by inputting randomly masked tokens in the corpus and predicting the randomly masked tokens;
the Next sentence prediction is configured to pre-train the Bert error correction model on whether the sentence B is the next sentence of the sentence a by inputting the sentence a and the sentence B, wherein the sentence B is 50% likely to be the next sentence of the sentence a and 50% likely to be a random sentence in the corpus.
6. The text correction apparatus of claim 5, wherein the corpus comprises a corpus of hot words in a domain vertical to a domain.
CN202110687769.5A 2021-06-21 2021-06-21 Text error correction method and device suitable for vertical field Active CN113449514B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110687769.5A CN113449514B (en) 2021-06-21 2021-06-21 Text error correction method and device suitable for vertical field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110687769.5A CN113449514B (en) 2021-06-21 2021-06-21 Text error correction method and device suitable for vertical field

Publications (2)

Publication Number Publication Date
CN113449514A CN113449514A (en) 2021-09-28
CN113449514B true CN113449514B (en) 2023-10-31

Family

ID=77812053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110687769.5A Active CN113449514B (en) 2021-06-21 2021-06-21 Text error correction method and device suitable for vertical field

Country Status (1)

Country Link
CN (1) CN113449514B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114817469B (en) * 2022-04-27 2023-08-08 马上消费金融股份有限公司 Text enhancement method, training method and training device for text enhancement model
CN115168565B (en) * 2022-07-07 2023-01-24 北京数美时代科技有限公司 Cold start method, device, equipment and storage medium for vertical domain language model
CN116975298B (en) * 2023-09-22 2023-12-05 厦门智慧思明数据有限公司 NLP-based modernized society governance scheduling system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020186778A1 (en) * 2019-03-15 2020-09-24 平安科技(深圳)有限公司 Error word correction method and device, computer device, and storage medium
CN112287670A (en) * 2020-11-18 2021-01-29 北京明略软件系统有限公司 Text error correction method, system, computer device and readable storage medium
CN112395861A (en) * 2020-11-18 2021-02-23 平安普惠企业管理有限公司 Method and device for correcting Chinese text and computer equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016310A (en) * 2020-09-03 2020-12-01 平安科技(深圳)有限公司 Text error correction method, system, device and readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020186778A1 (en) * 2019-03-15 2020-09-24 平安科技(深圳)有限公司 Error word correction method and device, computer device, and storage medium
CN112287670A (en) * 2020-11-18 2021-01-29 北京明略软件系统有限公司 Text error correction method, system, computer device and readable storage medium
CN112395861A (en) * 2020-11-18 2021-02-23 平安普惠企业管理有限公司 Method and device for correcting Chinese text and computer equipment

Also Published As

Publication number Publication date
CN113449514A (en) 2021-09-28

Similar Documents

Publication Publication Date Title
US10796105B2 (en) Device and method for converting dialect into standard language
CN113449514B (en) Text error correction method and device suitable for vertical field
Alkhatib et al. Deep learning for Arabic error detection and correction
CN114580382A (en) Text error correction method and device
CN112183094A (en) Chinese grammar debugging method and system based on multivariate text features
KR20230009564A (en) Learning data correction method and apparatus thereof using ensemble score
Abbad et al. Multi-components system for automatic Arabic diacritization
Abandah et al. Accurate and fast recurrent neural network solution for the automatic diacritization of Arabic text
KR20230061001A (en) Apparatus and method for correcting text
CN113380223A (en) Method, device, system and storage medium for disambiguating polyphone
Fang et al. Non-Autoregressive Chinese ASR Error Correction with Phonological Training
CN112183060B (en) Reference resolution method of multi-round dialogue system
Karim et al. On the training of deep neural networks for automatic Arabic-text diacritization
Chen et al. Integrated semantic and phonetic post-correction for chinese speech recognition
CN115437511B (en) Pinyin Chinese character conversion method, conversion model training method and storage medium
CN114444492B (en) Non-standard word class discriminating method and computer readable storage medium
Mijlad et al. Arabic text diacritization: Overview and solution
Nguyen et al. OCR error correction for Vietnamese handwritten text using neural machine translation
Lv et al. StyleBERT: Chinese pretraining by font style information
Winata Multilingual transfer learning for code-switched language and speech neural modeling
Nyberg Grammatical error correction for learners of swedish as a second language
Muaidi Levenberg-Marquardt learning neural network for part-of-speech tagging of Arabic sentences
CN111090720B (en) Hot word adding method and device
CN114896966A (en) Method, system, equipment and medium for positioning grammar error of Chinese text
Nazih et al. Arabic Syntactic Diacritics Restoration Using BERT Models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: No. 2-206, No. 1399 Liangmu Road, Cangqian Street, Yuhang District, Hangzhou City, Zhejiang Province, 311100

Patentee after: Kangxu Technology Co.,Ltd.

Country or region after: China

Address before: 310000 2-206, 1399 liangmu Road, Cangqian street, Yuhang District, Hangzhou City, Zhejiang Province

Patentee before: Zhejiang kangxu Technology Co.,Ltd.

Country or region before: China

CP03 Change of name, title or address