CN105279149A - Chinese text automatic correction method - Google Patents
Chinese text automatic correction method Download PDFInfo
- Publication number
- CN105279149A CN105279149A CN201510688403.4A CN201510688403A CN105279149A CN 105279149 A CN105279149 A CN 105279149A CN 201510688403 A CN201510688403 A CN 201510688403A CN 105279149 A CN105279149 A CN 105279149A
- Authority
- CN
- China
- Prior art keywords
- word
- text
- error
- chinese text
- words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses a Chinese text automatic correction method. The method comprises the following steps of: a) inputting a to-be-corrected Chinese text, and performing word segmentation preprocessing on the Chinese text sentence by sentence; b) searching for one-character words, two-character words or disperse strings of three or more than three characters occurring in the text subjected to word segmentation sentence by sentence; c) performing continuous determination on the disperse strings occurring in the text subjected to word segmentation by adopting an N-gram model, and checking text word level errors for each single sentence in combination with a word forming probability of separate characters; and d) constructing an error correction knowledge base to generate an error correction candidate text. According to the Chinese text automatic correction method provided by the invention, the one-character words, two-character words or disperse strings of three or more than three characters occurring in the text subjected to word segmentation are searched for sentence by sentence, the disperse strings occurring in the text subjected to word segmentation are subjected to continuous determination by adopting the N-gram model to determine identification errors, and the error correction knowledge base is constructed to generate the error correction candidate text, so that error checking and correcting processes are combined very well, and the method has the characteristics of high error checking speed and high error correcting efficiency.
Description
Technical field
The present invention relates to a kind of text correction method, particularly relate to a kind of Chinese text auto-correction method.
Background technology
Along with developing rapidly of Modern Laser phototypesetting technology and electronic publishing industry, how to ensure passed on information correctly one of importance becoming research.Current people use computing machine to carry out writing, edit and the work such as typesetting, inevitably some errors in text, such as multiword, hiatus, transposition, English word spelling write error, punctuate lack of standardization etc.Therefore need special school team's system to proofread manuscript.From long term growth, informationization is the trend of social development in the future, the electronic information that people face and manuscript increasing, and traditional craft check and correction needs press corrector to carry out reading word by word and sentence by sentence, inspection to text, all can not adapt to from cost and efficiency two aspects the trend that e-text quantity rapidly increases.Therefore, more and more urgent to the demand of an automatic school team system that accuracy is high, efficiency is high.
Automatic school team has very important practical value, and have a wide range of applications field.In publishing business, the realization of text automatic Proofreading can alleviate the workload of staff greatly, they is freed from loaded down with trivial details tasteless work, accelerates to publish rhythm and promotes developing rapidly of whole publishing business; In Text region, need with debugging, error correcting technique to speech recognition, the recognition results such as ORC Text region are modified; In copy editor, such as, all provide automatic errordetecting technology in a lot of text editing system such as word etc., the text of input is reported an error automatically; In man-machine interface, such as the man-machine interface such as data base querying, natural language requires certain fault freedom; Need to analyze the sentence of input in the systems such as aided education, find out mistake wherein, and provide possible correct option etc.
In addition, automatic Proofreading also has very important theory significance.From ownership of discipline, automatic Proofreading is subordinated to the category of natural language understanding, involves the basic sector of many natural language understandings, such as automatic word segmentation, part-of-speech tagging, syntactic analysis etc., because of but a research topic having very much a learning value.At present, the research of natural language processing has entered the stage to extensive real text process, and the real text of reality may also exist mistake, automatic Proofreading technology is studied exactly and is searched these mistakes of process, therefore the development of automatic Proofreading technology must improve the fault freedom of other natural language processings, promotes the development of whole natural language processing research further.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of Chinese text auto-correction method, to e-text automatic analysis, can find, indicate mistake and carry out error correction correction, debugging and error correction procedure are combined well, there is debugging speed fast, the feature that error correction efficiency is high.
The present invention solves the problems of the technologies described above the technical scheme adopted to be to provide a kind of Chinese text auto-correction method, comprises the steps: a) to input to wait to proofread Chinese text, carries out participle pre-service by simple sentence to Chinese text; B) individual character, double word or three words and above loose string thereof that occur in participle text are searched by simple sentence; C) adopt N-gram model to judge continuously the loose string occurred in participle text, and each simple sentence is checked to the mistake of text word level in conjunction with inside word probability; D) construct correcting knowledge sets and generate error correction candidate text.
Above-mentioned Chinese text auto-correction method, wherein, described step a) adopts voice or input through keyboard to wait to proofread Chinese text, and described pre-service comprises treating check and correction Chinese text arrangement grammar mistake and carrying out pattern match inspection input.
Above-mentioned Chinese text auto-correction method, wherein, described step a) in phonetic entry to wait to proofread the process of Chinese text as follows: receive the phonetic entry from microphone and transfer the voice flow that computing machine can receive to, the combination of Pattern matching generating candidate words word is carried out to voice flow, utilizes language model to identify the combination of candidate word word.
Above-mentioned Chinese text auto-correction method, wherein, described step a) middle input through keyboard waits that the process of proofreading Chinese text is as follows: encode to words in advance, keystroke signal is converted to the code sequence that computing machine accepts, and described code sequence be associated with word coding method.
Above-mentioned Chinese text auto-correction method, wherein, described step c) as follows to the deterministic process of three words and above loose string thereof: judge that in loose string, each word becomes separately the probability of word, determine the first error constant, the binary word model that continues is adopted to judge that adjacent two words become the probability of word successively, determine the second error constant, the ternary word model that continues is adopted to judge that adjacent three words become the probability of word successively, determine the 3rd error constant, all error constants are added the terminal error coefficient determining text word level.
Above-mentioned Chinese text auto-correction method, wherein, described step c) to continuous four words loose string W
kw
k+1w
k+2w
k+3deterministic process as follows: c1) judge W respectively
kw
k+1w
k+2w
k+3these words become separately the probability of word, if probability P=0 that certain word occurs separately, then this place is wrong, error constant K
1+=1.5; C2) with W
k-2for reference position, W
k+4for end position, the binary word model that continues is adopted to judge, with continuous two Term co-occurrence frequency R for basis for estimation; If R=0, then error constant K
4+=0.2, if R>=1, then K
2-=1.0; C3) with W
k-1for reference position, W
k+4for end position, the binary word model that continues is adopted to judge, with continuous two Term co-occurrence frequency R for basis for estimation; If R=0, then error constant K
3+=0.5, if 1<R<2, then K
3+=0.2, if R>=2, then K
3-=1.0; C4) with W
kthe first character of the first two word is end position, W
k+3rear second word is end position, adopts ternary word model to judge, with continuous three word co-occurrence frequency R for basis for estimation; If R=0, then error constant K
4+=0.2, if R>=1, then K
4-=1.0;
C5) with W
kprevious word is reference position, W
k+3a rear word is end position, adopts binary word model to judge, with continuous two word co-occurrence frequency R for basis for estimation; If R=0, then error constant K
5+=0.8, if 1<R<3, then K
5+=0.5, if R>=3, then K
5-=1.0; C6) treat debugging individual character for a certain, gained error constant is added, i.e. K=K
1+ K
2+ K
3+ K
4+ K
5if K>=1.5, then this place is wrong, is indicated by Error Text.
Above-mentioned Chinese text auto-correction method, wherein, described steps d) the error correction candidate text generated is sorted, described sequencer procedure is as follows: use each error correction candidate text to replace former Error Text, step b is repeated to the simple sentence after replacing) and step c) carry out debugging process again and obtain corresponding error constant, according to error constant size order, error correction candidate text is sorted.
Above-mentioned Chinese text auto-correction method, wherein, described steps d) text based error characteristic and the various correcting knowledge sets of likelihood match method construct, described correcting knowledge sets comprises wrongly written character dictionary, easily obscures words allusion quotation, similar code dictionary and/or the two-way dictionary of word drive.
The present invention contrasts prior art following beneficial effect: Chinese text auto-correction method provided by the invention, individual character, double word or three words and above loose string thereof that occur in participle text are searched by simple sentence, N-gram model is adopted to carry out judging continuously to determine to identify mistake to the loose string occurred in participle text, and construct correcting knowledge sets generation error correction candidate text, thus debugging and error correction procedure are combined well, there is debugging speed fast, the feature that error correction efficiency is high.
Accompanying drawing explanation
Fig. 1 is Chinese text automatic calibration schematic flow sheet of the present invention;
Fig. 2 is that the present invention carries out preprocessing process schematic diagram to Chinese text to be corrected;
Fig. 3 is that the present invention adopts input through keyboard to obtain Chinese text process schematic to be corrected;
Fig. 4 is that the present invention adopts phonetic entry to obtain Chinese text process schematic to be corrected;
Fig. 5 is that the voice signal in knowledge based storehouse of the present invention is to Chinese Character Recognition process schematic;
Fig. 6 is the detailed process schematic diagram of Chinese text automatic error-correcting of the present invention.
Embodiment
Below in conjunction with drawings and Examples, the invention will be further described.
Fig. 1 is Chinese text automatic calibration schematic flow sheet of the present invention.
Refer to Fig. 1, Chinese text auto-correction method provided by the invention, comprises the steps:
A) input wait proofread Chinese text, by simple sentence, participle pre-service is carried out to Chinese text; Voice or input through keyboard is adopted to wait to proofread Chinese text, described pre-service comprises treating check and correction Chinese text arrangement grammar mistake and carrying out pattern match inspection input, treat that check and correction Chinese text can adopt voice or input through keyboard, keyboard input process is as shown in Figure 3: encode to words in advance, keystroke signal is converted to the code sequence that computing machine accepts, and described code sequence is associated with word coding method; Phonetic entry process is as shown in Figure 4 and Figure 5: receive the phonetic entry from microphone and transfer the voice flow that computing machine can receive to, the combination of Pattern matching generating candidate words word is carried out to voice flow, utilizes language model to identify the combination of candidate word word.
B) individual character, double word or three words and above loose string thereof that occur in participle text are searched by simple sentence.
C) adopt N-gram model to judge continuously the loose string occurred in participle text, and each simple sentence is checked to the mistake of text word level in conjunction with inside word probability; As follows to the deterministic process of three words and above loose string thereof: to judge that in loose string, each word becomes separately the probability of word, determine the first error constant, the binary word model that continues is adopted to judge that adjacent two words become the probability of word successively, determine the second error constant, the ternary word model that continues is adopted to judge that adjacent three words become the probability of word successively, determine the 3rd error constant, all error constants are added the terminal error coefficient determining text word level; N-Gram is a kind of language model conventional in large vocabulary continuous speech recognition, for Chinese, is referred to as Chinese language model (CLM, ChineseLanguageModel).
D) construct correcting knowledge sets and generate error correction candidate text; Specifically can adopt text based error characteristic and the various correcting knowledge sets of likelihood match method construct, described correcting knowledge sets comprises wrongly written character dictionary, easily obscures words allusion quotation, similar code dictionary and/or the two-way dictionary of word drive; Select for the ease of user, the present invention also can sort to the error correction candidate text generated, described sequencer procedure is as follows: use each error correction candidate text to replace former Error Text, step b is repeated to the simple sentence after replacing) and step c) carry out debugging process again and obtain corresponding error constant, according to error constant size order, error correction candidate text is sorted.
Please continue see Fig. 6, provide a specific embodiment below, performing step is as follows:
Step1: input and wait to proofread text, adopt Beijing University's participle software, participle pre-service is carried out to text;
Step2: search individual character, double word or three words and above loose string thereof that occur in participle text, using all these local sources possible as mistake.Suppose to find out W in text
kw
k+1w
k+2w
k+3for the loose string of continuous four words occurred, then debugging is carried out in the source herein as mistake, k is natural number, represents and finds out the position of text in simple sentence.
Step3: judge W respectively
kw
k+1w
k+2w
k+3these words become separately the probability of word, if probability P=0 that certain word occurs separately, then this place is wrong, error constant K
1+=1.5.
Step4: with W
k-2for reference position, W
k+4for end position, the binary word model that continues is adopted to judge, with continuous two Term co-occurrence frequency R for basis for estimation.If R=0, then error constant K
4+=0.2, if R>=1, then K
2-=1.0.
Step5: with W
k-1for reference position, W
k+4for end position, the binary word model that continues is adopted to judge, with continuous two Term co-occurrence frequency R for basis for estimation.If R=0, then error constant K
3+=0.5, if 1<R<2, then K
3+=0.2, if R>=2, then K
3-=1.0.
Step6: with W
kthe first character of the first two word is end position, W
k+3rear second word is end position, adopts ternary word model to judge, with continuous three word co-occurrence frequency R for basis for estimation.If R=0, then error constant K
4+=0.2, if R>=1, then K
4-=1.0.
Step7: with W
kprevious word is reference position, W
k+3a rear word is end position, adopts binary word model to judge, with continuous two word co-occurrence frequency R for basis for estimation.If R=0, then error constant K
5+=0.8, if 1<R<3, then K
5+=0.5, if R>=3, then K
5-=1.0.
Step8: treat debugging individual character for a certain, is added each module gained error constant, i.e. K=K
1+ K
2+ K
3+ K
4+ K
5if K>=1.5, then this place is wrong, is indicated by Error Text.
Step9: terminate.
In sum, auto-correction method of the present invention, mainly comprises automatic errordetecting and error correction two parts, utilizes the combination of multi-model debugging technology based on hybrid algorithm and error correcting technique, devises a kind of self-verifying model of words staging error; And on the basis analyzing text words staging error characteristic distributions, adopt N-gram model to judge continuously the loose string occurred in text.The present invention checks the mistake of text word level in conjunction with inside word probability, on the basis of structure correcting knowledge sets, achieves Correcting Suggestion generating algorithm.In conjunction with the various correcting knowledge sets of the error characteristic of text and likelihood match method construct, comprise wrongly written character dictionary, easily obscure words allusion quotation, similar code dictionary, the two-way dictionary of word drive etc. and generate error correction candidate suggestion.And propose error correction candidate suggestion to sort, by the sequencer procedure of Correcting Suggestion by realizing the debugging process of each Correcting Suggestion.When error correction, each candidate's Correcting Suggestion is replaced former mistake, carry out debugging process and obtain corresponding error constant to this place, the minimum suggestion of error constant is most probable Correcting Suggestion, thus completes the sequencer procedure of text Correcting Suggestion.The method makes the research of error correction and debugging combine, and debugging technology is well applied to error correction procedure.Concrete advantage is as follows: 1, propose the words level automatic errordetecting function adopted based on N-gram model, reflect the information of commonly used words preferably: tuple higher for the frequency of occurrences in statistics and dictionary are compared, can find that the tuple corresponding to words conventional in Chinese has higher co-occurrence frequency, the adjacency matrix thus adding up acquisition contains conventional associational word set.2, the collocation of conventional function word can well be reacted: in Chinese, some function word is combined with some word, although there is no the implication of reality, but serve grammatical function, as " must very ", " can not ", the tuple such as " one-tenth " has very high co-occurrence probability.3, N unit words adjacency matrix can well react beginning of the sentence, sentence tail information.4, find a lot of mistake by the statistical method of words, illustrate that N unit words adjacency matrix reflects some inherent laws of natural language to a certain extent.
Although the present invention discloses as above with preferred embodiment; so itself and be not used to limit the present invention, any those skilled in the art, without departing from the spirit and scope of the present invention; when doing a little amendment and perfect, therefore protection scope of the present invention is when being as the criterion of defining with claims.
Claims (8)
1. a Chinese text auto-correction method, is characterized in that, comprises the steps:
A) input wait proofread Chinese text, by simple sentence, participle pre-service is carried out to Chinese text;
B) individual character, double word or three words and above loose string thereof that occur in participle text are searched by simple sentence;
C) adopt N-gram model to judge continuously the loose string occurred in participle text, and each simple sentence is checked to the mistake of text word level in conjunction with inside word probability;
D) construct correcting knowledge sets and generate error correction candidate text.
2. Chinese text auto-correction method as claimed in claim 1, it is characterized in that, described step a) adopts voice or input through keyboard to wait to proofread Chinese text, and described pre-service comprises treating check and correction Chinese text arrangement grammar mistake and carrying out pattern match inspection input.
3. Chinese text auto-correction method as claimed in claim 2, it is characterized in that, in described step a), to wait to proofread the process of Chinese text as follows in phonetic entry: receive the phonetic entry from microphone and transfer the voice flow that computing machine can receive to, the combination of Pattern matching generating candidate words word is carried out to voice flow, utilizes language model to identify the combination of candidate word word.
4. Chinese text auto-correction method as claimed in claim 2, it is characterized in that, in described step a), input through keyboard waits that the process of proofreading Chinese text is as follows: encode to words in advance, keystroke signal is converted to the code sequence that computing machine accepts, and described code sequence is associated with word coding method.
5. Chinese text auto-correction method as claimed in claim 1, it is characterized in that, the deterministic process of described step c) to three words and above loose string thereof is as follows: judge that in loose string, each word becomes separately the probability of word, determine the first error constant, the binary word model that continues is adopted to judge that adjacent two words become the probability of word successively, determine the second error constant, the ternary word model that continues is adopted to judge that adjacent three words become the probability of word successively, determine the 3rd error constant, all error constants are added the terminal error coefficient determining text word level.
6. Chinese text auto-correction method as claimed in claim 5, is characterized in that, described step c) is to continuous four words loose string W
kw
k+1w
k+2w
k+3deterministic process as follows:
C1) W is judged respectively
kw
k+1w
k+2w
k+3these words become separately the probability of word, if probability P=0 that certain word occurs separately, then this place is wrong, error constant K
1+=1.5;
C2) with W
k-2for reference position, W
k+4for end position, the binary word model that continues is adopted to judge, with continuous two Term co-occurrence frequency R for basis for estimation; If R=0, then error constant K
4+=0.2, if R>=1, then K
2-=1.0;
C3) with W
k-1for reference position, W
k+4for end position, the binary word model that continues is adopted to judge, with continuous two Term co-occurrence frequency R for basis for estimation; If R=0, then error constant K
3+=0.5, if 1<R<2, then K
3+=0.2, if R>=2, then K
3-=1.0;
C4) with W
kthe first character of the first two word is end position, W
k+3rear second word is end position, adopts ternary word model to judge, with continuous three word co-occurrence frequency R for basis for estimation; If R=0, then error constant K
4+=0.2, if R>=1, then K
4-=1.0;
C5) with W
kprevious word is reference position, W
k+3a rear word is end position, adopts binary word model to judge, with continuous two word co-occurrence frequency R for basis for estimation; If R=0, then error constant K
5+=0.8, if 1<R<3, then K
5+=0.5, if R>=3, then K
5-=1.0;
C6) treat debugging individual character for a certain, gained error constant is added, i.e. K=K
1+ K
2+ K
3+ K
4+ K
5if K>=1.5, then this place is wrong, is indicated by Error Text.
7. Chinese text auto-correction method as claimed in claim 5, it is characterized in that, described step d) sorts to the error correction candidate text generated, described sequencer procedure is as follows: use each error correction candidate text to replace former Error Text, simple sentence repetition step b) after replacement and step c) are carried out to debugging process again and obtained corresponding error constant, according to error constant size order, error correction candidate text is sorted.
8. Chinese text auto-correction method as claimed in claim 1, it is characterized in that, described step d) text based error characteristic and the various correcting knowledge sets of likelihood match method construct, described correcting knowledge sets comprises wrongly written character dictionary, easily obscures words allusion quotation, similar code dictionary and/or the two-way dictionary of word drive.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510688403.4A CN105279149A (en) | 2015-10-21 | 2015-10-21 | Chinese text automatic correction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510688403.4A CN105279149A (en) | 2015-10-21 | 2015-10-21 | Chinese text automatic correction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105279149A true CN105279149A (en) | 2016-01-27 |
Family
ID=55148178
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510688403.4A Pending CN105279149A (en) | 2015-10-21 | 2015-10-21 | Chinese text automatic correction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105279149A (en) |
Cited By (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105550173A (en) * | 2016-02-06 | 2016-05-04 | 北京京东尚科信息技术有限公司 | Text correction method and device |
CN105824804A (en) * | 2016-03-31 | 2016-08-03 | 长安大学 | English spelling error correction tool and method based on word bank |
CN105869634A (en) * | 2016-03-31 | 2016-08-17 | 重庆大学 | Field-based method and system for feeding back text error correction after speech recognition |
CN106547741A (en) * | 2016-11-21 | 2017-03-29 | 江苏科技大学 | A kind of Chinese language text auto-collation based on collocation |
WO2017161899A1 (en) * | 2016-03-24 | 2017-09-28 | 华为技术有限公司 | Text processing method, device, and computing apparatus |
CN107506413A (en) * | 2017-08-11 | 2017-12-22 | 江苏科技大学 | A kind of querying method based on Lucene wrong words |
CN107633250A (en) * | 2017-09-11 | 2018-01-26 | 畅捷通信息技术股份有限公司 | A kind of Text region error correction method, error correction system and computer installation |
CN107656627A (en) * | 2017-09-28 | 2018-02-02 | 百度在线网络技术(北京)有限公司 | Data inputting method and device |
CN107729316A (en) * | 2017-10-12 | 2018-02-23 | 福建富士通信息软件有限公司 | The identification of wrong word and the method and device of error correction in the interactive question and answer text of Chinese |
CN108038098A (en) * | 2017-11-28 | 2018-05-15 | 苏州市东皓计算机系统工程有限公司 | A kind of computword correcting method |
CN108132917A (en) * | 2017-12-04 | 2018-06-08 | 昆明理工大学 | A kind of document error correction flag method |
WO2018153295A1 (en) * | 2017-02-27 | 2018-08-30 | 腾讯科技(深圳)有限公司 | Text entity extraction method, device, apparatus, and storage media |
TWI635406B (en) * | 2016-11-25 | 2018-09-11 | 英業達股份有限公司 | Method for string recognition and machine learning |
CN108595410A (en) * | 2018-03-19 | 2018-09-28 | 小船出海教育科技(北京)有限公司 | The automatic of hand-written composition corrects method and device |
CN108628826A (en) * | 2018-04-11 | 2018-10-09 | 广州视源电子科技股份有限公司 | Candidate word evaluation method and device, computer equipment and storage medium |
CN108647681A (en) * | 2018-05-08 | 2018-10-12 | 重庆邮电大学 | A kind of English text detection method with text orientation correction |
CN108647202A (en) * | 2018-04-11 | 2018-10-12 | 广州视源电子科技股份有限公司 | Candidate word evaluation method and device, computer equipment and storage medium |
CN108664467A (en) * | 2018-04-11 | 2018-10-16 | 广州视源电子科技股份有限公司 | Candidate word evaluation method and device, computer equipment and storage medium |
CN108694167A (en) * | 2018-04-11 | 2018-10-23 | 广州视源电子科技股份有限公司 | Candidate word evaluation method, candidate word sorting method and device |
CN108717412A (en) * | 2018-06-12 | 2018-10-30 | 北京览群智数据科技有限责任公司 | Chinese check and correction error correction method based on Chinese word segmentation and system |
CN108733646A (en) * | 2018-04-11 | 2018-11-02 | 广州视源电子科技股份有限公司 | Candidate word evaluation method and device, computer equipment and storage medium |
CN108845984A (en) * | 2018-05-22 | 2018-11-20 | 广州视源电子科技股份有限公司 | Wrongly written character detection method and device, computer readable storage medium and terminal equipment |
CN109062888A (en) * | 2018-06-04 | 2018-12-21 | 昆明理工大学 | A kind of self-picketing correction method when there is Error Text input |
CN109213998A (en) * | 2018-08-17 | 2019-01-15 | 汇智容大(北京)信息技术有限公司 | Chinese wrongly written character detection method and system |
CN109460552A (en) * | 2018-10-29 | 2019-03-12 | 朱丽莉 | Rule-based and corpus Chinese faulty wording automatic testing method and equipment |
CN110046350A (en) * | 2019-04-12 | 2019-07-23 | 百度在线网络技术(北京)有限公司 | Grammatical bloopers recognition methods, device, computer equipment and storage medium |
CN110110334A (en) * | 2019-05-08 | 2019-08-09 | 郑州大学 | A kind of remote medical consultation with specialists recording text error correction method based on natural language processing |
CN110110969A (en) * | 2019-04-10 | 2019-08-09 | 中国科学院国家空间科学中心 | A kind of space environment forecast product gross examines appraisal procedure and system automatically |
CN110135879A (en) * | 2018-11-17 | 2019-08-16 | 华南理工大学 | Customer service quality automatic scoring method based on natural language processing |
CN110134936A (en) * | 2018-02-08 | 2019-08-16 | 北京搜狗科技发展有限公司 | A kind of segmenting method, device and electronic equipment |
CN110134950A (en) * | 2019-04-28 | 2019-08-16 | 北京百分点信息科技有限公司 | A kind of text auto-collation that words combines |
CN110929514A (en) * | 2019-11-20 | 2020-03-27 | 北京百分点信息科技有限公司 | Text proofreading method and device, computer readable storage medium and electronic equipment |
CN110991166A (en) * | 2019-12-03 | 2020-04-10 | 中国标准化研究院 | Chinese wrongly-written character recognition method and system based on pattern matching |
CN111079768A (en) * | 2019-12-23 | 2020-04-28 | 北京爱医生智慧医疗科技有限公司 | Character and image recognition method and device based on OCR |
CN111079415A (en) * | 2019-11-12 | 2020-04-28 | 中国标准化研究院 | Chinese automatic error checking method based on collocation conflict |
CN111144101A (en) * | 2019-12-26 | 2020-05-12 | 北大方正集团有限公司 | Wrongly written character processing method and device |
CN111310447A (en) * | 2020-03-18 | 2020-06-19 | 科大讯飞股份有限公司 | Grammar error correction method, grammar error correction device, electronic equipment and storage medium |
CN111339755A (en) * | 2018-11-30 | 2020-06-26 | 中国移动通信集团浙江有限公司 | Automatic error correction method and device for office data |
CN111626049A (en) * | 2020-05-27 | 2020-09-04 | 腾讯科技(深圳)有限公司 | Title correction method and device for multimedia information, electronic equipment and storage medium |
CN111783458A (en) * | 2020-08-20 | 2020-10-16 | 支付宝(杭州)信息技术有限公司 | Method and device for detecting overlapping character errors |
CN112711943A (en) * | 2020-12-17 | 2021-04-27 | 厦门市美亚柏科信息股份有限公司 | Uygur language identification method, device and storage medium |
CN117371445A (en) * | 2023-12-07 | 2024-01-09 | 深圳市慧动创想科技有限公司 | Information error correction method, device, computer equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1387650A (en) * | 1999-11-05 | 2002-12-25 | 微软公司 | Language input architecture for converting one text form to another text form with minimized typographical errors and conversion errors |
CN101655837A (en) * | 2009-09-08 | 2010-02-24 | 北京邮电大学 | Method for detecting and correcting error on text after voice recognition |
-
2015
- 2015-10-21 CN CN201510688403.4A patent/CN105279149A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1387650A (en) * | 1999-11-05 | 2002-12-25 | 微软公司 | Language input architecture for converting one text form to another text form with minimized typographical errors and conversion errors |
CN101655837A (en) * | 2009-09-08 | 2010-02-24 | 北京邮电大学 | Method for detecting and correcting error on text after voice recognition |
Non-Patent Citations (3)
Title |
---|
张仰森,俞士汶: "文本自动校对技术研究综述", 《计算机应用研》 * |
潘昊,颜军: "基于中文分词的文本自动校对算法", 《武汉理工大学学报》 * |
郇政永: "基于OCR的中文文本校对研究", 《中国优秀硕士学位论文全文数据库 信息科技集》 * |
Cited By (65)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105550173A (en) * | 2016-02-06 | 2016-05-04 | 北京京东尚科信息技术有限公司 | Text correction method and device |
WO2017161899A1 (en) * | 2016-03-24 | 2017-09-28 | 华为技术有限公司 | Text processing method, device, and computing apparatus |
CN107229627A (en) * | 2016-03-24 | 2017-10-03 | 华为技术有限公司 | A kind of text handling method, device and computing device |
CN105824804A (en) * | 2016-03-31 | 2016-08-03 | 长安大学 | English spelling error correction tool and method based on word bank |
CN105869634A (en) * | 2016-03-31 | 2016-08-17 | 重庆大学 | Field-based method and system for feeding back text error correction after speech recognition |
CN105869634B (en) * | 2016-03-31 | 2019-11-19 | 重庆大学 | It is a kind of based on field band feedback speech recognition after text error correction method and system |
CN106547741A (en) * | 2016-11-21 | 2017-03-29 | 江苏科技大学 | A kind of Chinese language text auto-collation based on collocation |
TWI635406B (en) * | 2016-11-25 | 2018-09-11 | 英業達股份有限公司 | Method for string recognition and machine learning |
US11222178B2 (en) | 2017-02-27 | 2022-01-11 | Tencent Technology (Shenzhen) Company Ltd | Text entity extraction method for extracting text from target text based on combination probabilities of segmentation combination of text entities in the target text, apparatus, and device, and storage medium |
WO2018153295A1 (en) * | 2017-02-27 | 2018-08-30 | 腾讯科技(深圳)有限公司 | Text entity extraction method, device, apparatus, and storage media |
CN107506413B (en) * | 2017-08-11 | 2020-03-20 | 江苏科技大学 | Lucene wrongly written character based query method |
CN107506413A (en) * | 2017-08-11 | 2017-12-22 | 江苏科技大学 | A kind of querying method based on Lucene wrong words |
CN107633250B (en) * | 2017-09-11 | 2023-04-18 | 畅捷通信息技术股份有限公司 | Character recognition error correction method, error correction system and computer device |
CN107633250A (en) * | 2017-09-11 | 2018-01-26 | 畅捷通信息技术股份有限公司 | A kind of Text region error correction method, error correction system and computer installation |
CN107656627A (en) * | 2017-09-28 | 2018-02-02 | 百度在线网络技术(北京)有限公司 | Data inputting method and device |
CN107656627B (en) * | 2017-09-28 | 2021-07-23 | 百度在线网络技术(北京)有限公司 | Information input method and device |
CN107729316A (en) * | 2017-10-12 | 2018-02-23 | 福建富士通信息软件有限公司 | The identification of wrong word and the method and device of error correction in the interactive question and answer text of Chinese |
CN108038098A (en) * | 2017-11-28 | 2018-05-15 | 苏州市东皓计算机系统工程有限公司 | A kind of computword correcting method |
CN108132917A (en) * | 2017-12-04 | 2018-06-08 | 昆明理工大学 | A kind of document error correction flag method |
CN108132917B (en) * | 2017-12-04 | 2021-12-17 | 昆明理工大学 | Document error correction marking method |
CN110134936A (en) * | 2018-02-08 | 2019-08-16 | 北京搜狗科技发展有限公司 | A kind of segmenting method, device and electronic equipment |
CN108595410A (en) * | 2018-03-19 | 2018-09-28 | 小船出海教育科技(北京)有限公司 | The automatic of hand-written composition corrects method and device |
CN108694167A (en) * | 2018-04-11 | 2018-10-23 | 广州视源电子科技股份有限公司 | Candidate word evaluation method, candidate word sorting method and device |
CN108628826A (en) * | 2018-04-11 | 2018-10-09 | 广州视源电子科技股份有限公司 | Candidate word evaluation method and device, computer equipment and storage medium |
CN108628826B (en) * | 2018-04-11 | 2022-09-06 | 广州视源电子科技股份有限公司 | Candidate word evaluation method and device, computer equipment and storage medium |
CN108733646B (en) * | 2018-04-11 | 2022-09-06 | 广州视源电子科技股份有限公司 | Candidate word evaluation method and device, computer equipment and storage medium |
CN108733646A (en) * | 2018-04-11 | 2018-11-02 | 广州视源电子科技股份有限公司 | Candidate word evaluation method and device, computer equipment and storage medium |
CN108647202A (en) * | 2018-04-11 | 2018-10-12 | 广州视源电子科技股份有限公司 | Candidate word evaluation method and device, computer equipment and storage medium |
CN108664467A (en) * | 2018-04-11 | 2018-10-16 | 广州视源电子科技股份有限公司 | Candidate word evaluation method and device, computer equipment and storage medium |
CN108647681A (en) * | 2018-05-08 | 2018-10-12 | 重庆邮电大学 | A kind of English text detection method with text orientation correction |
CN108647681B (en) * | 2018-05-08 | 2019-06-14 | 重庆邮电大学 | A kind of English text detection method with text orientation correction |
CN108845984A (en) * | 2018-05-22 | 2018-11-20 | 广州视源电子科技股份有限公司 | Wrongly written character detection method and device, computer readable storage medium and terminal equipment |
CN108845984B (en) * | 2018-05-22 | 2022-04-22 | 广州视源电子科技股份有限公司 | Wrongly written character detection method and device, computer readable storage medium and terminal equipment |
CN109062888B (en) * | 2018-06-04 | 2023-03-31 | 昆明理工大学 | Self-correcting method for input of wrong text |
CN109062888A (en) * | 2018-06-04 | 2018-12-21 | 昆明理工大学 | A kind of self-picketing correction method when there is Error Text input |
CN108717412A (en) * | 2018-06-12 | 2018-10-30 | 北京览群智数据科技有限责任公司 | Chinese check and correction error correction method based on Chinese word segmentation and system |
CN109213998B (en) * | 2018-08-17 | 2023-06-23 | 上海蜜度信息技术有限公司 | Chinese character error detection method and system |
CN109213998A (en) * | 2018-08-17 | 2019-01-15 | 汇智容大(北京)信息技术有限公司 | Chinese wrongly written character detection method and system |
CN109460552A (en) * | 2018-10-29 | 2019-03-12 | 朱丽莉 | Rule-based and corpus Chinese faulty wording automatic testing method and equipment |
CN110135879A (en) * | 2018-11-17 | 2019-08-16 | 华南理工大学 | Customer service quality automatic scoring method based on natural language processing |
CN110135879B (en) * | 2018-11-17 | 2024-01-16 | 华南理工大学 | Customer service quality automatic scoring method based on natural language processing |
CN111339755A (en) * | 2018-11-30 | 2020-06-26 | 中国移动通信集团浙江有限公司 | Automatic error correction method and device for office data |
CN110110969A (en) * | 2019-04-10 | 2019-08-09 | 中国科学院国家空间科学中心 | A kind of space environment forecast product gross examines appraisal procedure and system automatically |
CN110046350A (en) * | 2019-04-12 | 2019-07-23 | 百度在线网络技术(北京)有限公司 | Grammatical bloopers recognition methods, device, computer equipment and storage medium |
CN110134950B (en) * | 2019-04-28 | 2022-12-06 | 北京百分点科技集团股份有限公司 | Automatic text proofreading method combining words |
CN110134950A (en) * | 2019-04-28 | 2019-08-16 | 北京百分点信息科技有限公司 | A kind of text auto-collation that words combines |
CN110110334A (en) * | 2019-05-08 | 2019-08-09 | 郑州大学 | A kind of remote medical consultation with specialists recording text error correction method based on natural language processing |
CN110110334B (en) * | 2019-05-08 | 2022-09-13 | 郑州大学 | Remote consultation record text error correction method based on natural language processing |
CN111079415A (en) * | 2019-11-12 | 2020-04-28 | 中国标准化研究院 | Chinese automatic error checking method based on collocation conflict |
CN110929514A (en) * | 2019-11-20 | 2020-03-27 | 北京百分点信息科技有限公司 | Text proofreading method and device, computer readable storage medium and electronic equipment |
CN110929514B (en) * | 2019-11-20 | 2023-06-27 | 北京百分点科技集团股份有限公司 | Text collation method, text collation apparatus, computer-readable storage medium, and electronic device |
CN110991166B (en) * | 2019-12-03 | 2021-07-30 | 中国标准化研究院 | Chinese wrongly-written character recognition method and system based on pattern matching |
CN110991166A (en) * | 2019-12-03 | 2020-04-10 | 中国标准化研究院 | Chinese wrongly-written character recognition method and system based on pattern matching |
CN111079768A (en) * | 2019-12-23 | 2020-04-28 | 北京爱医生智慧医疗科技有限公司 | Character and image recognition method and device based on OCR |
CN111144101A (en) * | 2019-12-26 | 2020-05-12 | 北大方正集团有限公司 | Wrongly written character processing method and device |
CN111144101B (en) * | 2019-12-26 | 2021-12-03 | 北大方正集团有限公司 | Wrongly written character processing method and device |
CN111310447A (en) * | 2020-03-18 | 2020-06-19 | 科大讯飞股份有限公司 | Grammar error correction method, grammar error correction device, electronic equipment and storage medium |
CN111310447B (en) * | 2020-03-18 | 2024-02-02 | 河北省讯飞人工智能研究院 | Grammar error correction method, grammar error correction device, electronic equipment and storage medium |
CN111626049B (en) * | 2020-05-27 | 2022-12-16 | 深圳市雅阅科技有限公司 | Title correction method and device for multimedia information, electronic equipment and storage medium |
CN111626049A (en) * | 2020-05-27 | 2020-09-04 | 腾讯科技(深圳)有限公司 | Title correction method and device for multimedia information, electronic equipment and storage medium |
CN111783458A (en) * | 2020-08-20 | 2020-10-16 | 支付宝(杭州)信息技术有限公司 | Method and device for detecting overlapping character errors |
CN111783458B (en) * | 2020-08-20 | 2024-05-03 | 支付宝(杭州)信息技术有限公司 | Method and device for detecting character overlapping errors |
CN112711943A (en) * | 2020-12-17 | 2021-04-27 | 厦门市美亚柏科信息股份有限公司 | Uygur language identification method, device and storage medium |
CN112711943B (en) * | 2020-12-17 | 2023-11-24 | 厦门市美亚柏科信息股份有限公司 | Uygur language identification method, device and storage medium |
CN117371445A (en) * | 2023-12-07 | 2024-01-09 | 深圳市慧动创想科技有限公司 | Information error correction method, device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105279149A (en) | Chinese text automatic correction method | |
CN110489760B (en) | Text automatic correction method and device based on deep neural network | |
CN113495900B (en) | Method and device for obtaining structured query language statement based on natural language | |
CN103885938B (en) | Industry spelling mistake checking method based on user feedback | |
US11055327B2 (en) | Unstructured data parsing for structured information | |
Mehmood et al. | An unsupervised lexical normalization for Roman Hindi and Urdu sentiment analysis | |
CN107341143B (en) | Sentence continuity judgment method and device and electronic equipment | |
CN113076739A (en) | Method and system for realizing cross-domain Chinese text error correction | |
CN113673228B (en) | Text error correction method, apparatus, computer storage medium and computer program product | |
CN110110334B (en) | Remote consultation record text error correction method based on natural language processing | |
CN111651978A (en) | Entity-based lexical examination method and device, computer equipment and storage medium | |
KR20230009564A (en) | Learning data correction method and apparatus thereof using ensemble score | |
CN112417823B (en) | Chinese text word order adjustment and word completion method and system | |
Uthayamoorthy et al. | Ddspell-a data driven spell checker and suggestion generator for the tamil language | |
JP2018206262A (en) | Word linking identification model learning device, word linking detection device, method and program | |
Chaudhuri | Reversed word dictionary and phonetically similar word grouping based spell-checker to Bangla text | |
CN114510925A (en) | Chinese text error correction method, system, terminal equipment and storage medium | |
Mittra et al. | A bangla spell checking technique to facilitate error correction in text entry environment | |
Hocking et al. | Optical character recognition for South African languages | |
CN106776590A (en) | A kind of method and system for obtaining entry translation | |
CN117875310A (en) | Vertical domain text error correction method based on prefix and suffix word stock and confusion degree | |
CN116611428A (en) | Non-autoregressive decoding Vietnam text regularization method based on editing alignment algorithm | |
CN114580391A (en) | Chinese error detection model training method, device, equipment and storage medium | |
CN114970541A (en) | Text semantic understanding method, device, equipment and storage medium | |
Irani et al. | A Supervised Deep Learning-based Approach for Bilingual Arabic and Persian Spell Correction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160127 |