CN109446527A - Meaningless corpus analysis method and system - Google Patents
Meaningless corpus analysis method and system Download PDFInfo
- Publication number
- CN109446527A CN109446527A CN201811260440.5A CN201811260440A CN109446527A CN 109446527 A CN109446527 A CN 109446527A CN 201811260440 A CN201811260440 A CN 201811260440A CN 109446527 A CN109446527 A CN 109446527A
- Authority
- CN
- China
- Prior art keywords
- corpus
- meaningless
- user
- sentence
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004458 analytical method Methods 0.000 title claims description 45
- 230000014509 gene expression Effects 0.000 claims abstract description 149
- 238000000034 method Methods 0.000 claims abstract description 40
- 239000000284 extract Substances 0.000 claims abstract description 9
- 239000000463 material Substances 0.000 claims description 27
- 238000000605 extraction Methods 0.000 claims description 9
- 230000011218 segmentation Effects 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 230000002452 interceptive effect Effects 0.000 description 12
- 239000000203 mixture Substances 0.000 description 10
- 230000008569 process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 238000012163 sequencing technique Methods 0.000 description 3
- 238000010835 comparative analysis Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a method and a system for analyzing meaningless linguistic data, wherein the method comprises the following steps: acquiring a meaningless corpus, and summarizing a corpus regular expression according to the meaningless corpus; obtaining a meaningless judgment condition for judging the corpus according to the corpus regular expression; acquiring a user statement; when the user statement meets the judgment condition, judging that the user statement is meaningless; and when the user sentence is meaningless, analyzing the keywords of the user sentence and/or extracting the effective backbone of the user sentence to carry out intention recommendation and/or voice guidance. The method can accurately and quickly identify the meaningless corpus input by the user, and then can extract keywords or effective stems to conjecture the user intention from the corpus so as to carry out recommendation.
Description
Technical field
The present invention relates to technical field of language recognition, the analysis method and system of espespecially a kind of meaningless corpus.
Background technique
In existing interactive voice, during microphone collects user speech, the environment as locating for user is made an uproar
The problem of sound, more people link up etc., often will lead to microphone and has included meaningless segment voice messaging, and by segment
Voice messaging carries out speech recognition, and has obtained some meaningless corpus.
But in interactive system, after having obtained some meaningless corpus, it tends to be difficult to do relevant be effectively treated.
Obtained meaningless corpus can not be effectively treated, refine the true intention of user, to take appropriate measures, instead
The dialogue result entanglement of reply, gives an irrelevant answer.When user intentionally gets effective service, then it can cause user's dislike, because this
It is not user and wishes the information that interactive system can be got.
For said circumstances, interactive system is on the one hand needed to analyze identification one by one to all voices being collected into, if greatly
The meaningless voice of amount, which mixes, wherein will cause large effect to the processing of interactive system, such as processing speed is slower
Deng on the other hand can not correctly identifying the intention of user, lead to not make correct feedback, affect user experience.Cause
This saves a kind of method that can be analyzed meaningless corpus to needs at present.
Summary of the invention
The object of the present invention is to provide the analysis method and system of a kind of meaningless corpus, realize that accurate quickly identification ground is known
The meaningless corpus of other user input, then can extract keyword from the corpus or effective trunk supposition user be intended into
And recommended.
Technical solution provided by the invention is as follows:
The present invention provides a kind of analysis method of meaningless corpus characterized by comprising
Meaningless corpus is obtained, corpus regular expression is summarized according to the meaningless corpus;
It is obtained according to the corpus regular expression and determines the meaningless decision condition of corpus;
Obtain user's sentence;
When user's sentence meets the decision condition, determine that user's sentence is meaningless;
After determining that user's sentence is meaningless, analyzes the keyword of user's sentence and/or extract the user
Effective trunk of sentence carries out being intended to recommendation and/or voice guide.
Further, the meaningless corpus of the acquisition summarizes corpus regular expressions according to the meaningless corpus
Formula specifically includes:
The meaningless corpus is obtained, the corpus sample in the meaningless corpus is divided according to participle technique
Word obtains the word for including in the corpus sample and corresponding part of speech;
Corpus regular expression is summarized according to language material feature, the language material feature includes the word and the part of speech.
Further, described obtained according to the corpus regular expression determines that the meaningless decision condition of corpus is specific
Include:
Count the type and quantity of the part of speech for including in the corpus regular expression;
The type and quantity for analyzing part of speech in all corpus regular expressions obtain and determine that corpus is sentenced described in meaningless
Fixed condition, the decision condition are that the quantity of the word of one or more parts of speech reaches threshold value;
Semantic slot is converted by the part of speech for including in the decision condition and corresponding word.
Further, after acquisition user's sentence, user's sentence of working as meets the decision condition
When, determine that user's sentence is meaningless includes: before
User's sentence is segmented according to the participle technique, is converted into corresponding regular expression;
By in the regular expression word and corresponding part of speech and the semantic slot match.
Further, described after determining that user's sentence is meaningless, analyze user's sentence keyword and/
Or effective trunk of extraction user's sentence carries out being intended to recommendation and/or voice guide specifically includes:
After determining that user's sentence is meaningless, using the corresponding word of one or more parts of speech as user's sentence
Keyword, carried out being intended to recommendation and/or voice guide according to the keyword;And/or
It excludes, is extracted in the regular expression by the word met is matched with the semantic slot in the regular expression
Effective trunk of the remaining word as user's sentence carries out being intended to recommendation and/or voice draws according to effective trunk
It leads.
The present invention also provides a kind of analysis systems of meaningless corpus characterized by comprising
Processing module obtains meaningless corpus, summarizes corpus regular expression according to the meaningless corpus;
Control module obtains according to the corpus regular expression that the processing module is summarized and determines that corpus is meaningless
Decision condition;
Module is obtained, user's sentence is obtained;
Determination module, when user's sentence that the acquisition module obtains meets the decision condition, described in judgement
User's sentence is meaningless;
Analysis module analyzes the pass of user's sentence after the determination module determines that user's sentence is meaningless
Keyword and/or the effective trunk for extracting user's sentence carry out being intended to recommendation and/or voice guide.
Further, the processing module specifically includes:
Participle unit obtains the meaningless corpus, according to participle technique to the corpus in the meaningless corpus
Sample is segmented, and the word for including in the corpus sample and corresponding part of speech are obtained;
Processing unit summarizes corpus regular expression according to language material feature, and the language material feature includes that participle unit obtains
The word and the part of speech.
Further, the control module specifically includes:
Statistic unit counts the type and quantity of the part of speech for including in the corpus regular expression;
Control unit analyzes the type and number of part of speech in all corpus regular expressions that described control unit analyzes
Amount obtains and determines the meaningless decision condition of corpus, and the decision condition is the quantity of the word of one or more parts of speech
Reach threshold value;
Conversion unit turns the part of speech for including in the decision condition that the statistic unit obtains and corresponding word
Turn to semantic slot.
Further, further includes:
Word segmentation module segments user's sentence according to the participle technique, is converted into corresponding regular expressions
Formula;
Matching module, the word in the regular expression and corresponding part of speech that the word segmentation module is converted and described
Semantic slot is matched.
Further, the analysis module specifically includes:
Analytical unit, after determining that user's sentence is meaningless, using the corresponding word of one or more parts of speech as institute
State the keyword of user's sentence;
Execution unit carries out being intended to recommendation and/or voice guide according to the keyword;And/or
The analytical unit excludes the word met in the regular expression with the semantic slot, and extraction is described just
Then effective trunk of the remaining word as user's sentence in expression formula;
The execution unit carries out being intended to recommendation and/or voice guide according to effective trunk.
The analysis method and system of a kind of meaningless corpus provided through the invention, can bring following at least one to have
Beneficial effect:
1, in the present invention, meaningless corpus is formed by collecting a large amount of meaningless corpus sample, it is then therefrom total
Conclusion material regular expression, determines the meaningless decision condition of corpus to obtain, establishes the judgement obtained in great amount of samples
Condition can more accurately filter out meaningless user's sentence, and a possibility that omitting or is wrong occurs in reduction.
2, the keyword or effective in the present invention, after determining that user's sentence is meaningless, still in analysis user's sentence
Trunk, therefrom obtains the true intention of user, and then carries out being intended to recommendation or voice guide, avoids according to initial user's sentence
Make incoherent feedback.
Detailed description of the invention
Below by clearly understandable mode, preferred embodiment is described with reference to the drawings, a kind of meaningless corpus is divided
Above-mentioned characteristic, technical characteristic, advantage and its implementation of analysis method and system are further described.
Fig. 1 is a kind of flow chart of one embodiment of the analysis method of meaningless corpus of the present invention;
Fig. 2 is a kind of flow chart of second embodiment of the analysis method of meaningless corpus of the present invention;
Fig. 3 is a kind of flow chart of the third embodiment of the analysis method of meaningless corpus of the present invention;
Fig. 4 is a kind of flow chart of 4th embodiment of the analysis method of meaningless corpus of the present invention;
Fig. 5 is a kind of structural schematic diagram of 5th embodiment of the analysis system of meaningless corpus of the present invention;
Fig. 6 is a kind of structural schematic diagram of 6th embodiment of the analysis system of meaningless corpus of the present invention;
Fig. 7 is a kind of structural schematic diagram of 7th embodiment of the analysis system of meaningless corpus of the present invention;
Fig. 8 is a kind of structural schematic diagram of 8th embodiment of the analysis system of meaningless corpus of the present invention.
Drawing reference numeral explanation:
The analysis system of 100 meaningless corpus
110 processing module, 111 participle unit, 112 processing unit
120 control module, 121 statistic unit, 122 control unit, 123 conversion unit
130 obtain module
140 word segmentation modules
150 matching modules
160 determination modules
170 analysis module, 171 analytical unit, 172 execution unit
Specific embodiment
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, Detailed description of the invention will be compareed below
A specific embodiment of the invention.It should be evident that drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing, and obtain other embodiments.
To make simplified form, part related to the present invention is only schematically shown in each figure, they are not represented
Its practical structures as product.In addition, there is identical structure or function in some figures so that simplified form is easy to understand
Component only symbolically depicts one of those, or has only marked one of those.Herein, "one" is not only indicated
" only this ", can also indicate the situation of " more than one ".
The first embodiment of the present invention, as shown in Figure 1, a kind of analysis method of meaningless corpus, method include:
S100 obtains meaningless corpus, summarizes corpus regular expression according to the meaningless corpus.
Specifically, collecting a large amount of meaningless corpus sample, wherein corpus sample can be writtening language for specification, can also
To be user speech, audio etc., because user speech input and text input are all the friendships of mainstream during human-computer interaction
Mutual mode.
In addition, since entire analytic process is for penman text, so if what is collected is the languages such as user speech, audio
Sound file, it is necessary first to convert identification text for voice document, then the identification text is performed corresponding processing.
The word in each corpus sample and corresponding part of speech are analyzed, to show that each corpus sample is corresponding
Corpus regular expression, can specify in each corpus sample the word of part of speech in corresponding corpus regular expression with pair
The part of speech expression answered, such as verb, adjective etc., it is other can not be with the word for the word or specific part of speech that part of speech replaces in language
Material regular expression in still indicated with original word, such as how, how many etc..
S200 is obtained according to the corpus regular expression determines the meaningless decision condition of corpus.
Specifically, each corpus sample can obtain a corresponding corpus regular expression according to above-mentioned method,
The corpus regular expression of all corpus samples of comprehensive analysis, finds out the common trait in meaningless corpus, sentences to obtain
The meaningless decision condition of attribute material.
Since the corpus sample size for including in meaningless corpus is more, a certain feature may be not present in all languages
All exist in material regular expression, therefore can be independently arranged by default or user, meets certain amount or certain
The public characteristic of the corresponding corpus regular expression of the corpus sample of ratio is the decision condition.
S300 obtains user's sentence.
S600 determines that user's sentence is meaningless when user's sentence meets the decision condition.
Specifically, obtain user's sentence, if what user inputted by interactive system is text, directly by the use of input
Family sentence and decision condition are matched, if matched the result is that be consistent, determine that user's sentence of input is meaningless.
If matched the result is that be not consistent, determine that user's sentence of input has certain practical significance, therefore to the use of input
Family sentence is parsed, to identify that the true intention of user is fed back accordingly.
If user's sentence that user by way of the interactive voice that interactive system selects, first inputs user
It is converted into identification text, then matches the identification text and above-mentioned decision condition, if matched the result is that be consistent,
Then determine that user's sentence of input is meaningless.If matched the result is that be not consistent, illustrate that user's sentence has practical significance, then
It is fed back accordingly by the intention of the identification user of user's sentence.
S700 is analyzed described in keyword and/or the extraction of user's sentence after determining that user's sentence is meaningless
Effective trunk of user's sentence carries out being intended to recommendation and/or voice guide.
Specifically, when the above results show obtain user's sentence it is meaningless after, then analyze user's sentence or by with
The identification text of family sentence conversion, obtains keyword therein or is screened to obtain effective trunk, according to the keyword or
Person's effective trunk judges the intention of user, and then carries out corresponding intention recommendation or voice guide.
In the present embodiment, meaningless corpus is formed by collecting a large amount of meaningless corpus sample, it is then therefrom total
Conclusion material regular expression, determines the meaningless decision condition of corpus to obtain, establishes the judgement obtained in great amount of samples
Condition can more accurately filter out meaningless user's sentence, and a possibility that omitting or is wrong occurs in reduction.
In addition, after determining that user's sentence is meaningless, keyword or effective trunk still in analysis user's sentence, from
The middle true intention for obtaining user, and then carry out being intended to recommendation or voice guide, it avoids being made not according to initial user's sentence
Relevant feedback.
The second embodiment of the present invention is the optimal enforcement example of above-mentioned first embodiment, as shown in Figure 2, comprising:
S110 obtains the meaningless corpus, according to participle technique to the corpus sample in the meaningless corpus into
Row participle, obtains the word for including in the corpus sample and corresponding part of speech.
Specifically, obtaining meaningless corpus, the corpus sample in meaningless corpus is divided according to participle technique
Word, if the corpus sample is the voice documents such as user speech, audio, it is necessary first to identification text is converted by voice document,
Then the identification text is segmented.
Above-mentioned participle technique method particularly includes: the structure of sentence in corpus sample is first determined whether, then by corpus sample
In every a word in entire sentence is divided by word, word and phrase etc. according to the relationship between the part of speech and word of word
Participle is constituted.
S120 summarizes corpus regular expression according to language material feature, and the language material feature includes the word and institute's predicate
Property.
Specifically, several language material features are obtained after being segmented corpus sample by above-mentioned participle technique, according to this
Language material feature summary obtains corpus regular expression, the language material feature be the participle such as word, word and phrase after above-mentioned participle,
The corresponding part of speech of the participle and the participle corresponding relationship in the sentence of corpus sample.
Wherein, the form of expression of the participle such as each word, word and phrase in corresponding corpus regular expression may
It is corresponding part of speech, it is also possible to which the participle such as initial word, word and phrase can be independently arranged with default or user.
For example, a certain corpus sample are as follows: which the composition for describing autumn has.Judged in the corpus sample by participle technique
The part of speech for the word covered: (auxiliary word) composition (noun) for describing (verb) autumn (time word) has (verb) which (pronoun),
Relationship between word are as follows: relationship in fixed: (verb) is described in composition (noun)-, moves guest's relationship: describing (the time in (verb)-autumn
Word), there is (verb)-which (pronoun).Wherein, a part of word, word are replaced with corresponding part of speech, the word, word of another part
The initial word of pragmatic, word indicate, therefore the corresponding corpus regular expression of the corpus sample are as follows: describe the # noun # of # time word #
Which has.
S210 counts the type and quantity for the part of speech for including in the corpus regular expression.
Specifically, counting the corpus regular expression according to the part of speech of each participle in each corpus regular expression
In include part of speech type and the participles such as the corresponding word of every kind of part of speech, word and phrase quantity, and then calculate every kind of word
Property the participle ratio shared in the participle such as all words, word and phrase such as corresponding word, word and phrase.
Word, word and the phrase etc. do not expressed with corresponding part of speech in corpus regular expression are segmented,
Can directly by initial word, word and phrase carry out statistic of classification, that is using initial word, word and the phrase as
A certain " part of speech ".
It is minimum in statistic processes due to the otherness of everyone form of presentation for the participle of this part of speech
It is likely to be encountered completely the same participle, it is therefore desirable in view of the semanteme of participle, then be classified as semantic identical participle same
Class.Such as " ", " ground " and " obtaining " or "and", "AND", " and " etc..
For example, a certain corpus sample are as follows: which the composition for describing autumn has.Corresponding corpus regular expression are as follows: describe #
Which the # noun # of time word # has.Statistics obtains " describing " quantity one, " time word " quantity one, " " quantity one, " noun " number
Amount one." which has " quantity one, by " description ", " ", the word of " which has " as the same rank of " time word " and " noun "
Property.
S220 analyzes the type and quantity of part of speech in all corpus regular expressions, obtains and determines the meaningless institute of corpus
Decision condition is stated, the decision condition is that the quantity of the word of one or more parts of speech reaches threshold value.
Specifically, passing through the type and every kind of word of the part of speech for including in the single corpus regular expression of above-mentioned statistics
Property corresponding number or ratio, analyze the type and quantity of the part of speech for including in all corpus regular expressions, obtain judgement language
Expect meaningless decision condition.
The type in all corpus regular expressions comprising all parts of speech is obtained, counts the participle of every kind of part of speech one by one
The ratio occurred in each corpus regular expression, wherein the part of speech for the type having is in one or more corpus canonical tables
It may be 0 up to the ratio occurred in formula, especially this kind of part of speech is word, word and phrase initial in corpus regular expression.
Then the ratio that every kind of part of speech of comparative analysis occurs in each corpus regular expression, obtains meaningless corpus
In in a certain proportion of corpus sample the ratio of the participle of certain or a variety of parts of speech be more than certain threshold value, just by this in corpus kind or
The ratio of the participle of a variety of parts of speech is more than the threshold value as decision condition.
Such as obtain in meaningless corpus 70% corpus sample " " ratio be more than 40%, then will be in corpus
" " ratio be more than 40% as decision condition.70% and 40% two threshold value in the example above is only as an example, practical
User can be freely arranged in application process, and the numerical value of the two can be identical or not identical.
The part of speech for including in the decision condition and corresponding word are converted semantic slot by S230.
Specifically, being determined due to subsequent user's sentence, that is, it is compared with above-mentioned decision condition, therefore will
Part of speech and corresponding word in decision condition are converted into semantic slot, wherein with corresponding part of speech in corpus regular expression
The participle of expression only converts semantic slot for part of speech, and the participle with initial word, word, phrase expression is then by part of speech and corresponding word
Language is converted to semantic slot.
For the example above, by corpus " " ratio be more than 40% as decision condition, then by part of speech " " and
" ground " of identical semanteme, " obtaining " are converted into semantic slot.If it is determined that condition is that adjectival ratio is more than 40%, then by part of speech shape
Hold word and is converted into semantic slot.
S300 obtains user's sentence.
S600 determines that user's sentence is meaningless when user's sentence meets the decision condition.
S700 is analyzed described in keyword and/or the extraction of user's sentence after determining that user's sentence is meaningless
Effective trunk of user's sentence carries out being intended to recommendation and/or voice guide.
In the present embodiment, each of meaningless corpus corpus sample is parsed to obtain corresponding corpus one by one
Regular expression statisticallys analyze the corpus regular expression of all corpus samples, then obtains decision condition, so that it is guaranteed that energy
Enough accurately identify meaningless corpus.
The third embodiment of the present invention is the optimal enforcement example of above-mentioned first embodiment and second embodiment, such as Fig. 3 institute
Show, comprising:
S100 obtains meaningless corpus, summarizes corpus regular expression according to the meaningless corpus.
S200 is obtained according to the corpus regular expression determines the meaningless decision condition of corpus
S300 obtains user's sentence.
S400 segments user's sentence according to the participle technique, is converted into corresponding regular expression.
Specifically, segmenting according to user sentence of the participle technique to acquisition, sentence in user's sentence is first determined whether
Then structure will draw entire sentence according to the relationship between the part of speech and word of word in every a word in user's sentence
It is divided into the participles such as word, word and phrase composition, to obtain corresponding regular expression.
S500 by the regular expression word and corresponding part of speech and the semantic slot match.
Specifically, by regular expression word and corresponding part of speech and semantic slot match, due to canonical table
The form of expression up to the participles such as each word, word and phrase in formula may be corresponding part of speech, it is also possible to initial word, word
And the participle such as phrase, it is contemplated that matching speed, priority match are with the participle that corresponding part of speech is expressed in regular expression
No and semantic slot matching, the participle and semantic slot that then will be expressed in regular expression with initial word, word and phrase again
It is matched.
But it is used initially actually in regular expression with the participle of corresponding part of speech expression and in regular expression
Word, word and phrase expression participle and the semantic matched sequencing of slot have no effect on matching result, can voluntarily select.
S600 determines that user's sentence is meaningless when user's sentence meets the decision condition.
S700 is analyzed described in keyword and/or the extraction of user's sentence after determining that user's sentence is meaningless
Effective trunk of user's sentence carries out being intended to recommendation and/or voice guide.
In the present embodiment, is segmented according to user corpus of the identical participle technique to acquisition, obtain corresponding canonical
Then the part of speech for including in regular expression and corresponding word and semantic slot are carried out matching and obtain matching knot by expression formula
Fruit, so that rapidly and accurately whether identification user's sentence is meaningless.
The fourth embodiment of the present invention is the optimal enforcement example of above-mentioned first embodiment, as shown in Figure 4, comprising:
S100 obtains meaningless corpus, summarizes corpus regular expression according to the meaningless corpus.
S200 is obtained according to the corpus regular expression determines the meaningless decision condition of corpus.
S300 obtains user's sentence.
S600 determines that user's sentence is meaningless when user's sentence meets the decision condition.
S710 is after determining that user's sentence is meaningless, using the corresponding word of one or more parts of speech as the user
The keyword of sentence carries out being intended to recommendation and/or voice guide according to the keyword;And/or
Specifically, selecting one according to the sequencing of user setting after user's sentence that above-mentioned judgement obtains is meaningless
Then kind or the corresponding word of a variety of parts of speech carry out being intended to recommendation or voice guide according to keyword as keyword.
For example, user setting chooses a kind of word of part of speech as keyword, adjective is preferentially chosen, is secondly selected dynamic
Word finally selects time word, if not having adjective in user's sentence, worries verb and time word, then selects verb corresponding
Word is as keyword.
S720 will match the word met with the semantic slot in the regular expression and exclude, and extract the regular expressions
Effective trunk of the remaining word as user's sentence in formula carries out being intended to recommendation and/or language according to effective trunk
Sound guidance.
Specifically, when above-mentioned judgement obtain user's sentence it is meaningless after, it is also an option that by regular expression with language
The word that adopted slot matching meets excludes, and extracts remaining word as effective trunk, then carries out being intended to push away according to effective trunk
It recommends or voice guide.
For example, decision condition be in corpus " " ratio be more than 40%, semantic slot for part of speech " " and identical semanteme
" ground ", " obtaining ", then by the corresponding regular expression of user's sentence " ", " ground ", " obtaining " all exclude, remaining part
Identification user is carried out as effective trunk to be intended to.
In the present embodiment, after the user's sentence for determining to obtain is meaningless, still through selection keyword or extraction
The mode of effective trunk identifies the true intention of user as much as possible, and identifies according to choosing keyword or extracting effective trunk
User's true intention eliminates the interference of some words, reduces misread a possibility that user is intended to a certain extent.
The fifth embodiment of the present invention, as shown in figure 5, a kind of analysis system 100 of meaningless corpus, comprising:
Processing module 110 obtains meaningless corpus, summarizes corpus regular expression according to the meaningless corpus.
Specifically, processing module 110 collects a large amount of meaningless corpus sample, wherein corpus sample can be the book of specification
Face term is also possible to user speech, audio etc., because user speech input and text input are all during human-computer interaction
It is the interactive mode of mainstream.
In addition, since entire analytic process is for penman text, so if what is collected is the languages such as user speech, audio
Sound file, it is necessary first to convert identification text for voice document, then the identification text is performed corresponding processing.
Processing module 110 analyzes word and corresponding part of speech in each corpus sample, to obtain each language
Expect the corresponding corpus regular expression of sample, the word of part of speech can be specified in corresponding corpus canonical in each corpus sample
It is expressed in expression formula with corresponding part of speech, such as verb, adjective etc., it is other to use the word that part of speech replaces either specific word
The word of property still indicates with original word in corpus regular expression, for example, how, how many etc..
Control module 120, according to the processing module 110 summarize the corpus regular expression obtain determine corpus without
The decision condition of meaning.
Specifically, each corpus sample can obtain a corresponding corpus regular expression according to above-mentioned method,
The corpus regular expression of all corpus samples of 120 comprehensive analysis of control module, finds out the common trait in meaningless corpus,
The meaningless decision condition of corpus is determined to obtain.
Since the corpus sample size for including in meaningless corpus is more, a certain feature may be not present in all languages
All exist in material regular expression, therefore can be independently arranged by default or user, meets certain amount or certain
The public characteristic of the corresponding corpus regular expression of the corpus sample of ratio is the decision condition.
Module 130 is obtained, user's sentence is obtained.
Determination module 160 is sentenced when user's sentence that the acquisition module 130 obtains meets the decision condition
Fixed user's sentence is meaningless.
Specifically, obtaining module 130 obtains user's sentence, if what user inputted by interactive system is text, sentence
Cover half block 160 directly matches the user's sentence and decision condition of input, if matched the result is that be consistent, determines defeated
The user's sentence entered is meaningless.If matched the result is that be not consistent, it is certain to determine that user's sentence of input has
Practical significance, therefore user's sentence of input is parsed, to identify that the true intention of user is fed back accordingly.
If user, by way of the interactive voice that interactive system selects, determination module 160 is defeated by user first
The user's sentence entered is converted into identification text, then matches the identification text and above-mentioned decision condition, if matching
The result is that be consistent, then determine that user's sentence of input is meaningless.If matched the result is that be not consistent, illustrate that user's sentence has
It is of practical significance, is then fed back accordingly by the intention of the identification user of user's sentence.
Analysis module 170 analyzes user's language after the determination module 160 determines that user's sentence is meaningless
The keyword of sentence and/or the effective trunk for extracting user's sentence carry out being intended to recommendation and/or voice guide.
Specifically, then analysis module 170 analyzes user's language after the above results show that the user's sentence obtained is meaningless
The identification text that sentence is perhaps converted by user's sentence obtains keyword therein or is screened to obtain effective trunk, according to
Perhaps effective trunk judges the intention of user and then carries out corresponding intention recommendation or voice guide the keyword.
In the present embodiment, meaningless corpus is formed by collecting a large amount of meaningless corpus sample, it is then therefrom total
Conclusion material regular expression, determines the meaningless decision condition of corpus to obtain, establishes the judgement obtained in great amount of samples
Condition can more accurately filter out meaningless user's sentence, and a possibility that omitting or is wrong occurs in reduction.
In addition, after determining that user's sentence is meaningless, keyword or effective trunk still in analysis user's sentence, from
The middle true intention for obtaining user, and then carry out being intended to recommendation or voice guide, it avoids being made not according to initial user's sentence
Relevant feedback.
The sixth embodiment of the present invention is the optimal enforcement example of above-mentioned 5th embodiment, as shown in Figure 6, comprising:
Processing module 110 obtains meaningless corpus, summarizes corpus regular expression according to the meaningless corpus.
The processing module 110 specifically includes:
Participle unit 111 obtains the meaningless corpus, according to participle technique to the language in the meaningless corpus
Material sample is segmented, and the word for including in the corpus sample and corresponding part of speech are obtained.
Specifically, participle unit 111 obtains meaningless corpus, according to participle technique to the corpus in meaningless corpus
Sample is segmented, if the corpus sample is the voice documents such as user speech, audio, it is necessary first to convert voice document to
It identifies text, then the identification text is segmented.
Above-mentioned participle technique method particularly includes: the structure of sentence in corpus sample is first determined whether, then by corpus sample
In every a word in entire sentence is divided by word, word and phrase etc. according to the relationship between the part of speech and word of word
Participle is constituted.
Processing unit 112 summarizes corpus regular expression according to language material feature, and the language material feature includes participle unit
The 111 obtained words and the part of speech.
Specifically, obtaining several language material features after being segmented corpus sample by above-mentioned participle technique, processing is single
Member 112 according to the language material feature summary obtain corpus regular expression, the language material feature be above-mentioned participle after word, word and
The participle such as phrase, the corresponding part of speech of the participle and the participle corresponding relationship in the sentence of corpus sample.
Wherein, the form of expression of the participle such as each word, word and phrase in corresponding corpus regular expression may
It is corresponding part of speech, it is also possible to which the participle such as initial word, word and phrase can be independently arranged with default or user.
For example, a certain corpus sample are as follows: which the composition for describing autumn has.Judged in the corpus sample by participle technique
The part of speech for the word covered: (auxiliary word) composition (noun) for describing (verb) autumn (time word) has (verb) which (pronoun),
Relationship between word are as follows: relationship in fixed: (verb) is described in composition (noun)-, moves guest's relationship: describing (the time in (verb)-autumn
Word), there is (verb)-which (pronoun).Wherein, a part of word, word are replaced with corresponding part of speech, the word, word of another part
The initial word of pragmatic, word indicate, therefore the corresponding corpus regular expression of the corpus sample are as follows: describe the # noun # of # time word #
Which has.
Control module 120, according to the processing module 110 summarize the corpus regular expression obtain determine corpus without
The decision condition of meaning.
The control module 120 specifically includes:
Statistic unit 121 counts the type and quantity of the part of speech for including in the corpus regular expression.
Specifically, statistic unit 121 counts the language according to the part of speech of each participle in each corpus regular expression
The quantity of the participles such as the type for the part of speech for including in material regular expression and the corresponding word of every kind of part of speech, word and phrase, into
And the participle ratio shared in the participle such as all words, word and phrase such as calculate the corresponding word of every kind of part of speech, word and phrase
Example.
Word, word and the phrase etc. do not expressed with corresponding part of speech in corpus regular expression are segmented,
Can directly by initial word, word and phrase carry out statistic of classification, that is using initial word, word and the phrase as
A certain " part of speech ".
It is minimum in statistic processes due to the otherness of everyone form of presentation for the participle of this part of speech
It is likely to be encountered completely the same participle, it is therefore desirable in view of the semanteme of participle, then be classified as semantic identical participle same
Class.Such as " ", " ground " and " obtaining " or "and", "AND", " and " etc..
For example, a certain corpus sample are as follows: which the composition for describing autumn has.Corresponding corpus regular expression are as follows: describe #
Which the # noun # of time word # has.Statistics obtains " describing " quantity one, " time word " quantity one, " " quantity one, " noun " number
Amount one." which has " quantity one, by " description ", " ", the word of " which has " as the same rank of " time word " and " noun "
Property.
Control unit 122 analyzes the kind of part of speech in all corpus regular expressions that described control unit 122 analyzes
Class and quantity obtain and determine the meaningless decision condition of corpus, and the decision condition is the word of one or more parts of speech
Quantity reach threshold value.
Specifically, passing through the type and every kind of word of the part of speech for including in the single corpus regular expression of above-mentioned statistics
Property corresponding number or ratio, control unit 122 analyze the type and quantity for the part of speech for including in all corpus regular expressions,
It obtains and determines the meaningless decision condition of corpus.
The type in all corpus regular expressions comprising all parts of speech is obtained, counts the participle of every kind of part of speech one by one
The ratio occurred in each corpus regular expression, wherein the part of speech for the type having is in one or more corpus canonical tables
It may be 0 up to the ratio occurred in formula, especially this kind of part of speech is word, word and phrase initial in corpus regular expression.
Then the ratio that every kind of part of speech of comparative analysis occurs in each corpus regular expression, obtains meaningless corpus
In in a certain proportion of corpus sample the ratio of the participle of certain or a variety of parts of speech be more than certain threshold value, just by this in corpus kind or
The ratio of the participle of a variety of parts of speech is more than the threshold value as decision condition.
Such as obtain in meaningless corpus 70% corpus sample " " ratio be more than 40%, then will be in corpus
" " ratio be more than 40% as decision condition.70% and 40% two threshold value in the example above is only as an example, practical
User can be freely arranged in application process, and the numerical value of the two can be identical or not identical.
Conversion unit 123, by the part of speech for including in the decision condition that the statistic unit 121 obtains and corresponding
Word is converted into semantic slot.
Specifically, being determined due to subsequent user's sentence, that is, it is compared, therefore turns with above-mentioned decision condition
Change unit 123 by decision condition part of speech and corresponding word be converted into semantic slot, wherein in corpus regular expression
The participle expressed with corresponding part of speech only converts semantic slot for part of speech, and the participle with initial word, word, phrase expression is then by word
Property and corresponding word are converted to semantic slot.
For the example above, by corpus " " ratio be more than 40% as decision condition, then by part of speech " " and
" ground " of identical semanteme, " obtaining " are converted into semantic slot.If it is determined that condition is that adjectival ratio is more than 40%, then by part of speech shape
Hold word and is converted into semantic slot.
Module 130 is obtained, user's sentence is obtained.
Determination module 160 is sentenced when user's sentence that the acquisition module 130 obtains meets the decision condition
Fixed user's sentence is meaningless.
Analysis module 170 analyzes user's language after the determination module 160 determines that user's sentence is meaningless
The keyword of sentence and/or the effective trunk for extracting user's sentence carry out being intended to recommendation and/or voice guide.
In the present embodiment, each of meaningless corpus corpus sample is parsed to obtain corresponding corpus one by one
Regular expression statisticallys analyze the corpus regular expression of all corpus samples, then obtains decision condition, so that it is guaranteed that energy
Enough accurately identify meaningless corpus.
The seventh embodiment of the present invention is the optimal enforcement example of above-mentioned 5th embodiment and sixth embodiment, such as Fig. 7 institute
Show, comprising:
Processing module 110 obtains meaningless corpus, summarizes corpus regular expression according to the meaningless corpus.
Control module 120, according to the processing module 110 summarize the corpus regular expression obtain determine corpus without
The decision condition of meaning.
Module 130 is obtained, user's sentence is obtained.
Word segmentation module 140 segments user's sentence according to the participle technique, is converted into corresponding canonical table
Up to formula.
Specifically, word segmentation module 140 is segmented according to user's sentence of the participle technique to acquisition, user's language is first determined whether
Then the structure of sentence in sentence will incite somebody to action in every a word in user's sentence according to the relationship between the part of speech and word of word
Entire sentence is divided into the participles such as word, word and phrase composition, to obtain corresponding regular expression.
Matching module 150, the word in the regular expression that the word segmentation module 140 is converted and corresponding part of speech
It is matched with the semantic slot.
Specifically, matching module 150 by regular expression word and corresponding part of speech and semantic slot match,
Since the form of expression of the participles such as each word, word and phrase in regular expression may be corresponding part of speech, it is also possible to
The participle such as initial word, word and phrase, it is contemplated that matching speed, priority match is in regular expression with corresponding part of speech table
Whether the participle reached matches with semantic slot, point that then will be expressed in regular expression with initial word, word and phrase again
Word and semantic slot are matched.
But it is used initially actually in regular expression with the participle of corresponding part of speech expression and in regular expression
Word, word and phrase expression participle and the semantic matched sequencing of slot have no effect on matching result, can voluntarily select.
Determination module 160 is sentenced when user's sentence that the acquisition module 130 obtains meets the decision condition
Fixed user's sentence is meaningless.
Analysis module 170 analyzes user's language after the determination module 160 determines that user's sentence is meaningless
The keyword of sentence and/or the effective trunk for extracting user's sentence carry out being intended to recommendation and/or voice guide.
In the present embodiment, is segmented according to user corpus of the identical participle technique to acquisition, obtain corresponding canonical
Then the part of speech for including in regular expression and corresponding word and semantic slot are carried out matching and obtain matching knot by expression formula
Fruit, so that rapidly and accurately whether identification user's sentence is meaningless.
The eighth embodiment of the present invention is the optimal enforcement example of above-mentioned 5th embodiment, as shown in Figure 8, comprising:
Processing module 110 obtains meaningless corpus, summarizes corpus regular expression according to the meaningless corpus.
Control module 120, according to the processing module 110 summarize the corpus regular expression obtain determine corpus without
The decision condition of meaning.
Module 130 is obtained, user's sentence is obtained.
Determination module 160 is sentenced when user's sentence that the acquisition module 130 obtains meets the decision condition
Fixed user's sentence is meaningless.
Analysis module 170 analyzes user's language after the determination module 160 determines that user's sentence is meaningless
The keyword of sentence and/or the effective trunk for extracting user's sentence carry out being intended to recommendation and/or voice guide.
The analysis module 170 specifically includes:
Analytical unit 171, after determining that user's sentence is meaningless, using the corresponding word of one or more parts of speech as
The keyword of user's sentence.
Execution unit 172 carries out being intended to recommendation and/or voice guide according to the keyword;And/or
Specifically, analytical unit 171 is according to the successive of user setting after user's sentence that above-mentioned judgement obtains is meaningless
The corresponding word of the one or more parts of speech of sequential selection is as keyword, and then execution unit 172 is intended to according to keyword
Recommendation or voice guide.
For example, user setting chooses a kind of word of part of speech as keyword, adjective is preferentially chosen, is secondly selected dynamic
Word finally selects time word, if not having adjective in user's sentence, worries verb and time word, then selects verb corresponding
Word is as keyword.
The analytical unit 171 excludes the word met in the regular expression with the semantic slot, described in extraction
Effective trunk of the remaining word as user's sentence in regular expression.
The execution unit 172 carries out being intended to recommendation and/or voice guide according to effective trunk.
Specifically, analytical unit 171 is it is also an option that by canonical table after user's sentence that above-mentioned judgement obtains is meaningless
It is excluded up to the word that meets is matched in formula with semantic slot, extracts remaining word as effective trunk, then execution unit 172
It carries out being intended to recommendation or voice guide according to effective trunk.
For example, decision condition be in corpus " " ratio be more than 40%, semantic slot for part of speech " " and identical semanteme
" ground ", " obtaining ", then by the corresponding regular expression of user's sentence " ", " ground ", " obtaining " all exclude, remaining part
Identification user is carried out as effective trunk to be intended to.
In the present embodiment, after the user's sentence for determining to obtain is meaningless, still through selection keyword or extraction
The mode of effective trunk identifies the true intention of user as much as possible, and identifies according to choosing keyword or extracting effective trunk
User's true intention eliminates the interference of some words, reduces misread a possibility that user is intended to a certain extent.
It should be noted that above-described embodiment can be freely combined as needed.The above is only of the invention preferred
Embodiment, it is noted that for those skilled in the art, in the premise for not departing from the principle of the invention
Under, several improvements and modifications can also be made, these modifications and embellishments should also be considered as the scope of protection of the present invention.
Claims (10)
1. a kind of analysis method of meaningless corpus characterized by comprising
Meaningless corpus is obtained, corpus regular expression is summarized according to the meaningless corpus;
It is obtained according to the corpus regular expression and determines the meaningless decision condition of corpus;
Obtain user's sentence;
When user's sentence meets the decision condition, determine that user's sentence is meaningless;
After determining that user's sentence is meaningless, analyzes the keyword of user's sentence and/or extract user's sentence
Effective trunk carry out being intended to recommendation and/or voice guide.
2. the analysis method of meaningless corpus according to claim 1, which is characterized in that the meaningless corpus of the acquisition
Collection is summarized corpus regular expression according to the meaningless corpus and is specifically included:
The meaningless corpus is obtained, the corpus sample in the meaningless corpus is segmented according to participle technique,
Obtain the word for including in the corpus sample and corresponding part of speech;
Corpus regular expression is summarized according to language material feature, the language material feature includes the word and the part of speech.
3. the analysis method of meaningless corpus according to claim 2, which is characterized in that it is described according to the corpus just
Then expression formula, which obtains, determines that the meaningless decision condition of corpus specifically includes:
Count the type and quantity of the part of speech for including in the corpus regular expression;
The type and quantity for analyzing part of speech in all corpus regular expressions obtain and determine the meaningless judgement item of corpus
Part, the decision condition are that the quantity of the word of one or more parts of speech reaches threshold value;
Semantic slot is converted by the part of speech for including in the decision condition and corresponding word.
4. the analysis method of meaningless corpus according to claim 3, which is characterized in that the acquisition user sentence it
Afterwards, described when user's sentence meets the decision condition, determine that user's sentence is meaningless includes: before
User's sentence is segmented according to the participle technique, is converted into corresponding regular expression;
By in the regular expression word and corresponding part of speech and the semantic slot match.
5. the analysis method of meaningless corpus according to claim 4, which is characterized in that described working as determines the user
After sentence is meaningless, the keyword for analyzing user's sentence and/or the effective trunk for extracting user's sentence are intended to
Recommend and/or voice guide specifically include:
After determining that user's sentence is meaningless, using the corresponding word of one or more parts of speech as the pass of user's sentence
Keyword carries out being intended to recommendation and/or voice guide according to the keyword;And/or
It excludes, extracts remaining in the regular expression by the word met is matched with the semantic slot in the regular expression
Effective trunk of the word as user's sentence, carried out being intended to recommendation and/or voice guide according to effective trunk.
6. a kind of analysis system of meaningless corpus characterized by comprising
Processing module obtains meaningless corpus, summarizes corpus regular expression according to the meaningless corpus;
Control module obtains according to the corpus regular expression that the processing module is summarized and determines the meaningless judgement of corpus
Condition;
Module is obtained, user's sentence is obtained;
Determination module determines the user when user's sentence that the acquisition module obtains meets the decision condition
Sentence is meaningless;
Analysis module analyzes the keyword of user's sentence after the determination module determines that user's sentence is meaningless
And/or effective trunk of extraction user's sentence carries out being intended to recommendation and/or voice guide.
7. the analysis system of meaningless corpus according to claim 6, which is characterized in that the processing module is specifically wrapped
It includes:
Participle unit obtains the meaningless corpus, according to participle technique to the corpus sample in the meaningless corpus
It is segmented, obtains the word for including in the corpus sample and corresponding part of speech;
Processing unit summarizes corpus regular expression according to language material feature, and the language material feature includes the institute that participle unit obtains
Predicate language and the part of speech.
8. the analysis system of meaningless corpus according to claim 7, which is characterized in that the control module is specifically wrapped
It includes:
Statistic unit counts the type and quantity of the part of speech for including in the corpus regular expression;
Control unit analyzes the type and quantity of part of speech in all corpus regular expressions that described control unit analyzes,
It obtains and determines the meaningless decision condition of corpus, the decision condition is that the quantity of the word of one or more parts of speech reaches
Threshold value;
Conversion unit converts the part of speech for including in the decision condition that the statistic unit obtains and corresponding word to
Semantic slot.
9. the analysis system of meaningless corpus according to claim 8, which is characterized in that further include:
Word segmentation module segments user's sentence according to the participle technique, is converted into corresponding regular expression;
Matching module, word in the regular expression and corresponding part of speech that the word segmentation module is converted and the semanteme
Slot is matched.
10. the analysis system of meaningless corpus according to claim 9, which is characterized in that the analysis module is specifically wrapped
It includes:
Analytical unit, after determining that user's sentence is meaningless, using the corresponding word of one or more parts of speech as the use
The keyword of family sentence;
Execution unit carries out being intended to recommendation and/or voice guide according to the keyword;And/or
The word met in the regular expression with the semantic slot is excluded, extracts the canonical table by the analytical unit
Effective trunk up to word remaining in formula as user's sentence;
The execution unit carries out being intended to recommendation and/or voice guide according to effective trunk.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811260440.5A CN109446527B (en) | 2018-10-26 | 2018-10-26 | Nonsensical corpus analysis method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811260440.5A CN109446527B (en) | 2018-10-26 | 2018-10-26 | Nonsensical corpus analysis method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109446527A true CN109446527A (en) | 2019-03-08 |
CN109446527B CN109446527B (en) | 2023-10-20 |
Family
ID=65548680
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811260440.5A Active CN109446527B (en) | 2018-10-26 | 2018-10-26 | Nonsensical corpus analysis method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109446527B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112732897A (en) * | 2020-12-28 | 2021-04-30 | 平安科技(深圳)有限公司 | Document processing method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101784022A (en) * | 2009-01-16 | 2010-07-21 | 北京炎黄新星网络科技有限公司 | Method and system for filtering and classifying short messages |
CN102801859A (en) * | 2012-08-03 | 2012-11-28 | 陈伟 | Method and device for identifying junk short message, and mobile communication terminal with device |
CN105096942A (en) * | 2014-05-21 | 2015-11-25 | 清华大学 | Semantic analysis method and semantic analysis device |
CN106681980A (en) * | 2015-11-05 | 2017-05-17 | 中国移动通信集团公司 | Method and device for analyzing junk short messages |
US20170177715A1 (en) * | 2015-12-21 | 2017-06-22 | Adobe Systems Incorporated | Natural Language System Question Classifier, Semantic Representations, and Logical Form Templates |
-
2018
- 2018-10-26 CN CN201811260440.5A patent/CN109446527B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101784022A (en) * | 2009-01-16 | 2010-07-21 | 北京炎黄新星网络科技有限公司 | Method and system for filtering and classifying short messages |
CN102801859A (en) * | 2012-08-03 | 2012-11-28 | 陈伟 | Method and device for identifying junk short message, and mobile communication terminal with device |
CN105096942A (en) * | 2014-05-21 | 2015-11-25 | 清华大学 | Semantic analysis method and semantic analysis device |
CN106681980A (en) * | 2015-11-05 | 2017-05-17 | 中国移动通信集团公司 | Method and device for analyzing junk short messages |
US20170177715A1 (en) * | 2015-12-21 | 2017-06-22 | Adobe Systems Incorporated | Natural Language System Question Classifier, Semantic Representations, and Logical Form Templates |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112732897A (en) * | 2020-12-28 | 2021-04-30 | 平安科技(深圳)有限公司 | Document processing method and device, electronic equipment and storage medium |
WO2022142116A1 (en) * | 2020-12-28 | 2022-07-07 | 平安科技(深圳)有限公司 | Method and apparatus for processing document, and electronic device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109446527B (en) | 2023-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108376151B (en) | Question classification method and device, computer equipment and storage medium | |
WO2018066445A1 (en) | Causal relationship recognition apparatus and computer program therefor | |
Handler et al. | Bag of what? simple noun phrase extraction for text analysis | |
US9697821B2 (en) | Method and system for building a topic specific language model for use in automatic speech recognition | |
CN108388553B (en) | Method for eliminating ambiguity in conversation, electronic equipment and kitchen-oriented conversation system | |
CN106601259A (en) | Voiceprint search-based information recommendation method and device | |
CN105912629A (en) | Intelligent question and answer method and device | |
CN108345686A (en) | A kind of data analysing method and system based on search engine technique | |
CN110472203B (en) | Article duplicate checking and detecting method, device, equipment and storage medium | |
CN106528655A (en) | Text subject recognition method and device | |
KR20190118904A (en) | Topic modeling multimedia search system based on multimedia analysis and method thereof | |
US20040158558A1 (en) | Information processor and program for implementing information processor | |
CN113779983B (en) | Text data processing method and device, storage medium and electronic device | |
CN109697676A (en) | Customer analysis and application method and device based on social group | |
CN109446527A (en) | Meaningless corpus analysis method and system | |
CN111858900B (en) | Method, device, equipment and storage medium for generating question semantic parsing rule template | |
CN112861510A (en) | Summary processing method, apparatus, device and storage medium | |
CN106484672A (en) | Vocabulary recognition methods and vocabulary identifying system | |
CN107784024B (en) | Construct the method and device of party's portrait | |
CN109800430B (en) | Semantic understanding method and system | |
CN108475265B (en) | Method and device for acquiring unknown words | |
CN112231440A (en) | Voice search method based on artificial intelligence | |
CN104346336A (en) | Machine text mutual-curse based emotional venting method and system | |
CN114841143A (en) | Voice room quality evaluation method and device, equipment, medium and product thereof | |
CN114302227A (en) | Method and system for collecting and analyzing network video based on container collection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |