Nothing Special   »   [go: up one dir, main page]

WO2014017023A1 - Cause expression extraction device, cause expression extraction method, and cause expression extraction program - Google Patents

Cause expression extraction device, cause expression extraction method, and cause expression extraction program Download PDF

Info

Publication number
WO2014017023A1
WO2014017023A1 PCT/JP2013/004022 JP2013004022W WO2014017023A1 WO 2014017023 A1 WO2014017023 A1 WO 2014017023A1 JP 2013004022 W JP2013004022 W JP 2013004022W WO 2014017023 A1 WO2014017023 A1 WO 2014017023A1
Authority
WO
WIPO (PCT)
Prior art keywords
expression
cause
result
candidate
cause expression
Prior art date
Application number
PCT/JP2013/004022
Other languages
French (fr)
Japanese (ja)
Inventor
定政 邦彦
享 赤峯
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Publication of WO2014017023A1 publication Critical patent/WO2014017023A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Definitions

  • the present invention relates to a cause expression extraction apparatus, a cause expression extraction method, and a cause expression extraction program for extracting an expression describing a cause that has led to a certain event or state from input expressions.
  • clue expression an expression that becomes a clue (hereinafter referred to as a clue expression) when extracting the cause expression is prepared in advance, and in machine learning, the presence or absence of the occurrence is identified.
  • clue expressions for example in Japanese
  • clue expressions include “cause of (ga (genninn de)”, “due (niyori)”, “de (de)”, and the like. “Because of (ga genninn de)”, “by (niyori)”, and “de (de)” in Japanese correspond to “because of” in English.
  • causal relation pairs a pair of expressions that are likely to be causal relations
  • the presence or absence of appearance is one of the features in machine learning.
  • causal pairs include “killing people (hito wo korosu) – arrest (taiho)”, “election violation (senkyo ihan) – arrest (taiho)”, and the like.
  • “kill people (hito wo korosu)”, “arrest (taiho)”, and “election violation (senkyo ihan)” in Japanese are “kill a person”, “arrest”, “election offenses” in English, respectively. ".
  • feature used when predicting the cause expression by machine learning is called “feature”.
  • This Japanese sentence corresponds to the sentence “A traffic accident occurred because of drunken driving” in English. “A traffic accident occurred in Tamagawa. (Tamagawa de koutsuu jiko ga okotta.)”... Cause expression is not included. This Japanese sentence corresponds to the sentence “A traffic accident occurred at Tamagawa” in English.
  • emotional expression analysis is a technology that automatically classifies impressions that human beings generally receive for certain expressions. This classification category is called polarity. Examples of polarities of expression include “positive (good impression)”, “negative (bad impression)”, and “neutral (other than that)”.
  • Non-Patent Document 1 describes reversal of the polarity of expression.
  • Non-Patent Document 3 is given as an example of emotion expression analysis.
  • JP 2009-157771 A paragraphs 0032, 0069, 0072, 0073, etc.
  • Patent Document 1 uses a causal relationship pair.
  • In order to collect pairs of expressions it is necessary to select expressions from a large number of candidates in the order of the square of the number of expressions. Therefore, it is difficult to perform this operation manually.
  • Even when large-scale data such as a web corpus is used a problem of data sparseness is likely to occur due to the property of handling a pair of expressions, and there is a high possibility that selection cannot be performed with high accuracy.
  • the present invention can extract a cause expression from input data when input data relating to a certain event or condition is given and the input data includes a cause expression describing the cause of the event or condition. It is an object to provide a cause expression extraction device, a cause expression extraction method, and a cause expression extraction program.
  • the cause expression extraction apparatus classifies impressions received by a human being from a result expression that is an expression representing a result and a cause expression candidate that is a candidate of a cause expression that describes the cause of the result.
  • the cause expression candidate describes the cause of the result using the polarity determination means for determining the polarity of the case category for each of the result expression and the cause expression candidate, and the polarity of the result expression and the cause expression candidate.
  • a cause expression determination means for determining whether or not the cause expression has been made.
  • the cause expression extraction method gives an impression that a human being receives from an expression when a result expression that is a result expression and a cause expression candidate that is a cause expression candidate that describes the cause of the result are given.
  • the polarity that is the category for classification is determined for each result expression and cause expression candidate, and the cause expression candidate describes the cause of the result by using the polarity of the result expression and the cause expression candidate. It is characterized by determining whether it is an expression.
  • the cause expression extraction program provides a computer with a result expression that is a result expression and a cause expression candidate that is a cause expression candidate that describes the cause of the result.
  • a result expression that is a result expression
  • a cause expression candidate that is a cause expression candidate that describes the cause of the result.
  • the cause expression can be extracted from the input data.
  • FIG. FIG. 1 is a block diagram illustrating an example of a cause expression extraction apparatus according to the first embodiment of this invention.
  • the cause expression extraction apparatus of this embodiment includes an input unit 1, a data processing device 2 that operates under program control, and an output unit 4.
  • the data processing device 2 includes emotion analysis means 21 and cause expression determination means 22.
  • the input means 1 is an input interface through which input data is input.
  • a combination of a result expression that is an expression representing a result of a causal relationship and a cause expression candidate that is a candidate for an expression (cause expression) describing the cause of the result is input via the input unit 1.
  • the result expression represents, for example, an event or a state that is a result of the causal relationship.
  • Result expressions include expressions that express the state of the mind (for example, emotional expressions such as “okoru” and “hotto suru” in Japanese), and expressions that express evaluations and opinions (for example, Japanese) "Beautiful (utsukusii)", “Good (yoi)", etc.) may be included. “Okoru”, “hotto suru”, “utsukusii”, “yoi” in Japanese are “get angry”, “breathe freely”, “ Corresponds to “beautiful” and “good”.
  • the result expression and the cause expression candidate may be a character string, or may be structured data such as a morphological analysis result or a syntax analysis result.
  • the emotion analysis means 21 determines the polarity for each of the result expression and the cause expression candidate input via the input means 1.
  • polarity is a classification category of impressions that human beings generally receive for expression.
  • the polarity includes “positive”, “negative”, and “neutral”.
  • the following examples are given as examples of polarity.
  • the emotion analysis unit 21 stores in advance a dictionary that lists expressions corresponding to “negative”, expressions corresponding to “neutral”, and expressions corresponding to “positive”, and input expressions (result expressions) And the cause expression candidate) and the expressions listed in the dictionary may be subjected to pattern matching to determine the polarity of the input expression.
  • the emotion analysis means 21 may determine whether or not the character strings and the dependency structures match, and may perform pattern matching gently using a regular expression, a synonym dictionary, and a thesaurus.
  • a model that statistically determines the polarity of the expression from the data, and depending on the input expression and the model, The polarity of the input expression may be determined.
  • a polarity is given to a document
  • a customer review published on the Internet can be cited.
  • the emotion analysis means 21 may acquire the polarity of expression by a method other than the above. For example, when there is a server that provides a polarity determination result for each expression, the emotion analysis unit 21 may query the server for the polarity corresponding to the input expression and acquire the polarity of the expression input from the server. Good.
  • the cause expression determination unit 22 determines whether or not the cause expression candidate is a cause expression representing the cause of the result represented by the result expression, using the polarity of the result expression and the polarity of the cause expression candidate. Preferably, the cause expression determination unit 22 determines whether the cause expression candidate is the cause expression based on the consistency between the polarity of the result expression and the polarity of the cause expression candidate. For example, the cause expression determination unit 22 may determine that the cause expression candidate is the cause expression when the polarity of the result expression matches the polarity of the cause expression candidate.
  • the cause expression determination means 22 may calculate not only a discrete determination result indicating whether or not the cause expression candidate is a cause expression, but also a value indicating how much the cause expression candidate is a cause expression.
  • the cause expression determination means 22 may additionally refer to the presence / absence of a clue expression, the learning result obtained by machine learning, and the like when determining whether the cause expression candidate is a cause expression.
  • the output unit 4 is an output device (for example, a display device) that outputs the determination result of the cause expression determination unit 22.
  • the cause expression determination unit 22 causes the output unit 4 to output (for example, display) the result of determining whether the cause expression candidate is the cause expression.
  • the emotion analysis means 21 and the cause expression determination means 22 are realized by a CPU of a computer that operates according to a cause expression extraction program, for example.
  • the CPU may read the cause expression extraction program and operate as the emotion analysis means 21 and the cause expression determination means 22 according to the program.
  • the cause expression extraction program may be recorded on a computer-readable recording medium.
  • emotion analysis means 21 and the cause expression determination means 22 may be realized as separate hardware.
  • FIG. 2 is a flowchart showing an example of processing progress of the first embodiment of the present invention.
  • the emotion analysis means 21 determines the polarity of the result expression and the cause expression candidate (step S1).
  • the cause expression determination unit 22 uses the result expression polarity derived in step S1 and the cause expression candidate polarity to cause the cause expression candidate to indicate the cause of the result represented by the result expression. It is determined whether or not (step S2). For example, if the cause expression determination unit 22 matches the polarity of the result expression and the polarity of the cause expression candidate, the cause expression candidate is a cause expression representing the cause of the result represented by the result expression. You may judge. And when the polarity of a result expression and the polarity of a cause expression candidate do not correspond, you may determine with a cause expression candidate not being a cause expression. In step S2, the cause expression determination unit 22 causes the output unit 4 to output (for example, display) the determination result.
  • the cause of a negative event or condition can be considered to be a negative event or condition.
  • this is used to determine whether the cause expression candidate is a cause expression representing the cause of the result represented by the result expression using the polarity of the result expression and the polarity of the cause expression candidate. To do. Therefore, if the cause data describing the cause of the result is included in the input data (that is, if the cause expression candidate describes the cause of the result), the cause expression can be accurately extracted. .
  • the expression for which the polarity is to be determined may be a single expression such as “memory shortage (memori busoku)”, “error (era-)”, or the like. Then, the cost of the polarity determination processing for the expression is lower than the cost of constructing a causal relationship pair such as “memory shortage (memori busoku) —error (era-)”. That is, the polarity determination process for the expression can be realized more easily than the construction of the causal relationship pair. Therefore, in the present invention, it is possible to extract the cause expression without constructing a causal relationship pair.
  • FIG. FIG. 3 is a block diagram illustrating an example of a cause expression extraction device according to the second exemplary embodiment of the present invention. Elements similar to those in the first embodiment are denoted by the same reference numerals as those in FIG. 1, and detailed description thereof is omitted.
  • the cause expression extraction device of the present embodiment includes an input unit 1, a data processing device 2 that operates by program control, a storage device 3, and an output unit 4.
  • the data processing device 2 includes emotion analysis means 21, cause expression determination means 22, syntax analysis means 23, and clue matching means 24.
  • the storage device 3 stores a clue dictionary 31.
  • the clue dictionary 31 is a set of expressions (cue expressions) that serve as clues when extracting cause expressions.
  • a set of clue expressions may be determined in advance and stored in the storage device 3 as a clue dictionary 31.
  • clue expressions are often based on function words. Examples of clue expressions include “cause (ga genninn de)”, “due (niyori)”, “de (de)”, and the like. As already stated, “ga genninn de”, “by (niyori)”, and “de (de)” in Japanese correspond to “because of” in English.
  • text is input via the input means 1.
  • the text as input data is a single sentence.
  • the parsing means 23 parses the input text and obtains an analysis result. Specifically, the syntax analysis means 23 specifies the modification relationship (dependency relationship) of each expression in the text.
  • the clue collating means 24 collates the clue expression in the clue dictionary 31 with the text, and based on the location in the text corresponding to the clue expression and the modification relation of the expression in the text, the result expression and cause from the text Extract expression candidates.
  • the expression immediately before the clue expression corresponds to the cause expression candidate, and the destination of the phrase including the cause expression candidate is often the result expression.
  • the cause expression candidates are immediately before the cue expression.
  • the destination of the included phrase is the result expression.
  • the cause expression candidate is immediately before the clue expression, and the destination of the cause expression candidate is not necessarily the result expression.
  • the cause expression candidate position information information on which position the cause expression candidate appears on the basis of the clue expression (hereinafter referred to as cause expression candidate position information), and at which position.
  • result expression position information information on which position the cause expression candidate appears on the basis of the clue expression (hereinafter referred to as cause expression candidate position information) may also be included.
  • the clue collation means 24 refers to the cause expression position information and the result expression position information related to the clue expression, and extracts the result expression and the cause expression candidate from the text. Also good. Note that the Japanese sentence “The cause of the error is insufficient memory (era- ga hassei shita genninn ha memori busoku dearu.)” In this paragraph is “The cause of the error occurrence is” Equivalent to the sentence “memory shortage.”
  • the emotion analysis means 21 determines the polarity of the result expression and the cause expression candidate extracted from the text by the clue matching means 24, respectively.
  • the cause expression determination unit 22 determines whether or not the cause expression candidate is a cause expression representing the cause of the result represented by the result expression, using the polarity of the result expression and the polarity of the cause expression candidate.
  • the emotion analysis means 21 and the cause expression determination means 22 are the same as the emotion analysis means 21 and the cause expression determination means 22 in the first embodiment.
  • the syntax analysis unit 23, the clue collation unit 24, the emotion analysis unit 21, and the cause expression determination unit 22 are realized by a CPU of a computer that operates according to a cause expression extraction program, for example. Further, the syntax analysis unit 23, the clue collation unit 24, the emotion analysis unit 21, and the cause expression determination unit 22 may be realized as separate hardware.
  • FIG. 4 is a flowchart showing an example of processing progress of the second embodiment of the present invention.
  • the syntax analysis means 23 parses the input text and obtains an analysis result (step S11). That is, the modification relationship of each expression in the text is specified.
  • the clue collating means 24 collates the clue expression and the text, and based on the location in the text corresponding to the clue expression and the modification relationship of the expression in the text, the result expression and the cause expression candidate from the text. Is extracted (step S12).
  • the emotion analysis means 21 determines the polarity of the result expression and the cause expression candidate (step S13).
  • the cause expression determination means 22 is a cause expression that represents the cause of the result represented by the result expression, using the polarity of the result expression derived in step S1 and the polarity of the cause expression candidate. It is determined whether or not there is (step S14). The cause expression determination means 22 causes the output means 4 to output the determination result in step S14.
  • step S13 The polarity determination in step S13 is the same as step S1 in the first embodiment.
  • the determination in step S14 is the same as that in step S2 in the first embodiment.
  • the cause expression can be extracted with high accuracy. In other words, even if the cause expression candidate and the result expression are not specified and input, if the text is input, if the result expression and the cause expression are included in the text, the cause expression can be accurately extracted. Can do.
  • a result expression and a cause expression candidate are extracted from one text (for example, one sentence).
  • both the result expression and the cause expression candidate are required to determine whether or not the cause expression candidate is the cause expression. Therefore, the second embodiment has a high affinity with the present invention.
  • the input data is preferably a single sentence.
  • the input data in the second embodiment is not necessarily a single sentence.
  • the input data in the second embodiment may be a text having a length that includes a clue expression and includes a result expression and a cause expression candidate.
  • a set of a result expression that is an expression representing a result of a causal relationship and a cause expression candidate that is a candidate of an expression describing the cause of the result is input via the input unit 1.
  • the result expression is “an error has occurred (era- gashassei shita)”, and the cause expression candidate is “the memory is insufficient (memori ga fusoku shite ita)” (hereinafter, the first 1)), and the result expression is “Error occurred (era- ga hassei shita)” and the cause expression candidate is “Text search performed (tekisuto kensaku wo okonatta)” , The second set) is taken as an example. “Error occurred (era- ga hassei shita)” in Japanese corresponds to “An error occurrd.” In English.
  • the emotion analysis means 21 determines the polarity of the result expression and the cause expression candidate of each group.
  • the polarity of the result expression “an error has occurred (era- ga hassei shita)” is determined to be “negative”.
  • the polarity of the cause expression candidate “memori ga fusoku shite ita” is “negative”.
  • the polarity of the cause expression candidate “text search is performed (tekisuto kensaku wo okonatta)” is “neutral”.
  • the cause expression determination means 22 uses the polarity to determine whether or not the cause expression candidate is truly a cause expression. In the present embodiment, a case is shown in which the cause expression determination means 22 also derives a score (hereinafter referred to as “cause score”) that represents the cause expression likelihood of the cause expression candidate as the probability that the cause expression candidate is the cause expression.
  • a score hereinafter referred to as “cause score”
  • the cause expression determination unit 22 is a cause expression corresponding to “an error has occurred (era- ga hassei shita)” as a cause expression candidate “memory is insufficient (memori ga fusoku shite ita)”. Is determined.
  • the cause expression determination means 22 determines that the cause expression candidate “text search (tekisuto kensaku wo okonatta)” is not a cause expression corresponding to “an error has occurred (era- ga hassei shita)”. .
  • the cause expression determination means 22 then causes the cause expression candidate “memori ga fusoku shite ita” to correspond to the result expression “error has occurred (era- ga hassei shita)”.
  • the probability that the cause expression “text search (tekisuto kensaku wo okonatta)” is higher than the accuracy that is the cause expression corresponding to the result expression “error occurred (era- ga hassei shita)” .
  • the cause expression determination unit 22 outputs a determination result as to whether or not each cause expression candidate is a cause expression (for example, the output unit 4 displays the determination result).
  • a cause expression candidate whose cause score is 1 (“memory is insufficient (memori ga fusoku shite ita)”) is output as a cause expression.
  • a cause expression candidate having a cause score of 0 (“text search performed (tekisuto kensaku wo okonatta)”), the fact that it is not a cause expression is output.
  • the cause expression determination unit 22 may output not only a discrete determination result indicating whether the cause expression candidate is a cause expression but also a continuous value such as a cause score.
  • the cause expression determination unit 22 refers to the polarities of the result expression and the cause expression candidate when determining whether or not the cause expression candidate corresponds to the cause expression. It is preferable to use machine learning.
  • Machine learning is a method of automatically constructing a model that makes a prediction close to the correct answer data based on the correct answer data and predicting a new input based on the model.
  • correct data that indicates whether or not the cause expression candidate is really the cause expression is prepared, and prediction that can correctly classify the correct answer data Learn the model. Then, when a new result expression and a cause expression candidate are given as input data, it is predicted based on the model whether or not the cause expression candidate is truly a cause expression.
  • a feature to be used (referred to as a feature) is selected in advance from among various features of input data, and a prediction model is learned based on the feature and subsequent prediction is performed.
  • the features include, for example, words and word sequences that appear in the result expression and cause expression candidates, part of speech and meaning classification of each word, and whether or not a clue expression that easily represents the cause appears in the cause expression candidates.
  • this feature uses whether or not the polarity of the result expression matches the polarity of the cause expression candidate. It has been observed that the causes leading to negative events and conditions are often negative, and conversely, the causes leading to positive events and conditions are often positive. Therefore, by using the polarity consistency between the result expression and the cause expression candidate as the polarity, it can be determined with high accuracy whether or not the cause expression candidate is the cause expression. As described above, other features may be used together.
  • the cause expression (“memory was not insufficient (memori ga fusoku shite inakatta)”) contains the word “memory is insufficient (memori ga fusoku)”.
  • the word “insufficient memory (memori ga fusoku)” is used with a negative word, resulting in a positive polarity.
  • an expression is used with a negative word or a paradoxical expression, it is often handled that the polarity is reversed.
  • the cause expression candidate is the cause compared to the case where no polarity is used. Whether or not it is an expression can be determined with high accuracy.
  • the former sentence is referred to as a first sentence
  • the latter sentence is referred to as a second sentence.
  • the first sentence corresponds to the sentence “An error occurred because of memory shortage.” In English.
  • the second sentence is equivalent to a sentence “An error occurred when searching text.” In English.
  • the parsing means 23 parses the input text and obtains an analysis result. Specifically, the modification relation of each expression in the text is specified.
  • FIG. 5 shows the modification relationships in the text in the above sentences. An arrow indicated by a solid line in FIG. 5 indicates the destination of the phrase. For example, the phrase “error is (era- ga)” is related to the clause “hassei shita”.
  • the clue collating unit 24 collates the clue expression in the clue dictionary 31 with the text, and based on the location in the text corresponding to the clue expression and the modification relation of the expression in the text, the result is obtained from the text. Extract expressions and causal expression candidates.
  • the clue collating means 24 detects the clue expression “de” in both of the two input sentences.
  • the clue matching unit 24 extracts the expression “memory shortage (memoriokubusoku)” immediately before the clue expression “de (de)” as a cause expression candidate for the first sentence, and uses the phrase including the cause expression candidate.
  • the clue matching unit 24 extracts the expression “text search (tekisutoskensaku)” immediately before the clue expression “de (de)” as a cause expression candidate for the second sentence, and uses the phrase including the cause expression candidate. “Error occurred (era- ga hassei shita)”, which is the destination of a certain “text search (tekisuto kensaku de)”, is extracted as a result expression.
  • the cause expression candidate and the result expression are limited to one clause such as “occurs (hassei shita)” or includes multiple clauses such as “error occurred (era- ga hassei shita)”.
  • “Hassei shita” corresponds to “occurred” in English. It may not be preferable to have only one phrase as a cause expression candidate or result expression.
  • the expression corresponding to the main word has no polarity, such as “hassei shita”
  • the polarity may not be determined as a whole unless the essential case information is considered.
  • the syntax analysis unit 23 determines the essential case of each expression.
  • the clue collation means 24 extracts a cause expression candidate and a result expression as an expression including an essential case.
  • the emotion analysis means 22 determines the polarity of the result expression and the cause expression candidate of each sentence.
  • the polarity determination process by the emotion analysis means 22 is the same as the polarity determination process in the first embodiment.
  • the polarity of the result expression “error has occurred (era- ga hassei shita)” is determined to be “negative”.
  • the polarity of the cause expression candidate “memory shortage (memori busoku)” is “negative”.
  • the polarity of the cause expression candidate “text search (tekisuto kensaku)” is “neutral”.
  • the cause expression determination means 22 determines whether or not the cause expression candidate in each sentence is a cause expression for the result expression by using the polarities related to the result expression of each sentence and the cause expression candidates.
  • the determination process by the cause expression determination unit 22 is the same as the determination process by the cause expression determination unit 22 of the first embodiment.
  • the cause expression determination means 22 is the cause expression candidate of the first sentence (“memory shortage (memori busoku)”) is the cause expression of the result expression of the first sentence (“error occurred (era- ga hassei shita)”). Judge that there is. Further, it is determined that the cause expression candidate of the second sentence (“text search (tekisuto kensaku)”) is not the cause expression of the result expression of the second sentence (“error occurred (era- ga hassei shita)”).
  • the cause expression determination means 22 outputs a determination result as to whether or not the cause expression candidate of each sentence is a cause expression.
  • the cause expression determination means 22 not only outputs a determination result relating to whether or not the cause expression candidate is a cause expression, but also provides a continuous value such as a cause score for each cause expression candidate as in the first embodiment. And the value may also be output.
  • the cause expression candidate and the result expression are extracted from the inputted sentence, the polarities are determined with respect to the cause expression candidate and the result expression, and the cause expression candidate is converted into the result expression based on the polarity. It is determined whether or not the corresponding cause expression is present. Therefore, even if a pair of cause expression candidates and result expressions is not obtained in advance, the cause expression can be accurately extracted from the input sentence.
  • FIG. 6 is a block diagram showing an example of the minimum configuration of the cause expression extraction apparatus of the present invention.
  • the cause expression extraction apparatus of the present invention includes polarity determination means 91 and cause expression determination means 92.
  • polarity determination means 91 for example, emotion analysis means 21
  • a result expression that is an expression representing a result
  • a cause expression candidate that is a candidate for a cause expression that describes the cause of the result
  • human beings can recognize from the expression.
  • the polarity which is a category in the case of classifying received impressions, is determined for each of the result expression and the cause expression candidate.
  • the cause expression determination unit 92 uses the result expression polarity and the cause expression candidate polarity to determine whether the cause expression candidate is a cause expression describing the cause of the result. judge.
  • a clue expression storage unit (for example, the storage device 3) that stores a clue expression that serves as a clue to extract the cause expression
  • a syntax analysis unit for example, a syntax analysis unit 23
  • a result expression / cause expression candidate extracting means for example, a clue matching means for extracting the result expression and the cause expression candidate from the text based on the location in the text to be used as the clue expression and the parsing result. It may be.
  • the present invention is preferably applied to a cause expression extracting device that extracts an expression describing a cause that has led to a certain event or state from input expressions.
  • a cause expression extracting device that extracts an expression describing a cause that has led to a certain event or state from input expressions.
  • FAQ creation support for creating FAQs (Frequently Asked Questions) from a set of past cases that are a set of questions and answers to questions about products and services, text mining to extract important knowledge from text data
  • the present invention can be used for processing to extract an expression describing the cause of a certain event or state in a question and answer system that answers a language question in a natural language.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

Provided is a cause expression extraction device, whereby, if inputted data is supplied which relates to an event or state, and this inputted data includes a cause expression which describes a cause leading to this event or state, it is possible to extract the cause expression from the inputted data. A polarity determination means (91) determines, when a result expression which is an expression which represents a result and a cause expression candidate which is a candidate for a cause expression which describes a cause of this result are supplied, a polarity which is a category for sorting an impression which a person receives from an expression, with respect to the result expression and the cause expression candidate, respectively. A cause expression determination means (92) determines, using the polarity of the result expression and the polarity of the cause expression candidate, whether the cause expression candidate is a cause expression which describes the cause of the result.

Description

原因表現抽出装置、原因表現抽出方法および原因表現抽出プログラムCause expression extraction apparatus, cause expression extraction method, and cause expression extraction program
 本発明は、入力された表現の中から、ある出来事や状態に至った原因を記述した表現を抽出する原因表現抽出装置、原因表現抽出方法および原因表現抽出プログラムに関する。 The present invention relates to a cause expression extraction apparatus, a cause expression extraction method, and a cause expression extraction program for extracting an expression describing a cause that has led to a certain event or state from input expressions.
 以下、本明細書では、主として日本語の文や単語等を例にして説明する。 Hereinafter, in this specification, explanation will be given mainly using Japanese sentences and words as examples.
 ある出来事や状態に至った原因を特定することは、様々な局面で有用である。例えば、故障した製品を修理する場合、生じている不具合の原因を特定することで、初めて正しい修理を行うことができる。また、例えば、アンケート分析を行う場合、回答者が意見を述べた詳細な理由を知ることができれば、分析結果をより正確に読み解くことができる。 It is useful in various aspects to identify the cause that led to a certain event or condition. For example, when repairing a faulty product, correct repair can be performed for the first time by identifying the cause of the malfunction that has occurred. Further, for example, when conducting a questionnaire analysis, if the detailed reason why the respondent stated his / her opinion can be known, the analysis result can be read more accurately.
 このように、出来事や状態に至った原因を特定することの有用性は高い。そのため、原因に該当する表現(以下、原因表現と呼ぶ。)をテキストから自動的に抽出する技術が提案されている(例えば、特許文献1参照)。 Thus, it is highly useful to identify the cause that led to the event or condition. Therefore, a technique for automatically extracting an expression corresponding to a cause (hereinafter referred to as a cause expression) from a text has been proposed (see, for example, Patent Document 1).
 特許文献1に記載の技術では、テキスト中の原因表現の箇所を人手で指定した正解データを元に機械学習を行う。このような正解データの例(日本語の例を挙げる。)として、「メモリ不足でエラーが発生した(memori busoku de era- ga hassei shita)」等のテキストデータが考えられる。この例は、「メモリ不足(memori busoku)」という原因表現を含んでいる。また、原因表現を含まないテキストデータの例として、例えば、「印刷モジュールでエラーが発生した(insatsu moju-ru de era- ga hassei shita)」等のテキストデータが考えられる。なお、日本語における「メモリ不足でエラーが発生した(memori busoku de era- ga hassei shita)」という文は、英語における“An error occurred because of memory shortage.”という文に相当する。そして、日本語における「メモリ不足」は、英語における“memory shortage”に相当する。また、日本語における「印刷モジュールでエラーが発生した(insatsu moju-ru de era- ga hassei shita)」という文は、英語における“An error occurred in the printing module.”という文に相当する。 In the technique described in Patent Document 1, machine learning is performed based on correct answer data that manually specifies the location of the cause expression in the text. As an example of such correct answer data (an example in Japanese is given), text data such as “an error has occurred due to lack of memory (memori busoku de era- ga hassei shita)” can be considered. This example includes the cause expression “memory shortage (memori busoku)”. Further, as an example of text data not including the cause expression, for example, text data such as “an error has occurred in the printing module (insatsu moju-ru de era- ga hassei shita)” can be considered. Note that the sentence “An error occurred due to lack of memory (memori busoku de era- ga hassei shita)” in Japanese corresponds to the sentence “An error occurred because of memory shortage.” In English. And “memory shortage” in Japanese corresponds to “memory“ shortage ”in English. Also, the sentence “An error occurred in the printing module (insatsu moju-ru de era- ga hassei shita)” in Japanese corresponds to the sentence “An error occurred in the printing module.” In English.
 また、特許文献1に記載の技術では、原因表現を抽出する際に手掛かりとなる表現(以下、手掛かり表現と呼ぶ。)を事前に準備しておき、機械学習において、その出現の有無を素性の一つとして利用する。手掛かり表現の例(日本語の例を挙げる。)として、「が原因で(ga genninn de)」、「により(niyori)」、「で(de)」等が挙げられる。日本語における「が原因で(ga genninn de)」、「により(niyori)」、「で(de)」は、英語における“because of”に相当する。 Further, in the technique described in Patent Document 1, an expression that becomes a clue (hereinafter referred to as a clue expression) when extracting the cause expression is prepared in advance, and in machine learning, the presence or absence of the occurrence is identified. Use as one. Examples of clue expressions (for example in Japanese) include “cause of (ga (genninn de)”, “due (niyori)”, “de (de)”, and the like. “Because of (ga genninn de)”, “by (niyori)”, and “de (de)” in Japanese correspond to “because of” in English.
 また、特許文献1に記載の技術では、因果関係になりやすい表現のペア(以下、因果関係ペアと呼ぶ。)を事前に準備しておき、機械学習において、その出現の有無を素性の一つとして利用する。因果関係ペアの例として、「人を殺す(hito wo korosu)-逮捕(taiho)」、「選挙違反(senkyo ihan)-逮捕(taiho)」等が挙げられる。なお、日本語における「人を殺す(hito wo korosu)」、「逮捕(taiho)」、「選挙違反(senkyo ihan)」は、それぞれ、英語における“kill a person”,“arrest”,“election offenses”に相当する。 In the technique described in Patent Document 1, a pair of expressions that are likely to be causal relations (hereinafter referred to as causal relation pairs) is prepared in advance, and the presence or absence of appearance is one of the features in machine learning. Use as Examples of causal pairs include “killing people (hito wo korosu) – arrest (taiho)”, “election violation (senkyo ihan) – arrest (taiho)”, and the like. In addition, “kill people (hito wo korosu)”, “arrest (taiho)”, and “election violation (senkyo ihan)” in Japanese are “kill a person”, “arrest”, “election offenses” in English, respectively. ".
 なお、ここで、機械学習で原因表現を予測する際に利用する特徴を「素性」と呼んでいる。 Note that the feature used when predicting the cause expression by machine learning is called “feature”.
 正解データの作成には高いコストがかかる。そのため、特許文献1に記載の技術では、正解データの量を減らす目的で、まず、比較的、原因表現の判別の確度が高い手掛かり表現を併用している。ただし、以下の例に示すように、手掛かり表現が含まれるからといって、必ず原因表現を含んでいるわけではない。 ・ Creating correct data is expensive. For this reason, the technique described in Patent Document 1 uses a clue expression that has a relatively high accuracy in determining the cause expression for the purpose of reducing the amount of correct data. However, as shown in the following example, just because a clue expression is included does not necessarily include a cause expression.
手掛かり表現:日本語における「で(de)」
  「メモリ不足でエラーが発生した。(memori busoku de era- ga hassei shita.)」                       ・・・原因表現を含む。
  「印刷モジュールでエラーが発生した。(insatsu moju-ru de era- ga hassei shita.)」                    ・・・原因表現を含まない。
  「テキスト検索でエラーが発生した。(tekisuto kensaku de era- ga hassei shita.)」                    ・・・原因表現を含まない。
  なお、この日本文は、英語における“An error occurred when searching text.”という文に該当する。
  「飲酒運転で交通事故が起こった。(inshu unntenn de koutsuu jiko ga okotta.)」                      ・・・原因表現を含む。
  なお、この日本文は、英語における“A traffic accident occurred because of drunken driving.”という文に相当する。
  「玉川で交通事故が起こった。(Tamagawa de koutsuu jiko ga okotta.) 」                          ・・・原因表現を含まない。
  なお、この日本文は、英語における“A traffic accident occurred at Tamagawa.”という文に相当する。
Cue expression: “de” in Japanese
"An error occurred due to insufficient memory. (Memori busoku de era-ga hassei shita.)"
"An error occurred in the print module. (Insatsu moju-ru de era-ga hassei shita.)" ... Cause expression is not included.
"An error occurred in text search. (Tekisuto kensaku de era-ga hassei shita.)" ... Cause expression is not included.
This Japanese sentence corresponds to the sentence “An error occurred when searching text” in English.
“A traffic accident occurred during drunk driving. (Inshu unntenn de koutsuu jiko ga okotta.)”… Includes cause expressions.
This Japanese sentence corresponds to the sentence “A traffic accident occurred because of drunken driving” in English.
“A traffic accident occurred in Tamagawa. (Tamagawa de koutsuu jiko ga okotta.)”… Cause expression is not included.
This Japanese sentence corresponds to the sentence “A traffic accident occurred at Tamagawa” in English.
手掛かり表現:日本語における「から(kara)」
  「社員の非正規化から貧困が進んだ。(shainn no hiseikika kara hinnkonn ga susunda.)」                   ・・・原因表現を含む。
  なお、この日本文は、英語における“Poverty permeated because of the spread of irregular employment of staff.”という文に相当する。
  「農村部から貧困が進んだ。(nousonnbu kara hinnkonn ga susunda.)」                           ・・・原因表現を含まない。
  なお、この日本文は、英語における“Poverty permeated from rural districts.”という文に相当する。
Clue expression: “kara” in Japanese
“Poverty has progressed due to deregulation of employees.” (Including cause expression)
This Japanese sentence corresponds to the sentence “Poverty permeated because of the spread of irregular employment of staff” in English.
"Poverty has progressed from rural areas. (Nousonnbu kara hinnkonn ga susunda.)" ... Cause expression is not included.
This Japanese sentence corresponds to the sentence “Poverty permeated from rural districts” in English.
手掛かり表現:日本語における「ために(tameni)」
  「熱があるために病欠する。(netsu ga aru tameni byouketsu suru.)」                           ・・・原因表現を含む。
  なお、この日本文は、英語における“I am off sick because of getting a fever.”という文に相当する。
  「風邪を治すために病欠する。(kaze wo naosu tameni byouketsu suru.)」
                ・・・原因表現を含まない。(目的を表している。)
  なお、この日本文は、英語における“I am off sick in order to cure a cold.”という文に相当する。
Clue expression: “tameni” in Japanese
“I'm sick because I have fever.” (Netsu ga aru tameni byouketsu suru.)
This Japanese sentence corresponds to the sentence “I am off sick because of getting a fever.” In English.
“I'm sick to get rid of my cold (kaze wo naosu tameni byouketsu suru.)”
・ ・ ・ Cause expression is not included. (Represents purpose.)
This Japanese sentence corresponds to the sentence “I am off sick in order to cure a cold.” In English.
 そのため、特許文献1に記載の技術では、因果関係ペアを利用することで精度を高めている。上記の例において、原因表現を適切に判定するためには、理想的には、以下に例示する因果関係ペアが与えられていればよい。 Therefore, in the technique described in Patent Document 1, accuracy is increased by using a causal relationship pair. In the above example, in order to appropriately determine the cause expression, ideally, a causal relationship pair exemplified below may be given.
 「メモリ不足(memori busoku)-エラー(era-)」
 「飲酒運転(inshu unntenn)-交通事故(koutsuu jiko)」
 「非正規化(hiseikika)-貧困(hinnkonn)」
 「熱がある(netsu ga aru)-病欠(byouketsu)」
"Insufficient memory (memori busoku)-Error (era-)"
"Drinking driving (inshu unntenn)-traffic accident (koutsuu jiko)"
"Denormalization (hiseikika)-poverty (hinnkonn)"
"There is a fever (netsu ga aru)-sick leave (byouketsu)"
 なお、ここで例示した日本語の「メモリ不足」、「エラー」、「飲酒運転」、「交通事故」、「非正規化」、「貧困」、「熱がある」、「病欠」は、それぞれ、英語における“memory shortage”,“error”,“drunken driving”,“traffic accident”,“the spread of irregular employment”,“poverty”,“getting a fever”,“be off sick”に相当する。 In addition, the “memory shortage”, “error”, “drinking driving”, “traffic accident”, “denormalization”, “poverty”, “having fever”, and “disease absence” shown here It corresponds to “memory shortage”, “error”, “drunken driving”, “traffic accident”, “the spread of irregular employment”, “poverty”, “getting a fever”, “be off sick” in English.
 また、感情表現分析は、ある表現に対して一般的に人間が受けるであろう印象を自動分類する技術である。この分類のカテゴリを極性と呼ぶ。表現の極性として、「ポジティブ(よい印象)」、「ネガティブ(悪い印象)」、「ニュートラル(それ以外)」が挙げられる。 Also, emotional expression analysis is a technology that automatically classifies impressions that human beings generally receive for certain expressions. This classification category is called polarity. Examples of polarities of expression include “positive (good impression)”, “negative (bad impression)”, and “neutral (other than that)”.
 非特許文献1には、表現の極性の反転について記載されている。 Non-Patent Document 1 describes reversal of the polarity of expression.
 なお、手掛かり表現については、非特許文献2にも記載されている。 Note that clue expressions are also described in Non-Patent Document 2.
 また、感情表現分析の例として、非特許文献3が挙げられる。 Moreover, Non-Patent Document 3 is given as an example of emotion expression analysis.
特開2009-157791号公報(段落0032,0069,0072,0073等)JP 2009-157771 A (paragraphs 0032, 0069, 0072, 0073, etc.)
 前述のように、特許文献1に記載の技術では因果関係ペアを利用している。しかし、因果関係ペアを事前に準備しておくことは容易ではない。表現のペアを収集するためには、表現の数の二乗のオーダの膨大な候補から表現を選別する必要がある。そのため、この作業を人手で行うことは困難である。また、Webコーパス等の大規模データを用いる場合でも、表現のペアを扱うという性質上、データスパースネス問題が生じやすく、精度よく選別を行えない可能性が高い。また、同じ表現を表す同義語辞書や、表現の上位概念、下位概念を定義したシソーラスを用いて表現の数を減らしたとしても、二乗のオーダの数に対応する必要があることは変わらず、因果関係ペアを事前に準備しておくことは、依然、容易ではない。 As described above, the technique described in Patent Document 1 uses a causal relationship pair. However, it is not easy to prepare a causal relationship pair in advance. In order to collect pairs of expressions, it is necessary to select expressions from a large number of candidates in the order of the square of the number of expressions. Therefore, it is difficult to perform this operation manually. Even when large-scale data such as a web corpus is used, a problem of data sparseness is likely to occur due to the property of handling a pair of expressions, and there is a high possibility that selection cannot be performed with high accuracy. In addition, even if the number of expressions is reduced using a synonym dictionary that represents the same expression, a thesaurus that defines the subordinate concept, and a thesaurus that defines the subordinate concept, it is still necessary to support the number of square orders. It is still not easy to prepare causal relationship pairs in advance.
 そこで、本発明は、ある出来事や状態に関する入力データが与えられ、その入力データがその出来事や状態に至った原因を記述した原因表現を含む場合に、原因表現を入力データから抽出することができる原因表現抽出装置、原因表現抽出方法および原因表現抽出プログラムを提供することを目的とする。 Therefore, the present invention can extract a cause expression from input data when input data relating to a certain event or condition is given and the input data includes a cause expression describing the cause of the event or condition. It is an object to provide a cause expression extraction device, a cause expression extraction method, and a cause expression extraction program.
 本発明による原因表現抽出装置は、結果を表す表現である結果表現と、その結果の原因を記述した原因表現の候補である原因表現候補とが与えられると、表現から人間が受ける印象を分類する場合のカテゴリである極性を、結果表現および原因表現候補それぞれに対して判定する極性判定手段と、結果表現の極性と原因表現候補の極性とを用いて、原因表現候補が、結果の原因を記述した原因表現であるか否かを判定する原因表現判定手段とを備えることを特徴とする。 The cause expression extraction apparatus according to the present invention classifies impressions received by a human being from a result expression that is an expression representing a result and a cause expression candidate that is a candidate of a cause expression that describes the cause of the result. The cause expression candidate describes the cause of the result using the polarity determination means for determining the polarity of the case category for each of the result expression and the cause expression candidate, and the polarity of the result expression and the cause expression candidate. And a cause expression determination means for determining whether or not the cause expression has been made.
 また、本発明による原因表現抽出方法は、結果を表す表現である結果表現と、その結果の原因を記述した原因表現の候補である原因表現候補とが与えられると、表現から人間が受ける印象を分類する場合のカテゴリである極性を、結果表現および原因表現候補それぞれに対して判定し、結果表現の極性と原因表現候補の極性とを用いて、原因表現候補が、結果の原因を記述した原因表現であるか否かを判定することを特徴とする。 In addition, the cause expression extraction method according to the present invention gives an impression that a human being receives from an expression when a result expression that is a result expression and a cause expression candidate that is a cause expression candidate that describes the cause of the result are given. The polarity that is the category for classification is determined for each result expression and cause expression candidate, and the cause expression candidate describes the cause of the result by using the polarity of the result expression and the cause expression candidate. It is characterized by determining whether it is an expression.
 また、本発明による原因表現抽出プログラムは、コンピュータに、結果を表す表現である結果表現と、その結果の原因を記述した原因表現の候補である原因表現候補とが与えられると、表現から人間が受ける印象を分類する場合のカテゴリである極性を、結果表現および原因表現候補それぞれに対して判定する極性判定処理、および、結果表現の極性と原因表現候補の極性とを用いて、原因表現候補が、結果の原因を記述した原因表現であるか否かを判定する原因表現判定処理を実行させることを特徴とする。 In addition, the cause expression extraction program according to the present invention provides a computer with a result expression that is a result expression and a cause expression candidate that is a cause expression candidate that describes the cause of the result. Using the polarity determination process for determining the polarity, which is the category when the impression received is classified, for each of the result expression and the cause expression candidate, and the polarity of the result expression and the cause expression candidate, And cause expression determination processing for determining whether or not the cause expression describes the cause of the result.
 本発明によれば、ある出来事や状態に関する入力データが与えられ、その入力データがその出来事や状態に至った原因を記述した原因表現を含む場合に、原因表現を入力データから抽出することができる。 According to the present invention, when input data relating to a certain event or state is given, and the input data includes a cause expression describing the cause of the event or state, the cause expression can be extracted from the input data. .
本発明の第1の実施形態の原因表現抽出装置の例を示すブロック図である。It is a block diagram which shows the example of the cause expression extraction apparatus of the 1st Embodiment of this invention. 本発明の第1の実施形態の処理経過の例を示すフローチャートである。It is a flowchart which shows the example of the process progress of the 1st Embodiment of this invention. 本発明の第2の実施形態の原因表現抽出装置の例を示すブロック図である。It is a block diagram which shows the example of the cause expression extraction apparatus of the 2nd Embodiment of this invention. 本発明の第2の実施形態の処理経過の例を示すフローチャートである。It is a flowchart which shows the example of the process progress of the 2nd Embodiment of this invention. テキスト内の修飾関係の例を示す模式図である。It is a schematic diagram which shows the example of the modification relationship in a text. 本発明の原因表現抽出装置の最小構成の例を示すブロック図である。It is a block diagram which shows the example of the minimum structure of the cause expression extraction apparatus of this invention.
 以下、本発明の実施形態を図面を参照して説明する。以下の説明では、主として日本語の文や単語等を例として用いて、本発明を説明する。ただし、本発明は、日本語以外の言語にも適用可能である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following description, the present invention will be described mainly using Japanese sentences and words as examples. However, the present invention can also be applied to languages other than Japanese.
実施形態1.
 図1は、本発明の第1の実施形態の原因表現抽出装置の例を示すブロック図である。本実施形態の原因表現抽出装置は、入力手段1と、プログラム制御により動作するデータ処理装置2と、出力手段4とを備える。データ処理装置2は、感情分析手段21と、原因表現判定手段22とを含む。
Embodiment 1. FIG.
FIG. 1 is a block diagram illustrating an example of a cause expression extraction apparatus according to the first embodiment of this invention. The cause expression extraction apparatus of this embodiment includes an input unit 1, a data processing device 2 that operates under program control, and an output unit 4. The data processing device 2 includes emotion analysis means 21 and cause expression determination means 22.
 入力手段1は、入力データが入力される入力インタフェースである。第1の実施形態では、因果関係の結果を表す表現である結果表現と、その結果の原因を記述している表現(原因表現)の候補である原因表現候補との組が入力手段1を介して、データ処理装置2に入力される。結果表現は、例えば、因果関係の結果となる出来事や状態を表している。結果表現には、心の状態を表す表現(例えば、日本語における「怒る(okoru)」、「ほっとする(hotto suru)」等の感情表現)や、評価や意見を表す表現(例えば、日本語における「美しい(utsukusii)」、「良い(yoi)」等)が含まれていてもよい。なお、日本語における「怒る(okoru)」、「ほっとする(hotto suru)」、「美しい(utsukusii)」、「良い(yoi)」は、それぞれ、英語における“get angry”,“breathe freely”,“beautiful”,“good”に相当する。 The input means 1 is an input interface through which input data is input. In the first embodiment, a combination of a result expression that is an expression representing a result of a causal relationship and a cause expression candidate that is a candidate for an expression (cause expression) describing the cause of the result is input via the input unit 1. Are input to the data processing device 2. The result expression represents, for example, an event or a state that is a result of the causal relationship. Result expressions include expressions that express the state of the mind (for example, emotional expressions such as “okoru” and “hotto suru” in Japanese), and expressions that express evaluations and opinions (for example, Japanese) "Beautiful (utsukusii)", "Good (yoi)", etc.) may be included. “Okoru”, “hotto suru”, “utsukusii”, “yoi” in Japanese are “get angry”, “breathe freely”, “ Corresponds to “beautiful” and “good”.
 結果表現および原因表現候補は、文字列であってもよく、あるいは、形態素解析結果や構文解析結果のように構造化されたデータであってもよい。 The result expression and the cause expression candidate may be a character string, or may be structured data such as a morphological analysis result or a syntax analysis result.
 感情分析手段21は、入力手段1を介して入力された結果表現および原因表現候補に対して、それぞれ、極性を判定する。 The emotion analysis means 21 determines the polarity for each of the result expression and the cause expression candidate input via the input means 1.
 既に説明したように、極性とは、表現に対して一般的に人間が受けるであろう印象の分類カテゴリである。極性として、「ポジティブ」、「ネガティブ」、「ニュートラル」がある。極性の例として、以下の例が挙げられる。 As already explained, polarity is a classification category of impressions that human beings generally receive for expression. The polarity includes “positive”, “negative”, and “neutral”. The following examples are given as examples of polarity.
  “lack of memory” - ネガティブ
  “drunken drive”  - ネガティブ
  “error occurred”  - ネガティブ
  “text search”   - ニュートラル
  “San Francisco”  - ニュートラル
  “happy birthday” - ポジティブ
“Lack of memory” – negative “drunken drive” – negative “error occurred” – negative “text search” – neutral “San Francisco” – neutral “happy birthday” – positive
 感情分析手段21は、例えば、「ネガティブ」に該当する表現、「ニュートラル」に該当する表現、および、「ポジティブ」に該当する表現をそれぞれ列挙した辞書を予め保持し、入力された表現(結果表現および原因表現候補)と、辞書に列挙された表現とのパターンマッチングを行うことにより、入力された表現の極性を判定すればよい。このとき、感情分析手段21は、文字列や係り受け構造が一致しているか否かを判定する他、正規表現、同義語辞書、シソーラスを用いて、緩やかにパターンマッチングを行ってもよい。 For example, the emotion analysis unit 21 stores in advance a dictionary that lists expressions corresponding to “negative”, expressions corresponding to “neutral”, and expressions corresponding to “positive”, and input expressions (result expressions) And the cause expression candidate) and the expressions listed in the dictionary may be subjected to pattern matching to determine the polarity of the input expression. At this time, the emotion analysis means 21 may determine whether or not the character strings and the dependency structures match, and may perform pattern matching gently using a regular expression, a synonym dictionary, and a thesaurus.
 また、文書に対して極性が与えられているデータを大量に用意し、そのデータから、統計的に表現の極性を判定するモデルを構築しておき、入力された表現と、そのモデルとによって、入力された表現の極性を判定してもよい。文書に対して極性が与えられているデータの例として、例えば、インターネット上で公開されているカスタマーレビュー等が挙げられる。 Also, prepare a large amount of data with polarity given to the document, build a model that statistically determines the polarity of the expression from the data, and depending on the input expression and the model, The polarity of the input expression may be determined. As an example of data in which a polarity is given to a document, for example, a customer review published on the Internet can be cited.
 ここでは、極性判定の方法として2つの方法を挙げたが、この2つの方法を組み合わせて極性判定を行ってもよい。また、感情分析手段21は、上記以外の方法で、表現の極性を取得してもよい。例えば、各表現に対する極性判定結果を提供するサーバがある場合、感情分析手段21は、入力された表現に対応する極性をそのサーバに問い合わせ、そのサーバから入力された表現の極性を取得してもよい。 Here, although two methods are listed as polarity determination methods, the polarity determination may be performed by combining these two methods. Moreover, the emotion analysis means 21 may acquire the polarity of expression by a method other than the above. For example, when there is a server that provides a polarity determination result for each expression, the emotion analysis unit 21 may query the server for the polarity corresponding to the input expression and acquire the polarity of the expression input from the server. Good.
 原因表現判定手段22は、結果表現の極性と、原因表現候補の極性とを用いて、原因表現候補が、結果表現が表している結果の原因を表す原因表現であるか否かを判定する。好ましくは、原因表現判定手段22は、結果表現の極性と、原因表現候補の極性との一致性により、原因表現候補が原因表現であるか否かを判定する。例えば、原因表現判定手段22は、結果表現の極性と、原因表現候補の極性とが一致していることにより、原因表現候補が原因表現であると判定すればよい。 The cause expression determination unit 22 determines whether or not the cause expression candidate is a cause expression representing the cause of the result represented by the result expression, using the polarity of the result expression and the polarity of the cause expression candidate. Preferably, the cause expression determination unit 22 determines whether the cause expression candidate is the cause expression based on the consistency between the polarity of the result expression and the polarity of the cause expression candidate. For example, the cause expression determination unit 22 may determine that the cause expression candidate is the cause expression when the polarity of the result expression matches the polarity of the cause expression candidate.
 原因表現判定手段22は、原因表現候補が原因表現であるか否かという離散的な判定結果だけでなく、原因表現候補がどの程度、原因表現らしいかを表す値も算出してもよい。 The cause expression determination means 22 may calculate not only a discrete determination result indicating whether or not the cause expression candidate is a cause expression, but also a value indicating how much the cause expression candidate is a cause expression.
 なお、原因表現判定手段22は、原因表現候補が原因表現であるか否かを判定する際、手掛かり表現の有無、機械学習によって得られた学習結果等を追加で参照してもよい。 The cause expression determination means 22 may additionally refer to the presence / absence of a clue expression, the learning result obtained by machine learning, and the like when determining whether the cause expression candidate is a cause expression.
 出力手段4は、原因表現判定手段22の判定結果を出力する出力装置(例えば、ディスプレイ装置等)である。原因表現判定手段22は、原因表現候補が原因表現であるか否かを判定した結果を、出力手段4に出力させる(例えば、表示させる)。 The output unit 4 is an output device (for example, a display device) that outputs the determination result of the cause expression determination unit 22. The cause expression determination unit 22 causes the output unit 4 to output (for example, display) the result of determining whether the cause expression candidate is the cause expression.
 感情分析手段21および原因表現判定手段22は、例えば、原因表現抽出プログラムに従って動作するコンピュータのCPUによって実現される。この場合、CPUが、例えば、原因表現抽出プログラムを読み込み、そのプログラムに従って、感情分析手段21および原因表現判定手段22として動作してもよい。原因表現抽出プログラムは、コンピュータが読み取り可能な記録媒体に記録されていてもよい。 The emotion analysis means 21 and the cause expression determination means 22 are realized by a CPU of a computer that operates according to a cause expression extraction program, for example. In this case, for example, the CPU may read the cause expression extraction program and operate as the emotion analysis means 21 and the cause expression determination means 22 according to the program. The cause expression extraction program may be recorded on a computer-readable recording medium.
 また、感情分析手段21および原因表現判定手段22が別々のハードウェアとして実現されていてもよい。 Further, the emotion analysis means 21 and the cause expression determination means 22 may be realized as separate hardware.
 次に、本実施形態の処理経過について説明する。図2は、本発明の第1の実施形態の処理経過の例を示すフローチャートである。 Next, the process progress of this embodiment will be described. FIG. 2 is a flowchart showing an example of processing progress of the first embodiment of the present invention.
 入力手段1を介して結果表現と原因表現候補との組が入力されると、感情分析手段21は、その結果表現および原因表現候補の極性をそれぞれ判定する(ステップS1)。 When the combination of the result expression and the cause expression candidate is input via the input means 1, the emotion analysis means 21 determines the polarity of the result expression and the cause expression candidate (step S1).
 次に、原因表現判定手段22は、ステップS1で導出された結果表現の極性と原因表現候補の極性とを用いて、原因表現候補が、その結果表現が表している結果の原因を表す原因表現であるか否かを判定する(ステップS2)。例えば、原因表現判定手段22は、結果表現の極性と原因表現候補の極性とが一致している場合に、原因表現候補が、その結果表現が表している結果の原因を表す原因表現であると判定してもよい。そして、結果表現の極性と原因表現候補の極性とが一致していない場合に、原因表現候補が原因表現でないと判定してもよい。原因表現判定手段22は、ステップS2において、その判定結果を出力手段4に出力させる(例えば、表示させる)。 Next, the cause expression determination unit 22 uses the result expression polarity derived in step S1 and the cause expression candidate polarity to cause the cause expression candidate to indicate the cause of the result represented by the result expression. It is determined whether or not (step S2). For example, if the cause expression determination unit 22 matches the polarity of the result expression and the polarity of the cause expression candidate, the cause expression candidate is a cause expression representing the cause of the result represented by the result expression. You may judge. And when the polarity of a result expression and the polarity of a cause expression candidate do not correspond, you may determine with a cause expression candidate not being a cause expression. In step S2, the cause expression determination unit 22 causes the output unit 4 to output (for example, display) the determination result.
 これまで、原因表現の抽出に、表現の極性は利用されていなかった。しかし、例えば、ネガティブな出来事や状態に至る原因は、やはりネガティブな出来事や状態であると考えることができる。本発明ではこのことを利用し、結果表現の極性と原因表現候補の極性とを用いて、原因表現候補が、その結果表現が表している結果の原因を表す原因表現であるか否かを判定する。従って、入力データに、結果の原因を記述した原因表現が含まれていれば(すなわち、原因表現候補が、結果の原因を記述していれば)、その原因表現を精度よく抽出することができる。 So far, the polarity of the expression has not been used to extract the cause expression. However, for example, the cause of a negative event or condition can be considered to be a negative event or condition. In the present invention, this is used to determine whether the cause expression candidate is a cause expression representing the cause of the result represented by the result expression using the polarity of the result expression and the polarity of the cause expression candidate. To do. Therefore, if the cause data describing the cause of the result is included in the input data (that is, if the cause expression candidate describes the cause of the result), the cause expression can be accurately extracted. .
 例えば、コールセンタに寄せられた製品の不具合(結果)や、不具合の原因は、いずれもネガティブな極性を持つことが多いと考えられる。よって、コールセンタに蓄積された過去の事例から、不具合に至った原因の原因表現を精度よく抽出することができる。ここでは、コールセンタの過去の事例を例に説明したが、入力データは、コールセンタの過去の事例に関するものでなくてもよい。 For example, it is considered that defects (results) of products sent to call centers and causes of defects often have negative polarity. Therefore, it is possible to accurately extract the cause expression of the cause of the failure from the past cases accumulated in the call center. Here, the past case of the call center has been described as an example, but the input data may not be related to the past case of the call center.
 また、本発明では、因果関係ペアを事前に準備する必要がない。極性の判定対象となる表現は、例えば、「メモリ不足(memori busoku)」、「エラー(era-)」等のような単独の表現でよい。そして、「メモリ不足(memori busoku)-エラー(era-)」等のような因果関係ペアを構築するコストよりも、表現に対する極性の判定処理のコストの方が低い。すなわち、因果関係ペアの構築よりも、表現に対する極性の判定処理の方が容易に実現できる。従って、本発明では、因果関係ペアの構築することなく、原因表現の抽出を実現できる。 In the present invention, it is not necessary to prepare causal relationship pairs in advance. The expression for which the polarity is to be determined may be a single expression such as “memory shortage (memori busoku)”, “error (era-)”, or the like. Then, the cost of the polarity determination processing for the expression is lower than the cost of constructing a causal relationship pair such as “memory shortage (memori busoku) —error (era-)”. That is, the polarity determination process for the expression can be realized more easily than the construction of the causal relationship pair. Therefore, in the present invention, it is possible to extract the cause expression without constructing a causal relationship pair.
実施形態2.
 図3は、本発明の第2の実施形態の原因表現抽出装置の例を示すブロック図である。第1の実施形態と同様の要素については、図1と同一の符号を付し、詳細な説明を省略する。
Embodiment 2. FIG.
FIG. 3 is a block diagram illustrating an example of a cause expression extraction device according to the second exemplary embodiment of the present invention. Elements similar to those in the first embodiment are denoted by the same reference numerals as those in FIG. 1, and detailed description thereof is omitted.
 本実施形態の原因表現抽出装置は、入力手段1と、プログラム制御により動作するデータ処理装置2と、記憶装置3と、出力手段4とを備える。データ処理装置2は、感情分析手段21と、原因表現判定手段22と、構文解析手段23と、手掛かり照合手段24とを含む。 The cause expression extraction device of the present embodiment includes an input unit 1, a data processing device 2 that operates by program control, a storage device 3, and an output unit 4. The data processing device 2 includes emotion analysis means 21, cause expression determination means 22, syntax analysis means 23, and clue matching means 24.
 記憶装置3は、手掛かり辞書31を記憶する。手掛かり辞書31は、原因表現を抽出する際に手掛かりとなる表現(手掛かり表現)の集合である。手掛かり表現の集合を予め定めておき、手掛かり辞書31として記憶装置3に記憶させておけばよい。なお、手掛かり表現は機能語に基づくことが多い。手掛かり表現の例として、「が原因で(ga genninn de)」、「により(niyori)」、「で(de)」等が挙げられる。既に述べたように、日本語における「が原因で(ga genninn de)」、「により(niyori)」、「で(de)」は、英語における“because of”に相当する。 The storage device 3 stores a clue dictionary 31. The clue dictionary 31 is a set of expressions (cue expressions) that serve as clues when extracting cause expressions. A set of clue expressions may be determined in advance and stored in the storage device 3 as a clue dictionary 31. Note that clue expressions are often based on function words. Examples of clue expressions include “cause (ga genninn de)”, “due (niyori)”, “de (de)”, and the like. As already stated, “ga genninn de”, “by (niyori)”, and “de (de)” in Japanese correspond to “because of” in English.
 第2の実施形態では、入力手段1を介してテキストが入力される。好ましくは、入力データとなるテキストは、1つの文である。 In the second embodiment, text is input via the input means 1. Preferably, the text as input data is a single sentence.
 構文解析手段23は、入力されたテキストを構文解析し、解析結果を得る。具体的には、構文解析手段23は、テキスト内の各表現の修飾関係(係り受け関係)を特定する。 The parsing means 23 parses the input text and obtains an analysis result. Specifically, the syntax analysis means 23 specifies the modification relationship (dependency relationship) of each expression in the text.
 手掛かり照合手段24は、手掛かり辞書31内の手掛かり表現とテキストとを照合し、手掛かり表現に該当するテキスト内の箇所と、テキスト内の表現の修飾関係とに基づいて、テキスト内から結果表現および原因表現候補を抽出する。 The clue collating means 24 collates the clue expression in the clue dictionary 31 with the text, and based on the location in the text corresponding to the clue expression and the modification relation of the expression in the text, the result expression and cause from the text Extract expression candidates.
 日本語では、機能語を用いた手掛かり表現の場合、手掛かり表現の直前の表現が原因表現候補に該当し、その原因表現候補を含む文節の係り先が結果表現となることが多い。例示した「が原因で(ga genninn de)」、「により(niyori)」、「で(de)」という手掛かり表現の場合においても、手掛かり表現の直前が原因表現候補であり、その原因表現候補を含む文節の係り先が結果表現となる。これらの手掛かり表現を用いる場合、手掛かり照合手段24は、テキスト内における手掛かり表現の直前部分を原因表現候補として抽出し、その原因表現候補を含む文節の係り先を結果表現として抽出する。 In Japanese, in the case of clue expressions using function words, the expression immediately before the clue expression corresponds to the cause expression candidate, and the destination of the phrase including the cause expression candidate is often the result expression. Even in the case of the cue expressions “ga cause (ga genninn de)”, “dori (niyori)”, and “de (de)”, the cause expression candidates are immediately before the cue expression. The destination of the included phrase is the result expression. When these clue expressions are used, the clue collating unit 24 extracts a portion immediately before the clue expression in the text as a cause expression candidate, and extracts a destination of a phrase including the cause expression candidate as a result expression.
 ただし、全ての手掛かり表現に関して、手掛かり表現の直前が原因表現候補であり、その原因表現候補の係り先が結果表現となるとは限らない。例えば、日本語における「原因は(genninn ha)」という文言を手掛かり表現として用いるとする。この場合、「エラーが発生した原因はメモリ不足である。(era- ga hassei shita genninn ha memori busoku dearu.)」等のように、手掛かり表現の直前が結果表現となり、係り先が原因表現候補となる。従って、手掛かり表現辞書31には、手掛かり表現毎に、手掛かり表現を基準として、どの位置に原因表現候補が出現するかに関する情報(以下、原因表現候補位置情報と記す。)、および、どの位置に結果表現が出現するかに関する情報(以下、結果表現位置情報と記す。)も含めてもよい。そして、手掛かり照合手段24は、テキスト内で手掛かり表現を検出したときに、その手掛かり表現に関する原因表現位置情報および結果表現位置情報を参照して、テキスト内から結果表現および原因表現候補を抽出してもよい。なお、本段落で例示した「エラーが発生した原因はメモリ不足である。(era- ga hassei shita genninn ha memori busoku dearu.)」という日本語の文は、英語における“The cause of the error occurrence is memory shortage.”という文に相当する。 However, for all clue expressions, the cause expression candidate is immediately before the clue expression, and the destination of the cause expression candidate is not necessarily the result expression. For example, suppose the word “cause is (genninn ha)” in Japanese is used as a clue expression. In this case, the cause of the error is insufficient memory. (Era- ga hassei shita genninn ha memori busoku dearu.) Etc. Become. Therefore, in the clue expression dictionary 31, for each clue expression, information on which position the cause expression candidate appears on the basis of the clue expression (hereinafter referred to as cause expression candidate position information), and at which position. Information regarding whether the result expression appears (hereinafter referred to as result expression position information) may also be included. When the clue matching unit 24 detects the clue expression in the text, the clue collation means 24 refers to the cause expression position information and the result expression position information related to the clue expression, and extracts the result expression and the cause expression candidate from the text. Also good. Note that the Japanese sentence “The cause of the error is insufficient memory (era- ga hassei shita genninn ha memori busoku dearu.)” In this paragraph is “The cause of the error occurrence is” Equivalent to the sentence “memory shortage.”
 感情分析手段21は、手掛かり照合手段24によってテキスト内から抽出された結果表現および原因表現候補に対して、それぞれ、極性を判定する。 The emotion analysis means 21 determines the polarity of the result expression and the cause expression candidate extracted from the text by the clue matching means 24, respectively.
 原因表現判定手段22は、結果表現の極性と、原因表現候補の極性とを用いて、原因表現候補が、結果表現が表している結果の原因を表す原因表現であるか否かを判定する。 The cause expression determination unit 22 determines whether or not the cause expression candidate is a cause expression representing the cause of the result represented by the result expression, using the polarity of the result expression and the polarity of the cause expression candidate.
 感情分析手段21および原因表現判定手段22は、第1の実施形態における感情分析手段21および原因表現判定手段22と同様である。 The emotion analysis means 21 and the cause expression determination means 22 are the same as the emotion analysis means 21 and the cause expression determination means 22 in the first embodiment.
 構文解析手段23、手掛かり照合手段24、感情分析手段21および原因表現判定手段22は、例えば、原因表現抽出プログラムに従って動作するコンピュータのCPUによって実現される。また、構文解析手段23、手掛かり照合手段24、感情分析手段21および原因表現判定手段22が別々のハードウェアとして実現されていてもよい。 The syntax analysis unit 23, the clue collation unit 24, the emotion analysis unit 21, and the cause expression determination unit 22 are realized by a CPU of a computer that operates according to a cause expression extraction program, for example. Further, the syntax analysis unit 23, the clue collation unit 24, the emotion analysis unit 21, and the cause expression determination unit 22 may be realized as separate hardware.
 次に、本実施形態の処理経過について説明する。図4は、本発明の第2の実施形態の処理経過の例を示すフローチャートである。 Next, the process progress of this embodiment will be described. FIG. 4 is a flowchart showing an example of processing progress of the second embodiment of the present invention.
 入力手段1を介してテキストが入力されると、構文解析手段23は、入力されたテキストを構文解析し、解析結果を得る(ステップS11)。すなわち、テキスト内の各表現の修飾関係を特定する。 When a text is input via the input means 1, the syntax analysis means 23 parses the input text and obtains an analysis result (step S11). That is, the modification relationship of each expression in the text is specified.
 次に、手掛かり照合手段24は、手掛かり表現とテキストとを照合し、手掛かり表現に該当するテキスト内の箇所と、テキスト内の表現の修飾関係とに基づいて、テキスト内から結果表現および原因表現候補を抽出する(ステップS12)。 Next, the clue collating means 24 collates the clue expression and the text, and based on the location in the text corresponding to the clue expression and the modification relationship of the expression in the text, the result expression and the cause expression candidate from the text. Is extracted (step S12).
 次に、感情分析手段21は、その結果表現および原因表現候補の極性をそれぞれ判定する(ステップS13)。 Next, the emotion analysis means 21 determines the polarity of the result expression and the cause expression candidate (step S13).
 そして、原因表現判定手段22は、ステップS1で導出された結果表現の極性と原因表現候補の極性とを用いて、原因表現候補が、その結果表現が表している結果の原因を表す原因表現であるか否かを判定する(ステップS14)。原因表現判定手段22は、ステップS14において、その判定結果を出力手段4に出力させる。 Then, the cause expression determination means 22 is a cause expression that represents the cause of the result represented by the result expression, using the polarity of the result expression derived in step S1 and the polarity of the cause expression candidate. It is determined whether or not there is (step S14). The cause expression determination means 22 causes the output means 4 to output the determination result in step S14.
 ステップS13における極性判定は、第1の実施形態におけるステップS1と同様である。また、ステップS14の判定は、第1の実施形態におけるステップS2と同様である。 The polarity determination in step S13 is the same as step S1 in the first embodiment. The determination in step S14 is the same as that in step S2 in the first embodiment.
 第2の実施形態では、一つのテキスト内に、結果表現に対応する、原因を記述した原因表現が含まれていれば、その原因表現を精度よく抽出することができる。すなわち、原因表現候補と結果表現とを定めて入力しなくても、テキストを入力すれば、そのテキスト内に結果表現および原因表現が含まれている場合に、その原因表現を精度よく抽出することができる。 In the second embodiment, if the cause expression corresponding to the result expression is described in one text, the cause expression can be extracted with high accuracy. In other words, even if the cause expression candidate and the result expression are not specified and input, if the text is input, if the result expression and the cause expression are included in the text, the cause expression can be accurately extracted. Can do.
 第2の実施形態では、1つのテキスト(例えば、1つの文)から結果表現および原因表現候補を抽出する。また、本発明では、原因表現候補が原因表現であるか否かの判定に、結果表現および原因表現候補の両方を必要とする。従って、第2の実施形態は、本発明との親和性が高い。 In the second embodiment, a result expression and a cause expression candidate are extracted from one text (for example, one sentence). In the present invention, both the result expression and the cause expression candidate are required to determine whether or not the cause expression candidate is the cause expression. Therefore, the second embodiment has a high affinity with the present invention.
 なお、上記の第2の実施形態において、入力データは1つの文であることが好ましいと説明したが、第2の実施形態における入力データは必ずしも1つの文である必要はない。第2の実施形態における入力データは、手掛かり表現が含まれ、かつ結果表現と原因表現候補が含まれる程度の長さのテキストであればよい。 In the second embodiment, the input data is preferably a single sentence. However, the input data in the second embodiment is not necessarily a single sentence. The input data in the second embodiment may be a text having a length that includes a clue expression and includes a result expression and a cause expression candidate.
 以下、第1の実施形態に対応する実施例を、図1を参照して説明する。最初に、入力手段1を介して、因果関係の結果を表す表現である結果表現と、その結果の原因を記述している表現の候補である原因表現候補との組が入力される。本実施例では、結果表現が「エラーが発生した(era- ga hassei shita)」であり、原因表現候補が「メモリが不足していた(memori ga fusoku shite ita)」である組(以下、第1の組と記す。)と、結果表現が「エラーが発生した(era- ga hassei shita)」であり、原因表現候補が「テキスト検索を行った(tekisuto kensaku wo okonatta)」である組(以下、第2の組と記す。)の、2組が入力された場合を例にする。なお、日本語における「エラーが発生した(era- ga hassei shita)」は、英語における“An error occurrd.”に相当する。また、「メモリが不足していた(memori ga fusoku shite ita)」は、英語における“The memory was insufficient.”に相当する。また、日本語における「テキスト検索を行った(tekisuto kensaku wo okonatta)」は、“Text search was executed.”に相当する。 Hereinafter, an example corresponding to the first embodiment will be described with reference to FIG. First, a set of a result expression that is an expression representing a result of a causal relationship and a cause expression candidate that is a candidate of an expression describing the cause of the result is input via the input unit 1. In this example, the result expression is “an error has occurred (era- gashassei shita)”, and the cause expression candidate is “the memory is insufficient (memori ga fusoku shite ita)” (hereinafter, the first 1)), and the result expression is “Error occurred (era- ga hassei shita)” and the cause expression candidate is “Text search performed (tekisuto kensaku wo okonatta)” , The second set) is taken as an example. “Error occurred (era- ga hassei shita)” in Japanese corresponds to “An error occurrd.” In English. “Memory shortage (memori ga fusoku shite ita)” corresponds to “The memory was insufficient.” In English. “Text search performed (tekisuto kensaku wo okonatta)” in Japanese corresponds to “Text search was executed.”.
 感情分析手段21は、各組の結果表現および原因表現候補の極性をそれぞれ判定する。本例では、「エラーが発生した(era- ga hassei shita)」という結果表現の極性が「ネガティブ」であると判定したものとする。また、「メモリが不足していた(memori ga fusoku shite ita)」という原因表現候補の極性が「ネガティブ」であると判定したものとする。また、「テキスト検索を行った(tekisuto kensaku wo okonatta)」という原因表現候補の極性が「ニュートラル」であると判定したものとする。 The emotion analysis means 21 determines the polarity of the result expression and the cause expression candidate of each group. In this example, it is assumed that the polarity of the result expression “an error has occurred (era- ga hassei shita)” is determined to be “negative”. Further, it is assumed that the polarity of the cause expression candidate “memori ga fusoku shite ita” is “negative”. Further, it is assumed that the polarity of the cause expression candidate “text search is performed (tekisuto kensaku wo okonatta)” is “neutral”.
 原因表現判定手段22は、その極性を用いて、原因表現候補が真に原因表現であるか否かを判定する。本実施例では、原因表現判定手段22が、原因表現候補が原因表現である確度として、原因表現候補の原因表現らしさを表すスコア(以下、原因スコア)も導出する場合を示す。 The cause expression determination means 22 uses the polarity to determine whether or not the cause expression candidate is truly a cause expression. In the present embodiment, a case is shown in which the cause expression determination means 22 also derives a score (hereinafter referred to as “cause score”) that represents the cause expression likelihood of the cause expression candidate as the probability that the cause expression candidate is the cause expression.
 第1の組では、結果表現(「エラーが発生した(era- ga hassei shita)」)と原因表現候補(「メモリが不足していた(memori ga fusoku shite ita)」)の極性がともにネガティブで一致する。従って、原因表現判定手段22は、「メモリが不足していた(memori ga fusoku shite ita)」という原因表現候補が、「エラーが発生した(era- ga hassei shita)」に対応する原因表現であると判定する。また、第2の組では、結果表現(「エラーが発生した(era- ga hassei shita)」)の極性がネガティブであり、原因表現候補(「テキスト検索を行った(tekisuto kensaku wo okonatta)」)の極性がニュートラルであり、極性が一致しない。従って、原因表現判定手段22は、「テキスト検索を行った(tekisuto kensaku wo okonatta)」という原因表現候補が、「エラーが発生した(era- ga hassei shita)」に対応する原因表現でないと判定する。そして、原因表現判定手段22は、「メモリが不足していた(memori ga fusoku shite ita)」という原因表現候補が「エラーが発生した(era- ga hassei shita)」という結果表現に対応する原因表現である確度を、「テキスト検索を行った(tekisuto kensaku wo okonatta)」という原因表現候補が「エラーが発生した(era- ga hassei shita)」という結果表現に対応する原因表現である確度よりも高める。 In the first set, the polarities of the result expression ("error occurred (era- ga hassei shita)") and the cause expression candidate ("memory was insufficient (memori ga fusoku shite ita)") were both negative Match. Therefore, the cause expression determination unit 22 is a cause expression corresponding to “an error has occurred (era- ga hassei shita)” as a cause expression candidate “memory is insufficient (memori ga fusoku shite ita)”. Is determined. In the second set, the result expression (“error occurred (era- ga hassei shita)”) is negative in polarity, and the cause expression candidate (“text search (tekisuto kensaku wo okonatta)”) The polarity is neutral and the polarities do not match. Therefore, the cause expression determination means 22 determines that the cause expression candidate “text search (tekisuto kensaku wo okonatta)” is not a cause expression corresponding to “an error has occurred (era- ga hassei shita)”. . The cause expression determination means 22 then causes the cause expression candidate “memori ga fusoku shite ita” to correspond to the result expression “error has occurred (era- ga hassei shita)”. The probability that the cause expression “text search (tekisuto kensaku wo okonatta)” is higher than the accuracy that is the cause expression corresponding to the result expression “error occurred (era- ga hassei shita)” .
 確度を高める方法として様々な方法があるが、結果表現の極性と原因表現候補の極性とが一致している場合、その原因表現候補の原因スコアに1を加算し、結果表現の極性と原因表現候補の極性とが異なっている場合、その原因表現候補の原因スコアを変化させないこととしてもよい。この場合、上記の第1の組の原因表現候補(「メモリが不足していた(memori ga fusoku shite ita)」)に関しては、原因スコアの初期値(0とする。)に1を加算し、原因スコアを1とする。一方、上記の第2の組の原因表現候補(「テキスト検索を行った(tekisuto kensaku wo okonatta)」)の原因スコアは、初期値0のまま変化させない。 There are various methods for increasing the accuracy. When the polarity of the result expression matches the polarity of the cause expression candidate, 1 is added to the cause score of the cause expression candidate, and the polarity of the result expression and the cause expression When the polarity of the candidate is different, the cause score of the cause expression candidate may not be changed. In this case, for the first set of cause expression candidates (“memory is insufficient (memori ga fusoku shite ita)”), 1 is added to the initial value (set to 0) of the cause score, The cause score is 1. On the other hand, the cause score of the above-mentioned second group of cause expression candidates (“text search was performed (tekisuto kensaku wo okonatta)”) remains the initial value 0.
 原因表現判定手段22は、各原因表現候補について、原因表現であるか否かの判定結果を出力する(例えば、出力手段4に表示させる)。本例では、原因スコアが1である原因表現候補(「メモリが不足していた(memori ga fusoku shite ita)」)に関しては、原因表現である旨を出力する。また、原因スコアが0である原因表現候補(「テキスト検索を行った(tekisuto kensaku wo okonatta)」)に関しては、原因表現でない旨を出力する。原因表現判定手段22は、原因表現候補が原因表現であるか否かという離散的な判定結果だけでなく、原因スコア等の連続的な値を併せて出力してもよい。 The cause expression determination unit 22 outputs a determination result as to whether or not each cause expression candidate is a cause expression (for example, the output unit 4 displays the determination result). In this example, a cause expression candidate whose cause score is 1 (“memory is insufficient (memori ga fusoku shite ita)”) is output as a cause expression. For a cause expression candidate having a cause score of 0 (“text search performed (tekisuto kensaku wo okonatta)”), the fact that it is not a cause expression is output. The cause expression determination unit 22 may output not only a discrete determination result indicating whether the cause expression candidate is a cause expression but also a continuous value such as a cause score.
 また、原因表現判定手段22は、原因表現候補が原因表現に該当するか否かを判定する際に、結果表現および原因表現候補の各極性を参照する他に、併せて、サポートベクターマシン等の機械学習を用いることが好ましい。機械学習とは、正解となるデータを元に当該正解となるデータに近い予測をするモデルを自動構築し、そのモデルに基づいて新たな入力に対する予測を行うという方法である。原因表現の抽出においては、結果表現と原因表現候補の組に対して、その原因表現候補が真に原因表現であるか否かを表した正解データを用意し、この正解データを正しく分類できる予測モデルを学習する。そして、新たな結果表現と原因表現候補が入力データとして与えられた場合に、そのモデルに基づいて、その原因表現候補が真に原因表現であるか否かを予測する。一般に機械学習においては、入力データが持つ様々な特徴のうち、用いる特徴(素性と呼ぶ。)を予め選択しておき、その素性を元に予測モデルの学習とその後の予測を行う。素性としては例えば結果表現や原因表現候補に出現する単語や単語の並び、各単語の品詞や意味分類、原因を表しやすい手掛かり表現が原因表現候補に出現しているか否か等が挙げられる。 The cause expression determination unit 22 refers to the polarities of the result expression and the cause expression candidate when determining whether or not the cause expression candidate corresponds to the cause expression. It is preferable to use machine learning. Machine learning is a method of automatically constructing a model that makes a prediction close to the correct answer data based on the correct answer data and predicting a new input based on the model. In the extraction of the cause expression, for the combination of the result expression and the cause expression candidate, correct data that indicates whether or not the cause expression candidate is really the cause expression is prepared, and prediction that can correctly classify the correct answer data Learn the model. Then, when a new result expression and a cause expression candidate are given as input data, it is predicted based on the model whether or not the cause expression candidate is truly a cause expression. In general, in machine learning, a feature to be used (referred to as a feature) is selected in advance from among various features of input data, and a prediction model is learned based on the feature and subsequent prediction is performed. The features include, for example, words and word sequences that appear in the result expression and cause expression candidates, part of speech and meaning classification of each word, and whether or not a clue expression that easily represents the cause appears in the cause expression candidates.
 本発明では、この素性として、結果表現の極性と、原因表現候補の極性とが一致しているか否かということを用いる。ネガティブな出来事や状態に至る原因はネガティブなことが多く、逆に、ポジティブな出来事や状態に至る原因はポジティブなことが多いという傾向が観測されている。従って、結果表現と原因表現候補との間の極性の一致性を極性として用いることで、原因表現候補が原因表現であるか否かを高い精度で判定できる。上記のように、他の素性を併せて利用してもよい。 In the present invention, this feature uses whether or not the polarity of the result expression matches the polarity of the cause expression candidate. It has been observed that the causes leading to negative events and conditions are often negative, and conversely, the causes leading to positive events and conditions are often positive. Therefore, by using the polarity consistency between the result expression and the cause expression candidate as the polarity, it can be determined with high accuracy whether or not the cause expression candidate is the cause expression. As described above, other features may be used together.
 なお、一見、2つの表現において極性が逆に見えるケースであっても、実は極性が一致していることがある。日本語における「事なきを得た(koto naki wo eta)」という結果表現と、「メモリが不足していなかった(memori ga fusoku shite inakatta)」という原因表現とを例に説明する。なお、日本語における「事なきを得た(koto naki wo eta)」は、英語における“No harm is done.”に相当する。また、日本語における「メモリが不足していなかった(memori ga fusoku shite inakatta)」は、英語における“The memory was not insufficient.”に相当する。この結果表現(「事なきを得た(koto naki wo eta)」)の極性はポジティブである。原因表現(「メモリが不足していなかった(memori ga fusoku shite inakatta)」)は、「メモリが不足(memori ga fusoku)」という文言を含んでいるので、一見、極性がネガティブに見えるが、「メモリが不足(memori ga fusoku)」という文言が否定語とともに用いられている結果、極性はポジティブとなる。このように、ある表現が否定語や逆説的な表現とともに用いられている場合、極性が反転すると扱うことが多い。従って、上記の例では、「事なきを得た(koto naki wo eta)」、「メモリが不足していなかった(memori ga fusoku shite inakatta)」の極性はそれぞれポジティブであり、「メモリが不足していなかった(memori ga fusoku shite inakatta)」は、「事なきを得た(koto naki wo eta)」の原因となる。また、例えば、利害関係が対立する人物・組織などに関する記述にネガティブ表現が出現する場合にも、その表現にも、上記と同様の極性の反転が起こり得る。「ライバル社製品のメモリが不足していた(raibarusha seihin no memori ga fusoku shiteita)」という表現の場合、「メモリが不足(memori ga fusoku)」というネガティブな表現を含んでいるが、この表現とともに、利害関係が対立する者を表す「ライバル社(raibarusha)」という表現も出現している。その結果、「ライバル社製品のメモリが不足していた(raibarusha seihin no memori ga fusoku shiteita)」という記述全体としての極性は、ネガティブから変化し、ネガティブな出来事や状態の原因表現とはなりにくい。上記のような極性の反転については、非特許文献1に記載されている。なお、ここで例示した「ライバル社製品のメモリが不足していた(raibarusha seihin no memori ga fusoku shiteita)」という日本語の文は、英語における“The memories of competitor’s products are insufficient.”という文に相当する。そして、日本語の「ライバル社(raibarusha)」は、英語の“competitor”に相当する。 Note that even if the polarities appear to be reversed in the two expressions at first glance, the polarities may actually match. An example of the result expression in Japanese is “koto naki wo eta” and the cause expression “memori ga fusoku shite inakatta”. Note that “koto を 得 naki wo eta” in Japanese corresponds to “No harm is done.” In English. “The memory was not insufficient (memori ga fusoku shite inakatta)” in Japanese corresponds to “The memory was not insufficient.” In English. The polarity of the resulting expression (“koto naki wo eta”) is positive. The cause expression (“memory was not insufficient (memori ga fusoku shite inakatta)”) contains the word “memory is insufficient (memori ga fusoku)”. The word “insufficient memory (memori ga fusoku)” is used with a negative word, resulting in a positive polarity. Thus, when an expression is used with a negative word or a paradoxical expression, it is often handled that the polarity is reversed. Therefore, in the above example, the polarities of “I got nothing (koto naki wo eta)” and “Not enough memory (memori ga fusoku shite inakatta)” are positive, respectively, “Memori ga fusoku shite inakatta” ”causes“ koto naki wo eta ”. In addition, for example, when a negative expression appears in a description regarding a person / organization or the like having a conflict of interest, the same polarity reversal can occur in the expression. The expression “raibarusha seihin no memori ga fusoku shiteita)” contains the negative expression “memory shortage (memori ga」 fusoku) ”, but with this expression, The expression “raibarusha”, which represents those with conflicting interests, has also appeared. As a result, the overall polarity of the description, “the memory of the rival product was insufficient (raibarusha seihin no memori ga fusoku shiteita)” changes from negative, and is unlikely to be a cause expression of negative events and conditions. The inversion of the polarity as described above is described in Non-Patent Document 1. In addition, the Japanese sentence “raibarusha seihin no memori ga fusoku shiteita” illustrated here is equivalent to the sentence “The memories of competitor's products are insufficient.” In English To do. And Japanese “raibarusha” is equivalent to English “competitor”.
 本実施例では、結果表現の極性と原因表現候補の極性とを用いて、原因表現候補が原因表現であるか否かを判定するので、極性を用いない場合に比べて、原因表現候補が原因表現であるか否かを精度よく判定できる。 In the present embodiment, since the polarity of the result expression and the polarity of the cause expression candidate are used to determine whether or not the cause expression candidate is the cause expression, the cause expression candidate is the cause compared to the case where no polarity is used. Whether or not it is an expression can be determined with high accuracy.
 以下、第2の実施形態に対応する実施例を、図3を参照して説明する。本実施例では、手掛かり表現として、「が原因で(ga genninn de)」、「により(niyori)」、「で(de)」が手掛かり辞書31に含まれている場合を例にして説明する。 Hereinafter, an example corresponding to the second embodiment will be described with reference to FIG. In this embodiment, a case where “cause (ga genninn de)”, “due (niyori)”, and “de (de)” are included in the clue dictionary 31 will be described as an example.
 また、本実施例では、入力データとして、「メモリ不足でエラーが発生した(memori busoku de era- ga hassei shita)」という文と、「テキスト検索でエラーが発生した(tekisuto kensaku de era- ga hassei shita)」という文が与えられた場合を例にする。以下、前者の文を第1文と記し、後者の文を第2文と記す。第1文は、英語における“An error occurred because of memory shortage.”という文に相当する。また、第2文は、英語における“An error occurred when searching text.”という文に相当する。 In this embodiment, as input data, a sentence “an error occurred due to insufficient memory (memori busoku de era- ga hassei shita)” and “an error occurred in text search (tekisuto kensaku de era- ga hassei) Take the case where the sentence "shita)" is given. Hereinafter, the former sentence is referred to as a first sentence, and the latter sentence is referred to as a second sentence. The first sentence corresponds to the sentence “An error occurred because of memory shortage.” In English. The second sentence is equivalent to a sentence “An error occurred when searching text.” In English.
 構文解析手段23は、入力されたテキストを構文解析し、解析結果を得る。具体的には、テキスト内の各表現の修飾関係を特定する。上記の各文におけるテキスト内の修飾関係を図5に示す。図5において実線で示した矢印は、文節の係り先を示している。例えば、「エラーが(era- ga)」という文節が、「発生した(hassei shita)」という文節に係っていることを示している。 The parsing means 23 parses the input text and obtains an analysis result. Specifically, the modification relation of each expression in the text is specified. FIG. 5 shows the modification relationships in the text in the above sentences. An arrow indicated by a solid line in FIG. 5 indicates the destination of the phrase. For example, the phrase “error is (era- ga)” is related to the clause “hassei shita”.
 次に、手掛かり照合手段24は、手掛かり辞書31内の手掛かり表現とテキストとを照合し、手掛かり表現に該当するテキスト内の箇所と、テキスト内の表現の修飾関係とに基づいて、テキスト内から結果表現および原因表現候補を抽出する。本例では、手掛かり照合手段24は、2つの入力文の双方において、手掛かり表現「で(de)」を検出する。そして、手掛かり照合手段24は、第1文に関しては、手掛かり表現「で(de)」の直前の表現「メモリ不足(memori busoku)」を原因表現候補として抽出し、その原因表現候補を含む文節である「メモリ不足で(memori busoku de)」の係り先となっている「エラーが発生した(era- ga hassei shita)」を結果表現として抽出する。また、手掛かり照合手段24は、第2文に関しては、手掛かり表現「で(de)」の直前の表現「テキスト検索(tekisuto kensaku)」を原因表現候補として抽出し、その原因表現候補を含む文節である「テキスト検索で(tekisuto kensaku de)」の係り先となっている「エラーが発生した(era- ga hassei shita)」を結果表現として抽出する。 Next, the clue collating unit 24 collates the clue expression in the clue dictionary 31 with the text, and based on the location in the text corresponding to the clue expression and the modification relation of the expression in the text, the result is obtained from the text. Extract expressions and causal expression candidates. In this example, the clue collating means 24 detects the clue expression “de” in both of the two input sentences. Then, the clue matching unit 24 extracts the expression “memory shortage (memoriokubusoku)” immediately before the clue expression “de (de)” as a cause expression candidate for the first sentence, and uses the phrase including the cause expression candidate. “Error occurred (era- ga hassei shita)”, which is related to a certain “memory shortage (memori busoku de)”, is extracted as a result expression. Further, the clue matching unit 24 extracts the expression “text search (tekisutoskensaku)” immediately before the clue expression “de (de)” as a cause expression candidate for the second sentence, and uses the phrase including the cause expression candidate. “Error occurred (era- ga hassei shita)”, which is the destination of a certain “text search (tekisuto kensaku de)”, is extracted as a result expression.
 ここで、原因表現候補や結果表現を「発生した(hassei shita)」のように1つの文節だけとするか、「エラーが発生した(era- ga hassei shita)」のように複数の文節を含むようにするかについては、複数の方針が考えられる。なお、「発生した(hassei shita)」は、英語の“occurred”に相当する。原因表現候補や結果表現を1つの文節だけとすることは、好ましくない場合がある。例えば、「発生した(hassei shita)」等のように、主辞にあたる表現が極性を持たない場合、必須格の情報を考慮しないと、全体として極性が定まらないことがある。例えば、「発生した(hassei shita)」という例では、何が発生したかによって極性が異なる。そのため、構文解析手段23は、各表現の必須格を判定することが好ましい。そして、手掛かり照合手段24は、必須格を含む表現として、原因表現候補や結果表現を抽出することが好ましい。 Here, the cause expression candidate and the result expression are limited to one clause such as “occurs (hassei shita)” or includes multiple clauses such as “error occurred (era- ga hassei shita)”. There are several policies regarding how to do this. “Hassei shita” corresponds to “occurred” in English. It may not be preferable to have only one phrase as a cause expression candidate or result expression. For example, when the expression corresponding to the main word has no polarity, such as “hassei shita”, the polarity may not be determined as a whole unless the essential case information is considered. For example, in the example of “hassei shita”, the polarity differs depending on what has occurred. Therefore, it is preferable that the syntax analysis unit 23 determines the essential case of each expression. And it is preferable that the clue collation means 24 extracts a cause expression candidate and a result expression as an expression including an essential case.
 手掛かり照合手段24が第1文および第2文からそれぞれ結果表現および原因表現候補を抽出すると、感情分析手段22は、各文の結果表現および原因表現候補についてそれぞれ極性を判定する。感情分析手段22による極性判定処理は、第1の実施例における極性判定処理と同様である。ここでは、「エラーが発生した(era- ga hassei shita)」という結果表現の極性が「ネガティブ」であると判定したものとする。また、「メモリ不足(memori busoku)」という原因表現候補の極性が「ネガティブ」であると判定したものとする。また、「テキスト検索(tekisuto kensaku)」という原因表現候補の極性が「ニュートラル」であると判定したものとする。 When the clue matching means 24 extracts the result expression and the cause expression candidate from the first sentence and the second sentence, the emotion analysis means 22 determines the polarity of the result expression and the cause expression candidate of each sentence. The polarity determination process by the emotion analysis means 22 is the same as the polarity determination process in the first embodiment. Here, it is assumed that the polarity of the result expression “error has occurred (era- ga hassei shita)” is determined to be “negative”. Further, it is assumed that the polarity of the cause expression candidate “memory shortage (memori busoku)” is “negative”. Further, it is assumed that the polarity of the cause expression candidate “text search (tekisuto kensaku)” is “neutral”.
 続いて、原因表現判定手段22は、各文の結果表現および原因表現候補それぞれに関する極性を用いて、各文における原因表現候補が、結果表現に対する原因表現になっているか否かを判定する。原因表現判定手段22による判定処理は、第1の実施例の原因表現判定手段22による判定処理と同様である。原因表現判定手段22は、第1文の原因表現候補(「メモリ不足(memori busoku)」)が第1文の結果表現(「エラーが発生した(era- ga hassei shita)」)の原因表現であると判定する。また、第2文の原因表現候補(「テキスト検索(tekisuto kensaku)」)が第2文の結果表現(「エラーが発生した(era- ga hassei shita)」)の原因表現でないと判定する。 Subsequently, the cause expression determination means 22 determines whether or not the cause expression candidate in each sentence is a cause expression for the result expression by using the polarities related to the result expression of each sentence and the cause expression candidates. The determination process by the cause expression determination unit 22 is the same as the determination process by the cause expression determination unit 22 of the first embodiment. The cause expression determination means 22 is the cause expression candidate of the first sentence (“memory shortage (memori busoku)”) is the cause expression of the result expression of the first sentence (“error occurred (era- ga hassei shita)”). Judge that there is. Further, it is determined that the cause expression candidate of the second sentence (“text search (tekisuto kensaku)”) is not the cause expression of the result expression of the second sentence (“error occurred (era- ga hassei shita)”).
 そして、原因表現判定手段22は、各文の原因表現候補について、原因表現であるか否かの判定結果を出力する。原因表現判定手段22は、原因表現候補が原因表現であるか否かに関する判定結果を出力するだけでなく、第1の実施例と同様に、原因スコア等の連続的な値を原因表現候補毎に求め、その値も併せて出力してもよい。 Then, the cause expression determination means 22 outputs a determination result as to whether or not the cause expression candidate of each sentence is a cause expression. The cause expression determination means 22 not only outputs a determination result relating to whether or not the cause expression candidate is a cause expression, but also provides a continuous value such as a cause score for each cause expression candidate as in the first embodiment. And the value may also be output.
 第2の実施例では、入力された文から原因表現候補および結果表現を抽出し、その原因表現候補および結果表現に関してそれぞれ極性を判定し、その極性に基づいて、原因表現候補がその結果表現に対応する原因表現になっているか否かを判定する。従って、事前に原因表現候補と結果表現の組を求めていなくても、入力された文から原因表現を精度よく抽出することができる。 In the second embodiment, the cause expression candidate and the result expression are extracted from the inputted sentence, the polarities are determined with respect to the cause expression candidate and the result expression, and the cause expression candidate is converted into the result expression based on the polarity. It is determined whether or not the corresponding cause expression is present. Therefore, even if a pair of cause expression candidates and result expressions is not obtained in advance, the cause expression can be accurately extracted from the input sentence.
 次に、本発明の最小構成について説明する。図6は、本発明の原因表現抽出装置の最小構成の例を示すブロック図である。本発明の原因表現抽出装置は、極性判定手段91と、原因表現判定手段92とを備える。 Next, the minimum configuration of the present invention will be described. FIG. 6 is a block diagram showing an example of the minimum configuration of the cause expression extraction apparatus of the present invention. The cause expression extraction apparatus of the present invention includes polarity determination means 91 and cause expression determination means 92.
 極性判定手段91(例えば、感情分析手段21)は、結果を表す表現である結果表現と、その結果の原因を記述した原因表現の候補である原因表現候補とが与えられると、表現から人間が受ける印象を分類する場合のカテゴリである極性を、結果表現および原因表現候補それぞれに対して判定する。 When polarity determination means 91 (for example, emotion analysis means 21) is given a result expression that is an expression representing a result and a cause expression candidate that is a candidate for a cause expression that describes the cause of the result, human beings can recognize from the expression. The polarity, which is a category in the case of classifying received impressions, is determined for each of the result expression and the cause expression candidate.
 原因表現判定手段92(例えば、原因表現判定手段22)は、結果表現の極性と原因表現候補の極性とを用いて、原因表現候補が、結果の原因を記述した原因表現であるか否かを判定する。 The cause expression determination unit 92 (for example, the cause expression determination unit 22) uses the result expression polarity and the cause expression candidate polarity to determine whether the cause expression candidate is a cause expression describing the cause of the result. judge.
 そのような構成により、ある出来事や状態に関する入力データが与えられ、その入力データがその出来事や状態に至った原因を記述した原因表現を含む場合に、原因表現を入力データから抽出することができる。 With such a configuration, input data related to an event or state is given, and when the input data includes a cause expression describing the cause that led to the event or state, the cause expression can be extracted from the input data. .
 また、原因表現を抽出する手掛かりとなる手掛かり表現を記憶する手掛かり表現記憶手段(例えば、記憶装置3)と、入力されたテキストに対して構文解析を行う構文解析手段(例えば、構文解析手段23)と、手掛かり表現にするテキスト内の箇所と、構文解析結果とに基づいて、テキストから結果表現および原因表現候補を抽出する結果表現・原因表現候補抽出手段(例えば、手掛かり照合手段)とを備える構成であってもよい。 In addition, a clue expression storage unit (for example, the storage device 3) that stores a clue expression that serves as a clue to extract the cause expression, and a syntax analysis unit (for example, a syntax analysis unit 23) that performs syntax analysis on the input text. And a result expression / cause expression candidate extracting means (for example, a clue matching means) for extracting the result expression and the cause expression candidate from the text based on the location in the text to be used as the clue expression and the parsing result. It may be.
 この出願は、2012年7月26日に出願された日本特許出願2012-166075を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2012-166075 filed on July 26, 2012, the entire disclosure of which is incorporated herein.
 以上、実施形態および実施例を参照して本願発明を説明したが、本願発明は上記の実施形態および実施例に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the present invention has been described with reference to the embodiments and examples, the present invention is not limited to the above embodiments and examples. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
産業上の利用の可能性Industrial applicability
 本発明は、入力された表現の中から、ある出来事や状態に至った原因を記述した表現を抽出する原因表現抽出装置に好適に適用される。例えば、製品やサービスの不具合に対する質問とその回答の組である過去の事例の集合からFAQ(Frequently Asked Questions)を作成するFAQ作成支援や、テキストデータから重要な知見を抽出するテキストマイニングや、自然言語の質問に対して自然言語で回答する質疑応答システムなどにおいて、ある出来事や状態に至った原因を記述した表現を抽出する処理に本発明は利用可能である。 The present invention is preferably applied to a cause expression extracting device that extracts an expression describing a cause that has led to a certain event or state from input expressions. For example, FAQ creation support for creating FAQs (Frequently Asked Questions) from a set of past cases that are a set of questions and answers to questions about products and services, text mining to extract important knowledge from text data, The present invention can be used for processing to extract an expression describing the cause of a certain event or state in a question and answer system that answers a language question in a natural language.
 1 入力手段
 2 データ処理装置
 3 記憶装置
 4 出力手段
 21 感情分析手段
 22 原因表現判定手段
 23 構文解析手段
 24 手掛かり照合手段
 31 手掛かり辞書
DESCRIPTION OF SYMBOLS 1 Input means 2 Data processor 3 Memory | storage device 4 Output means 21 Emotion analysis means 22 Cause expression determination means 23 Syntax analysis means 24 Clue matching means 31 Clue dictionary

Claims (6)

  1.  結果を表す表現である結果表現と、前記結果の原因を記述した原因表現の候補である原因表現候補とが与えられると、表現から人間が受ける印象を分類する場合のカテゴリである極性を、前記結果表現および前記原因表現候補それぞれに対して判定する極性判定手段と、
     前記結果表現の極性と前記原因表現候補の極性とを用いて、前記原因表現候補が、前記結果の原因を記述した原因表現であるか否かを判定する原因表現判定手段とを備える
     ことを特徴とする原因表現抽出装置。
    Given a result expression that is an expression that represents a result and a cause expression candidate that is a candidate for a cause expression that describes the cause of the result, the polarity that is a category in the case of classifying human impressions from the expression, Polarity determination means for determining each of the result expression and the cause expression candidate;
    A cause expression determination unit that determines whether the cause expression candidate is a cause expression describing the cause of the result by using the polarity of the result expression and the polarity of the cause expression candidate. Cause expression extraction device.
  2.  原因表現を抽出する手掛かりとなる手掛かり表現を記憶する手掛かり表現記憶手段と、
     入力されたテキストに対して構文解析を行う構文解析手段と、
     手掛かり表現に該当する前記テキスト内の箇所と、構文解析結果とに基づいて、前記テキストから結果表現および原因表現候補を抽出する結果表現・原因表現候補抽出手段とを備える
     請求項1に記載の原因表現抽出装置。
    A clue expression storage means for storing a clue expression as a clue to extract a cause expression;
    A parsing means for parsing the input text;
    The cause according to claim 1, further comprising: a result expression / cause expression candidate extraction unit that extracts a result expression and a cause expression candidate from the text based on a location in the text corresponding to the clue expression and a parsing result. Expression extraction device.
  3.  結果を表す表現である結果表現と、前記結果の原因を記述した原因表現の候補である原因表現候補とが与えられると、表現から人間が受ける印象を分類する場合のカテゴリである極性を、前記結果表現および前記原因表現候補それぞれに対して判定し、
     前記結果表現の極性と前記原因表現候補の極性とを用いて、前記原因表現候補が、前記結果の原因を記述した原因表現であるか否かを判定する
     ことを特徴とする原因表現抽出方法。
    Given a result expression that is an expression that represents a result and a cause expression candidate that is a candidate for a cause expression that describes the cause of the result, the polarity that is a category in the case of classifying human impressions from the expression, Determine for each of the result expression and the cause expression candidate,
    A cause expression extraction method characterized by determining whether or not the cause expression candidate is a cause expression describing the cause of the result by using the polarity of the result expression and the polarity of the cause expression candidate.
  4.  原因表現を抽出する手掛かりとなる手掛かり表現を記憶し、
     入力されたテキストに対して構文解析を行い、
     手掛かり表現に該当する前記テキスト内の箇所と、構文解析結果とに基づいて、前記テキストから結果表現および原因表現候補を抽出する
     請求項3に記載の原因表現抽出方法。
    Memorize clue expressions that are clues to extract the cause expression,
    Parses the entered text,
    The cause expression extraction method according to claim 3, wherein a result expression and a cause expression candidate are extracted from the text based on a location in the text corresponding to the clue expression and a syntax analysis result.
  5.  コンピュータに、
     結果を表す表現である結果表現と、前記結果の原因を記述した原因表現の候補である原因表現候補とが与えられると、表現から人間が受ける印象を分類する場合のカテゴリである極性を、前記結果表現および前記原因表現候補それぞれに対して判定する極性判定処理、および、
     前記結果表現の極性と前記原因表現候補の極性とを用いて、前記原因表現候補が、前記結果の原因を記述した原因表現であるか否かを判定する原因表現判定処理
     を実行させるための原因表現抽出プログラム。
    On the computer,
    Given a result expression that is an expression that represents a result and a cause expression candidate that is a candidate for a cause expression that describes the cause of the result, the polarity that is a category in the case of classifying human impressions from the expression, Polarity determination processing for determining each of the result expression and the cause expression candidate, and
    Cause for causing cause expression determination processing to determine whether the cause expression candidate is a cause expression describing the cause of the result, using the polarity of the result expression and the polarity of the cause expression candidate Expression extraction program.
  6.  原因表現を抽出する手掛かりとなる手掛かり表現を記憶する手掛かり表現記憶手段を備えたコンピュータに、
     入力されたテキストに対して構文解析を行う構文解析処理、および、
     手掛かり表現に該当する前記テキスト内の箇所と、構文解析結果とに基づいて、前記テキストから結果表現および原因表現候補を抽出する結果表現・原因表現候補抽出処理
     を実行させる請求項5に記載の原因表現抽出プログラム。
    In a computer equipped with a clue expression storage means for storing a clue expression as a clue to extract a cause expression,
    A parsing process for parsing the input text, and
    The cause according to claim 5, wherein a result expression / cause expression candidate extraction process for extracting a result expression and a cause expression candidate from the text is executed based on a location in the text corresponding to the clue expression and a parsing result. Expression extraction program.
PCT/JP2013/004022 2012-07-26 2013-06-27 Cause expression extraction device, cause expression extraction method, and cause expression extraction program WO2014017023A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2012166075 2012-07-26
JP2012-166075 2012-07-26

Publications (1)

Publication Number Publication Date
WO2014017023A1 true WO2014017023A1 (en) 2014-01-30

Family

ID=49996851

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2013/004022 WO2014017023A1 (en) 2012-07-26 2013-06-27 Cause expression extraction device, cause expression extraction method, and cause expression extraction program

Country Status (1)

Country Link
WO (1) WO2014017023A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016012195A (en) * 2014-06-27 2016-01-21 Kddi株式会社 Factor estimation device, program, and factor estimation method
JP2016057989A (en) * 2014-09-11 2016-04-21 Kddi株式会社 Information provision device, and method and program for providing information
WO2018066445A1 (en) * 2016-10-05 2018-04-12 国立研究開発法人情報通信研究機構 Causal relationship recognition apparatus and computer program therefor
JP2021108212A (en) * 2019-06-18 2021-07-29 ヤフー株式会社 Acquisition device, acquisition method, and acquisition program
CN117787267A (en) * 2023-12-29 2024-03-29 广东外语外贸大学 Emotion cause pair extraction method and system based on neural network

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010271819A (en) * 2009-05-20 2010-12-02 Nec Corp Device, method, and program for extracting phrase relation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010271819A (en) * 2009-05-20 2010-12-02 Nec Corp Device, method, and program for extracting phrase relation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HIROKI SAKAJI ET AL.: "An Extraction Method of Causal Knowledge from Newspaper Corpus", IEICE TECHNICAL REPORT, vol. 111, no. 119, 30 June 2011 (2011-06-30), pages 7 - 10 *
RYUZO NAKAMICHI ET AL.: "Collection of connective expressions for emotion reasoning", IEICE TECHNICAL REPORT, vol. 108, no. 353, 6 December 2008 (2008-12-06), pages 1 - 6 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016012195A (en) * 2014-06-27 2016-01-21 Kddi株式会社 Factor estimation device, program, and factor estimation method
JP2016057989A (en) * 2014-09-11 2016-04-21 Kddi株式会社 Information provision device, and method and program for providing information
WO2018066445A1 (en) * 2016-10-05 2018-04-12 国立研究開発法人情報通信研究機構 Causal relationship recognition apparatus and computer program therefor
JP2018060364A (en) * 2016-10-05 2018-04-12 国立研究開発法人情報通信研究機構 Causal relation recognition device and computer program therefor
US11256658B2 (en) 2016-10-05 2022-02-22 National Institute Of Information And Communications Technology Causality recognizing apparatus and computer program therefor
JP2021108212A (en) * 2019-06-18 2021-07-29 ヤフー株式会社 Acquisition device, acquisition method, and acquisition program
JP7292324B2 (en) 2019-06-18 2023-06-16 ヤフー株式会社 Acquisition device, acquisition method, and acquisition program
CN117787267A (en) * 2023-12-29 2024-03-29 广东外语外贸大学 Emotion cause pair extraction method and system based on neural network
CN117787267B (en) * 2023-12-29 2024-06-07 广东外语外贸大学 Emotion cause pair extraction method and system based on neural network

Similar Documents

Publication Publication Date Title
Musi et al. Towards feasible guidelines for the annotation of argument schemes
US11842410B2 (en) Automated conversation review to surface virtual assistant misunderstandings
Bishop Problems with tense marking in children with specific language impairment: not how but when
US20170193090A1 (en) Readability awareness in natural language processing systems
Robins et al. Letter knowledge in parent–child conversations
WO2014017023A1 (en) Cause expression extraction device, cause expression extraction method, and cause expression extraction program
WO2020199600A1 (en) Sentiment polarity analysis method and related device
WO2012132388A1 (en) Text analyzing device, problematic behavior extraction method, and problematic behavior extraction program
Tan et al. Lost in propagation? Unfolding news cycles from the source
Long et al. Turbulent flow: A computational model of world literature
Bloom Sentiment analysis based on appraisal theory and functional local grammars
Felice et al. IDENTIFYING SPEECH ACTS IN E‐MAILS: TOWARD AUTOMATED SCORING OF THE TOEIC® E‐MAIL TASK
Verberne et al. Automatic thematic classification of election manifestos
CN112380868A (en) Petition-purpose multi-classification device based on event triples and method thereof
US11132699B2 (en) Apparatuses, method, and computer program for acquiring and evaluating information and noise removal
Mahoney Linguistic influences on differential item functioning for second language learners on the National Assessment of Educational Progress
US8090696B2 (en) Method and system for assigning scores to elements in a set of structured data
Bauwelinck et al. Annotating topics, stance, argumentativeness and claims in Dutch social media comments: A pilot study
Yuliah et al. Grammatical errors in social media caption
Salvetti Detecting deception in text: a corpus-driven approach
KR20190032911A (en) Apparatus and method for analyzing sentence habit
Nikadon et al. BERTAgent: The development of a novel tool to quantify agency in textual data
Jurgens et al. Your spouse needs professional help: Determining the contextual appropriateness of messages through modeling social relationships
Kaur et al. Sentiment detection from Punjabi text using support vector machine
Gao et al. High accuracy question answering via hybrid controlled natural language

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13823643

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13823643

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP