JP7568592B2

JP7568592B2 - Method and computer system for evaluating models

Info

Publication number: JP7568592B2
Application number: JP2021123995A
Authority: JP
Inventors: 由真松浦; 将彦真野; 涼西村; 友樹石田
Original assignee: Hitachi Solutions Ltd
Current assignee: Hitachi Solutions Ltd
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2024-10-16
Anticipated expiration: 2041-07-29
Also published as: JP2023019341A

Description

本発明は、自然言語処理を行うモデルの生成に使用する学習データの改善技術に関する。 The present invention relates to a technique for improving training data used to generate models that perform natural language processing.

自然言語処理の分野では、テキストデータを解析し、意味解釈及び分類等を行うＡＩ（モデル）の開発が行われている。モデルは機械学習によって生成される。 In the field of natural language processing, AI (models) are being developed that analyze text data and perform semantic interpretation and classification. Models are generated through machine learning.

機械学習では、大量の学習データを用いた演算が実行される。従来の機械学習では、学習データを用意するコストが高く、また、学習データの数を増やしてもモデルの予測精度が必ずしも向上しないという課題がある。高い予測精度のモデルを生成するためには、予測精度の向上効果が大きい質の高い学習データを用意する必要がある。これに対して、特許文献１に記載の技術が知られている。 In machine learning, calculations are performed using large amounts of training data. Conventional machine learning has issues in that the cost of preparing training data is high, and increasing the amount of training data does not necessarily improve the model's prediction accuracy. In order to generate a model with high prediction accuracy, it is necessary to prepare high-quality training data that has a large effect of improving prediction accuracy. In response to this, the technology described in Patent Document 1 is known.

特許文献１には「本発明は、所定の表示を行う表示部６と、力学モデルを用いて、自然言語分析の対象となる学習データについてノードの最適配置を算出するノード最適配置算出処理部２ｂと、ノード最適配置算出処理部２ｂによる算出結果に基づいて上記学習データを可視化して表示部６に表示するよう制御する表示制御部２ｅとを備えた学習データ精度可視化システム」が開示されている。 Patent document 1 discloses that "the present invention is a learning data accuracy visualization system including a display unit 6 that performs a predetermined display, a node optimal placement calculation processing unit 2b that uses a dynamic model to calculate the optimal placement of nodes for learning data that is the subject of natural language analysis, and a display control unit 2e that controls the display unit 6 to visualize the learning data based on the calculation results by the node optimal placement calculation processing unit 2b."

特開２０１９－２０９４６号公報JP 2019-20946 A

特許文献１に記載の技術では、モデルへの入力（質問）と、モデルからの出力（回答）との相関関係を可視化して、ユーザによる学習データの修正を支援している。 The technology described in Patent Document 1 visualizes the correlation between the input (question) to the model and the output (answer) from the model, helping users correct their learning data.

自然言語処理を実行するモデルの予測精度を向上させるためには、自然言語に対する人の思考と同様の処理を行うアルゴリズムを獲得する必要である。したがって、モデルの出力だけではなく、アルゴリズムの正しさも考慮して学習データを改善する必要がある。ここで、学習データの改善とは、学習データの修正及び学習データの追加を含む概念である。 To improve the predictive accuracy of a model that performs natural language processing, it is necessary to acquire an algorithm that processes natural language in a similar way to how humans think. Therefore, it is necessary to improve the training data by taking into consideration not only the model output but also the correctness of the algorithm. Here, improving the training data is a concept that includes both correcting the training data and adding training data.

また、モデルの予測精度は、モデルを評価するテストデータの質にも依存するため、テストデータの改善も重要である。 In addition, the predictive accuracy of a model also depends on the quality of the test data used to evaluate the model, so improving the test data is also important.

従来技術では、アルゴリズムの正しさは考慮されておらず、また、テストデータの改善についても考慮されていない。 Conventional techniques do not take into account the correctness of the algorithm, nor do they take into account the improvement of test data.

本発明は、モデルの出力及びアルゴリズムの正しさを考慮して、学習データ及びテストデータの改善を支援するシステム及び方法を提供することを目的とする。 The present invention aims to provide a system and method that supports the improvement of training data and test data by taking into account the model output and algorithm correctness.

本願において開示される発明の代表的な一例を示せば以下の通りである。すなわち、計算機システムが実行する、文書を入力として受け付け、前記文書を用いたタスクを実行することによって分類ラベルを出力するモデルの評価方法であって、前記計算機システムは、プロセッサ、前記プロセッサに接続される記憶装置、及び前記プロセッサに接続されるインタフェースを有する、少なくとも一つの計算機を備え、前記分類ラベルと、人が前記分類ラベルを付与する場合に重要視する着目キーワードとの対応付けを管理する判断根拠情報を保持し、前記モデルの評価方法は、前記少なくとも一つの計算機が、入力文書及び正解分類ラベルを含むテストデータを前記モデルに入力する第１のステップと、前記少なくとも一つの計算機が、前記モデルが前記分類ラベルの出力時に重要視した前記入力文書内の根拠キーワードを特定する第２のステップと、前記少なくとも一つの計算機が、前記分類ラベルと、前記正解分類ラベルとに基づいて前記モデルの出力の正誤を判定する第３のステップと、前記少なくとも一つの計算機が、前記根拠キーワードと、前記分類ラベルに対応する前記着目キーワードとに基づいて前記モデルの判断根拠の正誤を判定する第４のステップと、前記少なくとも一つの計算機が、前記モデルの出力及び前記モデルの判断根拠の正誤の判定結果に基づいて、前記モデルの再学習に使用する学習データ及び前記テストデータの改善指針となる情報を含む評価結果を提示する第５のステップと、を含む。 A representative example of the invention disclosed in the present application is as follows. That is, a method for evaluating a model executed by a computer system, which accepts a document as input and outputs a classification label by executing a task using the document, the computer system includes at least one computer having a processor, a storage device connected to the processor, and an interface connected to the processor, and holds judgment basis information that manages the association between the classification label and a keyword of interest that is considered important when a person assigns the classification label, and the method for evaluating the model includes a first step in which the at least one computer inputs test data including an input document and a correct classification label to the model, and a second step in which the at least one computer determines whether the model is correct. The method includes a second step of identifying the basis keywords in the input document that were considered important when outputting the classification label, a third step of the at least one computer judging the correctness of the output of the model based on the classification label and the correct classification label, a fourth step of the at least one computer judging the correctness of the model's judgment basis based on the basis keywords and the keyword of interest corresponding to the classification label, and a fifth step of the at least one computer presenting an evaluation result including information that serves as a guideline for improving the training data and test data used to retrain the model based on the output of the model and the judgment result of the correctness of the model's judgment basis.

本発明によれば、モデルの出力及び判断根拠の正しさを考慮した学習データ及びテストデータの改善指針を提示することによって、学習データ及びテストデータの改善を支援できる。ユーザは、改善指針に基づいて、学習データ及びテストデータの少なくともいずれかを改善することによって、予測精度の高いモデルを生成することができる。 According to the present invention, by presenting improvement guidelines for training data and test data that take into account the correctness of the model output and the judgment basis, it is possible to support the improvement of training data and test data. By improving at least one of the training data and test data based on the improvement guidelines, the user can generate a model with high prediction accuracy.

実施例１のシステムの構成例を示す図である。FIG. 1 illustrates an example of a system configuration according to a first embodiment. 実施例１の計算機システムを構成する計算機のハードウェア構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of a hardware configuration of a computer that configures a computer system according to a first embodiment. 実施例１の学習データのデータ構造の一例を示す図である。FIG. 4 is a diagram illustrating an example of a data structure of learning data according to the first embodiment. 実施例１のテストデータのデータ構造の一例を示す図である。FIG. 4 is a diagram illustrating an example of a data structure of test data according to the first embodiment. 実施例１の学習データセットを管理するための情報のデータ構造の一例を示す図である。FIG. 1 is a diagram showing an example of a data structure of information for managing a learning dataset in the first embodiment. 実施例１のテストデータセットを管理するための情報のデータ構造の一例を示す図である。FIG. 2 is a diagram illustrating an example of a data structure of information for managing a test data set in the first embodiment. 実施例１の判断根拠情報に格納されるデータのデータ構造の一例を示す図である。FIG. 13 is a diagram illustrating an example of a data structure of data stored in determination basis information according to the first embodiment. 実施例１のモデル管理情報に格納されるデータのデータ構造の一例を示す図である。11 is a diagram illustrating an example of a data structure of data stored in model management information according to the first embodiment. FIG. 実施例１のモデル管理情報に格納されるデータのデータ構造の一例を示す図である。11 is a diagram illustrating an example of a data structure of data stored in model management information according to the first embodiment. FIG. 実施例１のモデル管理情報に格納されるデータのデータ構造の一例を示す図である。11 is a diagram illustrating an example of a data structure of data stored in model management information according to the first embodiment. FIG. 実施例１のシステムの処理の流れを説明するシーケンス図である。FIG. 2 is a sequence diagram illustrating a process flow of the system according to the first embodiment. 実施例１の計算機システムが実行する学習／評価処理を説明するフローチャートである。1 is a flowchart illustrating a learning/evaluation process executed by the computer system of the first embodiment.

以下、本発明の実施例を、図面を用いて説明する。ただし、本発明は以下に示す実施の形態の記載内容に限定して解釈されるものではない。本発明の思想ないし趣旨から逸脱しない範囲で、その具体的構成を変更し得ることは当業者であれば容易に理解される。 The following describes an embodiment of the present invention with reference to the drawings. However, the present invention should not be interpreted as being limited to the description of the embodiment shown below. Those skilled in the art will easily understand that the specific configuration can be changed without departing from the concept or spirit of the present invention.

以下に説明する発明の構成において、同一又は類似する構成又は機能には同一の符号を付し、重複する説明は省略する。 In the configuration of the invention described below, the same or similar configurations or functions are given the same reference symbols, and duplicate explanations are omitted.

図１は、実施例１のシステムの構成例を示す図である。図２は、実施例１の計算機システムを構成する計算機のハードウェア構成の一例を示す図である。図３Ａは、実施例１の学習データのデータ構造の一例を示す図である。図３Ｂは、実施例１のテストデータのデータ構造の一例を示す図である。 FIG. 1 is a diagram showing an example of the configuration of a system according to the first embodiment. FIG. 2 is a diagram showing an example of the hardware configuration of a computer constituting a computer system according to the first embodiment. FIG. 3A is a diagram showing an example of the data structure of training data according to the first embodiment. FIG. 3B is a diagram showing an example of the data structure of test data according to the first embodiment.

システムは、計算機システム１００及び端末１０１から構成される。計算機システム１００及び端末１０１は、直接又は図示しないネットワークを介して互いに接続される。ネットワークは、例えば、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）及びＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）等であり、接続方式は有線及び無線のいずれでもよい。 The system is composed of a computer system 100 and a terminal 101. The computer system 100 and the terminal 101 are connected to each other directly or via a network (not shown). The network may be, for example, a wide area network (WAN) or a local area network (LAN), and the connection method may be either wired or wireless.

端末１０１は、計算機システム１００を利用するユーザが操作する端末である。端末１０１は、図示しない、プロセッサ、メモリ、及びネットワークインタフェースを有する。 The terminal 101 is a terminal operated by a user who uses the computer system 100. The terminal 101 has a processor, memory, and a network interface, which are not shown.

計算機システム１００は、問い合わせ及び報告書等のテキストデータを解析し、解析結果に基づいてテキストデータの分類及び二つのテキストデータの含有判定のいずれかのタスクを実行し、分類ラベルを出力する。テキストデータの分類では、例えば、テキスト内容が関連する業務分野が特定され、特定された分野を示す分類ラベルがテキストデータに付与される。テキストデータの含有判定では、例えば、二つのテキストの類似性等が判定され、類似又は非類似のいずれかを示す分類ラベルがテキストデータに付与される。 The computer system 100 analyzes text data such as inquiries and reports, and executes one of the tasks of classifying the text data or judging the inclusion of two pieces of text data based on the analysis results, and outputs a classification label. In classifying the text data, for example, the business field to which the text content relates is identified, and a classification label indicating the identified field is assigned to the text data. In judging the inclusion of text data, for example, the similarity between two texts is judged, and a classification label indicating either similarity or dissimilarity is assigned to the text data.

計算機システム１００は、図２に示すような計算機２００から構成される。なお、計算機システム１００は、ストレージシステム及びネットワークスイッチ等を含んでもよい。 The computer system 100 is composed of a computer 200 as shown in FIG. 2. The computer system 100 may also include a storage system and a network switch, etc.

計算機２００は、プロセッサ２０１、主記憶装置２０２、副記憶装置２０３、及びネットワークインタフェース２０４を有する。 The computer 200 has a processor 201, a main memory device 202, a secondary memory device 203, and a network interface 204.

プロセッサ２０１は、主記憶装置２０２に格納されるプログラムを実行する。プロセッサ２０１がプログラムにしたがって処理を実行することによって、特定の機能を実現する機能部（モジュール）として動作する。以下の説明では、機能部を主語に処理を説明する場合、プロセッサ２０１が当該機能部を実現するプログラムを実行していることを示す。主記憶装置２０２は、メモリ等であり、プロセッサ２０１が実行するプログラム及びプログラムが使用するデータを格納する。主記憶装置２０２は、ワークエリアとしても用いられる。副記憶装置２０３は、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）及びＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等であり、データを永続的に格納する。主記憶装置２０２に格納されるプログラム及びデータは、副記憶装置２０３に格納されてもよい。この場合、プロセッサ２０１が副記憶装置２０３からプログラム及びデータを読み出し、主記憶装置２０２にロードする。ネットワークインタフェース２０４は、ネットワークを介して他の装置と通信する。 The processor 201 executes a program stored in the main memory 202. The processor 201 executes processing according to the program, thereby operating as a functional unit (module) that realizes a specific function. In the following description, when a process is described with the functional unit as the subject, it indicates that the processor 201 is executing a program that realizes the functional unit. The main memory 202 is a memory or the like, and stores the program executed by the processor 201 and the data used by the program. The main memory 202 is also used as a work area. The secondary memory 203 is a hard disk drive (HDD) or a solid state drive (SSD), etc., and stores data permanently. The program and data stored in the main memory 202 may be stored in the secondary memory 203. In this case, the processor 201 reads the program and data from the secondary memory 203 and loads them into the main memory 202. The network interface 204 communicates with other devices via a network.

なお、計算機２００は、キーボード、マウス、及びタッチパネル等の入力装置、並びにディスプレイ及びプリンタ等の出力装置と接続するＩＯインタフェースを有してもよい。 The computer 200 may also have an IO interface for connecting to input devices such as a keyboard, mouse, and touch panel, as well as output devices such as a display and printer.

計算機システム１００は、機能部として、学習部１１０、予測部１１１、ＡＰＩ１１２、及びデータ記憶部１１３を有する。 The computer system 100 has the following functional units: a learning unit 110, a prediction unit 111, an API 112, and a data storage unit 113.

ＡＰＩ１１２は、端末１０１に対して、計算機システム１００の各種機能を利用するためのインタフェースを提供する。端末１０１は、ＡＰＩ１１２を介して、データセットの登録及び更新、並びに学習処理の実行等を行う。本実施例では、ユーザは、端末１０１を用いて、判断根拠情報１２０を入力する。判断根拠情報１２０は、タスクにおけるモデルの判断根拠となるキーワードに関する情報である。判断根拠情報１２０の詳細は図５を用いて説明する。 The API 112 provides the terminal 101 with an interface for utilizing various functions of the computer system 100. The terminal 101 registers and updates datasets, executes learning processes, and the like, via the API 112. In this embodiment, the user uses the terminal 101 to input judgment basis information 120. The judgment basis information 120 is information on keywords that serve as the judgment basis for the model in the task. Details of the judgment basis information 120 will be explained using FIG. 5.

データ記憶部１１３は、一つ以上の学習データセット１３０及び一つ以上のテストデータセット１４０を記憶し、管理する。 The data storage unit 113 stores and manages one or more training data sets 130 and one or more test data sets 140.

学習データセット１３０は、一つ以上の学習データ１３１から構成される。学習データ１３１は、モデルを生成するための学習処理に用いられるデータであり、図３Ａに示すように、データＩＤ３０１、テキストデータ３０２、及び分類ラベル３０３を含む。なお、学習データ１３１は、前述した以外のフィールドを含んでもよい。 The learning dataset 130 is composed of one or more pieces of learning data 131. The learning data 131 is data used in the learning process to generate a model, and includes a data ID 301, text data 302, and a classification label 303, as shown in FIG. 3A. Note that the learning data 131 may include fields other than those described above.

データＩＤ３０１は、学習データ１３１の識別情報を格納するフィールドである。テキストデータ３０２は、モデルに入力するテキストデータを格納するフィールドである。タスクが分類の場合、テキストデータ３０２には一つのテキストデータが格納され、タスクが含有判定の場合、テキストデータ３０２には二つのテキストデータが格納される。分類ラベル３０３は、テキストデータに対するタスクの実行によって出力される分類ラベルの正解値を格納するフィールドである。テキストデータの分類では、分類ラベル３０３には業務分野等が格納され、テキストデータの含有判定では、分類ラベル３０３には類似又は非類似等が格納される。 Data ID 301 is a field that stores identification information of learning data 131. Text data 302 is a field that stores text data to be input to the model. When the task is classification, one piece of text data is stored in text data 302, and when the task is content judgment, two pieces of text data are stored in text data 302. Classification label 303 is a field that stores the correct answer value of the classification label output by executing a task on text data. When classifying text data, classification label 303 stores business field, etc., and when content judgment of text data is performed, classification label 303 stores similarity or dissimilarity, etc.

テストデータセット１４０は、一つ以上のテストデータ１４１から構成される。テストデータ１４１は、モデルの予測精度を評価するために用いられるデータであり、図３Ｂに示すように、データＩＤ３１１、テキストデータ３１２、及び分類ラベル３１３を含む。なお、テストデータ１４１は、前述した以外のフィールドを含んでもよい。 The test dataset 140 is composed of one or more test data 141. The test data 141 is data used to evaluate the predictive accuracy of the model, and as shown in FIG. 3B, includes a data ID 311, text data 312, and a classification label 313. Note that the test data 141 may include fields other than those described above.

データＩＤ３１１は、テストデータ１４１の識別情報を格納するフィールドである。テキストデータ３１２は、モデルに入力するテキストデータを格納するフィールドである。タスクが分類の場合、テキストデータ３１２には一つのテキストデータが格納され、タスクが含有判定の場合、テキストデータ３１２には二つのテキストデータが格納される。分類ラベル３１３は、テキストデータに対するタスクの実行によって出力される分類ラベルの正解値を格納するフィールドである。テキストデータの分類では、分類ラベル３０３には業務分野等が格納され、テキストデータの含有判定では、分類ラベル３０３には類似又は非類似等が格納される。 Data ID 311 is a field that stores identification information of test data 141. Text data 312 is a field that stores text data to be input to the model. When the task is classification, one piece of text data is stored in text data 312, and when the task is content judgment, two pieces of text data are stored in text data 312. Classification label 313 is a field that stores the correct answer value of the classification label output by executing a task on text data. When classifying text data, classification label 303 stores business field, etc., and when judging content of text data, classification label 303 stores similarity, dissimilarity, etc.

なお、学習データセット１３０から一部の学習データ１３１を選択し、テストデータセット１４０を生成してもよい。 In addition, a portion of the training data 131 may be selected from the training data set 130 to generate the test data set 140.

以下の説明では、学習データセット１３０及びテストデータセット１４０を区別しない場合、データセットと記載する。 In the following description, when there is no need to distinguish between the training dataset 130 and the test dataset 140, they will be referred to as datasets.

学習部１１０は、タスクを行うためのモデルを生成するための学習処理を実行する。また、学習部１１０は、モデルの評価処理を実行する。学習部１１０は、学習処理及び評価処理の結果をモデル管理情報１２１に格納する。モデル管理情報１２１の詳細は図６Ａ、図６Ｂ、及び図６Ｃを用いて説明する。 The learning unit 110 executes a learning process to generate a model for performing a task. The learning unit 110 also executes an evaluation process for the model. The learning unit 110 stores the results of the learning process and the evaluation process in the model management information 121. Details of the model management information 121 will be explained using Figures 6A, 6B, and 6C.

予測部１１１は、モデル管理情報１２１に格納されるモデルの情報を用いて、入力されたテキストデータに対してタスクを実行する。 The prediction unit 111 executes a task on the input text data using the model information stored in the model management information 121.

なお、計算機システム１００が有する各機能部については、複数の機能部を一つの機能部にまとめてもよいし、一つの機能部を機能毎に複数の機能部に分けてもよい。 Regarding each functional unit of the computer system 100, multiple functional units may be combined into one functional unit, or one functional unit may be divided into multiple functional units for each function.

図４Ａは、実施例１の学習データセット１３０を管理するための情報のデータ構造の一例を示す図である。図４Ｂは、実施例１のテストデータセット１４０を管理するための情報のデータ構造の一例を示す図である。 FIG. 4A is a diagram showing an example of a data structure of information for managing the training dataset 130 of Example 1. FIG. 4B is a diagram showing an example of a data structure of information for managing the test dataset 140 of Example 1.

データ記憶部１１３は、図４Ａに示すデータセット管理データ４００を用いて学習データセット１３０を管理し、また、図４Ｂに示すデータセット管理データ４１０を用いてテストデータセット１４０を管理する。一つの学習データセット１３０に対して一つのデータセット管理データ４００が存在し、また、一つのテストデータセット１４０に対して一つのデータセット管理データ４１０が存在する。 The data storage unit 113 manages the training dataset 130 using the dataset management data 400 shown in FIG. 4A, and manages the test dataset 140 using the dataset management data 410 shown in FIG. 4B. One dataset management data 400 exists for one training dataset 130, and one dataset management data 410 exists for one test dataset 140.

データセット管理データ４００は、学習データセット名４０１、タスク４０２、及びリスト４０３を含む。なお、データセット管理データ４００は、前述以外のフィールドを含んでもよい。 The dataset management data 400 includes a learning dataset name 401, a task 402, and a list 403. The dataset management data 400 may include fields other than those described above.

学習データセット名４０１は、学習データセット１３０の識別情報である名称を格納するフィールドである。タスク４０２は、学習データセット１３０を用いて生成されるモデルが実行するタスクの種別を格納するフィールドである。リスト４０３は、学習データセット１３０を構成する学習データ１３１の識別情報のリストを格納するフィールドである。なお、識別情報のリストの代わりに、学習データ１３１そのものが格納されてもよい。 The learning dataset name 401 is a field that stores the name, which is the identification information of the learning dataset 130. The task 402 is a field that stores the type of task executed by the model generated using the learning dataset 130. The list 403 is a field that stores a list of the identification information of the learning data 131 that constitutes the learning dataset 130. Note that instead of a list of the identification information, the learning data 131 itself may be stored.

データセット管理データ４１０は、テストデータセット名４１１、タスク４１２、及びリスト４１３を含む。なお、データセット管理データ４１０は、前述以外のフィールドを含んでもよい。 The dataset management data 410 includes a test dataset name 411, a task 412, and a list 413. Note that the dataset management data 410 may include fields other than those described above.

テストデータセット名４１１は、テストデータセット１４０の識別情報である名称を格納するフィールドである。タスク４１２は、評価対象のモデルが実行するタスクの種別を格納するフィールドである。リスト４１３は、テストデータセット１４０を構成するテストデータ１４１の識別情報のリストを格納するフィールドである。なお、識別情報のリストの代わりに、テストデータ１４１そのものが格納されてもよい。 Test dataset name 411 is a field that stores the name, which is the identification information of the test dataset 140. Task 412 is a field that stores the type of task executed by the model to be evaluated. List 413 is a field that stores a list of identification information of the test data 141 that constitutes the test dataset 140. Note that instead of a list of identification information, the test data 141 itself may be stored.

図５は、実施例１の判断根拠情報１２０に格納されるデータのデータ構造の一例を示す図である。 Figure 5 is a diagram showing an example of the data structure of the data stored in the judgment basis information 120 in Example 1.

判断根拠情報１２０は、一つ以上の判断根拠データ５００を含む。一つの分類ラベルに対して一つの判断根拠データ５００が存在する。判断根拠データ５００は、分類ラベル５０１及びキーワード５０２を含む。 The judgment basis information 120 includes one or more judgment basis data 500. One judgment basis data 500 exists for one classification label. The judgment basis data 500 includes a classification label 501 and keywords 502.

分類ラベル５０１は分類ラベルを格納するフィールドである。キーワード５０２は、分類ラベルを付与する場合に、ユーザが重要視するキーワードを格納するフィールドである。すなわち、人の判断根拠を示すデータが格納される。 Classification label 501 is a field that stores a classification label. Keywords 502 is a field that stores keywords that the user considers important when assigning a classification label. In other words, data that indicates the basis for a person's judgment is stored.

図６Ａ、図６Ｂ、及び図６Ｃは、実施例１のモデル管理情報１２１に格納されるデータのデータ構造の一例を示す図である。 Figures 6A, 6B, and 6C are diagrams showing an example of the data structure of data stored in model management information 121 in Example 1.

モデル管理情報１２１は、一つ以上のモデルデータ６００を含む。一つのモデルに対して一つのモデルデータ６００が存在する。モデルデータ６００は、モデル名６０１、タスク６０２、モデルパラメータ６０３、学習パラメータ６０４、学習データセット名６０５、テストデータセット名６０６、ステータス６０７、有効フラグ６０８、精度評価指標６０９、及び評価結果６１０を含む。なお、モデルデータ６００は、前述以外のフィールドを含んでもよい。 The model management information 121 includes one or more model data 600. One model data 600 exists for one model. The model data 600 includes a model name 601, a task 602, model parameters 603, learning parameters 604, a learning dataset name 605, a test dataset name 606, a status 607, a valid flag 608, an accuracy evaluation index 609, and an evaluation result 610. Note that the model data 600 may include fields other than those described above.

モデル名６０１は、モデルの識別情報である名称を格納するフィールドである。 Model name 601 is a field that stores the name, which is the identification information of the model.

タスク６０２は、モデルが実行するタスクの種別を格納するフィールドである。 Task 602 is a field that stores the type of task that the model executes.

モデルパラメータ６０３は、モデルを定義するパラメータを格納するである。学習処理の開始時には、モデルパラメータ６０３には、初期モデルのパラメータが格納される。例えば、ＢＥＲＴ（ＢｉｄｉｒｅｃｔｉｏｎａｌＥｎｃｏｄｅｒＲｅｐｒｅｓｅｎｔａｔｉｏｎｓｆｒｏｍＴｒａｎｓｆｏｒｍｅｒｓ）の場合、モデルパラメータ６０３には、事前学習で生成されたモデルのパラメータが格納される。 The model parameters 603 store parameters that define the model. At the start of the learning process, the model parameters 603 store parameters of an initial model. For example, in the case of BERT (Bidirectional Encoder Representations from Transformers), the model parameters 603 store parameters of a model generated in pre-learning.

学習パラメータ６０４は、学習処理を制御するためのハイパーパラメータを格納するフィールドである。 Learning parameters 604 is a field that stores hyperparameters for controlling the learning process.

学習データセット名６０５は、学習処理に使用する学習データセット１３０の名称を格納するフィールドである。なお、学習データセット名６０５には、学習データセット１３０から取得する学習データ１３１の数が格納されてもよい。 The learning dataset name 605 is a field that stores the name of the learning dataset 130 used in the learning process. Note that the learning dataset name 605 may also store the number of learning data 131 obtained from the learning dataset 130.

テストデータセット名６０６は、評価処理に使用するテストデータセット１４０の名称を格納するフィールドである。なお、テストデータセット名６０６には、テストデータセット１４０から取得するテストデータ１４１の数が格納されてもよい。 The test dataset name 606 is a field that stores the name of the test dataset 140 used in the evaluation process. Note that the test dataset name 606 may also store the number of test data 141 obtained from the test dataset 140.

ステータス６０７は、学習処理の実行状態を示す値を格納するフィールドである。例えば、ステータス６０７には、学習前、学習中、及び学習完了等が格納される。 Status 607 is a field that stores a value indicating the execution status of the learning process. For example, status 607 stores "before learning," "learning," "learning completed," etc.

有効フラグ６０８は、予測部１１１が使用するモデルとして有効化されているか否かを示すフラグを格納する。例えば、有効フラグ６０８には、モデルが有効化されていることを示すＴｒｕｅ、及びモデルが無効化されていることを示すＦａｌｓｅ等が格納される。 The valid flag 608 stores a flag indicating whether or not the model is enabled as a model to be used by the prediction unit 111. For example, the valid flag 608 stores True, which indicates that the model is enabled, and False, which indicates that the model is disabled.

精度評価指標６０９は、モデルの予測精度を評価するための指標の値を格納するフィールドである。例えば、精度評価指標６０９には、正解率、再現率、適合率、及びＦ値等が格納される。 The accuracy evaluation index 609 is a field that stores the value of an index for evaluating the predictive accuracy of the model. For example, the accuracy evaluation index 609 stores the accuracy rate, recall rate, precision rate, F-score, etc.

評価結果６１０は、評価処理の結果を格納するフィールドである。評価結果６１０には、図６Ｂに示すような評価結果データ６５０が格納される。一つのテストデータ１４１に対して一つの評価結果データ６５０が存在する。 The evaluation result 610 is a field that stores the results of the evaluation process. The evaluation result 610 stores evaluation result data 650 as shown in FIG. 6B. One evaluation result data 650 exists for one test data 141.

評価結果データ６５０は、データＩＤ６５１、分類ラベル６５２、確信度６５３、分類ラベル正誤６５４、モデル判断根拠６５５、及び判断根拠正誤６５６を含む。なお、評価結果データ６５０は、前述以外のフィールドを含んでもよい。 The evaluation result data 650 includes a data ID 651, a classification label 652, a confidence level 653, a classification label correct/incorrect 654, a model judgment basis 655, and a judgment basis correct/incorrect 656. The evaluation result data 650 may include fields other than those described above.

データＩＤ６５１はデータＩＤ３１１と同一のフィールドである。 Data ID 651 is the same field as data ID 311.

分類ラベル６５２は、テストデータ１４１をモデルに入力することによって得られた分類ラベルを格納するフィールドである。 Classification label 652 is a field that stores the classification label obtained by inputting test data 141 into the model.

確信度６５３は、モデルが出力した分類ラベルの確信度（確率）を格納するフィールドである。 Confidence 653 is a field that stores the confidence (probability) of the classification label output by the model.

分類ラベル正誤６５４は、テストデータ１４１の分類ラベル３１３の値と、分類ラベル６５２の値とが一致するか否かを示す値を格納するフィールドである。例えば、分類ラベル正誤６５４には、二つの値が一致していることを示すＴｒｕｅ、及び二つの値が一致していないことを示すＦａｌｓｅ等が格納される。 The classification label true/false 654 is a field that stores a value indicating whether the value of the classification label 313 in the test data 141 matches the value of the classification label 652. For example, the classification label true/false 654 stores True, which indicates that the two values match, and False, which indicates that the two values do not match.

モデル判断根拠６５５は、モデルが分類ラベルを出力する場合に重要視したキーワードに関する情報を格納するフィールド群である。すなわち、モデルの判断根拠を示すデータが格納される。モデル判断根拠６５５は、トークンリスト６６１及びアテンションリスト６６２を含む。図６Ｃに示すように、トークンリスト６６１には、テキストデータから抽出された単語等の文字列のリストが格納される。アテンションリスト６６２には、トークンリスト６６１に格納される文字列の重要度のリストが格納される。なお、文字列の重要度は、アテンション機構及びＳＨＡＰ（ＳＨａｐｌｅｙＡｄｄｉｔｉｖｅｅｘＰｌａｎａｔｉｏｎｓ）等、公知の技術を用いて算出される値である。 The model judgment basis 655 is a group of fields that stores information about keywords that the model considers important when outputting a classification label. In other words, data indicating the judgment basis of the model is stored. The model judgment basis 655 includes a token list 661 and an attention list 662. As shown in FIG. 6C, the token list 661 stores a list of character strings such as words extracted from text data. The attention list 662 stores a list of the importance of character strings stored in the token list 661. The importance of a character string is a value calculated using a known technique such as an attention mechanism and SHAP (Shapley Additive exPlanations).

判断根拠正誤６５６は、ユーザの判断根拠及びモデルの判断根拠が一致するか否かを示す値を格納するフィールドである。例えば、分類ラベル正誤６５４には、ユーザ及びモデルの判断根拠が一致していることを示すＴｒｕｅ、及びユーザ及びモデルの判断根拠が一致していないことを示すＦａｌｓｅ等が格納される。 The judgment basis correctness 656 is a field that stores a value indicating whether the judgment basis of the user and the judgment basis of the model match. For example, the classification label correctness 654 stores True, which indicates that the judgment basis of the user and the model match, and False, which indicates that the judgment basis of the user and the model do not match.

図７は、実施例１のシステムの処理の流れを説明するシーケンス図である。 Figure 7 is a sequence diagram that explains the processing flow of the system in Example 1.

実施例１のシステムでは、データセットの登録、モデルの生成及び評価、並びに、学習データセット１３０の更新の三つの処理フェーズが存在する。 The system of Example 1 has three processing phases: dataset registration, model generation and evaluation, and updating of the learning dataset 130.

ユーザは、端末１０１を操作して、学習データセット１３０及びテストデータセット１４０の少なくともいずれかを設定するためのデータセット登録要求を計算機システム１００に送信する（ステップＳ１０１）。データセット登録要求には、データセットの名称、タスクの種別、及びデータセットに含めるデータ等が含まれる。 The user operates the terminal 101 to send a dataset registration request to the computer system 100 to set at least one of the learning dataset 130 and the test dataset 140 (step S101). The dataset registration request includes the name of the dataset, the type of task, and the data to be included in the dataset.

データ記憶部１１３は、ＡＰＩ１１２を介して、データセット登録要求を受信した場合、データセットを記憶領域に登録し（ステップＳ１０２）、完了通知を端末１０１に送信する（ステップＳ１０３）。このとき、データ記憶部１１３は、データセットに対応したデータセット管理データ４００及びデータセット管理データ４１０の少なくともいずれかを生成する。 When the data storage unit 113 receives a dataset registration request via the API 112, it registers the dataset in the storage area (step S102) and sends a completion notification to the terminal 101 (step S103). At this time, the data storage unit 113 generates at least one of the dataset management data 400 and the dataset management data 410 corresponding to the dataset.

ユーザは、端末１０１を操作して、学習実行要求を計算機システム１００に送信する（ステップＳ１１１）。学習実行要求には、モデルの名称、学習データセット１３０の名称、テストデータセット１４０の名称、学習パラメータ、及び判断根拠情報１２０が含まれる。 The user operates the terminal 101 to send a learning execution request to the computer system 100 (step S111). The learning execution request includes the name of the model, the name of the learning dataset 130, the name of the test dataset 140, the learning parameters, and the judgment basis information 120.

学習部１１０は、ＡＰＩ１１２を介して、学習実行要求を受信した場合、学習／評価処理を実行する（ステップＳ１１２）。学習／評価処理の詳細は図８を用いて説明する。 When the learning unit 110 receives a learning execution request via the API 112, it executes a learning/evaluation process (step S112). Details of the learning/evaluation process are described using FIG. 8.

学習部１１０は、学習／評価処理が終了した場合、ＡＰＩ１１２を介して、完了通知を端末１０１に送信する（ステップＳ１１３）。 When the learning/evaluation process is completed, the learning unit 110 sends a completion notification to the terminal 101 via the API 112 (step S113).

ユーザは、端末１０１を操作して、評価結果を取得するための取得要求を計算機システム１００に送信する（ステップＳ１１４）。 The user operates the terminal 101 to send a request to the computer system 100 to obtain the evaluation results (step S114).

学習部１１０は、ＡＰＩ１１２を介して、取得要求を受信した場合、評価情報を生成し、評価情報を端末１０１に送信する（ステップＳ１１５）。評価情報は、例えば、精度評価指標の値及び評価結果データ６５０を含む。なお、評価情報には、分類ラベル正誤６５４がＦａｌｓｅの評価結果データ６５０のみを含めてもよい。 When the learning unit 110 receives an acquisition request via the API 112, it generates evaluation information and transmits the evaluation information to the terminal 101 (step S115). The evaluation information includes, for example, the value of the accuracy evaluation index and the evaluation result data 650. Note that the evaluation information may include only the evaluation result data 650 in which the classification label true/false 654 is False.

ユーザは、評価情報を参照して、データセットの更新方法を決定する。例えば、モデルの分類ラベル及び判断根拠の両方に誤りがある場合、ユーザは、モデルが判断根拠として指定した文字列を正しく認識できないと判断し、文字列に関連する学習データ１３１を追加する。また、判断根拠は正しいが、モデルの分類ラベルが誤っている場合、ユーザは、テストデータ１４１に誤りがあると判断し、テストデータ１４１を修正する。 The user refers to the evaluation information and decides how to update the dataset. For example, if there are errors in both the classification label and the judgment basis of the model, the user determines that the model cannot correctly recognize the character string specified as the judgment basis, and adds learning data 131 related to the character string. On the other hand, if the judgment basis is correct but the classification label of the model is incorrect, the user determines that there is an error in the test data 141, and corrects the test data 141.

なお、学習部１１０が、前述のような判定を行って、判定結果を評価情報に含めてもよい。 The learning unit 110 may perform the above-mentioned judgment and include the judgment result in the evaluation information.

ユーザは、データセットの更新方法を決定した後、端末１０１を操作して、データセットの更新内容を含むデータセット更新要求を計算機システム１００に送信する（ステップＳ１２１）。 After determining the method for updating the dataset, the user operates the terminal 101 to send a dataset update request including the update contents of the dataset to the computer system 100 (step S121).

データ記憶部１１３は、ＡＰＩ１１２を介して、データセット更新要求を受信した場合、更新対象のデータセットを更新し（ステップＳ１２２）、完了通知を端末１０１に送信する（ステップＳ１２３）。このとき、データ記憶部１１３は、データセットに対応したデータセット管理データ４００及びデータセット管理データ４１０の少なくともいずれかを更新する。 When the data storage unit 113 receives a dataset update request via the API 112, it updates the dataset to be updated (step S122) and sends a completion notification to the terminal 101 (step S123). At this time, the data storage unit 113 updates at least one of the dataset management data 400 and the dataset management data 410 corresponding to the dataset.

ユーザは、データセットを更新した後、端末１０１を操作して、学習実行要求を計算機システム１００に送信する（ステップＳ１１１）。 After updating the dataset, the user operates the terminal 101 to send a learning execution request to the computer system 100 (step S111).

図８は、実施例１の計算機システム１００が実行する学習／評価処理を説明するフローチャートである。 Figure 8 is a flowchart explaining the learning/evaluation process executed by the computer system 100 of the first embodiment.

学習部１１０は、モデルデータ６００を生成し、判断根拠情報１２０をワークエリアに保存する（ステップＳ２０１）。 The learning unit 110 generates model data 600 and stores the judgment basis information 120 in the work area (step S201).

具体的には、学習部１１０は、学習実行要求に含まれる情報に基づいて、モデルデータ６００のモデル名６０１、タスク６０２、学習パラメータ６０４、学習データセット名６０５、及びテストデータセット名６０６の各々に値を設定する。また、学習部１１０は、モデルデータ６００のステータス６０７に「学習前」を設定する。 Specifically, the learning unit 110 sets values for each of the model name 601, task 602, learning parameters 604, learning dataset name 605, and test dataset name 606 of the model data 600 based on the information included in the learning execution request. In addition, the learning unit 110 sets the status 607 of the model data 600 to "before learning."

次に、学習部１１０は、ユーザによって指定された学習データセット１３０から学習データ１３１を取得する（ステップＳ２０２）。なお、学習データセット１３０の全ての学習データ１３１を取得してもよいし、所定の数の学習データ１３１を取得してもよい。 Next, the learning unit 110 acquires learning data 131 from the learning dataset 130 specified by the user (step S202). Note that all learning data 131 in the learning dataset 130 may be acquired, or a predetermined number of learning data 131 may be acquired.

次に、学習部１１０は、学習データ１３１及び学習パラメータ等を用いて、モデルを生成するための学習処理を実行する（ステップＳ２０３）。学習処理は公知の技術であるため詳細な説明は省略する。なお、本発明は、学習するモデルの種別及び学習の手法に限定されない。 Next, the learning unit 110 executes a learning process to generate a model using the learning data 131 and the learning parameters, etc. (step S203). The learning process is a well-known technique, so a detailed description is omitted. Note that the present invention is not limited to the type of model to be learned and the learning method.

学習部１１０は、学習処理の開始時に、モデルデータ６００のステータス６０７を「学習中」に更新し、事前学習で生成されたモデルの情報をモデルパラメータ６０３に設定する。学習部１１０は、学習処理が終了した場合、モデルデータ６００のステータス６０７を「学習完了」に更新し、モデルパラメータ６０３に生成されたモデルのパラメータを設定する。 When the learning process starts, the learning unit 110 updates the status 607 of the model data 600 to "learning" and sets the information of the model generated in pre-learning to the model parameters 603. When the learning process ends, the learning unit 110 updates the status 607 of the model data 600 to "learning completed" and sets the parameters of the generated model to the model parameters 603.

次に、学習部１１０は、ユーザによって指定されたテストデータセット１４０からテストデータ１４１を取得する（ステップＳ２０４）。なお、テストデータセット１４０の全てのテストデータ１４１を取得してもよいし、所定の数のテストデータ１４１を取得してもよい。 Next, the learning unit 110 acquires test data 141 from the test dataset 140 specified by the user (step S204). Note that all test data 141 in the test dataset 140 may be acquired, or a predetermined number of test data 141 may be acquired.

次に、学習部１１０は、取得したテストデータ１４１を用いて評価処理を開始する（ステップＳ２０５）。学習部１１０は、取得したテストデータ１４１の中から一つのテストデータ１４１を選択する。 Next, the learning unit 110 starts the evaluation process using the acquired test data 141 (step S205). The learning unit 110 selects one piece of test data 141 from the acquired test data 141.

次に、学習部１１０は、学習処理によって生成されたモデルにテストデータを入力し、モデルから出力を取得する（ステップＳ２０６）。モデルから取得する出力には、分類ラベル、確信度、及びモデル判断根拠データ（トークンリスト及びアテンションリスト）が含まれる。 Next, the learning unit 110 inputs test data into the model generated by the learning process and obtains output from the model (step S206). The output obtained from the model includes a classification label, a confidence level, and model judgment basis data (a token list and an attention list).

次に、学習部１１０は、モデル判断根拠データに基づいて、モデルが重要視したキーワードを特定する（ステップＳ２０７）。例えば、アテンションの値が最も大きいキーワード、又は、アテンションの値が閾値より大きいキーワードが特定される。 Next, the learning unit 110 identifies keywords that the model considers important based on the model judgment basis data (step S207). For example, the keyword with the highest attention value or the keyword with an attention value greater than a threshold value is identified.

次に、学習部１１０は、分類ラベル及び判断根拠の正誤を判定する（ステップＳ２０８）。具体的には、以下のような処理が実行される。 Next, the learning unit 110 judges whether the classification label and the judgment basis are correct (step S208). Specifically, the following process is executed.

（Ｓ２０８－１）学習部１１０は、テストデータ１４１の分類ラベル３１３の値と、モデルが出力した分類ラベルとを比較することによって、分類ラベルの正誤を判定する。すなわち、モデルの出力の正しさが評価される。 (S208-1) The learning unit 110 compares the value of the classification label 313 of the test data 141 with the classification label output by the model to determine whether the classification label is correct. In other words, the correctness of the model output is evaluated.

（Ｓ２０８－２）学習部１１０は、判断根拠情報１２０を参照して、分類ラベル５０１がテストデータ１４１の分類ラベル３１３に一致する判断根拠データ５００を検索する。学習部１１０は、特定されたキーワードと、検索された判断根拠データ５００のキーワード５０２に設定されるキーワードとを比較することによって、判断根拠の正誤を判定する。すなわち、モデルのアルゴリズムの正しさが評価される。 (S208-2) The learning unit 110 refers to the judgment basis information 120 and searches for judgment basis data 500 whose classification label 501 matches the classification label 313 of the test data 141. The learning unit 110 compares the identified keywords with the keywords set in the keywords 502 of the searched judgment basis data 500 to determine whether the judgment basis is correct or not. In other words, the correctness of the model's algorithm is evaluated.

例えば、特定されたキーワードと、キーワード５０２に設定されるキーワードとが完全に一致する場合、又は、特定されたキーワードがキーワード５０２に設定されるキーワードに含まれる場合、学習部１１０は人の判断根拠とモデルの判断根拠とが一致する、と判定する。 For example, if the identified keyword completely matches the keyword set in keyword 502, or if the identified keyword is included in the keywords set in keyword 502, the learning unit 110 determines that the human judgment basis and the model judgment basis match.

以上が、ステップＳ２０８の処理の説明である。 This concludes the explanation of the processing in step S208.

次に、学習部１１０は、評価結果データ６５０を生成する（ステップＳ２０９）。 Next, the learning unit 110 generates evaluation result data 650 (step S209).

具体的には、学習部１１０は、データＩＤ６５１にテストデータ１４１の識別情報が設定された評価結果データ６５０を生成する。また、学習部１１０は、モデルデータ６００の評価結果６１０に評価結果データ６５０を格納する。 Specifically, the learning unit 110 generates evaluation result data 650 in which the identification information of the test data 141 is set in the data ID 651. In addition, the learning unit 110 stores the evaluation result data 650 in the evaluation result 610 of the model data 600.

次に、学習部１１０は、取得した全てのテストデータ１４１について処理が完了したか否かを判定する（ステップＳ２１０）。 Next, the learning unit 110 determines whether processing has been completed for all acquired test data 141 (step S210).

取得した全てのテストデータ１４１について処理が完了していない場合、学習部１１０は、ステップＳ２０５に戻り、同様の処理を実行する。 If processing has not been completed for all acquired test data 141, the learning unit 110 returns to step S205 and executes the same processing.

取得した全てのテストデータ１４１について処理が完了した場合、学習部１１０は、精度評価指標を算出する（ステップＳ２１１）。このとき、学習部１１０は、モデルデータ６００の精度評価指標６０９に算出された精度評価指標を設定する。 When processing has been completed for all acquired test data 141, the learning unit 110 calculates an accuracy evaluation index (step S211). At this time, the learning unit 110 sets the calculated accuracy evaluation index to the accuracy evaluation index 609 of the model data 600.

次に、学習部１１０は、終了条件を満たすか否かを判定する（ステップＳ２１２）。例えば、精度評価指標が閾値より大きい場合、又は、学習処理の実行回数が閾値より大きい場合、学習部１１０は終了条件を満たすと判定する。 Next, the learning unit 110 determines whether the termination condition is satisfied (step S212). For example, if the accuracy evaluation index is greater than a threshold value, or if the number of times the learning process has been executed is greater than a threshold value, the learning unit 110 determines that the termination condition is satisfied.

終了条件を満たさないと判定された場合、学習部１１０は、ステップＳ２０２に戻り、同様の処理を実行する。このとき、学習部１１０は、モデルデータ６００の精度評価指標６０９及び評価結果６１０を初期化する。 If it is determined that the termination condition is not satisfied, the learning unit 110 returns to step S202 and executes the same process. At this time, the learning unit 110 initializes the accuracy evaluation index 609 and the evaluation result 610 of the model data 600.

終了条件を満たすと判定された場合、学習部１１０は、学習／評価処理を終了する。 If it is determined that the termination condition is met, the learning unit 110 terminates the learning/evaluation process.

本発明によれば、計算機システム１００は、人の判断根拠及びモデルの判断根拠を比較することによってモデルのアルゴリズムの正しさを評価し、ユーザに提示できる。ユーザは、モデルの出力及びアルゴリズムの正しさを考慮して学習データ及びテストデータの改善を行うことができる。これによって、高い予測精度のモデルを生成することができる。 According to the present invention, the computer system 100 can evaluate the correctness of the model's algorithm by comparing the human judgment basis with the model's judgment basis, and present the result to the user. The user can improve the training data and test data by taking into account the model's output and the correctness of the algorithm. This makes it possible to generate a model with high predictive accuracy.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。また、例えば、上記した実施例は本発明を分かりやすく説明するために構成を詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、実施例の構成の一部について、他の構成に追加、削除、置換することが可能である。 The present invention is not limited to the above-described embodiment, but includes various modified examples. For example, the above-described embodiment describes the configuration in detail to clearly explain the present invention, and is not necessarily limited to having all of the configurations described. Also, some of the configurations of the embodiment can be added to, deleted from, or replaced with other configurations.

１００計算機システム
１０１端末
１１０学習部
１１１予測部
１１２ＡＰＩ
１１３データ記憶部
１２０判断根拠情報
１２１モデル管理情報
１３０学習データセット
１３１学習データ
１４０テストデータセット
１４１テストデータ
２００計算機
２０１プロセッサ
２０２主記憶装置
２０３副記憶装置
２０４ネットワークインタフェース
４００、４１０データセット管理データ
５００判断根拠データ
６００モデルデータ
６５０評価結果データ 100 Computer system 101 Terminal 110 Learning unit 111 Prediction unit 112 API
113 Data storage unit 120 Judgment basis information 121 Model management information 130 Learning data set 131 Learning data 140 Test data set 141 Test data 200 Computer 201 Processor 202 Main storage device 203 Sub-storage device 204 Network interface 400, 410 Data set management data 500 Judgment basis data 600 Model data 650 Evaluation result data

Claims

A method for evaluating a model, executed by a computer system, which receives a document as an input, executes a task using the document, and outputs a classification label, comprising:
The computer system comprises:
at least one computer having a processor, a storage device coupled to the processor, and an interface coupled to the processor;
retaining judgment basis information for managing the association between the classification label and a keyword of interest that is considered important when a person assigns the classification label;
The method for evaluating the model includes:
A first step in which the at least one computer inputs test data including input documents and ground truth classification labels to the model;
A second step in which the at least one computer identifies basis keywords in the input document that the model considered important when outputting the classification label;
A third step in which the at least one computer determines whether an output of the model is correct or incorrect based on the classification label and the correct classification label;
a fourth step in which the at least one computer judges whether the judgment basis of the model is correct or not based on the basis keyword and the target keyword corresponding to the classification label;
A fifth step in which the at least one computer presents an evaluation result including information serving as an improvement guideline for the training data and the test data used for re-training the model based on the output of the model and the result of determining whether the decision basis of the model is correct or incorrect;
A method for evaluating a model, comprising:

A method for evaluating a model according to claim 1, comprising the steps of:
The fifth step is a method for evaluating a model, characterized in that it includes a step in which the at least one computer presents the evaluation result including information indicating the classification label to be improved based on the judgment results of the correctness of the output of the model and the correctness of the judgment basis of the model.

A method for evaluating a model according to claim 1, comprising the steps of:
The second step includes:
The at least one computer obtains an index representing the importance of a keyword included in the input document in the natural language processing of the model;
and identifying, by the at least one computer, the basis keywords from among the keywords included in the input document based on the index.

A method for evaluating a model according to claim 1, comprising the steps of:
A method for evaluating a model, comprising a step in which the at least one computer provides an interface for inputting the decision basis information.

1. A computer system comprising:
at least one computer having a processor, a storage device coupled to the processor, and an interface coupled to the processor;
retaining information on a model that receives a document as an input and outputs a classification label by executing a task using the document, and judgment basis information that manages associations between the classification label and a keyword of interest that is considered important when a person assigns the classification label;
The at least one computer
inputting test data including input documents and correct classification labels into the model;
Identifying basis keywords in the input document that the model considered important when outputting the classification label;
determining whether an output of the model is correct or not based on the classification label and the correct classification label;
determining whether the judgment basis of the model is correct or not based on the basis keyword and the target keyword corresponding to the classification label;
A computer system characterized by presenting evaluation results including information that serves as a guideline for improving the training data and test data used for re-learning the model based on the output of the model and the judgment results of the correctness of the model's judgment basis.

6. The computer system of claim 5,
The at least one computer presents the evaluation result including information indicating the classification label to be improved based on the judgment results of the correctness of the output of the model and the correctness of the judgment basis of the model.

6. The computer system of claim 5,
The at least one computer
In the natural language processing of the model, an index representing the importance of a keyword included in the input document is obtained;
A computer system comprising: a computer that identifies the basis keyword from among the keywords included in the input document based on the index.

6. The computer system of claim 5,
A computer system, wherein the at least one computer provides an interface for inputting the decision basis information.