JP7066385B2

JP7066385B2 - Information processing methods, information processing equipment, information processing systems and programs

Info

Publication number: JP7066385B2
Application number: JP2017228150A
Authority: JP
Inventors: 将実川岸; 裕之山本
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-11-28
Filing date: 2017-11-28
Publication date: 2022-05-13
Anticipated expiration: 2037-11-28
Also published as: JP2019101485A

Description

本明細書の開示は、情報処理方法、情報処理装置、情報処理システム及びプログラムに関する。 The disclosure of this specification relates to an information processing method, an information processing apparatus, an information processing system and a program.

あるデータを用いて機械学習した結果を利用して、他のデータに有効な機械学習を効率的に行う転移学習の技術が知られている（非特許文献１）。 There is known a transfer learning technique that efficiently performs machine learning that is effective for other data by using the result of machine learning using certain data (Non-Patent Document 1).

特許文献１には、同一のデータに付与された異なるラベルを用いて、第１のラベルを分類するように学習した分類器を基に、第２のラベルを分類する分類器に転移学習する技術が記載されている。 Patent Document 1 describes a technique for transfer learning to a classifier that classifies a second label based on a classifier that has been trained to classify the first label using different labels attached to the same data. Is described.

特開２０１７－０８４３２０号公報Japanese Unexamined Patent Publication No. 2017-084320

ＰａｎＳＪａｎｄＹａｎｇＱ，「Ａｓｕｒｖｅｙｏｎｔｒａｎｓｆｅｒｌｅａｒｎｉｎｇ」，ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＫｎｏｗｌｅｄｇｅａｎｄＤａｔａＥｎｇｉｎｅｅｒｉｎｇ，２２（１０），ｐｐ．１３４５－１３４９，２０１０Pan SJ and Yang Q, "A survive on transfer learning", IEEE Transitions on Knowledge and Data Engineering, 22 (10), pp. 1345-1349, 2010

医療分野のように、データの分野によっては付与されるラベルの量が異なる場合がある。例えば、第１のラベルを年齢、第２のラベルを血液検査値とすると、第１のラベルは全例に付与されるが、第２のラベルは血液検査を行った例にのみ付与される。特許文献１は、ラベルの種別でデータ数が異なる場合については開示していない。 As in the medical field, the amount of labels given may differ depending on the field of data. For example, if the first label is the age and the second label is the blood test value, the first label is given to all cases, but the second label is given only to the cases where the blood test is performed. Patent Document 1 does not disclose the case where the number of data differs depending on the type of label.

本発明の実施形態に係る情報処理方法は、第１のラベルと第２のラベルとのうち、前記第１のラベルのみが付与された第１の医用データ群と、前記第１のラベルと前記第２のラベルとのうち、前記第２のラベルのみが付与された第２の医用データ群とを取得する取得工程と、前記第１の医用データ群に基づいて第１の機械学習を行う第１の学習工程と、前記第１の機械学習におけるパラメータと前記第２の医用データ群とに基づいて第２の機械学習を行う第２の学習工程と、を有することを特徴とする。 The information processing method according to the embodiment of the present invention includes a first medical data group to which only the first label is attached among the first label and the second label, and the first label and the above. Of the second label, the acquisition step of acquiring the second medical data group to which only the second label is attached, and the first machine learning based on the first medical data group are performed. It is characterized by having one learning step and a second learning step of performing a second machine learning based on the parameters in the first machine learning and the second medical data group.

本発明の実施形態にかかる情報処理方法によれば、データ群ごとに異なる種別のラベルが付与されている場合であっても、第１のラベルに係る機械学習の結果に基づいて、第２のラベルに係る機械学習を適切に行うことができる。 According to the information processing method according to the embodiment of the present invention, even if different types of labels are assigned to each data group, the second label is based on the result of machine learning related to the first label. Machine learning related to labels can be performed appropriately.

情報処理装置の機能構成の一例を示す図である。It is a figure which shows an example of the functional structure of an information processing apparatus. 情報処理装置のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware composition of an information processing apparatus. 情報処理装置の処理の一例を示すフローチャートである。It is a flowchart which shows an example of the processing of an information processing apparatus. 第１の機械学習と第２の機械学習の例を示す図である。It is a figure which shows the example of the 1st machine learning and the 2nd machine learning. 情報処理装置の機能構成の他の一例を示す図である。It is a figure which shows another example of the functional structure of an information processing apparatus. 情報処理装置の処理の他の一例を示すフローチャートである。It is a flowchart which shows another example of the processing of an information processing apparatus.

以下、本発明を実施するための形態について図面を用いて説明する。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings.

＜実施形態１＞
医療分野において、医用画像から医師の診断の補助となるような情報を推論して提示する診断支援装置の開発が進められている。当該推論のために、医用画像とそのラベルに基づく機械学習が行われる場合がある。機械学習には多数の学習用データが求められるが、診断に用いられる様々な情報のうち、推論の対象となる情報のラベルを含むデータを多数入手できない場合がある。実施形態１は、推論の対象となる情報のラベルを含むデータが少数である場合にも、精度良く機械学習を行えるようにすることを目的とする。 <Embodiment 1>
In the medical field, development of a diagnostic support device that infers and presents information that assists a doctor's diagnosis from medical images is underway. Machine learning based on medical images and their labels may be performed for this reasoning. Although a large amount of learning data is required for machine learning, it may not be possible to obtain a large amount of data including labels of information to be inferred among various information used for diagnosis. The first embodiment aims to enable accurate machine learning even when there are a small number of data including labels of information to be inferred.

実施形態１における情報処理装置は、多数のデータから構築された第１のラベルを分類する分類器を転移させて、少数のデータを用いて第２のラベルを分類する分類器を構築する。 The information processing apparatus in the first embodiment transfers a classifier that classifies the first label constructed from a large number of data, and constructs a classifier that classifies the second label using a small number of data.

以下では、学習に用いる医用画像は胸部Ｘ線ＣＴ画像とし、第１のラベルと第２のラベルとして医用情報を用いるものとする。第１のラベル及び第２のラベルはそれぞれ、医用画像の被検体の状態を表す医用情報である。より具体的には、第１のラベルとして肺結節の診断名を、第２のラベルとして肺結節の画像所見を用いるものとする。 In the following, the medical image used for learning will be a chest X-ray CT image, and medical information will be used as the first label and the second label. The first label and the second label are medical information indicating the state of the subject in the medical image, respectively. More specifically, the diagnostic name of the lung nodule is used as the first label, and the imaging findings of the lung nodule are used as the second label.

図１は、実施形態１における情報処理装置１００の機能構成の一例を示す図である。情報処理装置１００は本発明の実施形態に係る情報処理方法を実行する装置の一例である。本実施形態における情報処理装置１００は、記憶部２００は学習に用いる医用画像と、ラベルとなる当該医用画像に関する診断名や画像所見等の医用情報を保持している。記憶部２００は、ＰＡＣＳや電子カルテ、読影レポートから抽出された情報を保持する。あるいは、記憶部２００はＰＡＣＳや電子カルテ、読影レポートでもよく、記憶部２００は情報処理装置１００からの要求に従い、必要となる情報を情報処理装置１００に出力する。 FIG. 1 is a diagram showing an example of the functional configuration of the information processing apparatus 100 according to the first embodiment. The information processing device 100 is an example of a device that executes the information processing method according to the embodiment of the present invention. In the information processing apparatus 100 of the present embodiment, the storage unit 200 holds a medical image used for learning and medical information such as a diagnosis name and image findings related to the medical image as a label. The storage unit 200 holds information extracted from the PACS, the electronic medical record, and the interpretation report. Alternatively, the storage unit 200 may be a PACS, an electronic medical record, or an image interpretation report, and the storage unit 200 outputs necessary information to the information processing device 100 in accordance with a request from the information processing device 100.

情報処理装置１００は、取得部１０２と、選択部１０４と、第１の機械学習部１０６と、第２の機械学習部１０８とを含む。取得部１０２は記憶部２００に要求を行い、医用画像と診断名（第１のラベル）の組を複数有する第１の医用データ群と、医用画像と画像所見（第２のラベル）の組を複数有する第２の医用データ群を取得する。選択部１０４は第１のラベルと第２のラベルとが付与されたデータの少なくとも一部を第３の医用データ群として選択する。第１の機械学習部１０６は、第１の医用データ群に基づいて診断名を分類する第１の機械学習を行う。第２の機械学習部１０８は、第１の機械学習の結果に基づいて、第２の医用データ群と第３の医用データ群に基づいて画像所見を分類する第２の機械学習を行う。 The information processing apparatus 100 includes an acquisition unit 102, a selection unit 104, a first machine learning unit 106, and a second machine learning unit 108. The acquisition unit 102 makes a request to the storage unit 200, and obtains a first medical data group having a plurality of sets of medical images and diagnosis names (first label), and a set of medical images and image findings (second label). Acquire a second group of medical data having a plurality of pieces. The selection unit 104 selects at least a part of the data to which the first label and the second label are attached as the third medical data group. The first machine learning unit 106 performs the first machine learning that classifies the diagnosis names based on the first medical data group. The second machine learning unit 108 performs a second machine learning that classifies image findings based on a second medical data group and a third medical data group based on the result of the first machine learning.

なお、図１に示した情報処理装置１００の各部の少なくとも一部は独立した装置として実現してもよい。また、夫々の機能を実現するソフトウェアとして実現してもよい。本実施形態では、各部はそれぞれソフトウェアにより実現されているものとする。 At least a part of each part of the information processing apparatus 100 shown in FIG. 1 may be realized as an independent apparatus. Further, it may be realized as software that realizes each function. In this embodiment, it is assumed that each part is realized by software.

図２は、情報処理装置１００のハードウェア構成の一例を示す図である。ＣＰＵ１００１は、主として各構成要素の動作を制御する。主メモリ１００２は、ＣＰＵ１００１が実行する制御プログラムを格納したり、ＣＰＵ１００１によるプログラム実行時の作業領域を提供したりする。磁気ディスク１００３は、オペレーティングシステム（ＯＳ）、周辺機器のデバイスドライバ、後述する処理等を行うためのプログラムを含む各種アプリケーションソフトを実現するためのプログラムを格納する。ＣＰＵ１００１が主メモリ１００２、磁気ディスク１００３等に格納されているプログラムを実行することにより、図１に示した情報処理装置１００の機能（ソフトウェア）及び後述するフローチャートにおける処理が実現される。 FIG. 2 is a diagram showing an example of the hardware configuration of the information processing apparatus 100. The CPU 1001 mainly controls the operation of each component. The main memory 1002 stores a control program executed by the CPU 1001 and provides a work area when the program is executed by the CPU 1001. The magnetic disk 1003 stores a program for realizing various application software including an operating system (OS), a device driver of a peripheral device, and a program for performing processing described later. By executing the program stored in the main memory 1002, the magnetic disk 1003, etc., the CPU 1001 realizes the function (software) of the information processing apparatus 100 shown in FIG. 1 and the processing in the flowchart described later.

表示メモリ１００４は、表示用データを一時記憶する。モニタ１００５は、例えばＣＲＴモニタや液晶モニタ等であり、表示メモリ１００４からのデータに基づいて画像やテキスト等の表示を行う。マウス１００６及びキーボード１００７は、ユーザによるポインティング入力及び文字等の入力を夫々行う。上記各構成要素は、共通バス１００８により互いに通信可能に接続されている。 The display memory 1004 temporarily stores display data. The monitor 1005 is, for example, a CRT monitor, a liquid crystal monitor, or the like, and displays an image, a text, or the like based on the data from the display memory 1004. The mouse 1006 and the keyboard 1007 each perform pointing input and character input by the user. Each of the above components is communicably connected to each other by a common bus 1008.

ＣＰＵ１００１はプロセッサの一例である。情報処理装置１００は、ＣＰＵ１００１に加えて、ＧＰＵやＦＰＧＡ（Ｆｉｅｌｄ－ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）の少なくともいずれかを有していてもよい。主メモリ１００２、磁気ディスク１００３はメモリの一例である。情報処理装置１００は、メモリとしてＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）を有していてもよい。 The CPU 1001 is an example of a processor. The information processing apparatus 100 may have at least one of a GPU and an FPGA (Field-Programmable Gate Array) in addition to the CPU 1001. The main memory 1002 and the magnetic disk 1003 are examples of memory. The information processing apparatus 100 may have an SSD (Solid State Drive) as a memory.

次に、図３のフローチャートを用いて、情報処理装置１００が行う全体の処理を説明する。図３は、情報処理装置１００が行う処理の一例を示すフローチャートである。本実施形態では、ＣＰＵ１００１が主メモリ１００２に格納されている各部の機能を実現するプログラムを実行することにより図３に示す処理が実現される。 Next, the entire process performed by the information processing apparatus 100 will be described with reference to the flowchart of FIG. FIG. 3 is a flowchart showing an example of processing performed by the information processing apparatus 100. In the present embodiment, the process shown in FIG. 3 is realized by the CPU 1001 executing a program that realizes the functions of each part stored in the main memory 1002.

なお、本実施形態では、第１の機械学習で用いるデータ（第１の医用データ群）と第２の機械学習で用いるデータ（第２の医用データ群と第３の医用データ群の和）でデータが重複しないようにする。これは、第１の機械学習と第２の機械学習の関係が独立ではないため、データが重複したことによる機械学習への影響を避けるためである。なお、本実施形態では第１の機械学習と第２の機械学習とで学習用データに重複がないようにする場合を例に説明するが、後述の変形例等に示すように、重複があってもよい。 In this embodiment, the data used in the first machine learning (first medical data group) and the data used in the second machine learning (sum of the second medical data group and the third medical data group) are used. Avoid duplication of data. This is to avoid the influence on machine learning due to duplication of data because the relationship between the first machine learning and the second machine learning is not independent. In this embodiment, a case where the learning data is not duplicated between the first machine learning and the second machine learning will be described as an example, but as shown in the modification described later, there is duplication. You may.

ステップＳ３１０において、取得部１０２は、記憶部２００に要求を行い第１の医用データ群と第２の医用データ群を取得する。本実施形態では、第１の医用データ群として診断名が付与された複数の医用画像と、第２の医用データ群として画像所見の一つである全体形状が付与された複数の医用画像を取得する。ただし本実施形態では、第１の医用データ群は画像所見が付与されたデータを含まないこととする。診断名と全体形状の両方が付与された医用画像を、後述の第３の医用データ群とする。具体的には、取得部１０２は記憶部２００に、診断名が付与された複数の医用画像と、全体形状の画像所見が付与された複数の医用画像とを要求する。取得部１０２は、記憶部２００から取得したこれらのデータ群のうち、診断名が付与され、全体形状の画像所見が付与されていない複数の医用画像を第１の医用データ群として取得する。取得部１０２は、記憶部２００から取得したこれらのデータ群のうち、全体形状の画像所見が付与され、診断名が付与されていない複数の医用画像を第２の医用データ群として取得する。すなわち、第１のラベル（たとえば診断名）と第２のラベル（たとえば画像所見）とのうち、第１のラベルのみが付与されたデータ群が第１の医用データ群である。当該第１のラベルと当該第２のラベルとのうち、第２のラベルのみが付与されたデータ群が第２の医用データ群である。ステップＳ３１０は、第１の医用データ群と第２の医用データ群とを取得する取得工程の一例である。 In step S310, the acquisition unit 102 makes a request to the storage unit 200 to acquire the first medical data group and the second medical data group. In the present embodiment, a plurality of medical images to which a diagnosis name is given as a first medical data group and a plurality of medical images to which an overall shape, which is one of the image findings, are given as a second medical data group are acquired. do. However, in the present embodiment, the first medical data group does not include the data to which the image findings are added. A medical image to which both a diagnosis name and an overall shape are given is referred to as a third medical data group described later. Specifically, the acquisition unit 102 requests the storage unit 200 to have a plurality of medical images to which a diagnosis name is given and a plurality of medical images to which image findings of the entire shape are given. Among these data groups acquired from the storage unit 200, the acquisition unit 102 acquires a plurality of medical images to which the diagnosis name is given and the image findings of the entire shape are not given as the first medical data group. Among these data groups acquired from the storage unit 200, the acquisition unit 102 acquires a plurality of medical images to which the image findings of the entire shape are given and to which the diagnosis name is not given, as the second medical data group. That is, of the first label (for example, diagnosis name) and the second label (for example, image findings), the data group to which only the first label is attached is the first medical data group. Of the first label and the second label, the data group to which only the second label is attached is the second medical data group. Step S310 is an example of an acquisition step of acquiring a first medical data group and a second medical data group.

ここで、診断名としては原発性肺癌、癌の肺転移、良性結節の３つのうちいずれかが付与されているものとする。画像所見とは医用画像の特徴を表現するものであり、たとえば文言で表現される。全体形状としては球形、楔形、不整形、平面状の４つのうちいずれかが付与されているものとする。また、第１の医用データ群のデータ数は第２の医用データ群のデータ数よりも多い（例えば５倍以上存在する）ものとする。 Here, it is assumed that any one of three diagnosis names is given: primary lung cancer, lung metastasis of cancer, and benign nodule. Image findings express the characteristics of medical images, and are expressed in words, for example. It is assumed that any of four shapes, spherical, wedge-shaped, irregular, and planar, is given as the overall shape. Further, it is assumed that the number of data in the first medical data group is larger than the number of data in the second medical data group (for example, there are five times or more).

第１の医用データ群及び第２の医用データ群について示した数量は一例であり、必ずしもこの条件を満たす必要はない。第１の医用データ群及び第２の医用データ群についての数量の一例を示したが、第１の医用データ群と第３の医用データ群の和と、第２の医用データ群と第３の医用データ群の和とのデータ数ととらえてもよい。また、第１のラベルを有するデータのデータ数と第２のラベルを有するデータのデータ数との関係は、例えば前者が少数で後者が多数であってもよいし、両者が同程度の数であってもよい。 The quantities shown for the first medical data group and the second medical data group are examples, and it is not always necessary to satisfy this condition. An example of the quantity of the first medical data group and the second medical data group is shown, but the sum of the first medical data group and the third medical data group, the second medical data group and the third medical data group are shown. It may be regarded as the number of data with the sum of medical data groups. Further, the relationship between the number of data having the first label and the number of data having the second label may be, for example, a small number of the former and a large number of the latter, or both may have the same number. There may be.

ステップＳ３２０において、選択部１０４は、ステップＳ３１０で記憶部２００から取得したデータ群のうち、診断名と全体形状の画像所見とが付与された複数の医用画像を第３の医用データ群として選択する。本実施形態では、診断名と全体形状の画像所見とがともに付与された全てのデータを第３の医用データ群として選択するものとする。すなわち第１のラベル（たとえば診断名）と第２のラベル（たとえば画像所見）とが付与されたデータ群が第３の医用データ群である。ステップＳ３２０の処理とステップＳ３１０の処理を統合して取得部１０２が行ってもよい。 In step S320, the selection unit 104 selects, among the data groups acquired from the storage unit 200 in step S310, a plurality of medical images to which the diagnosis name and the image findings of the overall shape are given as the third medical data group. .. In the present embodiment, all the data to which the diagnosis name and the image finding of the whole shape are given are selected as the third medical data group. That is, the data group to which the first label (for example, the diagnosis name) and the second label (for example, the image findings) are attached is the third medical data group. The acquisition unit 102 may integrate the processing of step S320 and the processing of step S310.

ステップＳ３３０において、第１の機械学習部１０６は、ステップＳ３１０で取得した第１の医用データ群に基づいて、診断名を分類する第１の機械学習を行う。第１の機械学習は深層畳み込みニューラルネットワーク（ＤＣＮＮ：ＤｅｅｐＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）の学習をするものとする。ＤＣＮＮは一般に入力層と複数の畳み込み層と全結合層、出力層から構成される。以下では、全層数をＮで示し、出力層をＮ番目の層とする。ステップＳ３３０は、第１の医用データ群に基づいて第１の機械学習を行う第１の学習工程の一例である。 In step S330, the first machine learning unit 106 performs the first machine learning to classify the diagnosis name based on the first medical data group acquired in step S310. The first machine learning is to learn a deep convolutional neural network (DCNN). The DCNN is generally composed of an input layer, a plurality of convolution layers, a fully connected layer, and an output layer. In the following, the total number of layers is indicated by N, and the output layer is referred to as the Nth layer. Step S330 is an example of a first learning step in which the first machine learning is performed based on the first medical data group.

ステップＳ３４０において、第２の機械学習部１０８は、ステップＳ３１０で取得した第２の医用データ群と、ステップＳ３２０で選択した第３の医用データ群に基づいて、第２の機械学習を行う。第２の機械学習は、ステップＳ３３０で行った第１の機械学習結果に基づいて、画像所見を分類する機械学習である。第１の機械学習結果とは、たとえば第１の機械学習で学習されたパラメータや、当該パラメータに基づいて算出される出力値のことであり、以下では第１の機械学習におけるパラメータと称する。本実施形態では、第２の医用データ群と第３の医用データ群の和のデータを用いて、第１の機械学習結果をｆｉｎｅ―ｔｕｎｉｎｇすることにより第２の機械学習を行うものとする。すなわち第２の機械学習部１０８は、第２の機械学習においてＤＣＮＮの学習を行う。ＤＣＮＮにおけるｆｉｎｅ－ｔｕｎｉｎｇとは、第１のラベルを有するデータで学習したモデルの出力層を第２のラベルに適合するように入れ替え、学習済みのモデルのパラメータを初期値として、第２のラベルを有するデータで再学習を実施する手法のことを指す。この手法を用いると、初期値なしで学習を行うのに対して、少ないデータ数で同等の性能を達成することができる。ステップＳ３４０は、第１の機械学習におけるパラメータと前記第２の医用データ群とに基づいて第２の機械学習を行う第２の学習工程の一例である。 In step S340, the second machine learning unit 108 performs the second machine learning based on the second medical data group acquired in step S310 and the third medical data group selected in step S320. The second machine learning is machine learning that classifies image findings based on the result of the first machine learning performed in step S330. The first machine learning result is, for example, a parameter learned in the first machine learning or an output value calculated based on the parameter, and is hereinafter referred to as a parameter in the first machine learning. In the present embodiment, the second machine learning is performed by fine-tuning the first machine learning result using the data of the sum of the second medical data group and the third medical data group. That is, the second machine learning unit 108 learns DCNN in the second machine learning. Fine-tuning in DCNN means that the output layer of the model trained with the data having the first label is replaced so as to match the second label, and the parameter of the trained model is used as the initial value, and the second label is used. It refers to a method of performing re-learning with the data that it has. By using this method, it is possible to achieve the same performance with a small number of data, while learning is performed without initial values. Step S340 is an example of a second learning step in which the second machine learning is performed based on the parameters in the first machine learning and the second medical data group.

なお、機械学習の手法はＤＣＮＮに限定されない。第１の機械学習部１０６及び第２の機械学習部１０８は、ベイジアンネットワーク、サポートベクターマシン、ランダムフォレストなどの手法を用いて機械学習を行ってもよい。 The machine learning method is not limited to DCNN. The first machine learning unit 106 and the second machine learning unit 108 may perform machine learning using a method such as a Bayesian network, a support vector machine, or a random forest.

図４は本実施形態におけるデータ群の関係とｆｉｎｅ―ｔｕｎｉｎｇの例を示した図である。全データ中の２つの円はそれぞれ、診断名が付与された複数の医用画像の集合（左）と、全体形状の画像所見が付与された複数の医用画像の集合（右）を表す。第１の医用データ群４１０（図４中の濃色部分）と第２の医用データ群４２０（図４中の淡色部分）はステップＳ３１０で取得した夫々のデータ群の模式図である。第３の医用データ群４３０はステップＳ３２０で選択されたデータ群であり、診断名と、全体形状の画像所見とが付与された複数の医用画像のデータ群である。 FIG. 4 is a diagram showing an example of the relationship between data groups and fine-tuning in the present embodiment. The two circles in the total data represent a set of a plurality of medical images with a diagnosis name (left) and a set of a plurality of medical images with the image findings of the whole shape (right), respectively. The first medical data group 410 (dark color portion in FIG. 4) and the second medical data group 420 (light color portion in FIG. 4) are schematic views of the respective data groups acquired in step S310. The third medical data group 430 is a data group selected in step S320, and is a data group of a plurality of medical images to which a diagnosis name and an image finding of the whole shape are given.

第１の機械学習部１０６は、第１の医用データ群４１０（図４中の濃色部分）で第１の機械学習を行う（ステップＳ３３０）。第２の機械学習部１０８は第１の機械学習部１０６と出力層（第Ｎ層）以外は同一のモデル構造を有し、第１の機械学習結果における第１層から第Ｎ－１層までのパラメータ４４０を初期値として有する。この時、第１の機械学習部の出力層４５０が診断名（第１のラベル）となっているのに対し、第２の機械学習部の出力層４６０は全体形状（第２のラベル）に置き換わっている。そして、第２の医用データ群と第３の医用データ群の和（図４中の淡色部分と図４中の斜線部分。すなわち右側の円で表されるデータ群）のデータで第２の機械学習を行う（ステップＳ３４０）。第１の機械学習結果を基に第２の機械学習を行うため、有効な第２の機械学習に必要なデータ数を抑えることができる。 The first machine learning unit 106 performs the first machine learning on the first medical data group 410 (dark color portion in FIG. 4) (step S330). The second machine learning unit 108 has the same model structure as the first machine learning unit 106 except for the output layer (Nth layer), and from the first layer to the N-1 layer in the first machine learning result. It has the parameter 440 of the above as an initial value. At this time, the output layer 450 of the first machine learning unit has a diagnostic name (first label), whereas the output layer 460 of the second machine learning unit has an overall shape (second label). It has been replaced. Then, the second machine is the data of the sum of the second medical data group and the third medical data group (the light-colored portion in FIG. 4 and the shaded portion in FIG. 4, that is, the data group represented by the circle on the right side). Learning is performed (step S340). Since the second machine learning is performed based on the first machine learning result, the number of data required for the effective second machine learning can be suppressed.

実施形態１においては全体形状の画像所見が付与されたデータ群を用いて第２の機械学習を行う場合を例に説明するが、第２の機械学習は異なるそれぞれの画像所見ごとに行われてよい。すなわち、１つの機械学習（第１の機械学習）におけるパラメータを用いて、複数の転移学習（第２の機械学習）が行われてもよい。前述のように医用データ群においては、多数の種別の画像所見が用いられる一方、それぞれの画像所見が付与されたデータ群の数が少ない場合がある。１つの機械学習のパラメータに基づいて複数の転移学習を行うことは医用データ群を用いて機械学習を行う場合に特に有効である。 In the first embodiment, a case where the second machine learning is performed using the data group to which the image findings of the whole shape are given will be described as an example, but the second machine learning is performed for each different image finding. good. That is, a plurality of transfer learnings (second machine learning) may be performed using the parameters in one machine learning (first machine learning). As described above, in the medical data group, while many types of image findings are used, the number of data groups to which each image finding is given may be small. Performing a plurality of transfer learnings based on one machine learning parameter is particularly effective when performing machine learning using a medical data group.

例えば、第１の医用データ群のデータ数が９０００、第２の医用データ群のデータ数が５００、第３の医用データ群のデータ数が１０００である場合を考える。ラベルが重複しているデータ（第３の医用データ群）を機械学習に使わないようにすると、第１の機械学習に用いる学習データのデータ数は９０００、第２の機械学習に用いる学習データのデータ数は５００となる。一方で、本実施形態の処理を行うと、第１の機械学習に用いる学習データの数は変わらず９０００、第２の機械学習に用いる学習データのデータ数は１５００となる。したがって、第２の機械学習に用いる学習データは、第１の機械学習に用いる学習データとラベルが重複しているデータを除く場合に比べて、データ数が３倍になる。一般に、機械学習ではデータ数が多いほど精度の高い学習ができるので、重複したデータを両者の機械学習に使わない場合に比べ、第２の機械学習の精度に正の影響が出ることが期待できる。 For example, consider a case where the number of data in the first medical data group is 9000, the number of data in the second medical data group is 500, and the number of data in the third medical data group is 1000. If the data with overlapping labels (third medical data group) is not used for machine learning, the number of training data used for the first machine learning is 9000, and the number of training data used for the second machine learning is 9000. The number of data is 500. On the other hand, when the processing of the present embodiment is performed, the number of learning data used for the first machine learning does not change to 9000, and the number of learning data used for the second machine learning becomes 1500. Therefore, the number of learning data used for the second machine learning is three times as large as that in the case of excluding the data whose label overlaps with the learning data used for the first machine learning. In general, in machine learning, the larger the number of data, the higher the accuracy of learning, so it can be expected that the accuracy of the second machine learning will be positively affected compared to the case where duplicated data is not used for both machine learning. ..

本実施形態では、第１のラベルと第２のラベルとのうち第１のラベルのみが付与された第１の医用データ群で第１の機械学習を行う。また、第１のラベルと第２のラベルとのうち第２のラベルのみが付与された第２の医用データ群と、第１のラベルと第２のラベルとが付与された第３の医用データ群とで第２の機械学習を行う。ラベル（第１のラベル、第２のラベル）が重複したデータを第２の機械学習のみに用いることで第２の機械学習に対するデータ数を増加させることができ、第１の機械学習に基づく第２の機械学習を適切に行うことが出来る。本実施形態の手法は、特に第２の医用データ群のデータ数が少ない場合に有効である。 In the present embodiment, the first machine learning is performed on the first medical data group to which only the first label of the first label and the second label is attached. Further, a second medical data group to which only the second label of the first label and the second label is attached, and a third medical data to which the first label and the second label are attached. Perform a second machine learning with the group. By using the data with duplicate labels (first label, second label) only for the second machine learning, the number of data for the second machine learning can be increased, and the first based on the first machine learning. The machine learning of 2 can be performed appropriately. The method of the present embodiment is particularly effective when the number of data in the second medical data group is small.

（実施形態１の変形例１）
実施形態１では、ステップＳ３２０において、記憶部２００から取得したデータのうち、診断名と全体形状の画像所見とがともに付与されているデータを第３の医用データ群として選択していた。変形例１では、第１の医用データ群の一部に第２のラベルを付与することで第３の医用データ群を作成してもよい。あるいは、第１の医用データ群の一部に第２のラベルを付与し、第３の医用データ群に追加するようにしてもよい。なお、この処理は、情報処理装置１００が不図示の付与部１１２を有し、付与部１１２が第１の医用データ群の一部に第２のラベルを付与することにより行われる。また別の例では、選択部１０４は第１のラベルと第２のラベルとをともに付与されているデータ群の一部のデータ群を第３の医用データ群として選択してもよい。 (Modification 1 of Embodiment 1)
In the first embodiment, in step S320, among the data acquired from the storage unit 200, the data to which the diagnosis name and the image finding of the whole shape are given are selected as the third medical data group. In the first modification, a third medical data group may be created by attaching a second label to a part of the first medical data group. Alternatively, a second label may be attached to a part of the first medical data group and added to the third medical data group. This process is performed by the information processing apparatus 100 having an addition unit 112 (not shown), and the addition unit 112 attaches a second label to a part of the first medical data group. In yet another example, the selection unit 104 may select a part of the data group of the data group to which both the first label and the second label are attached as the third medical data group.

例えば、第１の医用データ群のデータ数が９５００、第２の医用データ群のデータ数が５０００、初期の第３の医用データ群のデータ数が５００である場合を考える。この場合、本実施形態の処理を適用すると、第１の機械学習に用いる学習データのデータ数は９５００、第２の機械学習に用いる学習データのデータ数は１０００となる。ここで、初期の第１の医用データ群９５００のデータから５００のデータを選択して第２のラベルを付与し、第３の医用データ群に追加した場合を考える。この場合、第１の機械学習に用いる学習データのデータ数は９０００、第２の機械学習に用いる学習データのデータ数は１５００となる。つまり、処理を行う前と比べて、第１の機械学習に用いる学習データのデータ数は約９５％、第２の機械学習に用いる学習データのデータ数は１５０％となる。機械学習において、学習データのデータ数が９５％になってもほぼ負の影響が出ないと考えられるのに対して、１５０％になった場合は大きく正の影響が出ることが期待できる。 For example, consider a case where the number of data in the first medical data group is 9500, the number of data in the second medical data group is 5000, and the number of data in the initial third medical data group is 500. In this case, when the process of the present embodiment is applied, the number of data of the training data used for the first machine learning is 9,500, and the number of data of the learning data used for the second machine learning is 1000. Here, consider a case where 500 data are selected from the data of the initial first medical data group 9500, a second label is given, and the data is added to the third medical data group. In this case, the number of data of the training data used for the first machine learning is 9000, and the number of data of the training data used for the second machine learning is 1500. That is, the number of data of the training data used for the first machine learning is about 95% and the number of data of the learning data used for the second machine learning is 150% as compared with before the processing. In machine learning, it is considered that there is almost no negative effect even if the number of learning data reaches 95%, whereas it can be expected to have a large positive effect when it reaches 150%.

したがって、本変形例によれば、第２の医用データ群のデータ数が少ない場合でも、データを増やすことで第２の機械学習に対するデータ数を確保することができるため、第１の機械学習に基づく第２の機械学習を適切に行うことができる。 Therefore, according to this modification, even when the number of data in the second medical data group is small, the number of data for the second machine learning can be secured by increasing the data, so that the first machine learning can be performed. The second machine learning based on it can be done appropriately.

（実施形態１の変形例２）
実施形態１では、ステップＳ３４０において、第１の機械学習結果を転移（ｆｉｎｅ－ｔｕｎｉｎｇ）させることで第２の機械学習を行っていた。しかし、第２のラベルが付与されたデータ群のデータ数が所定値より多い場合には、第１の機械学習結果を転移するよりも通常の機械学習の方が高精度になることがあるため、転移を行わずに通常の機械学習を行うようにしてもよい。すなわち、第１の機械学習と第２の機械学習を独立に行うようにしてもよい。 (Modification 2 of Embodiment 1)
In the first embodiment, in step S340, the second machine learning is performed by transferring the first machine learning result to fine-tuning. However, when the number of data in the data group to which the second label is attached is larger than a predetermined value, normal machine learning may be more accurate than transferring the first machine learning result. , Ordinary machine learning may be performed without transfer. That is, the first machine learning and the second machine learning may be performed independently.

例えば、第１のラベルが付与されたデータ群のデータ数に対する、第２のラベルが付与されたデータ群のデータ数の比率が０．５を超える場合は第１の機械学習結果を転移するよりも通常の機械学習が高精度になると判断する。もちろんこの比率は一例であり、他の値であっても構わない。また、比率を用いない他の方法により判断しても構わない。 For example, if the ratio of the number of data in the second labeled data group to the number of data in the first labeled data group exceeds 0.5, the first machine learning result is transferred. Also judges that normal machine learning will be highly accurate. Of course, this ratio is an example and may be another value. Moreover, you may judge by another method which does not use a ratio.

本変形例によれば、第１の機械学習結果を転移するよりも通常の機械学習が高精度になると判断した場合は転移を行わないことで、第２の機械学習をより適切に行うことができる。本変形例は特にデータを継続的に収集する場合に有効である。 According to this modification, if it is determined that normal machine learning is more accurate than transferring the first machine learning result, the second machine learning can be performed more appropriately by not performing the transfer. can. This modification is particularly effective when collecting data continuously.

（実施形態１の変形例３）
実施形態１では、ステップＳ３３０とステップＳ３４０において、第１の機械学習と第２の機械学習で同一の手法を用いて機械学習を行っていたが、第１の機械学習と第２の機械学習で異なる手法を用いてもよい。例えば、第１の機械学習はＤＣＮＮを用い、第２の機械学習は、ＤＣＮＮの中間出力（第１の機械学習結果）を入力とするサポートベクターマシンを学習するようにしてもよい。もちろんこれらは一例であり、他の方法であっても構わない。 (Modification 3 of Embodiment 1)
In the first embodiment, in step S330 and step S340, machine learning was performed using the same method in the first machine learning and the second machine learning, but in the first machine learning and the second machine learning. Different methods may be used. For example, the first machine learning may use DCNN, and the second machine learning may learn a support vector machine having an intermediate output of DCNN (first machine learning result) as an input. Of course, these are just examples, and other methods may be used.

本変形例によれば、第１の機械学習と第２の機械学習の夫々で最適な手法を用いることで、第１の機械学習に基づく第２の機械学習をより適切に行うことができる。 According to this modification, the second machine learning based on the first machine learning can be performed more appropriately by using the optimum methods for each of the first machine learning and the second machine learning.

（実施形態１の変形例４）
実施形態１では、ステップＳ３２０において第３の医用データ群を選択していた。しかし、これに限らず、第１のラベルが付与されたデータ群と第２のラベルが付与されたデータ群とで重複するデータが無いことが判明しているような場合には、ステップＳ３２０において第３の医用データ群を選択しなくてもよい。そして、第１の医用データ群と第２の医用データ群のみでステップＳ３３０とステップＳ３４０における学習を行うようにしてもよい。 (Modified Example 4 of Embodiment 1)
In the first embodiment, the third medical data group was selected in step S320. However, the present invention is not limited to this, and when it is found that there is no overlapping data between the data group to which the first label is attached and the data group to which the second label is attached, in step S320. It is not necessary to select the third group of medical data. Then, learning in steps S330 and S340 may be performed only with the first medical data group and the second medical data group.

本変形例によれば、第１の医用データ群と第２の医用データ群で重複するデータが無い場合でも第１の機械学習に基づく第２の機械学習を適切に行うことができる。特に、第１の医用データ群と第２の医用データ群を異なるデータソースから取得する場合に有効である。 According to this modification, the second machine learning based on the first machine learning can be appropriately performed even when there is no overlapping data in the first medical data group and the second medical data group. In particular, it is effective when the first medical data group and the second medical data group are acquired from different data sources.

＜実施形態２＞
実施形態２における情報処理装置５００は、第３の医用データ群のデータ数に基づき、第１の機械学習と第２の機械学習で第３の医用データ群を学習に用いるかどうかを判定する。 <Embodiment 2>
The information processing apparatus 500 in the second embodiment determines whether or not the third medical data group is used for learning in the first machine learning and the second machine learning based on the number of data of the third medical data group.

図５は、実施形態２における情報処理装置５００の機能構成の一例を示す図である。なお、図１と同一の符号が付与された構成部については、実施形態１との差異部分のみ説明する。 FIG. 5 is a diagram showing an example of the functional configuration of the information processing apparatus 500 according to the second embodiment. It should be noted that only the differences from the first embodiment will be described for the components to which the same reference numerals are given as those in FIG.

情報処理装置５００は、取得部１０２と、選択部１０４と、判定部５１０と、第１の機械学習部５０６と、第２の機械学習部５０８とを備える。判定部５１０は、第３の医用データ群のデータ数に基づき、第１の機械学習部５０６と第２の機械学習部５０８において、第３の医用データ群を学習に用いるかどうかを判定する。第１の機械学習部５０６は、判定に基づき決定されたデータで診断名を分類する第１の機械学習を行う。第２の機械学習部５０８は、第１の機械学習の結果に基づいて、判定に基づき決定されたデータで画像所見を分類する第２の機械学習を行う。 The information processing apparatus 500 includes an acquisition unit 102, a selection unit 104, a determination unit 510, a first machine learning unit 506, and a second machine learning unit 508. The determination unit 510 determines whether or not to use the third medical data group for learning in the first machine learning unit 506 and the second machine learning unit 508 based on the number of data of the third medical data group. The first machine learning unit 506 performs the first machine learning that classifies the diagnosis name based on the data determined based on the determination. The second machine learning unit 508 performs the second machine learning that classifies the image findings with the data determined based on the determination based on the result of the first machine learning.

本実施形態における情報処理装置５００のハードウェア構成は、第一の実施形態における図２と同様である。 The hardware configuration of the information processing apparatus 500 in this embodiment is the same as that in FIG. 2 in the first embodiment.

次に、図６のフローチャートを用いて、情報処理装置５００が行う全体の処理を説明する。なお、図３と同一の符号が付与された処理については第一の実施形態との差異部分のみ説明する。 Next, the entire process performed by the information processing apparatus 500 will be described with reference to the flowchart of FIG. It should be noted that only the difference from the first embodiment will be described for the process to which the same reference numerals are given as in FIG.

ステップＳ６１０及びステップＳ６２０の処理は第一の実施形態におけるステップＳ３１０及びステップＳ３２０の処理と同様である。 The processing of steps S610 and S620 is the same as the processing of steps S310 and S320 in the first embodiment.

ステップＳ６２５において、判定部５１０は、ステップＳ６２０で選択した第３の医用データ群のデータ数に基づいて、ステップＳ６３０とステップＳ６４０の第１の機械学習と第２の機械学習において第３の医用データ群を用いるかどうかを判定する。 In step S625, the determination unit 510 determines the third medical data in the first machine learning and the second machine learning of steps S630 and S640 based on the number of data of the third medical data group selected in step S620. Determine if to use the group.

本実施形態において判定部５１０は、第３の医用データ群のデータ数が所定値より多い場合には第１の機械学習では第３の医用データ群を用いないと判定し、第２の機械学習では第３の医用データ群を用いると判定する。すなわち、ステップＳ６３０とステップＳ６４０では実施形態１と同様の処理を行う。 In the present embodiment, the determination unit 510 determines that the third medical data group is not used in the first machine learning when the number of data in the third medical data group is larger than a predetermined value, and the second machine learning. Then, it is determined that the third medical data group is used. That is, in step S630 and step S640, the same processing as in the first embodiment is performed.

判定部５１０は、第３の医用データ群のデータ数が所定値以下の場合には、ステップＳ６３０とステップＳ６４０で第３の医用データ群を用いないと判定するものとする。なお、この場合には第３の医用データ群は第１の機械学習手段と第２の機械学習手段の評価用データとして用いる。評価データを同一とすることで、第１の機械学習の精度と第２の機械学習の精度を同時に考慮した機械学習を行うことが出来る。 When the number of data in the third medical data group is equal to or less than a predetermined value, the determination unit 510 determines that the third medical data group is not used in steps S630 and S640. In this case, the third medical data group is used as evaluation data for the first machine learning means and the second machine learning means. By making the evaluation data the same, machine learning can be performed in consideration of the accuracy of the first machine learning and the accuracy of the second machine learning at the same time.

ステップＳ６３０において、第１の機械学習部５０６は、ステップＳ６１０で取得した第１の医用データ群に基づいて、診断名を分類する第１の機械学習を行う。なお、機械学習は実施形態１と同様にＤＣＮＮで行うものとし、説明は省略する。 In step S630, the first machine learning unit 506 performs the first machine learning for classifying the diagnosis name based on the first medical data group acquired in step S610. Note that machine learning is performed by DCNN as in the first embodiment, and the description thereof will be omitted.

ステップＳ６４０において、第２の機械学習部５０８は、ステップＳ６１０で取得した第２の医用データ群と、ステップＳ６２０で選択した第３の医用データ群に基づいて、ステップＳ６３０で行った第１の機械学習結果に基づき、画像所見を分類する第２の機械学習を行う。本実施形態では、ステップＳ６２５で第３の医用データ群を用いると判定された場合には実施形態１と同様に第２の医用データ群と第３の医用データ群の和のデータで学習を行う。一方、ステップＳ６２５で第３の医用データ群を用いないと判定された場合には、第２の医用データ群で学習を行う。なお、第２の機械学習は実施形態１と同様に、ＤＣＮＮを用いて第１の機械学習結果をｆｉｎｅ－ｔｕｎｉｎｇすることにより行うものとし、説明は省略する。 In step S640, the second machine learning unit 508 is the first machine performed in step S630 based on the second medical data group acquired in step S610 and the third medical data group selected in step S620. A second machine learning is performed to classify the image findings based on the learning result. In the present embodiment, when it is determined in step S625 that the third medical data group is used, learning is performed using the sum data of the second medical data group and the third medical data group as in the first embodiment. .. On the other hand, when it is determined in step S625 that the third medical data group is not used, learning is performed using the second medical data group. The second machine learning is performed by fine-tuning the first machine learning result using DCNN as in the first embodiment, and the description thereof will be omitted.

本実施形態では、第３の医用データ群のデータ数に基づき、第１の機械学習と第２の機械学習で第３の医用データ群を学習に用いるかどうかを判定する。第３の医用データ群の状態に応じて学習に用いるデータを柔軟に扱うことが出来るので、第１の機械学習に基づく第２の機械学習を適切に行うことが出来る。 In the present embodiment, it is determined whether or not the third medical data group is used for learning in the first machine learning and the second machine learning based on the number of data of the third medical data group. Since the data used for learning can be flexibly handled according to the state of the third medical data group, the second machine learning based on the first machine learning can be appropriately performed.

（実施形態２の変形例１）
実施形態２では、ステップＳ６２５において、第３の医用データ群のデータ数に基づき、第１の機械学習と第２の機械学習で第３の医用データ群を学習に用いるかどうかを判定していた。しかし、第１の医用データ群のデータ数と第２の医用データ群のデータ数も考慮するようにしてもよい。より具体的には、第１の医用データ群のデータ数（｜Ｄ１｜）と第２の医用データ群のデータ数（｜Ｄ２｜）と第３の医用データ群のデータ数（｜Ｄ３｜）の比率に基づいて判定を行う。 (Modification 1 of Embodiment 2)
In the second embodiment, in step S625, it is determined whether or not the third medical data group is used for learning in the first machine learning and the second machine learning based on the number of data of the third medical data group. .. However, the number of data in the first medical data group and the number of data in the second medical data group may also be considered. More specifically, the number of data in the first medical data group (| D1 |), the number of data in the second medical data group (| D2 |), and the number of data in the third medical data group (| D3 |). Judgment is made based on the ratio of.

例えば判定部５１０は、｜Ｄ３｜が第１の所定値以下の場合には、第１の機械学習と第２の機械学習で第３の医用データ群を用いないと判定する。 For example, when | D3 | is equal to or less than the first predetermined value, the determination unit 510 determines that the third medical data group is not used in the first machine learning and the second machine learning.

判定部５１０は、｜Ｄ３｜が第１の所定値より大きい場合には、以下のように比率に基づいて判定を行う。｜Ｄ３｜／｜Ｄ１｜が第２の所定値以下で、｜Ｄ３｜／｜Ｄ２｜が第２の所定値より大きい場合には、第１の医用データ群に対する第３の医用データ群の割合が小さく、第２の医用データ群に対する第３の医用データ群の割合が大きいことを示す。この場合、第３の医用データ群を用いないことによる第１の機械学習への影響は小さいと考えられる。一方で、第３の医用データ群を用いず第２の医用データ群のみで学習を行うことによる第２の機械学習への影響が大きいと考えられる。判定部５１０は、第１の機械学習では第３の医用データ群を用いないと判定し、第２の機械学習では第３の医用データ群を用いると判定する。この場合、第２のラベルを有するデータの数を増加させることができ、第２の機械学習をより適切に行うことができる。 When | D3 | is larger than the first predetermined value, the determination unit 510 makes a determination based on the ratio as follows. When | D3 | / | D1 | is equal to or less than the second predetermined value and | D3 | / | D2 | is larger than the second predetermined value, the ratio of the third medical data group to the first medical data group. Is small, indicating that the ratio of the third medical data group to the second medical data group is large. In this case, it is considered that the influence on the first machine learning by not using the third medical data group is small. On the other hand, it is considered that learning is performed only by the second medical data group without using the third medical data group, which has a great influence on the second machine learning. The determination unit 510 determines that the third medical data group is not used in the first machine learning, and determines that the third medical data group is used in the second machine learning. In this case, the number of data having the second label can be increased, and the second machine learning can be performed more appropriately.

｜Ｄ３｜／｜Ｄ１｜が第２の所定値以下で、｜Ｄ３｜／｜Ｄ２｜が第２の所定値以下の場合には、第１の医用データ群及び第２の医用データ群に対する第３の医用データ群の割合が小さいことを示す。この場合、第３の医用データ群を用いないことによる第１の機械学習及び第２の機械学習への影響は小さいと考えられる。したがって、判定部５１０は第１の機械学習と第２の機械学習両方で第３の医用データ群を用いないと判定する。このとき、上述のように第３の医用データ群を評価データとして用いてもよい。 When | D3 | / | D1 | is equal to or less than the second predetermined value and | D3 | / | D2 | is equal to or less than the second predetermined value, the first medical data group and the second medical data group are obtained. It shows that the ratio of the medical data group of 3 is small. In this case, it is considered that the influence on the first machine learning and the second machine learning by not using the third medical data group is small. Therefore, the determination unit 510 determines that the third medical data group is not used in both the first machine learning and the second machine learning. At this time, as described above, the third medical data group may be used as the evaluation data.

｜Ｄ３｜／｜Ｄ１｜が第２の所定値より大きく、｜Ｄ３｜／｜Ｄ２｜が第２の所定値より大きい場合は、第１の医用データ群及び第２の医用データ群に対する第３の医用データ群の割合が大きいことを示す。この場合、第３の医用データ群を用いないことによる第１の機械学習及び第２の機械学習への影響は大きいと考えられる。ここで、第１の機械学習と第２の機械学習とで重複したデータを用いて学習を行うことによる、それぞれの機械学習への影響よりも、学習に用いるデータ数が低下することによるそれぞれの機械学習への影響が大きいと考えられる。したがって判定部５１０は、第１の機械学習と第２の機械学習両方で第３の医用データ群を用いると判定する。 When | D3 | / | D1 | is larger than the second predetermined value and | D3 | / | D2 | is larger than the second predetermined value, the third for the first medical data group and the second medical data group. It shows that the proportion of the medical data group of is large. In this case, it is considered that the influence on the first machine learning and the second machine learning by not using the third medical data group is large. Here, each of them is due to the fact that the number of data used for learning is lower than the influence on each machine learning by learning using the overlapping data in the first machine learning and the second machine learning. It is thought that the impact on machine learning is large. Therefore, the determination unit 510 determines that the third medical data group is used in both the first machine learning and the second machine learning.

｜Ｄ３｜／｜Ｄ１｜が第２の所定値より大きく、｜Ｄ３｜／｜Ｄ２｜が第２の所定値以下の場合は、第１の医用データ群に対する第３の医用データ群の割合が大きく、第２の医用データ群に対する第３の医用データ群の割合が小さいことを示す。すなわち、｜Ｄ１｜＜｜Ｄ２｜であるので判定部５１０は、第１の機械学習と第２の機械学習両方で第３の医用データ群を用いると共に、実施形態１の変形例２と同様に第２の機械学習は第１の機械学習結果を転移せず、通常の機械学習を行うと判定する。 When | D3 | / | D1 | is larger than the second predetermined value and | D3 | / | D2 | is equal to or less than the second predetermined value, the ratio of the third medical data group to the first medical data group is It is large and indicates that the ratio of the third medical data group to the second medical data group is small. That is, since | D1 | << | D2 |, the determination unit 510 uses the third medical data group in both the first machine learning and the second machine learning, and similarly to the second modification of the first embodiment. It is determined that the second machine learning does not transfer the result of the first machine learning and performs normal machine learning.

すなわちステップＳ６２５は、第３の医用データ群のデータ数に基づいて、第３の医用データ群を第１の機械学習及び第２の機械学習の少なくともいずれか一方に用いるか否かを判定する判定工程の一例である。 That is, step S625 determines whether or not to use the third medical data group for at least one of the first machine learning and the second machine learning based on the number of data of the third medical data group. This is an example of the process.

本変形例によれば、第３の医用データ群に加えて第１の医用データ群と第２の医用データ群の状態に応じて学習に用いるデータを判定することができるので、第１の機械学習に基づく第２の機械学習をより適切に行うことができる。 According to this modification, the data used for learning can be determined according to the states of the first medical data group and the second medical data group in addition to the third medical data group, so that the first machine can be determined. The second machine learning based on learning can be performed more appropriately.

（その他の実施形態）
上述の実施形態では、胸部Ｘ線ＣＴ画像における肺結節に関する学習を行う場合を例に説明したが、本発明はこれに限らない。また、対象とする医用画像は、ＣＴ装置、デジタルラジオグラフィ、ＭＲＩ（ＭａｇｎｅｔｉｃＲｅｓｏｎａｎｃｅＩｍａｇｉｎｇ）装置、ＳＰＥＣＴ（ＳｉｎｇｌｅＰｈｏｔｏｎＥｍｉｓｓｉｏｎＣＴ）装置、ＰＥＴ（ＰｏｓｉｔｒｏｎＥｍｉｓｓｉｏｎＴｏｍｏｇｒａｐｈｙ）装置、超音波診断装置、眼底カメラ、光音響装置といった撮影装置の少なくともいずれかを用いて取得された医用画像でもよい。対象とする病変は肺結節影に限らず、被検体のいかなる部位の病変であってよい。また、学習の対象は医療に限られず、例えば学習に用いる画像をカメラで撮影された画像とし、第１のラベルを画像のシーン、第２のラベルを画像中のオブジェクト（空、木など）の状態としてもよい。 (Other embodiments)
In the above-described embodiment, the case of learning about the lung nodule in the chest X-ray CT image has been described as an example, but the present invention is not limited to this. The target medical images include a CT device, a digital radiography, an MRI (Magnetic Resonance Imaging) device, a SPECT (Single Photon Emission CT) device, a PET (Positron Emission Tomography) device, an ultrasonic diagnostic device, an ultrasonic diagnostic device, and an ultrasonic diagnostic device. It may be a medical image acquired by using at least one of an imaging device such as an acoustic device. The target lesion is not limited to the lung nodule shadow, and may be a lesion at any site of the subject. The target of learning is not limited to medical treatment. For example, the image used for learning is an image taken by a camera, the first label is an image scene, and the second label is an object (sky, tree, etc.) in the image. It may be in a state.

上述の実施形態では、第２のラベルとして全体形状の画像所見を用いる例を説明したが、本発明はこれに限らない。たとえば第２のラベルは任意の画像所見であってよい。画像所見は、たとえば病変の全体の形状を示す全体形状、病変の大きさ、解剖学的構造の状態を示す所見（たとえば気管支透瞭像に関する所見）、病変の詳細な形状を示す所見（たとえば病変の切れ込み形状に関する所見、棘状突起に関する所見）等が挙げられる。 In the above-described embodiment, an example in which the image findings of the whole shape are used as the second label has been described, but the present invention is not limited to this. For example, the second label may be any image finding. Imaging findings are, for example, overall shape showing the overall shape of the lesion, findings showing the size of the lesion, findings showing the state of the anatomical structure (for example, findings related to the bronchial fluoroscopy), and findings showing the detailed shape of the lesion (for example, lesion). Findings regarding the shape of the notch, findings regarding the spinous process), etc. can be mentioned.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

上述の各実施形態における情報処理装置は、単体の装置として実現してもよいし、複数の装置を互いに通信可能に組合せて上述の処理を実行する形態としてもよく、いずれも本発明の実施形態に含まれる。共通のサーバ装置あるいはサーバ群で、上述の処理を実行することとしてもよい。情報処理装置及び情報処理システムを構成する複数の装置は所定の通信レートで通信可能であればよく、また同一の施設内あるいは同一の国に存在することを要しない。 The information processing device in each of the above-described embodiments may be realized as a single device, or may be a form in which a plurality of devices are combined so as to be able to communicate with each other to execute the above-mentioned processing, both of which are embodiments of the present invention. include. The above processing may be executed by a common server device or a group of servers. The information processing device and the plurality of devices constituting the information processing system need not be present in the same facility or in the same country as long as they can communicate at a predetermined communication rate.

本発明の実施形態には、前述した実施形態の機能を実現するソフトウェアのプログラムを、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータが該供給されたプログラムのコードを読みだして実行するという形態を含む。 In the embodiment of the present invention, a software program that realizes the functions of the above-described embodiment is supplied to the system or device, and the computer of the system or device reads and executes the code of the supplied program. Including morphology.

したがって、実施形態に係る処理をコンピュータで実現するために、該コンピュータにインストールされるプログラムコード自体も本発明の実施形態の一つである。また、コンピュータが読みだしたプログラムに含まれる指示に基づき、コンピュータで稼働しているＯＳなどが、実際の処理の一部又は全部を行い、その処理によっても前述した実施形態の機能が実現され得る。 Therefore, in order to realize the processing according to the embodiment on the computer, the program code itself installed on the computer is also one of the embodiments of the present invention. Further, based on the instruction included in the program read by the computer, the OS or the like running on the computer performs a part or all of the actual processing, and the function of the above-described embodiment can be realized by the processing. ..

上述の実施形態を適宜組み合わせた形態も、本発明の実施形態に含まれる。 An embodiment in which the above-described embodiments are appropriately combined is also included in the embodiment of the present invention.

１００情報処理装置
１０２取得部
１０４選択部
１０６第１の機械学習部
１０８第２の機械学習部
５１０判定部
５１２付与部 100 Information processing device 102 Acquisition unit 104 Selection unit 106 First machine learning unit 108 Second machine learning unit 510 Judgment unit 512 Granting unit

Claims

Of the first label and the second label, the first medical data group to which only the first label is attached, and the second of the first label and the second label. An acquisition step of acquiring a second medical data group to which only a label is attached and a third medical data group to which the first label and the second label are attached .
A first learning step of performing a first machine learning based on the first medical data group, and
The second learning step of performing the second machine learning based on the parameters in the first machine learning and the second medical data group, and the second learning step based on the number of data of the third medical data group. A step of using the medical data group of 3 for at least one of the first machine learning and the second machine learning, and
An information processing method characterized by having.

The information processing method according to claim 1 , wherein the second learning step performs the second machine learning based on the number of data in the third medical data group.

The information according to claim 1 or claim 2 , wherein the first learning step performs the first machine learning based on the number of data of the third medical data group. Processing method.

The acquisition step is characterized in that the third medical data group is acquired by attaching the second label to the data included in the first medical data group. The information processing method according to any one of the following items.

The second learning step is characterized in that when the number of data in the third medical data group is larger than the first predetermined value, the third medical data group is used for the second machine learning. The information processing method according to claim 2 .

In the second learning step, the number of data in the third medical data group is larger than the first predetermined value, and the number of data in the third medical data group is larger than the number of data in the first medical data group. When the ratio is equal to or less than the second predetermined value and the ratio of the number of data in the third medical data group to the number of data in the second medical data group is larger than the second predetermined value, the third medical data. The information processing method according to claim 5, wherein the group is used for the second machine learning .

The number of data in the third medical data group is larger than the first predetermined value,
The ratio of the number of data in the third medical data group to the number of data in the first medical data group and the ratio of the number of data in the third medical data group to the number of data in the second medical data group are second. The information processing method according to claim 5, wherein when the value is larger than the predetermined value, the third medical data group is used in the first learning step and the second learning step .

A determination step of determining whether or not the third medical data group is used for at least one of the first machine learning and the second machine learning based on the number of data of the third medical data group. The information processing method according to any one of claims 1 to 7 , further comprising.

In the determination step, the ratio of the number of data in the third medical data group to the number of data in the first medical data group and the data in the third medical data group to the number of data in the second medical data group. 8. Claim 8 is characterized in that it is determined whether or not the third medical data group is used for at least one of the first machine learning and the second machine learning based on the ratio of the numbers. Information processing method described in.

When it is determined in the determination step that the third medical data group is not used in either the first machine learning or the second machine learning, the first learning step and the second learning step The information processing method according to claim 8 , wherein the third medical data group is used as evaluation data for machine learning in at least one of the steps.

Of the first label and the second label, the first medical data group to which only the first label is attached, and the second of the first label and the second label. An acquisition step of acquiring a second medical data group to which only a label is attached and a third medical data group to which the first label and the second label are attached .
A determination step for determining whether or not the number of data in the second medical data group is larger than a predetermined number, and
A first learning step of performing a first machine learning based on the first medical data group, and
When the number of the second data is larger than the predetermined number, the second machine learning is performed based on the second medical data group.
A second learning step in which the second machine learning is performed based on the parameters in the first machine learning and the second medical data group when the number of the second data is a predetermined number or less.
A step of using the third medical data group for at least one of the first machine learning and the second machine learning based on the number of data of the third medical data group.
An information processing method characterized by having.

The acquisition step further acquires a third medical data group to which the first label and the second label are attached.
In the determination step, the ratio of the number of data in the third medical data group to the number of data in the first medical data group and the data in the third medical data group to the number of data in the second medical data group. 11. It is characterized in that it is determined whether or not the third medical data group is used for at least one of the first machine learning and the second machine learning based on the ratio of the numbers. Information processing method described in.

The information processing according to any one of claims 1 to 12, wherein each of the first label and the second label is medical information indicating the state of a subject in a medical image. Method.

The information according to any one of claims 1 to 13, wherein the first label is a diagnosis name, and the second label is an image finding representing a feature of a medical image. Processing method.

Of the first label and the second label, the first medical data group to which only the first label is attached, and the second of the first label and the second label. An acquisition means for acquiring a second medical data group to which only a label is attached, a third medical data group to which the first label and the second label are attached, and an acquisition means.
A first learning means for performing a first machine learning based on the first medical data group,
It has a second learning means for performing a second machine learning based on the parameters in the first machine learning and the second medical data group .
An information processing apparatus characterized in that the third medical data group is further used for at least one of the first machine learning and the second machine learning based on the number of data of the third medical data group. ..

The information processing apparatus according to claim 15 , wherein the second learning means further uses the third medical data group to perform the second machine learning.

A first inference means for making inferences about the first label based on the first inference device generated by the first learning means, and a first inference means.
A second inference means that makes inferences about the second label based on the second inference device generated by the second learning means .
The information processing apparatus according to claim 15 or 16 .

The second inference device is a claim that is a machine-learned inference device using a third medical data group to which the first label and the second label are attached as learning data. The information processing apparatus according to 17 .

The information processing apparatus according to claim 17 or 18 , wherein the first label is a diagnostic name, and the second label is an image finding representing a feature of a medical image.

Of the first label and the second label, the first medical data group to which only the first label is attached, and the second of the first label and the second label. An acquisition means for acquiring a second medical data group to which only a label is attached, a third medical data group to which the first label and the second label are attached, and an acquisition means.
A first learning means for performing a first machine learning based on the first medical data group,
It has a second learning means for performing a second machine learning based on the parameters in the first machine learning and the second medical data group .
An information processing system characterized in that the third medical data group is further used for at least one of the first machine learning and the second machine learning based on the number of data of the third medical data group. ..

A storage means for storing data including a medical image and medical information given to the medical image, and a storage means.
The first machine learning based on the first medical data group which is the stored data and is the data to which only the first label is given among the first label and the second label. The first means of learning to do
The second medical data group, which is the stored data and is the data to which only the second label is given among the first label and the second label, and the first one. It has a second learning means that performs a second machine learning based on parameters in machine learning.
Based on the number of data in the third medical data group to which the first label and the second label are attached, the third medical data group is further added to the first machine learning and the second machine learning. An information processing system characterized in that it is used for at least one of them .

A program for causing a computer to execute the information processing method according to any one of claims 1 to 14.