JP6811465B2

JP6811465B2 - Learning device, learning method, learning program, automatic control device, automatic control method and automatic control program

Info

Publication number: JP6811465B2
Application number: JP2019098109A
Authority: JP
Inventors: 学嗣浅谷
Original assignee: Exa Wizards Inc
Current assignee: Exa Wizards Inc
Priority date: 2019-05-24
Filing date: 2019-05-24
Publication date: 2021-01-13
Anticipated expiration: 2039-05-24
Also published as: JP2020194242A; WO2020241037A1

Description

本発明は、対象機器の制御のための学習モデルに学習させる学習装置、学習方法および学習プログラムならびに当該学習モデルを用いた自動制御装置、自動制御方法および自動制御プログラムに関する。 The present invention relates to a learning device, a learning method and a learning program for training a learning model for controlling a target device, and an automatic control device, an automatic control method and an automatic control program using the learning model.

本発明者らは、ディープラーニングを用いて、対象機器（例えばロボット）の動作の学習（自己組織化）を行うことを検討している。非特許文献１には、ロボットを直接教示して物体操作タスクを行わせ、画像、音声信号、モータの各モーダリティーを複数のDeep Autoencoderによって統合して学習させることで、運動パターンを自己組織化できたことが記載されている。 The present inventors are studying learning (self-organization) of the movement of a target device (for example, a robot) by using deep learning. In Non-Patent Document 1, a robot is directly taught to perform an object operation task, and each modality of an image, a voice signal, and a motor is integrated and learned by a plurality of Deep Autoencoders to self-organize a motion pattern. It is stated that it was possible.

尾形、「ロボティクスと深層学習」、人工知能３１巻２号、２１０−２１５頁、２０１６年３月Ogata, "Robotics and Deep Learning", Artificial Intelligence Vol. 31, No. 2, pp. 210-215, March 2016

本発明者らは、より高い精度で対象機器の動作の学習を行うことを検討している。本発明の一態様は、高い精度で対象機器の動作の学習を行うことができる学習装置、学習方法および学習プログラムを実現することを目的とする。 The present inventors are studying to learn the operation of the target device with higher accuracy. One aspect of the present invention is to realize a learning device, a learning method, and a learning program capable of learning the operation of a target device with high accuracy.

上記の課題を解決するために、本発明の一態様に係る学習装置は、動作中の対象機器の状態値および当該動作の計測値を経時的に取得して蓄積する蓄積部と、動作中の対象機器の状態値および当該動作の計測値が少なくとも入力され、当該対象機器の未来の状態値を予測する第１の学習モデルに、教師データを学習させる学習部と、を備え、前記教師データは、前記蓄積部に蓄積された前記状態値および前記計測値の時系列データを含む。 In order to solve the above problems, the learning device according to one aspect of the present invention has a storage unit that acquires and accumulates the state value of the target device in operation and the measured value of the operation over time, and the storage unit in operation. At least the state value of the target device and the measured value of the operation are input, and the first learning model for predicting the future state value of the target device is provided with a learning unit for learning the teacher data. , The time series data of the state value and the measured value accumulated in the storage unit are included.

本発明の一態様に係る学習方法は、動作中の対象機器の状態値および当該動作の計測値を経時的に取得して蓄積する蓄積工程と、動作中の対象機器の状態値および当該動作の計測値が少なくとも入力され、当該対象機器の未来の状態値を予測する第１の学習モデルに、教師データを学習させる学習工程と、を含み、前記教師データは、前記蓄積工程で蓄積された前記状態値および前記計測値の時系列データを含む。 The learning method according to one aspect of the present invention includes a storage step of acquiring and accumulating the state value of the target device in operation and the measured value of the operation over time, and the state value of the target device in operation and the operation. The first learning model in which at least the measured value is input and predicts the future state value of the target device includes a learning step of learning the teacher data, and the teacher data is the said accumulated in the accumulation step. Includes time-series data of state values and the measured values.

本発明の一態様に係る自動制御装置は、動作中の対象機器の状態値および当該動作の計測値が少なくとも入力され、当該対象機器の未来の状態値を予測する第１の学習モデルと、少なくとも動作中の前記対象機器の状態値および当該動作の計測値を前記第１の学習モデルに入力し、前記第１の学習モデルが予測した前記未来の状態値に前記対象機器の状態値を近づけるように前記対象機器を制御する自動制御部と、を備え、前記第１の学習モデルは、過去の前記対象機器の前記状態値および前記計測値の時系列データを含む教師データを学習している。 The automatic control device according to one aspect of the present invention includes a first learning model in which at least the state value of the target device in operation and the measured value of the operation are input to predict the future state value of the target device, and at least. The state value of the target device during operation and the measured value of the operation are input to the first learning model, and the state value of the target device is brought closer to the future state value predicted by the first learning model. The first learning model includes an automatic control unit that controls the target device, and learns teacher data including time-series data of the state value and the measured value of the target device in the past.

本発明の一態様に係る自動制御方法は、動作中の対象機器の状態値および当該動作の計測値が少なくとも入力され、当該対象機器の未来の状態値を予測する第１の学習モデルに、少なくとも動作中の前記対象機器の状態値および当該動作の計測値を入力し、前記第１の学習モデルが予測した前記未来の状態値に前記対象機器の状態値を近づけるように前記対象機器を制御する自動制御工程を含み、前記第１の学習モデルは、過去の前記対象機器の前記状態値および前記計測値の時系列データを含む教師データを学習している。 In the automatic control method according to one aspect of the present invention, at least the state value of the target device in operation and the measured value of the operation are input, and at least in the first learning model for predicting the future state value of the target device. The state value of the target device during operation and the measured value of the operation are input, and the target device is controlled so that the state value of the target device approaches the future state value predicted by the first learning model. The first learning model includes an automatic control step, and learns teacher data including time-series data of the state value and the measured value of the target device in the past.

本発明の一態様によれば、高い精度で対象機器の動作の学習を行うことができる。 According to one aspect of the present invention, it is possible to learn the operation of the target device with high accuracy.

本発明の一実施形態に係る学習システムの概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the learning system which concerns on one Embodiment of this invention. 本発明の一実施形態に係る学習システムの外観を模式的に示す図である。It is a figure which shows typically the appearance of the learning system which concerns on one Embodiment of this invention. 本発明の一実施形態に係る学習装置による対象機器の自動制御の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the automatic control of the target device by the learning apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る第１の学習モデルの入力パラメータおよび出力パラメータを説明する図である。It is a figure explaining the input parameter and the output parameter of the 1st learning model which concerns on one Embodiment of this invention. 本発明の一実施形態に係る表示部の表示内容の一例を示す図である。It is a figure which shows an example of the display content of the display part which concerns on one Embodiment of this invention. 本発明の一実施形態に係る学習装置による手動学習の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of manual learning by the learning apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る学習装置による自動学習の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of automatic learning by the learning apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る学習装置として利用可能なコンピュータの構成を例示したブロック図である。It is a block diagram which illustrates the structure of the computer which can be used as the learning apparatus which concerns on one Embodiment of this invention.

以下、本発明の一実施形態について、詳細に説明する。 Hereinafter, one embodiment of the present invention will be described in detail.

図１は、本発明の一実施形態に係る学習システム１の概略構成を示すブロック図である。学習システム１は、図１に示すように、マニピュレータ（対象機器）１０、カメラ１３、計測装置１４、入力デバイス１５、ディスプレイ（表示部）１６、および、学習装置（自動制御装置）１００を備えている。 FIG. 1 is a block diagram showing a schematic configuration of a learning system 1 according to an embodiment of the present invention. As shown in FIG. 1, the learning system 1 includes a manipulator (target device) 10, a camera 13, a measuring device 14, an input device 15, a display (display unit) 16, and a learning device (automatic control device) 100. There is.

図２は、学習システム１の外観を模式的に示す図である。本実施形態において、マニピュレータ１０には、エンドエフェクタとしてスプーン１７が装着されており、塩２を秤量する動作を行う。例えば、容器３内の塩２を、設定された量だけ、容器４に移す動作を行う。なお、マニピュレータ１０の動作は塩の秤量に限定されず、他の物体（粉体、液体）の秤量を行うようになっていてもよいし、エンドエフェクタを交換することによって他の動作が可能になるように構成されていてもよい。エンドエフェクタは、スプーン、ハンド（グリッパ）、吸着ハンド、スプレーガン、又は溶接トーチであるが、これに限られない。 FIG. 2 is a diagram schematically showing the appearance of the learning system 1. In the present embodiment, the manipulator 10 is equipped with a spoon 17 as an end effector, and performs an operation of weighing the salt 2. For example, the operation of transferring the salt 2 in the container 3 to the container 4 by a set amount is performed. The operation of the manipulator 10 is not limited to the weighing of salt, and other objects (powder, liquid) may be weighed, and other operations can be performed by exchanging the end effector. It may be configured to be. End effectors are, but are not limited to, spoons, hands (grippers), suction hands, spray guns, or welding torches.

マニピュレータ１０は、１以上の関節１１を備えており、各関節１１が駆動されることによって動作する。関節１１は、アームの関節であってもよいし、エンドエフェクタの関節であってもよい。マニピュレータ１０はまた、１以上のセンサ１２を備えており、各センサ１２には、例えば、各関節１１の状態値（例えば、ジョイント角度、指角度）を検知する角度センサ、マニピュレータ１０の特定の箇所における力覚（モーメント）を検知する力覚センサなどが含まれ得る。 The manipulator 10 includes one or more joints 11 and operates by driving each joint 11. The joint 11 may be an arm joint or an end effector joint. The manipulator 10 also includes one or more sensors 12, and each sensor 12 may include, for example, an angle sensor that detects a state value (for example, a joint angle, a finger angle) of each joint 11, or a specific portion of the manipulator 10. A force sensor for detecting the force sense (moment) in the above may be included.

カメラ１３は、マニピュレータ１０の動作（塩の秤量）の目的物（塩２、容器３、容器４）を撮像して撮像画像を取得する。 The camera 13 captures an image (salt 2, container 3, container 4) of the operation (salt weighing) of the manipulator 10 to acquire an captured image.

計測装置１４は、重量計であり、マニピュレータ１０の動作（塩の秤量）の計測値（容器３から容器４に移された塩２の量）を計測する。計測装置１４は、重量計に限られず、対象機器の動作による変化量（例えば、塩の量）を計測可能な任意の装置であり得る。 The measuring device 14 is a weigh scale, and measures a measured value (amount of salt 2 transferred from the container 3 to the container 4) of the operation (weighing of salt) of the manipulator 10. The measuring device 14 is not limited to the weighing scale, and may be any device capable of measuring the amount of change (for example, the amount of salt) due to the operation of the target device.

入力デバイス１５は、マニピュレータ１０を手動で操作するための入力デバイスである。本実施形態において、入力デバイス１５は、図２に示すような、マニピュレータ１０と同じ形状を有し、各関節のジョイント角度を検知するセンサを備え、手で掴んで動かすことにより、マニピュレータ１０を直感的に操作することができるマスタスレーブ方式の入力デバイスであるが、これに限定されず、入力デバイス１５は、ロボットコントローラ、ティーチペンダント、キーボード、レバー、ボタン、スイッチ、タッチパッド等から構成されていてもよい。 The input device 15 is an input device for manually operating the manipulator 10. In the present embodiment, the input device 15 has the same shape as the manipulator 10 as shown in FIG. 2, includes a sensor for detecting the joint angle of each joint, and intuitively operates the manipulator 10 by grasping and moving it by hand. It is a master-slave type input device that can be operated in a specific manner, but the input device 15 is composed of a robot controller, a teach pendant, a keyboard, a lever, a button, a switch, a touch pad, and the like. May be good.

ディスプレイ１６は、各種情報を表示するための表示装置であり、例えば、ＬＣＤディスプレイなどであり得る。 The display 16 is a display device for displaying various information, and may be, for example, an LCD display.

（学習装置）
学習装置１００は、図１に示すように、第１の学習モデル１０１、第２の学習モデル１０２、蓄積部１０３、取得部１０４、学習部１０５、手動制御部１０６、自動制御部１０７および表示制御部１０８を備えている。 (Learning device)
As shown in FIG. 1, the learning device 100 includes a first learning model 101, a second learning model 102, a storage unit 103, an acquisition unit 104, a learning unit 105, a manual control unit 106, an automatic control unit 107, and display control. The unit 108 is provided.

第１の学習モデル１０１は、動作中のマニピュレータ１０の状態値および当該動作の計測値が少なくとも入力され、マニピュレータ１０の未来の状態値および計測値を予測する学習モデルであり、時系列データを学習可能な学習モデルであり得る。一態様において、第１の学習モデル１０１は、ＭＴＲＮＮ（Multi Timescale RNN）、ＬＳＴＭ（Long Short Term Memory）等のＲＮＮ（Recurrent Neural Network）であるが、これに限定されず、ＡＲＩＭＡ（AutoRegressive, Integrated and Moving Average）モデル、１次元ＣＮＮ（Convolutional Neural Network）等であってもよい。 The first learning model 101 is a learning model in which at least the state value of the manipulator 10 in operation and the measured value of the operation are input to predict the future state value and the measured value of the manipulator 10, and learn the time series data. It can be a possible learning model. In one aspect, the first learning model 101 is an RNN (Recurrent Neural Network) such as MTRNN (Multi Timescale RNN) and RSTM (Long Short Term Memory), but is not limited to this, and ARIMA (AutoRegressive, Integrated and). It may be a Moving Average) model, a one-dimensional CNN (Convolutional Neural Network), or the like.

第２の学習モデル１０２は、画像を圧縮および復元可能な学習モデルであり得る。一態様において、第２の学習モデル１０２は、ＣＡＥ（Convolutional Auto Encoder）であるが、これに限定されず、オートエンコーダ（Autoencoder）、ＲＢＭ（Restricted Boltzmann Machine）、主成分分析（Principal Component Analysis）モデル等であってもよい。 The second learning model 102 can be a learning model capable of compressing and restoring images. In one aspect, the second learning model 102 is a CAE (Convolutional Auto Encoder), but is not limited to this, an autoencoder (Autoencoder), an RBM (Restricted Boltzmann Machine), and a principal component analysis (Principal Component Analysis) model. And so on.

蓄積部１０３は、動作中のマニピュレータ１０の状態値を経時的に取得して蓄積する。一態様において、蓄積部１０３は、センサ１２からマニピュレータ１０の各関節１１のジョイント角度（状態値）および力覚（状態値）を取得し、図示しない記憶部に記憶する。 The storage unit 103 acquires and stores the state value of the operating manipulator 10 over time. In one aspect, the storage unit 103 acquires the joint angle (state value) and force sense (state value) of each joint 11 of the manipulator 10 from the sensor 12 and stores them in a storage unit (not shown).

蓄積部１０３はまた、マニピュレータ１０の動作の目的物の撮像画像、および、当該撮像画像の特徴量の少なくとも一方を経時的に取得して蓄積する。一態様において、蓄積部１０３は、カメラ１３が撮像した目的物（塩２、容器３、容器４）の撮像画像を図示しない記憶部に記憶する。また、一態様において、蓄積部１０３は、カメラ１３が撮像した目的物の撮像画像を第２の学習モデル１０２によって圧縮し、圧縮したデータを撮像画像の特徴量として取得し、図示しない記憶部に記憶する。 The storage unit 103 also acquires and stores at least one of the captured image of the target object of the operation of the manipulator 10 and the feature amount of the captured image over time. In one aspect, the storage unit 103 stores the captured image of the target object (salt 2, container 3, container 4) captured by the camera 13 in a storage unit (not shown). Further, in one embodiment, the storage unit 103 compresses the captured image of the target object captured by the camera 13 by the second learning model 102, acquires the compressed data as the feature amount of the captured image, and stores the compressed data in a storage unit (not shown). Remember.

蓄積部１０３はまた、マニピュレータ１０の動作の計測値を経時的に取得して蓄積する。一態様において、蓄積部１０３は、取得部１０４が取得した計測値を、図示しない記憶部に記憶する。 The storage unit 103 also acquires and stores the measured values of the operation of the manipulator 10 over time. In one aspect, the storage unit 103 stores the measured values acquired by the acquisition unit 104 in a storage unit (not shown).

一態様において、第２の学習モデル１０２は、ＣＡＥ、オートエンコーダのような、入力画像と出力画像とが一致するように深層学習される学習モデルである。学習部１０５は、蓄積部１０３が取得したカメラ１３の撮像画像の時系列データ（マニピュレータ１０の動作の動画データ）を、第２の学習モデル１０２に学習させる。これにより、蓄積部１０３は、撮像画像を入力した第２の学習モデル１０２の中間層から当該撮像画像の特徴量を取得することができる。すなわち、入力画像と出力画像とが一致するように深層学習された学習モデルの中間層は、入力画像の次元よりも少ない次元で、入力画像の情報量を落とさずに表現したものと言えるため、目的物の撮像画像の特徴を示す特徴量として好適に用いることができる。 In one aspect, the second learning model 102 is a learning model such as CAE or an autoencoder that is deep-learned so that the input image and the output image match. The learning unit 105 causes the second learning model 102 to learn the time-series data (moving image data of the operation of the manipulator 10) of the captured image of the camera 13 acquired by the storage unit 103. As a result, the storage unit 103 can acquire the feature amount of the captured image from the intermediate layer of the second learning model 102 in which the captured image is input. That is, it can be said that the intermediate layer of the learning model deeply trained so that the input image and the output image match is expressed in a dimension smaller than that of the input image without reducing the amount of information in the input image. It can be suitably used as a feature amount indicating the features of the captured image of the target object.

取得部１０４は、マニピュレータ１０の動作（塩の秤量）の計測値（容器４に移動した塩２の量）を取得する。一態様において、取得部１０４は、容器４を計量する計測装置１４から有線または無線により計測値を取得してもよいし、カメラ１３が計測装置１４のディスプレイを撮像するようになっており、カメラ１３の撮像画像を画像解析することにより、計測値を取得してもよい。取得部１０４はまた、マニピュレータ１０の動作が完了したときの計測値を結果値として取得する。 The acquisition unit 104 acquires the measured value (the amount of salt 2 moved to the container 4) of the operation (weighing of salt) of the manipulator 10. In one aspect, the acquisition unit 104 may acquire the measured value from the measuring device 14 that measures the container 4 by wire or wirelessly, or the camera 13 images the display of the measuring device 14. The measured value may be acquired by performing image analysis on the captured image of 13. The acquisition unit 104 also acquires the measured value when the operation of the manipulator 10 is completed as a result value.

学習部１０５は、第１の学習モデル１０１に教師データを学習させる。教師データの詳細については後述する。 The learning unit 105 causes the first learning model 101 to learn the teacher data. The details of the teacher data will be described later.

学習部１０５はまた、蓄積部１０３に蓄積されたカメラ１３の撮像画像の時系列データを、第２の学習モデル１０２に学習させる。 The learning unit 105 also causes the second learning model 102 to learn the time-series data of the captured image of the camera 13 stored in the storage unit 103.

手動制御部１０６は、入力デバイス１５（外部）からの指示に応じてマニピュレータ１０を制御する。 The manual control unit 106 controls the manipulator 10 in response to an instruction from the input device 15 (external).

自動制御部１０７は、設定された目標値（塩２を容器４に移動させる量）、マニピュレータ１０の状態値、撮像画像の特徴量、および、マニピュレータ１０の動作の計測値を、第１の学習モデル１０１に入力し、第１の学習モデル１０１が予測した未来の状態値にマニピュレータ１０の状態値を近づけるようにマニピュレータ１０を制御する。詳細については後述する。 The automatic control unit 107 first learns the set target value (the amount of moving the salt 2 to the container 4), the state value of the manipulator 10, the feature amount of the captured image, and the measured value of the operation of the manipulator 10. It is input to the model 101, and the manipulator 10 is controlled so that the state value of the manipulator 10 approaches the future state value predicted by the first learning model 101. Details will be described later.

表示制御部１０８は、各種情報をディスプレイ１６に表示させる。表示内容としては、特に限定されないが、カメラ１３の撮像画像、未来の撮像画像の予測画像（詳細については後述する）、マニピュレータ１０のモデリング画像、設定された目標値、計測された計測値等があり得る。 The display control unit 108 displays various information on the display 16. The display contents are not particularly limited, but include captured images of the camera 13, predicted images of future captured images (details will be described later), modeling images of the manipulator 10, set target values, measured measured values, and the like. possible.

（自動制御）
図３は、学習装置１００によるマニピュレータ１０の自動制御の流れの一例を示すフローチャートである。なお、一部のステップは並行して、または、順序を替えて実行してもよい。事前（過去）に後述する手動学習または自動学習がなされた学習装置１００は、マニピュレータ１０（対象機器）を自動制御することができる。 (Automatic control)
FIG. 3 is a flowchart showing an example of the flow of automatic control of the manipulator 10 by the learning device 100. Note that some steps may be performed in parallel or in a different order. The learning device 100, which has been subjected to manual learning or automatic learning described later in advance (past), can automatically control the manipulator 10 (target device).

ステップＳ１において、自動制御部１０７は、マニピュレータ１０の動作の目標値を設定する。例えば、自動制御部１０７は、図示しない入力部を介して入力された値を目標値として設定してもよい。 In step S1, the automatic control unit 107 sets a target value for the operation of the manipulator 10. For example, the automatic control unit 107 may set a value input via an input unit (not shown) as a target value.

ステップＳ２において、自動制御部１０７は、センサ１２からマニピュレータ１０の状態値（各関節１１のジョイント角、所定位置における力覚等）を取得する。 In step S2, the automatic control unit 107 acquires the state value of the manipulator 10 (joint angle of each joint 11, force sense at a predetermined position, etc.) from the sensor 12.

ステップＳ３において、自動制御部１０７は、第２の学習モデル１０２から、カメラ１３が撮像した撮像画像の特徴量を取得する。
In step S3, the automatic control unit 107 acquires the feature amount of the captured image captured by the camera 13 from the second learning model 102.

ステップＳ４において、自動制御部１０７は、取得部１０４から、マニピュレータ１０の動作の計測値を取得する。 In step S4, the automatic control unit 107 acquires the measured value of the operation of the manipulator 10 from the acquisition unit 104.

ステップＳ５において、自動制御部１０７は、第１の学習モデル１０１に入力する入力パラメータを生成する。図４は、第１の学習モデル１０１の入力パラメータおよび出力パラメータを説明する図である。図４に示すように、入力パラメータの各次元には、取得した状態値、特徴量および計測値ならびに設定した目標値が割り振られている。状態値、特徴量、計測値および目標値は、複数次元に割り当てられていてもよい。また、状態値、特徴量、計測値および目標値は、各次元に対応する正規化項によって正規化される。 In step S5, the automatic control unit 107 generates an input parameter to be input to the first learning model 101. FIG. 4 is a diagram illustrating input parameters and output parameters of the first learning model 101. As shown in FIG. 4, acquired state values, feature quantities, measured values, and set target values are assigned to each dimension of the input parameters. State values, feature quantities, measured values, and target values may be assigned to multiple dimensions. In addition, the state value, the feature amount, the measured value, and the target value are normalized by the normalization term corresponding to each dimension.

ステップＳ６において、自動制御部１０７は、第１の学習モデル１０１に入力パラメータを入力し、出力パラメータを取得する。第１の学習モデル１０１は、入力パラメータが入力されたとき、未来に入力される入力パラメータを予測するように学習されており、例えば、時刻ｔの入力パラメータを入力したときに、第１の学習モデル１０１は、時刻ｔ＋１の入力パラメータの予測値を出力するように学習されている。換言すれば、第１の学習モデル１０１は、１フレーム先の入力パラメータを予測する。なお、目標値は固定値である。 In step S6, the automatic control unit 107 inputs an input parameter to the first learning model 101 and acquires an output parameter. The first learning model 101 is trained to predict the input parameters to be input in the future when the input parameters are input. For example, when the input parameters at time t are input, the first learning model 101 is trained. The model 101 is trained to output the predicted value of the input parameter at time t + 1. In other words, the first learning model 101 predicts the input parameters one frame ahead. The target value is a fixed value.

ステップＳ７において、自動制御部１０７は、第１の学習モデル１０１が予測した未来の状態値にマニピュレータ１０の状態値を近づけるようにマニピュレータ１０を制御する。一態様において、自動制御部１０７は、第１の学習モデル１０１が出力した出力パラメータのうち、各関節１１のジョイント角を示すパラメータを参照し、マニピュレータ１０の各関節１１のジョイント角が、予測されたジョイント角に近づくように、各関節１１を制御してもよい。 In step S7, the automatic control unit 107 controls the manipulator 10 so that the state value of the manipulator 10 approaches the future state value predicted by the first learning model 101. In one aspect, the automatic control unit 107 refers to the parameter indicating the joint angle of each joint 11 among the output parameters output by the first learning model 101, and the joint angle of each joint 11 of the manipulator 10 is predicted. Each joint 11 may be controlled so as to approach the joint angle.

ステップＳ８において、自動制御部１０７は、第１の学習モデル１０１が出力した出力パラメータのうち、未来の撮像画像の特徴量を示すパラメータを表示制御部１０８に出力する。表示制御部１０８は、第２の学習モデル１０２を用いて、未来の撮像画像の特徴量を示すパラメータから、未来の撮像画像を復元する。そして、表示制御部１０８は、カメラ１３が撮像した撮像画像と、復元した未来の撮像画像とをディスプレイ１６に表示させる。 In step S8, the automatic control unit 107 outputs to the display control unit 108 a parameter indicating the feature amount of the future captured image among the output parameters output by the first learning model 101. The display control unit 108 restores the future captured image from the parameter indicating the feature amount of the future captured image by using the second learning model 102. Then, the display control unit 108 displays the captured image captured by the camera 13 and the restored future captured image on the display 16.

図５は、ステップＳ８におけるディスプレイ１６の表示内容の一例を示す図である。表示制御部１０８は、ディスプレイ１６に、カメラ１３が撮像した現在の撮像画像２００と、復元した未来の撮像画像２０１とを表示させる。そして、自動制御部１０７は、結果的に、現在の撮像画像２００が未来の撮像画像２０１の状態になるように、マニピュレータ１０を制御する。なお、ステップＳ８では、表示制御部１０８は、カメラ１３が撮像した現在の撮像画像２００のみをディスプレイ１６に表示させてもよい。 FIG. 5 is a diagram showing an example of the display contents of the display 16 in step S8. The display control unit 108 causes the display 16 to display the current captured image 200 captured by the camera 13 and the restored future captured image 201. Then, the automatic control unit 107 controls the manipulator 10 so that the current captured image 200 is in the state of the future captured image 201 as a result. In step S8, the display control unit 108 may display only the current captured image 200 captured by the camera 13 on the display 16.

ステップＳ９において、自動制御部１０７は、マニピュレータ１０の動作が完了したか否かを判定し、完了していなかった場合は（ステップＳ９のＮＯ）、ステップＳ２に戻って処理を継続し、完了していた場合は（ステップＳ９のＹＥＳ）、処理を終了する。自動制御部１０７は、取得部１０４が取得した計測値が目標値以上となった場合、または、取得部１０４が取得した計測値と目標値との差が予め設定された閾値以下となった場合に、動作が完了したと判定すればよい。なお、一態様において、第１の学習モデル１０１は、マニピュレータ１０の動作が完了している場合には、当該動作が完了したことを示す特定のパラメータを出力するように学習されており、自動制御部１０７は、第１の学習モデル１０１から、当該特定のパラメータが出力された場合に、マニピュレータ１０の動作が完了したと判定してもよい。 In step S9, the automatic control unit 107 determines whether or not the operation of the manipulator 10 is completed, and if it is not completed (NO in step S9), returns to step S2 to continue the process and complete the process. If so (YES in step S9), the process ends. When the measured value acquired by the acquisition unit 104 is equal to or greater than the target value, or when the difference between the measured value acquired by the acquisition unit 104 and the target value is equal to or less than a preset threshold value. In addition, it may be determined that the operation is completed. In one embodiment, the first learning model 101 is learned to output a specific parameter indicating that the operation is completed when the operation of the manipulator 10 is completed, and is automatically controlled. The unit 107 may determine that the operation of the manipulator 10 is completed when the specific parameter is output from the first learning model 101.

（手動学習）
図６は、学習装置１００による手動学習の流れの一例を示すフローチャートである。なお、一部のステップは並行して、または、順序を替えて実行してもよい。 (Manual learning)
FIG. 6 is a flowchart showing an example of the flow of manual learning by the learning device 100. Note that some steps may be performed in parallel or in a different order.

ステップＳ１１において、ユーザは、入力デバイス１５を操作して、マニピュレータ１０の動作を入力する。一態様において、入力デバイス１５は、図２に示すような、マニピュレータ１０と同じ形状を有し、各関節のジョイント角度を検知するセンサを備えるものであり、入力デバイス１５から手動制御部１０６に入力デバイス１５の各関節のジョイント角を示す指示信号が送信される。 In step S11, the user operates the input device 15 to input the operation of the manipulator 10. In one aspect, the input device 15 has the same shape as the manipulator 10 as shown in FIG. 2, and includes a sensor that detects the joint angle of each joint, and is input from the input device 15 to the manual control unit 106. An instruction signal indicating the joint angle of each joint of the device 15 is transmitted.

ステップＳ１２において、手動制御部１０６は、入力デバイス１５（外部）からの指示を取得し、マニピュレータ１０を制御する。一態様において、手動制御部１０６は、入力デバイス１５からの指示信号を参照して、マニピュレータ１０の各関節１１のジョイント角が、入力デバイス１５の各関節のジョイント角と同じになるように各関節を制御する。 In step S12, the manual control unit 106 acquires an instruction from the input device 15 (external) and controls the manipulator 10. In one embodiment, the manual control unit 106 refers to the instruction signal from the input device 15 so that the joint angle of each joint 11 of the manipulator 10 is the same as the joint angle of each joint of the input device 15. To control.

ステップＳ１３において、蓄積部１０３は、センサ１２からマニピュレータ１０の状態値（各関節１１のジョイント角、所定位置における力覚等）を取得し、時系列に沿って蓄積する。 In step S13, the storage unit 103 acquires the state value of the manipulator 10 (joint angle of each joint 11, force sense at a predetermined position, etc.) from the sensor 12 and stores it in chronological order.

ステップＳ１４において、蓄積部１０３は、カメラ１３が撮像した撮像画像を取得し、時系列に沿って蓄積する。 In step S14, the storage unit 103 acquires the captured image captured by the camera 13 and stores it in chronological order.

ステップＳ１５において、蓄積部１０３は、取得部１０４が計測装置１４から取得した計測値を取得し、時系列に沿って蓄積する。 In step S15, the accumulating unit 103 acquires the measured values acquired by the acquiring unit 104 from the measuring device 14 and accumulates them in chronological order.

ステップＳ１６において、手動制御部１０６は、マニピュレータ１０の動作が完了したか否かを判定し、完了していなかった場合は（ステップＳ１６のＮＯ）、ステップＳ１１に戻って処理を継続し、完了していた場合は（ステップＳ１６のＹＥＳ）、ステップＳ１７に進む。一態様において、ユーザは入力デバイス１５の操作により、動作の完了を指定することができる。 In step S16, the manual control unit 106 determines whether or not the operation of the manipulator 10 is completed, and if it is not completed (NO in step S16), returns to step S11 to continue the process and complete the process. If so (YES in step S16), the process proceeds to step S17. In one aspect, the user can specify the completion of the operation by operating the input device 15.

ステップＳ１７において、取得部１０４は、完了した動作の結果値を取得する。取得部１０４は、マニピュレータ１０の動作の結果値（容器４に移動した塩２の量）を、計測装置１４から有線または無線を介して受信することにより、または、カメラ１３の撮像画像を画像解析することにより取得する。 In step S17, the acquisition unit 104 acquires the result value of the completed operation. The acquisition unit 104 receives the result value of the operation of the manipulator 10 (the amount of salt 2 moved to the container 4) from the measuring device 14 via wired or wireless communication, or analyzes the captured image of the camera 13. Get by doing.

なお、ステップＳ１１〜ステップＳ１７は、十分な教師データを得るために複数回繰り返してもよい。 It should be noted that steps S11 to S17 may be repeated a plurality of times in order to obtain sufficient teacher data.

ステップＳ１８において、学習部１０５は、第２の学習モデル１０２が撮像画像を圧縮および復元できるようになるように、蓄積部１０３に蓄積された撮像画像の時系列データを、第２の学習モデル１０２に学習させる。 In step S18, the learning unit 105 uses the time-series data of the captured image stored in the storage unit 103 as the second learning model 102 so that the second learning model 102 can compress and restore the captured image. To learn.

ステップＳ１９において、学習部１０５は、手動制御部１０６による制御の結果として、マニピュレータ１０の動作毎に、蓄積部１０３に蓄積された状態値、撮像画像および計測値の時系列データと取得部１０４が取得した結果値とを用いて教師データを生成する。まず、学習部１０５は、撮像画像の時系列データを第２の学習モデル１０２に入力し、特徴量の時系列データを取得する。そして、学習部１０５は、状態値、特徴量および計測値の時系列データならびに結果値を含む教師データを生成する。そして、ステップＳ２０において、学習部１０５は、生成した教師データを第１の学習モデル１０１に学習させる。その後、手動制御部１０６は、処理を終了する。 In step S19, as a result of the control by the manual control unit 106, the learning unit 105 receives the time-series data of the state value, the captured image, and the measured value accumulated in the storage unit 103 and the acquisition unit 104 for each operation of the manipulator 10. Teacher data is generated using the acquired result value. First, the learning unit 105 inputs the time-series data of the captured image into the second learning model 102, and acquires the time-series data of the feature amount. Then, the learning unit 105 generates teacher data including time-series data of state values, feature quantities and measured values, and result values. Then, in step S20, the learning unit 105 causes the first learning model 101 to learn the generated teacher data. After that, the manual control unit 106 ends the process.

（教師データの詳細）
教師データは、状態値、特徴量および計測値の時系列データと、目標値に代えて、結果値と、を含む。すなわち、一態様において、教師データは、図４に示す入力パラメータの時系列データであって、設定された目標値が割り当てられていたパラメータの代わりに、取得部１０４が取得した結果値が固定値として入力されたデータである。 (Details of teacher data)
The teacher data includes time series data of state values, features and measured values, and result values instead of target values. That is, in one aspect, the teacher data is time-series data of the input parameters shown in FIG. 4, and the result value acquired by the acquisition unit 104 is a fixed value instead of the parameter to which the set target value is assigned. Is the data entered as.

一態様において、学習部１０５は、教師データに含まれる状態値、特徴量および計測値の時系列データを結果値（固定値）とともに順次入力し、次の時点の状態値、特徴量および計測値ならびに結果値（固定値）を正解データとして用いて、第１の学習モデル１０１の学習を行う。 In one aspect, the learning unit 105 sequentially inputs the time series data of the state value, the feature amount, and the measured value included in the teacher data together with the result value (fixed value), and the state value, the feature amount, and the measured value at the next time point. In addition, the first learning model 101 is trained using the result value (fixed value) as the correct answer data.

以上のように、本実施形態では、学習装置１００は、対象機器の動作を学習するとき、状態値や撮像画像の特徴量の時系列データに加えて、対象機器の動作の計測値（例えば、塩の秤量の場合の塩の移動量など）の時系列データを学習することにより、強化学習とは異なるアルゴリズムにより、計測値を反映させた学習を行うことができ、高い精度で対象機器の動作の学習を行うことができる。 As described above, in the present embodiment, when the learning device 100 learns the operation of the target device, in addition to the time-series data of the state value and the feature amount of the captured image, the learning device 100 measures the operation of the target device (for example, By learning the time-series data (such as the amount of salt movement in the case of salt weighing), learning that reflects the measured values can be performed by an algorithm different from reinforcement learning, and the operation of the target device with high accuracy. Can be learned.

また、学習装置１００は、対象機器の動作を学習するとき、当該動作の結果値を取得した後に、当該動作は当該結果値を目標値とした動作であったものとみなして学習を行う。換言すれば、学習装置１００は、当該動作に係る状態値、特徴量、計測値等を蓄積しておき、当該動作の結果値を取得した後、蓄積しておいた状態値、特徴量、計測値等を、当該結果値を得るための動作の教師データとして用いて学習モデルを学習させる。これにより、強化学習とは異なるアルゴリズムにより、結果値を反映させた学習を行うことができ、高い精度で対象機器の動作の学習を行うことができる。 Further, when the learning device 100 learns the operation of the target device, after acquiring the result value of the operation, the learning device 100 considers that the operation is an operation with the result value as the target value and performs learning. In other words, the learning device 100 accumulates state values, feature amounts, measured values, etc. related to the operation, and after acquiring the result value of the operation, the accumulated state values, feature amounts, and measurements. A learning model is trained by using a value or the like as teacher data of an operation for obtaining the result value. As a result, learning that reflects the result value can be performed by an algorithm different from reinforcement learning, and the operation of the target device can be learned with high accuracy.

なお、教師データに含まれる、複数次元のパラメータの時系列データは、パラメータの次元毎に正規化項を設け、正規化することが好ましい。すなわち、一態様において、学習部１０５は、教師データにおける各次元の平均および分散を算出して、各次元のパラメータが平均０、分散１になるように正規化項を算出し、教師データを正規化した後に、第１の学習モデルに学習させる。これにより、オーダーが異なるマルチモーダルなパラメータの平均および分散を合わせ、高い精度で対象機器の動作の学習を行うことができる。 It is preferable that the time-series data of the multidimensional parameters included in the teacher data is normalized by providing a normalization term for each parameter dimension. That is, in one embodiment, the learning unit 105 calculates the average and variance of each dimension in the teacher data, calculates the normalization term so that the parameters of each dimension have an average of 0 and a variance of 1, and normalizes the teacher data. After the conversion, the first learning model is trained. As a result, it is possible to learn the operation of the target device with high accuracy by matching the average and variance of multimodal parameters with different orders.

この場合、第１の学習モデル１０１の学習に用いる損失関数に、次元差を埋める制約をつけることが好ましい。すなわち、損失関数を最小化する場合に次元数が大きいものを最小化する方向のみへ学習が進まないように、次元数が小さい値の損失関数への寄与度を大きくすることが好ましい。例えば、以下の式（１）に示す損失関数を用いることができる。Ｄｉｍは、総次元数を表す。Ｍｉは、各モダリティ（例えば、ジョイント角度（状態値）、力覚（状態値）、特徴量および計測値）の次元数を表す。ｔは、正解データを表す。ｙは、予測データを表す。Ｎはデータ数を表す。 In this case, it is preferable to impose a constraint on the loss function used for learning of the first learning model 101 to fill the dimensional difference. That is, when the loss function is minimized, it is preferable to increase the contribution of the value having a small number of dimensions to the loss function so that the learning does not proceed only in the direction of minimizing the one having a large number of dimensions. For example, the loss function shown in the following equation (1) can be used. Dim represents the total number of dimensions. Mi represents the number of dimensions of each modality (for example, joint angle (state value), force sense (state value), feature amount and measured value). t represents the correct answer data. y represents the prediction data. N represents the number of data.

（自動学習）
学習装置１００は、図２に示す自動制御部１０７による制御の結果として蓄積部１０３に蓄積された状態値および特徴量の時系列データと、取得部１０４が取得した計測値および結果値とを用いて教師データを生成し、学習を行うことができる。これにより、動作精度を自動的に向上させることができる。つまり、学習装置１００は、人の手を介さずに、学習モデルを自習することができる。したがって、手動学習の回数が少なく、手動学習により得られた対象機器の動作精度が所望の動作精度より低い場合であっても、自動学習により、対象機器の動作精度を所望の動作精度まで向上させることができる。言い換えると、少ない手動学習により、高い動作精度を得ることができる。結果として、手動学習を行う作業者の手間を減らすと共に、学習に要する時間を短くすることができる。 (Automatic learning)
The learning device 100 uses the time-series data of the state value and the feature amount accumulated in the storage unit 103 as a result of the control by the automatic control unit 107 shown in FIG. 2, and the measurement value and the result value acquired by the acquisition unit 104. It is possible to generate teacher data and perform learning. As a result, the operating accuracy can be automatically improved. That is, the learning device 100 can self-learn the learning model without human intervention. Therefore, even when the number of manual learnings is small and the operating accuracy of the target device obtained by manual learning is lower than the desired operating accuracy, the automatic learning improves the operating accuracy of the target device to the desired operating accuracy. be able to. In other words, high motion accuracy can be obtained with less manual learning. As a result, it is possible to reduce the labor of the operator who performs manual learning and shorten the time required for learning.

図７は、学習装置１００による自動学習の流れの一例を示すフローチャートである。なお、一部のステップは並行して、または、順序を替えて実行してもよい。 FIG. 7 is a flowchart showing an example of the flow of automatic learning by the learning device 100. Note that some steps may be performed in parallel or in a different order.

図７に示すフローチャートは、図３に示すフローチャートの一部を改変することによって実行される。まず、ステップＳ１を行った後、Ｓ２〜Ｓ４に代えてステップＳ２１〜Ｓ２３を行う。 The flowchart shown in FIG. 7 is executed by modifying a part of the flowchart shown in FIG. First, after performing step S1, steps S21 to S23 are performed instead of S2 to S4.

ステップＳ２１において、自動制御部１０７は、センサ１２からマニピュレータ１０の状態値（各関節１１のジョイント角、所定位置における力覚等）を取得し、蓄積する。 In step S21, the automatic control unit 107 acquires a state value of the manipulator 10 (joint angle of each joint 11, force sense at a predetermined position, etc.) from the sensor 12 and accumulates it.

ステップＳ２２において、自動制御部１０７は、第２の学習モデル１０２から、カメラ１３が撮像した撮像画像の特徴量を取得し、蓄積する。 In step S22, the automatic control unit 107 acquires and accumulates the feature amount of the captured image captured by the camera 13 from the second learning model 102.

ステップＳ２３において、自動制御部１０７は、取得部１０４から、マニピュレータ１０の動作の計測値を取得し、蓄積する。 In step S23, the automatic control unit 107 acquires and accumulates the measured values of the operation of the manipulator 10 from the acquisition unit 104.

続いて、ステップＳ５〜Ｓ９を行い、ステップＳ９がＹＥＳであった場合に、ステップＳ２４〜Ｓ２６を行う。 Subsequently, steps S5 to S9 are performed, and if step S9 is YES, steps S24 to S26 are performed.

ステップＳ２４において、取得部１０４は、完了した動作の結果値を計測装置１４から取得する。 In step S24, the acquisition unit 104 acquires the result value of the completed operation from the measuring device 14.

ステップＳ２５において、学習部１０５は、自動制御部１０７による制御の結果として、マニピュレータ１０の動作毎に、蓄積部１０３に蓄積された状態値、特徴量および計測値の時系列データと取得部１０４が取得した結果値と含む教師データを生成する。そして、ステップＳ２６において、学習部１０５は、生成した教師データを第１の学習モデル１０１に学習させる。その後、自動制御部１０７は、処理を終了する。 In step S25, as a result of control by the automatic control unit 107, the learning unit 105 receives time-series data of state values, feature quantities, and measured values accumulated in the storage unit 103 and acquisition unit 104 for each operation of the manipulator 10. Generate the acquired result value and the teacher data including it. Then, in step S26, the learning unit 105 causes the first learning model 101 to learn the generated teacher data. After that, the automatic control unit 107 ends the process.

また、一態様において、学習部１０５は、マニピュレータ１０の動作を完了させるまでに掛かった時間に基づいて、ステップＳ２５〜Ｓ２６を行うか否かを判定してもよい。すなわち、学習装置１００は、自動制御の結果得られた時系列データのうち、動作速度が速いもの（動作を完了させるまでにかかった時間が閾値より短いもの）のみを教師データとして第１の学習モデル１０１に学習させることにより、自動制御時の動作を高速化することができる。一態様において、学習装置１００は、結果値を所定の段階に分け、各段階の結果値が得られた動作の時系列データのうち、動作速度が速いものの時系列データのみを教師データとして第１の学習モデル１０１に学習させてもよい。 Further, in one aspect, the learning unit 105 may determine whether or not to perform steps S25 to S26 based on the time taken to complete the operation of the manipulator 10. That is, the learning device 100 first learns only the time-series data obtained as a result of the automatic control, which has a high operation speed (the time required to complete the operation is shorter than the threshold value) as the teacher data. By letting the model 101 learn, it is possible to speed up the operation at the time of automatic control. In one aspect, the learning device 100 divides the result value into predetermined stages, and among the time series data of the operation for which the result value of each stage is obtained, only the time series data of the operation having a high operation speed is used as the teacher data. The learning model 101 of the above may be trained.

（サンプリングレートについて）
蓄積部１０３が状態値、特徴量および計測値を取得する間隔は、マニピュレータ１０の制御に要する時間に近いことが好ましい。換言すれば、教師データに用いる状態値、特徴量および計測値のサンプリングレートは、マニピュレータ１０の制御の処理レートに近いことが好ましい。 (About sampling rate)
The interval at which the storage unit 103 acquires the state value, the feature amount, and the measured value is preferably close to the time required for controlling the manipulator 10. In other words, it is preferable that the sampling rate of the state value, the feature amount and the measured value used for the teacher data is close to the processing rate of the control of the manipulator 10.

そこで、一態様において、自動制御部１０７は、あらかじめ用意した擬似的な第１の学習モデル１０１を用いて自動制御を行い、自動制御部１０７が状態値、特徴量および計測値の少なくとも一つを取得してからマニピュレータ１０を制御するまでに掛かった時間を測定する。そして、当該時間に基づいて、蓄積部１０３が、状態値、特徴量および計測値の少なくとも一つを取得する間隔を、当該時間に近づくように調整する。これにより、より高い精度で対象機器の動作の学習を行うことができる。 Therefore, in one embodiment, the automatic control unit 107 performs automatic control using a pseudo first learning model 101 prepared in advance, and the automatic control unit 107 performs at least one of a state value, a feature amount, and a measured value. The time taken from the acquisition to the control of the manipulator 10 is measured. Then, based on the time, the storage unit 103 adjusts the interval for acquiring at least one of the state value, the feature amount, and the measured value so as to approach the time. As a result, it is possible to learn the operation of the target device with higher accuracy.

（変形例）
上記実施形態では、入力パラメータとして、撮像画像の特徴量が含まれているが、当該特徴量は含めなくともよい。また、入力パラメータとして、目標値（結果値）が含まれているが、目標値（結果値）は含めなくともよい。 (Modification example)
In the above embodiment, the feature amount of the captured image is included as the input parameter, but the feature amount may not be included. Further, although the target value (result value) is included as the input parameter, the target value (result value) may not be included.

また、マニピュレータ１０の動作は、物体の秤量の他、物体の移動動作、塗装動作、溶接動作等であってもよい。また、計測値は、物体の量（重さ）の他、物体の移動距離、塗装色または範囲、温度等であってもよい。また、計測装置１４は、重量計の他、測距装置、カメラ、温度計等であってもよい。計測装置１４がカメラである場合、カメラ１３を計測装置１４として利用することも可能である。 Further, the operation of the manipulator 10 may be an object moving operation, a painting operation, a welding operation, or the like, in addition to weighing the object. In addition to the amount (weight) of the object, the measured value may be the moving distance of the object, the paint color or range, the temperature, and the like. Further, the measuring device 14 may be a distance measuring device, a camera, a thermometer, or the like, in addition to the weight scale. When the measuring device 14 is a camera, the camera 13 can also be used as the measuring device 14.

また、マニピュレータ１０に替えて、その他制御可能な対象機器（例えば、工作機械、３Ｄプリンタ、建設機械、医療機器など）に対して本発明を適用することも可能である。 Further, instead of the manipulator 10, the present invention can be applied to other controllable target devices (for example, machine tools, 3D printers, construction machines, medical devices, etc.).

〔ソフトウェアによる実現例〕
学習装置１００の制御ブロック（特に、蓄積部１０３、取得部１０４、学習部１０５、手動制御部１０６、自動制御部１０７および表示制御部１０８）は、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ソフトウェアによって実現してもよい。 [Example of realization by software]
The control block of the learning device 100 (particularly, the storage unit 103, the acquisition unit 104, the learning unit 105, the manual control unit 106, the automatic control unit 107, and the display control unit 108) is a logic formed in an integrated circuit (IC chip) or the like. It may be realized by a circuit (hardware) or by software.

後者の場合、学習装置１００を、図８に示すようなコンピュータ（電子計算機）を用いて構成することができる。図８は、学習装置１００として利用可能なコンピュータ９１０の構成を例示したブロック図である。コンピュータ９１０は、バス９１１を介して互いに接続された演算装置９１２と、主記憶装置９１３と、補助記憶装置９１４と、入出力インターフェース９１５と、通信インターフェース９１６とを備えている。演算装置９１２、主記憶装置９１３、および補助記憶装置９１４は、それぞれ、例えばプロセッサ、ＲＡＭ（random access memory）、ハードディスクドライブであってもよい。上記プロセッサとしては、例えばＣＰＵ（Central Processing Unit）およびＧＰＵ（Graphics Processing Unit）を用いることができる。第１の学習モデル１０１および第２の学習モデル１０２の学習は、ＧＰＵにより実行されるのが好ましい。入出力インターフェース９１５には、ユーザがコンピュータ９１０に各種情報を入力するための入力装置９２０、および、コンピュータ９１０がユーザに各種情報を出力するための出力装置９３０が接続される。入力装置９２０および出力装置９３０は、コンピュータ９１０に内蔵されたものであってもよいし、コンピュータ９１０に接続された（外付けされた）ものであってもよい。例えば、入力装置９２０は、キーボード、マウス、タッチセンサなどであってもよく、出力装置９３０は、ディスプレイ、プリンタ、スピーカなどであってもよい。また、タッチセンサとディスプレイとが一体化されたタッチパネルのような、入力装置９２０および出力装置９３０の双方の機能を有する装置を適用してもよい。そして、通信インターフェース９１６は、コンピュータ９１０が外部の装置と通信するためのインターフェースである。 In the latter case, the learning device 100 can be configured by using a computer (electronic computer) as shown in FIG. FIG. 8 is a block diagram illustrating the configuration of the computer 910 that can be used as the learning device 100. The computer 910 includes an arithmetic unit 912 connected to each other via a bus 911, a main storage device 913, an auxiliary storage device 914, an input / output interface 915, and a communication interface 916. The arithmetic unit 912, the main storage device 913, and the auxiliary storage device 914 may be, for example, a processor, a RAM (random access memory), or a hard disk drive, respectively. As the processor, for example, a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit) can be used. The learning of the first learning model 101 and the second learning model 102 is preferably executed by the GPU. An input device 920 for the user to input various information to the computer 910 and an output device 930 for the computer 910 to output various information to the user are connected to the input / output interface 915. The input device 920 and the output device 930 may be built in the computer 910 or may be connected (external) to the computer 910. For example, the input device 920 may be a keyboard, a mouse, a touch sensor, or the like, and the output device 930 may be a display, a printer, a speaker, or the like. Further, a device having both functions of an input device 920 and an output device 930, such as a touch panel in which a touch sensor and a display are integrated, may be applied. The communication interface 916 is an interface for the computer 910 to communicate with an external device.

補助記憶装置９１４には、コンピュータ９１０を学習装置１００として動作させるための各種のプログラムが格納されている。そして、演算装置９１２は、補助記憶装置９１４に格納された上記プログラムを主記憶装置９１３上に展開して該プログラムに含まれる命令を実行することによって、コンピュータ９１０を、学習装置１００が備える各部として機能させる。なお、補助記憶装置９１４が備える、プログラム等の情報を記録する記録媒体は、コンピュータ読み取り可能な「一時的でない有形の媒体」であればよく、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブル論理回路などであってもよい。また、記録媒体に記録されているプログラムを、主記憶装置９１３上に展開することなく実行可能なコンピュータであれば、主記憶装置９１３を省略してもよい。なお、上記各装置（演算装置９１２、主記憶装置９１３、補助記憶装置９１４、入出力インターフェース９１５、通信インターフェース９１６、入力装置９２０、および出力装置９３０）は、それぞれ１つであってもよいし、複数であってもよい。 The auxiliary storage device 914 stores various programs for operating the computer 910 as the learning device 100. Then, the arithmetic unit 912 expands the program stored in the auxiliary storage device 914 on the main storage device 913 and executes the instructions included in the program to use the computer 910 as each part included in the learning device 100. Make it work. The recording medium for recording information such as programs provided in the auxiliary storage device 914 may be a computer-readable "non-temporary tangible medium", for example, a tape, a disk, a card, a semiconductor memory, or a programmable logic. It may be a circuit or the like. Further, the main storage device 913 may be omitted as long as the computer can execute the program recorded on the recording medium without expanding it on the main storage device 913. Each of the above devices (arithmetic unit 912, main storage device 913, auxiliary storage device 914, input / output interface 915, communication interface 916, input device 920, and output device 930) may be one. There may be more than one.

また、上記プログラムは、コンピュータ９１０の外部から取得してもよく、この場合、任意の伝送媒体（通信ネットワークや放送波等）を介して取得してもよい。そして、本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 Further, the above program may be acquired from the outside of the computer 910, and in this case, it may be acquired via an arbitrary transmission medium (communication network, broadcast wave, etc.). The present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the above program is embodied by electronic transmission.

（まとめ）
本発明の態様１に係る学習装置は、動作中の対象機器の状態値および当該動作の計測値を経時的に取得して蓄積する蓄積部と、動作中の対象機器の状態値および当該動作の計測値が少なくとも入力され、当該対象機器の未来の状態値を予測する第１の学習モデルに、教師データを学習させる学習部と、を備え、前記教師データは、前記蓄積部に蓄積された前記状態値および前記計測値の時系列データを含む。 (Summary)
The learning device according to the first aspect of the present invention has a storage unit that acquires and accumulates the state value of the target device in operation and the measured value of the operation over time, and the state value of the target device in operation and the operation. The first learning model in which at least the measured value is input and predicts the future state value of the target device is provided with a learning unit for learning the teacher data, and the teacher data is stored in the storage unit. Includes time series data of state values and the measured values.

本発明の態様２に係る学習装置は、前記態様１において、前記動作が完了したときの計測値である結果値を取得する取得部をさらに備え、前記第１の学習モデルの入力データは、前記対象機器の動作の目標値をさらに含み、前記教師データは、さらに、前記目標値に代えて、前記取得部が取得した前記結果値を含むものとしてもよい。 In the first aspect, the learning device according to the second aspect of the present invention further includes an acquisition unit for acquiring a result value which is a measured value when the operation is completed, and the input data of the first learning model is the said. The target value of the operation of the target device may be further included, and the teacher data may further include the result value acquired by the acquisition unit in place of the target value.

本発明の態様３に係る学習装置は、前記態様１または２において、外部からの指示に応じて前記対象機器を制御する手動制御部をさらに備え、前記学習部は、前記手動制御部による制御の結果として前記蓄積部に蓄積された前記状態値および前記計測値の時系列データを少なくとも用いて学習を行うこととしてもよい。 The learning device according to the third aspect of the present invention further includes a manual control unit that controls the target device in response to an instruction from the outside in the first or second aspect, and the learning unit is controlled by the manual control unit. As a result, learning may be performed using at least the time-series data of the state value and the measured value accumulated in the storage unit.

本発明の態様４に係る学習装置は、前記態様１〜３において、動作中の前記対象機器の状態値および前記動作の計測値を前記第１の学習モデルに入力し、前記第１の学習モデルが予測した前記未来の状態値に前記対象機器の状態値を近づけるように前記対象機器を制御する自動制御部をさらに備えていることとしてもよい。 In the learning device according to the fourth aspect of the present invention, in the first to third aspects, the state value of the target device in operation and the measured value of the operation are input to the first learning model, and the first learning model It may be further provided with an automatic control unit that controls the target device so that the state value of the target device approaches the future state value predicted by.

本発明の態様５に係る学習装置は、前記態様４において、前記学習部が、前記自動制御部による制御の結果として前記蓄積部に蓄積された前記状態値および前記計測値の時系列データを少なくとも用いて、学習を行うこととしてもよい。 In the learning device according to the fifth aspect of the present invention, in the fourth aspect, the learning unit collects at least the time series data of the state value and the measured value accumulated in the storage unit as a result of control by the automatic control unit. It may be used for learning.

本発明の態様６に係る学習装置は、前記態様５において、前記自動制御部は、前記動作を完了させるまでに掛かった時間に基づいて、当該動作の結果として前記蓄積部に蓄積された前記状態値および前記計測値の時系列データを少なくとも用いた学習を前記学習部に行わせるか否かを判定することとしてもよい。 In the learning device according to the sixth aspect of the present invention, in the fifth aspect, the automatic control unit has accumulated in the storage unit as a result of the operation based on the time taken to complete the operation. It may be determined whether or not the learning unit is to perform learning using at least the time series data of the value and the measured value.

本発明の態様７に係る学習装置は、前記態様４〜６において、前記自動制御部は、前記状態値または前記計測値を取得してから前記対象機器を制御するまでに掛かった時間を測定し、当該時間に基づいて、前記蓄積部が、前記状態値または前記計測値を取得する間隔を調整することとしてもよい。 In the learning device according to the seventh aspect of the present invention, in the fourth to sixth aspects, the automatic control unit measures the time taken from the acquisition of the state value or the measured value to the control of the target device. , The interval at which the storage unit acquires the state value or the measured value may be adjusted based on the time.

本発明の態様８に係る学習装置は、前記態様１〜７において、前記蓄積部は、さらに、前記対象機器の動作の目的物を撮像した撮像画像の特徴量を経時的に取得して蓄積し、前記第１の学習モデルには、さらに、前記撮像画像の特徴量が入力され、前記教師データは、さらに、前記蓄積部に蓄積された前記特徴量の時系列データを含むこととしてもよい。 In the learning apparatus according to the eighth aspect of the present invention, in the first to seventh aspects, the storage unit further acquires and accumulates the feature amount of the captured image obtained by capturing the target object of the operation of the target device over time. The feature amount of the captured image is further input to the first learning model, and the teacher data may further include time-series data of the feature amount accumulated in the storage unit.

本発明の態様９に係る学習装置は、前記態様８において、前記蓄積部は、さらに、前記撮像画像を蓄積し、前記学習部は、入力画像と出力画像とが一致するように深層学習される第２の学習モデルに、前記蓄積部に蓄積された前記撮像画像を学習させ、前記撮像画像の特徴量は、当該撮像画像が入力された前記第２の学習モデルから得られることとしてもよい。 In the learning device according to the ninth aspect of the present invention, in the eighth aspect, the storage unit further accumulates the captured image, and the learning unit is deeply trained so that the input image and the output image match. The second learning model may be trained to learn the captured image accumulated in the storage unit, and the feature amount of the captured image may be obtained from the second learning model to which the captured image is input.

本発明の態様１０に係る学習装置は、前記態様８または９において、前記第１の学習モデルは、さらに、未来の撮像画像の特徴量を予測し、前記第１の学習モデルが予測した前記未来の撮像画像の特徴量から復元した前記未来の撮像画像と、前記目的物を撮像した撮像画像とを表示部に表示させる表示制御部をさらに備えていることとしてもよい。 In the learning device according to the tenth aspect of the present invention, in the eighth or ninth aspect, the first learning model further predicts the feature amount of the captured image in the future, and the future predicted by the first learning model. It is also possible to further include a display control unit for displaying the future captured image restored from the feature amount of the captured image and the captured image obtained by capturing the target object on the display unit.

本発明の態様１１に係る学習装置は、前記態様１〜１０において、複数次元のパラメータの時系列データを含み、次元毎に前記パラメータが正規化されていることとしてもよい。 The learning device according to the eleventh aspect of the present invention may include time-series data of a plurality of dimensional parameters in the first to tenth aspects, and the parameters may be normalized for each dimension.

本発明の態様１２に係る学習装置は、前記態様１〜１１において、前記第１の学習モデルが、ＲＮＮであることとしてもよい。 In the learning device according to the twelfth aspect of the present invention, the first learning model may be an RNN in the first to eleventh aspects.

本発明の態様１３に係る学習装置は、前記態様１〜１２において、前記対象機器が、関節を有するマニピュレータであり、前記状態値が、前記関節の状態値であることとしてもよい。 In the learning device according to the thirteenth aspect of the present invention, in the first to twelve aspects, the target device may be a manipulator having a joint, and the state value may be the state value of the joint.

本発明の態様１４に係る学習方法は、動作中の対象機器の状態値および当該動作の計測値を経時的に取得して蓄積する蓄積工程と、動作中の対象機器の状態値および当該動作の計測値が少なくとも入力され、当該対象機器の未来の状態値を予測する第１の学習モデルに、教師データを学習させる学習工程と、を含み、前記教師データは、前記蓄積工程で蓄積された前記状態値および前記計測値の時系列データを含む。 The learning method according to aspect 14 of the present invention includes a storage step of acquiring and accumulating the state value of the target device in operation and the measured value of the operation over time, and the state value of the target device in operation and the operation. The first learning model in which at least the measured value is input and predicts the future state value of the target device includes a learning step of learning the teacher data, and the teacher data is the said accumulated in the accumulation step. Includes time-series data of state values and the measured values.

本発明の態様１５に係る自動制御装置は、動作中の対象機器の状態値および当該動作の計測値が少なくとも入力され、当該対象機器の未来の状態値を予測する第１の学習モデルと、少なくとも動作中の前記対象機器の状態値および当該動作の計測値を前記第１の学習モデルに入力し、前記第１の学習モデルが予測した前記未来の状態値に前記対象機器の状態値を近づけるように前記対象機器を制御する自動制御部と、を備え、前記第１の学習モデルは、過去の前記対象機器の前記状態値および前記計測値の時系列データを含む教師データを学習している。 The automatic control device according to the fifteenth aspect of the present invention includes a first learning model in which at least the state value of the target device in operation and the measured value of the operation are input to predict the future state value of the target device, and at least. The state value of the target device during operation and the measured value of the operation are input to the first learning model, and the state value of the target device is brought closer to the future state value predicted by the first learning model. The first learning model includes an automatic control unit that controls the target device, and learns teacher data including time-series data of the state value and the measured value of the target device in the past.

本発明の態様１６に係る自動制御方法は、動作中の対象機器の状態値および当該動作の計測値が少なくとも入力され、当該対象機器の未来の状態値を予測する第１の学習モデルに、少なくとも動作中の前記対象機器の状態値および当該動作の計測値を入力し、前記第１の学習モデルが予測した前記未来の状態値に前記対象機器の状態値を近づけるように前記対象機器を制御する自動制御工程を含み、前記第１の学習モデルは、過去の前記対象機器の前記状態値および前記計測値の時系列データを含む教師データを学習している。 In the automatic control method according to the 16th aspect of the present invention, at least the state value of the target device in operation and the measured value of the operation are input, and at least in the first learning model for predicting the future state value of the target device. The state value of the target device during operation and the measured value of the operation are input, and the target device is controlled so that the state value of the target device approaches the future state value predicted by the first learning model. The first learning model includes an automatic control step, and learns teacher data including time-series data of the state value and the measured value of the target device in the past.

本発明の各態様に係る学習装置は、コンピュータによって実現してもよく、この場合には、コンピュータを上記学習装置が備える各部（ソフトウェア要素）として動作させることにより上記学習装置をコンピュータにて実現させる学習装置の学習プログラム、およびそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 The learning device according to each aspect of the present invention may be realized by a computer. In this case, the learning device is realized by the computer by operating the computer as each part (software element) included in the learning device. The learning program of the learning device and the computer-readable recording medium on which it is recorded also fall within the scope of the present invention.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope of the claims, and the embodiments obtained by appropriately combining the technical means disclosed in the different embodiments. Is also included in the technical scope of the present invention.

１学習システム
２塩
１０マニピュレータ（対象機器）
１１関節
１２センサ
１３カメラ
１４計測装置
１５入力デバイス
１６ディスプレイ（表示部）
１００学習装置
１０１第１の学習モデル
１０２第２の学習モデル
１０３蓄積部
１０４取得部
１０５学習部
１０６手動制御部
１０７自動制御部
１０８表示制御部
２００現在の撮像画像
２０１未来の撮像画像 1 Learning system 2 Salt 10 Manipulator (target device)
11 Joints 12 Sensors 13 Cameras 14 Measuring devices 15 Input devices 16 Display (display)
100 Learning device 101 First learning model 102 Second learning model 103 Storage unit 104 Acquisition unit 105 Learning unit 106 Manual control unit 107 Automatic control unit 108 Display control unit 200 Current captured image 201 Future captured image

Claims

A storage unit that acquires and accumulates the state value of the target device in operation and the measured value related to the purpose of the operation over time.
A learning unit for learning teacher data is provided in the first learning model in which at least the state value of the target device in operation and the measured value related to the purpose of the operation are input and the future state value of the target device is predicted. ,
The training data is seen containing a time series data of the storage unit stored the state value and the measured value,
Further provided with an acquisition unit for acquiring a result value which is a measured value when the operation is completed.
The input data of the first learning model further includes a target value of the operation of the target device.
The teacher data further, instead of the target value, the learning apparatus according to claim including Mukoto the result value acquired by the acquiring unit.

Further equipped with a manual control unit that controls the target device in response to an external instruction,
The learning unit, according to claim 1, characterized in that at least used learning time series data of the manual control unit the stored in the storage unit the said state value and the measured value as a result of control by the Learning device.

The state value of the target device during operation and the measured value related to the purpose of the operation are input to the first learning model, and the state value of the target device is added to the future state value predicted by the first learning model. The learning device according to claim 1 or 2 , further comprising an automatic control unit that controls the target device so as to approach the target device.

The third aspect of the present invention, wherein the learning unit performs learning by using at least the time series data of the state value and the measured value accumulated in the storage unit as a result of control by the automatic control unit. Learning device.

Based on the time taken to complete the operation, the automatic control unit performs learning using at least the time series data of the state value and the measured value accumulated in the storage unit as a result of the operation. The learning device according to claim 4 , wherein it is determined whether or not the learning unit is allowed to perform the learning device.

The automatic control unit measures the time taken from acquiring the state value or the measurement value to controlling the target device, and based on the time, the storage unit measures the state value or the measurement. The learning apparatus according to any one of claims 3 to 5 , wherein the interval for acquiring a value is adjusted.

The storage unit further acquires and stores the feature amount of the captured image obtained by capturing the target object of the operation of the target device over time.
The feature amount of the captured image is further input to the first learning model.
The learning device according to any one of claims 1 to 6 , wherein the teacher data further includes time-series data of the feature amount accumulated in the storage unit.

The storage unit further stores the captured image,
The learning unit causes a second learning model, which is deeply trained so that the input image and the output image match, to learn the captured image stored in the storage unit.
The learning device according to claim 7 , wherein the feature amount of the captured image is obtained from the second learning model in which the captured image is input.

The first learning model further predicts the features of future captured images.
The display control unit further includes a display control unit that displays the future captured image restored from the feature amount of the future captured image predicted by the first learning model and the captured image captured by the target object on the display unit. The learning device according to claim 7 or 8 .

The learning device according to any one of claims 1 to 9 , wherein the teacher data includes time-series data of parameters having a plurality of dimensions, and the parameters are normalized for each dimension.

The learning device according to any one of claims 1 to 10 , wherein the first learning model is an RNN.

The target device is a manipulator having joints.
The learning device according to any one of claims 1 to 11 , wherein the state value includes a state value of the joint.

A storage process that acquires and accumulates the state value of the target device in operation and the measured value related to the purpose of the operation over time, and
At least the state value of the target device in operation and the measured value related to the purpose of the operation are input, and the first learning model for predicting the future state value of the target device includes a learning step of learning the teacher data. ,
The training data is seen containing a time series data of the storage step in stored the state value and the measured value,
It further includes an acquisition step of acquiring a result value which is a measured value when the operation is completed.
The input data of the first learning model further includes a target value of the operation of the target device.
Learning the teacher data further that instead of the target value, the result value obtained in said obtaining step and said containing Mukoto.

A learning program for operating a computer as the learning device according to claim 1.

A computer-readable recording medium learning program according to claim 1 4.

A first learning model in which at least the state value of the target device in operation and the measured value related to the purpose of the operation are input to predict the future state value of the target device, and
At least the state value of the target device in operation and the measured value related to the purpose of the operation are input to the first learning model, and the state value of the target device is added to the future state value predicted by the first learning model. It is equipped with an automatic control unit that controls the target device so as to bring it closer to each other.
The first learning model learns teacher data including time-series data of the state value and the measured value of the target device in the past .
The input data of the first learning model further includes a target value of the operation of the target device.
The automatic control device further includes, in place of the target value, a result value which is a measured value when the operation of the target device in the past is completed .

At least the state value of the target device in operation and the measured value related to the purpose of the operation are input, and at least the state value of the target device in operation and the state value of the target device in operation and the state value of the target device in operation are input to the first learning model for predicting the future state value of the target device. It includes an automatic control step of inputting a measured value relating to the purpose of the operation and controlling the target device so as to bring the state value of the target device closer to the future state value predicted by the first learning model.
The first learning model learns teacher data including time-series data of the state value and the measured value of the target device in the past .
The input data of the first learning model further includes a target value of the operation of the target device.
An automatic control method , wherein the teacher data further includes, in place of the target value, a result value which is a measured value when the operation of the target device in the past is completed .

The automatic control program for operating a computer as the automatic control device according to claim 16 .

A computer-readable recording medium on which the automatic control program according to claim 18 is recorded.