WO2025028204A1

WO2025028204A1 - Control device, trained model generation device, method, and program

Info

Publication number: WO2025028204A1
Application number: PCT/JP2024/025018
Authority: WO
Inventors: 勇貴角川; 政志 ▲濱▼屋; 一敏田中
Original assignee: オムロン株式会社
Priority date: 2023-07-31
Filing date: 2024-07-10
Publication date: 2025-02-06

Abstract

Provided is a control device for controlling a robot that weighs an object which is a powder, a granular material, or a fluid, the control device acquiring state data representing a state when the robot is weighing the object. The control device generates behavior data including an operation for, when the state data is input, adjusting the amount of the object that the robot has scooped up by inputting the state data to a trained model, which is a pre-generated trained model and which outputs the behavior data. The control device controls the robot such that the operation represented by the behavior data is realized.

Description

Control device, trained model generation device, method, and program

　本開示は、制御装置、学習済みモデル生成装置、方法、及びプログラムに関する。 This disclosure relates to a control device, a trained model generation device, a method, and a program.

　従来、塩等の粉体を秤量する動作を行う学習システムが知られている（例えば、文献１：特開2020-194242号公報を参照）。この学習システムのマニピュレータには、エンドエフェクタとしてスプーンが装着されており、塩を秤量する動作を行う（例えば、文献１の段落［００１３］を参照）。　A learning system that performs the action of weighing powder such as salt is known (for example, see Reference 1: JP 2020-194242 A). The manipulator of this learning system is equipped with a spoon as an end effector, and performs the action of weighing salt (for example, see paragraph [0013] of Reference 1).

　また、ロボットアームを利用して第１容器から第２容器への注水を行い、液体を秤量する液体秤量方法が知られている（例えば、文献２：特開2021-164980号公報を参照）。この液体秤量方法は、ロボットアームの動作を制御する制御装置が、第１時点に撮影された第１容器を含む画像を取得し、第１時点の第１容器の角度を取得し、第１時点の第２容器の液体量を取得し、取得した画像、角度及び液体量を学習モデルへ入力して、学習モデルが出力する第２時点の角度を取得し、取得した第２時点の角度に応じて、ロボットアームが保持する第１容器の角度を制御する（例えば、文献２の請求項１を参照）。　A liquid weighing method is also known in which a robot arm is used to pour water from a first container into a second container and weigh the liquid (see, for example, Document 2: JP 2021-164980 A). In this liquid weighing method, a control device that controls the operation of the robot arm acquires an image including a first container taken at a first time point, acquires the angle of the first container at the first time point, acquires the amount of liquid in the second container at the first time point, inputs the acquired image, angle, and amount of liquid into a learning model, acquires the angle at a second time point output by the learning model, and controls the angle of the first container held by the robot arm according to the acquired angle at the second time point (see, for example, claim 1 in Document 2).

　また、容器に大容量で入った粉体、粒体又は流体を、別の容器に定められた量だけすくい出して小分けする動作をロボットに学習させる機械学習方法が知られている（例えば、文献３：特開2019-098419号公報を参照）。この機械学習方法は、現実空間において多軸ロボットと対象物とで強化学習させる現実学習プロセスと、シミュレーション空間において、多軸ロボットを疑似的にシミュレートした疑似多軸ロボットと、対象物を疑似的にシミュレートした疑似対象物と、で強化学習させるシミュレーション学習プロセスとからなる（例えば、文献３の要約を参照）。 A machine learning method is also known in which a robot learns to scoop up a large amount of powder, granules, or fluid from a container and divide it into a separate container in a set amount (see, for example, Reference 3: JP 2019-098419 A). This machine learning method consists of a real learning process in which reinforcement learning is performed between a multi-axis robot and an object in a real space, and a simulation learning process in which reinforcement learning is performed between a pseudo multi-axis robot that simulates the multi-axis robot and a pseudo object that simulates the object in a simulation space (see, for example, the summary of Reference 3).

　また、ロボットを用いて粒状媒体を扱う技術が知られている（例えば、文献４：Schenckc, C., Tompson, J., Fox, D., and Levine, S., "Learning Robotic Manipulation of Granular Media," In Proceedings of the First Conference on Robotic Learning (CoRL) (to appear), 2017を参照）。文献４には、粒状媒体を掬う際に用いられるロボットの制御モデルが開示されている。 Also, techniques for handling granular media using robots are known (see, for example, Reference 4: Schenckc, C., Tompson, J., Fox, D., and Levine, S., "Learning Robotic Manipulation of Granular Media," In Proceedings of the First Conference on Robotic Learning (CoRL) (to appear), 2017). Reference 4 discloses a control model of a robot used to scoop up granular media.

　ところで、ロボットが粉体、粒体、又は流体等の対象物を秤量する際に、ロボットは目標量よりも多い対象物をすくい上げてしまう場合もあり得る。通常ロボットは目標量通りを狙って動作するが様々な誤差により実現は難しく、目標量よりも多い対象物をすくい上げてしまう場合もあり得る。上記文献１－４に開示されている技術は、粉体等を秤量する技術であるものの、目標量よりも多い対象物をすくい上げてしまった場合にロボットにどのような動作をさせるのかについては考慮されていない。 When a robot weighs objects such as powders, granules, or fluids, the robot may end up scooping up more than the target amount. Normally, a robot operates aiming to reach the target amount, but this is difficult to achieve due to various errors, and the robot may end up scooping up more than the target amount. The technology disclosed in the above-mentioned documents 1-4 is a technology for weighing powders, etc., but does not take into consideration how the robot should behave if it scoops up more than the target amount.

　本開示は、上記の点に鑑みてなされたものであり、ロボットが粉体、粒体、又は流体である対象物を秤量する際に、目標量よりも多い対象物をすくい上げてしまった状況下において、対象物の量を調整することを目的とする。 The present disclosure has been made in consideration of the above points, and aims to adjust the amount of an object when a robot scoops up more than the target amount of an object, which may be a powder, granules, or fluid, when weighing the object.

　上記目的を達成するために、本開示の制御装置は、粉体、粒体、又は流体である対象物を秤量するロボットを制御する制御装置であって、前記ロボットが前記対象物を秤量する際の状態を表す状態データを取得する取得部と、前記取得部により取得された前記状態データを、予め生成された学習済みモデルであって、かつ前記状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルに対して入力することにより、前記取得部により取得された前記状態データに応じた前記行動データを生成する生成部と、前記生成部により生成された前記行動データが表す動作が実現されるように、前記ロボットを制御する制御部と、を含む制御装置である。 In order to achieve the above object, the control device of the present disclosure is a control device for controlling a robot that weighs an object, which is a powder, granules, or fluid, and includes: an acquisition unit that acquires state data representing the state of the robot when weighing the object; a generation unit that generates the behavioral data corresponding to the state data acquired by the acquisition unit by inputting the state data acquired by the acquisition unit into a trained model that is generated in advance and that outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the state data is input; and a control unit that controls the robot so that the action represented by the behavioral data generated by the generation unit is realized.

　また、本開示の制御方法は、粉体、粒体、又は流体である対象物を秤量するロボットを制御する制御方法であって、前記ロボットが前記対象物を秤量する際の状態を表す状態データを取得し、取得された前記状態データを、予め生成された学習済みモデルであって、かつ前記状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルに対して入力することにより、取得された前記状態データに応じた前記行動データを生成し、生成された前記行動データが表す動作が実現されるように、前記ロボットを制御する、処理をコンピュータが実行する制御方法である。 The control method disclosed herein is a control method for controlling a robot that weighs an object, which is a powder, granules, or fluid, in which a computer executes a process to acquire status data representing the state of the robot when weighing the object, input the acquired status data to a trained model that is generated in advance and outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the status data is input, thereby generating the behavioral data according to the acquired status data, and controlling the robot so as to realize the action represented by the generated behavioral data.

　また、本開示の制御プログラムは、粉体、粒体、又は流体である対象物を秤量するロボットを制御する制御プログラムであって、前記ロボットが前記対象物を秤量する際の状態を表す状態データを取得し、取得された前記状態データを、予め生成された学習済みモデルであって、かつ前記状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルに対して入力することにより、取得された前記状態データに応じた前記行動データを生成し、生成された前記行動データが表す動作が実現されるように、前記ロボットを制御する、処理をコンピュータに実行させるための制御プログラムである。 The control program disclosed herein is a control program for controlling a robot that weighs an object, which is a powder, granules, or fluid, and acquires status data representing the state of the robot when weighing the object, and inputs the acquired status data to a trained model that is generated in advance and outputs behavioral data including an action to adjust the amount of the object scooped up by the robot when the status data is input, thereby generating behavioral data according to the acquired status data, and controlling the robot so as to realize the action represented by the generated behavioral data.

　また、本開示の学習済みモデル生成装置は、仮想ロボットが仮想の粉体、粒体、又は流体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られる学習用データを取得する学習用取得部と、前記学習用取得部により取得された学習用データに基づいて、ロボットが粉体、粒体、又は流体である対象物を秤量する際の状態を表す状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルを生成する学習部と、を含む学習済みモデル生成装置である。 The trained model generating device of the present disclosure is a trained model generating device including a training acquisition unit that acquires training data obtained by computer simulating the operation of a virtual robot when weighing a virtual object that is a virtual powder, granules, or fluid, and a learning unit that generates a trained model based on the training data acquired by the training acquisition unit, and that outputs behavioral data including an operation of adjusting the amount of the object scooped up by the robot when state data representing the state when the robot weighs the object that is a powder, granules, or fluid is input.

　また、本開示の学習済みモデル生成方法は、仮想ロボットが仮想の粉体、粒体、又は流体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られる学習用データを取得し、取得された学習用データに基づいて、ロボットが粉体、粒体、又は流体である対象物を秤量する際の状態を表す状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルを生成する、処理をコンピュータが実行する学習済みモデル生成方法である。 The trained model generation method disclosed herein is a trained model generation method in which a computer executes a process to acquire training data obtained by computer simulating the operation of a virtual robot when weighing a virtual object that is a virtual powder, granules, or fluid, and generates a trained model based on the acquired training data, in which, when state data representing the state of the robot when weighing the object that is a powder, granules, or fluid is input, behavioral data including an operation of adjusting the amount of the object scooped up by the robot is output.

　また、本開示の学習済みモデル生成プログラムは、仮想ロボットが仮想の粉体、粒体、又は流体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られる学習用データを取得し、取得された学習用データに基づいて、ロボットが粉体、粒体、又は流体である対象物を秤量する際の状態を表す状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルを生成する、処理をコンピュータに実行させるための学習済みモデル生成プログラムである。 The trained model generation program of the present disclosure is a trained model generation program for causing a computer to execute a process that acquires training data obtained by computer simulation of the operation of a virtual robot when weighing a virtual object that is a virtual powder, granules, or fluid, and generates a trained model based on the acquired training data, in which, when state data representing the state of the robot when weighing the object that is a powder, granules, or fluid is input, behavioral data including an operation of adjusting the amount of the object scooped up by the robot is output.

　本開示の制御装置、学習済みモデル生成装置、方法、及びプログラムによれば、ロボットが粉体、粒体、又は流体である対象物を秤量する際に、目標量よりも多い対象物をすくい上げてしまった状況下において、対象物の量を調整することができる。 The control device, trained model generation device, method, and program disclosed herein allow the amount of an object to be adjusted in a situation where the robot scoops up more than the target amount of an object, which may be a powder, granule, or fluid, when weighing the object.

本実施形態の概要を説明するための図である。FIG. 1 is a diagram for explaining an overview of the present embodiment. 本実施形態の概要を説明するための図である。FIG. 1 is a diagram for explaining an overview of the present embodiment. 本実施形態のスプーンの動作を説明するための図である。11A to 11C are diagrams for explaining the operation of the spoon of the present embodiment. 本実施形態における物理パラメータのランダム化を説明するための図である。11A and 11B are diagrams for explaining randomization of physical parameters in the present embodiment. 本実施形態に係る学習済みモデル生成装置のハードウェア構成を示すブロック図である。FIG. 2 is a block diagram showing a hardware configuration of the trained model generation device according to the present embodiment. 本実施形態の学習済みモデル生成装置の概略構成を表すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a trained model generation device according to an embodiment of the present invention. 本実施形態の学習済みモデルを説明するための図である。FIG. 1 is a diagram for explaining a trained model of the present embodiment. 本実施形態の制御システムの概略構成を表すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a control system according to an embodiment of the present invention. 本実施形態に係る制御装置のハードウェア構成を示すブロック図である。2 is a block diagram showing a hardware configuration of a control device according to the embodiment. FIG. 本実施形態における学習済みモデル生成処理の流れを示すフローチャートである。1 is a flowchart showing the flow of a trained model generation process in this embodiment. 本実施形態における制御処理の流れを示すフローチャートである。4 is a flowchart showing the flow of a control process in the present embodiment.

　以下、本開示の実施形態の一例を、図面を参照しつつ説明する。本実施形態では、本開示に係る制御装置を搭載した制御システムを例に説明する。なお、各図面において同一又は等価な構成要素及び部分には同一の参照符号を付与している。また、図面の寸法及び比率は、説明の都合上誇張されており、実際の比率とは異なる場合がある。 Below, an example of an embodiment of the present disclosure will be described with reference to the drawings. In this embodiment, a control system equipped with a control device according to the present disclosure will be described as an example. Note that the same reference symbols are used for identical or equivalent components and parts in each drawing. Also, the dimensions and proportions in the drawings have been exaggerated for the convenience of explanation and may differ from the actual proportions.

＜実施形態の概要＞
　図１及び図２は、本実施形態の概要を説明するための図である。図１に示されるように、本実施形態のロボット２４は、容器Ｃｔに充填されている粉体の秤量をする。また、本実施形態では、図２に示されるように、コンピュータシミュレーションを実行することにより得られる学習用データを用いて、粉体を秤量する際に用いられる学習済みモデルを生成する。具体的には、図２に示されるように、学習フェーズＴｒでは、シミュレータを実行することにより得られる仮想の状態データＳｖ及び仮想の行動データＡｖに基づいて、強化学習を実行することにより方策を生成する。なお、この方策は、既知の機械学習モデルによって実現される。 <Overview of the embodiment>
1 and 2 are diagrams for explaining an overview of this embodiment. As shown in FIG. 1, a robot 24 of this embodiment weighs powder filled in a container Ct. In addition, in this embodiment, as shown in FIG. 2, a trained model used when weighing powder is generated using learning data obtained by executing a computer simulation. Specifically, as shown in FIG. 2, in the learning phase Tr, a policy is generated by executing reinforcement learning based on virtual state data Sv and virtual behavior data Av obtained by executing a simulator. Note that this policy is realized by a known machine learning model.

　そして、運用フェーズＯｐにおいて、ロボットは、この方策を利用して粉体を秤量する動作を実行する。具体的には、現在の状態データＳが方策へ入力されると、方策は新たな行動データを出力する。ロボットは、行動データに応じた動作をした後、電子天秤を用いて粉体を秤量Ｗする。ロボットの動作により、新たな状態データＳが生成され、その状態データが方策へ入力され、方策から次の行動データが出力される。運用フェーズＯｐにおいてこのサイクルが実行されることにより、ロボットによる粉体の秤量が実行される。 Then, in the operation phase Op, the robot executes the operation of weighing the powder using this strategy. Specifically, when the current state data S is input to the strategy, the strategy outputs new action data. The robot performs an operation according to the action data, and then weighs W the powder using an electronic balance. New state data S is generated by the robot's operation, and this state data is input to the strategy, and the next action data is output from the strategy. By executing this cycle in the operation phase Op, the robot weighs the powder.

　なお、ロボットが粉体である対象物を秤量する際に、目標量よりも多い対象物をすくい上げてしまう場合もあり得る。本実施形態では、このような場合を想定し、ロボットがすくい上げた対象物を落とすことにより対象物の量を調整する。なお、すくい上げた対象物の量が目標量よりも少なかった場合は、ロボットは対象物をすくい上げる動作を再度行ってもよい。以下、具体的に説明する。 When the robot weighs a powder object, it may scoop up more than the target amount. In this embodiment, such a case is assumed, and the robot adjusts the amount of the object by dropping the object it has scooped up. If the amount of the object scooped up is less than the target amount, the robot may scoop up the object again. This is explained in detail below.

＜問題定式化＞
　本実施形態では強化学習を用いて、上述した方策が反映された学習済みモデルを生成する。また、本実施形態では、マルコフ決定プロセス（Ｓ，Ａ，μ，Ｐ，ｒ，γ）により問題が定義される。Ｓは環境から取得される状態の集合であり、Ａは選択可能な行動の集合である。Ｐ^ａ _ｓｓ’は、状態ｓが観測された状況下において行動ａが選択され、次の時間ステップにおける状態ｓ’へと遷移する確率を表す。ｒ^ａ _ｓｓ’は、状態ｓが観測された状況下において行動ａが選択され、次の時間ステップにおける状態ｓ’へ遷移した際の報酬を表す。γ∈［０，１）は割引率である。方策π（ａ｜ｓ）は状態ｓが与えられた状況下において行動ａが選択される確率である。強化学習の目的は、以下の式（１）によって表される報酬の総和を最大化するような、最適な方策π^＊を見つけることである。なお、ｔは時刻を表す。 <Problem formulation>
In this embodiment, reinforcement learning is used to generate a trained model reflecting the above-mentioned policy. In this embodiment, the problem is defined by a Markov decision process (S, A, μ, P, r, γ). S is a set of states acquired from the environment, and A is a set of selectable actions. ^{P a} _ss' represents the probability that action a is selected in a situation in which state s is observed, and transitions to state s' in the next time step. ^{R a} _ss' represents the reward when action a is selected in a situation in which state s is observed, and transitions to state s' in the next time step. γ ∈ [0, 1) is a discount rate. Policy π(a|s) is the probability that action a is selected in a situation in which state s is given. The purpose of reinforcement learning is to find an optimal policy π ^* that maximizes the sum of rewards represented by the following formula (1). Note that t represents time.

　　　　　　　　　　　　　　　　　　　　（１）
(1)

　なお、上記式（１）のｒ_ｓｔは、以下の式（２）によって表される。 Incidentally, _rst in the above formula (1) is expressed by the following formula (2).

　　　　　　　　　　　　　　　　　　　　（２）
(2)

　本実施形態の粉体の秤量では、状態ｓは、現在の粉体の質量ｗ^{ｃｕｒｒｅｎｔ}と、粉体の目標質量ｗ^ｇｏａｌと、スプーンの傾き角θ^{ｓｐｏｏｎ}とを要素とするベクトル［ｗ^{ｃｕｒｒｅｎｔ}，θ^{ｓｐｏｏｎ}，ｗ^ｇｏａｌ］によって表される。本実施形態では、任意の目標質量の粉体を秤量するために、状態ｓには粉体の目標質量ｗ^ｇｏａｌが組み込まれる。行動ａは、スプーンを傾ける行動ａ^{ｉｎｃｌｉｎｅ}と、スプーンを振る行動ａ^{ｓｈａｋｅ}とを要素とするベクトル［ａ^{ｉｎｃｌｉｎｅ}，ａ^{ｓｈａｋｅ}］によって表される。なお、行動ａ^{ｉｎｃｌｉｎｅ}は、スプーンの相対的なピッチ角に相当する。 In weighing powder in this embodiment, the state s is represented by a vector [w ^current , θ ^spoon , w ^goal ] whose elements are the current powder mass w ^current , the target powder mass w ^goal , and the spoon tilt angle θ ^spoon . In this embodiment, in order to weigh powder of an arbitrary target mass, the state s incorporates the target powder mass w ^goal . The action a is represented by a vector [a ^incline , a ^shake ] whose elements are the action a ^incline of tilting the spoon and the action a ^shake of the spoon . The action a ^incline corresponds to the relative pitch angle of the spoon.

　図３は、スプーンを傾ける行動ａ^{ｉｎｃｌｉｎｅ}とスプーンを振る行動ａ^{ｓｈａｋｅ}とを説明するための図である。図３に示されているように、スプーンを振る行動ａ^{ｓｈａｋｅ}はスプーンを前後に動かすような動作であり、スプーンを振る行動ａ^{ｓｈａｋｅ}はスプーンの位置を前後に動かすような動作であり、スプーンを傾ける行動ａ^{ｉｎｃｌｉｎｅ}はスプーンの姿勢を変化させる動作である。スプーンを振る行動ａ^{ｓｈａｋｅ}は、スプーンの移動距離と移動加速度とによって表現可能である。このため、スプーンを振る行動ａ^{ｓｈａｋｅ}は、スプーンの移動距離と移動加速度との比例係数によって表される。 3 is a diagram for explaining the spoon tilting action a ^incline and the spoon shaking action a ^shake . As shown in FIG. 3, the spoon shaking action a ^shake is an action of moving the spoon back and forth, the spoon shaking action a ^shake is an action of moving the spoon back and forth, and the spoon tilting action a ^incline is an action of changing the spoon's posture. The spoon shaking action a ^shake can be expressed by the spoon's moving distance and movement acceleration. Therefore, the spoon shaking action a ^shake is expressed by a proportional coefficient between the spoon's moving distance and movement acceleration.

　このため、本実施形態の状態ｓと行動ａとは、以下のように定義される。 Therefore, state s and action a in this embodiment are defined as follows:

　また、報酬ｒは、以下の式（３）に示されるように、目標質量ｗ^ｇｏａｌと現在の粉体の質量ｗ^{ｃｕｒｒｅｎｔ}との間の差分として定義される。 Additionally, the reward r is defined as the difference between the goal mass w ^goal and the current powder mass w ^current , as shown in equation (3) below.

　　　　　　　　　　　　　　　　　　　　（３）
(3)

（コンピュータシミュレーションによる学習済みモデルの生成）
　本実施形態では、機械学習モデルを用いて方策π^＊（ａ｜ｓ）を実現する。具体的には、本実施形態では、機械学習モデルの一例であるLSTM（long short-term memory）構造を有するニューラルネットワークモデルを用いて方策π^＊（ａ｜ｓ）を実現する。LSTM構造を有するニューラルネットワークモデルは、粉体の時系列状態を考慮することが可能である。 (Generation of trained models through computer simulation)
In this embodiment, the policy π ^* (a|s) is realized using a machine learning model. Specifically, in this embodiment, the policy π ^* (a|s) is realized using a neural network model having a long short-term memory (LSTM) structure, which is an example of a machine learning model. The neural network model having the LSTM structure can take into account the time-series state of the powder.

　上述したように、本実施形態では、コンピュータシミュレーションを実行することにより得られる学習用データを用いて、LSTM構造を有するニューラルネットワークモデルを学習させることにより、粉体を秤量する際に用いられる学習済みモデルを生成する。 As described above, in this embodiment, a neural network model having an LSTM structure is trained using training data obtained by executing a computer simulation, thereby generating a trained model to be used when weighing powder.

　なお、コンピュータシミュレーションを実行する際には、シミュレーション上の仮想空間内の物理パラメータをランダムに変化させる。具体的には、粉体を構成する粒子間の摩擦係数、粒子の数、粒子の半径、粒子の質量、スプーンと粒子との間の摩擦係数、スプーンの振りの強さを表すパラメータ、重力、及び粉体の目標質量をランダムに変化させることによりシミュレーションを実行する。仮想空間内の物理パラメータをランダムに変化させることにより、シミュレーションにおいて様々な環境が仮想的に実現され、様々な状況に対応可能な学習済みモデルが生成される。 When running the computer simulation, physical parameters in the virtual space of the simulation are randomly changed. Specifically, the simulation is run by randomly changing the coefficient of friction between the particles that make up the powder, the number of particles, the radius of the particles, the mass of the particles, the coefficient of friction between the spoon and the particles, the parameter representing the strength of the spoon swing, gravity, and the target mass of the powder. By randomly changing the physical parameters in the virtual space, various environments are virtually realized in the simulation, and a trained model that can respond to a variety of situations is generated.

　図４は、仮想空間内の物理パラメータを変化させることを説明するための図である。図４に示されるように、シミュレーションを実行することにより学習済みモデルを生成する学習フェーズＴｒにおいては、仮想空間内の様々な物理パラメータをランダムに変化させつつ（図４におけるＲｄ）、仮想空間内のスプーンを動かしてスプーン上の粉体の量を調整する動作を学習済みモデルへ反映させる。そして、図４に示されるように、運用フェーズＯｐにおいては、学習済みモデルを用いて、現実空間のスプーンを動作させることによりスプーン上の粉体の量を調整する動作が実行される。 FIG. 4 is a diagram for explaining changing physical parameters in virtual space. As shown in FIG. 4, in the learning phase Tr in which a trained model is generated by executing a simulation, various physical parameters in the virtual space are randomly changed (Rd in FIG. 4) while the action of moving the spoon in the virtual space to adjust the amount of powder on the spoon is reflected in the trained model. Then, as shown in FIG. 4, in the operation phase Op, the trained model is used to perform the action of adjusting the amount of powder on the spoon by moving the spoon in real space.

　なお、粉体を構成する粒子間の摩擦係数は、粉体の流れにくさを表すパラメータでもある。また、スプーンと粒子との間の摩擦係数は、スプーンに残る粉体の量に影響を及ぼすパラメータでもある。また、スプーンの振りの強さを表すパラメータをランダムに変化させるのは、シミュレータと現実とでロボットによるスプーンの振りの強さが異なるため、それを学習済みモデルへ反映させるためである。また、重力をランダムに変化させるのは、粉体が舞う現象を再現するためである。なお、粉体が舞う現象をより多く再現するために、仮想空間内の重力は、現実の重力加速度よりも極めて小さい値に設定するようにしてもよい。また、目標質量をランダムに変化させるのは、任意の目標質量で粉体を秤量するためである。 The coefficient of friction between the particles that make up the powder is also a parameter that indicates how difficult it is for the powder to flow. The coefficient of friction between the spoon and the particles is also a parameter that affects the amount of powder remaining on the spoon. The parameter that indicates the strength of the spoon swing is changed randomly in order to reflect the difference in strength of the spoon swing by the robot between the simulator and reality in the trained model. Gravity is changed randomly in order to reproduce the phenomenon of powder flying. In order to reproduce the phenomenon of powder flying as much as possible, gravity in the virtual space may be set to a value that is much smaller than the gravitational acceleration in reality. The target mass is changed randomly in order to weigh the powder at an arbitrary target mass.

（学習済みモデル生成装置１０）
　図５は、本実施形態に係る学習済みモデル生成装置１０のハードウェア構成を示すブロック図である。図５に示されるように、学習済みモデル生成装置１０は、ＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）４２、メモリ４４、記憶装置４６、入出力Ｉ／Ｆ（Ｉｎｔｅｒｆａｃｅ）４８、記憶媒体読取装置５０、及び通信Ｉ／Ｆ５２を有する。各構成は、バス５４を介して相互に通信可能に接続されている。 (Trained model generation device 10)
Fig. 5 is a block diagram showing a hardware configuration of the trained model generation device 10 according to the present embodiment. As shown in Fig. 5, the trained model generation device 10 has a CPU (Central Processing Unit) 42, a memory 44, a storage device 46, an input/output I/F (Interface) 48, a storage medium reading device 50, and a communication I/F 52. Each component is connected to each other via a bus 54 so as to be able to communicate with each other.

　記憶装置４６には、後述する各処理を実行するための学習済みモデル生成プログラムが格納されている。ＣＰＵ４２は、中央演算処理ユニットであり、各種プログラムを実行したり、各構成を制御したりする。すなわち、ＣＰＵ４２は、記憶装置４６からプログラムを読み出し、メモリ４４を作業領域としてプログラムを実行する。ＣＰＵ４２は、記憶装置４６に記憶されているプログラムに従って、上記各構成の制御及び各種の演算処理を行う。 The storage device 46 stores a trained model generation program for executing each process described below. The CPU 42 is a central processing unit, and executes various programs and controls each component. That is, the CPU 42 reads the program from the storage device 46 and executes the program using the memory 44 as a working area. The CPU 42 controls each of the components and performs various calculation processes according to the program stored in the storage device 46.

　メモリ４４は、ＲＡＭ（Ｒａｎｄｏｍ　Ａｃｃｅｓｓ　Ｍｅｍｏｒｙ）により構成され、作業領域として一時的にプログラム及びデータを記憶する。記憶装置４６は、ＲＯＭ（Ｒｅａｄ　Ｏｎｌｙ　Ｍｅｍｏｒｙ）、及びＨＤＤ（Ｈａｒｄ　Ｄｉｓｋ　Ｄｒｉｖｅ）、ＳＳＤ（Ｓｏｌｉｄ　Ｓｔａｔｅ　Ｄｒｉｖｅ）等により構成され、オペレーティングシステムを含む各種プログラム、及び各種データを格納する。 Memory 44 is made up of RAM (Random Access Memory) and serves as a working area to temporarily store programs and data. Storage device 46 is made up of ROM (Read Only Memory), HDD (Hard Disk Drive), SSD (Solid State Drive), etc., and stores various programs including the operating system, and various data.

　入出力Ｉ／Ｆ４８は、外部からのデータの入力、及び外部へのデータの出力を行うインタフェースである。また、例えば、キーボードやマウス等の、各種の入力を行うための入力装置、及び、例えば、ディスプレイやプリンタ等の、各種の情報を出力するための出力装置が接続されてもよい。出力装置として、タッチパネルディスプレイを採用することにより、入力装置として機能させてもよい。 The input/output I/F 48 is an interface for inputting data from the outside and outputting data to the outside. In addition, input devices for various inputs, such as a keyboard or mouse, and output devices for outputting various information, such as a display or printer, may be connected. A touch panel display may be used as the output device to function as an input device.

　記憶媒体読取装置５０は、ＣＤ（Ｃｏｍｐａｃｔ　Ｄｉｓｃ）－ＲＯＭ、ＤＶＤ（Ｄｉｇｉｔａｌ　Ｖｅｒｓａｔｉｌｅ　Ｄｉｓｃ）－ＲＯＭ、ブルーレイディスク、ＵＳＢ（Ｕｎｉｖｅｒｓａｌ　Ｓｅｒｉａｌ　Ｂｕｓ）メモリ等の各種記憶媒体に記憶されたデータの読み込みや、記憶媒体に対するデータの書き込み等を行う。 The storage medium reader 50 reads data stored in various storage media such as CD (Compact Disc)-ROM, DVD (Digital Versatile Disc)-ROM, Blu-ray Disc, and USB (Universal Serial Bus) memory, and writes data to the storage media.

　通信Ｉ／Ｆ５２は、他の機器と通信するためのインタフェースであり、例えば、イーサネット（登録商標）、ＦＤＤＩ、Ｗｉ－Ｆｉ（登録商標）等の規格が用いられる。 The communication I/F 52 is an interface for communicating with other devices, and uses standards such as Ethernet (registered trademark), FDDI, and Wi-Fi (registered trademark).

　次に、学習済みモデル生成装置１０の機能構成について説明する。図６に示されるように、学習済みモデル生成装置１０は、機能的には、シミュレーション部１２と、学習用取得部１４と、学習部１６とを含む。また、学習済みモデル生成装置１０の所定の記憶領域には、学習済みモデル記憶部１８が設けられている。各機能構成は、ＣＰＵ４２が記憶装置４６に記憶された各プログラムを読み出し、メモリ４４に展開して実行することにより実現される。 Next, the functional configuration of the trained model generation device 10 will be described. As shown in FIG. 6, the trained model generation device 10 functionally includes a simulation unit 12, a learning acquisition unit 14, and a learning unit 16. A trained model storage unit 18 is also provided in a specified storage area of the trained model generation device 10. Each functional configuration is realized by the CPU 42 reading out each program stored in the storage device 46, expanding it in the memory 44, and executing it.

　学習済みモデル記憶部１８には、後述する処理によって生成された学習済みモデルが格納される。上述したように、本実施形態の学習済みモデルは、LSTM構造を有するニューラルネットワークモデルである。図７は、本実施形態の学習済みモデルを説明するための図である。図７に示されるように、本実施形態の学習済みモデルは、方策π^＊（ａ｜ｓ）を実現するモデルであり、状態データｓが入力されると行動データａが出力されるようなモデルである。 The trained model storage unit 18 stores trained models generated by the process described below. As described above, the trained model of this embodiment is a neural network model having an LSTM structure. FIG. 7 is a diagram for explaining the trained model of this embodiment. As shown in FIG. 7, the trained model of this embodiment is a model that realizes a policy π ^* (a|s), and is a model that outputs action data a when state data s is input.

　シミュレーション部１２は、仮想空間内の仮想ロボットが仮想の粉体である仮想対象物を秤量する際の動作に関するコンピュータシミュレーションを実行する。仮想ロボットには仮想のスプーンが設置されており、仮想ロボットは仮想のスプーンを用いて仮想の粉体を秤量する。なお、シミュレーション部１２は、コンピュータシミュレーションを実行する際に、仮想空間内の物理パラメータをランダムに変化させる。 The simulation unit 12 executes a computer simulation of the operation of a virtual robot in a virtual space when weighing a virtual object, which is a virtual powder. A virtual spoon is installed on the virtual robot, and the virtual robot weighs the virtual powder using the virtual spoon. Note that the simulation unit 12 randomly changes physical parameters in the virtual space when executing the computer simulation.

　学習用取得部１４は、学習用データを取得する。学習用データは、仮想ロボットが仮想の粉体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られるデータである。なお、学習用データは、例えば、仮想ロボットの特定箇所（例えば、スプーンが取り付けられている箇所）の位置及び姿勢、仮想のスプーンの位置及び姿勢、仮想のスプーン上に存在する仮想対象物の量、及びある状態ｓにおいてある行動ａを選択した際の報酬ｒ等のデータによって構成される。 The learning acquisition unit 14 acquires learning data. The learning data is data obtained by computer simulating the operation of the virtual robot when weighing a virtual object, which is a virtual powder. The learning data is composed of data such as the position and posture of a specific part of the virtual robot (e.g., the part where the spoon is attached), the position and posture of the virtual spoon, the amount of virtual object present on the virtual spoon, and the reward r when a certain action a is selected in a certain state s.

　学習部１６は、学習用取得部１４により取得された学習用データに基づいて学習済みモデルを生成する。学習済みモデルは、ロボットが粉体を秤量する際の状態データｓが入力されると当該ロボットがすくい上げた対象物の量を調整する動作を含む行動データａが出力する。 The learning unit 16 generates a trained model based on the training data acquired by the training acquisition unit 14. When state data s is input when the robot weighs powder, the trained model outputs behavior data a including an action to adjust the amount of the object scooped up by the robot.

　具体的には、学習部１６は、上記式（１）に示されている報酬関数が最大となるように、LSTM構造を有するニューラルネットワークモデルを深層強化学習させる。例えば、当該深層強化学習は、現在の粉体の質量が目標量に近づくように学習させることであってもよい。そして、学習部１６は、得られた学習済みモデルを、学習済みモデル記憶部１８へ格納する。なお、報酬関数は上記式（１）に限定されるものではなく、他の形式の報酬関数であってもよい。例えば、スプーンから粉体を落としすぎてしまい、スプーン上の粉体の量が目標量から遠ざかってしまうような場合を抑制するために、そのような行動に対して大きなペナルティを与えるような項を上記式（１）へ加えてもよい。この場合には、学習済みモデルは、すくい上げる粉体の量を目標量へ近づけると共に、粉体の落としすぎを抑制するといった行動をとる確率が高くなり、対象物の秤量にとってはより好ましい。 Specifically, the learning unit 16 performs deep reinforcement learning on a neural network model having an LSTM structure so that the reward function shown in the above formula (1) is maximized. For example, the deep reinforcement learning may be learning so that the current mass of powder approaches a target amount. The learning unit 16 then stores the obtained trained model in the trained model storage unit 18. Note that the reward function is not limited to the above formula (1) and may be another type of reward function. For example, in order to prevent a case where too much powder is dropped from the spoon and the amount of powder on the spoon becomes far from the target amount, a term that imposes a large penalty on such behavior may be added to the above formula (1). In this case, the trained model is more likely to take an action to bring the amount of powder scooped up closer to the target amount and prevent too much powder from being dropped, which is more favorable for weighing the object.

　これにより、ロボットが粉体を秤量する際の状態データｓが入力されると当該ロボットがすくい上げた対象物の量を調整する動作を含む行動データａが出力する学習済みモデルが得られたことになり、この学習済みモデルを用いて現実のロボットを制御することが可能となる。 As a result, a trained model is obtained in which, when state data s is input when the robot is weighing powder, the robot outputs behavioral data a, which includes actions to adjust the amount of the object that it scoops up. This trained model can then be used to control a real robot.

（制御システム２０）
　図８は、本実施形態の制御システム２０の概略構成を表すブロック図である。図８に示されるように、制御システム２０は、センサ群２２と、ロボット２４と、ロボット２４に設置されたスプーン２６と、制御装置３０とを備えている。本実施形態に係る制御装置３０は、学習済みモデル生成装置１０によって生成された学習済みモデルを用いてロボット２４の動作を制御する。なお、スプーン２６は、量を調整する対象物を取得できるものであれば何でもよく、薬さじやロボットが持てるカップ状のようなものの他、ロボットの指に当該対象物が収まるくぼみなどがある構造であってもよい。 (Control System 20)
Fig. 8 is a block diagram showing a schematic configuration of the control system 20 of this embodiment. As shown in Fig. 8, the control system 20 includes a sensor group 22, a robot 24, a spoon 26 installed on the robot 24, and a control device 30. The control device 30 according to this embodiment controls the operation of the robot 24 using the trained model generated by the trained model generation device 10. The spoon 26 may be anything that can obtain an object whose amount is to be adjusted, and may be a medicine spoon, a cup-like object that the robot can hold, or a structure having a recess in which the object can be placed in the fingers of the robot.

　センサ群２２は、ロボット２４の状態、スプーン２６の状態、スプーン２６上の対象物の状態、及び容器内の対象物の状態を逐次検知する。センサ群２２は、例えば、スプーン上又は容器内に存在する対象物の質量を計測する電子天秤、ロボット２４の特定箇所の姿勢及び位置を検知するセンサ、及びスプーン２６の姿勢及び位置を検知するセンサ等を含んで構成される。なお、計測対象となるのは対象物の質量に限定されるものではない。例えば、対象物を撮影した画像から、公知の画像処理技術を活用して推定した量（例えば、対象物の体積等）を対象物の状態とするようにしてもよい。また、スプーンから落とした対象物の量は、対象物を落とした際に発する音を公知の音声処理技術を活用して推定した対象物の量であってもよい。また、検知対象の対象物の状態は、スプーン２６上の対象物の状態又は容器内の対象物の状態に限らず、スプーン２６から落とした対象物の量であってもよい。 The sensor group 22 sequentially detects the state of the robot 24, the state of the spoon 26, the state of the object on the spoon 26, and the state of the object in the container. The sensor group 22 is composed of, for example, an electronic balance that measures the mass of the object on the spoon or in the container, a sensor that detects the posture and position of a specific part of the robot 24, and a sensor that detects the posture and position of the spoon 26. The measurement target is not limited to the mass of the object. For example, the state of the object may be an amount (e.g., the volume of the object) estimated from an image of the object using a known image processing technique. The amount of the object dropped from the spoon may be the amount of the object estimated using a known sound processing technique from the sound made when the object is dropped. The state of the object to be detected is not limited to the state of the object on the spoon 26 or the state of the object in the container, but may also be the amount of the object dropped from the spoon 26.

　ロボット２４は、後述する制御装置３０から出力される制御指令に応じて動作する。スプーン２６は、例えば、図１に示されるようにロボット２４に取り付けられ、ロボット２４の動作に応じて容器内の対象物をすくい上げる。 The robot 24 operates in response to control commands output from the control device 30, which will be described later. The spoon 26 is attached to the robot 24, for example as shown in FIG. 1, and scoops up objects in the container in response to the operation of the robot 24.

　図９は、本実施形態に係る制御装置３０のハードウェア構成を示すブロック図である。図９に示されるように、制御装置３０は、ＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）６２、メモリ６４、記憶装置６６、入出力Ｉ／Ｆ（Ｉｎｔｅｒｆａｃｅ）６８、記憶媒体読取装置７０、及び通信Ｉ／Ｆ７２を有する。各構成は、バス７４を介して相互に通信可能に接続されている。 FIG. 9 is a block diagram showing the hardware configuration of the control device 30 according to this embodiment. As shown in FIG. 9, the control device 30 has a CPU (Central Processing Unit) 62, a memory 64, a storage device 66, an input/output I/F (Interface) 68, a storage medium reading device 70, and a communication I/F 72. Each component is connected to each other so as to be able to communicate with each other via a bus 74.

　記憶装置６６には、後述する各処理を実行するための制御プログラムが格納されている。ＣＰＵ６２は、中央演算処理ユニットであり、各種プログラムを実行したり、各構成を制御したりする。すなわち、ＣＰＵ６２は、記憶装置６６からプログラムを読み出し、メモリ６４を作業領域としてプログラムを実行する。ＣＰＵ６２は、記憶装置６６に記憶されているプログラムに従って、上記各構成の制御及び各種の演算処理を行う。 The storage device 66 stores control programs for executing the various processes described below. The CPU 62 is a central processing unit, and executes various programs and controls each component. That is, the CPU 62 reads the programs from the storage device 66 and executes the programs using the memory 64 as a working area. The CPU 62 controls each of the components and performs various calculation processes according to the programs stored in the storage device 66.

　メモリ６４は、ＲＡＭ（Ｒａｎｄｏｍ　Ａｃｃｅｓｓ　Ｍｅｍｏｒｙ）により構成され、作業領域として一時的にプログラム及びデータを記憶する。記憶装置６６は、ＲＯＭ（Ｒｅａｄ　Ｏｎｌｙ　Ｍｅｍｏｒｙ）、及びＨＤＤ（Ｈａｒｄ　Ｄｉｓｋ　Ｄｒｉｖｅ）、ＳＳＤ（Ｓｏｌｉｄ　Ｓｔａｔｅ　Ｄｒｉｖｅ）等により構成され、オペレーティングシステムを含む各種プログラム、及び各種データを格納する。 Memory 64 is made up of RAM (Random Access Memory) and serves as a working area to temporarily store programs and data. Storage device 66 is made up of ROM (Read Only Memory), HDD (Hard Disk Drive), SSD (Solid State Drive), etc., and stores various programs including the operating system, and various data.

　入出力Ｉ／Ｆ６８は、外部からのデータの入力、及び外部へのデータの出力を行うインタフェースである。また、例えば、キーボードやマウス等の、各種の入力を行うための入力装置、及び、例えば、ディスプレイやプリンタ等の、各種の情報を出力するための出力装置が接続されてもよい。出力装置として、タッチパネルディスプレイを採用することにより、入力装置として機能させてもよい。 The input/output I/F 68 is an interface for inputting data from the outside and outputting data to the outside. In addition, input devices for performing various inputs, such as a keyboard or mouse, and output devices for outputting various information, such as a display or printer, may be connected. A touch panel display may be used as the output device to function as an input device.

　記憶媒体読取装置７０は、ＣＤ（Ｃｏｍｐａｃｔ　Ｄｉｓｃ）－ＲＯＭ、ＤＶＤ（Ｄｉｇｉｔａｌ　Ｖｅｒｓａｔｉｌｅ　Ｄｉｓｃ）－ＲＯＭ、ブルーレイディスク、ＵＳＢ（Ｕｎｉｖｅｒｓａｌ　Ｓｅｒｉａｌ　Ｂｕｓ）メモリ等の各種記憶媒体に記憶されたデータの読み込みや、記憶媒体に対するデータの書き込み等を行う。 The storage medium reader 70 reads data stored in various storage media such as CD (Compact Disc)-ROM, DVD (Digital Versatile Disc)-ROM, Blu-ray Disc, and USB (Universal Serial Bus) memory, and writes data to the storage media.

　通信Ｉ／Ｆ７２は、他の機器と通信するためのインタフェースであり、例えば、イーサネット（登録商標）、ＦＤＤＩ、Ｗｉ－Ｆｉ（登録商標）等の規格が用いられる。 The communication I/F 72 is an interface for communicating with other devices, and uses standards such as Ethernet (registered trademark), FDDI, and Wi-Fi (registered trademark).

　次に、制御装置３０の機能構成について説明する。図８に示されるように、制御装置３０は、機能的には、取得部３４と、生成部３６と、制御部３８とを含む。また、制御装置３０の所定の記憶領域には、学習済みモデル記憶部３２が設けられている。各機能構成は、ＣＰＵ６２が記憶装置６６に記憶された各プログラムを読み出し、メモリ６４に展開して実行することにより実現される。 Next, the functional configuration of the control device 30 will be described. As shown in FIG. 8, the control device 30 functionally includes an acquisition unit 34, a generation unit 36, and a control unit 38. A trained model storage unit 32 is provided in a predetermined storage area of the control device 30. Each functional configuration is realized by the CPU 62 reading out each program stored in the storage device 66, expanding it in the memory 64, and executing it.

　学習済みモデル記憶部３２には、学習済みモデル生成装置１０によって生成された学習済みモデルが格納される。 The trained model storage unit 32 stores the trained model generated by the trained model generation device 10.

　取得部３４は、センサ群２２によって検知されたロボット２４の状態、スプーン２６の状態、スプーン２６上の対象物の状態、及び容器内の対象物の状態を取得する。次に、取得部３４は、ロボット２４が対象物を秤量する際の状態を表す状態データｓを生成する。具体的には、取得部３４は、センサ群２２によって得られた各種データに基づいて、現在のスプーン２６上に存在する対象物の質量ｗ^{ｃｕｒｒｅｎｔ}と、現在のスプーン２６の傾きθ^{ｓｐｏｏｎ}と計算する。そして、取得部３４は、現在の状態データｓ＝［ｗ^{ｃｕｒｒｅｎｔ}，θ^{ｓｐｏｏｎ}，ｗ^ｇｏａｌ］を設定する。なお、対象物の目標質量ｗ^ｇｏａｌは、ユーザによって予め設定される。 The acquisition unit 34 acquires the state of the robot 24, the state of the spoon 26, the state of the object on the spoon 26, and the state of the object in the container, all of which are detected by the sensor group 22. Next, the acquisition unit 34 generates state data s representing the state of the robot 24 when weighing the object. Specifically, the acquisition unit 34 calculates the mass w ^current of the object currently present on the spoon 26 and the current inclination θ ^spoon of the spoon 26, based on the various data acquired by the sensor group 22. Then, the acquisition unit 34 sets the current state data s = [w ^current , θ ^spoon , w ^goal ]. Note that the target mass w ^goal of the object is set in advance by the user.

　生成部３６は、学習済みモデル記憶部３２に格納されている学習済みモデルに対して、取得部３４により取得された状態データｓを入力することにより、当該状態データｓに応じたロボット２４の行動データａ＝［ａ^{ｉｎｃｌｉｎｅ}，ａ^{ｓｈａｋｅ}］を生成する。この行動データａは、ロボット２４がすくい上げた対象物の量を調整する動作を含む行動データａであり、ロボット２４が次時刻にとるべき動作に相当する。なお、ロボット２４がすくい上げた対象物の量を調整する動作は、スプーン２６を傾ける動作及びスプーン２６を振る動作の少なくとも一方を含む動作であってよい。このため、スプーン２６を傾ける動作ａ^{ｉｎｃｌｉｎｅ}及びスプーン２６を振る動作ａ^{ｓｈａｋｅ}の何れか一方は行われず、何れか一方のみが行われてよい。また、スプーン２６を傾ける動作ａ^{ｉｎｃｌｉｎｅ}及びスプーン２６を振る動作ａ^{ｓｈａｋｅ}の双方が行われてもよい。 The generating unit 36 inputs the state data s acquired by the acquiring unit 34 into the learned model stored in the learned model storage unit 32, thereby generating behavior data a=[a ^incline , a ^shake ] of the robot 24 according to the state data s. This behavior data a is behavior data a including an action to adjust the amount of the object scooped up by the robot 24, and corresponds to an action to be taken by the robot 24 at the next time. Note that the action to adjust the amount of the object scooped up by the robot 24 may include at least one of an action to tilt the spoon 26 and an action to shake the spoon 26. For this reason, only one of the action a ^incline to tilt the spoon 26 and the action a ^shake to shake the spoon 26 may be performed, without performing either one of them. In addition, both the action a ^incline to tilt the spoon 26 and the action a ^shake to shake the spoon 26 may be performed.

　制御部３８は、生成部３６により生成された行動データａ＝［ａ^{ｉｎｃｌｉｎｅ}，ａ^{ｓｈａｋｅ}］が表す動作が実現されるように、ロボット２４を制御する。具体的には、制御部３８は、生成部３６により生成されたスプーンを傾ける行動ａ^{ｉｎｃｌｉｎｅ}が実現されるように、ロボット２４に対して制御指令を出力する。また、制御部３８は、生成部３６により生成されたスプーンを振る行動ａ^{ｓｈａｋｅ}が実現されるように、ロボット２４に対して制御指令を出力する。これにより、スプーン２６上に存在する対象物の量を調整するような動作が実行される。具体的には、ロボット２４がスプーン２６を用いてすくい上げた対象物を落とす動作である。対象物を落とす動作は、ロボット２４の特定箇所の姿勢及びロボット２４の特定箇所の位置を変化させる動作である。なお、上記生成された行動データによりロボット２４が動作する前（準備段階）や後（秤量後の粉体の処理）に、ロボット２４の動作をどのような手法で制御するかは、周知のティーチング技術に基づいてもよい。 The control unit 38 controls the robot 24 so that the action represented by the action data a=[a ^incline , a ^shake ] generated by the generation unit 36 is realized. Specifically, the control unit 38 outputs a control command to the robot 24 so that the action a ^incline of tilting the spoon generated by the generation unit 36 is realized. The control unit 38 also outputs a control command to the robot 24 so that the action a ^shake of shaking the spoon generated by the generation unit 36 is realized. As a result, an action to adjust the amount of the object present on the spoon 26 is executed. Specifically, it is an action of the robot 24 dropping the object scooped up by the spoon 26. The action of dropping the object is an action of changing the posture of a specific part of the robot 24 and the position of a specific part of the robot 24. Note that the method by which the action of the robot 24 is controlled before (preparation stage) or after (processing of the powder after weighing) the robot 24 operates based on the generated action data may be based on a well-known teaching technique.

　次に、本実施形態に係る学習済みモデル生成装置１０の作用について説明する。 Next, the operation of the trained model generation device 10 according to this embodiment will be described.

　学習済みモデル生成装置１０が所定の指示信号を受け付けると、学習済みモデル生成装置１０のＣＰＵ４２は記憶装置４６から学習済みモデル生成プログラムを読み出して、メモリ４４に展開して実行する。これにより、ＣＰＵ４２が学習済みモデル生成装置１０の各機能構成として機能し、図１０に示す学習済みモデル生成処理が実行される。 When the trained model generation device 10 receives a predetermined instruction signal, the CPU 42 of the trained model generation device 10 reads the trained model generation program from the storage device 46, expands it into the memory 44, and executes it. As a result, the CPU 42 functions as each functional component of the trained model generation device 10, and the trained model generation process shown in FIG. 10 is executed.

　ステップＳ１００において、シミュレーション部１２は、仮想空間内の仮想ロボットが仮想の粉体である仮想対象物を秤量する際の動作に関するコンピュータシミュレーションを実行する。仮想ロボットには仮想のスプーンが設置されており、仮想ロボットは仮想のスプーンを用いて仮想の粉体を秤量する。なお、シミュレーション部１２は、コンピュータシミュレーションを実行する際に、仮想空間内の物理パラメータをランダムに変化させ
る。 In step S100, the simulation unit 12 executes a computer simulation of an operation of a virtual robot in a virtual space when weighing a virtual object, which is a virtual powder. A virtual spoon is provided on the virtual robot, and the virtual robot weighs the virtual powder using the virtual spoon. When executing the computer simulation, the simulation unit 12 randomly changes physical parameters in the virtual space.

　ステップＳ１０２において、学習用取得部１４は、シミュレーション部１２によるシミュレーションが実行されている最中に、仮想ロボットの特定箇所（例えば、スプーンが取り付けられている箇所）の位置及び姿勢、仮想のスプーンの位置及び姿勢、仮想のスプーン上に存在する仮想対象物の量、及びある状態ｓにおいてある行動ａを選択した際の報酬ｒ等を含んで構成される学習用データを取得する。 In step S102, while the simulation unit 12 is performing the simulation, the learning acquisition unit 14 acquires learning data including the position and orientation of a specific part of the virtual robot (e.g., the part where the spoon is attached), the position and orientation of the virtual spoon, the amount of virtual objects present on the virtual spoon, and the reward r when a certain action a is selected in a certain state s.

　ステップＳ１０４において、学習部１６は、ステップＳ１０２で取得された学習用データに基づいて学習済みモデルを生成する。 In step S104, the learning unit 16 generates a trained model based on the training data acquired in step S102.

　ステップＳ１０６において、学習部１６は、ステップＳ１０４で生成された学習済みモデルを、学習済みモデル記憶部１８へ格納する。 In step S106, the learning unit 16 stores the trained model generated in step S104 in the trained model storage unit 18.

　次に、本実施形態に係る制御システム２０の作用について説明する。 Next, the operation of the control system 20 according to this embodiment will be described.

　学習済みモデル生成装置１０によって生成された学習済みモデルが、制御装置３０へ入力されると、その学習済みモデルは制御装置３０の学習済みモデル記憶部３２へ格納される。そして、制御システム２０が所定の指示信号を受け付けると、制御装置３０のＣＰＵ６２は記憶装置６６から制御プログラムを読み出して、メモリ６４に展開して実行する。これにより、ＣＰＵ６２が制御装置３０の各機能構成として機能し、図１１に示す制御処理が実行される。 When the trained model generated by the trained model generating device 10 is input to the control device 30, the trained model is stored in the trained model storage unit 32 of the control device 30. Then, when the control system 20 receives a predetermined instruction signal, the CPU 62 of the control device 30 reads out the control program from the storage device 66, expands it into the memory 64, and executes it. As a result, the CPU 62 functions as each functional component of the control device 30, and the control process shown in FIG. 11 is executed.

　ステップＳ２００において、取得部３４は、センサ群２２によって得られた各種データから、現在の状態データｓ＝［ｗ^{ｃｕｒｒｅｎｔ}，θ^{ｓｐｏｏｎ}，ｗ^ｇｏａｌ］を取得する。 In step S200, the acquisition unit 34 acquires current state data s = [w ^current , θ ^spoon , w ^goal ] from various data obtained by the sensor group 22.

　ステップＳ２０２において、生成部３６は、学習済みモデル記憶部３２に格納されている学習済みモデルに対して、ステップＳ２００で取得された状態データｓを入力することにより、当該状態データｓに応じたロボット２４の行動データａ＝［ａ^{ｉｎｃｌｉｎｅ}，ａ^{ｓｈａｋｅ}］を生成する。 In step S202, the generation unit 36 inputs the state data s acquired in step S200 into the learned model stored in the learned model storage unit 32, thereby generating behavioral data a = [a ^incline , a ^shake ] of the robot 24 corresponding to the state data s.

　ステップＳ２０４において、ステップＳ２０２で生成された行動データａ＝［ａ^{ｉｎｃｌｉｎｅ}，ａ^{ｓｈａｋｅ}］が表す動作が実現されるように、ロボット２４を制御する。 In step S204, the robot 24 is controlled so as to realize the action represented by the action data a=[a ^incline , a ^shake ] generated in step S202.

　図１１に示される制御処理が繰り返され、ロボット２４に対して制御信号が繰り返し出力されることにより、対象物の秤量が適切に実行される。 The control process shown in FIG. 11 is repeated, and a control signal is repeatedly output to the robot 24, thereby allowing the object to be weighed appropriately.

　以上説明したように、本実施形態に係る制御装置は、粉体である対象物を秤量するロボットを制御する制御装置である。制御装置は、ロボットが対象物を秤量する際の状態を表す状態データを取得し、当該状態データを、予め生成された学習済みモデルへ入力する。なお、学習済みモデルは、状態データが入力されるとロボットがすくい上げた対象物の量を調整する動作を含む行動データが出力されるモデルである。制御装置は、このような学習済みモデルを用いて、状態データに応じた行動データを生成する。そして、制御装置は、生成された行動データが表す動作が実現されるように、ロボットを制御する。これにより、ロボットが粉体である対象物を秤量する際に、目標量よりも多い対象物をすくい上げてしまった状況下において対象物の量の調整することができる。 As described above, the control device according to this embodiment is a control device that controls a robot that weighs a powder object. The control device acquires state data that represents the state of the robot when it weighs the object, and inputs the state data to a trained model that has been generated in advance. The trained model is a model that outputs behavioral data that includes an action to adjust the amount of the object scooped up by the robot when state data is input. The control device uses this trained model to generate behavioral data according to the state data. The control device then controls the robot so that the action represented by the generated behavioral data is realized. This makes it possible to adjust the amount of the object when the robot scoops up more than the target amount when weighing a powder object.

　具体的には、ロボットにはスプーンが設置されており、状態データはスプーンの姿勢と対象物の目標量とを含むデータであることにより、目標量よりも多い対象物をすくい上げてしまった状況下においても、スプーンを傾ける又は動かす動作をすることにより、対象物の量の調整することができる。 Specifically, the robot is equipped with a spoon, and the state data includes the spoon's orientation and the target amount of the object, so that even in a situation where more object than the target amount has been scooped up, the amount of object can be adjusted by tilting or moving the spoon.

　また、対象物の量を調整する動作は、ロボットがすくい上げた対象物を落とす動作である。対象物を落とす動作は、ロボットの特定箇所の姿勢及びロボットの特定箇所の位置を変化させる動作である。これにより、対象物の量を微妙に調整することが可能となり、精度高く秤量を実行することが可能となる。 The action of adjusting the amount of the object is the action of dropping the object that the robot has scooped up. The action of dropping the object is the action of changing the attitude of a specific part of the robot and the position of a specific part of the robot. This makes it possible to finely adjust the amount of the object, enabling weighing to be performed with high precision.

　また、本実施形態に係る学習済みモデル生成装置は、仮想ロボットが仮想の粉体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られる学習用データを取得する。そして、学習済みモデル生成装置は、学習用データに基づいて、粉体である対象物を秤量する際の状態を表す状態データが入力されるとロボットがすくい上げた対象物の量を調整する動作を含む行動データが出力される学習済みモデルを生成する。これにより、ロボットが粉体である対象物を秤量する際に、目標量よりも多い対象物をすくい上げてしまった状況下において対象物の量の調整するための学習済みモデルを得ることができる。 The trained model generating device according to this embodiment also acquires training data obtained by computer simulating the actions of a virtual robot when weighing a virtual object, which is a virtual powder. The trained model generating device then generates a trained model based on the training data, which outputs behavioral data including actions to adjust the amount of the object scooped up by the robot when state data representing the state when weighing the powder object is input. This makes it possible to obtain a trained model for adjusting the amount of the object when the robot scoops up more than the target amount when weighing the powder object.

　また、本実施形態に係る学習済みモデル生成装置は、仮想ロボットが仮想の粉体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られたデータに基づいて、学習済みモデルを生成する。これにより、現実のロボットを動作させることなく学習済みモデルを生成することが可能となる。現実のロボットを用いて学習用データを収集する際には、粉体をまき散らしてしまう等の事態の発生が予想される。これに対して、本実施形態のようにコンピュータシミュレーションを利用して学習用データを収集することにより、粉体をまき散らしてしまう等の事態の発生を抑制することが可能となる。 The trained model generation device according to this embodiment generates a trained model based on data obtained by computer simulating the operation of a virtual robot when weighing a virtual object, which is a virtual powder. This makes it possible to generate a trained model without operating a real robot. When collecting training data using a real robot, it is expected that there will be incidents such as scattering powder. In contrast, by collecting training data using computer simulation as in this embodiment, it is possible to prevent incidents such as scattering powder.

　また、本実施形態に係る学習済みモデル生成装置は、仮想空間内の物理パラメータをランダムに変化させることによりシミュレーションを実行する。これにより、シミュレーションにおいて様々な環境が仮想的に実現され、様々な状況に対応可能な学習済みモデルが生成される。 The trained model generation device according to this embodiment also executes a simulation by randomly varying physical parameters in a virtual space. This allows various environments to be virtually realized in the simulation, and generates a trained model that can handle a variety of situations.

　次に、実施例について説明する。本実施例では、上述した手法を４つの粉体の秤量に適用した。以下の表は、本実施例の結果を表す図である。以下の表は、小麦粉（Flour）、米粉（Rice）、塩（Salt）、及び石炭（Coal）の４つの粉体に対して、上述した手法を適用した際の秤量結果である。本実施例では、ロボットを用いて5mg、10mg、15mgの秤量を行った。以下の結果は、平均絶対値誤差（Mean absolute error）と標準偏差（Standard deviation）とから構成されている。例えば、小麦粉（Flour）の5mgの秤量の結果である「0.0±0.4」に関しては、0.0が平均絶対誤差を表し、0.4が標準偏差を表す。なお、本実施例では、１つの粉体種に対して、５つの方策を学習させ、それぞれの方策について５つの実験を行った。このため、１つの粉体種に対して２５個の実験結果が得られており、その結果が以下の表に示されている。以下の表からも分かるように、精度良く秤量が行われていることが分かる。 Next, an example will be described. In this example, the above-mentioned method was applied to weighing four powders. The following table shows the results of this example. The following table shows the weighing results when the above-mentioned method was applied to four powders: flour, rice, salt, and coal. In this example, 5 mg, 10 mg, and 15 mg were weighed using a robot. The results below consist of the mean absolute error and standard deviation. For example, for the weighing result of 5 mg of flour, "0.0±0.4", 0.0 represents the mean absolute error and 0.4 represents the standard deviation. In this example, five strategies were trained for one type of powder, and five experiments were performed for each strategy. Therefore, 25 experimental results were obtained for one type of powder, and the results are shown in the following table. As can be seen from the table below, the weighing was performed with high accuracy.

　また、以下の表は、学習済みモデルを生成する際のシミュレーションにおいて、仮想空間内の物理パラメータをランダムに変化させた場合について説明する。以下の表は、ランダムに変化させる物理パラメータの一覧である。 The following table explains the case where physical parameters in the virtual space are randomly changed in the simulation when generating a trained model. The following table lists the physical parameters that are randomly changed.

「Powder friction coefficient」は、粉体を構成する粒子間の摩擦係数を表す。「Powder particle number」は、粒子の数を表す。「Powder particle radius」は、粒子の半径を表す。「Powder particle mass」は、粒子の質量を表す。「Spoon friction coefficient」は、スプーンと粒子との間の摩擦係数を表す。「Shake speed weight」は、スプーンの振りの強さを表すパラメータを表す。「Gravity」は、重力を表す。「Goal powder amount」は、粉体の目標質量を表す。 "Powder friction coefficient" represents the friction coefficient between particles that make up the powder. "Powder particle number" represents the number of particles. "Powder particle radius" represents the radius of a particle. "Powder particle mass" represents the mass of a particle. "Spoon friction coefficient" represents the friction coefficient between the spoon and a particle. "Shake speed weight" is a parameter that represents the strength of the spoon swing. "Gravity" represents gravity. "Goal powder amount" represents the target mass of the powder.

　以下の表は、学習済みモデルを生成する際のシミュレーションにおいて、仮想空間内の物理パラメータをランダムに変化させた場合秤量結果である。以下の結果も、平均絶対値誤差と標準偏差とから構成されている。以下の表における「Ours」は、本実施形態において提案された方法によって得られた秤量結果である。また、「Incline」はスプーンを傾ける動作を表し、「Shake」はスプーンを振る動作を表す。なお、以下の表の２行目の「MLP」及び「P-controller」は既存手法を表す。「MLP」は多層ニューラルネットワークモデルを表し、「P-controller」はP制御を表す。以下の表の４行目に示されている秤量結果は、掲載されている物理パラメータをランダムに変化させずに固定した際の秤量結果である。また、以下の表の５行目に示されている秤量結果は、４行目に示されている秤量結果のうち成績の良かった上位ｎ個の物理パラメータをランダムに変化させることにより得られた秤量結果である。以下の表からも分かるように、物理パラメータをランダムに変化させることにより、より精度良く秤量が行われることが分かる。なお、ここでは、１行目はOurs(Incline&shake)、２行目はMLPおよびP-Controller、３行目はOurs(Incline only)およびOurs(Shake only)、というように数える。 The following table shows the weighing results when the physical parameters in the virtual space are randomly changed in a simulation for generating a trained model. The following results are also composed of the mean absolute error and the standard deviation. "Ours" in the following table is the weighing result obtained by the method proposed in this embodiment. "Incline" represents the action of tilting the spoon, and "Shake" represents the action of shaking the spoon. "MLP" and "P-controller" in the second row of the following table represent existing methods. "MLP" represents a multi-layer neural network model, and "P-controller" represents P control. The weighing results shown in the fourth row of the following table are the weighing results when the listed physical parameters are fixed without being randomly changed. The weighing results shown in the fifth row of the following table are the weighing results obtained by randomly changing the top n physical parameters with the best results among the weighing results shown in the fourth row. As can be seen from the following table, the weighing can be performed more accurately by randomly changing the physical parameters. Here, the first line counts Ours(Incline&shake), the second line counts MLP and P-Controller, the third line counts Ours(Incline only) and Ours(Shake only), and so on.

　以上、実施形態及び実施例について説明したが、本開示の要旨を逸脱しない範囲において、種々なる態様で実施し得ることは勿論である。 The above describes the embodiments and examples, but it goes without saying that the present disclosure can be embodied in various ways without departing from the spirit of the disclosure.

　例えば、上記実施形態では、対象物が粉体である場合を例に説明したが、これに限定されるものではない。例えば、対象物は、粒体又は流体であっても上記実施形態及び実施例を適用することは可能である。 For example, in the above embodiment, the target object is a powder, but the present invention is not limited to this. For example, the above embodiment and examples can be applied even if the target object is a granule or a fluid.

　また、上記実施形態の状態データｓ及び行動データａの構成は適宜変更することが可能である。例えば、目標質量は状態データｓへ組み入れずに構成することも可能である。例えば、目標質量毎に学習済みモデルを生成するようにしてもよい。 In addition, the configuration of the state data s and the action data a in the above embodiment can be changed as appropriate. For example, it is possible to configure the state data s without incorporating the target mass. For example, a trained model can be generated for each target mass.

　また、状態データｓを取得する際のセンサ群には、例えば、カメラ又はマイクが含まれていてもよく、スプーンへの盛り具合の画像や、スプーンから粉を落とす音から状態データｓ（例えば、粉体の質量）を得るようにしてもよい。 The group of sensors used to acquire the status data s may include, for example, a camera or a microphone, and the status data s (e.g., the mass of the powder) may be obtained from an image of the amount of powder on the spoon or the sound of the powder dropping from the spoon.

　また、上記実施形態では、学習済みモデルがLSTM構造を有するニューラルネットワークモデルによって実現される場合を例に説明したが、これに限定されるものではない。機械学習モデルであれば、どのようなモデルであってもよい。 In the above embodiment, the trained model is implemented by a neural network model having an LSTM structure, but this is not limited to this. Any machine learning model may be used.

　また、上記実施形態でＣＰＵがソフトウェア（プログラム）を読み込んで実行した各処理を、ＣＰＵ以外の各種のプロセッサが実行してもよい。この場合のプロセッサとしては、ＦＰＧＡ（Ｆｉｅｌｄ－Ｐｒｏｇｒａｍｍａｂｌｅ　Ｇａｔｅ　Ａｒｒａｙ）等の製造後に回路構成を変更可能なＰＬＤ（Ｐｒｏｇｒａｍｍａｂｌｅ　Ｌｏｇｉｃ　Ｄｅｖｉｃｅ）、及びＡＳＩＣ（Ａｐｐｌｉｃａｔｉｏｎ　Ｓｐｅｃｉｆｉｃ　Ｉｎｔｅｇｒａｔｅｄ　Ｃｉｒｃｕｉｔ）等の特定の処理を実行させるために専用に設計された回路構成を有するプロセッサである専用電気回路等が例示される。また、各処理を、これらの各種のプロセッサのうちの１つで実行してもよいし、同種又は異種の２つ以上のプロセッサの組み合わせ（例えば、複数のＦＰＧＡ、及びＣＰＵとＦＰＧＡとの組み合わせ等）で実行してもよい。また、これらの各種のプロセッサのハードウェア的な構造は、より具体的には、半導体素子等の回路素子を組み合わせた電気回路である。 Furthermore, the processes executed by the CPU after reading the software (program) in the above embodiment may be executed by various processors other than the CPU. Examples of processors in this case include PLDs (Programmable Logic Devices) such as FPGAs (Field-Programmable Gate Arrays) whose circuit configuration can be changed after manufacture, and dedicated electrical circuits such as ASICs (Application Specific Integrated Circuits), which are processors with circuit configurations designed exclusively to execute specific processes. Furthermore, each process may be executed by one of these various processors, or by a combination of two or more processors of the same or different types (for example, multiple FPGAs, or a combination of a CPU and an FPGA, etc.). Moreover, the hardware structure of these various processors is, more specifically, an electrical circuit that combines circuit elements such as semiconductor elements.

　また、上記実施形態では、各プログラムが記憶装置に予め記憶（インストール）されている態様を説明したが、これに限定されない。プログラムは、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、ブルーレイディスク、ＵＳＢメモリ等の記憶媒体に記憶された形態で提供されてもよい。また、プログラムは、ネットワークを介して外部装置からダウンロードされる形態としてもよい。 In the above embodiment, the programs are described as being pre-stored (installed) in a storage device, but the present invention is not limited to this. The programs may be provided in a form stored in a storage medium such as a CD-ROM, DVD-ROM, Blu-ray disc, or USB memory. The programs may also be downloaded from an external device via a network.

（付記）
　以下、本開示の態様について付記する。 (Additional Note)
The following additional notes are given regarding aspects of the present disclosure.

（付記１）
　粉体、粒体、又は流体である対象物を秤量するロボットを制御する制御装置であって、
　前記ロボットが前記対象物を秤量する際の状態を表す状態データを取得する取得部と、
　前記取得部により取得された前記状態データを、予め生成された学習済みモデルであって、かつ前記状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルに対して入力することにより、前記取得部により取得された前記状態データに応じた前記行動データを生成する生成部と、
　前記生成部により生成された前記行動データが表す動作が実現されるように、前記ロボットを制御する制御部と、
　を含む制御装置。 (Appendix 1)
A control device for controlling a robot that weighs an object that is a powder, a granular material, or a fluid, comprising:
an acquisition unit that acquires state data representing a state of the robot when weighing the object;
a generation unit that generates the behavioral data according to the status data acquired by the acquisition unit by inputting the status data acquired by the acquisition unit to a trained model that is generated in advance and outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the status data is input; and
A control unit that controls the robot so as to realize an action represented by the action data generated by the generation unit;
A control device including:

（付記２）
　前記学習済みモデルは、
　仮想ロボットが仮想の粉体、粒体、又は流体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られたデータに基づき学習されたモデルである、
　付記１に記載の制御装置。 (Appendix 2)
The trained model is
The model is trained based on data obtained by computer simulation of the behavior of a virtual robot when weighing a virtual object, which is a virtual powder, granule, or fluid.
2. The control device of claim 1.

（付記３）
　前記シミュレーションは、仮想空間内の物理パラメータをランダムに変化させることにより実行されたシミュレーションである、
　付記２に記載の制御装置。 (Appendix 3)
The simulation is performed by randomly varying physical parameters in a virtual space.
3. The control device of claim 2.

（付記４）
　前記ロボットにはスプーンが設置されており、
　前記状態データは、前記スプーンの姿勢と前記対象物の目標量とを含むデータである、
　付記１～付記３の何れか１項に記載の制御装置。 (Appendix 4)
The robot is provided with a spoon,
The state data includes the attitude of the spoon and a target amount of the object.
4. The control device according to claim 1 ,

（付記５）
　前記対象物の量を調整する動作は、前記ロボットがすくい上げた前記対象物を落とす動作である、
　付記１～付記４の何れか１項に記載の制御装置。 (Appendix 5)
The action of adjusting the amount of the object is an action of dropping the object scooped up by the robot.
5. The control device according to claim 1 ,

（付記６）
　前記対象物を落とす動作は、前記ロボットの特定箇所の姿勢及び前記ロボットの特定箇所の位置を変化させる動作である、
　付記５に記載の制御装置。 (Appendix 6)
The action of dropping the object is an action of changing the posture of a specific part of the robot and the position of the specific part of the robot.
6. The control device according to claim 5.

（付記７）
　仮想ロボットが仮想の粉体、粒体、又は流体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られる学習用データを取得する学習用取得部と、
　前記学習用取得部により取得された学習用データに基づいて、ロボットが粉体、粒体、又は流体である対象物を秤量する際の状態を表す状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルを生成する学習部と、
　を含む学習済みモデル生成装置。 (Appendix 7)
a learning acquisition unit that acquires learning data obtained by performing a computer simulation of an operation of the virtual robot when weighing a virtual object, which is a virtual powder, a virtual particle, or a virtual fluid;
a learning unit that generates a trained model based on the learning data acquired by the learning acquisition unit, and that outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when state data representing a state when the robot weighs the object, which is a powder, a granule, or a fluid, is input; and
A trained model generating device comprising:

（付記８）
　粉体、粒体、又は流体である対象物を秤量するロボットを制御する制御方法であって、
　前記ロボットが前記対象物を秤量する際の状態を表す状態データを取得し、
　取得された前記状態データを、予め生成された学習済みモデルであって、かつ前記状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルに対して入力することにより、取得された前記状態データに応じた前記行動データを生成し、
　生成された前記行動データが表す動作が実現されるように、前記ロボットを制御する、
　処理をコンピュータが実行する制御方法。 (Appendix 8)
A control method for controlling a robot that weighs an object that is a powder, a granular material, or a fluid, comprising:
acquiring status data representing a status of the robot when weighing the object;
The acquired state data is input to a trained model that is generated in advance and that outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the state data is input, thereby generating the behavioral data according to the acquired state data;
Controlling the robot so as to realize an action represented by the generated behavioral data.
A method of controlling how processing is executed by a computer.

（付記９）
　仮想ロボットが仮想の粉体、粒体、又は流体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られる学習用データを取得し、
　取得された学習用データに基づいて、ロボットが粉体、粒体、又は流体である対象物を秤量する際の状態を表す状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルを生成する、
　処理をコンピュータが実行する学習済みモデル生成方法。 (Appendix 9)
acquiring learning data obtained by computer simulating the operation of the virtual robot when weighing a virtual object, which is a virtual powder, a virtual granule, or a virtual fluid;
A trained model is generated based on the acquired training data, in which, when state data representing a state when the robot weighs an object, which is a powder, a granule, or a fluid, is input, behavioral data including an action of adjusting the amount of the object scooped up by the robot is output.
A method for generating trained models in which processing is performed by a computer.

（付記１０）
　粉体、粒体、又は流体である対象物を秤量するロボットを制御する制御プログラムであって、
　前記ロボットが前記対象物を秤量する際の状態を表す状態データを取得し、
　取得された前記状態データを、予め生成された学習済みモデルであって、かつ前記状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルに対して入力することにより、取得された前記状態データに応じた前記行動データを生成し、
　生成された前記行動データが表す動作が実現されるように、前記ロボットを制御する、
　処理をコンピュータに実行させるための制御プログラム。 (Appendix 10)
A control program for controlling a robot that weighs an object that is a powder, a granular material, or a fluid, comprising:
acquiring status data representing a status of the robot when weighing the object;
The acquired state data is input to a trained model that is generated in advance and that outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the state data is input, thereby generating the behavioral data according to the acquired state data;
Controlling the robot so as to realize an action represented by the generated behavioral data.
A control program that causes a computer to execute processing.

（付記１１）
　仮想ロボットが仮想の粉体、粒体、又は流体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られる学習用データを取得し、
　取得された学習用データに基づいて、ロボットが粉体、粒体、又は流体である対象物を秤量する際の状態を表す状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルを生成する、
　処理をコンピュータに実行させるための学習済みモデル生成プログラム。 (Appendix 11)
acquiring learning data obtained by computer simulating the operation of the virtual robot when weighing a virtual object, which is a virtual powder, a virtual granule, or a virtual fluid;
A trained model is generated based on the acquired training data, in which, when state data representing a state when the robot weighs an object, which is a powder, a granule, or a fluid, is input, behavioral data including an action of adjusting the amount of the object scooped up by the robot is output.
A trained model generation program for allowing a computer to execute processing.

　２０２３年７月３１日に出願された日本国特許出願２０２３‐１２４１３２号の開示は、その全体が参照により本明細書に取り込まれる。本明細書に記載された全ての文献、特許出願、および技術規格は、個々の文献、特許出願、および技術規格が参照により取り込まれることが具体的かつ個々に記された場合と同程度に、本明細書中に参照により取り込まれる。 The disclosure of Japanese Patent Application No. 2023-124132, filed on July 31, 2023, is incorporated herein by reference in its entirety. All documents, patent applications, and technical standards described herein are incorporated herein by reference to the same extent as if each individual document, patent application, and technical standard was specifically and individually indicated to be incorporated by reference.

Claims

A control device for controlling a robot that weighs an object that is a powder, a granular material, or a fluid, comprising:
an acquisition unit that acquires state data representing a state of the robot when weighing the object;
a generation unit that generates the behavioral data according to the status data acquired by the acquisition unit by inputting the status data acquired by the acquisition unit to a trained model that is generated in advance and outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the status data is input; and
A control unit that controls the robot so as to realize an action represented by the action data generated by the generation unit;
A control device including:

The trained model is
The model is trained based on data obtained by computer simulation of the behavior of a virtual robot when weighing a virtual object, which is a virtual powder, granule, or fluid.
The control device according to claim 1 .

The computer simulation is a simulation performed by randomly varying physical parameters in a virtual space.
The control device according to claim 2.

The robot is provided with a spoon,
The state data includes the attitude of the spoon and a target amount of the object.
The control device according to any one of claims 1 to 3.

The action of adjusting the amount of the object is an action of dropping the object scooped up by the robot.
The control device according to any one of claims 1 to 3.

The action of dropping the object is an action of changing the posture of a specific part of the robot and the position of the specific part of the robot.
The control device according to claim 5.

a learning acquisition unit that acquires learning data obtained by performing a computer simulation of an operation of the virtual robot when weighing a virtual object, which is a virtual powder, a virtual particle, or a virtual fluid;
a learning unit that generates a trained model based on the learning data acquired by the learning acquisition unit, and that outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when state data representing a state when the robot weighs the object, which is a powder, a granule, or a fluid, is input; and
A trained model generating device comprising:

A control method for controlling a robot that weighs an object that is a powder, a granular material, or a fluid, comprising:
acquiring status data representing a status of the robot when weighing the object;
The acquired state data is input to a trained model that is generated in advance and that outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the state data is input, thereby generating the behavioral data according to the acquired state data;
Controlling the robot so as to realize an action represented by the generated behavioral data.
A method of controlling how processing is executed by a computer.

acquiring learning data obtained by computer simulating the operation of the virtual robot when weighing a virtual object, which is a virtual powder, a virtual granule, or a virtual fluid;
A trained model is generated based on the acquired training data, in which, when state data representing a state when the robot weighs an object, which is a powder, a granule, or a fluid, is input, behavioral data including an action of adjusting the amount of the object scooped up by the robot is output.
A method for generating trained models in which processing is performed by a computer.

A control program for controlling a robot that weighs an object that is a powder, a granular material, or a fluid, comprising:
acquiring status data representing a status of the robot when weighing the object;
The acquired state data is input to a trained model that is generated in advance and that outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the state data is input, thereby generating the behavioral data according to the acquired state data;
Controlling the robot so as to realize an action represented by the generated behavioral data.
A control program that causes a computer to execute processing.

acquiring learning data obtained by computer simulating the operation of the virtual robot when weighing a virtual object, which is a virtual powder, a virtual granule, or a virtual fluid;
A trained model is generated based on the acquired training data, in which, when state data representing a state when the robot weighs an object, which is a powder, a granule, or a fluid, is input, behavioral data including an action of adjusting the amount of the object scooped up by the robot is output.
A trained model generation program for allowing a computer to execute processing.