Nothing Special   »   [go: up one dir, main page]

WO2025028204A1 - Control device, trained model generation device, method, and program - Google Patents

Control device, trained model generation device, method, and program Download PDF

Info

Publication number
WO2025028204A1
WO2025028204A1 PCT/JP2024/025018 JP2024025018W WO2025028204A1 WO 2025028204 A1 WO2025028204 A1 WO 2025028204A1 JP 2024025018 W JP2024025018 W JP 2024025018W WO 2025028204 A1 WO2025028204 A1 WO 2025028204A1
Authority
WO
WIPO (PCT)
Prior art keywords
robot
virtual
data
action
powder
Prior art date
Application number
PCT/JP2024/025018
Other languages
French (fr)
Japanese (ja)
Inventor
勇貴 角川
政志 ▲濱▼屋
一敏 田中
Original Assignee
オムロン株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP2023124132A external-priority patent/JP2025020634A/en
Application filed by オムロン株式会社 filed Critical オムロン株式会社
Publication of WO2025028204A1 publication Critical patent/WO2025028204A1/en

Links

Definitions

  • a learning system that performs the action of weighing powder such as salt is known (for example, see Reference 1: JP 2020-194242 A).
  • the manipulator of this learning system is equipped with a spoon as an end effector, and performs the action of weighing salt (for example, see paragraph [0013] of Reference 1).
  • a liquid weighing method is also known in which a robot arm is used to pour water from a first container into a second container and weigh the liquid (see, for example, Document 2: JP 2021-164980 A).
  • a control device that controls the operation of the robot arm acquires an image including a first container taken at a first time point, acquires the angle of the first container at the first time point, acquires the amount of liquid in the second container at the first time point, inputs the acquired image, angle, and amount of liquid into a learning model, acquires the angle at a second time point output by the learning model, and controls the angle of the first container held by the robot arm according to the acquired angle at the second time point (see, for example, claim 1 in Document 2).
  • a machine learning method is also known in which a robot learns to scoop up a large amount of powder, granules, or fluid from a container and divide it into a separate container in a set amount (see, for example, Reference 3: JP 2019-098419 A).
  • This machine learning method consists of a real learning process in which reinforcement learning is performed between a multi-axis robot and an object in a real space, and a simulation learning process in which reinforcement learning is performed between a pseudo multi-axis robot that simulates the multi-axis robot and a pseudo object that simulates the object in a simulation space (see, for example, the summary of Reference 3).
  • Reference 4 discloses a control model of a robot used to scoop up granular media.
  • the robot When a robot weighs objects such as powders, granules, or fluids, the robot may end up scooping up more than the target amount. Normally, a robot operates aiming to reach the target amount, but this is difficult to achieve due to various errors, and the robot may end up scooping up more than the target amount.
  • the technology disclosed in the above-mentioned documents 1-4 is a technology for weighing powders, etc., but does not take into consideration how the robot should behave if it scoops up more than the target amount.
  • the present disclosure has been made in consideration of the above points, and aims to adjust the amount of an object when a robot scoops up more than the target amount of an object, which may be a powder, granules, or fluid, when weighing the object.
  • control device of the present disclosure is a control device for controlling a robot that weighs an object, which is a powder, granules, or fluid, and includes: an acquisition unit that acquires state data representing the state of the robot when weighing the object; a generation unit that generates the behavioral data corresponding to the state data acquired by the acquisition unit by inputting the state data acquired by the acquisition unit into a trained model that is generated in advance and that outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the state data is input; and a control unit that controls the robot so that the action represented by the behavioral data generated by the generation unit is realized.
  • the control method disclosed herein is a control method for controlling a robot that weighs an object, which is a powder, granules, or fluid, in which a computer executes a process to acquire status data representing the state of the robot when weighing the object, input the acquired status data to a trained model that is generated in advance and outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the status data is input, thereby generating the behavioral data according to the acquired status data, and controlling the robot so as to realize the action represented by the generated behavioral data.
  • the control program disclosed herein is a control program for controlling a robot that weighs an object, which is a powder, granules, or fluid, and acquires status data representing the state of the robot when weighing the object, and inputs the acquired status data to a trained model that is generated in advance and outputs behavioral data including an action to adjust the amount of the object scooped up by the robot when the status data is input, thereby generating behavioral data according to the acquired status data, and controlling the robot so as to realize the action represented by the generated behavioral data.
  • an object which is a powder, granules, or fluid
  • the trained model generating device of the present disclosure is a trained model generating device including a training acquisition unit that acquires training data obtained by computer simulating the operation of a virtual robot when weighing a virtual object that is a virtual powder, granules, or fluid, and a learning unit that generates a trained model based on the training data acquired by the training acquisition unit, and that outputs behavioral data including an operation of adjusting the amount of the object scooped up by the robot when state data representing the state when the robot weighs the object that is a powder, granules, or fluid is input.
  • the trained model generation method disclosed herein is a trained model generation method in which a computer executes a process to acquire training data obtained by computer simulating the operation of a virtual robot when weighing a virtual object that is a virtual powder, granules, or fluid, and generates a trained model based on the acquired training data, in which, when state data representing the state of the robot when weighing the object that is a powder, granules, or fluid is input, behavioral data including an operation of adjusting the amount of the object scooped up by the robot is output.
  • the trained model generation program of the present disclosure is a trained model generation program for causing a computer to execute a process that acquires training data obtained by computer simulation of the operation of a virtual robot when weighing a virtual object that is a virtual powder, granules, or fluid, and generates a trained model based on the acquired training data, in which, when state data representing the state of the robot when weighing the object that is a powder, granules, or fluid is input, behavioral data including an operation of adjusting the amount of the object scooped up by the robot is output.
  • control device trained model generation device, method, and program disclosed herein allow the amount of an object to be adjusted in a situation where the robot scoops up more than the target amount of an object, which may be a powder, granule, or fluid, when weighing the object.
  • an object which may be a powder, granule, or fluid
  • FIG. 1 is a diagram for explaining an overview of the present embodiment.
  • FIG. 1 is a diagram for explaining an overview of the present embodiment.
  • 11A to 11C are diagrams for explaining the operation of the spoon of the present embodiment.
  • 11A and 11B are diagrams for explaining randomization of physical parameters in the present embodiment.
  • FIG. 2 is a block diagram showing a hardware configuration of the trained model generation device according to the present embodiment.
  • FIG. 1 is a block diagram showing a schematic configuration of a trained model generation device according to an embodiment of the present invention.
  • FIG. 1 is a diagram for explaining a trained model of the present embodiment.
  • FIG. 1 is a block diagram showing a schematic configuration of a control system according to an embodiment of the present invention.
  • 2 is a block diagram showing a hardware configuration of a control device according to the embodiment.
  • FIG. 1 is a flowchart showing the flow of a trained model generation process in this embodiment. 4 is a flowchart showing the flow of a control process in the present embodiment.
  • ⁇ Overview of the embodiment> 1 and 2 are diagrams for explaining an overview of this embodiment.
  • a robot 24 of this embodiment weighs powder filled in a container Ct.
  • a trained model used when weighing powder is generated using learning data obtained by executing a computer simulation.
  • a policy is generated by executing reinforcement learning based on virtual state data Sv and virtual behavior data Av obtained by executing a simulator. Note that this policy is realized by a known machine learning model.
  • the robot executes the operation of weighing the powder using this strategy. Specifically, when the current state data S is input to the strategy, the strategy outputs new action data. The robot performs an operation according to the action data, and then weighs W the powder using an electronic balance. New state data S is generated by the robot's operation, and this state data is input to the strategy, and the next action data is output from the strategy. By executing this cycle in the operation phase Op, the robot weighs the powder.
  • the robot When the robot weighs a powder object, it may scoop up more than the target amount. In this embodiment, such a case is assumed, and the robot adjusts the amount of the object by dropping the object it has scooped up. If the amount of the object scooped up is less than the target amount, the robot may scoop up the object again. This is explained in detail below.
  • reinforcement learning is used to generate a trained model reflecting the above-mentioned policy.
  • the problem is defined by a Markov decision process (S, A, ⁇ , P, r, ⁇ ).
  • S is a set of states acquired from the environment
  • A is a set of selectable actions.
  • P a ss' represents the probability that action a is selected in a situation in which state s is observed, and transitions to state s' in the next time step.
  • R a ss' represents the reward when action a is selected in a situation in which state s is observed, and transitions to state s' in the next time step.
  • ⁇ ⁇ [0, 1) is a discount rate.
  • s) is the probability that action a is selected in a situation in which state s is given.
  • the purpose of reinforcement learning is to find an optimal policy ⁇ * that maximizes the sum of rewards represented by the following formula (1). Note that t represents time.
  • the state s is represented by a vector [w current , ⁇ spoon , w goal ] whose elements are the current powder mass w current , the target powder mass w goal , and the spoon tilt angle ⁇ spoon .
  • the state s incorporates the target powder mass w goal .
  • the action a is represented by a vector [a incline , a shake ] whose elements are the action a incline of tilting the spoon and the action a shake of the spoon .
  • the action a incline corresponds to the relative pitch angle of the spoon.
  • FIG. 3 is a diagram for explaining the spoon tilting action a incline and the spoon shaking action a shake .
  • the spoon shaking action a shake is an action of moving the spoon back and forth
  • the spoon shaking action a shake is an action of moving the spoon back and forth
  • the spoon tilting action a incline is an action of changing the spoon's posture.
  • the spoon shaking action a shake can be expressed by the spoon's moving distance and movement acceleration. Therefore, the spoon shaking action a shake is expressed by a proportional coefficient between the spoon's moving distance and movement acceleration.
  • state s and action a in this embodiment are defined as follows:
  • the reward r is defined as the difference between the goal mass w goal and the current powder mass w current , as shown in equation (3) below.
  • s) is realized using a machine learning model.
  • s) is realized using a neural network model having a long short-term memory (LSTM) structure, which is an example of a machine learning model.
  • the neural network model having the LSTM structure can take into account the time-series state of the powder.
  • a neural network model having an LSTM structure is trained using training data obtained by executing a computer simulation, thereby generating a trained model to be used when weighing powder.
  • FIG. 4 is a diagram for explaining changing physical parameters in virtual space.
  • various physical parameters in the virtual space are randomly changed (Rd in FIG. 4) while the action of moving the spoon in the virtual space to adjust the amount of powder on the spoon is reflected in the trained model.
  • the trained model is used to perform the action of adjusting the amount of powder on the spoon by moving the spoon in real space.
  • the coefficient of friction between the particles that make up the powder is also a parameter that indicates how difficult it is for the powder to flow.
  • the coefficient of friction between the spoon and the particles is also a parameter that affects the amount of powder remaining on the spoon.
  • the parameter that indicates the strength of the spoon swing is changed randomly in order to reflect the difference in strength of the spoon swing by the robot between the simulator and reality in the trained model.
  • Gravity is changed randomly in order to reproduce the phenomenon of powder flying. In order to reproduce the phenomenon of powder flying as much as possible, gravity in the virtual space may be set to a value that is much smaller than the gravitational acceleration in reality.
  • the target mass is changed randomly in order to weigh the powder at an arbitrary target mass.
  • FIG. 5 is a block diagram showing a hardware configuration of the trained model generation device 10 according to the present embodiment.
  • the trained model generation device 10 has a CPU (Central Processing Unit) 42, a memory 44, a storage device 46, an input/output I/F (Interface) 48, a storage medium reading device 50, and a communication I/F 52.
  • Each component is connected to each other via a bus 54 so as to be able to communicate with each other.
  • the storage device 46 stores a trained model generation program for executing each process described below.
  • the CPU 42 is a central processing unit, and executes various programs and controls each component. That is, the CPU 42 reads the program from the storage device 46 and executes the program using the memory 44 as a working area. The CPU 42 controls each of the components and performs various calculation processes according to the program stored in the storage device 46.
  • Memory 44 is made up of RAM (Random Access Memory) and serves as a working area to temporarily store programs and data.
  • Storage device 46 is made up of ROM (Read Only Memory), HDD (Hard Disk Drive), SSD (Solid State Drive), etc., and stores various programs including the operating system, and various data.
  • the input/output I/F 48 is an interface for inputting data from the outside and outputting data to the outside.
  • input devices for various inputs such as a keyboard or mouse, and output devices for outputting various information, such as a display or printer, may be connected.
  • a touch panel display may be used as the output device to function as an input device.
  • the storage medium reader 50 reads data stored in various storage media such as CD (Compact Disc)-ROM, DVD (Digital Versatile Disc)-ROM, Blu-ray Disc, and USB (Universal Serial Bus) memory, and writes data to the storage media.
  • CD Compact Disc
  • DVD Digital Versatile Disc
  • USB Universal Serial Bus
  • the communication I/F 52 is an interface for communicating with other devices, and uses standards such as Ethernet (registered trademark), FDDI, and Wi-Fi (registered trademark).
  • the trained model generation device 10 functionally includes a simulation unit 12, a learning acquisition unit 14, and a learning unit 16.
  • a trained model storage unit 18 is also provided in a specified storage area of the trained model generation device 10.
  • Each functional configuration is realized by the CPU 42 reading out each program stored in the storage device 46, expanding it in the memory 44, and executing it.
  • the trained model storage unit 18 stores trained models generated by the process described below.
  • the trained model of this embodiment is a neural network model having an LSTM structure.
  • FIG. 7 is a diagram for explaining the trained model of this embodiment. As shown in FIG. 7, the trained model of this embodiment is a model that realizes a policy ⁇ * (a
  • the simulation unit 12 executes a computer simulation of the operation of a virtual robot in a virtual space when weighing a virtual object, which is a virtual powder.
  • a virtual spoon is installed on the virtual robot, and the virtual robot weighs the virtual powder using the virtual spoon. Note that the simulation unit 12 randomly changes physical parameters in the virtual space when executing the computer simulation.
  • the learning acquisition unit 14 acquires learning data.
  • the learning data is data obtained by computer simulating the operation of the virtual robot when weighing a virtual object, which is a virtual powder.
  • the learning data is composed of data such as the position and posture of a specific part of the virtual robot (e.g., the part where the spoon is attached), the position and posture of the virtual spoon, the amount of virtual object present on the virtual spoon, and the reward r when a certain action a is selected in a certain state s.
  • the learning unit 16 generates a trained model based on the training data acquired by the training acquisition unit 14.
  • the trained model outputs behavior data a including an action to adjust the amount of the object scooped up by the robot.
  • the learning unit 16 performs deep reinforcement learning on a neural network model having an LSTM structure so that the reward function shown in the above formula (1) is maximized.
  • the deep reinforcement learning may be learning so that the current mass of powder approaches a target amount.
  • the learning unit 16 then stores the obtained trained model in the trained model storage unit 18.
  • the reward function is not limited to the above formula (1) and may be another type of reward function.
  • a term that imposes a large penalty on such behavior may be added to the above formula (1).
  • the trained model is more likely to take an action to bring the amount of powder scooped up closer to the target amount and prevent too much powder from being dropped, which is more favorable for weighing the object.
  • a trained model is obtained in which, when state data s is input when the robot is weighing powder, the robot outputs behavioral data a, which includes actions to adjust the amount of the object that it scoops up.
  • This trained model can then be used to control a real robot.
  • Fig. 8 is a block diagram showing a schematic configuration of the control system 20 of this embodiment.
  • the control system 20 includes a sensor group 22, a robot 24, a spoon 26 installed on the robot 24, and a control device 30.
  • the control device 30 controls the operation of the robot 24 using the trained model generated by the trained model generation device 10.
  • the spoon 26 may be anything that can obtain an object whose amount is to be adjusted, and may be a medicine spoon, a cup-like object that the robot can hold, or a structure having a recess in which the object can be placed in the fingers of the robot.
  • the sensor group 22 sequentially detects the state of the robot 24, the state of the spoon 26, the state of the object on the spoon 26, and the state of the object in the container.
  • the sensor group 22 is composed of, for example, an electronic balance that measures the mass of the object on the spoon or in the container, a sensor that detects the posture and position of a specific part of the robot 24, and a sensor that detects the posture and position of the spoon 26.
  • the measurement target is not limited to the mass of the object.
  • the state of the object may be an amount (e.g., the volume of the object) estimated from an image of the object using a known image processing technique.
  • the amount of the object dropped from the spoon may be the amount of the object estimated using a known sound processing technique from the sound made when the object is dropped.
  • the state of the object to be detected is not limited to the state of the object on the spoon 26 or the state of the object in the container, but may also be the amount of the object dropped from the spoon 26.
  • the robot 24 operates in response to control commands output from the control device 30, which will be described later.
  • the spoon 26 is attached to the robot 24, for example as shown in FIG. 1, and scoops up objects in the container in response to the operation of the robot 24.
  • FIG. 9 is a block diagram showing the hardware configuration of the control device 30 according to this embodiment.
  • the control device 30 has a CPU (Central Processing Unit) 62, a memory 64, a storage device 66, an input/output I/F (Interface) 68, a storage medium reading device 70, and a communication I/F 72.
  • Each component is connected to each other so as to be able to communicate with each other via a bus 74.
  • the storage device 66 stores control programs for executing the various processes described below.
  • the CPU 62 is a central processing unit, and executes various programs and controls each component. That is, the CPU 62 reads the programs from the storage device 66 and executes the programs using the memory 64 as a working area.
  • the CPU 62 controls each of the components and performs various calculation processes according to the programs stored in the storage device 66.
  • Memory 64 is made up of RAM (Random Access Memory) and serves as a working area to temporarily store programs and data.
  • Storage device 66 is made up of ROM (Read Only Memory), HDD (Hard Disk Drive), SSD (Solid State Drive), etc., and stores various programs including the operating system, and various data.
  • the input/output I/F 68 is an interface for inputting data from the outside and outputting data to the outside.
  • input devices for performing various inputs such as a keyboard or mouse, and output devices for outputting various information, such as a display or printer, may be connected.
  • a touch panel display may be used as the output device to function as an input device.
  • the storage medium reader 70 reads data stored in various storage media such as CD (Compact Disc)-ROM, DVD (Digital Versatile Disc)-ROM, Blu-ray Disc, and USB (Universal Serial Bus) memory, and writes data to the storage media.
  • CD Compact Disc
  • DVD Digital Versatile Disc
  • USB Universal Serial Bus
  • the communication I/F 72 is an interface for communicating with other devices, and uses standards such as Ethernet (registered trademark), FDDI, and Wi-Fi (registered trademark).
  • the control device 30 functionally includes an acquisition unit 34, a generation unit 36, and a control unit 38.
  • a trained model storage unit 32 is provided in a predetermined storage area of the control device 30.
  • Each functional configuration is realized by the CPU 62 reading out each program stored in the storage device 66, expanding it in the memory 64, and executing it.
  • the trained model storage unit 32 stores the trained model generated by the trained model generation device 10.
  • This behavior data a is behavior data a including an action to adjust the amount of the object scooped up by the robot 24, and corresponds to an action to be taken by the robot 24 at the next time.
  • the action to adjust the amount of the object scooped up by the robot 24 may include at least one of an action to tilt the spoon 26 and an action to shake the spoon 26.
  • the action of dropping the object is an action of changing the posture of a specific part of the robot 24 and the position of a specific part of the robot 24.
  • the method by which the action of the robot 24 is controlled before (preparation stage) or after (processing of the powder after weighing) the robot 24 operates based on the generated action data may be based on a well-known teaching technique.
  • the CPU 42 of the trained model generation device 10 When the trained model generation device 10 receives a predetermined instruction signal, the CPU 42 of the trained model generation device 10 reads the trained model generation program from the storage device 46, expands it into the memory 44, and executes it. As a result, the CPU 42 functions as each functional component of the trained model generation device 10, and the trained model generation process shown in FIG. 10 is executed.
  • step S100 the simulation unit 12 executes a computer simulation of an operation of a virtual robot in a virtual space when weighing a virtual object, which is a virtual powder.
  • a virtual spoon is provided on the virtual robot, and the virtual robot weighs the virtual powder using the virtual spoon.
  • the simulation unit 12 randomly changes physical parameters in the virtual space.
  • step S102 while the simulation unit 12 is performing the simulation, the learning acquisition unit 14 acquires learning data including the position and orientation of a specific part of the virtual robot (e.g., the part where the spoon is attached), the position and orientation of the virtual spoon, the amount of virtual objects present on the virtual spoon, and the reward r when a certain action a is selected in a certain state s.
  • a specific part of the virtual robot e.g., the part where the spoon is attached
  • the position and orientation of the virtual spoon e.g., the amount of virtual objects present on the virtual spoon
  • the reward r when a certain action a is selected in a certain state s.
  • step S104 the learning unit 16 generates a trained model based on the training data acquired in step S102.
  • step S106 the learning unit 16 stores the trained model generated in step S104 in the trained model storage unit 18.
  • the trained model generated by the trained model generating device 10 is input to the control device 30, the trained model is stored in the trained model storage unit 32 of the control device 30. Then, when the control system 20 receives a predetermined instruction signal, the CPU 62 of the control device 30 reads out the control program from the storage device 66, expands it into the memory 64, and executes it. As a result, the CPU 62 functions as each functional component of the control device 30, and the control process shown in FIG. 11 is executed.
  • the control process shown in FIG. 11 is repeated, and a control signal is repeatedly output to the robot 24, thereby allowing the object to be weighed appropriately.
  • the control device is a control device that controls a robot that weighs a powder object.
  • the control device acquires state data that represents the state of the robot when it weighs the object, and inputs the state data to a trained model that has been generated in advance.
  • the trained model is a model that outputs behavioral data that includes an action to adjust the amount of the object scooped up by the robot when state data is input.
  • the control device uses this trained model to generate behavioral data according to the state data.
  • the control device then controls the robot so that the action represented by the generated behavioral data is realized. This makes it possible to adjust the amount of the object when the robot scoops up more than the target amount when weighing a powder object.
  • the robot is equipped with a spoon, and the state data includes the spoon's orientation and the target amount of the object, so that even in a situation where more object than the target amount has been scooped up, the amount of object can be adjusted by tilting or moving the spoon.
  • the action of adjusting the amount of the object is the action of dropping the object that the robot has scooped up.
  • the action of dropping the object is the action of changing the attitude of a specific part of the robot and the position of a specific part of the robot. This makes it possible to finely adjust the amount of the object, enabling weighing to be performed with high precision.
  • the trained model generating device also acquires training data obtained by computer simulating the actions of a virtual robot when weighing a virtual object, which is a virtual powder.
  • the trained model generating device then generates a trained model based on the training data, which outputs behavioral data including actions to adjust the amount of the object scooped up by the robot when state data representing the state when weighing the powder object is input. This makes it possible to obtain a trained model for adjusting the amount of the object when the robot scoops up more than the target amount when weighing the powder object.
  • the trained model generation device generates a trained model based on data obtained by computer simulating the operation of a virtual robot when weighing a virtual object, which is a virtual powder. This makes it possible to generate a trained model without operating a real robot.
  • a real robot When collecting training data using a real robot, it is expected that there will be incidents such as scattering powder.
  • the trained model generation device also executes a simulation by randomly varying physical parameters in a virtual space. This allows various environments to be virtually realized in the simulation, and generates a trained model that can handle a variety of situations.
  • the above-mentioned method was applied to weighing four powders.
  • the following table shows the results of this example.
  • the following table shows the weighing results when the above-mentioned method was applied to four powders: flour, rice, salt, and coal.
  • 5 mg, 10 mg, and 15 mg were weighed using a robot.
  • the results below consist of the mean absolute error and standard deviation.
  • 0.0 represents the mean absolute error and 0.4 represents the standard deviation.
  • five strategies were trained for one type of powder, and five experiments were performed for each strategy. Therefore, 25 experimental results were obtained for one type of powder, and the results are shown in the following table. As can be seen from the table below, the weighing was performed with high accuracy.
  • the following table explains the case where physical parameters in the virtual space are randomly changed in the simulation when generating a trained model.
  • the following table lists the physical parameters that are randomly changed.
  • “Powder friction coefficient” represents the friction coefficient between particles that make up the powder.
  • “Powder particle number” represents the number of particles.
  • “Powder particle radius” represents the radius of a particle.
  • “Powder particle mass” represents the mass of a particle.
  • “Spoon friction coefficient” represents the friction coefficient between the spoon and a particle.
  • “Shake speed weight” is a parameter that represents the strength of the spoon swing.
  • Gramity represents gravity.
  • “Goal powder amount” represents the target mass of the powder.
  • the following table shows the weighing results when the physical parameters in the virtual space are randomly changed in a simulation for generating a trained model.
  • the following results are also composed of the mean absolute error and the standard deviation.
  • "Ours” in the following table is the weighing result obtained by the method proposed in this embodiment.
  • "Incline” represents the action of tilting the spoon, and
  • "Shake” represents the action of shaking the spoon.
  • "MLP” and "P-controller” in the second row of the following table represent existing methods.
  • MLP represents a multi-layer neural network model, and "P-controller” represents P control.
  • the weighing results shown in the fourth row of the following table are the weighing results when the listed physical parameters are fixed without being randomly changed.
  • the weighing results shown in the fifth row of the following table are the weighing results obtained by randomly changing the top n physical parameters with the best results among the weighing results shown in the fourth row. As can be seen from the following table, the weighing can be performed more accurately by randomly changing the physical parameters.
  • the first line counts Ours(Incline&shake)
  • the second line counts MLP and P-Controller
  • the third line counts Ours(Incline only) and Ours(Shake only)
  • the target object is a powder, but the present invention is not limited to this.
  • the above embodiment and examples can be applied even if the target object is a granule or a fluid.
  • the configuration of the state data s and the action data a in the above embodiment can be changed as appropriate.
  • a trained model can be generated for each target mass.
  • the group of sensors used to acquire the status data s may include, for example, a camera or a microphone, and the status data s (e.g., the mass of the powder) may be obtained from an image of the amount of powder on the spoon or the sound of the powder dropping from the spoon.
  • the status data s e.g., the mass of the powder
  • the trained model is implemented by a neural network model having an LSTM structure, but this is not limited to this. Any machine learning model may be used.
  • processors in this case include PLDs (Programmable Logic Devices) such as FPGAs (Field-Programmable Gate Arrays) whose circuit configuration can be changed after manufacture, and dedicated electrical circuits such as ASICs (Application Specific Integrated Circuits), which are processors with circuit configurations designed exclusively to execute specific processes.
  • PLDs Programmable Logic Devices
  • FPGAs Field-Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • each process may be executed by one of these various processors, or by a combination of two or more processors of the same or different types (for example, multiple FPGAs, or a combination of a CPU and an FPGA, etc.).
  • the hardware structure of these various processors is, more specifically, an electrical circuit that combines circuit elements such as semiconductor elements.
  • the programs are described as being pre-stored (installed) in a storage device, but the present invention is not limited to this.
  • the programs may be provided in a form stored in a storage medium such as a CD-ROM, DVD-ROM, Blu-ray disc, or USB memory.
  • the programs may also be downloaded from an external device via a network.
  • a control device for controlling a robot that weighs an object that is a powder, a granular material, or a fluid comprising: an acquisition unit that acquires state data representing a state of the robot when weighing the object; a generation unit that generates the behavioral data according to the status data acquired by the acquisition unit by inputting the status data acquired by the acquisition unit to a trained model that is generated in advance and outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the status data is input; and A control unit that controls the robot so as to realize an action represented by the action data generated by the generation unit;
  • a control device including:
  • the robot is provided with a spoon,
  • the state data includes the attitude of the spoon and a target amount of the object. 4.
  • the control device according to claim 1
  • the action of adjusting the amount of the object is an action of dropping the object scooped up by the robot. 5.
  • the control device according to claim 1
  • the action of dropping the object is an action of changing the posture of a specific part of the robot and the position of the specific part of the robot. 6.
  • a trained model generating device comprising:
  • a control method for controlling a robot that weighs an object that is a powder, a granular material, or a fluid comprising: acquiring status data representing a status of the robot when weighing the object; The acquired state data is input to a trained model that is generated in advance and that outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the state data is input, thereby generating the behavioral data according to the acquired state data; Controlling the robot so as to realize an action represented by the generated behavioral data.
  • a method of controlling how processing is executed by a computer.
  • (Appendix 9) acquiring learning data obtained by computer simulating the operation of the virtual robot when weighing a virtual object, which is a virtual powder, a virtual granule, or a virtual fluid;
  • a trained model is generated based on the acquired training data, in which, when state data representing a state when the robot weighs an object, which is a powder, a granule, or a fluid, is input, behavioral data including an action of adjusting the amount of the object scooped up by the robot is output.
  • a method for generating trained models in which processing is performed by a computer.
  • a control program for controlling a robot that weighs an object that is a powder, a granular material, or a fluid comprising: acquiring status data representing a status of the robot when weighing the object; The acquired state data is input to a trained model that is generated in advance and that outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the state data is input, thereby generating the behavioral data according to the acquired state data; Controlling the robot so as to realize an action represented by the generated behavioral data.
  • (Appendix 11) acquiring learning data obtained by computer simulating the operation of the virtual robot when weighing a virtual object, which is a virtual powder, a virtual granule, or a virtual fluid;
  • a trained model is generated based on the acquired training data, in which, when state data representing a state when the robot weighs an object, which is a powder, a granule, or a fluid, is input, behavioral data including an action of adjusting the amount of the object scooped up by the robot is output.
  • a trained model generation program for allowing a computer to execute processing.

Landscapes

  • Manipulator (AREA)

Abstract

Provided is a control device for controlling a robot that weighs an object which is a powder, a granular material, or a fluid, the control device acquiring state data representing a state when the robot is weighing the object. The control device generates behavior data including an operation for, when the state data is input, adjusting the amount of the object that the robot has scooped up by inputting the state data to a trained model, which is a pre-generated trained model and which outputs the behavior data. The control device controls the robot such that the operation represented by the behavior data is realized.

Description

制御装置、学習済みモデル生成装置、方法、及びプログラムControl device, trained model generation device, method, and program

 本開示は、制御装置、学習済みモデル生成装置、方法、及びプログラムに関する。 This disclosure relates to a control device, a trained model generation device, a method, and a program.

 従来、塩等の粉体を秤量する動作を行う学習システムが知られている(例えば、文献1:特開2020-194242号公報を参照)。この学習システムのマニピュレータには、エンドエフェクタとしてスプーンが装着されており、塩を秤量する動作を行う(例えば、文献1の段落[0013]を参照)。  A learning system that performs the action of weighing powder such as salt is known (for example, see Reference 1: JP 2020-194242 A). The manipulator of this learning system is equipped with a spoon as an end effector, and performs the action of weighing salt (for example, see paragraph [0013] of Reference 1).

 また、ロボットアームを利用して第1容器から第2容器への注水を行い、液体を秤量する液体秤量方法が知られている(例えば、文献2:特開2021-164980号公報を参照)。この液体秤量方法は、ロボットアームの動作を制御する制御装置が、第1時点に撮影された第1容器を含む画像を取得し、第1時点の第1容器の角度を取得し、第1時点の第2容器の液体量を取得し、取得した画像、角度及び液体量を学習モデルへ入力して、学習モデルが出力する第2時点の角度を取得し、取得した第2時点の角度に応じて、ロボットアームが保持する第1容器の角度を制御する(例えば、文献2の請求項1を参照)。  A liquid weighing method is also known in which a robot arm is used to pour water from a first container into a second container and weigh the liquid (see, for example, Document 2: JP 2021-164980 A). In this liquid weighing method, a control device that controls the operation of the robot arm acquires an image including a first container taken at a first time point, acquires the angle of the first container at the first time point, acquires the amount of liquid in the second container at the first time point, inputs the acquired image, angle, and amount of liquid into a learning model, acquires the angle at a second time point output by the learning model, and controls the angle of the first container held by the robot arm according to the acquired angle at the second time point (see, for example, claim 1 in Document 2).

 また、容器に大容量で入った粉体、粒体又は流体を、別の容器に定められた量だけすくい出して小分けする動作をロボットに学習させる機械学習方法が知られている(例えば、文献3:特開2019-098419号公報を参照)。この機械学習方法は、現実空間において多軸ロボットと対象物とで強化学習させる現実学習プロセスと、シミュレーション空間において、多軸ロボットを疑似的にシミュレートした疑似多軸ロボットと、対象物を疑似的にシミュレートした疑似対象物と、で強化学習させるシミュレーション学習プロセスとからなる(例えば、文献3の要約を参照)。 A machine learning method is also known in which a robot learns to scoop up a large amount of powder, granules, or fluid from a container and divide it into a separate container in a set amount (see, for example, Reference 3: JP 2019-098419 A). This machine learning method consists of a real learning process in which reinforcement learning is performed between a multi-axis robot and an object in a real space, and a simulation learning process in which reinforcement learning is performed between a pseudo multi-axis robot that simulates the multi-axis robot and a pseudo object that simulates the object in a simulation space (see, for example, the summary of Reference 3).

 また、ロボットを用いて粒状媒体を扱う技術が知られている(例えば、文献4:Schenckc, C., Tompson, J., Fox, D., and Levine, S., "Learning Robotic Manipulation of Granular Media," In Proceedings of the First Conference on Robotic Learning (CoRL) (to appear), 2017を参照)。文献4には、粒状媒体を掬う際に用いられるロボットの制御モデルが開示されている。 Also, techniques for handling granular media using robots are known (see, for example, Reference 4: Schenckc, C., Tompson, J., Fox, D., and Levine, S., "Learning Robotic Manipulation of Granular Media," In Proceedings of the First Conference on Robotic Learning (CoRL) (to appear), 2017). Reference 4 discloses a control model of a robot used to scoop up granular media.

 ところで、ロボットが粉体、粒体、又は流体等の対象物を秤量する際に、ロボットは目標量よりも多い対象物をすくい上げてしまう場合もあり得る。通常ロボットは目標量通りを狙って動作するが様々な誤差により実現は難しく、目標量よりも多い対象物をすくい上げてしまう場合もあり得る。上記文献1-4に開示されている技術は、粉体等を秤量する技術であるものの、目標量よりも多い対象物をすくい上げてしまった場合にロボットにどのような動作をさせるのかについては考慮されていない。 When a robot weighs objects such as powders, granules, or fluids, the robot may end up scooping up more than the target amount. Normally, a robot operates aiming to reach the target amount, but this is difficult to achieve due to various errors, and the robot may end up scooping up more than the target amount. The technology disclosed in the above-mentioned documents 1-4 is a technology for weighing powders, etc., but does not take into consideration how the robot should behave if it scoops up more than the target amount.

 本開示は、上記の点に鑑みてなされたものであり、ロボットが粉体、粒体、又は流体である対象物を秤量する際に、目標量よりも多い対象物をすくい上げてしまった状況下において、対象物の量を調整することを目的とする。 The present disclosure has been made in consideration of the above points, and aims to adjust the amount of an object when a robot scoops up more than the target amount of an object, which may be a powder, granules, or fluid, when weighing the object.

 上記目的を達成するために、本開示の制御装置は、粉体、粒体、又は流体である対象物を秤量するロボットを制御する制御装置であって、前記ロボットが前記対象物を秤量する際の状態を表す状態データを取得する取得部と、前記取得部により取得された前記状態データを、予め生成された学習済みモデルであって、かつ前記状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルに対して入力することにより、前記取得部により取得された前記状態データに応じた前記行動データを生成する生成部と、前記生成部により生成された前記行動データが表す動作が実現されるように、前記ロボットを制御する制御部と、を含む制御装置である。 In order to achieve the above object, the control device of the present disclosure is a control device for controlling a robot that weighs an object, which is a powder, granules, or fluid, and includes: an acquisition unit that acquires state data representing the state of the robot when weighing the object; a generation unit that generates the behavioral data corresponding to the state data acquired by the acquisition unit by inputting the state data acquired by the acquisition unit into a trained model that is generated in advance and that outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the state data is input; and a control unit that controls the robot so that the action represented by the behavioral data generated by the generation unit is realized.

 また、本開示の制御方法は、粉体、粒体、又は流体である対象物を秤量するロボットを制御する制御方法であって、前記ロボットが前記対象物を秤量する際の状態を表す状態データを取得し、取得された前記状態データを、予め生成された学習済みモデルであって、かつ前記状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルに対して入力することにより、取得された前記状態データに応じた前記行動データを生成し、生成された前記行動データが表す動作が実現されるように、前記ロボットを制御する、処理をコンピュータが実行する制御方法である。 The control method disclosed herein is a control method for controlling a robot that weighs an object, which is a powder, granules, or fluid, in which a computer executes a process to acquire status data representing the state of the robot when weighing the object, input the acquired status data to a trained model that is generated in advance and outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the status data is input, thereby generating the behavioral data according to the acquired status data, and controlling the robot so as to realize the action represented by the generated behavioral data.

 また、本開示の制御プログラムは、粉体、粒体、又は流体である対象物を秤量するロボットを制御する制御プログラムであって、前記ロボットが前記対象物を秤量する際の状態を表す状態データを取得し、取得された前記状態データを、予め生成された学習済みモデルであって、かつ前記状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルに対して入力することにより、取得された前記状態データに応じた前記行動データを生成し、生成された前記行動データが表す動作が実現されるように、前記ロボットを制御する、処理をコンピュータに実行させるための制御プログラムである。 The control program disclosed herein is a control program for controlling a robot that weighs an object, which is a powder, granules, or fluid, and acquires status data representing the state of the robot when weighing the object, and inputs the acquired status data to a trained model that is generated in advance and outputs behavioral data including an action to adjust the amount of the object scooped up by the robot when the status data is input, thereby generating behavioral data according to the acquired status data, and controlling the robot so as to realize the action represented by the generated behavioral data.

 また、本開示の学習済みモデル生成装置は、仮想ロボットが仮想の粉体、粒体、又は流体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られる学習用データを取得する学習用取得部と、前記学習用取得部により取得された学習用データに基づいて、ロボットが粉体、粒体、又は流体である対象物を秤量する際の状態を表す状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルを生成する学習部と、を含む学習済みモデル生成装置である。 The trained model generating device of the present disclosure is a trained model generating device including a training acquisition unit that acquires training data obtained by computer simulating the operation of a virtual robot when weighing a virtual object that is a virtual powder, granules, or fluid, and a learning unit that generates a trained model based on the training data acquired by the training acquisition unit, and that outputs behavioral data including an operation of adjusting the amount of the object scooped up by the robot when state data representing the state when the robot weighs the object that is a powder, granules, or fluid is input.

 また、本開示の学習済みモデル生成方法は、仮想ロボットが仮想の粉体、粒体、又は流体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られる学習用データを取得し、取得された学習用データに基づいて、ロボットが粉体、粒体、又は流体である対象物を秤量する際の状態を表す状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルを生成する、処理をコンピュータが実行する学習済みモデル生成方法である。 The trained model generation method disclosed herein is a trained model generation method in which a computer executes a process to acquire training data obtained by computer simulating the operation of a virtual robot when weighing a virtual object that is a virtual powder, granules, or fluid, and generates a trained model based on the acquired training data, in which, when state data representing the state of the robot when weighing the object that is a powder, granules, or fluid is input, behavioral data including an operation of adjusting the amount of the object scooped up by the robot is output.

 また、本開示の学習済みモデル生成プログラムは、仮想ロボットが仮想の粉体、粒体、又は流体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られる学習用データを取得し、取得された学習用データに基づいて、ロボットが粉体、粒体、又は流体である対象物を秤量する際の状態を表す状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルを生成する、処理をコンピュータに実行させるための学習済みモデル生成プログラムである。 The trained model generation program of the present disclosure is a trained model generation program for causing a computer to execute a process that acquires training data obtained by computer simulation of the operation of a virtual robot when weighing a virtual object that is a virtual powder, granules, or fluid, and generates a trained model based on the acquired training data, in which, when state data representing the state of the robot when weighing the object that is a powder, granules, or fluid is input, behavioral data including an operation of adjusting the amount of the object scooped up by the robot is output.

 本開示の制御装置、学習済みモデル生成装置、方法、及びプログラムによれば、ロボットが粉体、粒体、又は流体である対象物を秤量する際に、目標量よりも多い対象物をすくい上げてしまった状況下において、対象物の量を調整することができる。 The control device, trained model generation device, method, and program disclosed herein allow the amount of an object to be adjusted in a situation where the robot scoops up more than the target amount of an object, which may be a powder, granule, or fluid, when weighing the object.

本実施形態の概要を説明するための図である。FIG. 1 is a diagram for explaining an overview of the present embodiment. 本実施形態の概要を説明するための図である。FIG. 1 is a diagram for explaining an overview of the present embodiment. 本実施形態のスプーンの動作を説明するための図である。11A to 11C are diagrams for explaining the operation of the spoon of the present embodiment. 本実施形態における物理パラメータのランダム化を説明するための図である。11A and 11B are diagrams for explaining randomization of physical parameters in the present embodiment. 本実施形態に係る学習済みモデル生成装置のハードウェア構成を示すブロック図である。FIG. 2 is a block diagram showing a hardware configuration of the trained model generation device according to the present embodiment. 本実施形態の学習済みモデル生成装置の概略構成を表すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a trained model generation device according to an embodiment of the present invention. 本実施形態の学習済みモデルを説明するための図である。FIG. 1 is a diagram for explaining a trained model of the present embodiment. 本実施形態の制御システムの概略構成を表すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a control system according to an embodiment of the present invention. 本実施形態に係る制御装置のハードウェア構成を示すブロック図である。2 is a block diagram showing a hardware configuration of a control device according to the embodiment. FIG. 本実施形態における学習済みモデル生成処理の流れを示すフローチャートである。1 is a flowchart showing the flow of a trained model generation process in this embodiment. 本実施形態における制御処理の流れを示すフローチャートである。4 is a flowchart showing the flow of a control process in the present embodiment.

 以下、本開示の実施形態の一例を、図面を参照しつつ説明する。本実施形態では、本開示に係る制御装置を搭載した制御システムを例に説明する。なお、各図面において同一又は等価な構成要素及び部分には同一の参照符号を付与している。また、図面の寸法及び比率は、説明の都合上誇張されており、実際の比率とは異なる場合がある。 Below, an example of an embodiment of the present disclosure will be described with reference to the drawings. In this embodiment, a control system equipped with a control device according to the present disclosure will be described as an example. Note that the same reference symbols are used for identical or equivalent components and parts in each drawing. Also, the dimensions and proportions in the drawings have been exaggerated for the convenience of explanation and may differ from the actual proportions.

<実施形態の概要>
 図1及び図2は、本実施形態の概要を説明するための図である。図1に示されるように、本実施形態のロボット24は、容器Ctに充填されている粉体の秤量をする。また、本実施形態では、図2に示されるように、コンピュータシミュレーションを実行することにより得られる学習用データを用いて、粉体を秤量する際に用いられる学習済みモデルを生成する。具体的には、図2に示されるように、学習フェーズTrでは、シミュレータを実行することにより得られる仮想の状態データSv及び仮想の行動データAvに基づいて、強化学習を実行することにより方策を生成する。なお、この方策は、既知の機械学習モデルによって実現される。
<Overview of the embodiment>
1 and 2 are diagrams for explaining an overview of this embodiment. As shown in FIG. 1, a robot 24 of this embodiment weighs powder filled in a container Ct. In addition, in this embodiment, as shown in FIG. 2, a trained model used when weighing powder is generated using learning data obtained by executing a computer simulation. Specifically, as shown in FIG. 2, in the learning phase Tr, a policy is generated by executing reinforcement learning based on virtual state data Sv and virtual behavior data Av obtained by executing a simulator. Note that this policy is realized by a known machine learning model.

 そして、運用フェーズOpにおいて、ロボットは、この方策を利用して粉体を秤量する動作を実行する。具体的には、現在の状態データSが方策へ入力されると、方策は新たな行動データを出力する。ロボットは、行動データに応じた動作をした後、電子天秤を用いて粉体を秤量Wする。ロボットの動作により、新たな状態データSが生成され、その状態データが方策へ入力され、方策から次の行動データが出力される。運用フェーズOpにおいてこのサイクルが実行されることにより、ロボットによる粉体の秤量が実行される。 Then, in the operation phase Op, the robot executes the operation of weighing the powder using this strategy. Specifically, when the current state data S is input to the strategy, the strategy outputs new action data. The robot performs an operation according to the action data, and then weighs W the powder using an electronic balance. New state data S is generated by the robot's operation, and this state data is input to the strategy, and the next action data is output from the strategy. By executing this cycle in the operation phase Op, the robot weighs the powder.

 なお、ロボットが粉体である対象物を秤量する際に、目標量よりも多い対象物をすくい上げてしまう場合もあり得る。本実施形態では、このような場合を想定し、ロボットがすくい上げた対象物を落とすことにより対象物の量を調整する。なお、すくい上げた対象物の量が目標量よりも少なかった場合は、ロボットは対象物をすくい上げる動作を再度行ってもよい。以下、具体的に説明する。 When the robot weighs a powder object, it may scoop up more than the target amount. In this embodiment, such a case is assumed, and the robot adjusts the amount of the object by dropping the object it has scooped up. If the amount of the object scooped up is less than the target amount, the robot may scoop up the object again. This is explained in detail below.

<問題定式化>
 本実施形態では強化学習を用いて、上述した方策が反映された学習済みモデルを生成する。また、本実施形態では、マルコフ決定プロセス(S,A,μ,P,r,γ)により問題が定義される。Sは環境から取得される状態の集合であり、Aは選択可能な行動の集合である。P ss’は、状態sが観測された状況下において行動aが選択され、次の時間ステップにおける状態s’へと遷移する確率を表す。r ss’は、状態sが観測された状況下において行動aが選択され、次の時間ステップにおける状態s’へ遷移した際の報酬を表す。γ∈[0,1)は割引率である。方策π(a|s)は状態sが与えられた状況下において行動aが選択される確率である。強化学習の目的は、以下の式(1)によって表される報酬の総和を最大化するような、最適な方策πを見つけることである。なお、tは時刻を表す。
<Problem formulation>
In this embodiment, reinforcement learning is used to generate a trained model reflecting the above-mentioned policy. In this embodiment, the problem is defined by a Markov decision process (S, A, μ, P, r, γ). S is a set of states acquired from the environment, and A is a set of selectable actions. P a ss' represents the probability that action a is selected in a situation in which state s is observed, and transitions to state s' in the next time step. R a ss' represents the reward when action a is selected in a situation in which state s is observed, and transitions to state s' in the next time step. γ ∈ [0, 1) is a discount rate. Policy π(a|s) is the probability that action a is selected in a situation in which state s is given. The purpose of reinforcement learning is to find an optimal policy π * that maximizes the sum of rewards represented by the following formula (1). Note that t represents time.


                    (1)

(1)

 なお、上記式(1)のrstは、以下の式(2)によって表される。 Incidentally, rst in the above formula (1) is expressed by the following formula (2).


                    (2)

(2)

 本実施形態の粉体の秤量では、状態sは、現在の粉体の質量wcurrentと、粉体の目標質量wgoalと、スプーンの傾き角θspoonとを要素とするベクトル[wcurrent,θspoon,wgoal]によって表される。本実施形態では、任意の目標質量の粉体を秤量するために、状態sには粉体の目標質量wgoalが組み込まれる。行動aは、スプーンを傾ける行動ainclineと、スプーンを振る行動ashakeとを要素とするベクトル[aincline,ashake]によって表される。なお、行動ainclineは、スプーンの相対的なピッチ角に相当する。 In weighing powder in this embodiment, the state s is represented by a vector [w current , θ spoon , w goal ] whose elements are the current powder mass w current , the target powder mass w goal , and the spoon tilt angle θ spoon . In this embodiment, in order to weigh powder of an arbitrary target mass, the state s incorporates the target powder mass w goal . The action a is represented by a vector [a incline , a shake ] whose elements are the action a incline of tilting the spoon and the action a shake of the spoon . The action a incline corresponds to the relative pitch angle of the spoon.

 図3は、スプーンを傾ける行動ainclineとスプーンを振る行動ashakeとを説明するための図である。図3に示されているように、スプーンを振る行動ashakeはスプーンを前後に動かすような動作であり、スプーンを振る行動ashakeはスプーンの位置を前後に動かすような動作であり、スプーンを傾ける行動ainclineはスプーンの姿勢を変化させる動作である。スプーンを振る行動ashakeは、スプーンの移動距離と移動加速度とによって表現可能である。このため、スプーンを振る行動ashakeは、スプーンの移動距離と移動加速度との比例係数によって表される。 3 is a diagram for explaining the spoon tilting action a incline and the spoon shaking action a shake . As shown in FIG. 3, the spoon shaking action a shake is an action of moving the spoon back and forth, the spoon shaking action a shake is an action of moving the spoon back and forth, and the spoon tilting action a incline is an action of changing the spoon's posture. The spoon shaking action a shake can be expressed by the spoon's moving distance and movement acceleration. Therefore, the spoon shaking action a shake is expressed by a proportional coefficient between the spoon's moving distance and movement acceleration.

 このため、本実施形態の状態sと行動aとは、以下のように定義される。 Therefore, state s and action a in this embodiment are defined as follows:



 また、報酬rは、以下の式(3)に示されるように、目標質量wgoalと現在の粉体の質量wcurrentとの間の差分として定義される。 Additionally, the reward r is defined as the difference between the goal mass w goal and the current powder mass w current , as shown in equation (3) below.


                    (3)

(3)

(コンピュータシミュレーションによる学習済みモデルの生成)
 本実施形態では、機械学習モデルを用いて方策π(a|s)を実現する。具体的には、本実施形態では、機械学習モデルの一例であるLSTM(long short-term memory)構造を有するニューラルネットワークモデルを用いて方策π(a|s)を実現する。LSTM構造を有するニューラルネットワークモデルは、粉体の時系列状態を考慮することが可能である。
(Generation of trained models through computer simulation)
In this embodiment, the policy π * (a|s) is realized using a machine learning model. Specifically, in this embodiment, the policy π * (a|s) is realized using a neural network model having a long short-term memory (LSTM) structure, which is an example of a machine learning model. The neural network model having the LSTM structure can take into account the time-series state of the powder.

 上述したように、本実施形態では、コンピュータシミュレーションを実行することにより得られる学習用データを用いて、LSTM構造を有するニューラルネットワークモデルを学習させることにより、粉体を秤量する際に用いられる学習済みモデルを生成する。 As described above, in this embodiment, a neural network model having an LSTM structure is trained using training data obtained by executing a computer simulation, thereby generating a trained model to be used when weighing powder.

 なお、コンピュータシミュレーションを実行する際には、シミュレーション上の仮想空間内の物理パラメータをランダムに変化させる。具体的には、粉体を構成する粒子間の摩擦係数、粒子の数、粒子の半径、粒子の質量、スプーンと粒子との間の摩擦係数、スプーンの振りの強さを表すパラメータ、重力、及び粉体の目標質量をランダムに変化させることによりシミュレーションを実行する。仮想空間内の物理パラメータをランダムに変化させることにより、シミュレーションにおいて様々な環境が仮想的に実現され、様々な状況に対応可能な学習済みモデルが生成される。 When running the computer simulation, physical parameters in the virtual space of the simulation are randomly changed. Specifically, the simulation is run by randomly changing the coefficient of friction between the particles that make up the powder, the number of particles, the radius of the particles, the mass of the particles, the coefficient of friction between the spoon and the particles, the parameter representing the strength of the spoon swing, gravity, and the target mass of the powder. By randomly changing the physical parameters in the virtual space, various environments are virtually realized in the simulation, and a trained model that can respond to a variety of situations is generated.

 図4は、仮想空間内の物理パラメータを変化させることを説明するための図である。図4に示されるように、シミュレーションを実行することにより学習済みモデルを生成する学習フェーズTrにおいては、仮想空間内の様々な物理パラメータをランダムに変化させつつ(図4におけるRd)、仮想空間内のスプーンを動かしてスプーン上の粉体の量を調整する動作を学習済みモデルへ反映させる。そして、図4に示されるように、運用フェーズOpにおいては、学習済みモデルを用いて、現実空間のスプーンを動作させることによりスプーン上の粉体の量を調整する動作が実行される。 FIG. 4 is a diagram for explaining changing physical parameters in virtual space. As shown in FIG. 4, in the learning phase Tr in which a trained model is generated by executing a simulation, various physical parameters in the virtual space are randomly changed (Rd in FIG. 4) while the action of moving the spoon in the virtual space to adjust the amount of powder on the spoon is reflected in the trained model. Then, as shown in FIG. 4, in the operation phase Op, the trained model is used to perform the action of adjusting the amount of powder on the spoon by moving the spoon in real space.

 なお、粉体を構成する粒子間の摩擦係数は、粉体の流れにくさを表すパラメータでもある。また、スプーンと粒子との間の摩擦係数は、スプーンに残る粉体の量に影響を及ぼすパラメータでもある。また、スプーンの振りの強さを表すパラメータをランダムに変化させるのは、シミュレータと現実とでロボットによるスプーンの振りの強さが異なるため、それを学習済みモデルへ反映させるためである。また、重力をランダムに変化させるのは、粉体が舞う現象を再現するためである。なお、粉体が舞う現象をより多く再現するために、仮想空間内の重力は、現実の重力加速度よりも極めて小さい値に設定するようにしてもよい。また、目標質量をランダムに変化させるのは、任意の目標質量で粉体を秤量するためである。 The coefficient of friction between the particles that make up the powder is also a parameter that indicates how difficult it is for the powder to flow. The coefficient of friction between the spoon and the particles is also a parameter that affects the amount of powder remaining on the spoon. The parameter that indicates the strength of the spoon swing is changed randomly in order to reflect the difference in strength of the spoon swing by the robot between the simulator and reality in the trained model. Gravity is changed randomly in order to reproduce the phenomenon of powder flying. In order to reproduce the phenomenon of powder flying as much as possible, gravity in the virtual space may be set to a value that is much smaller than the gravitational acceleration in reality. The target mass is changed randomly in order to weigh the powder at an arbitrary target mass.

(学習済みモデル生成装置10)
 図5は、本実施形態に係る学習済みモデル生成装置10のハードウェア構成を示すブロック図である。図5に示されるように、学習済みモデル生成装置10は、CPU(Central Processing Unit)42、メモリ44、記憶装置46、入出力I/F(Interface)48、記憶媒体読取装置50、及び通信I/F52を有する。各構成は、バス54を介して相互に通信可能に接続されている。
(Trained model generation device 10)
Fig. 5 is a block diagram showing a hardware configuration of the trained model generation device 10 according to the present embodiment. As shown in Fig. 5, the trained model generation device 10 has a CPU (Central Processing Unit) 42, a memory 44, a storage device 46, an input/output I/F (Interface) 48, a storage medium reading device 50, and a communication I/F 52. Each component is connected to each other via a bus 54 so as to be able to communicate with each other.

 記憶装置46には、後述する各処理を実行するための学習済みモデル生成プログラムが格納されている。CPU42は、中央演算処理ユニットであり、各種プログラムを実行したり、各構成を制御したりする。すなわち、CPU42は、記憶装置46からプログラムを読み出し、メモリ44を作業領域としてプログラムを実行する。CPU42は、記憶装置46に記憶されているプログラムに従って、上記各構成の制御及び各種の演算処理を行う。 The storage device 46 stores a trained model generation program for executing each process described below. The CPU 42 is a central processing unit, and executes various programs and controls each component. That is, the CPU 42 reads the program from the storage device 46 and executes the program using the memory 44 as a working area. The CPU 42 controls each of the components and performs various calculation processes according to the program stored in the storage device 46.

 メモリ44は、RAM(Random Access Memory)により構成され、作業領域として一時的にプログラム及びデータを記憶する。記憶装置46は、ROM(Read Only Memory)、及びHDD(Hard Disk Drive)、SSD(Solid State Drive)等により構成され、オペレーティングシステムを含む各種プログラム、及び各種データを格納する。 Memory 44 is made up of RAM (Random Access Memory) and serves as a working area to temporarily store programs and data. Storage device 46 is made up of ROM (Read Only Memory), HDD (Hard Disk Drive), SSD (Solid State Drive), etc., and stores various programs including the operating system, and various data.

 入出力I/F48は、外部からのデータの入力、及び外部へのデータの出力を行うインタフェースである。また、例えば、キーボードやマウス等の、各種の入力を行うための入力装置、及び、例えば、ディスプレイやプリンタ等の、各種の情報を出力するための出力装置が接続されてもよい。出力装置として、タッチパネルディスプレイを採用することにより、入力装置として機能させてもよい。 The input/output I/F 48 is an interface for inputting data from the outside and outputting data to the outside. In addition, input devices for various inputs, such as a keyboard or mouse, and output devices for outputting various information, such as a display or printer, may be connected. A touch panel display may be used as the output device to function as an input device.

 記憶媒体読取装置50は、CD(Compact Disc)-ROM、DVD(Digital Versatile Disc)-ROM、ブルーレイディスク、USB(Universal Serial Bus)メモリ等の各種記憶媒体に記憶されたデータの読み込みや、記憶媒体に対するデータの書き込み等を行う。 The storage medium reader 50 reads data stored in various storage media such as CD (Compact Disc)-ROM, DVD (Digital Versatile Disc)-ROM, Blu-ray Disc, and USB (Universal Serial Bus) memory, and writes data to the storage media.

 通信I/F52は、他の機器と通信するためのインタフェースであり、例えば、イーサネット(登録商標)、FDDI、Wi-Fi(登録商標)等の規格が用いられる。 The communication I/F 52 is an interface for communicating with other devices, and uses standards such as Ethernet (registered trademark), FDDI, and Wi-Fi (registered trademark).

 次に、学習済みモデル生成装置10の機能構成について説明する。図6に示されるように、学習済みモデル生成装置10は、機能的には、シミュレーション部12と、学習用取得部14と、学習部16とを含む。また、学習済みモデル生成装置10の所定の記憶領域には、学習済みモデル記憶部18が設けられている。各機能構成は、CPU42が記憶装置46に記憶された各プログラムを読み出し、メモリ44に展開して実行することにより実現される。 Next, the functional configuration of the trained model generation device 10 will be described. As shown in FIG. 6, the trained model generation device 10 functionally includes a simulation unit 12, a learning acquisition unit 14, and a learning unit 16. A trained model storage unit 18 is also provided in a specified storage area of the trained model generation device 10. Each functional configuration is realized by the CPU 42 reading out each program stored in the storage device 46, expanding it in the memory 44, and executing it.

 学習済みモデル記憶部18には、後述する処理によって生成された学習済みモデルが格納される。上述したように、本実施形態の学習済みモデルは、LSTM構造を有するニューラルネットワークモデルである。図7は、本実施形態の学習済みモデルを説明するための図である。図7に示されるように、本実施形態の学習済みモデルは、方策π(a|s)を実現するモデルであり、状態データsが入力されると行動データaが出力されるようなモデルである。 The trained model storage unit 18 stores trained models generated by the process described below. As described above, the trained model of this embodiment is a neural network model having an LSTM structure. FIG. 7 is a diagram for explaining the trained model of this embodiment. As shown in FIG. 7, the trained model of this embodiment is a model that realizes a policy π * (a|s), and is a model that outputs action data a when state data s is input.

 シミュレーション部12は、仮想空間内の仮想ロボットが仮想の粉体である仮想対象物を秤量する際の動作に関するコンピュータシミュレーションを実行する。仮想ロボットには仮想のスプーンが設置されており、仮想ロボットは仮想のスプーンを用いて仮想の粉体を秤量する。なお、シミュレーション部12は、コンピュータシミュレーションを実行する際に、仮想空間内の物理パラメータをランダムに変化させる。 The simulation unit 12 executes a computer simulation of the operation of a virtual robot in a virtual space when weighing a virtual object, which is a virtual powder. A virtual spoon is installed on the virtual robot, and the virtual robot weighs the virtual powder using the virtual spoon. Note that the simulation unit 12 randomly changes physical parameters in the virtual space when executing the computer simulation.

 学習用取得部14は、学習用データを取得する。学習用データは、仮想ロボットが仮想の粉体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られるデータである。なお、学習用データは、例えば、仮想ロボットの特定箇所(例えば、スプーンが取り付けられている箇所)の位置及び姿勢、仮想のスプーンの位置及び姿勢、仮想のスプーン上に存在する仮想対象物の量、及びある状態sにおいてある行動aを選択した際の報酬r等のデータによって構成される。 The learning acquisition unit 14 acquires learning data. The learning data is data obtained by computer simulating the operation of the virtual robot when weighing a virtual object, which is a virtual powder. The learning data is composed of data such as the position and posture of a specific part of the virtual robot (e.g., the part where the spoon is attached), the position and posture of the virtual spoon, the amount of virtual object present on the virtual spoon, and the reward r when a certain action a is selected in a certain state s.

 学習部16は、学習用取得部14により取得された学習用データに基づいて学習済みモデルを生成する。学習済みモデルは、ロボットが粉体を秤量する際の状態データsが入力されると当該ロボットがすくい上げた対象物の量を調整する動作を含む行動データaが出力する。 The learning unit 16 generates a trained model based on the training data acquired by the training acquisition unit 14. When state data s is input when the robot weighs powder, the trained model outputs behavior data a including an action to adjust the amount of the object scooped up by the robot.

 具体的には、学習部16は、上記式(1)に示されている報酬関数が最大となるように、LSTM構造を有するニューラルネットワークモデルを深層強化学習させる。例えば、当該深層強化学習は、現在の粉体の質量が目標量に近づくように学習させることであってもよい。そして、学習部16は、得られた学習済みモデルを、学習済みモデル記憶部18へ格納する。なお、報酬関数は上記式(1)に限定されるものではなく、他の形式の報酬関数であってもよい。例えば、スプーンから粉体を落としすぎてしまい、スプーン上の粉体の量が目標量から遠ざかってしまうような場合を抑制するために、そのような行動に対して大きなペナルティを与えるような項を上記式(1)へ加えてもよい。この場合には、学習済みモデルは、すくい上げる粉体の量を目標量へ近づけると共に、粉体の落としすぎを抑制するといった行動をとる確率が高くなり、対象物の秤量にとってはより好ましい。 Specifically, the learning unit 16 performs deep reinforcement learning on a neural network model having an LSTM structure so that the reward function shown in the above formula (1) is maximized. For example, the deep reinforcement learning may be learning so that the current mass of powder approaches a target amount. The learning unit 16 then stores the obtained trained model in the trained model storage unit 18. Note that the reward function is not limited to the above formula (1) and may be another type of reward function. For example, in order to prevent a case where too much powder is dropped from the spoon and the amount of powder on the spoon becomes far from the target amount, a term that imposes a large penalty on such behavior may be added to the above formula (1). In this case, the trained model is more likely to take an action to bring the amount of powder scooped up closer to the target amount and prevent too much powder from being dropped, which is more favorable for weighing the object.

 これにより、ロボットが粉体を秤量する際の状態データsが入力されると当該ロボットがすくい上げた対象物の量を調整する動作を含む行動データaが出力する学習済みモデルが得られたことになり、この学習済みモデルを用いて現実のロボットを制御することが可能となる。 As a result, a trained model is obtained in which, when state data s is input when the robot is weighing powder, the robot outputs behavioral data a, which includes actions to adjust the amount of the object that it scoops up. This trained model can then be used to control a real robot.

(制御システム20)
 図8は、本実施形態の制御システム20の概略構成を表すブロック図である。図8に示されるように、制御システム20は、センサ群22と、ロボット24と、ロボット24に設置されたスプーン26と、制御装置30とを備えている。本実施形態に係る制御装置30は、学習済みモデル生成装置10によって生成された学習済みモデルを用いてロボット24の動作を制御する。なお、スプーン26は、量を調整する対象物を取得できるものであれば何でもよく、薬さじやロボットが持てるカップ状のようなものの他、ロボットの指に当該対象物が収まるくぼみなどがある構造であってもよい。
(Control System 20)
Fig. 8 is a block diagram showing a schematic configuration of the control system 20 of this embodiment. As shown in Fig. 8, the control system 20 includes a sensor group 22, a robot 24, a spoon 26 installed on the robot 24, and a control device 30. The control device 30 according to this embodiment controls the operation of the robot 24 using the trained model generated by the trained model generation device 10. The spoon 26 may be anything that can obtain an object whose amount is to be adjusted, and may be a medicine spoon, a cup-like object that the robot can hold, or a structure having a recess in which the object can be placed in the fingers of the robot.

 センサ群22は、ロボット24の状態、スプーン26の状態、スプーン26上の対象物の状態、及び容器内の対象物の状態を逐次検知する。センサ群22は、例えば、スプーン上又は容器内に存在する対象物の質量を計測する電子天秤、ロボット24の特定箇所の姿勢及び位置を検知するセンサ、及びスプーン26の姿勢及び位置を検知するセンサ等を含んで構成される。なお、計測対象となるのは対象物の質量に限定されるものではない。例えば、対象物を撮影した画像から、公知の画像処理技術を活用して推定した量(例えば、対象物の体積等)を対象物の状態とするようにしてもよい。また、スプーンから落とした対象物の量は、対象物を落とした際に発する音を公知の音声処理技術を活用して推定した対象物の量であってもよい。また、検知対象の対象物の状態は、スプーン26上の対象物の状態又は容器内の対象物の状態に限らず、スプーン26から落とした対象物の量であってもよい。 The sensor group 22 sequentially detects the state of the robot 24, the state of the spoon 26, the state of the object on the spoon 26, and the state of the object in the container. The sensor group 22 is composed of, for example, an electronic balance that measures the mass of the object on the spoon or in the container, a sensor that detects the posture and position of a specific part of the robot 24, and a sensor that detects the posture and position of the spoon 26. The measurement target is not limited to the mass of the object. For example, the state of the object may be an amount (e.g., the volume of the object) estimated from an image of the object using a known image processing technique. The amount of the object dropped from the spoon may be the amount of the object estimated using a known sound processing technique from the sound made when the object is dropped. The state of the object to be detected is not limited to the state of the object on the spoon 26 or the state of the object in the container, but may also be the amount of the object dropped from the spoon 26.

 ロボット24は、後述する制御装置30から出力される制御指令に応じて動作する。スプーン26は、例えば、図1に示されるようにロボット24に取り付けられ、ロボット24の動作に応じて容器内の対象物をすくい上げる。 The robot 24 operates in response to control commands output from the control device 30, which will be described later. The spoon 26 is attached to the robot 24, for example as shown in FIG. 1, and scoops up objects in the container in response to the operation of the robot 24.

 図9は、本実施形態に係る制御装置30のハードウェア構成を示すブロック図である。図9に示されるように、制御装置30は、CPU(Central Processing Unit)62、メモリ64、記憶装置66、入出力I/F(Interface)68、記憶媒体読取装置70、及び通信I/F72を有する。各構成は、バス74を介して相互に通信可能に接続されている。 FIG. 9 is a block diagram showing the hardware configuration of the control device 30 according to this embodiment. As shown in FIG. 9, the control device 30 has a CPU (Central Processing Unit) 62, a memory 64, a storage device 66, an input/output I/F (Interface) 68, a storage medium reading device 70, and a communication I/F 72. Each component is connected to each other so as to be able to communicate with each other via a bus 74.

 記憶装置66には、後述する各処理を実行するための制御プログラムが格納されている。CPU62は、中央演算処理ユニットであり、各種プログラムを実行したり、各構成を制御したりする。すなわち、CPU62は、記憶装置66からプログラムを読み出し、メモリ64を作業領域としてプログラムを実行する。CPU62は、記憶装置66に記憶されているプログラムに従って、上記各構成の制御及び各種の演算処理を行う。 The storage device 66 stores control programs for executing the various processes described below. The CPU 62 is a central processing unit, and executes various programs and controls each component. That is, the CPU 62 reads the programs from the storage device 66 and executes the programs using the memory 64 as a working area. The CPU 62 controls each of the components and performs various calculation processes according to the programs stored in the storage device 66.

 メモリ64は、RAM(Random Access Memory)により構成され、作業領域として一時的にプログラム及びデータを記憶する。記憶装置66は、ROM(Read Only Memory)、及びHDD(Hard Disk Drive)、SSD(Solid State Drive)等により構成され、オペレーティングシステムを含む各種プログラム、及び各種データを格納する。 Memory 64 is made up of RAM (Random Access Memory) and serves as a working area to temporarily store programs and data. Storage device 66 is made up of ROM (Read Only Memory), HDD (Hard Disk Drive), SSD (Solid State Drive), etc., and stores various programs including the operating system, and various data.

 入出力I/F68は、外部からのデータの入力、及び外部へのデータの出力を行うインタフェースである。また、例えば、キーボードやマウス等の、各種の入力を行うための入力装置、及び、例えば、ディスプレイやプリンタ等の、各種の情報を出力するための出力装置が接続されてもよい。出力装置として、タッチパネルディスプレイを採用することにより、入力装置として機能させてもよい。 The input/output I/F 68 is an interface for inputting data from the outside and outputting data to the outside. In addition, input devices for performing various inputs, such as a keyboard or mouse, and output devices for outputting various information, such as a display or printer, may be connected. A touch panel display may be used as the output device to function as an input device.

 記憶媒体読取装置70は、CD(Compact Disc)-ROM、DVD(Digital Versatile Disc)-ROM、ブルーレイディスク、USB(Universal Serial Bus)メモリ等の各種記憶媒体に記憶されたデータの読み込みや、記憶媒体に対するデータの書き込み等を行う。 The storage medium reader 70 reads data stored in various storage media such as CD (Compact Disc)-ROM, DVD (Digital Versatile Disc)-ROM, Blu-ray Disc, and USB (Universal Serial Bus) memory, and writes data to the storage media.

 通信I/F72は、他の機器と通信するためのインタフェースであり、例えば、イーサネット(登録商標)、FDDI、Wi-Fi(登録商標)等の規格が用いられる。 The communication I/F 72 is an interface for communicating with other devices, and uses standards such as Ethernet (registered trademark), FDDI, and Wi-Fi (registered trademark).

 次に、制御装置30の機能構成について説明する。図8に示されるように、制御装置30は、機能的には、取得部34と、生成部36と、制御部38とを含む。また、制御装置30の所定の記憶領域には、学習済みモデル記憶部32が設けられている。各機能構成は、CPU62が記憶装置66に記憶された各プログラムを読み出し、メモリ64に展開して実行することにより実現される。 Next, the functional configuration of the control device 30 will be described. As shown in FIG. 8, the control device 30 functionally includes an acquisition unit 34, a generation unit 36, and a control unit 38. A trained model storage unit 32 is provided in a predetermined storage area of the control device 30. Each functional configuration is realized by the CPU 62 reading out each program stored in the storage device 66, expanding it in the memory 64, and executing it.

 学習済みモデル記憶部32には、学習済みモデル生成装置10によって生成された学習済みモデルが格納される。 The trained model storage unit 32 stores the trained model generated by the trained model generation device 10.

 取得部34は、センサ群22によって検知されたロボット24の状態、スプーン26の状態、スプーン26上の対象物の状態、及び容器内の対象物の状態を取得する。次に、取得部34は、ロボット24が対象物を秤量する際の状態を表す状態データsを生成する。具体的には、取得部34は、センサ群22によって得られた各種データに基づいて、現在のスプーン26上に存在する対象物の質量wcurrentと、現在のスプーン26の傾きθspoonと計算する。そして、取得部34は、現在の状態データs=[wcurrent,θspoon,wgoal]を設定する。なお、対象物の目標質量wgoalは、ユーザによって予め設定される。 The acquisition unit 34 acquires the state of the robot 24, the state of the spoon 26, the state of the object on the spoon 26, and the state of the object in the container, all of which are detected by the sensor group 22. Next, the acquisition unit 34 generates state data s representing the state of the robot 24 when weighing the object. Specifically, the acquisition unit 34 calculates the mass w current of the object currently present on the spoon 26 and the current inclination θ spoon of the spoon 26, based on the various data acquired by the sensor group 22. Then, the acquisition unit 34 sets the current state data s = [w current , θ spoon , w goal ]. Note that the target mass w goal of the object is set in advance by the user.

 生成部36は、学習済みモデル記憶部32に格納されている学習済みモデルに対して、取得部34により取得された状態データsを入力することにより、当該状態データsに応じたロボット24の行動データa=[aincline,ashake]を生成する。この行動データaは、ロボット24がすくい上げた対象物の量を調整する動作を含む行動データaであり、ロボット24が次時刻にとるべき動作に相当する。なお、ロボット24がすくい上げた対象物の量を調整する動作は、スプーン26を傾ける動作及びスプーン26を振る動作の少なくとも一方を含む動作であってよい。このため、スプーン26を傾ける動作aincline及びスプーン26を振る動作ashakeの何れか一方は行われず、何れか一方のみが行われてよい。また、スプーン26を傾ける動作aincline及びスプーン26を振る動作ashakeの双方が行われてもよい。 The generating unit 36 inputs the state data s acquired by the acquiring unit 34 into the learned model stored in the learned model storage unit 32, thereby generating behavior data a=[a incline , a shake ] of the robot 24 according to the state data s. This behavior data a is behavior data a including an action to adjust the amount of the object scooped up by the robot 24, and corresponds to an action to be taken by the robot 24 at the next time. Note that the action to adjust the amount of the object scooped up by the robot 24 may include at least one of an action to tilt the spoon 26 and an action to shake the spoon 26. For this reason, only one of the action a incline to tilt the spoon 26 and the action a shake to shake the spoon 26 may be performed, without performing either one of them. In addition, both the action a incline to tilt the spoon 26 and the action a shake to shake the spoon 26 may be performed.

 制御部38は、生成部36により生成された行動データa=[aincline,ashake]が表す動作が実現されるように、ロボット24を制御する。具体的には、制御部38は、生成部36により生成されたスプーンを傾ける行動ainclineが実現されるように、ロボット24に対して制御指令を出力する。また、制御部38は、生成部36により生成されたスプーンを振る行動ashakeが実現されるように、ロボット24に対して制御指令を出力する。これにより、スプーン26上に存在する対象物の量を調整するような動作が実行される。具体的には、ロボット24がスプーン26を用いてすくい上げた対象物を落とす動作である。対象物を落とす動作は、ロボット24の特定箇所の姿勢及びロボット24の特定箇所の位置を変化させる動作である。なお、上記生成された行動データによりロボット24が動作する前(準備段階)や後(秤量後の粉体の処理)に、ロボット24の動作をどのような手法で制御するかは、周知のティーチング技術に基づいてもよい。 The control unit 38 controls the robot 24 so that the action represented by the action data a=[a incline , a shake ] generated by the generation unit 36 is realized. Specifically, the control unit 38 outputs a control command to the robot 24 so that the action a incline of tilting the spoon generated by the generation unit 36 is realized. The control unit 38 also outputs a control command to the robot 24 so that the action a shake of shaking the spoon generated by the generation unit 36 is realized. As a result, an action to adjust the amount of the object present on the spoon 26 is executed. Specifically, it is an action of the robot 24 dropping the object scooped up by the spoon 26. The action of dropping the object is an action of changing the posture of a specific part of the robot 24 and the position of a specific part of the robot 24. Note that the method by which the action of the robot 24 is controlled before (preparation stage) or after (processing of the powder after weighing) the robot 24 operates based on the generated action data may be based on a well-known teaching technique.

 次に、本実施形態に係る学習済みモデル生成装置10の作用について説明する。 Next, the operation of the trained model generation device 10 according to this embodiment will be described.

 学習済みモデル生成装置10が所定の指示信号を受け付けると、学習済みモデル生成装置10のCPU42は記憶装置46から学習済みモデル生成プログラムを読み出して、メモリ44に展開して実行する。これにより、CPU42が学習済みモデル生成装置10の各機能構成として機能し、図10に示す学習済みモデル生成処理が実行される。 When the trained model generation device 10 receives a predetermined instruction signal, the CPU 42 of the trained model generation device 10 reads the trained model generation program from the storage device 46, expands it into the memory 44, and executes it. As a result, the CPU 42 functions as each functional component of the trained model generation device 10, and the trained model generation process shown in FIG. 10 is executed.

 ステップS100において、シミュレーション部12は、仮想空間内の仮想ロボットが仮想の粉体である仮想対象物を秤量する際の動作に関するコンピュータシミュレーションを実行する。仮想ロボットには仮想のスプーンが設置されており、仮想ロボットは仮想のスプーンを用いて仮想の粉体を秤量する。なお、シミュレーション部12は、コンピュータシミュレーションを実行する際に、仮想空間内の物理パラメータをランダムに変化させ
る。
In step S100, the simulation unit 12 executes a computer simulation of an operation of a virtual robot in a virtual space when weighing a virtual object, which is a virtual powder. A virtual spoon is provided on the virtual robot, and the virtual robot weighs the virtual powder using the virtual spoon. When executing the computer simulation, the simulation unit 12 randomly changes physical parameters in the virtual space.

 ステップS102において、学習用取得部14は、シミュレーション部12によるシミュレーションが実行されている最中に、仮想ロボットの特定箇所(例えば、スプーンが取り付けられている箇所)の位置及び姿勢、仮想のスプーンの位置及び姿勢、仮想のスプーン上に存在する仮想対象物の量、及びある状態sにおいてある行動aを選択した際の報酬r等を含んで構成される学習用データを取得する。 In step S102, while the simulation unit 12 is performing the simulation, the learning acquisition unit 14 acquires learning data including the position and orientation of a specific part of the virtual robot (e.g., the part where the spoon is attached), the position and orientation of the virtual spoon, the amount of virtual objects present on the virtual spoon, and the reward r when a certain action a is selected in a certain state s.

 ステップS104において、学習部16は、ステップS102で取得された学習用データに基づいて学習済みモデルを生成する。 In step S104, the learning unit 16 generates a trained model based on the training data acquired in step S102.

 ステップS106において、学習部16は、ステップS104で生成された学習済みモデルを、学習済みモデル記憶部18へ格納する。 In step S106, the learning unit 16 stores the trained model generated in step S104 in the trained model storage unit 18.

 次に、本実施形態に係る制御システム20の作用について説明する。 Next, the operation of the control system 20 according to this embodiment will be described.

 学習済みモデル生成装置10によって生成された学習済みモデルが、制御装置30へ入力されると、その学習済みモデルは制御装置30の学習済みモデル記憶部32へ格納される。そして、制御システム20が所定の指示信号を受け付けると、制御装置30のCPU62は記憶装置66から制御プログラムを読み出して、メモリ64に展開して実行する。これにより、CPU62が制御装置30の各機能構成として機能し、図11に示す制御処理が実行される。 When the trained model generated by the trained model generating device 10 is input to the control device 30, the trained model is stored in the trained model storage unit 32 of the control device 30. Then, when the control system 20 receives a predetermined instruction signal, the CPU 62 of the control device 30 reads out the control program from the storage device 66, expands it into the memory 64, and executes it. As a result, the CPU 62 functions as each functional component of the control device 30, and the control process shown in FIG. 11 is executed.

 ステップS200において、取得部34は、センサ群22によって得られた各種データから、現在の状態データs=[wcurrent,θspoon,wgoal]を取得する。 In step S200, the acquisition unit 34 acquires current state data s = [w current , θ spoon , w goal ] from various data obtained by the sensor group 22.

 ステップS202において、生成部36は、学習済みモデル記憶部32に格納されている学習済みモデルに対して、ステップS200で取得された状態データsを入力することにより、当該状態データsに応じたロボット24の行動データa=[aincline,ashake]を生成する。 In step S202, the generation unit 36 inputs the state data s acquired in step S200 into the learned model stored in the learned model storage unit 32, thereby generating behavioral data a = [a incline , a shake ] of the robot 24 corresponding to the state data s.

 ステップS204において、ステップS202で生成された行動データa=[aincline,ashake]が表す動作が実現されるように、ロボット24を制御する。 In step S204, the robot 24 is controlled so as to realize the action represented by the action data a=[a incline , a shake ] generated in step S202.

 図11に示される制御処理が繰り返され、ロボット24に対して制御信号が繰り返し出力されることにより、対象物の秤量が適切に実行される。 The control process shown in FIG. 11 is repeated, and a control signal is repeatedly output to the robot 24, thereby allowing the object to be weighed appropriately.

 以上説明したように、本実施形態に係る制御装置は、粉体である対象物を秤量するロボットを制御する制御装置である。制御装置は、ロボットが対象物を秤量する際の状態を表す状態データを取得し、当該状態データを、予め生成された学習済みモデルへ入力する。なお、学習済みモデルは、状態データが入力されるとロボットがすくい上げた対象物の量を調整する動作を含む行動データが出力されるモデルである。制御装置は、このような学習済みモデルを用いて、状態データに応じた行動データを生成する。そして、制御装置は、生成された行動データが表す動作が実現されるように、ロボットを制御する。これにより、ロボットが粉体である対象物を秤量する際に、目標量よりも多い対象物をすくい上げてしまった状況下において対象物の量の調整することができる。 As described above, the control device according to this embodiment is a control device that controls a robot that weighs a powder object. The control device acquires state data that represents the state of the robot when it weighs the object, and inputs the state data to a trained model that has been generated in advance. The trained model is a model that outputs behavioral data that includes an action to adjust the amount of the object scooped up by the robot when state data is input. The control device uses this trained model to generate behavioral data according to the state data. The control device then controls the robot so that the action represented by the generated behavioral data is realized. This makes it possible to adjust the amount of the object when the robot scoops up more than the target amount when weighing a powder object.

 具体的には、ロボットにはスプーンが設置されており、状態データはスプーンの姿勢と対象物の目標量とを含むデータであることにより、目標量よりも多い対象物をすくい上げてしまった状況下においても、スプーンを傾ける又は動かす動作をすることにより、対象物の量の調整することができる。 Specifically, the robot is equipped with a spoon, and the state data includes the spoon's orientation and the target amount of the object, so that even in a situation where more object than the target amount has been scooped up, the amount of object can be adjusted by tilting or moving the spoon.

 また、対象物の量を調整する動作は、ロボットがすくい上げた対象物を落とす動作である。対象物を落とす動作は、ロボットの特定箇所の姿勢及びロボットの特定箇所の位置を変化させる動作である。これにより、対象物の量を微妙に調整することが可能となり、精度高く秤量を実行することが可能となる。 The action of adjusting the amount of the object is the action of dropping the object that the robot has scooped up. The action of dropping the object is the action of changing the attitude of a specific part of the robot and the position of a specific part of the robot. This makes it possible to finely adjust the amount of the object, enabling weighing to be performed with high precision.

 また、本実施形態に係る学習済みモデル生成装置は、仮想ロボットが仮想の粉体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られる学習用データを取得する。そして、学習済みモデル生成装置は、学習用データに基づいて、粉体である対象物を秤量する際の状態を表す状態データが入力されるとロボットがすくい上げた対象物の量を調整する動作を含む行動データが出力される学習済みモデルを生成する。これにより、ロボットが粉体である対象物を秤量する際に、目標量よりも多い対象物をすくい上げてしまった状況下において対象物の量の調整するための学習済みモデルを得ることができる。 The trained model generating device according to this embodiment also acquires training data obtained by computer simulating the actions of a virtual robot when weighing a virtual object, which is a virtual powder. The trained model generating device then generates a trained model based on the training data, which outputs behavioral data including actions to adjust the amount of the object scooped up by the robot when state data representing the state when weighing the powder object is input. This makes it possible to obtain a trained model for adjusting the amount of the object when the robot scoops up more than the target amount when weighing the powder object.

 また、本実施形態に係る学習済みモデル生成装置は、仮想ロボットが仮想の粉体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られたデータに基づいて、学習済みモデルを生成する。これにより、現実のロボットを動作させることなく学習済みモデルを生成することが可能となる。現実のロボットを用いて学習用データを収集する際には、粉体をまき散らしてしまう等の事態の発生が予想される。これに対して、本実施形態のようにコンピュータシミュレーションを利用して学習用データを収集することにより、粉体をまき散らしてしまう等の事態の発生を抑制することが可能となる。 The trained model generation device according to this embodiment generates a trained model based on data obtained by computer simulating the operation of a virtual robot when weighing a virtual object, which is a virtual powder. This makes it possible to generate a trained model without operating a real robot. When collecting training data using a real robot, it is expected that there will be incidents such as scattering powder. In contrast, by collecting training data using computer simulation as in this embodiment, it is possible to prevent incidents such as scattering powder.

 また、本実施形態に係る学習済みモデル生成装置は、仮想空間内の物理パラメータをランダムに変化させることによりシミュレーションを実行する。これにより、シミュレーションにおいて様々な環境が仮想的に実現され、様々な状況に対応可能な学習済みモデルが生成される。 The trained model generation device according to this embodiment also executes a simulation by randomly varying physical parameters in a virtual space. This allows various environments to be virtually realized in the simulation, and generates a trained model that can handle a variety of situations.

 次に、実施例について説明する。本実施例では、上述した手法を4つの粉体の秤量に適用した。以下の表は、本実施例の結果を表す図である。以下の表は、小麦粉(Flour)、米粉(Rice)、塩(Salt)、及び石炭(Coal)の4つの粉体に対して、上述した手法を適用した際の秤量結果である。本実施例では、ロボットを用いて5mg、10mg、15mgの秤量を行った。以下の結果は、平均絶対値誤差(Mean absolute error)と標準偏差(Standard deviation)とから構成されている。例えば、小麦粉(Flour)の5mgの秤量の結果である「0.0±0.4」に関しては、0.0が平均絶対誤差を表し、0.4が標準偏差を表す。なお、本実施例では、1つの粉体種に対して、5つの方策を学習させ、それぞれの方策について5つの実験を行った。このため、1つの粉体種に対して25個の実験結果が得られており、その結果が以下の表に示されている。以下の表からも分かるように、精度良く秤量が行われていることが分かる。 Next, an example will be described. In this example, the above-mentioned method was applied to weighing four powders. The following table shows the results of this example. The following table shows the weighing results when the above-mentioned method was applied to four powders: flour, rice, salt, and coal. In this example, 5 mg, 10 mg, and 15 mg were weighed using a robot. The results below consist of the mean absolute error and standard deviation. For example, for the weighing result of 5 mg of flour, "0.0±0.4", 0.0 represents the mean absolute error and 0.4 represents the standard deviation. In this example, five strategies were trained for one type of powder, and five experiments were performed for each strategy. Therefore, 25 experimental results were obtained for one type of powder, and the results are shown in the following table. As can be seen from the table below, the weighing was performed with high accuracy.

 また、以下の表は、学習済みモデルを生成する際のシミュレーションにおいて、仮想空間内の物理パラメータをランダムに変化させた場合について説明する。以下の表は、ランダムに変化させる物理パラメータの一覧である。 The following table explains the case where physical parameters in the virtual space are randomly changed in the simulation when generating a trained model. The following table lists the physical parameters that are randomly changed.

「Powder friction coefficient」は、粉体を構成する粒子間の摩擦係数を表す。「Powder particle number」は、粒子の数を表す。「Powder particle radius」は、粒子の半径を表す。「Powder particle mass」は、粒子の質量を表す。「Spoon friction coefficient」は、スプーンと粒子との間の摩擦係数を表す。「Shake speed weight」は、スプーンの振りの強さを表すパラメータを表す。「Gravity」は、重力を表す。「Goal powder amount」は、粉体の目標質量を表す。 "Powder friction coefficient" represents the friction coefficient between particles that make up the powder. "Powder particle number" represents the number of particles. "Powder particle radius" represents the radius of a particle. "Powder particle mass" represents the mass of a particle. "Spoon friction coefficient" represents the friction coefficient between the spoon and a particle. "Shake speed weight" is a parameter that represents the strength of the spoon swing. "Gravity" represents gravity. "Goal powder amount" represents the target mass of the powder.

 以下の表は、学習済みモデルを生成する際のシミュレーションにおいて、仮想空間内の物理パラメータをランダムに変化させた場合秤量結果である。以下の結果も、平均絶対値誤差と標準偏差とから構成されている。以下の表における「Ours」は、本実施形態において提案された方法によって得られた秤量結果である。また、「Incline」はスプーンを傾ける動作を表し、「Shake」はスプーンを振る動作を表す。なお、以下の表の2行目の「MLP」及び「P-controller」は既存手法を表す。「MLP」は多層ニューラルネットワークモデルを表し、「P-controller」はP制御を表す。以下の表の4行目に示されている秤量結果は、掲載されている物理パラメータをランダムに変化させずに固定した際の秤量結果である。また、以下の表の5行目に示されている秤量結果は、4行目に示されている秤量結果のうち成績の良かった上位n個の物理パラメータをランダムに変化させることにより得られた秤量結果である。以下の表からも分かるように、物理パラメータをランダムに変化させることにより、より精度良く秤量が行われることが分かる。なお、ここでは、1行目はOurs(Incline&shake)、2行目はMLPおよびP-Controller、3行目はOurs(Incline only)およびOurs(Shake only)、というように数える。 The following table shows the weighing results when the physical parameters in the virtual space are randomly changed in a simulation for generating a trained model. The following results are also composed of the mean absolute error and the standard deviation. "Ours" in the following table is the weighing result obtained by the method proposed in this embodiment. "Incline" represents the action of tilting the spoon, and "Shake" represents the action of shaking the spoon. "MLP" and "P-controller" in the second row of the following table represent existing methods. "MLP" represents a multi-layer neural network model, and "P-controller" represents P control. The weighing results shown in the fourth row of the following table are the weighing results when the listed physical parameters are fixed without being randomly changed. The weighing results shown in the fifth row of the following table are the weighing results obtained by randomly changing the top n physical parameters with the best results among the weighing results shown in the fourth row. As can be seen from the following table, the weighing can be performed more accurately by randomly changing the physical parameters. Here, the first line counts Ours(Incline&shake), the second line counts MLP and P-Controller, the third line counts Ours(Incline only) and Ours(Shake only), and so on.

 以上、実施形態及び実施例について説明したが、本開示の要旨を逸脱しない範囲において、種々なる態様で実施し得ることは勿論である。 The above describes the embodiments and examples, but it goes without saying that the present disclosure can be embodied in various ways without departing from the spirit of the disclosure.

 例えば、上記実施形態では、対象物が粉体である場合を例に説明したが、これに限定されるものではない。例えば、対象物は、粒体又は流体であっても上記実施形態及び実施例を適用することは可能である。 For example, in the above embodiment, the target object is a powder, but the present invention is not limited to this. For example, the above embodiment and examples can be applied even if the target object is a granule or a fluid.

 また、上記実施形態の状態データs及び行動データaの構成は適宜変更することが可能である。例えば、目標質量は状態データsへ組み入れずに構成することも可能である。例えば、目標質量毎に学習済みモデルを生成するようにしてもよい。 In addition, the configuration of the state data s and the action data a in the above embodiment can be changed as appropriate. For example, it is possible to configure the state data s without incorporating the target mass. For example, a trained model can be generated for each target mass.

 また、状態データsを取得する際のセンサ群には、例えば、カメラ又はマイクが含まれていてもよく、スプーンへの盛り具合の画像や、スプーンから粉を落とす音から状態データs(例えば、粉体の質量)を得るようにしてもよい。 The group of sensors used to acquire the status data s may include, for example, a camera or a microphone, and the status data s (e.g., the mass of the powder) may be obtained from an image of the amount of powder on the spoon or the sound of the powder dropping from the spoon.

 また、上記実施形態では、学習済みモデルがLSTM構造を有するニューラルネットワークモデルによって実現される場合を例に説明したが、これに限定されるものではない。機械学習モデルであれば、どのようなモデルであってもよい。 In the above embodiment, the trained model is implemented by a neural network model having an LSTM structure, but this is not limited to this. Any machine learning model may be used.

 また、上記実施形態でCPUがソフトウェア(プログラム)を読み込んで実行した各処理を、CPU以外の各種のプロセッサが実行してもよい。この場合のプロセッサとしては、FPGA(Field-Programmable Gate Array)等の製造後に回路構成を変更可能なPLD(Programmable Logic Device)、及びASIC(Application Specific Integrated Circuit)等の特定の処理を実行させるために専用に設計された回路構成を有するプロセッサである専用電気回路等が例示される。また、各処理を、これらの各種のプロセッサのうちの1つで実行してもよいし、同種又は異種の2つ以上のプロセッサの組み合わせ(例えば、複数のFPGA、及びCPUとFPGAとの組み合わせ等)で実行してもよい。また、これらの各種のプロセッサのハードウェア的な構造は、より具体的には、半導体素子等の回路素子を組み合わせた電気回路である。 Furthermore, the processes executed by the CPU after reading the software (program) in the above embodiment may be executed by various processors other than the CPU. Examples of processors in this case include PLDs (Programmable Logic Devices) such as FPGAs (Field-Programmable Gate Arrays) whose circuit configuration can be changed after manufacture, and dedicated electrical circuits such as ASICs (Application Specific Integrated Circuits), which are processors with circuit configurations designed exclusively to execute specific processes. Furthermore, each process may be executed by one of these various processors, or by a combination of two or more processors of the same or different types (for example, multiple FPGAs, or a combination of a CPU and an FPGA, etc.). Moreover, the hardware structure of these various processors is, more specifically, an electrical circuit that combines circuit elements such as semiconductor elements.

 また、上記実施形態では、各プログラムが記憶装置に予め記憶(インストール)されている態様を説明したが、これに限定されない。プログラムは、CD-ROM、DVD-ROM、ブルーレイディスク、USBメモリ等の記憶媒体に記憶された形態で提供されてもよい。また、プログラムは、ネットワークを介して外部装置からダウンロードされる形態としてもよい。 In the above embodiment, the programs are described as being pre-stored (installed) in a storage device, but the present invention is not limited to this. The programs may be provided in a form stored in a storage medium such as a CD-ROM, DVD-ROM, Blu-ray disc, or USB memory. The programs may also be downloaded from an external device via a network.

(付記)
 以下、本開示の態様について付記する。
(Additional Note)
The following additional notes are given regarding aspects of the present disclosure.

(付記1)
 粉体、粒体、又は流体である対象物を秤量するロボットを制御する制御装置であって、
 前記ロボットが前記対象物を秤量する際の状態を表す状態データを取得する取得部と、
 前記取得部により取得された前記状態データを、予め生成された学習済みモデルであって、かつ前記状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルに対して入力することにより、前記取得部により取得された前記状態データに応じた前記行動データを生成する生成部と、
 前記生成部により生成された前記行動データが表す動作が実現されるように、前記ロボットを制御する制御部と、
 を含む制御装置。
(Appendix 1)
A control device for controlling a robot that weighs an object that is a powder, a granular material, or a fluid, comprising:
an acquisition unit that acquires state data representing a state of the robot when weighing the object;
a generation unit that generates the behavioral data according to the status data acquired by the acquisition unit by inputting the status data acquired by the acquisition unit to a trained model that is generated in advance and outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the status data is input; and
A control unit that controls the robot so as to realize an action represented by the action data generated by the generation unit;
A control device including:

(付記2)
 前記学習済みモデルは、
 仮想ロボットが仮想の粉体、粒体、又は流体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られたデータに基づき学習されたモデルである、
 付記1に記載の制御装置。
(Appendix 2)
The trained model is
The model is trained based on data obtained by computer simulation of the behavior of a virtual robot when weighing a virtual object, which is a virtual powder, granule, or fluid.
2. The control device of claim 1.

(付記3)
 前記シミュレーションは、仮想空間内の物理パラメータをランダムに変化させることにより実行されたシミュレーションである、
 付記2に記載の制御装置。
(Appendix 3)
The simulation is performed by randomly varying physical parameters in a virtual space.
3. The control device of claim 2.

(付記4)
 前記ロボットにはスプーンが設置されており、
 前記状態データは、前記スプーンの姿勢と前記対象物の目標量とを含むデータである、
 付記1~付記3の何れか1項に記載の制御装置。
(Appendix 4)
The robot is provided with a spoon,
The state data includes the attitude of the spoon and a target amount of the object.
4. The control device according to claim 1 ,

(付記5)
 前記対象物の量を調整する動作は、前記ロボットがすくい上げた前記対象物を落とす動作である、
 付記1~付記4の何れか1項に記載の制御装置。
(Appendix 5)
The action of adjusting the amount of the object is an action of dropping the object scooped up by the robot.
5. The control device according to claim 1 ,

(付記6)
 前記対象物を落とす動作は、前記ロボットの特定箇所の姿勢及び前記ロボットの特定箇所の位置を変化させる動作である、
 付記5に記載の制御装置。
(Appendix 6)
The action of dropping the object is an action of changing the posture of a specific part of the robot and the position of the specific part of the robot.
6. The control device according to claim 5.

(付記7)
 仮想ロボットが仮想の粉体、粒体、又は流体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られる学習用データを取得する学習用取得部と、
 前記学習用取得部により取得された学習用データに基づいて、ロボットが粉体、粒体、又は流体である対象物を秤量する際の状態を表す状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルを生成する学習部と、
 を含む学習済みモデル生成装置。
(Appendix 7)
a learning acquisition unit that acquires learning data obtained by performing a computer simulation of an operation of the virtual robot when weighing a virtual object, which is a virtual powder, a virtual particle, or a virtual fluid;
a learning unit that generates a trained model based on the learning data acquired by the learning acquisition unit, and that outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when state data representing a state when the robot weighs the object, which is a powder, a granule, or a fluid, is input; and
A trained model generating device comprising:

(付記8)
 粉体、粒体、又は流体である対象物を秤量するロボットを制御する制御方法であって、
 前記ロボットが前記対象物を秤量する際の状態を表す状態データを取得し、
 取得された前記状態データを、予め生成された学習済みモデルであって、かつ前記状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルに対して入力することにより、取得された前記状態データに応じた前記行動データを生成し、
 生成された前記行動データが表す動作が実現されるように、前記ロボットを制御する、
 処理をコンピュータが実行する制御方法。
(Appendix 8)
A control method for controlling a robot that weighs an object that is a powder, a granular material, or a fluid, comprising:
acquiring status data representing a status of the robot when weighing the object;
The acquired state data is input to a trained model that is generated in advance and that outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the state data is input, thereby generating the behavioral data according to the acquired state data;
Controlling the robot so as to realize an action represented by the generated behavioral data.
A method of controlling how processing is executed by a computer.

(付記9)
 仮想ロボットが仮想の粉体、粒体、又は流体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られる学習用データを取得し、
 取得された学習用データに基づいて、ロボットが粉体、粒体、又は流体である対象物を秤量する際の状態を表す状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルを生成する、
 処理をコンピュータが実行する学習済みモデル生成方法。
(Appendix 9)
acquiring learning data obtained by computer simulating the operation of the virtual robot when weighing a virtual object, which is a virtual powder, a virtual granule, or a virtual fluid;
A trained model is generated based on the acquired training data, in which, when state data representing a state when the robot weighs an object, which is a powder, a granule, or a fluid, is input, behavioral data including an action of adjusting the amount of the object scooped up by the robot is output.
A method for generating trained models in which processing is performed by a computer.

(付記10)
 粉体、粒体、又は流体である対象物を秤量するロボットを制御する制御プログラムであって、
 前記ロボットが前記対象物を秤量する際の状態を表す状態データを取得し、
 取得された前記状態データを、予め生成された学習済みモデルであって、かつ前記状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルに対して入力することにより、取得された前記状態データに応じた前記行動データを生成し、
 生成された前記行動データが表す動作が実現されるように、前記ロボットを制御する、
 処理をコンピュータに実行させるための制御プログラム。
(Appendix 10)
A control program for controlling a robot that weighs an object that is a powder, a granular material, or a fluid, comprising:
acquiring status data representing a status of the robot when weighing the object;
The acquired state data is input to a trained model that is generated in advance and that outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the state data is input, thereby generating the behavioral data according to the acquired state data;
Controlling the robot so as to realize an action represented by the generated behavioral data.
A control program that causes a computer to execute processing.

(付記11)
 仮想ロボットが仮想の粉体、粒体、又は流体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られる学習用データを取得し、
 取得された学習用データに基づいて、ロボットが粉体、粒体、又は流体である対象物を秤量する際の状態を表す状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルを生成する、
 処理をコンピュータに実行させるための学習済みモデル生成プログラム。
(Appendix 11)
acquiring learning data obtained by computer simulating the operation of the virtual robot when weighing a virtual object, which is a virtual powder, a virtual granule, or a virtual fluid;
A trained model is generated based on the acquired training data, in which, when state data representing a state when the robot weighs an object, which is a powder, a granule, or a fluid, is input, behavioral data including an action of adjusting the amount of the object scooped up by the robot is output.
A trained model generation program for allowing a computer to execute processing.

 2023年7月31日に出願された日本国特許出願2023‐124132号の開示は、その全体が参照により本明細書に取り込まれる。本明細書に記載された全ての文献、特許出願、および技術規格は、個々の文献、特許出願、および技術規格が参照により取り込まれることが具体的かつ個々に記された場合と同程度に、本明細書中に参照により取り込まれる。 The disclosure of Japanese Patent Application No. 2023-124132, filed on July 31, 2023, is incorporated herein by reference in its entirety. All documents, patent applications, and technical standards described herein are incorporated herein by reference to the same extent as if each individual document, patent application, and technical standard was specifically and individually indicated to be incorporated by reference.

Claims (11)

 粉体、粒体、又は流体である対象物を秤量するロボットを制御する制御装置であって、
 前記ロボットが前記対象物を秤量する際の状態を表す状態データを取得する取得部と、
 前記取得部により取得された前記状態データを、予め生成された学習済みモデルであって、かつ前記状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルに対して入力することにより、前記取得部により取得された前記状態データに応じた前記行動データを生成する生成部と、
 前記生成部により生成された前記行動データが表す動作が実現されるように、前記ロボットを制御する制御部と、
 を含む制御装置。
A control device for controlling a robot that weighs an object that is a powder, a granular material, or a fluid, comprising:
an acquisition unit that acquires state data representing a state of the robot when weighing the object;
a generation unit that generates the behavioral data according to the status data acquired by the acquisition unit by inputting the status data acquired by the acquisition unit to a trained model that is generated in advance and outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the status data is input; and
A control unit that controls the robot so as to realize an action represented by the action data generated by the generation unit;
A control device including:
 前記学習済みモデルは、
 仮想ロボットが仮想の粉体、粒体、又は流体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られたデータに基づき学習されたモデルである、
 請求項1に記載の制御装置。
The trained model is
The model is trained based on data obtained by computer simulation of the behavior of a virtual robot when weighing a virtual object, which is a virtual powder, granule, or fluid.
The control device according to claim 1 .
 前記コンピュータシミュレーションは、仮想空間内の物理パラメータをランダムに変化させることにより実行されたシミュレーションである、
 請求項2に記載の制御装置。
The computer simulation is a simulation performed by randomly varying physical parameters in a virtual space.
The control device according to claim 2.
 前記ロボットにはスプーンが設置されており、
 前記状態データは、前記スプーンの姿勢と前記対象物の目標量とを含むデータである、
 請求項1~請求項3の何れか1項に記載の制御装置。
The robot is provided with a spoon,
The state data includes the attitude of the spoon and a target amount of the object.
The control device according to any one of claims 1 to 3.
 前記対象物の量を調整する動作は、前記ロボットがすくい上げた前記対象物を落とす動作である、
 請求項1~請求項3の何れか1項に記載の制御装置。
The action of adjusting the amount of the object is an action of dropping the object scooped up by the robot.
The control device according to any one of claims 1 to 3.
 前記対象物を落とす動作は、前記ロボットの特定箇所の姿勢及び前記ロボットの特定箇所の位置を変化させる動作である、
 請求項5に記載の制御装置。
The action of dropping the object is an action of changing the posture of a specific part of the robot and the position of the specific part of the robot.
The control device according to claim 5.
 仮想ロボットが仮想の粉体、粒体、又は流体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られる学習用データを取得する学習用取得部と、
 前記学習用取得部により取得された学習用データに基づいて、ロボットが粉体、粒体、又は流体である対象物を秤量する際の状態を表す状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルを生成する学習部と、
 を含む学習済みモデル生成装置。
a learning acquisition unit that acquires learning data obtained by performing a computer simulation of an operation of the virtual robot when weighing a virtual object, which is a virtual powder, a virtual particle, or a virtual fluid;
a learning unit that generates a trained model based on the learning data acquired by the learning acquisition unit, and that outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when state data representing a state when the robot weighs the object, which is a powder, a granule, or a fluid, is input; and
A trained model generating device comprising:
 粉体、粒体、又は流体である対象物を秤量するロボットを制御する制御方法であって、
 前記ロボットが前記対象物を秤量する際の状態を表す状態データを取得し、
 取得された前記状態データを、予め生成された学習済みモデルであって、かつ前記状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルに対して入力することにより、取得された前記状態データに応じた前記行動データを生成し、
 生成された前記行動データが表す動作が実現されるように、前記ロボットを制御する、
 処理をコンピュータが実行する制御方法。
A control method for controlling a robot that weighs an object that is a powder, a granular material, or a fluid, comprising:
acquiring status data representing a status of the robot when weighing the object;
The acquired state data is input to a trained model that is generated in advance and that outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the state data is input, thereby generating the behavioral data according to the acquired state data;
Controlling the robot so as to realize an action represented by the generated behavioral data.
A method of controlling how processing is executed by a computer.
 仮想ロボットが仮想の粉体、粒体、又は流体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られる学習用データを取得し、
 取得された学習用データに基づいて、ロボットが粉体、粒体、又は流体である対象物を秤量する際の状態を表す状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルを生成する、
 処理をコンピュータが実行する学習済みモデル生成方法。
acquiring learning data obtained by computer simulating the operation of the virtual robot when weighing a virtual object, which is a virtual powder, a virtual granule, or a virtual fluid;
A trained model is generated based on the acquired training data, in which, when state data representing a state when the robot weighs an object, which is a powder, a granule, or a fluid, is input, behavioral data including an action of adjusting the amount of the object scooped up by the robot is output.
A method for generating trained models in which processing is performed by a computer.
 粉体、粒体、又は流体である対象物を秤量するロボットを制御する制御プログラムであって、
 前記ロボットが前記対象物を秤量する際の状態を表す状態データを取得し、
 取得された前記状態データを、予め生成された学習済みモデルであって、かつ前記状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルに対して入力することにより、取得された前記状態データに応じた前記行動データを生成し、
 生成された前記行動データが表す動作が実現されるように、前記ロボットを制御する、
 処理をコンピュータに実行させるための制御プログラム。
A control program for controlling a robot that weighs an object that is a powder, a granular material, or a fluid, comprising:
acquiring status data representing a status of the robot when weighing the object;
The acquired state data is input to a trained model that is generated in advance and that outputs behavioral data including an action of adjusting the amount of the object scooped up by the robot when the state data is input, thereby generating the behavioral data according to the acquired state data;
Controlling the robot so as to realize an action represented by the generated behavioral data.
A control program that causes a computer to execute processing.
 仮想ロボットが仮想の粉体、粒体、又は流体である仮想対象物を秤量する際の動作をコンピュータシミュレーションすることにより得られる学習用データを取得し、
 取得された学習用データに基づいて、ロボットが粉体、粒体、又は流体である対象物を秤量する際の状態を表す状態データが入力されると前記ロボットがすくい上げた前記対象物の量を調整する動作を含む行動データが出力される学習済みモデルを生成する、
 処理をコンピュータに実行させるための学習済みモデル生成プログラム。
acquiring learning data obtained by computer simulating the operation of the virtual robot when weighing a virtual object, which is a virtual powder, a virtual granule, or a virtual fluid;
A trained model is generated based on the acquired training data, in which, when state data representing a state when the robot weighs an object, which is a powder, a granule, or a fluid, is input, behavioral data including an action of adjusting the amount of the object scooped up by the robot is output.
A trained model generation program for allowing a computer to execute processing.
PCT/JP2024/025018 2023-07-31 2024-07-10 Control device, trained model generation device, method, and program WO2025028204A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2023-124132 2023-07-31
JP2023124132A JP2025020634A (en) 2023-07-31 Control device, trained model generation device, method, and program

Publications (1)

Publication Number Publication Date
WO2025028204A1 true WO2025028204A1 (en) 2025-02-06

Family

ID=94395103

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2024/025018 WO2025028204A1 (en) 2023-07-31 2024-07-10 Control device, trained model generation device, method, and program

Country Status (1)

Country Link
WO (1) WO2025028204A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019098419A (en) * 2017-11-28 2019-06-24 キユーピー株式会社 Machine learning method for learning extraction operation of powder, grain or fluid, and robot machine learning control device
JP2020194242A (en) * 2019-05-24 2020-12-03 株式会社エクサウィザーズ Learning device, learning method, learning program, automatic control device, automatic control method and automatic control program
JP2021091022A (en) * 2019-12-09 2021-06-17 キヤノン株式会社 Robot control device, learned model, robot control method, and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019098419A (en) * 2017-11-28 2019-06-24 キユーピー株式会社 Machine learning method for learning extraction operation of powder, grain or fluid, and robot machine learning control device
JP2020194242A (en) * 2019-05-24 2020-12-03 株式会社エクサウィザーズ Learning device, learning method, learning program, automatic control device, automatic control method and automatic control program
JP2021091022A (en) * 2019-12-09 2021-06-17 キヤノン株式会社 Robot control device, learned model, robot control method, and program

Similar Documents

Publication Publication Date Title
JP2009099082A (en) Dynamics simulation device, dynamics simulation method, and computer program
CN114137982B (en) Robot motion control method and device, robot control equipment and storage medium
JP2019098419A (en) Machine learning method for learning extraction operation of powder, grain or fluid, and robot machine learning control device
CN109843514B (en) Method and autonomous system for collision detection
Bi et al. Friction modeling and compensation for haptic display based on support vector machine
Spandonidis et al. Μicro-scale modeling of excited granular ship cargos: A numerical approach
CN110134137A (en) Spacecraft Attitude Tracking Control Method Based on Extended State Observer
JP7076756B2 (en) A method and calculation system for determining the values of error parameters that indicate the quality of robot calibration.
JP2011081530A (en) Discrete element method analysis simulation method for particle model, discrete element method analysis simulation program and discrete element method analysis simulation device
WO2025028204A1 (en) Control device, trained model generation device, method, and program
CN110174875A (en) Simulator, analogy method and storage medium
JP2025020634A (en) Control device, trained model generation device, method, and program
Kadokawa et al. Learning robotic powder weighing from simulation for laboratory automation
CN112975965B (en) Decoupling control method and device of humanoid robot and humanoid robot
EP1624393A1 (en) System and method for simulating motion of a multibody system
Frontoni et al. A framework for simulations and tests of mobile robotics tasks
JP7101387B2 (en) Simulation equipment, simulation method, program
JP7390405B2 (en) Methods and systems for testing robotic systems in integrated physical and simulated environments
KR102423745B1 (en) A method for implementing forward kinematics of motion platform based on 3d-model running engine
WO2021246378A1 (en) Simulation device, simulation method, and program
Gabiccini et al. On state and inertial parameter estimation of free-falling planar rigid bodies subject to unscheduled frictional impacts
Osburg et al. Using deep neural networks to improve contact wrench estimation of serial robotic manipulators in static tasks
Boeing et al. SubSim: An autonomous underwater vehicle simulation package
Kazadi et al. Artificial physics, swarm engineering, and the hamiltonian method
WO2023058764A1 (en) Simulation device, simulation method, and program