CN118372258B

CN118372258B - Distributed vision cluster robot system

Info

Publication number: CN118372258B
Application number: CN202410805748.2A
Authority: CN
Inventors: 马昭; 梁甲琛; 赵世钰
Original assignee: Westlake University
Current assignee: Westlake University
Priority date: 2024-06-21
Filing date: 2024-06-21
Publication date: 2024-09-03
Anticipated expiration: 2044-06-21
Also published as: CN118372258A

Abstract

The application provides a distributed vision cluster robot system. The system comprises a plurality of robots, a simulation server and an upper computer. After each robot receives an operation instruction, controlling four cameras to respectively acquire images, acquiring the position information of other robots based on a visual perception algorithm, generating an environment map based on each image, and generating a corresponding planning path based on each position information and the environment map; after receiving the simulation instruction, each robot uploads the respective pose information to a simulation server, the simulation server synchronizes the pose information and sends the synchronized pose information to each robot, and each robot constructs a twin body of each robot in a visual simulation environment according to the pose information so as to judge whether the robot is synchronized with each robot. The application obtains the position information of other robots based on the visual perception algorithm, and is not easy to be interfered by signals; the distributed simulation mode is adopted to reduce the requirement on simulation calculation force and improve the simulation efficiency.

Description

Distributed vision cluster robot system

Technical Field

The application relates to the field of robots, in particular to a distributed vision cluster robot system.

Background

Clustered robots (swarm robotics) are related to the design, construction, and deployment of large clustered robots that can coordinate and cooperate with each other to solve problems or perform tasks.

In the prior art, clustered robots acquire position information of each other by means of communication, and the method is prone to signal interference. Secondly, the simulation server is utilized to simulate the clustered robots, and when the number of the clustered robots is large, the calculation force requirement on the simulation server is high.

Disclosure of Invention

Based on this, it is necessary to provide a distributed vision clustered robot system in view of the above technical problems.

The embodiment of the invention provides a distributed vision cluster robot system, which comprises a plurality of robots, a simulation server and an upper computer; the upper computer sends an operation instruction and a simulation instruction to each robot through the simulation server; each robot is provided with four cameras;

After each robot receives an operation instruction, controlling the four cameras to respectively acquire images, acquiring the position information of other robots based on a visual perception algorithm, generating an environment map based on each image, and generating a corresponding planning path based on each position information and the environment map;

After receiving the simulation instruction, each robot uploads the respective pose information to the simulation server, the simulation server synchronizes the pose information and sends the synchronized pose information to each robot, and each robot constructs a twin body of each robot in a visual simulation environment according to the pose information so as to judge whether the robot is synchronized with each robot.

In some embodiments, each of the robots includes:

the visual perception module is used for mapping the four images to a tetrahedron template to obtain a tetrahedron model, and inputting the tetrahedron model to a trained target detection model to obtain the position information of other robots.

In some embodiments, the visual perception module comprises:

the model construction module is used for converting each point in the four images into a three-dimensional coordinate point based on a coordinate conversion relation, obtaining the minimum distance between each three-dimensional coordinate point and the tetrahedron template, and obtaining each projection point in the tetrahedron template by moving the minimum distance along the normal vector direction, so as to obtain the tetrahedron model.

In some embodiments, the visual perception module comprises:

The model training module is used for marking the bounding boxes and the categories of the robots in the tetrahedral model samples based on the tetrahedral model sample training set;

And training an initial three-dimensional detection model by using the marked tetrahedral model sample training set to obtain the target detection model.

In some embodiments, each of the robots includes:

The map generation module is used for extracting the characteristic points of the four images to generate a sparse map, acquiring depth information of the four images based on a depth estimation algorithm, and binding and fusing the depth information with the sparse map to obtain the environment map.

In some embodiments, each of the robots includes:

The algorithm optimization module is used for obtaining a first distance between the twin bodies, obtaining a second distance between the robots calculated by the robots through the visual perception algorithm, evaluating the visual perception algorithm based on errors between the second distances and the corresponding first distances, and optimizing the visual perception algorithm until the evaluation result meets the requirement.

In some embodiments, each robot is provided with an indicator light, and after receiving the running instruction or the simulation instruction, the indicator lights are controlled to display corresponding colors;

and executing the running instruction or the simulation instruction according to the color displayed by the indicator lamp when the robot does not receive the running instruction or the simulation instruction and judges the indicator lamp display of other robots based on the image.

In some embodiments, each of the robots includes:

The target tracking module is used for determining queen bee robots in other robots, converting the position information into images of corresponding cameras by using reverse perspective transformation and camera parameters based on the position information of the queen bee robots, determining the cameras as main cameras, and controlling the main cameras to track the queen bee robots by using a target tracking algorithm.

In some embodiments, the simulation server comprises:

And the gesture synchronization module is used for selecting gesture information with the same time stamp according to the time stamp of each piece of gesture information and sending the gesture information to each robot.

In some embodiments, the host computer includes:

The instruction generation module is used for receiving input multi-modal information, analyzing and identifying the multi-modal information by utilizing a language big model to obtain the running instruction or the simulation instruction; the category of the multi-modal information comprises one of voice, text and image.

Compared with the prior art, after each robot receives the operation instruction, the four cameras are controlled to respectively acquire images, and the position information of other robots is acquired based on a visual perception algorithm, so that the robot is not easy to be interfered by signals compared with the communication mode for acquiring the position information;

The environment map can be generated in real time based on each image, and the corresponding planning path is generated based on each position information and the environment map, so that collision among robots and the like are avoided;

after receiving the simulation instruction, each robot uploads the respective pose information to a simulation server, the simulation server synchronizes the pose information and sends the synchronized pose information to each robot, and each robot constructs a twin body of each robot in a visual simulation environment according to the pose information so as to judge whether the robots are synchronized. The distributed simulation mode is adopted to reduce the requirement on simulation calculation force, reduce data transmission with a simulation server and improve simulation efficiency.

Drawings

FIG. 1 is a schematic diagram of a distributed vision clustered robot system in one embodiment;

FIG. 2 is a schematic diagram of a module of a visual perception module according to an embodiment;

FIG. 3 is a schematic diagram of a tetrahedral model obtained in one embodiment;

FIG. 4 is a schematic diagram of a visual perception module according to an embodiment;

FIG. 5 is a block diagram of a map generation module according to an embodiment;

FIG. 6 is a block diagram of an algorithm optimization module in an embodiment;

FIG. 7 is a schematic block diagram of a target tracking module according to an embodiment;

FIG. 8 is a schematic diagram of a simulation server according to an embodiment;

FIG. 9 is a schematic block diagram of an upper computer in an embodiment.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are used in the description of the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some examples or embodiments of the present invention, and it is apparent to those of ordinary skill in the art that the present invention may be applied to other similar situations according to the drawings without inventive effort. Unless otherwise apparent from the context of the language or otherwise specified, like reference numerals in the figures refer to like structures or operations.

As used in the specification and in the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.

While the present invention makes various references to certain modules in an apparatus according to embodiments of the invention, any number of different modules may be used and run on a computing device and/or processor. The modules are merely illustrative and different aspects of the apparatus and method may use different modules.

It will be understood that when an element or module is referred to as being "connected," "coupled" to another element, module, or block, it can be directly connected or coupled or in communication with the other element, module, or block, or intervening elements, modules, or blocks may be present unless the context clearly dictates otherwise. The term "and/or" as used herein may include any and all combinations of one or more of the associated listed items.

As shown in fig. 1, an embodiment of the present invention provides a distributed vision cluster robot system, including a plurality of robots 30, a simulation server 20, and an upper computer 10; wherein, the upper computer 10 sends an operation instruction and a simulation instruction to each robot 30 through the simulation server 20; each of the robots 30 is provided with four cameras.

After each robot receives the operation instruction, the four cameras are controlled to respectively acquire images, position information of other robots is acquired based on a visual perception algorithm, an environment map is generated based on each image, and a corresponding planning path is generated based on each position information and the environment map.

After each robot receives the operation instruction, the four cameras are controlled to acquire images respectively, and the position information of other robots is acquired based on a visual perception algorithm, so that the robot is not easy to be interfered by signals compared with the mode of communication for acquiring the position information.

The environment map can be generated in real time based on each image, and the corresponding planning path is generated based on each position information and the environment map, so that collision among robots and the like are avoided.

Each robot in the same scene realizes scene synchronization and distributed simulation. The distributed simulation belongs to hardware-in-the-loop simulation, not only can the verification of the algorithm be realized, but also the deployment feasibility of the algorithm can be evaluated, and the simulation code and the deployment code can be completely multiplexed, so that the twin of the simulation robot is realized.

The robot, the simulation server, and the host computer will be described in detail below.

The robot is provided with a panoramic vision system consisting of four low-cost cameras. Panoramic vision systems are particularly important for clustered robots, and can sense the states of other robots in a cluster at any time to realize cooperation and collision avoidance.

In some embodiments, as shown in fig. 2, each of the robots 30 includes: the visual perception module 310 is configured to map the four images to a tetrahedral template to obtain a tetrahedral model, and input the tetrahedral model to a trained target detection model to obtain position information of each other robot.

As shown in fig. 3, the tetrahedral template is a three-dimensional template containing four blank surfaces (front, rear, left, right) preset to be stored in the cluster robot. And respectively mapping points in the four planar images to four blank surfaces in the tetrahedron template to obtain a tetrahedron model, wherein the front, back, left and right surfaces of the tetrahedron model are mapped with the planar images, and the upper surface and the lower surface are blank surfaces.

In some embodiments, as shown in fig. 4, the visual perception module 310 includes:

The model construction module 311 is configured to convert each point in the four images into a three-dimensional coordinate point based on a coordinate transformation relationship, obtain a minimum distance between each three-dimensional coordinate point and the tetrahedron template, and obtain each projection point in the tetrahedron template by moving the minimum distance along a normal vector direction, thereby obtaining the tetrahedron model.

The model training module 312 is configured to label bounding boxes and categories of robots in the tetrahedral model samples based on the tetrahedral model sample training set; and training an initial three-dimensional detection model by using the marked tetrahedral model sample training set to obtain the target detection model.

The model construction module 311 first constructs a planar expression of the four images.

For each tetrahedral model, its planar expression needs to be defined. Each planar expression may be determined by three vertices. For noodlesThe planar expression is derived from the following steps. Specifically, an edge vector of the image is constructed:

wherein V ₁、V₂、V₃ represents the three vertices of the image;

constructing a normal vector of the image:

Constructing a planar expression of the image:

Wherein (x, y, z) represents the coordinates of any point in space; (x ₁, y₁, z₁) represents the three-dimensional coordinates of a known point on the plane or of the clustered robot itself; n ₁、n₂、n₃ represents a normal vector Is a component of (a).

And secondly, mapping points in the four images to a tetrahedral template by using a camera model based on the plane expression, so as to obtain a tetrahedral model.

And converting each point in the image into a three-dimensional coordinate point based on the coordinate conversion relation. Assuming that each camera follows a pinhole camera model, the coordinate conversion relationship converts each point (u, v) in the image into a three-dimensional coordinate point (x, y, z), and the coordinate conversion relationship is:

Wherein K represents an internal reference matrix of the camera, and z represents depth.

And obtaining the minimum distance from each three-dimensional coordinate point to the tetrahedron template. From the three-dimensional coordinate points obtained above, projections of these points on the tetrahedral template can be calculated. Given three-dimensional coordinate pointsThe minimum distance D to the upper plane of the tetrahedral template is:

and obtaining each projection point P' in the tetrahedral template by moving a minimum distance along the normal vector direction, so as to obtain the tetrahedral model.

After the three-dimensional coordinate points are obtained, the points need to be projected onto a tetrahedral template. The key step is to move the minimum distance along the normal vector direction and determine the projection point.

Moving this minimum distance along the normal vector direction, points in space can be projected onto the plane of the tetrahedral template. Projection pointThe calculation mode of (a) is as follows:

thus, the projection points of the tetrahedron model are obtained from any three-dimensional coordinate points (x, y, z) in the space 。

The tetrahedron model constructed based on the above can effectively process and analyze image data from four different directions in a unified three-dimensional space, so that the clustered robot can better understand the structure of the surrounding environment and accurately navigate and avoid obstacles.

The model training module 312 may employ the methods described above for the construction of tetrahedral model samples. And acquiring images by using four cameras, and performing labeling, including a bounding box and a category of the robot. Preprocessing the image to adapt to the spatial representation of the tetrahedral model, ensuring that the annotation of each robot is converted into a three-dimensional coordinate system, and converting the projection of the image from four faces into a consistent three-dimensional representation-tetrahedral model sample.

And training an initial three-dimensional detection model by using the marked tetrahedral model sample training set to obtain the target detection model. The initial three-dimensional detection model is a neural network suitable for processing three-dimensional space data, such as a model based on 3D CNN, or a Point Net which considers an advanced Point cloud processing model.

And training an initial three-dimensional detection model by using the labeled tetrahedral model sample training set, and optimizing the model to identify and position the robot in the three-dimensional space. At the same time, a multitasking loss function is applied while learning classification and location information (bounding box) of the object. The trained target detection model can identify and position robots in the tetrahedral model.

In some embodiments, as shown in fig. 5, each of the robots 30 includes: the map generating module 320 is configured to extract feature points of the four images to generate a sparse map, obtain depth information of the four images based on a depth estimation algorithm, bind and fuse the depth information with the sparse map, and obtain the environment map.

Specifically, each robot is provided with an IMU (inertial navigation system) capable of providing robust position information.

The sparse map based on the characteristic points can be generated in real time according to the four images. The robot adopts a self-supervision depth estimation algorithm, so that depth information in an image can be estimated. And binding and fusing the panoramic depth information with the sparse map based on the feature points to obtain the environment map. Based on the environment map, the functions of obstacle avoidance, path planning and the like can be realized for the robot.

In some embodiments, as shown in fig. 6, each of the robots 30 includes: the algorithm optimization module 330 is configured to obtain a first distance between the twin bodies, obtain a second distance between the robots calculated by the robots using the visual perception algorithm, evaluate the visual perception algorithm based on an error between the second distance and the corresponding first distance, and optimize the visual perception algorithm until an evaluation result meets a requirement.

In an example embodiment, the clustered robots include three robots. The visual perception algorithm is evaluated by calculating a mean square error between the second distance and the first distance.

The calculation formula is as follows:

And Representing two second distances between the three robots; And Representing two first distances between the three twins.

After the mean square error is calculated, if the mean square error is larger than a preset value, the visual perception algorithm is required to be optimized until the calculated mean square error is smaller than or equal to the preset value, so that the visual perception algorithm is evaluated and optimized.

In some embodiments, each robot is provided with an indicator light, and after receiving the running instruction or the simulation instruction, the indicator light is controlled to display a corresponding color. And executing the running instruction or the simulation instruction according to the color displayed by the indicator lamp when the robot does not receive the running instruction or the simulation instruction and judges the indicator lamp display of other robots based on the image.

Under some special conditions, the robot may not receive the instruction issued by the upper computer, at this time, the indicator lights of the surrounding robots may be identified to determine whether the upper computer issues the instruction, and determine whether to run the instruction or simulate the instruction according to the color displayed by the indicator lights, and respond to the corresponding instruction. The indication lamps of the surrounding robots are not recognized for the robots that receive the instruction issued by the host computer. By the method, the situation that the robot cannot work because the robot does not receive the instruction can be avoided.

In some embodiments, as shown in fig. 7, each of the robots 30 includes: the target tracking module 340 is configured to determine a queen bee robot in other robots, convert the position information back into an image of a corresponding camera using reverse perspective transformation and camera parameters based on the position information of the queen bee robot, determine the camera as a main camera, and control the main camera to track the queen bee robot by using a target tracking algorithm.

The target tracking module 340 determines queen robots among the other robots, for example, by identifying the color of the indicator lights or flashing signals. The position information is converted back into the image of the corresponding camera by using reverse perspective transformation and camera parameters, so that the queen bee robot can be tracked better.

An inverse perspective transformation is a geometric transformation that projects points in three-dimensional space onto a two-dimensional image plane. This process relies on camera parameters, both intrinsic and extrinsic.

The camera internal reference matrix K describes parameters of imaging characteristics of the camera, including focal length and principal point offset. The form is as follows:

wherein, AndIs the focal length of the camera in the x and y directions,AndIs the coordinates of the principal point of the image (the intersection of the optical axis and the imaging plane).

The camera external reference matrix describes the position and direction of the camera in space, including a rotation matrix R and a translation vector t. The form of the external parameter matrix is as follows:

The rotation matrix R and translation vector t are typically obtained by a camera calibration process. Calibration may be performed using a variety of methods, such as using a checkerboard or a specific calibration plate, by capturing multiple views at different angles, and processing the images using computer vision algorithms (e.g., calibration functions in OpenCV) to calculate R and t.

To three-dimensional space pointProjecting to two-dimensional image pointsThe following formula is used:

After simplification:

The goal of the inverse perspective transformation is to transform points on the two-dimensional image back to coordinates in three-dimensional space. This process involves back-projecting the image coordinates back into the camera coordinate system.

Let us assume that we know the points in the imageAnd corresponding depth information Z. First, the image coordinates need to be converted into three-dimensional points in camera coordinates。

Inverse matrix using camera built-in matrix:

Next, three-dimensional points in the camera coordinate system need to be converted into the world coordinate system. Inverse transformation is performed by using an external parameter matrix of the camera:

Simplifying the calculation:

wherein, Is the inverse of the rotation matrix and t is the translation vector.

The robot is provided with an on-board computer in a distributed mode, and the on-board computer is provided with a visual perception module, a map generation module, an algorithm optimization module, a target tracking module and the like, so that the distributed deployment of the algorithms such as visual perception, map generation, algorithm optimization, target tracking and the like is realized.

The onboard computer chooses to use Nvidia Jetson NX processors that have multiple modes of operation, up to 6-core processing computations, with up to 21 TOPS's acceleration computing capability providing the ability to run modern neural networks in parallel and process data from multiple high-resolution sensors. The method has excellent performance and real-time computing capability, provides a basis for the positioning and sensing algorithm deployment of the robots, enables the clustered robots to complete respective tasks independently of the help of other computers, and greatly improves the capability of coping with complex environments.

Each robot is also provided with a microcontroller in a distributed manner, and in order to realize the functions of state visualization, autonomous real-time motion control, information interaction and the like, a microcontroller STM32H7B0 is used as a main control. The microcontroller of the model is provided with a multi-signal input/output interface, and besides being communicated with an on-board computer, the microcontroller meets the access of various sensors and functional modules and is convenient to upgrade and expand. As a microprocessor, the intelligent robot has a Cortex-M7 kernel, has 280Mhz processing performance, simultaneously has 128K flash memory and 1.18MB user SRAM, has great processing capacity for deploying and running some motion control related bottom codes, meets the functional design requirement to a great extent, and is beneficial to improving the overall efficiency of the robot.

In the aspect of a power system, a Mecanum wheel is designed and used for solving the different motion requirements of the robot under different scenes. The technology enables the robot to realize the moving modes of forward movement, transverse movement, oblique movement, rotation, combination thereof and the like, and the moving speed can reach 1m/s at the highest speed. The whole configuration is adjusted, so that the motor with enough power can be used, the machine body is as small as possible, and more robots can be put down on the same place.

In order to support the power consumption of the power system of the robot and the onboard computer. The robot is responsible for power and power supply of computational resources through a 9800mAH battery. The standard voltage of the main loop of the robot is 11.1V, and the voltage is converted into 5V and 3.3V through two high-power switch power supply chips to supply power for peripheral equipment and a control board. And the on-board computer with the largest power consumption is directly connected with the main loop through electronic switch control. When the robot is in the low-power-consumption sleep mode, the power supply loop is closed, and the theoretical standby time is not less than one year. When the whole system is started, the system can run for 2-4 hours under the conditions of high load calculation and movement.

Six sensors are integrated in the robot, acceleration, angular velocity and attitude information of the robot can be acquired by the nine-axis IMU, an odometer, a GPS module and a UWB module can be used for assisting navigation, and related sensor data can be acquired according to different requirements in actual development.

At present, many cluster robots do not perform special optimization aiming at cluster functions, and no matter how the operation logic and the software management are in a traditional single-to-single management mode, so that no matter how experiments or daily development and debugging are performed are very troublesome.

The distributed vision cluster robot system provided by the embodiment can realize one-key cluster management. The upper computer distributes the operation instructions and the simulation instructions to the robots through the simulation server, and the robots can realize clustered operation or simulation after receiving the corresponding instructions.

In addition, the robot is further provided with a wireless receiving and transmitting module and a charging slot for the robot, and the upper computer distributes charging instructions, so that unified charging of the clustered robots can be realized, and the charging operation becomes more convenient.

In order to facilitate centralized opening and closing, the whole distributed vision cluster robot system adopts a low-power design, and when an opening command is not received, only the low-power wireless transceiver module for receiving an opening signal works with a stabilized voltage power supply, and the whole standby power consumption is only 105uA. And the robot receives the command after the robot is started, and the robot starts a power supply and a power supply of the on-board computer.

In some embodiments, as shown in fig. 8, the simulation server 20 includes: the pose synchronization module 210 is configured to select pose information with the same time stamp according to the time stamp of each piece of pose information, and send the pose information to each robot.

After the simulation is carried out, the robots upload the pose information of each robot to the simulation server in real time, so that the simulation server needs to synchronize the time of the pose information of each robot, and the pose information with the same timestamp is sent to each robot, thereby ensuring the consistency in time.

In some embodiments, as shown in fig. 9, the upper computer 10 includes: the instruction generating module 110 is configured to receive input multi-modal information, analyze and identify the multi-modal information by using a language big model, and obtain the running instruction or the simulation instruction; the category of the multi-modal information comprises one of voice, text and image.

The language big model can analyze and identify the input information such as voice, characters, images and the like by using the existing trained language big model, and generate corresponding operation instructions or simulation instructions and the like according to analysis and identification results, so that artificial writing of the operation instructions or the simulation instructions and the like is not needed.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A distributed vision cluster robot system, which is characterized by comprising a plurality of robots, a simulation server and an upper computer; the upper computer sends an operation instruction and a simulation instruction to each robot through the simulation server; each robot is provided with four cameras;

after receiving simulation instructions, each robot uploads respective pose information to the simulation server, the simulation server synchronizes the pose information and then sends the synchronized pose information to each robot, and each robot constructs a twin body of each robot in a visual simulation environment according to the pose information so as to judge whether the robot is synchronized with each robot;

Each of the robots includes:

the visual perception module is used for mapping the four images to a tetrahedron template to obtain a tetrahedron model, and inputting the tetrahedron model to a trained target detection model to obtain the position information of other robots;

The visual perception module comprises:

The model construction module is used for converting each point in the four images into a three-dimensional coordinate point based on a coordinate conversion relation, obtaining the minimum distance between each three-dimensional coordinate point and the tetrahedron template, and obtaining each projection point in the tetrahedron template by moving the minimum distance along the normal vector direction to obtain the tetrahedron model;

The visual perception module comprises:

the model training module is used for marking the bounding boxes and the categories of the robots in the tetrahedral model samples based on the tetrahedral model sample training set; and training an initial three-dimensional detection model by using the marked tetrahedral model sample training set to obtain the target detection model.

2. The system of claim 1, wherein each of the robots comprises:

3. The system of claim 1, wherein each of the robots comprises:

4. The system according to claim 1, wherein each robot is provided with an indicator light, and after receiving the operation instruction or the simulation instruction, the indicator light is controlled to display a corresponding color;

5. The system of claim 1, wherein each of the robots comprises:

6. The system of claim 1, wherein the simulation server comprises:

7. The system of claim 1, wherein the host computer comprises: