CN112894815B

CN112894815B - Method for detecting optimal position and posture for article grabbing by visual servo mechanical arm

Info

Publication number: CN112894815B
Application number: CN202110097875.8A
Authority: CN
Inventors: 田军委; 闫明涛; 张震; 苏宇; 赵鹏; 徐浩铭; 杨寒
Original assignee: Xian Technological University
Current assignee: Dongguan Ruidong Intelligent Technology Co ltd
Priority date: 2021-01-25
Filing date: 2021-01-25
Publication date: 2022-09-27
Anticipated expiration: 2041-01-25
Also published as: CN112894815A

Abstract

The invention discloses a method for detecting the optimal grabbing pose of a visual servo mechanical arm, which mainly comprises the following steps: step one, reading a photo; step two, extracting SURF characteristic points; thirdly, generating a feature vector of the image according to the SURF feature points; step four, a matching pair (containing a wild value) is initially established; step five, preventing affine change and removing outliers which do not meet the change; step six, acquiring a polygonal frame of the target object; resolving an optimal pose under a two-dimensional coordinate system; and step eight, controlling the mechanical arm to grab the article by the servo control. The method combines monocular vision with the servo mechanical arm, and further obtains the optimal pose angle of the target object by utilizing the geometric relation of slope obtained between two points, namely the angle of rotation grasped by the servo mechanical arm, so that the method is simple and easy to implement, has few occupation problems, is high in calculation speed, and has wide application prospect.

Description

Method for detecting optimal position and posture for article grabbing by visual servo mechanical arm

Technical Field

The invention relates to the technical field of vision processing, in particular to a method for detecting an optimal position and posture for article grabbing of a vision servo mechanical arm.

Background

The mobile phone picking at narrow positions such as a sewer, a pointing and the like is a widely existing problem, and the scheme of grabbing articles by using a mechanical arm structure is effective, but an article positioning and action commanding method matched with the mechanical arm structure aiming at the problems is lacked all the time. Before a robot automatically grabs a target object, the pose of the grabbed object is determined, namely a proper grabbing position and a proper grabbing pose are determined, and machine vision detection is the most commonly used target pose detection method; the typical machine vision pose extraction method mainly comprises two categories of monocular vision and binocular vision. Monocular vision is convenient to operate, the data information processing speed is high, but depth information is lacked, and the depth measurement work is difficult to complete; the binocular vision can carry out depth measurement by utilizing the matching of imaging points of the same point in space on two cameras, and can position and fix the posture of an object with a simple shape in a simple environment. And the situation that the mobile phone at the narrow position picks up the object with simple shape is just suitable for the monocular vision mode.

Therefore, the invention is necessary to invent the optimal pose detection method for article grabbing by the visual servo mechanical arm, which is based on monocular vision, high in calculation speed and small in occupied resource.

Disclosure of Invention

The purpose of the invention is as follows: in order to overcome the defects in the prior art, the invention provides the optimal position and posture detection method for article grabbing of the visual servo mechanical arm, which is based on monocular vision, high in calculation speed and small in occupied resource.

The technical scheme is as follows: in order to achieve the purpose, the invention discloses a method for detecting the optimal grabbing pose of a visual servo mechanical arm, which is characterized by comprising the following steps: comprises the following steps of (a) carrying out,

step one, reading a photo;

step two, extracting SURF characteristic points;

thirdly, generating a feature vector of the image according to the SURF feature points;

step four, a matching pair (containing a wild value) is initially established;

step five, preventing affine change and removing outliers which do not meet the change;

step six, acquiring a polygonal frame of the target object;

resolving the optimal pose under a two-dimensional coordinate system;

and step eight, the mechanical arm is controlled by the servo to grab the article.

Further, in the first step, a camera imaging model is adopted for photo shooting and camera calibration(ii) a The camera imaging model is composed of a world coordinate system O _w -X _w Y _w Z _w Camera coordinate system O _c -X _c Y _c Z _c Pixel coordinate system O _p Uv and image coordinate system O _i -X _i Y _i Forming; where P is a point in the camera coordinate system and the coordinates are (X) _c ,Y _c ,Z _c ) (ii) a P' is an imaging point in the image, the coordinates in the image coordinate system are (x, y), and the coordinates in the pixel coordinate system are (u, v); camera calibration is to determine the relationship between the camera coordinate system, the image coordinate system, the pixel coordinate system and the real coordinate system.

Further, in the second step, the following sub-steps are included,

establishing an integral image; the sum of the pixels of any area is calculated through a simple addition and subtraction method, and the calculation formula of the sum of the gray values of all the pixels in the rectangle ABCD is as follows:

∑＝A-B-C+D

establishing a Hessian matrix; a pixel (x, y), and the Hessian matrix of the function f (x, y) is defined as:

it can be seen that the H matrix is formed by the second-order partial derivative of the second order of the function f, and each pixel point can solve an H matrix; in the formula: l is _xx (x, σ) is the convolution of the image f with the second derivative of the gaussian with a scale σ, which is defined as follows:

for L _xy (x, σ) and L _yy (x, σ), are defined similarly.

Step three, establishing a scale space; the detection algorithm and extraction of the SUFR feature extreme points are based on the scale space theory; by changing the scale at which the gaussian filter is used, the image itself is not changed to construct a response to a different scale.

Extracting feature points; and removing determined characteristic points by using a non-extreme value, comparing the size of each pixel point of the Hessian determinant image with 26 points of a 3-dimensional neighborhood, setting a threshold, obtaining the characteristic points of a sub-pixel level by using a 3-dimensional linear interpolation method, and removing the points of which the characteristic values are smaller than the threshold to obtain more stable points.

A fifth substep, selecting the direction of the characteristic points; with the feature point as the center, detecting the interest point to determine the scale s, counting the Haar wavelet responses of all points in the radius sector, and giving greater weight to the pixels close to the feature point; counting the sum of all Harr wavelet responses in the region to form a vector, namely the direction of a sector; and traversing all the fan-shaped areas, and selecting the longest vector direction as the direction of the feature point.

A sixth substep of constructing a characteristic point descriptor of the SURF; constructing a square window with 20s as side length by taking the characteristic point as a center, wherein s is the scale of the characteristic point; dividing the image into 16 4 x 4 subregions, wherein the sum of the Haar wavelet characteristics of horizontal x and vertical y of each subregion statistic containing 25 pixel elements is sigma H _x Sum Σ H _y (ii) a Counting 4-dimensional descriptors V [ ∑ H ] of each subregion _x ,∑H _y ,∑|H _x |,∑|H _y |]And obtaining a feature descriptor with the feature vector length of 64 dimensions.

Further, in the seventh step,

the image is arranged in a pixel coordinate system O _p-uv The middle position relation is converted into the physical coordinate O of the image _i -X _i Y _i The purpose of the method is to calculate the geometric relationship in a physical coordinate system of an image, and the conversion formula is in a matrix form (4):

where (u, v) is the pixel in the horizontal and vertical directions, let O _i (u ₀ ,v ₀ ) Is a pixel at the center of the coordinate system of the image, and dx and dy are the actual physical dimensions of each pixel in the directions of the x-axis and the y-axisU is known from calibration of the camera ₀ ,v ₀ Dx, dy; the centroid O of the article can be obtained by using the image algorithm in the article rectangular frame CDEF matched with the SURF invariant feature point algorithm ₄ (u _s ,v _s ) And CDEF pixel coordinates, and further determining the article holding point A (u) based on the geometric relationship ₁ ,v ₁ ) And B (u) ₂ ,v ₂ ) The pixel coordinates of (a), which are matched to the rectangular frame of the article;

holding point A (u) of article ₁ ,v ₁ ) And B (u) ₂ ,v ₂ ) Substituting the pixel coordinates of (a) into equation (4) derives the physical coordinates of a and B, as in equation (5):

and formula (6):

found A (x) ₁ ,y ₁ ) And B (x) ₂ ,y ₂ ) The included angle alpha, namely the optimal pose angle, can be solved by the image physical coordinates, theoretically, the rotation angle given to the manipulator by a servo is also alpha, but the rotation angle is smaller than or larger than alpha due to the system error of the mechanical arm, so that the rotation angle is beta; taking the clockwise rotation of the manipulator grab relative to the article as positive and the anticlockwise rotation as negative, the slope of the manipulator grab can be obtained by utilizing the geometric relationship between two points in a coordinate system, and a formula (7) can be obtained:

from equations (5), (6) and (7), equation (8) can be derived:

the angle α in the formula (8) is obtained, the value range [0,90 ° ] is obtained, if the regular manipulator holds the robot to rotate clockwise, and if the regular manipulator holds the robot to rotate counterclockwise, the robot rotates counterclockwise. Alpha is the pose angle of the article on a two-dimensional plane and also is the clamping angle of the rotation of the mechanical hand, and the clamping point is two points A and B. As can be seen from the formula (8), dx and dy are intrinsic parameters of the camera, so that alpha is only related to the difference between the pixel points of the two points A and B, thereby simplifying the complex calculation in the image and improving the visual servo effect.

Has the advantages that: according to the method for detecting the optimal pose grabbed by the vision servo mechanical arm, monocular vision is combined with the servo mechanical arm, and the SURF invariant feature points are matched with a target object through the processes of camera calibration, image acquisition, image preprocessing and the like, so that target detection is completed. The method is simple and easy to implement, can effectively avoid the problem of occupation of image matching resources in the binocular vision pose measurement process, and is high in calculation speed. The effectiveness and the accuracy of the algorithm are verified through experiments, the detection of the optimal position and the determination of the grasping point of the mechanical hand can be completed, and the grasping requirement of the visual servo mechanical arm is met. Meanwhile, the measurement algorithm can be applied to the fields of industrial part positioning, palletizing robot carrying and the like, and has wide application prospect.

Drawings

FIG. 1 is a schematic overall view of a robotic arm article-grasping system;

FIG. 2 is a schematic view of an imaging model of a camera;

FIG. 3 is a schematic illustration of a labeling process;

FIG. 4 is a schematic diagram of fast integral map calculation;

FIG. 5 is a pose resolving diagram;

FIG. 6 is a schematic view of a rectangular box of a matched item;

FIG. 7 is a summary diagram of a MATLAB toolkit calibration experiment;

FIG. 8 is a schematic diagram of pose detection experiment operation under a single background;

FIG. 9 is a schematic diagram of a complex background lower gesture detection experiment;

fig. 10 is a physical representation diagram of pose resolving and article grabbing.

Detailed Description

The present invention will be further described with reference to the accompanying drawings.

The mechanical arm and camera position generally have two relationships: an eye-in-hand camera on a robot arm, which has high operability and can acquire high image data; one is eye-to-hand with the camera outside the mechanical arm, which has the advantages of wide field of view obtained by the camera and strong flexibility, but the mechanical arm is easily influenced by the position of the camera. The camera is adopted on the mechanical arm, a fixed transformation relation can be obtained after the camera is calibrated with the mechanical arm end effector, and O1O2O3O4 respectively represents the origin of a mechanical arm coordinate system, a mechanical hand grasping coordinate system, a camera coordinate system and an article coordinate system. The principle of the scheme is shown in figure 1.

The optimal pose detection method for the grabbing of the visual servo mechanical arm is characterized by comprising the following steps: comprises the following steps of (a) carrying out,

reading a photo;

step two, extracting SURF characteristic points;

step four, a matching pair (containing a wild value) is initially established;

step six, acquiring a polygonal frame of the target object;

resolving the optimal pose under a two-dimensional coordinate system;

In the first step, a camera imaging model is adopted for photo shooting and camera calibration; the camera imaging model is composed of a world coordinate system O _w -X _w Y _w Z _w Camera coordinate system O _c -X _c Y _c Z _c Pixel coordinate system O _p Uv and image coordinate System O _i -X _i Y _i Forming; wherein P isA point in the camera coordinate system, the coordinate being (X) _c ,Y _c ,Z _c ) (ii) a P' is an imaging point in the image, the coordinates in the image coordinate system are (x, y), and the coordinates in the pixel coordinate system are (u, v); the imaging principle model is shown in fig. 2; camera calibration is to determine the relationship between the camera coordinate system, the image coordinate system, the pixel coordinate system and the real coordinate system. Specifically, a mathematical model is created by the imaging principle of the camera, model parameters of the camera are derived by using pixel coordinates of known features and corresponding world coordinates, and the calibration process of internal and external parameters of the camera is shown in fig. 3.

In the second step, the following sub-steps are included,

establishing an integral image; the sum of the pixels in any area is calculated by simple addition and subtraction, and as shown in fig. 4, the sum of the gray values of all the pixels in the rectangle ABCD is calculated as formula (1):

∑＝A-B-C+D

establishing a Hessian matrix; a pixel point (x, y), and the Hessian matrix definition (2) of the function f (x, y) is:

it can be seen that the H matrix is formed by the second-order partial derivative of the second order of the function f, and each pixel point can solve an H matrix; in the formula: l is _xx (x, σ) is the convolution of the image f with the second derivative of the gaussian with a scale σ, which is defined (3) as follows:

for L _xy (x, σ) and L _yy (x, σ), are similarly defined.

Step three, establishing a scale space; the detection algorithm and extraction of the SUFR characteristic extreme point are based on a scale space theory; by changing the scale at which the gaussian filter is used, the image itself is not changed to construct a response to a different scale.

Selecting the direction of the characteristic points; with the feature point as the center, detecting the interest point to determine the scale s, counting the Haar wavelet responses of all points in the radius sector, and giving greater weight to the pixels close to the feature point; counting the sum of all Harr wavelet responses in the region to form a vector, namely the direction of a sector; and traversing all the fan-shaped areas, and selecting the longest vector direction as the direction of the feature point.

A sixth step of constructing a characteristic point descriptor of SURF; constructing a square window with 20s as side length by taking the characteristic point as a center, wherein s is the scale of the characteristic point; dividing the image into 16 4 x 4 subregions, wherein each subregion statistically comprises 25 pixel elements, and the sum of the Haar wavelet characteristics of horizontal x and vertical y is sigma H _x Sum Σ H _y (ii) a Counting 4-dimensional descriptors V [ ∑ H ] of each subregion _x ,∑H _y ,∑|H _x |,∑|H _y |]And obtaining a feature descriptor with the feature vector length of 64 dimensions.

In the seventh step, the process is carried out,

the optimal pose detection of the article includes SURF invariant feature point matched target detection and pose solution thereof. Calibrating the camera on the mechanical arm, solving an internal and external parameter matrix of the camera, and performing optimal pose detection and solution on the object image detected by SURF invariant feature point matching under a two-dimensional coordinate system, wherein the solution principle process is shown in (a) and (b) of FIG. 5.

The principle process is as follows: in the image coordinate system Oi-XiYi, let the centroid coordinate of the article be O4(sx, sy), let the deviation distance e between the center O2(zx, zy) of the end-of-arm hand grasp and the optical axis center Oi of the camera be designated, designate the finger posture of the hand grasp as PQ, and have a projection relation with the image of the camera as shown in fig. 5(a), where Op-u v is the pixel coordinate system.

The servo robot arm is moved from (a) to (b) so that its center of hand grip O2 coincides with the center of mass O4 of the item, as shown in fig. 5 (b). The y axis of the object posture and the Xi axis of the image coordinate system Oi-XiYi form an included angle alpha, the object is fixedly driven to rotate the mechanical hand grip by an angle alpha, so that the finger PQ is superposed with the y axis of the object posture on the projection, namely the new posture of the finger PQ is P1Q1 after the mechanical hand grip rotates by the angle alpha, and the gripping points are two points A (x1, y1) and B (x2, y 2). The angle of rotation alpha and the new finger position P1Q1 provide accuracy for the servo-robot arm to grasp an item.

The image is arranged in a pixel coordinate system O _{p-u v} The middle position relation is converted into the physical coordinate O of the image _i -X _i Y _i The purpose of the method is to calculate the geometric relationship in a physical coordinate system of an image, and the conversion formula is in a matrix form (4):

where (u, v) are pixels in the horizontal and vertical directions, let O _i (u ₀ ,v ₀ ) Is the pixel in the center of the coordinate system of the image, dx and dy are the actual physical dimensions of each pixel in the directions of the x-axis and the y-axis, and u is known from the calibration of the camera ₀ ,v ₀ Dx, dy; the centroid O of the article can be obtained by using the image algorithm in the article rectangular frame CDEF matched with the SURF invariant feature point algorithm ₄ (u _s ,v _s ) And CDEF pixel coordinates, and further determining the article holding point A (u) based on the geometric relationship ₁ ,v ₁ ) And B (u) ₂ ,v ₂ ) The pixel coordinates of (2) which are matched with the rectangular frame of the article;

and formula (6):

from equations (5), (6) and (7), equation (8) can be derived:

Compared with the traditional pose detection algorithm, the pose detection algorithm has the advantage that the calculation is greatly improved from the design principle and the solving result of the algorithm. For example, a typical PNP problem proposed in 1981 by fischler is that the pose problem of a target relative to a camera is solved according to image coordinates of the target point and a camera imaging model, and a nonlinear solving algorithm is used in the solving process of the planar PNP problem, and the algorithm has high accuracy, but generally requires iteration, has large calculated amount and has a stability problem. In the later research process, as a typical algorithm, a POSIT algorithm proposed by DENIS OBERKAMPF is adopted; a planar target pose estimation method (SP) proposed by Gerald Schweighofer; EPnP algorithm proposed by Lepetit, orthogonal iterative algorithm proposed by Lu, and the like. The typical algorithms all belong to nonlinear solution in the process of solving the attitude, so that the problem of resource occupation in the process of image solving is solved, and the algorithm can well avoid the complex iteration problem and reduce the problem of resource occupation.

An MATLAB 2018b version software and an MATLAB hardware support package usbwebcams are selected for camera calibration and target detection experiments, and an operating system is a 64-bit Win10 system. The pose resolving and servo manipulator grabbing experiment selects the right arm of a Rethink double-arm robot, and the operating system of the robot is a 64-bit Linux system Ubuntu16.04 for grabbing verification.

The following four points need to be completed before the pose detection and article grabbing experiment are carried out on the right arm of the Rethink double-arm robot: firstly, the method comprises the following steps: calibrating internal and external parameters of a camera on the right arm; secondly, the method comprises the following steps: correcting the positions of the center of the hand grab and the center of the camera; thirdly, the method comprises the following steps: the size of a UI display interface window of the double-arm robot is marked as 640 multiplied by 480 pixels; fourthly: and calibrating the initial position of the tail end of the right arm to enable the right arm to move in the horizontal plane direction for searching.

Camera calibration experiment

The maximum resolution of a camera on the mechanical arm is 1280 multiplied by 800 pixels, the calibration plate is a checkerboard of 10 multiplied by 7, the side length of each check is 28mm multiplied by 28mm, 20 pictures with the pixel size of 480 multiplied by 640 are collected and are led into an MATLAB toolbox for camera calibration.

In the MATLAB toolbox calibration experiment, the camera calibration result needs to be evaluated, and the camera calibration result can be analyzed through the projection error analysis of the camera and the three-dimensional external parameter visualization with the camera as the center and the calibration plate as the center, wherein the calibration process is specifically shown in FIG. 7.

After the MATLAB toolbox calibration experiment, camera Params. IntrinsicMatrix is input into a command line of the MATLAB toolbox calibration experiment, an internal parameter matrix of the camera can be solved, and radial distortion and tangential distortion coefficients of the camera can be solved by inputting RadiationDistorention and TangentiationDistorention, wherein the physical parameters of the specific camera are shown in Table 1.

SURF feature point target detection and pose resolving experiment

In order to verify the effectiveness of the proposed algorithm in grabbing articles by the servo mechanical arm, a pose detection experiment is completed, wherein the pose detection experiment comprises an SURF (speeded Up robust feature) invariant feature point target detection experiment and a pose resolving experiment. Meanwhile, in order to make the experiment more convincing, a writer completes two groups of pose detection contrast experiments under a single background and a complex background (without shielding), and observes whether the algorithm experiment effect of pose detection is influenced under the complex background. And (3) respectively taking the experimental results of pose detection experiments under five groups of single backgrounds and five groups of complex backgrounds for analysis, comparison and evaluation, wherein the experimental results are shown in a table 2. The experimental process of pose detection under a single background and the experimental process of pose detection under a complex background are shown in fig. 8 and 9.

TABLE 1 calibration results of the camera

Experiments show that the number of feature points extracted from SURF invariant feature points in a single background is relatively large, but cross points are easy to appear; although the number of extracted feature points in the complex background is less, the number of the appeared cross points is also less; the two are equivalent in the accuracy and matching time of feature point matching, the overall single background is slightly better than the complex background in the matching effect, but the algorithm can meet the requirement of detecting articles in the complex background from the overall experimental result. In a single background or a complex background, the matching speed and the matching effect of the characteristic points of the SURF invariant characteristic point matching experiment are relatively accurate and stable, any deflection angle of an article on an o-xy plane, namely the pose of the article on a two-dimensional plane can be matched, and two theoretical mechanical finger clamping points A and B can be solved by using an image algorithm and a proposed pose solving algorithm so as to obtain a pose angle alpha.

The accuracy of the calculated pose angle alpha is verified through an article grabbing experiment, wherein the grabbing experiment takes a hand grab as a reference object, 6 articles in clockwise and anticlockwise states are taken respectively, the pixel size is 640 multiplied by 480 pixels, and the pose angle alpha is the pose state of a two-dimensional coordinate system under a complex background. The internal parameters u0, v0, dx and dy of the camera and two clamping points A and B of the mechanical arm for grabbing the object can be obtained through a camera calibration experiment and an SURF characteristic point matching experiment, and the optimal pose angle alpha, namely the rotation angle beta of the mechanical arm tail end hand grab, is solved by substituting the parameters into a formula (8). The calculated pose angle alpha and the hand-held rotation angle beta at the tail end of the mechanical arm have certain errors, and in order to verify whether the errors influence the object gripping, an error analysis table of the pose angle and the hand-held rotation angle is listed, as shown in table 3.

TABLE 2 pose detection result analysis under single background and complex background

In table 3, a certain error exists between the pose angle α and the rotation angle β output to the gripper, and the rotation precision of the steering engine in the gripper at the tail end of the mechanical arm of the double-arm robot is analyzed, the rotation precision of the steering engine can only be 0.1 bit, and the absolute error belongs to the absolute error of the system and is unavoidable; the relative error is found by the experimental grabbing to be derived from light, the texture of the surface of the object and the like. In the experimental grabbing process, the error is smaller than 1-2 degrees, and the grabbing of the article is not influenced. In order to avoid collision of the mechanical arm in grabbing, a person firstly simulates grabbing of an article at the PC end and finally downloads the compiled code into a double-arm machine system. The effectiveness of the proposed algorithm is well verified by the selected 12 groups of object grabbing experiments, and the accuracy is provided for visual grabbing. And (3) performing posture resolving and article grabbing experiments, as shown in FIG. 10.

The optimal pose detection algorithm for measuring the object by using the monocular vision system combines monocular vision with the servo mechanical arm, and realizes that the SURF invariant feature points are matched with a target object through the processes of camera calibration, image acquisition, image preprocessing and the like, thereby completing target detection. The position transformation relation from a pixel coordinate system to an image coordinate system is calculated through a camera imaging principle, the optimal pose angle of the target object, namely the angle of the rotation of the servo manipulator, is calculated by utilizing the geometric relation of the slope calculated between two points, the algorithm is simple and easy to implement, and the problem of image matching resource occupation in the binocular vision pose measurement process can be effectively solved. The effectiveness and the accuracy of the algorithm are verified through experiments, the detection of the optimal position and posture of the article and the determination of the grasping and holding point of the manipulator can be completed, and the grasping requirement of the visual servo manipulator is met. Meanwhile, the measurement algorithm can be applied to the fields of industrial part positioning, palletizing robot carrying and the like, and has wide application prospect.

TABLE 3 error analysis chart for pose angle and hand-holding rotation angle

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. The optimal pose detection method for grabbing the article by the visual servo mechanical arm is characterized by comprising the following steps of: comprises the following steps of (a) carrying out,

reading a photo;

step two, extracting SURF characteristic points;

step four, preliminarily establishing a matching pair containing a wild value;

step six, acquiring a polygonal frame of the target object;

resolving the optimal pose under a two-dimensional coordinate system;

step eight, the mechanical arm is controlled by a servo to grab the article;

in the first step, a camera imaging model is adopted for photo shooting and camera calibration; the camera imaging model is composed of a world coordinate system O _w -X _w Y _w Z _w Camera coordinate system O _c -X _c Y _c Z _c Pixel coordinate system O _p Uv and image coordinate system O _i -X _i Y _i Forming; where P is a point in the camera coordinate system and the coordinates are (X) _c ,Y _c ,Z _c ) (ii) a P' is an imaging point in the image, the coordinates in the image coordinate system are (x, y), and the coordinates in the pixel coordinate system are (u, v); the camera calibration is to determine the relationship among a camera coordinate system, an image coordinate system, a pixel coordinate system and a real coordinate system;

in step two, comprising the following sub-steps,

Σ＝A-B-C+D

it can be seen that the H matrix is formed by the second-order partial derivative of the function f, and each pixel point can solve an H matrix; in the formula: l is _xx (x, σ) is the convolution of the function f with the second derivative of the gaussian scaled by σ, which is defined as follows:

for L _xy (x, σ) and L _yy (x, σ), defined similarly;

step three, establishing a scale space; the detection algorithm and extraction of the SUFR characteristic extreme point are based on a scale space theory; by changing the scale at which the gaussian filter is used, instead of changing the image itself, a response to a different scale is constructed;

extracting feature points; removing determined characteristic points by using a non-extreme value, comparing the size of each pixel point of the Hessian determinant image with 26 points of a 3-dimensional neighborhood, setting a threshold, obtaining the characteristic points of a sub-pixel level by using a 3-dimensional linear interpolation method, and removing the points of which the characteristic values are smaller than the threshold to obtain more stable points;

selecting the direction of the characteristic points; with the feature point as the center, detecting the interest point to determine the scale s, counting the Haar wavelet responses of all the points in the sector, and giving a larger weight to the pixel close to the feature point; counting the sum of all Haar wavelet responses in the region to form a vector, namely the direction of a sector; traversing all the fan-shaped areas, and selecting the longest vector direction as the direction of the feature point;

a sixth substep of constructing a characteristic point descriptor of the SURF; constructing a square window with 20s as side length by taking the characteristic point as a center, wherein s is the scale of the characteristic point; dividing the image into 16 4 x 4 subregions, wherein each subregion counts the sum of Haar wavelet characteristics of horizontal x and vertical y containing 25 pixel elements to be respectively sigma H _x Sum-sigma H _y (ii) a Counting 4-dimensional descriptors V [ ∑ H ] of each subarea _x ,∑H _y ,∑|H _x |,∑|H _y |]Obtaining a feature descriptor with a feature vector length of 64 dimensions;

in the seventh step, the process is carried out,

images are arranged in a pixel coordinate system O _p-uv The middle position relation is converted into the physical coordinate O of the image _i -X _i Y _i The purpose of the method is to calculate the geometric relationship in a physical coordinate system of an image, and the conversion formula is in a matrix form (4):

where (u, v) are pixels in the horizontal and vertical directions, let O _i (u ₀ ,v ₀ ) Is shown in the figureThe pixels in the center of the image coordinate system, dx and dy are the actual physical dimensions of each pixel in the directions of the x-axis and the y-axis, and u is known from the calibration of the camera ₀ ,v ₀ Dx, dy; the centroid O of the article can be obtained by using the image algorithm in the article rectangular frame CDEF matched with the SURF invariant feature point algorithm ₄ (u _s ,v _s ) And CDEF pixel coordinates, and further determining the article holding point A (u) based on the geometric relationship ₁ ,v ₁ ) And B (u) ₂ ,v ₂ ) The pixel coordinates of (2) which are matched with the rectangular frame of the article;

and formula (6):

found A (x) ₁ ,y ₁ ) And B (x) ₂ ,y ₂ ) The included angle alpha, namely the optimal pose angle, can be solved by the physical image coordinates of the image, theoretically, the rotation angle given to the mechanical hand by a servo is also alpha, but the rotation angle is smaller than or larger than alpha due to the system error of the mechanical arm, so the rotation angle is beta; taking the clockwise rotation of the manipulator grab relative to the article as positive and the anticlockwise rotation as negative, the slope of the manipulator grab can be obtained by utilizing the geometric relationship between two points in a coordinate system, and a formula (7) can be obtained:

from equations (5), (6) and (7), equation (8) can be derived:

calculating an angle alpha in the formula (8), wherein the value range is [0,90 degrees ], and if the angle alpha is in the regular clockwise rotation when the mechanical hand is held by the regular manipulator, the angle alpha is in the counterclockwise rotation when the mechanical hand is negative; alpha is the pose angle of the article on a two-dimensional plane and is also the clamping angle of the rotation of the gripper, and the clamping points are two points A and B; as can be seen from the formula (8), dx and dy are intrinsic parameters of the camera, so that alpha is only related to the difference between the pixel points of the two points A and B, thereby simplifying the complex calculation in the image and improving the visual servo effect.