Nothing Special   »   [go: up one dir, main page]

CN118552626B - Single-view image camera calibration method and system - Google Patents

Single-view image camera calibration method and system Download PDF

Info

Publication number
CN118552626B
CN118552626B CN202411002916.0A CN202411002916A CN118552626B CN 118552626 B CN118552626 B CN 118552626B CN 202411002916 A CN202411002916 A CN 202411002916A CN 118552626 B CN118552626 B CN 118552626B
Authority
CN
China
Prior art keywords
layer
module
view image
encoder
image camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411002916.0A
Other languages
Chinese (zh)
Other versions
CN118552626A (en
Inventor
陈再良
刘栩菁
沈海澜
张健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202411002916.0A priority Critical patent/CN118552626B/en
Publication of CN118552626A publication Critical patent/CN118552626A/en
Application granted granted Critical
Publication of CN118552626B publication Critical patent/CN118552626B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种单视角图像相机标定方法,包括获取现有的图像数据集;基于图像的几何特征提取图像对应的特征向量并构建训练数据集;构建包括编码器网络和解码器网络的单视角图像相机标定初步模型并训练得到单视角图像相机标定模型;采用单视角图像相机标定模型完成目标单视角图像相机的参数标定。本发明还公开了一种实现所述单视角图像相机标定方法的系统。本发明能够在不预设标志物的场景中获得更精准的标定结果,而且可靠性更高,精确性更好,实用性更好。

The present invention discloses a method for calibrating a single-view image camera, including obtaining an existing image data set; extracting a feature vector corresponding to the image based on the geometric features of the image and constructing a training data set; constructing a preliminary model for calibrating a single-view image camera including an encoder network and a decoder network and training to obtain a single-view image camera calibration model; and using the single-view image camera calibration model to complete parameter calibration of a target single-view image camera. The present invention also discloses a system for implementing the single-view image camera calibration method. The present invention can obtain more accurate calibration results in scenes without preset markers, and has higher reliability, better accuracy, and better practicality.

Description

Single-view image camera calibration method and system
Technical Field
The invention belongs to the field of image processing, and particularly relates to a single-view image camera calibration method and system.
Background
Camera calibration is a key pre-step in the fields of vision and images, and plays a vital role in various applications such as three-dimensional reconstruction, target tracking, robot navigation and the like. Only with accurate camera calibration can an accurate mapping of the two-dimensional image to the three-dimensional world be established. In recent years, along with the continuous development of computer vision technology, the application range of camera calibration is continuously expanded, and the application range is expanded from industrial detection, unmanned aerial vehicle driving to emerging fields such as virtual reality, unmanned aerial vehicle driving and the like. Currently, application scenes of single-view image cameras are gradually increased, such as internet images and the like; therefore, there is also a higher demand for the accuracy of calibration of single view image cameras.
The traditional single-view image camera calibration method requires that a specified marker must exist in a calibration image; although higher accuracy can be achieved in an ideal case, the calibration performance of the calibration method depends on the design and quality of the marker to a great extent. When the unknown scene is faced, the characteristic information extracted by the traditional single-view image camera calibration method often has deviation due to the lack of a preset marker or distortion of the marker or the presence of a large amount of noise interference in the image; this results in the fact that such schemes often perform poorly in non-ideal situations. Aiming at the limitation of the traditional method, researchers propose a single-view image camera calibration scheme based on a deep learning technology.
However, most of the current single-view image camera calibration schemes based on deep learning are based on a data driving mode, namely, a constructed large-scale data set is utilized to directly return to a camera focal length or related geometric constraint. Although such methods alleviate the reliance on markers and scenes to some extent, the effectiveness of such schemes depends on the amount and quality of the data, and large-scale diversity of training data is required to achieve good results. In addition, the calibration problem is regarded as a black box regression problem, so that display modeling and reasoning of camera imaging geometry are omitted, and full utilization of geometric structure features associated with different prediction targets is lacking. Therefore, the practical applicability and accuracy of this type of solution are also relatively poor.
Disclosure of Invention
The invention aims to provide a single-view image camera calibration method with high reliability, good accuracy and good practicability.
The second purpose of the invention is to provide a system for realizing the calibration method of the single-view image camera.
The single-view image camera calibration method provided by the invention comprises the following steps:
s1, acquiring an existing image data set;
s2, extracting feature vectors corresponding to the images based on the geometric features of the images from the image data set obtained in the step S1, so as to construct a training data set;
s3, constructing a single-view image camera calibration preliminary model comprising an encoder network and a decoder network;
The encoder network is used for extracting geometric features and context information in the input image and adaptively focusing on key areas in the input image;
The decoder network is used for mapping the output characteristics of the encoder network and the characteristics of the input image to the parameter space of the single-view image camera so as to predict the internal parameters and the external parameters of the single-view image camera;
S4, training the single-view image camera calibration preliminary model constructed in the step S3 by adopting the training data set constructed in the step S2 to obtain a single-view image camera calibration model;
S5, adopting the single-view image camera calibration model constructed in the step S4 to finish parameter calibration of the target single-view image camera.
The step S2 specifically comprises the following steps:
for the image dataset acquired in step S1, the line L existing in the image is extracted by using LSD (LINE SEGMENT Detector) algorithm, which is expressed as WhereinFor the image of the i-th sheet,The number of detected line segments;
each line segment is processed according to the linear equation of the line segment in the image plane Representation of whereinFor the image coordinates, a is the normal vector component in the direction of the x axis of the straight line, b is the normal vector component in the direction of the y axis of the straight line, and c is the origin point of the straight lineIs a distance of (2);
conversion to obtain an upper triangular matrix And encoded as a 6-dimensional vectorAs a feature vector for the image.
The step S3 comprises the following steps:
constructing an encoder network based on the spatial domain geometry and the frequency domain geometry;
A decoder network is constructed based on a self-attention mechanism.
The construction of the encoder network specifically comprises the following steps:
The constructed encoder network comprises an encoding module, a frequency domain module, an attention module and an reasoning module which are sequentially connected in series; the input of the encoder network is an image in the training dataset;
The coding module is used for carrying out preliminary feature extraction and representation learning on the input image; the encoding module comprises a first encoder layer, a second encoder layer, a third encoder layer and a fourth encoder layer; the input of the first encoder layer and the output of the first encoder layer are added to be used as the input of the second encoder layer; the input of the second encoder layer and the output of the second encoder layer are added to be used as the input of the third encoder layer; the input of the third encoder layer and the output of the third encoder layer are added to be used as the input of the fourth encoder layer; the input of the fourth encoder layer and the output of the fourth encoder layer are added to be used as the output of the encoding module; the first encoder layer, the second encoder layer, the third encoder layer and the fourth encoder layer have the same structure and comprise a first convolution layer, a second convolution layer, a third convolution layer, a batch normalization layer and a ReLU activation function layer; the first convolution layer is Is a convolution layer of (2); the second convolution layer isIs a convolution layer of (2); the third convolution layer isIs a convolution layer of (2);
The frequency domain module is used for extracting geometrical feature distribution of the image under different frequencies; the frequency domain module comprises a first normalization layer, a fast Fourier transform layer, a complex parameter layer, a fast inverse Fourier transform layer and a second normalization layer which are sequentially connected in series; the output of the coding module is normalized by the first normalization layer and then is subjected to fast Fourier transform by the fast Fourier transform layer; the output of the fast Fourier transform layer is processed by a complex parameter layer to control the importance of different frequency components; the output of the complex parameter layer is subjected to fast inverse Fourier transform through a fast inverse Fourier transform layer, and finally normalized through a second normalization layer, so that the output of the frequency domain module is obtained; wherein the processing procedure of the complex parameter processing layer is expressed as Where w represents a weight parameter,Representing the real tensor after fourier transformation,Representing the imaginary tensor after Fourier transformation, i representing the imaginary unit;
The attention module is used for extracting and enhancing geometric features related to single-view image camera calibration in the image; the attention module comprises a linear transformation layer, a first zooming point lamination layer and a geometric feature perception layer which are sequentially connected in series; the linear transformation layer is composed of a full connection layer and is used for carrying out linear transformation on the input characteristic data; a first scale point overlay is used to capture the relative importance and degree of correlation between features, the process of which is expressed as WhereinAndRepresenting the matrix obtained by the linear transformation layer,Representation ofS represents a softmax function; the geometric feature perception layer is used for extracting geometric information feature representation of input features, the geometric feature perception layer comprises a first branch, a second branch and GeLU activation function sublayers, after the input features pass through the first branch and the second branch respectively, the output of the first branch and the output of the second branch are added, and then processed by the GeLU activation function sublayers and multiplied by the output of the scaling point lamination to obtain the output of the geometric feature perception layer, the first branch comprises sequentially connected in seriesA convolution layer, a batch normalization layer and a sigmoid activation function layer, and a second branch comprisesA convolution layer, a batch normalization layer and a sigmoid activation function layer;
the reasoning module is used for carrying out nonlinear change on the output characteristics of the attention module and enhancing the characteristic extraction capacity of the model; the reasoning module comprises a first full-connection layer, a ReLu activation function layer and a second full-connection layer which are sequentially connected in series.
The construction of the decoder network specifically comprises the following steps:
The built decoder network comprises a convolution layer, a first attention layer, a second attention layer and a feedforward neural network layer which are sequentially connected in series; the input of the decoder network is the output of the encoder network and the feature vector corresponding to the image in the training data set;
The convolution layer is Is a convolution layer of (2);
The first attention layer and the second attention layer have the same structure and comprise a linear projection sublayer, a second zoom point product sublayer, a softmax sublayer and a third zoom point product sublayer; the linear projection sublayer is used for carrying out linear projection on the input characteristics to obtain a Q value matrix Q, K value matrix K and a V value matrix V through transformation; the second scaling dot product sub-layer is used for calculating the correlation of the Q value matrix Q and the K value matrix K, and the processing procedure of the second scaling dot product sub-layer is expressed as WhereinAndRepresenting the matrix obtained by linearly projecting the sub-layers,Representation ofDimension size of (2); the softmax sublayer is used for carrying out normalization processing on input data; the third scaling dot product sub-layer is used for extracting global context information, and the processing procedure of the third scaling dot product sub-layer is expressed asWhereinRepresenting a matrix obtained by linear projection of the sub-layer;
The feedforward neural network layer adopts a feedforward neural network to transform the obtained global context information into an internal reference space and an external reference space of the single-view image camera, so that parameter calibration of the single-view image camera is realized.
The training of step S4 specifically includes the following steps:
the following formula is adopted as the logarithmic space loss function Wherein w is the width of the image; in is the true value of the internal parameter of the single-view image camera; A predicted value of an internal reference of the single-view image camera; Is a set threshold value;
when predicting vanishing point constraint of camera external parameters, the following formula is adopted as a first similarity loss function Wherein vp is the true value of the vanishing point coordinates; A predicted value of vanishing point coordinates; Is the norm of the vector;
When the horizon loss of the camera external parameters is predicted, the following formula is adopted as a second similarity loss function Wherein n is the number of selected terminal points; hor is the true value of the horizon; a predicted value for the horizon; calculating functions for coordinates of left and right endpoints of the horizon; Manhattan distance for vector;
Finally, the total loss function LL is constructed as As a first weight to be used,As a result of the second weight being set,Is a third weight.
The invention also provides a system for realizing the single-view image camera calibration method, which comprises a data acquisition module, a training set construction module, a model training module and a camera calibration module; the data acquisition module, the training set construction module, the model training module and the camera calibration module are sequentially connected in series; the data acquisition module is used for acquiring the existing image data set and uploading the data information to the training set construction module; the training set construction module is used for extracting feature vectors corresponding to the images from the acquired image data set according to the received data information, so as to construct a training data set, and uploading the data information to the model construction module; the model construction module is used for constructing a single-view image camera calibration preliminary model comprising an encoder network and a decoder network according to the received data information, and uploading the data information to the model training module; the model training module is used for training the constructed single-view image camera calibration preliminary model by adopting the constructed training data set according to the received data information to obtain a single-view image camera calibration model, and uploading the data information to the camera calibration module; and the camera calibration module is used for completing parameter calibration of the target single-view image camera by adopting the constructed single-view image camera calibration model according to the received data information.
According to the single-view image camera calibration method and system, the feature vectors are extracted based on the geometric features of the images, the geometric features related to the targets in the images are effectively extracted, richer and targeted feature information is provided for the training process of the model, and the constructed calibration network is guided to carry out corresponding training, so that the scheme of the invention can obtain more accurate calibration results in scenes without presetting markers, and the method is higher in reliability, better in accuracy and better in practicability.
Drawings
FIG. 1 is a flow chart of the calibration method of the present invention.
Fig. 2 is a visual schematic diagram of the calibration method of the present invention, fig. 2 (a) is a visual schematic diagram of vanishing points, fig. 2 (b) is a visual schematic diagram of the horizon, and fig. 2 (c) is a visual schematic diagram of an internal reference.
Fig. 3 is a schematic diagram of calibration results of the calibration method of the present invention, fig. 3 (a 1) - (e 1) are schematic diagrams of comparison of the calibration horizon results and true values, and fig. 3 (a 2) - (e 2) are schematic diagrams of calibration horizon and vanishing point results.
FIG. 4 is a schematic diagram of functional modules of the system of the present invention.
Detailed Description
FIG. 1 is a schematic flow chart of the calibration method of the present invention: the invention discloses a single-view image camera calibration method, which comprises the following steps:
s1, acquiring an existing image data set;
In specific implementation, a google street view data set and an HLW data set can be obtained, wherein the data set comprises pictures and parameters of a corresponding single-view image camera;
S2, extracting feature vectors corresponding to the images based on the geometric features of the images from the image data set obtained in the step S1, so as to construct a training data set; the method specifically comprises the following steps:
for the image dataset acquired in step S1, the line L existing in the image is extracted by using LSD (LINE SEGMENT Detector) algorithm, which is expressed as WhereinFor the image of the i-th sheet,The number of detected line segments;
each line segment is processed according to the linear equation of the line segment in the image plane Representation of whereinFor the image coordinates, a is the normal vector component in the direction of the x axis of the straight line, b is the normal vector component in the direction of the y axis of the straight line, and c is the origin point of the straight lineIs a distance of (2);
conversion to obtain an upper triangular matrix And encoded as a 6-dimensional vectorAs a feature vector corresponding to the image;
s3, constructing a single-view image camera calibration preliminary model comprising an encoder network and a decoder network;
The encoder network is used for extracting geometric features and context information in the input image and adaptively focusing on key areas in the input image; the key region is a key region containing remarkable geometric clues;
The decoder network is used for mapping the output characteristics of the encoder network and the characteristics of the input image to the parameter space of the single-view image camera so as to predict the internal parameters and the external parameters of the single-view image camera;
The specific implementation method comprises the following steps:
Constructing an encoder network based on the spatial domain geometry and the frequency domain geometry; the method specifically comprises the following steps:
The constructed encoder network comprises an encoding module, a frequency domain module, an attention module and an reasoning module which are sequentially connected in series; the input of the encoder network is an image in the training dataset;
The coding module is used for carrying out preliminary feature extraction and representation learning on the input image; the encoding module comprises a first encoder layer, a second encoder layer, a third encoder layer and a fourth encoder layer; the input of the first encoder layer and the output of the first encoder layer are added to be used as the input of the second encoder layer; the input of the second encoder layer and the output of the second encoder layer are added to be used as the input of the third encoder layer; the input of the third encoder layer and the output of the third encoder layer are added to be used as the input of the fourth encoder layer; the input of the fourth encoder layer and the output of the fourth encoder layer are added to be used as the output of the encoding module; the first encoder layer, the second encoder layer, the third encoder layer and the fourth encoder layer have the same structure and comprise a first convolution layer, a second convolution layer, a third convolution layer, a batch normalization layer and a ReLU activation function layer; the first convolution layer is Is a convolution layer of (2); the second convolution layer isIs a convolution layer of (2); the third convolution layer isIs a convolution layer of (2);
The frequency domain module is used for extracting geometrical feature distribution of the image under different frequencies, wherein the geometrical features can show special distribution in the frequency domain; the frequency domain module comprises a first normalization layer, a fast Fourier transform layer, a complex parameter layer, a fast inverse Fourier transform layer and a second normalization layer which are sequentially connected in series; the output of the coding module is normalized by the first normalization layer and then is subjected to fast Fourier transform by the fast Fourier transform layer; the output of the fast Fourier transform layer is processed by a complex parameter layer to control the importance of different frequency components; the output of the complex parameter layer is subjected to fast inverse Fourier transform through a fast inverse Fourier transform layer, and finally normalized through a second normalization layer, so that the output of the frequency domain module is obtained; wherein the processing procedure of the complex parameter processing layer is expressed as Where w represents a weight parameter,Representing the real tensor after fourier transformation,Representing the imaginary tensor after Fourier transformation, i representing the imaginary unit;
The attention module is used for extracting and enhancing geometric features related to single-view image camera calibration in the image; the attention module comprises a linear transformation layer, a first zooming point lamination layer and a geometric feature perception layer which are sequentially connected in series; the linear transformation layer is composed of a full connection layer and is used for carrying out linear transformation on the input characteristic data; the first scaling dot lamination consists of matrix multiplication and scaling operations for obtaining a weight matrix to capture the relative importance and degree of correlation between features, and its processing is expressed as WhereinAndRepresenting the matrix obtained by the linear transformation layer,Representation ofS represents a softmax function; the geometric feature perception layer is used for extracting geometric information feature representation of input features, the geometric feature perception layer comprises a first branch, a second branch and GeLU activation function sublayers, after the input features pass through the first branch and the second branch respectively, the output of the first branch and the output of the second branch are added, and then processed by the GeLU activation function sublayers and multiplied by the output of the scaling point lamination to obtain the output of the geometric feature perception layer, the first branch comprises sequentially connected in seriesA convolution layer, a batch normalization layer and a sigmoid activation function layer, and a second branch comprisesA convolution layer, a batch normalization layer and a sigmoid activation function layer;
The reasoning module is used for carrying out nonlinear change on the output characteristics of the attention module and enhancing the characteristic extraction capacity of the model; the reasoning module comprises a first full-connection layer, a ReLu activation function layer and a second full-connection layer which are sequentially connected in series;
constructing a decoder network based on a self-attention mechanism; the method specifically comprises the following steps:
The built decoder network comprises a convolution layer, a first attention layer, a second attention layer and a feedforward neural network layer which are sequentially connected in series; the input of the decoder network is the output of the encoder network and the feature vector corresponding to the image in the training data set;
The convolution layer is Is a convolution layer of (2);
The first attention layer and the second attention layer have the same structure and comprise a linear projection sublayer, a second zoom point product sublayer, a softmax sublayer and a third zoom point product sublayer; the linear projection sublayer is used for carrying out linear projection on the input characteristics to obtain a Q value matrix Q, K value matrix K and a V value matrix V through transformation; the second scaling dot product sub-layer is used for calculating the correlation of the Q value matrix Q and the K value matrix K, and the processing procedure of the second scaling dot product sub-layer is expressed as WhereinAndRepresenting the matrix obtained by linearly projecting the sub-layers,Representation ofDimension size of (2); the softmax sublayer is used for carrying out normalization processing on input data; the third scaling dot product sub-layer is used for extracting global context information, and the processing procedure of the third scaling dot product sub-layer is expressed asWhereinRepresenting a matrix obtained by linear projection of the sub-layer;
the feedforward neural network layer adopts a feedforward neural network to transform the obtained global context information into an internal reference space and an external reference space of the single-view image camera, so that parameter calibration of the single-view image camera is realized;
S4, training the single-view image camera calibration preliminary model constructed in the step S3 by adopting the training data set constructed in the step S2 to obtain a single-view image camera calibration model; the training specifically comprises the following steps:
the following formula is adopted as the logarithmic space loss function Wherein w is the width of the image; in is the true value of the internal parameter of the single-view image camera; A predicted value of an internal reference of the single-view image camera; Is a set threshold value;
when predicting vanishing point constraint of camera external parameters, the following formula is adopted as a first similarity loss function Wherein vp is the true value of the vanishing point coordinates; A predicted value of vanishing point coordinates; Is the norm of the vector;
When the horizon loss of the camera external parameters is predicted, the following formula is adopted as a second similarity loss function Wherein n is the number of selected terminal points; hor is the true value of the horizon; a predicted value for the horizon; calculating functions for coordinates of left and right endpoints of the horizon; Manhattan distance for vector;
finally, the total loss function L is constructed as As a first weight to be used,As a result of the second weight being set,Is a third weight;
S5, adopting the single-view image camera calibration model constructed in the step S4 to finish parameter calibration of the target single-view image camera.
The scheme of the invention can predict the internal parameters and the external parameters of the camera under the condition of no need of markers by sensing the geometric clues in the image, so that the method can be used in wider scenes, and has better practicability and higher accuracy.
FIG. 2 is a visual schematic diagram of the calibration method of the present invention:
Fig. 2 (a) corresponds to a feature representation of learned vanishing points, and it can be found that the points of interest of the network are concentrated on parallel lines on both sides of the road surface street, and the network extracts geometrical information closely related to the vanishing points by capturing these parallel line segments and their extending directions. The network effectively learns the characteristic representation related to vanishing points by utilizing the convergence trend of the parallel lines;
FIG. 2 (b) corresponds to a characteristic representation of the learned horizon, the thermodynamic diagram highlighting the contour of the building bottom and the boundary between the sky and the ground; by focusing on capturing the distinction between building bottom and ground objects, the network is able to locate the position of the horizon in the image using a characteristic representation of the geometric boundary;
FIG. 2 (c) corresponds to a feature representation of the learning internal reference, with thermodynamic diagrams focused near the center of the image, in the form of concentric circles; this feature distribution represents geometrical properties related to perspective projection, and the network provides important constraints for computing internal parameters of the image by capturing features around the center.
As can be seen from fig. 2, the network of the present invention has the ability to perceive the geometric cues corresponding to different prediction targets, and adaptively focuses on the key geometric cues in the image for a specific prediction target, so that a more accurate calibration result can be obtained on the basis of having a certain interpretation.
Fig. 3 is a schematic diagram of calibration results of the calibration method of the present invention, and fig. 3 shows comparison results of experimental results and true values, and prediction results of a horizon and a vanishing point, respectively, in which a green line represents the prediction results of the horizon, a first line of red lines represents the true values of the horizon, and a second line of red lines represents the prediction results of the vanishing point; as can be seen from FIG. 3, the calibration method of the present invention is generally close to a true value, shows higher accuracy and consistency, and again illustrates the effectiveness and superiority of the calibration method of the present invention.
The effect of the calibration method of the present invention is further described below with reference to the examples:
Comparing the method provided by the invention with the existing calibration method (traditional method, 2018 method, 2020 method and 2021 method); the traditional method is a method proposed by Hyunjoon Lee in Automatic Upright Adjustment of Photographs With Robust Camera Calibration in 2014; the 2018 method is Yannick Hold-Geoffroy method proposed in A Perceptual Measure for DEEP SINGLE IMAGE CAMERA calisation in 2018; the method in 2020 is Jinwoo Lee the method proposed in Neural Geometric Parser for SINGLE IMAGE CAMERA calisation in 2020; the method of 2021 is Jinwoo Lee (authors) in 2021 (time) in CTRL-C: camera calibration TRansformer with Line-Classification;
Experiments were performed on the disclosed google street view dataset, containing 13214 training sets and 1333 test sets, when compared. In the experiment, the average error of internal reference calibration, the average error of vanishing point prediction and the AUC value of horizon prediction are taken as evaluation criteria. All experimental results were obtained on the test set.
Specific comparative data are shown in table 1:
table 1 comparative experimental data comparative schematic table
As can be seen from the experimental results in Table 1, the calibration method provided by the invention obtains optimal results on various indexes, is superior to the traditional method and other deep learning methods, and proves the effectiveness and superiority of the method provided by the invention.
FIG. 4 is a schematic diagram of functional modules of the system of the present invention: the system for realizing the single-view image camera calibration method comprises a data acquisition module, a training set construction module, a model training module and a camera calibration module; the data acquisition module, the training set construction module, the model training module and the camera calibration module are sequentially connected in series; the data acquisition module is used for acquiring the existing image data set and uploading the data information to the training set construction module; the training set construction module is used for extracting feature vectors corresponding to the images from the acquired image data set according to the received data information, so as to construct a training data set, and uploading the data information to the model construction module; the model construction module is used for constructing a single-view image camera calibration preliminary model comprising an encoder network and a decoder network according to the received data information, and uploading the data information to the model training module; the model training module is used for training the constructed single-view image camera calibration preliminary model by adopting the constructed training data set according to the received data information to obtain a single-view image camera calibration model, and uploading the data information to the camera calibration module; and the camera calibration module is used for completing parameter calibration of the target single-view image camera by adopting the constructed single-view image camera calibration model according to the received data information.
In addition, the single-view image camera calibration method and the single-view image camera calibration system can be used for single-view image cameras, wherein the single-view image cameras comprise the single-view image camera calibration method, and the single-view image camera calibration method is adopted for single-view image camera calibration. When the method is specifically applied, the single-view image camera calibration model obtained after training in the calibration method is integrated into the single-view image camera; when the single-view image camera calibration model is used, a single-view image camera integrated with the single-view image camera calibration model is adopted, photographing is carried out at will, the obtained photo is input into the single-view image camera calibration model, the single-view image camera calibration model can realize parameter calibration on the single-view image camera through the input photo, and the calibrated parameters are input into the single-view image camera to complete the calibration process.

Claims (5)

1. A single-view image camera calibration method is characterized by comprising the following steps:
s1, acquiring an existing image data set;
s2, extracting feature vectors corresponding to the images based on the geometric features of the images from the image data set obtained in the step S1, so as to construct a training data set;
s3, constructing a single-view image camera calibration preliminary model comprising an encoder network and a decoder network;
The encoder network is used for extracting geometric features and context information in the input image and adaptively focusing on key areas in the input image;
The decoder network is used for mapping the output characteristics of the encoder network and the characteristics of the input image to the parameter space of the single-view image camera so as to predict the internal parameters and the external parameters of the single-view image camera;
The specific implementation method comprises the following steps:
Constructing an encoder network based on the spatial domain geometry and the frequency domain geometry; the method specifically comprises the following steps:
The constructed encoder network comprises an encoding module, a frequency domain module, an attention module and an reasoning module which are sequentially connected in series; the input of the encoder network is an image in the training dataset;
The coding module is used for carrying out preliminary feature extraction and representation learning on the input image; the encoding module comprises a first encoder layer, a second encoder layer, a third encoder layer and a fourth encoder layer; the input of the first encoder layer and the output of the first encoder layer are added to be used as the input of the second encoder layer; the input of the second encoder layer and the output of the second encoder layer are added to be used as the input of the third encoder layer; the input of the third encoder layer and the output of the third encoder layer are added to be used as the input of the fourth encoder layer; the input of the fourth encoder layer and the output of the fourth encoder layer are added to be used as the output of the encoding module; the first encoder layer, the second encoder layer, the third encoder layer and the fourth encoder layer have the same structure and comprise a first convolution layer, a second convolution layer, a third convolution layer, a batch normalization layer and a ReLU activation function layer; the first convolution layer is Is a convolution layer of (2); the second convolution layer isIs a convolution layer of (2); the third convolution layer isIs a convolution layer of (2);
The frequency domain module is used for extracting geometrical feature distribution of the image under different frequencies; the frequency domain module comprises a first normalization layer, a fast Fourier transform layer, a complex parameter layer, a fast inverse Fourier transform layer and a second normalization layer which are sequentially connected in series; the output of the coding module is normalized by the first normalization layer and then is subjected to fast Fourier transform by the fast Fourier transform layer; the output of the fast Fourier transform layer is processed by a complex parameter layer to control the importance of different frequency components; the output of the complex parameter layer is subjected to fast inverse Fourier transform through a fast inverse Fourier transform layer, and finally normalized through a second normalization layer, so that the output of the frequency domain module is obtained; wherein the processing procedure of the complex parameter processing layer is expressed as Where w represents a weight parameter,Representing the real tensor after fourier transformation,Representing the imaginary tensor after Fourier transformation, i representing the imaginary unit;
The attention module is used for extracting and enhancing geometric features related to single-view image camera calibration in the image; the attention module comprises a linear transformation layer, a first zooming point lamination layer and a geometric feature perception layer which are sequentially connected in series; the linear transformation layer is composed of a full connection layer and is used for carrying out linear transformation on the input characteristic data; a first scale point overlay is used to capture the relative importance and degree of correlation between features, the process of which is expressed as WhereinAndRepresenting the matrix obtained by the linear transformation layer,Representation ofS represents a softmax function; the geometric feature perception layer is used for extracting geometric information feature representation of input features, the geometric feature perception layer comprises a first branch, a second branch and GeLU activation function sublayers, after the input features pass through the first branch and the second branch respectively, the output of the first branch and the output of the second branch are added, and then processed by the GeLU activation function sublayers and multiplied by the output of the scaling point lamination to obtain the output of the geometric feature perception layer, the first branch comprises sequentially connected in seriesA convolution layer, a batch normalization layer and a sigmoid activation function layer, and a second branch comprisesA convolution layer, a batch normalization layer and a sigmoid activation function layer;
The reasoning module is used for carrying out nonlinear change on the output characteristics of the attention module and enhancing the characteristic extraction capacity of the model; the reasoning module comprises a first full-connection layer, a ReLu activation function layer and a second full-connection layer which are sequentially connected in series;
constructing a decoder network based on a self-attention mechanism;
S4, training the single-view image camera calibration preliminary model constructed in the step S3 by adopting the training data set constructed in the step S2 to obtain a single-view image camera calibration model;
S5, adopting the single-view image camera calibration model constructed in the step S4 to finish parameter calibration of the target single-view image camera.
2. The calibration method of a single view image camera according to claim 1, wherein the step S2 specifically comprises the following steps:
Extracting lines L existing in the image from the image dataset acquired in the step S1 by adopting an LSD algorithm, wherein the lines L are expressed as WhereinFor the image of the i-th sheet,The number of detected line segments;
each line segment is processed according to the linear equation of the line segment in the image plane Representation of whereinFor the image coordinates, a is the normal vector component in the direction of the x axis of the straight line, b is the normal vector component in the direction of the y axis of the straight line, and c is the origin point of the straight lineIs a distance of (2);
conversion to obtain an upper triangular matrix And encoded as a 6-dimensional vectorAs a feature vector for the image.
3. The single view image camera calibration method according to claim 2, wherein said constructing a decoder network comprises the steps of:
The built decoder network comprises a convolution layer, a first attention layer, a second attention layer and a feedforward neural network layer which are sequentially connected in series; the input of the decoder network is the output of the encoder network and the feature vector corresponding to the image in the training data set;
The convolution layer is Is a convolution layer of (2);
The first attention layer and the second attention layer have the same structure and comprise a linear projection sublayer, a second zoom point product sublayer, a softmax sublayer and a third zoom point product sublayer; the linear projection sublayer is used for carrying out linear projection on the input characteristics to obtain a Q value matrix Q, K value matrix K and a V value matrix V through transformation; the second scaling dot product sub-layer is used for calculating the correlation of the Q value matrix Q and the K value matrix K, and the processing procedure of the second scaling dot product sub-layer is expressed as WhereinAndRepresenting the matrix obtained by linearly projecting the sub-layers,Representation ofDimension size of (2); the softmax sublayer is used for carrying out normalization processing on input data; the third scaling dot product sub-layer is used for extracting global context information, and the processing procedure of the third scaling dot product sub-layer is expressed asWhereinRepresenting a matrix obtained by linear projection of the sub-layer;
The feedforward neural network layer adopts a feedforward neural network to transform the obtained global context information into an internal reference space and an external reference space of the single-view image camera, so that parameter calibration of the single-view image camera is realized.
4. The method for calibrating a single-view image camera according to claim 3, wherein the training in step S4 specifically comprises the following steps:
the following formula is adopted as the logarithmic space loss function Wherein w is the width of the image; in is the true value of the internal parameter of the single-view image camera; A predicted value of an internal reference of the single-view image camera; Is a set threshold value;
when predicting vanishing point constraint of camera external parameters, the following formula is adopted as a first similarity loss function Wherein vp is the true value of the vanishing point coordinates; A predicted value of vanishing point coordinates; Is the norm of the vector;
When the horizon loss of the camera external parameters is predicted, the following formula is adopted as a second similarity loss function Wherein n is the number of selected terminal points; hor is the true value of the horizon; a predicted value for the horizon; calculating functions for coordinates of left and right endpoints of the horizon; Manhattan distance for vector;
Finally, the total loss function LL is constructed as As a first weight to be used,As a result of the second weight being set,Is a third weight.
5. A system for implementing the single view image camera calibration method according to any one of claims 1 to 4, comprising a data acquisition module, a training set construction module, a model training module and a camera calibration module; the data acquisition module, the training set construction module, the model training module and the camera calibration module are sequentially connected in series; the data acquisition module is used for acquiring the existing image data set and uploading the data information to the training set construction module; the training set construction module is used for extracting feature vectors corresponding to the images from the acquired image data set according to the received data information, so as to construct a training data set, and uploading the data information to the model construction module; the model construction module is used for constructing a single-view image camera calibration preliminary model comprising an encoder network and a decoder network according to the received data information, and uploading the data information to the model training module; the model training module is used for training the constructed single-view image camera calibration preliminary model by adopting the constructed training data set according to the received data information to obtain a single-view image camera calibration model, and uploading the data information to the camera calibration module; and the camera calibration module is used for completing parameter calibration of the target single-view image camera by adopting the constructed single-view image camera calibration model according to the received data information.
CN202411002916.0A 2024-07-25 2024-07-25 Single-view image camera calibration method and system Active CN118552626B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411002916.0A CN118552626B (en) 2024-07-25 2024-07-25 Single-view image camera calibration method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411002916.0A CN118552626B (en) 2024-07-25 2024-07-25 Single-view image camera calibration method and system

Publications (2)

Publication Number Publication Date
CN118552626A CN118552626A (en) 2024-08-27
CN118552626B true CN118552626B (en) 2024-11-26

Family

ID=92448484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411002916.0A Active CN118552626B (en) 2024-07-25 2024-07-25 Single-view image camera calibration method and system

Country Status (1)

Country Link
CN (1) CN118552626B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107808400B (en) * 2017-10-24 2021-11-26 上海交通大学 Camera calibration system and calibration method thereof
US10515460B2 (en) * 2017-11-29 2019-12-24 Adobe Inc. Neural network-based camera calibration
CN115187638B (en) * 2022-09-07 2022-12-27 南京逸智网络空间技术创新研究院有限公司 Unsupervised monocular depth estimation method based on optical flow mask
CN117409339A (en) * 2023-10-13 2024-01-16 东南大学 Unmanned aerial vehicle crop state visual identification method for air-ground coordination
CN117671024A (en) * 2023-11-30 2024-03-08 广东工业大学 Image defocusing fuzzy model training method and camera parameter calibration method
CN118379358A (en) * 2024-03-22 2024-07-23 桂林电子科技大学 A RGB-D camera depth module calibration method based on adversarial neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CTRL-C: Camera calibration TRansformer with Line-Classification;Jinwoo Lee等;《arxiv:2109.02259V1》;20210906;第1-14页 *

Also Published As

Publication number Publication date
CN118552626A (en) 2024-08-27

Similar Documents

Publication Publication Date Title
CN111325797B (en) Pose estimation method based on self-supervision learning
JP7439153B2 (en) Lifted semantic graph embedding for omnidirectional location recognition
CN113283525B (en) Image matching method based on deep learning
CN101860729A (en) A Target Tracking Method for Omni-directional Vision
CN110969670A (en) Multispectral camera dynamic stereo calibration algorithm based on significant features
CN109993103A (en) A Human Behavior Recognition Method Based on Point Cloud Data
CN116703996A (en) Monocular 3D Object Detection Algorithm Based on Instance-Level Adaptive Depth Estimation
CN113112547A (en) Robot, repositioning method thereof, positioning device and storage medium
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning
CN116958434A (en) Multi-view three-dimensional reconstruction method, measurement method and system
Jindal et al. An ensemble mosaicing and ridgelet based fusion technique for underwater panoramic image reconstruction and its refinement
JP7254849B2 (en) Rotational Equivariant Orientation Estimation for Omnidirectional Localization
CN115456870A (en) Multi-image splicing method based on external parameter estimation
CN118485702B (en) High-precision binocular vision ranging method
CN114820733A (en) Interpretable thermal infrared visible light image registration method and system
CN112950653B (en) Attention image segmentation method, device and medium
CN116188550A (en) Self-supervision depth vision odometer based on geometric constraint
CN117095033B (en) Multi-mode point cloud registration method based on image and geometric information guidance
CN118552626B (en) Single-view image camera calibration method and system
CN118822906A (en) Indoor dynamic environment map construction method and system based on image restoration and completion
CN109215122B (en) A street view three-dimensional reconstruction system and method, intelligent car
CN117893561A (en) Infrared tiny target detection algorithm based on local contrast computing method
CN117788686A (en) Three-dimensional scene reconstruction method and device based on 2D image and electronic equipment
van de Wouw et al. Hierarchical 2.5-d scene alignment for change detection with large viewpoint differences
CN115496859A (en) 3D Scene Motion Trend Estimation Method Based on Cross Attention Learning from Scattered Point Clouds

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant