Nothing Special   »   [go: up one dir, main page]

CN111597367A - Three-dimensional model retrieval method based on view and Hash algorithm - Google Patents

Three-dimensional model retrieval method based on view and Hash algorithm Download PDF

Info

Publication number
CN111597367A
CN111597367A CN202010418065.3A CN202010418065A CN111597367A CN 111597367 A CN111597367 A CN 111597367A CN 202010418065 A CN202010418065 A CN 202010418065A CN 111597367 A CN111597367 A CN 111597367A
Authority
CN
China
Prior art keywords
layer
model
dimensional
convolution
models
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010418065.3A
Other languages
Chinese (zh)
Other versions
CN111597367B (en
Inventor
张满囤
燕明晓
王红
田琪
崔时雨
齐畅
魏玮
吴清
王小芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei University of Technology
Original Assignee
Hebei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei University of Technology filed Critical Hebei University of Technology
Priority to CN202010418065.3A priority Critical patent/CN111597367B/en
Publication of CN111597367A publication Critical patent/CN111597367A/en
Application granted granted Critical
Publication of CN111597367B publication Critical patent/CN111597367B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a three-dimensional model retrieval method based on view and Hash algorithm, which comprises the steps of obtaining a plurality of view pictures shot by different three-dimensional models at different angles, and normalizing the view pictures; constructing a convolution neural network based on AlexNet: connecting two full-connection layers through a view layer after the 5 layers of convolution layers, adding a hash layer after the last full-connection layer, converting high-dimensional characteristics into low-dimensional hash codes, and designing a quantization loss function in the conversion process to reduce quantization errors of the hash codes; training a convolution neural network based on AlexNet by utilizing an existing three-dimensional model data set, wherein the characteristics of each model are represented by Hash characteristics learned by the trained network; the similarity between any given query three-dimensional model and the three-dimensional models in the three-dimensional model database is calculated by utilizing the Hamming distance, and the first models with the minimum Hamming distance are selected as results to be output to a retrieval list, so that the retrieval efficiency of the three-dimensional models can be improved.

Description

Three-dimensional model retrieval method based on view and Hash algorithm
Technical Field
The technical scheme of the invention relates to the retrieval of a three-dimensional (3D) model, in particular to a three-dimensional model retrieval method based on a view and a Hash algorithm.
Background
With the advent of the big data era, image acquisition becomes simpler and more diversified. In recent years, due to the large amount of low-cost 3D acquisition equipment and 3D modeling tools, the number of three-dimensional models is rapidly increased, and very huge three-dimensional model resources are already available on a network. The three-dimensional model is more and more widely applied to the aspects of three-dimensional games, virtual reality, industrial design, movie and television entertainment and the like, and the requirement for accurate and efficient three-dimensional object retrieval is increasingly shown.
At present, the retrieval work of the three-dimensional model can be mainly divided into two aspects: model-based retrieval and view-based retrieval. Model-based retrieval is primarily from the perspective of three-dimensional data to represent model features such as polygonal meshes, voxel meshes, point clouds, or implicit surfaces. The model-based method can better retain the original data information and the space geometric characteristics of the three-dimensional model. However, in the real world, it is sometimes difficult to directly represent a model by three-dimensional data, and currently, there are few open-source three-dimensional feature model databases. The view-based retrieval is carried out by representing a three-dimensional model by a group of two-dimensional images, reducing the matching dimension between the three-dimensional models to a two-dimensional layer, and inquiring the model to be searched by matching the similarity of the views, so that the over-fitting problem can be avoided to a great extent. However, in the current view-based algorithm, the extracted high-dimensional features are measured in the euclidean space to complete similarity retrieval, and the retrieval efficiency is low. How to improve the model retrieval efficiency is the key to improve the three-dimensional model retrieval performance.
Disclosure of Invention
Aiming at the defect of low algorithm retrieval efficiency of the current three-dimensional model based on view retrieval, the invention provides a three-dimensional model retrieval method based on a view and a Hash algorithm. According to the method, a Hash algorithm is added to the last layer of the convolutional neural network, after a model extracted from the convolutional layer is processed by a view layer, high-dimensional features are converted into Hash code features through a Hash layer, and then the similarity of the model is calculated in a low-dimensional Hamming space by utilizing Hamming distance, so that the model retrieval efficiency is improved.
The technical scheme adopted by the invention for solving the technical problems is as follows: the method comprises the steps of obtaining a plurality of view pictures shot by different three-dimensional models at different angles, and normalizing the view pictures;
constructing a convolution neural network based on AlexNet: connecting two full-connection layers through a view layer after the 5 layers of convolution layers, adding a hash layer after the last full-connection layer, converting high-dimensional characteristics into low-dimensional hash codes, and designing a quantization loss function in the conversion process to reduce quantization errors of the hash codes;
training a convolution neural network based on AlexNet by utilizing an existing three-dimensional model data set, wherein the characteristics of each model are represented by Hash characteristics learned by the trained network; and calculating the similarity between any given query three-dimensional model and the three-dimensional models in the three-dimensional model database by using the Hamming distance, wherein the larger the Hamming distance is, the more dissimilar the models are represented, the smaller the Hamming distance is, the more similar the models are represented, and a plurality of models with the most advanced sequence are selected as results to be output to a retrieval list according to the sequence of the Hamming distance from small to large.
In the above retrieval method, model scale standardization processing is performed on different three-dimensional models before obtaining a plurality of view pictures, and since models on a network are various and large in number, standardization processing needs to be performed on all models in a data set in order to avoid being influenced by the size of the models in the retrieval process. The models with different scales are scaled into the cube with the side length of 2 by scaling the models, so that the uniformity and the usability of model characteristics can be ensured. The method comprises the following specific steps:
step 2-1, reading the information of each point of the three-dimensional model, and finding the coordinate point (x) with the minimum modelmin,ymin,zmin) And the maximum coordinate point (x) of the modelmax,ymax,zmax)。
2-2, calculating a difference value between the maximum coordinate point and the minimum coordinate point, taking the maximum value of the difference values in three dimensions as the side length l of the model bounding box, constructing a cube bounding box, and placing the center of the model on the center of the cube;
step 2-3, zooming the model to obtain a standardized model: for the coordinates (x, y, z) of any point, a new coordinate (x ', y ', z ') is obtained after scaling, and the specific calculation method is as follows:
x′=(x-xmin)×2/l-1
y′=(y-ymin)×2/l-1
z′=(z-zmin)×2/l-1
after standardization, the coordinates of all points of the model are located in [ -1,1], and the model is located in a cube with the side length of 2, so that a standardized model is obtained.
In the above retrieval method, the process of obtaining the multi-view picture is as follows: and arranging a virtual camera array around the model, taking 12 view pictures by each model, normalizing a plurality of view pictures into a uniform size, and using the normalized view pictures as the input of the convolutional neural network.
Step 3-1, placing the standardized model at the body center of the regular icosahedron, placing virtual cameras at 12 vertexes of the regular icosahedron for shooting, and obtaining a group of 12 views of the model with the size of 256 multiplied by 256;
step 3-2, cutting the multi-view of the model into 227 multiplied by 227 size as the input of the convolutional neural network, wherein the cutting method comprises the following steps:
left=Cw/2-C′w/2
top=Ch/2-Ch′/2
right=left+C′w
bottom=top+C′h
wherein top, bottom, left, right respectively represent the new size (C'w,C′h) In the original size (C)w,Ch) Upper, lower, left, and right boundaries of the crop in (1).
In the above search method, the specific structure of the convolution neural network based on AlexNet is as follows:
step 4-1, sequentially inputting 12 views of 227 × 227 sizes of all models into a convolutional neural network, acquiring local features of an image by using a convolutional pooling layer, wherein the convolutional layer and the pooling layer are specifically set as follows:
the first layer comprises a convolutional layer and a max pooling layer, the convolutional layer convolutional kernel has a size of 11 × 11, the step size is 4, and the activation function is set to the Relu function. And then performing pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step size is 2.
The second layer comprises a convolutional layer and a max pooling layer, the convolutional layer convolutional core has a size of 5 × 5, the step size is 1, and the activation function is set to be the Relu function. And then performing pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step size is 2.
The third layer comprises a convolutional layer, the size of the convolutional layer convolutional core is 3 x 3, the step size is 1, and the activation function is set to be a Relu function.
The fourth layer comprises a convolutional layer with a convolutional kernel size of 3 x 3, step size 1, and activation function set to the Relu function.
The fifth layer comprises a convolutional layer and a max pooling layer, the convolutional layer has the size of 3 x 3 and the step size of 1, and the activation function is set to be the Relu function. And then performing pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step size is 2.
And 4-2, after carrying out convolution layer processing, adding a view layer after the fifth convolution layer, carrying out view layer processing on the features of the 12 pictures of each model after convolution processing, comparing the 12 pictures with the view layer, taking the feature maximum value of each dimension of each picture, generating a feature descriptor of the three-dimensional model, and inputting the feature descriptor into a full-connection layer for processing. The setting of 2 layers in total of the full connection layer is the same, 4096 neurons are set, the Relu activation function is added to avoid gradient disappearance, and the dropout layer is added to randomly set the value of the neuron to 0, so that network parameters are reduced, the complexity is reduced, and overfitting is prevented.
And 4-3, adding a hash layer after the full connection layer, wherein the layer contains k hidden layer neurons (namely the bits of the hash code), and setting a sigmod activation function. Mapping 4096-dimensional features output by a full connection layer to a low-dimensional space to form a low-dimensional Hash feature fnFurther converting it into discrete hash code bnThe conversion process is as followsn=sgn(fn-0.5). Simultaneous setting of quantization loss function LqlTo control the error of the hash code quantization process.
Figure BDA0002495820930000031
N is the number of input samples, and k is the number of bits of the hash code.
When the network is trained, a public Princeton three-dimensional model data set ModelNet40 is used, after model scale standardization processing and multi-view picture normalization processing, training set data are input into a convolution neural network based on AlexNet for training, network parameters are optimized, and a network model is generated; and then testing the model test set by using the generated network model. The invention uses a Tensorflow deep learning framework with a language of Python 3.6.
The calculation process of the Hamming distance is as follows:
and obtaining hash code characteristics corresponding to the characteristics of each model, wherein the similarity between the models is represented by a Hamming distance D, the greater the Hamming distance is, the more dissimilar the models are represented, and the smaller the Hamming distance is, the more similar the models are represented. The calculation method of the Hamming distance comprises the following steps
Figure BDA0002495820930000032
bi,bjIs the hash code characteristic of both models,
Figure BDA0002495820930000033
is an exclusive or operation; for any query three-dimensional model Q, similarity measurement is carried out on the three-dimensional model Q and the three-dimensional models in the three-dimensional model database M, and the model Q is matched*The calculation process of (2) is as follows:
S(Q,M)=argminD(bi,bj)
Figure BDA0002495820930000041
s represents the similarity between models, MmRepresents the mth model in the database (1 is not less than m and not more than N)*),N*The number of samples in the database; and finally outputting the 10 models with the highest similarity to the model to a retrieval list as a result through the calculation.
Compared with the prior art, the invention has the beneficial effects that:
1. aiming at the task of three-dimensional model retrieval efficiency, an algorithm based on view and Hash learning is provided. The method simultaneously integrates the advantages of convolutional neural network, multi-view and Hash algorithm retrieval, and obtains better results in three-dimensional model retrieval. The convolutional network design of the invention is to process a plurality of views by utilizing the convolutional layer to generate a view pool (view layer), combine the multiple views of a three-dimensional model together, input the combined views into the network at the back to extract features, add a hash layer after the processing of a full connection layer, the hash layer is the last layer, learn the hash features by a hash algorithm after the high-dimensional features, control the loss error of hash quantization, generate almost lossless hash codes, and improve the three-dimensional retrieval precision and efficiency.
2. The retrieval method carries out scale standardization processing on the initially obtained three-dimensional model data, so that the method is suitable for various models on a data set or a network, and can avoid the problem that the extracted features of the models are influenced due to overlarge size difference of the models. In order to test the performance of the algorithm, the data set is compared with the existing algorithm in the ModelNet40, and the result shows that the method has good performance.
3. According to the method, the specific quantification loss function is added after the Hash layer is introduced to control the quantification error in the Hash code conversion process, the retrieval efficiency is improved, and the low-dimensional Hash characteristic enables the Hamming distance to be used for rapid retrieval, so that the retrieval efficiency is guaranteed.
Drawings
FIG. 1 is a general flow diagram of the present invention.
FIG. 2 is a result of a normalization process for an example three-dimensional model of the present invention.
FIG. 3 is a two-dimensional projection process of a three-dimensional model of the present invention.
FIG. 4 is a set of two-dimensional views obtained by projection of an example model in accordance with the present invention.
Fig. 5 is a network hierarchy diagram of the present invention.
FIG. 6 is a ROC plot of the performance of the present invention compared to other advanced algorithms on a ModelNet40 data set. The corresponding literature for the other 5 algorithms in fig. 6 is as follows.
[1]Su H,Maji S,Kalogerakis E,et al.Multi-view convolutional neuralnetworks for 3D shape recognition//IEEE International Conference on ComputerVision,Santiago,2015:945-953.
[2]Wu N Z,Song S,Khosla A,et al.3D shapenets:a deep representationfor volumetric shape modeling//2015IEEE Conference on Computer Vision andPattern Recognition(CVPR),Boston,MA,2015:1912-1920.
[3]Cheng HC,Lo C H,Chu CH,Kim YS.Shape similarity measurement for 3Dmechanical part using D2shape distribution and negative featuredecomposition.Computers in Industry,2010,62(3):269-280.
[4]Kun Zhou,Minmin Gong,Xin Huang,Baining Guo.Data-parallel octreesfor surface reconstruction.IEEE transactions on visualization and computergraphics,2011,17(5):669–681.
Detailed Description
The invention will be further described with reference to the accompanying drawings, but the scope of the invention is not limited thereto.
As shown in fig. 1, the three-dimensional model retrieval method based on the view and hash algorithm of the present invention mainly includes 7 modules: inputting a three-dimensional model; standardizing the model; acquiring a two-dimensional view of the model; designing a convolutional neural network structure; training a convolutional neural network structure; generating model features; and searching model similarity.
1. Input model module
The user selects the input three-dimensional model by himself, the invention uses ModelNet40 data set disclosed by Princeton university to carry out experiment, the data set comprises 40 types of universal model types, the model of each type is divided into a training set and a testing set, and the invention uses 9461 models of the training set to carry out training.
2. Model standardization
The models on the network are various and huge. In order to avoid the influence of the size scale of the model in the retrieval process, the scale standardization process needs to be carried out on all the models in the data set. For the airplane model in fig. 2, the model normalization is implemented by the following steps:
step 2-1, reading the information of each point of the airplane model, and finding the coordinate point (x) with the minimum modelmin,ymin,zmin) And the maximum coordinate point (x) of the modelmax,ymax,zmax)。
Step 2-2 calculation of (x)max-xmin),(ymax-ymin),(zmax-zmin) And taking the maximum value of the three as the side length l of the model bounding box, constructing a cube bounding box, and placing the center of the model on the center of the cube.
And 2-3, zooming the model to obtain a standardized model. For the coordinates (x, y, z) of any point, a new coordinate (x ', y ', z ') is obtained after scaling, and the specific calculation method is as follows:
x′=(x-xmin)×2/l-1
y′=(y-ymin)×2/l-1
z′=(z-zmin)×2/l-1
after normalization, the coordinates of all points of the model are located at [ -1,1 ]. All point coordinates after the model normalization as shown in fig. 2 are located at-1, and the model is in a cube with a side length of 2.
3. Obtaining a two-dimensional view of a model
Step 3-1 as shown in fig. 3, the model is placed at the center of the regular icosahedron, and virtual cameras are placed at 12 vertices of the regular icosahedron for shooting, so as to obtain a group of 12 views of the model. Fig. 4 shows 12 views of 256 × 256 size taken by an example airplane model.
Step 3-2 crops the multi-view of the model to 227 x 227 size as input to the convolutional neural network. The shearing method comprises the following steps:
left=Cw/2-C′w/2
top=Ch/2-C′h/2
right=left+C′w
bottom=top+C′h
wherein C isw=Ch=256,C′w=C′h227, calculated left 15, right 242, top 15, bottom 242.
4. Designing convolutional neural network structures
Step 4-1, inputting the clipped model multiple views into a convolutional neural network, wherein the network structure is shown in fig. 5, the local features of the image are obtained by using a convolutional pooling layer, and the convolutional pooling layer is specifically set as follows:
the first layer comprises a convolutional layer and a max pooling layer, the convolutional layer convolutional kernel has a size of 11 × 11, the step size is 4, and the activation function is set to the Relu function. And then performing pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step size is 2.
The second layer comprises a convolutional layer and a max pooling layer, the convolutional layer convolutional core has a size of 5 × 5, the step size is 1, and the activation function is set to be the Relu function. And then performing pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step size is 2.
The third layer comprises a convolutional layer, the size of the convolutional layer convolutional core is 3 x 3, the step size is 1, and the activation function is set to be a Relu function.
The fourth layer comprises a convolutional layer with a convolutional kernel size of 3 x 3, step size 1, and activation function set to the Relu function.
The fifth layer comprises a convolutional layer and a max pooling layer, the convolutional layer has the size of 3 x 3 and the step size of 1, and the activation function is set to be the Relu function. And then performing pooling operation on the convolution result, wherein the convolution kernel size of the maximum pooling layer is 3 multiplied by 3, and the step size is 2.
And 4-2, after carrying out convolution layer processing, adding a view layer after the fifth convolution layer, carrying out view layer processing on the features of the 12 pictures of each model after convolution processing, comparing the 12 pictures with the view layer, taking the feature maximum value of each dimension of each picture, generating a feature descriptor of the three-dimensional model, and inputting the feature descriptor into a full-connection layer for processing. The setting of 2 layers in total of the full connection layer is the same, 4096 neurons are set, the Relu activation function is added to avoid gradient disappearance, and the dropout layer is added to randomly set the value of the neuron to 0, so that network parameters are reduced, the complexity is reduced, and overfitting is prevented.
And 4-3, adding a hash layer after the full connection layer, wherein the layer contains k hidden layer neurons (namely the bits of the hash code), and setting a sigmod activation function. Mapping 4096-dimensional features output by a full connection layer to a low-dimensional space to form a low-dimensional Hash feature fnFurther converting it into discrete hash code bnThe conversion process is as followsn=sgn(fn-0.5). Simultaneous setting of quantization loss function LqlTo control the error of the hash code quantization process.
Figure BDA0002495820930000061
N is the number of input samples, and k is the number of bits of the hash code. We set N to 9461 and k to 48 during the experiment.
5. Training convolutional neural network structure
The invention uses the deep learning framework of TensorFlow, and the language is Python 3.6. Training was performed using the training set in the ModelNet40 dataset for a total of 9461 models, with the batch _ size set to 16 and the learning rate set to 0.0001.
6. Generating model features
After training of the training set, a network model capable of well learning the hash features of the model is generated, the hash features of the model are output by the last hash layer, and each model has 48-bit hash features, for example, the hash features of the airplane model in fig. 2 are [011101111001100110110111101110110110001100111010 ].
7. Model similarity retrieval
The characteristics of each model are represented by the hash codes learned by the trained network in the fourth step. The hash layer maps the high-dimensional features of the model to hash code features in a low-dimensional hamming space. Thus, the similarity between models is represented by the hamming distance D, with larger hamming distances representing less similarity of models and smaller hamming distances representing more similarity of models. The calculation method of the Hamming distance comprises the following steps
Figure BDA0002495820930000071
bi,bjIs the hash code characteristic of both models,
Figure BDA0002495820930000072
is an exclusive or operation. For any query three-dimensional model Q, similarity measurement is carried out on the three-dimensional model Q and the three-dimensional models in the three-dimensional model database M, and the model Q is matched*The calculation process of (2) is as follows:
S(Q,M)=argminD(bi,bj)
Figure BDA0002495820930000073
s represents the similarity between models, MmRepresents the mth model in the database (1 is not less than m and not more than N)*),N*Is the number of samples in the database M. Finally obtaining a matching model Q*. Finally, the 10 models with the highest similarity with the models are output to the retrieval column as results through the calculationTable (7). The retrieval as the airplane model airplan _0219.off in FIG. 2 returns the most similar 10 models, [ 'airplan _0219.off', 'airplan _0115.off', 'airplan _0218.off', 'airplan _0002.off', 'airplan _0027.off', 'airplan _0566.off', 'airplan _0020.off', 'airplan _0374.off', 'airplan _0613.off', and 'airplan _0276.off']。
In order to verify the effectiveness of the present invention, the disclosed three-dimensional model dataset ModelNet40 is compared with other 5 advanced algorithms, and fig. 6 shows a receiver operating characteristic Curve (ROC Curve for short) of each algorithm, where the ordinate of the ROC Curve is true positive rate (TPR sensitivity) and the abscissa is false positive rate (FPR specificity). The abscissa, False Positive Rate (FPR), the proportion of samples predicted to be positive but actually negative to all negative samples; true Positive Rate (TPR), the proportion of samples predicted to be positive and actually positive to all positive examples. The closer the point on the curve is to the upper left corner, the higher the true rate is, the lower the false positive rate is, the stronger the algorithm distinguishing capability is, and the better the performance is. The results in the figure show that the three-dimensional model retrieval method based on the view and the hash algorithm has excellent performance.
In the above embodiments, the AlexNet convolutional neural network, the ModelNet40 dataset, the TensorFlow deep learning framework, the Relu activation function, the dropout layer, and the sigmod activation function are well known in the art.
The foregoing is a detailed description of embodiments of the invention, taken in conjunction with the accompanying drawings, and the detailed description is given for the purpose of facilitating a better understanding of the method of the invention. It will be understood by those skilled in the art that various modifications and equivalent arrangements may be made without departing from the spirit and scope of the present invention and shall be covered by the appended claims.
Nothing in this specification is said to apply to the prior art.

Claims (7)

1. A three-dimensional model retrieval method based on view and Hash algorithm comprises the steps of obtaining a plurality of view pictures shot by different three-dimensional models at different angles, and normalizing;
constructing a convolution neural network based on AlexNet: connecting two full-connection layers through a view layer after the 5 layers of convolution layers, adding a hash layer after the last full-connection layer, converting high-dimensional characteristics into low-dimensional hash codes, and designing a quantization loss function in the conversion process to reduce quantization errors of the hash codes;
training a convolution neural network based on AlexNet by utilizing an existing three-dimensional model data set, wherein the characteristics of each model are represented by Hash characteristics learned by the trained network; and calculating the similarity between any given query three-dimensional model and the three-dimensional models in the three-dimensional model database by using the Hamming distance, wherein the larger the Hamming distance is, the more dissimilar the models are represented, the smaller the Hamming distance is, the more similar the models are represented, and a plurality of models with the most advanced sequence are selected as results to be output to a retrieval list according to the sequence of the Hamming distance from small to large.
2. The retrieval method according to claim 1, wherein the view layer is that a maximum feature value of each dimension of each picture is selected from a plurality of pictures of the same three-dimensional model after 5 layers of convolutional layer feature extraction, and a feature descriptor of the generated three-dimensional model is input into a full connection layer for processing;
the high-dimensional features output by the full connection layer are transcoded into low-dimensional hash features f by the hash layernThen the hash feature f is usednAccording to bn=sgn(fn-0.5), into a discrete hash code bn(ii) a Quantization loss function L in the conversion processqlIs composed of
Figure FDA0002495820920000011
s.t.bn∈{0,1}kN is the number of input samples, and k is the number of bits of the hash code.
3. The retrieval method according to claim 1, wherein before obtaining the plurality of view pictures, model scale standardization processing is performed on different three-dimensional models, and the models with different scales are scaled into a cube with a side length of 2 by scaling the models, and the specific steps are as follows:
1) reading the information of each point of the three-dimensional model, and finding the coordinate point (x) with the minimum modelmin,ymin,zmin) And the maximum coordinate point (x) of the modelmax,ymax,zmax);
2) Calculating the difference value between the maximum coordinate point and the minimum coordinate point, taking the maximum value of the difference values in three dimensions as the side length l of the model bounding box, constructing a cube bounding box, and placing the center of the model on the center of the cube;
3) scaling the model to obtain a normalized model: wherein, for the coordinates (x, y, z) of any point, a new coordinate (x ', y ', z ') is obtained after scaling, and the specific calculation method is as follows:
x′=(x-xmin)×2/l-1
y′=(y-ymin)×2/l-1
z′=(z-zmin)×2/l-1
after standardization, the coordinates of all points of the model are located in [ -1,1], and the model is located in a cube with the side length of 2, so that a standardized model is obtained.
4. The retrieval method of claim 3, wherein the multi-view picture is obtained by: arranging a virtual camera array around the models, taking 12 view pictures by each model, normalizing a plurality of view pictures into a uniform size, and using the uniform size as the input of a convolutional neural network; the method comprises the following specific steps:
1) placing the standardized model at the body center of the regular icosahedron, placing a virtual camera at 12 vertexes of the regular icosahedron for shooting, and obtaining a group of 12 views of the model with the size of 256 multiplied by 256;
2) the multi-view of the model is clipped to 227 × 227, and the clipped method is as the input of the convolutional neural network:
left=Cw/2-C′w/2
top=Ch/2-Ch′/2
right=left+C′w
bottom=top+C′h
wherein top, bottom, left, right respectively represent the new size (C'w,Ch') in the original size (C)w,Ch) Upper, lower, left, and right boundaries of the crop in (1).
5. The retrieval method of claim 4, wherein the AlexNet-based convolutional neural network has a specific structure as follows:
1) sequentially inputting 12 views of 227 x 227 sizes of all models into a convolutional neural network, acquiring local features of an image by using a convolutional pooling layer, wherein the convolutional layer and the pooling layer are specifically set as follows:
the first layer comprises a convolution layer and a maximum pooling layer, the size of the convolution layer convolution kernel is 11 multiplied by 11, the step length is 4, and the activation function is set to be a Relu function; then performing pooling operation on the convolution result, wherein the size of a convolution kernel of the maximum pooling layer is 3 multiplied by 3, and the step length is 2;
the second layer comprises a convolution layer and a maximum pooling layer, the size of the convolution layer convolution kernel is 5 multiplied by 5, the step length is 1, and the activation function is set to be a Relu function; then performing pooling operation on the convolution result, wherein the size of a convolution kernel of the maximum pooling layer is 3 multiplied by 3, and the step length is 2;
the third layer comprises a convolution layer, the size of the convolution layer convolution kernel is 3 multiplied by 3, the step length is 1, and the activation function is set to be a Relu function;
the fourth layer comprises a convolution layer, the size of the convolution layer is 3 multiplied by 3, the step size is 1, and the activation function is set to be a Relu function;
the fifth layer comprises a convolutional layer and a max pooling layer, the convolutional layer has the size of 3 x 3 and the step size of 1, and the activation function is set to be the Relu function. Then performing pooling operation on the convolution result, wherein the size of a convolution kernel of the maximum pooling layer is 3 multiplied by 3, and the step length is 2;
2) after the convolution layer processing, adding a view layer after the fifth convolution layer, performing the view layer processing on the feature of each three-dimensional model after the convolution processing on the 12 pictures, comparing the 12 pictures by the view layer, taking the feature maximum value of each dimension of each picture, generating a feature descriptor of the three-dimensional model, and inputting the feature descriptor into a full-connection layer for processing; the setting of 2 layers in all connection layers is the same, 4096 neurons are set, a Relu activation function is added to avoid gradient disappearance, and a dropout layer is added to randomly set the value of the neuron to 0;
3) adding a hash layer behind the full-connection layer, wherein the layer contains k hidden layer neurons, namely the bits of the hash code, and setting a sigmod activation function; mapping 4096-dimensional features output by a full connection layer to a low-dimensional space to form a low-dimensional Hash feature fnFurther converting it into discrete hash code bnThe conversion process is as followsn=sgn(fn-0.5); simultaneous setting of quantization loss function LqlIs composed of
Figure FDA0002495820920000031
Wherein, bn∈{0,1}kAnd N is the number of input samples.
6. The retrieval method according to claim 1, wherein when training a network, a public Princeton three-dimensional model data set ModelNet40 is used, and after model scale standardization processing and multi-view picture normalization processing, training set data are input into a convolution neural network based on AlexNet for training, network parameters are optimized, and a network model is generated; and then testing the model test set by using the generated network model.
7. The search method according to claim 1, wherein the hamming distance is calculated by:
the method for obtaining the hash code characteristics corresponding to the characteristics of each model and calculating the Hamming distance comprises the following steps
Figure FDA0002495820920000032
bi,bjIs the hash code characteristic of both models,
Figure FDA0002495820920000033
is an exclusive or operation; for any query three-dimensional model Q, similarity measurement is carried out on the three-dimensional model Q and the three-dimensional models in the three-dimensional model database M, and the model Q is matched*The calculation process of (2) is as follows:
S(Q,M)=arg min D(bi,bj)
Figure FDA0002495820920000034
s represents the similarity between models, MmRepresents the mth model in the database (1 is not less than m and not more than N)*),N*The number of samples in the database; and finally outputting the 10 models with the highest similarity to the model to a retrieval list as a result through the calculation.
CN202010418065.3A 2020-05-18 2020-05-18 Three-dimensional model retrieval method based on view and hash algorithm Active CN111597367B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010418065.3A CN111597367B (en) 2020-05-18 2020-05-18 Three-dimensional model retrieval method based on view and hash algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010418065.3A CN111597367B (en) 2020-05-18 2020-05-18 Three-dimensional model retrieval method based on view and hash algorithm

Publications (2)

Publication Number Publication Date
CN111597367A true CN111597367A (en) 2020-08-28
CN111597367B CN111597367B (en) 2023-11-24

Family

ID=72182555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010418065.3A Active CN111597367B (en) 2020-05-18 2020-05-18 Three-dimensional model retrieval method based on view and hash algorithm

Country Status (1)

Country Link
CN (1) CN111597367B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032613A (en) * 2021-03-12 2021-06-25 哈尔滨理工大学 Three-dimensional model retrieval method based on interactive attention convolution neural network
CN115294284A (en) * 2022-10-09 2022-11-04 南京纯白矩阵科技有限公司 High-resolution three-dimensional model generation method for guaranteeing uniqueness of generated model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180276528A1 (en) * 2015-12-03 2018-09-27 Sun Yat-Sen University Image Retrieval Method Based on Variable-Length Deep Hash Learning
CN108932314A (en) * 2018-06-21 2018-12-04 南京农业大学 A kind of chrysanthemum image content retrieval method based on the study of depth Hash
CN108984642A (en) * 2018-06-22 2018-12-11 西安工程大学 A kind of PRINTED FABRIC image search method based on Hash coding
CN109165306A (en) * 2018-08-09 2019-01-08 长沙理工大学 Image search method based on the study of multitask Hash
CN109783682A (en) * 2019-01-19 2019-05-21 北京工业大学 It is a kind of based on putting non-to the depth of similarity loose hashing image search method
CN109918528A (en) * 2019-01-14 2019-06-21 北京工商大学 A kind of compact Hash code learning method based on semanteme protection

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180276528A1 (en) * 2015-12-03 2018-09-27 Sun Yat-Sen University Image Retrieval Method Based on Variable-Length Deep Hash Learning
CN108932314A (en) * 2018-06-21 2018-12-04 南京农业大学 A kind of chrysanthemum image content retrieval method based on the study of depth Hash
CN108984642A (en) * 2018-06-22 2018-12-11 西安工程大学 A kind of PRINTED FABRIC image search method based on Hash coding
CN109165306A (en) * 2018-08-09 2019-01-08 长沙理工大学 Image search method based on the study of multitask Hash
CN109918528A (en) * 2019-01-14 2019-06-21 北京工商大学 A kind of compact Hash code learning method based on semanteme protection
CN109783682A (en) * 2019-01-19 2019-05-21 北京工业大学 It is a kind of based on putting non-to the depth of similarity loose hashing image search method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032613A (en) * 2021-03-12 2021-06-25 哈尔滨理工大学 Three-dimensional model retrieval method based on interactive attention convolution neural network
CN113032613B (en) * 2021-03-12 2022-11-08 哈尔滨理工大学 Three-dimensional model retrieval method based on interactive attention convolution neural network
CN115294284A (en) * 2022-10-09 2022-11-04 南京纯白矩阵科技有限公司 High-resolution three-dimensional model generation method for guaranteeing uniqueness of generated model

Also Published As

Publication number Publication date
CN111597367B (en) 2023-11-24

Similar Documents

Publication Publication Date Title
Elad et al. On bending invariant signatures for surfaces
CN104090972B (en) The image characteristics extraction retrieved for D Urban model and method for measuring similarity
Kazmi et al. A survey of 2D and 3D shape descriptors
Yang et al. Content-based 3-D model retrieval: A survey
CN112907602B (en) Three-dimensional scene point cloud segmentation method based on improved K-nearest neighbor algorithm
CN110458939A (en) The indoor scene modeling method generated based on visual angle
CN111625667A (en) Three-dimensional model cross-domain retrieval method and system based on complex background image
CN106844620B (en) View-based feature matching three-dimensional model retrieval method
CN104637090A (en) Indoor scene modeling method based on single picture
CN114067075A (en) Point cloud completion method and device based on generation of countermeasure network
Liu et al. Scene recognition mechanism for service robot adapting various families: A cnn-based approach using multi-type cameras
JP7075654B2 (en) 3D CAD model partial search method and 3D CAD model search method
CN111597367B (en) Three-dimensional model retrieval method based on view and hash algorithm
Wang et al. Ovpt: Optimal viewset pooling transformer for 3d object recognition
Lei et al. Mesh convolution with continuous filters for 3-D surface parsing
CN118229889B (en) Video scene previewing auxiliary method and device
KR102129060B1 (en) Content-based 3d model retrieval method using a single depth image, 3d model retrieval server for performing the methods and computer readable recording medium thereof
Jiang et al. Efficient Match Pair Retrieval for Large-scale UAV Images via Graph Indexed Global Descriptor
Peng et al. ZS-SBPRnet: A zero-shot sketch-based point cloud retrieval network based on feature projection and cross-reconstruction
Wang et al. Auto-points: Automatic learning for point cloud analysis with neural architecture search
CN109658489B (en) Three-dimensional grid data processing method and system based on neural network
CN109272013B (en) Similarity measurement method based on learning
Huang et al. ImGeo-VoteNet: image and geometry co-supported VoteNet for RGB-D object detection
Nie et al. PANORAMA-based multi-scale and multi-channel CNN for 3D model retrieval
Yin et al. A PCLR-GIST algorithm for fast image retrieval in visual indoor localization system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant