CN113032613B

CN113032613B - Three-dimensional model retrieval method based on interactive attention convolution neural network

Info

Publication number: CN113032613B
Application number: CN202110270518.7A
Authority: CN
Inventors: 贾雯惠; 高雪瑶
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2021-03-12
Filing date: 2021-03-12
Publication date: 2022-11-08
Anticipated expiration: 2041-03-12
Also published as: CN113032613A

Abstract

The invention provides a three-dimensional model retrieval method based on an interactive attention convolution neural network. The method comprises the steps of preprocessing a three-dimensional model, fixing a projection angle to obtain 6 views of the three-dimensional model, and converting the views into line graphs to serve as a view set of the three-dimensional model. Secondly, an interaction attention module is embedded in the convolutional neural network to extract semantic features, and data interaction between two network layers of the convolutional neural network is increased. And extracting global features by using a Gist algorithm and a two-dimensional shape distribution algorithm. Thirdly, calculating the similarity between the sketch and the two-dimensional view by using the Euclidean distance. These features are then combined with the weights to retrieve the three-dimensional model. The method solves the problem of inaccurate semantic features caused by overfitting when a neural network is trained by using small sample data, and improves the accuracy of three-dimensional model retrieval.

Description

Three-dimensional model retrieval method based on interactive attention convolution neural network

The technical field is as follows:

the invention relates to a three-dimensional model retrieval method based on an interactive attention convolution neural network, which is well applied to the field of three-dimensional model retrieval.

The background art comprises the following steps:

in recent years, with the increasing development of science and technology, three-dimensional models have important roles not only in many professional fields but also widely spread in daily life of people, and the demand of people for searching three-dimensional models is gradually increased. The test objects of the example-based three-dimensional model retrieval can only be models in a database, so that certain universality is lacked. The three-dimensional model retrieval based on the sketch can be drawn at will according to the requirements of users, is convenient and applicable, and has wide prospects.

Currently, some common algorithms use a single manual feature or deep-learning algorithm pair to solve the sketch-based model retrieval problem. However, the traditional manual features have defects, researchers need a large amount of prior knowledge, the setting of parameters needs to be manually set in advance, and the extracted feature effect is not imagined. The parameters can be automatically adjusted by using a deep learning algorithm, so that the method has good expansibility. It also has certain drawbacks. Because the number of nodes of the deep neural network is large, a large amount of data is needed to train the neural network to obtain excellent results, and once the training data amount is insufficient, overfitting is caused, and the obtained results are also deviated. In order to obtain a better retrieval result on the premise of insufficient training samples, the invention provides a three-dimensional model retrieval method based on an interactive attention convolution neural network.

The invention content is as follows:

the invention discloses a three-dimensional model retrieval method based on an interactive attention convolution neural network, aiming at solving the problem that the retrieval effect of a three-dimensional model retrieval method based on a sketch is poor on the premise of insufficient training samples.

Therefore, the invention provides the following technical scheme:

1. a three-dimensional model retrieval method based on an interactive attention convolution neural network is characterized by comprising the following steps:

step 1: and carrying out data preprocessing, projecting the three-dimensional model to obtain a plurality of views corresponding to the three-dimensional model, and obtaining an edge view set of the model by using an edge detection algorithm.

Step 2: designing a deep convolutional neural network, and optimizing a network model by using an interactive attention module. And selecting one part of the view sets as a training set, and the other part of the view sets as a test set.

And 3, step 3: the training includes two processes, forward propagation and backward propagation. Training data are used as input of interactive attention convolution neural network model training, and the optimized interactive attention convolution neural network model is obtained through the training of the interactive attention convolution neural network model.

And 4, step 4: and respectively extracting semantic features of the freehand sketch and the model view by using the optimized interactive attention convolution neural network model and the gist feature, and respectively extracting two-dimensional shape distribution features of the freehand sketch and the model view by using the two-dimensional shape distribution feature.

And 5: and fusing the plurality of features in a weighting mode. And retrieving the model which is most similar to the hand-drawn sketch according to the Euclidean distance.

2. The method for retrieving a three-dimensional model based on an interactive attention convolutional neural network as claimed in claim 1, wherein in the step 1, the three-dimensional model is projected to obtain a plurality of views corresponding to the three-dimensional model and an edge detection algorithm is used to obtain an edge view set of the model, and the specific steps are as follows:

step 1-1, setting a three-dimensional model at the center of a virtual sphere;

step 1-2, placing a virtual camera above the model, and rotating the model by 360 degrees by 30 degrees in each step to obtain 12 view sets of the three-dimensional model;

1-3, obtaining respective edge views of 12 original view sets by using a Canny edge detection algorithm;

after the three-dimensional model is projected, the three-dimensional model is characterized into a group of two-dimensional views, and the semantic gap between the hand-drawn sketch and the three-dimensional model view can be reduced by using a Canny edge detection algorithm.

3. The method for retrieving the three-dimensional model based on the interactive attention convolutional neural network as claimed in claim 1, wherein in the step 2, the deep convolutional neural network is designed, and the network model is optimized by using the interactive attention module, and the specific steps are as follows:

step 2-1, determining the depth of a convolutional neural network, the size of a convolutional kernel, and the number of convolutional layers and pooling layers;

step 2-2, designing an interactive attention module, connecting a global pooling layer after the output of the convolutional layer, and solving the conv of the convolutional layer _n Amount of information Z in each channel _k The information amount calculation formula is as follows:

wherein, conv _nk A kth feature map of size W representing the output of the nth convolutional layer _n *H _n 。

Step 2-3, connecting two full connection layers after the global pooling layer, and adaptively adjusting the attention weight S of each channel according to the information quantity _kn The weight is calculated as follows:

S _kn ＝F _ex (Z,W)＝σ(g(Z,W))＝σ(W ₂ δ(W ₁ Z))

wherein, delta is a Relu function, and sigma is a sigmoid function. W is a group of ₁ 、W ₂ The weights of the first full connection and the second full connection, respectively.

Step 2-4 calculating the interactive attention weight S of two neighborhood convolution layers respectively _k1 And S _k2 And fusing the data to obtain the optimal attention weight S _k The calculation formula of the optimal attention weight is as follows:

S _k ＝Average(S _k1 ,S _k2 )

step 2-5 will notice the weight S _k And second convolution layer conv ₂ The first pooling layer a _p Fusing to obtain final result a ₂ The fused calculation formula is as follows:

and selecting one part of the view sets as a training set, and the other part of the view sets as a test set.

4. The method for retrieving the three-dimensional model based on the interactive attention convolutional neural network as claimed in claim 1, wherein in the step 3, the convolutional neural network model is trained, and the specific steps are as follows:

step 3-1, inputting training data into an initialized interactive attention convolution neural network model;

step 3-2, extracting more detailed view features through the convolution layer, extracting low-level features through the shallow-level convolution layer, and extracting high-level semantic features through the high-level convolution layer;

3-3, after the attention module is fused with the neighborhood convolution layer through the weighting channel, reducing information lost when the edge view of the hand-drawn sketch or the model is pooled;

step 3-4, the scale of the view features is reduced through a pooling layer, so that the number of parameters is reduced, and the speed of model calculation is increased;

step 3-5, through a Dropout layer, the overfitting problem caused by insufficient training samples is relieved;

3-6, after alternately operating convolution, an attention module, dropout and pooling, finally inputting a full connection layer, and reducing the dimensions of the extracted features to connect the extracted features into a one-dimensional high-level semantic feature vector;

steps 3-7 use the labeled 2D view to optimize the weights and biases of the interactive attention convolution neural network during back propagation. The 2D view-set is { v } ₁ ,v ₂ ，…，v _n Is set of { l } labels ₁ ,l ₂ ，…,l _n }. The 2D view has t classes including 1,2, \8230;, t. After forward propagation, v _i The prediction probability in class j is y _ testj. V is to be _i Label l of _i Comparing with the category j, calculating the expected probability y _ij The formula for calculating the probability is as follows:

step 3-8 will predict the probability y _ test _ij And true probability y _j A comparison is made and the error loss is calculated using a cross entropy loss function.

The error loss is calculated as follows:

and continuously iterating the interactive attention convolution neural network model to obtain an optimized interactive attention convolution neural network model, and storing the weight and the bias.

5. The method for retrieving the three-dimensional model based on the interactive attention convolution neural network as claimed in claim 1, wherein in the step 4, the optimized interactive attention convolution neural network model and the gist feature are used to extract semantic features of the freehand sketch and the model view respectively, and the two-dimensional shape distribution feature of the freehand sketch and the model view is extracted respectively by using the two-dimensional shape distribution feature, and the specific process is as follows:

step 4-1, inputting the test data into the optimized interactive attention convolution neural network model;

and 4-2, extracting the characteristics of the full connection layer to be used as high-level semantic characteristics of the hand-drawn sketch or the model view.

Step 4-3 divides the sketch or 2D view of size m x n into 4 x 4 blocks. The size of each block is a b, where a = m/4, b = n/4.

Step 4-4 each block is processed by 32 Gabor filters of 4 scales, 8 directions. And combining the processed features to obtain gist features. The formula is as follows:

wherein i =4,j =8.G (x, y) is the gist feature of the 32 Gabor filters, and cat () represents the stitching operation. Here, x and y are positions of pixels, and I (x, y) denotes a block. At the same time, g _ij (x, y) are filters for the ith scale and the jth direction. * Representing a convolution operation.

Step 4-5, randomly and equidistantly sampling points on the boundary of the sketch or the 2D view, and collecting the points as points = { (x) ₁ ,y ₁ )，…，(x _i ,y _i )，…，(x _n ,y _n ) }. Here (x) _i ,y _i ) Are the coordinates of the points.

Steps 4-6 represent the distance between the centroid and the random sample point on the sketch or two-dimensional view boundary using the D1 descriptor. Extracting dots from the dots, and collecting PD1= { ai = { ₁ ，…，ai _k ，…，ai _N }. D1 set of shape distribution features as { D1_ v ₁ ，…，D1_v _i ，…，D1_v _Bins }. Wherein D1_ v _i Is a statistic of intervals (BinsSize × (i-1), binsSize ×, i), bins is the number of intervals, and BinsSize is the length of the intervals. D1_ v _i The calculation formula of (c) is as follows:

D1_v _i ＝|{P|dist(P,O)∈(BinSize*(i-1),BinSize*i),P∈PD1}|

where BinsSize = max ({ dist (P, O) | P ∈ PD1 })/N, dist () is the euclidean distance between two points. O is the centroid of the sketch or 2D view.

Steps 4-7 describe the distance between two random sample points on the sketch or two-dimensional view boundary using a D2 descriptor. Extracting point pairs from the points and collecting the point pairs as PD2= { (ai) ₁ ,bi ₁ )，(ai ₂ ,bi ₂ )，…，(ai _N ,bi _N ) }. D2 set of shape distribution features as { D2_ v } ₁ ，…，D2_v _i ，…，D2_v _Bins }. Here, D2_ v _i Statistics in the intervals (BinSize × (i-1), binSize × (i)) are represented. D2_ v _i The calculation formula is as follows:

D2_v _i ＝|{P|dist(P)∈(BinSize*(i-1),BinSize*i),P∈PD2}|

where binsseze = max ({ dist (P) | P ∈ PD2 })/N.

Steps 4-8 utilize the D3 descriptor for describing the square root of the area formed by the 3 random sample points on the sketch or 2D view boundary. Extracting point triplets from the points, and collecting PD3= { (ai) ₁ ,bi ₁ ,ci ₁ )，(ai ₂ ,bi ₂ ,ci ₂ )，…，(ai _n ,bi _n ,ci _n ) }. D3 set of shape distribution features as { D3_ v ₁ ，…，D3_v _i ，…，D3_v _Bins }. Here, D3_ v _i The statistical information in the interval (BinSize × (i-1), binSize × i) is shown. D3_ v _i The calculation formula is as follows:

D3_v _i ＝|{P|herson(P)∈(BinSize*(i-1),BinSize*i),P∈PD3}|

wherein,

herson () stands for Helen's formula, and calculates triangle P = (P) using Helen's formula ₁ ,P ₂ ,P ₃ ) The calculation formula is as follows:

wherein, a = dist (P) ₁ ,P ₂ ),b＝dist(P ₁ ,P ₃ ),c＝dist(P ₂ ,P ₃ ).

Step 4-9D1_v _i ,D2_v _i ,D3_v _i The connection forms a shape distribution feature, i =1,2, \ 8230;, bins.

6. The three-dimensional model retrieval method based on interactive attention CNN and weighted similarity calculation of claim 1, wherein in the step 5, a plurality of features are fused, and a model most similar to a hand-drawn sketch is retrieved according to a similarity measurement formula by the following specific processes:

step 5-1, selecting Euclidean distance as a similarity measurement method;

and 5-2, extracting feature vectors from the two-dimensional view and the sketch by using the improved interactive attention convolution neural network, and normalizing the feature vectors. Calculating similarity by using Euclidean distance, marking as distance1, and calculating retrieval accuracy, and marking as t1;

and 5-3, extracting the feature vectors of the sketch and the model view by using the gist features, and normalizing the feature vectors. Calculating similarity by using Euclidean distance, marking as distance2, and calculating the accuracy of retrieval, and marking as t2;

and 5-4, extracting a feature vector between the sketch and the model view by using the two-dimensional shape distribution features, and performing normalization processing on the feature vector. Calculating similarity by using Euclidean distance, marking as distance3, and calculating retrieval accuracy, and marking as t3;

and 5-5, comparing the accuracy of the three features, and performing weighted fusion on the features to form a new feature similarity Sim (distance). The formula is as follows:

Sim(distance)＝w ₁ *distance1+w ₂ *distance2+w ₃ *distance，w ₁ +w ₂ +w ₃ ＝1

wherein w ₁ ＝t ₁ /(t ₁ +t ₂ +t ₃ )，w ₂ ＝t ₂ /(t ₁ +t ₂ +t ₃ )，w ₃ ＝t ₃ /(t ₁ +t ₂ +t ₃ )

And 5-6, sorting according to the similarity from small to large to realize the retrieval effect.

Has the advantages that:

1. the invention discloses a three-dimensional model retrieval method based on an interactive attention convolution neural network. Model search was performed based on the SHREC13 database and the model net40 database. Experimental results show that the method has high accuracy.

2. The retrieval model used by the invention is an interactive attention module and a convolutional neural network model, and the convolutional neural network has the capacity of local perception and parameter sharing, can well process high-dimensional data and does not need to manually select data characteristics. The proposed interactive attention model combines the attention weights of two adjacent convolutional layers to realize the interaction of data between two network layers. And a better retrieval effect can be obtained by the trained convolutional neural network model.

3. And when the model is trained, updating parameters by adopting a random gradient descent method. The error returns along the original route through reverse propagation, namely, each layer of parameters are updated layer by layer from the output layer through each middle hidden layer in the reverse direction, and finally the error returns to the output layer. Forward and backward propagation are continuously performed to reduce errors and update model parameters until the CNN is trained.

4. The invention improves the distribution characteristics of the three-dimensional shape, so that the method is suitable for sketch and two-dimensional view. Shape information of the sketch and the three-dimensional model view is described using a shape distribution function.

5. The invention adopts a mode of self-adaptive fusion of various characteristics to perform similarity fusion on the provided characteristics, thereby realizing better retrieval effect.

Description of the drawings:

fig. 1 is a sketch to be retrieved in an embodiment of the present invention.

Fig. 2 is a three-dimensional model search framework diagram according to an embodiment of the present invention.

FIG. 3 is a projection view of a model in an embodiment of the invention.

Fig. 4 is a Canny edge view in an embodiment of the invention.

FIG. 5 is a model of an interactive attention convolutional neural network in an embodiment of the present invention.

FIG. 6 is a training process of the interactive attention convolution neural network in an embodiment of the present invention.

FIG. 7 illustrates a testing process of the Interactive attention convolutional neural network in an embodiment of the present invention.

The specific implementation mode is as follows:

in order to clearly and completely describe the technical solutions in the embodiments of the present invention, the present invention is further described in detail below with reference to the drawings in the embodiments.

The invention uses the sketch of the SHREC13 and the data of the model Net40 model base to carry out experimental verification. Take "17205.Png" in the SHREC13 sketch and "table _0399.Off" in the model library of ModelNet40 as examples. The sketch to be retrieved is shown in fig. 1.

The experimental frame diagram of the three-dimensional model retrieval method based on the interactive attention convolution neural network is implemented, as shown in fig. 2, and comprises the following steps:

step 1, projecting the three-dimensional model to obtain a three-dimensional model edge view set, which specifically comprises the following steps:

step 1-1, a table _0399.Off file is placed in the center of a virtual sphere.

Step 1-2, placing a virtual camera above the model, and rotating the model by 360 degrees at each step by 30 degrees, so as to obtain 12 view sets of the three-dimensional model, wherein one view is taken as an example for display, and the projection view of the model is shown in fig. 3;

the views obtained by steps 1-3 using the Canny edge detection algorithm are shown in fig. 4;

step 2, designing a deep convolutional neural network, and optimizing a network model by using an interactive attention module, as shown in fig. 5, specifically:

and 2-1, designing a deep convolutional neural network for better characteristic extraction effect, wherein the deep convolutional neural network comprises 5 convolutional layers, 4 pooling layers, two dropout layers, a connecting layer and a full connecting layer.

Step 2-2, embedding the interactive attention module into the designed convolutional neural network structure, connecting a global pooling layer after the output of the convolutional layer, and solving the information quantity Z of each channel in the convolutional layer _k . Taking the sketch as an example, the first convolution layer information amount of the sketch is as follows:

Z _k ＝[[0.0323739 0.04996519 0.0190248 0.03274497 0.03221277 0.00206719 0.04075038 0.01613641 0.03390235 0.04024649 0.03553107 0.00632962 0.03442683 0.04588291 0.01900478 0.02144121 0.03710039 0.03861086 0.05596253 0.0439686 0.03611921 0.04850776 0.00716817 0.02596463 0.00525256 0.03657651 0.02809189 0.03490375 0.04528182 0.03938764 0.00690786 0.04449471]]

step 2-3, two full connection layers are connected after the global pooling layer, and the attention weight S of each channel is self-adaptively adjusted according to the information quantity _kn . Taking the sketch as an example, the attention weights of the sketch are as follows:

S _kn ＝[[0.49450904 0.49921992 0.50748134 0.5051483 0.5093386 0.49844238 0.50426346 0.50664175 0.5053692 0.5012332 0.5004162 0.49788538 0.505669 0.5012219 0.5009724 0.4942028 0.49796405 0.4992011 0.5064934 0.4963113 0.50500274 0.50238824 0.50202376 0.49661288 0.50185806 0.5048757 0.5073203 0.50703263 0.51684725 0.50641936 0.5052296 0.4979179]]

step 2-4 calculating the interactive attention weight S of two neighborhood convolution layers respectively _k1 And S _k2 And fusing the data to obtain the optimal attention weight S _k The optimal attention weight of the sketch is as follows:

S _k ＝[[0.4625304 0.47821882 0.5064253 0.5032532 0.5093386 0.49877496 0.50426346 0.50664175 0.5053692 0.5012332 0.5004162 0.49784237 0.505688 0.5011142 0.5008647 0.4942028 0.49796405 0.4991069 0.5064934 0.4963113 0.5102687 0.50125698 0.502524856 0.49675384 0.49365704 0.5027958 0.5076529 0.50814523 0.51006527 0.50361942 0.50422731 0.4635842]]

step 2-5 will notice the weight S _k And second convolution layer conv ₂ The first pooling layer a _p Fusing to obtain final result a ₂ Partial results for the second convolution layer of the sketch are:

a ₂ ＝[[[[0.14450312 0.0644969 0.10812703...0.18608719 0.01994037 0]

[0.18341058 0.15881275 0.24716881...0.18875208 0.14420813 0.08290599]

[0.17390229 0.14937611 0.2255666...0.15295741 0.18792515 0.08066748]

...

[0.31344187 0.18656467 0.22178406...0.22087486 0.22130579 0.00955889]

[0.12405898 0.10548315 0.11685486...0.10439464 0.2906406 0.14846338]]

[[0.10032222 0.21919143 0.09797319...0.13584027 0.0.12112971]

[0.20946684 0.14252397 0.17954415...0.09708451 0.0.15463363]

[0.06941956 0.03963253 0.13273408...0.00173131 0.04566149 0.14895247]

...

[[0.01296724 0.27460644 0.09022377...0.06938899 0.04487894 0.2567152]

[0.16118288 0.38024116 0.02033611...0.13374138 0 0.17068687]

[0.09430372 0.35878736 0...0.0846955 0 0.25289127]

...

[0.10363265 0.4103881 0...0.0728834 0 0.29586816]

[0.18578637 0.34666267 0...0.05323519 0 0.27042198]

[0.0096841 0.18718664 0...0.04646093 0.00576336 0.155898]]]]

step 3, training the convolutional neural network model, as shown in fig. 6, specifically comprising the following steps:

step 3-1, inputting the sketch and the edge two-dimensional view into an initialized interactive attention convolution neural network as training data;

step 3-2, extracting more detailed view characteristics through the convolution layer;

3-3, after the attention module is fused with the neighborhood convolution layer through the weighting channel, the information lost when the edge view of the hand-drawn sketch or the model is pooled can be reduced;

step 3-4, extracting the maximum view information through a pooling layer;

step 3-5, passing through a Dropout layer, reducing the overfitting problem caused by insufficient training samples;

3-6, after alternately operating convolution, an attention module, dropout and pooling, finally inputting a full connection layer, and reducing the dimension of the extracted features to connect the extracted features into a one-dimensional high-level semantic feature vector;

steps 3-7 learned by the softmax function that the probability of sketch "17205.Png" under the "table" category is 89.99%

Step 3-8 is to predict the probability y _ test _ij And true probability y _j A comparison is made and the error loss is calculated using the cross entropy loss function.

Wherein loss ₁₇₂₀₅ Error of draft "17205.Png" is shown.

And continuously iterating the interactive attention convolution neural network model to obtain the optimized interactive attention convolution neural network model.

Step 4, extracting semantic features and shape distribution features, specifically:

step 4-1, inputting the test data into the optimized interactive attention convolution neural network model, wherein the test process is shown in FIG. 7;

and 4-2, extracting the features of the full connection layer to be used as high-level semantic features of the hand-drawn sketch or the model view. Part of the high-level semantic features of the extracted sketch are as follows:

Feature＝[[0,0.87328064,0,0,1.3293583,0,2.3825126,0,0,4.8035927,0,1.5186063,0,3.6845286,1.0825952,0,1.8516512,1.0285587,0,0,0,3.3322043,1.0545557,0,0,4.8707848,3.042554,0,0,0,0,6.8227463,2.537525,1.5318785,2.7271123,0,3.0482264……]]

step 4-3 dividing the size sketch or two-dimensional view into 4 x 4 blocks;

step 4-4 each block is processed by 32 Gabor filters of 4 scales, 8 directions. And combining the processed features to obtain gist features. The Gist feature is extracted 512 dimensions, and the partial Gist feature of the sketch is as follows:

G(x,y)＝[[5.81147151e-03 1.51588341e-02 1.75721212e-03 2.10059434e-01 1.62918585e-01 1.54040498e-01 1.44374291e-01 8.71880878e-01 5.26758657e-01 4.14263371e-01 7.17606844e-01 6.22190594e-01 1.11205845e-01 7.69002490e-04 2.18182730e-01 2.29565939e-01 9.32599080e-03 1.10805327e-02 1.40071468e-03 2.58543039e-01 5.67934220e-02 1.06132064e-01 9.10082146e-02 4.02163211e-01 2.97883778e-01 2.45860956e-01 4.02066928e-01 2.84401506e-01

1.03228724e-01 6.37419945e-04 2.71290458e-01……]]

step 4-5, randomly and equidistantly sampling points on the boundary of the sketch or the two-dimensional view;

and 4-6, representing the distance between the centroid and the random sampling point on the boundary of the sketch or the two-dimensional view by using the D1 descriptor. The portion D1 descriptor of the sketch is as follows:

D1＝[0.30470497858541628,0.6256941275550102,0.11237884569183111,0.23229854666522,0.2657159486944761,0.0731852015843772,0.40751749800795261……]

steps 4-7 use the D2 descriptor to describe the distance between two random sample points on the sketch or two-dimensional view boundary. The portion D2 descriptor of the sketch is as follows:

D2＝[0.13203683803844625,0.028174099301372796,0.15392681513105217,0.130238265264,0.123460163767958,0.06985106421513015,0.12992235205980568……]

steps 4-8 utilize the D3 descriptor for describing the square root of the area formed by the 3 random sample points on the sketch or two-dimensional view boundary. The portion D3 descriptor of the sketch is as follows:

D3＝[0.9193157274532394,0.5816923854309814,0.46980644879802125,0.498873567635874,0.7195175116705602,0.29425190983247506,0.8724092377243926……]

step 4-9, connecting D1, D2 and D3 in series to form a shape distribution characteristic;

and 5, fusing a plurality of characteristics of the sketch, and retrieving a model most similar to the hand-drawn sketch according to a similarity measurement formula, wherein the method specifically comprises the following steps:

step 5-1, comparing various similarity retrieval methods, wherein the final effect is best in Euclidean distance;

and 5-2, extracting feature vectors from the two-dimensional view and the sketch by using the improved interactive attention convolution neural network, and normalizing the feature vectors. Calculating the similarity by using the Euclidean distance, marking as distance1, and the retrieval accuracy rate is 0.96;

and 5-3, extracting the feature vectors of the sketch and the model view by using the gist features, and normalizing the feature vectors. Calculating the similarity by using the Euclidean distance, marking as distance2, and the retrieval accuracy rate is 0.53;

and 5-4, extracting a feature vector between the sketch and the model view by using the two-dimensional shape distribution features, and performing normalization processing on the feature vector. Calculating the similarity by using the Euclidean distance, marking as distance3, and the retrieval accuracy rate is 0.42;

and 5-5, determining the weight according to the retrieval accuracy of the three characteristics.

The final weight is determined as: 5:3:2

Sim(distance)＝0.5*distance1+0.3*distance2+0.2*distance

According to the three-dimensional model retrieval method based on the interactive attention convolution neural network, a traditional characteristic and depth characteristic weighting fusion mode is adopted, and a good retrieval effect is achieved.

The foregoing is a detailed description of embodiments of the invention, taken in conjunction with the accompanying drawings, wherein the specific embodiments are merely provided to assist in understanding the method of the invention. For those skilled in the art, variations and modifications can be made within the scope of the embodiments and applications according to the concept of the present invention, and therefore the present invention should not be construed as being limited thereto.

Claims

step 1: carrying out data preprocessing, projecting the three-dimensional model to obtain a plurality of views corresponding to the three-dimensional model and obtaining an edge view set of the model by using an edge detection algorithm;

step 2: designing a deep convolutional neural network, optimizing a network model by using an interactive attention module, selecting one part of view sets as a training set, and selecting the other part of view sets as a test set, wherein the method comprises the following steps:

wherein, conv _nk A kth feature map of size H representing the output of the nth convolutional layer _n *W _n ；

S _kn ＝F _ex (Z,W)＝σ(g(Z,W))＝σ(W ₂ δ(W ₁ Z))；

wherein, delta is Relu function, sigma is sigmoid function, W ₁ 、W ₂ Weights for the first full connection and the second full connection, respectively;

step 2-4, respectively calculating the interactive attention weight S of the two neighborhood convolution layers _k1 And S _k2 And fusing the two to obtain the optimal attention weight S _k The calculation formula of the optimal attention weight is as follows:

S _k ＝Average(S _k1 ,S _k2 )；

step 2-5, attention will be paid to the weight S _k And second convolution layer conv ₂ The first pooling layer a _p Fusing to obtain final result a ₂ The fused calculation formula is as follows:

selecting one part of view set as a training set and the other part of view set as a test set;

and 3, step 3: training comprises a forward propagation process and a backward propagation process, training data are used as input of interactive attention convolution neural network model training, and an optimized interactive attention convolution neural network model is obtained through the training of the interactive attention convolution neural network model;

and 4, step 4: extracting semantic features of the freehand sketch and the model view respectively by using the optimized interactive attention convolution neural network model and gist features, and extracting two-dimensional shape distribution features of the freehand sketch and the model view respectively by using two-dimensional shape distribution features;

and 5: and (4) weighting and fusing the plurality of features, and retrieving the model which is most similar to the hand-drawn sketch according to the Euclidean distance.

step 1-1, arranging a three-dimensional model in the center of a virtual sphere;

3. The method for retrieving the three-dimensional model based on the interactive attention convolutional neural network as claimed in claim 1, wherein in the step 3, the convolutional neural network model is trained, and the specific steps are as follows:

step 3-2, extracting more detailed view features through the convolutional layers, extracting low-level features through the shallow-level convolutional layers, and extracting high-level semantic features through the high-level convolutional layers;

3-6, after alternately operating convolution, attention module, dropout and pooling, finally inputting a full connection layer, and reducing the dimension of the extracted features to connect the extracted features into a one-dimensional high-level semantic feature vector;

3-7, in the back propagation process, using the 2D view with the label to optimize the weight and the bias of the interactive attention convolution neural network, wherein the 2D view set is { v } ₁ ,v ₂ ，…，v _n Is set of { l } labels ₁ ,l ₂ ，…,l _n },2D views have t classes, including 1,2, \ 8230;, t, after forward propagation, v _i The prediction probability in class j is y _ testj, v is _i Label l of _i Comparing with the category j, calculating the expected probability y _ij The formula for calculating the probability is as follows:

step 3-8, predicting the probability y _ test _ij And true probability y _j Comparing, and calculating an error loss by using a cross entropy loss function;

the error loss is calculated as follows:

4. The method for retrieving a three-dimensional model based on an interactive attention convolution neural network as claimed in claim 1, wherein in the step 4, the optimized interactive attention convolution neural network model and gist feature are used to extract semantic features of a freehand sketch and a model view respectively, and two-dimensional shape distribution features of the freehand sketch and the model view are extracted respectively by using two-dimensional shape distribution features, and the specific process is as follows:

step 4-1, inputting test data into the optimized interactive attention convolution neural network model;

4-2, extracting the characteristics of the full connection layer to serve as high-level semantic characteristics of the hand-drawn sketch or the model view;

step 4-3, dividing the sketch or the 2D view with the size of m × n into 4 × 4 blocks, wherein the size of each block is a × b, wherein a = m/4, b = n/4;

and 4-4, processing each block by 32 Gabor filters with 4 scales and 8 directions, and combining the processed features to obtain a gist feature, wherein the formula is as follows:

where I =4,j =8,g (x, y) is the gist characteristic of 32 Gabor filters, cat () represents the stitching operation, where x and y are the positions of the pixels, I (x, y) represents the block, and g is the same time _ij (x, y) is the filter for the ith scale and the jth direction, which represents the convolution operation;

step 4-5, randomly and equidistantly sampling points on the boundary of the sketch or the 2D view, and collecting the points as points = { (x) ₁ ,y ₁ )，…，(x _i ,y _i )，…，(x _n ,y _n ) Here (x) _i ,y _i ) Is the coordinates of the point(s),

step 4-6, representing the distance between the centroid and the random sampling point on the boundary of the sketch or the two-dimensional view by using the D1 descriptor, extracting points from the points, and collecting PD1= { ai = ₁ ，…，ai _k ，…，ai _N }, D1 set of shape distribution features as { D1_ v } ₁ ，…，D1_v _i ，…，D1_v _Bins D1_ vi is a statistic of intervals (BinsSize × (i-1), binsSize ×, i), bins is the number of intervals, binsSize is the length of the intervals, and the calculation formula of D1_ vi is as follows:

D1_v _i ＝|{P|dist(P,O)∈(BinSize*(i-1),BinSize*i),P∈PD1}|；

wherein, binsisze = max ({ dist (P, O) | P ∈ PD1 })/N, dist () is the euclidean distance between two points, and O is the centroid of the sketch or 2D view;

step 4-7, using the D2 descriptor to describe the distance between two random sampling points on the boundary of the sketch or the two-dimensional view, extracting point pairs from the points, and collecting the point pairs as PD2= { (ai) ₁ ,bi ₁ )，(ai ₂ ,bi ₂ )，…，(ai _N ,bi _N ) }, D2 set of shape distribution features as { D2_ v } ₁ ，…，D2_v _i ，…，D2_v _Bins Here, D2_ vi represents a statistic in an interval (BinSize × (i-1), binSize × i), and D2_ vi is calculated as follows:

D2_v _i ＝|{P|dist(P)∈(BinSize*(i-1),BinSize*i),P∈PD2}|；

wherein BinsSize = max ({ dist (P) | P ∈ PD2 })/N,

step 4-8, extracting point triples from the points by using the D3 descriptor to describe the square root of the area formed by the 3 random sampling points on the boundary of the sketch or the 2D view, and collecting PD3= { (ai) ₁ ,bi ₁ ,ci ₁ )，(ai ₂ ,bi ₂ ,ci ₂ )，…，(ai _n ,bi _n ,ci _n ) D3 shape distribution feature set to { D3_ v } ₁ ，…，D3_v _i ，…，D3_v _Bins Here, D3_ v _i Represents statistical information in the interval (BinSize (i-1), binSize i), D3_ v _i Comprises the following steps:

D3_v _i ＝|{P|herson(P)∈(BinSize*(i-1),BinSize*i),P∈PD3}|；

wherein,

herson () stands for helln formula, and triangle P = (P) is calculated using helln formula ₁ ,P ₂ ,P ₃ ) The calculation formula is as follows:

wherein, a = dist (P) ₁ ,P ₂ ),b＝dist(P ₁ ,P ₃ ),c＝dist(P ₂ ,P ₃ )；

And 4-9, connecting D1_ vi, D2_ vi and D3_ vi to form a shape distribution characteristic, i =1,2, \ 8230;, bins.

5. The method for retrieving a three-dimensional model based on an interactive attention convolution neural network according to claim 1, wherein in the step 5, a plurality of features are fused, and a model most similar to a hand-drawn sketch is retrieved according to a similarity measurement formula, and the specific process is as follows:

step 5-1, selecting Euclidean distance as a similarity measurement method;

step 5-2, extracting feature vectors from the two-dimensional view and the sketch by using an improved interactive attention convolution neural network, normalizing the feature vectors, calculating similarity by using Euclidean distance, and marking the similarity as distance1, and calculating retrieval accuracy as t1;

step 5-3, extracting feature vectors of the sketch and the model view by using gist features, and carrying out normalization processing on the feature vectors; calculating similarity by using Euclidean distance, marking as distance2, and calculating the accuracy of retrieval, and marking as t2;

step 5-4, extracting a feature vector between the sketch and the model view by using the two-dimensional shape distribution feature, carrying out normalization processing on the feature vector, calculating the similarity by using the Euclidean distance, and marking the similarity as distance3, and calculating the retrieval accuracy as t3;

and 5-5, comparing the accuracy of the three features, and performing weighted fusion on the features to form a new feature similarity distance, wherein the formula is as follows:

Sim(distance)＝w ₁ *distance1+w ₂ *distance2+w ₃ *distance，w ₁ +w ₂ +w ₃ ＝1；

wherein w ₁ ＝t ₁ /(t ₁ +t ₂ +t ₃ )，w ₂ ＝t ₂ /(t ₁ +t ₂ +t ₃ )，w ₃ ＝t ₃ /(t ₁ +t ₂ +t ₃ )；