CN118135255A

CN118135255A - Training method of image matching model, image matching method and computer equipment

Info

Publication number: CN118135255A
Application number: CN202211542850.5A
Authority: CN
Inventors: 黄冉冉; 蔡剑成; 刘新民
Original assignee: Meituan Technology Co ltd
Current assignee: Meituan Technology Co ltd
Priority date: 2022-12-02
Filing date: 2022-12-02
Publication date: 2024-06-04

Abstract

The application discloses a training method of an image matching model, an image matching method and computer equipment, and belongs to the technical field of artificial intelligence. The method comprises the following steps: and acquiring a plurality of first features corresponding to the first image, a plurality of second features corresponding to the second image and a plurality of third features corresponding to the second image output by the initial image matching model, determining a plurality of positive feature pairs, a plurality of first negative feature pairs and a plurality of second negative feature pairs, and updating the initial image matching model according to the plurality of positive feature pairs, the plurality of first negative feature pairs and the plurality of second negative feature pairs to obtain a target image matching model for determining whether the plurality of images are matched. According to the application, the first feature, the second feature and the third feature are obtained, the positive feature pair, the first negative feature pair and the second negative feature pair are determined, the number of the negative feature pairs is increased, the features of more diversified images can be brought into the negative feature pairs, the training of the image matching model is brought into, the training effect is improved, and the image matching capability of the image matching model is further improved.

Description

Training method of image matching model, image matching method and computer equipment

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a training method of an image matching model, an image matching method and computer equipment.

Background

With the development of artificial intelligence technology, more and more artificial intelligence models are applied to life, and an image matching model is one of the artificial intelligence models. The image matching model is used to extract features for the plurality of images, the features being used to determine whether there is a match between the plurality of images. How to train and obtain the image matching model becomes a problem to be solved urgently.

Disclosure of Invention

The embodiment of the application provides a training method of an image matching model, an image matching method and computer equipment, which are used for training to obtain the image matching model. The technical scheme provided by the embodiment of the application comprises the following aspects.

In a first aspect, an embodiment of the present application provides a training method for an image matching model, where the method includes:

Acquiring a first image and a second image which are matched, inputting the first image and the second image into an initial image matching model, and obtaining a plurality of first features corresponding to the first image and a plurality of second features corresponding to the second image output by the initial image matching model;

determining a plurality of positive feature pairs and a plurality of first negative feature pairs according to the plurality of first features and the plurality of second features, wherein any one positive feature pair comprises a first feature and a second feature which are matched, and any one first negative feature pair comprises a first feature and a second feature which are not matched;

Acquiring a third feature, determining a plurality of second negative feature pairs according to the third feature and a target feature, wherein the third feature is a feature corresponding to a third image except the first image and the second image, and the target feature is at least one of the plurality of first features and the plurality of second features;

updating an initial image matching model according to the positive feature pairs, the first negative feature pairs and the second negative feature pairs to obtain a target image matching model, wherein the target image matching model is used for outputting reference features according to the input images, and the reference features are used for determining whether the images are matched.

In one possible implementation manner, acquiring the third feature includes: sampling the third image to obtain a plurality of alternative features corresponding to the third image; determining a feature queue according to a plurality of alternative features, wherein the arrangement order of the plurality of alternative features in the feature queue is determined according to the sampling order of the plurality of alternative features; and acquiring a third feature from the feature queue according to the arrangement order.

In one possible implementation, updating the initial image matching model according to the plurality of positive feature pairs, the plurality of first negative feature pairs, and the plurality of second negative feature pairs, to obtain the target image matching model includes: obtaining a reference value; determining a loss function from the reference value, the plurality of positive feature pairs, the plurality of first negative feature pairs, and the plurality of second negative feature pairs; and updating the initial image matching model according to the loss function to obtain a target image matching model.

In one possible implementation, any one of the first feature pairs corresponds to a positive feature pair and at least one of the first negative feature pairs, any one of the second feature pairs corresponds to a positive feature pair and at least one of the first negative feature pairs, and determining the loss function based on the reference value, the plurality of positive feature pairs, the plurality of first negative feature pairs, and the plurality of second negative feature pairs comprises: multiplying any one first feature point by a second feature included in a positive feature pair corresponding to any one first feature to obtain a first product, multiplying any one first feature point by at least one second feature included in at least one first negative feature pair corresponding to any one first feature to obtain at least one second product, and determining a first loss according to the first product, the at least one second product and a reference value; multiplying any one second feature point by a first feature included in a positive feature pair corresponding to any one second feature to obtain a third product, multiplying any one second feature point by at least one first feature included in a first negative feature pair corresponding to any one second feature to obtain at least one fourth product, and determining a second loss according to the third product, the at least one fourth product and a reference value; multiplying the target feature points included in each second negative feature pair by the third feature to obtain a plurality of fifth products, determining a third loss according to the plurality of fifth products, the first products and the reference values when the target feature includes any one of the first features, and determining a fourth loss according to the plurality of fifth products, the third products and the reference values when the target feature includes any one of the second features; and carrying out weighted summation on at least one of the third loss and the fourth loss and the first loss and the second loss to obtain a loss function.

In one possible implementation, determining the first loss from the first product, the at least one second product, and the reference value includes: the second products and the first products are subjected to difference to obtain at least one first difference value, the first difference values and the reference value are subjected to sum to obtain at least one first value, and the first loss is determined according to the at least one first value; determining a second loss from the third product, the at least one fourth product, and the reference value, comprising: the fourth products and the third products are subjected to difference to obtain at least one second difference value, the second difference values and the reference value are subjected to sum to obtain at least one second value, and the second loss is determined according to the at least one second value; determining a third loss from the plurality of fifth products, the first products, and the reference value, comprising: the plurality of fifth products and the first products are subjected to difference to obtain a plurality of third difference values, the third difference values are subjected to sum with reference values to obtain a plurality of third values, and third loss is determined according to the plurality of third values; determining a fourth loss from the plurality of fifth products, the third products, and the reference value, comprising: and carrying out difference operation on the plurality of fifth products and the third products to obtain a plurality of fourth difference values, carrying out sum operation on each fourth difference value and the reference value to obtain a plurality of fourth values, and determining fourth loss according to the plurality of fourth values.

In one possible implementation, acquiring the matched first and second images includes: acquiring a basic image, wherein the basic image is a first image or an image matched with the first image; and processing the basic image to obtain a second image matched with the first image, wherein the processing comprises at least one of random angle transformation and random scale transformation.

In one possible implementation, determining a plurality of positive feature pairs and a plurality of first negative feature pairs from the plurality of first features and the plurality of second features includes: determining a relative order of the respective first features among the plurality of first features; determining a relative order of each second feature in the plurality of second features; determining the first feature and the second feature which are in the same relative sequence as positive feature pairs; the first feature and the second feature, which are different in relative order, are determined as a first negative feature pair.

In a second aspect, an embodiment of the present application provides a method for matching images, including:

acquiring a fourth image and a fifth image, inputting the fourth image and the fifth image into a target image matching model, obtaining a plurality of fourth features and a plurality of fifth features output by the target image matching model, and training the target image matching model according to the training method of the image matching model in any one of the first aspect;

according to the fourth features and the fifth features, it is determined whether the fourth image and the fifth image match.

In a third aspect, an embodiment of the present application provides a training apparatus for an image matching model, including:

The acquisition module is used for acquiring a first image and a second image which are matched, inputting the first image and the second image into the initial image matching model, and obtaining a plurality of first features corresponding to the first image and a plurality of second features corresponding to the second image output by the initial image matching model;

The determining module is used for determining a plurality of positive feature pairs and a plurality of first negative feature pairs according to the plurality of first features and the plurality of second features, wherein any positive feature pair comprises a matched first feature and a matched second feature, and any first negative feature pair comprises a non-matched first feature and a non-matched second feature;

the acquisition module is further used for acquiring a third feature, determining a plurality of second negative feature pairs according to the third feature and a target feature, wherein the third feature is a feature corresponding to a third image except the first image and the second image, and the target feature is at least one of the plurality of first features and the plurality of second features;

the updating module is used for updating the initial image matching model according to the positive feature pairs, the first negative feature pairs and the second negative feature pairs to obtain a target image matching model, wherein the target image matching model is used for outputting reference features according to the input images, and the reference features are used for determining whether the images are matched.

In one possible implementation manner, the obtaining module is configured to sample the third image to obtain a plurality of alternative features corresponding to the third image; determining a feature queue according to a plurality of alternative features, wherein the arrangement order of the plurality of alternative features in the feature queue is determined according to the sampling order of the plurality of alternative features; and acquiring a third feature from the feature queue according to the arrangement order.

In one possible implementation, the updating module is configured to obtain the reference value; determining a loss function from the reference value, the plurality of positive feature pairs, the plurality of first negative feature pairs, and the plurality of second negative feature pairs; and updating the initial image matching model according to the loss function to obtain a target image matching model.

In one possible implementation manner, any one first feature corresponds to a positive feature pair and at least one first negative feature pair, any one second feature corresponds to a positive feature pair and at least one first negative feature pair, an updating module is used for multiplying any one first feature point by a second feature included in the positive feature pair corresponding to any one first feature to obtain a first product, multiplying any one first feature point by the second feature included in the at least one first negative feature pair corresponding to any one first feature to obtain at least one second product, and determining a first loss according to the first product, the at least one second product and a reference value; multiplying any one second feature point by a first feature included in a positive feature pair corresponding to any one second feature to obtain a third product, multiplying any one second feature point by at least one first feature included in a first negative feature pair corresponding to any one second feature to obtain at least one fourth product, and determining a second loss according to the third product, the at least one fourth product and a reference value; multiplying the target feature points included in each second negative feature pair by the third feature to obtain a plurality of fifth products, determining a third loss according to the plurality of fifth products, the first products and the reference values when the target feature includes any one of the first features, and determining a fourth loss according to the plurality of fifth products, the third products and the reference values when the target feature includes any one of the second features; and carrying out weighted summation on at least one of the third loss and the fourth loss and the first loss and the second loss to obtain a loss function.

In one possible implementation, the updating module is configured to perform a difference between each second product and the first product to obtain at least one first difference value, sum each first difference value and a reference value to obtain at least one first value, and determine the first loss according to the at least one first value; the updating module is further used for differentiating each fourth product with the third product to obtain at least one second difference value, summing each second difference value with a reference value to obtain at least one second value, and determining a second loss according to the at least one second value; the updating module is further used for carrying out difference on the plurality of fifth products and the first products to obtain a plurality of third difference values, carrying out sum on each third difference value and a reference value to obtain a plurality of third values, and determining third loss according to the plurality of third values; the updating module is further configured to perform difference between the plurality of fifth products and the third products to obtain a plurality of fourth difference values, sum each fourth difference value with a reference value to obtain a plurality of fourth values, and determine a fourth loss according to the plurality of fourth values.

In one possible implementation manner, the acquiring module is configured to acquire a base image, where the base image is a first image or an image matched with the first image; and processing the basic image to obtain a second image matched with the first image, wherein the processing comprises at least one of random angle transformation and random scale transformation.

In one possible implementation, the determining module is configured to determine a relative order of the first features in the plurality of first features; determining a relative order of each second feature in the plurality of second features; determining the first feature and the second feature which are in the same relative sequence as positive feature pairs; the first feature and the second feature, which are different in relative order, are determined as a first negative feature pair.

In a fourth aspect, an embodiment of the present application provides an apparatus for image matching, including:

The acquisition module is used for acquiring a fourth image and a fifth image, inputting the fourth image and the fifth image into the target image matching model, obtaining a plurality of fourth features and a plurality of fifth features output by the target image matching model, and training the target image matching model according to the training method of the image matching model provided by the first aspect or any one possible implementation manner of the first aspect;

And the determining module is used for determining whether the fourth image and the fifth image are matched according to the fourth characteristics and the fifth characteristics.

In a fifth aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor and a memory, and at least one computer program is stored in the memory, where the at least one computer program is loaded and executed by the processor, so that the computer device implements the training method of the image matching model provided in the first aspect or any one of possible implementation manners of the first aspect, or the method of image matching provided in the second aspect.

In a sixth aspect, an embodiment of the present application further provides a computer readable storage medium, where at least one computer program is stored, where the at least one computer program is loaded and executed by a processor, to enable a computer to implement the training method of the image matching model provided in the first aspect or any one of possible implementation manners of the first aspect, or the image matching method provided in the second aspect.

In a seventh aspect, embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads computer instructions from a computer readable storage medium, the processor executing the computer instructions, causing the computer device to perform the training method of the image matching model provided by the first aspect or any one of the possible implementation manners of the first aspect, or the method of image matching provided by the second aspect.

The technical scheme provided by the embodiment of the application at least has the following beneficial effects:

According to the application, the first feature, the second feature and the third feature are obtained, the positive feature pair, the first negative feature pair and the second negative feature pair are determined, the number of the negative feature pairs is increased, the features of more diversified images can be brought into the negative feature pairs, and the training of the image matching model is brought into the training, so that the training effect is improved. And the image matching capability using the trained image matching model is better.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;

FIG. 2 is a flowchart of a training method for an image matching model according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a training method of an image matching model according to an embodiment of the present application;

FIG. 4 is a flowchart of a method for image matching according to an embodiment of the present application;

FIG. 5 is a comparative schematic diagram of training results provided in an embodiment of the present application;

FIG. 6 is a comparative schematic diagram of another training result provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a training device for an image matching model according to an embodiment of the present application;

FIG. 8 is a schematic diagram of an apparatus for image matching according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a server according to an embodiment of the present application;

Fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

Image matching may identify and align feature-level content or structure in multiple images or frames with the same or similar features. Through image matching, applications such as visual navigation, image stitching, three-dimensional reconstruction, visual positioning, scene depth calculation, target detection tracking and the like can be realized. Intelligent image matching often depends on an image matching model, and thus a training method for the image matching model needs to be provided.

The embodiment of the application provides a training method of an image matching model and an image matching method, which can be applied to an implementation environment shown in fig. 1. As shown in FIG. 1, the implementation environment may include a computer device 11. The computer device 11 may be provided with an application program capable of training the image matching model, and when the application program needs to train the image matching model, the method provided by the embodiment of the application can be used for training. The computer device 11 may also be provided with an application program capable of performing image matching, and when the application program needs to perform image matching, the method provided by the embodiment of the present application may be applied to perform matching.

For example, the computer device 11 may establish a communication connection with other devices, and acquire, from the other devices, an image matching model to be trained and an image to be matched according to the communication connection, and the training method of the image matching model and the image matching method provided by the embodiment of the present application are implemented by interaction between the computer device 11 and the other devices. Or the computer device 11 may receive the image matching model to be trained and the image to be matched, which are uploaded manually, the training method and the image matching method of the image matching model provided by the embodiment of the application are completed by the computer device 11 alone.

Alternatively, the computer device 11 comprises a server or a terminal. The terminal includes, but is not limited to, intelligent devices with unmanned aerial vehicles, detectors, cell phones, tablet computers, personal computers, and the like. The server can be a server, a server cluster formed by a plurality of servers, or a cloud computing service center.

It will be appreciated by those skilled in the art that the above terminals and servers are only examples, and that other terminals or servers that may be present in the present application or in the future are applicable to the present application and are also included within the scope of the present application and are incorporated herein by reference.

The embodiment of the application provides a training method of an image matching model, which can be applied to computer equipment 11 shown in fig. 1. As shown in fig. 2, the method includes the following steps 201 to 204.

Step 201, acquiring a first image and a second image which are matched, inputting the first image and the second image into an initial image matching model, and obtaining a plurality of first features corresponding to the first image and a plurality of second features corresponding to the second image output by the initial image matching model.

The method provided by the application aims at training an initial image matching model, wherein the initial image matching model is an image matching model which needs to be trained. The embodiment of the application does not limit the type of the initial image matching model, and the initial image matching model is a convolutional neural network model. In an exemplary training process, a plurality of images required to be input into an initial image matching model are firstly acquired, the images are input into the initial image matching model, and then a plurality of characteristics corresponding to the images output by the initial image matching model are obtained, wherein the characteristics are used for determining whether the images are matched. For example, after inputting a plurality of images into the initial image matching model, the initial image matching model may output feature maps corresponding to the respective images, and a plurality of features corresponding to the respective images may be respectively included in the feature maps corresponding to the respective images.

For example, the plurality of images are matching images, for example, the images in the plurality of images may be matched in pairs. Wherein the matching images are images recorded with the same or similar scene, which may include one or more objects. For example, identical may refer to the objects being substantially identical and the characteristic information of the objects being the same, and similar may refer to the objects being substantially identical but the characteristic information of the objects being different, the characteristic information of the objects including, but not limited to, at least one of angle, scale, sharpness, darkness, color, brightness.

The matching image will be described by taking the example that the feature information includes angles and scales. If two images record the same building and the angles and dimensions of the recorded buildings are the same, the two images record the same scene and belong to matched images. If two images record the same building but at least one of the angle and the size of the recorded building is different, then the two images record similar scenes, also belonging to the matching images. If the two images record different buildings, the two images are neither identical nor similar, belonging to mismatched images.

The plurality of images required to be input into the initial image matching model comprise a first image and a second image, and the first image and the second image are matched with each other.

The method for acquiring the first image is not limited in the embodiment of the application. In some embodiments, the first image may be manually specified from an image library, the image library may include a plurality of images, and a matching relationship of the plurality of images in the image library is determined. For example, the image library stores an image 1, an image 2, an image 3 and an image 4, where the image 1 is matched with the image 2, and the image 3 is matched with the image 4, and any image in the image library may be designated as the first image. In other embodiments, the first image may also be obtained by capturing either scene by a computer device.

In some embodiments, the method of acquiring the second image may be the same as the method of acquiring the first image. For example, the second image is manually specified from the image library. For example, in the case where the image 1 in the image library is manually specified as the first image, the image 2 matching the image 1 may be specified as the second image. For another example, the second image may be obtained by photographing any scene by the computer device.

In other embodiments, acquiring a second image that matches the first image includes: acquiring a basic image, wherein the basic image is a first image or an image matched with the first image; and processing the basic image to obtain a second image matched with the first image, wherein the processing comprises at least one of random angle transformation and random scale transformation. The random angle transformation may be a random rotation of the base image, the random scale transformation may be a random scaling of the base image, and the process of processing the base image may also be referred to as a data transformation process.

In such an embodiment, the second image is an image processed by at least one of a random angle transform and a random scale transform. Because the steering and lifting of the computer equipment under the complex condition can bring rich and even severe angle and/or scale change, the second image can be input into the initial image matching model so as to avoid poor performance of the target image matching model under the scene with large angle and/or scale change after the initial image matching model is updated to obtain the target image matching model.

After the first image and the second image are acquired, the first image and the second image are input into an initial image matching model. In some embodiments, the initial image matching model may directly output a plurality of first features corresponding to the first image and a plurality of second features corresponding to the second image. The first feature may also be referred to as a first descriptor, a first feature descriptor, or a first sample, where the first feature is used to describe a local feature of the first image. The second feature may also be referred to as a second descriptor, a second feature descriptor, or a second sample, the second feature being used to describe a local feature of the second image.

Or in other embodiments, the initial image matching model outputs a first feature map based on the input first image, the first feature map comprising a plurality of first features of the first image. Accordingly, the initial image matching model outputs a second feature map based on the input second image. The second feature map includes a plurality of second features of the second image. Illustratively, the second feature map includes the same number of second features as the first feature map includes the first features.

Referring to fig. 3, a schematic diagram of a training method of an image matching model is shown, which is implemented based on a twin network structure, i.e. a first image and a second image are input into the same initial image matching model.

It should be understood that the sequence of acquiring the second image and inputting the first image into the initial image matching model does not affect the implementation effect of the training method of the image matching model provided by the present application, which is not limited in this application. The process of acquiring the first image and the second image and inputting the first image and the second image into the initial image matching model is a data processing stage of the training process.

Step 202, determining a plurality of positive feature pairs and a plurality of first negative feature pairs according to the plurality of first features and the plurality of second features, wherein any one positive feature pair comprises a matched first feature and second feature, and any one first negative feature pair comprises a non-matched first feature and second feature.

For a first feature, the positive pair of features includes a second feature that is different from the second feature of the first negative pair of features. Accordingly, for a second feature, the positive pair of features that the second feature includes is different from the first pair of features that the second feature includes.

Wherein the initial image matching model may extract and output the plurality of first features and the plurality of second features in the same order. For example, a feature extraction window may be set, and the initial image matching model slides the feature extraction window on the first image and the second image according to the same sliding sequence, and sequentially extracts features in the feature extraction window, so as to obtain a plurality of first features and a plurality of second features with the same sequence. Thereafter, a plurality of first features and a plurality of second features are output in the same order.

As an example, according to the above-mentioned plurality of features corresponding to the respective images may be respectively included in the feature maps corresponding to the respective images, it is known that the plurality of first features corresponding to the first image are included in the first feature map corresponding to the first image, and the plurality of second features corresponding to the second image are included in the second feature map corresponding to the second image. The relative order of the individual first features in the plurality of first features may be manifested by the relative position of the individual first features in the first feature map. Accordingly, the relative order of each second feature in the plurality of second features may be represented by the relative position of each second feature in the second feature map.

For example, referring to fig. 3, a first set of feature points may be generated from the acquired first image. The first feature point set comprises at least one first feature point, and first features included in the first feature map correspond to the first feature points one by one. For a first feature point, the first feature point can indicate a relative position of a first feature corresponding to the first feature point in the first feature map. Thus, the relative position of the first feature in the first feature map can be determined. The relative position of the first feature in the first feature map may be represented by a vector or may be represented by coordinates, and the method of representing the relative position of the first feature in the first feature map according to the present application is not limited.

Accordingly, referring still to fig. 3, when determining the relative position of the second feature in the second feature map, a second feature point set including at least one second feature point may be acquired, where one second feature point indicates the relative position of a second feature corresponding to the second feature point in the second feature map, which is not described herein again. Alternatively, if the second image is an image obtained by processing the base image, a set of base feature points including at least one base feature point may be obtained according to the base image, and then the relative position of the second feature in the second feature map may be determined based on the base feature points.

A first feature may be considered to match a second feature if the relative position of the first feature in the first feature map is the same as the relative position of the second feature in the second feature map. The first feature and the second feature may be determined as a positive feature pair, i.e. a positive sample pair. In this case, it is considered that the first feature does not match the other second feature, and the relative position of the other second feature in the second feature map is different from the relative position of the first feature in the first feature map. The first feature and the respective other second feature may then be determined as a first negative feature pair, i.e. a first negative sample pair, respectively.

For example, the first feature point set corresponding to the first image includes N first feature points, the first feature map corresponding to the first image output by the initial image matching model also includes N first features f _1,x, and x may be any integer from 1 to N. N first features F _1,x may also constitute a first feature set F ₁, denoted F ₁＝{f_1,1,f_1,2,…,f_1,N.

The positive feature pair where the i-th first feature f _1,i is located may be determined to be (f _1,i,f_2,i) if the relative position of the i-th first feature f _1,i in the first feature map is the same as the relative position of the i-th second feature f _2,i in the second feature map.

The first negative feature pair where the i-th first feature f _1,i is located can be determined to be (f _1,i,f_2,k) if the relative position of the i-th first feature f _1,i in the first feature map is different from the relative position of the k-th second feature f _2,k (k+.i) in the second feature map.

In addition, the second feature map corresponding to the second image may also include N second features F _2,x, x may be any integer from 1 to N, and may also obtain the second feature set F ₂＝{f_2,1,f_2,2,…,f_2,N. And, a positive pair of features (f _2,i,f_1,i) where the ith second feature is located, and a first negative pair of features (f _2,i,f_1,k) where the ith second feature is located, may be formed.

Step 203, acquiring a third feature, and determining a plurality of second negative feature pairs according to the third feature and a target feature, wherein the third feature is a feature corresponding to a third image except the first image and the second image, and the target feature is at least one of the plurality of first features and the plurality of second features.

Wherein a second negative feature pair includes a third feature and a target feature, the number of third features is at least one, the number of third images is also at least one, and the third image may be an image that does not match both the first image and the second image. The third image may be specified manually from an image library, and may be taken by a computer device, for example. Alternatively, referring to fig. 3, the third image may be obtained based on random angular transformation and/or random scale transformation of images other than the first image and the second image.

In the embodiment of the present application, the method for acquiring the third feature is not limited. Illustratively, as shown in fig. 3, acquiring the third feature includes: sampling the third image to obtain a plurality of alternative features corresponding to the third image; determining a feature queue according to a plurality of alternative features, wherein the arrangement order of the plurality of alternative features in the feature queue is determined according to the sampling order of the plurality of alternative features; and acquiring a third feature from the feature queue according to the arrangement order.

The fixed length of the feature queue may be M, i.e., M candidate features may be stored in the feature queue. The length of the feature queue M may be the same or different during each training.

For example, when a plurality of candidate features are sampled from the third image, the third image may be input into an initial image matching model or another image matching model to obtain an output third feature map, where the third feature map includes candidate features that may be added to a feature queue. And, the alternative feature included in the third feature map may correspond to the third feature point, and the relative position of one alternative feature in the third feature map is indicated by the third feature point corresponding to the alternative feature.

After the plurality of candidate features are sampled, the acquired at least one candidate feature may be arranged in the feature queue according to the sampling order, so that the arrangement order of the candidate features in the feature queue is equal to the sampling order of the candidate features.

Or the plurality of candidate features can be screened to obtain candidate features after screening, and the number of the candidate features after screening is smaller than or equal to the number of the plurality of candidate features. And arranging the selected alternative features in a feature queue, wherein the arrangement sequence between any two adjacent selected alternative features in the selected alternative features stored in the feature queue is the same as the sequence of sampling to obtain the optional two adjacent selected alternative features.

For example, 5 candidate features are sampled, candidate feature 1, candidate feature 2, candidate feature 3, candidate feature 4, and candidate feature 5, respectively. After 5 candidate features are screened, candidate feature 1, candidate feature 3, and candidate feature 5 are considered suitable for storage in the feature queue, candidate feature 1 is determined to be candidate feature a after screening, candidate feature 3 is determined to be candidate feature B after screening, and candidate feature 5 is determined to be candidate feature C after screening. The ranking of the candidate features after screening in the feature queue is candidate feature a after screening-candidate feature B after screening-candidate feature C after screening. Any two adjacent optional features after screening, for example, the arrangement sequence of the optional feature A after screening and the optional feature B after screening is the same as the sequence of the optional feature A after screening and the optional feature B after screening obtained by sampling.

After the feature queue is determined, a third feature is obtained according to the arrangement sequence of the alternative features in the feature queue, the third feature can also be called a negative sample, more negative samples are mined or created, the scale of the negative sample is increased, the learning and training of the initial image matching model can be deeper, and the training effect is better.

For example, the established feature queue includes 10 candidate features, and the first 2 candidate features are acquired from the 10 candidate features as the third feature in sequence according to the arrangement order.

Then, a second negative feature pair, that is, a second negative sample pair, may be determined from the acquired third feature and target feature. The third feature is obtained as a key step of the embodiment of the present application, so that in the training process, not only the first negative feature pair between the first image and the second image which are matched, but also the second negative feature pair between the first image and/or the second image and the third image are focused, and the number of the negative feature pairs is increased.

For example, the determined feature queue Q includes M features Q _y, y may be any integer from 1 to M, and the numbers M and N may be the same or different, and the feature queue may be represented as q= { Q ₁,q₂,…,q_M }. The third feature is denoted as q _k, if the target feature includes the ith first feature, the second negative feature pair where the ith first feature f _1,i is located may be determined to be (f _1,i,q_k), and if the target feature includes the ith second feature f _2,i, the second negative feature pair where the ith second feature f _2,i is located may be determined to be (f _2,i,q_k).

And step 204, updating an initial image matching model according to the positive feature pairs, the first negative feature pairs and the second negative feature pairs to obtain a target image matching model, wherein the target image matching model is used for outputting reference features according to the input images, and the reference features are used for determining whether the images are matched.

The process of updating the initial image matching model to obtain the target image matching model is the training process of the image matching model. Because the training process adopts positive feature pairs, first negative feature pairs and second negative feature pairs, and the number of the negative feature pairs is large, the target image matching model obtained through training has good performance, and an accurate feature map can be output according to an input image. Such feature maps include features that enable accurate matching of multiple images. Among other things, such training process is also known as deep learning method, positive feature pair, first negative feature pair and second negative feature pair are also known as valid sample pair.

In addition, since the second image and the third image may be images obtained through random angle and/or random scale transformation, and the first negative feature pair and the second negative feature pair need to be generated depending on at least one feature of the second feature corresponding to the second image and the third feature corresponding to the third image, the first negative feature pair and the second negative feature pair may be considered to be also obtained through random angle and/or random scale transformation. Through the first negative feature pair and the second negative feature pair, rotation and scale invariance which are lack by a traditional or common convolution algorithm can be overcome, the capability of a target image matching model obtained through training for extracting similar or identical features from a plurality of images with large angle and/or scale variation can be improved, the matching capability of the target image matching model in a scene with large angle and/or scale variation is improved, and the robustness of the features, namely the angle stability and scale invariance of the features in the scene with large angle and/or scale variation, is improved.

The reference value may also be referred to as a feature interval or classification interval (margin), and the reference value may increase the difficulty of classifying the positive feature pair from the first negative feature pair by the initial image matching model, and/or the difficulty of classifying the positive feature pair from the second negative feature pair by the initial image matching model. The method of obtaining the reference value is not limited in the present application, and for example, the reference value may be set by a skilled person according to experience or the requirement for a loss function.

In addition, the feature product of the positive feature pair can reflect the similarity between the first feature and the second feature included in the positive feature pair, the feature product of the first negative feature pair can reflect the similarity between the first feature and the second feature included in the first negative feature pair, and the feature product of the second negative feature pair can reflect the similarity between the target feature and the third feature included in the second negative feature pair.

In one possible implementation manner, any one of the first feature pairs corresponds to one positive feature pair and at least one of the first negative feature pairs, any one of the second feature pairs corresponds to one positive feature pair and at least one of the first negative feature pairs, and the loss function is determined according to the reference value, the plurality of positive feature pairs, the plurality of first negative feature pairs, and the plurality of second negative feature pairs, and then the loss function is determined according to the reference value, the plurality of positive feature pairs, the plurality of first negative feature pairs, and the plurality of second negative feature pairs, including steps 1 to 4 as follows.

Step 1, multiplying any one first feature point by a second feature included in a positive feature pair corresponding to any one first feature to obtain a first product, multiplying any one first feature point by a second feature included in at least one first negative feature pair corresponding to any one first feature to obtain at least one second product, and determining a first loss according to the first product, the at least one second product and a reference value.

Alternatively, the product of a first feature point multiplied by a second feature that matches the first feature is a first product, and the product of a first feature point multiplied by a second feature that does not match the first feature is a second product.

For example, if the positive feature pair where the first feature f _1,i is located is (f _1,i,f_2,i), the first product is f _1,i·f_2,i, that is, the product of the first feature f _1,i point included in the positive feature pair multiplied by the second feature f _2,i. The first negative feature pair where the first feature is located is (f _1,i,f_2,k), then a second product is f _1,i·f_2,k, i.e., the first negative feature pair includes the product of the first feature f _1,i point times the second feature f _2,k.

After deriving the first product and the at least one second product, illustratively, determining a first penalty from the first product, the at least one second product, and the reference value comprises: and (3) differencing each second product with the first product to obtain at least one first difference value, summing each first difference value with a reference value to obtain at least one first value, and determining a first loss according to the at least one first value.

With reference to equation (1) below, determining the first loss from the positive feature pair (f _1,i,f_2,i), the first negative feature pair (f _1,i,f_2,k), and the reference value may be accomplished. Wherein D (f _1,i,f_2,i) is a first loss, the first loss D (f _1,i,f_2,i) is a loss of the first image to the second image, the result of f _1,i·f_2,k-f_1,i·f_2,i +m is a first value, m is a reference value, and τ represents a temperature coefficient. The temperature coefficient tau can be a numerical value set by a technician, and can influence the result curve of the loss function, the larger the temperature coefficient tau is, the smoother the result curve of the loss function is, the smaller the temperature coefficient tau is, the sharper the result curve of the loss function is, but the better the convergence is.

D(f_1,i,f_2,i)＝log[1+∑_{0≤k≤N,k≠i}exp((f_1,i·f_2,k-f_1,i·f_2,i+m)/τ)] (1)

In the above formula (1), it is desirable that the feature similarity between the first negative feature pairs is smaller than the feature similarity between the positive feature pairs, the distance between the feature similarity between the first negative feature pairs and the feature similarity between the positive feature pairs is at least a distance of one m, and that the larger the distance is represented by the stronger the matching ability of the initial image matching model. That is, the smaller the value of f _1,i·f_2,k, the better the value of f _1,i·f_2,i, and the larger the value of f _1,i·f_2,i, the smaller or equal to-m the first difference of f _1,i·f_2,k-f_1,i·f_2,i, that is, the distance between the product of the positive feature pair and the product of the first negative feature pair is greater than or equal to the reference value m, and the first value is less than or equal to zero, so that the resulting first loss D (f _1,i,f_2,i) is smaller.

In another writing manner in which the following formula (2) is the formula (1), the first loss D (f _1,i,f_2,i) may also be calculated according to the formula (2). Each symbol in the formula (2) may be referred to the description of the formula (1), and will not be described herein.

And 2, multiplying any one second feature point by a first feature included in a positive feature pair corresponding to any one second feature to obtain a third product, multiplying any one second feature point by a first feature included in at least one first negative feature pair corresponding to any one second feature to obtain at least one fourth product, and determining a second loss according to the third product, the at least one fourth product and a reference value.

Alternatively, the product of the second feature point multiplied by the first feature matching the second feature is a third product, and the product of the second feature point multiplied by the first feature not matching the second feature is a fourth product.

For example, if the positive feature pair where the second feature f _2,i is located is (f _2,i,f_1,i), the third product is f _2,i·f_1,i, that is, the product of the second feature f _2,i point included in the positive feature pair multiplied by the first feature f _1,i. The first negative feature pair where the second feature is located is (f _2,i,f_1,k), then a fourth product is f _2,i·f_1,k, i.e., the product of the second feature f _2,i point included in the first negative feature pair times the first feature f _1,k.

After deriving the third product and the at least one fourth product, illustratively, determining a second loss from the third product, the at least one fourth product, and the reference value comprises: and (3) differencing each fourth product with the third product to obtain at least one second difference value, summing each second difference value with a reference value to obtain at least one second value, and determining a second loss according to the at least one second value.

With reference to the following formula (3), it may be achieved to obtain a second loss D (f _2,i,f_1,i) from the positive pair of features (f _2,i,f_1,i), the first negative pair of features (f _2,i,f_1,k) and the reference value m, where the second loss D (f _2,i,f_1,i) is the loss of the second image to the first image and the result of f _2,i·f_1,k-f_2,i·f_1,i +m is the second value.

D(f_2,i,f_1,i)＝log[1+∑_{0≤k≤N,k≠i}exp((f_2,i·f_1,k-f_2,i·f_1,i+m)/τ)] (3)

It should be appreciated that the second loss D (f _2,i,f_1,i) can also be calculated from the following equation (4). F _2,i·f_1,i in equation (4) is the characteristic product of the positive characteristic pair, and f _2,i·f_1,k is the characteristic product of the first negative characteristic pair.

And 3, multiplying target feature points included in each second negative feature pair by a third feature to obtain a plurality of fifth products, determining third loss according to the plurality of fifth products, the first products and the reference value when the target feature comprises any one first feature, and determining fourth loss according to the plurality of fifth products, the third products and the reference value when the target feature comprises any one second feature.

In other words, the product of a target feature point multiplied by a third feature is a fifth product.

For example, the target feature may be the first feature f _1,i, the second negative feature pair includes the target feature f _1,i and the third feature q _k, where 0.ltoreq.k.ltoreq.N, the second negative feature pair where the target feature f _1,i is located is (f _1,i,q_k), and a fifth product is f _1,i·q_k, that is, the product of the points of the target feature f _1,i included in the second negative feature pair multiplied by the third feature q _k.

After deriving the plurality of fifth products, illustratively, determining a third loss from the plurality of fifth products, the first product, and the reference value comprises: and carrying out difference between the plurality of fifth products and the first products to obtain a plurality of third difference values, carrying out sum of each third difference value and a reference value to obtain a plurality of third values, and determining third loss according to the plurality of third values.

With reference to equation (5) below, it may be achieved to obtain a third loss D ^q(f_1,i,f_2,i) from the positive pair of features (f _1,i,f_2,i), the second negative pair of features (f _1,i,q_k), and the reference value m, the third loss D ^q(f_1,i,f_2,i) being the loss of the first image to the third feature.

D^q(f_1,i,f_2,i)＝log[1+∑_0≤k≤Mexp((f_1,i·q_k-f_1,i·f_2,i+m)/τ)] (5)

For another example, the target feature may be a second feature f _2,i, where the second negative feature pair includes the target feature f _2,i and the third feature q _k, where 0.ltoreq.k.ltoreq.n, and where the second negative feature pair where the target feature f _2,i is located is (f _2,i,q_k), a fifth product is f _2,i·q_k, that is, a product of points of the target feature f _2,i included in the second negative feature pair multiplied by the third feature q _k.

Illustratively, determining the fourth loss from the plurality of fifth products, the third products, and the reference value comprises: and carrying out difference operation on the plurality of fifth products and the third products to obtain a plurality of fourth difference values, carrying out sum operation on each fourth difference value and the reference value to obtain a plurality of fourth values, and determining fourth loss according to the plurality of fourth values.

With reference to equation (6) below, it may be achieved to obtain a fourth loss D ^q(f_2,i,f_1,i) from the positive pair of features (f _1,i,f_2,i), the second negative pair of features (f _2,i,q_k), and the reference value m, the fourth loss D ^q(f_2,i,f_1,i) being the loss of the second image to the third feature.

D^q(f_2,i,f_1,i)＝log[1+∑_0≤k≤Mexp((f_2,i·q_k-f_2,i·f_1,i+m)/τ)] (6)

And step 4, carrying out weighted summation on at least one of the third loss and the fourth loss and the first loss and the second loss to obtain a loss function.

The weights of the first loss, the second loss, the third loss, and the fourth loss may be the same or different, and are not limited herein. Illustratively, the loss function may be the following equation (7).

And L is a feature loss, the matching capability of the initial image matching model can be embodied, and the smaller the value of the feature loss L is, the stronger the matching capability of the initial image matching model is represented, and otherwise, the worse the matching capability of the initial image matching model is represented. Gamma is a weight parameter, and can be set by a technician according to requirements, and can also be determined according to the matching capacity of the initial image matching model. Alternatively, the loss function may be a softmax (normalized index) loss function, or may be another loss function. D (f _1,i,f_2,i) is the first loss, D (f _2,i,f_1,i) is the second loss, D ^q(f_1,i,f_2,i) is the third loss, and D ^q(f_2,i,f_1,i) is the fourth loss.

Of course, in addition to the first loss and the second loss, the loss function may include only the third loss D ^q(f_1,i,f_2,i), and γd ^q(f_2,i,f_1,i) may be deleted from the above formula (7) to obtain the loss function. Alternatively, in addition to the first loss and the second loss, the loss function may include only the fourth loss D ^q(f_2,i,f_1,i), and γd ^q(f_1,i,f_2,i) may be deleted from the above formula (7) to obtain the loss function.

After determining the loss function based on the steps 1 to 4, referring to fig. 3, the feature loss obtained by calculating the loss function may be fed back to the initial image matching model, and the initial image matching model is updated according to the loss function, so that the matching capability of the updated initial image model is stronger. For the same group of first image and second image, the number of similar features of the first feature in the first feature map and the second feature in the second feature map obtained by inputting the updated initial image matching model is more than the number of similar features of the first feature in the first feature map and the second feature in the second feature map obtained by inputting the initial image matching model. The updated initial image model is the target image matching model.

Based on the above descriptions of step1 to step 4, the effect of using the reference value in the loss function according to the embodiment of the present application is described by taking the first value as an example.

Wherein the first value is the sum of the first difference and the reference value. The first loss may be determined from the first value, and the loss function may be determined from the first loss, and the training process may require that the result of the loss function be sufficiently small, and it may be appreciated that the first loss may also need to be sufficiently small, and the first value may also need to be sufficiently small.

If the reference value is not set, the first difference is the first value, and when the first difference is small enough, the first value is small enough, and the result of the loss function is small enough. However, in the present application, the first value is the sum of the first difference and the reference value, so the first difference needs to be smaller, so that the sum of the first difference and the reference value is also small enough, that is, the first value is also small enough. Therefore, the reference value can constrain the first difference such that the first difference is smaller.

Since the first difference is the difference of the second product minus the first product, the first difference is smaller, which may be indicative of the second product being smaller and/or the first product being greater. Wherein the second product is smaller because it reflects the similarity between the first and second features comprised by the first negative pair of features, representing that the first negative pair of features comprises less similarity between the first and second features. Accordingly, the first product can reflect the similarity between the first feature and the second feature included in the positive feature pair, so the first product is larger, representing that the similarity between the first feature and the second feature included in the positive feature pair is larger.

Thus, the similarity between the matched first and second features included in the positive pair of features is substantially greater than the similarity between the unmatched first and second features included in the first negative pair of features. Therefore, the target image matching model obtained through the training process can extract similar features aiming at different images which are matched, and dissimilar features can be extracted aiming at different images which are not matched, so that the target image matching model has higher distinguishing capability on the features, and whether different images are matched can be accurately determined by using the target image matching model.

Furthermore, in another possible implementation, the loss function may be determined from only the positive feature pair, the first negative feature pair, and the second negative feature pair. In such an implementation, the method of determining the first, second, third, and fourth losses is similar to the method of determining the loss function from the reference value, the positive feature pair, the first negative feature pair, and the second negative feature pair described above, except that none of the first, second, third, and fourth losses determined in such an implementation include the reference value. After determining the first, second, third, and fourth losses, the first, second, third, and fourth losses may be weighted and summed to obtain a loss function. Such a method is less significant than the above-described method of determining a loss function from a reference value, a positive pair of features, a first negative pair of features, and a second negative pair of features, and the effect is relatively less desirable.

Optionally, updating the initial image matching model according to the positive feature pair, the first negative feature pair and the second negative feature pair to obtain the target image matching model, and deleting the third feature from the feature queue.

For example, the third feature may be deleted from the feature queue after the second negative feature pair is determined from the third feature and the third loss and the fourth loss are determined based on the second negative feature pair. Therefore, the method can realize the dynamic storage of the alternative features in the feature queue according to a first-in first-out mode.

In summary, in the embodiment of the present application, by acquiring the first feature, the second feature, and the third feature, and determining the positive feature pair, the first negative feature pair, and the second negative feature pair, the number of negative feature pairs is increased, so that the features of more diversified images can be incorporated into the negative feature pairs, and the training of the initial image matching model is performed, thereby improving the training effect.

In addition, by introducing the reference value, the training difficulty is improved, and the expression capability of the characteristics for matching is enhanced. By performing random angle and/or random scale transformation on the basic image and other images, the capability of the initial image matching model to extract similar or identical features for a plurality of images with identical or similar attributes with large angle and/or scale variation, or the capability of performing pixel identification and alignment on contents or structures in a plurality of images with identical or similar attributes, can be improved. And moreover, the matching capability of the initial image matching model in the scene with large angle and/or scale transformation is improved, and the robustness of the features is improved, namely, the angle stability and scale invariance of the features in the scene with large angle and/or scale transformation are improved.

The application further provides an image matching method based on the image matching model after training by the training method of the image matching model. The method may also be applied to the computer device 11 shown in fig. 1. As shown in fig. 4, the method includes the following steps 401 to 402.

Step 401, acquiring a fourth image and a fifth image, inputting the fourth image and the fifth image into a target image matching model, and obtaining a plurality of fourth features and a plurality of fifth features output by the target image matching model, wherein the target image matching model is obtained by training according to the training method of the image matching model provided by the application.

The fourth image and the fifth image are images to be matched, or a target image matching model is required to be used for acquiring a plurality of fourth features corresponding to the fourth image and a plurality of fifth features corresponding to the fifth image, so that whether the fourth image and the fifth image are matched or not can be determined according to the fourth features and the fifth features.

In an exemplary scenario, an image matching model trained by the training method of the image matching model provided by the application is applied to an unmanned aerial vehicle. In this scenario, after the unmanned aerial vehicle's task is to reach the destination, a target object located at the destination is determined, and an object carried by the unmanned aerial vehicle is placed on the target object.

Before the unmanned aerial vehicle reaches the destination, a technician can input a photo of the target object as a fourth image into a target image matching model of the unmanned aerial vehicle, and obtain a plurality of fourth features output by the target image matching model of the unmanned aerial vehicle. After the unmanned aerial vehicle reaches the destination, the unmanned aerial vehicle photographs surrounding scenes, the photographs obtained through photographing are automatically input into a target image matching model of the unmanned aerial vehicle as a fifth image, and a plurality of fifth features output by the image matching model of the target unmanned aerial vehicle are obtained.

The unmanned aerial vehicle is a fifth image shot in the flight process, and the pose of the unmanned aerial vehicle in the flight process may be flexibly changed, so that the fifth image shot by the unmanned aerial vehicle and the fourth image may have different angles and dimensions. However, since the model for extracting the plurality of fourth features and the plurality of fifth features is a target image matching model having a strong matching ability even in a scene with a large angle and/or scale transformation, it is possible to ensure that the plurality of extracted fourth features and the plurality of fifth features have a strong angle stability in a scene with a large angle transformation and have a strong scale invariance in a scene with a large scale transformation.

Step 402, determining whether the fourth image and the fifth image match according to the fourth plurality of features and the fifth plurality of features.

Illustratively, the drone determines whether the first image matches the second image based on the number of matched features, i.e., similar features, present in the fourth and fifth features.

In some embodiments, if there is no fourth feature or fifth feature that matches, or although there are fourth and fifth features that match, the number of fourth and fifth features that match is less than a match threshold set by the technician, then it is determined that the fourth image does not match the fifth image. Thus, it is stated that the destination has not been reached currently, the drone needs to continue flying and capturing a new fifth image, and continue to confirm whether the fourth image matches the new fifth image.

In other embodiments, if there are fourth and fifth features that match and the number of fourth and fifth features that match is greater than or equal to a match threshold set by the technician, then it is determined that the fourth image matches the fifth image. After the unmanned aerial vehicle determines the fifth image which can be matched with the fourth image, the unmanned aerial vehicle indicates that the destination is reached, and the target object at the destination is determined, so that the unmanned aerial vehicle can place the object carried by the unmanned aerial vehicle on the target object to complete the task.

In summary, the embodiment of the application uses the target image matching model to extract the features of different images to be matched. The target image matching model is obtained through training according to a plurality of negative feature pairs, has strong feature extraction capacity, and therefore the feature extracted by using the target image matching model is accurate, and is beneficial to accurately determining whether different images are matched.

In addition, even under the special scenes that angles and scales are different and angles and scales are different among different images to be matched, the features extracted by the target image matching model still have strong robustness or strong angle stability and scale invariance, so that whether the different images are matched can still be accurately determined by using the features, and the interference of the angles and scales on the image matching process is avoided.

The methods provided in the related art and the embodiments of the present application are described in comparison.

In the related art, a first image and a second image are input into an initial image matching model to obtain a first feature image and a second feature image which are output by the initial image matching model, the first feature and the second feature are determined according to the first feature image and the second feature image, feature loss is calculated according to a loss function based on the first feature and the second feature, and the feature loss is fed back to the initial image matching model, so that training of the image matching model is completed once. Wherein the second image is an image that is not randomly angularly and/or randomly upscaled.

Problems associated with the related art include, but are not limited to, technical problem one: only the first negative feature pairs in the first image and the second image are concerned, the number of the negative feature pairs is small, and the training effect is poor; the technical problems are as follows: the training difficulty of the initial image matching model is low, and the capability of the characteristics for matching is weakened; the technical problems are as follows: under the scene that the angle and/or the scale change is large, the effect of extracting the first feature and the second feature for the input first image and the second image is poor, the matching capability is poor, even the matching capability is almost invalid, and the matching can be unsuccessful.

Aiming at the first technical problem: according to the method, the third characteristic is obtained according to the characteristic queue, the second negative characteristic pair is determined based on the third characteristic, the first characteristic and the second characteristic, and the number of the negative characteristic pairs is increased.

Aiming at the second technical problem: the application introduces a reference value into the loss function when calculating the feature loss, and increases the spacing limit between the positive feature pair and the first negative feature pair and/or between the positive feature pair and the second negative feature pair.

Aiming at the technical problems III: the application creates a second feature and a third feature with more angle and/or angle transformation by performing random angle and/or random scale transformation on the base image and other images.

The effect of the training of the initial image matching model and the training method of the initial image matching model according to the related art can be seen in fig. 5 and 6. The left half part of each group of pictures is a first image, the right half part of each group of pictures is a second image, the left end of each gray line is connected with a first feature, and the right end of each gray line is connected with a second feature matched with the first feature.

As can be seen from lines 1 and 3 of fig. 5, the related art can extract more first and second features that are matched only when the angles of the first and second images are the same, and can extract fewer first and second features that are matched only when the angles of the second and first images are different by 45 degrees, 90 degrees, and 180 degrees. As can be seen from the 2 nd and 4 th rows of fig. 5, the embodiment of the present application can extract more and matched first features and second features, both in the case that the angles of the first image and the second image are the same, and in the case that the angles of the second image and the first image are different by 45 degrees, 90 degrees and 180 degrees.

As can be seen from fig. 6, in the case that the first image and the second image have different scales, the related art can only extract fewer first features and second features that are matched, while the embodiment of the application can extract more first features and second features that are matched.

Referring to fig. 7, an embodiment of the present application provides a training apparatus for an image matching model, which includes the following modules.

The acquiring module 701 is configured to acquire a first image and a second image that are matched, input the first image and the second image into an initial image matching model, and obtain a plurality of first features corresponding to the first image and a plurality of second features corresponding to the second image output by the initial image matching model;

A determining module 702, configured to determine a plurality of positive feature pairs and a plurality of first negative feature pairs according to the plurality of first features and the plurality of second features, where any one positive feature pair includes a matched first feature and second feature, and any one first negative feature pair includes a non-matched first feature and second feature;

the obtaining module 701 is further configured to obtain a third feature, and determine a plurality of second negative feature pairs according to the third feature and a target feature, where the third feature is a feature corresponding to a third image other than the first image and the second image, and the target feature is at least one of the plurality of first features and the plurality of second features;

the updating module 703 is configured to update the initial image matching model according to the plurality of positive feature pairs, the plurality of first negative feature pairs, and the plurality of second negative feature pairs to obtain a target image matching model, where the target image matching model is configured to output a plurality of reference features according to the plurality of input images, and the plurality of reference features is configured to determine whether the plurality of images match.

In a possible implementation manner, the obtaining module 701 is configured to sample the third image to obtain a plurality of alternative features corresponding to the third image; determining a feature queue according to a plurality of alternative features, wherein the arrangement order of the plurality of alternative features in the feature queue is determined according to the sampling order of the plurality of alternative features; and acquiring a third feature from the feature queue according to the arrangement order.

In one possible implementation, the updating module 703 is configured to obtain a reference value; determining a loss function from the reference value, the plurality of positive feature pairs, the plurality of first negative feature pairs, and the plurality of second negative feature pairs; and updating the initial image matching model according to the loss function to obtain a target image matching model.

In a possible implementation manner, any one of the first feature pairs corresponds to a positive feature pair and at least one of the first negative feature pairs, any one of the second feature pairs corresponds to a positive feature pair and at least one of the first negative feature pairs, and the updating module 703 is configured to multiply the second feature included in the one of the positive feature pairs corresponding to any one of the first feature pairs by any one of the first feature points to obtain a first product, multiply the second feature included in the at least one of the first negative feature pairs corresponding to any one of the first feature points by any one of the first feature points to obtain at least one second product, and determine the first loss according to the first product, the at least one second product, and the reference value; multiplying any one second feature point by a first feature included in a positive feature pair corresponding to any one second feature to obtain a third product, multiplying any one second feature point by at least one first feature included in a first negative feature pair corresponding to any one second feature to obtain at least one fourth product, and determining a second loss according to the third product, the at least one fourth product and a reference value; multiplying the target feature points included in each second negative feature pair by the third feature to obtain a plurality of fifth products, determining a third loss according to the plurality of fifth products, the first products and the reference values when the target feature includes any one of the first features, and determining a fourth loss according to the plurality of fifth products, the third products and the reference values when the target feature includes any one of the second features; and carrying out weighted summation on at least one of the third loss and the fourth loss and the first loss and the second loss to obtain a loss function.

In a possible implementation, the updating module 703 is configured to perform a difference between each second product and the first product to obtain at least one first difference value, sum each first difference value and the reference value to obtain at least one first value, and determine the first loss according to the at least one first value; the updating module 703 is further configured to perform a difference between each fourth product and the third product to obtain at least one second difference value, sum each second difference value and the reference value to obtain at least one second value, and determine a second loss according to the at least one second value; the updating module 703 is further configured to perform a difference between the plurality of fifth products and the first products to obtain a plurality of third difference values, sum each third difference value with a reference value to obtain a plurality of third values, and determine a third loss according to the plurality of third values; the updating module 703 is further configured to perform a difference between the plurality of fifth products and the third products to obtain a plurality of fourth differences, sum each fourth difference with the reference value to obtain a plurality of fourth values, and determine a fourth loss according to the plurality of fourth values.

In a possible implementation manner, the acquiring module 701 is configured to acquire a base image, where the base image is a first image or an image matched with the first image; and processing the basic image to obtain a second image matched with the first image, wherein the processing comprises at least one of random angle transformation and random scale transformation.

In one possible implementation, the determining module 702 is configured to determine a relative order of the first features in the plurality of first features; determining a relative order of each second feature in the plurality of second features; determining the first feature and the second feature which are in the same relative sequence as positive feature pairs; the first feature and the second feature, which are different in relative order, are determined as a first negative feature pair.

Referring to fig. 8, an embodiment of the present application provides an apparatus for image matching, which includes the following modules.

The obtaining module 801 is configured to obtain a fourth image and a fifth image, input the fourth image and the fifth image into a target image matching model, obtain a plurality of fourth features and a plurality of fifth features output by the target image matching model, and train the target image matching model according to the training method of any one of the image matching models provided by the embodiment of the present application;

A determining module 802 is configured to determine whether the fourth image and the fifth image match according to the fourth plurality of features and the fifth plurality of features.

Note that, the technical effects of the method embodiment corresponding to fig. 2 may be referred to as the technical effects of the method embodiment corresponding to fig. 7. The technical effects of the embodiment of the method of fig. 4 can be seen from the technical effects of the embodiment of the method of fig. 8. And will not be described in detail herein.

In addition, when the apparatus provided in the above embodiment implements the functions thereof, only the division of the above functional modules is used as an example, and in practical application, the above functional allocation may be implemented by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.

In an exemplary embodiment, a computer device is also provided, the computer device comprising a processor and a memory, the memory having at least one computer program stored therein. The at least one computer program is loaded and executed by one or more processors to cause the computer apparatus to implement the training method of the image matching model corresponding to fig. 2 or the method of image matching corresponding to fig. 4 described above. The computer device may be a server or a terminal, and the structures of the server and the terminal are respectively described with reference to fig. 9 and fig. 10.

Fig. 9 is a schematic structural diagram of a server according to an embodiment of the present application, where the server may have a relatively large difference due to different configurations or performances, and may include one or more processors (Central Processing Units, CPU) 901 and one or more memories 902, where at least one computer program is stored in the one or more memories 902, and the at least one computer program is loaded and executed by the one or more processors 901, so that the server implements the training method of the image matching model or the image matching method provided in the foregoing method embodiments. Of course, the server may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.

Fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present application. The terminal may be, for example: smart phones, tablet computers, notebook computers or desktop computers. Terminals may also be referred to by other names as user equipment, portable terminals, laptop terminals, desktop terminals, etc.

Generally, the terminal includes: a processor 1001 and a memory 1002.

The processor 1001 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 1001 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ). The processor 1001 may also include a main processor and a coprocessor, the main processor being a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1001 may be integrated with a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 1001 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.

Memory 1002 may include one or more computer-readable storage media, which may be non-transitory. Memory 1002 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1002 is configured to store at least one instruction for execution by processor 1001 to cause the terminal to implement the training method of the image matching model or the method of image matching provided by the method embodiments of the present application.

In some embodiments, the terminal may further optionally include: a peripheral interface 1003, and at least one peripheral. The processor 1001, the memory 1002, and the peripheral interface 1003 may be connected by a bus or signal line. The various peripheral devices may be connected to the peripheral device interface 1003 via a bus, signal wire, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1004, a display 1005, a camera assembly 1006, audio circuitry 1007, a positioning assembly 1008, and a power supply 1009.

Peripheral interface 1003 may be used to connect I/O (Input/Output) related at least one peripheral to processor 1001 and memory 1002. In some embodiments, processor 1001, memory 1002, and peripheral interface 1003 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 1001, memory 1002, and peripheral interface 1003 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

Radio Frequency circuit 1004 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. Radio frequency circuitry 1004 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1004 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1004 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. Radio frequency circuitry 1004 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (WIRELESS FIDELITY ) networks. In some embodiments, the radio frequency circuit 1004 may further include NFC (NEAR FIELD Communication) related circuits, which is not limited by the present application.

The display screen 1005 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 1005 is a touch screen, the display 1005 also has the ability to capture touch signals at or above the surface of the display 1005. The touch signal may be input to the processor 1001 as a control signal for processing. At this time, the display 1005 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 1005 may be one, disposed on the front panel of the terminal; in other embodiments, the display 1005 may be at least two, respectively disposed on different surfaces of the terminal or in a folded design; in other embodiments, the display 1005 may be a flexible display disposed on a curved surface or a folded surface of the terminal. Even more, the display 1005 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display 1005 may be made of LCD (Liquid CRYSTAL DISPLAY), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 1006 is used to capture images or video. Optionally, camera assembly 1006 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 1006 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 1007 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 1001 for processing, or inputting the electric signals to the radio frequency circuit 1004 for voice communication. For the purpose of stereo acquisition or noise reduction, a plurality of microphones can be respectively arranged at different parts of the terminal. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 1001 or the radio frequency circuit 1004 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, audio circuit 1007 may also include a headphone jack.

The location component 1008 is used to locate the current geographic location of the terminal to enable navigation or LBS (Location Based Service, location-based services). The positioning component 1008 may be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, the Granati system of Russia, or the Galileo system of the European Union.

The power supply 1009 is used to supply power to the various components in the terminal. The power source 1009 may be alternating current, direct current, disposable battery or rechargeable battery. When the power source 1009 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal further includes one or more sensors 1010. The one or more sensors 1010 include, but are not limited to: acceleration sensor 1011, gyroscope sensor 1012, pressure sensor 1013, fingerprint sensor 1014, optical sensor 1015, and proximity sensor 1016.

The acceleration sensor 1011 can detect the magnitudes of accelerations on three coordinate axes of a coordinate system established with the terminal. For example, the acceleration sensor 1011 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 1001 may control the display screen 1005 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 1011. The acceleration sensor 1011 may also be used for the acquisition of motion data of a game or a user.

The gyro sensor 1012 may detect a body direction and a rotation angle of the terminal, and the gyro sensor 1012 may collect a 3D motion of the user to the terminal in cooperation with the acceleration sensor 1011. The processor 1001 may implement the following functions according to the data collected by the gyro sensor 1012: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 1013 may be provided at a side frame of the terminal and/or a lower layer of the display 1005. When the pressure sensor 1013 is provided at a side frame of the terminal, a grip signal of the terminal by a user can be detected, and the processor 1001 performs left-right hand recognition or quick operation according to the grip signal collected by the pressure sensor 1013. When the pressure sensor 1013 is provided at the lower layer of the display screen 1005, the processor 1001 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 1005. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 1014 is used to collect a fingerprint of the user, and the processor 1001 identifies the identity of the user based on the fingerprint collected by the fingerprint sensor 1014, or the fingerprint sensor 1014 identifies the identity of the user based on the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 1001 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 1014 may be provided on the front, back or side of the terminal. When a physical key or vendor Logo (trademark) is provided on the terminal, the fingerprint sensor 1014 may be integrated with the physical key or vendor Logo.

The optical sensor 1015 is used to collect ambient light intensity. In one embodiment, the processor 1001 may control the display brightness of the display screen 1005 based on the ambient light intensity collected by the optical sensor 1015. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 1005 is turned up; when the ambient light intensity is low, the display brightness of the display screen 1005 is turned down. In another embodiment, the processor 1001 may dynamically adjust the shooting parameters of the camera module 1006 according to the ambient light intensity collected by the optical sensor 1015.

A proximity sensor 1016, also known as a distance sensor, is typically provided on the front panel of the terminal. The proximity sensor 1016 is used to collect the distance between the user and the front of the terminal. In one embodiment, when the proximity sensor 1016 detects a gradual decrease in the distance between the user and the front face of the terminal, the processor 1001 controls the display 1005 to switch from the bright screen state to the off screen state; when the proximity sensor 1016 detects that the distance between the user and the front surface of the terminal gradually increases, the processor 1001 controls the display 1005 to switch from the off-screen state to the on-screen state.

It will be appreciated by those skilled in the art that the structure shown in fig. 10 is not limiting of the terminal and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

In an exemplary embodiment, there is also provided a computer-readable storage medium having stored therein at least one computer program loaded and executed by a processor of a computer device to cause the computer to implement the training method of any one of the image matching models or the image matching method.

In one possible implementation, the computer readable storage medium may be a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and so on.

In an exemplary embodiment, a computer program product or a computer program is also provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform any one of the training methods of the image matching model or the image matching method described above.

It should be noted that the terms "first," "second," and the like in the description and in the claims, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals related to the present application are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of the related data is required to comply with the relevant laws and regulations and standards of the relevant countries and regions. For example, the first image, the second image, the first feature, and the second feature referred to in the present application are all acquired with sufficient authorization.

It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The above embodiments are merely exemplary embodiments of the present application and are not intended to limit the present application, any modifications, equivalent substitutions, improvements, etc. that fall within the principles of the present application should be included in the scope of the present application.

Claims

1. A method of training an image matching model, the method comprising:

determining a plurality of positive feature pairs and a plurality of first negative feature pairs according to the plurality of first features and the plurality of second features, wherein any one positive feature pair comprises a matched first feature and second feature, and any one first negative feature pair comprises a non-matched first feature and second feature;

Acquiring a third feature, and determining a plurality of second negative feature pairs according to the third feature and a target feature, wherein the third feature is a feature corresponding to a third image except the first image and the second image, and the target feature is at least one of the plurality of first features and the plurality of second features;

Updating the initial image matching model according to the positive feature pairs, the first negative feature pairs and the second negative feature pairs to obtain a target image matching model, wherein the target image matching model is used for outputting reference features according to the input images, and the reference features are used for determining whether the images are matched.

2. The method of claim 1, wherein the obtaining the third feature comprises:

sampling the third image to obtain a plurality of alternative features corresponding to the third image;

Determining a feature queue according to the plurality of alternative features, wherein the arrangement order of the plurality of alternative features in the feature queue is determined according to the sampling order of the plurality of alternative features;

and acquiring the third features from the feature queue according to the arrangement sequence.

3. The method of claim 1, wherein updating the initial image matching model based on the plurality of positive feature pairs, the plurality of first negative feature pairs, and the plurality of second negative feature pairs to obtain a target image matching model comprises:

Obtaining a reference value;

determining a loss function from the reference value, the plurality of positive feature pairs, the plurality of first negative feature pairs, and the plurality of second negative feature pairs;

and updating the initial image matching model according to the loss function to obtain the target image matching model.

4. A method according to claim 3, wherein any one of the first features corresponds to a positive feature pair and at least one of the first negative feature pairs, and any one of the second features corresponds to a positive feature pair and at least one of the first negative feature pairs, said determining a loss function based on the reference value, the plurality of positive feature pairs, the plurality of first negative feature pairs, and the plurality of second negative feature pairs, comprises:

Multiplying the second feature included in one positive feature pair corresponding to any one first feature by the any one first feature point to obtain a first product, multiplying the second feature included in at least one first negative feature pair corresponding to any one first feature by the any one first feature point to obtain at least one second product, and determining a first loss according to the first product, the at least one second product and the reference value;

Multiplying the first feature included in one positive feature pair corresponding to any one second feature by the any one second feature point to obtain a third product, multiplying the first feature included in at least one first negative feature pair corresponding to any one second feature by the any one second feature point to obtain at least one fourth product, and determining a second loss according to the third product, the at least one fourth product and the reference value;

Multiplying target feature points included in each second negative feature pair by a third feature to obtain a plurality of fifth products, determining a third loss according to the plurality of fifth products, the first products and the reference value when the target feature includes any one of the first features, and determining a fourth loss according to the plurality of fifth products, the third products and the reference value when the target feature includes any one of the second features;

and carrying out weighted summation on at least one of the third loss and the fourth loss, the first loss and the second loss to obtain the loss function.

5. The method of claim 4, wherein the step of determining the position of the first electrode is performed,

Said determining a first loss from said first product, said at least one second product, and said reference value comprises: the second products and the first products are subjected to difference to obtain at least one first difference value, the first difference values and the reference value are subjected to sum to obtain at least one first value, and first loss is determined according to the at least one first value;

Said determining a second loss from said third product, said at least one fourth product, and said reference value comprises: the fourth products and the third products are subjected to difference to obtain at least one second difference value, the second difference values and the reference value are subjected to sum to obtain at least one second value, and second loss is determined according to the at least one second value;

Said determining a third loss from said plurality of fifth products, said first product, and said reference value comprises: the plurality of fifth products and the first products are subjected to difference to obtain a plurality of third difference values, each third difference value is subjected to sum with the reference value to obtain a plurality of third values, and third loss is determined according to the plurality of third values;

Said determining a fourth loss from said plurality of fifth products, said third products, and said reference value comprises: and carrying out difference on the plurality of fifth products and the third products to obtain a plurality of fourth difference values, carrying out sum on each fourth difference value and the reference value to obtain a plurality of fourth values, and determining fourth loss according to the plurality of fourth values.

6. The method of any of claims 1-5, wherein the acquiring the matched first and second images comprises:

Acquiring a basic image, wherein the basic image is the first image or an image matched with the first image;

And processing the basic image to obtain the second image matched with the first image, wherein the processing comprises at least one of random angle transformation and random scale transformation.

7. The method of any of claims 1-5, wherein determining a plurality of positive feature pairs and a plurality of first negative feature pairs from the plurality of first features and the plurality of second features comprises:

determining a relative order of each first feature in the plurality of first features;

determining a relative order of each second feature in the plurality of second features;

Determining first and second features in the same relative order as the positive feature pair;

a first feature and a second feature that differ in relative order are determined as the first negative feature pair.

8. A method of image matching, the method comprising:

acquiring a fourth image and a fifth image, inputting the fourth image and the fifth image into a target image matching model, and obtaining a plurality of fourth features and a plurality of fifth features output by the target image matching model, wherein the target image matching model is obtained by training according to the training method of the image matching model in any one of claims 1-7;

determining whether the fourth image and the fifth image match based on the fourth plurality of features and the fifth plurality of features.

9. A computer device, characterized in that it comprises a processor and a memory, in which at least one computer program is stored, which is loaded and executed by the processor, to cause the computer device to implement the training method of an image matching model according to any one of claims 1-7 or the method of image matching according to claim 8.

10. A computer readable storage medium, wherein at least one computer program is stored in the computer readable storage medium, and the at least one computer program is loaded and executed by a processor, so that the computer implements the training method of the image matching model according to any one of claims 1 to 7 or the image matching method according to claim 8.