Nothing Special   »   [go: up one dir, main page]

CN107729935B - The recognition methods of similar pictures and device, server, storage medium - Google Patents

The recognition methods of similar pictures and device, server, storage medium Download PDF

Info

Publication number
CN107729935B
CN107729935B CN201710945888.XA CN201710945888A CN107729935B CN 107729935 B CN107729935 B CN 107729935B CN 201710945888 A CN201710945888 A CN 201710945888A CN 107729935 B CN107729935 B CN 107729935B
Authority
CN
China
Prior art keywords
hash code
picture
pictures
same
code blocks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710945888.XA
Other languages
Chinese (zh)
Other versions
CN107729935A (en
Inventor
高增
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dabei Biotechnology Co ltd
Original Assignee
Hangzhou Buy Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Buy Technology Co Ltd filed Critical Hangzhou Buy Technology Co Ltd
Priority to CN201710945888.XA priority Critical patent/CN107729935B/en
Publication of CN107729935A publication Critical patent/CN107729935A/en
Application granted granted Critical
Publication of CN107729935B publication Critical patent/CN107729935B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/758Involving statistics of pixels or of feature values, e.g. histogram matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a kind of recognition methods of similar pictures and device, server, storage medium.Wherein, this method comprises: the low dimensional feature vector Hash codes of every width picture are calculated using the picture hash algorithm that is averaged;It is at least two Hash code blocks according to preset rules universal formulation by the low dimensional feature vector Hash codes of every width picture, wherein the corresponding location index of each Hash code block;In at least two Hash code blocks of every width picture, by same position index, there are the pictures of identical Hash code block to be divided into the same cluster, obtains multiple picture clusters;In each picture cluster, similar pictures are identified according to the distance of the Hash code block between each picture in addition to the same position indexes corresponding identical Hash code block.The embodiment of the present invention reduces the computation complexity of similar pictures identification, reduces calculation amount, realizes the efficient calculating of picture similarity.

Description

Similar picture identification method and device, server and storage medium
Technical Field
The embodiment of the invention relates to an image processing technology, in particular to a method and a device for identifying similar pictures, a server and a storage medium.
Background
With the continuous improvement of the interaction degree between the user and different types of websites, the user can upload pictures in the hands of the user, so that the number of the pictures in the websites is increased rapidly, and a large number of repeated and highly similar pictures exist in the same website.
The method has the advantages that similar pictures among different pictures in the massive pictures can be quickly identified, redundant picture data can be removed, storage cost is reduced, and homogeneous or diverse picture services such as picture searching can be provided for users according to user requirements. The complexity of calculating the similarity of the pictures in the existing similar picture identification technology is high, so that the identification of the similar pictures in a mass of pictures needs a very large amount of calculation, and the method has no strong practical applicability.
Disclosure of Invention
The embodiment of the invention provides a method and a device for identifying similar pictures, a server and a storage medium, which are used for solving the problems of high complexity and large calculation amount of a method for identifying similar pictures in the prior art.
In a first aspect, an embodiment of the present invention provides a method for identifying similar pictures, where the method includes:
calculating to obtain a low-dimensional characteristic vector hash code of each picture by using a picture average hash algorithm;
uniformly dividing the low-dimensional eigenvector hash code of each picture into at least two hash code blocks according to a preset rule, wherein each hash code block corresponds to one position index;
dividing the pictures with the same hash code blocks in the same position index into the same cluster in at least two hash code blocks of each picture to obtain a plurality of picture clusters;
in each picture cluster, similar pictures are identified according to the distance of the hash code blocks between the pictures except the same hash code block corresponding to the same position index.
Further, each picture includes at least two hash code blocks with the same or different hash code lengths, and the hash code blocks include overlapping or non-overlapping portions.
Further, the method further comprises:
calculating the similarity between the pictures in each picture cluster according to the distance between the hash code blocks except the same hash code block corresponding to the same position index among the pictures;
and counting the similarity among the pictures obtained by calculation in all the picture clusters, and identifying the pictures with the similarity exceeding a preset threshold as similar pictures.
Further, the method further comprises:
responding to a similar picture searching request of a target picture, and calculating a target low-dimensional feature vector hash code of the target picture by using a picture average hash algorithm;
uniformly dividing the target low-dimensional eigenvector hash code into at least two hash code blocks according to the preset rule, wherein each hash code block corresponds to a position index;
dividing a target picture into at least one target picture cluster in the plurality of picture clusters, wherein the pictures in the target picture cluster and the target picture have the same Hash code blocks in the same position index;
and searching similar pictures of the target picture according to the distance of the hash code blocks except the same hash code block corresponding to the same position index among the pictures in at least one target picture cluster.
In a second aspect, an embodiment of the present invention further provides an apparatus for identifying similar pictures, where the apparatus includes:
the hash code calculation module is used for calculating a low-dimensional characteristic vector hash code of each picture by using a picture average hash algorithm;
the partitioning module is used for uniformly partitioning the low-dimensional eigenvector hash code of each picture into at least two hash code blocks according to a preset rule, wherein each hash code block corresponds to one position index;
the clustering module is used for placing the pictures with the same Hash code blocks in the same position index into the same cluster in at least two Hash code blocks of each picture to obtain a plurality of picture clusters;
and the first identification module is used for identifying similar pictures according to the distance of the hash code blocks except the same hash code block corresponding to the same position index among the pictures in each picture cluster.
Further, the blocking module is specifically configured to uniformly divide the low-dimensional eigenvector hash code of each picture into at least two hash code blocks with the same or different hash code lengths according to a preset rule, and the hash code blocks include overlapping or non-overlapping portions.
Further, the apparatus further comprises:
the calculation module is used for calculating the similarity between the pictures in each picture cluster according to the distance between the hash code blocks except the same hash code block corresponding to the same position index among the pictures;
and the second identification module is used for counting the similarity among the pictures obtained by calculation in all the picture clusters and identifying the picture with the similarity exceeding a preset threshold value as a similar picture.
Further, the device also comprises a searching module used for searching similar pictures of the target picture;
the search module specifically comprises:
the target hash code calculation unit is used for responding to a similar picture search request of a target picture and calculating a target low-dimensional feature vector hash code of the target picture by using a picture average hash algorithm;
the partitioning unit is used for uniformly dividing the target low-dimensional eigenvector hash code into at least two hash code blocks according to the preset rule, wherein each hash code block corresponds to one position index;
the clustering unit is used for partitioning a target picture into at least one target picture cluster in the plurality of picture clusters, wherein the pictures in the target picture cluster and the target picture have the same hash code block in the same position index;
and the searching unit is used for searching similar pictures of the target picture according to the distance of the hash code blocks except the same hash code block corresponding to the same position index among the pictures in at least one target picture cluster.
In a third aspect, an embodiment of the present invention further provides a server, including:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method for identifying similar pictures as described above.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for identifying similar pictures as described above.
According to the embodiment of the invention, the low-dimensional eigenvector hash code of each picture is calculated by using the average hash algorithm of the pictures, massive pictures are clustered according to whether the divided hash code blocks indexed at the same position are the same or not, and then similar pictures are identified according to the distance between the hash code blocks except the hash code block indexed at the same position, so that the problems of high complexity and large calculation amount of a method for identifying similar pictures in the prior art are solved, the calculation complexity of similar picture identification is reduced, the calculation amount is reduced, and the high-efficiency calculation of picture similarity is realized.
Drawings
Fig. 1 is a flowchart of a method for identifying similar pictures according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for identifying similar pictures according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an apparatus for recognizing similar pictures according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a server according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a similar picture identification method according to an embodiment of the present invention, where the embodiment is applicable to identifying similar pictures, the method may be executed by a similar picture identification apparatus, and the apparatus may be implemented in a software and/or hardware manner, and may be configured in a server, for example. As shown in fig. 1, the method specifically includes:
and step S110, calculating to obtain the low-dimensional characteristic vector hash code of each picture by using a picture average hash algorithm.
Similar pictures are identified in massive pictures, and an Average Hash algorithm (aHash) compares the Average value of each pixel point in the gray-scale image converted by each picture with the Average value of all pixel points to obtain a feature vector Hash code of each picture. For example, the original image may be first scaled to 8 × 8, and the scaled image may be converted into a gray-scale image, so as to obtain a 64-bit low-dimensional eigenvector hash code. Each bit in the low-dimensional feature vector of each picture obtained by the aHash algorithm is 0 or 1. For similar pictures, at least one block of the hash code blocks divided into a plurality of hash code blocks is the same. Here, in the process of calculating the low-latitude eigenvector hash code of the picture by using the aHash algorithm, the scaled picture may not be converted into a grayscale, and this may be configured according to the actual situation.
The algorithm for generating the picture Hash code includes, but is not limited to, aHash algorithm, and other algorithms such as Perceptual Hash algorithm (pHash) or color histogram may be adopted according to actual needs.
Step S120, uniformly dividing the low-dimensional eigenvector hash code of each picture into at least two hash code blocks according to a preset rule, wherein each hash code block corresponds to one position index.
In the step, the low-dimensional eigenvector hash codes of each picture are respectively and uniformly divided into at least two hash code blocks according to a preset rule. The preset rule is preset by a user for blocking the hash code of each picture according to the actual judgment requirement of the user, for example, if the obtained picture hash code has 64 bits, the obtained picture hash code can be divided into 4 blocks, and the number of the bits of each block is 16 bits. If it is preset that it is necessary to calculate the similarity of pictures when the number of hash code blocks different between two pictures is less than or equal to 3, it is effective to set the number of hash code blocks of each picture to be at least 4 blocks. When the number of hash code blocks different between the two pictures is more than 3, it is not necessary to further compare the similarity between the two pictures, and it is determined that the two pictures are not similar.
In this step, optionally, each picture includes at least two hash code blocks with the same or different hash code lengths, and the hash code blocks include overlapping or non-overlapping portions. In the above example, the 64-bit hash code is equally divided into 4 hash code blocks, or the 4 hash code blocks may not be equally divided, that is, the bits in the 4 hash code blocks may be the same or different; and if there is an overlap in the 4 hashed code blocks, the total number of bits of the 4 hashed code blocks is greater than 64 bits, and if there is no overlap in the 4 hashed code blocks, the total number of bits of the 4 hashed code blocks is still equal to 64 bits. The dividing rule of the hash code block can be configured and preset according to actual needs, and then each picture is divided according to the preset rule.
For the specific division process of the hash code blocks, for example, the obtained 64-bit bitmap chip hash code is averagely divided into 4 hash code blocks A, B, C and D, which do not coincide with each other, the picture will have 4 position indexes in consideration of that each hash code block corresponds to one position index, if the hash code block a is taken as an index block, the remaining hash code blocks B, C and D are taken as a whole as the key values of the hash code block a, and similarly, if the hash code block B is taken as an index block, the remaining hash code blocks A, C and D are taken as a whole as the key values of the hash code block B, and so on, there are 4 combinations, that is, the picture corresponds to 4 key value pairs. In the process of identifying similar pictures, if one 16-bit hash code block is selected for accurate matching, when 2 34-power (almost 10 hundred million) hash code fingerprints (each hash fingerprint corresponds to one picture) exist in the sample picture library, 2-power (262144) candidate results are returned for each index hash code block, and the calculation amount for calculating the distance between the hash code blocks of different pictures is greatly reduced.
For example, the 64-bitmap chip hash codes may be divided unequally, and overlapping portions exist between the divided hash code blocks. For example, a 64-bit bitmap chip hash code may be first split into 4 blocks, optionally with 16 bits as a hash code block F, and then the remaining 48 bits of the chip hash code may be split into 4 hash code sub-blocks e, F, g, and h again. Each hash code sub-block is 12 bits, the hash code block F and the hash code sub-block e can be combined to be an index block, the index block is 28 bits, and the rest hash codes are used as key values of the index block; or the combination of the hash code block F and the hash code sub-block F, the hash code block F and the hash code sub-block g, or the hash code block F and the hash code sub-block h is used as an index block, and four combination modes are provided. Finally, there are 4X4 combinations for the non-uniform method, namely 4X4 key value pairs. In the process of identifying similar pictures, 16 index blocks are searched in parallel, and the condition of hash code block omission is avoided. In addition, compared with the situation that the 64-bitmap hash code is divided into 4 hash code blocks evenly, and each index key corresponds to 16 bits, the 28-bit index block obtained in a non-even mode is used for matching and searching, the number of returned results is less, and the calculation amount of the distance between the hash code blocks of different pictures is less.
Step S130, in at least two hash code blocks of each picture, dividing the pictures with the same hash code block in the same position index into the same cluster to obtain a plurality of picture clusters.
In this step, the hash code blocks of each picture are compared, the comparison may be performed according to the value of each part in the hash code blocks indexed at the same position, the hash code blocks with the same value of each part are regarded as the same hash code block, and then the pictures with the same hash code block are divided into the same cluster, so as to obtain a plurality of picture clusters. After a large number of pictures are clustered, the number of pictures in each cluster is relatively small, so that the calculation amount of similarity between similar pictures is minimized.
And step S140, in each picture cluster, identifying similar pictures according to the distance of the hash code blocks except the same hash code block corresponding to the same position index among the pictures.
Specifically, after a large number of pictures are clustered, the number of pictures in each picture cluster is relatively reduced, so that the effect of improving the calculation performance can be achieved, in the process of calculating the similarity, the same hash code blocks corresponding to the same position indexes among the pictures are removed, the remaining hash code blocks in each picture are respectively taken as a whole, the distance between the remaining hash code blocks among the pictures is calculated according to the distance to identify the similar pictures, and the calculation performance is further improved. Wherein, the smaller the calculated distance is, the more similar the two pictures are, namely the greater the similarity is; the larger the distance is, the larger the difference between the two pictures is, i.e. the smaller the similarity is.
Optionally, the method further includes:
calculating the similarity between the pictures in each picture cluster according to the distance between the hash code blocks except the same hash code block corresponding to the same position index among the pictures;
and counting the similarity among the pictures obtained by calculation in all the picture clusters, and identifying the pictures with the similarity exceeding a preset threshold as similar pictures.
The pictures are clustered according to the same hash code block among the pictures, and the same picture may exist in different picture clusters. Illustratively, the picture a and the picture B are simultaneously divided into a plurality of picture clusters, the picture similarity in each picture cluster is calculated, the similarity values of the plurality of pictures a and the picture B are obtained, and after the similarity of the pictures calculated in all the picture clusters is counted, one similarity value of the picture a and the picture B can be obtained, so that the purpose of duplicate removal and simplification is achieved.
According to the embodiment of the invention, the low-dimensional eigenvector hash code of each picture is calculated by using the average hash algorithm of the pictures, a similar hash technology is adopted to cluster massive pictures according to whether the divided hash code blocks indexed at the same position are the same or not, and then the similar pictures are identified according to the distance between the hash code blocks except the hash code block corresponding to the same position index among the pictures, so that the problems of high complexity and large calculated amount of a method for identifying the similar pictures in the prior art are solved, the calculation complexity of similar picture identification is reduced, the calculated amount is reduced, and the high-efficiency calculation of the picture similarity is realized.
Example two
Fig. 2 is a flowchart of a method for identifying similar pictures according to a second embodiment of the present invention, which is further optimized based on the second embodiment. As shown in fig. 2, the method specifically includes:
step S210, in response to a similar picture search request of the target picture, calculating a target low-dimensional feature vector hash code of the target picture by using a picture average hash algorithm.
In the step, specifically, a user inputs a target picture to be searched on a webpage or application software, the server receives the target picture, responds to a similar picture search request of the target picture, and calculates a target low-dimensional feature vector hash code of the target picture by using a picture average hash algorithm.
Step S220, uniformly dividing the target low-dimensional eigenvector hash code into at least two hash code blocks according to a preset rule, wherein each hash code block corresponds to a position index.
The lengths of the divided target low-dimensional feature vector hash code blocks can be the same or different, and the hash code blocks comprise overlapped or non-overlapped parts.
Step S230, dividing the target picture into at least one target picture cluster of the obtained multiple picture clusters, where the same hash code block exists in the pictures in the target picture cluster and the target picture at the same position index.
Step 240, in at least one target picture cluster, searching similar pictures of the target picture according to the distance of the hash code blocks except the same hash code block corresponding to the same position index among the pictures.
Specifically, the distance between the hash code blocks except the same hash code block corresponding to the same position index between the target picture and the picture in each picture cluster is calculated, the smaller the calculated distance is, the greater the corresponding picture similarity is, the picture with the similarity value exceeding the threshold value is identified as a similar picture, after the search is completed, the server returns the picture with the higher similarity to the target picture, and the search result of the similar picture is displayed on a user webpage or application software. For example, if the similarity threshold is set to 98% by the user, the pictures with the similarity values of greater than or equal to 98% in the calculated picture similarity values will be identified as similar pictures.
According to the embodiment of the invention, the low-dimensional eigenvector hash code of the target picture is obtained by utilizing the average hash algorithm of the picture, the low-dimensional eigenvector hash code of the target picture is partitioned and clustered by adopting a similar hash technology, and then the similar picture of the target picture is identified according to the distance of the hash code blocks except the same hash code block corresponding to the same position index between the pictures in each target picture cluster, so that the rapid search of the similar pictures in the massive pictures is realized.
EXAMPLE III
Fig. 3 is a schematic structural diagram of an apparatus for identifying similar pictures according to a third embodiment of the present invention, which is applicable to identifying similar pictures. The device provided by the embodiment can execute the identification method of the similar picture provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
As shown in fig. 3, the apparatus for identifying similar pictures in this embodiment includes a hash code calculation module 310, a blocking module 320, a clustering module 330, and a first identification module 340. Wherein:
the hash code calculation module 310 is configured to calculate a low-dimensional feature vector hash code of each picture by using a picture average hash algorithm.
The partitioning module 320 is configured to uniformly divide the low-dimensional eigenvector hash code of each picture into at least two hash code blocks according to a preset rule, where each hash code block corresponds to one position index.
Further, the partitioning module 320 is specifically configured to uniformly divide the low-dimensional eigenvector hash code of each picture into at least two hash code blocks with the same or different hash code lengths according to a preset rule, where the hash code blocks include overlapping or non-overlapping portions.
The clustering module 330 is configured to place, in at least two hash code blocks of each picture, pictures with the same hash code block in the same position index into the same cluster, so as to obtain multiple picture clusters.
The first identifying module 340 is configured to identify, in each picture cluster, similar pictures according to distances between the pictures except for the same hash code block corresponding to the same position index.
Optionally, the apparatus further includes a calculating module and a second identifying module, wherein:
and the calculating module is used for calculating the similarity between the pictures in each picture cluster according to the distance between the hash code blocks except the same hash code block corresponding to the same position index among the pictures.
And the second identification module is used for counting the similarity among the pictures obtained by calculation in all the picture clusters and identifying the picture with the similarity exceeding a preset threshold value as a similar picture.
Further, the device also comprises a searching module used for searching similar pictures of the target picture. Wherein,
the search module specifically comprises:
the target hash code calculation unit is used for responding to a similar picture search request of a target picture and calculating a target low-dimensional feature vector hash code of the target picture by using a picture average hash algorithm;
the partitioning unit is used for uniformly dividing the target low-dimensional eigenvector hash code into at least two hash code blocks according to a preset rule, wherein each hash code block corresponds to one position index;
the clustering unit is used for partitioning a target picture into at least one target picture cluster in a plurality of obtained picture clusters, wherein the pictures in the target picture cluster and the target picture have the same hash code block in the same position index;
and the searching unit is used for searching similar pictures of the target picture according to the distance of the hash code blocks except the same hash code block corresponding to the same position index among the pictures in at least one target picture cluster.
According to the embodiment of the invention, the low-dimensional eigenvector hash code of each picture is obtained by utilizing the average hash algorithm of the pictures, massive pictures are clustered according to whether the divided hash code blocks indexed at the same position are the same or not, and then similar pictures are identified according to the distance between the pictures except the hash code blocks indexed at the same position, so that the problems of high complexity and large calculated amount of a method for identifying similar pictures in the prior art are solved, the calculation complexity of similar picture identification is reduced, the calculated amount is reduced, the high-efficiency calculation of picture similarity is realized, and the rapid search of the similar pictures is realized.
Example four
Fig. 4 is a schematic structural diagram of a server according to a fourth embodiment of the present invention. FIG. 4 illustrates a block diagram of an exemplary server 12 suitable for use in implementing embodiments of the present invention. The server 12 shown in fig. 4 is only an example, and should not bring any limitation to the function and the scope of use of the embodiment of the present invention.
As shown in FIG. 4, the server 12 is in the form of a general purpose computing device. The components of the server 12 may include, but are not limited to: one or more processors 16, a memory device 28, and a bus 18 that connects the various system components (including the memory device 28 and the processors 16).
Bus 18 represents one or more of any of several types of bus structures, including a memory device bus or memory device controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
The server 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by server 12 and includes both volatile and nonvolatile media, removable and non-removable media.
Storage 28 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 30 and/or cache Memory 32. The server 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk such as a Compact disk Read-Only Memory (CD-ROM), Digital Video disk Read-Only Memory (DVD-ROM) or other optical media may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Storage 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in storage 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
The server 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with the server 12, and/or with any devices (e.g., network card, modem, etc.) that enable the server 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Further, the server 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), Wide Area Network (WAN), and/or a public Network such as the internet) via the Network adapter 20. As shown in FIG. 4, the network adapter 20 communicates with the other modules of the server 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the server 12, including but not limited to: microcode, device drivers, Redundant processors, external disk drive Arrays, RAID (Redundant Arrays of Independent Disks) systems, tape drives, and data backup storage systems, among others.
The processor 16 executes various functional applications and data processing by running a program stored in the storage device 28, for example, to implement the method for identifying similar pictures provided by the embodiment of the present invention.
EXAMPLE five
Fifth, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for identifying similar pictures provided in the embodiment of the present invention.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM, or flash Memory), an optical fiber, a portable compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method for identifying similar pictures is characterized by comprising the following steps:
calculating to obtain a low-dimensional characteristic vector hash code of each picture by using a picture average hash algorithm;
uniformly dividing the low-dimensional eigenvector hash code of each picture into at least two hash code blocks according to a preset rule, wherein each hash code block corresponds to one position index;
dividing the pictures with the same hash code blocks in the same position index into the same cluster in at least two hash code blocks of each picture to obtain a plurality of picture clusters;
in each picture cluster, similar pictures are identified according to the distance of the hash code blocks between the pictures except the same hash code block corresponding to the same position index.
2. The method according to claim 1, wherein each picture comprises at least two hash code blocks with same or different hash code lengths, and the hash code blocks comprise overlapping or non-overlapping parts.
3. The method according to claim 1 or 2, characterized in that the method further comprises:
calculating the similarity between the pictures in each picture cluster according to the distance between the hash code blocks except the same hash code block corresponding to the same position index among the pictures;
and counting the similarity among the pictures obtained by calculation in all the picture clusters, and identifying the pictures with the similarity exceeding a preset threshold as similar pictures.
4. The method according to claim 1 or 2, characterized in that the method further comprises:
responding to a similar picture searching request of a target picture, and calculating a target low-dimensional feature vector hash code of the target picture by using a picture average hash algorithm;
uniformly dividing the target low-dimensional eigenvector hash code into at least two hash code blocks according to the preset rule, wherein each hash code block corresponds to a position index;
dividing a target picture into at least one target picture cluster in the plurality of picture clusters, wherein the pictures in the target picture cluster and the target picture have the same Hash code blocks in the same position index;
and searching similar pictures of the target picture according to the distance of the hash code blocks except the same hash code block corresponding to the same position index among the pictures in at least one target picture cluster.
5. An apparatus for recognizing similar pictures, comprising:
the hash code calculation module is used for calculating a low-dimensional characteristic vector hash code of each picture by using a picture average hash algorithm;
the partitioning module is used for uniformly partitioning the low-dimensional eigenvector hash code of each picture into at least two hash code blocks according to a preset rule, wherein each hash code block corresponds to one position index;
the clustering module is used for placing the pictures with the same Hash code blocks in the same position index into the same cluster in at least two Hash code blocks of each picture to obtain a plurality of picture clusters;
and the first identification module is used for identifying similar pictures according to the distance of the hash code blocks except the same hash code block corresponding to the same position index among the pictures in each picture cluster.
6. The apparatus according to claim 5, wherein the partitioning module is specifically configured to uniformly partition the low-dimensional eigenvector hash code of each picture into at least two hash code blocks with the same or different hash code lengths according to a preset rule, and the hash code blocks include overlapping or non-overlapping portions therebetween.
7. The apparatus of claim 5 or 6, further comprising:
the calculation module is used for calculating the similarity between the pictures in each picture cluster according to the distance between the hash code blocks except the same hash code block corresponding to the same position index among the pictures;
and the second identification module is used for counting the similarity among the pictures obtained by calculation in all the picture clusters and identifying the picture with the similarity exceeding a preset threshold value as a similar picture.
8. The apparatus according to claim 5 or 6, wherein the apparatus further comprises a searching module for searching for similar pictures of the target picture;
the search module specifically comprises:
the target hash code calculation unit is used for responding to a similar picture search request of a target picture and calculating a target low-dimensional feature vector hash code of the target picture by using a picture average hash algorithm;
the partitioning unit is used for uniformly dividing the target low-dimensional eigenvector hash code into at least two hash code blocks according to the preset rule, wherein each hash code block corresponds to one position index;
the clustering unit is used for partitioning a target picture into at least one target picture cluster in the plurality of picture clusters, wherein the pictures in the target picture cluster and the target picture have the same hash code block in the same position index;
and the searching unit is used for searching similar pictures of the target picture according to the distance of the hash code blocks except the same hash code block corresponding to the same position index among the pictures in at least one target picture cluster.
9. A server, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method for identifying similar pictures as in any of claims 1-4.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method for identifying similar pictures according to any one of claims 1 to 4.
CN201710945888.XA 2017-10-12 2017-10-12 The recognition methods of similar pictures and device, server, storage medium Active CN107729935B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710945888.XA CN107729935B (en) 2017-10-12 2017-10-12 The recognition methods of similar pictures and device, server, storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710945888.XA CN107729935B (en) 2017-10-12 2017-10-12 The recognition methods of similar pictures and device, server, storage medium

Publications (2)

Publication Number Publication Date
CN107729935A CN107729935A (en) 2018-02-23
CN107729935B true CN107729935B (en) 2019-11-12

Family

ID=61210968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710945888.XA Active CN107729935B (en) 2017-10-12 2017-10-12 The recognition methods of similar pictures and device, server, storage medium

Country Status (1)

Country Link
CN (1) CN107729935B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309336B (en) 2018-03-12 2023-08-08 腾讯科技(深圳)有限公司 Image retrieval method, device, system, server and storage medium
CN108536769B (en) * 2018-03-22 2023-01-03 深圳市安软慧视科技有限公司 Image analysis method, search method and device, computer device and storage medium
CN108595710B (en) * 2018-05-11 2021-07-13 杨晓春 Rapid massive picture de-duplication method
CN111079757B (en) * 2018-10-19 2024-09-20 北京奇虎科技有限公司 Clothing attribute identification method and device and electronic equipment
CN111506756B (en) * 2019-01-30 2024-05-17 北京京东尚科信息技术有限公司 Method and system for searching similar pictures, electronic equipment and storage medium
CN111985957A (en) * 2019-05-24 2020-11-24 阿里巴巴集团控股有限公司 Advertisement exposure effect evaluation method and related device
CN110399511A (en) * 2019-07-23 2019-11-01 中南民族大学 Image cache method, equipment, storage medium and device based on Redis
CN110490250A (en) * 2019-08-19 2019-11-22 广州虎牙科技有限公司 A kind of acquisition methods and device of artificial intelligence training set
CN111078914B (en) * 2019-12-18 2023-04-18 书行科技(北京)有限公司 Method and device for detecting repeated pictures
CN111368122B (en) * 2020-02-14 2022-09-30 深圳壹账通智能科技有限公司 Method and device for removing duplicate pictures
CN111522989B (en) * 2020-07-06 2020-10-30 南京梦饷网络科技有限公司 Method, computing device, and computer storage medium for image retrieval

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819582A (en) * 2012-07-26 2012-12-12 华数传媒网络有限公司 Quick searching method for mass images
CN103984776A (en) * 2014-06-05 2014-08-13 北京奇虎科技有限公司 Repeated image identification method and image search duplicate removal method and device
CN104112284A (en) * 2013-04-22 2014-10-22 阿里巴巴集团控股有限公司 Method and equipment for detecting similarity of images

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819582A (en) * 2012-07-26 2012-12-12 华数传媒网络有限公司 Quick searching method for mass images
CN104112284A (en) * 2013-04-22 2014-10-22 阿里巴巴集团控股有限公司 Method and equipment for detecting similarity of images
CN103984776A (en) * 2014-06-05 2014-08-13 北京奇虎科技有限公司 Repeated image identification method and image search duplicate removal method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Block mean value based image perceptual Hashing;Yang B,等;《Proceedings of the IEEE International Conference on IIH-MSP》;IEEE;20161226;全文 *
较大规模图片 使用phash去重;辰辰沉沉沉;《https://www.jianshu.com/p/c87f6f69d51f》;20170725;全文 *

Also Published As

Publication number Publication date
CN107729935A (en) 2018-02-23

Similar Documents

Publication Publication Date Title
CN107729935B (en) The recognition methods of similar pictures and device, server, storage medium
CN108229419B (en) Method and apparatus for clustering images
US11310559B2 (en) Method and apparatus for recommending video
CN110147722A (en) A kind of method for processing video frequency, video process apparatus and terminal device
CN106951484B (en) Picture retrieval method and device, computer equipment and computer readable medium
CN114612759B (en) Video processing method, video query method, model training method and model training device
WO2019080411A1 (en) Electrical apparatus, facial image clustering search method, and computer readable storage medium
US20130254191A1 (en) Systems and methods for mobile search using bag of hash bits and boundary reranking
CN110222775B (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN111651636A (en) Video similar segment searching method and device
CN111209431A (en) Video searching method, device, equipment and medium
CN111104825A (en) Face registry updating method, device, equipment and medium
CN110941978B (en) Face clustering method and device for unidentified personnel and storage medium
CN112232203B (en) Pedestrian recognition method and device, electronic equipment and storage medium
Sundara Vadivel et al. An efficient CBIR system based on color histogram, edge, and texture features
US10127476B2 (en) Signal classification using sparse representation
CN109697240B (en) Image retrieval method and device based on features
CN112380169A (en) Storage device, data processing method, device, apparatus, medium, and system
CN109800215B (en) Bidding processing method and device, computer storage medium and terminal
CN110198473B (en) Video processing method and device, electronic equipment and computer readable storage medium
CN107729944A (en) A kind of recognition methods, device, server and the storage medium of vulgar picture
CN111177450A (en) Image retrieval cloud identification method and system and computer readable storage medium
US11599743B2 (en) Method and apparatus for obtaining product training images, and non-transitory computer-readable storage medium
CN111368128A (en) Target picture identification method and device and computer readable storage medium
US8958651B2 (en) Tree-model-based stereo matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231113

Address after: 311422 Room 125, 1st Floor, Building 197-2, Jiulong Avenue, Yinhu Street, Fuyang District, Hangzhou City, Zhejiang Province

Patentee after: Hangzhou Dabei Biotechnology Co.,Ltd.

Address before: 310019 Room 204, building A12, No.9 Jiusheng Road, Jianggan District, Hangzhou City, Zhejiang Province

Patentee before: HANGZHOU BEIGOU TECHNOLOGY CO.,LTD.