Nothing Special   »   [go: up one dir, main page]

CN110633383A - Method and device for identifying repeated house sources, electronic equipment and readable medium - Google Patents

Method and device for identifying repeated house sources, electronic equipment and readable medium Download PDF

Info

Publication number
CN110633383A
CN110633383A CN201910865969.8A CN201910865969A CN110633383A CN 110633383 A CN110633383 A CN 110633383A CN 201910865969 A CN201910865969 A CN 201910865969A CN 110633383 A CN110633383 A CN 110633383A
Authority
CN
China
Prior art keywords
source picture
room source
hash value
room
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910865969.8A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Youzhuju Network Technology Co Ltd
Original Assignee
Beijing Infinite Light Field Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Infinite Light Field Technology Co Ltd filed Critical Beijing Infinite Light Field Technology Co Ltd
Priority to CN201910865969.8A priority Critical patent/CN110633383A/en
Publication of CN110633383A publication Critical patent/CN110633383A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/16Real estate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Marketing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the disclosure discloses a method and a device for identifying repeated house sources, electronic equipment and a readable medium. The method comprises the following steps: performing preset rule operation on the first room source picture and the second room source picture to obtain a hash value of the first room source picture and a hash value of the second room source picture; the hash value of the first room source picture and the hash value of the second room source picture are both hash values of target bits; determining whether the first room source picture and the second room source picture are similar pictures or not according to the difference digits of the hash value of the first room source picture and the hash value of the second room source picture; and if the number of the similar pictures of the first room source picture and the second room source picture exceeds a similar threshold value, determining that the first room source and the second room source are repeated room sources. By adopting the technical scheme provided by the embodiment of the disclosure, the repeated house sources on the platform can be identified based on the image identification technology, and the purpose of reducing the calculation amount can be achieved.

Description

Method and device for identifying repeated house sources, electronic equipment and readable medium
Technical Field
The embodiment of the disclosure relates to the technical field of picture processing, and in particular relates to a method and a device for identifying a repeated room source, an electronic device and a readable medium.
Background
With the progress of the internet, people tend to find house resources on the internet more and more when needing to buy houses or rent houses, so that the time for finding the house resources is shortened, and the efficiency for finding the house resources is improved. At present, the house resources on the market have more online modes, corresponding information can be published on each house searching platform, and simultaneously, a householder and a mediator can publish the information, so that the problems of house resource data lag and more data of repeated house resources are caused. And the house source information may change at any time, such as replacing, adding or deleting part of pictures of the original house source information, changing the price of the house source information, and the like. For any room source platform, all room source pictures are compared to determine whether the room source is a repeated room source, and the calculation amount is huge, so that many platforms give up judging the repeated room source, a large amount of redundant information is increased, and the room finding experience of a user on the platform is influenced.
Disclosure of Invention
The embodiment of the disclosure provides a method and a device for identifying repeated house sources, electronic equipment and a readable medium, so as to realize the purpose of identifying repeated house sources on a platform based on an image identification technology and reducing the calculation amount.
In a first aspect, an embodiment of the present disclosure provides a method for identifying a duplicate house source, where the method includes:
performing preset rule operation on the first room source picture and the second room source picture to obtain a hash value of the first room source picture and a hash value of the second room source picture; the hash value of the first room source picture and the hash value of the second room source picture are both hash values of target bits;
determining whether the first room source picture and the second room source picture are similar pictures or not according to the difference digits of the hash value of the first room source picture and the hash value of the second room source picture;
and if the number of the similar pictures of the first room source picture and the second room source picture exceeds a similar threshold value, determining that the first room source and the second room source are repeated room sources.
Optionally, the preset rule operation is performed on the first room source picture and the second room source picture to obtain the hash value of the first room source picture and the hash value of the second room source picture, including:
acquiring a gray-scale image of a first room source picture and a gray-scale image of a second room source picture;
modifying the image sizes of the gray level image of the first room source picture and the gray level image of the second room source picture to obtain a gray level image of the first room source picture with a preset size and a gray level image of the second room source picture with a preset size;
and determining the hash value of the first room source picture and the hash value of the second room source picture according to the gradient information of the gray level map of the first room source picture with the preset size and the gray level map of the second room source picture with the preset size.
Optionally, the preset size is an image size of 8 rows and 9 columns of pixel points;
correspondingly, determining the hash value of the first room source picture and the hash value of the second room source picture according to the gradient information of the gray level map of the first room source picture with the preset size and the gray level map of the second room source picture with the preset size, including:
aiming at each pixel point of the gray-scale image of the first room source picture with the preset size and the gray-scale image of the second room source picture with the preset size, if the gray value of the adjacent pixel point on the right side of the same row of the pixel point is larger than the gray value of the pixel point, marking the gray value as 1; otherwise, the flag is 0;
and respectively obtaining 8 rows and 8 columns of mark numerical values as the hash value of the first room source picture and the hash value of the second room source picture.
Optionally, before determining whether the first room source picture and the second room source picture are similar pictures according to the difference bits between the hash value of the first room source picture and the hash value of the second room source picture, the method further includes:
the hash value of the first room source picture and the hash value of the second room source picture are segmented into a first preset number of comparison segments;
and if the data comparison result in each comparison section is different, determining that the first room source picture and the second room source picture are not similar pictures.
Optionally, after the hash value of the first room source picture and the hash value of the second room source picture are segmented into a first preset number of comparison segments, the method further includes:
if the data comparison result in the at least one comparison section is the same, determining whether the first room source picture and the second room source picture are similar pictures according to the difference digit of the hash value of the first room source picture and the hash value of the second room source picture, including:
determining the number of bits of data difference in different comparison sections as the data comparison result in the comparison section, wherein the number of bits of the data difference is the difference number of bits of the hash value of the first room source picture and the hash value of the second room source picture;
if the difference digit number exceeds a second preset number, determining whether the first room source picture and the second room source picture are dissimilar pictures; and if the difference digit does not exceed a second preset quantity, determining whether the first room source picture and the second room source picture are similar pictures.
Optionally, the second preset number is the first preset number minus 1.
Optionally, the similarity threshold is 4.
In a second aspect, an embodiment of the present disclosure further provides an apparatus for identifying a duplicate origin, where the apparatus includes:
the hash value determining module is used for performing preset rule operation on the first room source picture and the second room source picture to obtain a hash value of the first room source picture and a hash value of the second room source picture; the hash value of the first room source picture and the hash value of the second room source picture are both hash values of target bits;
the similar picture identification module is used for determining whether the first room source picture and the second room source picture are similar pictures or not according to the difference digit of the hash value of the first room source picture and the hash value of the second room source picture;
and the repeated house source determining module is used for determining that the first house source and the second house source are repeated house sources if the number of the similar pictures of the first house source picture and the second house source picture exceeds a similar threshold value.
In a third aspect, the present disclosure provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the method for identifying duplicate origins according to the present disclosure.
In a fourth aspect, the disclosed embodiments provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for identifying duplicate sources according to the disclosed embodiments.
According to the technical scheme provided by the embodiment of the disclosure, a first room source picture and a second room source picture are subjected to preset rule operation to obtain a hash value of the first room source picture and a hash value of the second room source picture; the hash value of the first room source picture and the hash value of the second room source picture are both hash values of target bits; determining whether the first room source picture and the second room source picture are similar pictures or not according to the difference digits of the hash value of the first room source picture and the hash value of the second room source picture; and if the number of the similar pictures of the first room source picture and the second room source picture exceeds a similar threshold value, determining that the first room source and the second room source are repeated room sources. By adopting the technical scheme provided by the disclosure, the repeated house sources on the platform can be identified based on the image identification technology, and the purpose of reducing the calculation amount can be achieved.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.
Fig. 1 is a flowchart of a method for identifying duplicate origin provided by an embodiment of the present disclosure;
FIG. 2 is a flow chart of a method for identifying duplicate sources provided by an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an identification apparatus for duplicate origin provided in an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
Fig. 1 is a flowchart of a method for identifying duplicate origin provided in an embodiment of the present disclosure, where the embodiment is applicable to a case of identifying information of duplicate origins on any platform, and the method may be executed by an apparatus for identifying duplicate origins provided in an embodiment of the present disclosure, where the apparatus may be implemented by software and/or hardware, and may be integrated in an electronic device such as an intelligent terminal.
As shown in fig. 1, the method for identifying duplicate sources includes:
s110, performing preset rule operation on the first room source picture and the second room source picture to obtain a hash value of the first room source picture and a hash value of the second room source picture; and the hash value of the first room source picture and the hash value of the second room source picture are both hash values of target bits.
The house source information of the same cell name is large, and a house owner or an intermediary often cannot accurately obtain a specific unit number and a house door number when uploading the house source information, so that the house source amount of the same information is large, and the house owner and the intermediary all have the possibility of uploading the house source information, so that the phenomenon of a large number of repeated house sources on the platform is caused. Therefore, the scheme provides more accurate judgment on whether the house source is a repeated house source according to the house source picture.
In the technical scheme, as most of the pictures in each house source information are 7-10, each picture of different house sources can be processed to respectively obtain the hash value of the house source picture. And then, comparing the picture of one house source with the picture of the other house source to determine whether the pictures are similar pictures. In the technical scheme, the hash value of the room source picture is obtained through preset rule operation in many ways. As long as a group of hash values that can uniquely represent a picture can be obtained, it should be noted that since the hash values corresponding to the pictures need to be compared, the hash values are calculated in the same manner for all the house source pictures. The hash value may be a character string formed by a number and a subtitle, and may include only a number or only a subtitle.
S120, determining whether the first room source picture and the second room source picture are similar pictures according to the difference digit of the hash value of the first room source picture and the hash value of the second room source picture.
After the hash value of the first room source picture and the hash value of the second room source picture are obtained, if the two pictures are the same, the two hash values are also the same, and if the two pictures have different bits, the two pictures can be determined to be similar pictures. In this scheme, whether the two pictures are similar pictures or not can be determined according to the difference digit of the hash value of the first room source picture and the hash value of the second room source picture. The number of bits of the hash value of the first room source picture and the hash value of the second room source picture may be 64 bits or 32 bits, or may be more or less, and may be specifically determined according to the repeated room source calculation capability of the platform and the required calculation speed. It can be understood that the more bits of the hash value of the first room source picture and the hash value of the second room source picture, the slower the subsequent calculation process. Taking 64 bits as an example, a threshold value may be set, which may be 5. If the hash value of the first room source picture and the hash value of the second room source picture are more than 5, it can be determined that the two pictures are similar pictures. If less than or equal to 5 bits, it may be determined that the two pictures are similar pictures.
S130, if the number of the similar pictures of the first room source picture and the second room source picture exceeds a similar threshold value, determining that the first room source and the second room source are repeated room sources.
In this embodiment, preferably, the similarity threshold is 4.
With reference to the above example, if similar pictures exist in the first room source picture and the second room source picture, the number of the two room source similar pictures may be counted, and if the number of the two room source similar pictures exceeds the similarity threshold, it may be determined that the two room sources are duplicate room sources. Wherein, the similarity threshold value can be 4, and the number can be obtained from a large amount of statistics of the house source information. For example, the number of the first room source pictures is 7, the number of the second room source pictures is 10, if the comparison shows that the number of the similar pictures of the two room sources is 4 or less than 4, the two room sources can be determined not to be the duplicate room sources, and if the number of the similar pictures of the two room sources exceeds 4, for example, 5, 6, and even the 7 pictures of the first room source are similar to the pictures of the second room source, the two room sources can be determined to be the duplicate room sources. In such a case, in the process of displaying for the user, operations such as filtering the duplicate house resources, or supervision operations such as deleting or controlling the quantity of the duplicate house resources on the platform may be performed, so as to avoid the occurrence of a large amount of redundant information.
According to the technical scheme provided by the embodiment of the disclosure, a first room source picture and a second room source picture are subjected to preset rule operation to obtain a hash value of the first room source picture and a hash value of the second room source picture; the hash value of the first room source picture and the hash value of the second room source picture are both hash values of target bits; determining whether the first room source picture and the second room source picture are similar pictures or not according to the difference digits of the hash value of the first room source picture and the hash value of the second room source picture; and if the number of the similar pictures of the first room source picture and the second room source picture exceeds a similar threshold value, determining that the first room source and the second room source are repeated room sources. By adopting the technical scheme provided by the disclosure, the repeated house sources on the platform can be identified based on the image identification technology, and the purpose of reducing the calculation amount can be achieved.
Fig. 2 is a flowchart of a method for identifying duplicate origin provided in the embodiment of the present disclosure, and further optimization is performed on the basis of the foregoing technical solution. The concrete optimization is as follows: carry out the operation of preset rule to first room source picture and second room source picture to obtain the hash value of first room source picture and the hash value of second room source picture, include: acquiring a gray-scale image of a first room source picture and a gray-scale image of a second room source picture; modifying the image sizes of the gray level image of the first room source picture and the gray level image of the second room source picture to obtain a gray level image of the first room source picture with a preset size and a gray level image of the second room source picture with a preset size; and determining the hash value of the first room source picture and the hash value of the second room source picture according to the gradient information of the gray level map of the first room source picture with the preset size and the gray level map of the second room source picture with the preset size. .
As shown in fig. 2, the method for identifying duplicate sources includes:
s210, obtaining a gray level image of the first room source picture and a gray level image of the second room source picture.
The picture of the first room source and the picture of the second room source can be color pictures or gray pictures, and if the pictures are gray pictures, no processing is needed, and if the pictures are color pictures, the pictures can be converted into gray pictures. In this scheme, it can also be determined whether the room source picture is a grayscale picture after the room source picture is obtained, and the conventional method for converting a color picture into a grayscale picture is more and is not repeated here.
S220, modifying the image sizes of the gray level image of the first room source picture and the gray level image of the second room source picture to obtain the gray level image of the first room source picture with the preset size and the gray level image of the second room source picture with the preset size.
After the grayscale picture is obtained, it may be processed to obtain a picture of a preset size. It may be down-sampled, for example, to obtain a grayscale image of relatively low resolution. The down-sampling mode can be 4 pixel points or every 9 pixel points are down-sampled to be the gray value of one pixel point, and then the gray image with low resolution is obtained. Specifically, how to perform the down-sampling may be determined according to a desired preset size.
S230, determining the hash value of the first room source picture and the hash value of the second room source picture according to the gradient information of the gray level map of the first room source picture with the preset size and the gray level map of the second room source picture with the preset size.
In this embodiment, the hash value of the first room source picture and the hash value of the second room source picture may be determined according to the obtained gradient information of the gray scale map with the preset size. Specifically, gradient information of the gray-scale image is obtained through convolution calculation by a convolution operator in the horizontal or vertical direction, and the hash value of the room source image is determined according to the gradient information.
In this technical solution, preferably, the preset size is an image size of 8 rows and 9 columns of pixel points. Correspondingly, determining the hash value of the first room source picture and the hash value of the second room source picture according to the gradient information of the gray level map of the first room source picture with the preset size and the gray level map of the second room source picture with the preset size, including: aiming at each pixel point of the gray-scale image of the first room source picture with the preset size and the gray-scale image of the second room source picture with the preset size, if the gray value of the adjacent pixel point on the right side of the same row of the pixel point is larger than the gray value of the pixel point, marking the gray value as 1; otherwise, the flag is 0; and respectively obtaining 8 rows and 8 columns of mark numerical values as the hash value of the first room source picture and the hash value of the second room source picture. In the obtained 8-row 9-column image, each pixel point is compared with the pixel point on the right side of the pixel point, if the gray value of the pixel point on the right side is greater than that of the pixel point, the pixel point can be recorded as 1, and if the gray value of the pixel point on the right side is not greater than that of the pixel point, the pixel point can be recorded as 0, so that 8 numerical values can be determined by the 9 pixel points on each row, the counting results of all the 8 pixel points on the 8 rows are combined together, and a hash value of 64 is obtained, wherein the component of the hash value only comprises a digital part and only comprises two numbers of 0 and 1. Through the arrangement, the calculation amount of the identification process of the pixel picture can be greatly reduced, so that the requirement of a calculation result is lowered.
S240, determining whether the first room source picture and the second room source picture are similar pictures or not according to the difference digits of the hash value of the first room source picture and the hash value of the second room source picture.
And S250, if the number of the similar pictures of the first room source picture and the second room source picture exceeds a similar threshold, determining that the first room source and the second room source are repeated room sources.
According to the technical scheme, on the basis of the technical scheme, the method for determining the hash value of the house source picture is provided, the hash value of the house source picture can be rapidly determined by adopting the method, subsequent calculation can be carried out according to the hash value, the identification speed of the similar house source picture is improved, and the calculation amount required by identification of the similar picture is reduced.
On the basis of the foregoing technical solution, optionally, before determining whether the first room source picture and the second room source picture are similar pictures according to the difference bit number between the hash value of the first room source picture and the hash value of the second room source picture, the method further includes: the hash value of the first room source picture and the hash value of the second room source picture are segmented into a first preset number of comparison segments; and if the data comparison result in each comparison section is different, determining that the first room source picture and the second room source picture are not similar pictures. The hash value may be divided into a plurality of comparison segments, and each comparison segment may be compared. Wherein, each comparison segment is a numerical value of 0 and 1, and can be converted into a decimal number to determine whether the decimal number of each comparison segment is the same as that of the comparison segment of other images. If the two pictures are different, the content of the next comparison section is further judged, and if all the comparison sections of the two pictures are different, the two pictures can be determined not to be similar pictures. In combination with the above example, if there are more than 5 different bits of the hash value, it may be determined that the two room source pictures are different, and thus, if the 6 comparison segments are different, at least 6 bits are different, and the two room source pictures are dissimilar pictures. Through the design, the similar calculation amount of the house source pictures can be reduced, and the calculation speed is improved.
On the basis of the foregoing technical solution, optionally, after the hash value of the first room source picture and the hash value of the second room source picture are segmented into a first preset number of comparison segments, the method further includes: if the data comparison result in the at least one comparison section is the same, determining whether the first room source picture and the second room source picture are similar pictures according to the difference digit of the hash value of the first room source picture and the hash value of the second room source picture, including: determining the number of bits of data difference in different comparison sections as the data comparison result in the comparison section, wherein the number of bits of the data difference is the difference number of bits of the hash value of the first room source picture and the hash value of the second room source picture; if the difference digit number exceeds a second preset number, determining whether the first room source picture and the second room source picture are dissimilar pictures; and if the difference digit does not exceed a second preset quantity, determining whether the first room source picture and the second room source picture are similar pictures. If the data comparison result in at least one comparison segment is the same, the data comparison result in the comparison segment can be counted for different comparison segments to determine the difference bit number of the hash value, and whether the first room source picture and the second room source picture are similar pictures or not can be determined specifically by: determining the data comparison result in the comparison section as the digit of the data difference in different comparison sections; if the difference digit number exceeds a second preset number, whether the first room source picture and the second room source picture are dissimilar pictures can be further determined; and if the difference digit does not exceed a second preset quantity, determining whether the first room source picture and the second room source picture are similar pictures. Wherein the second preset number may be 5 bits in the above example. By means of the scheme, repeated calculation of the same comparison segment can be avoided. In this embodiment, if the hash value is 64, the 6 comparison segments may include: 11 bits, and 9 bits.
In this technical solution, preferably, the second preset number is the first preset number minus 1. By the arrangement, under the condition that the comparison sections are different, the two pictures can be directly determined to be dissimilar pictures, so that the calculation speed can be increased.
Fig. 3 is a schematic structural diagram of an apparatus for identifying duplicate origin provided in an embodiment of the present disclosure. As shown in fig. 3, the apparatus includes:
the hash value determining module 310 is configured to perform preset rule operation on the first room source picture and the second room source picture to obtain a hash value of the first room source picture and a hash value of the second room source picture; the hash value of the first room source picture and the hash value of the second room source picture are both hash values of target bits;
the similar picture identification module 320 is configured to determine whether the first room source picture and the second room source picture are similar pictures according to the difference digit of the hash value of the first room source picture and the hash value of the second room source picture;
a repeated room source determining module 330, configured to determine that the first room source and the second room source are repeated room sources if the number of the similar pictures of the first room source picture and the second room source picture exceeds a similarity threshold.
According to the technical scheme provided by the embodiment of the disclosure, a first room source picture and a second room source picture are subjected to preset rule operation to obtain a hash value of the first room source picture and a hash value of the second room source picture; the hash value of the first room source picture and the hash value of the second room source picture are both hash values of target bits; determining whether the first room source picture and the second room source picture are similar pictures or not according to the difference digits of the hash value of the first room source picture and the hash value of the second room source picture; and if the number of the similar pictures of the first room source picture and the second room source picture exceeds a similar threshold value, determining that the first room source and the second room source are repeated room sources. By adopting the technical scheme provided by the disclosure, the repeated house sources on the platform can be identified based on the image identification technology, and the purpose of reducing the calculation amount can be achieved.
The product can execute the method provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method.
Fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure. Referring now to FIG. 4, a block diagram of an electronic device 400 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 4, the electronic device 400 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 401 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage device 406 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the electronic apparatus 400 are also stored. The processing device 401, the ROM 402, and the RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
Generally, the following devices may be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 407 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 406 including, for example, magnetic tape, hard disk, etc.; and a communication device 409. The communication means 409 may allow the electronic device 400 to communicate wirelessly or by wire with other devices to exchange data. While fig. 4 illustrates an electronic device 400 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 409, or from the storage means 406, or from the ROM 402. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 401.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform: performing preset rule operation on the first room source picture and the second room source picture to obtain a hash value of the first room source picture and a hash value of the second room source picture; the hash value of the first room source picture and the hash value of the second room source picture are both hash values of target bits; determining whether the first room source picture and the second room source picture are similar pictures or not according to the difference digits of the hash value of the first room source picture and the hash value of the second room source picture; and if the number of the similar pictures of the first room source picture and the second room source picture exceeds a similar threshold value, determining that the first room source and the second room source are repeated room sources.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure, [ example one ] there is provided a method of identifying a duplicate origin, the method comprising:
performing preset rule operation on the first room source picture and the second room source picture to obtain a hash value of the first room source picture and a hash value of the second room source picture; the hash value of the first room source picture and the hash value of the second room source picture are both hash values of target bits;
determining whether the first room source picture and the second room source picture are similar pictures or not according to the difference digits of the hash value of the first room source picture and the hash value of the second room source picture;
and if the number of the similar pictures of the first room source picture and the second room source picture exceeds a similar threshold value, determining that the first room source and the second room source are repeated room sources.
According to one or more embodiments of the present disclosure, in example two, there is provided a method for identifying duplicate room sources, where a preset rule operation is performed on a first room source picture and a second room source picture to obtain a hash value of the first room source picture and a hash value of the second room source picture, the method including:
acquiring a gray-scale image of a first room source picture and a gray-scale image of a second room source picture;
modifying the image sizes of the gray level image of the first room source picture and the gray level image of the second room source picture to obtain a gray level image of the first room source picture with a preset size and a gray level image of the second room source picture with a preset size;
and determining the hash value of the first room source picture and the hash value of the second room source picture according to the gradient information of the gray level map of the first room source picture with the preset size and the gray level map of the second room source picture with the preset size.
According to one or more embodiments of the present disclosure, [ example three ] there is provided a method for identifying a repeating room source, wherein the preset size is an image size of 8 rows and 9 columns of pixel points;
correspondingly, determining the hash value of the first room source picture and the hash value of the second room source picture according to the gradient information of the gray level map of the first room source picture with the preset size and the gray level map of the second room source picture with the preset size, including:
aiming at each pixel point of the gray-scale image of the first room source picture with the preset size and the gray-scale image of the second room source picture with the preset size, if the gray value of the adjacent pixel point on the right side of the same row of the pixel point is larger than the gray value of the pixel point, marking the gray value as 1; otherwise, the flag is 0;
and respectively obtaining 8 rows and 8 columns of mark numerical values as the hash value of the first room source picture and the hash value of the second room source picture.
According to one or more embodiments of the present disclosure, an [ example four ] provides a duplicate origin identification method, before determining whether a first origin picture and a second origin picture are similar pictures according to a difference bit number of a hash value of the first origin picture and a hash value of the second origin picture, the method further includes:
the hash value of the first room source picture and the hash value of the second room source picture are segmented into a first preset number of comparison segments;
and if the data comparison result in each comparison section is different, determining that the first room source picture and the second room source picture are not similar pictures.
According to one or more embodiments of the present disclosure, [ example five ] there is provided a duplicate origin identification method, after segmenting a hash value of a first origin picture and a hash value of a second origin picture into a first preset number of comparison segments, the method further comprising:
if the data comparison result in the at least one comparison section is the same, determining whether the first room source picture and the second room source picture are similar pictures according to the difference digit of the hash value of the first room source picture and the hash value of the second room source picture, including:
determining the number of bits of data difference in different comparison sections as the data comparison result in the comparison section, wherein the number of bits of the data difference is the difference number of bits of the hash value of the first room source picture and the hash value of the second room source picture;
if the difference digit number exceeds a second preset number, determining whether the first room source picture and the second room source picture are dissimilar pictures; and if the difference digit does not exceed a second preset quantity, determining whether the first room source picture and the second room source picture are similar pictures.
According to one or more embodiments of the present disclosure, [ example six ] there is provided a method of identifying duplicate premises sources, the second preset number being the first preset number minus 1.
According to one or more embodiments of the present disclosure, [ example seven ] there is provided a method of identifying a duplicate origin, the similarity threshold being 4.
According to one or more embodiments of the present disclosure, [ example eight ] there is provided an identification apparatus of a duplicate origin, including:
the hash value determining module is used for performing preset rule operation on the first room source picture and the second room source picture to obtain a hash value of the first room source picture and a hash value of the second room source picture; the hash value of the first room source picture and the hash value of the second room source picture are both hash values of target bits;
the similar picture identification module is used for determining whether the first room source picture and the second room source picture are similar pictures or not according to the difference digit of the hash value of the first room source picture and the hash value of the second room source picture;
and the repeated house source determining module is used for determining that the first house source and the second house source are repeated house sources if the number of the similar pictures of the first house source picture and the second house source picture exceeds a similar threshold value.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (10)

1. A method for identifying duplicate sources, comprising:
performing preset rule operation on the first room source picture and the second room source picture to obtain a hash value of the first room source picture and a hash value of the second room source picture; the hash value of the first room source picture and the hash value of the second room source picture are both hash values of target bits;
determining whether the first room source picture and the second room source picture are similar pictures or not according to the difference digits of the hash value of the first room source picture and the hash value of the second room source picture;
and if the number of the similar pictures of the first room source picture and the second room source picture exceeds a similar threshold value, determining that the first room source and the second room source are repeated room sources.
2. The method of claim 1, wherein performing a predetermined rule operation on the first room source picture and the second room source picture to obtain the hash value of the first room source picture and the hash value of the second room source picture comprises:
acquiring a gray-scale image of a first room source picture and a gray-scale image of a second room source picture;
modifying the image sizes of the gray level image of the first room source picture and the gray level image of the second room source picture to obtain a gray level image of the first room source picture with a preset size and a gray level image of the second room source picture with a preset size;
and determining the hash value of the first room source picture and the hash value of the second room source picture according to the gradient information of the gray level map of the first room source picture with the preset size and the gray level map of the second room source picture with the preset size.
3. The method of claim 2, wherein the predetermined size is an image size of 8 rows and 9 columns of pixels;
correspondingly, determining the hash value of the first room source picture and the hash value of the second room source picture according to the gradient information of the gray level map of the first room source picture with the preset size and the gray level map of the second room source picture with the preset size, including:
aiming at each pixel point of the gray-scale image of the first room source picture with the preset size and the gray-scale image of the second room source picture with the preset size, if the gray value of the adjacent pixel point on the right side of the same row of the pixel point is larger than the gray value of the pixel point, marking the gray value as 1; otherwise, the flag is 0;
and respectively obtaining 8 rows and 8 columns of mark numerical values as the hash value of the first room source picture and the hash value of the second room source picture.
4. The method according to claim 1, wherein before determining whether the first room source picture and the second room source picture are similar pictures according to a difference bit number of the hash value of the first room source picture and the hash value of the second room source picture, the method further comprises:
the hash value of the first room source picture and the hash value of the second room source picture are segmented into a first preset number of comparison segments;
and if the data comparison result in each comparison section is different, determining that the first room source picture and the second room source picture are not similar pictures.
5. The method of claim 4, wherein after the hash value of the first room source picture and the hash value of the second room source picture are sliced into a first preset number of comparison segments, the method further comprises:
if the data comparison result in the at least one comparison section is the same, determining whether the first room source picture and the second room source picture are similar pictures according to the difference digit of the hash value of the first room source picture and the hash value of the second room source picture, including:
determining the number of bits of data difference in different comparison sections as the data comparison result in the comparison section, wherein the number of bits of the data difference is the difference number of bits of the hash value of the first room source picture and the hash value of the second room source picture;
if the difference digit number exceeds a second preset number, determining whether the first room source picture and the second room source picture are dissimilar pictures; and if the difference digit does not exceed a second preset quantity, determining whether the first room source picture and the second room source picture are similar pictures.
6. The method of claim 5, wherein the second predetermined number is the first predetermined number minus 1.
7. The method of claim 1, wherein the similarity threshold is 4.
8. An apparatus for identifying duplicate sources, comprising:
the hash value determining module is used for performing preset rule operation on the first room source picture and the second room source picture to obtain a hash value of the first room source picture and a hash value of the second room source picture; the hash value of the first room source picture and the hash value of the second room source picture are both hash values of target bits;
the similar picture identification module is used for determining whether the first room source picture and the second room source picture are similar pictures or not according to the difference digit of the hash value of the first room source picture and the hash value of the second room source picture;
and the repeated house source determining module is used for determining that the first house source and the second house source are repeated house sources if the number of the similar pictures of the first house source picture and the second house source picture exceeds a similar threshold value.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of identifying duplicate origins as claimed in any of the claims 1-7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method for identifying duplicate origin as claimed in any one of claims 1 to 7.
CN201910865969.8A 2019-09-12 2019-09-12 Method and device for identifying repeated house sources, electronic equipment and readable medium Pending CN110633383A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910865969.8A CN110633383A (en) 2019-09-12 2019-09-12 Method and device for identifying repeated house sources, electronic equipment and readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910865969.8A CN110633383A (en) 2019-09-12 2019-09-12 Method and device for identifying repeated house sources, electronic equipment and readable medium

Publications (1)

Publication Number Publication Date
CN110633383A true CN110633383A (en) 2019-12-31

Family

ID=68971176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910865969.8A Pending CN110633383A (en) 2019-09-12 2019-09-12 Method and device for identifying repeated house sources, electronic equipment and readable medium

Country Status (1)

Country Link
CN (1) CN110633383A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259966A (en) * 2020-01-17 2020-06-09 青梧桐有限责任公司 Method and system for identifying homonymous cell with multi-feature fusion
CN111275096A (en) * 2020-01-17 2020-06-12 青梧桐有限责任公司 Homonymous cell identification method and system based on image identification
CN112419312A (en) * 2020-12-11 2021-02-26 五八有限公司 Similar house source information detection method and device, electronic equipment and readable medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912643A (en) * 2016-04-08 2016-08-31 浙江理工大学 Image retrieval method based on content improved Average Hash
CN108763570A (en) * 2018-06-05 2018-11-06 北京拓世寰宇网络技术有限公司 A kind of method and device identifying the identical source of houses
CN109753576A (en) * 2018-12-25 2019-05-14 上海七印信息科技有限公司 A kind of method for retrieving similar images

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912643A (en) * 2016-04-08 2016-08-31 浙江理工大学 Image retrieval method based on content improved Average Hash
CN108763570A (en) * 2018-06-05 2018-11-06 北京拓世寰宇网络技术有限公司 A kind of method and device identifying the identical source of houses
CN109753576A (en) * 2018-12-25 2019-05-14 上海七印信息科技有限公司 A kind of method for retrieving similar images

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
韩红旗: "《语义指纹著者姓名消歧理论及应用》", 31 July 2018 *
黄嘉恒,李晓伟,陈本辉,杨邓奇: "基于哈希的图像相似度算法比较研究", 《大理大学学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259966A (en) * 2020-01-17 2020-06-09 青梧桐有限责任公司 Method and system for identifying homonymous cell with multi-feature fusion
CN111275096A (en) * 2020-01-17 2020-06-12 青梧桐有限责任公司 Homonymous cell identification method and system based on image identification
CN112419312A (en) * 2020-12-11 2021-02-26 五八有限公司 Similar house source information detection method and device, electronic equipment and readable medium
CN112419312B (en) * 2020-12-11 2023-04-07 五八有限公司 Similar house source information detection method and device, electronic equipment and readable medium

Similar Documents

Publication Publication Date Title
CN110222775B (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN110634047B (en) Method and device for recommending house resources, electronic equipment and storage medium
CN111784712B (en) Image processing method, device, equipment and computer readable medium
CN110633383A (en) Method and device for identifying repeated house sources, electronic equipment and readable medium
CN110705511A (en) Blurred image recognition method, device, equipment and storage medium
CN110321447A (en) Determination method, apparatus, electronic equipment and the storage medium of multiimage
CN111738316A (en) Image classification method and device for zero sample learning and electronic equipment
CN113918659A (en) Data operation method and device, storage medium and electronic equipment
CN113255812B (en) Video frame detection method and device and electronic equipment
CN111209432A (en) Information acquisition method and device, electronic equipment and computer readable medium
CN115358958A (en) Special effect graph generation method, device and equipment and storage medium
CN113033680B (en) Video classification method and device, readable medium and electronic equipment
CN115272182A (en) Lane line detection method, lane line detection device, electronic device, and computer-readable medium
CN111414921B (en) Sample image processing method, device, electronic equipment and computer storage medium
CN111915532B (en) Image tracking method and device, electronic equipment and computer readable medium
CN111258582B (en) Window rendering method and device, computer equipment and storage medium
CN111338827A (en) Method and device for pasting table data and electronic equipment
CN114332324B (en) Image processing method, device, equipment and medium
CN111680754B (en) Image classification method, device, electronic equipment and computer readable storage medium
CN111737575B (en) Content distribution method, content distribution device, readable medium and electronic equipment
CN118262188A (en) Object detection model training method, object detection information generating method and device
CN115272061A (en) Method, device and equipment for generating special effect video and storage medium
CN110209851B (en) Model training method and device, electronic equipment and storage medium
CN110189279B (en) Model training method and device, electronic equipment and storage medium
CN110752958A (en) User behavior analysis method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230419

Address after: Room 802, Information Building, 13 Linyin North Street, Pinggu District, Beijing, 101299

Applicant after: Beijing youzhuju Network Technology Co.,Ltd.

Address before: No. 715, 7th floor, building 3, 52 Zhongguancun South Street, Haidian District, Beijing 100081

Applicant before: Beijing infinite light field technology Co.,Ltd.