CN111259966A - Method and system for identifying homonymous cell with multi-feature fusion - Google Patents
Method and system for identifying homonymous cell with multi-feature fusion Download PDFInfo
- Publication number
- CN111259966A CN111259966A CN202010053091.0A CN202010053091A CN111259966A CN 111259966 A CN111259966 A CN 111259966A CN 202010053091 A CN202010053091 A CN 202010053091A CN 111259966 A CN111259966 A CN 111259966A
- Authority
- CN
- China
- Prior art keywords
- cell
- image
- distinguished
- name
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/467—Encoded features or binary features, e.g. local binary patterns [LBP]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Remote Sensing (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method and a system for distinguishing homonymous cells with multi-feature fusion, wherein the method comprises the following steps: acquiring first/second longitude and latitude information of a first/second cell to be distinguished; determining the distance between two cells; when the distance is smaller than or equal to a first preset threshold value, acquiring a first/second image and a first/second name of a first/second cell to be distinguished; calculating a first/second LBP characteristic value of the first/second image, and determining an image similarity value between the first image and the second image; calculating a text similarity value between the first name and the second name; according to preset weight, carrying out weighted average on the image similarity value and the text similarity value to obtain a similarity score between two cells; and determining a discrimination result according to the similarity score. According to the method and the device, three characteristics of the distance between the cells to be distinguished, the image similarity value and the text similarity between the cell names are comprehensively considered, so that misjudgment can be avoided, and the distinguishing accuracy is effectively improved.
Description
Technical Field
The invention relates to the technical field of information processing, in particular to a homonymous cell distinguishing method with multi-feature fusion.
Background
With the rapid popularization and development of the internet, a house renting and selling platform is greatly popularized. And the house broker issues the house source information to each renting and selling platform so that the user can conveniently search the required house source information on the house source website by setting a screening condition.
However, in some application scenarios, if the alias of the cell a is the cell B, different house brokers may use different cell names when issuing the house source information, which results in that a user cannot distinguish whether the two are the same house source when searching the house source information; in addition, in another application scenario, if there are two cells with the same or similar names, the user may misunderstand that the two cells are the same house source.
In order to solve the above problem, the method for determining whether two cell names are the same cell in the prior art is as follows: firstly, judging whether the cities and the urban areas where the two cells are located are the same; and if the two cell names are the same, further calculating the text similarity of the two cell names, and if the text similarity is more than or equal to 90%, judging that the two cells are the same cell.
However, in the above method for identifying the cells with the same name, when an alias with text similarity smaller than 90% exists in a certain cell, or text similarity of names of two different cells exceeds 90%, a high misjudgment frequency occurs, and the identification accuracy is greatly reduced.
Disclosure of Invention
The invention provides a homonymous cell distinguishing method with multi-feature fusion, which can effectively improve the distinguishing accuracy of homonymous cells and reduce the misjudgment risk.
In a first aspect, the present application provides a method for identifying a homonymous cell with multi-feature fusion, where the method includes:
acquiring first longitude and latitude information of a first cell to be identified and second longitude and latitude information of a second cell to be identified;
determining the distance between the first cell to be distinguished and the second cell to be distinguished according to the first longitude and latitude information and the second longitude and latitude information;
when the distance between the first cell to be distinguished and the second cell to be distinguished is smaller than or equal to a first preset threshold value, acquiring a first image and a first name of the first cell to be distinguished and a second image and a second name of the second cell to be distinguished; the first image comprises a first building body of a first cell to be identified, and the second image comprises a second building body of a second cell to be identified;
calculating a first LBP characteristic value of the first image and a second LBP characteristic value of the second image, and determining an image similarity value between the first image and the second image according to the first LBP characteristic value and the second LBP characteristic value;
calculating a text similarity value between the first name and the second name;
according to a preset weight, carrying out weighted average on the image similarity value and the text similarity value to obtain a similarity score between a first cell to be distinguished and a second cell to be distinguished;
and determining a discrimination result according to the similarity score.
Optionally, the step of calculating a first LBP feature value of the first image and a second LBP feature value of the second image, and determining an image similarity value between the first image and the second image according to the first LBP feature value and the second LBP feature value includes:
sliding a preset window on the first image/the second image, and comparing the pixel values of the first pixel points/the second pixel points at each adjacent point of the current central point with a threshold value by taking the pixel value of the first pixel point/the second pixel point at the current central point in the preset window as the threshold value when the preset window slides each time, so as to obtain a first unsigned binary number/a second unsigned binary number corresponding to the current central point;
converting the first unsigned binary number/the second unsigned binary number into a decimal number to obtain an LBP characteristic value of a first pixel point/a second pixel point at the current central point;
after each first pixel point of a non-boundary region in the first image and each second pixel point of a non-boundary region in the second image are traversed in a sliding mode, taking the obtained LBP characteristic value of each first pixel point/second pixel point as a first LBP characteristic value/a second LBP characteristic value of the first image/second image;
calculating a cosine similarity between the first LBP feature value and the second LBP feature value as an image similarity value between the first image and the second image.
Optionally, the size of the preset window is 3 × 3;
the step of calculating the first unsigned binary number/the second unsigned binary number corresponding to the current center point comprises the following steps:
judging whether the pixel value of a first pixel point/a second pixel point at each adjacent point of the current central point in the preset window is larger than or equal to the threshold value or not; if so, marking the neighbor point as "1"; if not, marking the adjacent points as '0';
and sequentially arranging the marks of all the adjacent points in the preset window according to a preset sequence to obtain 8-bit first unsigned binary number/second unsigned binary number.
Optionally, before the step of obtaining the first longitude and latitude information of the first cell to be identified and the second longitude and latitude information of the second cell to be identified, the method further includes:
acquiring a first city and a first urban area where the first cell to be identified is located, and a second city and a second urban area where the second cell to be identified is located;
comparing whether a first city in which the first cell to be distinguished is located is the same as a second city in which the second cell to be distinguished is located; if not, the first cell to be distinguished and the second cell to be distinguished are not the same cell;
if so, comparing whether a first urban area in which the first cell to be distinguished is located is the same as a second urban area in which the second cell to be distinguished is located: if not, the first cell to be distinguished and the second cell to be distinguished are not the same cell; and if so, executing the step of determining the distance between the first cell to be distinguished and the second cell to be distinguished according to the first longitude and latitude information and the second longitude and latitude information.
Optionally, the first longitude and latitude information includes a first longitude and a first latitude of the first cell to be distinguished, and the second longitude and latitude information includes a second longitude and a second latitude of the second cell to be distinguished;
the distance between the first cell to be distinguished and the second cell to be distinguished is obtained by adopting the following formula:
wherein d is the distance between the first cell to be discriminated and the second cell to be discriminated, R is the radius of the earth,andfirst and second latitudes, respectively, and Δ λ represents the difference between the first and second longitudes.
Optionally, after the step of determining the distance between the first cell to be identified and the second cell to be identified according to the first longitude and latitude information and the second longitude and latitude information, the method further includes:
when the distance between a first cell to be distinguished and a second cell to be distinguished is larger than the first preset threshold value, the distinguishing result is that the first cell to be distinguished and the second cell to be distinguished are not the same cell.
Optionally, the step of calculating the text similarity value between the first name and the second name includes:
preprocessing a text corresponding to the first name and a text corresponding to the second name to respectively obtain a first text and a second text;
after word segmentation processing is carried out on the first text and the second text, vectorizing is carried out on the first text and the second text respectively by utilizing a word vector model trained in advance, and a first word vector and a second word vector are obtained;
and calculating the cosine similarity of the first word vector and the second word vector as the text similarity of the first name and the second name.
Optionally, the step of preprocessing the text corresponding to the first name and the text corresponding to the second name to obtain the first text and the second text respectively includes:
and removing invalid suffix and symbol characters in the text corresponding to the first name and the text corresponding to the second name, converting capital English characters into lowercase English characters, and converting numbers into Chinese characters.
Optionally, the step of determining a discrimination result according to the similarity score includes:
judging whether the similarity score is greater than or equal to a second preset threshold value or not; if so, determining that the first cell to be distinguished and the second cell to be distinguished are the same cell; if not, the first cell to be distinguished and the second cell to be distinguished are not the same cell.
In a second aspect, the present application provides a multi-feature fused homonymous cell discrimination system, the system comprising:
the first acquiring module is used for acquiring first longitude and latitude information of a first cell to be identified and second longitude and latitude information of a second cell to be identified;
the distance determining module is used for determining the distance between the first cell to be identified and the second cell to be identified according to the first longitude and latitude information and the second longitude and latitude information;
a second obtaining module, configured to obtain a first image and a first name of the first cell to be identified, and a second image and a second name of the second cell to be identified, when a distance between the first cell to be identified and the second cell to be identified is smaller than or equal to a first preset threshold;
a first calculating module, configured to calculate a first LBP feature value of the first image and a second LBP feature value of the second image, and determine an image similarity value between the first image and the second image according to the first LBP feature value and the second LBP feature value;
the second calculation module is used for calculating a text similarity value between the first name and the second name;
the score obtaining module is used for carrying out weighted average on the image similarity value and the text similarity value according to preset weight to obtain a similarity score between a first cell to be distinguished and a second cell to be distinguished;
and the result determining module is used for determining a discrimination result according to the similarity score.
Compared with the prior art, the method and the system for identifying the homonymous cell with multi-feature fusion provided by the invention at least realize the following beneficial effects:
according to the homonymous cell identification method and system with multi-feature fusion, first longitude and latitude information of a first cell to be identified and second longitude and latitude information of a second cell to be identified are obtained; determining the distance between the first cell to be identified and the second cell to be identified according to the first longitude and latitude information and the second longitude and latitude information; when the distance between the first cell to be distinguished and the second cell to be distinguished is smaller than or equal to a first preset threshold value, acquiring a first image and a first name of the first cell to be distinguished and a second image and a second name of the second cell to be distinguished; calculating a first LBP characteristic value of the first image and a second LBP characteristic value of the second image, and determining an image similarity value between the first image and the second image according to the first LBP characteristic value and the second LBP characteristic value; calculating a text similarity value between the first name and the second name; according to preset weight, carrying out weighted average on the image similarity value and the text similarity value to obtain a similarity score between a first cell to be distinguished and a second cell to be distinguished; and determining a discrimination result according to the similarity score. Because the three characteristics of the distance between the cells to be distinguished, the text similarity between the cell names to be distinguished and the similarity between the building bodies of the cells to be distinguished are comprehensively considered, the misjudgment caused by the fact that the cells to be distinguished have the alias or the cell names to be distinguished are the same can be avoided, and the distinguishing accuracy is effectively improved.
Of course, it is not necessary for any product in which the present invention is practiced to achieve all of the above-described technical effects simultaneously.
Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a flowchart illustrating a method for identifying a homonymous cell with multi-feature fusion according to an embodiment of the present application;
fig. 2 is a schematic diagram illustrating calculation of LBP characteristic values in the method for identifying the cells of the same name provided in the embodiment of fig. 1;
fig. 3 is a schematic structural diagram of a homonymous cell identification system with multi-feature fusion according to an embodiment of the present application.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
In the method for distinguishing the cells with the same name provided by the prior art, for the cells to be distinguished in the same city and urban area, whether the cells are the same cell is judged only by judging whether the text similarity is greater than or equal to a preset threshold value. It can be understood that if a name a and an alias B exist in a certain cell, and the text similarity between a and B is less than 90%, the name a and the alias B are determined to be different cells; for another example, because the names of two cells with names a and B are the same or similar, the text similarity is greater than or equal to 90%, and then the name a and the alias B are determined to be the same cell. Therefore, the method for distinguishing the cells with the same name provided by the prior art only takes the text similarity as a distinguishing basis, so that high misjudgment frequency can occur, and the distinguishing accuracy is greatly reduced.
In view of the above, the invention provides a method for identifying a cell with the same name based on text similarity, which can effectively improve the identification accuracy of the cell with the same name and reduce the risk of misjudgment.
The following detailed description is to be read in connection with the drawings and the detailed description.
Fig. 1 is a flowchart illustrating a homonymous cell identification method with multi-feature fusion according to an embodiment of the present application. Referring to fig. 1, the method for identifying a cell of the same name includes:
103, when the distance between the first cell to be distinguished and the second cell to be distinguished is smaller than or equal to a first preset threshold value, acquiring a first image and a first name of the first cell to be distinguished, and a second image and a second name of the second cell to be distinguished; the first image comprises a first building body of a first cell to be identified, and the second image comprises a second building body of a second cell to be identified;
106, carrying out weighted average on the image similarity value and the text similarity value according to preset weight to obtain a similarity score between a first cell to be distinguished and a second cell to be distinguished;
and step 107, determining a discrimination result according to the similarity score.
Specifically, after first longitude and latitude information of a first cell to be identified and second longitude and latitude information of a second cell to be identified are obtained, a distance between the two cells can be calculated, and preliminary identification is performed in a distance limiting mode; then, for two cells to be identified which may be the same cell, respectively obtaining a first image and a first name of a first cell to be identified and a second image and a second name of a second cell to be identified, and performing weighted average on the text similarity value between the two cell names and the image similarity value between the building bodies of the two cells; and finally, determining a discrimination result according to the similarity score obtained by weighted average.
Because the three characteristics of the distance between the cells to be distinguished, the text similarity between the cell names to be distinguished and the similarity between the building bodies of the cells to be distinguished are comprehensively considered, the misjudgment caused by the fact that the cells to be distinguished have the alias or the cell names to be distinguished are the same can be avoided, and the distinguishing accuracy is effectively improved.
Optionally, in the step 104, the step of calculating a first LBP feature value of the first image and a second LBP feature value of the second image, and determining an image similarity value between the first image and the second image according to the first LBP feature value and the second LBP feature value includes:
sliding a preset window on the first image/the second image, taking the pixel value of a first pixel point/a second pixel point at the current central point in the preset window as a threshold value during each sliding, and comparing the pixel value of the first pixel point/the second pixel point at each adjacent point of the current central point with the threshold value to obtain a first unsigned binary number/a second unsigned binary number corresponding to the current central point;
converting the first unsigned binary number/the second unsigned binary number into a decimal number to obtain an LBP characteristic value of a first pixel point/a second pixel point at the current central point;
after each first pixel point of a non-boundary region in the first image and each second pixel point of a non-boundary region in the second image are traversed in a sliding mode, the obtained LBP characteristic value of each first pixel point/second pixel point is used as a first LBP characteristic value/a second LBP characteristic value of the first image/the second image;
and calculating the cosine similarity between the first LBP characteristic value and the second LBP characteristic value as the image similarity value between the first image and the second image.
Specifically, the size of the preset window may be 3 × 3.
LBP is a descriptor commonly used in computer vision and is robust to monotonic gray scale changes. Therefore, the first image and the second image may be converted into a gray-scale map before the first LBP feature value and the second LBP feature value are calculated.
Fig. 2 is a schematic diagram illustrating calculation of LBP characteristic values in the method for identifying the cells of the same name provided in the embodiment of fig. 1. Referring to fig. 2, taking a preset window with a size of 3 × 3 as an example, when the preset window slides on the first image/the second image, the pixel value C of the first pixel point/the second pixel point at the current center point in the preset window is taken as a threshold; then, the pixel values P of the first pixel point/the second pixel point at each adjacent point of the current central point are calculated0、P1、…、P6、P7And comparing the current central point with the threshold value C to obtain the first unsigned binary number/the second unsigned binary number corresponding to the current central point.
Optionally, the step of calculating the first unsigned binary number/the second unsigned binary number corresponding to the current center point comprises:
s1, judging whether the pixel value of the first pixel point/the second pixel point at each adjacent point of the current central point in the preset window is larger than or equal to a threshold value; if so, then mark the adjacent point as "1"; if not, marking adjacent points as '0';
and S2, sequentially arranging the marks of each adjacent point in the preset window according to a preset sequence to obtain a first unsigned binary number/a second unsigned binary number of 8 bits.
Specifically, please refer to fig. 2 and the following formulas:
in the formula, aiMarks representing neighboring points of the current center point; c represents the pixel value of the first pixel point/the second pixel point at the current central point, namely a threshold value; p is a radical ofiAnd expressing the pixel values of the first pixel point/the second pixel point at each adjacent point of the current central point.
When calculating a first unsigned binary number/a second unsigned binary number corresponding to a current central point, if the pixel value of a first pixel point/a second pixel point at each adjacent point of the current central point is more than or equal to a threshold value, marking the adjacent point as '1'; and if the pixel value of the first pixel point/the second pixel point at each adjacent point of the current central point is less than the threshold value, marking the adjacent point as '0'.
In step S2, the labels of adjacent points in the preset window may be arranged in sequence in a clockwise or counterclockwise order. It can be understood that, since the size of the preset window is 3 × 3, the number of the neighboring points of the current center point is 8, and thus the first unsigned binary number and the second unsigned binary number obtained after the comparison are both 8 bits.
The LBP feature can accurately describe local texture information of an image by comparing relative gray levels of pixel points. However, such localized features lack coarse-grained understanding of the overall information of the image, and are also susceptible to noise and are not robust enough, resulting in some loss of specific local structural feature information.
Therefore, in this embodiment, an average value of pixel values of all the first pixel points/second pixel points corresponding to the preset window may also be calculated, and the average value is used as a threshold value for subsequent comparison. Therefore, the problem of local characteristic information loss can be solved, and a large-scale structure of an image can be captured; meanwhile, the method has stronger robustness to rotation and light change within a certain range, thereby improving the calculation accuracy of the image similarity value.
Optionally, before the step of obtaining the first longitude and latitude information of the first cell to be identified and the second longitude and latitude information of the second cell to be identified, the method further includes:
acquiring a first city and a first urban area where a first cell to be identified is located, and a second city and a second urban area where a second cell to be identified is located;
comparing whether a first city in which the first cell to be distinguished is located is the same as a second city in which the second cell to be distinguished is located; if not, the first cell to be distinguished and the second cell to be distinguished are not the same cell;
if so, comparing whether a first urban area in which the first cell to be distinguished is located is the same as a second urban area in which the second cell to be distinguished is located: if not, the first cell to be distinguished and the second cell to be distinguished are not the same cell; and if so, executing the step of determining the distance between the first cell to be distinguished and the second cell to be distinguished according to the first longitude and latitude information and the second longitude and latitude information.
Obviously, if two cells to be distinguished are the same cell, then both must also be in the same city and urban area, i.e.: two cells to be distinguished in different cities or urban areas must not be the same cell. Therefore, aiming at the condition that the cell to be distinguished is in different cities or urban areas, the distinguishing result can be quickly obtained by comparing the first basic information with the second basic information without subsequent calculation, so that the calculation resources are greatly saved, and the real-time performance of the algorithm is improved.
Optionally, the first longitude and latitude information includes a first longitude and a first latitude of the first cell to be distinguished, and the second longitude and latitude information includes a second longitude and a second latitude of the second cell to be distinguished;
the distance between the first cell to be distinguished and the second cell to be distinguished is obtained by adopting the following formula:
wherein d is the distance between the first cell to be discriminated and the second cell to be discriminated, R is the radius of the earth,andfirst and second latitudes, respectively, and Δ λ represents the difference between the first and second longitudes.
In this embodiment, the first longitude and latitude information and the second longitude and latitude information may be obtained by a map App, and the radius of the earth may be 6371 km. The earth is a sphere, and the longitude and latitude can be converted into the earth coordinate by using the formula, so that the distance between the cells to be distinguished can be accurately calculated.
Optionally, after the step of determining the distance between the first cell to be identified and the second cell to be identified according to the first longitude and latitude information and the second longitude and latitude information, the method further includes:
and when the distance between the first cell to be distinguished and the second cell to be distinguished is larger than a first preset threshold value, the distinguishing result is that the first cell to be distinguished and the second cell to be distinguished are not the same cell.
Wherein, the preset threshold value can be set according to the area of the cell to be distinguished. In this embodiment, the preset threshold may be 1500 meters. When the distance between the first cell to be distinguished and the second cell to be distinguished is larger than 1500, the first cell and the second cell to be distinguished are not the same cell, and the operation speed of the algorithm is greatly increased.
Optionally, the step of calculating the text similarity value between the first name and the second name includes:
preprocessing a text corresponding to the first name and a text corresponding to the second name to respectively obtain a first text and a second text;
after word segmentation processing is carried out on the first text and the second text, vectorizing is carried out on the first text and the second text respectively by utilizing a word vector model trained in advance, and a first word vector and a second word vector are obtained;
and calculating the cosine similarity of the first word vector and the second word vector as the text similarity of the first name and the second name.
The text similarity represents the matching degree between two or more texts; the larger the text similarity is, the higher the similarity between the explanatory texts is, whereas the smaller the text similarity is, the lower the similarity between the explanatory texts is. In particular, when vectorizing text, different vectorization granularities may be selected. For example, vectorization may be performed in units of words or in units of words.
Optionally, the step of preprocessing the text corresponding to the first name and the text corresponding to the second name to obtain the first text and the second text, respectively, includes:
and removing invalid suffix and symbol characters in the text corresponding to the first name and the text corresponding to the second name, converting capital English characters into lowercase English characters, and converting numbers into Chinese characters.
Specifically, the invalid suffix may be a word having no practical meaning in the names of "cell", "house", "building", and the like. It should be understood that invalid suffix in the text corresponding to the first name and the text corresponding to the second name are removed, so that the meaningless words can be prevented from generating adverse influence on the calculation result, and the accuracy of the calculated text similarity is improved.
In addition, after the invalid suffix is removed, stop words which appear in the text but are hardly useful for characterizing the text features can be further removed from the text corresponding to the first name and the text corresponding to the second name. For example, "a, the, of, and, or" etc. in English, "I, Y, etc. in China.
Obviously, before the text similarity is calculated, the stop words are removed, so that the density of the keywords can be improved, the dimensionality of the text can be reduced, the calculation accuracy of the text similarity is further improved, the algorithm efficiency is effectively improved, and the real-time performance is better.
Optionally, the step of determining the discrimination result according to the similarity score includes:
judging whether the similarity score is greater than or equal to a second preset threshold value or not; if so, the discrimination result is that the first cell to be discriminated and the second cell to be discriminated are the same cell; if not, the first cell to be distinguished and the second cell to be distinguished are not the same cell.
The homonymous cell distinguishing method with multi-feature fusion provided by the invention at least realizes the following beneficial effects:
the method for distinguishing the homonymous cells with the multi-feature fusion comprises the steps of obtaining first longitude and latitude information of a first cell to be distinguished and second longitude and latitude information of a second cell to be distinguished; determining the distance between the first cell to be identified and the second cell to be identified according to the first longitude and latitude information and the second longitude and latitude information; when the distance between the first cell to be distinguished and the second cell to be distinguished is smaller than or equal to a first preset threshold value, acquiring a first image and a first name of the first cell to be distinguished and a second image and a second name of the second cell to be distinguished; calculating a first LBP characteristic value of the first image and a second LBP characteristic value of the second image, and determining an image similarity value between the first image and the second image according to the first LBP characteristic value and the second LBP characteristic value; calculating a text similarity value between the first name and the second name; according to preset weight, carrying out weighted average on the image similarity value and the text similarity value to obtain a similarity score between a first cell to be distinguished and a second cell to be distinguished; and determining a discrimination result according to the similarity score. Because the three characteristics of the distance between the cells to be distinguished, the text similarity between the cell names to be distinguished and the similarity between the building bodies of the cells to be distinguished are comprehensively considered, the misjudgment caused by the fact that the cells to be distinguished have the alias or the cell names to be distinguished are the same can be avoided, and the distinguishing accuracy is effectively improved.
Based on the same inventive concept, the present application further provides a system for identifying a homonymous cell with multi-feature fusion, and fig. 3 is a schematic structural diagram of the system for identifying a homonymous cell with multi-feature fusion provided in the embodiment of the present application. Referring to fig. 3, the system includes:
a first obtaining module 310, configured to obtain first longitude and latitude information of a first cell to be identified and second longitude and latitude information of a second cell to be identified;
a distance determining module 320, configured to determine a distance between the first cell to be identified and the second cell to be identified according to the first longitude and latitude information and the second longitude and latitude information;
a second obtaining module 330, configured to obtain a first image and a first name of the first cell to be identified, and a second image and a second name of the second cell to be identified when a distance between the first cell to be identified and the second cell to be identified is less than or equal to a first preset threshold;
a first calculating module 340, configured to calculate a first LBP feature value of the first image and a second LBP feature value of the second image, and determine an image similarity value between the first image and the second image according to the first LBP feature value and the second LBP feature value;
a second calculating module 350, configured to calculate a text similarity value between the first name and the second name;
the score obtaining module 360 is configured to perform weighted average on the image similarity value and the text similarity value according to a preset weight, so as to obtain a similarity score between the first cell to be distinguished and the second cell to be distinguished;
and a result determining module 370, configured to determine a discrimination result according to the similarity score.
Because the three characteristics of the distance between the cells to be distinguished, the text similarity between the cell names to be distinguished and the similarity between the building bodies of the cells to be distinguished are comprehensively considered, the misjudgment caused by the fact that the cells to be distinguished have the alias or the cell names to be distinguished are the same can be avoided, and the distinguishing accuracy is effectively improved.
According to the homonymous cell identification method and system with multi-feature fusion, first longitude and latitude information of a first cell to be identified and second longitude and latitude information of a second cell to be identified are obtained; determining the distance between the first cell to be identified and the second cell to be identified according to the first longitude and latitude information and the second longitude and latitude information; when the distance between the first cell to be distinguished and the second cell to be distinguished is smaller than or equal to a first preset threshold value, acquiring a first image and a first name of the first cell to be distinguished and a second image and a second name of the second cell to be distinguished; calculating a first LBP characteristic value of the first image and a second LBP characteristic value of the second image, and determining an image similarity value between the first image and the second image according to the first LBP characteristic value and the second LBP characteristic value; calculating a text similarity value between the first name and the second name; according to preset weight, carrying out weighted average on the image similarity value and the text similarity value to obtain a similarity score between a first cell to be distinguished and a second cell to be distinguished; and determining a discrimination result according to the similarity score. Because the three characteristics of the distance between the cells to be distinguished, the text similarity between the cell names to be distinguished and the similarity between the building bodies of the cells to be distinguished are comprehensively considered, the misjudgment caused by the fact that the cells to be distinguished have the alias or the cell names to be distinguished are the same can be avoided, and the distinguishing accuracy is effectively improved.
Although some specific embodiments of the present invention have been described in detail by way of examples, it should be understood by those skilled in the art that the above examples are for illustrative purposes only and are not intended to limit the scope of the present invention. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.
Claims (10)
1. A method for identifying a homonymous cell with multi-feature fusion, the method comprising:
acquiring first longitude and latitude information of a first cell to be identified and second longitude and latitude information of a second cell to be identified;
determining the distance between the first cell to be distinguished and the second cell to be distinguished according to the first longitude and latitude information and the second longitude and latitude information;
when the distance between the first cell to be distinguished and the second cell to be distinguished is smaller than or equal to a first preset threshold value, acquiring a first image and a first name of the first cell to be distinguished and a second image and a second name of the second cell to be distinguished; the first image comprises a first building body of a first cell to be identified, and the second image comprises a second building body of a second cell to be identified;
calculating a first LBP characteristic value of the first image and a second LBP characteristic value of the second image, and determining an image similarity value between the first image and the second image according to the first LBP characteristic value and the second LBP characteristic value;
calculating a text similarity value between the first name and the second name;
according to a preset weight, carrying out weighted average on the image similarity value and the text similarity value to obtain a similarity score between a first cell to be distinguished and a second cell to be distinguished;
and determining a discrimination result according to the similarity score.
2. The method of claim 1, wherein the step of calculating a first LBP feature value of the first image and a second LBP feature value of the second image and determining an image similarity value between the first image and the second image according to the first LBP feature value and the second LBP feature value comprises:
sliding a preset window on the first image/the second image, and comparing the pixel values of the first pixel points/the second pixel points at each adjacent point of the current central point with a threshold value by taking the pixel value of the first pixel point/the second pixel point at the current central point in the preset window as the threshold value when the preset window slides each time, so as to obtain a first unsigned binary number/a second unsigned binary number corresponding to the current central point;
converting the first unsigned binary number/the second unsigned binary number into a decimal number to obtain an LBP characteristic value of a first pixel point/a second pixel point at the current central point;
after each first pixel point of a non-boundary region in the first image and each second pixel point of a non-boundary region in the second image are traversed in a sliding mode, taking the obtained LBP characteristic value of each first pixel point/second pixel point as a first LBP characteristic value/a second LBP characteristic value of the first image/second image;
calculating a cosine similarity between the first LBP feature value and the second LBP feature value as an image similarity value between the first image and the second image.
3. The method according to claim 2, wherein the size of the predetermined window is 3 x 3;
the step of calculating the first unsigned binary number/the second unsigned binary number corresponding to the current center point comprises the following steps:
judging whether the pixel value of a first pixel point/a second pixel point at each adjacent point of the current central point in the preset window is larger than or equal to the threshold value or not; if so, marking the neighbor point as "1"; if not, marking the adjacent points as '0';
and sequentially arranging the marks of all the adjacent points in the preset window according to a preset sequence to obtain 8-bit first unsigned binary number/second unsigned binary number.
4. The method according to claim 1, wherein the step of obtaining the first longitude and latitude information of the first cell to be identified and the second longitude and latitude information of the second cell to be identified further comprises:
acquiring a first city and a first urban area where the first cell to be identified is located, and a second city and a second urban area where the second cell to be identified is located;
comparing whether a first city in which the first cell to be distinguished is located is the same as a second city in which the second cell to be distinguished is located; if not, the first cell to be distinguished and the second cell to be distinguished are not the same cell;
if so, comparing whether a first urban area in which the first cell to be distinguished is located is the same as a second urban area in which the second cell to be distinguished is located: if not, the first cell to be distinguished and the second cell to be distinguished are not the same cell; and if so, executing the step of determining the distance between the first cell to be distinguished and the second cell to be distinguished according to the first longitude and latitude information and the second longitude and latitude information.
5. The multi-feature fused homonymous cell discrimination method of claim 4, wherein the first longitude and latitude information includes a first longitude and a first latitude of the first cell to be discriminated, and the second longitude and latitude information includes a second longitude and a second latitude of the second cell to be discriminated;
the distance between the first cell to be distinguished and the second cell to be distinguished is obtained by adopting the following formula:
6. The method according to claim 4, wherein the step of determining the distance between the first cell to be identified and the second cell to be identified according to the first longitude and latitude information and the second longitude and latitude information further comprises:
when the distance between a first cell to be distinguished and a second cell to be distinguished is larger than the first preset threshold value, the distinguishing result is that the first cell to be distinguished and the second cell to be distinguished are not the same cell.
7. The method of claim 1, wherein the step of calculating the text similarity value between the first name and the second name comprises:
preprocessing a text corresponding to the first name and a text corresponding to the second name to respectively obtain a first text and a second text;
after word segmentation processing is carried out on the first text and the second text, vectorizing is carried out on the first text and the second text respectively by utilizing a word vector model trained in advance, and a first word vector and a second word vector are obtained;
and calculating the cosine similarity of the first word vector and the second word vector as the text similarity of the first name and the second name.
8. The method for identifying the homonymous cell with the multi-feature fusion as claimed in claim 7, wherein the step of preprocessing the text corresponding to the first name and the text corresponding to the second name to obtain the first text and the second text respectively comprises:
and removing invalid suffix and symbol characters in the text corresponding to the first name and the text corresponding to the second name, converting capital English characters into lowercase English characters, and converting numbers into Chinese characters.
9. The method for identifying a cell with the same name as the cell with the same name and the same name as the cell with the same name, according to the similarity score, comprising the steps of:
judging whether the similarity score is greater than or equal to a second preset threshold value or not; if so, determining that the first cell to be distinguished and the second cell to be distinguished are the same cell; if not, the first cell to be distinguished and the second cell to be distinguished are not the same cell.
10. A multi-feature fused homonymous cell discrimination system, the system comprising:
the first acquiring module is used for acquiring first longitude and latitude information of a first cell to be identified and second longitude and latitude information of a second cell to be identified;
the distance determining module is used for determining the distance between the first cell to be identified and the second cell to be identified according to the first longitude and latitude information and the second longitude and latitude information;
a second obtaining module, configured to obtain a first image and a first name of the first cell to be identified, and a second image and a second name of the second cell to be identified, when a distance between the first cell to be identified and the second cell to be identified is smaller than or equal to a first preset threshold;
a first calculating module, configured to calculate a first LBP feature value of the first image and a second LBP feature value of the second image, and determine an image similarity value between the first image and the second image according to the first LBP feature value and the second LBP feature value;
the second calculation module is used for calculating a text similarity value between the first name and the second name;
the score obtaining module is used for carrying out weighted average on the image similarity value and the text similarity value according to preset weight to obtain a similarity score between a first cell to be distinguished and a second cell to be distinguished;
and the result determining module is used for determining a discrimination result according to the similarity score.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010053091.0A CN111259966A (en) | 2020-01-17 | 2020-01-17 | Method and system for identifying homonymous cell with multi-feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010053091.0A CN111259966A (en) | 2020-01-17 | 2020-01-17 | Method and system for identifying homonymous cell with multi-feature fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111259966A true CN111259966A (en) | 2020-06-09 |
Family
ID=70950781
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010053091.0A Withdrawn CN111259966A (en) | 2020-01-17 | 2020-01-17 | Method and system for identifying homonymous cell with multi-feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111259966A (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104156732A (en) * | 2014-08-01 | 2014-11-19 | 北京利云技术开发公司 | Paper authenticity identification system and method |
CN104504007A (en) * | 2014-12-10 | 2015-04-08 | 成都品果科技有限公司 | Method and system for acquiring similarity degree of images |
CN108763570A (en) * | 2018-06-05 | 2018-11-06 | 北京拓世寰宇网络技术有限公司 | A kind of method and device identifying the identical source of houses |
CN109271626A (en) * | 2018-08-31 | 2019-01-25 | 北京工业大学 | Text semantic analysis method |
CN109977287A (en) * | 2019-03-28 | 2019-07-05 | 国家计算机网络与信息安全管理中心 | A kind of house property data identity method of discrimination of different aforementioned sources |
CN110096634A (en) * | 2019-04-29 | 2019-08-06 | 成都理工大学 | A kind of house property data vector alignment schemes based on particle group optimizing |
CN110334778A (en) * | 2019-07-16 | 2019-10-15 | 同方知网数字出版技术股份有限公司 | A kind of image synthesis similarity analysis method based on description content and image content features |
CN110490264A (en) * | 2019-08-23 | 2019-11-22 | 中国民航大学 | Multidimensional distance cluster method for detecting abnormality and system based on time series |
CN110633383A (en) * | 2019-09-12 | 2019-12-31 | 北京无限光场科技有限公司 | Method and device for identifying repeated house sources, electronic equipment and readable medium |
-
2020
- 2020-01-17 CN CN202010053091.0A patent/CN111259966A/en not_active Withdrawn
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104156732A (en) * | 2014-08-01 | 2014-11-19 | 北京利云技术开发公司 | Paper authenticity identification system and method |
CN104504007A (en) * | 2014-12-10 | 2015-04-08 | 成都品果科技有限公司 | Method and system for acquiring similarity degree of images |
CN108763570A (en) * | 2018-06-05 | 2018-11-06 | 北京拓世寰宇网络技术有限公司 | A kind of method and device identifying the identical source of houses |
CN109271626A (en) * | 2018-08-31 | 2019-01-25 | 北京工业大学 | Text semantic analysis method |
CN109977287A (en) * | 2019-03-28 | 2019-07-05 | 国家计算机网络与信息安全管理中心 | A kind of house property data identity method of discrimination of different aforementioned sources |
CN110096634A (en) * | 2019-04-29 | 2019-08-06 | 成都理工大学 | A kind of house property data vector alignment schemes based on particle group optimizing |
CN110334778A (en) * | 2019-07-16 | 2019-10-15 | 同方知网数字出版技术股份有限公司 | A kind of image synthesis similarity analysis method based on description content and image content features |
CN110490264A (en) * | 2019-08-23 | 2019-11-22 | 中国民航大学 | Multidimensional distance cluster method for detecting abnormality and system based on time series |
CN110633383A (en) * | 2019-09-12 | 2019-12-31 | 北京无限光场科技有限公司 | Method and device for identifying repeated house sources, electronic equipment and readable medium |
Non-Patent Citations (1)
Title |
---|
张雨石: ""图像物体检测识别中的LBP特征"", 《HTTPS://BLOG.CSDN.NET/STDCOUTZYX/ARTICLE/DETAILS/37317863》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111783419B (en) | Address similarity calculation method, device, equipment and storage medium | |
CN109918673B (en) | Semantic arbitration method and device, electronic equipment and computer-readable storage medium | |
CN106682233A (en) | Method for Hash image retrieval based on deep learning and local feature fusion | |
CN112966691A (en) | Multi-scale text detection method and device based on semantic segmentation and electronic equipment | |
CN111159485B (en) | Tail entity linking method, device, server and storage medium | |
CN111340034B (en) | Text detection and identification method and system for natural scene | |
CN112613293B (en) | Digest generation method, digest generation device, electronic equipment and storage medium | |
CN112016605A (en) | Target detection method based on corner alignment and boundary matching of bounding box | |
CN111274822A (en) | Semantic matching method, device, equipment and storage medium | |
CN116304307A (en) | Graph-text cross-modal retrieval network training method, application method and electronic equipment | |
CN116226785A (en) | Target object recognition method, multi-mode recognition model training method and device | |
CN108268641A (en) | Invoice information recognition methods and invoice information identification device, equipment and storage medium | |
CN111950280A (en) | Address matching method and device | |
CN113094465A (en) | Method and system for checking duplicate of design product | |
CN116189139A (en) | Traffic sign detection method based on Transformer | |
CN112445976A (en) | City address positioning method based on congestion index map | |
CN112231449A (en) | Vertical field entity chain finger system based on multi-path recall | |
CN111259966A (en) | Method and system for identifying homonymous cell with multi-feature fusion | |
CN114880572B (en) | Intelligent news client recommendation system | |
CN111291155A (en) | Method and system for identifying homonymous cells based on text similarity | |
CN114297235A (en) | Risk address identification method and system and electronic equipment | |
CN108920361B (en) | String matching code similarity detection method | |
CN117216249A (en) | Data classification method, device, electronic equipment, medium and vehicle | |
CN114036297A (en) | Statement classification method and device, terminal equipment and storage medium | |
CN114090781A (en) | Text data-based repulsion event detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20200609 |
|
WW01 | Invention patent application withdrawn after publication |