CN114911787B - Multi-source POI data cleaning method integrating position and semantic constraint - Google Patents
Multi-source POI data cleaning method integrating position and semantic constraint Download PDFInfo
- Publication number
- CN114911787B CN114911787B CN202210613379.8A CN202210613379A CN114911787B CN 114911787 B CN114911787 B CN 114911787B CN 202210613379 A CN202210613379 A CN 202210613379A CN 114911787 B CN114911787 B CN 114911787B
- Authority
- CN
- China
- Prior art keywords
- data
- poi
- processing
- word
- inconsistent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000004140 cleaning Methods 0.000 title claims abstract description 16
- 238000012545 processing Methods 0.000 claims abstract description 40
- 230000011218 segmentation Effects 0.000 claims abstract description 20
- 238000006243 chemical reaction Methods 0.000 claims abstract description 4
- 238000004458 analytical method Methods 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 4
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 230000001419 dependent effect Effects 0.000 claims description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 241001669696 Butis Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 239000013256 coordination polymer Substances 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to a multi-source POI data cleaning method integrating position and semantic constraint, and belongs to the technical field of data processing. The method performs the steps of: step 1, performing GeoHash conversion on collected multi-source POI data; step 2, inquiring the adjacent points of the converted character string; step 3, performing redundancy processing on the window with the adjacent point in the step 2; step 4, constructing a word segmentation scheme; step 5, performing redundancy processing on the data processed in the step 4; and 6, finishing the POI data re-matching based on word frequency statistics of the word segmentation scheme reconstructed in the step 5. The method can complete data cleaning work more accurately and efficiently, has more excellent cleaning result, and is more practical and effective.
Description
Technical Field
The invention relates to a multi-source POI data cleaning method integrating position and semantic constraint, and belongs to the technical field of data processing.
Background
With the continuous emergence of new information release modes represented by blogs, social networks and location-based services LBS, and the rising of technologies such as cloud computing and internet of things, data is continuously growing and accumulating at unprecedented speeds, and various fields are continuously trying to mine hidden information under big data. But as the amount of data increases substantially, the quality of the data is also continually decreasing. In a big data environment, there are several problems with various types of data from heterogeneous systems: (1) the data of each application system lacks the definition of unified standard, and has larger inconsistency. (2) Repeatability, there are two or more identical physical descriptions of the same objective thing in the database. (3) Ambiguity, which is caused by defects in the design of the system and artifacts in some use processes, leads to the phenomenon that attribute values are lost and uncertain in the data record.
As a result of the above-described situation, data cleansing plays an increasingly important role in the process of data analysis and management. Data cleansing aims to identify and correct noise in the data, minimizing the impact of noise on the data analysis results. POIs are taken as a component of big data and are important carriers of location services, and the quality of the location service research is directly controlled. To obtain more comprehensive POI data, researchers/technicians have attempted to obtain data from multiple data sources, but this has been accompanied by problems such as increased redundancy, incomplete data, etc.
Disclosure of Invention
The invention aims to solve the technical problems that: how to provide a multi-source POI data cleaning method.
In order to solve the technical problems, the technical scheme provided by the invention is as follows: a multi-source POI data cleaning method integrating position and semantic constraint is implemented as follows,
step 1, performing GeoHash conversion on collected multi-source POI data, and converting two-dimensional coordinate data into character strings;
step 2, inquiring the adjacent points of the converted character string;
step 3, performing redundancy processing on the window with the adjacent point in the step 2, and sequentially performing redundancy data processing, incomplete data processing, inconsistent data processing and high-similarity data processing;
step 4, constructing a word segmentation scheme based on the Chinese language model Chinese Language Model and the hidden Markov model Hidden Markov Mode;
step 5, performing redundancy processing on the data processed in the step 4;
and 6, finishing POI data re-matching based on word frequency statistics of the word segmentation scheme reconstructed in the step 5, and realizing multi-source POI data cleaning.
The improvement of the technical scheme is as follows: and carrying out neighbor point query on the converted character string by prefix matching based on a B+ tree method.
The improvement of the technical scheme is as follows: the processing for redundant data, incomplete data, inconsistent data and highly similar data in step 3 is as follows,
redundant data processing, namely reserving one operation for repeated data caused by continuous tracking of the same platform data; the method comprises the steps of processing partial attributes of a small amount of redundant data in a consistent manner in a mode of reserving data with highest completeness based on the position attributes;
processing incomplete data, namely firstly carrying out redundancy judgment on the incomplete data and the complete data, removing if the complete data is defined as redundant data, further judging whether the complete data is inconsistent data or high similar data if the complete data is non-redundant data, and processing the complete data according to a corresponding mode while adding corresponding labels;
inconsistent data processing, namely verifying the names and positions of POI points by carrying out multiple times of geographic analysis and address analysis on different geographic service platforms for inconsistent data of non-adjacent points; for inconsistent data of adjacent points, selecting the position data with the most analyzed information as entity position information, and eliminating other inconsistent data;
and (3) processing high similarity data, namely performing phrase segmentation on entity description names by utilizing an inconsistent data processing mode, establishing a similarity data index, acquiring address data based on a region mapping library of a designated region, and selecting POI data more comprehensive relative to geographic elements for storage.
The improvement of the technical scheme is as follows: in step 4, processing is performed by using Chinese Language Model for splitting the existing vocabulary dependent POI names; for words that are not captured by the vocabulary but need to be divided, the POI name segmentation is divided based on the word formation using Hidden Markov Mode.
The improvement of the technical scheme is as follows: the redundant processing procedure in the step 5 is consistent with the rest part of the step 3 except for the processing objects.
The improvement of the technical scheme is as follows: in the reconstruction process of step 6, keywords related to the names of the POI data and word frequencies of the corresponding keywords are required to be determined, and the reverse file frequencies are abandoned according to the word frequencies, so that the keywords with high probability are selected to correspond to the corresponding POI data.
The beneficial effects of the invention are as follows: the invention processes redundancy, error, true, reclassification and the like on the multi-source POI data, unifies the data quality and standard by position constraint and semantic constraint, and obtains a POI data set with high availability and high credibility. The method can complete data cleaning work more accurately and efficiently, has more excellent cleaning result, and is a more practical and effective data cleaning method.
Drawings
Fig. 1 is a flowchart of a method for cleaning multi-source POI data fusing position and semantic constraints according to an embodiment of the present invention.
Detailed Description
Examples
The method for cleaning multi-source POI data with fusion of position and semantic constraint in this embodiment, as shown in fig. 1, performs the following steps:
step 1, performing GeoHash conversion on collected multi-source POI data, and converting two-dimensional coordinate data into character strings; the GeoHash is taken as a geographical position representation structure, position information such as longitude and latitude can be coded and converted into character strings composed of letters and numbers, namely two-dimensional data are reduced into one-dimensional data, and the longer the shared prefix between two geographical position converted GeoHash character strings is, the more adjacent the two points are in space.
Step 2, inquiring the adjacent points of the converted character string; and matching the prefix of the character string in a certain fixed window by using a B+ tree method to realize the inquiry of the adjacent points. When the GeoHash length is selected, experiments should be performed for multiple times to calculate latitude errors, longitude errors and meter errors under the condition of each length, and the most suitable length is selected.
And 3, if the data (or the region) in the step 2 only has 1 POI point (POI position) or does not have the POI point, storing the data for subsequent research operation. If a plurality of adjacent POI points exist in the area, data similarity detection is needed, data quality and data redundancy are analyzed, and investigation, screening and elimination are performed to obtain a low-redundancy and high-availability data set in the area.
Step 4, constructing a word segmentation scheme based on a Chinese language model Chinese Language Model (CLM) and a hidden Markov model Hidden Markov Mode (HMM);
and 4.1, splitting aiming at the POI names depended on by the existing vocabulary, and processing by using a CLM. Firstly, a prefix dictionary is used for scanning a word graph formed by POI names, and a directed acyclic graph is constructed according to the conditions of all possible formed words in the names so as to obtain all segmentation modes W of the POI names S. Calculating the conditional probability P (W|S) of each segmentation mode based on dynamic programming, and taking the POI name segmentation mode W corresponding to the maximum conditional probability * And obtaining a final word segmentation result. Is obtained by Bayes formula, and the W can be obtained by solving P (W) * While P (W) can be obtained using CP (W) LM modeling, taking Bi-gram as an example to show the formula:
W * =argmax w P(W|S)
step 4.2, dividing the POI name word by using the HMM mainly based on word formation aiming at the words which are not recorded by the word list but need to be divided, namely, the non-login words, and defining four word positions of the word: word head, word middle, word tail and single word. And taking the POI name as input, taking a sequence string formed by the corresponding word positions as output, and dividing the word position sequence string to obtain the division of the POI name. For the discovered non-login words, the non-login words are added into a vocabulary to improve algorithm efficiency. The final POI name segmentation example is as follows.
POI name word segmentation schematic
In step 5, since the data originates from multiple platforms, there will be a case where multiple data records point to the same geographic entity in the captured data, thereby generating redundant data.
The redundant data of the class needs to be cleaned in terms of both its attribute and its position. And (3) for redundant data with completely consistent attribute record rows, a method for reserving one record and rejecting other redundant records is adopted. For the redundant data with the consistent attribute record row part, because the redundant data are all adjacent POI data under the same window, the POI with the highest data completeness is reserved based on the position information, and the redundant data with other consistent attributes are discarded.
And for partial incomplete data lacking the category auxiliary information, firstly carrying out redundancy judgment on the partial incomplete data and the complete data, and eliminating if the partial incomplete data is defined as redundant data. If the data is non-redundant data, calculating the similarity degree of the data and each type according to the analysis result of the data of each type established in the step 3, and giving the POI data type label according to the similarity threshold value or keyword matching requirement of the data meeting a certain type.
Aiming at inconsistent data with inconsistent positions, selecting the position data with the most analyzed information as entity position information, and eliminating other inconsistent data; for data with similar names but consistent positions in the adjacent POI information, index is established for the inconsistent data in the window, and a maximum group of data is expressed by the names and used as entity name information.
Neighboring POI data inconsistency examples
Aiming at POI data with similar names but not identical, adjacent and not identical coordinates, firstly, carrying out initial judgment on the similarity: and performing phrase segmentation on the entity description name by using a word segmentation scheme, and taking POI groups with the same number of segmented words being greater than 1 group or POI data with 1 group of word segmentation results and similar POI data to establish similar data indexes. And secondly, carrying out address data standardization on the data in the similar data indexes, for example, using a Chinese administrative division mapping library to extract and map the original address information, and obtaining the standardized address data. And finally, splitting POI address data into 8 parts of contents, namely province, city, district, road, house number, community, building and text, based on an administrative region mapping library and a word segmentation scheme, setting different splitting result comparison modes aiming at different POI types, and selecting POI data with complete geographic elements for storage.
In step 6, since the POI data reconstructed in step 5 still has the phenomenon that the types of the multiple-source POI data are inconsistent with the actual types due to the small amount of POI, the POI data need to be subjected to type re-matching, and the multiple-source POI data can be cleaned.
Because the category attribute of the POI data, the influence of the same category data on the target POI is far greater than that of the non-same category data, a POI category corpus is constructed, and the simulated POI is a corpus scene. Firstly, counting the number of keywords (Term Count, TC for short) in the POI name, namely the number of times of occurrence in a certain word corpus. Determining word Frequency of calculated keyword after TC, for keyword J in some i type POI name, its word Frequency (Term Frequency, TF) is shown in the following formula, wherein TC i,j Is the number of times the word appears in the i corpus, butIs the sum of the number of occurrences of all words in the i corpus.
Based on the assumption that the name directivity in POI data and the position in a name sequence where a keyword group is positioned have certain relation, TF-IDF is improved, when the POI name is scanned, the sequence k in the phrase is split according to the POI where the keyword is positioned, and the definition rule of newly added position weights W and W is as follows: when the number of the keywords is more than 2, the weight of the keyword at the rearmost position is set to be 2, the weight of the keyword in the penultimate group is set to be 1.5, and the rest is set to be 1; when the number of the keywords is 2, the weight of the keyword at the last position is set to be 1.5, and the rest is set to be 1; weight 1 is reset when the number of the keywords is 1; keywords are individual numbers or letters and are not weighted. The final weighted word frequency is obtained by multiplication with TF, the formula is as follows:
and taking n keywords in each type before weighted word frequency as category core words based on a heap sorting and batch processing mode, constructing a core word dictionary tree for efficiently matching the core words, calculating the weight sum corresponding to the keywords according to the keywords of a certain category matched by POI names, namely calculating the probability of the category to which the keywords belong, and finally selecting the category with the highest probability to give the POI, thereby completing the re-matching process.
Claims (5)
1. A multi-source POI data cleaning method integrating position and semantic constraint is characterized by executing the following steps:
step 1, performing GeoHash conversion on collected multi-source POI data, and converting two-dimensional coordinate data into character strings;
step 2, inquiring the adjacent points of the converted character string;
step 3, performing redundancy processing on the window with the adjacent point in the step 2, and sequentially performing redundancy data processing, incomplete data processing, inconsistent data processing and high-similarity data processing;
step 4, constructing a word segmentation scheme based on the Chinese language model Chinese Language Model and the hidden Markov model Hidden Markov Mode, wherein: processing using Chinese Language Model for existing vocabulary dependent POI name splitting; aiming at words which are not recorded by the vocabulary but need to be divided, using Hidden Markov Mode to divide the POI name into words based on word formation;
step 5, performing redundancy processing on the data processed in the step 4;
and 6, finishing POI data re-matching based on word frequency statistics of the word segmentation scheme reconstructed in the step 5, and realizing multi-source POI data cleaning.
2. The multi-source POI data cleansing method with fusion of location and semantic constraints according to claim 1, wherein: and carrying out neighbor point query on the converted character string by prefix matching based on a B+ tree method.
3. The multi-source POI data cleansing method with fusion of location and semantic constraints according to claim 1, wherein: the processing for redundant data, incomplete data, inconsistent data and highly similar data in step 3 is as follows,
redundant data processing, namely reserving one operation for repeated data caused by continuous tracking of the same platform data; the method comprises the steps of processing partial attributes of a small amount of redundant data in a consistent manner in a mode of reserving data with highest completeness based on the position attributes;
processing incomplete data, namely firstly carrying out redundancy judgment on the incomplete data and the complete data, removing if the complete data is defined as redundant data, further judging whether the complete data is inconsistent data or high similar data if the complete data is non-redundant data, and processing the complete data according to a corresponding mode while adding corresponding labels;
inconsistent data processing, namely verifying the names and positions of POI points by carrying out multiple times of geographic analysis and address analysis on different geographic service platforms for inconsistent data of non-adjacent points; for inconsistent data of adjacent points, selecting the position data with the most analyzed information as entity position information, and eliminating other inconsistent data;
and (3) processing high similarity data, namely performing phrase segmentation on entity description names by utilizing an inconsistent data processing mode, establishing a similarity data index, acquiring address data based on a region mapping library of a designated region, and selecting POI data more comprehensive relative to geographic elements for storage.
4. The multi-source POI data cleansing method with fusion of location and semantic constraints according to claim 1, wherein: the redundant processing procedure in the step 5 is consistent with the rest part of the step 3 except for the processing objects.
5. The multi-source POI data cleansing method with fusion of location and semantic constraints according to claim 1, wherein: in the reconstruction process of step 6, keywords related to the names of the POI data and word frequencies of the corresponding keywords are required to be determined, and the reverse file frequencies are abandoned according to the word frequencies, so that the keywords with high probability are selected to correspond to the corresponding POI data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210613379.8A CN114911787B (en) | 2022-05-31 | 2022-05-31 | Multi-source POI data cleaning method integrating position and semantic constraint |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210613379.8A CN114911787B (en) | 2022-05-31 | 2022-05-31 | Multi-source POI data cleaning method integrating position and semantic constraint |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114911787A CN114911787A (en) | 2022-08-16 |
CN114911787B true CN114911787B (en) | 2023-10-27 |
Family
ID=82771332
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210613379.8A Active CN114911787B (en) | 2022-05-31 | 2022-05-31 | Multi-source POI data cleaning method integrating position and semantic constraint |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114911787B (en) |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20150032378A (en) * | 2013-09-16 | 2015-03-26 | 엔에이치엔엔터테인먼트 주식회사 | Service method and system for providing reward using moving path of users |
CN107153712A (en) * | 2017-05-26 | 2017-09-12 | 南京大学 | Support the personalized customization picture management method of the time and space association of mobile terminal |
CN107798054A (en) * | 2017-09-04 | 2018-03-13 | 昆明理工大学 | A kind of range query method and device based on Trie |
CN108846013A (en) * | 2018-05-04 | 2018-11-20 | 昆明理工大学 | A kind of spatial key word querying method and device based on geohash Yu Patricia Trie |
CN111143588A (en) * | 2019-12-27 | 2020-05-12 | 中科星图股份有限公司 | Image space-time index quick retrieval method based on machine learning |
CN111274341A (en) * | 2020-01-16 | 2020-06-12 | 中国建设银行股份有限公司 | Site selection method and device for network points |
CN112287055A (en) * | 2020-11-03 | 2021-01-29 | 亿景智联(北京)科技有限公司 | Algorithm for calculating redundant POI data according to cosine similarity and Buffer |
CN112307142A (en) * | 2020-06-05 | 2021-02-02 | 北京沃东天骏信息技术有限公司 | Method and device for determining information point in geographic information system and storage medium |
CN112527938A (en) * | 2020-12-17 | 2021-03-19 | 安徽迪科数金科技有限公司 | Chinese POI matching method based on natural language understanding |
CN113032672A (en) * | 2021-03-24 | 2021-06-25 | 北京百度网讯科技有限公司 | Method and device for extracting multi-modal POI (Point of interest) features |
CN113568951A (en) * | 2021-07-30 | 2021-10-29 | 拉扎斯网络科技(上海)有限公司 | Data mining and processing method and device, storage medium and electronic equipment |
CN113761867A (en) * | 2020-12-29 | 2021-12-07 | 京东城市(北京)数字科技有限公司 | Address recognition method and device, computer equipment and storage medium |
CN114201480A (en) * | 2021-11-04 | 2022-03-18 | 深圳市凯立德科技股份有限公司 | Multi-source POI fusion method and device based on NLP technology and readable storage medium |
CN114491056A (en) * | 2021-12-10 | 2022-05-13 | 新智道枢(上海)科技有限公司 | Method and system for improving POI (Point of interest) search in digital police scene |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7937402B2 (en) * | 2006-07-10 | 2011-05-03 | Nec (China) Co., Ltd. | Natural language based location query system, keyword based location query system and a natural language and keyword based location query system |
CN107291785A (en) * | 2016-04-12 | 2017-10-24 | 滴滴(中国)科技有限公司 | A kind of data search method and device |
US10776405B2 (en) * | 2016-07-28 | 2020-09-15 | International Business Machines Corporation | Mechanism and apparatus of spatial encoding enabled multi-scale context join |
US11366866B2 (en) * | 2017-12-08 | 2022-06-21 | Apple Inc. | Geographical knowledge graph |
-
2022
- 2022-05-31 CN CN202210613379.8A patent/CN114911787B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20150032378A (en) * | 2013-09-16 | 2015-03-26 | 엔에이치엔엔터테인먼트 주식회사 | Service method and system for providing reward using moving path of users |
CN107153712A (en) * | 2017-05-26 | 2017-09-12 | 南京大学 | Support the personalized customization picture management method of the time and space association of mobile terminal |
CN107798054A (en) * | 2017-09-04 | 2018-03-13 | 昆明理工大学 | A kind of range query method and device based on Trie |
CN108846013A (en) * | 2018-05-04 | 2018-11-20 | 昆明理工大学 | A kind of spatial key word querying method and device based on geohash Yu Patricia Trie |
CN111143588A (en) * | 2019-12-27 | 2020-05-12 | 中科星图股份有限公司 | Image space-time index quick retrieval method based on machine learning |
CN111274341A (en) * | 2020-01-16 | 2020-06-12 | 中国建设银行股份有限公司 | Site selection method and device for network points |
CN112307142A (en) * | 2020-06-05 | 2021-02-02 | 北京沃东天骏信息技术有限公司 | Method and device for determining information point in geographic information system and storage medium |
CN112287055A (en) * | 2020-11-03 | 2021-01-29 | 亿景智联(北京)科技有限公司 | Algorithm for calculating redundant POI data according to cosine similarity and Buffer |
CN112527938A (en) * | 2020-12-17 | 2021-03-19 | 安徽迪科数金科技有限公司 | Chinese POI matching method based on natural language understanding |
CN113761867A (en) * | 2020-12-29 | 2021-12-07 | 京东城市(北京)数字科技有限公司 | Address recognition method and device, computer equipment and storage medium |
CN113032672A (en) * | 2021-03-24 | 2021-06-25 | 北京百度网讯科技有限公司 | Method and device for extracting multi-modal POI (Point of interest) features |
CN113568951A (en) * | 2021-07-30 | 2021-10-29 | 拉扎斯网络科技(上海)有限公司 | Data mining and processing method and device, storage medium and electronic equipment |
CN114201480A (en) * | 2021-11-04 | 2022-03-18 | 深圳市凯立德科技股份有限公司 | Multi-source POI fusion method and device based on NLP technology and readable storage medium |
CN114491056A (en) * | 2021-12-10 | 2022-05-13 | 新智道枢(上海)科技有限公司 | Method and system for improving POI (Point of interest) search in digital police scene |
Non-Patent Citations (2)
Title |
---|
Spatial Data Quality in the Internet of Things: Management, Exploitation, and Prospects;Huan Li等;《ACM Computing Surveys》;第55卷(第3期);第1-41页 * |
结合否定关键词的空间关键词查询;金海等;《微电子学与计算机》;第38卷(第9期);第54-60页 * |
Also Published As
Publication number | Publication date |
---|---|
CN114911787A (en) | 2022-08-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111324784B (en) | Character string processing method and device | |
US8171029B2 (en) | Automatic generation of ontologies using word affinities | |
CN110597804B (en) | Facilitating spatial indexing on a distributed key value store | |
CN109886294A (en) | Knowledge fusion method, apparatus, computer equipment and storage medium | |
US20160275196A1 (en) | Semantic search apparatus and method using mobile terminal | |
KR100903961B1 (en) | Indexing And Searching Method For High-Demensional Data Using Signature File And The System Thereof | |
CN109408578B (en) | Monitoring data fusion method for heterogeneous environment | |
CN108388559A (en) | Name entity recognition method and system, computer program of the geographical space under | |
CN110807102A (en) | Knowledge fusion method and device, computer equipment and storage medium | |
CN111651447B (en) | Intelligent construction life-span data processing, analyzing and controlling system | |
CN116628173B (en) | Intelligent customer service information generation system and method based on keyword extraction | |
CN107832778B (en) | Same target identification method based on spatial comprehensive similarity | |
CN109885641B (en) | Method and system for searching Chinese full text in database | |
CN114090787A (en) | Knowledge graph construction method based on internet power policy information | |
CN115688779B (en) | Address recognition method based on self-supervision deep learning | |
CN114168705B (en) | Chinese address matching method based on address element index | |
CN115438274A (en) | False news identification method based on heterogeneous graph convolutional network | |
CN114911787B (en) | Multi-source POI data cleaning method integrating position and semantic constraint | |
CN112685452A (en) | Enterprise case retrieval method, device, equipment and storage medium | |
CN117892820A (en) | Multistage data modeling method and system based on large language model | |
CN113128210B (en) | Webpage form information analysis method based on synonym discovery | |
CN115759055A (en) | English place name proofreading method considering multi-dimensional character characteristics | |
KR101839121B1 (en) | System and method for correcting user's query | |
CN111460325B (en) | POI searching method, device and equipment | |
CN113111136B (en) | Entity disambiguation method and device based on UCL knowledge space |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |