Nothing Special   »   [go: up one dir, main page]

CN112347222A - Method and system for converting non-standard address into standard address based on knowledge base reasoning - Google Patents

Method and system for converting non-standard address into standard address based on knowledge base reasoning Download PDF

Info

Publication number
CN112347222A
CN112347222A CN202011141247.7A CN202011141247A CN112347222A CN 112347222 A CN112347222 A CN 112347222A CN 202011141247 A CN202011141247 A CN 202011141247A CN 112347222 A CN112347222 A CN 112347222A
Authority
CN
China
Prior art keywords
address
entity
standard
knowledge base
entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011141247.7A
Other languages
Chinese (zh)
Other versions
CN112347222B (en
Inventor
吕晓宝
叶恺翔
王元兵
王海荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sugon Nanjing Research Institute Co ltd
Original Assignee
Sugon Nanjing Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sugon Nanjing Research Institute Co ltd filed Critical Sugon Nanjing Research Institute Co ltd
Priority to CN202011141247.7A priority Critical patent/CN112347222B/en
Publication of CN112347222A publication Critical patent/CN112347222A/en
Application granted granted Critical
Publication of CN112347222B publication Critical patent/CN112347222B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method and a system for converting a non-standard address into a standard address based on knowledge base reasoning, wherein the method specifically comprises the following steps: firstly, setting a body of an address knowledge base, secondly, constructing a standard address knowledge base, constructing an entity from the traditional standard address base, further constructing a word vector of a standard address, comparing the word vector by a cosine similarity algorithm, mapping the word vector to the entity in the knowledge base, searching an entity matched with an address element in the standard address knowledge base by using a semantic similarity algorithm based on an address name, and further extracting the address element and azimuth relation description information in an original text by named entity identification; through natural language processing and knowledge map processing, non-standardized address text data are automatically mapped to standard addresses through an algorithm, and the cleaning and treatment of the address data are completed.

Description

Method and system for converting non-standard address into standard address based on knowledge base reasoning
Technical Field
The invention relates to an address conversion technology, in particular to a method and a system for converting a non-standard address into a standard address based on knowledge base reasoning.
Background
With the progress of informatization construction of digital cities and smart cities in various regions, business information of different departments is gradually brought into informatization construction contents, however, most of addresses for expressing spatial positions in the information are semantic place name address information described by natural language characters, and spatial geographic coordinates for determining relative position relation of a space main body are described in an information world and are main indexes of spatialization of various information, the spatialization of the address is one of core technologies of an application service informatization system based on the position, and how to correlate and match the address and the spatial geographic coordinates is a key for realizing spatialization of various address information and is also a basis for realizing large-batch business data spatialization management.
At present, the non-standard address mapping algorithm basically calculates the similarity between each address text in a standard address and a non-standard address, and then selects the most similar address as an output result, and generally adopts a similarity algorithm as follows: 1. matching based on keywords; 2. cosine similarity based on the short text vector; 3. an edit distance based on the character string; 4. big data recommendation based on user click behaviors; 5. the mapping process is regarded as a text classification task, machine automatic learning is carried out through a naive Bayes and neural network model, and the similarity algorithms basically meet the requirements of nonstandard address mapping but lack reasoning capability.
Disclosure of Invention
The purpose of the invention is as follows: the utility model provides a complicated path planning system of data center computer lab to solve above-mentioned problem.
The technical scheme is as follows: a complex path planning system of a data center machine room comprises:
step 1: setting an ontology of an address knowledge base;
step 2: constructing a standard address knowledge base;
and step 3: comparing by a cosine similarity algorithm;
and 4, step 4: and extracting the address information of the original text.
According to one aspect of the invention, the ontology of the address knowledge base in the step 1 comprises a knowledge graph ontology, uuids of entities, entity attributes and relationships among the entities, wherein the knowledge graph ontology comprises six levels of province, city, county, street, town, road section and address unit, the entities are corresponding standard addresses of different levels, and are distinguished through global unique identifiers; the uuid of the entity consists of three parts, namely a knowledge map body, a name and a number in a knowledge base; the number is an administrative division number or an address number; the entity attributes comprise names, types, labels, longitude and latitude of a central point, longitude and latitude sequences of boundaries and remarks, and the labels are social attributes of the address entities.
According to an aspect of the present invention, the step 2 is further:
step 21, constructing a standard address knowledge base, constructing word vectors of standard addresses, constructing relationships among entities, calculating relationships among the entities, and acquiring hidden relationships, wherein the constructed standard address knowledge base comprises a traditional standard address base and unstructured text data;
step 22, building an entity from a traditional standard address library, wherein the traditional standard address library comprises a place name, longitude and latitude, an address type and an address label; when the knowledge graph is brought into, forming an entity by each standard address according to the uuid of the entity in the step 1, and standardizing the field value into a corresponding attribute value according to the mapping relation between the field and the entity attribute;
step 23, building word vectors of standard addresses according to a standard knowledge base, wherein the word vectors of the standard addresses are built by cutting address character strings in a segmentation mode with the step length of 1 and the window length of 2, a group of character strings with the length of 2 are generated and used as vector bases, and the value of each vector is the number of times that each base appears in the address character strings;
step 24, constructing the relationship between entities from the structured administrative division information, and directly constructing the relationship between the lower address and the upper address and the equal relationship of the same address generated by different names and laws through the existing administrative division information;
step 25, calculating the relationship between the entities according to the longitude and latitude, calculating the distance and the orientation between every two entities, taking 1 kilometer as a truncation radius of the adjacent relationship, taking the left deviation and the right deviation of 45 degrees of the four orientations of east, west, south and north as respective direction intervals according to respective standard angles, and taking the actual travel distance of each address unit entity on the same road section along the road section as a distance attribute value of the orientation relationship;
step 26, constructing and extracting a hidden relation between the existing entities in the knowledge base according to the unstructured text data, and further acquiring the hidden relation between the address of the artificial oral description and the corresponding artificial calibration standard address;
for each piece of unstructured text data, firstly, extracting address elements in the text in an entity naming identification mode, comparing the extracted address elements with the word vectors of the constructed standard addresses and the address word vectors of all entities in the knowledge base through a cosine similarity algorithm, and mapping the address elements to an entity A in the knowledge base.
According to an aspect of the present invention, the step 3 is further a step of comparing by a cosine similarity algorithm, and marking word vectors after the non-standard address character string segmentation as vectors a and b, which have different respective bases, and therefore have different vector spaces and need to be converted to the same vector space, the module calculates and extracts a union set of a vector base and a vector base b to form a merging base, and converts the a vector base and the vector base b into a new merging vector space formed by the merging base, and then the step of calculating the similarity by using a cosine similarity formula between the non-standard address word vector a and the standard address word vector b is as follows:
step 31, splicing the bases of the two word vectors to form a vector base union set to obtain new word vector values, wherein the generated new vectors are (1,1,0,0) and (0,1,1, 1);
step 32, obtaining the following mode according to a cosine similarity algorithm:
Figure BDA0002738345170000031
wherein a and b each represent a vector;
let vector a ═ x1,x2,x3,x4…xn) (ii) a Vector b ═ y1,y2,y3,y4…yn) And then substituting the result into a cosine similarity algorithm to further obtain the following mode:
Figure BDA0002738345170000032
by the above method, the standard address with the highest cosine similarity is extracted from each non-standard address to form a standard candidate set for querying the non-standard address, and the entity B is further obtained according to the recorded manual verification standard address.
According to one aspect of the invention, an entity B is obtained according to a recorded manual check standard address, the entity A and the entity B mapped to a knowledge base are judged, and a relationship between the entity A and the entity B is judged; extracting additional relation azimuth description in the text address in a mode of combining the regular expression and the part of speech tagging algorithm, mapping the additional relation azimuth description into a corresponding relation type and a corresponding attribute in a knowledge base body, and then establishing a corresponding relation from an entity A to an entity B, wherein the step of specifically extracting the additional relation azimuth description is as follows:
step 1, firstly, performing part-of-speech tagging on a text through an open-source word segmentation tool, filtering place names, proper nouns, verbs, adjectives and time words, and segmenting the text into a plurality of semantic segments;
step 2, judging whether each segment is described in relation orientation in a regular expression matching mode;
step 3, describing semantic segments of the direction by adopting a regular expression;
there is a relationship between the entity a and the entity B, and the probability of occurrence of the entity a and the entity B has influence on each other, that is:
p((B|A))≠P(B)
searching an entity matched with the address elements in a standard address knowledge base to obtain the following mode:
p((B|A))=P(B)
the following is further derived from the relationship between the two:
p(AB)=p((B|A))P(A)=P(A)P(B)
where A and B represent independent entity vector events.
According to an aspect of the present invention, the step 4 is further: the following steps are obtained according to the address information of the extracted original file:
step 41, identifying address and direction description information;
step 42, matching address entities, and if no azimuth description exists, ending the process;
step 43, matching the orientation description into a standard relationship, and screening the relationship conforming to the orientation description from all the relationships of the matched address entities;
step 44, deducing a tail entity according to the relationship, and if the attribute description of the tail entity exists, further screening the tail entity;
step 45, if a plurality of relationships are continuously inferred, the existence of the intermediate entity corresponding to each relationship needs to be confirmed;
step 46, screening the uniqueness of the address description information jointly according to the head and tail entities and the relationship attributes;
further according to step 4, an entity A mapped into a knowledge base and a step 3 of manually checking a standard address to obtain an entity B for recording, through named entity identification, address elements and azimuth relation description information in an original text are extracted, then an entity matched with the address elements is searched in the standard address knowledge base by using a semantic similarity algorithm based on an address name, if the matched entity is unique and does not have the relation azimuth description information, a standard address conversion process is completed, subsequent steps are not needed, the matched entity is unique and has the relation azimuth description information, the relation information needs to be matched with a standard relation in a knowledge base body through the semantic similarity algorithm, a relation type and corresponding attributes are determined, and a relation which is the most similar to the attribute uniquely matched with the entity is searched in all relations connected with the entity A, acquiring an entity B, and setting an error allowable range to float 30% above a distance attribute value for a distance precision range; the method comprises the steps of generating exact description about address attributes, carrying out step-by-step reasoning on a plurality of azimuth relationships under the condition that multi-hop relationship reasoning exists, sequentially confirming the existence of intermediate entities until the tail entity is finally confirmed to exist, and carrying out combined screening on azimuth relationships and attribute information under the condition that a matched entity is not unique, and extracting a standard address of the tail entity.
According to one aspect of the present invention, the relationship conforming to the orientation description is screened from all the relationships of the matching address entity, and the following steps are obtained:
43.1, establishing a non-standard address library and a standard address library which are independent of each other;
43.2, preprocessing the non-standard address base and performing first-level address matching with the standard address base;
43.3, splitting the address of the pre-processed non-standard address library and the standard address library to form an independent address library, and completing the allocation of the non-standard address library and the standard address library;
43.4, matching the addresses of the second level formed by the non-standard address base and the standard address base;
43.5, matching the addresses of the second level, judging whether the traversal of the level combination mode is finished, and finishing the matching of the address library if the traversal of the level combination mode is finished; if not, the operation of step 43.4 is performed so that the non-standard address pool and the standard address pool address match.
Has the advantages that: the invention designs a method and a system for converting a non-standard address into a standard address based on knowledge base reasoning, wherein the standard address and mutual relationship attributes are constructed into a knowledge base in a form of a head entity-directed relationship-tail entity triple, the knowledge base is stored in a knowledge graph form, and the head entity and the directed relationship in the triple are determined by extracting the standard address in the non-structured address and extracting related azimuth and attribute elements, so that a knowledge graph query condition is determined, and the tail entity address based on the standard address reasoning is obtained; the method is effectively applied to the scene of verbally describing the address by the user, helps the system quickly and accurately locate the real address pointed by the user, and compared with the traditional standard address mapping algorithm, the method can automatically construct and update the knowledge base based on the existing structural and non-structural data, and carries out logical reasoning, thereby conforming to the actual business scene.
Drawings
FIG. 1 is a flow chart of the standard address repository construction of the present invention.
FIG. 2 is a flow diagram of the knowledge base translation non-standard address of the present invention.
FIG. 3 is an address matching flow diagram of the present invention.
Detailed Description
As shown in fig. 1, in this embodiment, a complex path planning system for a data center room includes:
step 1: setting an ontology of an address knowledge base;
step 2: constructing a standard address knowledge base;
and step 3: comparing by a cosine similarity algorithm;
and 4, step 4: and extracting the address information of the original text.
In a further embodiment, the ontology of the address knowledge base in step 1 includes a knowledge graph ontology, uuid of an entity, attribute of the entity, and a relationship between the entities, where the knowledge graph ontology includes six levels of province, city, county, street, town, road segment, and address unit, and the entities are standard addresses corresponding to different levels, and are distinguished by a globally unique identifier; the uuid of the entity consists of three parts, namely a knowledge map body, a name and a number in a knowledge base; the number is an administrative division number or an address number; the entity attributes comprise names, types, labels, longitude and latitude of a central point, longitude and latitude sequences of boundaries and remarks, and the labels are social attributes of the address entities.
In a further embodiment, the step 2 is further:
step 21, constructing a standard address knowledge base, constructing word vectors of standard addresses, constructing relationships among entities, calculating relationships among the entities, and acquiring hidden relationships, wherein the constructed standard address knowledge base comprises a traditional standard address base and unstructured text data;
step 22, building an entity from a traditional standard address library, wherein the traditional standard address library comprises a place name, longitude and latitude, an address type and an address label; when the knowledge graph is brought into, forming an entity by each standard address according to the uuid of the entity in the step 1, and standardizing the field value into a corresponding attribute value according to the mapping relation between the field and the entity attribute;
step 23, building word vectors of standard addresses according to a standard knowledge base, wherein the word vectors of the standard addresses are built by cutting address character strings in a segmentation mode with the step length of 1 and the window length of 2, a group of character strings with the length of 2 are generated and used as vector bases, and the value of each vector is the number of times that each base appears in the address character strings;
step 24, constructing the relationship between entities from the structured administrative division information, and directly constructing the relationship between the lower address and the upper address and the equal relationship of the same address generated by different names and laws through the existing administrative division information;
step 25, calculating the relationship between the entities according to the longitude and latitude, calculating the distance and the orientation between every two entities, taking 1 kilometer as a truncation radius of the adjacent relationship, taking the left deviation and the right deviation of 45 degrees of the four orientations of east, west, south and north as respective direction intervals according to respective standard angles, and taking the actual travel distance of each address unit entity on the same road section along the road section as a distance attribute value of the orientation relationship;
step 26, constructing and extracting a hidden relation between the existing entities in the knowledge base according to the unstructured text data, and further acquiring the hidden relation between the address of the artificial oral description and the corresponding artificial calibration standard address;
for each piece of unstructured text data, firstly, extracting address elements in the text in an entity naming identification mode, comparing the extracted address elements with the word vectors of the constructed standard addresses and the address word vectors of all entities in the knowledge base through a cosine similarity algorithm, and mapping the address elements to an entity A in the knowledge base.
In a further embodiment, the step 3 is further:
the method comprises the following steps of comparing by a cosine similarity algorithm, recording word vectors after segmentation of non-standard address character strings as vectors a and b, converting the vectors into the same vector space because the vector spaces are different due to different bases, extracting a union set of a vector base and a vector base b by module operation to form a merging base, converting the vectors a and b into a new merging vector space formed by the merging base, and calculating the similarity between the non-standard address word vector a and a standard address word vector b by using a cosine similarity formula, wherein the steps comprise:
step 31, splicing the bases of the two word vectors to form a vector base union set to obtain new word vector values, wherein the generated new vectors are (1,1,0,0) and (0,1,1, 1);
step 32, obtaining the following mode according to a cosine similarity algorithm:
Figure BDA0002738345170000071
wherein a and b each represent a vector;
let vector a ═ x1,x2,x3,x4…xn) (ii) a Vector b ═ y1,y2,y3,y4…yn) And then substituting the result into a cosine similarity algorithm to further obtain the following mode:
Figure BDA0002738345170000072
by the above method, the standard address with the highest cosine similarity is extracted from each non-standard address to form a standard candidate set for querying the non-standard address, and the entity B is further obtained according to the recorded manual verification standard address.
In a further embodiment, an entity B is obtained according to the recorded manual check standard address, the entity A and the entity B mapped to the knowledge base are judged, and a relationship between the entity A and the entity B is judged; extracting additional relation azimuth description in the text address in a mode of combining the regular expression and the part of speech tagging algorithm, mapping the additional relation azimuth description into a corresponding relation type and a corresponding attribute in a knowledge base body, and then establishing a corresponding relation from an entity A to an entity B, wherein the step of specifically extracting the additional relation azimuth description is as follows:
step 1, firstly, performing part-of-speech tagging on a text through an open-source word segmentation tool, filtering place names, proper nouns, verbs, adjectives and time words, and segmenting the text into a plurality of semantic segments;
step 2, judging whether each segment is described in relation orientation in a regular expression matching mode;
step 3, describing semantic segments of the direction by adopting a regular expression;
there is a relationship between the entity a and the entity B, and the probability of occurrence of the entity a and the entity B has influence on each other, that is:
p((B|A))≠P(B)
searching an entity matched with the address elements in a standard address knowledge base to obtain the following mode:
p((B|A))=P(B)
the following is further derived from the relationship between the two:
p(AB)=p((B|A))P(A)=P(A)P(B)
where A and B represent independent entity vector events.
In a further embodiment, the step 4 is further: the following steps are obtained according to the address information of the extracted original file:
step 41, identifying address and direction description information;
step 42, matching address entities, and if no azimuth description exists, ending the process;
step 43, matching the orientation description into a standard relationship, and screening the relationship conforming to the orientation description from all the relationships of the matched address entities;
step 44, deducing a tail entity according to the relationship, and if the attribute description of the tail entity exists, further screening the tail entity;
step 45, if a plurality of relationships are continuously inferred, the existence of the intermediate entity corresponding to each relationship needs to be confirmed;
step 46, screening the uniqueness of the address description information jointly according to the head and tail entities and the relationship attributes;
further according to step 4, an entity A mapped into a knowledge base and a step 3 of manually checking a standard address to obtain an entity B for recording, through named entity identification, address elements and azimuth relation description information in an original text are extracted, then an entity matched with the address elements is searched in the standard address knowledge base by using a semantic similarity algorithm based on an address name, if the matched entity is unique and does not have the relation azimuth description information, a standard address conversion process is completed, subsequent steps are not needed, the matched entity is unique and has the relation azimuth description information, the relation information needs to be matched with a standard relation in a knowledge base body through the semantic similarity algorithm, a relation type and corresponding attributes are determined, and a relation which is the most similar to the attribute uniquely matched with the entity is searched in all relations connected with the entity A, acquiring an entity B, and setting an error allowable range to float 30% above a distance attribute value for a distance precision range; the method comprises the steps of generating exact description about address attributes, carrying out step-by-step reasoning on a plurality of azimuth relationships under the condition that multi-hop relationship reasoning exists, sequentially confirming the existence of intermediate entities until the tail entity is finally confirmed to exist, and carrying out combined screening on azimuth relationships and attribute information under the condition that a matched entity is not unique, and extracting a standard address of the tail entity.
In a further embodiment, the relationship conforming to the orientation description is screened from all the relationships of the matching address entities, resulting in the following steps:
43.1, establishing a non-standard address library and a standard address library which are independent of each other;
43.2, preprocessing the non-standard address base and performing first-level address matching with the standard address base;
43.3, splitting the address of the pre-processed non-standard address library and the standard address library to form an independent address library, and completing the allocation of the non-standard address library and the standard address library;
43.4, matching the addresses of the second level formed by the non-standard address base and the standard address base;
43.5, matching the addresses of the second level, judging whether the traversal of the level combination mode is finished, and finishing the matching of the address library if the traversal of the level combination mode is finished; if not, the operation of step 43.4 is performed so that the non-standard address pool and the standard address pool address match.
In a further embodiment, the label is a social attribute of the address entity, such as "store, supermarket, school, hospital, institution, enterprise, residential district", etc., the attribute types of different types of entities are different, and all attributes should be included for "face" type entities, such as province, city, county, town, community, district, etc.; for "point" type entities, such as address units, there is no need to include a "boundary latitude and longitude sequence"; for "line" type entities, such as road segments, there is no need to include "center point latitude and longitude";
in further embodiments, the relationships between entities are classified into four types of relationships, i.e., "belong to", "equal", "adjacent", "cross", and so on:
the belonging relationship refers to a spatial contained relationship in which lower-level entities belong to upper-level entities among six levels of entities. Typically, a subordinate entity can only have a relationship with a nearest superior entity. However, there are exceptions, such as one "address unit" class entity corresponds to "intersection", and may belong to a plurality of "road section" type entities, or one "road section" class entity spans different areas, and may belong to different street towns;
the equal relation means that different place entities actually correspond to the same place due to different name calling methods or space superposition and the like;
the neighbor relation contains two attributes: "orientation" and "distance". Wherein the orientations include discrete values such as "south", "north", "east", "west", "opposite", "near", and the like; the distance is a specific numerical value and the unit is meter;
the intersection relation refers to a line-type address entity, such as an intersection generated by intersection between road sections (street, road, lane), wherein the intersection also corresponds to an address unit-type entity. The head entity and the tail entity of the cross relationship are respectively a road section type entity, the attribute of the cross relationship comprises two attributes of an intersection type and an intersection entity, the attribute value of the intersection type is equal to the intersection type and the intersection entity, and the attribute value of the intersection entity is uuid of the intersection type entity generated by the intersection.
In a further embodiment, the longitude and latitude are used to calculate the relationship between the entities, for example, if the determination of the longitude and latitude shows that "huawei building" faces the east 10 ° and is a new street crossing before 100 m, the relationship from the "huawei building" entity to the "new street crossing" entity can be increased by the following steps: east; distance: 100 meters ".
In a further embodiment, the regular expression indicates (near the opposite | east | south | west | north | side | adjacent | side | next | partition |) [ \ u4E00- \ u9FA5|0-9] {0,3} $, indicating that a semantic fragment that conforms to the orientation description must occur in one of the words "opposite", "east", etc., and not more than three characters from the end of the string; for example, a text that people put up opposite to a street of the eight-dot Huawei building in the morning can be used for identifying and extracting the entity A of the Huawei building through a named entity, then the orientation semantic description of the street opposite to the street is extracted through a regular expression template and a part-of-speech tagging algorithm, the semantic similarity algorithm is used for mapping the orientation semantic description of the street opposite to the standard relation type of the opposite, and the opposite relation can be added between the entity A of the Huawei building and the entity B of the Guangxi building by combining the manual verification address of the text.
In a further embodiment, the named entity identifies and extracts the address elements and the orientation relation description information in the original text, for example, "120 m east store of building", and "120 m east store" and so on.
In a further embodiment, the matched entity is unique, for example "120 meters east" is mapped to a neighbor relation, and the attribute is "position: east; distance: 120 m ".
In a further embodiment, the occurrence of a precise description of an address attribute, such as "shop of 120 m east of building mansion", then this address attribute of "shop" is used as a prerequisite for the end entity B query.
In a further embodiment, for the case of multi-hop relationship inference, the multi-position relationship is inferred step by step, and the existence of the intermediate entities is sequentially confirmed until the tail entity confirms the existence. If the intersection of the yellow mountain road and the level road is 50 meters east, the opposite side of the Suguo supermarket, all the cross relations of the entity of the yellow mountain road are searched, the relation corresponding to the entity of the yellow mountain road is found out, the uuid corresponding to the entity of the intersection is found out in the attribute of the relation, the entity is positioned, and the direction is found out from the entity of the intersection: east; distance: and (3) confirming that the label attribute of the tail entity is supermarket according to the tail entity of 50 meters, searching address entities with opposite relation according to the tail entity, wherein the process needs to ensure the existence of various intermediate entities, and if the tail entity does not exist, the conversion fails.
In a further embodiment, in the case that the matching entity is not unique, joint screening needs to be performed on information such as orientation relationship and attribute. For example, the Suguo supermarket opposite to the building is searched for all address entities with names including the building in the city range, the entities with the names of the Suguo supermarket in the tail entities of the opposite relation are screened, then the head entities and the tail entities meeting the conditions can be uniquely determined, and the standard addresses of the tail entities are extracted.
In a further embodiment, a system for a method for converting non-standard addresses to standard addresses based on knowledge base reasoning, comprising the following modules:
the hierarchical distribution module is used for setting an address knowledge base body; the hierarchy distribution module comprises a knowledge graph body, uuid of an entity, entity attributes and relations among the entities, wherein the knowledge graph body comprises six hierarchies of province, city, district and county, street and town, road sections and address units, the entities are corresponding standard addresses of different hierarchies, and are distinguished through global unique identifiers; the uuid of the entity consists of three parts, namely a knowledge map body, a name and a number in a knowledge base; the number is an administrative division number or an address number; the entity attributes comprise names, types, labels, longitude and latitude of a central point, a boundary longitude and latitude sequence and remarks, and the labels are social attributes of the address entities;
the standard address construction module is used for constructing a standard address knowledge base; the standard address construction module is further:
step 21, constructing a standard address knowledge base, constructing word vectors of standard addresses, constructing relationships among entities, calculating relationships among the entities, and acquiring hidden relationships, wherein the constructed standard address knowledge base comprises a traditional standard address base and unstructured text data;
step 22, building an entity from a traditional standard address library, wherein the traditional standard address library comprises a place name, longitude and latitude, an address type and an address label; when the knowledge graph is brought into, forming an entity by each standard address according to the uuid of the entity in the step 1, and standardizing the field value into a corresponding attribute value according to the mapping relation between the field and the entity attribute;
step 23, building word vectors of standard addresses according to a standard knowledge base, wherein the word vectors of the standard addresses are built by cutting address character strings in a segmentation mode with the step length of 1 and the window length of 2, a group of character strings with the length of 2 are generated and used as vector bases, and the value of each vector is the number of times that each base appears in the address character strings;
step 24, constructing the relationship between entities from the structured administrative division information, and directly constructing the relationship between the lower address and the upper address and the equal relationship of the same address generated by different names and laws through the existing administrative division information;
step 25, calculating the relationship between the entities according to the longitude and latitude, calculating the distance and the orientation between every two entities, taking 1 kilometer as a truncation radius of the adjacent relationship, taking the left deviation and the right deviation of 45 degrees of the four orientations of east, west, south and north as respective direction intervals according to respective standard angles, and taking the actual travel distance of each address unit entity on the same road section along the road section as a distance attribute value of the orientation relationship;
step 26, constructing and extracting a hidden relation between the existing entities in the knowledge base according to the unstructured text data, and further acquiring the hidden relation between the address of the artificial oral description and the corresponding artificial calibration standard address;
for each piece of unstructured text data, firstly, extracting address elements in a text in an entity naming and identifying mode, comparing the extracted address elements with word vectors of the constructed standard addresses and address word vectors of each entity in a knowledge base through a cosine similarity algorithm, and mapping the extracted address elements to an entity A in the knowledge base;
the vector comparison module is used for comparing by a cosine similarity algorithm; the vector comparison module further compares the word vectors divided by the non-standard address character strings through a cosine similarity calculation method, the word vectors marked as vectors a and b are different in vector space due to different bases, and the vector space is required to be converted to the same vector space, the module calculates and extracts a union set of the vector bases a and b to form a merging base, the vector bases a and b are converted into a new merging vector space formed by the merging base, and the step of calculating the similarity between the non-standard address word vector a and the standard address word vector b by using a cosine similarity formula comprises the following steps:
step 31, splicing the bases of the two word vectors to form a vector base union set to obtain new word vector values, wherein the generated new vectors are (1,1,0,0) and (0,1,1, 1);
step 32, obtaining the following mode according to a cosine similarity algorithm:
Figure BDA0002738345170000121
wherein a and b each represent a vector;
let vector a ═ x1,x2,x3,x4…xn) (ii) a Vector b ═ y1,y2,y3,y4…yn) And then substituting the result into a cosine similarity algorithm to further obtain the following mode:
Figure BDA0002738345170000122
by the above mode, the standard address with the highest cosine similarity is extracted from each non-standard address to form a standard candidate set for inquiring the non-standard addresses, and an entity B is further obtained according to the recorded manual verification standard address;
the address information screening module is used for extracting the address information of the original text; the address information screening module obtains the following steps according to the address information of the extracted original file:
step 41, identifying address and direction description information;
step 42, matching address entities, and if no azimuth description exists, ending the process;
step 43, matching the orientation description into a standard relationship, and screening the relationship conforming to the orientation description from all the relationships of the matched address entities;
step 44, deducing a tail entity according to the relationship, and if the attribute description of the tail entity exists, further screening the tail entity;
step 45, if a plurality of relationships are continuously inferred, the existence of the intermediate entity corresponding to each relationship needs to be confirmed;
step 46, screening the uniqueness of the address description information jointly according to the head and tail entities and the relationship attributes;
further according to step 4, an entity A mapped into a knowledge base and a step 3 of manually checking a standard address to obtain an entity B for recording, through named entity identification, address elements and azimuth relation description information in an original text are extracted, then an entity matched with the address elements is searched in the standard address knowledge base by using a semantic similarity algorithm based on an address name, if the matched entity is unique and does not have the relation azimuth description information, a standard address conversion process is completed, subsequent steps are not needed, the matched entity is unique and has the relation azimuth description information, the relation information needs to be matched with a standard relation in a knowledge base body through the semantic similarity algorithm, a relation type and corresponding attributes are determined, and a relation which is the most similar to the attribute uniquely matched with the entity is searched in all relations connected with the entity A, acquiring an entity B, and setting an error allowable range to float 30% above a distance attribute value for a distance precision range; the method comprises the steps of generating exact description about address attributes, carrying out step-by-step reasoning on a plurality of azimuth relationships under the condition that multi-hop relationship reasoning exists, sequentially confirming the existence of intermediate entities until the tail entity is finally confirmed to exist, and carrying out combined screening on azimuth relationships and attribute information under the condition that a matched entity is not unique, and extracting a standard address of the tail entity.
In summary, the present invention has the following advantages: the method comprises the steps of constructing a knowledge base by a head entity-directed relationship-tail entity triple form of standard addresses and mutual relationship attributes, storing the knowledge base in a knowledge map form, and determining a head entity and a directed relationship in the triple through extraction of the standard addresses in unstructured addresses and extraction of relevant direction and attribute elements so as to determine a knowledge map query condition and obtain a tail entity address inferred based on the standard addresses, so that non-standardized geographic position information orally described by a user can be processed and converted into standard address information capable of being processed by a machine.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. The invention is not described in detail in order to avoid unnecessary repetition.

Claims (10)

1. A method for converting a non-standard address into a standard address based on knowledge base reasoning is characterized by comprising the following steps:
step 1: setting an ontology of an address knowledge base;
step 2: constructing a standard address knowledge base;
and step 3: comparing by a cosine similarity algorithm;
and 4, step 4: and extracting the address information of the original text.
2. The method for converting a non-standard address into a standard address based on knowledge base reasoning according to claim 1, wherein the ontology of the address knowledge base in step 1 comprises a knowledge graph ontology, uuids of entities, attributes of the entities and relationships among the entities, wherein the knowledge graph ontology comprises six levels of province, city, county, street, town, road section and address unit, and the entities are corresponding standard addresses of different levels and are distinguished by a globally unique identifier; the uuid of the entity consists of three parts, namely a knowledge map body, a name and a number in a knowledge base; the number is an administrative division number or an address number; the entity attributes comprise names, types, labels, longitude and latitude of a central point, longitude and latitude sequences of boundaries and remarks, and the labels are social attributes of the address entities.
3. The method for converting non-standard address into standard address based on knowledge base inference as claimed in claim 1, wherein said step 2 is further:
step 21, constructing a standard address knowledge base, constructing word vectors of standard addresses, constructing relationships among entities, calculating relationships among the entities, and acquiring hidden relationships, wherein the constructed standard address knowledge base comprises a traditional standard address base and unstructured text data;
step 22, building an entity from a traditional standard address library, wherein the traditional standard address library comprises a place name, longitude and latitude, an address type and an address label; when the knowledge graph is brought into, forming an entity by each standard address according to the uuid of the entity in the step 1, and standardizing the field value into a corresponding attribute value according to the mapping relation between the field and the entity attribute;
step 23, building word vectors of standard addresses according to a standard knowledge base, wherein the word vectors of the standard addresses are built by cutting address character strings in a segmentation mode with the step length of 1 and the window length of 2, a group of character strings with the length of 2 are generated and used as vector bases, and the value of each vector is the number of times that each base appears in the address character strings;
step 24, constructing the relationship between entities from the structured administrative division information, and directly constructing the relationship between the lower address and the upper address and the equal relationship of the same address generated by different names and laws through the existing administrative division information;
step 25, calculating the relationship between the entities according to the longitude and latitude, calculating the distance and the orientation between every two entities, taking 1 kilometer as a truncation radius of the adjacent relationship, taking the left deviation and the right deviation of 45 degrees of the four orientations of east, west, south and north as respective direction intervals according to respective standard angles, and taking the actual travel distance of each address unit entity on the same road section along the road section as a distance attribute value of the orientation relationship;
step 26, constructing and extracting a hidden relation between the existing entities in the knowledge base according to the unstructured text data, and further acquiring the hidden relation between the address of the artificial oral description and the corresponding artificial calibration standard address;
for each piece of unstructured text data, firstly, extracting address elements in the text in an entity naming identification mode, comparing the extracted address elements with the word vectors of the constructed standard addresses and the address word vectors of all entities in the knowledge base through a cosine similarity algorithm, and mapping the address elements to an entity A in the knowledge base.
4. The method for converting non-standard address into standard address based on knowledge base inference as claimed in claim 1, wherein said step 3 is further:
the method comprises the following steps of comparing by a cosine similarity algorithm, recording word vectors after segmentation of non-standard address character strings as vectors a and b, converting the vectors into the same vector space because the vector spaces are different due to different bases, extracting a union set of a vector base and a vector base b by module operation to form a merging base, converting the vectors a and b into a new merging vector space formed by the merging base, and calculating the similarity between the non-standard address word vector a and a standard address word vector b by using a cosine similarity formula, wherein the steps comprise:
step 31, splicing the bases of the two word vectors to form a vector base union set to obtain new word vector values, wherein the generated new vectors are (1,1,0,0) and (0,1,1, 1);
step 32, obtaining the following mode according to a cosine similarity algorithm:
Figure FDA0002738345160000021
wherein a and b each represent a vector;
let vector a ═ x1,x2,x3,x4…xn) (ii) a Vector b ═ y1,y2,y3,y4…yn) And then substituting the result into a cosine similarity algorithm to further obtain the following mode:
Figure FDA0002738345160000022
by the above method, the standard address with the highest cosine similarity is extracted from each non-standard address to form a standard candidate set for querying the non-standard address, and the entity B is further obtained according to the recorded manual verification standard address.
5. The method for converting the non-standard address into the standard address based on the knowledge base inference as claimed in any one of claims 3 or 4, wherein an entity B is obtained according to the recorded manual check standard address, the entity A and the entity B mapped into the knowledge base are judged, and a relationship between the entity A and the entity B is judged; extracting additional relation azimuth description in the text address in a mode of combining the regular expression and the part of speech tagging algorithm, mapping the additional relation azimuth description into a corresponding relation type and a corresponding attribute in a knowledge base body, and then establishing a corresponding relation from an entity A to an entity B, wherein the step of specifically extracting the additional relation azimuth description is as follows:
step 1, firstly, performing part-of-speech tagging on a text through an open-source word segmentation tool, filtering place names, proper nouns, verbs, adjectives and time words, and segmenting the text into a plurality of semantic segments;
step 2, judging whether each segment is described in relation orientation in a regular expression matching mode;
step 3, describing semantic segments of the direction by adopting a regular expression;
there is a relationship between the entity a and the entity B, and the probability of occurrence of the entity a and the entity B has influence on each other, that is:
p((B|A))≠P(B)
searching an entity matched with the address elements in a standard address knowledge base to obtain the following mode:
p((B|A))=P(B)
the following is further derived from the relationship between the two:
p(AB)=p((B|A))P(A)=P(A)P(B)
where A and B represent independent entity vector events.
6. The method for converting non-standard address into standard address based on knowledge base inference as claimed in claim 1, wherein said step 4 is further:
the following steps are obtained according to the address information of the extracted original file:
step 41, identifying address and direction description information;
step 42, matching address entities, and if no azimuth description exists, ending the process;
step 43, matching the orientation description into a standard relationship, and screening the relationship conforming to the orientation description from all the relationships of the matched address entities;
step 44, deducing a tail entity according to the relationship, and if the attribute description of the tail entity exists, further screening the tail entity;
step 45, if a plurality of relationships are continuously inferred, the existence of the intermediate entity corresponding to each relationship needs to be confirmed;
step 46, screening the uniqueness of the address description information jointly according to the head and tail entities and the relationship attributes;
further according to step 4, an entity A mapped into a knowledge base and a step 3 of manually checking a standard address to obtain an entity B for recording, through named entity identification, address elements and azimuth relation description information in an original text are extracted, then an entity matched with the address elements is searched in the standard address knowledge base by using a semantic similarity algorithm based on an address name, if the matched entity is unique and does not have the relation azimuth description information, a standard address conversion process is completed, subsequent steps are not needed, the matched entity is unique and has the relation azimuth description information, the relation information needs to be matched with a standard relation in a knowledge base body through the semantic similarity algorithm, a relation type and corresponding attributes are determined, and a relation which is the most similar to the attribute uniquely matched with the entity is searched in all relations connected with the entity A, acquiring an entity B, and setting an error allowable range to float 30% above a distance attribute value for a distance precision range; the method comprises the steps of generating exact description about address attributes, carrying out step-by-step reasoning on a plurality of azimuth relationships under the condition that multi-hop relationship reasoning exists, sequentially confirming the existence of intermediate entities until the tail entity is finally confirmed to exist, and carrying out combined screening on azimuth relationships and attribute information under the condition that a matched entity is not unique, and extracting a standard address of the tail entity.
7. The method for converting non-standard address into standard address based on knowledge base inference as claimed in claim 6, wherein the relationship conforming to the orientation description is selected from all the relationships of the matching address entities, and the following steps are obtained:
43.1, establishing a non-standard address library and a standard address library which are independent of each other;
43.2, preprocessing the non-standard address base and performing first-level address matching with the standard address base;
43.3, splitting the address of the pre-processed non-standard address library and the standard address library to form an independent address library, and completing the allocation of the non-standard address library and the standard address library;
43.4, matching the addresses of the second level formed by the non-standard address base and the standard address base;
43.5, matching the addresses of the second level, judging whether the traversal of the level combination mode is finished, and finishing the matching of the address library if the traversal of the level combination mode is finished; if not, the operation of step 43.4 is performed so that the non-standard address pool and the standard address pool address match.
8. A system for converting a non-standard address into a standard address based on knowledge base reasoning is characterized by comprising the following modules:
the hierarchical distribution module is used for setting an address knowledge base body;
the standard address construction module is used for constructing a standard address knowledge base;
the vector comparison module is used for comparing by a cosine similarity algorithm;
and the address information screening module is used for extracting the address information of the original text.
9. The system for converting non-standard addresses into standard addresses based on knowledge base reasoning, according to claim 8, wherein the hierarchy distribution module comprises a knowledge graph ontology, uuids of entities, attributes of the entities, and relationships among the entities, wherein the knowledge graph ontology comprises six hierarchies of province, city, county, street, town, road segment, and address unit, and the entities are corresponding standard addresses of different hierarchies, and are distinguished by a globally unique identifier; the uuid of the entity consists of three parts, namely a knowledge map body, a name and a number in a knowledge base; the number is an administrative division number or an address number; the entity attributes comprise names, types, labels, longitude and latitude of a central point, a boundary longitude and latitude sequence and remarks, and the labels are social attributes of the address entities;
the standard address construction module is further:
step 21, constructing a standard address knowledge base, constructing word vectors of standard addresses, constructing relationships among entities, calculating relationships among the entities, and acquiring hidden relationships, wherein the constructed standard address knowledge base comprises a traditional standard address base and unstructured text data;
step 22, building an entity from a traditional standard address library, wherein the traditional standard address library comprises a place name, longitude and latitude, an address type and an address label; when the knowledge graph is brought into, forming an entity by each standard address according to the uuid of the entity in the step 1, and standardizing the field value into a corresponding attribute value according to the mapping relation between the field and the entity attribute;
step 23, building word vectors of standard addresses according to a standard knowledge base, wherein the word vectors of the standard addresses are built by cutting address character strings in a segmentation mode with the step length of 1 and the window length of 2, a group of character strings with the length of 2 are generated and used as vector bases, and the value of each vector is the number of times that each base appears in the address character strings;
step 24, constructing the relationship between entities from the structured administrative division information, and directly constructing the relationship between the lower address and the upper address and the equal relationship of the same address generated by different names and laws through the existing administrative division information;
step 25, calculating the relationship between the entities according to the longitude and latitude, calculating the distance and the orientation between every two entities, taking 1 kilometer as a truncation radius of the adjacent relationship, taking the left deviation and the right deviation of 45 degrees of the four orientations of east, west, south and north as respective direction intervals according to respective standard angles, and taking the actual travel distance of each address unit entity on the same road section along the road section as a distance attribute value of the orientation relationship;
step 26, constructing and extracting a hidden relation between the existing entities in the knowledge base according to the unstructured text data, and further acquiring the hidden relation between the address of the artificial oral description and the corresponding artificial calibration standard address;
for each piece of unstructured text data, firstly, extracting address elements in the text in an entity naming identification mode, comparing the extracted address elements with the word vectors of the constructed standard addresses and the address word vectors of all entities in the knowledge base through a cosine similarity algorithm, and mapping the address elements to an entity A in the knowledge base.
10. The system of claim 8, wherein the vector comparison module further performs comparison by a cosine similarity algorithm, and the word vectors obtained by segmenting the non-standard address character strings are recorded as vectors a and b, which have different bases, and therefore the vector spaces are different and need to be converted into the same vector space, the module extracts a union set of the a and b vector bases to form a merging base, and converts the a and b vectors into a new merging vector space formed by the merging base, and the step of calculating the similarity between the non-standard address word vector a and the standard address word vector b by using a cosine similarity formula is as follows:
step 31, splicing the bases of the two word vectors to form a vector base union set to obtain new word vector values, wherein the generated new vectors are (1,1,0,0) and (0,1,1, 1);
step 32, obtaining the following mode according to a cosine similarity algorithm:
Figure FDA0002738345160000061
wherein a and b each represent a vector;
let vector a ═ x1,x2,x3,x4…xn) (ii) a Vector b ═ y1,y2,y3,y4…yn) And then substituting the result into a cosine similarity algorithm to further obtain the following mode:
Figure FDA0002738345160000062
by the above mode, the standard address with the highest cosine similarity is extracted from each non-standard address to form a standard candidate set for inquiring the non-standard addresses, and an entity B is further obtained according to the recorded manual verification standard address;
the address information screening module obtains the following steps according to the address information of the extracted original file:
step 41, identifying address and direction description information;
step 42, matching address entities, and if no azimuth description exists, ending the process;
step 43, matching the orientation description into a standard relationship, and screening the relationship conforming to the orientation description from all the relationships of the matched address entities;
step 44, deducing a tail entity according to the relationship, and if the attribute description of the tail entity exists, further screening the tail entity;
step 45, if a plurality of relationships are continuously inferred, the existence of the intermediate entity corresponding to each relationship needs to be confirmed;
step 46, screening the uniqueness of the address description information jointly according to the head and tail entities and the relationship attributes;
further according to step 4, an entity A mapped into a knowledge base and a step 3 of manually checking a standard address to obtain an entity B for recording, through named entity identification, address elements and azimuth relation description information in an original text are extracted, then an entity matched with the address elements is searched in the standard address knowledge base by using a semantic similarity algorithm based on an address name, if the matched entity is unique and does not have the relation azimuth description information, a standard address conversion process is completed, subsequent steps are not needed, the matched entity is unique and has the relation azimuth description information, the relation information needs to be matched with a standard relation in a knowledge base body through the semantic similarity algorithm, a relation type and corresponding attributes are determined, and a relation which is the most similar to the attribute uniquely matched with the entity is searched in all relations connected with the entity A, acquiring an entity B, and setting an error allowable range to float 30% above a distance attribute value for a distance precision range; the method comprises the steps of generating exact description about address attributes, carrying out step-by-step reasoning on a plurality of azimuth relationships under the condition that multi-hop relationship reasoning exists, sequentially confirming the existence of intermediate entities until the tail entity is finally confirmed to exist, and carrying out combined screening on azimuth relationships and attribute information under the condition that a matched entity is not unique, and extracting a standard address of the tail entity.
CN202011141247.7A 2020-10-22 2020-10-22 Method and system for converting non-standard address into standard address based on knowledge base reasoning Active CN112347222B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011141247.7A CN112347222B (en) 2020-10-22 2020-10-22 Method and system for converting non-standard address into standard address based on knowledge base reasoning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011141247.7A CN112347222B (en) 2020-10-22 2020-10-22 Method and system for converting non-standard address into standard address based on knowledge base reasoning

Publications (2)

Publication Number Publication Date
CN112347222A true CN112347222A (en) 2021-02-09
CN112347222B CN112347222B (en) 2022-03-18

Family

ID=74359804

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011141247.7A Active CN112347222B (en) 2020-10-22 2020-10-22 Method and system for converting non-standard address into standard address based on knowledge base reasoning

Country Status (1)

Country Link
CN (1) CN112347222B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818072A (en) * 2021-03-09 2021-05-18 携程旅游信息技术(上海)有限公司 Tourism knowledge map updating method, system, equipment and storage medium
CN112906394A (en) * 2021-03-18 2021-06-04 北京字节跳动网络技术有限公司 Address recognition method, device, equipment and storage medium
CN113505190A (en) * 2021-09-10 2021-10-15 南方电网数字电网研究院有限公司 Address information correction method, device, computer equipment and storage medium
CN113822057A (en) * 2021-08-06 2021-12-21 北京百度网讯科技有限公司 Location information determination method, location information determination device, electronic device, and storage medium
CN113836357A (en) * 2021-10-12 2021-12-24 北京商越网络科技有限公司 Address database data processing method and control system based on text similarity calculation
CN113987114A (en) * 2021-09-17 2022-01-28 上海燃气有限公司 Address matching method and device based on semantic analysis and electronic equipment
CN114117004A (en) * 2021-11-24 2022-03-01 北京百度网讯科技有限公司 Address recognition method and device, electronic equipment and storage medium
CN114168705A (en) * 2021-12-03 2022-03-11 南京大峡谷信息科技有限公司 Chinese address matching method based on address element index
CN114363264A (en) * 2021-12-22 2022-04-15 中科曙光(南京)计算技术有限公司 Service reservation method
WO2023045311A1 (en) * 2021-09-26 2023-03-30 中兴通讯股份有限公司 Resource topology restoration method and apparatus, server, and storage medium
CN117708262A (en) * 2024-02-02 2024-03-15 北京友友天宇系统技术有限公司 Method and device for carrying out data association on multidimensional and multi-source data and electronic equipment
CN118245518A (en) * 2024-05-27 2024-06-25 深圳大学 Position information retrieval method, system and terminal based on triple semantic structure
CN118588313A (en) * 2024-08-06 2024-09-03 四川互慧软件有限公司 Hospital data dictionary mapping method, device, computing equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679850A (en) * 2015-02-13 2015-06-03 深圳市华傲数据技术有限公司 Address structuring method and device
CN105144205A (en) * 2013-04-29 2015-12-09 西门子公司 Device and method for answering a natural language question using a number of selected knowledge bases
US9535902B1 (en) * 2013-06-28 2017-01-03 Digital Reasoning Systems, Inc. Systems and methods for entity resolution using attributes from structured and unstructured data
CN107194011A (en) * 2017-06-23 2017-09-22 重庆邮电大学 A kind of position prediction system and method based on social networks
CN111144117A (en) * 2019-12-26 2020-05-12 同济大学 Knowledge graph Chinese address disambiguation method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105144205A (en) * 2013-04-29 2015-12-09 西门子公司 Device and method for answering a natural language question using a number of selected knowledge bases
US9535902B1 (en) * 2013-06-28 2017-01-03 Digital Reasoning Systems, Inc. Systems and methods for entity resolution using attributes from structured and unstructured data
CN104679850A (en) * 2015-02-13 2015-06-03 深圳市华傲数据技术有限公司 Address structuring method and device
CN107194011A (en) * 2017-06-23 2017-09-22 重庆邮电大学 A kind of position prediction system and method based on social networks
CN111144117A (en) * 2019-12-26 2020-05-12 同济大学 Knowledge graph Chinese address disambiguation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵成 等: "一种中文地址知识库支撑的中文地址分词算法", 《测绘科学技术学报》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818072A (en) * 2021-03-09 2021-05-18 携程旅游信息技术(上海)有限公司 Tourism knowledge map updating method, system, equipment and storage medium
CN112906394A (en) * 2021-03-18 2021-06-04 北京字节跳动网络技术有限公司 Address recognition method, device, equipment and storage medium
CN113822057A (en) * 2021-08-06 2021-12-21 北京百度网讯科技有限公司 Location information determination method, location information determination device, electronic device, and storage medium
CN113822057B (en) * 2021-08-06 2022-10-18 北京百度网讯科技有限公司 Location information determination method, location information determination device, electronic device, and storage medium
CN113505190A (en) * 2021-09-10 2021-10-15 南方电网数字电网研究院有限公司 Address information correction method, device, computer equipment and storage medium
CN113987114A (en) * 2021-09-17 2022-01-28 上海燃气有限公司 Address matching method and device based on semantic analysis and electronic equipment
WO2023045311A1 (en) * 2021-09-26 2023-03-30 中兴通讯股份有限公司 Resource topology restoration method and apparatus, server, and storage medium
CN113836357B (en) * 2021-10-12 2022-09-16 北京商越网络科技有限公司 Address database data processing method and control system based on text similarity calculation
CN113836357A (en) * 2021-10-12 2021-12-24 北京商越网络科技有限公司 Address database data processing method and control system based on text similarity calculation
CN114117004B (en) * 2021-11-24 2023-06-30 北京百度网讯科技有限公司 Address recognition method, address recognition device, electronic equipment and storage medium
CN114117004A (en) * 2021-11-24 2022-03-01 北京百度网讯科技有限公司 Address recognition method and device, electronic equipment and storage medium
CN114168705A (en) * 2021-12-03 2022-03-11 南京大峡谷信息科技有限公司 Chinese address matching method based on address element index
CN114363264A (en) * 2021-12-22 2022-04-15 中科曙光(南京)计算技术有限公司 Service reservation method
CN114363264B (en) * 2021-12-22 2024-03-15 中科曙光(南京)计算技术有限公司 Service reservation method
CN117708262A (en) * 2024-02-02 2024-03-15 北京友友天宇系统技术有限公司 Method and device for carrying out data association on multidimensional and multi-source data and electronic equipment
CN117708262B (en) * 2024-02-02 2024-05-31 北京友友天宇系统技术有限公司 Method and device for carrying out data association on multidimensional and multi-source data and electronic equipment
CN118245518A (en) * 2024-05-27 2024-06-25 深圳大学 Position information retrieval method, system and terminal based on triple semantic structure
CN118245518B (en) * 2024-05-27 2024-09-13 深圳大学 Position information retrieval method, system and terminal based on triple semantic structure
CN118588313A (en) * 2024-08-06 2024-09-03 四川互慧软件有限公司 Hospital data dictionary mapping method, device, computing equipment and storage medium

Also Published As

Publication number Publication date
CN112347222B (en) 2022-03-18

Similar Documents

Publication Publication Date Title
CN112347222B (en) Method and system for converting non-standard address into standard address based on knowledge base reasoning
CN113434623B (en) Fusion method based on multi-source heterogeneous space planning data
Mustière et al. Matching networks with different levels of detail
CN110020433B (en) Industrial and commercial high-management name disambiguation method based on enterprise incidence relation
CN107679221B (en) Time-space data acquisition and service combination scheme generation method for disaster reduction task
CN109657074B (en) News knowledge graph construction method based on address tree
CN112612863B (en) Address matching method and system based on Chinese word segmentation device
CN111881290A (en) Distribution network multi-source grid entity fusion method based on weighted semantic similarity
CN108388559A (en) Name entity recognition method and system, computer program of the geographical space under
CN112988715B (en) Construction method of global network place name database based on open source mode
CN116028645B (en) Urban municipal infrastructure emergency knowledge graph determination method, system and equipment
CN115495594A (en) Knowledge graph fusion method and system based on urban public facility decision case
CN116414823A (en) Address positioning method and device based on word segmentation model
CN111191084B (en) Map structure-based place name address resolution method
CN114661744B (en) Terrain database updating method and system based on deep learning
CN111813819A (en) Space-time big data-based place name and address online matching method
CN114168705B (en) Chinese address matching method based on address element index
CN114860891A (en) Method and device for constructing space-time map of intelligent pipe network
CN114595302A (en) Method, device, medium, and apparatus for constructing multi-level spatial relationship of spatial elements
CN111325235B (en) Multilingual-oriented universal place name semantic similarity calculation method and application thereof
Zhou et al. A points of interest matching method using a multivariate weighting function with gradient descent optimization
CN114820960B (en) Method, device, equipment and medium for constructing map
Loai Ali et al. Towards rule-guided classification for volunteered geographic information
CN116431746A (en) Address mapping method and device based on coding library, electronic equipment and storage medium
CN116431625A (en) Positioning analysis method and device for geographic entity and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant