CN118760796A

CN118760796A - Address location method and device

Info

Publication number: CN118760796A
Application number: CN202410725379.6A
Authority: CN
Inventors: 王铭; 俞自生; 隋远
Original assignee: Jingdong City Beijing Digital Technology Co Ltd; Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong City Beijing Digital Technology Co Ltd; Jingdong Technology Information Technology Co Ltd
Priority date: 2024-06-05
Filing date: 2024-06-05
Publication date: 2024-10-11

Abstract

The embodiments of the present disclosure disclose an address location method and device. The specific implementation of the method includes: slicing the address text by sliding window to obtain a word unit set; matching the word unit set with the hierarchical node data in the geographic hierarchical tree to obtain matching hierarchical nodes, wherein the hierarchical node data includes: a word unit set obtained by slicing the hierarchical node name by sliding window; for each matching hierarchical node, calculating the score value of the hierarchical node according to the weight of the matching word unit; obtaining all candidate address links containing matching hierarchical nodes through depth-first traversal; calculating the score of each candidate address link according to the score value of the hierarchical node contained in the candidate address link, and selecting the candidate address link with the highest score as the location result. This implementation can realize a fast and accurate address location function, and has a wide range of application value.

Description

Address location method and device

技术领域Technical Field

本公开的实施例涉及计算机技术领域，具体涉及地址落位方法和装置。Embodiments of the present disclosure relate to the field of computer technology, and in particular to an address location method and device.

背景技术Background Art

地址落位是将自然语言描述的地址文本映射至标准地理层级的过程。地理层级是一种根据空间范围划分的层级结构。通常地，可以根据实际的使用场景，划分不同的地理层级，如治安巡逻的地理层级、物流配送的地理层级等。Address location is the process of mapping the address text described in natural language to a standard geographic hierarchy. The geographic hierarchy is a hierarchical structure divided according to the spatial scope. Generally, different geographic hierarchies can be divided according to the actual usage scenarios, such as the geographic hierarchy of security patrols, the geographic hierarchy of logistics distribution, etc.

不同于常见的地址查询(将地址文本映射至经纬度坐标)，地址落位将用户输入的地址文本映射至预先定义好的地理层级。在政务服务场景中，地址落位应用十分普遍。在这些应用场景中，涉及到的地址文本主要来自于用户个人和管理人员手动填写，经常会出现地理层级冲突、撰写不规范、地址不完整、错别字等情况，这对地址落位造成巨大挑战。Different from the common address query (mapping address text to longitude and latitude coordinates), address location maps the address text entered by the user to a predefined geographic hierarchy. In government service scenarios, address location is very common. In these application scenarios, the address text involved mainly comes from manual filling by individual users and managers, and there are often conflicts in geographic hierarchy, irregular writing, incomplete addresses, typos, etc., which poses a huge challenge to address location.

现有技术的全文检索方法进行地址落位，依赖于文本相似度计算，然而文本之间的自由匹配无法适应地址天然的层级结构。现有技术中基于层级选择实现的规则匹配方法，在一定程度上受限于规则的制定，其预先定义好的规则系统并不能很好地适应用户复杂多样的地址填写情况，并且可能会因为用户填写的行政层级不标准或前后冲突，导致无法匹配到用户真正想要的结果。The existing full-text search method for address location relies on text similarity calculation, but the free matching between texts cannot adapt to the natural hierarchical structure of the address. The rule matching method based on hierarchical selection in the existing technology is limited to a certain extent by the formulation of rules. Its pre-defined rule system cannot adapt well to the complex and diverse address filling situations of users, and may fail to match the results that users really want because the administrative levels filled in by users are not standard or conflicting.

发明内容Summary of the invention

本公开的实施例提出了地址落位方法和装置。The embodiments of the present disclosure provide an address location method and device.

第一方面，本公开的实施例提供了地址落位方法，包括：将地址文本进行滑窗切片得到词元集合；将所述词元集合与地理层级树中的层级节点数据进行匹配，获得匹配的层级节点，其中，层级节点数据包括：将层级节点名称滑窗切片得到的词元集合；对于每个匹配的层级节点，根据匹配的词元的权重计算该层级节点的分数值；通过深度优先遍历获得所有包含匹配的层级节点的候选地址链路；根据候选地址链路包含的层级节点的分数值计算每个候选地址链路的分数，并选择分数最高的候选地址链路作为落位结果。In a first aspect, an embodiment of the present disclosure provides an address placement method, comprising: slicing an address text through a sliding window to obtain a word set; matching the word set with hierarchical node data in a geographic hierarchical tree to obtain matching hierarchical nodes, wherein the hierarchical node data comprises: a word set obtained by slicing the hierarchical node name through a sliding window; for each matching hierarchical node, calculating a score value of the hierarchical node according to a weight of the matching word; obtaining all candidate address links containing matching hierarchical nodes through a depth-first traversal; calculating a score for each candidate address link according to the score values of the hierarchical nodes contained in the candidate address link, and selecting the candidate address link with the highest score as the placement result.

在一些实施例中，所述将所述词元集合与地理层级树中的层级节点数据进行匹配，获得匹配的层级节点，包括：针对所述词元集合中每个词元查询预先构建的倒排索引获得所有包含该词元的层级节点，获得匹配的层级节点。In some embodiments, matching the word set with the hierarchical node data in the geographic hierarchical tree to obtain matching hierarchical nodes includes: querying a pre-built inverted index for each word in the word set to obtain all hierarchical nodes containing the word, thereby obtaining matching hierarchical nodes.

在一些实施例中，所述方法还包括：针对地理层级树中的每个层级节点，对节点名称进行滑窗切片得到对应的模糊词元集合；针对所述模糊词元集合中每个模糊词元，将包含该模糊词元的所有层级节点注册到该词元的倒排索引中。In some embodiments, the method further includes: for each hierarchical node in the geographic hierarchical tree, performing sliding window slicing on the node name to obtain a corresponding fuzzy word set; for each fuzzy word in the fuzzy word set, registering all hierarchical nodes containing the fuzzy word into the inverted index of the word.

在一些实施例中，所述方法还包括：根据所述词元集合中每个词元在地理层级树中的层级节点数据中出现的频率计算词元的权重，其中，出现的频率越高，权重越低。In some embodiments, the method further comprises: calculating the weight of a word-gram according to the frequency of occurrence of each word-gram in the word-gram set in the hierarchical node data in the geographical hierarchical tree, wherein the higher the frequency of occurrence, the lower the weight.

在一些实施例中，所述根据匹配的词元的权重计算该层级节点的分数值，包括：根据词元权重、词元长度、层级节点的匹配比例计算该层级节点的分数值。In some embodiments, the step of calculating the score value of the level node according to the weight of the matched word-gram includes: calculating the score value of the level node according to the word-gram weight, the word-gram length, and the matching ratio of the level node.

在一些实施例中，所述根据候选地址链路包含的层级节点的分数值计算每个候选地址链路的分数，包括：对于候选地址链路包含的每个层级节点，通过层级权重对该层级节点的分数值加权；将候选地址链路包含的所有层级节点的分数值的加权值累加，作为候选地址链路的分数。In some embodiments, the score of each candidate address link is calculated based on the score values of the hierarchical nodes included in the candidate address link, including: for each hierarchical node included in the candidate address link, weighting the score value of the hierarchical node by the hierarchical weight; and accumulating the weighted values of the score values of all hierarchical nodes included in the candidate address link as the score of the candidate address link.

在一些实施例中，在所述根据匹配的词元的权重计算该层级节点的分数值之前，所述方法还包括：通过预先生成的词元权重索引查询匹配的词元的权重；其中，所述词元权重索引通过如下步骤生成：针对地理层级树中的每个层级节点，对节点名称进行滑窗切片得到对应的模糊词元集合；根据所述模糊词元集合中每个模糊词元在地理层级树中的层级节点数据中出现的频率计算模糊词元的权重，其中，出现的频率越高，权重越低；将每个模糊词元的权重采用键值类型的数据结构存储，生成词元权重索引。In some embodiments, before calculating the score value of the hierarchical node according to the weight of the matching word, the method also includes: querying the weight of the matching word through a pre-generated word weight index; wherein the word weight index is generated by the following steps: for each hierarchical node in the geographic hierarchical tree, sliding window slicing the node name to obtain a corresponding fuzzy word set; calculating the weight of the fuzzy word according to the frequency of occurrence of each fuzzy word in the fuzzy word set in the hierarchical node data in the geographic hierarchical tree, wherein the higher the frequency of occurrence, the lower the weight; storing the weight of each fuzzy word in a key-value type data structure to generate a word weight index.

第二方面，本公开的实施例提供了一种地址落位装置，包括：切词单元，被配置成将地址文本进行滑窗切片得到词元集合；匹配单元，被配置成将所述词元集合与地理层级树中的层级节点数据进行匹配，获得匹配的层级节点，其中，层级节点数据包括：将层级节点名称滑窗切片得到的词元集合；计算单元，被配置成对于每个匹配的层级节点，根据匹配的词元的权重计算该层级节点的分数值；链路单元，被配置成通过深度优先遍历获得所有包含匹配的层级节点的候选地址链路；选择单元，被配置成根据候选地址链路包含的层级节点的分数值计算每个候选地址链路的分数，并选择分数最高的候选地址链路作为落位结果。In a second aspect, an embodiment of the present disclosure provides an address placement device, comprising: a word segmentation unit, configured to perform sliding window slicing on an address text to obtain a word set; a matching unit, configured to match the word set with hierarchical node data in a geographic hierarchical tree to obtain matched hierarchical nodes, wherein the hierarchical node data comprises: a word set obtained by performing sliding window slicing on the hierarchical node name; a calculation unit, configured to calculate, for each matched hierarchical node, a score value of the hierarchical node according to the weight of the matched word; a link unit, configured to obtain all candidate address links containing matched hierarchical nodes through a depth-first traversal; a selection unit, configured to calculate the score of each candidate address link according to the score values of the hierarchical nodes contained in the candidate address link, and select the candidate address link with the highest score as the placement result.

在一些实施例中，所述匹配单元进一步被配置成：针对所述词元集合中每个词元查询预先构建的倒排索引获得所有包含该词元的层级节点，获得匹配的层级节点。In some embodiments, the matching unit is further configured to: query a pre-constructed inverted index for each word in the word set to obtain all hierarchical nodes containing the word, and obtain matching hierarchical nodes.

在一些实施例中，所述装置还包括倒排单元，被配置成：针对地理层级树中的每个层级节点，对节点名称进行滑窗切片得到对应的模糊词元集合；针对所述模糊词元集合中每个模糊词元，将包含该模糊词元的所有层级节点注册到该词元的倒排索引中。In some embodiments, the device also includes an inverted index unit, which is configured to: for each hierarchical node in the geographic hierarchical tree, perform sliding window slicing on the node name to obtain a corresponding fuzzy word element set; for each fuzzy word element in the fuzzy word element set, register all hierarchical nodes containing the fuzzy word element into the inverted index of the word element.

在一些实施例中，所述装置还包括权重计算单元，被配置成：根据所述词元集合中每个词元在地理层级树中的层级节点数据中出现的频率计算词元的权重，其中，出现的频率越高，权重越低。In some embodiments, the device also includes a weight calculation unit configured to calculate the weight of the word based on the frequency of each word in the word set appearing in the hierarchical node data in the geographic hierarchical tree, wherein the higher the frequency of occurrence, the lower the weight.

在一些实施例中，所述计算单元进一步被配置成：根据词元权重、词元长度、层级节点的匹配比例计算该层级节点的分数值。In some embodiments, the calculation unit is further configured to: calculate the score value of the level node according to the word unit weight, word unit length, and matching ratio of the level node.

在一些实施例中，所述链路单元进一步被配置成：对于候选地址链路包含的每个层级节点，通过层级权重对该层级节点的分数值加权；将候选地址链路包含的所有层级节点的分数值的加权值累加，作为候选地址链路的分数。In some embodiments, the link unit is further configured to: for each hierarchical node included in the candidate address link, weight the score value of the hierarchical node by the hierarchical weight; and accumulate the weighted values of the score values of all hierarchical nodes included in the candidate address link as the score of the candidate address link.

在一些实施例中，所述装置还包括权重索引单元，被配置成：在所述根据匹配的词元的权重计算该层级节点的分数值之前，通过预先生成的词元权重索引查询匹配的词元的权重；其中，所述词元权重索引通过如下步骤生成：针对地理层级树中的每个层级节点，对节点名称进行滑窗切片得到对应的模糊词元集合；根据所述模糊词元集合中每个模糊词元在地理层级树中的层级节点数据中出现的频率计算模糊词元的权重，其中，出现的频率越高，权重越低；将每个模糊词元的权重采用键值类型的数据结构存储，生成词元权重索引。In some embodiments, the device also includes a weight index unit, which is configured to: before calculating the score value of the hierarchical node based on the weight of the matching word, query the weight of the matching word through a pre-generated word weight index; wherein the word weight index is generated through the following steps: for each hierarchical node in the geographic hierarchical tree, the node name is sliced by sliding window to obtain the corresponding fuzzy word set; the weight of the fuzzy word is calculated according to the frequency of occurrence of each fuzzy word in the fuzzy word set in the hierarchical node data in the geographic hierarchical tree, wherein the higher the frequency of occurrence, the lower the weight; the weight of each fuzzy word is stored in a key-value type data structure to generate a word weight index.

第三方面，本公开的实施例提供了一种电子设备，包括：一个或多个处理器；存储装置，其上存储有一个或多个计算机程序，当所述一个或多个计算机程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现如第一方面中任一项所述的方法。In a third aspect, an embodiment of the present disclosure provides an electronic device, comprising: one or more processors; a storage device on which one or more computer programs are stored, and when the one or more computer programs are executed by the one or more processors, the one or more processors implement a method as described in any one of the first aspects.

第四方面，本公开的实施例提供了一种计算机可读介质，其上存储有计算机程序，其中，所述计算机程序被处理器执行时实现如第一方面中任一项所述的方法。In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method as described in any one of the first aspects.

本公开的实施例提供的地址落位方法和装置，针对地址落位中结果相似度量和文本模糊匹配的技术问题，融合地理层级、数据库索引和文本检索技术，提出了一种新的基于层级结构的地址落位方法。本公开针对标准地理层级数据采用一种层级存储结构，基于层级存储结构建立倒排索引，结合地址的层级特点设计一种落位匹配方法，通过匹配地址链路的分数获得最优的落位结果。本公开可以有效应用于问卷调查、事件分派等政务服务场景，实现快速且精准的地址落位功能，具有广泛的应用价值。The address location method and device provided by the embodiments of the present disclosure address the technical problems of result similarity measurement and text fuzzy matching in address location, integrate geographic hierarchy, database index and text retrieval technology, and propose a new address location method based on hierarchical structure. The present disclosure adopts a hierarchical storage structure for standard geographic hierarchy data, establishes an inverted index based on the hierarchical storage structure, and designs a location matching method in combination with the hierarchical characteristics of the address, and obtains the optimal location result by matching the score of the address link. The present disclosure can be effectively applied to government service scenarios such as questionnaire surveys and event dispatching to achieve fast and accurate address location functions, and has wide application value.

应当理解，本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征，也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that the content described in this section is not intended to identify the key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will become easily understood through the following description.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

通过阅读参照以下附图所作的对非限制性实施例所作的详细描述，本公开的其它特征、目的和优点将会变得更明显：Other features, objects and advantages of the present disclosure will become more apparent from the detailed description of non-limiting embodiments made with reference to the following drawings:

图1是本公开的一个实施例可以应用于其中的示例性系统架构图；FIG1 is an exemplary system architecture diagram in which an embodiment of the present disclosure may be applied;

图2是根据本公开的地址落位方法的一个实施例的流程图；FIG2 is a flow chart of an embodiment of an address location method according to the present disclosure;

图3是根据本公开的地址落位方法的又一个实施例的流程图；FIG3 is a flow chart of another embodiment of an address location method according to the present disclosure;

图4a-4e是根据本公开的地址落位方法的应用场景的示意图；4a-4e are schematic diagrams of application scenarios of the address placement method according to the present disclosure;

图5是根据本公开的地址落位装置的一个实施例的结构示意图；FIG5 is a schematic structural diagram of an address placement device according to an embodiment of the present disclosure;

图6是适于用来实现本公开的实施例的电子设备的计算机系统的结构示意图。FIG. 6 is a schematic diagram of a computer system of an electronic device suitable for implementing an embodiment of the present disclosure.

具体实施方式DETAILED DESCRIPTION

下面结合附图和实施例对本公开作进一步的详细说明。可以理解的是，此处所描述的具体实施例仅仅用于解释相关发明，而非对该发明的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与有关发明相关的部分。The present disclosure is further described in detail below in conjunction with the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are only used to explain the relevant invention, rather than to limit the invention. It is also necessary to explain that, for ease of description, only the parts related to the relevant invention are shown in the accompanying drawings.

需要说明的是，在不冲突的情况下，本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。It should be noted that, in the absence of conflict, the embodiments and features in the embodiments of the present disclosure may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings and in combination with the embodiments.

图1示出了可以应用本公开的地址落位方法或地址落位装置的实施例的示例性系统架构100。FIG. 1 shows an exemplary system architecture 100 to which an embodiment of an address location method or an address location apparatus of the present disclosure may be applied.

如图1所示，系统架构100可以包括终端设备101、102、103，网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型，例如有线、无线通信链路或者光纤电缆等等。As shown in Fig. 1, system architecture 100 may include terminal devices 101, 102, 103, network 104 and server 105. Network 104 is used to provide a medium for communication links between terminal devices 101, 102, 103 and server 105. Network 104 may include various connection types, such as wired, wireless communication links or optical fiber cables, etc.

用户可以使用终端设备101、102、103通过网络104与服务器105交互，以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用，例如3D视频播放器、网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。Users can use terminal devices 101, 102, 103 to interact with server 105 through network 104 to receive or send messages, etc. Various communication client applications can be installed on terminal devices 101, 102, 103, such as 3D video players, web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, etc.

终端设备101、102、103可以是硬件，也可以是软件。当终端设备101、102、103为硬件时，可以是具有显示屏并且支持3D视频播放的各种电子设备，包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III，动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV，动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时，可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务)，也可以实现成单个软件或软件模块。在此不做具体限定。Terminal devices 101, 102, 103 can be hardware or software. When terminal devices 101, 102, 103 are hardware, they can be various electronic devices with display screens and supporting 3D video playback, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, Moving Picture Experts Group Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Group Audio Layer 4) players, laptop computers and desktop computers, etc. When terminal devices 101, 102, 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, to provide distributed services), or it can be implemented as a single software or software module. No specific limitation is made here.

服务器105可以是提供各种服务的服务器，例如对终端设备发送的地址文本进行处理，映射到标准地理层级节点，得到落位结果再反馈给终端设备。The server 105 may be a server that provides various services, such as processing the address text sent by the terminal device, mapping it to standard geographical hierarchy nodes, obtaining the location result and then feeding it back to the terminal device.

需要说明的是，服务器可以是硬件，也可以是软件。当服务器为硬件时，可以实现成多个服务器组成的分布式服务器集群，也可以实现成单个服务器。当服务器为软件时，可以实现成多个软件或软件模块(例如用来提供分布式服务的多个软件或软件模块)，也可以实现成单个软件或软件模块。在此不做具体限定。服务器也可以为分布式系统的服务器，或者是结合了区块链的服务器。服务器也可以是云服务器，或者是带人工智能技术的智能云计算服务器或智能云主机。It should be noted that the server can be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster consisting of multiple servers, or it can be implemented as a single server. When the server is software, it can be implemented as multiple software or software modules (for example, multiple software or software modules used to provide distributed services), or it can be implemented as a single software or software module. No specific limitation is made here. The server can also be a server of a distributed system, or a server combined with a blockchain. The server can also be a cloud server, or an intelligent cloud computing server or intelligent cloud host with artificial intelligence technology.

需要说明的是，本公开的实施例所提供的地址落位方法一般由服务器105执行，相应地，地址落位装置一般设置于服务器105中。It should be noted that the address location method provided in the embodiments of the present disclosure is generally executed by the server 105 , and accordingly, the address location device is generally disposed in the server 105 .

应该理解，图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要，可以具有任意数目的终端设备、网络和服务器。It should be understood that the number of terminal devices, networks and servers in Figure 1 is only illustrative. Any number of terminal devices, networks and servers may be provided according to implementation requirements.

下面对本文涉及的名词进行解释：The following is an explanation of the nouns involved in this article:

1、地理层级树tree：如图4b所示，一种基于行政区域划分的地理层级，本质为一种描述层级关系的树形结构，可形式化地表示为T<V,E,root>，其中V＝{node₁,…,node_n}表示树节点(也称为层级节点)的集合，E＝{edge₁,…,edge_h}表示边的集合，root表示整个树的根节点。每个树节点node代表一个地区/地点，是地址落位匹配的目标对象；边edge表示节点之间的父子关系，代表地区/地点之间的隶属关系。一般地，地区指代行政区划(如省、市、区、街道、乡镇、社区等)，地点指代兴趣点(Point of Interest,POI)，其中住宅地点的最低隶属层级为社区，其他POI(如医院、学校、商场等)的最低隶属层级为街道，也可隶属于区或市。1. Geographical hierarchy tree: As shown in Figure 4b, a geographical hierarchy based on administrative divisions is essentially a tree structure that describes hierarchical relationships. It can be formally represented as T<V,E,root>, where V = {node ₁ ,…,node _n } represents the set of tree nodes (also called hierarchical nodes), E = {edge ₁ ,…,edge _h } represents the set of edges, and root represents the root node of the entire tree. Each tree node represents a region/location, which is the target object of address location matching; the edge represents the parent-child relationship between nodes, representing the affiliation between regions/locations. Generally, regions refer to administrative divisions (such as provinces, cities, districts, streets, towns, communities, etc.), and places refer to points of interest (POI). The lowest level of a residential location is the community, and the lowest level of other POIs (such as hospitals, schools, shopping malls, etc.) is the street, which can also be affiliated to a district or city.

2、层级节点node：表示地理层级树tree中的一个地区/地点，对应图4b中的一个节点。层级节点具有三个基本属性：父节点parent，名称name，级别level。其中parent为当前节点的上级节点，用于维护树形的层级结构和形成地址链路；name为层级节点的名称，即地区/地点的名称；level为1-9的整数，表示层级节点的级别，与地理级别的对应关系如表1所示。在实际应用中，所有的地区/地点都可以表示为层级节点的形式，并以此构成标准地理层级树。每棵地理层级树的根节点root表示当前的省份，即root.level＝1。2. Hierarchical node node: represents a region/place in the geographic hierarchical tree tree, corresponding to a node in Figure 4b. Hierarchical nodes have three basic attributes: parent node parent, name name, and level level. Parent is the parent node of the current node, which is used to maintain the tree-shaped hierarchical structure and form address links; name is the name of the hierarchical node, that is, the name of the region/place; level is an integer from 1 to 9, indicating the level of the hierarchical node, and the correspondence with the geographical level is shown in Table 1. In practical applications, all regions/places can be represented in the form of hierarchical nodes, and thus form a standard geographic hierarchical tree. The root node root of each geographic hierarchical tree represents the current province, that is, root.level=1.

levellevel 地理级别Geographic level 11 省Province 22 市city 33 区district 44 街道/乡镇Street/Town 55 社区/村屯Community/Village 66 兴趣点Points of Interest 77 楼栋Building 88 单元unit 99 户household

表1层级节点级别及其含义Table 1 Hierarchical node levels and their meanings

3、地址链路link：基于地理层级树，给定一个层级节点node，由根节点root到node所构成的节点序列为一条地址链路link。例如，在图4b中，给定一个层级节点node₁₃，对应的地址链路为link＝<root,node₁,node₄,node₆,node₁₀,node₁₃>，提取该地址链路中每个层级节点的name属性则得到该地址链路的文本表示，即[省]-[市1]-[区2]-[街道1]-[社区2]-[小区2]。地址链路中级别level值最大的层级节点称为链路的终端节点，上述地址链路的终端节点为node₁₃。本文用小写字母l表示一条地址链路。3. Address link link: Based on the geographic hierarchical tree, given a hierarchical node node, the node sequence formed by the root node root to node is an address link link. For example, in Figure 4b, given a hierarchical node node ₁₃ , the corresponding address link is link = <root, node ₁ , node ₄ , node ₆ , node ₁₀ , node ₁₃ >, and extracting the name attribute of each hierarchical node in the address link will obtain the text representation of the address link, that is, [province]-[city 1]-[district 2]-[street 1]-[community 2]-[suburb 2]. The hierarchical node with the largest level value in the address link is called the terminal node of the link, and the terminal node of the above address link is node _13. This article uses the lowercase letter l to represent an address link.

4、地址文本query：用自然语言描述的地址文本，其中包含部分行政区划相关信息，例如“北京市经开区通泰国际公馆1号楼”。本文用小写字母q表示一个地址文本。4. Address text query: an address text described in natural language, which contains some administrative division related information, such as "Building 1, Tongtai International Residence, Economic Development Zone, Beijing". This article uses lowercase letter q to represent an address text.

5、词元word：对文本进行n-gram滑窗切片得到词元序列，滑动窗口大小n∈[2,maxSize]，maxSize为最大窗口大小，考虑到地点命名规律(地名通常不超过8个字)，一般maxSize设置为8。例如，针对“通泰国际公馆”文本进行2-gram的滑窗切片得到W_2-gram＝{“通泰”，“泰国”，“国际”，“际公”，“公馆”}的词元序列，进行3-gram的滑窗切片得到W_3-gram＝{“通泰国”，“泰国际”，“国际公”，“际公馆”}的词元序列，将所有n-gram的词元序列合并得到文本对应的模糊词元集合本文用小写字母w表示一个词元，用大写字母W表示对一个文本进行n-gram滑窗切片得到的所有词元集合。5. Word: Perform n-gram sliding window slicing on the text to obtain a word sequence, with a sliding window size of n∈[2,maxSize], where maxSize is the maximum window size. Considering the naming rules of places (place names usually do not exceed 8 characters), maxSize is generally set to 8. For example, perform 2-gram sliding window slicing on the text "Tongtai International Mansion" to obtain a word sequence of W _2-gram = {"Tongtai", "Thailand", "International", "International", "Mansion"}, perform 3-gram sliding window slicing to obtain a word sequence of W _3-gram = {"Tongtai", "Thailand International", "International", "Mansion"}, and merge all n-gram word sequences to obtain the fuzzy word set corresponding to the text This article uses lowercase letter w to represent a word unit, and uppercase letter W to represent the set of all word units obtained by performing n-gram sliding window slicing on a text.

在执行地址落位方法之前，可先执行如图4c所示的数据索引初始化过程：Before executing the address placement method, the data index initialization process shown in FIG. 4c may be performed first:

数据索引初始化包括构建倒排索引和计算词元权重两个步骤，分别生成词元倒排索引和词元权重索引，其中词元倒排索引用于快速查询词元匹配的层级节点，词元权重索引用于快速查询词元在当前地理层级数据中的重要程度的权重值。Data index initialization includes two steps: building an inverted index and calculating word weights. The word inverted index and word weight index are generated respectively. The word inverted index is used to quickly query the hierarchical nodes of word matching, and the word weight index is used to quickly query the weight value of the importance of the word in the current geographic level data.

(a)构建倒排索引(a) Building an inverted index

倒排索引是搜索引擎中常见的技术方案，可以加速数据查询的效率。在地址落位方法中，利用倒排索引技术可以快速定位查询词元对应匹配的层级节点。倒排索引Index(·)采用key-value类型的数据结构存储，其中key为词元，value为包含该词元的层级节点的列表。构建完成倒排索引后，可以通过键key_i快速查询到对应的值value_i，即value_i＝Index(key_i)。倒排索引中，所有的词元集合即为所有的key集合，即Index.Keys＝{w₁,…,w_N}，倒排索引中的词元总数记为N。Inverted index is a common technical solution in search engines, which can speed up the efficiency of data query. In the address placement method, the inverted index technology can be used to quickly locate the hierarchical nodes corresponding to the query word. The inverted index Index(·) is stored in a key-value type data structure, where the key is the word and the value is a list of hierarchical nodes containing the word. After the inverted index is built, the corresponding value value _i can be quickly queried through the key key _i , that is, value _i = Index(key _i ). In the inverted index, the set of all word elements is the set of all keys, that is, Index.Keys = {w ₁ ,…,w _N }, and the total number of word elements in the inverted index is recorded as N.

构建过程为：首先输入地理层级树T<V,E,root>，其中V＝{node₁,…,node_i,…,node_n}，node_i表示一个层级节点，具有三个基本属性，即node_i＝(parent,name,level)。针对每个层级节点node_i，对节点名称node_i.name进行n-gram滑窗切片得到对应的模糊词元集合k_i为文本对应的模糊词元总数。针对每个词元w_j，将当前节点node_i注册到词元w_j的倒排索引中，即Index(w_j)＝Index(w_j)∪{node_i}，直至所有层级节点遍历完成。倒排索引结构Index(·)构建完成后，可以通过查询词元w_j快速获取包含该词元的所有层级节点的列表Index(w_j)，Index(w_j)中层级节点的个数可以写作Index(w_j).size。同时得到倒排索引中所有的词元集合Index.Keys＝{w₁,…,w_N}，N为倒排索引中词元的总数。The construction process is as follows: first input the geographical hierarchical tree T<V,E,root>, where V = {node ₁ ,…,node _i ,…,node _n }, node _i represents a hierarchical node with three basic attributes, namely node _i = (parent, name, level). For each hierarchical node node _i , perform n-gram sliding window slicing on the node name node _i .name to obtain the corresponding fuzzy word set k _i is the total number of fuzzy word units corresponding to the text. For each word unit w _j , the current node node _i is registered in the inverted index of the word unit w _j , that is, Index(w _j ) = Index(w _j )∪{node _i }, until all hierarchical nodes are traversed. After the inverted index structure Index(·) is constructed, the list Index(w _j ) of all hierarchical nodes containing the word unit can be quickly obtained by querying the word unit w _j. The number of hierarchical nodes in Index(w _j ) can be written as Index(w _j ).size. At the same time, the set of all word units in the inverted index is obtained, Index.Keys = {w ₁ ,…,w _N }, where N is the total number of word units in the inverted index.

(b)构建权重索引(b) Constructing weight index

权重索引是存储每个词元的权重值的数据结构，用于快速查询词元在当前地理层级数据中重要程度的权重值，避免重复计算。权重索引也采用key-value类型的数据结构存储，其中key为词元，value为词元的权重值。构建完成权重索引后，可以通过键key_i快速查询到对应的值value_i，即value_i＝Weight(key_i)。权重的具体计算方式可通过步骤203。The weight index is a data structure that stores the weight value of each word, and is used to quickly query the weight value of the importance of the word in the current geographic level data to avoid repeated calculations. The weight index is also stored in a key-value type data structure, where the key is the word and the value is the weight value of the word. After the weight index is constructed, the corresponding value value _i can be quickly queried through the key key _i , that is, value _i =Weight(key _i ). The specific calculation method of the weight can be obtained through step 203.

如图4d所示，地址落位方法主要包括匹配层级节点、搜索候选链路和选择最优链路三个过程。其中，匹配层级节点通过对查询地址文本进行滑窗切片得到查询词元集合，根据倒排索引快速匹配层级节点，并根据词元权重和节点信息计算层级节点的匹配分数；搜索候选链路基于地理层级数据的树形结构，通过深度优先遍历的方式，搜索包含匹配节点的所有候选地址链路；选择最优链路针对所有候选链路，根据层级节点的匹配分数计算每条地址链路的分数，并返回分数最高的候选地址链路作为最终的查询落位结果。As shown in Figure 4d, the address placement method mainly includes three processes: matching hierarchical nodes, searching candidate links, and selecting the optimal link. Among them, matching hierarchical nodes obtains the query word set by sliding window slicing the query address text, quickly matches hierarchical nodes according to the inverted index, and calculates the matching score of hierarchical nodes according to word weights and node information; searching candidate links is based on the tree structure of geographic hierarchical data, and all candidate address links containing matching nodes are searched through depth-first traversal; selecting the optimal link For all candidate links, the score of each address link is calculated according to the matching score of the hierarchical node, and the candidate address link with the highest score is returned as the final query placement result.

继续参考图2，示出了根据本公开的地址落位方法的一个实施例的流程200。该地址落位方法，包括以下步骤：2, a process 200 of an embodiment of an address location method according to the present disclosure is shown. The address location method comprises the following steps:

步骤201，将地址文本进行滑窗切片得到词元集合。Step 201, performing sliding window slicing on the address text to obtain a word unit set.

在本实施例中，地址落位方法的执行主体(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式接收用户通过终端设备输入的地址文本。针对用户输入的地址文本q，进行n-gram滑窗切片得到所有可能的查询词元集合W＝{w₁,…,w_j,…,w_k}，其中w_j为一个查询词元。In this embodiment, the execution subject of the address location method (such as the server shown in FIG1 ) can receive the address text input by the user through the terminal device through a wired connection or a wireless connection. For the address text q input by the user, n-gram sliding window slicing is performed to obtain all possible query word element sets W = {w ₁ ,…,w _j ,…,w _k }, where w _j is a query word element.

对文本进行n-gram划窗切片得到词元序列，滑动窗口大小n∈[2,maxSize]，maxSize为最大窗口大小，考虑到地点命名规律(地名通常不超过8个字)，一般maxSize设置为8。例如，针对“通泰国际公馆”文本进行2-gram的划窗切片得到W_2-gram＝{“通泰”，“泰国”，“国际”，“际公”，“公馆”}的词元序列，进行3-gram的划窗切片得到W_3-gram＝{“通泰国”，“泰国际”，“国际公”，“际公馆”}的词元序列，将所有n-gram的词元序列合并得到文本对应的模糊词元集合本文用小写字母w表示一个词元，用大写字母W表示对一个文本进行n-gram划窗切片得到的所有词元集合。The text is sliced by n-gram window to obtain a word sequence, and the sliding window size n∈[2,maxSize], maxSize is the maximum window size. Considering the naming rules of places (place names usually do not exceed 8 characters), maxSize is generally set to 8. For example, for the text of "Tongtai International Mansion", 2-gram window slicing is performed to obtain the word sequence of W _2-gram = {"Tongtai", "Thailand", "International", "International", "Mansion"}, and 3-gram window slicing is performed to obtain the word sequence of W _3-gram = {"Tongtai", "Thailand International", "International", "Mansion"}. All n-gram word sequences are merged to obtain the fuzzy word set corresponding to the text. This article uses lowercase letter w to represent a word unit, and uppercase letter W to represent the set of all word units obtained by n-gram window slicing of a text.

步骤202，将词元集合与地理层级树中的层级节点数据进行匹配，获得匹配的层级节点。Step 202 : Match the word set with the hierarchical node data in the geographical hierarchical tree to obtain matching hierarchical nodes.

在本实施例中，其中，层级节点数据包括：将层级节点名称滑窗切片得到的词元集合。In this embodiment, the hierarchical node data includes: a word-unit set obtained by sliding window slicing the hierarchical node name.

针对单个词元w_j依次遍历所有的层级节点数据V＝{node₁,…,node_i,…,node_n}并判断是否符合条件若符合则将该节点加入匹配的节点集合M，整个过程的时间复杂度为O(n)。For a single word _wj, traverse all the hierarchical node data V = {node ₁ , ..., node _i , ..., node _n } in turn and determine whether it meets the conditions If it matches, the node is added to the matching node set M. The time complexity of the whole process is O(n).

步骤203，对于每个匹配的层级节点，根据匹配的词元的权重计算该层级节点的分数值。Step 203: For each matched level node, a score value of the level node is calculated according to the weight of the matched word.

在本实施例中，可预先根据层级节点数据中所有词元出现的频率计算每个词元的权重，出现的频率越高则权重越小，例如，计算每个词元的TF-IDF(term frequency–inverse document frequency，词频-逆文本频率)作为该词元的权重。In this embodiment, the weight of each word can be calculated in advance based on the frequency of occurrence of all words in the hierarchical node data. The higher the frequency of occurrence, the smaller the weight. For example, the TF-IDF (term frequency-inverse document frequency) of each word is calculated as the weight of the word.

每个层级节点对应多个词元。对于每个层级节点，通过该层级节点数据中的每个词元的权重能够计算出一个层级节点的分数值，最终从多个词元计算出的层级节点的分数值中取最高的分数值作为该层级节点的分数值。Each level node corresponds to multiple word units. For each level node, the score value of a level node can be calculated through the weight of each word unit in the level node data, and finally the highest score value among the scores of the level nodes calculated from multiple word units is taken as the score value of the level node.

可将词元的权重值作为层级节点的分数值。也可通过词元长度对词元的权重值进行加权修正后作为层级节点的分数值。The weight value of the word unit can be used as the score value of the hierarchical node. The weight value of the word unit can also be weighted and modified according to the length of the word unit and used as the score value of the hierarchical node.

在本实施例的一些可选的实现方式中，通过预先生成的词元权重索引查询匹配的词元的权重；其中，所述词元权重索引通过如下步骤生成：针对地理层级树中的每个层级节点，对节点名称进行滑窗切片得到对应的模糊词元集合；根据所述模糊词元集合中每个模糊词元在地理层级树中的层级节点数据中出现的频率计算模糊词元的权重，其中，出现的频率越高，权重越低；将每个模糊词元的权重采用键值类型的数据结构存储，生成词元权重索引。In some optional implementations of the present embodiment, the weight of a matching word is queried through a pre-generated word weight index; wherein the word weight index is generated through the following steps: for each hierarchical node in the geographic hierarchical tree, the node name is sliced by sliding window to obtain a corresponding fuzzy word set; the weight of the fuzzy word is calculated based on the frequency of occurrence of each fuzzy word in the hierarchical node data in the geographic hierarchical tree, wherein the higher the frequency of occurrence, the lower the weight; the weight of each fuzzy word is stored in a key-value type data structure to generate a word weight index.

词元权重索引是存储每个词元的权重值的数据结构，用于快速查询词元在当前地理层级数据中重要程度的权重值，避免重复计算。权重索引也采用key-value类型的数据结构存储，其中key为词元，value为词元的权重值。构建完成权重索引后，可以通过键key_i快速查询到对应的值value_i，即value_i＝Weight(key_i)。The word weight index is a data structure that stores the weight value of each word. It is used to quickly query the weight value of the importance of the word in the current geographic level data to avoid repeated calculations. The weight index is also stored in a key-value type data structure, where the key is the word and the value is the weight value of the word. After the weight index is built, the corresponding value value _i can be quickly queried through the key key _i , that is, value _i = Weight(key _i ).

每个词元权重值的定义如下：The definition of each word unit weight is as follows:

词元总数为N，词元w_i在地理层级数据中的出现次数为n，则词元w_i的重要程度(权重)可以定义为：The total number of words is N, and the number of occurrences of word _wi in the geographic level data is n. The importance (weight) of word _wi can be defined as:

其中Norm(·)为批量归一化操作，此处采用最大-最小归一化。Where Norm(·) is a batch normalization operation, and the maximum-minimum normalization is used here.

在本实施例的一些可选的实现方式中，所述根据匹配的词元的权重计算该层级节点的分数值，包括：根据词元权重、词元长度、层级节点的匹配比例计算该层级节点的分数值。In some optional implementations of this embodiment, the step of calculating the score value of the level node based on the weight of the matching word-gram includes: calculating the score value of the level node based on the word-gram weight, word-gram length, and matching ratio of the level node.

定义层级节点的匹配分数NodeScore计算方式。给定地址文本q和层级节点node_i，层级节点的分数值为：Defines the calculation method of the matching score NodeScore of the level node. Given an address text q and a level node node _i , the score value of the level node is:

其中W_q为地址文本q的n-gram滑窗切片的词元集合，w_j为W_q中的一个查询词元；Weight(w_j)表示词元w_j在地理层级树中的重要程度，其定义见公式(1)；lenght(·)表示计算字符个数，lenght(w_j)表示词元w_j的字符个数(表示词元长度)，length(node_i.name)表示层级节点node_i的节点名称的字符个数。表示层级节点的匹配比例。Where _Wq is the word set of the n-gram sliding window slice of the address text q, _wj is a query word in _Wq ; Weight( _wj ) represents the importance of word _wj in the geographic hierarchy tree, and its definition is shown in formula (1); lenght(·) represents the number of characters counted, lenght( _wj ) represents the number of characters of word _wj (indicating the word length), and length( _nodei.name ) represents the number of characters in the node name of the hierarchical node node _i . Indicates the matching ratio of level nodes.

步骤204，通过深度优先遍历获得所有包含匹配的层级节点的候选地址链路。Step 204: Obtain all candidate address links containing matching hierarchical nodes through depth-first traversal.

在本实施例中，通过匹配查询内容的层级节点集合M，在地理层级树T<V,E,root>中进行深度优先遍历，搜索所有候选地址链路的终端节点。In this embodiment, by matching the hierarchical node set M of the query content, a depth-first traversal is performed in the geographic hierarchical tree T<V, E, root> to search for the terminal nodes of all candidate address links.

例如，在图4e中，假设所有深色的节点为匹配查询内容的层级节点，这些节点构成的集合记为M，For example, in Figure 4e, assume that all dark nodes are hierarchical nodes that match the query content, and the set of these nodes is denoted as M.

即M＝{root,node₄,node₅,node₁₀,node₁₁,node₁₃}。在整个地理层级树中，候选链路形成的路径用虚线表示，共有以下三条候选地址链路L＝{l₁,l₂,l₃}：That is, M = {root, node ₄ , node ₅ , node ₁₀ , node ₁₁ , node ₁₃ }. In the entire geographical hierarchy tree, the path formed by the candidate links is represented by a dotted line. There are three candidate address links L = { _l1 , _l2 , _l3 }:

l1＝<root,node₂,node₅>l1＝<root,node ₂ ,node ₅ >

l2＝<root,node₁,node₄,node₇,node₁₁>l2＝<root,node ₁ ,node ₄ ,node ₇ ,node ₁₁ >

l3＝<root,node₁,node₄,node₆,node₁₀,node₁₃>l3＝<root,node ₁ ,node ₄ ,node ₆ ,node ₁₀ ,node ₁₃ >

以上三条候选地址链路对应的文本表示分别为[省]-[市2]-[兴趣点1]、[省]-[市1]-[区2]-[街道2]-[兴趣点3]和[省]-[市1]-[区2]-[街道1]-[社区2]-[小区2]。L＝{l₁,l₂,l₃}覆盖了所有匹配的层级节点M＝{root,node₄,node₅,node₁₀,node₁₁,node₁₃}。每条候选链路中级别level值最大的层级节点称为链路的终端节点，上述三条地址链路l₁,l₂,l₃对应的终端节点分别为node₅,node₁₁,node₁₃。The text representations corresponding to the above three candidate address links are [province]-[city 2]-[point of interest 1], [province]-[city 1]-[district 2]-[street 2]-[point of interest 3] and [province]-[city 1]-[district 2]-[street 1]-[community 2]-[community 2]. L = {l ₁ ,l ₂ ,l ₃ } covers all matching hierarchical nodes M = {root,node ₄ ,node ₅ ,node ₁₀ ,node ₁₁ ,node ₁₃ }. The hierarchical node with the largest level value in each candidate link is called the terminal node of the link. The terminal nodes corresponding to the above three address links l ₁ ,l ₂ ,l ₃ are node ₅ ,node ₁₁ ,node ₁₃ respectively.

为获得所有候选链路的终端节点，采用深度优先遍历整个地理层级树，可以通过递归过程实现：To obtain the terminal nodes of all candidate links, the entire geographical hierarchy tree is traversed in depth-first order, which can be achieved through a recursive process:

S0:给定匹配的节点集合M，地理层级树T<V,E,root>，将根节点root赋为currNodeS0: Given a matching node set M, a geographical tree T<V,E,root>, assign the root node root to currNode

S1:初始化以currNode为根的子树下的终端节点集合为S＝{}S1: Initialize the terminal node set under the subtree with currNode as the root to S = {}

S2:获取currNode的孩子节点集合childs＝{node_i|node_i.parent＝currNode}S2: Get the child node set of currNode: children = {node _i | node _i .parent = currNode}

S3:若childs为空且currNode∈M，则S.add(currNode)并返回S，退出当前调用转到S5S3: If childrens is empty and currNode∈M, then S.add(currNode) and return S, exit the current call and go to S5

S4:遍历childs中每个节点node_i，将node_i赋为currNode，执行递归调用转到S1S4: traverse each node node _i in childrens, assign node _i to currNode, perform a recursive call and go to S1

S5:将返回的终端节点集合S′加入SS5: Add the returned terminal node set S′ to S

S6:若S为空且currNode∈M，则S.add(currNode)S6: If S is empty and currNode∈M, then S.add(currNode)

S7:如果currNode＝root则结束递归，返回S；否则返回S，退出当前调用转到S5S7: If currNode = root, end the recursion and return to S; otherwise, return to S, exit the current call and go to S5

通过以上递归过程，最终得到根节点root子树下的所有终端节点集合S。Through the above recursive process, we finally get the set S of all terminal nodes under the subtree of the root node root.

步骤205，根据候选地址链路包含的层级节点的分数值计算每个候选地址链路的分数，并选择分数最高的候选地址链路作为落位结果。Step 205: Calculate the score of each candidate address link according to the score values of the hierarchical nodes included in the candidate address link, and select the candidate address link with the highest score as the placement result.

在本实施例中，通过计算所有候选地址链路的链路分数，选择分数值最大的地址链路作为最终的查询落位结果。地址链路分数由候选链路中匹配的层级节点分数计算得到，地址链路分数越大表示与查询内容匹配程度越高。In this embodiment, by calculating the link scores of all candidate address links, the address link with the largest score is selected as the final query placement result. The address link score is calculated from the scores of the matching hierarchical nodes in the candidate links. The larger the address link score, the higher the degree of match with the query content.

可直接将候选地址链路包含的层级节点的分数值累加作为候选地址链路的分数。The score values of the hierarchical nodes included in the candidate address link may be directly accumulated as the score of the candidate address link.

在本实施例的一些可选的实现方式中，对于候选地址链路包含的每个层级节点，通过层级权重对该层级节点的分数值加权；将候选地址链路包含的所有层级节点的分数值的加权值累加，作为候选地址链路的分数。In some optional implementations of this embodiment, for each hierarchical node included in the candidate address link, the score value of the hierarchical node is weighted by the hierarchical weight; the weighted values of the score values of all hierarchical nodes included in the candidate address link are accumulated as the score of the candidate address link.

定义地址链路的匹配分数LinkScore计算方式。给定查询地址文本q和候选地址链路link_i＝<node₁，...，node_k，...，node_s>，其中node_k表示候选链路link_i中按照等级level升序排列的第k个层级节点，地址链路匹配分数为：Define the calculation method of the address link matching score LinkScore. Given a query address text q and a candidate address link link _i = <node ₁ , ..., node _k , ..., node _s >, where node _k represents the kth level node in the candidate link link _i in ascending order of level, the address link matching score is:

其中log(node_k.level)表示层级权重，在地址链路中，匹配到等级level值越大的层级节点的权重越高；NodeScore(node_k,q)的表示查询文本q与层级节点node_k的匹配分数，其定义见公式(2)。Where log(node _k .level) represents the hierarchical weight. In the address link, the weight of the hierarchical node with a larger level value is higher. NodeScore(node _k ,q) represents the matching score between the query text q and the hierarchical node node _k . Its definition is shown in formula (2).

通过遍历每条候选地址链路并根据上述定义计算链路分数，同时记录最大分数的地址链路，最终将链路分数值最大的结果作为落位结果返回给用户。By traversing each candidate address link and calculating the link score according to the above definition, and recording the address link with the maximum score, the result with the maximum link score value is finally returned to the user as the placement result.

本公开的上述实施例提供的方法，针对地理层级数据设计了一种基于滑窗切片的匹配方法，通过对查询文本进行n-gram滑窗切片得到查询词元匹配层级节点，实现对地理层级数据的模糊匹配。本公开设计了一种基于层级结构的分数计算方法，实现对所有候选地址链路进行匹配程度的排序，基于排序结果返回分数最高的候选地址链路。The method provided by the above embodiment of the present disclosure designs a matching method based on sliding window slicing for geographic hierarchical data, obtains query word matching hierarchical nodes by performing n-gram sliding window slicing on the query text, and realizes fuzzy matching of geographic hierarchical data. The present disclosure designs a score calculation method based on a hierarchical structure, realizes the ranking of the matching degree of all candidate address links, and returns the candidate address link with the highest score based on the ranking result.

进一步参考图3，其示出了地址落位方法的又一个实施例的流程300。该地址落位方法的流程300，包括以下步骤：Further referring to FIG3 , it shows a process 300 of another embodiment of the address location method. The process 300 of the address location method includes the following steps:

步骤301，将地址文本进行滑窗切片得到词元集合。Step 301, perform sliding window slicing on the address text to obtain a word unit set.

步骤302，针对词元集合中每个词元查询预先构建的倒排索引获得所有包含该词元的层级节点，获得匹配的层级节点。Step 302: query the pre-built inverted index for each word in the word set to obtain all hierarchical nodes containing the word, and obtain matching hierarchical nodes.

步骤303，对于每个匹配的层级节点，根据匹配的词元的权重计算该层级节点的分数值。Step 303: For each matched level node, a score value of the level node is calculated according to the weight of the matched word.

步骤304，通过深度优先遍历获得所有包含匹配的层级节点的候选地址链路。Step 304: Obtain all candidate address links containing matching hierarchical nodes through depth-first traversal.

步骤305，根据候选地址链路包含的层级节点的分数值计算每个候选地址链路的分数，并选择分数最高的候选地址链路作为落位结果。Step 305: Calculate the score of each candidate address link according to the score values of the hierarchical nodes included in the candidate address link, and select the candidate address link with the highest score as the placement result.

步骤301、303-305与步骤201、203-205基本相同，因此不再赘述。Steps 301, 303-305 are substantially the same as steps 201, 203-205, and thus will not be described in detail.

前文已经介绍过预先构建倒排索引的过程，步骤302是通过查询预先构建的倒排索引的方式快速查询层级节点。The process of pre-building an inverted index has been introduced above. Step 302 is to quickly query the hierarchical nodes by querying the pre-built inverted index.

针对层级节点匹配过程进行了查询效率的优化设计。通过查询词元W＝{w₁,…,w_j,…,w_k}匹配地理层级树，步骤202的实现方式为依次遍历所有的层级节点V＝{node₁,…,node_i,…,node_n}并判断是否符合条件若符合则将该节点加入匹配的节点集合M，整个过程的时间复杂度为O(n*k)；在步骤302中，通过提前构建好的词元倒排索引结构可以快速完成层级节点匹配操作，即针对词元w_j查询倒排索引Index(·)获得词元匹配的层级节点列表Index(w_j)，并将该词元对应的Index(w_j)直接加入M中，即M＝M∪Index(w_j)，倒排索引查询的时间复杂度为O(1)，整个过程的时间复杂度可以降低为O(k)。The query efficiency is optimized for the hierarchical node matching process. By matching the geographical hierarchical tree with the query word W = {w ₁ ,…,w _j ,…,w _k }, step 202 is implemented by traversing all hierarchical nodes V = {node ₁ ,…,node _i ,…,node _n } in turn and judging whether they meet the conditions. If it matches, the node is added to the matching node set M. The time complexity of the whole process is O(n*k). In step 302, the hierarchical node matching operation can be quickly completed through the pre-built word inverted index structure, that is, the inverted index Index(·) is queried for the word w _j to obtain the hierarchical node list Index(w _j ) matching the word, and the Index(w _j ) corresponding to the word is directly added to M, that is, M=M∪Index(w _j ). The time complexity of the inverted index query is O(1), and the time complexity of the whole process can be reduced to O(k).

通过结合数据库技术和全文检索技术，设计词元倒排索引和词元权重索引，加速地址落位匹配过程，降低查询耗时。By combining database technology and full-text retrieval technology, we design term inverted index and term weight index to accelerate the address location matching process and reduce query time.

继续参见图4a，图4a是根据本实施例的地址落位方法的应用场景的示意图。在图4a的应用场景中，首先执行数据初始化过程。装置基于标准的地理层级数据构建倒排索引，索引结构为一种key-value类型的数据结构，key为对层级节点名称进行滑窗切片得到的文本片段(称之为词元)，value为包含该词元的所有层级节点的列表，这种倒排索引结构可以通过查询词元快速定位到所有的包含该词元的层级节点。构建完成倒排索引后，根据每个词元在地理层级数据中出现的频率计算词元的权重，该权重反映了词元的重要程度(即出现的频率越高，词元的重要程度相对越低)。构建完成倒排索引并计算完成词元权重就结束了装置的准备工作，值得注意的是，数据初始化过程只需要在装置启动时运行一次，将处理完成的倒排索引和词项权重存储在缓存中，后续可以进行无限次的地址落位过程。Continuing to refer to FIG. 4a, FIG. 4a is a schematic diagram of an application scenario of the address location method according to the present embodiment. In the application scenario of FIG. 4a, the data initialization process is first performed. The device constructs an inverted index based on standard geographic hierarchical data. The index structure is a key-value type data structure. The key is a text fragment (called a word element) obtained by sliding window slicing the hierarchical node name, and the value is a list of all hierarchical nodes containing the word element. This inverted index structure can quickly locate all hierarchical nodes containing the word element by querying the word element. After the inverted index is constructed, the weight of the word element is calculated according to the frequency of each word element in the geographic hierarchical data. The weight reflects the importance of the word element (that is, the higher the frequency of occurrence, the lower the importance of the word element). The preparation of the device is completed after the inverted index is constructed and the word element weight is calculated. It is worth noting that the data initialization process only needs to be run once when the device is started, and the processed inverted index and term weight are stored in the cache. The address location process can be performed unlimited times later.

接着执行地址落位过程。输入用户查询的地址文本，首先对查询地址文本进行滑窗切片得到词元，该操作与数据初始化过程中对层级节点名称进行滑窗切片的操作一致。接着利用词元查询预先构建的倒排索引获得所有包含该词元的层级节点，由此实现层级节点匹配操作。获得所有匹配到的层级节点及匹配信息，根据词元权重、词元长度、层级节点的匹配比例等信息计算每个匹配的层级节点的分数值。然后通过深度优先遍历获得所有包含匹配层级节点的候选地址链路，该链路为一条由根节点到匹配终端节点的有序的层级节点序列。最后，根据候选地址链路包含的层级节点信息计算整个地址链路的分数，选择分数最高的地址链路作为落位结果返回给用户。Then, the address placement process is executed. The address text queried by the user is input, and first, the query address text is sliced by sliding window to obtain word units. This operation is consistent with the operation of sliding window slicing of the hierarchical node name during the data initialization process. Then, the inverted index pre-built by the word unit query is used to obtain all hierarchical nodes containing the word unit, thereby realizing the hierarchical node matching operation. All matched hierarchical nodes and matching information are obtained, and the score value of each matched hierarchical node is calculated based on the word unit weight, word unit length, and matching ratio of the hierarchical node. Then, all candidate address links containing matching hierarchical nodes are obtained through depth-first traversal. The link is an ordered sequence of hierarchical nodes from the root node to the matching terminal node. Finally, the score of the entire address link is calculated based on the hierarchical node information contained in the candidate address link, and the address link with the highest score is selected as the placement result and returned to the user.

进一步参考图5，作为对上述各图所示方法的实现，本公开提供了一种地址落位装置的一个实施例，该装置实施例与图2所示的方法实施例相对应，该装置具体可以应用于各种电子设备中。Further referring to FIG. 5 , as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an address placement device, which corresponds to the method embodiment shown in FIG. 2 , and can be specifically applied to various electronic devices.

如图5所示，本实施例的地址落位装置500包括：切词单元501、匹配单元502、计算单元503、链路单元504和选择单元505。切词单元501，被配置成将地址文本进行滑窗切片得到词元集合；匹配单元502，被配置成将所述词元集合与地理层级树中的层级节点数据进行匹配，获得匹配的层级节点，其中，层级节点数据包括：将层级节点名称滑窗切片得到的词元集合；计算单元503，被配置成对于每个匹配的层级节点，根据匹配的词元的权重计算该层级节点的分数值；链路单元504，被配置成通过深度优先遍历获得所有包含匹配的层级节点的候选地址链路；选择单元505，被配置成根据候选地址链路包含的层级节点的分数值计算每个候选地址链路的分数，并选择分数最高的候选地址链路作为落位结果。As shown in Fig. 5, the address placement device 500 of this embodiment includes: a word segmentation unit 501, a matching unit 502, a calculation unit 503, a link unit 504 and a selection unit 505. The word segmentation unit 501 is configured to perform sliding window slicing on the address text to obtain a word unit set; the matching unit 502 is configured to match the word unit set with the hierarchical node data in the geographical hierarchical tree to obtain a matched hierarchical node, wherein the hierarchical node data includes: a word unit set obtained by sliding window slicing the hierarchical node name; the calculation unit 503 is configured to calculate the score value of each matched hierarchical node according to the weight of the matched word unit; the link unit 504 is configured to obtain all candidate address links containing matched hierarchical nodes through depth-first traversal; the selection unit 505 is configured to calculate the score of each candidate address link according to the score value of the hierarchical node contained in the candidate address link, and select the candidate address link with the highest score as the placement result.

在本实施例中，地址落位装置500的切词单元501、匹配单元502、计算单元503、链路单元504和选择单元505的具体处理可以参考图2对应实施例中的步骤201、步骤202、步骤203、步骤204和步骤205。In this embodiment, the specific processing of the word segmentation unit 501, matching unit 502, calculation unit 503, link unit 504 and selection unit 505 of the address placement device 500 can refer to steps 201, 202, 203, 204 and 205 in the corresponding embodiment of Figure 2.

在本实施例的一些可选的实现方式中，所述匹配单元502进一步被配置成：针对所述词元集合中每个词元查询预先构建的倒排索引获得所有包含该词元的层级节点，获得匹配的层级节点。In some optional implementations of this embodiment, the matching unit 502 is further configured to: query the pre-constructed inverted index for each word in the word set to obtain all hierarchical nodes containing the word, and obtain matching hierarchical nodes.

在一些实施例中，所述装置还包括倒排单元(附图中未示出)，被配置成：针对地理层级树中的每个层级节点，对节点名称进行滑窗切片得到对应的模糊词元集合；针对所述模糊词元集合中每个模糊词元，将包含该模糊词元的所有层级节点注册到该词元的倒排索引中。In some embodiments, the device also includes an inverted index unit (not shown in the drawings), which is configured to: for each hierarchical node in the geographic hierarchical tree, perform sliding window slicing on the node name to obtain a corresponding fuzzy word set; for each fuzzy word in the fuzzy word set, register all hierarchical nodes containing the fuzzy word into the inverted index of the word.

在一些实施例中，所述装置还包括权重计算单元(附图中未示出)，被配置成：根据所述词元集合中每个词元在地理层级树中的层级节点数据中出现的频率计算词元的权重，其中，出现的频率越高，权重越低。In some embodiments, the device also includes a weight calculation unit (not shown in the drawings), which is configured to calculate the weight of the word based on the frequency of each word in the word set appearing in the hierarchical node data in the geographic hierarchical tree, wherein the higher the frequency of occurrence, the lower the weight.

在一些实施例中，所述计算单元503进一步被配置成：根据词元权重、词元长度、层级节点的匹配比例计算该层级节点的分数值。In some embodiments, the calculation unit 503 is further configured to: calculate the score value of the level node according to the word unit weight, word unit length, and the matching ratio of the level node.

在一些实施例中，所述链路单元504进一步被配置成：对于候选地址链路包含的每个层级节点，通过层级权重对该层级节点的分数值加权；将候选地址链路包含的所有层级节点的分数值的加权值累加，作为候选地址链路的分数。In some embodiments, the link unit 504 is further configured to: for each hierarchical node included in the candidate address link, weight the score value of the hierarchical node by the hierarchical weight; and accumulate the weighted values of the score values of all hierarchical nodes included in the candidate address link as the score of the candidate address link.

在一些实施例中，所述装置还包括权重索引单元(附图中未示出)，被配置成：在所述根据匹配的词元的权重计算该层级节点的分数值之前，通过预先生成的词元权重索引查询匹配的词元的权重；其中，所述词元权重索引通过如下步骤生成：针对地理层级树中的每个层级节点，对节点名称进行滑窗切片得到对应的模糊词元集合；根据所述模糊词元集合中每个模糊词元在地理层级树中的层级节点数据中出现的频率计算模糊词元的权重，其中，出现的频率越高，权重越低；将每个模糊词元的权重采用键值类型的数据结构存储，生成词元权重索引。In some embodiments, the device also includes a weight index unit (not shown in the drawings), which is configured to: before calculating the score value of the hierarchical node based on the weight of the matching word, query the weight of the matching word through a pre-generated word weight index; wherein the word weight index is generated through the following steps: for each hierarchical node in the geographic hierarchical tree, the node name is sliced by sliding window to obtain the corresponding fuzzy word set; the weight of the fuzzy word is calculated according to the frequency of occurrence of each fuzzy word in the fuzzy word set in the hierarchical node data in the geographic hierarchical tree, wherein the higher the frequency of occurrence, the lower the weight; the weight of each fuzzy word is stored in a key-value type data structure to generate a word weight index.

需要说明的是，本公开的技术方案中，所涉及的用户个人信息的采集、收集、更新、分析、处理、使用、传输、存储等方面，均符合相关法律法规的规定，被用于合法的用途，且不违背公序良俗。对用户个人信息采取必要措施，防止对用户个人信息数据的非法访问，维护用户个人信息安全、网络安全和国家安全。It should be noted that the collection, collection, updating, analysis, processing, use, transmission, storage and other aspects of user personal information involved in the technical solution of this disclosure are in compliance with the provisions of relevant laws and regulations, are used for legitimate purposes, and do not violate public order and good morals. Necessary measures are taken for user personal information to prevent illegal access to user personal information data and maintain the security of user personal information, network security and national security.

根据本公开的实施例，本公开还提供了一种电子设备、一种可读存储介质。According to an embodiment of the present disclosure, the present disclosure also provides an electronic device and a readable storage medium.

一种电子设备，包括：一个或多个处理器；存储装置，其上存储有一个或多个计算机程序，当所述一个或多个计算机程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现流程200或300所述的方法。An electronic device comprises: one or more processors; a storage device on which one or more computer programs are stored, and when the one or more computer programs are executed by the one or more processors, the one or more processors implement the method described in process 200 or 300.

一种计算机可读介质，其上存储有计算机程序，其中，所述计算机程序被处理器执行时实现流程200或300所述的方法。A computer-readable medium stores a computer program, wherein the computer program implements the method described in process 200 or 300 when executed by a processor.

图6示出了可以用来实施本公开的实施例的示例电子设备600的示意性框图。电子设备旨在表示各种形式的数字计算机，诸如，膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置，诸如，个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例，并且不意在限制本文中描述的和/或者要求的本公开的实现。FIG6 shows a schematic block diagram of an example electronic device 600 that can be used to implement an embodiment of the present disclosure. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely examples and are not intended to limit the implementation of the present disclosure described and/or required herein.

如图6所示，设备600包括计算单元601，其可以根据存储在只读存储器(ROM)602中的计算机程序或者从存储单元608加载到随机访问存储器(RAM)603中的计算机程序，来执行各种适当的动作和处理。在RAM 603中，还可存储设备600操作所需的各种程序和数据。计算单元601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。As shown in Figure 6, the device 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a read-only memory (ROM) 602 or a computer program loaded from a storage unit 608 into a random access memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The computing unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to the bus 604.

设备600中的多个部件连接至I/O接口605，包括：输入单元606，例如键盘、鼠标等；输出单元607，例如各种类型的显示器、扬声器等；存储单元608，例如磁盘、光盘等；以及通信单元609，例如网卡、调制解调器、无线通信收发机等。通信单元609允许设备600通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606, such as a keyboard, a mouse, etc.; an output unit 607, such as various types of displays, speakers, etc.; a storage unit 608, such as a disk, an optical disk, etc.; and a communication unit 609, such as a network card, a modem, a wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.

计算单元601可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元601的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元601执行上文所描述的各个方法和处理，例如路区规划方法。例如，在一些实施例中，路区规划方法可被实现为计算机软件程序，其被有形地包含于机器可读介质，例如存储单元608。在一些实施例中，计算机程序的部分或者全部可以经由ROM 602和/或通信单元609而被载入和/或安装到设备600上。当计算机程序加载到RAM 603并由计算单元601执行时，可以执行上文描述的路区规划方法的一个或多个步骤。备选地，在其他实施例中，计算单元601可以通过其他任何适当的方式(例如，借助于固件)而被配置为执行路区规划方法。The computing unit 601 may be a variety of general and/or special processing components with processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital signal processors (DSPs), and any appropriate processors, controllers, microcontrollers, etc. The computing unit 601 performs the various methods and processes described above, such as a road area planning method. For example, in some embodiments, the road area planning method may be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as a storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed on the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the road area planning method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the road area planning method in any other appropriate manner (e.g., by means of firmware).

本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括：实施在一个或者多个计算机程序中，该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释，该可编程处理器可以是专用或者通用可编程处理器，可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令，并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include: being implemented in one or more computer programs that can be executed and/or interpreted on a programmable system including at least one programmable processor, which can be a special purpose or general purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.

用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器，使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行，作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。The program code for implementing the method of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special-purpose computer, or other programmable data processing device, so that the program code, when executed by the processor or controller, implements the functions/operations specified in the flow chart and/or block diagram. The program code may be executed entirely on the machine, partially on the machine, partially on the machine and partially on a remote machine as a stand-alone software package, or entirely on a remote machine or server.

在本公开的上下文中，机器可读介质可以是有形的介质，其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备，或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the foregoing. A more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

为了提供与用户的交互，可以在计算机上实施此处描述的系统和技术，该计算机具有：用于向用户显示信息的显示装置(例如，CRT(阴极射线管)或者LCD(液晶显示器)监视器)；以及键盘和指向装置(例如，鼠标或者轨迹球)，用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互；例如，提供给用户的反馈可以是任何形式的传感反馈(例如，视觉反馈、听觉反馈、或者触觉反馈)；并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user can provide input to the computer. Other types of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including acoustic input, voice input, or tactile input).

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如，作为数据服务器)、或者包括中间件部件的计算系统(例如，应用服务器)、或者包括前端部件的计算系统(例如，具有图形用户界面或者网络浏览器的用户计算机，用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如，通信网络)来将系统的部件相互连接。通信网络的示例包括：局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., a user computer with a graphical user interface or a web browser through which a user can interact with implementations of the systems and techniques described herein), or a computing system that includes any combination of such back-end components, middleware components, or front-end components. The components of the system may be interconnected by any form or medium of digital data communication (e.g., a communications network). Examples of communications networks include: a local area network (LAN), a wide area network (WAN), and the Internet.

计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以为分布式系统的服务器，或者是结合了区块链的服务器。服务器也可以是云服务器，或者是带人工智能技术的智能云计算服务器或智能云主机。服务器可以为分布式系统的服务器，或者是结合了区块链的服务器。服务器也可以是云服务器，或者是带人工智能技术的智能云计算服务器或智能云主机。A computer system may include a client and a server. The client and the server are generally remote from each other and usually interact through a communication network. The relationship between the client and the server is generated by computer programs running on the respective computers and having a client-server relationship with each other. The server may be a server of a distributed system, or a server combined with a blockchain. The server may also be a cloud server, or an intelligent cloud computing server or intelligent cloud host with artificial intelligence technology. The server may be a server of a distributed system, or a server combined with a blockchain. The server may also be a cloud server, or an intelligent cloud computing server or intelligent cloud host with artificial intelligence technology.

应该理解，可以使用上面所示的各种形式的流程，重新排序、增加或删除步骤。例如，本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行，只要能够实现本公开公开的技术方案所期望的结果，本文在此不进行限制。It should be understood that the various forms of processes shown above can be used to reorder, add or delete steps. For example, the steps recorded in this disclosure can be executed in parallel, sequentially or in different orders, as long as the desired results of the technical solutions disclosed in this disclosure can be achieved, and this document does not limit this.

上述具体实施方式，并不构成对本公开保护范围的限制。本领域技术人员应该明白的是，根据设计要求和其他因素，可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等，均应包含在本公开保护范围之内。The above specific implementations do not constitute a limitation on the protection scope of the present disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions can be made according to design requirements and other factors. Any modification, equivalent substitution and improvement made within the spirit and principle of the present disclosure shall be included in the protection scope of the present disclosure.

Claims

1. A method for address placement, comprising:

Slice the address text through sliding windows to obtain a word-unit set;

Matching the word-gram set with the hierarchical node data in the geographical hierarchical tree to obtain matching hierarchical nodes, wherein the hierarchical node data includes: a word-gram set obtained by sliding window slicing of the hierarchical node name;

For each matched level node, the score value of the level node is calculated according to the weight of the matched word;

Obtain all candidate address links containing matching hierarchical nodes through depth-first traversal;

The score of each candidate address link is calculated according to the score values of the hierarchical nodes contained in the candidate address link, and the candidate address link with the highest score is selected as the placement result.

2. The method according to claim 1, wherein the step of matching the word set with the hierarchical node data in the geographical hierarchical tree to obtain the matching hierarchical nodes comprises:

For each word in the word set, a pre-built inverted index is queried to obtain all hierarchical nodes containing the word, and a matching hierarchical node is obtained.

3. The method according to claim 2, wherein the method further comprises:

For each level node in the geographical level tree, the node name is sliced by sliding window to obtain the corresponding fuzzy word set;

For each fuzzy word-gram in the fuzzy word-gram set, all hierarchical nodes containing the fuzzy word-gram are registered in the inverted index of the word-gram.

4. The method according to claim 1, wherein the method further comprises:

The weight of each word in the word set is calculated according to the frequency of occurrence of each word in the hierarchical node data in the geographical hierarchical tree, wherein the higher the frequency of occurrence, the lower the weight.

5. The method according to claim 1, wherein the step of calculating the score value of the level node according to the weight of the matched word-element comprises:

The score value of the level node is calculated based on the word unit weight, word unit length, and the matching ratio of the level nodes.

6. The method according to claim 1, wherein the step of calculating the score of each candidate address link according to the score values of the hierarchical nodes included in the candidate address link comprises:

For each level node included in the candidate address link, the score value of the level node is weighted by the level weight;

The weighted values of the score values of all level nodes included in the candidate address link are accumulated as the score of the candidate address link.

7. The method according to claim 1, wherein before calculating the score value of the level node according to the weight of the matched word-gram, the method further comprises:

Query the weight of the matching word through the pre-generated word weight index;

The word unit weight index is generated by the following steps:

Calculate the weight of the fuzzy word according to the frequency of occurrence of each fuzzy word in the fuzzy word set in the hierarchical node data in the geographical hierarchical tree, wherein the higher the frequency of occurrence, the lower the weight;

The weight of each fuzzy word is stored in a key-value data structure to generate a word weight index.

8. An address placement device, comprising:

A word segmentation unit is configured to perform sliding window slicing on the address text to obtain a word unit set;

A matching unit is configured to match the word-gram set with the hierarchical node data in the geographical hierarchical tree to obtain a matched hierarchical node, wherein the hierarchical node data includes: a word-gram set obtained by sliding window slicing the hierarchical node name;

A calculation unit is configured to calculate, for each matched level node, a score value of the level node according to the weight of the matched word element;

The link unit is configured to obtain all candidate address links containing matching hierarchical nodes through depth-first traversal;

The selection unit is configured to calculate the score of each candidate address link according to the score values of the hierarchical nodes included in the candidate address link, and select the candidate address link with the highest score as the placement result.

9. An electronic device comprising:

one or more processors;

a storage device having one or more computer programs stored thereon,

When the one or more computer programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1 to 7.

10. A computer-readable medium having a computer program stored thereon, wherein the computer program implements the method according to any one of claims 1 to 7 when executed by a processor.