Nothing Special   »   [go: up one dir, main page]

CN110543586B - Multi-user identity fusion method, device, equipment and storage medium - Google Patents

Multi-user identity fusion method, device, equipment and storage medium Download PDF

Info

Publication number
CN110543586B
CN110543586B CN201910831646.7A CN201910831646A CN110543586B CN 110543586 B CN110543586 B CN 110543586B CN 201910831646 A CN201910831646 A CN 201910831646A CN 110543586 B CN110543586 B CN 110543586B
Authority
CN
China
Prior art keywords
identity
connection
nodes
user
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910831646.7A
Other languages
Chinese (zh)
Other versions
CN110543586A (en
Inventor
张阳
杨双全
刘畅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910831646.7A priority Critical patent/CN110543586B/en
Publication of CN110543586A publication Critical patent/CN110543586A/en
Application granted granted Critical
Publication of CN110543586B publication Critical patent/CN110543586B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, a device, equipment and a storage medium for fusing multiple user identities, and relates to the technical field of big data. The specific implementation scheme is as follows: acquiring user identity data, wherein the user identity data has at least two identity characteristics; according to at least two identity characteristics of the user identity data, constructing a map network, wherein the map network comprises: the node representing the identity characteristics and the connection edge representing the incidence relation of the identity characteristics; according to the connection relationship between nodes and the connection relationship between the nodes and connection edges in the graph network, determining an identity group of the same user, wherein the identity group comprises: a plurality of identity characteristics. According to the technical scheme, the identity characteristics of the user identity data are associated in the form of the map network, so that not only can the identity groups corresponding to a plurality of identity characteristics of the same user be accurately determined, but also the method can be applied to any scene, and the problem of limited application range is avoided.

Description

Multi-user identity fusion method, device, equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for fusing multiple user identities in a big data technology.
Background
Under the wide environment of the popularization of the internet, the virtual user identity (such as equipment ID, network ID and the like) and the real user identity (such as identity information such as identity card numbers, mobile phone numbers and the like, user asset information such as vehicle properties, house properties and the like) are associated, and the complete behaviors of people can be restored from different expression carriers, so that great product commercial values are created.
In the prior art, a scheme of multi-identity fusion mainly determines a plurality of different user identities meeting the same rule as belonging to the same user based on a preset rule, and fuses the plurality of user identities of the user to make the user identities related to each other.
However, although the accuracy of the attribution judgment of the fusion method is high, the used rule is set manually, so that the fusion method cannot be applied to complex scenes, and the application range is limited.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for fusing multiple user identities, which are used for solving the problems that the existing fusion method cannot be applied to complex scenes and the application range is limited.
In a first aspect, the present application provides a method for fusing multiple user identities, including:
acquiring user identity data, wherein the user identity data has at least two identity characteristics;
according to at least two identity characteristics of the user identity data, constructing a graph network, wherein the graph network comprises: the node representing the identity characteristics and the connection edge representing the incidence relation of the identity characteristics;
according to the connection relationship among the nodes in the graph network and the connection relationship among the nodes and the connection edges, determining an identity group of the same user, wherein the identity group comprises: a plurality of identity characteristics.
In the embodiment, the identity characteristics of the user identity data are associated in the form of the graph network, so that not only can the identity groups corresponding to a plurality of identity characteristics of the same user be accurately determined, but also the method can be applied to any scene, and the problem of limited use range is avoided.
In a possible design of the first aspect, the obtaining user identity data includes:
acquiring preset configuration information, wherein the configuration information comprises: data source type, data source path, extraction mode and extraction period;
and extracting the user identity data from the data source corresponding to the data source type according to the data source path, the extraction mode and the extraction period.
In this embodiment, the user data extraction is implemented based on each information dependency in the preset configuration information, which can ensure that the data extraction task can be executed stably and orderly.
Optionally, the configuration information further includes: a field mapping relationship;
the method further comprises the following steps:
according to the field mapping relation, sequentially analyzing the acquired user identity data, and extracting at least two identity characteristics of the user identity data;
in another possible design of the first aspect, the constructing a graph network according to at least two identity features of the user identity data includes:
and constructing the graph network by taking each identity feature in the user identity data as a node of the graph network and taking the incidence relation of every two identity features in the user identity data as a connecting edge of the graph network, wherein each node and each connecting edge in the graph network respectively have attribute information.
According to the scheme, aiming at the problem that different data can possibly come from different systems, the extraction of the user identity data and the identification and extraction of the identity features in the user identity data are realized through the preset configuration information, and the construction of the map network is realized based on the extracted identity features, so that the automation degree is high, and the cost is low.
In still another possible design of the first aspect, the determining, according to connection relationships between nodes and connection edges in the graph network, an identity group of the same user includes:
determining the connection times between adjacent nodes in the graph network according to the connection relationship between the nodes in the graph network and the connection relationship between the nodes and the connection edges;
determining a first connection relation and a second connection relation based on the connection times between adjacent nodes in the graph network and a preset time threshold, wherein the first connection relation is the connection relation that the connection times between the adjacent nodes are larger than the time threshold, and the second connection relation is the connection relation that the connection times between the adjacent nodes are smaller than or equal to the time threshold;
and sequentially traversing the nodes of the graph network outwards by taking the target node as a starting point according to the first connection relation and the second connection relation, and determining the identity group of the user corresponding to the target node.
In the embodiment, the identity group of the user corresponding to the target node is determined based on the determined connection relationship, and the obtained result is high in accuracy.
In another possible design of the first aspect, the determining, according to connection relationships between nodes and connection edges in the graph network, an identity group of the same user includes:
determining an association relation among the nodes according to the connection relation among the nodes in the graph network, the connection relation among the nodes and the connection edges and the attribute information of each node;
and aggregating the nodes in the graph network based on the incidence relation among the nodes to determine the identity group of the same user.
In this embodiment, a higher fusion rate can be ensured by obtaining the identity group of the same user through an aggregation method based on the association relationship between the nodes.
In yet another possible design of the first aspect, the method further includes:
determining a plurality of user identity characteristics which have an association relation with a target identity characteristic in an identity group according to the identity group of the same user, wherein the target identity characteristic is any one of the user identity characteristics included in the identity group;
pushing a message to at least one of the plurality of user identity characteristics.
In the embodiment, the commercial value of the product is improved by determining the plurality of user identity characteristics in the identity group, which have the association relationship with the target identity characteristic, and then pertinently pushing the message to the user.
Optionally, the determining, according to the identity group of the same user, a plurality of user identity features having an association relationship with the target identity feature in the identity group includes:
and searching, traversing and screening nodes in the identity group of the same user, and determining a plurality of user identity characteristics which have an incidence relation with the target identity characteristics in the identity group.
In this embodiment, the vertices having a connection relationship with the vertices corresponding to the target identity feature can be quickly derived and obtained through the processes of node identity retrieval, node identity breadth traversal, node identity feature screening, and the like.
In yet another possible design of the first aspect, the method further includes:
storing the corresponding relation between the nodes and the connecting edges in the map network in the form of a map database;
the graph database includes: point storage, connection edge storage and attribute storage;
the point storing includes: the node main key, the attribute information owned by the node and the connecting edge connected by the node;
the connecting edge storage comprises: the connection side main key, the starting point and the ending point connected by the connection side and the attribute information carried by the connection side;
the attribute storage includes: attribute primary keys, the meaning represented by the attribute, and the specific content represented by the attribute.
In this embodiment, the vertex and the connecting edge of the graph database are described based on the incidence relation between the identity features of the user identity data and the common occurrence record of the identity features, so that the user identity data can solve the data fusion problem through a graph theory algorithm.
Optionally, the node primary key, the connection edge primary key, and the attribute primary key are all stored in an index manner.
In this embodiment, by constructing indexes for the node primary key, the connection side primary key, and the attribute primary key, convenience of retrieval and data management can be improved.
In a second aspect, the present application provides a device for fusing multiple user identities, including:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring user identity data which has at least two identity characteristics;
a processing module, configured to construct a graph network according to at least two identity features of the user identity data, where the graph network includes: the node representing the identity characteristics and the connection edge representing the incidence relation of the identity characteristics;
a determining module, configured to determine an identity group of the same user according to a connection relationship between nodes in the graph network and a connection relationship between a node and a connection edge, where the identity group includes: a plurality of identity characteristics.
In a possible design of the second aspect, the obtaining module is specifically configured to obtain preset configuration information, where the configuration information includes: data source type, data source path, extraction mode and extraction period; and extracting the user identity data from the data source corresponding to the data source type according to the data source path, the extraction mode and the extraction period.
Optionally, the configuration information further includes: a field mapping relationship;
correspondingly, the processing module is further configured to sequentially analyze the obtained user identity data according to the field mapping relationship, and extract at least two identity features of the user identity data.
In another possible design of the second aspect, the processing module is specifically configured to use each identity feature in the user identity data as a node of a graph network, use an association relationship between every two identity features in the user identity data as a connection edge of the graph network, and construct the graph network, where each node and each connection edge in the graph network have attribute information respectively.
In still another possible design of the second aspect, the determining module is specifically configured to determine the connection times between adjacent nodes in the graph network according to connection relationships between nodes in the graph network and connection relationships between nodes and connection edges, and determine a first connection relationship and a second connection relationship based on the connection times between adjacent nodes in the graph network and a preset time threshold, where the first connection relationship is a connection relationship in which the connection times between adjacent nodes are greater than the time threshold, and the second connection relationship is a connection relationship in which the connection times between adjacent nodes are less than or equal to the time threshold; and sequentially traversing the nodes of the graph network outwards by taking the target node as a starting point according to the first connection relation and the second connection relation, and determining the identity group of the user corresponding to the target node.
In another possible design of the second aspect, the determining module is specifically configured to determine an association relationship between nodes according to a connection relationship between nodes in the graph network, a connection relationship between a node and a connection edge, and attribute information of each node; and aggregating the nodes in the graph network based on the incidence relation among the nodes to determine the identity group of the same user.
In yet another possible design of the second aspect, the determining module is further configured to determine, according to an identity group of the same user, a plurality of user identity features in the identity group, which have an association relationship with a target identity feature, where the target identity feature is any one of the user identity features included in the identity group;
the device further comprises: a push module;
the pushing module is used for pushing a message to at least one of the plurality of user identity characteristics.
Optionally, the determining module is further specifically configured to perform retrieval, traversal, and screening processing on nodes in an identity group of the same user, and determine a plurality of user identity features in the identity group, where the user identity features have an association relationship with a target identity feature.
In yet another possible design of the second aspect, the processing module is further configured to store, in the form of a graph database, a correspondence between a node and a connecting edge in the graph network;
the graph database includes: point storage, connection edge storage and attribute storage;
the point storing includes: the node main key, the attribute information owned by the node and the connecting edge connected with the node;
the connecting edge storage comprises: the connection side main key, the starting point and the ending point connected by the connection side and the attribute information carried by the connection side;
the attribute storage includes: attribute primary keys, the meaning represented by the attribute, and the specific content represented by the attribute.
Optionally, the node primary key, the connection edge primary key, and the attribute primary key are all stored in an index manner.
The apparatus provided in the second aspect of the present application may be configured to perform the method provided in the first aspect, and the implementation principle and the technical effect are similar, which are not described herein again.
In a third aspect, the present application provides an electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the first aspect and its various possible designs.
In a fourth aspect, the present application provides a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of the first aspect as well as possible designs of the first aspect.
In a fifth aspect, the present application provides a method for fusing multiple user identities, including:
determining the incidence relation of at least two identity characteristics according to the at least two identity characteristics of the user identity data;
and determining the identity group of the same user according to the incidence relation of the at least two identity characteristics.
One embodiment in the above application has the following advantages or benefits: the identity group corresponding to the identity characteristics of the same user can be accurately determined, and the method can be applied to any scene and is wide in application range. Because the technical means of associating the identity characteristics of the user identity data in the form of the map network is adopted, the technical problems that the existing fusion method cannot be applied to complex scenes and the application range is limited are solved, and the technical effect that the method can be applied to any scene and is wide in application range is achieved.
Other effects of the above-described alternative will be described below with reference to specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
fig. 1 is a schematic flowchart of a method for multiple user identity fusion according to a first embodiment of the present application;
FIG. 2 is a diagram illustrating a K-V format corresponding to a point store in an embodiment of the present application;
FIG. 3 is a diagram illustrating a K-V form corresponding to a connected edge memory according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a K-V form corresponding to attribute storage in an embodiment of the present application;
FIG. 5 is a schematic diagram of the association of identity groups of the same person;
fig. 6 is a flowchart illustrating a method for multiple user identity fusion according to a second embodiment of the present application;
fig. 7 is a flowchart illustrating a method for multi-user identity fusion according to a third embodiment of the present application;
fig. 8 is a flowchart illustrating a method for multi-user identity fusion according to a fourth embodiment of the present application;
fig. 9 is a flowchart illustrating a method for multiple user identity fusion according to a fifth embodiment of the present application;
fig. 10 is a schematic structural diagram of a device for multi-identity fusion provided in an embodiment of the present application;
fig. 11 is a block diagram of an electronic device for implementing the method for multi-user identity fusion according to the embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application to assist in understanding, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Under the environment of popularization of the internet, a large amount of user internet information is generated every day. In general, the internet access information of the user includes information of a user identity, and the user identity can be represented by a virtual user identity or a real identity. Exemplary virtual user identities include: various identity Identifiers (IDs) on the internet (including device IDs such as International Mobile Equipment Identity (IMEI), advertisement Identifiers (IDFA), etc.,; network IDs such as Internet Protocol (IP) addresses, access Points (APs), service Set Identifiers (SSID), etc.); the real user identities include: identity information (such as identification card number and mobile phone number) and various asset information (such as vehicle and house) of the user.
In practical application, by associating the virtual identity of the user with the real identity of the user, the complete behavior of the person can be restored from different expression carriers (vehicle, house, mobile phone equipment code and the like), thereby creating huge commercial value of the product. For example, a complete user profile may be portrayed based on various types of communicated IDs; accurately recommending users on different devices; performing environment touch aiming at different store arrival behaviors of a user, and putting accurate advertisements; meanwhile, advantage complementation among enterprises can be carried out, for example, the identities of a certain search website and an e-commerce website are called through, commodities searched on the search website are recommended to the user on the e-commerce website, and therefore commodities which the user may be interested in are recommended to the user in a targeted mode.
However, as internet information of users continues to accumulate, data to be communicated expands sharply, and thus the following challenging problems need to be solved:
in terms of computational storage: there is a need to solve the PB level data processing problem, where PB refers to petabyte, which is a higher level storage unit, 1pb =1024tb.
On a user scale: for billion-level user IDs, the judgment of whether two users are the same or not needs to be carried out, which is about equal to the 20-power similarity calculation of 10, so that the calculation resource capacity of the current large-scale Internet enterprise cannot be processed and needs to be optimized.
In the market scene: more and more enterprises want to integrate the ID of the Internet equipment with the data of the enterprises, so as to enhance the vitality of the products. Therefore, it is necessary to consider how to privately merge virtual and real data under the condition of limited resources of an enterprise, and migrate the data into the enterprise.
From the technical perspective: with the development of Artificial Intelligence (AI), new types of user IDs, such as face ID, voiceprint ID, and fingerprint ID, must be generated, and therefore, how to adapt to various new types of user IDs by using the existing virtual-real data fusion method needs to be considered.
The introduction of the background art can be known that the multiple identity fusion scheme in the prior art is realized based on preset rules, cannot be applied to complex scenes, and is limited in use range, and to solve the problem, the embodiment of the application provides a multiple user identity fusion method. The technical scheme relates to the aspects of data extraction, data storage, data fusion calculation, association retrieval and the like, supports privatized deployment, and is convenient for small enterprises to perform data fusion.
It can be understood that the execution subject of the embodiment of the present application may be an electronic device, for example, a terminal device such as a computer and a tablet computer, or may also be a server, for example, a background processing platform, and the like. Therefore, the present embodiment is explained by referring to the terminal device and the server collectively as the electronic device, and it can be determined as the actual situation as to whether the electronic device is specifically the terminal device or the server.
The technical solution of the present application will be described in detail below with reference to specific examples. It should be noted that the following several specific embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.
Fig. 1 is a schematic flowchart of a method for multi-user identity fusion according to a first embodiment of the present application. As shown in fig. 1, the method may include the steps of:
s101, obtaining user identity data, wherein the user identity data has at least two identity characteristics.
In practical applications, data generally used for fusion is mainly classified into two main categories, including Virtual Network Data (VND) and Real Data (RD). The internet virtual data relates to internet browsing data, search data, map positioning data, consumption data (e.g., online shopping data), bayonet camera data and the like of the user; reality data relates to social data, government affairs data, enterprise data, consumption data (e.g., offline shopping data), enterprise internal data (e.g., banking data), and the like.
The user identity data may have identity characteristics including: the virtual identity features and the real identity features, wherein the Virtual Identity (VI) features refer to identity features of a series of activities of a user on the internet, including various device information, account information, and the like of the user. From the data type, the Real Identity (RI) features include address information, native place, address, household, and the like of public security registration, and may also include government or functional department information and data such as real estate, car property, debt, bank account, and the like; the real identity feature can also be defined as an identity ID inside an enterprise according to a specific enterprise use scenario.
It can be understood that, in the embodiment of the present application, identity features of user identity data are not limited, and specific expression forms of virtual identity features and real identity features are not limited, which may be determined according to actual situations and are not described herein again.
In the embodiment of the present application, the step is to obtain user identity data for performing multiple user identity fusion, where the user identity data needs to have at least two identity features, and in practical applications, the user identity data is mainly extracted from various types of virtual data and real data.
S102, establishing a graph network according to at least two identity characteristics of the user identity data.
Wherein the graph network comprises: the node representing the identity characteristic and the connection edge representing the incidence relation of the identity characteristic.
In practical application, the general data is relational form data, and the graph structure is described in a way of adjacent tables, so that a point table and an edge table are generally required to be created. However, due to the complex graph structure, especially after the nodes are over hundred million in size, the distributed storage is required to be utilized to ensure the scalability of the system. Because a general relational database is selected, after data are stored in a distributed manner, the problem of communication among data becomes more complex, and the design cost is increased. Therefore, in the embodiment of the application, the graph database is used for storing the nodes representing the identity characteristics and the connection edges representing the incidence relation of the identity characteristics, and the graph database is a non-relational database and is used for storing the relation information between entities by applying the graph theory, so that the organization cost of the point edges is reduced.
In this embodiment, the vertex and the connecting edge of the graph database are described based on the association relationship between the identity features of the user identity data and the record where the identity features appear together, so that the user identity data can solve the problem of data fusion through a graph theory algorithm.
Illustratively, in an embodiment of the present application, the method of the present application may further include the steps of:
and storing the corresponding relation between the nodes and the connecting edges in the map network in a form of a map database.
Wherein the graph database comprises: point storage, connecting edge storage and attribute storage.
The point storage includes: the node main key, the attribute information owned by the node and the connecting edge of the node connection.
The connecting edge storage comprises: the connection edge main key, the starting point and the ending point connected by the connection edge, and the attribute information carried by the connection edge.
The attribute storage includes: attribute primary keys, the meaning that the attribute represents, and the specific content that the attribute represents.
Further, in the embodiment of the present application, the node primary key, the connection edge primary key, and the attribute primary key are all stored in an index manner.
In particular, in embodiments of the present application, the graph database relates primarily to the storage of points, edges, and attributes. Optionally, the point-edge-attribute is organized, indexed, and stored inside the storage medium in the form of a Key-Value (K-V) pair.
For example, fig. 2 is a schematic diagram of a dot storage corresponding to a K-V format in the embodiment of the present application. FIG. 3 is a diagram illustrating a form of a connecting edge storing a corresponding K-V in an embodiment of the present application. FIG. 4 is a diagram illustrating a K-V form corresponding to attribute storage in an embodiment of the present application. As shown in fig. 2, when a point store is represented by a K-V form, i.e., key-value (value), a point primary key is mainly composed of a node primary key (vertex _ id), attribute information owned by a node, and edges to which the node is connected. The attribute information and the connection edge both store corresponding primary keys, that is, the attribute primary key (property _ id) shown in fig. 3 and the edge primary key (edge _ id) shown in fig. 4.
It should be noted that, in practical application, the node primary key is actually an identifier of a vertex, the attribute primary key is actually an identifier of an attribute, and the edge primary key is actually an identifier of a connecting edge.
Referring to fig. 2, for example, for two nodes containing identity feature information, the node primary keys are 12345 and 67894 respectively, specifically representing a certain identity card and a certain mobile phone number, then, the point 12345 has two attributes P-123 and P-124, and meanwhile, the point 12345 has two edges E-12323 and E-86743; point 67894 has two attributes P-376 and P-377, while point 67894 has two edges E-12323 and E-86743.
Referring to fig. 3, the connected edge storage is composed of a primary key (edge _ id) of the edge, a start point and an end point to which the connected edge is connected, and attributes carried on the connected edge. The start point and the end point are the primary keys (vertex _ id) of the corresponding points, the attribute is also the attribute in the point storage, and the value corresponds to the primary key (property _ id) of the attribute.
For example, the primary key is an edge of E-12323 with a start point of 12345 and an end point of 67894, and the primary key connecting edges E-12323 has two edge attributes P-73625 and P-5325. The primary key is the edge of E-86743, starting at 12345, ending at 87251, and the connecting-edge primary key E-86743 has two edge attributes P-56325633 and P-672.
Referring to fig. 4, the property store is composed of a property primary key (property _ id), a meaning represented by the current property (property meaning: property _ key), and a specific content represented by the property (property content: property _ value).
For example, for two attributes P-123 and P-124 of the point 12345, where P-123 indicates that the feature TYPE is cell phone, i.e., ID-TYPE = phone, and P-124 indicates that the feature information is a specific cell phone number, i.e., ID-INFO =138 × 1232. Two attributes P-376 and P-377 are available for point 67894, where P-376 indicates the feature TYPE as ID card, i.e., ID-TYPE = idcard, and P-124 indicates the feature information as a specific ID card number, i.e., ID-INFO =3401 × 1273.
It should be noted that, in the embodiment of the present application, in order to facilitate retrieval and data management, each node, connection edge, and primary key of an attribute construct an index, and support to construct an index for a specific certain type of attribute, so that nodes and/or connection edges are conveniently retrieved through the attribute.
For example, in this embodiment, based on the analysis of each data structure in the graph database, each type of identity ID may construct a graph relationship, where the attribute on the connection edge includes specific connection information of identity features represented by two connected nodes, such as connection time, connection frequency, and the like, which may provide feature representation for subsequent calculation.
In this embodiment, user identity data is stored in a form of a K-V-based graph database, specifically, distributed K-V storage may be used for trillion-scale data, and memory or single-machine K-V storage may be used for data of a scale within ten million or less.
S103, according to the connection relation among the nodes in the graph network and the connection relation among the nodes and the connection edges, determining an identity group of the same user, wherein the identity group comprises: a plurality of identity characteristics.
In the embodiment of the application, after the graph network is constructed and obtained based on at least two identity characteristics of user identity data, the connection relationship between nodes and the connection relationship between the nodes and the connection edges in the graph network can be determined, so that the identity group of the same user can be determined based on the connection relationship between the nodes and the connection edges.
Specifically, in the embodiment of the present application, the step of determining the identity group of the same user based on the created graph network may be understood as a data fusion process, that is, performing fusion calculation on each identity feature in the graph network, specifically, mining other identity features having an implicit connection relationship with the identity feature according to the topological property of the identity feature and the node attribute mounted on the identity feature, and finally forming a natural person dimension identity group (ID-group).
In practical applications, the identity fusion calculation method can be divided into a rule mode and a model mode. For specific implementation principles of the rule mode and the model mode, reference may be made to the following description of the embodiments shown in fig. 7 and fig. 8, which is not described herein again.
It is worth noting that in embodiments of the present application, both rule mode and model mode involve a large number of computational problems that traverse the graph network over and over. For a scene with a small data size, data clipping (such as clipping according to cities or clipping according to each type characteristic) can be performed through a way of narrowing a demand target group to obtain a data set with a small data size, and then single-machine operation is performed to determine the identity group of the same user. For a scene with a large data size, identity groups of the same user can be determined by means of distributed graph computing, such as computing frameworks of GraphX, graphLab, giraph and the like.
For example, in the embodiment of the present application, the identity group ID-group of the same user determined above may be expressed as a natural person identity set. For example, fig. 5 is a schematic diagram of association between identity groups of the same person. Referring to fig. 5, the identity of a natural person is respectively expanded (connection relationship is established between every two identity features), and each identity feature pair is injected into the created map network, so that the relationships between the identity features can be communicated, and a strong communication graph of the identity features is obtained.
For example, in the schematic diagram shown in fig. 5, the expression form of the identity feature may include: IMEI, AP, IP address, IDFA, mobile phone number, identification number, house, car, other, etc.
In the method for fusing multiple user identities provided in the embodiment of the present application, a graph network is constructed by obtaining user identity data and according to at least two identity characteristics of the user identity data, and the graph network includes: the method comprises the following steps of characterizing nodes of identity characteristics and connecting edges of incidence relations of the identity characteristics, and determining an identity group of the same user according to the connecting relations among the nodes and the connecting edges in the graph network, wherein the identity group comprises the following steps: a plurality of identity characteristics. In the technical scheme, the identity characteristics of the user identity data are associated in the form of the map network, so that not only can the identity groups corresponding to a plurality of identity characteristics of the same user be accurately determined, but also the method can be applied to any scene, and the problem of limited application range is avoided.
Exemplarily, on the basis of the foregoing embodiments, fig. 6 is a flowchart illustrating a method for multiple user identity fusion according to a second embodiment of the present application. As shown in fig. 6, in this embodiment, the above S101 may be implemented by the following steps:
s601, obtaining preset configuration information, where the configuration information includes: data source type, data source path, extraction mode and extraction period.
In the embodiment of the application, the user identity data subjected to fusion may be internet virtual data and reality data, and different user identity data may be stored in different types of storage systems after being generated, for example, HDFS, HIVE, MYSQL, noSQL, and the like. Therefore, in order to obtain the user identity data from different storage systems, the preset configuration information needs to be obtained, so as to obtain the user identity data from different storage systems according to the preset configuration information.
Correspondingly, in this embodiment, the preset configuration information may include a data source type (HDFS, HIVE, MYSQL, noSQL \8230), a data source path (host: port), hdfspath \8230, an extraction mode, and an extraction period. The data source type is used for representing a system type for storing user identity data, the data source path is used for representing a route through which the user identity data are extracted, the extraction mode is used for representing what mode is adopted for data extraction, and the extraction period is used for representing how long the data extraction is automatically executed once. The extraction period may also be considered as a scheduling frequency (execution period) for indicating that the user data extraction task is executed at a daily level, an hourly level, or a single time.
In this embodiment, a piece of user identity data must include at least two identity characteristics, each identity characteristic is represented by a point, and a weak connection edge exists between the two points. For example, if the identity characteristic may be a device code, a mobile phone number, an identity number, an account number, and the like, a piece of user identity data at least includes at least two of the device code, the mobile phone number, the identity number, and the account number.
S602, extracting user identity data from the data source corresponding to the data source type according to the data source path, the extraction mode and the extraction period.
In this embodiment, based on preset configuration information, first a data source type, a data source path, an extraction manner, and an extraction period are determined, then a hadoop/spark/single machine is selected according to the extraction manner, and user identity data is extracted from a data source corresponding to the data source type based on a duration of each interval extraction period of the data source path.
It can be understood that, the user data extraction is realized based on each information dependency relationship in the preset configuration information, which can ensure that the data extraction task can be executed stably and orderly.
Optionally, in an embodiment of the present application, if the configuration information further includes: a field mapping relationship; then, the method may further comprise the steps of:
s603, according to the field mapping relation, sequentially analyzing the acquired user identity data, and extracting at least two identity characteristics of the user identity data.
The field mapping relationship is used to indicate which fields (identity features in time) are extracted from the acquired user identity data to serve as nodes and connection edges of the graph network, and which attributes are labeled to serve as attributes of the point (for example, the point attributes include a mobile phone number and an equipment identifier), and which attributes serve as attributes of the edge (the edge attributes may refer to information such as login time).
Therefore, in this embodiment, each line of data in the obtained user identity data may be analyzed according to the field mapping relationship, and at least two identity features of each line of data are determined, so that a realization possibility is provided for subsequent map network construction.
Accordingly, on the basis of the above embodiment, for example, as shown in fig. 6, the step 102 may be implemented by:
s604, constructing the graph network by taking each identity feature in the user identity data as a node of the graph network and taking the incidence relation of every two identity features in the user identity data as a connecting edge of the graph network.
Each node and each connecting edge in the graph network respectively have attribute information.
In this embodiment, when at least two identity features of the user identity data are determined according to the field mapping relationship, the association relationship between every two identity features and the attribute information of each node and each connection edge are determined, so that the constructed graph network can be obtained by taking each identity feature of the user identity data as a vertex of the graph network and taking the association relationship between every two identity features as a connection edge of the graph network.
From the above analysis, the user identity data can be organized into structure data having an origin, an origin attribute, a termination point attribute, a connection edge, and a connection edge attribute. The structural data is actually a directed graph, the relationship between nodes can be configured, and the output form of the structural data can be json, proto and csv. The embodiment of the present application does not limit the specific form of the output, which can be selected according to actual situations.
Further, node duplication removal, attribute combination and other operations are performed on the structural data formed based on the user identity data, so that the nodes corresponding to the identity characteristics of the newly-added data can be inserted into the graph network, and the nodes corresponding to the identity characteristics of the original data can execute corresponding updating operations.
Similarly, the connection edge deduplication and the attribute merging can be performed on the structural data formed based on the user identity data, that is, edges with the same connection relation in a plurality of pieces of data are merged, repeated connection edges and attributes are removed, and the reconstruction of information in the user identity data through a merging form is achieved. Finally, the extracted attribute information such as the vertexes and the connecting edges can be updated to the graph network based on the reconstructed structure data.
The method for fusing multiple user identities provided by the embodiment of the application acquires preset configuration information, wherein the configuration information comprises: the method comprises the following steps of extracting user identity data from a data source corresponding to a data source type according to a data source type, a data source path, an extraction mode and an extraction period, and configuring information further comprising: when the field mapping relationship is established, the obtained user identity data can be sequentially analyzed according to the field mapping relationship, at least two identity characteristics of the user identity data are extracted, each identity characteristic in the user identity data is used as a node of the graph network, and the incidence relationship of every two identity characteristics is used as a connecting edge of the graph network to construct the graph network. In the technical scheme, aiming at the problem that different data can possibly come from different systems, the extraction of the user identity data and the identification and extraction of the identity features in the user identity data are realized through the preset configuration information, and the construction of the map network is realized based on the extracted identity features, so that the automation degree is high, and the cost is low.
Exemplarily, on the basis of the foregoing embodiments, fig. 7 is a flowchart illustrating a method for multiple user identity fusion according to a third embodiment of the present application. As shown in fig. 7, in this embodiment, the step S103 may be implemented by:
s701, determining the connection times between adjacent nodes in the graph network according to the connection relationship between the nodes in the graph network and the connection relationship between the nodes and the connection edges.
In the graph network, which nodes have a connection relationship can be determined according to the connection relationship between the nodes, and basic information of the connection between the nodes, such as connection time, connection times and the like, can be determined according to the connection relationship between the nodes and the connection edges, so that the connection times between adjacent nodes in the graph network can be determined according to the basic information of the connection relationship between the nodes and the connection between the nodes.
In this embodiment, since the connection relationship between the nodes and the connection edges are also the node connection information, the number of connections between adjacent nodes in the graph network may also be interpreted as determining the number of connections between adjacent nodes according to the node connection information in a plurality of preset time periods.
S702, determining a first connection relation and a second connection relation based on the connection times between adjacent nodes in the graph network and a preset time threshold.
The first connection relation is the connection relation that the connection times between adjacent nodes are larger than a time threshold value, and the second connection relation is the connection relation that the connection times between the adjacent nodes are smaller than or equal to the time threshold value.
The embodiment of the application mainly performs identity fusion calculation based on a rule mode, and particularly, in the implementation mode, firstly, connection times between adjacent nodes in the graph network, node connection information in different time periods and a preset time threshold are determined, secondly, according to the connection times between the node connection information in different time periods and the adjacent nodes, the relation between the connection times between the adjacent nodes and the time threshold is determined, and therefore which connection relation each two adjacent nodes belong to is determined.
In practical applications, the first connection relationship may also be referred to as a strong connection relationship, and the second connection relationship may also be referred to as a weak connection relationship.
And S703, sequentially traversing the nodes of the graph network outwards by taking the target node as a starting point according to the first connection relation and the second connection relation, and determining the identity group of the user corresponding to the target node.
In this embodiment, after the connection relationship between adjacent nodes is determined, each node in the graph network may be traversed to the periphery with itself as a starting point, and the first connection relationship and the second connection relationship are determined for any passing node, so that all the identity features corresponding to the node sets satisfying the first connection relationship are used as an identity group. Similarly, for the target node, the identity group of the user corresponding to the target node can be determined by traversing all the identity groups and according to a certain policy (for example, solving a minimum connected subgraph).
It is understood that, in the embodiment of the present application, the depth (out degree) of the node traversal may be determined according to actual situations, and is not limited in the embodiment. Typically, the depth of traversal is typically 3 degrees.
As can be seen from the above analysis, in this embodiment, based on the connection times between adjacent nodes in the graph network and a preset time threshold, a first connection relationship and a second connection relationship are determined, where the first connection relationship is a connection in which the connection times between adjacent nodes are greater than the time threshold, and the second connection relationship is a connection in which the connection times between adjacent nodes are less than or equal to the time threshold, and according to the first connection relationship and the second connection relationship, the target node is taken as a starting point, and the nodes of the graph network are traversed outward in sequence to determine an identity group of the user corresponding to the target node, that is, the scheme is based on a rule mode fusion mode, and the accuracy of the obtained result is high.
Exemplarily, on the basis of the foregoing embodiments, fig. 8 is a flowchart illustrating a method for multiple user identity fusion according to a fourth embodiment of the present application. This embodiment differs from the embodiment shown in fig. 7 in the way in which the two embodiments determine the identity group of the same user. Specifically, as shown in fig. 8, in this embodiment, the step S103 may be implemented by:
s801, determining the association relationship among the nodes according to the connection relationship among the nodes in the graph network, the connection relationship among the nodes and the connection edges and the attribute information of each node.
In this embodiment, identity fusion is mainly calculated based on a model mode, specifically, nodes are characterized according to a connection relationship between the nodes, a connection relationship between the nodes and a connection edge, and attribute information possessed by each node, for example, an identification Embedding (ID-Embedding) process is performed, where the ID-Embedding process is performed to extract a set of features (a set of vectors) for each node in a graph network, so as to achieve the purpose of mathematical operations between the nodes.
It will be appreciated that there are many ways in which nodes may be characterized, for example, node2vec, graph-embedding, etc. The method actually adopted can be determined according to actual needs, and is not described herein again.
In the embodiment of the present application, to meet more demand scenarios, a characterization scheme based on an attribute data type may be adopted, which is a general feature representation method, and specifically, according to different types of attribute features: string, int, double, enum, and carry out related coding. For example, string type may be encoded by numerical normalization according to onehot character encoding, int.
S802, based on the association relationship among the nodes, the nodes in the graph network are aggregated, and the identity group of the same user is determined.
In this embodiment, the nodes are clustered based on the association relationship between the nodes, that is, the nodes with higher similarity (association) are aggregated by using a similarity algorithm, so as to form a strongly-related group. The similarity algorithm may be selected according to the feature attributes extracted in S801. For example, the discrete type feature may use the Jaccard similarity algorithm, the continuity feature vector may use cosine similarity, and the numerical type feature may use euclidean similarity.
From the above analysis, after the association relationship between the nodes is determined, the association degree between the nodes is obtained correspondingly, so that different algorithms can be selected to cluster the nodes, for example, a classic Louvain community discovery algorithm can be used, and a traditional k-means algorithm can also be used for clustering. The final output is a higher confidence group of identities for the same user.
According to the method for multi-user identity fusion, the association relationship among the nodes is determined according to the connection relationship among the nodes in the graph network, the connection relationship among the nodes and the connection edges and the attribute information of each node, the nodes in the graph network are aggregated based on the association relationship among the nodes, and the identity group of the same user is determined. In the technical scheme, the high fusion rate can be ensured based on the fusion mode of the model mode.
It should be noted that, in practical applications, the fusion mode based on the rule mode and the model mode can be used in combination, that is, the two modes are complementary to each other to form a complete fusion system. In the embodiment of the application, distributed and non-distributed computing can be selected according to the data scale, so that a scheme of independent deployment in an engineering angle is facilitated.
Exemplarily, on the basis of any of the foregoing embodiments, fig. 9 is a flowchart illustrating a method for multiple user identity fusion according to a fifth embodiment of the present application. Referring to fig. 9, the method may further include the steps of:
s901, according to the identity group of the same user, determining a plurality of user identity characteristics in the identity group, which have an association relationship with a target identity characteristic, wherein the target identity characteristic is any one of the user identity characteristics included in the identity group.
In the embodiment, the scheme of determining multiple associated user identity characteristics of the target identity characteristics according to the identity group is actually a process of association retrieval. Specifically, the association search is a process of performing graph traversal search based on a strong connection graph formed by S103 and the association relationship shown in fig. 5 in the embodiment shown in fig. 1.
Specifically, based on a given source node type, identity characteristics of the source node, a target node type, and the like, a connected node identifier and a corresponding score may be output, where the score represents a degree of association between the connected node and the source node and the target node. Since the connection relationship between the nodes forms a connection graph, as shown in fig. 5, the identity of the target identity feature can be intuitively deduced according to the target identity feature, so as to obtain all the user identity features having an association relationship with the target identity feature.
Illustratively, in this embodiment, this step may be implemented as follows:
and searching, traversing and screening nodes in the identity group of the same user, and determining a plurality of user identity characteristics which have an association relationship with the target identity characteristics in the identity group.
In the embodiment of the present application, the process of identity connectivity derivation can be divided into: searching node identities, traversing the breadth of the node identities, screening node identity characteristics and the like.
Specifically, node identity retrieval: as can be seen from the above description of the embodiment shown in FIG. 1, the point store mainly includes a node primary KEY (KEY), attribute information owned by the node, and a connecting edge to which the node is connected. By indexing the attribute primary keys (e.g., indexing ID-TYPE and ID-INFO), improved retrieval performance may be provided. For example, if a node with a mobile phone number of 139xxxxxxxx needs to be retrieved, the process of retrieving the node is to query data of ID-TYPE = phone and ID-INFO =139 xxxxxxxxxx.
And traversing the breadth of the node identity: node traversal for graph-based Breadth First Search (BFS). Starting from a target node corresponding to the target identity characteristic, nodes connected with 1 layer and 2 layers of 82303030n (which can be controlled by parameters) are traversed from inside to outside in sequence.
For example, in the schematic diagram shown in fig. 5, for the mobile phone number in fig. 5, the first layer obtains the IMEI, IDFA, AP, IP address, etc., and the second layer obtains the car, house, etc. As can be seen from the point storage shown in fig. 2, the BFS process is a process of traversing the connecting edges stored in the values of the nodes, and one-time traversal of one node is performed only once for K-V search, which greatly improves traversal efficiency.
Screening node identity characteristics: all identity groups communicated with the nodes corresponding to the target identity characteristics can be obtained by traversing the nodes in the graph network, and the attribute main keys of the nodes are provided with indexes, so that the retrieval results can be filtered according to the indexes of the attribute main keys. For example, to obtain idfa in communication with a mobile phone, a round of traversal can obtain the result, and nodes with non-ID-TYPE = idfa are filtered out.
Therefore, the vertexes which are in communication relation with the vertexes corresponding to the target identity characteristics can be quickly deduced and obtained through the processes of node identity retrieval, node identity breadth traversal, node identity characteristic screening and the like.
In this embodiment, based on the online identity feature derivation through graph traversal, by using a storage and indexing scheme of a graph network, compared with the existing scheme of storing the mapping relationship of the identity features through a positive direction table, the average response effect is significantly improved, and especially when the step length (degree) of a connected path is greater than 2 steps, the average response performance is improved by 10 times.
S902, pushing a message to at least one of the plurality of user identity characteristics.
In this embodiment, for the identity group of the same user, after determining a plurality of user identity features in the identity group that have an association relationship with the target identity feature, a message may be pushed to at least one of the plurality of user identity features.
For example, when the identities of a certain search website and an e-commerce website are reached, after a user searches for a commodity on the search website, the searched commodity can be recommended to the user on the e-commerce website when the user logs in the e-commerce website, so that the commodity which the user may be interested in is recommended to the user in a targeted manner.
As can be seen from the above analysis, in this embodiment, a plurality of user identity features having an association relationship with a target identity feature in an identity group are determined according to the identity group of the same user, where the target identity feature is any one of the user identity features included in the identity group, and a message is pushed to at least one of the user identity features. The technical scheme realizes targeted message pushing to the user and improves the commercial value of the product.
To sum up, the technical implementation of the embodiment of the application regards each identity feature in the user identity data as a node in the graph network, strengthens the connection relationship between a plurality of identity feature pairs with weak relationship through a graph theory algorithm, and forms a strong relationship pair so as to achieve the purpose of identity fusion.
In the above, a specific implementation of the multiple identity fusion method mentioned in the embodiment of the present application is introduced, and the following is an embodiment of the apparatus of the present application, which can be used to implement the embodiment of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Fig. 10 is a schematic structural diagram of a device for multiple identity fusion provided in an embodiment of the present application. The device can be integrated in or realized by electronic equipment, and the electronic equipment can be terminal equipment or a server. As shown in fig. 10, in this embodiment, the apparatus 100 for multi-identity fusion may include:
an obtaining module 1001, configured to obtain user identity data, where the user identity data has at least two identity features;
a processing module 1002, configured to construct a graph network according to at least two identity features of the user identity data, where the graph network includes: the node for representing the identity characteristics and the connection edge for representing the incidence relation of the identity characteristics;
a determining module 1003, configured to determine an identity group of the same user according to a connection relationship between nodes in the graph network and a connection relationship between a node and a connection edge, where the identity group includes: a plurality of identity characteristics.
In a possible design of the embodiment of the present application, the obtaining module 1001 is specifically configured to obtain preset configuration information, where the configuration information includes: data source type, data source path, extraction mode and extraction period; and extracting the user identity data from the data source corresponding to the data source type according to the data source path, the extraction mode and the extraction period.
Illustratively, the configuration information further includes: a field mapping relationship;
correspondingly, the processing module 1002 is further configured to sequentially analyze the obtained user identity data according to the field mapping relationship, and extract at least two identity features of the user identity data.
In another possible design of the present application, the processing module 1002 is specifically configured to use each identity feature in the user identity data as a node of a graph network, and use an association relationship between every two identity features in the user identity data as a connection edge of the graph network, so as to construct the graph network, where each node and each connection edge in the graph network have attribute information respectively.
In yet another possible design of the embodiment of the present application, the determining module 1003 is specifically configured to determine the connection times between adjacent nodes in the graph network according to the connection relationships between the nodes in the graph network and the connection relationships between the nodes and the connection edges, and determine a first connection relationship and a second connection relationship based on the connection times between the adjacent nodes in the graph network and a preset time threshold, where the first connection relationship is a connection relationship in which the connection times between the adjacent nodes are greater than the time threshold, and the second connection relationship is a connection relationship in which the connection times between the adjacent nodes are less than or equal to the time threshold; and sequentially traversing the nodes of the graph network outwards by taking the target node as a starting point according to the first connection relation and the second connection relation, and determining the identity group of the user corresponding to the target node.
In another possible design of the embodiment of the present application, the determining module 1003 is specifically configured to determine an association relationship between nodes according to a connection relationship between nodes in the graph network, a connection relationship between a node and a connection edge, and attribute information of each node; and aggregating the nodes in the graph network based on the association relationship among the nodes to determine the identity group of the same user.
In any one of the above possible designs of the embodiment of the present application, the determining module 1003 is further configured to determine, according to an identity group of the same user, a plurality of user identity features in the identity group, where the user identity features have an association relationship with a target identity feature, where the target identity feature is any one of the user identity features included in the identity group;
correspondingly, the device also comprises: a push module;
the pushing module is used for pushing a message to at least one identity feature in the plurality of user identity features.
In this embodiment of the application, the determining module 1003 is further specifically configured to perform retrieval, traversal, and screening processing on nodes in an identity group of the same user, and determine a plurality of user identity features in the identity group, where the user identity features have an association relationship with a target identity feature.
In any one of the possible designs of this embodiment, the processing module 1002 is further configured to store a corresponding relationship between a node and a connecting edge in the graph network in a graph database;
wherein the graph database comprises: point storage, connection edge storage and attribute storage;
the point storing includes: the node main key, the attribute information owned by the node and the connecting edge connected with the node;
the connecting edge storage comprises: the connection side main key, the starting point and the ending point connected by the connection side and the attribute information carried by the connection side;
the attribute storage includes: attribute primary keys, the meaning represented by the attribute, and the specific content represented by the attribute.
Optionally, the node primary key, the connection edge primary key, and the attribute primary key are all stored in an index manner.
The apparatus provided in the embodiment of the present application may be used to execute the method in the embodiments shown in fig. 1 to fig. 9, and the implementation principle and the technical effect are similar, which are not described herein again.
It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or can be implemented in the form of hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the determining module may be a processing element separately set up, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the determining module is called and executed by a processing element of the apparatus. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.
For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when some of the above modules are implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor that can call program code. As another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.
Further, according to the embodiment of the application, the application also provides an electronic device and a readable storage medium.
Fig. 11 is a block diagram of an electronic device for implementing the method for multi-user identity fusion according to the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 11, the electronic apparatus includes: one or more processors 1101, a memory 1102, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 11, a processor 1101 is taken as an example.
The memory 1102 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method of multi-user identity fusion provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of multi-user identity fusion provided herein.
The memory 1102, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the method for multi-user identity fusion in the embodiment of the present application (for example, the obtaining module 1001, the processing module 1002, and the determining module 1003 shown in fig. 10). The processor 1101 executes the non-transitory software programs, instructions and modules stored in the memory Y02 to execute various functional applications of the server and data processing, i.e., to implement the method of multi-user identity fusion in the above method embodiment.
The memory 1102 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device with multi-user identity fusion, and the like. Further, the memory 1102 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1102 may optionally include memory remotely located from the processor 1101, and such remote memory may be connected to multiple user identity converged electronic devices over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device with multi-user identity fusion can further comprise: an input device 1103 and an output device 1104. The processor 1101, memory 1102, input device 1103, and output device 1104 may be connected by a bus or other means, such as by bus in fig. 11.
The input device 1103 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device that are integrated with multiple user identities, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer, one or more mouse buttons, a track ball, a joystick, and the like. The output devices 1104 may include a display device, auxiliary lighting devices (e.g., LEDs), tactile feedback devices (e.g., vibrating motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
The embodiment of the present application further provides a method for fusing multiple user identities, including:
determining the incidence relation of at least two identity characteristics according to the at least two identity characteristics of the user identity data;
and determining the identity group of the same user according to the incidence relation of the at least two identity characteristics.
For a specific implementation principle of this embodiment, reference may be made to the description of the embodiments shown in fig. 1 to fig. 9, which is not described herein again.
According to the technical scheme of the embodiment of the application, by acquiring the user identity data and according to at least two identity characteristics of the user identity data, the graph network is constructed, and comprises: according to the incidence relation between the nodes in the graph network and the incidence relation between the nodes and the connecting edges, determining an identity group of the same user, wherein the identity group comprises: a plurality of identity characteristics. In the technical scheme, the identity characteristics of the user identity data are associated in the form of the map network, so that not only can the identity groups corresponding to a plurality of identity characteristics of the same user be accurately determined, but also the method can be applied to any scene, and the problem of limited application range is avoided.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments are not intended to limit the scope of the present disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A method for fusing multiple user identities, comprising:
acquiring user identity data, wherein the user identity data has at least two identity characteristics;
according to at least two identity characteristics of the user identity data, establishing a graph network, wherein the graph network comprises: the node representing the identity characteristics and the connection edge representing the incidence relation of the identity characteristics;
according to the connection relationship among the nodes in the graph network and the connection relationship among the nodes and the connection edges, determining an identity group of the same user, wherein the identity group comprises: a plurality of identity characteristics;
the determining the identity group of the same user according to the connection relationship between the nodes and the connection edges in the graph network includes:
determining the connection times between adjacent nodes in the graph network according to the connection relationship between the nodes in the graph network and the connection relationship between the nodes and the connection edges;
determining a first connection relation and a second connection relation based on the connection times between adjacent nodes in the graph network and a preset time threshold, wherein the first connection relation is the connection relation that the connection times between the adjacent nodes are larger than the time threshold, and the second connection relation is the connection relation that the connection times between the adjacent nodes are smaller than or equal to the time threshold;
according to the first connection relation and the second connection relation, sequentially traversing nodes of the graph network outwards by taking a target node as a starting point, and determining an identity group of a user corresponding to the target node;
the determining the identity group of the same user according to the connection relationship between the nodes and the connection edges in the graph network further comprises:
determining an association relation among the nodes according to the connection relation among the nodes in the graph network, the connection relation among the nodes and the connection edges and the attribute information of each node;
based on the incidence relation among the nodes, aggregating the nodes in the graph network to form a strongly-related group; and determining the identity group of the same user according to the identity group of the user corresponding to the target node and the strongly-related group.
2. The method of claim 1, wherein the obtaining user identity data comprises:
acquiring preset configuration information, wherein the configuration information comprises: data source type, data source path, extraction mode and extraction period;
and extracting the user identity data from the data source corresponding to the data source type according to the data source path, the extraction mode and the extraction period.
3. The method of claim 2, wherein the configuration information further comprises: a field mapping relationship;
the method further comprises the following steps:
and analyzing the acquired user identity data in sequence according to the field mapping relationship, and extracting at least two identity characteristics of the user identity data.
4. The method according to any one of claims 1-3, further comprising:
determining a plurality of user identity characteristics which have an association relation with a target identity characteristic in an identity group according to the identity group of the same user, wherein the target identity characteristic is any one of the user identity characteristics included in the identity group;
pushing a message to at least one of the plurality of user identity features.
5. The method according to claim 4, wherein the determining, according to the identity group of the same user, a plurality of user identity features in the identity group, which have an association relationship with the target identity feature, comprises:
and searching, traversing and screening nodes in the identity group of the same user, and determining a plurality of user identity characteristics which have an incidence relation with the target identity characteristics in the identity group.
6. The method according to any one of claims 1-3, further comprising:
storing the corresponding relation between the nodes and the connecting edges in the map network in a form of a map database;
the graph database includes: point storage, connection edge storage and attribute storage;
the point storing includes: the node main key, the attribute information owned by the node and the connecting edge connected with the node;
the connecting edge storage comprises: the connection side main key, the starting point and the ending point connected by the connection side and the attribute information carried by the connection side;
the attribute storage includes: attribute primary keys, the meaning that the attribute represents, and the specific content that the attribute represents.
7. The method of claim 6, wherein the node primary key, the connecting edge primary key, and the attribute primary key are stored in an indexed manner.
8. An apparatus for multi-user identity fusion, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring user identity data which has at least two identity characteristics;
a processing module, configured to construct a graph network according to at least two identity features of the user identity data, where the graph network includes: the node representing the identity characteristics and the connection edge representing the incidence relation of the identity characteristics;
a determining module, configured to determine an identity group of the same user according to a connection relationship between nodes in the graph network and a connection relationship between a node and a connection edge, where the identity group includes: a plurality of identity characteristics;
the determining module is specifically configured to determine connection times between adjacent nodes in the graph network according to connection relationships between nodes in the graph network and connection relationships between nodes and connection edges, and determine a first connection relationship and a second connection relationship based on the connection times between the adjacent nodes in the graph network and a preset time threshold, where the first connection relationship is a connection relationship in which the connection times between the adjacent nodes is greater than the time threshold, and the second connection relationship is a connection relationship in which the connection times between the adjacent nodes is less than or equal to the time threshold; according to the first connection relation and the second connection relation, sequentially traversing nodes of the graph network outwards by taking a target node as a starting point, and determining an identity group of a user corresponding to the target node; determining an association relation among the nodes according to the connection relation among the nodes in the graph network, the connection relation among the nodes and the connection edges and the attribute information of each node; based on the incidence relation among the nodes, aggregating the nodes in the graph network to form a strongly-related group; and determining the identity group of the same user according to the identity group of the user corresponding to the target node and the strongly related group.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
10. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
CN201910831646.7A 2019-09-04 2019-09-04 Multi-user identity fusion method, device, equipment and storage medium Active CN110543586B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910831646.7A CN110543586B (en) 2019-09-04 2019-09-04 Multi-user identity fusion method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910831646.7A CN110543586B (en) 2019-09-04 2019-09-04 Multi-user identity fusion method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110543586A CN110543586A (en) 2019-12-06
CN110543586B true CN110543586B (en) 2022-11-15

Family

ID=68712484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910831646.7A Active CN110543586B (en) 2019-09-04 2019-09-04 Multi-user identity fusion method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110543586B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143627B (en) * 2019-12-27 2023-08-15 北京百度网讯科技有限公司 User identity data determination method, device, equipment and medium
CN111259090B (en) * 2020-02-03 2023-10-24 北京百度网讯科技有限公司 Graph generation method and device of relational data, electronic equipment and storage medium
CN113283921A (en) * 2020-02-19 2021-08-20 华为技术有限公司 Business data processing method and device and cloud server
CN111459999B (en) * 2020-03-27 2023-08-18 北京百度网讯科技有限公司 Identity information processing method, device, electronic equipment and storage medium
CN111506737B (en) * 2020-04-08 2023-12-19 北京百度网讯科技有限公司 Graph data processing method, searching method, device and electronic equipment
CN113556368A (en) * 2020-04-23 2021-10-26 北京达佳互联信息技术有限公司 User identification method, device, server and storage medium
CN111640477A (en) * 2020-05-29 2020-09-08 京东方科技集团股份有限公司 Identity information unifying method and device and electronic equipment
CN112115367B (en) * 2020-09-28 2024-04-02 北京百度网讯科技有限公司 Information recommendation method, device, equipment and medium based on fusion relation network
CN112115381B (en) * 2020-09-28 2024-08-02 北京百度网讯科技有限公司 Construction method, device, electronic equipment and medium of fusion relation network
CN112883170B (en) * 2021-01-20 2023-08-18 中国人民大学 User feedback guided self-adaptive dialogue recommendation method and system
CN113672653B (en) * 2021-08-09 2024-10-29 杭州蚂蚁酷爱科技有限公司 Method and device for identifying private data in database
CN114064705A (en) * 2021-10-19 2022-02-18 广州数说故事信息科技有限公司 User information fusion method, terminal, storage medium and system under multilayer association
CN114547279B (en) * 2022-02-21 2023-04-28 电子科技大学 Judicial recommendation method based on mixed filtering
CN114676288B (en) * 2022-03-17 2024-06-28 北京悠易网际科技发展有限公司 ID pull-through method and device
CN117349358B (en) * 2023-12-04 2024-02-20 中国电子投资控股有限公司 Data matching and merging method and system based on distributed graph processing framework

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105099729A (en) * 2014-04-22 2015-11-25 阿里巴巴集团控股有限公司 User ID (Identification) recognition method and device
CN107682344A (en) * 2017-10-18 2018-02-09 南京邮数通信息科技有限公司 A kind of ID collection of illustrative plates method for building up based on DPI data interconnection net identifications
CN108664480A (en) * 2017-03-27 2018-10-16 北京国双科技有限公司 A kind of multi-data source user information integration method and device
CN109347787A (en) * 2018-08-15 2019-02-15 阿里巴巴集团控股有限公司 A kind of recognition methods of identity information and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011106897A1 (en) * 2010-03-05 2011-09-09 Chrapko Evan V Systems and methods for conducting more reliable assessments with connectivity statistics

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105099729A (en) * 2014-04-22 2015-11-25 阿里巴巴集团控股有限公司 User ID (Identification) recognition method and device
CN108664480A (en) * 2017-03-27 2018-10-16 北京国双科技有限公司 A kind of multi-data source user information integration method and device
CN107682344A (en) * 2017-10-18 2018-02-09 南京邮数通信息科技有限公司 A kind of ID collection of illustrative plates method for building up based on DPI data interconnection net identifications
CN109347787A (en) * 2018-08-15 2019-02-15 阿里巴巴集团控股有限公司 A kind of recognition methods of identity information and device

Also Published As

Publication number Publication date
CN110543586A (en) 2019-12-06

Similar Documents

Publication Publication Date Title
CN110543586B (en) Multi-user identity fusion method, device, equipment and storage medium
CN111782965B (en) Intention recommendation method, device, equipment and storage medium
US20210319329A1 (en) Method and apparatus for generating knowledge graph, method for relation mining
CN111046237B (en) User behavior data processing method and device, electronic equipment and readable medium
Deng et al. A user identification algorithm based on user behavior analysis in social networks
US20230334089A1 (en) Entity recognition from an image
CN104077723B (en) A kind of social networks commending system and method
CN110019876B (en) Data query method, electronic device and storage medium
CN107391502B (en) Time interval data query method and device and index construction method and device
CN112269789A (en) Method and device for storing data and method and device for reading data
CN111400504A (en) Method and device for identifying enterprise key people
US11556595B2 (en) Attribute diversity for frequent pattern analysis
US11366821B2 (en) Epsilon-closure for frequent pattern analysis
CN112528067A (en) Graph database storage method, graph database reading method, graph database storage device, graph database reading device and graph database reading equipment
WO2021027331A1 (en) Graph data-based full relationship calculation method and apparatus, device, and storage medium
CN111767321A (en) Node relation network determining method and device, electronic equipment and storage medium
Garcia et al. Multiple parallel mapreduce k-means clustering with validation and selection
CN110019400B (en) Data storage method, electronic device and storage medium
Jin et al. node2bits: Compact time-and attribute-aware node representations
CN112948593A (en) Knowledge graph generation method, device, equipment and medium
Zhang et al. Social network sensitive area perturbance method based on firefly algorithm
CN112101390A (en) Attribute information determination method, attribute information determination device and electronic equipment
CN112905872A (en) Intention recognition method, device, equipment and readable storage medium
Bhat et al. A framework for user identity resolutions across social networks
You et al. SAPAM: a scalable" activities in places" analysis mechanism for informed place design

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant