WO2018077039A1 - 社区发现方法、装置、服务器及计算机存储介质 - Google Patents
社区发现方法、装置、服务器及计算机存储介质 Download PDFInfo
- Publication number
- WO2018077039A1 WO2018077039A1 PCT/CN2017/105956 CN2017105956W WO2018077039A1 WO 2018077039 A1 WO2018077039 A1 WO 2018077039A1 CN 2017105956 W CN2017105956 W CN 2017105956W WO 2018077039 A1 WO2018077039 A1 WO 2018077039A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- community
- node
- nodes
- label
- communities
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 55
- 238000001514 detection method Methods 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 9
- 230000008859 change Effects 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 description 57
- 230000004927 fusion Effects 0.000 description 15
- 235000019580 granularity Nutrition 0.000 description 15
- 239000013598 vector Substances 0.000 description 13
- 238000010586 diagram Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 9
- 238000012545 processing Methods 0.000 description 8
- 238000005192 partition Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 238000000638 solvent extraction Methods 0.000 description 5
- 238000012546 transfer Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 235000008694 Humulus lupulus Nutrition 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000009194 climbing Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/22—Arrangements for sorting or merging computer data on continuous record carriers, e.g. tape, drum, disc
- G06F7/24—Sorting, i.e. extracting data from one or more carriers, rearranging the data in numerical or other ordered sequence, and rerecording the sorted data on the original carrier or on a different carrier or set of carriers sorting methods in general
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24578—Query processing with adaptation to user needs using ranking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/22—Arrangements for sorting or merging computer data on continuous record carriers, e.g. tape, drum, disc
- G06F7/32—Merging, i.e. combining data contained in ordered sequence on at least two record carriers to produce a single carrier or set of carriers having all the original data in the ordered sequence merging methods in general
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Definitions
- the embodiments of the present invention relate to the field of computer technologies and Internet technologies, and in particular, to a community discovery method, apparatus, server, and computer storage medium.
- Community discovery refers to the division of social networks into different communities, so that the user relationships within the same community are closely connected, and the relationship between the community and the community is sparse.
- An existing community discovery method includes: the server divides a social network into communities according to an SCD (Scalable Community Detection) algorithm, and obtains multiple communities.
- the social network refers to at least one of a relational network such as Facebook, Weibo, a campus network, and an instant messaging application.
- the SCD algorithm is based on triangles in the network, the communities that are divided are closely connected and have certain accuracy. However, because the community is too fine, it is not suitable for many scenarios. For example, taking the social network as the example of the campus network, the divided community is usually as small as the user belongs to the department, and for the campus network, the user usually expects to be divided into student stages such as junior high school students, high school students, and university students. Therefore, it is obvious that the above division does not satisfy the user's needs.
- Embodiments of the present invention are directed to providing a community discovery method, apparatus, server, and computer storage medium.
- a community discovery method applied to a server, the server including one or more processors and a memory, and one or more programs, wherein the one or more More than one program is stored in the memory, the program can include one or more modules each corresponding to a set of instructions, the one or more processors being configured to execute the instructions; the method comprising:
- n is an integer greater than or equal to 2;
- the community nodes with the same label are divided into the second community, and m second communities are obtained, where m is a positive integer less than n.
- a community discovery apparatus for a server, the server comprising one or more processors and a memory, and one or more programs, wherein the one or more More than one program is stored in the memory, the program may include one or more modules each corresponding to a set of instructions, the one or more processors being configured to execute the instructions; the apparatus comprising:
- a dividing module configured to divide a network node in the social network into a community, and obtain labels of n first communities and each of the n first communities; n is an integer greater than or equal to 2;
- An update module configured to update a label of a community node according to a label propagation algorithm, where the community node is a network node in the n small communities, and an initial label of the community node is a label of a first community to which the community node belongs;
- the merging module is configured to divide the community nodes with the same label into the second community, and obtain m second communities, where m is a positive integer smaller than n.
- a computer storage medium storing computer executable instructions for executing a community discovery method according to an embodiment of the present invention is provided. .
- a server comprising: a processor and a memory for storing a computer program executable on the processor, wherein the processor is configured to run the computer program At the time, the community discovery method described in the embodiment of the present invention is executed.
- the label of the community node in the first community is updated according to the label propagation algorithm, and then the community node with the same label is divided into the second community, and the m is obtained.
- the second community the first community to be initially divided will be merged to obtain a smaller number of second communities, which will solve the problem that the communities in the prior art are smaller in size and thus unable to meet the needs of certain scenarios.
- the problem is achieved; while ensuring the accuracy of the community obtained by the division, the first community can be merged into the second community and the granularity of the obtained community can be matched to the effect of dividing the demand.
- FIG. 1 is an abstract schematic diagram of a social network according to an embodiment of the present invention.
- FIG. 2 is a schematic diagram of an algorithm architecture involved in a community discovery method according to various embodiments of the present invention
- FIG. 3 is a flowchart of a community discovery method according to an embodiment of the present invention.
- FIG. 4 is a flowchart of a method for updating a label of a community node according to an embodiment of the present invention
- FIG. 5 is a schematic diagram of partitioning a community by a community discovery algorithm in a community discovery method according to another embodiment of the present invention.
- FIG. 6 is a schematic structural diagram of a community discovery apparatus according to another embodiment of the present invention.
- FIG. 7 is a schematic structural diagram of a server according to an embodiment of the present invention.
- FIG. 8 is a schematic diagram of a social network model according to an embodiment of the present invention.
- FIG. 1 is an abstract schematic diagram of a social network according to an embodiment of the present invention; as shown in FIG. 1 , a social network is equivalent to a “map”, and a network user is equivalent to a node in a “map”, and a network user is used.
- the relationship between the nodes is represented by a "line" between the nodes, which is used to divide the social network into at least two communities whose accuracy is higher than a preset threshold, by updating the tags of the community nodes in the community.
- the first community obtained by the division is merged into the second community, so that the obtained second community can still meet the community division requirements under the premise that the accuracy is guaranteed.
- the social network mentioned above refers to at least one of a relational network such as Facebook, Weibo, campus network, instant messaging application, and the like, and each node in the social network is a corresponding network user.
- the server may divide the n first communities in the social network by using a community discovery algorithm, update the community nodes in the first community through a label propagation algorithm, and obtain m second communities. Therefore, please refer to FIG. 2, which shows a schematic diagram of an algorithm framework of a community discovery method according to various embodiments of the present invention.
- the community discovery method is applicable to a server; the server includes one or more processors and a memory, and one or more programs, wherein the one or more programs are stored in a memory, the program may include one or more modules each corresponding to a set of instructions, the one or more
- the processor is configured to execute instructions; as shown in FIG. 3, the community discovery method can include:
- Step 201 Divide a community into a network node in the social network, and obtain labels of n first communities and each of the n first communities, where n is an integer greater than or equal to 2.
- the community discovery algorithm selected in this embodiment is accurate for community division.
- the community discovery algorithm finds that the community is more accurate, and the community of the community is correspondingly smaller. Therefore, the community discovery algorithm used in this embodiment generally divides the community into smaller granularity than the preset granularity.
- the granularity includes: classmates>university students>student classmates>student class students>student class members, and the preset granularity can be one of them.
- the student union that is, the community division granularity can be greater than the minimum granularity and less than the maximum. Any of the granularities in the granularity.
- the preset community discovery algorithm may be any one of an SCD algorithm and a faction filtering algorithm.
- Each of the first communities that are divided includes at least one community node, which is not limited in this embodiment.
- the label of each community node in the first community is the label of the first community.
- the label of the first community can be college students, colleagues, family members, and the like.
- the first community obtained by the division includes A, B, C, and D
- the labels of each first community are A', B', C', and D'
- the first community A includes community nodes A1, A2.
- A3 and A4 the first community B includes nodes B1, B2, and B3, the first community C includes community nodes C1, C2, C3, C4, C5, and C6, and the first community D includes community nodes D1 and D2.
- the initial labels of community nodes A1, A2, A3, and A4 are A'
- the initial labels of community nodes B1, B2, and B3 are B'
- the initial labels of community nodes C1, C2, C3, C4, C5, and C6 are C'
- the initial label of community nodes D1 and D2 is D'.
- each small community may also include tens, hundreds, or even thousands of community nodes, which is not limited in this embodiment.
- Step 202 Update, according to a label propagation algorithm, a label of a community node included in each of the n first communities, where the community node is a network node in the n first communities, where the community node
- the initial tag is a tag of the first community to which the community node belongs.
- the updating, by the label propagation algorithm, the label of the community node includes: traversing each community node, acquiring a first quantity, where the first quantity is the number of community nodes in the first community to which the community node belongs; The number of adjacent community nodes adjacent to the community node included in the small community to which the neighboring community node of the community node belongs, and selecting the maximum value as the second quantity; if the second quantity is greater than the number a quantity, the label of the community node is updated to a label of the first community corresponding to the second quantity, and the first community corresponding to the second quantity represents an adjacent community adjacent to the community node
- the number of nodes is the second number of first communities.
- the server may obtain the first number based on the number of community nodes in the first community to which the community node belongs.
- the server can obtain the number of community nodes included in the community where A1 is located, that is, the first community A.
- the server can obtain the number of community nodes in the community where C2 is located, that is, the first community C.
- the server may first count the number of neighboring community nodes adjacent to the community node included in the candidate community, and then select from the community node. The maximum value is taken as the second quantity.
- the candidate community refers to the first community to which the adjacent community node of the community node belongs.
- the community node A1 is adjacent to the community node C1, and the server can obtain the community to which the C1 belongs, that is, the number of adjacent nodes of the A1 included in the first community C is 5 (C1 respectively) , C2, C3, C4 and C5).
- traversing each community node to obtain a first quantity and acquiring the number of neighboring community nodes adjacent to the community node included in the first community to which the neighboring community node of the community node belongs, and selecting The maximum value is used as the second quantity, and the execution process may not be limited.
- the first quantity may be obtained first, then the second quantity may be obtained, or the second quantity may be obtained first, and then the first quantity may be obtained, or the first quantity and the first quantity may be obtained simultaneously.
- the second quantity is not limited in this embodiment.
- the server may detect whether the second quantity is greater than the first quantity, and if the detection result is that the second quantity is greater than the first quantity, the neighboring of the community node is indicated. A large part of the community nodes are in the candidate community corresponding to the second number. At this time, the community node may be divided into candidate communities, so the server may update the label of the community node in order to merge the communities. The label of the candidate community corresponding to the second quantity.
- the community node A1 is still taken as an example.
- the first quantity obtained by the server is 4, the second quantity is 5, and the second quantity is greater than the first quantity.
- the server may update the label of A1 from A' to the first.
- the server does not perform any processing on the label of the community node. For example, if the second quantity obtained is 2, the server does not update the label of the community node A1, that is, its label is still A'.
- each community node After traversing each community node, detecting whether the total number of community nodes in which the label changes in each community node reaches a quantity threshold; if the total number of community nodes whose detection result is a label change reaches a quantity threshold Then, each community node is traversed again; if the total number of community nodes whose detection result is that the label changes does not reach the quantity threshold, the traversal is ended.
- the server may count each time the tag of the community node is updated, and after traversing each community node, acquire the number of community nodes whose tags have changed, and detect whether the quantity is The number threshold is reached.
- the quantity threshold is a value preset by the user, and the quantity threshold is less than a preset value.
- the number threshold may be 0, which is not limited.
- the server can traverse each community node again, and determine Is there a community node that needs to update the label? If the detection result is that the number threshold is not reached, it indicates that the labels of the various community nodes have stabilized, and the server can end the update of the labels of the community nodes.
- Step 203 Divide the same community node with the label to the second community, and obtain m second communities, where m is a positive integer less than n.
- the total number of community nodes whose label response and detection result are changed does not reach the threshold value, and the community nodes with the same label are divided into the second community, and m second communities are obtained.
- the community node may be divided into the community corresponding to the updated label; of course, the server may also traverse the label of the community node after traversing each community node.
- the same community node is divided into a second community, which is not limited in this embodiment.
- the server taking the social network as the campus network and the preset community discovery algorithm as the SCD algorithm, the server is divided according to the SCD algorithm to obtain the first community based on the department of the community, for a certain department.
- the user because his friend, that is, most of his neighbors are in the community of a class, the server can merge the users in the first community of the department into the second community of the class.
- the class can be The users in the first community are merged into the second community of the school, so that the server can get a large community with accurate and large granularity.
- the community nodes may also be sorted first, for example, the community nodes are sorted in a random order, or the community nodes are followed.
- the importance of the community nodes is sorted.
- the community nodes are sorted according to the importance degree of each community node; correspondingly, the traversing each community node acquires the first quantity, including: according to the order of the sorted community nodes, For each community node, get the first quantity.
- the sorting of the community nodes according to the importance degree of the community node may specifically include the following three possible implementation manners:
- the N community nodes are sorted in descending order according to the clustering coefficient of each community node.
- the clustering coefficient of the community node is CC.
- j represents the number of edges connected to all neighboring community nodes of the i-th community node
- k represents the number of all neighboring community nodes of the i-th community node.
- the servers are sorted in descending order of the degree of the community nodes.
- the degree of the node indicates the number of neighboring community nodes adjacent to the community node. For example, taking the community node A1 as an example, and the neighboring community nodes including C1, C2, C3, C4, and C5, the community node A The degree is 5.
- the clustering coefficient of the community node is 1 and the degree is less than the preset degree, the community nodes are ranked last, and the community nodes are sorted according to the order of degree from large to small, and this embodiment does not do this. limited.
- each community node is sorted according to the Pagerank algorithm.
- the community nodes are sorted in descending order of the degree of community nodes.
- the server may also sort the community nodes by other methods, which is not limited in this embodiment.
- this embodiment is only used to update the label of the community node by using the label propagation method.
- the server may also update the label of the community node by using other label propagation methods, which is not described in this embodiment.
- the first community referred to in this embodiment refers to a community that is divided by a preset community discovery algorithm
- the second community refers to that after the label of the node in the small community is updated according to the label propagation algorithm, the label is the same.
- the re-divided community does not mean that the number of community nodes included in the second community must be greater than the number of community nodes included in the first community, which means that the obtained m second communities may exist.
- the number of the community nodes included in the second largest community is smaller than the number of the community nodes included in the first community in the first community, which is not limited in this embodiment.
- the community discovery method after dividing the n first communities, updates the tags of the community nodes in the first community according to the label propagation algorithm, and then divides the community nodes with the same tags into the first In the second community, m second communities are obtained; the first community that is initially divided is merged, and then a smaller number of second communities are obtained, which solves the problem that the community divided in the prior art has a small granularity and cannot be satisfied.
- the problem of the requirements of certain scenarios achieving the accuracy of the community obtained by the division, and merging the first community into the second community, so that the granularity of the obtained community meets the effect of dividing the demand.
- the label of the community node when the label is propagated, is updated to the label of the candidate community corresponding to the second quantity only when the second quantity is greater than the first quantity, and the label is re-updated according to the updated label.
- the accuracy of community segmentation after dividing the community At the same time, by updating the label of the community node to be larger than the label of the candidate community corresponding to the second quantity of the first quantity, the problem that the existing label is propagated has a certain randomness, and the fluctuation of the divided community is reduced. .
- the community nodes arranged in descending order of importance of the community nodes can be traversed, further ensuring the accuracy of the merged communities.
- Step 301 For the i-th community node of the N community nodes, obtain a first quantity, where the first quantity is the total number of community nodes in the first community to which the i-th community node belongs, 0 ⁇ i ⁇ N-1, i is a positive integer and the starting value of i is 0.
- Step 302 Obtain the number of neighboring community nodes adjacent to the i-th community node included in the first community to which the neighboring community node of the i-th community node belongs, and select the maximum number of the second largest number.
- the server may be at i ⁇ N-1
- the server can detect whether the total number of community nodes whose tags have changed in the community node reaches a quantity threshold.
- the community discovery algorithm in the foregoing embodiment may be at least one of an SCD algorithm and a faction filtering algorithm.
- the community discovery algorithm is an SCD algorithm as an example.
- Step 201 of the embodiment of the present invention may include:
- WCC Weighted Community Clustering
- formula (1) is used to calculate the WCC value of node x with respect to community C.
- t(x, C) represents the number of triangles associated with node x in community C
- vt(x, C) represents the number of nodes belonging to community C among all nodes that can form a triangle with node x.
- +vt(x,V ⁇ C) indicates the remaining nodes in community C that do not include x.
- the SCD algorithm will use the WCC value of equation (3) as the objective function, and optimize the WCC value through the greedy algorithm to obtain a local optimal partition.
- SCD is mainly divided into two steps:
- the SCD algorithm Before pre-clustering, you need to pruning the community.
- the SCD algorithm first removes edges in the network that do not belong to any triangle. Calculate the clustering coefficient of each node, and the clustering coefficient of node v satisfies the following formula (4):
- j represents the number of edges connected to all neighboring nodes of node v
- k represents The number of all neighboring nodes of node v.
- the clustering coefficients are sorted in reverse order. If the clustering coefficients are the same, the nodes are sorted according to the degree of the nodes. Among them, there is a special case. When the clustering coefficient of a node is 1 and the degree is 2, the node is sorted at the end of the sequence, and finally the node sequence S is obtained.
- Pre-clustering begins. Each node in the node sequence S is traversed in order. For any node v that has not been visited, a new community C is created for the node v and its neighbor nodes that have not been visited, and the node in the community C is marked as accessed. And add community C to partition P. Until all nodes are marked as accessed, the iteration ends and a pre-clustered partition P is obtained.
- the hill climbing method is used to optimize the WCC value. First traverse each node and calculate the best_movement of each node. The best_movement is executed for each node to obtain a new partition P'. Calculating the new WCC value of P'. If the new WCC value is increased by more than a threshold value ⁇ ( ⁇ , for example, 0.1) relative to the old partition P, the best_movement of the new partition P' is recalculated until it is divided twice before and after. The WCC value rise is less than the end of the threshold ⁇ .
- ⁇ for example, 0.1
- No_Action means no action
- Remove means to remove a node from the current community, making it an isolated node
- Transfer means moving a node from the current community to another community.
- the server calculates best_movement, it first calculates the gain of each operation for the WCC value, including:
- WCC I (v, C 1 ) indicates the improvement of the overall WCC value after the isolated node v is inserted into the community C 1 .
- C' 1 C 1 ⁇ v ⁇ , with:
- WCC R (v, C 1 ) represents the improvement of the overall WCC value after the node v is removed from the community C 1 .
- WCC T (v, C 1 , C 2 ) indicates the improvement of the overall WCC value after the node v is transferred from the community C1 to the community C2.
- the server For each node v in V, the server first calculates the elevated WCC R (v, C 1 ) after removing it from the current community, and then obtains its potential candidate community from its neighbor-owned community, calculating each candidate community. The corresponding transfer promotion value WCC T (v, C 1 , C 2 ) is selected and marked as the highest boost. Then select the optimal operation from the WCC R (v, C 1 ) and the maximum WCC T (v, C 1 , C 2 ), namely Remove or Transfer. If both of the WCC values are negative, then keep v in the original community (No_Action).
- the server can be divided into multiple small communities, after which the server can assign a corresponding label to each community.
- FIG. 6 is a schematic structural diagram of a community discovery apparatus according to an embodiment of the present invention.
- the community discovery apparatus is applicable to a server, where the server includes one or more processors and a memory, and one or more
- the program wherein the one or more programs are stored in a memory, the program may include one or more modules each corresponding to a set of instructions, the one or more processors being configured to execute
- the community discovery apparatus may include: a partitioning module 510, an update module 520, and a merge module 530. among them:
- the dividing module 510 is configured to divide the network nodes in the social network into communities, and obtain labels of n first communities and each of the n first communities; n is an integer greater than or equal to 2;
- the update module 520 is configured to update, according to a label propagation algorithm, a label of a community node included in each of the n first communities, where the community node is a network node in the n first communities, The initial label of the community node is a label of the first community to which the community node belongs;
- the merging module 530 is configured to divide the community nodes with the same label into the second community, and obtain m second communities, where m is a positive integer smaller than n.
- the community discovery apparatus after dividing the n first communities, updates the labels of the community nodes in the first community according to the label propagation algorithm, and then divides the community nodes with the same label into the first In the second community, m second communities are obtained; the first community that is initially divided is merged, and then a smaller number of second communities are obtained, which solves the problem that the community divided in the prior art has a small granularity and cannot be satisfied.
- the problem of the requirements of certain scenarios achieving the accuracy of the community obtained by the division, and merging the first community into the second community, so that the granularity of the obtained community meets the effect of dividing the demand.
- the update module 520 includes:
- a first acquiring unit configured to traverse each community node, to obtain a first quantity, where the first quantity is a number of community nodes in the first community to which the community node belongs;
- a second acquiring unit configured to acquire the number of neighboring community nodes adjacent to the community node included in the first community to which the neighboring community node of the community node belongs, and select the maximum value as the second quantity
- an update unit configured to update the label of the community node to the second quantity, where the second quantity acquired by the second acquiring unit is greater than the first quantity acquired by the first acquiring unit a label of the corresponding first community, and a first community table corresponding to the second quantity
- the number of adjacent community nodes adjacent to the community node is the second number of first communities.
- the device further includes:
- a detecting module configured to detect, after traversing each community node, whether a total number of community nodes whose labels have changed in the community node reaches a quantity threshold
- a first result module configured to traverse each community node again when the total number of community nodes whose detection result is that the label changes is the threshold value
- a second result module configured to end the traversal when the total number of community nodes whose detection result of the detection module is that the label changes does not reach the quantity threshold.
- the device further includes:
- a sorting module configured to sort community nodes according to the importance of each community node
- the first obtaining unit is further configured to acquire a first quantity for each community node according to the order of the sorted community nodes.
- the sorting module is further configured to:
- the community discovery device provided by the foregoing embodiment is only illustrated by the division of each functional module. In actual applications, the function distribution may be completed by different functional modules according to requirements, that is, the internal structure of the server. Divided into different functional modules to Complete all or part of the functions described above.
- the community discovery device and the community discovery method embodiment provided by the foregoing embodiments are in the same concept, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
- the community discovery device may be implemented by a server in an actual application; the partitioning module 510, the update module 520, the merge module 530, the detection module, the first result module, and the second in the community discovery device.
- the result module and the sorting module can be implemented by a central processing unit (CPU), a digital signal processor (DSP), or a field-programmable gate array (FPGA) in practical applications.
- the embodiment of the present invention further provides a computer storage medium, which may be a computer readable storage medium included in the memory in the above embodiment, or may be separately readable by a computer that is not assembled into the terminal.
- Storage medium stores one or more computer executable instructions that are used by one or more processors to perform the community discovery method of embodiments of the present invention.
- the computer executable instructions are configured to: divide a network node in a social network into communities according to a preset community discovery algorithm, and obtain n first communities and each of the first communities in the n first communities.
- n is an integer greater than or equal to 2; updating a label of a community node included in each of the n first communities according to a label propagation algorithm, the community node being a network in the n first communities a node, the initial label of the community node is a label of the first community to which the community node belongs; the community node with the same label is divided into the second community, and m second communities are obtained, where m is a positive integer smaller than n.
- the computer executable instructions are configured to: traverse each community node, obtain a first quantity, where the first quantity is a number of community nodes in a first community to which the community node belongs; The number of neighboring community nodes adjacent to the community node included in the first community to which the neighboring community node of the community node belongs, and selecting the maximum value as the second a quantity; if the second quantity is greater than the first quantity, updating a label of the community node to a label of a first community corresponding to the second quantity, and a first community corresponding to the second quantity Representing the number of adjacent community nodes adjacent to the community node as the second number of first communities.
- the computer executable instructions are configured to: after traversing each community node, detecting whether a total number of community nodes in the community node whose label has changed reaches a quantity threshold; if the detection result is that the label changes If the total number of community nodes reaches the number threshold, then each community node is traversed again; if the total number of community nodes whose detection result is a label change does not reach the quantity threshold, the traversal is ended.
- the computer executable instructions are configured to: sort the community nodes according to the importance degree of each community node; and obtain the first quantity for each community node according to the order of the sorted community nodes. .
- the computer executable instructions are configured to: sort the respective community nodes according to a clustering coefficient of each community node in a descending order; or, the respective communities according to a Pagerank algorithm The nodes are sorted; or, the individual community nodes are sorted in descending order of degree of each community node.
- FIG. 7 is a schematic structural diagram of a server according to an embodiment of the present invention.
- This server is used to implement the community discovery method provided in the above embodiments. Specifically:
- the server 600 includes a processor 601, a system memory 604 including a random access memory (RAM) 602 and a read only memory (ROM) 603, and a system bus 605 that connects the system memory 604 and the processor 601.
- the server 600 also includes a basic input/output system (I/O system) 606 that facilitates transfer of information between various devices within the computer, and mass storage for storing the operating system 613, applications 614, and other program modules 615.
- I/O system basic input/output system
- the processor 601 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may pass through a set of hardware in the processor 601. The instructions in the form of logic circuits or software are completed.
- the processor 601 described above may be a general purpose processor, a DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, or the like.
- the processor 601 can implement or perform the various methods, steps, and logic blocks disclosed in the embodiments of the present invention.
- a general purpose processor can be a microprocessor or any conventional processor or the like.
- the steps of the method disclosed in the embodiment of the present invention may be directly implemented as a hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
- the software module can reside in a storage medium located in memory 604, which reads the information in memory 604 and, in conjunction with its hardware, performs the steps of the foregoing method.
- the basic input/output system 606 includes a display 608 for displaying information and an input device 609 such as a mouse or keyboard for user input of information.
- the display 608 and the input device 609 are both connected to the processor 601 via an input and output controller 610 connected to the system bus 605.
- the basic input/output system 606 can also include an input output controller 610 for receiving and processing input from a plurality of other devices, such as a keyboard, mouse, or electronic stylus.
- input and output controller 610 also provides output to a display screen, printer, or other type of output device.
- the mass storage device 607 is connected to the processor 601 by a mass storage controller (not shown) connected to the system bus 605.
- the mass storage device 607 and its associated computer readable medium provide non-volatile storage for the server 600. That is, the mass storage device 607 can include a computer readable medium (not shown) such as a hard disk or a CD-ROM drive.
- the computer readable medium can include computer storage media and communication media.
- Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
- Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state storage technologies, CD-ROM, DVD or other optical storage, Cartridges, tapes, disk storage, or other magnetic storage devices.
- RAM random access memory
- ROM read only memory
- EPROM Erasable programmable read-only memory
- EEPROM electrically erasable programmable read-only memory
- the server 600 may also be operated by a remote computer connected to the network through a network such as the Internet. That is, the server 600 can be connected to the network 612 through a network interface unit 611 connected to the system bus 605, or can also be connected to other types of networks or remote computer systems (not shown) using the network interface unit 611. .
- the memory also includes one or more programs, the one or more programs being stored in a memory and configured to be executed by one or more processors.
- the one or more programs described above include instructions for performing the community discovery method described above.
- n is an integer greater than or equal to 2; a label of a community node included in each of the n first communities is updated according to a label propagation algorithm, the community node is a network node in the first community, the initial label of the community node is a label of the first community to which the community node belongs; the community node with the same label is divided into the second community, and m second communities are obtained, where m is A positive integer less than n.
- the processor 601 is configured to: when traversing the computer program, perform: traversing each community node, acquiring a first quantity, where the first quantity is a community in a first community to which the community node belongs The number of nodes; the number of neighboring community nodes adjacent to the community node included in the first community to which the neighboring community node of the community node belongs is obtained, and the maximum value is selected as the second quantity; And updating the label of the community node to a label of the first community corresponding to the second quantity, where the second quantity corresponding to the first community represents the community node The number of adjacent neighboring community nodes is the second number of first communities.
- the processor 601 when the processor 601 is configured to run the computer program, it is executed to: after traversing each community node, detecting whether the total number of community nodes in the community node whose label has changed reaches a quantity threshold; As a result, if the total number of community nodes whose labels have changed reaches the number threshold, each community node is traversed again; if the total number of community nodes whose detection result is a label change does not reach the number threshold, the traversal is ended.
- processor 601 when the processor 601 is configured to run the computer program, perform: sorting community nodes according to the importance degree of each community node; and, according to the order of the sorted community nodes, for each community Node, get the first quantity.
- processor 601 when the processor 601 is configured to run the computer program, perform: sorting the community nodes according to a clustering coefficient of each community node in descending order; or, according to Pagerank The algorithm sorts the individual community nodes; or sorts the community nodes according to the degree of each community node in descending order.
- the network node in the social network may be referred to as a node
- the obtained first community and the second community may be referred to as a first social network model and a second social network model, respectively, and the second community
- the generation method can also be implemented according to the following processing methods:
- Step 401 Acquire a first social network model that has a first association relationship between the nodees
- the two nodes having the connected edges in the first social network model have a first association relationship
- the two users having the first association relationship have a friend relationship
- Step 402 Express the first social network model as a first adjacency matrix according to a preset representation manner
- traversing the nodes in the first social network model setting an element value corresponding to any two nodes having a direct association relationship to 1, and setting an element value corresponding to any two nodes not having a direct association relationship to 0, generating a first adjacency matrix;
- the first social network model has N nodes, and the first adjacency matrix is an N ⁇ N matrix, and the first adjacency matrix is a symmetric matrix;
- the first adjacency matrix indicates that there is a direct association relationship between nodes in the first social network model, and the direct association relationship refers to having a connected edge between two nodes; for example, if the first social If there is a joint between node 1 and node 2 in the network model, then node 1 and node 2 are considered to have a direct relationship; then, the value of element A 12 in the first adjacency matrix is 1. If there is no connected edge between node 1 and node 3 in the first social network model, it is considered that there is no direct association between node 1 and node 3; then the value of element A 13 in the first adjacency matrix is 0.
- Step 403 Record any two nodes in the first social network model that have a direct association relationship with the first node as having a direct association relationship to generate a second adjacency matrix; the first node is the first social node. Any node in the network model;
- the first node is any one of the first social network models
- the recording node 1 and the node 3 have a direct association relationship in the second adjacency matrix.
- Step 404 Acquire a structural similarity of any two nodes having a direct association relationship in the second adjacency matrix in the first adjacency matrix;
- the node u and the node v having the direct association relationship in the second adjacency matrix are obtained, and the first social network model represented by the first adjacency matrix is respectively determined to determine that the node u corresponds to the node a first set of neighbor nodes of u, and a set of second neighbor nodes corresponding to the node v, including the node v;
- N[u] represents the set of all adjacent nodes in which the node u contains itself in the first matrix
- N[v] represents the set of all adjacent nodes in which the node v contains itself in the first matrix.
- represents the number of intersections of N[u] and N[v]
- d[u] represents the number of nodes in the set N[u]
- d[v] represents the set N[v] The number of nodes in ].
- Step 405 Acquire a similarity of feature attributes of any two nodes having a direct association relationship in the second adjacency matrix
- acquiring a similarity of multiple feature attributes of any two nodes having a direct association relationship in the second adjacency matrix; and performing similarity on the similarity of the multiple feature attributes according to a linear weighted average algorithm The similarity of the feature attributes of any two nodes; when calculating the similarity of feature attributes, the weighting parameters can be flexibly set according to actual conditions;
- Obtaining a similarity of the feature attributes of any two nodes having a direct association relationship in the second adjacency matrix respectively acquiring the first feature corresponding to the first feature of any two nodes having a direct association relationship in the second adjacency matrix a first row vector and a second row vector of the attribute; wherein the first feature attribute is any one of the plurality of feature attributes; the values in the first row vector and the second row vector Determining, respectively, a plurality of specified time periods within a preset time range, a state of the first feature attribute of the any two nodes; determining the any two nodes based on the first row vector and the second row vector Corresponding to the first similarity of the first feature attribute;
- each feature attribute of two nodes having a direct associative relationship in the second adjacency matrix may be calculated by using the following formula:
- each feature attribute of the node in the second adjacency matrix has a corresponding row vector value, such as the attention to the public number or not, the row vector value corresponding to the corresponding feature attribute is 1 and 0, and the different geographic locations are in the single
- the monthly release status or the number of times of check-in is recorded as the value of the row value corresponding to the corresponding feature attribute, or the actual number of times is classified according to a preset rule;
- the feature attributes include: geographic location information, personal interest, behavioral preference, etc., when the plurality of feature attributes are processed according to the weighted average algorithm, the similarity of each feature attribute may be weighted according to actual needs, and the user A and the user A are obtained.
- the similarity of the feature attributes of User B is the same.
- Step 406 Determine a fusion similarity based on the structural similarity of the two nodes and the similarity of the feature attributes, filter the determined fusion similarity according to a preset requirement, and generate a third based on the fusion similarity that meets the preset requirement.
- Adjacency matrix
- the fusion similarity of any two nodes is calculated according to the nonlinear exponential in the following formula
- TP S represents fusion similarity
- P S represents similarity of feature attributes
- T S represents structural similarity
- ⁇ and ⁇ are weighting parameters; and values of ⁇ and ⁇ can be adjusted according to actual needs.
- the correlation between the two nodes may be considered small; therefore, all the fusion similarities less than the preset threshold are replaced by zero, and the third adjacency is generated.
- the third adjacency matrix includes both the similarity of the user's friend relationship chain and the similarity of the user's feature attributes; so that the corresponding community discovery, link prediction, and corresponding based on the reconstructed social network model are performed.
- the results obtained by graph characterization are more realistic.
- Step 407 Generate a second social network model that represents a second association relationship between the nodes based on the third adjacency matrix
- two nodes in the third adjacency matrix having a second association relationship are in the The two social network models have a connected edge, and the two nodes in the third adjacency matrix that do not have the second associated relationship do not have a connected edge in the second social network model.
- Step 501 Acquire a first adjacency matrix for representing a network model.
- traversing the nodes in the first social network model setting an element value corresponding to any two nodes having a direct association relationship to 1, and setting an element value corresponding to any two nodes not having a direct association relationship to 0, generating a first adjacency matrix A;
- node 1 and node 2 have connected edges, and node 1 and node 2 are considered to have a direct association relationship, and the value of element A 12 in the first adjacency matrix is 1; node 1 and node 6 There is no contiguous edge between them, and it is considered that node 1 and node 6 do not have a direct association relationship, and the value of the element A 16 in the first adjacency matrix A is 1; for the same reason, each element in the first adjacency matrix A can be calculated.
- Step 502 Record, in the first social network model, two nodes with a maximum hop count of 2 as having a direct association relationship to generate a second adjacency matrix;
- the node 3 has a side edge with the node 1 and the node 5, respectively, and the node 1 and the node 5 do not have the edge, and the number of hops between the node 1 and the node 5 is considered to be 2, and the node is recorded.
- 1 and node 5 have a direct association relationship, the value of the element A 15 in the second adjacency matrix is 1; similarly, the values of the elements A 14 and A 16 in the second adjacency matrix are 1; node 2
- the number of hops between the node 3 and the node 1 is 1, and the values of the elements A 12 and A 13 in the second adjacency matrix are also 1; and so on, the second adjacency matrix A1 is obtained.
- Step 503 Acquire a structural similarity of any two nodes having a direct association relationship in the second adjacency matrix in the first adjacency matrix;
- the neighbors of the node 1 are the node 2 and the node 3
- the neighbors of the node 5 are the node 3
- the node 1 and The intersection of the neighbors of node 5 is node 3
- the structural similarity of node 1 and node 5 is:
- Step 504 Acquire a similarity of feature attributes of any two nodes having a direct association relationship in the second adjacency matrix.
- the similarity of multiple feature attributes of any two nodes having a direct association relationship in the second adjacency matrix is first acquired; and the similarity of the multiple feature attributes is processed according to a linear weighted average algorithm.
- the similarity of the feature attributes of any two nodes, and the weighting parameters when calculating the feature attribute similarity can be flexibly set according to actual conditions.
- Each feature attribute of the node in the second adjacency matrix has a corresponding row vector value, such as a public number or a record, and the row vector value corresponding to the corresponding feature attribute is 1 and 0, differently
- the number of times the status is posted in a single month or the number of times of check-in is recorded as the value of the row value corresponding to the corresponding feature attribute, or the actual number of times is classified according to a preset rule;
- the similarity of the feature attributes of the pairs of nodes having the direct associative relationship in the second adjacency matrix can be calculated.
- Step 505 Determine a fusion similarity based on structural similarity of any two nodes and similarity of feature attributes
- fusion similarity of any two nodes is calculated according to the following formula
- TP S represents the fusion similarity
- P S represents the similarity of the characteristic attributes of any two nodes
- T S represents any two structural similarities
- ⁇ and ⁇ are weighting parameters; the values of ⁇ and ⁇ can be performed according to actual needs. Adjustment.
- Step 506 Replace the element values corresponding to any two nodes having a direct association relationship in the second adjacency matrix with the calculated fusion similarities of the two nodes.
- Step 507 replacing the fusion similarity less than 0.3 with zero to generate a third adjacency matrix
- the preset threshold can be flexibly set according to actual needs, and is usually set between 0.2 and 0.4;
- the threshold is set to 0.3
- the third adjacency matrix A2 is obtained as:
- the adjacency matrix representing the reconstructed network model includes both the user's
- the similarity of the buddy relationship chain also includes the similarity of the user's feature attributes.
- the disclosed method and apparatus may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the modules is only a logical function division.
- there may be another division manner for example, multiple modules or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed.
- the communication connections between the various components shown or discussed may be indirect coupling or communication connections through some interfaces, devices or modules, and may be electrical, mechanical or otherwise.
- the modules described above as separate components may or may not be physically separated.
- the components displayed as modules may or may not be physical modules, that is, may be located in one place or distributed to multiple network modules; Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may be separately used as one module, or two or more modules may be integrated into one module;
- the module can be implemented in the form of hardware or in the form of hardware plus software function modules.
- the foregoing program may be stored in a computer readable storage medium, and when executed, the program includes The foregoing steps of the method embodiment; and the foregoing storage medium includes: a removable storage device, a read-only memory (ROM), a magnetic disk or an optical disk, and the like, which can store program codes.
- ROM read-only memory
- the above-described integrated module of the embodiment of the present invention may be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a stand-alone product.
- the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product. It is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the methods described in various embodiments of the present invention.
- the foregoing storage medium includes various media that can store program codes, such as a mobile storage device, a ROM, a magnetic disk, or an optical disk.
- the technical solution of the embodiment of the present invention after dividing the n first communities, updating the tags of the community nodes in the first community according to the label propagation algorithm, and then dividing the community nodes with the same tags into the second community, and obtaining m first
- the second community the first community to be initially divided will be merged to obtain a smaller number of second communities, which solves the problem that the communities in the prior art are smaller in size and thus unable to meet the needs of certain scenarios.
- the first community can be merged into the second community and the granularity of the obtained community can be matched with the effect of dividing the demand.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Marketing (AREA)
- General Health & Medical Sciences (AREA)
- General Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Primary Health Care (AREA)
- Human Resources & Organizations (AREA)
- Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Economics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明实施例公开了一种社区发现方法、装置、服务器及计算机存储介质。所述方法包括:将社会网络中的网络节点划分社区,得到n个第一社区以及所述n个第一社区中每个第一社区的标签;n为大于等于2的整数;根据标签传播算法更新所述n个第一社区中每个第一社区包括的社区节点的标签,社区节点的初始标签为社区节点所属第一社区的标签;将标签相同的社区节点划分至第二社区,获得m个第二社区,m为小于n的正整数。
Description
相关申请的交叉引用
本申请基于申请号为201610954505.0、申请日为2016年10月27日的中国专利申请、以及基于申请号为201610933379.0、申请日为2016年10月31日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。
本发明实施例涉及计算机技术与互联网技术领域,特别涉及一种社区发现方法、装置、服务器及计算机存储介质。
社区发现,是指将社会网络划分成不同的社区,使得同一社区内部的用户关系连接紧密,而社区与社区之间的关系连接稀疏。
现有的一种社区发现方法包括:服务器根据SCD(Scalable Community Detection,可扩展的社区检测)算法对社会网络划分社区,并得到多个社区。其中,社会网络是指诸如脸书、微博、校园网、即时通信应用程序等关系网络中的至少一种。
由于SCD算法基于网络中的三角形,划分出来的社区内部连接紧密,具备一定的准确度,但是由于其划分社区过于精细,并不适用于很多场景的需求。比如,以社会网络为校园网为例,划分后的社区通常小至用户在社团中所属的部门,而对于校园网来说,用户通常期望以学生阶段如初中同学、高中同学、大学同学来划分,因此显而易见的,上述划分方式并不满足用户需求。
发明内容
本发明实施例期望提供一种社区发现方法、装置、服务器及计算机存储介质。
根据本发明实施例的第一方面,提供了一种社区发现方法,应用于服务器,所述服务器包括有一个或多个处理器以及存储器,以及一个或一个以上的程序,其中,所述一个或一个以上的程序存储于存储器中,所述程序可以包括一个或一个以上的每一个对应于一组指令的模块,所述一个或多个处理器被配置为执行指令;该方法包括:
将社会网络中的网络节点划分社区,得到n个第一社区以及所述n个第一社区中每个第一社区的标签;n为大于等于2的整数;
根据标签传播算法更新所述n个第一社区中每个第一社区包括的社区节点的标签,所述社区节点为所述n个第一社区中的网络节点,所述社区节点的初始标签为所述社区节点所属第一社区的标签;
将标签相同的社区节点划分至第二社区,获得到m个第二社区,m为小于n的正整数。
根据本发明实施例的第二方面,提供了一种社区发现装置,应用于服务器,所述服务器包括有一个或多个处理器以及存储器,以及一个或一个以上的程序,其中,所述一个或一个以上的程序存储于存储器中,所述程序可以包括一个或一个以上的每一个对应于一组指令的模块,所述一个或多个处理器被配置为执行指令;该装置包括:
划分模块,配置为将社会网络中的网络节点划分社区,得到n个第一社区以及所述n个第一社区中每个第一社区的标签;n为大于等于2的整数;
更新模块,配置为根据标签传播算法更新社区节点的标签,所述社区节点为所述n个小社区中的网络节点,所述社区节点的初始标签为所述社区节点所属第一社区的标签;
合并模块,配置为将标签相同的社区节点划分至第二社区,获得m个第二社区,m为小于n的正整数。
根据本发明实施例的第三方面,提供了一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行本发明实施例所述的社区发现方法。
根据本发明实施例的第四方面,提供了一种服务器,该服务器包括:处理器和用于存储能够在处理器上运行的计算机程序的存储器,其中,所述处理器用于运行所述计算机程序时,执行本发明实施例所述的社区发现方法。
本发明实施例提供的技术方案,通过在划分得到n个第一社区之后,根据标签传播算法更新第一社区中的社区节点的标签,进而将标签相同的社区节点划分至第二社区,得到m个第二社区;也即将初步划分得到的第一社区进行合并,进而得到数量更少的第二社区,解决了现有技术中划分得到的社区的粒度较小,进而无法满足某些场景的需求的问题;达到了在保证划分得到的社区的准确度的同时,又可以将第一社区合并为第二社区进而使得得到的社区的粒度符合划分需求的效果。
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例的社会网络的一种抽象示意图;
图2是本发明各个实施例提供的社区发现方法所涉及的算法架构图;
图3是本发明一个实施例提供的社区发现方法的流程图;
图4是本发明一个实施例提供的更新社区节点的标签的方法流程图;
图5是本发明另一实施例提供的社区发现方法中通过社区发现算法划分社区的示意图;
图6是本发明另一实施例提供的社区发现装置的结构示意图;
图7是本发明一个实施例提供的服务器的结构示意图;
图8为本发明实施例的一种社交网络模型示意图。
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
本发明实施例的社区发现方法可以用于服务器中,该服务器可以为一台服务器,也可以为由多台服务器组成的服务器集群,对此并不做限定。在本发明实施例中,图1为本发明实施例的社会网络的一种抽象示意图;如图1所示,社会网络相当于“图”,网络用户相当于“图”中的节点,网络用户之间的关联关系通过节点之间的“线”表示,该服务器用于将社会网络划分为准确度高于预设阈值的至少两个社区,通过更新社区中的社区节点的标签的方式来将划分得到的第一社区合并为第二社区,进而使得得到的第二社区在准确度有所保证的前提下划分粒度仍然可以满足社区划分需求。其中,上述所说的社会网络是指诸如脸书、微博、校园网、即时通信应用程序等关系网络中的至少一种,并且社会网络中的各个节点即为对应的各个网络用户。
实际实现时,服务器可以通过社区发现算法划分得到社会网络中的n个第一社区,通过标签传播算法更新第一社区中的社区节点,进而得到m个第二社区。因此,请参考图2,其示出了本发明各个实施例所涉及的社区发现方法的算法框架示意图。
参考图3,其示出了本发明一个实施例提供的社区发现方法的流程图,社区发现方法可应用于服务器;所述服务器包括有一个或多个处理器以及
存储器,以及一个或一个以上的程序,其中,所述一个或一个以上的程序存储于存储器中,所述程序可以包括一个或一个以上的每一个对应于一组指令的模块,所述一个或多个处理器被配置为执行指令;如图3所示,该社区发现方法可以包括:
步骤201,对社会网络中的网络节点划分社区,得到n个第一社区以及所述n个第一社区中每个第一社区的标签,n为大于等于2的整数。
这里,不同的社区发现算法在发现社区时均会存在一定的误差,其并非完全准确,也即每种社区发现算法均具备一定的准确度,而本实施例选取的社区发现算法为社区划分准确度高于预设阈值的社区发现算法。由于通常情况下,社区发现算法发现社区的准确度越高,其划分的社区相应的也越小,因此,本实施例所采用的社区发现算法对社区的划分粒度通常小于预设粒度。以粒度包括:同学>大学同学>学生会同学>学生会部门同学>学生会部门组别同学为例,预设粒度可以为其中某一种,如学生会,即社区划分粒度例如可以是大于最小粒度、小于最大粒度中的任一粒度。具体的,该预设社区发现算法可以为诸如SCD算法、派系过滤算法中的任一种。
划分得到的每个第一社区中包括至少一个社区节点,本实施例对此并不做限定。并且,第一社区中的每个社区节点的标签即为该第一社区的标签。其中,第一社区的标签可以为大学同学、同事、家人等等。
比如,划分得到的第一社区包括A、B、C、D,每个第一社区的标签为A’、B’、C’和D’,其中第一社区A中包括社区节点A1、A2、A3和A4,第一社区B中包括节点B1、B2和B3,第一社区C中包括社区节点C1、C2、C3、C4、C5和C6,第一社区D中包括社区节点D1和D2,则社区节点A1、A2、A3和A4的初始标签即为A’,社区节点B1、B2和B3的初始标签即为B’,社区节点C1、C2、C3、C4、C5和C6的初始标签即为C’,社区节点D1和D2的初始标签即为D’。上述只是以每个小社区中包括几个
社区节点来举例,实际实现时,每个小社区中还可以包括几十、几百甚至上千个社区节点,本实施例对此并不做限定。
步骤202,根据标签传播算法更新所述n个第一社区中每个第一社区包括的社区节点的标签,所述社区节点为所述n个第一社区中的网络节点,所述社区节点的初始标签为所述社区节点所属第一社区的标签。
作为一种实施方式,所述根据标签传播算法更新社区节点的标签,包括:遍历每个社区节点,获取第一数量,第一数量为该社区节点所属第一社区中的社区节点的数量;获取所述社区节点的相邻社区节点所属小社区中包括的与所述社区节点相邻的相邻社区节点的数量,选择其中的最大值作为第二数量;若所述第二数量大于所述第一数量,则将所述社区节点的标签更新为所述第二数量所对应的第一社区的标签,所述第二数量所对应的第一社区表示与所述社区节点相邻的相邻社区节点的数量为所述第二数量的第一社区。
这里,对于每个社区节点,服务器可基于社区节点所属第一社区中的社区节点数量获取第一数量。
比如,以社区节点A1为例,服务器可以获取A1所在的社区也即第一社区A中包括的社区节点的数量4。又比如,对于社区节点C2,服务器可以获取C2所在的社区也即第一社区C中的社区节点的数量5。
这里,在社会网络中,不同社区节点之间可能存在相邻关系,对于每个社区节点,服务器可以先统计候选社区中包括的与该社区节点相邻的相邻社区节点的数量,然后从中选择最大值作为第二数量。其中,候选社区是指该社区节点的相邻社区节点所属的第一社区。
比如,仍然以社区节点A1为例,社区节点A1与社区节点C1相邻,则服务器可以获取C1所属的社区也即第一社区C中包括的A1的相邻节点的数量如5(分别为C1、C2、C3、C4和C5)。
本实施例中,遍历每个社区节点获得第一数量,以及获取所述社区节点的相邻社区节点所属第一社区中包括的与所述社区节点相邻的相邻社区节点的数量,选择其中的最大值作为第二数量,可不限定执行过程,可以先获得第一数量,再获得第二数量,也可以先获得第二数量,再获得第一数量,也可以是同时获得第一数量和第二数量,本实施例中对此并不做限定。
本实施例中,在服务器获取到第一数量和第二数量之后,服务器可以检测第二数量是否大于第一数量,若检测结果为第二数量大于第一数量,则说明该社区节点的相邻社区节点中很大部分均在第二数量所对应的候选社区中,此时,该社区节点可以被划分至候选社区的可能性比较大,因此为了合并社区,服务器可以将该社区节点的标签更新为第二数量所对应的候选社区的标签。
比如,仍然以社区节点A1为例,服务器获取到的第一数量为4,第二数量为5,第二数量大于第一数量,则此时,服务器可以将A1的标签由A’更新为第一社区C的标签C’。
需要说明的是,若第二数量不大于第一数量,则服务器不对该社区节点的标签做任何处理。比如,获取到的第二数量为2,则服务器不更新社区节点A1的标签,也即其标签仍然为A’。
作为一种实施方式,在遍历每个社区节点之后,检测每个社区节点中标签发生变化的社区节点的总数量是否达到数量阈值;若检测结果为标签发生变化的社区节点的总数量达到数量阈值,则再次遍历每个社区节点;若检测结果为标签发生变化的社区节点的总数量未达到所述数量阈值,则结束遍历。
这里,服务器可在每次更新社区节点的标签时进行计数,在遍历完每个社区节点之后,获取标签发生变化的社区节点的数量,检测该数量是否
达到数量阈值。其中,数量阈值为用户预先设定的数值,该数量阈值小于预设数值。其中,该数量阈值可以为0,对此并不做限定。
如果服务器的检测结果为标签发生变化的社区节点的总数量达到数量阈值,则表明各个社区节点的标签还未趋于稳定,仍然有继续传播的空间,此时服务器可以再次遍历各个社区节点,判定是否有需要更新标签的社区节点。若检测结果为未达到数量阈值,则表明各个社区节点的标签已经趋于稳定,此时服务器可以结束对社区节点的标签的更新。
步骤203:将标签相同的社区节点划分至第二社区,获得m个第二社区,m为小于n的正整数。
这里,具体是标签响应与检测结果为发生变化的社区节点的总数量未达到所述数量阈值,将标签相同的社区节点划分至第二社区,得到m个第二社区。
实际实现时,服务器每更新一个社区节点的标签,可以将该社区节点划分至更新后的标签所对应的社区;当然,服务器还可以在遍历每个社区节点之后,统一按照社区节点的标签将标签相同的社区节点划分至一个第二社区,本实施例对此并不做限定。
在本实施例的一个应用场景中,以社会网络为校园网、预设社区发现算法为SCD算法为例,服务器根据SCD算法划分得到基于所在社团的部门的第一社区之后,对于某个部门的用户,由于其好友也即其大多数的相邻节点在某个班级的社区中,因此,服务器可以将部门的第一社区中的用户合并至班级的第二社区中,类似的,可以将班级的第一社区中的用户合并至学校的第二社区中,这样,服务器即可得到划分准确且粒度较大的大社区。
作为一种实施方式,在遍历每个社区节点之前,还可以先对社区节点进行排序,比如,将社区节点按照随机顺序排序,或者,将社区节点按照
社区节点的重要程度进行排序。其中,当按照重要程度排序时,按照每个社区节点的重要程度对社区节点进行排序;相应的,所述遍历每个社区节点,获取第一数量,包括:按照排序后的社区节点的顺序,对每个社区节点,获取第一数量。
其中,按照社区节点的重要程度对社区节点进行排序可以具体包括如下三种可能的实现方式:
第一种,按照每个社区节点的聚类系数(Clustering Coefficient)由大到小的顺序对N个社区节点进行排序。
在排序过程中,若多个社区节点的聚类系数相同,则对于这多个社区节点,服务器按照社区节点的度从大到小的顺序排序。其中,节点的度表示与社区节点相邻的相邻社区节点的数量,比如,以社区节点A1为例,其相邻的社区节点包括C1、C2、C3、C4和C5,则社区节点A的度为5。另外,当社区节点的聚类系数为1且度小于预设度数时,将这些社区节点排在最后,且各个社区节点间按照度由大到小的顺序排序,本实施例对此并不做限定。
第二种,根据Pagerank算法对各个社区节点进行排序。
第三种,按照社区节点的度由大到小的顺序对社区节点进行排序。
实际实现时,服务器还可以通过其他方法对社区节点进行排序,本实施例对此并不做限定。
另外,本实施例也只是以通过上述标签传播方法更新社区节点的标签来举例,实际实现时,服务器还可以通过其他标签传播方法更新社区节点的标签,本实施例对此并不做赘述。
需要补充说明的是,本实施例所说的第一社区是指通过预设社区发现算法划分得到的社区,第二社区是指根据标签传播算法更新小社区中的节点的标签之后,按照标签相同重新划分得到的社区,其并非是说第二社区中包括的社区节点的个数一定大于第一社区中包括的社区节点的个数,这也就是说得到的m个第二社区中可能存在某第二大社区中包括的社区节点的个数小于n个第一社区中的某个第一社区中包括的社区节点的个数,本实施例对此并不做限定。
综上所述,本实施例提供的社区发现方法,通过在划分得到n个第一社区之后,根据标签传播算法更新第一社区中的社区节点的标签,进而将标签相同的社区节点划分至第二社区,得到m个第二社区;也即将初步划分得到的第一社区进行合并,进而得到数量更少的第二社区,解决了现有技术中划分得到的社区的粒度较小,进而无法满足某些场景的需求的问题;达到了在保证划分得到的社区的准确度的同时,又可以将第一社区合并为第二社区进而使得得到的社区的粒度符合划分需求的效果。
并且,本实施例在进行标签传播时,只有在第二数量大于第一数量时,才会将社区节点的标签更新为第二数量所对应的候选社区的标签,保证了根据更新后的标签重新划分社区之后社区划分的准确度。同时,通过将社区节点的标签更新为大于第一数量的第二数量所对应的候选社区的标签,避免了现有的标签传播时存在一定的随机性的问题,减少了划分后的社区的波动。
此外,更新标签的过程中,在遍历社区节点时,可以遍历按照社区节点的重要程度由高到低的顺序排列后的社区节点,进一步保证了合并后的社区的准确度。
在上述实施例中,假设社区节点的总量为N个,则请参考图4,遍历的过程可以具体实现为如下步骤:
步骤301,对于N个社区节点中的第i个社区节点,获取第一数量,第一数量为第i个社区节点所属第一社区中的社区节点的总数量,0≤i≤N-1,i为正整数,i的起始值为0。
步骤302,获取第i个社区节点的相邻社区节点所属第一社区中包括的与该第i个社区节点相邻的相邻社区节点的数量,选择其中的最大值最为第二数量。
步骤303,若第二数量大于第一数量,则将第i个社区节点的标签更新为第二数量所对应的第一社区的标签,在i<N-1时令i=i+1,再次执行对于N个社区节点中的第i个社区节点,获取第一数量的步骤。
在更新第i个社区节点的标签之后,为了遍历N个社区节点中的下一个社区节点,服务器可以在i<N-1时令i=i+1,再次执行步骤301,在此不再赘述。
如果更新第i个社区节点的标签后,i=N-1,则说明N个社区节点已全部遍历完毕,此时服务器可以检测社区节点中标签发生变化的社区节点的总数量是否达到数量阈值。
步骤304,若第二数量不大于第一数量,则在i<N-1时令i=i+1,再次执行对于N个社区节点中的第i个社区节点、获取第一数量的步骤。
而如果第二数量不大于第一数量,则由于此时不需要更新第i个社区节点的标签,此时,为了遍历N个社区节点中的下一个社区节点,服务器可以在i<N-1时令i=i+1,并再次执行步骤301,在此步骤赘述。
与步骤303类似,若i=N-1,则服务器可以检测社区节点中标签发生变化的社区节点的总数量是否达到数量阈值。
上述实施例中的社区发现算法可以为SCD算法以及派系过滤算法中的至少一种,下述以社区发现算法为SCD算法为例,则本发明实施例步骤201可以包括:
第一,社区初始化。
WCC(Weighted Community Clustering,加权社区聚类系数)是基于三角形的社区划分评价指标。其可由如下公式(1)推导获得:
其中,公式(1)用于计算节点x关于社区C的WCC值。其中,t(x,C)表示在社区C中与节点x相关的三角形数量,vt(x,C)表示所有能与节点x构成三角形的节点中属于社区C的节点数量。|C\{x}|+vt(x,V\C)表示社区C中不包括x在内的其余节点。
单个社区的WCC值可通过公式(2)表示:
一个社区划分的总体WCC值可通过公式(3)表示:
其中,P={C1,C2,...Cn},V代表网络中的节点。WCC值越大,代表划分效果越好。
第二,SCD算法将以公式(3)的WCC值作为目标函数,通过贪心算法优化WCC值,得到一个局部最优划分。SCD主要分为两个步骤:
1、预聚类。
预聚类之前,需要对社区进行剪枝。SCD算法首先去除网络中不属于任意三角形的边。计算每个节点的聚类系数,节点v的聚类系数满足如下公式(4):
其中,j表示节点v的所有相邻节点之间互相连接的边的数量,k表示
节点v的所有相邻节点的数量。
根据每个节点的聚类系数倒序排序,若聚类系数相同,则按节点的度从大到小排序。其中,存在一种特殊情况,当一个节点的聚类系数为1且度为2时,将该节点排序在序列的最后,最终得到节点序列S。
预聚类开始。按序遍历节点序列S中每个节点,对于任一还没被访问过的节点v,为节点v及其未被访问过的邻居节点创建一个新社区C,标记社区C中节点为已访问,并把社区C添加到划分P中。直到所有节点都标记为已访问,迭代结束,得到一个预聚类的划分P。
2、细化聚类。
在预聚类得到划分P后,使用爬山法优化WCC值。首先遍历每个节点,计算每个节点的best_movement。再对每个节点执行best_movement,得到新的划分P’。计算P’的新WCC值,若新的WCC值相对旧的划分P的WCC值的提升大于一个阈值α(α例如可取0.1),则重新计算新划分P’的best_movement,直到前后两次划分的WCC值提升小于阈值α结束。请参考图5,其示出了具体的方法流程图。
其中,best_movement表示对每个节点的最佳操作方式。实际应用中,分别有No_Action、Remove和Transfer三种候选操作方式。No_Action表示不执行任何操作;Remove表示将一个节点从当前社区移除,使其成为一个孤立的节点;Transfer表示将一个节点从当前社区移动至另外一个社区。
具体的,服务器计算best_movement之前,首先要计算每种操作对WCC值的增益,具体包括:
A、记WCCI(v,C1)表示将孤立节点v插入到社区C1后对整体WCC值的提升情况。假设P={C1,C2,...Ck,{v}}和P'={C'1,C2,...Ck}都是图G(V,E)的划分,且C'1=C1∪{v},有:
B、记WCCR(v,C1)表示将节点v从社区C1中移除后整体WCC值的提升情况。假设P={C1,C2,...Ck}和P'={C'1,C2,...Ck,{v}}都是图G(V,E)的划分,且C1=C'1∪{v},则:
WCC(P')-WCC(P)=WCCR(v,C1)=-WCCI(v,C'1) (6)
C、记WCCT(v,C1,C2)表示将节点v从社区C1转移到社区C2后整体WCC值的提升情况。假设P={C1,C2,...Ck}和P'={C'1,C2,...C'k}都是图G(V,E)的划分,且C1=C'1∪{v}和C'k=Ck∪{v},则:
WCC(P')-WCC(P)=WCCT(v,C1,Ck)=-WCCI(v,C'1)+WCCI(v,Ck) (7)
对于V中的每个节点v,服务器首先计算将它从当前社区移除后的提升WCCR(v,C1),然后从它的邻居所属社区获得它的潜在候选社区,计算每个候选社区相应的转移提升值WCCT(v,C1,C2),选取提升最大的并作标记。然后从WCCR(v,C1)和最大WCCT(v,C1,C2)中选取最优操作,即Remove或Transfer。若两者对WCC值的提升均为负,则保持v在原社区不变(No_Action)。
经过上述步骤之后,服务器即可划分得到多个小社区,之后,服务器可以为每个社区分配一个对应的标签。
请参考图6,其示出了本发明一个实施例提供的社区发现装置的结构示意图,社区发现装置可应用于服务器,所述服务器包括有一个或多个处理器以及存储器,以及一个或一个以上的程序,其中,所述一个或一个以上的程序存储于存储器中,所述程序可以包括一个或一个以上的每一个对应于一组指令的模块,所述一个或多个处理器被配置为执行指令;如图6所示,该社区发现装置可以包括:划分模块510、更新模块520和合并模块530。其中:
划分模块510,配置为将社会网络中的网络节点划分社区,得到n个第一社区以及所述n个第一社区中每个第一社区的标签;n为大于等于2的整数;
更新模块520,配置为根据标签传播算法更新所述n个第一社区中每个第一社区包括的社区节点的标签,所述社区节点为所述n个第一社区中的网络节点,所述社区节点的初始标签为所述社区节点所属第一社区的标签;
合并模块530,配置为将标签相同的社区节点划分至第二社区,获得m个第二社区,m为小于n的正整数。
综上所述,本实施例提供的社区发现装置,通过在划分得到n个第一社区之后,根据标签传播算法更新第一社区中的社区节点的标签,进而将标签相同的社区节点划分至第二社区,得到m个第二社区;也即将初步划分得到的第一社区进行合并,进而得到数量更少的第二社区,解决了现有技术中划分得到的社区的粒度较小,进而无法满足某些场景的需求的问题;达到了在保证划分得到的社区的准确度的同时,又可以将第一社区合并为第二社区进而使得得到的社区的粒度符合划分需求的效果。
基于上述实施例提供的社区发现装置,作为一种实施方式,所述更新模块520,包括:
第一获取单元,配置为遍历每个社区节点,获取第一数量,所述第一数量为所述社区节点所属第一社区中的社区节点的数量;
第二获取单元,配置为获取所述社区节点的相邻社区节点所属第一社区中包括的与所述社区节点相邻的相邻社区节点的数量,选择其中的最大值作为第二数量;
更新单元,配置为在所述第二获取单元获取到的所述第二数量大于所述第一获取单元获取到的所述第一数量,将所述社区节点的标签更新为所述第二数量所对应的第一社区的标签,所述第二数量所对应的第一社区表
示与所述社区节点相邻的相邻社区节点的数量为所述第二数量的第一社区。
作为一种实施方式,所述装置还包括:
检测模块,配置为在遍历每个社区节点之后,检测社区节点中标签发生变化的社区节点的总数量是否达到数量阈值;
第一结果模块,配置为在所述检测模块的检测结果为标签发生变化的社区节点的总数量达到所述数量阈值时,再次遍历每个社区节点;
第二结果模块,配置为在所述检测模块的检测结果为标签发生变化的社区节点的总数量未达到所述数量阈值时,结束遍历。
作为一种实施方式,所述装置还包括:
排序模块,配置为按照每个社区节点的重要程度对社区节点进行排序;
所述第一获取单元,还配置为按照排序后的社区节点的顺序,对每个社区节点,获取第一数量。
作为一种实施方式,所述排序模块,还配置为:
按照每个社区节点的聚类系数由大到小的顺序对所述各个社区节点进行排序;或者,根据Pagerank算法对所述各个社区节点进行排序;或者,按照各个社区节点的度由大到小的顺序对所述各个社区节点进行排序。
本领域技术人员应当理解,本发明实施例的社区发现装置中各处理单元的功能,可参照前述社区发现方法的相关描述而理解,本发明实施例的社区发现装置中各处理单元,可通过实现本发明实施例所述的功能的模拟电路而实现,也可以通过执行本发明实施例所述的功能的软件在智能终端上的运行而实现。
需要说明的是:上述实施例提供的社区发现装置,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将服务器的内部结构划分成不同的功能模块,以
完成以上描述的全部或者部分功能。另外,上述实施例提供的社区发现装置和社区发现方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
在本发明实施例中,所述社区发现装置在实际应用中可通过服务器实现;所述社区发现装置中的划分模块510、更新模块520、合并模块530、检测模块、第一结果模块、第二结果模块和排序模块,在实际应用中可由中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Signal Processor)或可编程门阵列(FPGA,Field-Programmable Gate Array)实现
本发明实施例还提供了一种计算机存储介质,该计算机存储介质可以是上述实施例中的存储器中所包含的计算机可读存储介质;也可以是单独存在,未装配入终端中的计算机可读存储介质。该计算机可读存储介质存储有一个或者一个以上计算机可执行指令,该一个或者一个以上计算机可执行指令被一个或者一个以上的处理器用于执行本发明实施例的社区发现方法。具体的,所述计算机可执行指令用于执行:根据预设社区发现算法将社会网络中的网络节点划分社区,得到n个第一社区以及所述n个第一社区中每个第一社区的标签;n为大于等于2的整数;根据标签传播算法更新所述n个第一社区中每个第一社区包括的社区节点的标签,所述社区节点为所述n个第一社区中的网络节点,所述社区节点的初始标签为所述社区节点所属第一社区的标签;将标签相同的社区节点划分至第二社区,获得m个第二社区,m为小于n的正整数。
作为一种实施方式,所述计算机可执行指令用于执行:遍历每个社区节点,获取第一数量,所述第一数量为所述社区节点所属第一社区中的社区节点的数量;获取所述社区节点的相邻社区节点所属第一社区中包括的与所述社区节点相邻的相邻社区节点的数量,选择其中的最大值作为第二
数量;若所述第二数量大于所述第一数量,则将所述社区节点的标签更新为所述第二数量所对应的第一社区的标签,所述第二数量所对应的第一社区表示与所述社区节点相邻的相邻社区节点的数量为所述第二数量的第一社区。
作为一种实施方式,所述计算机可执行指令用于执行:在遍历每个社区节点之后,检测社区节点中标签发生变化的社区节点的总数量是否达到数量阈值;若检测结果为标签发生变化的社区节点的总数量达到所述数量阈值,则再次遍历每个社区节点;若检测结果为标签发生变化的社区节点的总数量未达到所述数量阈值,则结束遍历。
作为一种实施方式,所述计算机可执行指令用于执行:按照每个社区节点的重要程度对社区节点进行排序;则按照排序后的社区节点的顺序,对每个社区节点,获取第一数量。
作为一种实施方式,所述计算机可执行指令用于执行:按照每个社区节点的聚类系数由大到小的顺序对所述各个社区节点进行排序;或者,根据Pagerank算法对所述各个社区节点进行排序;或者,按照各个社区节点的度由大到小的顺序对所述各个社区节点进行排序。
请参考图7,其示出了本发明一个实施例提供的服务器的结构示意图。该服务器用于实施上述实施例中提供的社区发现方法。具体来讲:
所述服务器600包括处理器601、包括随机存取存储器(RAM)602和只读存储器(ROM)603的系统存储器604,以及连接系统存储器604和处理器601的系统总线605。所述服务器600还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(I/O系统)606,和用于存储操作系统613、应用程序614和其他程序模块615的大容量存储设备607。
可以理解,处理器601可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器601中的硬件的集
成逻辑电路或者软件形式的指令完成。上述的处理器601可以是通用处理器、DSP,或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。处理器601可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本发明实施例所公开的方法的步骤,可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于存储介质中,该存储介质位于存储器604,处理器601读取存储器604中的信息,结合其硬件完成前述方法的步骤。
所述基本输入/输出系统606包括有用于显示信息的显示器608和用于用户输入信息的诸如鼠标、键盘之类的输入设备609。其中所述显示器608和输入设备609都通过连接到系统总线605的输入输出控制器610连接到处理器601。所述基本输入/输出系统606还可以包括输入输出控制器610以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器610还提供输出到显示屏、打印机或其他类型的输出设备。
所述大容量存储设备607通过连接到系统总线605的大容量存储控制器(未示出)连接到处理器601。所述大容量存储设备607及其相关联的计算机可读介质为服务器600提供非易失性存储。也就是说,所述大容量存储设备607可以包括诸如硬盘或者CD-ROM驱动器之类的计算机可读介质(未示出)。
不失一般性,所述计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、EPROM、EEPROM、闪存或其他固态存储其技术,CD-ROM、DVD或其他光学存储、
磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器604和大容量存储设备607可以统称为存储器。
根据本发明的各种实施例,所述服务器600还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即服务器600可以通过连接在所述系统总线605上的网络接口单元611连接到网络612,或者说,也可以使用网络接口单元611来连接到其他类型的网络或远程计算机系统(未示出)。
所述存储器还包括一个或者一个以上的程序,所述一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行。上述一个或者一个以上程序包含用于执行上述社区发现方法的指令。
本实施例中,所述处理器601用于运行所述计算机程序时,执行:根据预设社区发现算法将社会网络中的网络节点划分社区,得到n个第一社区以及所述n个第一社区中每个第一社区的标签;n为大于等于2的整数;根据标签传播算法更新所述n个第一社区中每个第一社区包括的社区节点的标签,所述社区节点为所述n个第一社区中的网络节点,所述社区节点的初始标签为所述社区节点所属第一社区的标签;将标签相同的社区节点划分至第二社区,获得m个第二社区,m为小于n的正整数。
作为一种实施方式,所述处理器601用于运行所述计算机程序时,执行:遍历每个社区节点,获取第一数量,所述第一数量为所述社区节点所属第一社区中的社区节点的数量;获取所述社区节点的相邻社区节点所属第一社区中包括的与所述社区节点相邻的相邻社区节点的数量,选择其中的最大值作为第二数量;若所述第二数量大于所述第一数量,则将所述社区节点的标签更新为所述第二数量所对应的第一社区的标签,所述第二数量所对应的第一社区表示与所述社区节点相邻的相邻社区节点的数量为所述第二数量的第一社区。
作为一种实施方式,所述处理器601用于运行所述计算机程序时,执行:在遍历每个社区节点之后,检测社区节点中标签发生变化的社区节点的总数量是否达到数量阈值;若检测结果为标签发生变化的社区节点的总数量达到所述数量阈值,则再次遍历每个社区节点;若检测结果为标签发生变化的社区节点的总数量未达到所述数量阈值,则结束遍历。
作为一种实施方式,所述处理器601用于运行所述计算机程序时,执行:按照每个社区节点的重要程度对社区节点进行排序;则按照排序后的社区节点的顺序,对每个社区节点,获取第一数量。
作为一种实施方式,所述处理器601用于运行所述计算机程序时,执行:按照每个社区节点的聚类系数由大到小的顺序对所述各个社区节点进行排序;或者,根据Pagerank算法对所述各个社区节点进行排序;或者,按照各个社区节点的度由大到小的顺序对所述各个社区节点进行排序。
本实施例上述各实施例中,社会网络中的网络节点可简称为节点、得到的第一社区、第二社区可分别称为第一社交网络模型和第二社交网络模型,则第二社区的生成方式还可依据以下处理方式实现:
步骤401,获取表征节点之间具有第一关联关系的第一社交网络模型;
这里,所述第一社交网络模型中具有连边的两个节点具有第一关联关系,具有第一关联关系的两个用户之间为好友关系。
步骤402,将所述第一社交网络模型按预设表示方式表示为第一邻接矩阵;
具体地,遍历所述第一社交网络模型中的节点,将具有直接关联关系的任意两个节点对应的元素值置为1,将不具有直接关联关系的任意两个节点对应的元素值置为0,生成第一邻接矩阵;
这里,所述第一社交网络模型中有N个节点,则所述第一邻接矩阵为N×N的矩阵,且所述第一邻接矩阵为对称矩阵;
所述第一邻接矩阵表示所述第一社交网络模型中的节点之间具有直接关联关系,所述直接关联关系是指两个节点之间具有连边;举例来说,如果所述第一社交网络模型中的节点1和节点2之间具有连边,则认为节点1和节点2之间具有直接关联关系;那么,第一邻接矩阵中的元素A12的值为1。如果所述第一社交网络模型中的节点1和节点3之间不具有连边,则认为节点1和节点3之间不具有直接关联关系;那么第一邻接矩阵中的元素A13的值为0。
步骤403,将所述第一社交网络模型中分别与第一节点具有直接关联关系的任意两个节点记为具有直接关联关系而生成第二邻接矩阵;所述第一节点为所述第一社交网络模型中的任一节点;
这里,所述第一节点为所述第一社交网络模型中的任一节点;
举例来说,如果所述第一社交网络模型中的节点1和节点2之间具有连边,节点2和节点3之间具有连边,而节点1和节点3之间不具有连边,那么,记录节点1和节点3在所述第二邻接矩阵中具有直接关联关系。
步骤404,获取所述第二邻接矩阵中具有直接关联关系的任意两个节点在所述第一邻接矩阵中的结构相似度;
具体的,获取所述第二邻接矩阵中具有直接关联关系的节点u和节点v,基于所述第一邻接矩阵所表示的第一社交网络模型分别确定所述节点u对应的包含有所述节点u的第一邻居节点集合,以及所述节点v对应的包含有所述节点v的第二邻居节点集合;
分别确定所述第一邻居节点集合的第一节点数量以及所述第二邻居节点集合的第二节点数量;基于所述第一邻居节点集合、所述第二邻居节点集合、所述第一节点数量和所述第二节点数量确定所述结构相似度;
这里,可利用如下公式计算节点u和节点v的结构相似度:
其中,Ts表示结构相似度,N[u]表示节点u在第一矩阵中包含自身的所有邻接节点的集合,N[v]表示节点v在第一矩阵中包含自身的所有邻接节点的集合,|N[u]∩N[v]|表示N[u]与N[v]交集的数量,d[u]表示集合N[u]中的节点数量,d[v]表示集合N[v]中的节点数量。
步骤405,获取所述第二邻接矩阵中具有直接关联关系的任意两个节点的特征属性的相似度;
具体地,获取所述第二邻接矩阵中具有直接关联关系的任意两个节点的多个特征属性的相似度;对所述多个特征属性的相似度按照线性加权平均算法进行处理后获得所述任意两个节点的特征属性的相似度;在计算特征属性的相似度时,加权参数可根据实际情况灵活设置;
在获取所述第二邻接矩阵中具有直接关联关系的任意两个节点的特征属性的相似度时,分别获取所述第二邻接矩阵中具有直接关联关系的任意两个节点的对应于第一特征属性的第一行向量和第二行向量;其中,所述第一特征属性为所述多个特征属性中的任一特征属性;所述第一行向量和所述第二行向量中的数值分别表征在预设时间范围内的多个指定时间段、所述任意两个节点的第一特征属性的状态;基于所述第一行向量和所述第二行向量确定所述任意两个节点对应于所述第一特征属性的第一相似度;
这里,可利用如下公式计算第二邻接矩阵中具有直接关联关系的两个节点的每个特征属性的相似度:
其中,所述第二邻接矩阵中节点的每个特征属性均有对应的行向量值,如关注公众号与否记录为相应的特征属性对应的行向量值为1和0,不同地理位置在单月发布状态或签到的次数记录为相应的特征属性对应的行向量值为实际次数值、或实际次数值按预设的规则分类统计后的数值;
所述特征属性包括:地理位置信息、个人兴趣、行为偏好等,在对多个特征属性按照加权平均算法进行处理时,可根据实际需要对各特征属性的相似度进行加权处理,得到用户A与用户B的特征属性的相似度。
步骤406,基于所述任意两个节点的结构相似度和特征属性的相似度确定融合相似度,对确定的融合相似度按照预设要求进行筛选,基于满足预设要求的融合相似度生成第三邻接矩阵;
具体的,首先根据如下公式中非线性指数的形式计算任意两个节点的融合相似度;
其中,TPS表示融合相似度,PS表示特征属性的相似度,TS表示结构相似度,α和β为加权参数;α和β的值可根据实际需要进行调整。
再将所述第二邻接矩阵中具有直接关联关系的任意两个节点对应的元素值替换为对应的融合相似度;
最后,两个节点之间的融合相似度小于预设阈值时,可以认为这两个节点之间的关联性小;因此,将全部小于预设阈值的融合相似度替换为零,生成第三邻接矩阵;
这里,所述第三邻接矩阵既包括了用户的好友关系链的相似度,又包括了用户的特征属性的相似度;使得基于重构后的社交网络模型进行对应的社区发现、链路预测和图表征得到的结果更加具有现实意义。
步骤407,基于所述第三邻接矩阵生成表征节点之间第二关联关系的第二社交网络模型;
具体地,所述第三邻接矩阵中具有第二关联关系的两个节点在所述第
二社交网络模型中具有连边,所述第三邻接矩阵中不具有第二关联关系的两个节点在所述第二社交网络模型中不具有连边。
上述处理过程的详细处理流程,包括以下步骤:
步骤501,获取用于表示网络模型的第一邻接矩阵;
具体地,遍历所述第一社交网络模型中的节点,将具有直接关联关系的任意两个节点对应的元素值置为1,将不具有直接关联关系的任意两个节点对应的元素值置为0,生成第一邻接矩阵A;
如图8所示,节点1和节点2之间具有连边,认为节点1和节点2具有直接关联关系,则所述第一邻接矩阵中的元素A12的值为1;节点1和节点6之间不具有连边,认为节点1和节点6不具有直接关联关系,则所述第一邻接矩阵A中的元素A16的值为1;同理,可计算第一邻接矩阵A中各元素的值;
步骤502,将所述第一社交网络模型中最大跳数为2的两个节点记为具有直接关联关系而生成第二邻接矩阵;
具体地,如图8所示,节点3分别与节点1、节点5具有连边,而节点1与节点5不具有连边,则认为节点1和节点5之间的跳数为2,记录节点1和节点5具有直接关联关系,所述第二邻接矩阵中的元素A15的值为1;同理,所述第二邻接矩阵中的元素A14、A16的值均为1;节点2、节点3与节点1之间的跳数为1,所述第二邻接矩阵中的元素A12、A13的值也为1;以此类推,得到第二邻接矩阵A1,
步骤503,获取第二邻接矩阵中具有直接关联关系的任意两个节点在所述第一邻接矩阵中的结构相似度;
具体地,以计算第二邻接矩阵中节点1和节点5的结构相似度为例,在第一邻接矩阵中节点1的邻居为节点2和节点3,节点5的邻居为节点3,节点1和节点5的邻居的交集为节点3,则节点1和节点5的结构相似度为:
步骤504,获取所述第二邻接矩阵中具有直接关联关系的任意两个节点的特征属性的相似度;
具体地,先获取所述第二邻接矩阵中具有直接关联关系的任意两个节点的多个特征属性的相似度;再对所述多个特征属性的相似度按照线性加权平均算法进行处理后获得所述任意两个节点的特征属性的相似度,计算特征属性相似度时的加权参数可根据实际情况灵活设置。
以图8所示网络模型中节点1和节点5为例,首先获取节点1和节点5中第一特征属性的行向量,利用下述公式计算节点1和节点5基于所述第一特征属性的相似度:
其中,所述第二邻接矩阵中节点的每个特征属性均有对应的行向量值,如关注公众号与否记录为相应的特征属性对应的行向量值为1和0,不同地
理位置在单月发布状态或签到的次数记录为相应的特征属性对应的行向量值为实际次数值、或实际次数值按预设的规则分类统计后的数值;
其次,基于同样的方法,分别计算节点1与节点5的各特征属性的相似度;
最后,根据实际需要对各特征属性的相似度进行加权处理,得到节点1与节点5的特征属性的相似度;
利用与计算节点1与节点5的特征属性的相似度同样的方法,可计算第二邻接矩阵中具有直接关联关系的各节点对的特征属性的相似度。
步骤505,基于任意两个节点的结构相似度和特征属性的相似度确定融合相似度;
具体的,根据如下公式计算任意两个节点的融合相似度;
其中,TPS表示融合相似度,PS表示任意两个节点的特征属性的相似度,TS表示任意两个结构相似度,α和β为加权参数;α和β的值可根据实际需要进行调整。
步骤506,将第二邻接矩阵中具有直接关联关系的任意两个节点对应的元素值替换为计算得到的两个节点的融合相似度;
这里,将第二邻接矩阵中具有直接关联关系的任意两个节点对应的元素值替换为计算得到的两个节点的融合相似度后,生成矩阵A1’;
步骤507,将小于0.3的融合相似度替换为零,生成第三邻接矩阵;
这里,将低于预设阈值的融合相似度进行剔除,简化后续对重构网络模型的研究和计算;所述预设阈值可根据实际需要灵活设置,通常设置为0.2至0.4之间;
本实施例中,设置阈值为0.3,得到第三邻接矩阵A2为:
应用本发明实施例,由于所述第三邻接矩阵既包括了用户的好友关系链的相似度,又包括了用户的特征属性的相似度,使得表示重构网络模型的邻接矩阵既包括了用户的好友关系链的相似度,又包括了用户的特征属性的相似度;基于重构后的社交网络模型进行相应的社区发现、链路预测和图表征得到的结果更加具有现实意义。
应当理解的是,在本文中使用的,除非上下文清楚地支持例外情况,单数形式“一个”(“a”、“an”、“the”)旨在也包括复数形式。还应当理解的是,在本文中使用的“和/或”是指包括一个或者一个以上相关联地列出的项目的任意和所有可能组合。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。
在本发明所提供的几个实施例中,应该理解到,所揭露的方法及装置,可以通过其他的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个模块或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的通信连接可以是通过一些接口,设备或模块的间接耦合或通信连接,可以是电性的、机械的或其他形式的。
上述作为分离部件说明的模块可以是、或也可以不是物理上分开的,作为模块显示的部件可以是、或也可以不是物理模块,即可以位于一个地方,也可以分布到多个网络模块上;可以根据实际的需要选择其中的部分或全部模块来实现本实施例方案的目的。
另外,在本发明各实施例中的各功能模块可以全部集成在一个处理模块中,也可以是各模块分别单独作为一个模块,也可以两个或两个以上模块集成在一个模块中;上述集成的模块既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read-Only Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本发明实施例上述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存
储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。
本发明实施例中记载的存储器切换方法、装置只以上述实施例为例,但不仅限于此,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。
以上所述仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。
本发明实施例的技术方案通过在划分得到n个第一社区之后,根据标签传播算法更新第一社区中的社区节点的标签,进而将标签相同的社区节点划分至第二社区,得到m个第二社区;也即将初步划分得到的第一社区进行合并,进而得到数量更少的第二社区,解决了现有技术中划分得到的社区的粒度较小,进而无法满足某些场景的需求的问题;达到了在保证划分得到的社区的准确度的同时,又可以将第一社区合并为第二社区进而使得得到的社区的粒度符合划分需求的效果。
Claims (16)
- 一种社区发现方法,应用于服务器,所述服务器包括有一个或多个处理器以及存储器,以及一个或一个以上的程序,其中,所述一个或一个以上的程序存储于存储器中,所述程序可以包括一个或一个以上的每一个对应于一组指令的模块,所述一个或多个处理器被配置为执行指令;所述方法包括:将社会网络中的网络节点划分社区,得到n个第一社区以及所述n个第一社区中每个第一社区的标签;n为大于等于2的整数;根据标签传播算法更新所述n个第一社区中每个第一社区包括的社区节点的标签,所述社区节点为所述n个第一社区中的网络节点,所述社区节点的初始标签为所述社区节点所属第一社区的标签;将标签相同的社区节点划分至第二社区,获得m个第二社区,m为小于n的正整数。
- 根据权利要求1所述的方法,其中,所述根据标签传播算法更新社区节点的标签,包括:遍历每个社区节点,获取第一数量,所述第一数量为所述社区节点所属第一社区中的社区节点的数量;获取所述社区节点的相邻社区节点所属第一社区中包括的与所述社区节点相邻的相邻社区节点的数量,选择其中的最大值作为第二数量;若所述第二数量大于所述第一数量,则将所述社区节点的标签更新为所述第二数量所对应的第一社区的标签,所述第二数量所对应的第一社区表示与所述社区节点相邻的相邻社区节点的数量为所述第二数量的第一社区。
- 根据权利要求2所述的方法,其中,所述方法还包括:在遍历每个社区节点之后,检测社区节点中标签发生变化的社区节点 的总数量是否达到数量阈值;若检测结果为标签发生变化的社区节点的总数量达到所述数量阈值,则再次遍历每个社区节点;若检测结果为标签发生变化的社区节点的总数量未达到所述数量阈值,则结束遍历。
- 根据权利要求2或3所述的方法,其中,所述方法还包括:按照每个社区节点的重要程度对社区节点进行排序;所述遍历每个社区节点,获取第一数量,包括:按照排序后的社区节点的顺序,对每个社区节点,获取第一数量。
- 根据权利要求4所述的方法,其中,所述按照每个社区节点的重要程度对社区节点进行排序,包括:按照每个社区节点的聚类系数由大到小的顺序对所述各个社区节点进行排序。
- 根据权利要求4所述的方法,其中,所述按照每个社区节点的重要程度对社区节点进行排序,包括:根据Pagerank算法对所述各个社区节点进行排序。
- 根据权利要求4所述的方法,其中,所述按照每个社区节点的重要程度对社区节点进行排序,包括:按照社区节点的度由大到小的顺序对社区节点进行排序;所述社区节点的度表示社区节点相邻的相邻社区节点的数量。
- 一种社区发现装置,应用于服务器,所述服务器包括有一个或多个处理器以及存储器,以及一个或一个以上的程序,其中,所述一个或一个以上的程序存储于存储器中,所述程序可以包括一个或一个以上的每一个对应于一组指令的模块,所述一个或多个处理器被配置为执行指令;所述装置包括:划分模块,配置为将社会网络中的网络节点划分社区,得到n个第一社区以及所述n个第一社区中每个第一社区的标签;n为大于等于2的整数;更新模块,配置为根据标签传播算法更新所述n个第一社区中每个第一社区包括的社区节点的标签,所述社区节点为所述n个第一社区中的网络节点,所述社区节点的初始标签为所述社区节点所属第一社区的标签;合并模块,配置为将标签相同的社区节点划分至第二社区,获得m个第二社区,m为小于n的正整数。
- 根据权利要求8所述的装置,其中,所述更新模块,包括:第一获取单元,配置为遍历每个社区节点,获取第一数量,所述第一数量为所述社区节点所属第一社区中的社区节点的数量;第二获取单元,配置为获取所述社区节点的相邻社区节点所属第一社区中包括的与所述社区节点相邻的相邻社区节点的数量,选择其中的最大值作为第二数量;更新单元,配置为在所述第二获取单元获取到的所述第二数量大于所述第一获取单元获取到的所述第一数量时,将所述社区节点的标签更新为所述第二数量所对应的第一社区的标签,所述第二数量所对应的第一社区表示与所述社区节点相邻的相邻社区节点的数量为所述第二数量的第一社区。
- 根据权利要求9所述的装置,其中,所述装置还包括:检测模块,配置为在遍历每个社区节点之后,检测社区节点中标签发生变化的社区节点的总数量是否达到数量阈值;第一结果模块,配置为在所述检测模块的检测结果为标签发生变化的社区节点的总数量达到所述数量阈值时,再次遍历每个社区节点;第二结果模块,配置为在所述检测模块的检测结果为标签发生变化的社区节点的总数量未达到所述数量阈值时,结束遍历。
- 根据权利要求9或10所述的装置,其中,所述装置还包括:排序模块,配置为按照每个社区节点的重要程度对社区节点进行排序;所述第一获取单元,还配置为按照排序后的社区节点的顺序,对每个社区节点,获取第一数量。
- 根据权利要求11所述的装置,其中,所述排序模块,还配置为按照每个社区节点的聚类系数由大到小的顺序对所述各个社区节点进行排序。
- 根据权利要求11所述的装置,其中,所述排序模块,还配置为根据Pagerank算法对所述各个社区节点进行排序。
- 根据权利要求11所述的装置,其中,所述排序模块,还配置为按照社区节点的度由大到小的顺序对社区节点进行排序;所述社区节点的度表示社区节点相邻的相邻社区节点的数量。
- 一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1至7任一项所述的社区发现方法。
- 一种服务器,该服务器包括:处理器和用于存储能够在处理器上运行的计算机程序的存储器,其中,所述处理器用于运行所述计算机程序时,执行权利要求1至7任一项所述的社区发现方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/310,920 US10846052B2 (en) | 2016-10-27 | 2017-10-12 | Community discovery method, device, server and computer storage medium |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610954505.0 | 2016-10-27 | ||
CN201610954505.0A CN108011735B (zh) | 2016-10-27 | 2016-10-27 | 社区发现方法及装置 |
CN201610933379.0A CN108022171B (zh) | 2016-10-31 | 2016-10-31 | 一种数据处理方法及设备 |
CN201610933379.0 | 2016-10-31 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018077039A1 true WO2018077039A1 (zh) | 2018-05-03 |
Family
ID=62023341
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/105956 WO2018077039A1 (zh) | 2016-10-27 | 2017-10-12 | 社区发现方法、装置、服务器及计算机存储介质 |
Country Status (2)
Country | Link |
---|---|
US (1) | US10846052B2 (zh) |
WO (1) | WO2018077039A1 (zh) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108647739A (zh) * | 2018-05-17 | 2018-10-12 | 华中科技大学 | 一种基于改进的密度峰值聚类的社交网络社区发现方法 |
CN109992723A (zh) * | 2019-02-25 | 2019-07-09 | 平安科技(深圳)有限公司 | 一种基于社交网络的用户兴趣标签构建方法及相关设备 |
CN110162692A (zh) * | 2018-12-10 | 2019-08-23 | 腾讯科技(深圳)有限公司 | 用户标签确定方法、装置、计算机设备和存储介质 |
CN110442800A (zh) * | 2019-07-22 | 2019-11-12 | 哈尔滨工程大学 | 一种融合节点属性和图结构的半监督社区发现方法 |
CN110969526A (zh) * | 2019-12-13 | 2020-04-07 | 南京三百云信息科技有限公司 | 重叠社群处理方法、装置以及电子设备 |
CN111177473A (zh) * | 2018-11-13 | 2020-05-19 | 杭州海康威视数字技术股份有限公司 | 人员关系分析方法、装置和可读存储介质 |
WO2020113437A1 (zh) * | 2018-12-04 | 2020-06-11 | 区链通网络有限公司 | 图结构处理方法、系统、网络设备及存储介质 |
CN111506620A (zh) * | 2020-03-31 | 2020-08-07 | 上海氪信信息技术有限公司 | 局部社区的挖掘与合并方法及其装置、芯片、存储介质 |
CN113449112A (zh) * | 2020-03-24 | 2021-09-28 | 顺丰科技有限公司 | 异常寄递行为识别方法、装置、计算机设备及存储介质 |
CN114880584A (zh) * | 2022-05-16 | 2022-08-09 | 华能澜沧江水电股份有限公司 | 一种基于社区发现的发电机组故障分析方法 |
CN114969563A (zh) * | 2022-05-25 | 2022-08-30 | 中国电子科技集团公司第十研究所 | 一种基于改进ppr和扩张截止函数的组网群体发现方法 |
CN115225601A (zh) * | 2022-06-07 | 2022-10-21 | 湖北工程学院 | 消息转发方法、装置、设备及存储介质 |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108763314B (zh) * | 2018-04-26 | 2021-01-19 | 深圳市腾讯计算机系统有限公司 | 一种兴趣推荐方法、装置、服务器及存储介质 |
CN111814944A (zh) * | 2019-04-12 | 2020-10-23 | 北京百度网讯科技有限公司 | 顶点对社区的分配方法、装置以及终端 |
CN110310122B (zh) * | 2019-06-27 | 2023-09-01 | 上海麦克风文化传媒有限公司 | 一种基于图结构的iOS代充风险控制方法 |
CN110489598B (zh) * | 2019-07-05 | 2021-12-07 | 中国联合网络通信集团有限公司 | 一种用户社团划分方法及装置 |
CN110610434A (zh) * | 2019-09-04 | 2019-12-24 | 成都威嘉软件有限公司 | 基于人工智能的社区发现方法 |
CN111353002B (zh) * | 2020-02-03 | 2024-05-03 | 中国人民解放军国防科技大学 | 网络表示学习模型的训练方法、装置、电子设备及介质 |
CN111274457B (zh) * | 2020-02-03 | 2023-12-19 | 中国人民解放军国防科技大学 | 一种网络图分割方法及存储介质 |
CN111339425B (zh) * | 2020-03-05 | 2021-07-23 | 拉扎斯网络科技(上海)有限公司 | 一种对象标记方法、装置、服务器及存储介质 |
CN111383125B (zh) * | 2020-03-16 | 2023-09-01 | 中国联合网络通信集团有限公司 | 社区划分方法、系统、终端设备及存储介质 |
CN111507506A (zh) * | 2020-03-20 | 2020-08-07 | 厦门大学 | 一种基于共识嵌入的复杂网络社区发现方法 |
CN112288528A (zh) * | 2020-10-30 | 2021-01-29 | 浙江集享电子商务有限公司 | 恶意社群发现方法、装置、计算机设备和可读存储介质 |
CN112633388B (zh) * | 2020-12-28 | 2022-07-29 | 中国科学院软件研究所 | 一种面向社交网络的分布式用户聚类方法 |
CN113515672A (zh) * | 2020-12-31 | 2021-10-19 | 腾讯科技(深圳)有限公司 | 数据处理方法、装置、计算机可读介质及电子设备 |
CN112989272B (zh) * | 2020-12-31 | 2024-02-27 | 中科院计算技术研究所大数据研究院 | 一种基于局部路径的社团发现算法 |
CN112967146B (zh) * | 2021-02-03 | 2023-08-04 | 北京航空航天大学 | 一种基于标签传播的科研社区发现方法及装置 |
CN113111133A (zh) * | 2021-04-09 | 2021-07-13 | 北京沃东天骏信息技术有限公司 | 用户分类的方法和装置 |
CN113807600B (zh) * | 2021-09-26 | 2023-07-25 | 河南工业职业技术学院 | 一种动态社交网络中的链路预测方法 |
CN114913336A (zh) * | 2022-05-27 | 2022-08-16 | 北京达佳互联信息技术有限公司 | 网络图特征提取方法、装置、电子设备、存储介质 |
CN114896520B (zh) * | 2022-06-10 | 2024-08-02 | 兰州大学 | 一种基于元社区一致性的集成社区检测方法及系统 |
CN115169501A (zh) * | 2022-08-05 | 2022-10-11 | 东北电力大学 | 基于公共邻居节点聚类熵紧密相似性的社区检测方法 |
CN115563400B (zh) * | 2022-09-19 | 2023-06-13 | 广东技术师范大学 | 一种基于模体加权聚合的多路网络社区检测方法及装置 |
CN116308856A (zh) * | 2023-02-10 | 2023-06-23 | 华南师范大学 | 一种面向多层学习者关系网络的社区发现方法及装置 |
CN118132323B (zh) * | 2024-04-30 | 2024-07-05 | 苏州元脑智能科技有限公司 | 一种服务器故障诊断方法、装置、电子设备及存储介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103327092A (zh) * | 2012-11-02 | 2013-09-25 | 中国人民解放军国防科学技术大学 | 一种信息网络上的社区发现方法和系统 |
CN104199852A (zh) * | 2014-08-12 | 2014-12-10 | 上海交通大学 | 基于节点隶属度的标签传播社团结构挖掘方法 |
CN105677648A (zh) * | 2014-11-18 | 2016-06-15 | 四三九九网络股份有限公司 | 一种基于标签传播算法的社团发现方法及系统 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090132561A1 (en) * | 2007-11-21 | 2009-05-21 | At&T Labs, Inc. | Link-based classification of graph nodes |
US8606787B1 (en) * | 2010-09-15 | 2013-12-10 | Google Inc. | Social network node clustering system and method |
US20130268595A1 (en) * | 2012-04-06 | 2013-10-10 | Telefonaktiebolaget L M Ericsson (Publ) | Detecting communities in telecommunication networks |
CN105893382A (zh) | 2014-12-23 | 2016-08-24 | 天津科技大学 | 一种基于先验知识的微博用户群体划分方法 |
US20170270254A1 (en) * | 2016-03-18 | 2017-09-21 | Northeastern University | Methods and systems for quantifying closeness of two sets of nodes in a network |
-
2017
- 2017-10-12 WO PCT/CN2017/105956 patent/WO2018077039A1/zh active Application Filing
- 2017-10-12 US US16/310,920 patent/US10846052B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103327092A (zh) * | 2012-11-02 | 2013-09-25 | 中国人民解放军国防科学技术大学 | 一种信息网络上的社区发现方法和系统 |
CN104199852A (zh) * | 2014-08-12 | 2014-12-10 | 上海交通大学 | 基于节点隶属度的标签传播社团结构挖掘方法 |
CN105677648A (zh) * | 2014-11-18 | 2016-06-15 | 四三九九网络股份有限公司 | 一种基于标签传播算法的社团发现方法及系统 |
Non-Patent Citations (3)
Title |
---|
SHEN, HAIYAN ET AL.: "ANew Overlapping Community Discovery Algorithm Based on Label Propagation", SOFTWARE GUIDE, vol. 14, no. 4, 30 April 2015 (2015-04-30) * |
XIN, NAN: "The Research of Real-Time Community Detection Algorithm Based on Label Propagation", ELECTRONIC TECHNOLOGY & INFORMATION SCIENCE , CHINA MASTER'S THESES FULL-TEXT DATABASE, 15 January 2014 (2014-01-15) * |
ZHAO, BAOFENG ET AL.: "A Stable Label Propagation Algorithm for Community Detection", JOURNAL OF TAIYUAN UNIVERSITY OF TECHNOLOGY, vol. 44, no. 4, 31 July 2013 (2013-07-31) * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108647739A (zh) * | 2018-05-17 | 2018-10-12 | 华中科技大学 | 一种基于改进的密度峰值聚类的社交网络社区发现方法 |
CN108647739B (zh) * | 2018-05-17 | 2020-09-18 | 华中科技大学 | 一种基于改进的密度峰值聚类的社交网络社区发现方法 |
CN111177473A (zh) * | 2018-11-13 | 2020-05-19 | 杭州海康威视数字技术股份有限公司 | 人员关系分析方法、装置和可读存储介质 |
CN111177473B (zh) * | 2018-11-13 | 2023-11-14 | 杭州海康威视数字技术股份有限公司 | 人员关系分析方法、装置和可读存储介质 |
WO2020113437A1 (zh) * | 2018-12-04 | 2020-06-11 | 区链通网络有限公司 | 图结构处理方法、系统、网络设备及存储介质 |
CN110162692A (zh) * | 2018-12-10 | 2019-08-23 | 腾讯科技(深圳)有限公司 | 用户标签确定方法、装置、计算机设备和存储介质 |
CN110162692B (zh) * | 2018-12-10 | 2021-05-25 | 腾讯科技(深圳)有限公司 | 用户标签确定方法、装置、计算机设备和存储介质 |
CN109992723A (zh) * | 2019-02-25 | 2019-07-09 | 平安科技(深圳)有限公司 | 一种基于社交网络的用户兴趣标签构建方法及相关设备 |
CN109992723B (zh) * | 2019-02-25 | 2023-06-20 | 平安科技(深圳)有限公司 | 一种基于社交网络的用户兴趣标签构建方法及相关设备 |
CN110442800B (zh) * | 2019-07-22 | 2022-05-20 | 哈尔滨工程大学 | 一种融合节点属性和图结构的半监督社区发现方法 |
CN110442800A (zh) * | 2019-07-22 | 2019-11-12 | 哈尔滨工程大学 | 一种融合节点属性和图结构的半监督社区发现方法 |
CN110969526A (zh) * | 2019-12-13 | 2020-04-07 | 南京三百云信息科技有限公司 | 重叠社群处理方法、装置以及电子设备 |
CN113449112A (zh) * | 2020-03-24 | 2021-09-28 | 顺丰科技有限公司 | 异常寄递行为识别方法、装置、计算机设备及存储介质 |
CN111506620B (zh) * | 2020-03-31 | 2023-04-25 | 上海氪信信息技术有限公司 | 局部社区的挖掘与合并方法及其装置、芯片、存储介质 |
CN111506620A (zh) * | 2020-03-31 | 2020-08-07 | 上海氪信信息技术有限公司 | 局部社区的挖掘与合并方法及其装置、芯片、存储介质 |
CN114880584A (zh) * | 2022-05-16 | 2022-08-09 | 华能澜沧江水电股份有限公司 | 一种基于社区发现的发电机组故障分析方法 |
CN114880584B (zh) * | 2022-05-16 | 2024-05-28 | 华能澜沧江水电股份有限公司 | 一种基于社区发现的发电机组故障分析方法 |
CN114969563A (zh) * | 2022-05-25 | 2022-08-30 | 中国电子科技集团公司第十研究所 | 一种基于改进ppr和扩张截止函数的组网群体发现方法 |
CN115225601A (zh) * | 2022-06-07 | 2022-10-21 | 湖北工程学院 | 消息转发方法、装置、设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
US20190179615A1 (en) | 2019-06-13 |
US10846052B2 (en) | 2020-11-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018077039A1 (zh) | 社区发现方法、装置、服务器及计算机存储介质 | |
CN111782965B (zh) | 意图推荐方法、装置、设备及存储介质 | |
Bhagat et al. | Node classification in social networks | |
Burgess et al. | Link-prediction enhanced consensus clustering for complex networks | |
CN111444395A (zh) | 获取实体间关系表达的方法、系统和设备、广告召回系统 | |
Xiao et al. | A truth discovery approach with theoretical guarantee | |
CN110515986B (zh) | 一种社交网络图的处理方法、装置及存储介质 | |
CN111881219A (zh) | 动态知识图谱补全方法、装置、电子设备和存储介质 | |
CN115293919B (zh) | 面向社交网络分布外泛化的图神经网络预测方法及系统 | |
WO2020224445A1 (zh) | 一种车流路径分布信息的处理方法、装置及电子设备 | |
CN111708876A (zh) | 生成信息的方法和装置 | |
CN110909868A (zh) | 基于图神经网络模型的节点表示方法和装置 | |
CN108900320B (zh) | 一种互联网测试床拓扑结构大比例规模缩减方法及装置 | |
CN113821657A (zh) | 基于人工智能的图像处理模型训练方法及图像处理方法 | |
Hasan et al. | Graphettes: Constant-time determination of graphlet and orbit identity including (possibly disconnected) graphlets up to size 8 | |
CN104199924B (zh) | 选择具有快照关系的网络表格的方法及装置 | |
US20220284023A1 (en) | Estimating computational cost for database queries | |
US11782918B2 (en) | Selecting access flow path in complex queries | |
CN108011735B (zh) | 社区发现方法及装置 | |
Dahinden et al. | Decomposition and model selection for large contingency tables | |
CN113535939A (zh) | 文本处理方法和装置、电子设备以及计算机可读存储介质 | |
CN116795995A (zh) | 知识图谱构建方法、装置、计算机设备和存储介质 | |
CN109918543B (zh) | 一种图流中针对节点的链路预测方法 | |
Rajan et al. | Maximum likelihood reconstruction of ancestral networks by integer linear programming | |
Liu et al. | Social Network Community‐Discovery Algorithm Based on a Balance Factor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17866149 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17866149 Country of ref document: EP Kind code of ref document: A1 |