Abstract
Community detection in social networks is a widely studied topic in Artificial Intelligence and graph analysis. It can be useful to discover hidden relations between users, the target audience in digital marketing, and the recommender system, amongst others. In this context, some of the existing proposals for finding communities in networks are agglomerative methods. These methods used similarities or link prediction between nodes to discover the communities in graphs. The different similarity metrics used in these proposals focused mainly on common neighbors between similar nodes. However, such definitions are missing in the sense that they do not take into account the connection between common neighbors. In this paper, we propose a new similarity measure, named \(\alpha \)-Structural Similarity, that focuses not only on common neighbors of nodes but also on their connections. Afterwards, in the light of \(\alpha \)-Structural Similarity, we extend the Hierarchical Clustering algorithm to identify disjoint communities in networks. Finally, we conduct extensive experiments on synthetic networks and various well-known real-world networks to confirm the efficiency of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aldecoa, R., Marín, I.: Deciphering network community structure by surprise. PLoS ONE 6, e24195 (2011)
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. 2008, P10008 (2008)
Bollacker, K.D., Lawrence, S., Giles, C.L.: CiteSeer: an autonomous web agent for automatic retrieval and identification of interesting publications. In: Proceedings of the Second International Conference on Autonomous Agents, pp. 116–123 (1998)
Chen, Y.C., Zhu, W.Y., Peng, W.C., Lee, W.C., Lee, S.Y.: CIM: community-based influence maximization in social networks. .ACM Trans. Intell. Syst. Technol. (TIST) 5(2), 1–31 (2014)
Enright, A.J., Van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30(7), 1575–1584 (2002)
Fortunato, S., Lancichinetti, A.: Community detection algorithms: a comparative analysis: invited presentation, extended abstract. In: 4th International ICST Conference on Performance Evaluation Methodologies and Tools (2010)
Ganley, D., Lampe, C.: The ties that bind: social network principles in online communities. Decis. Support Syst. 47(3), 266–274 (2009)
Krebs, V.: Books about us politics. Unpublished (2004). http://www.orgnet.com
Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4) (2008). https://doi.org/10.1103/physreve.78.046110
Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection at http://snap.stanford.edu/data (2014)
Lusseau, D., Schneider, K., Boisseau, O.J., Haase, P., Slooten, E., Dawson, S.M.: The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behav. Ecol. Sociobiol. 54(4), 396–405 (2003)
Luís, R.: Towards data science: modularity maximization greedy algorithm. https://towardsdatascience.com/modularity-maximization-5cfa6495b286. Accessed 30 May 2020
Martin, J.: Theoretical computer science: Fast extraction of the edges of an induced subgraph. https://cstheory.stackexchange.com/questions/33440/fast-extraction-of-the-edges-of-an-induced-subgraph. Accessed 26 December 2015
Martínez, V., Berzal, F., Cubero, J.C.: A survey of link prediction in complex networks. ACM Comput. Surv. 49(4), 1–33 (2016). https://doi.org/10.1145/3012704
Newman, M.E.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004)
Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76, 036106 (2007)
Rossetti, G., Milli, L., Cazabet, R.: CDLIB: a python library to extract, compare and evaluate communities from complex networks. Appl. Netw. Sci. 4, 1–26 (2019)
Rossetti, G., Pappalardo, L., Rinzivillo, S.: A novel approach to evaluate community detection algorithms on ground truth. In: Cherifi, H., Gonçalves, B., Menezes, R., Sinatra, R. (eds.) Complex Networks VII. SCI, vol. 644, pp. 133–144. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30569-1_10
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval (1983)
Šubelj, L., Bajec, M.: Model of complex networks based on citation dynamics. In: Proceedings of the 22nd International Conference on World Wide Web (2013)
Traag, V.A., Krings, G., Van Dooren, P.: Significant scales in community structure. Sci. Rep. 3(1), 1–10 (2013)
Traag, V.A., Waltman, L., Van Eck, N.J.: From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9(1), 1–12 (2019)
Unknown: CDP (2022). https://github.com/2x254/CDP
Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33(4), 452–473 (1977)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix a
We have \(s_2(u_2,v_2)=\frac{k+2}{\sqrt{(n+2)(n+k)}}\)
if \(\mathbf {(u_1,v_1) \in E_2 }\), we can distinguish two cases:
-
\(\mathbf {|\{u_1,v_1\} \cap \{u_0, v_0\}| \ne 1}\). In this case, we have \(s_2(u_1,v_1) = 1\). Then,
\(\frac{k+2}{\sqrt{(n+2)(n+k)}}< 1 \Leftrightarrow (k+2)^2<(n+2)(n+k) \Leftrightarrow k < (n-2)+\)
\(\sqrt{(2-n)^2 +4(n^2 +2n-4)}\). k.t. \( (n-2)+ \sqrt{(2-n)^2 +4(n^2 +2n-4)} > n \implies s_2(u_2,v_2) < s_2(u_1,v_1) \; \forall \; 1\le k \le n \).
-
\(\mathbf {|\{u_1,v_1\} \cap \{u_0, v_0\}| = 1}\). In this case, we have \(s_2(u_1,v_1) = \frac{n}{\sqrt{n(n+k)}}\). Then, \(\frac{k+2}{\sqrt{(n+2)(n+k)}}< \frac{n}{\sqrt{n(n+k)}} \Leftrightarrow \frac{(k+2)^2}{(n+2)(n+k)}< \frac{n}{(n+k)} \Leftrightarrow (k+2)^2< n(n+2) \Leftrightarrow k+2< \sqrt{n(n+2)} \Leftrightarrow k < \sqrt{n(n+2)} -2 \). k.t. \(\sqrt{n(n+2)} -2< n-1 \;\; \forall \;\;n \ge 4 \implies \mathbf {s_2(u_2,v_2) < s_2(u_1,v_1) }\) iff. \(\mathbf { k < n-1 }\).
if \(\mathbf {(u_1,v_1) \in E_1 }\), there are also two cases:
-
\(\mathbf {|\{u_1,v_1\} \cap \{u_2, v_2\}| = 0}\) or \(\mathbf {(u_1,v_1) \in E_1 \cap E_3}\). In this case, \(s_2(u_1,v_1) = 1\). Then, \(s_2(u_2,v_2) < s_2(u_1,v_1) ~\forall \; 1\le k \le n \). (proved).
-
\(\mathbf {|\{u_1,v_1\} \cap \{u_2, v_2\}| = 1}\). In this case, \(s_2(u_1,v_1)= \frac{n}{\sqrt{n(n+2)}}\). Then,
\(\frac{k+2}{\sqrt{(n+2)(n+k)}}< \frac{n}{\sqrt{n(n+2)}} \Leftrightarrow (k+2)^2< n(n+k) \Leftrightarrow k< \frac{(n-4) + \sqrt{n(5n-8)}}{2} \Leftrightarrow k \le n-1< \frac{(n-4) + \sqrt{n(5n-8)}}{2} \Leftrightarrow s_2(u_2,v_2) < s_2(u_1,v_1) \) iff. \(k \le n-1\).
Appendix B
\(s_2^{\alpha }(u_2,v_2)= (1-\alpha ) \frac{k+2}{\sqrt{(n+2)(n+k)}} + \alpha \frac{(k+2)(k+1)}{min(n(n-1)+4k+2,\;n(n-1)+(k+2)(k+1)-2)} \)
It should be noted that \(min(n(n-1)+6,\;n(n-1)+4)=n(n-1)+4\) iff \(k=1\) and \(min(n(n-1)+4k+2,\;n(n-1)+(k+2)(k+1)-2) = n(n-1)+4k+2 \) iff \(k \ge 2\)
if \(\mathbf {(u_1,v_1) \in E_2 }\), we have the same cases as mentioned in the proof of 2S:
-
\( \mathbf {|\{u_1,v_1\} \cap \{u_0, v_0\}| \ne 1}\). In this case, we have \(s_2^{\alpha }(u_1,v_1) = 1\). Then,
-
if \(\mathbf {k=1}\): k.t. \(\frac{3}{\sqrt{(n+2)(n+1)}} < 1 \;\forall \; n \ge 4 \Leftrightarrow (1-\alpha ) \frac{3}{\sqrt{(n+2)(n+1)}} \le (1-\alpha ) \;\;\forall \; \alpha \in [0 .. 1] \) and k.t. \(\frac{6}{n(n-1)+4}< 1 \;\forall \; n \ge 4 \Leftrightarrow \alpha \frac{6}{n(n-1)+4}< \alpha \;\forall \; \alpha \in \;]0 .. 1] \Leftrightarrow (1-\alpha ) \frac{3}{\sqrt{(n+2)(n+1)}} + \alpha \frac{6}{n(n-1)+4} < 1 \;\;\forall \; \alpha \in \; ]0 .. 1]\)
-
if \(\mathbf {k\ge 2}\): k.t. \(\frac{k+2}{\sqrt{(n+2)(n+k)}} < 1 \; \forall \; 1\le k \le n\) (proved) \(\Leftrightarrow (1-\alpha ) \frac{k+2}{\sqrt{(n+2)(n+k)}}\)
\(\le (1-\alpha ) \;\;\; \forall \; 1\le k \le n, \; \forall \; \alpha \in [0 .. 1] \) and wtp \(\frac{(k+2)(k+1)}{n(n-1)+4k+2} < 1 \) iff. \(k < n. \;\;\;\;\) \(\frac{(k+2)(k+1)}{n(n-1)+4k+2}< 1 \Leftrightarrow (k+2)(k+1)< n(n-1)+4k+2 \Leftrightarrow k^2 +3k+2-4k< n^2 -n +2 \Leftrightarrow k^2 -k< n^2 -n \Leftrightarrow k < n \).
\(\Leftrightarrow \alpha \frac{(k+2)(k+1)}{n(n-1)+4k+2} < \alpha \) iff. \(k<n, \;\;\forall \; \alpha \in \;]0 .. 1] \Leftrightarrow (1-\alpha ) \frac{k+2}{\sqrt{(n+2)(n+k)}} + \alpha \frac{(k+2)(k+1)}{n(n-1)+4k+2} < 1 \) iff \(k<n, \;\;\forall \; \alpha \in \;]0 .. 1]. \)
-
-
\(\mathbf {|\{u_1,v_1\} \cap \{u_0, v_0\}| = 1}\). We have \(s_2^{\alpha }(u_1,v_1) = (1-\alpha ) \frac{n}{\sqrt{n(n+k)}}+ \alpha \). Then,
-
if \(\mathbf {k=1}\) : k.t. \( (1-\alpha ) \frac{3}{\sqrt{(n+2)(n+1)}} \le (1-\alpha ) \frac{n}{\sqrt{n(n+1)}} \;\;\forall \;\; n \ge 4, \; \forall \; \alpha \in [0 .. 1] \) and \(\alpha \frac{6}{n(n-1)+4}< \alpha \;\; \forall \;\; n \ge 4, \; \forall \; \alpha \in \;]0 .. 1] \Leftrightarrow (1-\alpha ) \frac{3}{\sqrt{(n+2)(n+1)}} + \alpha \frac{6}{n(n-1)+4} < (1-\alpha ) \frac{n}{\sqrt{n(n+1)}} + \alpha \;\;\; \forall \; \alpha \in \;]0 .. 1] \).
-
if \(\mathbf { 2 \le k < n-1 }\) : k.t. \(\frac{k+2}{\sqrt{(n+2)(n+k)}} - \frac{n}{\sqrt{n(n+k)}} < 0\) iff. \(k<n-1\) (proved) and \(\frac{(k+2)(k+1)}{n(n-1)+4k+2}-1 < 0 \) iff. \(k < n. \;\) (proved) \( \Leftrightarrow (1-\alpha ) [\frac{k+2}{\sqrt{(n+2)(n+k)}} - \frac{n}{\sqrt{n(n+k)}}] \le 0\) iff. \(k<n-1, \;\forall \; \alpha \in [0 .. 1]\) and \(\alpha [\frac{(k+2)(k+1)}{n(n-1)+4k+2}-1] < 0 \) iff. \(k< n, \;\forall \; \alpha \in \;]0 .. 1] \) \(\Leftrightarrow (1-\alpha )[\frac{k+2}{\sqrt{(n+2)(n+k)}} - \frac{n}{\sqrt{n(n+k)}}] + \alpha [\frac{(k+2)(k+1)}{n(n-1)+4k+2}-1] < 0 \) iff. \(k< n-1, \;\forall \; \alpha \in \;]0 .. 1] \Leftrightarrow (1-\alpha ) \frac{k+2}{\sqrt{(n+2)(n+k)}} + \alpha \frac{(k+2)(k+1)}{n(n-1)+4k+2} < (1-\alpha ) \frac{n}{\sqrt{n(n+k)}} +\alpha \) iff. \(k < n-1, \; \forall \; \alpha \in \;]0 .. 1].\)
-
if \(\mathbf { k=n-1}\) : let be \(a=(1-\alpha ) \frac{n+1}{\sqrt{(n+2)(2n-1)}} + \alpha \frac{(n+1)n}{(n+4)(n-1)+2}\) and \(b=(1-\alpha ) \frac{n}{\sqrt{n(2n-1)}}+ \alpha \Leftrightarrow a-b= (1-\alpha ) \frac{n+1}{\sqrt{(n+2)(2n-1)}} + \alpha \frac{(n+1)n}{(n+4)(n-1)+2} - (1-\alpha ) \frac{n}{\sqrt{n(2n-1)}} - \alpha = [\frac{n+1}{\sqrt{(n+2)(2n-1)}} - \frac{n}{\sqrt{n(2n-1)}}] - \alpha [ \frac{n+1}{\sqrt{(n+2)(2n-1)}} - \frac{n}{\sqrt{n(2n-1)}} + 1 - \frac{(n+1)n}{(n+4)(n-1)+2}]\)
\(\Leftrightarrow a-b <0 \) iff. \(\alpha > \frac{ \frac{n+1}{\sqrt{(n+2)(2n-1)}} - \frac{n}{\sqrt{n(2n-1)}}}{\frac{n+1}{\sqrt{(n+2)(2n-1)}} - \frac{n}{\sqrt{n(2n-1)}} + 1 - \frac{(n+1)n}{(n+4)(n-1)+2}}\). Let’s consider
\(f(n)= \frac{ \frac{n+1}{\sqrt{(n+2)(2n-1)}} - \frac{n}{\sqrt{n(2n-1)}}}{\frac{n+1}{\sqrt{(n+2)(2n-1)}} - \frac{n}{\sqrt{n(2n-1)}} + 1 - \frac{(n+1)n}{(n+4)(n-1)+2}}\) a continuous decreasing function on \([4, +\infty [ \). Calculating the limits : \(\displaystyle {\lim _{n \rightarrow +\infty }f(n)} = 0\) and \(\displaystyle {\lim _{n \rightarrow 4}f(n)} \approx 0.06 \Leftrightarrow 0 \le f(n) \le 0.06 \). Then, \( a<b \;\) iff. \( \alpha > 0.06\).
-
if \( \mathbf {k=n}\) : let be \(a = (1- \alpha ) \sqrt{ \frac{n+2}{2n}} + \alpha \; \) and \(b= (1- \alpha ) \frac{1}{\sqrt{2}} + \alpha \). Let’s consider \(g(n)=\sqrt{ \frac{n+2}{2n}}\) a continuous and strictly positive decreasing function on \([4,+\infty [ \). Calculating the limits : \( \displaystyle {\lim _{n \rightarrow +\infty }g(n)} = \frac{1}{\sqrt{2}}\) and \(\displaystyle {\lim _{n \rightarrow 4}g(n)} = \sqrt{\frac{3}{4}} \Leftrightarrow \frac{1}{\sqrt{2}} \le g(n) \le \sqrt{\frac{3}{4}} \Leftrightarrow g(n) \ge \frac{1}{\sqrt{2}} \Leftrightarrow (1-\alpha ) \;g(n) + \alpha \ge (1-\alpha ) \frac{1}{\sqrt{2}} + \alpha \Leftrightarrow a \ge b \Leftrightarrow \;\)if \( k=n, s_2^{\alpha }(u_2,v_2) \ge s_2^{\alpha }(u_1,v_1) \;\; \forall \; \alpha \in [0 .. 1]\).
\(\implies \mathbf {s_2^{\alpha }(u_2,v_2) < s_2^{\alpha }(u_1,v_1)}\) iff. \(\mathbf {k\le n-1 }\) and \(\mathbf {\alpha > 0.06 }\).
-
if \(\mathbf {(u_1,v_1) \in E_1 }\), there are also two cases, which are the same in the proof of 2S:
-
\(\mathbf {|\{u_1,v_1\} \cap \{u_2, v_2\}| = 0}\) or \(\mathbf {(u_1,v_1) \in E_1 \cap E_3}\). In this case, \(s_2^{\alpha }(u_1,v_1) = 1\). Then, \(s_2^{\alpha }(u_2,v_2)< s_2^{\alpha }(u_1,v_1)\) iff \(k<n\) (proved).
-
\(\mathbf {|\{u_1,v_1\} \cap \{u_2, v_2\}| = 1}\). In this case, \( s_2^{\alpha }(u_1,v_1) = (1-\alpha )\frac{n}{\sqrt{n(n+2)}}+\alpha \). Then,
-
if \(\mathbf {k=1}\): k.t. \(\frac{3}{\sqrt{(n+2)(n+1)}} < \frac{n}{\sqrt{n(n+2)}} \Leftrightarrow (1-\alpha ) \frac{3}{\sqrt{(n+2)(n+1)}} \le (1-\alpha ) \frac{n}{\sqrt{n(n+2)}} \; \forall \; \alpha \in [0 .. 1] , \; \forall \; n \ge 4 \). k.t. \(\frac{6}{n(n-1)+4}< 1 \Leftrightarrow \alpha \frac{6}{n(n-1)+4}< \alpha \;\forall \; \alpha \in \;]0 .. 1] \Leftrightarrow (1-\alpha ) \frac{3}{\sqrt{(n+2)(n+1)}} + \alpha \frac{6}{n(n-1)+4} < (1-\alpha )\frac{n}{\sqrt{n(n+2)}}+\alpha \;\forall \; \alpha \in \;]0 .. 1], \; \forall \; n \ge 4 \).
-
if \(\mathbf {k\ge 2}\): k.t. \(\frac{k+2}{\sqrt{(n+2)(n+k)}} < \frac{n}{\sqrt{n(n+2)}}\) iff \(k\le n-1\) (proved) \(\Leftrightarrow (1- \alpha ) \frac{k+2}{\sqrt{(n+2)(n+k)}} \le (1-\alpha ) \frac{n}{\sqrt{n(n+2)}} \) iff \(k\le n-1, \;\forall \; \alpha \in [0 .. 1].\) k.t.
\(\frac{(k+2)(k+1)}{n(n-1)+4k+2} < 1 \) iff \(k < n.\) (proved) \(\Leftrightarrow \alpha \frac{(k+2)(k+1)}{n(n-1)+4k+2} < \alpha \;\forall \; \alpha \in \;]0 .. 1]\) \(\Leftrightarrow (1- \alpha ) \frac{k+2}{\sqrt{(n+2)(n+k)}} + \alpha \frac{(k+2)(k+1)}{n(n-1)+4k+2} < (1-\alpha ) \frac{n}{\sqrt{n(n+2)}} + \alpha \) iff \(k \le n-1 \;\forall \; \alpha \in \;]0 .. 1] \Leftrightarrow s_2^{\alpha }(u_2,v_2) < s_2^{\alpha }(u_1,v_1) \) iff. \(k\le n-1\).
-
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hassine, M.B., Jabbour, S., Kmimech, M., Raddaoui, B., Graiet, M. (2023). A Non-overlapping Community Detection Approach Based on \(\alpha \)-Structural Similarity. In: Wrembel, R., Gamper, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2023. Lecture Notes in Computer Science, vol 14148. Springer, Cham. https://doi.org/10.1007/978-3-031-39831-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-39831-5_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39830-8
Online ISBN: 978-3-031-39831-5
eBook Packages: Computer ScienceComputer Science (R0)