Nothing Special   »   [go: up one dir, main page]

Skip to main content

A Non-overlapping Community Detection Approach Based on \(\alpha \)-Structural Similarity

  • Conference paper
  • First Online:
Big Data Analytics and Knowledge Discovery (DaWaK 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14148))

Included in the following conference series:

Abstract

Community detection in social networks is a widely studied topic in Artificial Intelligence and graph analysis. It can be useful to discover hidden relations between users, the target audience in digital marketing, and the recommender system, amongst others. In this context, some of the existing proposals for finding communities in networks are agglomerative methods. These methods used similarities or link prediction between nodes to discover the communities in graphs. The different similarity metrics used in these proposals focused mainly on common neighbors between similar nodes. However, such definitions are missing in the sense that they do not take into account the connection between common neighbors. In this paper, we propose a new similarity measure, named \(\alpha \)-Structural Similarity, that focuses not only on common neighbors of nodes but also on their connections. Afterwards, in the light of \(\alpha \)-Structural Similarity, we extend the Hierarchical Clustering algorithm to identify disjoint communities in networks. Finally, we conduct extensive experiments on synthetic networks and various well-known real-world networks to confirm the efficiency of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aldecoa, R., Marín, I.: Deciphering network community structure by surprise. PLoS ONE 6, e24195 (2011)

    Article  Google Scholar 

  2. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. 2008, P10008 (2008)

    Article  MATH  Google Scholar 

  3. Bollacker, K.D., Lawrence, S., Giles, C.L.: CiteSeer: an autonomous web agent for automatic retrieval and identification of interesting publications. In: Proceedings of the Second International Conference on Autonomous Agents, pp. 116–123 (1998)

    Google Scholar 

  4. Chen, Y.C., Zhu, W.Y., Peng, W.C., Lee, W.C., Lee, S.Y.: CIM: community-based influence maximization in social networks. .ACM Trans. Intell. Syst. Technol. (TIST) 5(2), 1–31 (2014)

    Article  Google Scholar 

  5. Enright, A.J., Van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30(7), 1575–1584 (2002)

    Article  Google Scholar 

  6. Fortunato, S., Lancichinetti, A.: Community detection algorithms: a comparative analysis: invited presentation, extended abstract. In: 4th International ICST Conference on Performance Evaluation Methodologies and Tools (2010)

    Google Scholar 

  7. Ganley, D., Lampe, C.: The ties that bind: social network principles in online communities. Decis. Support Syst. 47(3), 266–274 (2009)

    Article  Google Scholar 

  8. Krebs, V.: Books about us politics. Unpublished (2004). http://www.orgnet.com

  9. Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4) (2008). https://doi.org/10.1103/physreve.78.046110

  10. Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection at http://snap.stanford.edu/data (2014)

  11. Lusseau, D., Schneider, K., Boisseau, O.J., Haase, P., Slooten, E., Dawson, S.M.: The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behav. Ecol. Sociobiol. 54(4), 396–405 (2003)

    Article  Google Scholar 

  12. Luís, R.: Towards data science: modularity maximization greedy algorithm. https://towardsdatascience.com/modularity-maximization-5cfa6495b286. Accessed 30 May 2020

  13. Martin, J.: Theoretical computer science: Fast extraction of the edges of an induced subgraph. https://cstheory.stackexchange.com/questions/33440/fast-extraction-of-the-edges-of-an-induced-subgraph. Accessed 26 December 2015

  14. Martínez, V., Berzal, F., Cubero, J.C.: A survey of link prediction in complex networks. ACM Comput. Surv. 49(4), 1–33 (2016). https://doi.org/10.1145/3012704

    Article  Google Scholar 

  15. Newman, M.E.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004)

    Article  Google Scholar 

  16. Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76, 036106 (2007)

    Article  Google Scholar 

  17. Rossetti, G., Milli, L., Cazabet, R.: CDLIB: a python library to extract, compare and evaluate communities from complex networks. Appl. Netw. Sci. 4, 1–26 (2019)

    Article  Google Scholar 

  18. Rossetti, G., Pappalardo, L., Rinzivillo, S.: A novel approach to evaluate community detection algorithms on ground truth. In: Cherifi, H., Gonçalves, B., Menezes, R., Sinatra, R. (eds.) Complex Networks VII. SCI, vol. 644, pp. 133–144. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30569-1_10

    Chapter  Google Scholar 

  19. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval (1983)

    Google Scholar 

  20. Šubelj, L., Bajec, M.: Model of complex networks based on citation dynamics. In: Proceedings of the 22nd International Conference on World Wide Web (2013)

    Google Scholar 

  21. Traag, V.A., Krings, G., Van Dooren, P.: Significant scales in community structure. Sci. Rep. 3(1), 1–10 (2013)

    Article  Google Scholar 

  22. Traag, V.A., Waltman, L., Van Eck, N.J.: From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9(1), 1–12 (2019)

    Article  Google Scholar 

  23. Unknown: CDP (2022). https://github.com/2x254/CDP

  24. Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33(4), 452–473 (1977)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Motaz Ben Hassine .

Editor information

Editors and Affiliations

Appendices

Appendix a

We have \(s_2(u_2,v_2)=\frac{k+2}{\sqrt{(n+2)(n+k)}}\)

if \(\mathbf {(u_1,v_1) \in E_2 }\), we can distinguish two cases:

  • \(\mathbf {|\{u_1,v_1\} \cap \{u_0, v_0\}| \ne 1}\). In this case, we have \(s_2(u_1,v_1) = 1\). Then,

    \(\frac{k+2}{\sqrt{(n+2)(n+k)}}< 1 \Leftrightarrow (k+2)^2<(n+2)(n+k) \Leftrightarrow k < (n-2)+\)

    \(\sqrt{(2-n)^2 +4(n^2 +2n-4)}\). k.t. \( (n-2)+ \sqrt{(2-n)^2 +4(n^2 +2n-4)} > n \implies s_2(u_2,v_2) < s_2(u_1,v_1) \; \forall \; 1\le k \le n \).

  • \(\mathbf {|\{u_1,v_1\} \cap \{u_0, v_0\}| = 1}\). In this case, we have \(s_2(u_1,v_1) = \frac{n}{\sqrt{n(n+k)}}\). Then, \(\frac{k+2}{\sqrt{(n+2)(n+k)}}< \frac{n}{\sqrt{n(n+k)}} \Leftrightarrow \frac{(k+2)^2}{(n+2)(n+k)}< \frac{n}{(n+k)} \Leftrightarrow (k+2)^2< n(n+2) \Leftrightarrow k+2< \sqrt{n(n+2)} \Leftrightarrow k < \sqrt{n(n+2)} -2 \). k.t. \(\sqrt{n(n+2)} -2< n-1 \;\; \forall \;\;n \ge 4 \implies \mathbf {s_2(u_2,v_2) < s_2(u_1,v_1) }\) iff. \(\mathbf { k < n-1 }\).

if \(\mathbf {(u_1,v_1) \in E_1 }\), there are also two cases:

  • \(\mathbf {|\{u_1,v_1\} \cap \{u_2, v_2\}| = 0}\) or \(\mathbf {(u_1,v_1) \in E_1 \cap E_3}\). In this case, \(s_2(u_1,v_1) = 1\). Then, \(s_2(u_2,v_2) < s_2(u_1,v_1) ~\forall \; 1\le k \le n \). (proved).

  • \(\mathbf {|\{u_1,v_1\} \cap \{u_2, v_2\}| = 1}\). In this case, \(s_2(u_1,v_1)= \frac{n}{\sqrt{n(n+2)}}\). Then,

    \(\frac{k+2}{\sqrt{(n+2)(n+k)}}< \frac{n}{\sqrt{n(n+2)}} \Leftrightarrow (k+2)^2< n(n+k) \Leftrightarrow k< \frac{(n-4) + \sqrt{n(5n-8)}}{2} \Leftrightarrow k \le n-1< \frac{(n-4) + \sqrt{n(5n-8)}}{2} \Leftrightarrow s_2(u_2,v_2) < s_2(u_1,v_1) \) iff. \(k \le n-1\).

Appendix B

\(s_2^{\alpha }(u_2,v_2)= (1-\alpha ) \frac{k+2}{\sqrt{(n+2)(n+k)}} + \alpha \frac{(k+2)(k+1)}{min(n(n-1)+4k+2,\;n(n-1)+(k+2)(k+1)-2)} \)

It should be noted that \(min(n(n-1)+6,\;n(n-1)+4)=n(n-1)+4\) iff \(k=1\) and \(min(n(n-1)+4k+2,\;n(n-1)+(k+2)(k+1)-2) = n(n-1)+4k+2 \) iff \(k \ge 2\)

if \(\mathbf {(u_1,v_1) \in E_2 }\), we have the same cases as mentioned in the proof of 2S:

  • \( \mathbf {|\{u_1,v_1\} \cap \{u_0, v_0\}| \ne 1}\). In this case, we have \(s_2^{\alpha }(u_1,v_1) = 1\). Then,

    • if \(\mathbf {k=1}\): k.t. \(\frac{3}{\sqrt{(n+2)(n+1)}} < 1 \;\forall \; n \ge 4 \Leftrightarrow (1-\alpha ) \frac{3}{\sqrt{(n+2)(n+1)}} \le (1-\alpha ) \;\;\forall \; \alpha \in [0 .. 1] \) and k.t. \(\frac{6}{n(n-1)+4}< 1 \;\forall \; n \ge 4 \Leftrightarrow \alpha \frac{6}{n(n-1)+4}< \alpha \;\forall \; \alpha \in \;]0 .. 1] \Leftrightarrow (1-\alpha ) \frac{3}{\sqrt{(n+2)(n+1)}} + \alpha \frac{6}{n(n-1)+4} < 1 \;\;\forall \; \alpha \in \; ]0 .. 1]\)

    • if \(\mathbf {k\ge 2}\): k.t. \(\frac{k+2}{\sqrt{(n+2)(n+k)}} < 1 \; \forall \; 1\le k \le n\) (proved) \(\Leftrightarrow (1-\alpha ) \frac{k+2}{\sqrt{(n+2)(n+k)}}\)

      \(\le (1-\alpha ) \;\;\; \forall \; 1\le k \le n, \; \forall \; \alpha \in [0 .. 1] \) and wtp \(\frac{(k+2)(k+1)}{n(n-1)+4k+2} < 1 \) iff. \(k < n. \;\;\;\;\) \(\frac{(k+2)(k+1)}{n(n-1)+4k+2}< 1 \Leftrightarrow (k+2)(k+1)< n(n-1)+4k+2 \Leftrightarrow k^2 +3k+2-4k< n^2 -n +2 \Leftrightarrow k^2 -k< n^2 -n \Leftrightarrow k < n \).

      \(\Leftrightarrow \alpha \frac{(k+2)(k+1)}{n(n-1)+4k+2} < \alpha \) iff. \(k<n, \;\;\forall \; \alpha \in \;]0 .. 1] \Leftrightarrow (1-\alpha ) \frac{k+2}{\sqrt{(n+2)(n+k)}} + \alpha \frac{(k+2)(k+1)}{n(n-1)+4k+2} < 1 \) iff \(k<n, \;\;\forall \; \alpha \in \;]0 .. 1]. \)

  • \(\mathbf {|\{u_1,v_1\} \cap \{u_0, v_0\}| = 1}\). We have \(s_2^{\alpha }(u_1,v_1) = (1-\alpha ) \frac{n}{\sqrt{n(n+k)}}+ \alpha \). Then,

    • if \(\mathbf {k=1}\) : k.t. \( (1-\alpha ) \frac{3}{\sqrt{(n+2)(n+1)}} \le (1-\alpha ) \frac{n}{\sqrt{n(n+1)}} \;\;\forall \;\; n \ge 4, \; \forall \; \alpha \in [0 .. 1] \) and \(\alpha \frac{6}{n(n-1)+4}< \alpha \;\; \forall \;\; n \ge 4, \; \forall \; \alpha \in \;]0 .. 1] \Leftrightarrow (1-\alpha ) \frac{3}{\sqrt{(n+2)(n+1)}} + \alpha \frac{6}{n(n-1)+4} < (1-\alpha ) \frac{n}{\sqrt{n(n+1)}} + \alpha \;\;\; \forall \; \alpha \in \;]0 .. 1] \).

    • if \(\mathbf { 2 \le k < n-1 }\) : k.t. \(\frac{k+2}{\sqrt{(n+2)(n+k)}} - \frac{n}{\sqrt{n(n+k)}} < 0\) iff. \(k<n-1\) (proved) and \(\frac{(k+2)(k+1)}{n(n-1)+4k+2}-1 < 0 \) iff. \(k < n. \;\) (proved) \( \Leftrightarrow (1-\alpha ) [\frac{k+2}{\sqrt{(n+2)(n+k)}} - \frac{n}{\sqrt{n(n+k)}}] \le 0\) iff. \(k<n-1, \;\forall \; \alpha \in [0 .. 1]\) and \(\alpha [\frac{(k+2)(k+1)}{n(n-1)+4k+2}-1] < 0 \) iff. \(k< n, \;\forall \; \alpha \in \;]0 .. 1] \) \(\Leftrightarrow (1-\alpha )[\frac{k+2}{\sqrt{(n+2)(n+k)}} - \frac{n}{\sqrt{n(n+k)}}] + \alpha [\frac{(k+2)(k+1)}{n(n-1)+4k+2}-1] < 0 \) iff. \(k< n-1, \;\forall \; \alpha \in \;]0 .. 1] \Leftrightarrow (1-\alpha ) \frac{k+2}{\sqrt{(n+2)(n+k)}} + \alpha \frac{(k+2)(k+1)}{n(n-1)+4k+2} < (1-\alpha ) \frac{n}{\sqrt{n(n+k)}} +\alpha \) iff. \(k < n-1, \; \forall \; \alpha \in \;]0 .. 1].\)

    • if \(\mathbf { k=n-1}\) : let be \(a=(1-\alpha ) \frac{n+1}{\sqrt{(n+2)(2n-1)}} + \alpha \frac{(n+1)n}{(n+4)(n-1)+2}\) and \(b=(1-\alpha ) \frac{n}{\sqrt{n(2n-1)}}+ \alpha \Leftrightarrow a-b= (1-\alpha ) \frac{n+1}{\sqrt{(n+2)(2n-1)}} + \alpha \frac{(n+1)n}{(n+4)(n-1)+2} - (1-\alpha ) \frac{n}{\sqrt{n(2n-1)}} - \alpha = [\frac{n+1}{\sqrt{(n+2)(2n-1)}} - \frac{n}{\sqrt{n(2n-1)}}] - \alpha [ \frac{n+1}{\sqrt{(n+2)(2n-1)}} - \frac{n}{\sqrt{n(2n-1)}} + 1 - \frac{(n+1)n}{(n+4)(n-1)+2}]\)

      \(\Leftrightarrow a-b <0 \) iff. \(\alpha > \frac{ \frac{n+1}{\sqrt{(n+2)(2n-1)}} - \frac{n}{\sqrt{n(2n-1)}}}{\frac{n+1}{\sqrt{(n+2)(2n-1)}} - \frac{n}{\sqrt{n(2n-1)}} + 1 - \frac{(n+1)n}{(n+4)(n-1)+2}}\). Let’s consider

      \(f(n)= \frac{ \frac{n+1}{\sqrt{(n+2)(2n-1)}} - \frac{n}{\sqrt{n(2n-1)}}}{\frac{n+1}{\sqrt{(n+2)(2n-1)}} - \frac{n}{\sqrt{n(2n-1)}} + 1 - \frac{(n+1)n}{(n+4)(n-1)+2}}\) a continuous decreasing function on \([4, +\infty [ \). Calculating the limits : \(\displaystyle {\lim _{n \rightarrow +\infty }f(n)} = 0\) and \(\displaystyle {\lim _{n \rightarrow 4}f(n)} \approx 0.06 \Leftrightarrow 0 \le f(n) \le 0.06 \). Then, \( a<b \;\) iff. \( \alpha > 0.06\).

    • if \( \mathbf {k=n}\) : let be \(a = (1- \alpha ) \sqrt{ \frac{n+2}{2n}} + \alpha \; \) and \(b= (1- \alpha ) \frac{1}{\sqrt{2}} + \alpha \). Let’s consider \(g(n)=\sqrt{ \frac{n+2}{2n}}\) a continuous and strictly positive decreasing function on \([4,+\infty [ \). Calculating the limits : \( \displaystyle {\lim _{n \rightarrow +\infty }g(n)} = \frac{1}{\sqrt{2}}\) and \(\displaystyle {\lim _{n \rightarrow 4}g(n)} = \sqrt{\frac{3}{4}} \Leftrightarrow \frac{1}{\sqrt{2}} \le g(n) \le \sqrt{\frac{3}{4}} \Leftrightarrow g(n) \ge \frac{1}{\sqrt{2}} \Leftrightarrow (1-\alpha ) \;g(n) + \alpha \ge (1-\alpha ) \frac{1}{\sqrt{2}} + \alpha \Leftrightarrow a \ge b \Leftrightarrow \;\)if \( k=n, s_2^{\alpha }(u_2,v_2) \ge s_2^{\alpha }(u_1,v_1) \;\; \forall \; \alpha \in [0 .. 1]\).

    \(\implies \mathbf {s_2^{\alpha }(u_2,v_2) < s_2^{\alpha }(u_1,v_1)}\) iff. \(\mathbf {k\le n-1 }\) and \(\mathbf {\alpha > 0.06 }\).

if \(\mathbf {(u_1,v_1) \in E_1 }\), there are also two cases, which are the same in the proof of 2S:

  • \(\mathbf {|\{u_1,v_1\} \cap \{u_2, v_2\}| = 0}\) or \(\mathbf {(u_1,v_1) \in E_1 \cap E_3}\). In this case, \(s_2^{\alpha }(u_1,v_1) = 1\). Then, \(s_2^{\alpha }(u_2,v_2)< s_2^{\alpha }(u_1,v_1)\) iff \(k<n\) (proved).

  • \(\mathbf {|\{u_1,v_1\} \cap \{u_2, v_2\}| = 1}\). In this case, \( s_2^{\alpha }(u_1,v_1) = (1-\alpha )\frac{n}{\sqrt{n(n+2)}}+\alpha \). Then,

    • if \(\mathbf {k=1}\): k.t. \(\frac{3}{\sqrt{(n+2)(n+1)}} < \frac{n}{\sqrt{n(n+2)}} \Leftrightarrow (1-\alpha ) \frac{3}{\sqrt{(n+2)(n+1)}} \le (1-\alpha ) \frac{n}{\sqrt{n(n+2)}} \; \forall \; \alpha \in [0 .. 1] , \; \forall \; n \ge 4 \). k.t. \(\frac{6}{n(n-1)+4}< 1 \Leftrightarrow \alpha \frac{6}{n(n-1)+4}< \alpha \;\forall \; \alpha \in \;]0 .. 1] \Leftrightarrow (1-\alpha ) \frac{3}{\sqrt{(n+2)(n+1)}} + \alpha \frac{6}{n(n-1)+4} < (1-\alpha )\frac{n}{\sqrt{n(n+2)}}+\alpha \;\forall \; \alpha \in \;]0 .. 1], \; \forall \; n \ge 4 \).

    • if \(\mathbf {k\ge 2}\): k.t. \(\frac{k+2}{\sqrt{(n+2)(n+k)}} < \frac{n}{\sqrt{n(n+2)}}\) iff \(k\le n-1\) (proved) \(\Leftrightarrow (1- \alpha ) \frac{k+2}{\sqrt{(n+2)(n+k)}} \le (1-\alpha ) \frac{n}{\sqrt{n(n+2)}} \) iff \(k\le n-1, \;\forall \; \alpha \in [0 .. 1].\) k.t.

      \(\frac{(k+2)(k+1)}{n(n-1)+4k+2} < 1 \) iff \(k < n.\) (proved) \(\Leftrightarrow \alpha \frac{(k+2)(k+1)}{n(n-1)+4k+2} < \alpha \;\forall \; \alpha \in \;]0 .. 1]\) \(\Leftrightarrow (1- \alpha ) \frac{k+2}{\sqrt{(n+2)(n+k)}} + \alpha \frac{(k+2)(k+1)}{n(n-1)+4k+2} < (1-\alpha ) \frac{n}{\sqrt{n(n+2)}} + \alpha \) iff \(k \le n-1 \;\forall \; \alpha \in \;]0 .. 1] \Leftrightarrow s_2^{\alpha }(u_2,v_2) < s_2^{\alpha }(u_1,v_1) \) iff. \(k\le n-1\).

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hassine, M.B., Jabbour, S., Kmimech, M., Raddaoui, B., Graiet, M. (2023). A Non-overlapping Community Detection Approach Based on \(\alpha \)-Structural Similarity. In: Wrembel, R., Gamper, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2023. Lecture Notes in Computer Science, vol 14148. Springer, Cham. https://doi.org/10.1007/978-3-031-39831-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-39831-5_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-39830-8

  • Online ISBN: 978-3-031-39831-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics