Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Privacy-preserving eigenvector computation with applications in spectral clustering

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

Eigenvectors give many useful information about the data. One of the applications that benefits from eigenvectors is spectral clustering in which the nodes of a graph that can be a representation of a data set, will be clustered based on the spectrum (eigenvalues) of the Laplacian matrix. However, in scenarios where the data is distributed among multiple data owners, privacy of the data is an important concept. In the current paper, we propose a privacy-preserving protocol to compute eigenvectors of the distributed data using homomorphic encryption and Jacobi method with the aim of being employed in spectral clustering. We show that the computation overhead of each data owner in our iterative protocol is \(O(\frac{nf^2}{N})\), where n is the number of iterations, N is the total number of data owners and f is the total number of data records, which means that increasing the number of data owners leads to decreasing the computation overhead of each single data owner. Moreover, the total communication complexity of the whole protocol is \(O(nf^2)\). In comparison with previous known spectral clustering schemes PrivateGraph (Sharma in IEEE Trans Knowl Data Eng 31(5):981–995, 2018) and PrivGED (Wang et al. in IEEE Trans Knowl Data Eng, 2022), on one hand our proposed scheme only uses one computing server and on the other hand there is no single data user entity to get the final result and instead, all of the data owners get the result. The time consumption of our proposed protocol on “Parkinson Disease”, “Brain Networks”, “Cervical Cancer” and “Lung Cancer” data sets is 22.24 min, 18.38 min, 19.45 min and 3.84 min, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Algorithm 3
Fig. 1
Fig. 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Availability of data and materials

The data sets used during the current study are available in the UCI machine learning repository, http://archive.ics.uci.edu/ml

References

  1. Penna F, Stańczak S (2014) Decentralized eigenvalue algorithms for distributed signal detection in wireless networks. IEEE Trans Signal Process 63(2):427–440

    Article  MathSciNet  Google Scholar 

  2. Sahai T, Speranzon A, Banaszuk A (2010) Wave equation based algorithm for distributed eigenvector computation. In: 49th IEEE Conference on Decision and Control (CDC), pp. 7308–7315. IEEE

  3. Leonardos S, Preciado V, Daniilidis K (2017) A dynamical systems approach to distributed eigenvector computation. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 2209–2215. IEEE

  4. Wang S, Zheng Y, Jia X, Yi X (2022) Privacy-preserving analytics on decentralized social graphs: The case of eigen decomposition. IEEE Trans Knowl Data Eng

  5. Sharma S, Powers J, Chen K (2018) Privategraph: Privacy-preserving spectral analysis of encrypted graphs in the cloud. IEEE Trans Knowl Data Eng 31(5):981–995

    Article  Google Scholar 

  6. Zhang Y, Bai G, Li X, Curtis C, Chen C, Ko RK (2020) Privcoll: Practical privacy-preserving collaborative machine learning. In: Computer Security–ESORICS 2020: 25th European Symposium on Research in Computer Security, ESORICS 2020, Guildford, UK, September 14–18, 2020, Proceedings, Part I, pp. 399–418. Springer

  7. Bertrand A, Moonen M (2014) Distributed adaptive estimation of covariance matrix eigenvectors in wireless sensor networks with application to distributed pca. Signal Process 104:120–135

    Article  Google Scholar 

  8. Mises R, Pollaczek-Geiringer H (1929) Praktische verfahren der gleichungsauflösung. ZAMM-J Appl Math Mech 9(1):58–77

    Article  Google Scholar 

  9. Lanczos C (1950) An iteration method for the solution of the eigenvalue problem of linear differential and integral operators

  10. Pearson K (1901) Liii. on lines and planes of closest fit to systems of points in space. Lond Edinburgh Dublin Philos Mag J Sci 2(11):559–572

  11. Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417

    Article  Google Scholar 

  12. Jacobi CGJ (1846) Über ein leichtes verfahren die in der theorie der säcularstörungen vorkommenden gleichungen numerisch aufzulösen*

  13. Hewage U, Sinha R, Naeem MA (2023) Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review. Artif Intell Rev 1–38

  14. Li Y, Xu W (2019) Privpy: General and scalable privacy-preserving data mining. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1299–1307

  15. Chamikara MAP, Bertok P, Liu D, Camtepe S, Khalil I (2020) Efficient privacy preservation of big data for accurate data mining. Inf Sci 527:420–443

    Article  MathSciNet  Google Scholar 

  16. Alguliyev RM, Aliguliyev RM, Abdullayeva FJ (2019) Privacy-preserving deep learning algorithm for big personal data analysis. J Ind Inf Integr 15:1–14

    Google Scholar 

  17. Mendes R, Vilela JP (2017) Privacy-preserving data mining: methods, metrics, and applications. IEEE Access 5:10562–10582

    Article  Google Scholar 

  18. Han X, Tong X, Fan Y (2023) Eigen selection in spectral clustering: a theory-guided practice. J Am Stat Assoc 118(541):109–121

    Article  MathSciNet  Google Scholar 

  19. Abbe E, Fan J, Wang K (2022) An l p theory of pca and spectral clustering. Ann Stat 50(4):2359–2385

    Article  MathSciNet  Google Scholar 

  20. Couillet R, Chatelain F, Le Bihan N (2021) Two-way kernel matrix puncturing: towards resource-efficient pca and spectral clustering. In: International Conference on Machine Learning, pp. 2156–2165. PMLR

  21. Hasan BMS, Abdulazeez AM (2021) A review of principal component analysis algorithm for dimensionality reduction. J Soft Comput Data Min 2(1):20–30

    Google Scholar 

  22. Sharma S, Powers J, Chen K (2016) Privacy-preserving spectral analysis of large graphs in public clouds. In: Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, pp. 71–82

  23. Al-Rubaie M, Wu P-y, Chang JM, Kung S-Y (2017) Privacy-preserving pca on horizontally-partitioned data. In: 2017 IEEE Conference on Dependable and Secure Computing, pp. 280–287. IEEE

  24. Liu Y, Chen C, Zheng L, Wang L, Zhou J, Liu G, Yang S (2020) Privacy preserving pca for multiparty modeling. arXiv preprint arXiv:2002.02091

  25. Li J, Wei J, Ye M, Liu W, Hu X (2020) Privacy-preserving constrained spectral clustering algorithm for large-scale data sets. IET Inf Secur 14(3):321–331

    Article  Google Scholar 

  26. Wang S, Chang JM (2018) Differentially private principal component analysis over horizontally partitioned data. In: 2018 IEEE Conference on Dependable and Secure Computing (DSC), pp. 1–8. IEEE

  27. Lin Z, Jaromczyk JW (2011) Privacy preserving spectral clustering over vertically partitioned data sets. In: 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), vol. 2, pp. 1206–1211. IEEE

  28. Patil RY, Patil YH, Kachhoria R, Lonare S (2022) A provably secure data sharing scheme for smart gas distribution grid using fog computing. Int J Inf Technol 14(6):2927–2939

    Google Scholar 

  29. Patil S, Bhandari S, Thepade S, Raut R, Athawale SV (2022) Improved resilience of secret sharing scheme with augmented multifarious features. Int J Inf Technol 14(5):2633–2644

    Google Scholar 

  30. Narantsatsralt U-U, Kang S et al (2017) Social network community detection using agglomerative spectral clustering. Complexity 2017

  31. Symeonidis P, Mantas N (2013) Spectral clustering for link prediction in social networks with positive and negative links. Soc Netw Anal Min 3:1433–1447

    Article  Google Scholar 

  32. Wang X, Qian B, Davidson I (2014) On constrained spectral clustering and its applications. Data Min Knowl Disc 28:1–30

    Article  MathSciNet  Google Scholar 

  33. Lin L, Tang C, Dong G, Chen Z, Pan Z, Liu J, Yang Y, Shi J, Ji R, Hong W (2021) Spectral clustering to analyze the hidden events in single-molecule break junctions. J Phys Chem C 125(6):3623–3630

    Article  Google Scholar 

  34. Gan S, Cosgrove DA, Gardiner EJ, Gillet VJ (2014) Investigation of the use of spectral clustering for the analysis of molecular data. J Chem Inf Model 54(12):3302–3319

    Article  Google Scholar 

  35. Ren S, Zhang S, Wu T (2020) An improved spectral clustering community detection algorithm based on probability matrix. Discret Dyn Nat Soc 2020:1–6

    MathSciNet  Google Scholar 

  36. Mansano RE, Allem LE, Del-Vecchio RR, Hoppen C (2022) Balanced portfolio via signed graphs and spectral clustering in the Brazilian stock market. Qual Quant 56(4):2325–2340

    Article  Google Scholar 

  37. Kumar D, Kumar D (2023) A spectral-spatial 3d-convolutional capsule network for hyperspectral image classification with limited training samples. Int J Inf Technol 15(1):379–391

    Google Scholar 

  38. Leader JJ (2022) Numerical Analysis and Scientific Computation, 2nd edn. Chapman and Hall/CRC, Florida, United States. https://doi.org/10.1201/9781003042273

  39. Brakerski Z (2012) Fully homomorphic encryption without modulus switching from classical gapsvp. In: Advances in Cryptology–CRYPTO 2012: 32nd Annual Cryptology Conference, Santa Barbara, CA, USA, August 19-23, 2012. Proceedings, pp. 868–886. Springer

  40. Fan J, Vercauteren F (2012) Somewhat practical fully homomorphic encryption. Cryptology ePrint Archive

  41. Ng A, Jordan M, Weiss Y (2001) On spectral clustering: Analysis and an algorithm. Adv Neural Inf Process Syst 14

  42. Dua D, Graff C (2017) UCI Machine Learning Repository. http://archive.ics.uci.edu/ml

  43. Zhou L, Li C (2016) Outsourcing eigen-decomposition and singular value decomposition of large matrix to a public cloud. IEEE Access 4:869–879

    Article  Google Scholar 

  44. Hu X, Tang C (2015) Secure outsourced computation of the characteristic polynomial and eigenvalues of matrix. J Cloud Comput 4(1):1–6

    Article  Google Scholar 

  45. Ünal AB, Akgün M, Pfeifer N (2022) Cecilia: Comprehensive secure machine learning framework. arXiv preprint arXiv:2202.03023

  46. Moon JF, Aktar S, Hashem M (2015) Securely outsourcing large scale eigen value problem to public cloud. In: 2015 18th International Conference on Computer and Information Technology (ICCIT), pp. 490–494. IEEE

  47. Pathak MA, Raj B (2010) Privacy preserving protocols for eigenvector computation. In: PSDML, pp. 113–126 . Springer

  48. Zhang Y, Zheng P, Luo W (2019) Privacy-preserving outsourcing computation of qr decomposition in the encrypted domain. In: 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), pp. 389–396. IEEE

  49. Wang Y, Wu X, Wu L (2013) Differential privacy preserving spectral graph analysis. In: Advances in Knowledge Discovery and Data Mining: 17th Pacific-Asia Conference, PAKDD 2013, Gold Coast, Australia, April 14-17, 2013, Proceedings, Part II 17, pp. 329–340. Springer

  50. Fan X, Wang G, Chen K, He X, Xu W (2021) Ppca: Privacy-preserving principal component analysis using secure multiparty computation (mpc). arXiv preprint arXiv:2105.07612

  51. Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: Advances in Cryptology-EUROCRYPT’99: International Conference on the Theory and Application of Cryptographic Techniques Prague, Czech Republic, May 2–6, 1999 Proceedings 18, pp. 223–238. Springer

  52. ElGamal T (1985) A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans Inf Theory 31(4):469–472

    Article  MathSciNet  Google Scholar 

  53. Gentry C (2009) Fully homomorphic encryption using ideal lattices. In: Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing, pp. 169–178

  54. Brakerski Z, Gentry C, Vaikuntanathan V (2014) (leveled) fully homomorphic encryption without bootstrapping. ACM Trans Comput Theory (TOCT) 6(3):1–36

    Article  MathSciNet  Google Scholar 

  55. López-Alt A, Tromer E, Vaikuntanathan V (2012) On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption. In: Proceedings of the Forty-fourth Annual ACM Symposium on Theory of Computing, pp. 1219–1234

  56. Spielman D (2012) Spectral graph theory. Combin Sci Comput 18:18

    MathSciNet  Google Scholar 

  57. Chaudhary M, Pruthi J, Jain VK (2022) Suryakant: A novel squirrel search clustering algorithm for text document clustering. Int J Inf Technol 14(6):3277–3286

    Google Scholar 

  58. Mann SK, Chawla S (2023) A proposed hybrid clustering algorithm using k-means and birch for cluster based cab recommender system (cbcrs). Int J Inf Technol 15(1):219–227

    Google Scholar 

  59. Fawaz SM, Belal N, ElRefaey A, Fakhr MW (2021) A comparative study of homomorphic encryption schemes using microsoft seal. J Phys Conf Ser 2128:012021

Download references

Acknowledgements

The authors would like to acknowledge the financial support of Information and Communication Technology Park for this project under grant number 01-99-02-000282.

Funding

The funding was provided by the Information and Communication Technology Park under grant number 01-99-02-000282.

Author information

Authors and Affiliations

Authors

Contributions

MJ came up with the idea, contributed to the design of the protocol, wrote some sections and draw the figure and tables. HM contributed to the design of the protocol, wrote some sections and performed the final corrections. Both authors have reviewed the manuscript and they have agreed on publishing the current version of the manuscript.

Corresponding author

Correspondence to Hamid Mala.

Ethics declarations

Conflict of interest

The authors declare that they have no Conflict of interest to report regarding the present work.

Consent for publication

All authors have agreed and given their consent for submission of this paper to International Journal of Information Technology (IJIT).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jaberi, M., Mala, H. Privacy-preserving eigenvector computation with applications in spectral clustering. Int. j. inf. tecnol. (2024). https://doi.org/10.1007/s41870-024-01815-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s41870-024-01815-z

Keywords

Navigation