Abstract
Eigenvectors give many useful information about the data. One of the applications that benefits from eigenvectors is spectral clustering in which the nodes of a graph that can be a representation of a data set, will be clustered based on the spectrum (eigenvalues) of the Laplacian matrix. However, in scenarios where the data is distributed among multiple data owners, privacy of the data is an important concept. In the current paper, we propose a privacy-preserving protocol to compute eigenvectors of the distributed data using homomorphic encryption and Jacobi method with the aim of being employed in spectral clustering. We show that the computation overhead of each data owner in our iterative protocol is \(O(\frac{nf^2}{N})\), where n is the number of iterations, N is the total number of data owners and f is the total number of data records, which means that increasing the number of data owners leads to decreasing the computation overhead of each single data owner. Moreover, the total communication complexity of the whole protocol is \(O(nf^2)\). In comparison with previous known spectral clustering schemes PrivateGraph (Sharma in IEEE Trans Knowl Data Eng 31(5):981–995, 2018) and PrivGED (Wang et al. in IEEE Trans Knowl Data Eng, 2022), on one hand our proposed scheme only uses one computing server and on the other hand there is no single data user entity to get the final result and instead, all of the data owners get the result. The time consumption of our proposed protocol on “Parkinson Disease”, “Brain Networks”, “Cervical Cancer” and “Lung Cancer” data sets is 22.24 min, 18.38 min, 19.45 min and 3.84 min, respectively.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
The data sets used during the current study are available in the UCI machine learning repository, http://archive.ics.uci.edu/ml
References
Penna F, Stańczak S (2014) Decentralized eigenvalue algorithms for distributed signal detection in wireless networks. IEEE Trans Signal Process 63(2):427–440
Sahai T, Speranzon A, Banaszuk A (2010) Wave equation based algorithm for distributed eigenvector computation. In: 49th IEEE Conference on Decision and Control (CDC), pp. 7308–7315. IEEE
Leonardos S, Preciado V, Daniilidis K (2017) A dynamical systems approach to distributed eigenvector computation. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 2209–2215. IEEE
Wang S, Zheng Y, Jia X, Yi X (2022) Privacy-preserving analytics on decentralized social graphs: The case of eigen decomposition. IEEE Trans Knowl Data Eng
Sharma S, Powers J, Chen K (2018) Privategraph: Privacy-preserving spectral analysis of encrypted graphs in the cloud. IEEE Trans Knowl Data Eng 31(5):981–995
Zhang Y, Bai G, Li X, Curtis C, Chen C, Ko RK (2020) Privcoll: Practical privacy-preserving collaborative machine learning. In: Computer Security–ESORICS 2020: 25th European Symposium on Research in Computer Security, ESORICS 2020, Guildford, UK, September 14–18, 2020, Proceedings, Part I, pp. 399–418. Springer
Bertrand A, Moonen M (2014) Distributed adaptive estimation of covariance matrix eigenvectors in wireless sensor networks with application to distributed pca. Signal Process 104:120–135
Mises R, Pollaczek-Geiringer H (1929) Praktische verfahren der gleichungsauflösung. ZAMM-J Appl Math Mech 9(1):58–77
Lanczos C (1950) An iteration method for the solution of the eigenvalue problem of linear differential and integral operators
Pearson K (1901) Liii. on lines and planes of closest fit to systems of points in space. Lond Edinburgh Dublin Philos Mag J Sci 2(11):559–572
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417
Jacobi CGJ (1846) Über ein leichtes verfahren die in der theorie der säcularstörungen vorkommenden gleichungen numerisch aufzulösen*
Hewage U, Sinha R, Naeem MA (2023) Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review. Artif Intell Rev 1–38
Li Y, Xu W (2019) Privpy: General and scalable privacy-preserving data mining. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1299–1307
Chamikara MAP, Bertok P, Liu D, Camtepe S, Khalil I (2020) Efficient privacy preservation of big data for accurate data mining. Inf Sci 527:420–443
Alguliyev RM, Aliguliyev RM, Abdullayeva FJ (2019) Privacy-preserving deep learning algorithm for big personal data analysis. J Ind Inf Integr 15:1–14
Mendes R, Vilela JP (2017) Privacy-preserving data mining: methods, metrics, and applications. IEEE Access 5:10562–10582
Han X, Tong X, Fan Y (2023) Eigen selection in spectral clustering: a theory-guided practice. J Am Stat Assoc 118(541):109–121
Abbe E, Fan J, Wang K (2022) An l p theory of pca and spectral clustering. Ann Stat 50(4):2359–2385
Couillet R, Chatelain F, Le Bihan N (2021) Two-way kernel matrix puncturing: towards resource-efficient pca and spectral clustering. In: International Conference on Machine Learning, pp. 2156–2165. PMLR
Hasan BMS, Abdulazeez AM (2021) A review of principal component analysis algorithm for dimensionality reduction. J Soft Comput Data Min 2(1):20–30
Sharma S, Powers J, Chen K (2016) Privacy-preserving spectral analysis of large graphs in public clouds. In: Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, pp. 71–82
Al-Rubaie M, Wu P-y, Chang JM, Kung S-Y (2017) Privacy-preserving pca on horizontally-partitioned data. In: 2017 IEEE Conference on Dependable and Secure Computing, pp. 280–287. IEEE
Liu Y, Chen C, Zheng L, Wang L, Zhou J, Liu G, Yang S (2020) Privacy preserving pca for multiparty modeling. arXiv preprint arXiv:2002.02091
Li J, Wei J, Ye M, Liu W, Hu X (2020) Privacy-preserving constrained spectral clustering algorithm for large-scale data sets. IET Inf Secur 14(3):321–331
Wang S, Chang JM (2018) Differentially private principal component analysis over horizontally partitioned data. In: 2018 IEEE Conference on Dependable and Secure Computing (DSC), pp. 1–8. IEEE
Lin Z, Jaromczyk JW (2011) Privacy preserving spectral clustering over vertically partitioned data sets. In: 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), vol. 2, pp. 1206–1211. IEEE
Patil RY, Patil YH, Kachhoria R, Lonare S (2022) A provably secure data sharing scheme for smart gas distribution grid using fog computing. Int J Inf Technol 14(6):2927–2939
Patil S, Bhandari S, Thepade S, Raut R, Athawale SV (2022) Improved resilience of secret sharing scheme with augmented multifarious features. Int J Inf Technol 14(5):2633–2644
Narantsatsralt U-U, Kang S et al (2017) Social network community detection using agglomerative spectral clustering. Complexity 2017
Symeonidis P, Mantas N (2013) Spectral clustering for link prediction in social networks with positive and negative links. Soc Netw Anal Min 3:1433–1447
Wang X, Qian B, Davidson I (2014) On constrained spectral clustering and its applications. Data Min Knowl Disc 28:1–30
Lin L, Tang C, Dong G, Chen Z, Pan Z, Liu J, Yang Y, Shi J, Ji R, Hong W (2021) Spectral clustering to analyze the hidden events in single-molecule break junctions. J Phys Chem C 125(6):3623–3630
Gan S, Cosgrove DA, Gardiner EJ, Gillet VJ (2014) Investigation of the use of spectral clustering for the analysis of molecular data. J Chem Inf Model 54(12):3302–3319
Ren S, Zhang S, Wu T (2020) An improved spectral clustering community detection algorithm based on probability matrix. Discret Dyn Nat Soc 2020:1–6
Mansano RE, Allem LE, Del-Vecchio RR, Hoppen C (2022) Balanced portfolio via signed graphs and spectral clustering in the Brazilian stock market. Qual Quant 56(4):2325–2340
Kumar D, Kumar D (2023) A spectral-spatial 3d-convolutional capsule network for hyperspectral image classification with limited training samples. Int J Inf Technol 15(1):379–391
Leader JJ (2022) Numerical Analysis and Scientific Computation, 2nd edn. Chapman and Hall/CRC, Florida, United States. https://doi.org/10.1201/9781003042273
Brakerski Z (2012) Fully homomorphic encryption without modulus switching from classical gapsvp. In: Advances in Cryptology–CRYPTO 2012: 32nd Annual Cryptology Conference, Santa Barbara, CA, USA, August 19-23, 2012. Proceedings, pp. 868–886. Springer
Fan J, Vercauteren F (2012) Somewhat practical fully homomorphic encryption. Cryptology ePrint Archive
Ng A, Jordan M, Weiss Y (2001) On spectral clustering: Analysis and an algorithm. Adv Neural Inf Process Syst 14
Dua D, Graff C (2017) UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
Zhou L, Li C (2016) Outsourcing eigen-decomposition and singular value decomposition of large matrix to a public cloud. IEEE Access 4:869–879
Hu X, Tang C (2015) Secure outsourced computation of the characteristic polynomial and eigenvalues of matrix. J Cloud Comput 4(1):1–6
Ünal AB, Akgün M, Pfeifer N (2022) Cecilia: Comprehensive secure machine learning framework. arXiv preprint arXiv:2202.03023
Moon JF, Aktar S, Hashem M (2015) Securely outsourcing large scale eigen value problem to public cloud. In: 2015 18th International Conference on Computer and Information Technology (ICCIT), pp. 490–494. IEEE
Pathak MA, Raj B (2010) Privacy preserving protocols for eigenvector computation. In: PSDML, pp. 113–126 . Springer
Zhang Y, Zheng P, Luo W (2019) Privacy-preserving outsourcing computation of qr decomposition in the encrypted domain. In: 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), pp. 389–396. IEEE
Wang Y, Wu X, Wu L (2013) Differential privacy preserving spectral graph analysis. In: Advances in Knowledge Discovery and Data Mining: 17th Pacific-Asia Conference, PAKDD 2013, Gold Coast, Australia, April 14-17, 2013, Proceedings, Part II 17, pp. 329–340. Springer
Fan X, Wang G, Chen K, He X, Xu W (2021) Ppca: Privacy-preserving principal component analysis using secure multiparty computation (mpc). arXiv preprint arXiv:2105.07612
Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: Advances in Cryptology-EUROCRYPT’99: International Conference on the Theory and Application of Cryptographic Techniques Prague, Czech Republic, May 2–6, 1999 Proceedings 18, pp. 223–238. Springer
ElGamal T (1985) A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans Inf Theory 31(4):469–472
Gentry C (2009) Fully homomorphic encryption using ideal lattices. In: Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing, pp. 169–178
Brakerski Z, Gentry C, Vaikuntanathan V (2014) (leveled) fully homomorphic encryption without bootstrapping. ACM Trans Comput Theory (TOCT) 6(3):1–36
López-Alt A, Tromer E, Vaikuntanathan V (2012) On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption. In: Proceedings of the Forty-fourth Annual ACM Symposium on Theory of Computing, pp. 1219–1234
Spielman D (2012) Spectral graph theory. Combin Sci Comput 18:18
Chaudhary M, Pruthi J, Jain VK (2022) Suryakant: A novel squirrel search clustering algorithm for text document clustering. Int J Inf Technol 14(6):3277–3286
Mann SK, Chawla S (2023) A proposed hybrid clustering algorithm using k-means and birch for cluster based cab recommender system (cbcrs). Int J Inf Technol 15(1):219–227
Fawaz SM, Belal N, ElRefaey A, Fakhr MW (2021) A comparative study of homomorphic encryption schemes using microsoft seal. J Phys Conf Ser 2128:012021
Acknowledgements
The authors would like to acknowledge the financial support of Information and Communication Technology Park for this project under grant number 01-99-02-000282.
Funding
The funding was provided by the Information and Communication Technology Park under grant number 01-99-02-000282.
Author information
Authors and Affiliations
Contributions
MJ came up with the idea, contributed to the design of the protocol, wrote some sections and draw the figure and tables. HM contributed to the design of the protocol, wrote some sections and performed the final corrections. Both authors have reviewed the manuscript and they have agreed on publishing the current version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no Conflict of interest to report regarding the present work.
Consent for publication
All authors have agreed and given their consent for submission of this paper to International Journal of Information Technology (IJIT).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jaberi, M., Mala, H. Privacy-preserving eigenvector computation with applications in spectral clustering. Int. j. inf. tecnol. (2024). https://doi.org/10.1007/s41870-024-01815-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41870-024-01815-z