Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Efficient approximation and privacy preservation algorithms for real time online evolving data streams

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Because of the processing of continuous unstructured large streams of data, mining real-time streaming data is a more challenging research issue than mining static data. The privacy issue persists when sensitive data is included in streaming data. In recent years, there has been significant progress in research on the anonymization of static data. For the anonymization of quasi-identifiers, two typical strategies are generalization and suppression. However, the high dynamicity and potential infinite properties of the streaming data make it a challenging task. To end this, we propose a novel Efficient Approximation and Privacy Preservation Algorithms (EAPPA) framework in this paper to achieve efficient data pre-processing from the live streaming and its privacy preservation with minimum Information Loss (IL) and computational requirements. As the existing privacy preservation solutions for streaming data suffer from the challenges of redundant data, we first propose the efficient technique of data approximation with data pre-processing. We design the Flajolet Martin (FM) algorithm for robust and efficient approximation of unique elements in the data stream with a data cleaning mechanism. We fed the periodically approximated and pre-processed streaming data to the anonymization algorithm. Using adaptive clustering, we propose innovative k-anonymization and l-diversity privacy principles for data streams. The proposed approach scans a stream to detect and reuse clusters that fulfill the k-anonymity and l-diversity criteria for reducing anonymization time and IL. The experimental results reveal the efficiency of the EAPPA framework compared to state-of-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8

Similar content being viewed by others

Data availability

We conduct experiments on one real-world dataset: An adult from the UCI repository which is available on https://archive.ics.uci.edu/ml/datasets/Adult, 2020.

References

  1. Kolajo, T., Daramola, O., Adebiyi, A.: Big data stream analysis: a systematic literature review. J Big Data 6, 47 (2019). https://doi.org/10.1186/s40537-019-0210-7

    Article  Google Scholar 

  2. Mahajan, H.B., Uke, N., Pise, P., et al.: Automatic robot Manoeuvres detection using computer vision and deep learning techniques: a perspective of internet of robotics things (IoRT). Multimed. Tools Appl. (2022). https://doi.org/10.1007/s11042-022-14253-5

    Article  Google Scholar 

  3. Gama, J.: A survey on learning from data streams: current and future trends. Progress Artif. Intell. 1(1), 45–55 (2012). https://doi.org/10.1007/s13748-011-0002-6

    Article  Google Scholar 

  4. Mahajan, H.B., Badarla, A., Junnarkar, A.A.: CL-IoT: cross-layer Internet of Things protocol for intelligent manufacturing of smart farming. J. Ambient. Intell. Human Comput. 12, 7777–7791 (2021). https://doi.org/10.1007/s12652-020-02502-0

    Article  Google Scholar 

  5. Mahajan, H.B., Badarla, A.: Application of internet of things for smart precision farming: solutions and challenges. Int. J. Adv. Sci. Technol. Dec. 2018, 37–45 (2018)

    Google Scholar 

  6. Mahajan, H.B., Badarla, A.: Cross-layer protocol for WSN-assisted IoT smart farming applications using nature inspired algorithm. Wireless Pers. Commun. 121, 3125–3149 (2021). https://doi.org/10.1007/s11277-021-08866-6

    Article  Google Scholar 

  7. Sun, D., Zhang, G., Zheng, W., Li, K.: Key technologies for big data stream computing. In: Li, K., Jiang, H., Yang, L.T., Guzzocrea, A. (eds.) Big data algorithms, analytics and applications, pp. 193–214. Chapman and Hall/CRC, New York (2015) . (ISBN 978-1-4822-4055-9)

    Google Scholar 

  8. Joseph, S., Jasmin, E.A., Chandran, S.: Stream computing: opportunities and challenges in smart grid. Procedia Technol. 21, 49–53 (2015). https://doi.org/10.1016/j.protcy.2015.10.008

    Article  Google Scholar 

  9. Li, N., Li, T., Venkatasubramanian, S.: Closeness: A new privacy measure for data publishing. IEEE Trans. Knowl. Data Eng. 22(7), 943–956 (2010). https://doi.org/10.1109/tkde.2009.139

    Article  Google Scholar 

  10. Fung, B., Wang, K., Chen, R., Yu, P.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42, 1–53 (2010). https://doi.org/10.1145/1749603.1749605

    Article  Google Scholar 

  11. Zakerzadeh, H., Aggarwal, C.C., Barker, K.: Managing dimensionality in data privacy anonymization. Knowl. Inf. Syst. 49(1), 341–373 (2016)

    Article  Google Scholar 

  12. Zhang, Y., Szabo, C., Sheng, Q.Z.: Cleaning environmental sensing data streams based on individual sensor reliability. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds.) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol. 8787. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11746-1_29

    Chapter  Google Scholar 

  13. Mahajan, H.B., Rashid, A.S., Junnarkar, A.A., et al.: Integration of Healthcare 4.0 and blockchain into secure cloud-based electronic health records systems. Appl. Nanosci. (2022). https://doi.org/10.1007/s13204-021-02164-0

    Article  PubMed  PubMed Central  Google Scholar 

  14. Mahajan, H., Junnarkar, A., Tiwari, M., Tiwari, T., Upadhyaya, M.: LCIPA: lightweight clustering protocol for industry 4.0 enabled precision agriculture. Microprocess. Microsyst. 94, 104633 (2022). https://doi.org/10.1016/j.micpro.2022.104633

    Article  Google Scholar 

  15. Fischer, P.M., Esmaili, K.S., Miller, R.J.: Stream schema: providing and exploiting static metadata for data stream processing. In Proceedings of the 13th International Conference on Extending Database Technology. 207–218 (2010). https://doi.org/10.1145/1739041.1739068

  16. Reddy, K.S.S., Bindu, C.S.: A review of density-based clustering algorithms for big data analysis. In: International conference on I-SMAC (IoT in Social, Mobile, Analytic, and Cloud), Palladam, India 10–11 February 2017, IEEE (2017). https://doi.org/10.1109/i-smac.2017.8058322

  17. Deepa, M.S., Sujatha, N.: Comparative study of various clustering techniques and its characteristics. Int. J. Adv. Netw. Appl. 5(6), 2104–2116 (2014)

    Google Scholar 

  18. Zubaroğlu, A., Atalay, V.: Data stream clustering: a review. Artif. Intell. Rev. 54, 1201–1236 (2021). https://doi.org/10.1007/s10462-020-09874-x

    Article  Google Scholar 

  19. Xiao, X., Tao, Y.: Dynamic anonymization: accurate statistical analysis with privacy preservation. In: Proceedings of the 27th ACM SIGMOD international conference on management of data, pp. 107–120 (2008)

  20. Qu, Y., Yu, S., Gao, L., Zhou, W., Peng, S.: A Hybrid Privacy Protection Scheme in Cyber-Physical Social Networks. IEEE Trans. Comput. Soc. Syst. 1–12 (2018). https://doi.org/10.1109/tcss.2018.2861775

  21. Liu, P., Xu, Y.X., Jiang, Q., Tang, Y., Guo, Y., Wang, L., Li, X.: Local differential privacy for social network publishing. Neurocomputing 391, 273–279 (2019). https://doi.org/10.1016/j.neucom.2018.11.104

    Article  Google Scholar 

  22. Shao, Y., Liu, J., Shi, S., Zhang, Y., Cui, B.: Fast de-anonymization of social networks with structural information. Data Sci. Eng. (2019). https://doi.org/10.1007/s41019-019-0086-8

    Article  Google Scholar 

  23. Yazdanjue, N., Fathian, M., Amiri, B.: Evolutionary algorithms for k-Anonymity in social networks based on clustering approach. Comput. J. (2019). https://doi.org/10.1093/comjnl/bxz069

    Article  Google Scholar 

  24. Zhang, C., Wu, S., Jiang, H., Wang, Y., Yu, J., Cheng, X.: Attribute-enhanced de-anonymization of online social networks. In: Tagarelli, A., Tong, H. (eds.) Computational Data and Social Networks. CSoNet 2019. Lecture Notes in Computer Science, vol. 11917. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34980-6_29

    Chapter  Google Scholar 

  25. Siddula, M., Li, Y., Cheng, X., Tian, Z., Cai, Z.: Anonymization in Online Social Networks Based on Enhanced Equi-Cardinal Clustering. IEEE Trans. Comput. Soc. Syst. 1–12 (2019). https://doi.org/10.1109/tcss.2019.2928324

  26. Zhao, P., Huang, H., Zhao, X., Huang, D.: P3: privacy-preserving scheme against poisoning attacks in mobile-edge computing. IEEE Trans. Comput. Soc. Syst. 7(3), 818–826 (2020). https://doi.org/10.1109/tcss.2019.2960824

    Article  Google Scholar 

  27. Cai, Y., Zhang, S., Xia, H., Fan, Y., Zhang, H.: A Privacy-preserving scheme for interactive messaging over online social networks. IEEE Internet Things J. 1–1 (2020). https://doi.org/10.1109/jiot.2020.2986341

  28. Gao, T., Li, F.: Protecting social network with differential privacy under novel graph model. IEEE Access 8, 185276–185289 (2020). https://doi.org/10.1109/ACCESS.2020.3026008

    Article  Google Scholar 

  29. Qu, Y., Yu, S., Zhou, W., Chen, S., Wu, J.: Customizable reliable privacy-preserving data sharing in cyber-physical social network. IEEE Trans. Netw. Sci. Eng. 1–1 (2020). https://doi.org/10.1109/TNSE.2020.3036855

  30. Aldeen, Y.A.A.S., Salleh, M., Aljeroudi, Y.: An innovative privacy preserving technique for incremental datasets on cloud computing. J. Biomed. Inform. 62, 107–116 (2016). https://doi.org/10.1016/j.jbi.2016.06.011

    Article  PubMed  Google Scholar 

  31. Xiao, X., Tao, Y.: M-invariance. Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data - SIGMOD ’07. (2007). https://doi.org/10.1145/1247480.1247556

  32. Hasan, A., Jiang, Q., Chen, H., Wang, S.: A new approach to privacy-preserving multiple independent data publishing. Appl. Sci. 8(5), 783 (2018). https://doi.org/10.3390/app8050783

    Article  Google Scholar 

  33. Cao, J., Carminati, B., Ferrari, E., Tan, K.-L.: CASTLE: continuously anonymizing data streams. IEEE Trans. Dependable Secure Comput. 8(3), 337–352 (2011). https://doi.org/10.1109/tdsc.2009.47

    Article  Google Scholar 

  34. Guo, K., Zhang, Q.: Fast clustering-based anonymization approaches with time constraints for data streams. Knowl.-Based Syst. 46, 95–108 (2013). https://doi.org/10.1016/j.knosys.2013.03.007

    Article  Google Scholar 

  35. Wang, J., Du, K., Luo, X., et al.: Two privacy-preserving approaches for data publishing with identity reservation. Knowl. Inf. Syst. 60, 1039–1080 (2019). https://doi.org/10.1007/s10115-018-1237-3

    Article  Google Scholar 

  36. Wang, J., Deng, C., Li, X.: Two privacy-preserving approaches for publishing transactional data streams. IEEE Access 6, 23648–23658 (2018). https://doi.org/10.1109/access.2018.2814622

    Article  Google Scholar 

  37. Yang, L., Chen, X., Luo, Y., Lan, X., Wang, W.: IDEA: a utility-enhanced approach to incomplete data stream anonymization. Tsinghua Sci. Technol. 27(1), 127–140 (2022). https://doi.org/10.26599/TST.2020.9010031

    Article  Google Scholar 

  38. U.M. L. Repository, Adult data set (2020). https://archive.ics.uci.edu/ml/datasets/Adult

  39. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002). https://doi.org/10.1109/69.979982

    Article  Google Scholar 

  40. Alhayani, B.A., AlKawak, O.A., Mahajan, H.B., et al.: Design of quantum communication protocols in quantum cryptography. Wireless Pers. Commun. (2023). https://doi.org/10.1007/s11277-023-10587-x

    Article  Google Scholar 

  41. Patil, S., Vaze, V., Agarkar, P. et al.: Social context-aware and fuzzy preference temporal graph for personalized B2B marketing campaigns recommendations. Soft Comput. (2023). https://doi.org/10.1007/s00500-023-08914-2

  42. Mahajan, H., Reddy, K.T.V.: Secure gene profile data processing using lightweight cryptography and blockchain. Cluster Comput. (2023). https://doi.org/10.1007/s10586-023-04123-6

    Article  Google Scholar 

Download references

Funding

This Declaration is not applicable.

Author information

Authors and Affiliations

Authors

Contributions

The research work presented in this paper is a part of Ph. D. research of Research Scholar Rahul A. Patil which is carried out under the guidance and supervision of supervisor Dr. Pramod D. Patil.

Corresponding author

Correspondence to Rahul A. Patil.

Ethics declarations

Ethical approval

This Declaration is not applicable.

Competing interests

This Declaration is not applicable as there are no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Privacy and Security in Machine Learning

Guest Editors: Jin Li, Francesco Palmieri and Changyu Dong

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Patil, R.A., Patil, P.D. Efficient approximation and privacy preservation algorithms for real time online evolving data streams. World Wide Web 27, 5 (2024). https://doi.org/10.1007/s11280-024-01244-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11280-024-01244-9

Keywords

Navigation