Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3240323.3240325acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
extended-abstract

Comparing recommender systems using synthetic data

Published: 27 September 2018 Publication History

Abstract

In this work, we propose SynRec, a data protection framework that uses data synthesis. The goal is to protect sensitive information in the user-item matrix by replacing the original values with synthetic values or, alternatively, completely synthesizing new users. The synthetic data must fulfill two requirements. First, it must no longer be possible to derive certain sensitive information from the data, and, second, it must remain possible to use the synthetic data for comparing recommender systems. SynRec is a step towards making it possible for companies to release recommender system data to the research community for the development of new algorithms, for example, in the context of recommender system challenges. We report the results of preliminary experiments, which provide a proof-of-concept, and also describe the future research directions, i.e., the challenges that must be addressed in order to make the framework useful in practice.

References

[1]
Moustafa Alzantot, Supriyo Chakraborty, and Mani Srivastava. 2017. Sensegen: A deep learning architecture for synthetic sensor data generation. In 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 188--193.
[2]
Nino Antulov-Fantulin, Matko Bošnjak, Vinko Zlatić, Miha Grčar, and Tomislav Šmuc. 2014. Synthetic Sequence Generator for Recommender Systems-Memory Biased Random Walk on a Sequence Multilayer Network. In International Conference on Discovery Science. Springer, 25--36.
[3]
Anirban Basu, Jaideep Vaidya, Hiroaki Kikuchi, and Theo Dimitrakos. 2013. Privacy-preserving collaborative filtering on the cloud and practical implementation experiences. In Cloud Computing (CLOUD). IEEE Sixth International Conference on Cloud Computing, 406--413.
[4]
Brett K Beaulieu-Jones, Zhiwei Steven Wu, Chris Williams, and Casey S Greene. 2017. Privacy-preserving generative deep neural networks support clinical data sharing. bioRxiv (2017), 159756.
[5]
Shlomo Berkovsky, Tsvi Kuflik, and Francesco Ricci. 2012. The impact of data obfuscation on the accuracy of collaborative filtering. Expert Systems with Applications 39, 5 (2012), 5033--5042.
[6]
Vincent Bindschaedler, Reza Shokri, and Carl A Gunter. 2017. Plausible deniability for privacy-preserving data synthesis. Proceedings of the VLDB Endowment 10, 5 (2017), 481--492.
[7]
John Canny. 2002. Collaborative filtering with privacy. In Security and Privacy, 2002. Proceedings. 2002 IEEE Symposium on. IEEE, 45--57.
[8]
Fran Casino, Josep Domingo-Ferrer, Constantinos Patsakis, Domènec Puig, and Agusti Solanas. 2015. A k-anonymous approach to privacy preserving collaborative filtering. J. Comput. System Sci. 81, 6 (2015), 1000--1011.
[9]
María del Carmen Rodríguez-Hernández, Sergio Ilarri, Ramón Hermoso, and Raquel Trillo-Lado. 2017. DataGenCARS: A generator of synthetic data for the evaluation of context-aware recommendation systems. Pervasive and Mobile Computing 38 (2017), 516--541.
[10]
María del Carmen Rodríguez-Hernández, Sergio Ilarri, Ramón Hermoso, and Raque Trillo-Lado. 2017. Towards Trajectory-Based Recommendations in Museums: Evaluation of Strategies Using Mixed Synthetic and Real Data. Procedia Computer Science 113 (2017), 234--239.
[11]
Josep Domingo-Ferrer and Vicenç Torra. 2002. Distance-based and probabilistic record linkage for re-identification of records with categorical variables. Butlletí de lACIA, Associació Catalana dIntelligència Artificial (2002), 243--250.
[12]
Jörg Drechsler. 2011. Synthetic datasets for statistical disclosure control: theory and implementation. Vol. 201. Springer Science & Business Media.
[13]
Jörg Drechsler and Jerome P Reiter. 2011. An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Computational Statistics & Data Analysis 55, 12 (2011), 3232--3243.
[14]
Zekeriya Erkin, Michael Beye, Thijs Veugen, and Reginald L Lagendijk. 2010. Privacy enhanced recommender system. In Thirty-first symposium on information theory in the Benelux. 35--42.
[15]
Zekeriya Erkin, Thijs Veugen, and Reginald L Lagendijk. 2011. Generating private recommendations in a social trust network. In Computational Aspects of Social Networks (CASoN), 2011 International Conference on. IEEE, 82--87.
[16]
Arik Friedman, Bart P Knijnenburg, Kris Vanhecke, Luc Martens, and Shlomo Berkovsky. 2015. Privacy aspects of recommender systems. In Recommender Systems Handbook. Springer, 649--688.
[17]
Anco Hundepool, Josep Domingo-Ferrer, Luisa Franconi, Sarah Giessing, Eric Schulte Nordholt, Keith Spicer, and Peter-Paul De Wolf. 2012. Statistical disclosure control. John Wiley & Sons.
[18]
Arjan JP Jeckmans, Michael Beye, Zekeriya Erkin, Pieter Hartel, Reginald L Lagendijk, and Qiang Tang. 2013. Privacy in recommender systems. In Social media retrieval. Springer, 263--281.
[19]
McNamee, Roger and Parakilas, Sandy. 2018. "The Face-book breach makes it clear: data must be regulated, The Guardian". https://www.theguardian.com/commentisfree/2018/mar/19/facebook-data-cambridge-analytica-privacy-breach, Online; accessed 08-May-2018.
[20]
Frank McSherry and Ilya Mironov. 2009. Differentially private recommender systems: Building privacy into the netflix prize contenders. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 627--636.
[21]
Arvind Narayanan and Vitaly Shmatikov. 2008. Robust de-anonymization of large sparse datasets. In Security and Privacy, 2008. SP 2008. IEEE Symposium on. IEEE, 111--125.
[22]
Rupa Parameswaran and Douglas M Blough. 2007. Privacy preserving collaborative filtering using data obfuscation. In Granular Computing, 2007. GRC 2007. IEEE International Conference on. IEEE, 380--380.
[23]
Javier Parra-Arnau, David Rebollo-Monedero, and Jordi Forné. 2012. A privacy-protecting architecture for collaborative filtering via forgery and suppression of ratings. In Data Privacy Management and Autonomous Spontaneus Security. Springer, 42--57.
[24]
Marden Pasinato, Carlos Eduardo Mello, Marie-Aude Aufaure, and Geraldo Zimbrao. 2013. Generating synthetic data for context-aware recommender systems. In Computational Intelligence and 11th Brazilian Congress on Computational Intelligence (BRICS-CCI & CBIC), 2013 BRICS Congress on. IEEE, 563--567.
[25]
Neha Patki, Roy Wedge, and Kalyan Veeramachaneni. 2016. The synthetic data vault. In Data Science and Advanced Analytics (DSAA), 2016 IEEE International Conference on. IEEE, 399--410.
[26]
Huseyin Polat and Wenliang Du. 2003. Privacy-preserving collaborative filtering using randomized perturbation techniques. In Third IEEE International Conference on Data Mining. ICDM. IEEE, 625--628.
[27]
Chris Salter, O. Sami Saydjari, Bruce Schneier, and Jim Wallner. 1998. Toward a Secure System Engineering Methodology. In Proceedings of the 1998 Workshop on New Security Paradigms (NSPW '98). 2--10.
[28]
Mingxuan Sun, Changbin Li, and Hongyuan Zha. 2017. Inferring Private Demographics of New Users in Recommender Systems. In Proceedings of the 20th ACM International Conference on Modelling, Analysis and Simulation of Wireless and Mobile Systems. ACM, 237--244.
[29]
Latanya Sweeney. 2002. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10, 05 (2002), 571--588.
[30]
Matthias Templ. 2017. Statistical disclosure control for microdata: methods and applications in R. Springer.
[31]
Vicenç Torra, John M Abowd, and Josep Domingo-Ferrer. 2006. Using maha-lanobis distance-based record linkage for disclosure risk assessment. In International Conference on Privacy in Statistical Databases. Springer, 233--242.
[32]
Karen HL Tso and Lars Schmidt-Thieme. 2006. Empirical analysis of attribute-aware recommender system algorithms using synthetic data. Journal of Computers 1, 4 (2006), 18--29.
[33]
Ruoxuan Wei, Hui Tian, and Hong Shen. 2018. Improving k-anonymity based privacy preservation for collaborative filtering. Computers & Electrical Engineering (2018).
[34]
Udi Weinsberg, Smriti Bhagat, Stratis Ioannidis, and Nina Taft. 2012. BlurMe: Inferring and obfuscating user gender based on ratings. In Proceedings of the sixth ACM conference on Recommender systems. ACM, 195--202.
[35]
Tianqing Zhu, Yongli Ren, Wanlei Zhou, Jia Rong, and Ping Xiong. 2014. An effective privacy preserving algorithm for neighborhood-based collaborative filtering. Future Generation Computer Systems 36 (2014), 142--155.

Cited By

View all
  • (2024)A Recommender System for Educational PlanningCybernetics and Information Technologies10.2478/cait-2024-001624:2(67-85)Online publication date: 1-Jun-2024
  • (2024)Informed Dataset Selection with ‘Algorithm Performance Spaces’Proceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3691704(1085-1090)Online publication date: 8-Oct-2024
  • (2024)Enhancing Privacy in Recommender Systems through Differential Privacy TechniquesProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688019(1348-1352)Online publication date: 8-Oct-2024
  • Show More Cited By

Index Terms

  1. Comparing recommender systems using synthetic data

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    RecSys '18: Proceedings of the 12th ACM Conference on Recommender Systems
    September 2018
    600 pages
    ISBN:9781450359016
    DOI:10.1145/3240323
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 September 2018

    Check for updates

    Author Tags

    1. disclosure control
    2. preference hiding
    3. recommendation
    4. synthetic data

    Qualifiers

    • Extended-abstract

    Conference

    RecSys '18
    Sponsor:
    RecSys '18: Twelfth ACM Conference on Recommender Systems
    October 2, 2018
    British Columbia, Vancouver, Canada

    Acceptance Rates

    RecSys '18 Paper Acceptance Rate 32 of 181 submissions, 18%;
    Overall Acceptance Rate 254 of 1,295 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)76
    • Downloads (Last 6 weeks)13
    Reflects downloads up to 24 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Recommender System for Educational PlanningCybernetics and Information Technologies10.2478/cait-2024-001624:2(67-85)Online publication date: 1-Jun-2024
    • (2024)Informed Dataset Selection with ‘Algorithm Performance Spaces’Proceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3691704(1085-1090)Online publication date: 8-Oct-2024
    • (2024)Enhancing Privacy in Recommender Systems through Differential Privacy TechniquesProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688019(1348-1352)Online publication date: 8-Oct-2024
    • (2024)Simulating News Recommendation Ecosystems for Insights and ImplicationsIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.338132911:5(5699-5713)Online publication date: Oct-2024
    • (2024)SNOOKER: a dataset generator for helpdesk servicesJournal of Intelligent Information Systems10.1007/s10844-024-00905-5Online publication date: 8-Nov-2024
    • (2024)AUTO-DataGenCARS+: An Advanced User-Oriented Tool to Generate Data for the Evaluation of Recommender SystemsAdvances in Mobile Computing and Multimedia Intelligence10.1007/978-3-031-78049-3_16(176-191)Online publication date: 28-Nov-2024
    • (2024)Leveraging Artificial Intelligence Models Using Synthetic DataProceedings of the Future Technologies Conference (FTC) 2024, Volume 110.1007/978-3-031-73110-5_4(56-66)Online publication date: 5-Nov-2024
    • (2023)A Generative Adversarial Approach with Social Relationship for Recommender SystemsProceedings of the 2023 International Conference on Power, Communication, Computing and Networking Technologies10.1145/3630138.3630424(1-5)Online publication date: 24-Sep-2023
    • (2023)Performance Ranking of Recommender Systems on Simulated DataProcedia Computer Science10.1016/j.procs.2022.10.216212:C(142-151)Online publication date: 20-Jan-2023
    • (2022)Report on the 1st simulation for information retrieval workshop (Sim4IR 2021) at SIGIR 2021ACM SIGIR Forum10.1145/3527546.352755955:2(1-16)Online publication date: 17-Mar-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media