Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3597503.3639107acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Are Your Requests Your True Needs? Checking Excessive Data Collection in VPA App

Published: 12 April 2024 Publication History

Abstract

Virtual personal assistants (VPA) services encompass a large number of third-party applications (or apps) to enrich their functionalities. These apps have been well examined to scrutinize their data collection behaviors against their declared privacy policies. Nonetheless, it is often overlooked that most users tend to ignore privacy policies at the installation time. Dishonest developers thus can exploit this situation by embedding excessive declarations to cover their data collection behaviors during compliance auditing.
In this work, we present Pico, a privacy inconsistency detector, which checks the VPA app's privacy compliance by analyzing (in)consistency between data requested and data essential for its functionality. Pico understands the app's functionality topics from its publicly available textual data, and leverages advanced GPT-based language models to address domain-specific challenges. Based on the counterparts with similar functionality, suspicious data collection can be detected through the lens of anomaly detection. We apply Pico to understand the status quo of data-functionality compliance among all 65,195 skills in the Alexa app store. Our study reveals that 21.7% of the analyzed skills exhibit suspicious data collection, including Top 10 popular Alexa skills that pose threats to 54,116 users. These findings should raise an alert to both developers and users, in the compliance with the purpose limitation principle in data regulations.

References

[1]
2023. OpenAI/Models. Retrieved April 7, 2023 from https://platform.openai.com/docs/models/gpt-3-5
[2]
2024. PICO. Retrieved Janurary 9, 2024 from https://github.com/UQ-Trust-Lab/PICO
[3]
Amazon Alexa. 2023. Set Up News and Flash Briefings for Alexa. https://www.amazon.com/gp/help/customer/display.html?nodeId=GXMFWZJ8FKRGLFFU
[4]
Amazon. 2023. Alexa skills. Retrieved January 7, 2023 from https://www.amazon.com.au/Alexa-Skills/b?node=4931595051
[5]
Amazon Developer Documentation. 2022. Configure Permissions for Customer Information in Your Skill. https://developer.amazon.com/en-US/docs/alexa/custom-skills/configure-permissions-for-customer-information-in-your-skill.html
[6]
Benjamin Andow, Samin Yaseer Mahmud, Wenyu Wang, Justin Whitaker, William Enck, Bradley Reaves, Kapil Singh, and Tao Xie. 2019. PolicyLint: investigating internal privacy policy contradictions on google play. In 28th USENIX security symposium (USENIX security). 585--602.
[7]
Benjamin Andow, Samin Yaseer Mahmud, Justin Whitaker, William Enck, Bradley Reaves, Kapil Singh, and Serge Egelman. 2020. Actions Speak Louder than Words: Entity-Sensitive Privacy Policy and Data Flow Analysis with POLICHECK. In 29th USENIX Security Symposium (USENIX Security).
[8]
Duc Bui, Yuan Yao, Kang G. Shin, Jong-Min Choi, and Junbum Shin. 2021. Consistency Analysis of Data-Usage Purposes in Mobile Apps. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (Virtual Event, Republic of Korea) (CCS). Association for Computing Machinery, New York, NY, USA, 2824--2843.
[9]
Yunang Chen, Mohannad Alhanahnah, Andrei Sabelfeld, Rahul Chatterjee, and Earlence Fernandes. 2022. Practical Data Access Minimization in Trigger-Action Platforms. In 31st USENIX Security Symposium (USENIX Security). Boston, MA, 2929--2945.
[10]
Ioana Ciuca and Yuan-Sen Ting. 2023. Galactic ChitChat: Using Large Language Models to Converse with Astronomy Literature. Research Notes of the AAS 7 (09 2023), 193.
[11]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In North American Chapter of the Association for Computational Linguistics.
[12]
Jan Digutsch and Michal Kosinski. 2023. Overlap in meaning is a stronger predictor of semantic activation in GPT-3 than in humans. Scientific Reports 13, 1 (2023), 5035.
[13]
Jide Edu, Xavi Ferrer Aran, Jose Such, and Guillermo Suarez-Tangil. 2021. SkillVet: Automated Traceability Analysis of Amazon Alexa Skills. IEEE Transactions on Dependable and Secure Computing (2021).
[14]
Jide Edu, Xavier Ferrer-Aran, Jose Such, and Guillermo Suarez-Tangil. 2022. Measuring Alexa skill privacy practices across three years. In Proceedings of the ACM Web Conference 2022. 670--680.
[15]
European Parliament. 2020. General Data Protection Regulation (GDPR). Retrieved July 26, 2023 from https://gdpr-info.eu/
[16]
Alessandra Gorla, Ilaria Tavecchia, Florian Gross, and Andreas Zeller. 2014. Checking app behavior against app descriptions. In Proceedings of the 36th international conference on software engineering (ICSE). 1025--1035.
[17]
Maarten Grootendorst. 2020. KeyBERT: Minimal keyword extraction with BERT.
[18]
Maarten R. Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. ArXiv abs/2203.05794 (2022).
[19]
Edward Guo, Mehul Gupta, Sarthak Sinha, Karl Rössler, Marcos Tatagiba, Ryojo Akagami, Ossama Al-Mefty, Taku Sugiyama, Phillip E Stieg, Gwynedd E Pickett, et al. 2023. neuroGPT-X: Towards an Accountable Expert Opinion Tool for Vestibular Schwannoma. medRxiv (2023), 2023--02.
[20]
Zhixiu Guo, Zijin Lin, Pan Li, and Kai Chen. 2020. Skillexplorer: Understanding the behavior of skills in large scale. In 29th USENIX Security Symposium (USENIX Security). 2649--2666.
[21]
Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear 7, 1 (2017), 411--420.
[22]
Tu Le, Danny Yuxing Huang, Noah Apthorpe, and Yuan Tian. 2022. SkillBot: Identifying Risky Content for Children in Alexa Skills. ACM Trans. Internet Technol. 22, 3, Article 79 (July 2022), 31 pages.
[23]
Suwan Li, Lei Bu, Guangdong Bai, Zhixiu Guo, Kai Chen, and Hanlin Wei. 2022. VITAS: Guided Model-based VUI Testing of VPA Apps. In 37th IEEE/ACM International Conference on Automated Software Engineering (ASE). 1--12.
[24]
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In 8th IEEE international conference on data mining. IEEE, 413--422.
[25]
Aleecia M McDonald and Lorrie Faith Cranor. 2008. The cost of reading privacy policies. I/S: A Journal of Law and Policy for the Information Society (ISJLP) 4 (2008), 543.
[26]
Leland McInnes, John Healy, and Steve Astels. 2017. hdbscan: Hierarchical density based clustering. J. Open Source Softw. 2, 11 (2017), 205.
[27]
Anca Micheti, Jacquelyn Burkell, and Valerie Steeves. 2010. Fixing broken doors: Strategies for drafting privacy policies young people can understand. Bulletin of Science, Technology & Society 30, 2 (2010), 130--143.
[28]
Jonathan A Obar and Anne Oeldorf-Hirsch. 2020. The biggest lie on the internet: Ignoring the privacy policies and terms of service policies of social networking services. Information, Communication & Society 23, 1 (2020), 128--147.
[29]
OpenAI. 2021. GPT-3 powers the next generation of apps. https://openai.com/blog/gpt-3-apps
[30]
Lianglu Pan, Shaanan Cohney, Toby Murray, and Van-Thuan Pham. 2023. EDE-Fuzz: A Web API Fuzzer for Excessive Data Exposures. In 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE). IEEE Computer Society, 519--530.
[31]
Sai Teja Peddinti, Igor Bilogrevic, Nina Taft, Martin Pelikan, Úlfar Erlingsson, Pauline Anthonysamy, and Giles Hogben. 2019. Reducing permission requests in mobile apps. In Proceedings of the internet measurement conference (IMC). 259--266.
[32]
Mayk Caldas Ramos, Shane S Michtavy, Marc D Porosoff, and Andrew D White. 2023. Bayesian Optimization of Catalysts With In-context Learning. (2023).
[33]
scikit-learn. 2023. sklearn.ensemble.IsolationForest. Retrieved July 26, 2023 from https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html
[34]
Rocky Slavin, Xiaoyin Wang, Mitra Bokaei Hosseini, James Hester, Ram Krishnan, Jaspreet Bhatia, Travis D. Breaux, and Jianwei Niu. 2016. Toward a Framework for Detecting Privacy Policy Violations in Android Application Code. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). 25--36.
[35]
Bergur Thormundsson. 2023. Virtual Assistant Technology - statistics & facts. Retrieved July 27, 2023 from https://www.statista.com/topics/5572/virtual-assistants
[36]
Xiaoyin Wang, Xue Qin, Mitra Bokaei Hosseini, Rocky Slavin, Travis D Breaux, and Jianwei Niu. 2018. Guileak: Tracing privacy policy claims on user input data for android applications. In Proceedings of the 40th International Conference on Software Engineering (ICSE). 37--47.
[37]
Fuman Xie, Yanjun Zhang, Hanlin Wei, and Guangdong Bai. 2022. UQ-AAS21: a comprehensive dataset of Amazon Alexa skills. In International Conference on Advanced Data Mining and Applications (ADMA). Springer, 159--173.
[38]
Fuman Xie, Yanjun Zhang, Chuan Yan, Suwan Li, Lei Bu, Kai Chen, Zi Huang, and Guangdong Bai. 2022. Scrutinizing Privacy Policy Compliance of Virtual Personal Assistant Apps. In 37th IEEE/ACM International Conference on Automated Software Engineering (ASE).
[39]
Chuan Yan, Fuman Xie, Mark Huasong Meng, Yanjun Zhang, and Guangdong Bai. 2024. On the Quality of Privacy Policy Documents of Virtual Personal Assistant Applications. In 24th Privacy Enhancing Technologies Symposium (PETS).
[40]
Jeffrey Young, Song Liao, Long Cheng, Hongxin Hu, and Huixing Deng. 2022. SkillDetective: Automated Policy-Violation Detection of Voice Assistant Applications in the Wild. In 31st USENIX Security Symposium (USENIX Security).
[41]
Chengpeng Zhang, Haoyu Wang, Ran Wang, Yao Guo, and Guoai Xu. 2018. Re-checking App Behavior against App Description in the Context of Third-party Libraries. In International Conference on Software Engineering and Knowledge Engineering.
[42]
Nan Zhang, Xianghang Mi, Xuan Feng, XiaoFeng Wang, Yuan Tian, and Feng Qian. 2019. Dangerous skills: Understanding and mitigating security risks of voice-controlled third-party functions on virtual personal assistant systems. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 1381--1396.
[43]
Lu Zhou, Chengyongxiao Wei, Tong Zhu, Guoxing Chen, Xiaokuan Zhang, Suguo Du, Hui Cao, and Haojin Zhu. 2023. POLICYCOMP: Counterpart Comparison of Privacy Policies Uncovers Overbroad Personal Data Collection Practices. In 32nd USENIX Security Symposium (USENIX Security 23). 1073--1090.

Index Terms

  1. Are Your Requests Your True Needs? Checking Excessive Data Collection in VPA App

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering
    May 2024
    2942 pages
    ISBN:9798400702174
    DOI:10.1145/3597503
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    • Faculty of Engineering of University of Porto

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 April 2024

    Check for updates

    Author Tags

    1. virtual personal assistant
    2. privacy compliance
    3. alexa skills

    Qualifiers

    • Research-article

    Conference

    ICSE '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 276 of 1,856 submissions, 15%

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 86
      Total Downloads
    • Downloads (Last 12 months)86
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 25 Nov 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media