A Question Answering Tool for Website Privacy Policy Comprehension

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14045))

Included in the following conference series:

International Conference on Human-Computer Interaction

1207 Accesses
1 Citations

Abstract

Everyday we interact with online services from companies that ask for our permission to use our personal information. Nowadays it is common practice for websites and apps to collect big amounts of data which are mainly used for revenue optimization based on user analytics. This customer data collection and usage is regulated by legal agreements (i.e., privacy and cookie policies) which we are required to accept (multiple times a day), but which are generally very long and formulated in a way that makes their interpretation difficult for the general public. An average privacy policy takes 15 min to read and includes lots of legal jargon (e.g., including words like “data controller” and “legal basis for processing”). In this research project, we are developing a support system where users can search for concrete answers in the privacy policies of companies or websites, by formulating their questions in natural language. Instead of blindly accepting a privacy policy, a user could first query the system for answers to a potential concern. The system will return a ranked list of phrases and documents matching the query. In case the generated answer is not sufficient for the user, an extension will allow them to forward complex requests to best-matching legal professionals, specialized in privacy legislation, which can process them for a small fee. We present different aspects of the internal implementation, including the identification of relevant spans in unstructured privacy policies and the selection of the best-suited NLP model for this specific task. The initial results of a user evaluation are presented, showing promising directions. Eventually, some future research directions for the extension of the system conclude our contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://maartengr.github.io/KeyBERT/index.html.
2.
https://github.com/luyug/Condenser.
3.
https://github.com/nyu-dl/dl4marco-bert.
4.
https://github.com/stanford-futuredata/ColBERT.
5.
https://github.com/JetRunner/LaPraDoR.
6.
https://pytorch.org/serve/.
7.
Vector Search Engine QDrant, see https://qdrant.tech/.

References

Abela, S.: Data protection and freedom of information. In: Abela, S. (ed.) Leadership and Management in Healthcare, pp. 103–107. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-21025-9_10
Chapter Google Scholar
Crook, M.: The Caldicott report and patient confidentiality (2003)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Fabian, B., Ermakova, T., Lentz, T.: Large-scale readability analysis of privacy policies. In: Proceedings of the International Conference on Web Intelligence, pp. 18–25 (2017)
Google Scholar
Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378 (1971)
Article Google Scholar
Gao, L., Callan, J.: Condenser: a pre-training architecture for dense retrieval. arXiv preprint arXiv:2104.08253 (2021)
Gao, L., Callan, J.: Is your language model ready for dense representation fine-tuning. arXiv preprint arXiv:2104.08253 (2021)
Goddard, M.: The EU general data protection regulation (GDPR): European regulation that has a global impact. Int. J. Mark. Res. 59(6), 703–705 (2017)
Article Google Scholar
Honnibal, M., Montani, I., Van Landeghem, S., Boyd, A., et al.: Spacy: industrial-strength natural language processing in Python (2020)
Google Scholar
Khattab, O., Zaharia, M.: ColBERT: efficient and effective passage search via contextualized late interaction over BERT. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 39–48 (2020)
Google Scholar
Kiss, T., Strunk, J.: Unsupervised multilingual sentence boundary detection. Comput. Linguist. 32(4), 485–525 (2006)
Article Google Scholar
Korunovska, J., Kamleitner, B., Spiekermann, S.: The challenges and impact of privacy policy comprehension. arXiv preprint arXiv:2005.08967 (2020)
Leatherman, S., Berwick, D.M.: Accelerating global improvements in health care quality. JAMA 324(24), 2479–2480 (2020)
Article Google Scholar
Liu, Y., Stolcke, A., Shriberg, E., Harper, M.: Using conditional random fields for sentence boundary detection in speech. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 451–458 (2005)
Google Scholar
Mazzola, L., Waldis, A., Shankar, A., Argyris, D., Denzler, A., Van Roey, M.: Privacy and customer’s education: NLP for information resources suggestions and expert finder systems. In: Moallem, A. (ed.) HCII 2022. LNCS, vol. 13333, pp. 62–77. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-05563-8_5
Chapter Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26 (2013)
Google Scholar
Nogueira, R., Cho, K.: Passage re-ranking with BERT. arXiv preprint arXiv:1901.04085 (2019)
Peters, S., Verhagen, H.: An evaluation of the nutri-score system along the reasoning for scientific substantiation of health claims in the EU—a narrative review. Foods 11(16), 2426 (2022)
Article Google Scholar
Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C.D.: Stanza: a Python natural language processing toolkit for many human languages. arXiv preprint arXiv:2003.07082 (2020)
Ravichander, A., Black, A.W., Wilson, S., Norton, T., Sadeh, N.: Question answering for privacy policies: combining computational and legal perspectives. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 4949–4959. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/D19-1500. https://www.aclweb.org/anthology/D19-1500
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., Gatford, M., et al.: Okapi at trec-3. NIST Special Publication Sp 109, 109 (1995)
Google Scholar
Sadvilkar, N., Neumann, M.: PySBD: pragmatic sentence boundary disambiguation. arXiv preprint arXiv:2010.09657 (2020)
Sanchez, G.: Sentence boundary detection in legal text. In: Proceedings of the Natural Legal Language Processing Workshop 2019, Minneapolis, Minnesota, pp. 31–38. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/W19-2204. https://aclanthology.org/W19-2204
Santhanam, K., Khattab, O., Saad-Falcon, J., Potts, C., Zaharia, M.: ColBERTv2: effective and efficient retrieval via lightweight late interaction. arXiv preprint arXiv:2112.01488 (2021)
Savelka, J., Walker, V.R., Grabmair, M., Ashley, K.D.: Sentence boundary detection in adjudicatory decisions in the United States. Traitement automatique des langues 58, 21 (2017)
Google Scholar
Sharma, P., Li, Y.: Self-supervised contextual keyword and keyphrase retrieval with self-labelling (2019). https://www.preprints.org/manuscript/201908.0073/v1
Sivan-Sevilla, I.: Varieties of enforcement strategies post-GDPR: a fuzzy-set qualitative comparative analysis (FSQCA) across data protection authorities. J. Eur. Public Policy 1–34 (2022)
Google Scholar
Subrahmanya, S.V.G., et al.: The role of data science in healthcare advancements: applications, benefits, and future prospects. Irish J. Med. Sci. (1971-) 191(4), 1473–1483 (2022)
Google Scholar
Tikkinen-Piri, C., Rohunen, A., Markkula, J.: EU general data protection regulation: changes and implications for personal data collecting companies. Comput. Law Secur. Rev. 34(1), 134–153 (2018)
Article Google Scholar
Tkachenko, M., Malyuk, M., Holmanyuk, A., Liubimov, N.: Label Studio: Data labeling software (2020–2022). Open source software https://github.com/heartexlabs/label-studio
Trotman, A., Puurula, A., Burgess, B.: Improvements to BM25 and language models examined. In: Proceedings of the 2014 Australasian Document Computing Symposium, pp. 58–65 (2014)
Google Scholar
Vail, M.W., Earp, J.B., Antón, A.I.: An empirical study of consumer perceptions and comprehension of web site privacy policies. IEEE Trans. Eng. Manag. 55(3), 442–454 (2008)
Article Google Scholar
Vanberg, A.D.: Informational privacy post GDPR-end of the road or the start of a long journey? Int. J. Hum. Rights 25(1), 52–78 (2021)
Article Google Scholar
Xu, C., Guo, D., Duan, N., McAuley, J.: LaPraDoR: unsupervised pretrained dense retriever for zero-shot text retrieval. arXiv preprint arXiv:2203.06169 (2022)

Download references

Acknowledgements

The research leading to this work was partially financed by Innosuisse - Swiss federal agency for Innovation, through a competitive call. The project 50446.1 IP-ICT is called P2Sr Profila Privacy Simplified reloaded: Open-smart knowledge base on Swiss privacy policies and Swiss privacy legislation, simplifying consumers’ access to legal knowledge and expertise (https://www.aramis.admin.ch/Grunddaten/?ProjectID=48867). The authors would like to thank all the people involved on the implementation side at Profila GmbH (https://www.profila.com/) for all the constructive and fruitful discussions and insights provided about privacy regulations and consumers’ rights.

Author information

Authors and Affiliations

School of Information Technology, HSLU - Lucerne University of Applied Sciences and Arts, Suurstoffi 1, 6343, Rotkreuz, Switzerland
Luca Mazzola, Atreya Shankar, Christof Bless, Maria A. Rodriguez, Andreas Waldis & Alexander Denzler
Profila GmbH, Seeburgstrasse 45, 6006, Luzern, Switzerland
Michiel Van Roey

Authors

Luca Mazzola
View author publications
You can also search for this author in PubMed Google Scholar
Atreya Shankar
View author publications
You can also search for this author in PubMed Google Scholar
Christof Bless
View author publications
You can also search for this author in PubMed Google Scholar
Maria A. Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Waldis
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Denzler
View author publications
You can also search for this author in PubMed Google Scholar
Michiel Van Roey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luca Mazzola .

Editor information

Editors and Affiliations

San Jose State University, San Jose, CA, USA
Abbas Moallem

Appendix A - SBD and Q2D Graphs

In this appendix, we provide the reader with the graphical representations of the data from Table 1 and from Table 2. Effectiveness of nltk is demonstrated with a good F1 measure and a very limited runtime.

BM25+, a relatively simple and sparse IDF-based model, practically outperforms other approaches when considering accuracy and runtime.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mazzola, L. et al. (2023). A Question Answering Tool for Website Privacy Policy Comprehension. In: Moallem, A. (eds) HCI for Cybersecurity, Privacy and Trust. HCII 2023. Lecture Notes in Computer Science, vol 14045. Springer, Cham. https://doi.org/10.1007/978-3-031-35822-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-35822-7_14
Published: 09 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35821-0
Online ISBN: 978-3-031-35822-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Question Answering Tool for Website Privacy Policy Comprehension

Abstract

Access this chapter

Subscribe and save

Buy Now

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix A - SBD and Q2D Graphs

Appendix A - SBD and Q2D Graphs

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation