Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3472749.3474736acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article
Open access

KondoCloud: Improving Information Management in Cloud Storage via Recommendations Based on File Similarity

Published: 12 October 2021 Publication History

Abstract

Users face many challenges in keeping their personal file collections organized. While current file-management interfaces help users retrieve files in disorganized repositories, they do not aid in organization. Pertinent files can be difficult to find, and files that should have been deleted may remain. To help, we designed KondoCloud, a file-browser interface for personal cloud storage. KondoCloud makes machine learning-based recommendations of files users may want to retrieve, move, or delete. These recommendations leverage the intuition that similar files should be managed similarly.
We developed and evaluated KondoCloud through two complementary online user studies. In our Observation Study, we logged the actions of 69 participants who spent 30 minutes manually organizing their own Google Drive repositories. We identified high-level organizational strategies, including moving related files to newly created sub-folders and extensively deleting files. To train the classifiers that underpin KondoCloud’s recommendations, we had participants label whether pairs of files were similar and whether they should be managed similarly. In addition, we extracted ten metadata and content features from all files in participants’ repositories. Our logistic regression classifiers all achieved F1 scores of 0.72 or higher. In our Evaluation Study, 62 participants used KondoCloud either with or without recommendations. Roughly half of participants accepted a non-trivial fraction of recommendations, and some participants accepted nearly all of them. Participants who were shown the recommendations were more likely to delete related files located in different directories. They also generally felt the recommendations improved efficiency. Participants who were not shown recommendations nonetheless manually performed about a third of the actions that would have been recommended.

Supplementary Material

VTT File (p69-talk.vtt)
VTT File (p69-video_figure.vtt)
VTT File (p69-video_preview.vtt)
MP4 File (p69-talk.mp4)
Talk video and captions
MP4 File (p69-video_figure.mp4)
Video figure and captions
MP4 File (p69-video_preview.mp4)
Video preview and captions

References

[1]
Nitin Agrawal, William J. Bolosky, John R. Douceur, and Jacob R. Lorch. 2007. A five-year study of file-system metadata. ACM TOS 3, 3 (2007), 9.
[2]
Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105–120.
[3]
Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, 2019. Guidelines for human-AI interaction. In Proc. CHI.
[4]
Anne Aula, Natalie Jhaveri, and Mika Käki. 2005. Information search and re-access strategies of experienced web users. In Proc. WWW.
[5]
Sandeep Avula, Gordon Chadwick, Jaime Arguello, and Robert Capra. 2018. SearchBots: User engagement with chatbots during collaborative search. In Proc. CHIIR.
[6]
Aaron Bangor, Philip Kortum, and James Miller. 2009. Determining what individual SUS scores mean: Adding an adjective rating scale. Journal of Usability Studies 4, 3 (2009), 114–123.
[7]
Xinlong Bao and Thomas G. Dietterich. 2011. FolderPredictor: Reducing the cost of reaching the right folder. ACM TIST 2, 1 (2011).
[8]
Deborah Barreau. 1995. Context as a factor in personal information management systems. Journal of the American Society for Information Science 46, 5(1995), 327–339.
[9]
Deborah Barreau and Bonnie A. Nardi. 1995. Finding and reminding: File organization from the desktop. ACM SIGCHI Bulletin 27, 3 (1995), 39–43.
[10]
Yael Benn, Ofer Bergman, Liv Glazer, Paris Arent, Iain D. Wilkinson, Rosemary Varley, and Steve Whittaker. 2015. Navigating through digital folders uses the same brain structures as real world navigation. Scientific Reports 5, 1 (2015).
[11]
Ofer Bergman, Ruth Beyth-Marom, and Rafi Nachmias. 2006. The project fragmentation problem in personal information management. In Proc. CHI.
[12]
Ofer Bergman, Ruth Beyth-Marom, and Rafi Nachmias. 2008. The user-subjective approach to personal information management systems design: Evidence and implementations. JASIST 59, 2 (2008), 235–246.
[13]
Ofer Bergman, Ruth Beyth-Marom, Rafi Nachmias, Noa Gradovitch, and Steve Whittaker. 2008. Improved search engines and navigation preference in personal information management. ACM TOIS 26, 4 (2008).
[14]
Ofer Bergman, Maskit Tene-Rubinstein, and Jonathan Shalom. 2013. The use of attention resources in navigation versus search. Personal and Ubiquitous Computing 17, 3 (2013), 583–590.
[15]
Ofer Bergman, Simon Tucker, Ruth Beyth-Marom, Edward Cutrell, and Steve Whittaker. 2009. It’s not that important: Demoting personal information of low subjective importance using GrayArea. In Proc. CHI.
[16]
Ofer Bergman, Steve Whittaker, and Noa Falk. 2014. Shared files: The retrieval perspective. JASIST 65, 10 (2014), 1949–1963.
[17]
Ofer Bergman, Steve Whittaker, and Yaron Frishman. 2019. Let’s get personal: The little nudge that improves document retrieval in the cloud. J. Doc (2019).
[18]
Ofer Bergman, Steve Whittaker, Mark Sanderson, Rafi Nachmias, and Anand Ramamoorthy. 2010. The effect of folder structure on personal file navigation. JASIST 61, 12 (2010), 2426–2441.
[19]
Richard Boardman and M. Angela Sasse. 2004. Stuff goes into the computer and doesn’t come out: A cross-tool study of personal information management. In Proc. CHI.
[20]
Richard Boardman, Robert Spence, and M. Angela Sasse. 2003. Too many hierarchies? The daily struggle for control of the workspace. In Proc. HCII.
[21]
Will Brackenbury, Galen Harrison, Kyle Chard, Aaron Elmore, and Blase Ur. 2021. Files of a feather flock together? Measuring and modeling how users perceive file similarity in cloud storage. In Proc. SIGIR.
[22]
Robert Capra and Manuel A. Pérez-Quiñones. 2006. Factors and evaluation of refinding behaviors. In Proc. SIGIR.
[23]
Suming Jeremiah Chen, Zhen Qin, Zachary Teal Wilson, Brian Lee Calaci, Michael Richard Rose, Ryan Lee Evans, Sean Robert Abraham, Don Metzler, Sandeep Tata, and Mike Colagrosso. 2020. Improving recommendation quality at Google Drive. In Proc. KDD.
[24]
Andrea Civan, William Jones, Predrag Klasnja, and Harry Bruce. 2008. Better to organize personal information by folders or by tags?: The devil is in the details. Proceedings of the American Society for Information Science and Technology 45, 1 (2008), 1–13.
[25]
Jason W. Clark, Peter Snyder, Damon McCoy, and Chris Kanich. 2015. I saw images I didn’t even know I had: Understanding user perceptions of cloud storage privacy. In Proc. CHI.
[26]
Andy Cockburn and Bruce McKenzie. 2001. What do web users do? An empirical analysis of web use. IJHCS 54, 6 (2001), 903–922.
[27]
Mary Czerwinski, Eric Horvitz, and Susan Wilhite. 2004. A diary study of task switching and interruptions. In Proc. CHI.
[28]
Sanjoy Dasgupta and Sivan Sabato. 2020. Robust learning from discriminative feature feedback. In Proc. AISTATS.
[29]
Jesse David Dinneen and Ilja Frissen. 2020. Mac users do it differently: The role of operating system and individual differences in file management. In Proc. CHI Extended Abstracts.
[30]
Jesse David Dinneen and Charles-Antoine Julien. 2015. The disappearing semantic web; an examination of 54 semantic search tools. In Proceedings of the Annual Conference of CAIS.
[31]
Jesse David Dinneen, Charles-Antoine Julien, and Ilja Frissen. 2019. The scale and structure of personal file collections. In Proc. CHI.
[32]
Jesse David Dinneen, Fabian Odoni, Ilja Frissen, and Charles-Antoine Julien. 2016. Cardinal: Novel software for studying file management behavior. In Proceedings of the 79th ASIS&T Annual Meeting.
[33]
Paul Dourish. 2003. The appropriation of interactive technologies: Some lessons from placeless documents. CSCW 12, 4 (2003), 465–490.
[34]
Paul Dourish, W. Keith Edwards, Anthony LaMarca, John Lamping, Karin Petersen, Michael Salisbury, Douglas B. Terry, and James Thornton. 2000. Extending document management systems with user-specific active properties. ACM TOIS 18, 2 (2000), 140–170.
[35]
Idilio Drago, Marco Mellia, Maurizio M. Munafo, Anna Sperotto, Ramin Sadre, and Aiko Pras. 2012. Inside Dropbox: Understanding personal cloud storage services. In Proc. IMC.
[36]
Susan Dumais, Edward Cutrell, Jonathan J. Cadiz, Gavin Jancke, Raman Sarin, and Daniel C. Robbins. 2003. Stuff I’ve seen: A system for personal information retrieval and re-use. In Proc. SIGIR.
[37]
David Ellis and Merete Haugan. 1997. Modelling the information seeking patterns of engineers and research scientists in an industrial environment. J. Doc 53, 4 (1997), 384–403.
[38]
Leah Findlater and Joanna McGrenere. 2010. Beyond performance: Feature awareness in personalized interfaces. IJHCS 68, 3 (2010), 121–137.
[39]
Stephen Fitchett and Andy Cockburn. 2012. Accessrank: Predicting what users will do next. In Proc. CHI.
[40]
Stephen Fitchett, Andy Cockburn, and Carl Gutwin. 2014. Finder highlights: Field evaluation and design of an augmented file browser. In Proc. CHI.
[41]
Eric Freeman and David Gelernter. 1996. Lifestreams: A storage model for personal data. ACM SIGMOD Record 25, 1 (1996), 80–86.
[42]
Krzysztof Z. Gajos, Mary Czerwinski, Desney S. Tan, and Daniel S. Weld. 2006. Exploring the design space for adaptive graphical user interfaces. In Proc. AVI.
[43]
Krzysztof Z. Gajos, Katherine Everitt, Desney S. Tan, Mary Czerwinski, and Daniel S. Weld. 2008. Predictability and accuracy in adaptive user interfaces. In Proc. CHI Extended Abstracts.
[44]
Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun. 2013. Optimized product quantization for approximate nearest neighbor search. In Proc. CVPR.
[45]
Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 1999. Similarity search in high dimensions via hashing. In Proc. VLDB.
[46]
Google. 2019. Vision AI. https://cloud.google.com/vision/.
[47]
Julien Gori, Han L. Han, and Michel Beaudouin-Lafon. 2020. FileWeaver: Flexible file management with automatic dependency tracking. In Proc. UIST.
[48]
Robert Gray. 1984. Vector quantization. IEEE ASSP Magazine 1, 2 (1984), 4–29.
[49]
Saul Greenberg and Ian H. Witten. 1985. Adaptive personalized interfaces—A question of viability. Behaviour & Information Technology 4, 1 (1985), 31–45.
[50]
Karl Gyllstrom. 2009. Enriching personal information management with document interaction histories. Ph.D. Dissertation.
[51]
Sharon Hardof-Jaffe, Arnon Hershkovitz, Hama Abu-Kishk, Ofer Bergman, and Rafi Nachmias. 2009. Students’ Organization Strategies of Personal Information Space. Journal of Digital Information 10, 5 (2009).
[52]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proc. CVPR.
[53]
Jeffrey Heer. 2019. Agency plus automation: Designing artificial intelligence into interactive systems. PNAS 116, 6 (2019), 1844–1850.
[54]
Sarah Henderson and Ananth Srinivasan. 2009. An empirical analysis of personal digital document structures. In Proc. HCI.
[55]
Farnaz Jahanbakhsh, Ahmed Hassan Awadallah, Susan T. Dumais, and Xuhai Xu. 2020. Effects of past interactions on user experience with recommended documents. In Proc. CHIIR.
[56]
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE TPAMI 33, 1 (2010), 117–128.
[57]
William Jones, Susan Dumais, and Harry Bruce. 2002. Once found, what then? A study of “keeping” behaviors in the personal use of web information. Proceedings of the American Society for Information Science and Technology 39, 1 (2002), 391–402.
[58]
William Jones, Ammy Jiranida Phuwanartnurak, Rajdeep Gill, and Harry Bruce. 2005. Don’t take my folders away! Organizing personal information to get things done. In Proc. CHI Extended Abstracts.
[59]
Mohammad Taha Khan, Maria Hyun, Chris Kanich, and Blase Ur. 2018. Forgotten but not gone: Identifying the need for longitudinal data management in cloud storage. In Proc. CHI.
[60]
Carol C. Kuhlthau. 1991. Inside the search process: Information seeking from the user’s perspective. JASIST 42, 5 (1991), 361–371.
[61]
Barbara H. Kwasnik. 1989. How a personal document’s intended use or purpose affects its classification in an office. In ACM SIGIR Forum, Vol. 23. 207–210.
[62]
Barbara H. Kwasnik. 1991. The importance of factors that are not document attributes in the organization of personal documents.J. Doc (1991).
[63]
Barbara H. Kwasnik. 1992. The role of classification structures in reflecting and building theory. Advances in Classification Research Online 3, 1 (1992), 63–82.
[64]
Mark W. Lansdale. 1988. The psychology of personal information management. Applied Ergonomics 19, 1 (1988), 55–66.
[65]
Bongshin Lee and Benjamin B. Bederson. 2003. Favorite folders: A configurable, scalable file browser. Technical Report.
[66]
John D. Lee and Katrina A. See. 2004. Trust in automation: Designing for appropriate reliance. Human Factors 46, 1 (2004), 50–80.
[67]
Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personalized news recommendation based on click behavior. In Proc. IUI.
[68]
Wanyu Liu, Olivier Rioul, Joanna Mcgrenere, Wendy E. Mackay, and Michel Beaudouin-Lafon. 2018. BIGFile: Bayesian information gain for fast file retrieval. In Proc. CHI.
[69]
Thomas W. Malone. 1983. How do people organize their desks?: Implications for the design of office information systems. ACM TOIS 1, 1 (1983), 99–112.
[70]
Gary Marsden and David E. Cairns. 2004. Improving the usability of the hierarchical file system. South African Computer Journal 2004, 32 (2004), 69–78.
[71]
Ben McCamish, Vahid Ghadakchi, Arash Termehchy, Behrouz Touri, and Liang Huang. 2018. The data interaction game. In Proc. SIGMOD.
[72]
Tomás Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proc. NIPS.
[73]
Xi Niu and Ahmad Al-Doulat. 2021. LuckyFind: Leveraging surprise to improve user satisfaction and inspire curiosity in a recommender system. In Proc. CHIIR.
[74]
Kyong Eun Oh. 2012. What happens once you categorize files into folders?Proceedings of the American Society for Information Science and Technology 49, 1 (2012), 1–4.
[75]
Kyong Eun Oh. 2017. Types of personal information categorization: Rigid, fuzzy, and flexible. JASIST 68, 6 (2017), 1491–1504.
[76]
Kyong Eun Oh and Nicholas J. Belkin. 2014. Understanding what personal information items make categorization difficult. Proceedings of the American Society for Information Science and Technology 51, 1 (2014), 1–3.
[77]
[77] Prolific.2019. https://www.prolific.co/.
[78]
T.J. Robertson, Shrinu Prabhakararao, Margaret Burnett, Curtis Cook, Joseph R. Ruthruff, Laura Beckwith, and Amit Phalgune. 2004. Impact of interruption style on end-user debugging. In Proc. CHI.
[79]
Andrew Sears and Ben Shneiderman. 1994. Split menus: effectively using selection frequency to organize menus. ACM TOCHI 1, 1 (1994), 27–51.
[80]
Richard B. Segal and Jeffrey O. Kephart. 1999. MailCat: An intelligent assistant for organizing e-mail. In Proc. AGENTS.
[81]
Debmalya Sinha and Anupam Basu. 2012. Gardener: A file browser assistant to help users maintaining semantic folder hierarchy. In Proc. IHCI.
[82]
Aleksandrs Slivkins. 2011. Contextual bandits with similarity information. In Proc. COLT.
[83]
[83] Studio 42.2019. https://studio-42.github.io/elFinder/.
[84]
Sandeep Tata, Alexandrin Popescul, Marc Najork, Mike Colagrosso, Julian Gibbons, Alan Green, Alexandre Mah, Michael Smith, Divanshu Garg, Cayden Meyer, and Reuben Kan. 2017. Quick access: building a smart experience for Google Drive. In Proc. KDD.
[85]
Jaime Teevan, Christine Alvarado, Mark S. Ackerman, and David R. Karger. 2004. The perfect search engine is not enough: A study of orienteering behavior in directed search. In Proc. CHI.
[86]
[86] Usability.gov.2021. https://www.usability.gov/how-to-and-tools/methods/system-usability-scale.html.
[87]
Francesco Vitale, Izabelle Janzen, and Joanna McGrenere. 2018. Hoarding and minimalism: Tendencies in digital data preservation. In Proc. CHI.
[88]
Amy Voida, Judith S. Olson, and Gary M. Olson. 2013. Turbulence in the clouds: Challenges of cloud-based information work. In Proc. CHI.
[89]
Stephen Voida and Elizabeth D. Mynatt. 2009. It feels better than filing: Everyday work experiences in an activity-based computing system. In Proc. CHI.
[90]
Jun Wang, Wei Liu, Sanjiv Kumar, and Shih-Fu Chang. 2015. Learning to hash for indexing big data—A survey. Proc. IEEE 104, 1 (2015), 34–57.
[91]
Steve Whittaker. 2011. Personal information management: From information consumption to curation. Annual Review of Information Science and Technology 45, 1(2011), 1–62.
[92]
Steve Whittaker, Ofer Bergman, and Paul Clough. 2010. Easy on that trigger dad: A study of long term family photo retrieval. Personal and Ubiquitous Computing 14, 1 (2010), 31–43.
[93]
Brian Whitworth. 2005. Polite computing. Behaviour & Information Technology 24, 5 (2005), 353–363.
[94]
Ho Chung Wu, Robert Wing Pong Luk, Kam-Fai Wong, and Kui-Lam Kwok. 2008. Interpreting TF-IDF term weights as making relevance decisions. ACM TOIS 26, 3 (2008).
[95]
Xuhai Xu, Ahmed Hassan Awadallah, Susan T. Dumais, Farheen Omar, Bogdan Popp, Robert Rounthwaite, and Farnaz Jahanbakhsh. 2020. Understanding user behavior for document recommendation. In Proc. WWW.
[96]
Liang Huai Yang, Jian Zhou, Jiacheng Wang, and Mong-Li Lee. 2012. A novel PIM system and its effective storage compression scheme. JSW 7, 6 (2012), 1385–1392.

Cited By

View all
  • (2023)Digital hoarding and personal use digital dataHuman–Computer Interaction10.1080/07370024.2023.2293001(1-20)Online publication date: 21-Dec-2023
  • (2022)Summarizing Sets of Related ML-Driven Recommendations for Improving File Management in Cloud StorageProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526113.3545704(1-11)Online publication date: 29-Oct-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
UIST '21: The 34th Annual ACM Symposium on User Interface Software and Technology
October 2021
1357 pages
ISBN:9781450386357
DOI:10.1145/3472749
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2021

Check for updates

Author Tags

  1. Google Drive
  2. cloud storage
  3. personal information management
  4. recommendations

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

UIST '21

Acceptance Rates

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Upcoming Conference

UIST '25
The 38th Annual ACM Symposium on User Interface Software and Technology
September 28 - October 1, 2025
Busan , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)498
  • Downloads (Last 6 weeks)35
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Digital hoarding and personal use digital dataHuman–Computer Interaction10.1080/07370024.2023.2293001(1-20)Online publication date: 21-Dec-2023
  • (2022)Summarizing Sets of Related ML-Driven Recommendations for Improving File Management in Cloud StorageProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526113.3545704(1-11)Online publication date: 29-Oct-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media