Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

PHOcus: efficiently archiving photos

Published: 01 August 2022 Publication History

Abstract

Our ability to collect data is rapidly outstripping our ability to effectively store and use it. Organizations are therefore facing tough decisions of what data to archive (or dispose of) to effectively meet their business goals. PHOcus addresses this problem in the context of image data (photos) by proposing which photos to archive to meet an online storage budget. The decision is based on factors such as usage patterns and their relative importance, the quality and size of a photo, the relevance of a photo for a usage pattern, the similarity between different photos, as well as policy requirements of what photos must be retained. We formalize the photo archival problem and give an efficient algorithm with an optimal approximation guarantee. We then demonstrate our system, PHOcus, on an e-commerce application as well as with personal photos on a smartphone, and discuss how many of the inputs to the problem can be automatically obtained.

References

[1]
Arnon Dagan, Ido Guy, and Slava Novgorodov. An image is worth a thousand terms? analysis of visual e-commerce search. In SIGIR, 2021.
[2]
Dataage 2025 - the digitization of the world, seagate us. http://www.seagate.com/our-story/data-age-2025.
[3]
General Data Protection Regulation (GDPR). https://en.wikipedia.org/wiki/General_Data_Protection_Regulation.
[4]
Shay Gershtein, Tova Milo, and Slava Novgorodov. Inventory reduction via maximal coverage in e-commerce. In EDBT, pages 522--533, 2020.
[5]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. of CVPR, pages 770--778, 2016.
[6]
Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, et al. The open images dataset v4. IJCV, 128(7):1956--1981, 2020.
[7]
Liying Li, Guodong Zhao, and Rick S Blum. A survey of caching techniques in cellular networks: Research issues and challenges in content placement and delivery strategies. IEEE Communications Surveys & Tutorials, 20(3).
[8]
David G Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91--110, 2004.
[9]
Tova Milo. Getting rid of data. ACM J. Data Inf. Qual., 12(1):1:1--1:7, 2020.
[10]
Michel Minoux. Accelerated greedy algorithms for maximizing submodular set functions. In Optimization techniques, pages 234--243. Springer, 1978.
[11]
Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, and Amin Karbasi. Fast constrained submodular maximization: Personalized data summarization. In International Conference on Machine Learning, pages 1358--1367. PMLR, 2016.
[12]
Ian Simon, Noah Snavely, and Steven M Seitz. Scene summarization for online image collections. In Proc. of ICCV, pages 1--8. IEEE, 2007.
[13]
Anurag Singh, Lakshay Virmani, and AV Subramanyam. Image corpus representative summarization. In Proc. of BigMM, pages 21--29, 2019.
[14]
Pinaki Sinha, Sharad Mehrotra, and Ramesh Jain. Summarization of personal photologs using multidimensional content and context. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, pages 1--8, 2011.
[15]
Maxim Sviridenko. A note on maximizing a submodular set function subject to a knapsack constraint. Operations Research Letters, 32(1):41--43, 2004.
[16]
Sebastian Tschiatschek, Rishabh K Iyer, Haochen Wei, and Jeff A Bilmes. Learning mixtures of submodular functions for image collection summarization. In Advances in neural information processing systems, pages 1413--1421, 2014.
[17]
Victor Zakhary, Divyakant Agrawal, and Amr El Abbadi. Caching at the web scale: [tutorial]. In Proc. of WWW, pages 909--912, 2017.

Cited By

View all
  • (2023)Toward a Life Cycle Assessment for the Carbon Footprint of DataProceedings of the 2nd Workshop on Sustainable Computer Systems10.1145/3604930.3605724(1-9)Online publication date: 9-Jul-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 15, Issue 12
August 2022
551 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2022
Published in PVLDB Volume 15, Issue 12

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)1
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Toward a Life Cycle Assessment for the Carbon Footprint of DataProceedings of the 2nd Workshop on Sustainable Computer Systems10.1145/3604930.3605724(1-9)Online publication date: 9-Jul-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media