Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3643489.3661116acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

SnapSeek: An Interactive Lifelog Acquisition System for LSC'24

Published: 18 June 2024 Publication History

Abstract

In the digital age, the surge in lifelog data presents challenges for traditional search methods due to their reliance on natural language and lack of context. Our research introduces an advanced semantic search application tailored to Lifelog Search Challenge 2024. Leveraging CLIP, BLIP-2, BEIT-3, and all-mpnet-base-v2 models, our system creates precise embeddings from diverse data formats. With Milvus, swift and accurate data retrieval is ensured for efficient indexing. Unique features include metadata enhancement, query standardization, query auto-parsing, effective ranking system, and innovative UI for further lifelog exploration. These advancements redefine lifelog search, offering enhanced access and navigation within archives, and setting a new standard for intelligent, context-aware solutions.

References

[1]
Naushad Alam, Yvette Graham, and Cathal Gurrin. 2023. Memento 3.0: An enhanced lifelog search engine for LSC'23. In Proceedings of the 6th Annual ACM Lifelog Search Challenge. 41--46.
[2]
Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, and Hwalsuk Lee. 2019. Character Region Awareness for Text Detection. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9357--9366.
[3]
Cathal Gurrin, Alan F. Smeaton, and Aiden R. Doherty. 2014. LifeLogging: Personal Big Data. Foundations and Trends® in Information Retrieval 8, 1 (2014), 1--125.
[4]
Cathal Gurrin, Liting Zhou, Graham Healy, Bailer. Werner, Duc-Tien Dang-Nguyen, Steve Hodges, Björn Þór Jónsson, Jakub Lokoč, Luca Rossetto, Minh-Triet Tran, and Klaus Schöffmann. 2024. Introduction to the Seventh Annual Lifelog Search Challenge, LSC'24. International Conference on Multimedia Retrieval (ICMR'24).
[5]
Glenn Jocher, Ayush Chaurasia, and Jing Qiu. 2023. Ultralytics YOLOv8. https://github.com/ultralytics/ultralytics
[6]
Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, Tom Duerig, and Vittorio Ferrari. 2020. The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale. IJCV (2020).
[7]
Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In International conference on machine learning. PMLR, 19730--19742.
[8]
Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. 2015. Microsoft COCO: Common Objects in Context. arXiv:1405.0312 [cs.CV]
[9]
Thao-Nhu Nguyen, Tu-Khiem Le, Van-Tu Ninh, Cathal Gurrin, Minh-Triet Tran, Thanh Binh Nguyen, Graham Healy, Annalina Caputo, and Sinead Smyth. 2023. E-LifeSeeker: An Interactive Lifelog Search Engine for LSC'23. In Proceedings of the 6th Annual ACM Lifelog Search Challenge (Thessaloniki, Greece) (LSC '23). Association for Computing Machinery, New York, NY, USA, 13--17.
[10]
Tien-Thanh Nguyen-Dang, Xuan-Dang Thai, Gia-Huy Vuong, Van-Son Ho, Minh-Triet Tran, Van-Tu Ninh, Minh-Khoi Pham, Tu-Khiem Le, and Graham Healy. 2023. LifeInsight: An Interactive Lifelog Retrieval System with Comprehensive Spatial Insights and Query Assistance. In Proceedings of the 6th Annual ACM Lifelog Search Challenge (Thessaloniki, Greece) (LSC '23). Association for Computing Machinery, New York, NY, USA, 59--64.
[11]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748--8763.
[12]
Ricardo Ribeiro, Luísa Amaral, Wei Ye, Alina Trifan, António JR Neves, and Pedro Iglésias. 2023. MEMORIA: A Memory Enhancement and MOment RetrIeval Application for LSC 2023. In Proceedings of the 6th Annual ACM Lifelog Search Challenge. 18--23.
[13]
Klaus Schoeffmann. 2023. lifeXplore at the Lifelog Search Challenge 2023. In Proceedings of the 6th Annual ACM Lifelog Search Challenge (Thessaloniki, Greece) (LSC '23). Association for Computing Machinery, New York, NY, USA, 53--58.
[14]
Mingxing Tan and Quoc V. Le. 2020. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv:1905.11946 [cs.LG]
[15]
Ly Duyen Tran, Binh Nguyen, Liting Zhou, and Cathal Gurrin. 2023. MyEachtra: Event-Based Interactive Lifelog Retrieval System for LSC'23. In Proceedings of the 6th Annual ACM Lifelog Search Challenge (Thessaloniki, Greece) (LSC '23). Association for Computing Machinery, New York, NY, USA, 24--29.
[16]
Quang-Linh Tran, Ly-Duyen Tran, Binh Nguyen, and Cathal Gurrin. 2023. MemoriEase: An Interactive Lifelog Retrieval System for LSC'23. In Proceedings of the 6th Annual ACM Lifelog Search Challenge. 30--35.
[17]
Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. 2022. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv:2207.02696 [cs.CV]
[18]
Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. 2024. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv preprint arXiv:2402.13616 (2024).
[19]
Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, and Lijuan Wang. 2022. Git: A generative image-to-text transformer for vision and language. arXiv preprint arXiv:2205.14100 (2022).
[20]
Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, et al. 2022. Image as a foreign language: Beit pretraining for all vision and vision-language tasks. arXiv preprint arXiv:2208.10442 (2022).
[21]
Wenhao Wu, Zhun Sun, and Wanli Ouyang. 2023. Revisiting classifier: Transferring vision-language models for video recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 2847--2855.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
LSC '24: Proceedings of the 7th Annual ACM Workshop on the Lifelog Search Challenge
June 2024
128 pages
ISBN:9798400705502
DOI:10.1145/3643489
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2024

Check for updates

Author Tags

  1. lifelog retrieval
  2. image retrieval
  3. content-based image retrieval
  4. lifelog search challenge

Qualifiers

  • Research-article

Funding Sources

  • Vingroup Innovation Foundation (VinIF)

Conference

LSC '24
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 54
    Total Downloads
  • Downloads (Last 12 months)54
  • Downloads (Last 6 weeks)4
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media