Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2736277.2741112acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Tagging Personal Photos with Transfer Deep Learning

Published: 18 May 2015 Publication History

Abstract

The advent of mobile devices and media cloud services has led to the unprecedented growing of personal photo collections. One of the fundamental problems in managing the increasing number of photos is automatic image tagging. Existing research has predominantly focused on tagging general Web images with a well-labelled image database, e.g., ImageNet. However, they can only achieve limited success on personal photos due to the domain gaps between personal photos and Web images. These gaps originate from the differences in semantic distribution and visual appearance. To deal with these challenges, in this paper, we present a novel transfer deep learning approach to tag personal photos. Specifically, to solve the semantic distribution gap, we have designed an ontology consisting of a hierarchical vocabulary tailored for personal photos. This ontology is mined from $10,000$ active users in Flickr with 20 million photos and 2.7 million unique tags. To deal with the visual appearance gap, we discover the intermediate image representations and ontology priors by deep learning with bottom-up and top-down transfers across two domains, where Web images are the source domain and personal photos are the target. Moreover, we present two modes (single and batch-modes) in tagging and find that the batch-mode is highly effective to tag photo collections. We conducted personal photo tagging on 7,000 real personal photos and personal photo search on the MIT-Adobe FiveK photo dataset. The proposed tagging approach is able to achieve a performance gain of $12.8\%$ and $4.5\%$ in terms of NDCG@5, against the state-of-the-art hand-crafted feature-based and deep learning-based methods, respectively.

References

[1]
Y. Bengio. Learning deep architectures for AI. Found. Trends Mach. Learn., 2(1):1--127, Jan. 2009.
[2]
V. Bychkovsky, S. Paris, E. Chan, and F. Durand. Learning photographic global tonal adjustment with a database of input / output image pairs. In CVPR, pages 97--104, 2011.
[3]
L. Cao, J. Luo, H. A. Kautz, and T. S. Huang. Annotating collections of photos using hierarchical event and scene models. In CVPR, pages 1--8, 2008.
[4]
T. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. Nus-wide: A real-world web image database from national university of singapore. In Proc. of ACM Conf. on Image and Video Retrieval, pages 1--9, 2009.
[5]
N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, pages 886--893, 2005.
[6]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR, pages 248--255, 2009.
[7]
J. Fu, J. Wang, Y. Rui, X.-J. Wang, T. Mei, and H. Lu. Image tag refinment with view-dependent concept representations. In IEEE Transactions on Circuits and Systems for Video Technology. IEEE, 2014.
[8]
A. C. Gallagher, C. Neustaedter, L. Cao, J. Luo, and T. Chen. Image annotation using personal calendars as context. In ACM Multimedia, pages 681--684, 2008.
[9]
X. Glorot, A. Bordes, and Y. Bengio. Domain adaptation for large-scale sentiment classification: A deep learning approach. In ICML, pages 513--520, 2011.
[10]
B. Gong, Y. Shi, F. Sha, and K. Grauman. Geodesic flow kernel for unsupervised domain adaptation. In CVPR, pages 2066--2073, 2012.
[11]
M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In ICCV, pages 309--316, 2009.
[12]
N. Imran, J. Liu, J. Luo, and M. Shah. Event recognition from photo collections via pagerank. In ACM Multimedia, pages 621--624, 2009.
[13]
C. Ji, X. Zhou, L. Lin, and W. Yang. Labeling images by integrating sparse multiple distance learning and semantic context modeling. In ECCV, pages 688--701, 2012.
[14]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1106--1114, 2012.
[15]
Q. V. Le, M. Ranzato, R. Monga, M. Devin, G. Corrado, K. Chen, J. Dean, and A. Y. Ng. Building high-level features using large scale unsupervised learning. In ICML, 2012.
[16]
C. Li, Q. Liu, J. Liu, and H. Lu. Learning ordinal discriminative features for age estimation. In CVPR, pages 2570--2577, 2012.
[17]
T. Li, T. Mei, I.-S. Kweon, and X.-S. Hua. Contextual bag-of-words for visual categorization. IEEE Transactions on Circuits and Systems for Video Technology, 21(4):381--392, Apr. 2011.
[18]
X. Li, C. G. Snoek, and M. Worring. Learning tag relevance by neighbor voting for social image retrieval. In Proc. ACM International Conference on Multimedia In formation Retrieval, pages 180--187, 2008.
[19]
D. Liu, X.-S. Hua, L. Yang, M. Wang, and H.-J. Zhang. Tag ranking. In WWW, pages 351--360, 2009.
[20]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60(2):91--110, Nov. 2004.
[21]
J. Masci, U. Meier, D. C. Ciresan, and J. Schmidhuber. Stacked convolutional auto-encoders for hierarchical feature extraction. In ICANN, pages 52--59, 2011.
[22]
O. Maxime, B. Leno, L. Ivan, and S. Josef. Learning and transferring mid-level image representations using convolutional neural networks. In CVPR, pages 1717--1724, 2014.
[23]
A. Oliva and A. Torralba. Building the gist of a scene: The role of global image features in recognition. Visual Perception, Progress in Brain Research, 155, 2006.
[24]
J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In ICCV, pages 1470--1477, 2003.
[25]
N. Srivastava and R. Salakhutdinov. Discriminative transfer learning with tree-based priors. In NIPS, pages 2094--2102, 2013.
[26]
A. Torralba and A. A. Efros. Unbiased look at dataset bias. In CVPR, pages 1521--1528, 2011.
[27]
D. Tsai, Y. Jing, Y. Liu, H. A. Rowley, S. Ioffe, and J. M. Rehg. Large-scale image annotation using visual synset. In ICCV, pages 611--618, 2011.
[28]
H. Wang, F. Nie, H. Huang, and C. Ding. Dyadic transfer learning for cross-domain image classification. In ICCV, pages 551--556, 2011.
[29]
X.-J. Wang, L. Zhang, F. Jing, and W.-Y. Ma. Annosearch: Image auto-annotation by search. In CVPR, pages 1483--1490, 2006.
[30]
P. Wu, S. C.-H. Hoi, P. Zhao, and Y. He. Mining social images with distance metric learning for automated image tagging. In WSDM, pages 197--206, 2011.

Cited By

View all
  • (2023)EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00487(5262-5274)Online publication date: 1-Oct-2023
  • (2023)Anchor-Based Detection for Natural Language Localization in Ego-Centric Videos2023 IEEE International Conference on Consumer Electronics (ICCE)10.1109/ICCE56470.2023.10043460(01-04)Online publication date: 6-Jan-2023
  • (2023)An NLP-guided ontology development and refinement approach to represent and query visual informationExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.118998213:PBOnline publication date: 1-Mar-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '15: Proceedings of the 24th International Conference on World Wide Web
May 2015
1460 pages
ISBN:9781450334693

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 18 May 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. image tagging
  3. ontology
  4. personal photo
  5. transfer learning

Qualifiers

  • Research-article

Conference

WWW '15
Sponsor:
  • IW3C2

Acceptance Rates

WWW '15 Paper Acceptance Rate 131 of 929 submissions, 14%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00487(5262-5274)Online publication date: 1-Oct-2023
  • (2023)Anchor-Based Detection for Natural Language Localization in Ego-Centric Videos2023 IEEE International Conference on Consumer Electronics (ICCE)10.1109/ICCE56470.2023.10043460(01-04)Online publication date: 6-Jan-2023
  • (2023)An NLP-guided ontology development and refinement approach to represent and query visual informationExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.118998213:PBOnline publication date: 1-Mar-2023
  • (2023)Co-creating an Object Recognition Exergame with Hospital Service Users to Promote Physical ActivityUniversal Access in Human-Computer Interaction10.1007/978-3-031-35897-5_43(609-619)Online publication date: 9-Jul-2023
  • (2022)VizOPS: A data-driven ontology to represent public place surveillance data2022 IEEE Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI)10.1109/IATMSI56455.2022.10119327(1-6)Online publication date: 21-Dec-2022
  • (2022)Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.00498(5026-5035)Online publication date: Jun-2022
  • (2022)A Metaverse: Taxonomy, Components, Applications, and Open ChallengesIEEE Access10.1109/ACCESS.2021.314017510(4209-4251)Online publication date: 2022
  • (2021)Introducing Tagasaurus, an Approach to Reduce Cognitive Fatigue from Long-Term Interface Usage When Storing Descriptions and Impressions from PhotographsTechnologies10.3390/technologies90300459:3(45)Online publication date: 29-Jun-2021
  • (2021)Towards the use of IoT and AI for Pervasive Exergames2021 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT)10.1109/GCAIoT53516.2021.9692952(1-6)Online publication date: 12-Dec-2021
  • (2021)Multi-level dictionary learning for fine-grained images categorization with attention modelNeurocomputing10.1016/j.neucom.2020.07.147453:C(403-412)Online publication date: 17-Sep-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media