research-article

Tagging Personal Photos with Transfer Deep Learning

Authors:

Yong RuiAuthors Info & Claims

WWW '15: Proceedings of the 24th International Conference on World Wide Web

Pages 344 - 354

https://doi.org/10.1145/2736277.2741112

Published: 18 May 2015 Publication History

Abstract

The advent of mobile devices and media cloud services has led to the unprecedented growing of personal photo collections. One of the fundamental problems in managing the increasing number of photos is automatic image tagging. Existing research has predominantly focused on tagging general Web images with a well-labelled image database, e.g., ImageNet. However, they can only achieve limited success on personal photos due to the domain gaps between personal photos and Web images. These gaps originate from the differences in semantic distribution and visual appearance. To deal with these challenges, in this paper, we present a novel transfer deep learning approach to tag personal photos. Specifically, to solve the semantic distribution gap, we have designed an ontology consisting of a hierarchical vocabulary tailored for personal photos. This ontology is mined from $10,000$ active users in Flickr with 20 million photos and 2.7 million unique tags. To deal with the visual appearance gap, we discover the intermediate image representations and ontology priors by deep learning with bottom-up and top-down transfers across two domains, where Web images are the source domain and personal photos are the target. Moreover, we present two modes (single and batch-modes) in tagging and find that the batch-mode is highly effective to tag photo collections. We conducted personal photo tagging on 7,000 real personal photos and personal photo search on the MIT-Adobe FiveK photo dataset. The proposed tagging approach is able to achieve a performance gain of $12.8\%$ and $4.5\%$ in terms of NDCG@5, against the state-of-the-art hand-crafted feature-based and deep learning-based methods, respectively.

References

[1]

Y. Bengio. Learning deep architectures for AI. Found. Trends Mach. Learn., 2(1):1--127, Jan. 2009.

Digital Library

[2]

V. Bychkovsky, S. Paris, E. Chan, and F. Durand. Learning photographic global tonal adjustment with a database of input / output image pairs. In CVPR, pages 97--104, 2011.

Digital Library

[3]

L. Cao, J. Luo, H. A. Kautz, and T. S. Huang. Annotating collections of photos using hierarchical event and scene models. In CVPR, pages 1--8, 2008.

[4]

T. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. Nus-wide: A real-world web image database from national university of singapore. In Proc. of ACM Conf. on Image and Video Retrieval, pages 1--9, 2009.

Digital Library

[5]

N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, pages 886--893, 2005.

Digital Library

[6]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR, pages 248--255, 2009.

[7]

J. Fu, J. Wang, Y. Rui, X.-J. Wang, T. Mei, and H. Lu. Image tag refinment with view-dependent concept representations. In IEEE Transactions on Circuits and Systems for Video Technology. IEEE, 2014.

[8]

A. C. Gallagher, C. Neustaedter, L. Cao, J. Luo, and T. Chen. Image annotation using personal calendars as context. In ACM Multimedia, pages 681--684, 2008.

Digital Library

[9]

X. Glorot, A. Bordes, and Y. Bengio. Domain adaptation for large-scale sentiment classification: A deep learning approach. In ICML, pages 513--520, 2011.

Digital Library

[10]

B. Gong, Y. Shi, F. Sha, and K. Grauman. Geodesic flow kernel for unsupervised domain adaptation. In CVPR, pages 2066--2073, 2012.

Digital Library

[11]

M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In ICCV, pages 309--316, 2009.

[12]

N. Imran, J. Liu, J. Luo, and M. Shah. Event recognition from photo collections via pagerank. In ACM Multimedia, pages 621--624, 2009.

Digital Library

[13]

C. Ji, X. Zhou, L. Lin, and W. Yang. Labeling images by integrating sparse multiple distance learning and semantic context modeling. In ECCV, pages 688--701, 2012.

Digital Library

[14]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1106--1114, 2012.

Digital Library

[15]

Q. V. Le, M. Ranzato, R. Monga, M. Devin, G. Corrado, K. Chen, J. Dean, and A. Y. Ng. Building high-level features using large scale unsupervised learning. In ICML, 2012.

Digital Library

[16]

C. Li, Q. Liu, J. Liu, and H. Lu. Learning ordinal discriminative features for age estimation. In CVPR, pages 2570--2577, 2012.

Digital Library

[17]

T. Li, T. Mei, I.-S. Kweon, and X.-S. Hua. Contextual bag-of-words for visual categorization. IEEE Transactions on Circuits and Systems for Video Technology, 21(4):381--392, Apr. 2011.

Digital Library

[18]

X. Li, C. G. Snoek, and M. Worring. Learning tag relevance by neighbor voting for social image retrieval. In Proc. ACM International Conference on Multimedia In formation Retrieval, pages 180--187, 2008.

Digital Library

[19]

D. Liu, X.-S. Hua, L. Yang, M. Wang, and H.-J. Zhang. Tag ranking. In WWW, pages 351--360, 2009.

Digital Library

[20]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60(2):91--110, Nov. 2004.

Digital Library

[21]

J. Masci, U. Meier, D. C. Ciresan, and J. Schmidhuber. Stacked convolutional auto-encoders for hierarchical feature extraction. In ICANN, pages 52--59, 2011.

Digital Library

[22]

O. Maxime, B. Leno, L. Ivan, and S. Josef. Learning and transferring mid-level image representations using convolutional neural networks. In CVPR, pages 1717--1724, 2014.

Digital Library

[23]

A. Oliva and A. Torralba. Building the gist of a scene: The role of global image features in recognition. Visual Perception, Progress in Brain Research, 155, 2006.

[24]

J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In ICCV, pages 1470--1477, 2003.

Digital Library

[25]

N. Srivastava and R. Salakhutdinov. Discriminative transfer learning with tree-based priors. In NIPS, pages 2094--2102, 2013.

[26]

A. Torralba and A. A. Efros. Unbiased look at dataset bias. In CVPR, pages 1521--1528, 2011.

Digital Library

[27]

D. Tsai, Y. Jing, Y. Liu, H. A. Rowley, S. Ioffe, and J. M. Rehg. Large-scale image annotation using visual synset. In ICCV, pages 611--618, 2011.

Digital Library

[28]

H. Wang, F. Nie, H. Huang, and C. Ding. Dyadic transfer learning for cross-domain image classification. In ICCV, pages 551--556, 2011.

Digital Library

[29]

X.-J. Wang, L. Zhang, F. Jing, and W.-Y. Ma. Annosearch: Image auto-annotation by search. In CVPR, pages 1483--1490, 2006.

Digital Library

[30]

P. Wu, S. C.-H. Hoi, P. Zhao, and Y. He. Mining social images with distance metric learning for automated image tagging. In WSDM, pages 197--206, 2011.

Digital Library

Cited By

Pramanick SSong YNag SLin KShah HShou MChellappa RZhang P(2023)EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00487(5262-5274)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICCV51070.2023.00487
Liu BZheng SFu JCheng W(2023)Anchor-Based Detection for Natural Language Localization in Ego-Centric Videos2023 IEEE International Conference on Consumer Electronics (ICCE)10.1109/ICCE56470.2023.10043460(01-04)Online publication date: 6-Jan-2023
https://doi.org/10.1109/ICCE56470.2023.10043460
Patel AMerlino GPuliafito AVyas RVyas OOjha MTiwari V(2023)An NLP-guided ontology development and refinement approach to represent and query visual informationExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.118998213:PBOnline publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1016/j.eswa.2022.118998
Show More Cited By

Index Terms

Tagging Personal Photos with Transfer Deep Learning

Recommendations

Tagging photos using users' vocabularies

Online social image share websites such as Flickr and Panoramio allow users to manually annotate their images with their own words, which can be used to facilitating image retrieval and other image applications. The smart-phones have made it possible ...
Tagging tagged images: on the impact of existing annotations on image tagging
CrowdMM '12: Proceedings of the ACM multimedia 2012 workshop on Crowdsourcing for multimedia

Crowdsourcing has been widely used to generate metadata for multimedia resources. By presenting partially described resources to human annotators, resources are tagged yielding better descriptions. Although significant improvements in metadata quality ...
Geo-referenced Tourist Attraction Photo Tagging by Mining Community Photo Collections
Advances in Multimedia Information Processing – PCM 2013
Abstract
The advent of photo sharing sites like Flickr has drastically increased the volume of community photo collections on the web. Also the rising popularity of the mobile devices with GPS cameras like iPhone has made most of the photos geo-tagged. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '15: Proceedings of the 24th International Conference on World Wide Web

May 2015

1460 pages

ISBN:9781450334693

General Chairs:
Aldo Gangemi
National Research Council, Italy & Paris 13 University-CNRS, France
,
Stefano Leonardi
Sapienza University of Rome, Italy
,
Alessandro Panconesi
Sapienza University of Rome, Italy

Copyright © 2015 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

Sponsors

IW3C2: International World Wide Web Conference Committee

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

International World Wide Web Conferences Steering Committee

Republic and Canton of Geneva, Switzerland

Publication History

Published: 18 May 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '15

Sponsor:

IW3C2

WWW '15: 24th International World Wide Web Conference

May 18 - 22, 2015

Florence, Italy

Acceptance Rates

WWW '15 Paper Acceptance Rate 131 of 929 submissions, 14%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
603
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)2

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pramanick SSong YNag SLin KShah HShou MChellappa RZhang P(2023)EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00487(5262-5274)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICCV51070.2023.00487
Liu BZheng SFu JCheng W(2023)Anchor-Based Detection for Natural Language Localization in Ego-Centric Videos2023 IEEE International Conference on Consumer Electronics (ICCE)10.1109/ICCE56470.2023.10043460(01-04)Online publication date: 6-Jan-2023
https://doi.org/10.1109/ICCE56470.2023.10043460
Patel AMerlino GPuliafito AVyas RVyas OOjha MTiwari V(2023)An NLP-guided ontology development and refinement approach to represent and query visual informationExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.118998213:PBOnline publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1016/j.eswa.2022.118998
Woodward KKanjo EParker W(2023)Co-creating an Object Recognition Exergame with Hospital Service Users to Promote Physical ActivityUniversal Access in Human-Computer Interaction10.1007/978-3-031-35897-5_43(609-619)Online publication date: 9-Jul-2023
https://doi.org/10.1007/978-3-031-35897-5_43
Patel ATiwari VOjha MVyas O(2022)VizOPS: A data-driven ontology to represent public place surveillance data2022 IEEE Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI)10.1109/IATMSI56455.2022.10119327(1-6)Online publication date: 21-Dec-2022
https://doi.org/10.1109/IATMSI56455.2022.10119327
Xue HHang TZeng YSun YLiu BYang HFu JGuo B(2022)Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.00498(5026-5035)Online publication date: Jun-2022
https://doi.org/10.1109/CVPR52688.2022.00498
Park SKim Y(2022)A Metaverse: Taxonomy, Components, Applications, and Open ChallengesIEEE Access10.1109/ACCESS.2021.314017510(4209-4251)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2021.3140175
Mantzaris APandohie RHopwood MPho PEhling DWalker T(2021)Introducing Tagasaurus, an Approach to Reduce Cognitive Fatigue from Long-Term Interface Usage When Storing Descriptions and Impressions from PhotographsTechnologies10.3390/technologies90300459:3(45)Online publication date: 29-Jun-2021
https://doi.org/10.3390/technologies9030045
Woodward KKanjo EParker W(2021)Towards the use of IoT and AI for Pervasive Exergames2021 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT)10.1109/GCAIoT53516.2021.9692952(1-6)Online publication date: 12-Dec-2021
https://doi.org/10.1109/GCAIoT53516.2021.9692952
Ji JGuo YYang ZZhang TLu X(2021)Multi-level dictionary learning for fine-grained images categorization with attention modelNeurocomputing10.1016/j.neucom.2020.07.147453:C(403-412)Online publication date: 17-Sep-2021
https://dl.acm.org/doi/10.1016/j.neucom.2020.07.147
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents