research-article

Content-based copy detection through multimodal feature representation and temporal pyramid matching

Authors:

Wen GaoAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 10, Issue 1

Article No.: 5, Pages 1 - 20

https://doi.org/10.1145/2542205.2542208

Published: 27 December 2013 Publication History

Abstract

Content-based copy detection (CBCD) is drawing increasing attention as an alternative technology to watermarking for video identification and copyright protection. In this article, we present a comprehensive method to detect copies that are subjected to complicated transformations. A multimodal feature representation scheme is designed to exploit the complementarity of audio features, global and local visual features so that optimal overall robustness to a wide range of complicated modifications can be achieved. Meanwhile, a temporal pyramid matching algorithm is proposed to assemble frame-level similarity search results into sequence-level matching results through similarity evaluation over multiple temporal granularities. Additionally, inverted indexing and locality sensitive hashing (LSH) are also adopted to speed up similarity search. Experimental results over benchmarking datasets of TRECVID 2010 and 2009 demonstrate that the proposed method outperforms other methods for most transformations in terms of copy detection accuracy. The evaluation results also suggest that our method can achieve competitive copy localization preciseness.

References

[1]

Ahmed, F., Siyal, M. Y., and Abbas, U. V. 2010. A secure and robust hash-based scheme for image authentication. Signal Process. 90, 5, 1456--1470.

Digital Library

[2]

Ballard, D. H. 1981. Generalizing the Hough transform to detect arbitrary shapes. Patt. Recog. 13, 2, 111--122.

[3]

Bay, H., Tuytelaars, T., and Gool, L. V. 2006. SURF: Speeded Up Robust Features. In Proceedings of the 9th European Conference on Computer Vision (ECCV'06), (Graz, Austria). 404--417.

Digital Library

[4]

Bosch, A., Zisserman, A., and Muñoz, X. 2008. Scene classification using a hybrid generative/discriminative approach. IEEE Trans. Pattern Anal. Mach. Intell. 30, 4, 712--727.

Digital Library

[5]

Cano, P., Batlle, E., Kalker, T., and Haitsma, J. 2005. A review of audio fingerprinting. J. VLSI Signal Process. 41, 3, 271--284.

Digital Library

[6]

Cano, P., Batlle, E., Mayer, H., and Neuschmied, H. 2002. Robust sound modeling for song detection in broadcast audio. In Proceedings of AES 112th International Convention (Germany).

[7]

Chen, J. and Huang, T. 2008. A robust feature extraction algorithm for audio fingerprinting. In Proceedings of the 9th Pacific Rim Conference on Multimedia (PCM'08), 887--890.

Digital Library

[8]

Chen, L. and Stentiford, F. W. M. 2008. Video sequence matching based on temporal ordinal measurement. Patt. Recog. Lett. 29, 13, 1824--1831.

Digital Library

[9]

Cheung, S. S. and Zakhor, A. 2003. Efficient video similarity measurement with video signature. IEEE Trans. Circuits Syst. Video Technol. 13, 1, 59--74.

Digital Library

[10]

De Roover, C., De Vleeschouwer, C., Lefèbvre, F., and Macq, B. 2005. Robust video hashing based on radial projections of key frames. IEEE Trans. Signal Proc. 53, 10, 4020--4037.

Digital Library

[11]

Douze, M., Jégou, H., and Schmid, C. 2010. An image-based approach to video copy detection with spatio-temporal post-filtering. IEEE Trans. Multimedia 12, 4, 257--266.

Digital Library

[12]

Gionis, A., Indyk, P., and Motwani, R. 1999. Similarity search in high dimensions via hashing. In Proceedings of the 25th International Conference on Very Large Data Bases. 518--529.

Digital Library

[13]

Grauman, K. and Darrell, T. 2005. The pyramid match kernel: Discriminative classification with sets of image features. In Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV'05). 1458--1465.

Digital Library

[14]

Hampapur, A. and Bolle, R. M. 2001. Comparison of distance measures for video copy detection. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'01). 737--740.

[15]

Hua, X.-S., Chen, X., and Zhang, H.-J. 2004. Robust video signature based on ordinal measure. In Proceedings of the IEEE International Conference on Image Processing (ICIP'04). 685--688.

[16]

Huang, T., Tian, Y., Gao, W., and Lu, J. 2010. Mediaprinting: Identifying multimedia content for digital rights management. Computer. 43, 12, 28--35.

Digital Library

[17]

Iwamoto, K., Kasutani, E., and Yamada, A. 2006. Image signature robust to caption superimposition for video sequence identification. In Proceedings of the IEEE International Conference on Image Processing (ICIP'06). 3185--3188.

[18]

Internet Archive. www.archive.org.

[19]

Joly, A., Buisson, O., and Frélicot, C. 2007. Content-based copy retrieval using distortion-based probabilistic similarity search. IEEE Trans. Multimedia 9, 2, 293--306.

Digital Library

[20]

Kim, C. and Vasudev, B. 2005. Spatiotemporal sequence matching for efficient video copy detection. IEEE Trans. Circuits Syst. Video Technol. 15, 1, 127--132.

Digital Library

[21]

Kim, H., Lee, J., Liu, H., and Lee, D. 2008. Video linkage: Group based copied video detection. In Proceedings of the ACM International Conference on Content-Based Image Video Retrieval (CIVR'08). 397--406.

Digital Library

[22]

Law-To, J., Buisson, O., Gouet-Brunet, V., and Boujemaa, N. 2006. Robust voting algorithms based on labels of behavior for video copy detection. In Proceedings of the ACM International Conference on Multimedia (MM). (Santa Barbara, CA). 835--844.

Digital Library

[23]

Lazebnik, S., Schmid, C., and Ponce, J. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the 19th IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2169--2178.

Digital Library

[24]

Lee, S. and Yoo, C. D. 2006. Video fingerprinting based on centroids of gradient orientations. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'06). 401--404.

[25]

Li, Y., Mou, L., Su, C., Fang, X., Qian, M., Jiang, M., Wang, Y., Tian, Y., Huang, T., and Gao, W. 2010. PKU&commat;TRECVID2010: Copy detection with visual-audio feature fusion and sequential pyramid matching. In Online Proceedings of TRECVID 2010 Workshop.

[26]

Lin, C.-Y. and Chang, S.-F. 2001. A robust image authentication method distinguishing jpeg compression from malicious manipulation. IEEE Trans. Circuits Syst. Video Technol. 11, 2, 153--168.

Digital Library

[27]

Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2, 91--110.

Digital Library

[28]

Mikolajczyk, K. and Schmid, C. 2005. A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27, 10, 1615--1630.

Digital Library

[29]

Mou, L., Huang, T., Tian, Y., Lian, S., and Chen, X. 2011. Robust and discriminative image authentication based on sparse coding. In Proceedings of IEEE Consumer Communications and Networking Conference (CCNC'11). 323--326.

[30]

Mou, L., Chen, X., Tian, Y., and Huang, T. 2012. Robust and discriminative image authentication based on standard model feature. In Proceedings of IEEE International Symposium on Circuits & Systems (ISCAS'12). 1131--1134.

[31]

MPEG. 2002. ISO/IEC 15938-4:2002 Information technology -- Multimedia content description interface -- Part 4: Audio. Oostveen, J., Kalker, T., and Haitsma, J. 2002. Feature extraction and a database strategy for video & fingerprinting. Vis. Lect. Notes Comput. Sci. 2, 117--128.

[32]

Over, P., Awad, G. M., Fiscus, J., Antonishek, B., Michel, M., Smeaton, A. F., Kraaij, W., and Quénot, G. 2010. TRECVID 2010 -- An overview of the goals, tasks, data, evaluation mechanisms, and metrics. In Proceedings of TRECVid.

[33]

Radhakrishnan, R. and Bauer, C. 2008. Robust video fingerprints based on subspace embedding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'08). 2245--2248.

[34]

Shivakumar, N. N. 1999. Detecting digital copyright violations on the Internet. Ph.D. Dissertation, Stanford University.

Digital Library

[35]

Sivic, J. and Zisserman, A. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV'03). 1470--1477.

Digital Library

[36]

Swaminathan, A., Mao, Y., and Wu, M. 2006. Robust and secure image hashing. IEEE Trans. Inf. Forensics Security 1, 2, 215--230.

Digital Library

[37]

Tian, Y., Jiang, M., Mou, L., Fang, X., and Huang, T. 2011. A multimodal video copy detection approach with sequential pyramid matching. In Proceedings of the IEEE International Conference on Image Processing (ICIP'11). 3629--3632.

[38]

Wang, X. and Kankanhalli, M. 2010. MultiFusion: A boosting approach for multimedia fusion. ACM Trans. Multimedia Comput. Commun. Appl. 6, 4, Article 25.

Digital Library

[39]

Wei, S., Zhao, Y., Zhu, C., Xu, C., and Zhu, Z. 2011. Frame fusion for video copy detection. IEEE Trans. Circuits Syst. Video Technol. 21, 1, 15--28.

Digital Library

Cited By

Liu XYu YLi XZhao YGuo G(2022)TCSD: Triple Complementary Streams Detector for Comprehensive Deepfake DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/355800419:6(1-22)Online publication date: 22-Aug-2022
https://dl.acm.org/doi/10.1145/3558004
Yu YNi RLi WZhao Y(2022)Detection of AI-Manipulated Fake Faces via Mining Generalized FeaturesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/349902618:4(1-23)Online publication date: 4-Mar-2022
https://dl.acm.org/doi/10.1145/3499026
Shen LHong RHao Y(2020)Advance on large scale near-duplicate video retrievalFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-019-8229-714:5Online publication date: 3-Jan-2020
https://dl.acm.org/doi/10.1007/s11704-019-8229-7
Show More Cited By

Index Terms

Content-based copy detection through multimodal feature representation and temporal pyramid matching
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
2. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

A spectrogram-based audio fingerprinting system for content-based copy detection

This paper presents a novel audio fingerprinting method that is highly robust to a variety of audio distortions. It is based on an unconventional audio fingerprint generation scheme. The robustness is achieved by generating different versions of the ...
Accurate content-based video copy detection with efficient feature indexing
ICMR '11: Proceedings of the 1st ACM International Conference on Multimedia Retrieval

We describe an accurate content-based copy detection system that uses both local and global visual features to ensure robustness. Our system advances state-of-the-art techniques in four key directions. (1) Multiple-codebook-based product quantization: ...
Indexing local configurations of features for scalable content-based video copy detection
LS-MMRM '09: Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining

Content-based video copy detection is relevant for structuring large video databases. The use of local features leads to good robustness to most types of photometric or geometric transformations. However, to achieve both good precision and good recall ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 10, Issue 1

December 2013

166 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/2559928

Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 December 2013

Accepted: 01 April 2013

Revised: 01 November 2012

Received: 01 November 2011

Published in TOMM Volume 10, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
352
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)4

Reflects downloads up to 23 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu XYu YLi XZhao YGuo G(2022)TCSD: Triple Complementary Streams Detector for Comprehensive Deepfake DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/355800419:6(1-22)Online publication date: 22-Aug-2022
https://dl.acm.org/doi/10.1145/3558004
Yu YNi RLi WZhao Y(2022)Detection of AI-Manipulated Fake Faces via Mining Generalized FeaturesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/349902618:4(1-23)Online publication date: 4-Mar-2022
https://dl.acm.org/doi/10.1145/3499026
Shen LHong RHao Y(2020)Advance on large scale near-duplicate video retrievalFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-019-8229-714:5Online publication date: 3-Jan-2020
https://dl.acm.org/doi/10.1007/s11704-019-8229-7
Jing WNie XCui CXi XYang GYin Y(2019)Global-view hashingWorld Wide Web10.1007/s11280-018-0536-722:2(771-789)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1007/s11280-018-0536-7
Yang YTian YHuang T(2019)Multiscale video sequence matching for near-duplicate detection and retrievalMultimedia Tools and Applications10.1007/s11042-018-5862-378:1(311-336)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1007/s11042-018-5862-3
Mou L(2018)Ownership Identification and Signaling of Multimedia Content Components2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)10.1109/MIPR.2018.00049(212-213)Online publication date: Apr-2018
https://doi.org/10.1109/MIPR.2018.00049
Nie XLi XSun JYin YLiu XMu YJiang YLuo J(2017)UFvHProceedings of the Workshop on Visual Analysis in Smart and Connected Communities10.1145/3132734.3132738(17-24)Online publication date: 23-Oct-2017
https://dl.acm.org/doi/10.1145/3132734.3132738
Nie XYin YSun JLiu JCui C(2017)Comprehensive Feature-Based Robust Video Fingerprinting Using Tensor ModelIEEE Transactions on Multimedia10.1109/TMM.2016.262975819:4(785-796)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1109/TMM.2016.2629758
Nie XWeizhen Jing Lin Yuan Ma Chaoran Cui Yin Y(2017)Two-layer video fingerprinting strategy for near-duplicate video detection2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)10.1109/ICMEW.2017.8026322(555-560)Online publication date: Jul-2017
https://doi.org/10.1109/ICMEW.2017.8026322
Ouali CDumouchel PGupta V(2017)Robust video fingerprints using positions of salient regions2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2017.7952715(3041-3045)Online publication date: Mar-2017
https://doi.org/10.1109/ICASSP.2017.7952715
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents