Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Content-based copy detection through multimodal feature representation and temporal pyramid matching

Published: 27 December 2013 Publication History

Abstract

Content-based copy detection (CBCD) is drawing increasing attention as an alternative technology to watermarking for video identification and copyright protection. In this article, we present a comprehensive method to detect copies that are subjected to complicated transformations. A multimodal feature representation scheme is designed to exploit the complementarity of audio features, global and local visual features so that optimal overall robustness to a wide range of complicated modifications can be achieved. Meanwhile, a temporal pyramid matching algorithm is proposed to assemble frame-level similarity search results into sequence-level matching results through similarity evaluation over multiple temporal granularities. Additionally, inverted indexing and locality sensitive hashing (LSH) are also adopted to speed up similarity search. Experimental results over benchmarking datasets of TRECVID 2010 and 2009 demonstrate that the proposed method outperforms other methods for most transformations in terms of copy detection accuracy. The evaluation results also suggest that our method can achieve competitive copy localization preciseness.

References

[1]
Ahmed, F., Siyal, M. Y., and Abbas, U. V. 2010. A secure and robust hash-based scheme for image authentication. Signal Process. 90, 5, 1456--1470.
[2]
Ballard, D. H. 1981. Generalizing the Hough transform to detect arbitrary shapes. Patt. Recog. 13, 2, 111--122.
[3]
Bay, H., Tuytelaars, T., and Gool, L. V. 2006. SURF: Speeded Up Robust Features. In Proceedings of the 9th European Conference on Computer Vision (ECCV'06), (Graz, Austria). 404--417.
[4]
Bosch, A., Zisserman, A., and Muñoz, X. 2008. Scene classification using a hybrid generative/discriminative approach. IEEE Trans. Pattern Anal. Mach. Intell. 30, 4, 712--727.
[5]
Cano, P., Batlle, E., Kalker, T., and Haitsma, J. 2005. A review of audio fingerprinting. J. VLSI Signal Process. 41, 3, 271--284.
[6]
Cano, P., Batlle, E., Mayer, H., and Neuschmied, H. 2002. Robust sound modeling for song detection in broadcast audio. In Proceedings of AES 112th International Convention (Germany).
[7]
Chen, J. and Huang, T. 2008. A robust feature extraction algorithm for audio fingerprinting. In Proceedings of the 9th Pacific Rim Conference on Multimedia (PCM'08), 887--890.
[8]
Chen, L. and Stentiford, F. W. M. 2008. Video sequence matching based on temporal ordinal measurement. Patt. Recog. Lett. 29, 13, 1824--1831.
[9]
Cheung, S. S. and Zakhor, A. 2003. Efficient video similarity measurement with video signature. IEEE Trans. Circuits Syst. Video Technol. 13, 1, 59--74.
[10]
De Roover, C., De Vleeschouwer, C., Lefèbvre, F., and Macq, B. 2005. Robust video hashing based on radial projections of key frames. IEEE Trans. Signal Proc. 53, 10, 4020--4037.
[11]
Douze, M., Jégou, H., and Schmid, C. 2010. An image-based approach to video copy detection with spatio-temporal post-filtering. IEEE Trans. Multimedia 12, 4, 257--266.
[12]
Gionis, A., Indyk, P., and Motwani, R. 1999. Similarity search in high dimensions via hashing. In Proceedings of the 25th International Conference on Very Large Data Bases. 518--529.
[13]
Grauman, K. and Darrell, T. 2005. The pyramid match kernel: Discriminative classification with sets of image features. In Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV'05). 1458--1465.
[14]
Hampapur, A. and Bolle, R. M. 2001. Comparison of distance measures for video copy detection. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME'01). 737--740.
[15]
Hua, X.-S., Chen, X., and Zhang, H.-J. 2004. Robust video signature based on ordinal measure. In Proceedings of the IEEE International Conference on Image Processing (ICIP'04). 685--688.
[16]
Huang, T., Tian, Y., Gao, W., and Lu, J. 2010. Mediaprinting: Identifying multimedia content for digital rights management. Computer. 43, 12, 28--35.
[17]
Iwamoto, K., Kasutani, E., and Yamada, A. 2006. Image signature robust to caption superimposition for video sequence identification. In Proceedings of the IEEE International Conference on Image Processing (ICIP'06). 3185--3188.
[18]
Internet Archive. www.archive.org.
[19]
Joly, A., Buisson, O., and Frélicot, C. 2007. Content-based copy retrieval using distortion-based probabilistic similarity search. IEEE Trans. Multimedia 9, 2, 293--306.
[20]
Kim, C. and Vasudev, B. 2005. Spatiotemporal sequence matching for efficient video copy detection. IEEE Trans. Circuits Syst. Video Technol. 15, 1, 127--132.
[21]
Kim, H., Lee, J., Liu, H., and Lee, D. 2008. Video linkage: Group based copied video detection. In Proceedings of the ACM International Conference on Content-Based Image Video Retrieval (CIVR'08). 397--406.
[22]
Law-To, J., Buisson, O., Gouet-Brunet, V., and Boujemaa, N. 2006. Robust voting algorithms based on labels of behavior for video copy detection. In Proceedings of the ACM International Conference on Multimedia (MM). (Santa Barbara, CA). 835--844.
[23]
Lazebnik, S., Schmid, C., and Ponce, J. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the 19th IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2169--2178.
[24]
Lee, S. and Yoo, C. D. 2006. Video fingerprinting based on centroids of gradient orientations. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'06). 401--404.
[25]
Li, Y., Mou, L., Su, C., Fang, X., Qian, M., Jiang, M., Wang, Y., Tian, Y., Huang, T., and Gao, W. 2010. PKU@TRECVID2010: Copy detection with visual-audio feature fusion and sequential pyramid matching. In Online Proceedings of TRECVID 2010 Workshop.
[26]
Lin, C.-Y. and Chang, S.-F. 2001. A robust image authentication method distinguishing jpeg compression from malicious manipulation. IEEE Trans. Circuits Syst. Video Technol. 11, 2, 153--168.
[27]
Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2, 91--110.
[28]
Mikolajczyk, K. and Schmid, C. 2005. A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27, 10, 1615--1630.
[29]
Mou, L., Huang, T., Tian, Y., Lian, S., and Chen, X. 2011. Robust and discriminative image authentication based on sparse coding. In Proceedings of IEEE Consumer Communications and Networking Conference (CCNC'11). 323--326.
[30]
Mou, L., Chen, X., Tian, Y., and Huang, T. 2012. Robust and discriminative image authentication based on standard model feature. In Proceedings of IEEE International Symposium on Circuits & Systems (ISCAS'12). 1131--1134.
[31]
MPEG. 2002. ISO/IEC 15938-4:2002 Information technology -- Multimedia content description interface -- Part 4: Audio. Oostveen, J., Kalker, T., and Haitsma, J. 2002. Feature extraction and a database strategy for video & fingerprinting. Vis. Lect. Notes Comput. Sci. 2, 117--128.
[32]
Over, P., Awad, G. M., Fiscus, J., Antonishek, B., Michel, M., Smeaton, A. F., Kraaij, W., and Quénot, G. 2010. TRECVID 2010 -- An overview of the goals, tasks, data, evaluation mechanisms, and metrics. In Proceedings of TRECVid.
[33]
Radhakrishnan, R. and Bauer, C. 2008. Robust video fingerprints based on subspace embedding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'08). 2245--2248.
[34]
Shivakumar, N. N. 1999. Detecting digital copyright violations on the Internet. Ph.D. Dissertation, Stanford University.
[35]
Sivic, J. and Zisserman, A. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV'03). 1470--1477.
[36]
Swaminathan, A., Mao, Y., and Wu, M. 2006. Robust and secure image hashing. IEEE Trans. Inf. Forensics Security 1, 2, 215--230.
[37]
Tian, Y., Jiang, M., Mou, L., Fang, X., and Huang, T. 2011. A multimodal video copy detection approach with sequential pyramid matching. In Proceedings of the IEEE International Conference on Image Processing (ICIP'11). 3629--3632.
[38]
Wang, X. and Kankanhalli, M. 2010. MultiFusion: A boosting approach for multimedia fusion. ACM Trans. Multimedia Comput. Commun. Appl. 6, 4, Article 25.
[39]
Wei, S., Zhao, Y., Zhu, C., Xu, C., and Zhu, Z. 2011. Frame fusion for video copy detection. IEEE Trans. Circuits Syst. Video Technol. 21, 1, 15--28.

Cited By

View all
  • (2022)TCSD: Triple Complementary Streams Detector for Comprehensive Deepfake DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/355800419:6(1-22)Online publication date: 22-Aug-2022
  • (2022)Detection of AI-Manipulated Fake Faces via Mining Generalized FeaturesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/349902618:4(1-23)Online publication date: 4-Mar-2022
  • (2020)Advance on large scale near-duplicate video retrievalFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-019-8229-714:5Online publication date: 3-Jan-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 10, Issue 1
December 2013
166 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/2559928
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 December 2013
Accepted: 01 April 2013
Revised: 01 November 2012
Received: 01 November 2011
Published in TOMM Volume 10, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Content-based copy detection
  2. feature representation
  3. temporal pyramid matching

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)4
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)TCSD: Triple Complementary Streams Detector for Comprehensive Deepfake DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/355800419:6(1-22)Online publication date: 22-Aug-2022
  • (2022)Detection of AI-Manipulated Fake Faces via Mining Generalized FeaturesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/349902618:4(1-23)Online publication date: 4-Mar-2022
  • (2020)Advance on large scale near-duplicate video retrievalFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-019-8229-714:5Online publication date: 3-Jan-2020
  • (2019)Global-view hashingWorld Wide Web10.1007/s11280-018-0536-722:2(771-789)Online publication date: 1-Mar-2019
  • (2019)Multiscale video sequence matching for near-duplicate detection and retrievalMultimedia Tools and Applications10.1007/s11042-018-5862-378:1(311-336)Online publication date: 1-Jan-2019
  • (2018)Ownership Identification and Signaling of Multimedia Content Components2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)10.1109/MIPR.2018.00049(212-213)Online publication date: Apr-2018
  • (2017)UFvHProceedings of the Workshop on Visual Analysis in Smart and Connected Communities10.1145/3132734.3132738(17-24)Online publication date: 23-Oct-2017
  • (2017)Comprehensive Feature-Based Robust Video Fingerprinting Using Tensor ModelIEEE Transactions on Multimedia10.1109/TMM.2016.262975819:4(785-796)Online publication date: 1-Apr-2017
  • (2017)Two-layer video fingerprinting strategy for near-duplicate video detection2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)10.1109/ICMEW.2017.8026322(555-560)Online publication date: Jul-2017
  • (2017)Robust video fingerprints using positions of salient regions2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2017.7952715(3041-3045)Online publication date: Mar-2017
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media