Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3517077.3517091acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmipConference Proceedingsconference-collections
research-article

A preliminary study of challenges in extracting purity videos from the AV Speech Benchmark

Published: 22 May 2022 Publication History

Abstract

Recently reported deep audiovisual models have shown promising results on solving the cocktail party problem and are attracting new studies. Audiovisual datasets are an important basis for these studies. Here we investigate the AVSpeech dataset[1], a popular dataset that was launched by the Google team, for training deep audio-visual models for multi-talker speech separation. Our goal is to derive a special kind of video, called purity video, from the dataset. A piece of purity video contains continuous image frames of the same person with a face within a time.
A natural question is how we can extract purity videos, as many as possible, from the AVSpeech dataset. This paper presents the tools and methods we utilized, problems we encountered, and the purity video we obtained. Our main contributions are as follows: 1) We propose a solution to extract a derivation subset of the AVSpeech dataset that is of high quality and more than the existing training sets publicly available. 2) We implemented the above solution to perform experiments on the AVSpeech dataset and got insightful results; 3) We also evaluated our proposed solution on our manually labeled dataset called VTData. Experiments show that our solution is effective and robust. We hope this work can help the community in exploiting the AVSpeech dataset for other video understanding tasks.

References

[1]
Qian Y, Weng C, Chang X, et al. Past review, current progress, and challenges ahead on the cocktail party problem[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(1): 40-63.
[2]
Zhu H, Luo M, Wang R, Deep Audio-Visual Learning: A Survey[J]. arXiv preprint arXiv:2001.04758, 2020.
[3]
Ephrat A, Mosseri I, Lang O, Looking to listen at the cocktail party: A speaker-independent audio-visual model for speech separation [J]. arXiv preprint arXiv:1804.03619, 2018.
[4]
Rigal R, Chodorowski J, Zerr B. Deep Audio-Visual Speech Separation Based on Facial Motion}}[J]. Proc. Interspeech 2021, 2021: 3540-3544.
[5]
Owens A, Efros A A. Audio-visual scene analysis with self-supervised multisensory features[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 631-648.
[6]
Zhao H, Gan C, Rouditchenko A, The sound of pixels[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 570-586.
[7]
Xiang J, Zhu G. Joint face detection and facial expression recognition with MTCNN[C]//2017 4th international conference on information science and control engineering (ICISCE). IEEE, 2017: 424-427.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMIP '22: Proceedings of the 2022 7th International Conference on Multimedia and Image Processing
January 2022
250 pages
ISBN:9781450387408
DOI:10.1145/3517077
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 May 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AVSpeech dataset
  2. deep audiovisual model
  3. face recognition
  4. purity video

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • ?????????

Conference

ICMIP 2022

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 24
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media