Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3581783.3611853acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification

Published: 27 October 2023 Publication History

Abstract

Audiovisual data is everywhere in this digital age, which raises higher requirements for the deep learning models developed on them. To well handle the information of the multi-modal data is the key to a better audiovisual modal. We observe that these audiovisual data naturally have temporal attributes, such as the time information for each frame in the video. More concretely, such data is inherently multi-modal according to both audio and visual cues, which proceed in a strict chronological order. It indicates that temporal information is important in multi-modal acoustic event modeling for both intra- and inter-modal. However, existing methods deal with each modal feature independently and simply fuse them together, which neglects the mining of temporal relation and thus leads to sub-optimal performance. With this motivation, we propose a Temporal Multi-modal graph learning method for Acoustic event Classification, called TMac, by modeling such temporal information via graph learning techniques. In particular, we construct a temporal graph for each acoustic event, dividing its audio data and video data into multiple segments. Each segment can be considered as a node, and the temporal relationships between nodes can be considered as timestamps on their edges. In this case, we can smoothly capture the dynamic information in intra-modal and inter-modal. Several experiments are conducted to demonstrate TMac outperforms other SOTA models in performance. Our code is available at https://github.com/MGitHubL/TMac.

References

[1]
Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, and Boqing Gong. 2021. Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text. NeurIPS (2021).
[2]
Jean-Baptiste Alayrac, Adria Recasens, Rosalia Schneider, Relja Arandjelović, Jason Ramapuram, Jeffrey De Fauw, Lucas Smaira, Sander Dieleman, and Andrew Zisserman. 2020. Self-supervised multimodal versatile networks. NeurIPS, Vol. 33 (2020), 25--37.
[3]
Humam Alwassel, Dhruv Mahajan, Bruno Korbar, Lorenzo Torresani, Bernard Ghanem, and Du Tran. 2020. Self-supervised learning by cross-modal audio-video clustering. NeurIPS (2020).
[4]
Relja Arandjelovic and Andrew Zisserman. 2017. Look, listen and learn. In ICCV.
[5]
Relja Arandjelovic and Andrew Zisserman. 2018. Objects that sound. In ECCV. 435--451.
[6]
Huriye Atilgan, Stephen M Town, Katherine C Wood, Gareth P Jones, Ross K Maddox, Adrian KC Lee, and Jennifer K Bizley. 2018. Integration of visual information in auditory cortex promotes auditory scene analysis through multisensory binding. Neuron (2018).
[7]
Yusuf Aytar, Carl Vondrick, and Antonio Torralba. 2016. Soundnet: Learning sound representations from unlabeled video. NeurIPS (2016).
[8]
Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. NeurIPS (2020).
[9]
Mingyang Chen, Wen Zhang, Yuxia Geng, Zezhong Xu, Jeff Z Pan, and Huajun Chen. 2023. Generalizing to Unseen Elements: A Survey on Knowledge Extrapolation for Knowledge Graphs. arXiv preprint arXiv:2302.01859 (2023).
[10]
Dading Chong, Helin Wang, Peilin Zhou, and Qingcheng Zeng. 2022. Masked spectrogram prediction for self-supervised audio pre-training. arXiv preprint arXiv:2204.12768 (2022).
[11]
Wei Dai, Chia Dai, Shuhui Qu, Juncheng Li, and Samarjit Das. 2017. Very deep convolutional neural networks for raw waveforms. In ICASSP.
[12]
Difei Gao, Ke Li, Ruiping Wang, Shiguang Shan, and Xilin Chen. [n.d.]. Multi-modal graph neural network for joint reasoning on vision and scene text. In CVPR.
[13]
Jianfei Gao and Bruno Ribeiro. 2021. On the equivalence between temporal and static graph representations for observational predictions. arXiv preprint arXiv:2103.07016 (2021).
[14]
Jort F Gemmeke, Daniel PW Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R Channing Moore, Manoj Plakal, and Marvin Ritter. 2017. Audio set: An ontology and human-labeled dataset for audio events. In ICASSP.
[15]
Oguzhan Gencoglu, Tuomas Virtanen, and Heikki Huttunen. 2014. Recognition of acoustic events using deep neural networks. In EUSIPCO.
[16]
Yuan Gong, Yu-An Chung, and James Glass. 2021. Ast: Audio spectrogram transformer. arXiv preprint arXiv:2104.01778 (2021).
[17]
Tengda Han, Weidi Xie, and Andrew Zisserman. 2020. Self-supervised co-training for video representation learning. NeurIPS (2020).
[18]
Yoonchang Han and Kyogu Lee. 2016. Acoustic scene classification using convolutional neural network and multiple-width frequency-delta data augmentation. arXiv preprint arXiv:1607.02383 (2016).
[19]
Alan G Hawkes. 1971. Point spectra of some mutually exciting point processes. Journal of the Royal Statistical Society: Series B (Methodological) (1971).
[20]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.
[21]
Shawn Hershey, Sourish Chaudhuri, Daniel PW Ellis, Jort F Gemmeke, Aren Jansen, R Channing Moore, Manoj Plakal, Devin Platt, Rif A Saurous, Bryan Seybold, et al. 2017. CNN architectures for large-scale audio classification. In ICASSP.
[22]
Dayu Hu, Ke Liang, Sihang Zhou, Wenxuan Tu, Meng Liu, and Xinwang Liu. 2023. scDFC: A deep fusion clustering method for single-cell RNA-seq data. BIB (2023).
[23]
Po-Yao Huang, Hu Xu, Juncheng Li, Alexei Baevski, Michael Auli, Wojciech Galuba, Florian Metze, and Christoph Feichtenhofer. 2022. Masked autoencoders that listen. NeurIPS, Vol. 35 (2022), 28708--28720.
[24]
Jiaqi Jin, Siwei Wang, Zhibin Dong, Xinwang Liu, and En Zhu. 2023 c. Deep Incomplete Multi-view Clustering with Cross-view Partial Sample and Prototype Alignment. arXiv preprint arXiv:2303.15689 (2023).
[25]
Yiqiao Jin, Yeon-Chang Lee, Kartik Sharma, Meng Ye, Karan Sikka, Ajay Divakaran, and Srijan Kumar. 2023 a. Predicting Information Pathways Across Online Communities. In KDD.
[26]
Yeying Jin, Ruoteng Li, Wenhan Yang, and Robby T Tan. 2023 b. Estimating reflectance layer from a single image: Integrating reflectance guidance and shadow/specular aware learning. In AAAI. 1069--1077.
[27]
Yeying Jin, Aashish Sharma, and Robby T. Tan. 2021. DC-ShadowNet: Single-Image Hard and Soft Shadow Removal Using Unsupervised Domain-Classifier Guided Network. In ICCV. 5027--5036.
[28]
Yiqiao Jin, Xiting Wang, Yaru Hao, Yizhou Sun, and Xing Xie. 2023 d. Prototypical Fine-tuning: Towards Robust Performance Under Varying Data Sizes. In AAAI.
[29]
Yiqiao Jin, Xiting Wang, Ruichao Yang, Yizhou Sun, Wei Wang, Hao Liao, and Xing Xie. 2022a. Towards fine-grained reasoning for fake news detection. In AAAI. 5746--5754.
[30]
Yeying Jin, Wenhan Yang, and Robby T Tan. 2022b. Unsupervised night image enhancement: When layer decomposition meets light-effects suppression. In ECCV.
[31]
Hyoung-Gook Kim and Jin Young Kim. 2017. Acoustic Event Detection in Multichannel Audio Using Gated Recurrent Neural Networks with High-Resolution Spectral Features. ETRI Journal (2017).
[32]
Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, and Mark D Plumbley. 2020. Panns: Large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2020).
[33]
Bruno Korbar, Du Tran, and Lorenzo Torresani. 2018. Cooperative learning of audio and video models from self-supervised synchronization. NeurIPS, Vol. 31 (2018).
[34]
Khaled Koutini, Jan Schlüter, Hamid Eghbal-zadeh, and Gerhard Widmer. 2021. Efficient training of audio transformers with patchout. arXiv preprint arXiv:2110.05069 (2021).
[35]
Srijan Kumar, Xikun Zhang, and Jure Leskovec. 2019. Predicting dynamic embedding trajectory in temporal interaction networks. In KDD.
[36]
Jingbei Li, Yi Meng, Chenyi Li, Zhiyong Wu, Helen Meng, Chao Weng, and Dan Su. 2022. Enhancing speaking styles in conversational text-to-speech synthesis with graph-based multi-modal context modeling. In ICASSP. IEEE, 7917--7921.
[37]
Liang Li, Junpu Zhang, Siwei Wang, Xinwang Liu, Kenli Li, and Keqin Li. 2023 b. Multi-View Bipartite Graph Clustering With Coupled Noisy Feature Filter. TKDE (2023), 1--13.
[38]
Qian Li, Shu Guo, Yangyifei Luo, Cheng Ji, Lihong Wang, Jiawei Sheng, and Jianxin Li. 2023 a. Attribute-Consistent Knowledge Graph Representation Learning for Multi-Modal Entity Alignment. arXiv preprint arXiv:2304.01563 (2023).
[39]
Zixuan Li, Xiaolong Jin, Saiping Guan, Wei Li, Jiafeng Guo, Yuanzhuo Wang, and Xueqi Cheng. 2021a. Search from history and reason for future: Two-stage reasoning on temporal knowledge graphs. arXiv preprint arXiv:2106.00327 (2021).
[40]
Zixuan Li, Xiaolong Jin, Wei Li, Saiping Guan, Jiafeng Guo, Huawei Shen, Yuanzhuo Wang, and Xueqi Cheng. 2021b. Temporal knowledge graph reasoning based on evolutional representation learning. In SIGIR. 408--417.
[41]
Ke Liang, Yue Liu, Sihang Zhou, Wenxuan Tu, Yi Wen, Xihong Yang, Xiangjun Dong, and Xinwang Liu. 2023 a. Knowledge Graph Contrastive Learning Based on Relation-Symmetrical Structure. TKDE (2023).
[42]
Ke Liang, Lingyuan Meng, Meng Liu, Yue Liu, Wenxuan Tu, Siwei Wang, Sihang Zhou, Xinwang Liu, and Fuchun Sun. 2022. A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic, and Multimodal. (2022).
[43]
Ke Liang, Lingyuan Meng, Sihang Zhou, Siwei Wang, Wenxuan Tu, Yue Liu, Meng Liu, and Xinwang Liu. 2023 b. Message Intercommunication for Inductive Relation Reasoning. arXiv preprint arXiv:2305.14074 (2023).
[44]
Ke Liang, Sihang Zhou, Yue Liu, Lingyuan Meng, Meng Liu, and Xinwang Liu. 2023 c. Structure Guided Multi-modal Pre-trained Transformer for Knowledge Graph Reasoning. arXiv preprint arXiv:2307.03591 (2023).
[45]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. [n.d.]. Focal loss for dense object detection. In ICCV.
[46]
Meng Liu, Ke Liang, Bin Xiao, Sihang Zhou, Wenxuan Tu, Yue Liu, Xihong Yang, and Xinwang Liu. 2023 b. Self-Supervised Temporal Graph learning with Temporal and Structural Intensity Alignment. arXiv preprint arXiv:2302.07491 (2023).
[47]
Meng Liu and Yong Liu. 2021. Inductive representation learning in temporal networks via mining neighborhood and community influences. In SIGIR.
[48]
Meng Liu, Yue Liu, Ke Liang, Siwei Wang, Sihang Zhou, and Xinwang Liu. 2023 c. Deep Temporal Graph Clustering. arXiv preprint arXiv:2305.10738 (2023).
[49]
Meng Liu, Jiaming Wu, and Yong Liu. 2022b. Embedding global and local influences for dynamic graphs. In CIKM. 4249--4253.
[50]
Yue Liu, Ke Liang, Jun Xia, Sihang Zhou, Xihong Yang, Xinwang Liu, and Z. Stan Li. 2023 a. Dink-Net: Neural Clustering on Large Graphs. In ICML.
[51]
Yue Liu, Wenxuan Tu, Sihang Zhou, Xinwang Liu, Linxuan Song, Xihong Yang, and En Zhu. 2022a. Deep Graph Clustering via Dual Correlation Reduction. In AAAI. 7603--7611.
[52]
Yue Liu, Jun Xia, Sihang Zhou, Siwei Wang, Xifeng Guo, Xihong Yang, Ke Liang, Wenxuan Tu, Z. Stan Li, and Xinwang Liu. 2022c. A Survey of Deep Graph Clustering: Taxonomy, Challenge, and Application. arXiv preprint arXiv:2211.12875 (2022).
[53]
Yue Liu, Xihong Yang, Sihang Zhou, and Xinwang Liu. 2023 d. Simple contrastive graph clustering. TNNLS (2023).
[54]
Shuang Ma, Zhaoyang Zeng, Daniel McDuff, and Yale Song. 2020. Active contrastive learning of audio-visual video representations. arXiv preprint arXiv:2009.09805 (2020).
[55]
Harry McGurk and John MacDonald. 1976. Hearing lips and seeing voices. Nature (1976).
[56]
Yujie Mo, Yajie Lei, Jialie Shen, Xiaoshuang Shi, Heng Tao Shen, and Xiaofeng Zhu. 2023. Disentangled Multiplex Graph Representation Learning. In ICML.
[57]
Pedro Morgado, Nuno Vasconcelos, and Ishan Misra. 2021. Audio-visual instance discrimination with cross-modal agreement. In CVPR. 12475--12486.
[58]
Andrew Owens, Jiajun Wu, Josh H McDermott, William T Freeman, and Antonio Torralba. 2016. Ambient sound provides supervision for visual learning. In ECCV.
[59]
Aldo Pareja, Giacomo Domeniconi, Jie Chen, Tengfei Ma, Toyotaro Suzumura, Hiroki Kanezashi, Tim Kaler, Tao Schardl, and Charles Leiserson. 2020. Evolvegcn: Evolving graph convolutional networks for dynamic graphs. In AAAI.
[60]
Huy Phan, Marco Maaß, Radoslaw Mazur, and Alfred Mertins. 2014. Random regression forests for acoustic event detection and classification. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2014).
[61]
AJ Piergiovanni, Anelia Angelova, and Michael S Ryoo. [n.d.]. Evolving losses for unsupervised video representation learning. In CVPR.
[62]
Liang Qu, Huaisheng Zhu, Qiqi Duan, and Yuhui Shi. 2020. Continuous-time link prediction via temporal dependent graph neural network. In Proceedings of The Web Conference 2020.
[63]
Emanuele Rossi, Ben Chamberlain, Fabrizio Frasca, Davide Eynard, Federico Monti, and Michael Bronstein. 2020. Temporal graph networks for deep learning on dynamic graphs. arXiv preprint arXiv:2006.10637 (2020).
[64]
Andrew Rouditchenko, Angie Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogerio Feris, et al. 2020. Avlnet: Learning audio-visual language representations from instructional videos. arXiv preprint arXiv:2006.09199 (2020).
[65]
Aaqib Saeed, David Grangier, and Neil Zeghidour. 2021. Contrastive learning of general-purpose audio representations. In ICASSP.
[66]
Aravind Sankar, Yanhong Wu, Liang Gou, Wei Zhang, and Hao Yang. 2020. Dysat: Deep neural representation learning on dynamic graphs via self-attention networks. In WSDM.
[67]
Amir Shirian, Mona Ahmadian, Krishna Somandepalli, and Tanaya Guha. 2023. Heterogeneous Graph Learning for Acoustic Event Classification. arXiv preprint arXiv:2303.02665 (2023).
[68]
Amir Shirian, Krishna Somandepalli, and Tanaya Guha. 2022a. Self-supervised graphs for audio representation learning with limited labeled data. IEEE Journal of Selected Topics in Signal Processing (2022).
[69]
Amir Shirian, Krishna Somandepalli, Victor Sanchez, and Tanaya Guha. 2022b. Visually-aware Acoustic Event Detection using Heterogeneous Graphs. In Proc. Interspeech 2022.
[70]
Abhinav Shukla, Stavros Petridis, and Maja Pantic. 2020. Learning speech representations from raw audio by joint audiovisual self-supervision. arXiv preprint arXiv:2007.04134 (2020).
[71]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[72]
Guichen Tang, Ruiyu Liang, Yue Xie, Yongqiang Bao, and Shijia Wang. 2019. Improved convolutional neural networks for acoustic event classification. Multimedia Tools and Applications (2019).
[73]
Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A closer look at spatiotemporal convolutions for action recognition. In CVPR.
[74]
Xinhang Wan, Jiyuan Liu, Weixuan Liang, Xinwang Liu, Yi Wen, and En Zhu. 2022. Continual Multi-View Clustering. In ACM MM.
[75]
Xinhang Wan, Xinwang Liu, Jiyuan Liu, Siwei Wang, Yi Wen, Weixuan Liang, En Zhu, Zhe Liu, and Lu Zhou. 2023. Auto-weighted Multi-view Clustering for Large-scale Data. arxiv: 2303.01983
[76]
Xiaoyang Wang, Yao Ma, Yiqi Wang, Wei Jin, Xin Wang, Jiliang Tang, Caiyan Jia, and Jian Yu. 2020. Traffic flow prediction via spatial temporal graph neural network. In The Web Conference.
[77]
Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In ACM MM. 1437--1445.
[78]
Yi Wen, Siwei Wang, Qing Liao, Weixuan Liang, Ke Liang, Xinhang Wan, and Xinwang Liu. 2023. Unpaired Multi-View Graph Clustering with Cross-View Structure Matching. arXiv preprint arXiv:2307.03476 (2023).
[79]
Zhihao Wen and Yuan Fang. 2022. Trend: Temporal event and node dynamics for graph representation learning. In Proceedings of the ACM Web Conference 2022.
[80]
Martin Weyssow, Houari Sahraoui, and Bang Liu. [n.d.]. Better modeling the programming world with code concept graphs-augmented multi-modal learning. In ICSE.
[81]
Hanrui Wu, Jinyi Long, Nuosi Li, Dahai Yu, and Michael K Ng. 2023 a. Adversarial Auto-encoder Domain Adaptation for Cold-start Recommendation with Positive and Negative Hypergraphs. TOIS (2023), 1--25.
[82]
Hanrui Wu, Yuguang Yan, and Michael Kwok-Po Ng. 2023 b. Hypergraph Collaborative Network on Vertices and Hyperedges. TPAMI (2023), 3245--3258.
[83]
Liang Xiang, Quan Yuan, Shiwan Zhao, Li Chen, Xiatian Zhang, Qing Yang, and Jimeng Sun. 2010. Temporal recommendation on graphs via long-and short-term preference fusion. In KDD.
[84]
Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, and Kevin Murphy. 2017. Rethinking spatiotemporal feature learning for video understanding. arXiv preprint arXiv:1712.04851 (2017).
[85]
Ruichao Yang, Xiting Wang, Yiqiao Jin, Chaozhuo Li, Jianxun Lian, and Xing Xie. 2022d. Reinforcement subgraph reasoning for fake news detection. In KDD. 2253--2262.
[86]
Xihong Yang, Xiaochang Hu, Sihang Zhou, Xinwang Liu, and En Zhu. 2022a. Interpolation-based contrastive learning for few-label semi-supervised learning. TNNLS (2022).
[87]
Xihong Yang, Yue Liu, Sihang Zhou, Xinwang Liu, and En Zhu. 2022b. Mixed Graph Contrastive Network for Semi-Supervised Node Classification. arXiv preprint arXiv:2206.02796 (2022).
[88]
Xihong Yang, Yue Liu, Sihang Zhou, Siwei Wang, Xinwang Liu, and En Zhu. 2022c. Contrastive Deep Graph Clustering with Learnable Augmentation. arXiv preprint arXiv:2212.03559 (2022).
[89]
Xihong Yang, Yue Liu, Sihang Zhou, Siwei Wang, Wenxuan Tu, Qun Zheng, Xinwang Liu, Liming Fang, and En Zhu. 2023. Cluster-guided Contrastive Graph Clustering Network. In AAAI.
[90]
Yongjing Yin, Fandong Meng, Jinsong Su, Chulun Zhou, Zhengyuan Yang, Jie Zhou, and Jiebo Luo. 2020. A novel graph-based multi-modal fusion encoder for neural machine translation. arXiv preprint arXiv:2007.08742 (2020).
[91]
Haomin Zhang, Ian McLoughlin, and Yan Song. 2015. Robust sound event recognition using convolutional neural networks. In ICASSP.
[92]
Junpu Zhang, Liang Li, Siwei Wang, Jiyuan Liu, Yue Liu, Xinwang Liu, and En Zhu. 2022. Multiple Kernel Clustering with Dual Noise Minimization. In ACM MM. 3440--3450.
[93]
Mengqi Zhang, Yuwei Xia, Qiang Liu, Shu Wu, and Liang Wang. 2023. Learning Latent Relations for Temporal Knowledge Graph Reasoning. In ACL. 12617--12631.
[94]
Hongkuan Zhou, Rajgopal Kannan, Ananthram Swami, and Viktor Prasanna. 2023. HTNet: Dynamic WLAN Performance Prediction using Heterogenous Temporal GNN. arXiv preprint arXiv:2304.10013 (2023).
[95]
Christian Zieger and Maurizio Omologo. 2008. Acoustic event classification using a distributed microphone network with a GMM/SVM combined algorithm. In Ninth Annual Conference of the International Speech Communication Association.
[96]
Yuan Zuo, Guannan Liu, Hao Lin, Jia Guo, Xiaoqian Hu, and Junjie Wu. 2018. Embedding temporal network via neighborhood formation. In KDD.

Cited By

View all
  • (2025)A survey of graph neural networks and their industrial applicationsNeurocomputing10.1016/j.neucom.2024.128761614(128761)Online publication date: Jan-2025
  • (2024)Time-Frequency Domain Fusion Enhancement for Audio Super-ResolutionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681486(2879-2887)Online publication date: 28-Oct-2024
  • (2024)MMDFND: Multi-modal Multi-Domain Fake News DetectionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681317(1178-1186)Online publication date: 28-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. acoustic event classification
  2. multi-modal audiovisual learning
  3. temporal graph learning

Qualifiers

  • Research-article

Funding Sources

Conference

MM '23
Sponsor:
MM '23: The 31st ACM International Conference on Multimedia
October 29 - November 3, 2023
Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)319
  • Downloads (Last 6 weeks)27
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)A survey of graph neural networks and their industrial applicationsNeurocomputing10.1016/j.neucom.2024.128761614(128761)Online publication date: Jan-2025
  • (2024)Time-Frequency Domain Fusion Enhancement for Audio Super-ResolutionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681486(2879-2887)Online publication date: 28-Oct-2024
  • (2024)MMDFND: Multi-modal Multi-Domain Fake News DetectionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681317(1178-1186)Online publication date: 28-Oct-2024
  • (2024)Simple Yet Effective: Structure Guided Pre-trained Transformer for Multi-modal Knowledge Graph ReasoningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681112(1554-1563)Online publication date: 28-Oct-2024
  • (2024)Distributed and Joint Evidential K-Nearest Neighbor ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334109836:11(5972-5985)Online publication date: 1-Nov-2024
  • (2024)Self-Supervised Contrastive Graph Clustering Network via Structural Information Fusion2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD)10.1109/CSCWD61410.2024.10580852(254-259)Online publication date: 8-May-2024
  • (2024)Transferable graph auto-encoders for cross-network node classificationPattern Recognition10.1016/j.patcog.2024.110334150:COnline publication date: 2-Jul-2024
  • (2023)Structural Embedding Pre-Training for Deep Temporal Graph Learning2023 China Automation Congress (CAC)10.1109/CAC59555.2023.10450968(7615-7620)Online publication date: 17-Nov-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media