Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 

Topic Editors

School of Computer Science and Technology, Zhoukou Normal University, Zhoukou 466001, China
Prof. Dr. Hong Su
School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
College of Computer and Information Sciences, Jouf University, Sakaka 72388, Saudi Arabia
School of Automobile Engineering, Guilin University of Aerospace Technology, Guilin 541004, China

Research on Deep Neural Networks for Video Motion Recognition

Abstract submission deadline
30 November 2024
Manuscript submission deadline
31 January 2025
Viewed by
2478

Topic Information

Dear Colleagues,

Deep neural networks have been widely used for video motion recognition tasks, such as action recognition, activity recognition, and gesture recognition. This Topic call aims to bridge the gap between theoretical research and practical applications in the field of video motion recognition using deep neural networks. The articles are expected to provide insights into the latest trends and advancements in the field and their potential to address real-world problems in various domains. The issue will also highlight the limitations and open research problems in the area, paving the way for future research directions. The contributions are expected to provide a comprehensive and detailed understanding of the underlying principles and techniques of deep-neural-network-based video motion recognition, facilitating the development of innovative solutions and techniques to overcome the existing challenges in the field. Topics of interest include but are not limited to:

  • Novel deep neural network architectures for video motion recognition;
  • Learning spatiotemporal features for video motion recognition;
  • Transfer learning and domain adaptation for video motion recognition;
  • Large-scale video datasets and benchmarking for video motion recognition;
  • Applications of deep neural networks for video motion recognition, such as human– computer interaction, surveillance, and sports analysis;
  • Applications of explainable artificial intelligence for video motion recognition.

Prof. Dr. Hamad Naeem
Prof. Dr. Hong Su
Prof. Dr. Amjad Alsirhani
Prof. Dr. Muhammad Shoaib Bhutta
Topic Editors

Keywords

  • deep learning
  • video analysis
  • motion detection
  • computer vision
  • neural networks

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Future Internet
futureinternet
2.8 7.1 2009 13.1 Days CHF 1600 Submit
Information
information
2.4 6.9 2010 14.9 Days CHF 1600 Submit
Journal of Imaging
jimaging
2.7 5.9 2015 20.9 Days CHF 1800 Submit
Mathematics
mathematics
2.3 4.0 2013 17.1 Days CHF 2600 Submit
Symmetry
symmetry
2.2 5.4 2009 16.8 Days CHF 2400 Submit

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (2 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
16 pages, 5429 KiB  
Article
Video WeAther RecoGnition (VARG): An Intensity-Labeled Video Weather Recognition Dataset
by Himanshu Gupta, Oleksandr Kotlyar, Henrik Andreasson and Achim J. Lilienthal
J. Imaging 2024, 10(11), 281; https://doi.org/10.3390/jimaging10110281 - 5 Nov 2024
Viewed by 500
Abstract
Adverse weather (rain, snow, and fog) can negatively impact computer vision tasks by introducing noise in sensor data; therefore, it is essential to recognize weather conditions for building safe and robust autonomous systems in the agricultural and autonomous driving/drone sectors. The performance degradation [...] Read more.
Adverse weather (rain, snow, and fog) can negatively impact computer vision tasks by introducing noise in sensor data; therefore, it is essential to recognize weather conditions for building safe and robust autonomous systems in the agricultural and autonomous driving/drone sectors. The performance degradation in computer vision tasks due to adverse weather depends on the type of weather and the intensity, which influences the amount of noise in sensor data. However, existing weather recognition datasets often lack intensity labels, limiting their effectiveness. To address this limitation, we present VARG, a novel video-based weather recognition dataset with weather intensity labels. The dataset comprises a diverse set of short video sequences collected from various social media platforms and videos recorded by the authors, processed into usable clips, and categorized into three major weather categories, rain, fog, and snow, with three intensity classes: absent/no, moderate, and high. The dataset contains 6742 annotated clips from 1079 videos, with the training set containing 5159 clips and the test set containing 1583 clips. Two sets of annotations are provided for training, the first set to train the models as a multi-label weather intensity classifier and the second set to train the models as a multi-class classifier for three weather scenarios. This paper describes the dataset characteristics and presents an evaluation study using several deep learning-based video recognition approaches for weather intensity prediction. Full article
Show Figures

Figure 1

Figure 1
<p>Illustration of dataset construction pipeline and the dataset directory structure. (<b>a</b>) Dataset construction pipeline. (<b>b</b>) Dataset folder structure.</p>
Full article ">Figure 2
<p>Examples demonstrating the variety in the VARG dataset regarding scenario, camera dynamicity, time of day, and image quality.</p>
Full article ">Figure 3
<p>Rain streak and snowflakes detection using [<a href="#B14-jimaging-10-00281" class="html-bibr">14</a>] (2nd column), [<a href="#B15-jimaging-10-00281" class="html-bibr">15</a>] (3rd column), and [<a href="#B16-jimaging-10-00281" class="html-bibr">16</a>] (4th column). Top row: dynamic background, heavy snowfall, and low illumination. Middle row: state background, heavy snowfall, and well illuminated. Bottom row: dynamic background, heavy rainfall, and well illuminated.</p>
Full article ">Figure 4
<p>Examples of the two sets of annotation provided in VARG dataset.</p>
Full article ">Figure 5
<p>Model architecture with four different classification heads used for weather prediction. Here, “B” is the batch size.</p>
Full article ">Figure 6
<p>Confusion matrices of multi-class weather intensity classification using MViTv2 backbone and multi-class multiple-head classification with attention module (MCMHA) that provided the best exact match accuracy on the test dataset. Darker color correspond to larger value.</p>
Full article ">Figure 7
<p>Examples of wrong classifications of weather intensity for (<b>a</b>) rain, (<b>b</b>) fog, and (<b>c</b>) snow weather. The ground truth and predicted labels for each image are given on top of the images in the format “true-predicted”. (<b>a</b>) Rain intensity misclassification. (<b>b</b>) Fog intensity misclassification. (<b>c</b>) Snow intensity misclassification.</p>
Full article ">Figure A1
<p>Lidar scans and efficiency of ROR and DROR filters in moderate (<b>top row</b>) and heavy snow (<b>bottom row</b>). Source: sequence “2018_03_07” of the CADC dataset for <b>top row</b>, and sequence “2021_01_26” of the Boreas dataset for <b>bottom row</b>.</p>
Full article ">Figure A2
<p>Lidar and camera in moderate (<b>top row</b>) and heavy (<b>bottom row</b>) fog. Source: sequence “fog_6_0”, and sequence “fog_8_0” from Radiate dataset for <b>top row</b> and <b>bottom row</b>. The maximum point range for lidar scans in moderate and heavy fog is 50 and 18 m, respectively, in this particular case.</p>
Full article ">Figure A3
<p>Result of object detection using Yolov5 in “moderate” and “heavy” snowing shown in 1st and 2nd row, respectively. Red boxes indicate the ground truth and orange (car) boxes shows the prediction using Yolov5 model.</p>
Full article ">Figure A4
<p>Result of object detection using Yolov5 in “moderate” and “heavy” fog shown in 1st and 2nd row, respectively. Red boxes indicate the ground truth and green (truck) and orange (car) boxes shows the prediction using Yolov5 model.</p>
Full article ">
16 pages, 1535 KiB  
Article
Temporal–Semantic Aligning and Reasoning Transformer for Audio-Visual Zero-Shot Learning
by Kaiwen Zhang, Kunchen Zhao and Yunong Tian
Mathematics 2024, 12(14), 2200; https://doi.org/10.3390/math12142200 - 13 Jul 2024
Cited by 1 | Viewed by 630
Abstract
Zero-shot learning (ZSL) enables models to recognize categories not encountered during training, which is crucial for categories with limited data. Existing methods overlook efficient temporal modeling in multimodal data. This paper proposes a Temporal–Semantic Aligning and Reasoning Transformer (TSART) for spatio-temporal modeling. TSART [...] Read more.
Zero-shot learning (ZSL) enables models to recognize categories not encountered during training, which is crucial for categories with limited data. Existing methods overlook efficient temporal modeling in multimodal data. This paper proposes a Temporal–Semantic Aligning and Reasoning Transformer (TSART) for spatio-temporal modeling. TSART uses the pre-trained SeLaVi network to extract audio and visual features and explores the semantic information of these modalities through audio and visual encoders. It incorporates a temporal information reasoning module to enhance the capture of temporal features in audio, and a cross-modal reasoning module to effectively integrate audio and visual information, establishing a robust joint embedding representation. Our experimental results validate the effectiveness of this approach, demonstrating outstanding Generalized Zero-Shot Learning (GZSL) performance on the UCF101 Generalized Zero-Shot Learning (UCF-GZSL), VGGSound-GZSL, and ActivityNet-GZSL datasets, with notable improvements in the Harmonic Mean (HM) evaluation. These results indicate that TSART has great potential in handling complex spatio-temporal information and multimodal fusion. Full article
Show Figures

Figure 1

Figure 1
<p>The changes in the position of the hand during shooting.</p>
Full article ">Figure 2
<p>The intensity variation of sound signals at different frequencies in the audio.</p>
Full article ">Figure 3
<p>The Temporal–Semantic Aligning and Reasoning Transformer architecture includes an audio encoder, visual encoder, temporal information reasoning, and cross-modal reasoning module. It effectively models temporal information in audio-visual content.</p>
Full article ">
Back to TopTop