Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 

Topic Editors

Dr. Wei Zhou
School of Computer Science and Informatics, Cardiff University, Cathays, Cardiff CF24 4AG, UK
School of Biomedical Engineering, Shenzhen University Health Science Center, Shenzhen 518037, China
Dr. Wenhan Yang
Peng Cheng Laboratory, Shenzhen 518066, China

Visual Computing and Understanding: New Developments and Trends

Abstract submission deadline
30 December 2024
Manuscript submission deadline
30 March 2025
Viewed by
3679

Topic Information

Dear Colleagues,

We as humans have to handle massive amounts of visual information in our daily lives. As a result, there has been recent growing interest in the advancement of artificial intelligence-based perception and analysis algorithms in the field of computer vision and image processing.

Despite significant successes in visual computing and understanding in recent years, new developments and trends in the methods in which these achievements are made are still in their infancy, especially for many complex real-world applications.

The aim of this topic is to progress this field by collecting research on both the theorical and applied issues related to advances in visual computing and understanding. All interested authors are invited to submit their innovative manuscripts on (but are not limited to) the following  topics:

  • Image/video acquisition, fusion, and generation;
  • Image/video coding, restoration, and quality assessment;
  • Image/video classification, segmentation, and detection;
  • Deep learning-based methods for image processing and analysis;
  • Deep learning-based methods for video processing and analysis;
  • Deep learning-based computer vision methods for 3D models;
  • Intelligent vision methods for autonomous driving systems;
  • Robotic vision and its applications;
  • Biomedical vision analysis and applications;
  • Advances in visual computing theories.

Dr. Wei Zhou
Dr. Guanghui Yue
Dr. Wenhan Yang
Topic Editors

Keywords

  • image processing and video processing
  • visual computing and deep learning
  • computer vision and robotic vision
  • autonomous driving
  • biomedical vision
  • image acquisition and image fusion
  • generative models
  • video coding and image restoration
  • quality assessment
  • visual understanding
  • feature extraction and object detection
  • image classification
  • semantic segmentation
  • saliency detection
  • perception modelling

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Applied Sciences
applsci
2.5 5.3 2011 17.8 Days CHF 2400 Submit
Computers
computers
2.6 5.4 2012 17.2 Days CHF 1800 Submit
Electronics
electronics
2.6 5.3 2012 16.8 Days CHF 2400 Submit
Information
information
2.4 6.9 2010 14.9 Days CHF 1600 Submit
Journal of Imaging
jimaging
2.7 5.9 2015 20.9 Days CHF 1800 Submit

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (4 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
19 pages, 5749 KiB  
Article
Video Anomaly Detection Based on Global–Local Convolutional Autoencoder
by Fusheng Sun, Jiahao Zhang, Xiaodong Wu, Zhong Zheng and Xiaowen Yang
Electronics 2024, 13(22), 4415; https://doi.org/10.3390/electronics13224415 - 11 Nov 2024
Viewed by 444
Abstract
Video anomaly detection (VAD) plays a crucial role in fields such as security, production, and transportation. To address the issue of overgeneralization in anomaly behavior prediction by deep neural networks, we propose a network called AMFCFBMem-Net (appearance and motion feature cross-fusion block memory [...] Read more.
Video anomaly detection (VAD) plays a crucial role in fields such as security, production, and transportation. To address the issue of overgeneralization in anomaly behavior prediction by deep neural networks, we propose a network called AMFCFBMem-Net (appearance and motion feature cross-fusion block memory network), which combines appearance and motion feature cross-fusion blocks. Firstly, dual encoders for appearance and motion are employed to separately extract these features, which are then integrated into the skip connection layer to mitigate the model’s tendency to predict abnormal behavior, ultimately enhancing the prediction accuracy for abnormal samples. Secondly, a motion foreground extraction module is integrated into the network to generate a foreground mask map based on speed differences, thereby widening the prediction error margin between normal and abnormal behaviors. To capture the latent features of various models for normal samples, a memory module is introduced at the bottleneck of the encoder and decoder structures. This further enhances the model’s anomaly detection capabilities and diminishes its predictive generalization towards abnormal samples. The experimental results on the UCSD Pedestrian dataset 2 (UCSD Ped2) and CUHK Avenue anomaly detection dataset (CUHK Avenue) demonstrate that, compared to current cutting-edge video anomaly detection algorithms, our proposed method achieves frame-level AUCs of 97.5% and 88.8%, respectively, effectively enhancing anomaly detection capabilities. Full article
Show Figures

Figure 1

Figure 1
<p>The AMFCFBMem-net network model: (1) The appearance encoder processes continuous RGB video frames. <math display="inline"><semantics> <mrow> <msub> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>4</mn> </mrow> </msub> <mo>,</mo> <mo>⋯</mo> <mo>,</mo> <msub> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math> as input to extract the visual features of the static scene and the object of interest within the video. (2) The role of the video motion foreground extraction module is to extract the motion foreground information <math display="inline"><semantics> <mrow> <msubsup> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>4</mn> </mrow> <mrow> <mi>t</mi> <mo>−</mo> <mn>3</mn> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>3</mn> </mrow> <mrow> <mi>t</mi> <mo>−</mo> <mn>4</mn> </mrow> </msubsup> <mo>,</mo> <mo>⋯</mo> <mo>,</mo> <msubsup> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>1</mn> </mrow> <mrow> <mi>t</mi> <mo>−</mo> <mn>2</mn> </mrow> </msubsup> </mrow> </semantics></math> from continuous RGB video frames <math display="inline"><semantics> <mrow> <msub> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>4</mn> </mrow> </msub> <mo>,</mo> <mo>⋯</mo> <mo>,</mo> <msub> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math>. (3) The motion encoder utilizes the motion foreground information <math display="inline"><semantics> <mrow> <msubsup> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>4</mn> </mrow> <mrow> <mi>t</mi> <mo>−</mo> <mn>3</mn> </mrow> </msubsup> <mo>,</mo> <msubsup> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>3</mn> </mrow> <mrow> <mi>t</mi> <mo>−</mo> <mn>4</mn> </mrow> </msubsup> <mo>,</mo> <mo>⋯</mo> <mo>,</mo> <msubsup> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>1</mn> </mrow> <mrow> <mi>t</mi> <mo>−</mo> <mn>2</mn> </mrow> </msubsup> </mrow> </semantics></math> as input to extract its motion features. (4) The decoder is capable of translating the features extracted by the encoder back into a higher-dimensional space, facilitating the generation of images and the prediction of future motion states. (5) The AMFCFB module is designed to facilitate interaction and integration between the motion features extracted by the motion encoder. (6) The memory module serves to capture the profound, appearance-based semantic features of various patterns in normal samples.</p>
Full article ">Figure 2
<p>AMFCFB model framework: Subtract the corresponding elements of <span class="html-italic">F<sub>a</sub></span> and the motion-specific map <span class="html-italic">F</span><sub>m</sub>, subsequently execute the convolution and sigmoid operation 1 × 1, and then multiply element-wise with <span class="html-italic">F<sub>a</sub></span> to obtain <span class="html-italic">f</span><sub>a</sub>. <span class="html-italic">f</span><sub>a</sub> serves to enhance the background characteristics. The appearance feature diagram <span class="html-italic">F</span><sub>a</sub> and the motion special map <span class="html-italic">F</span><sub>m</sub> incorporate their respective elements, followed by the convolution and sigmoid operation of 1 × 1. Subsequently, these results are multiplied with <span class="html-italic">F</span><sub>m</sub> to yield <span class="html-italic">f</span><sub>m</sub>. The <span class="html-italic">f</span><sub>m</sub> serves to enhance the motion foreground feature.</p>
Full article ">Figure 3
<p>Vectorization feature operation of VO: The sum fusion feature family <span class="html-italic">f</span><sub>am</sub>, consisting of <span class="html-italic">f</span><sub>a</sub> and <span class="html-italic">f</span><sub>m</sub>, is combined with the appearance input feature map <span class="html-italic">F</span><sub>a</sub> and the motion-specific feature <span class="html-italic">F</span><sub>m</sub> through a vectorization operation (VO). Then apply the Softmax function to each element individually. Transforming the family <span class="html-italic">f</span><sub>am</sub> shape into a matrix of <span class="html-italic">C</span> × (<span class="html-italic">H</span> × <span class="html-italic">W</span>), and then multiplying it with the matrix following the previous Softmax operation. Ultimately, the matrix’s feature shape, which has a size of <span class="html-italic">C</span> × (<span class="html-italic">H</span> × <span class="html-italic">W</span>), is condensed into a feature graph of size <span class="html-italic">C</span> × <span class="html-italic">H</span> × <span class="html-italic">W</span>.</p>
Full article ">Figure 4
<p>Motion foreground extraction module: It processes two consecutive video frames, <math display="inline"><semantics> <mrow> <msub> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>3</mn> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>4</mn> </mrow> </msub> </mrow> </semantics></math>, as inputs. Initially, the video frames <math display="inline"><semantics> <mrow> <msub> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>3</mn> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>4</mn> </mrow> </msub> </mrow> </semantics></math> are converted to grayscale <math display="inline"><semantics> <mrow> <msubsup> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>3</mn> </mrow> <mo>′</mo> </msubsup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msubsup> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>4</mn> </mrow> <mo>′</mo> </msubsup> </mrow> </semantics></math>, which produces a grayscale map. The absolute value of the difference between the grayscale map <math display="inline"><semantics> <mrow> <msubsup> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>3</mn> </mrow> <mo>′</mo> </msubsup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msubsup> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>4</mn> </mrow> <mo>′</mo> </msubsup> </mrow> </semantics></math> the original image is then calculated. The absolute value of its difference is binarized to obtain the foreground mask map G, and then <math display="inline"><semantics> <mrow> <msubsup> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>3</mn> </mrow> <mo>′</mo> </msubsup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msubsup> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>4</mn> </mrow> <mo>′</mo> </msubsup> </mrow> </semantics></math> and G are summed to generate the grayscale map <math display="inline"><semantics> <mrow> <msubsup> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>3</mn> </mrow> <mo>′</mo> </msubsup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msubsup> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>4</mn> </mrow> <mo>′</mo> </msubsup> </mrow> </semantics></math> and the motion foreground information. Finally, the grayscale map <math display="inline"><semantics> <mrow> <msubsup> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>3</mn> </mrow> <mo>′</mo> </msubsup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msubsup> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>4</mn> </mrow> <mo>′</mo> </msubsup> </mrow> </semantics></math> and the motion foreground information is transformed into RGB maps to extract the foreground region information from the actual video frames <math display="inline"><semantics> <mrow> <msubsup> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>3</mn> </mrow> <mrow> <mi>t</mi> <mo>−</mo> <mn>4</mn> </mrow> </msubsup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msubsup> <mi>I</mi> <mrow> <mi>t</mi> <mo>−</mo> <mn>4</mn> </mrow> <mrow> <mi>t</mi> <mo>−</mo> <mn>3</mn> </mrow> </msubsup> </mrow> </semantics></math>, respectively.</p>
Full article ">Figure 5
<p>Memory module network architecture: The memory module primarily consists of two operations: reading and updating. Firstly, the output feature map <math display="inline"><semantics> <mrow> <mi>C</mi> <mo>×</mo> <mi>H</mi> <mo>×</mo> <mi>W</mi> </mrow> </semantics></math> of the encoder is divided into K query terms (<math display="inline"><semantics> <mrow> <msup> <mi>q</mi> <mn>1</mn> </msup> <mo>,</mo> <msup> <mi>q</mi> <mn>2</mn> </msup> <mo>,</mo> <mo>⋯</mo> <mo>,</mo> <msup> <mi>q</mi> <mi>K</mi> </msup> </mrow> </semantics></math>) along the channel dimension, where <math display="inline"><semantics> <mrow> <mi>K</mi> <mo>=</mo> <mi>H</mi> <mo>×</mo> <mi>W</mi> </mrow> </semantics></math>, and the size of each query term is <math display="inline"><semantics> <mrow> <mi>C</mi> <mo>×</mo> <mn>1</mn> <mo>×</mo> <mn>1</mn> </mrow> </semantics></math>, and there are a total of M storage terms of <math display="inline"><semantics> <mrow> <mi>C</mi> <mo>×</mo> <mn>1</mn> <mo>×</mo> <mn>1</mn> </mrow> </semantics></math> size, which are utilized to document the potential characteristics of normal data.</p>
Full article ">Figure 6
<p>Outlier scores on the UCSD ped2 and CUHK Avenue datasets. (<b>a</b>) UCSD ped2 datasets. (<b>b</b>) CUHK Avenue datasets.</p>
Full article ">Figure 7
<p>The ROC curves of UCSD Ped2 and CUHK Avenue test datasets across various models. (<b>a</b>) UCSD ped2 datasets. (<b>b</b>) CUHK Avenue datasets.</p>
Full article ">Figure 8
<p>Prediction error plots of UCSD Ped2 and CUHK Avenue test datasets across various models.</p>
Full article ">
16 pages, 6488 KiB  
Article
3D-CNN Method for Drowsy Driving Detection Based on Driving Pattern Recognition
by Jimin Lee, Soomin Woo and Changjoo Moon
Electronics 2024, 13(17), 3388; https://doi.org/10.3390/electronics13173388 - 26 Aug 2024
Viewed by 646
Abstract
Drowsiness impairs drivers’ concentration and reaction time, doubling the risk of car accidents. Various methods for detecting drowsy driving have been proposed that rely on facial changes. However, they have poor detection for drivers wearing a mask or sunglasses, and they do not [...] Read more.
Drowsiness impairs drivers’ concentration and reaction time, doubling the risk of car accidents. Various methods for detecting drowsy driving have been proposed that rely on facial changes. However, they have poor detection for drivers wearing a mask or sunglasses, and they do not reflect the driver’s drowsiness habits. Therefore, this paper proposes a novel method to detect drowsy driving even with facial detection obstructions, such as masks or sunglasses, and regardless of the driver’s different drowsiness habits, by recognizing behavioral patterns. We achieve this by constructing both normal driving and drowsy driving datasets and developing a 3D-CNN (3D Convolutional Neural Network) model reflecting the Inception structure of GoogleNet. This binary classification model classifies normal driving and drowsy driving videos. Using actual videos captured inside real vehicles, this model achieved a classification accuracy of 85% for detecting drowsy driving without facial obstructions and 75% for detecting drowsy driving when masks and sunglasses are worn. Our results demonstrate that the behavioral pattern recognition method is effective in detecting drowsy driving. Full article
Show Figures

Figure 1

Figure 1
<p>Hyundai Avante (mid-sized sedan) used for data collection.</p>
Full article ">Figure 2
<p>Arducam 8MP camera installed inside the vehicle.</p>
Full article ">Figure 3
<p>Example of the recorded video data (the actual data used do not obscure the face).</p>
Full article ">Figure 4
<p>The image with the background removed using OpenCV (the actual data used do not obscure the face).</p>
Full article ">Figure 5
<p>The drowsy driving detection model architecture proposed in this paper.</p>
Full article ">Figure 6
<p>Accuracy of the test data.</p>
Full article ">Figure 7
<p>Accuracy for new drivers without facial obstructions.</p>
Full article ">Figure 8
<p>Accuracy with facial obstruction, divided into groups with masks and sunglasses.</p>
Full article ">
17 pages, 6653 KiB  
Article
Supervised-Learning-Based Method for Restoring Subsurface Shallow-Layer Q Factor Distribution
by Danfeng Zang, Jian Li, Chuankun Li, Mingxing Ma, Chenli Guo and Jiangang Wang
Electronics 2024, 13(11), 2145; https://doi.org/10.3390/electronics13112145 - 30 May 2024
Viewed by 422
Abstract
The distribution of shallow subsurface quality factors (Q) is a crucial indicator for assessing the integrity of subsurface structures and serves as a primary parameter for evaluating the attenuation characteristics of seismic waves propagating through subsurface media. As the complexity of underground spaces [...] Read more.
The distribution of shallow subsurface quality factors (Q) is a crucial indicator for assessing the integrity of subsurface structures and serves as a primary parameter for evaluating the attenuation characteristics of seismic waves propagating through subsurface media. As the complexity of underground spaces increases, regions expand, and testing environments diversify, the survivability of test nodes is compromised, resulting in sparse effective seismic data with a low signal-to-noise ratio (SNR). Within the confined area defined by the source and sensor placement, only the Q factor along the wave propagation path can be estimated with relative accuracy. Estimating the Q factor in other parts of the area is challenging. Additionally, in recent years, deep neural networks have been employed to address the issue of missing values in seismic data; however, these methods typically require large datasets to train networks that can effectively fit the data, making them less applicable to our specific problem. In response to this challenge, we have developed a supervised learning method for the restoration of shallow subsurface Q factor distributions. The process begins with the construction of an incomplete labeled data volume, followed by the application of a block-based data augmentation technique to enrich the training samples and train the network. The uniformly partitioned initial data are then fed into the trained network to obtain output data, which are subsequently combined to form a complete Q factor data volume. We have validated this training approach using various networks, all yielding favorable results. Additionally, we compared our method with a data augmentation approach that involves creating random masks, demonstrating that our method reduces the mean absolute percentage error (MAPE) by 5%. Full article
Show Figures

Figure 1

Figure 1
<p>Training framework for Q factor restoration.</p>
Full article ">Figure 2
<p>Testing framework for Q factor restoration.</p>
Full article ">Figure 3
<p>Schematic diagram of chromatography model.</p>
Full article ">Figure 4
<p>Chunking operation to create a dataset.</p>
Full article ">Figure 5
<p>Network architecture.</p>
Full article ">Figure 6
<p>Sensor and source layout diagram.</p>
Full article ">Figure 7
<p>Field testing: (<b>a</b>) electric spark source device; (<b>b</b>) vibration sensor node; (<b>c</b>) overall layout plan.</p>
Full article ">Figure 7 Cont.
<p>Field testing: (<b>a</b>) electric spark source device; (<b>b</b>) vibration sensor node; (<b>c</b>) overall layout plan.</p>
Full article ">Figure 8
<p>Waveforms received by some sensors.</p>
Full article ">Figure 9
<p>Reference signal and its spectrum: (<b>a</b>) displacement of the reference sensor; (<b>b</b>) displacement spectrum of the reference sensor.</p>
Full article ">Figure 10
<p>Vibration wave propagation path.</p>
Full article ">Figure 11
<p>Comparison of losses between two initialization methods.</p>
Full article ">Figure 12
<p>Q factor and truth value of training set output: (<b>a</b>) Q factor value calculated by SART; (<b>b</b>) Q factor value of CNN output.</p>
Full article ">Figure 13
<p>Non-public area error percentage.</p>
Full article ">Figure 14
<p>Comparison of two methods: (<b>a</b>) Q factor values of the overall random missing method; (<b>b</b>) our method fixes the complete Q factor value.</p>
Full article ">
14 pages, 5118 KiB  
Article
Domain Adaptive Subterranean 3D Pedestrian Detection via Instance Transfer and Confidence Guidance
by Zengyun Liu, Zexun Zheng, Tianyi Qin, Liying Xu and Xu Zhang
Electronics 2024, 13(5), 982; https://doi.org/10.3390/electronics13050982 - 4 Mar 2024
Viewed by 926
Abstract
With the exploration of subterranean scenes, determining how to ensure the safety of subterranean pedestrians has gradually become a hot research topic. Considering the poor illumination and lack of annotated data in subterranean scenes, it is essential to explore the LiDAR-based domain adaptive [...] Read more.
With the exploration of subterranean scenes, determining how to ensure the safety of subterranean pedestrians has gradually become a hot research topic. Considering the poor illumination and lack of annotated data in subterranean scenes, it is essential to explore the LiDAR-based domain adaptive detectors for localizing the spatial location of pedestrians, thus providing instruction for evacuation and rescue. In this paper, a novel domain adaptive subterranean 3D pedestrian detection method is proposed to adapt pre-trained detectors from the annotated road scenes to the unannotated subterranean scenes. Specifically, an instance transfer-based scene updating strategy is designed to update the subterranean scenes by transferring instances from the road scenes to the subterranean scenes, aiming to create sufficient high-quality pseudo labels for fine-tuning the pre-trained detector. In addition, a pseudo label confidence-guided learning mechanism is constructed to fully utilize pseudo labels of different qualities under the guidance of confidence scores. Extensive experiments validate the superiority of our proposed domain adaptive subterranean 3D pedestrian detection method. Full article
Show Figures

Figure 1

Figure 1
<p>Illustrations of different point cloud scenes. (<b>a</b>) A road scene from the KITTI dataset. (<b>b</b>) A road scene from the ONCE dataset. (<b>c</b>) A subterranean scene from the Edgar dataset.</p>
Full article ">Figure 2
<p>Architecture of the proposed domain adaptive subterranean 3D pedestrian detection method.</p>
Full article ">Figure 3
<p>Visualization results of domain adaptive subterranean 3D pedestrian detection when using SECOND-IoU as the detector. The prediction results of pedestrians are represented in green and the annotated 3D bounding boxes are in blue, and the missed and false detections are circled in red. (<b>a</b>) The prediction results of the comparison method ST3D++. (<b>b</b>) The prediction results of the proposed method.</p>
Full article ">Figure 4
<p>Evaluation of hyperparameters <math display="inline"><semantics> <mi>δ</mi> </semantics></math> when using SECOND-IoU as the detector.</p>
Full article ">
Back to TopTop