CVI2: Vol 18, No 5

Volume 18, Issue 5August 2024

Volume 18, Issue 5

August 2024

Publisher:

John Wiley & Sons, Inc.
605 Third Ave. New York, NY
United States

EISSN:1751-9640

Tags:

Bibliometrics

Select All

Export Citations Save to Binder

research-article

Open Access

Adversarial catoptric light: An effective, stealthy and robust physical‐world attack to DNNs

Pages 557–573https://doi.org/10.1049/cvi2.12264

Abstract

Recent studies have demonstrated that finely tuned deep neural networks (DNNs) are susceptible to adversarial attacks. Conventional physical attacks employ stickers as perturbations, achieving robust adversarial effects but compromising ...

In view of the invisibility and robustness of the existing physical attacks, the authors propose the adversarial catoptric light, which uses genetic algorithm to optimise the physical parameters of the catoptric light to perform black‐box physical ...

research-article

Open Access

A deep learning framework for multi‐object tracking in team sports videos

Pages 574–590https://doi.org/10.1049/cvi2.12266

Abstract

In response to the challenges of Multi‐Object Tracking (MOT) in sports scenes, such as severe occlusions, similar appearances, drastic pose changes, and complex motion patterns, a deep‐learning framework CTGMOT (CNN‐Transformer‐GNN‐based MOT) ...

The authors propose a deep‐learning framework, CTGMOT, for multi‐object tracking (MOT) in complex team sports videos. The backbone network of the framework combines CNN and Transformers to extract local and global features, and uses parallel decoders to ...

research-article

Open Access

Clean, performance‐robust, and performance‐sensitive historical information based adversarial self‐distillation

Pages 591–612https://doi.org/10.1049/cvi2.12265

Abstract

Adversarial training suffers from poor effectiveness due to the challenging optimisation of loss with hard labels. To address this issue, adversarial distillation has emerged as a potential solution, encouraging target models to mimic the output ...

The authors’ method allows the target model to distill the most instant robust and non‐robust knowledge from the previous iteration. To avoid storing model parameters to generate AEs, an existing self‐distillation algorithm was extended, making each “...

research-article

Open Access

Multi‐Scale Feature Attention‐DEtection TRansformer: Multi‐Scale Feature Attention for security check object detection

Pages 613–625https://doi.org/10.1049/cvi2.12267

Abstract

X‐ray security checks aim to detect contraband in luggage; however, the detection accuracy is hindered by the overlapping and significant size differences of objects in X‐ray images. To address these challenges, the authors introduce a novel ...

The authors use dilated convolutions of multi‐scale dilation rates to build a pyramid feature extraction structure and encapsulate the structure into self‐attention. The new attention module is called Multi‐Scale Feature Attention (MSFA). MSFA can fuse ...

research-article

Open Access

OmDet: Large‐scale vision‐language multi‐dataset pre‐training with multimodal detection network

Pages 626–639https://doi.org/10.1049/cvi2.12268

Abstract

The advancement of object detection (OD) in open‐vocabulary and open‐world scenarios is a critical challenge in computer vision. OmDet, a novel language‐aware object detection architecture and an innovative training mechanism that harnesses ...

OmDet, a novel language‐aware detector, designed to enhance open‐vocabulary and open‐world object detection through a continual learning approach and multi‐dataset vision‐language pre‐training is presented. By using natural language for knowledge ...

research-article

Open Access

A novel multi‐model 3D object detection framework with adaptive voxel‐image feature fusion

Pages 640–651https://doi.org/10.1049/cvi2.12269

Abstract

The multifaceted nature of sensor data has long been a hurdle for those seeking to harness its full potential in the field of 3D object detection. Although the utilisation of point clouds as input has yielded exceptional results, the challenge ...

A voxel‐based single‐shot multi‐model network for 3D object detection is introduced, namely AVIFF. The authors made some new attempts in fusing features of point cloud and image by designing the adaptive feature fusion (AFF) module and dense fusion (DF) ...

research-article

Open Access

Context‐aware relation enhancement and similarity reasoning for image‐text retrieval

Pages 652–665https://doi.org/10.1049/cvi2.12270

Abstract

Image‐text retrieval is a fundamental yet challenging task, which aims to bridge a semantic gap between heterogeneous data to achieve precise measurements of semantic similarity. The technique of fine‐grained alignment between cross‐modal ...

A novel context‐aware relation enhancement and similarity reasoning model is proposed to achieve precise image‐text retrieval, which conducts both intra‐modal relation enhancement and inter‐modal similarity reasoning while considering the global‐context ...

research-article

Open Access

ASDNet: A robust involution‐based architecture for diagnosis of autism spectrum disorder utilising eye‐tracking technology

Pages 666–681https://doi.org/10.1049/cvi2.12271

Abstract

Autism Spectrum Disorder (ASD) is a chronic condition characterised by impairments in social interaction and communication. Early detection of ASD is desired, and there exists a demand for the development of diagnostic aids to facilitate this. A ...

An Involutional neural network architecture has been developed to diagnose ASD. The proposed model is trained to detect ASD from eye‐tracking scanpath, heatmaps, and fixation maps. Monte Carlo dropout has been applied to the model to perform an ...

research-article

Open Access

SIANet: 3D object detection with structural information augment network

Pages 682–695https://doi.org/10.1049/cvi2.12272

Abstract

3D object detection technology from point clouds has been widely applied in the field of automatic driving in recent years. In practical applications, the shape point clouds of some objects are incomplete due to occlusion or far distance, which ...

The authors design a Structural Information Augment (SIA) module to reconstruct the complete shapes of objects within proposals and then integrate the reconstructed structural information into the spatial feature of the object for box refinement. Besides,...

research-article

Open Access

Attentional bias for hands: Cascade dual‐decoder transformer for sign language production

Pages 696–708https://doi.org/10.1049/cvi2.12273

Abstract

Sign Language Production (SLP) refers to the task of translating textural forms of spoken language into corresponding sign language expressions. Sign languages convey meaning by means of multiple asynchronous articulators, including manual and ...

An efficient cascade dual decoder Transformer model is presented, which heuristically optimises mappings among text, hand pose, and full‐articulatory pose for sign language production (SLP). In addition, a novel spatio‐temporal loss is introduced to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

IET Computer Vision

Sections

Adversarial catoptric light: An effective, stealthy and robust physical‐world attack to DNNs

A deep learning framework for multi‐object tracking in team sports videos

Clean, performance‐robust, and performance‐sensitive historical information based adversarial self‐distillation

Multi‐Scale Feature Attention‐DEtection TRansformer: Multi‐Scale Feature Attention for security check object detection

OmDet: Large‐scale vision‐language multi‐dataset pre‐training with multimodal detection network

A novel multi‐model 3D object detection framework with adaptive voxel‐image feature fusion

Context‐aware relation enhancement and similarity reasoning for image‐text retrieval

ASDNet: A robust involution‐based architecture for diagnosis of autism spectrum disorder utilising eye‐tracking technology

SIANet: 3D object detection with structural information augment network

Attentional bias for hands: Cascade dual‐decoder transformer for sign language production