Adversarial catoptric light: An effective, stealthy and robust physical‐world attack to DNNs
Recent studies have demonstrated that finely tuned deep neural networks (DNNs) are susceptible to adversarial attacks. Conventional physical attacks employ stickers as perturbations, achieving robust adversarial effects but compromising ...
In view of the invisibility and robustness of the existing physical attacks, the authors propose the adversarial catoptric light, which uses genetic algorithm to optimise the physical parameters of the catoptric light to perform black‐box physical ...
A deep learning framework for multi‐object tracking in team sports videos
In response to the challenges of Multi‐Object Tracking (MOT) in sports scenes, such as severe occlusions, similar appearances, drastic pose changes, and complex motion patterns, a deep‐learning framework CTGMOT (CNN‐Transformer‐GNN‐based MOT) ...
The authors propose a deep‐learning framework, CTGMOT, for multi‐object tracking (MOT) in complex team sports videos. The backbone network of the framework combines CNN and Transformers to extract local and global features, and uses parallel decoders to ...
Clean, performance‐robust, and performance‐sensitive historical information based adversarial self‐distillation
Adversarial training suffers from poor effectiveness due to the challenging optimisation of loss with hard labels. To address this issue, adversarial distillation has emerged as a potential solution, encouraging target models to mimic the output ...
The authors’ method allows the target model to distill the most instant robust and non‐robust knowledge from the previous iteration. To avoid storing model parameters to generate AEs, an existing self‐distillation algorithm was extended, making each “...
Multi‐Scale Feature Attention‐DEtection TRansformer: Multi‐Scale Feature Attention for security check object detection
X‐ray security checks aim to detect contraband in luggage; however, the detection accuracy is hindered by the overlapping and significant size differences of objects in X‐ray images. To address these challenges, the authors introduce a novel ...
The authors use dilated convolutions of multi‐scale dilation rates to build a pyramid feature extraction structure and encapsulate the structure into self‐attention. The new attention module is called Multi‐Scale Feature Attention (MSFA). MSFA can fuse ...
OmDet: Large‐scale vision‐language multi‐dataset pre‐training with multimodal detection network
The advancement of object detection (OD) in open‐vocabulary and open‐world scenarios is a critical challenge in computer vision. OmDet, a novel language‐aware object detection architecture and an innovative training mechanism that harnesses ...
OmDet, a novel language‐aware detector, designed to enhance open‐vocabulary and open‐world object detection through a continual learning approach and multi‐dataset vision‐language pre‐training is presented. By using natural language for knowledge ...
A novel multi‐model 3D object detection framework with adaptive voxel‐image feature fusion
The multifaceted nature of sensor data has long been a hurdle for those seeking to harness its full potential in the field of 3D object detection. Although the utilisation of point clouds as input has yielded exceptional results, the challenge ...
A voxel‐based single‐shot multi‐model network for 3D object detection is introduced, namely AVIFF. The authors made some new attempts in fusing features of point cloud and image by designing the adaptive feature fusion (AFF) module and dense fusion (DF) ...
Context‐aware relation enhancement and similarity reasoning for image‐text retrieval
Image‐text retrieval is a fundamental yet challenging task, which aims to bridge a semantic gap between heterogeneous data to achieve precise measurements of semantic similarity. The technique of fine‐grained alignment between cross‐modal ...
A novel context‐aware relation enhancement and similarity reasoning model is proposed to achieve precise image‐text retrieval, which conducts both intra‐modal relation enhancement and inter‐modal similarity reasoning while considering the global‐context ...
ASDNet: A robust involution‐based architecture for diagnosis of autism spectrum disorder utilising eye‐tracking technology
- Nasirul Mumenin,
- Mohammad Abu Yousuf,
- Md Asif Nashiry,
- A. K. M. Azad,
- Salem A. Alyami,
- Pietro Lio',
- Mohammad Ali Moni
Autism Spectrum Disorder (ASD) is a chronic condition characterised by impairments in social interaction and communication. Early detection of ASD is desired, and there exists a demand for the development of diagnostic aids to facilitate this. A ...
An Involutional neural network architecture has been developed to diagnose ASD. The proposed model is trained to detect ASD from eye‐tracking scanpath, heatmaps, and fixation maps. Monte Carlo dropout has been applied to the model to perform an ...
SIANet: 3D object detection with structural information augment network
3D object detection technology from point clouds has been widely applied in the field of automatic driving in recent years. In practical applications, the shape point clouds of some objects are incomplete due to occlusion or far distance, which ...
The authors design a Structural Information Augment (SIA) module to reconstruct the complete shapes of objects within proposals and then integrate the reconstructed structural information into the spatial feature of the object for box refinement. Besides,...
Attentional bias for hands: Cascade dual‐decoder transformer for sign language production
Sign Language Production (SLP) refers to the task of translating textural forms of spoken language into corresponding sign language expressions. Sign languages convey meaning by means of multiple asynchronous articulators, including manual and ...
An efficient cascade dual decoder Transformer model is presented, which heuristically optimises mappings among text, hand pose, and full‐articulatory pose for sign language production (SLP). In addition, a novel spatio‐temporal loss is introduced to ...