Nothing Special   »   [go: up one dir, main page]

skip to main content
Reflects downloads up to 16 Nov 2024Bibliometrics
Skip Table Of Content Section
research-article
Open Access
Adversarial catoptric light: An effective, stealthy and robust physical‐world attack to DNNs
Abstract

Recent studies have demonstrated that finely tuned deep neural networks (DNNs) are susceptible to adversarial attacks. Conventional physical attacks employ stickers as perturbations, achieving robust adversarial effects but compromising ...

In view of the invisibility and robustness of the existing physical attacks, the authors propose the adversarial catoptric light, which uses genetic algorithm to optimise the physical parameters of the catoptric light to perform black‐box physical ...

research-article
Open Access
A deep learning framework for multi‐object tracking in team sports videos
Abstract

In response to the challenges of Multi‐Object Tracking (MOT) in sports scenes, such as severe occlusions, similar appearances, drastic pose changes, and complex motion patterns, a deep‐learning framework CTGMOT (CNN‐Transformer‐GNN‐based MOT) ...

The authors propose a deep‐learning framework, CTGMOT, for multi‐object tracking (MOT) in complex team sports videos. The backbone network of the framework combines CNN and Transformers to extract local and global features, and uses parallel decoders to ...

research-article
Open Access
Clean, performance‐robust, and performance‐sensitive historical information based adversarial self‐distillation
Abstract

Adversarial training suffers from poor effectiveness due to the challenging optimisation of loss with hard labels. To address this issue, adversarial distillation has emerged as a potential solution, encouraging target models to mimic the output ...

The authors’ method allows the target model to distill the most instant robust and non‐robust knowledge from the previous iteration. To avoid storing model parameters to generate AEs, an existing self‐distillation algorithm was extended, making each “...

research-article
Open Access
Multi‐Scale Feature Attention‐DEtection TRansformer: Multi‐Scale Feature Attention for security check object detection
Abstract

X‐ray security checks aim to detect contraband in luggage; however, the detection accuracy is hindered by the overlapping and significant size differences of objects in X‐ray images. To address these challenges, the authors introduce a novel ...

The authors use dilated convolutions of multi‐scale dilation rates to build a pyramid feature extraction structure and encapsulate the structure into self‐attention. The new attention module is called Multi‐Scale Feature Attention (MSFA). MSFA can fuse ...

research-article
Open Access
OmDet: Large‐scale vision‐language multi‐dataset pre‐training with multimodal detection network
Abstract

The advancement of object detection (OD) in open‐vocabulary and open‐world scenarios is a critical challenge in computer vision. OmDet, a novel language‐aware object detection architecture and an innovative training mechanism that harnesses ...

OmDet, a novel language‐aware detector, designed to enhance open‐vocabulary and open‐world object detection through a continual learning approach and multi‐dataset vision‐language pre‐training is presented. By using natural language for knowledge ...

research-article
Open Access
A novel multi‐model 3D object detection framework with adaptive voxel‐image feature fusion
Abstract

The multifaceted nature of sensor data has long been a hurdle for those seeking to harness its full potential in the field of 3D object detection. Although the utilisation of point clouds as input has yielded exceptional results, the challenge ...

A voxel‐based single‐shot multi‐model network for 3D object detection is introduced, namely AVIFF. The authors made some new attempts in fusing features of point cloud and image by designing the adaptive feature fusion (AFF) module and dense fusion (DF) ...

research-article
Open Access
Context‐aware relation enhancement and similarity reasoning for image‐text retrieval
Abstract

Image‐text retrieval is a fundamental yet challenging task, which aims to bridge a semantic gap between heterogeneous data to achieve precise measurements of semantic similarity. The technique of fine‐grained alignment between cross‐modal ...

A novel context‐aware relation enhancement and similarity reasoning model is proposed to achieve precise image‐text retrieval, which conducts both intra‐modal relation enhancement and inter‐modal similarity reasoning while considering the global‐context ...

research-article
Open Access
ASDNet: A robust involution‐based architecture for diagnosis of autism spectrum disorder utilising eye‐tracking technology
Abstract

Autism Spectrum Disorder (ASD) is a chronic condition characterised by impairments in social interaction and communication. Early detection of ASD is desired, and there exists a demand for the development of diagnostic aids to facilitate this. A ...

An Involutional neural network architecture has been developed to diagnose ASD. The proposed model is trained to detect ASD from eye‐tracking scanpath, heatmaps, and fixation maps. Monte Carlo dropout has been applied to the model to perform an ...

research-article
Open Access
SIANet: 3D object detection with structural information augment network
Abstract

3D object detection technology from point clouds has been widely applied in the field of automatic driving in recent years. In practical applications, the shape point clouds of some objects are incomplete due to occlusion or far distance, which ...

The authors design a Structural Information Augment (SIA) module to reconstruct the complete shapes of objects within proposals and then integrate the reconstructed structural information into the spatial feature of the object for box refinement. Besides,...

research-article
Open Access
Attentional bias for hands: Cascade dual‐decoder transformer for sign language production
Abstract

Sign Language Production (SLP) refers to the task of translating textural forms of spoken language into corresponding sign language expressions. Sign languages convey meaning by means of multiple asynchronous articulators, including manual and ...

An efficient cascade dual decoder Transformer model is presented, which heuristically optimises mappings among text, hand pose, and full‐articulatory pose for sign language production (SLP). In addition, a novel spatio‐temporal loss is introduced to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.