Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (3,304)

Search Parameters:
Keywords = multi-view

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 9369 KiB  
Article
Study on Mechanism of Visual Comfort Perception in Urban 3D Landscape
by Miao Zhang, Tao Shen, Liang Huo, Shunhua Liao, Wenfei Shen and Yucai Li
Buildings 2025, 15(4), 628; https://doi.org/10.3390/buildings15040628 - 18 Feb 2025
Abstract
Landscape visual evaluation is a key method for assessing the value of visual landscape resources. This study aims to enhance the visual environment and sensory quality of urban landscapes by establishing standards for the visual comfort of urban natural landscapes. Using line-of-sight and [...] Read more.
Landscape visual evaluation is a key method for assessing the value of visual landscape resources. This study aims to enhance the visual environment and sensory quality of urban landscapes by establishing standards for the visual comfort of urban natural landscapes. Using line-of-sight and multi-factor analysis algorithms, the method assesses spatial visibility and visual exposure of building clusters in the core urban areas of Harbin, identifying areas and viewpoints with high visual potential. Focusing on the viewpoints of landmark 3D models and the surrounding landscape’s visual environment, the study uses the city’s sky, greenery, and water features as key visual elements for evaluating the comfort of urban natural landscapes. By integrating GIS data, big data street-view photos, and image semantic recognition, spatial analysis algorithms extract both objective and subjective visual values at observation points, followed by mathematical modeling and quantitative analysis. The study explores the coupling relationship between objective physical visual values and subjective perceived visibility. The results show that 3D visual analysis effectively reveals the relationship between landmark buildings and surrounding landscapes, providing scientific support for urban planning and contributing to the development of a more distinctive and attractive urban space. Full article
(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)
Show Figures

Figure 1

Figure 1
<p>Map of the study area.</p>
Full article ">Figure 2
<p>Technical Roadmap for Comprehensive Landscape Visual Analysis.</p>
Full article ">Figure 3
<p>(<b>a</b>) Digital Elevation Model Analysis; (<b>b</b>) viewshed analysis of the city.</p>
Full article ">Figure 4
<p>Harbin city land use type map. (The red-circled area is the building complex of the study area).</p>
Full article ">Figure 5
<p>(<b>a</b>) Traffic accessibility analysis map; (<b>b</b>) traffic factor influence map; (<b>c</b>) POI data influence; (<b>d</b>) green space influence factor.</p>
Full article ">Figure 6
<p>(<b>a</b>) Flood control monument model and surrounding Buildings (Post-Modeling); (<b>b</b>) Saint Sophia Cathedral and surrounding buildings (Post-Modeling).</p>
Full article ">Figure 7
<p>Comparison of Viewpoint Selection Based on Street View Images and Models. (<b>a</b>) Flood control monument model; (<b>b</b>) Saint Sophia Cathedral. (The viewpoints of F1–F5 and S1–S5 correspond one-to-one in different perspectives).</p>
Full article ">Figure 8
<p>Skyline Analysis (<b>a</b>) F1–F5 Analysis Diagram; (<b>b</b>) S1–S5 Analysis Diagram (<b>c</b>) Skyline Radar Chart for F1–F5; (<b>d</b>) Skyline Radar Chart for S1–S5.</p>
Full article ">Figure 9
<p>Multi-Viewpoint Street Scenes of Flood Control Memorial Tower and Saint Sophia Cathedral. (Landmark buildings—Flood Control Memorial Tower has been circled in yellow, and Saint Sophia Cathedral has been circled in red). (<b>a</b>) Perspective 1; (<b>b</b>) Perspective 2 (<b>c</b>) Perspective 3; (<b>d</b>) Perspective 4.</p>
Full article ">Figure 9 Cont.
<p>Multi-Viewpoint Street Scenes of Flood Control Memorial Tower and Saint Sophia Cathedral. (Landmark buildings—Flood Control Memorial Tower has been circled in yellow, and Saint Sophia Cathedral has been circled in red). (<b>a</b>) Perspective 1; (<b>b</b>) Perspective 2 (<b>c</b>) Perspective 3; (<b>d</b>) Perspective 4.</p>
Full article ">Figure 10
<p>Multi-view Visibility Analysis of 3D Models. (<b>a</b>) View directions of F1–F5; (<b>b</b>) view directions of S1–S5.</p>
Full article ">Figure 11
<p>Percentage Statistical Chart of Comprehensive Analysis for Visual Evaluation Factors.</p>
Full article ">Figure 12
<p>Hierarchical Model Diagram for Statistical Analysis of Visual Factors.</p>
Full article ">Figure 13
<p>Visual Landscape Control Elements Diagram. (Red in the picture: Viewpoints 1–23).</p>
Full article ">Figure 14
<p>Spatial Distribution Analysis of Urban Landscape Elements.</p>
Full article ">
24 pages, 30185 KiB  
Article
3D Digital Human Generation from a Single Image Using Generative AI with Real-Time Motion Synchronization
by Myeongseop Kim, Taehyeon Kim and Kyung-Taek Lee
Electronics 2025, 14(4), 777; https://doi.org/10.3390/electronics14040777 - 17 Feb 2025
Viewed by 188
Abstract
The generation of 3D digital humans has traditionally relied on multi-view imaging systems and large-scale datasets, posing challenges in cost, accessibility, and real-time applicability. To overcome these limitations, this study presents an efficient pipeline that constructs high-fidelity 3D digital humans from a single [...] Read more.
The generation of 3D digital humans has traditionally relied on multi-view imaging systems and large-scale datasets, posing challenges in cost, accessibility, and real-time applicability. To overcome these limitations, this study presents an efficient pipeline that constructs high-fidelity 3D digital humans from a single frontal image. By leveraging generative AI, the system synthesizes additional views and generates UV maps compatible with the SMPL-X model, ensuring anatomically accurate and photorealistic reconstructions. The generated 3D models are imported into Unity 3D, where they are rigged for real-time motion synchronization using BlazePose-based lightweight pose estimation. To further enhance motion realism, custom algorithms—including ground detection and rotation smoothing—are applied, improving movement stability and fluidity. The system was rigorously evaluated through both quantitative and qualitative analyses. Results show an average generation time of 211.1 s, segmentation accuracy of 92.1%, and real-time rendering at 64.4 FPS. In qualitative assessments, expert reviewers rated the system using the SUS usability framework and heuristic evaluation, confirming its usability and effectiveness. This method eliminates the need for multi-view cameras or depth sensors, significantly reducing the barrier to entry for real-time 3D avatar creation and interactive AI-driven applications. It has broad applications in virtual reality (VR), gaming, digital content creation, AI-driven simulation, digital twins, and telepresence systems. By introducing a scalable and accessible 3D modeling pipeline, this research lays the groundwork for future advancements in immersive and interactive environments. Full article
(This article belongs to the Special Issue AI Synergy: Vision, Language, and Modality)
Show Figures

Figure 1

Figure 1
<p>Overview of the Proposed 3D Digital Human Generation and Real-Time Motion Synchronization Pipeline <span class="html-italic">(This figure illustrates the complete pipeline for generating a 3D digital human from a single frontal image and synchronizing it with real-time motion data. The system consists of five key components, each contributing to different aspects of the generation and rendering process: <b>1. Virtual Try-On Module</b>—The input frontal image undergoes a virtual try-on process where clothing is digitally altered using generative AI. This allows for personalization while preserving the identity and body shape of the user. <b>2. 2D-to-3D Translation</b>—The modified image is converted into a 3D representation using a neural rendering model. The system estimates depth and reconstructs the rear view of the user, ensuring a full 3D model from a single image input. <b>3. Super-Resolution Enhancement</b>—A super-resolution module refines the generated 3D texture, improving the fidelity of details such as facial features, clothing textures, and structural accuracy. This step is essential for ensuring realism in the final 3D model. <b>4. UV Map Generation</b>—A UV mapping process extracts and flattens the high-resolution texture, enabling seamless application onto a parametric 3D human model. The UV maps maintain consistency between the synthesized views and the reconstructed 3D geometry. <b>5. Real-Time Rendering and Motion Synchronization</b>—The final textured 3D digital human is imported into a real-time simulation environment (Unity 3D). Using a monocular RGB camera and BlazePose-based motion tracking, the digital human dynamically responds to user movements. Custom inverse kinematics (IK) algorithms, rotation smoothing, and ground detection ensure stability and natural motion transitions)</span>.</p>
Full article ">Figure 2
<p>Digital Human UV Map Completion.</p>
Full article ">Figure 3
<p>Real-Time Pose Estimation and Motion Synchronization Framework for 3D Digital Humans.</p>
Full article ">Figure 4
<p>Comparison of Joint Definitions Between BlazePose and SMPL-X: Absence of Spine Joints.</p>
Full article ">Figure 5
<p>Generation of Virtual Spine Landmarks for SMPL-X Integration: (<b>a</b>) BlazePose 3D Skeleton, (<b>b</b>) Spine Joint Inference, (<b>c</b>) SMPL-X Skeleton.</p>
Full article ">Figure 6
<p>Calculation of Pelvis-Orthogonal Direction for SMPL-X Orientation Alignment.</p>
Full article ">Figure 7
<p>Foot-Ground Offset Adjustment via Ray-Casting.</p>
Full article ">Figure 8
<p>Expert Evaluation of MetaHuman for Qualitative Assessment.</p>
Full article ">Figure 9
<p>Task 1: MetaHuman Generation and Rendering for Qualitative Evaluation.</p>
Full article ">Figure 10
<p>Real-time 3D digital human synchronization based on input methods.</p>
Full article ">
16 pages, 2071 KiB  
Article
Fully Incomplete Information for Multiview Clustering in Postoperative Liver Tumor Diagnoses
by Siyuan Li and Xinde Li
Sensors 2025, 25(4), 1215; https://doi.org/10.3390/s25041215 - 17 Feb 2025
Viewed by 153
Abstract
Multiview clustering (MVC) is a proven, effective approach to boosting the various downstream tasks given by unlabeled data. In contemporary society, domain-specific multiview data, such as multiphase postoperative liver tumor contrast-enhanced computed tomography (CECT) images, may be vulnerable to exploitation by illicit organizations [...] Read more.
Multiview clustering (MVC) is a proven, effective approach to boosting the various downstream tasks given by unlabeled data. In contemporary society, domain-specific multiview data, such as multiphase postoperative liver tumor contrast-enhanced computed tomography (CECT) images, may be vulnerable to exploitation by illicit organizations or may not be comprehensively collected due to patient privacy concerns. Thus, these can be modeled as incomplete multiview clustering (IMVC) problems. Most existing IMVC methods have three issues: (1) most methods rely on paired views, which are often unavailable in clinical practice; (2) directly predicting the features of missing views may omit key features; and (3) recovered views still have subtle differences from the originals. To overcome these challenges, we proposed a novel framework named fuzzy clustering combined with information theory arithmetic based on feature reconstruction (FCITAFR). Specifically, we propose a method for reconstructing the characteristics of prevailing perspectives for each sample. Based on this, we utilized the reconstructed features to predict the missing views. Then, based on the predicted features, we used variational fuzzy c-means clustering (FCM) combined with information theory to learn the mutual information among views. The experimental results indicate the advantages of FCITAFR in comparison to state-of-the-art methods, on both in-house and external datasets, in terms of accuracy (ACC) (77.5%), normalized mutual information (NMI) (37.9%), and adjusted rand index (ARI) (29.5%). Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

Figure 1
<p>The framework of FCITAFR, where “?” donates the missing views. The within-view reconstruction loss <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>L</mi> </mrow> <mrow> <mi>r</mi> <mi>e</mi> <mi>c</mi> </mrow> </msub> </mrow> </semantics></math> uses encoders to obtain the latent representation <math display="inline"><semantics> <mrow> <msup> <mrow> <mi mathvariant="bold-italic">Z</mi> </mrow> <mrow> <mi mathvariant="bold-italic">v</mi> </mrow> </msup> </mrow> </semantics></math>. Then, via feature reconstruct prediction loss <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>L</mi> </mrow> <mrow> <mi>p</mi> <mi>r</mi> <mi>e</mi> </mrow> </msub> </mrow> </semantics></math>, in order to reconstruct the latent representation of existing views to obtain new latent representation <math display="inline"><semantics> <mrow> <msup> <mrow> <mi>L</mi> </mrow> <mrow> <mi>v</mi> </mrow> </msup> </mrow> </semantics></math>, the generators <span class="html-italic">G</span> predict the missing views of each sample. Finally, via fuzzy clustering with information theory loss <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>L</mi> </mrow> <mrow> <mi>f</mi> <mi>c</mi> <mi>i</mi> <mi>t</mi> </mrow> </msub> </mrow> </semantics></math>, in order to obtain the membership matrix <math display="inline"><semantics> <mrow> <msup> <mrow> <mi mathvariant="bold-italic">U</mi> </mrow> <mrow> <mi mathvariant="bold-italic">v</mi> </mrow> </msup> </mrow> </semantics></math> of each view, the fusion similarity matrix among views can be obtained via information theory.</p>
Full article ">Figure 2
<p>Five instances of liver tumor CECT images from the different periods; “?” denotes a missing view and “→” denotes mapping the missing view.</p>
Full article ">Figure 3
<p>Parameter evaluation on Zhongda with a missing rate of 0.3.</p>
Full article ">Figure 4
<p>Visualization of latent space via t-SNE on Zheyi with the missing rate of 0.3. With an epoch of 500, compared with other methods, FCITAFR constructed a more discriminative latent space.</p>
Full article ">Figure 5
<p>Convergence evaluation on Zhongda with a missing rate of 0.3.</p>
Full article ">
26 pages, 340 KiB  
Article
How Do Researchers and Public Officials Co-Navigate e-Participation Implementation? An Action-Research Experience with South African Municipalities
by Odilile Ayodele, Paul Plantinga, Diana Sanchez-Bentacourt, Simangele Dlamini and Nonkululeko Dlamini
Systems 2025, 13(2), 125; https://doi.org/10.3390/systems13020125 - 17 Feb 2025
Viewed by 89
Abstract
How can local governments in the Global South effectively implement and sustain digital public participation initiatives, and what specific roles do public officials and knowledge intermediaries play in catalysing and supporting this digital transformation? In this paper, we respond to this query by [...] Read more.
How can local governments in the Global South effectively implement and sustain digital public participation initiatives, and what specific roles do public officials and knowledge intermediaries play in catalysing and supporting this digital transformation? In this paper, we respond to this query by examining our experiences as researchers in implementing an ongoing multi-year applied research project on e-participation and policy modelling in various South African municipalities. The project and the country’s ongoing public participation initiatives align with government policy emphasising citizen-centric public service delivery through consultation, transparency, and accountability. We aim to highlight the practical aspects of introducing e-participation mechanisms in local governance structures with different human and material resource capacity levels, which is the case in Global South contexts. Our central question is as follows: How can public officials, working alongside researchers and knowledge intermediaries, navigate the sustainable adoption of e-participation through participatory design, system awareness, and bricolage? We explore this question from the perspective of researchers leading the implementation of a multi-year, participatory action-research project which aims to pilot e-participation in six municipalities across South Africa. In the paper, we analyse the interplay of various socio-technical factors influencing the early stages of implementation and use of e-participation tools in local government. Our findings highlight the importance of building socio-technical agency of front-line public participation officials, exploring malleable technologies that are flexible to adaptation, facilitating peer-learning amongst officials on what works in similar contexts, and having a system view within and across pilots. Full article
(This article belongs to the Special Issue Digital Solutions for Participatory Governance in Smart Cities)
35 pages, 45770 KiB  
Article
Target Ship Recognition and Tracking with Data Fusion Based on Bi-YOLO and OC-SORT Algorithms for Enhancing Ship Navigation Assistance
by Shuai Chen, Miao Gao, Peiru Shi, Xi Zeng and Anmin Zhang
J. Mar. Sci. Eng. 2025, 13(2), 366; https://doi.org/10.3390/jmse13020366 - 16 Feb 2025
Viewed by 294
Abstract
With the ever-increasing volume of maritime traffic, the risks of ship navigation are becoming more significant, making the use of advanced multi-source perception strategies and AI technologies indispensable for obtaining information about ship navigation status. In this paper, first, the ship tracking system [...] Read more.
With the ever-increasing volume of maritime traffic, the risks of ship navigation are becoming more significant, making the use of advanced multi-source perception strategies and AI technologies indispensable for obtaining information about ship navigation status. In this paper, first, the ship tracking system was optimized using the Bi-YOLO network based on the C2f_BiFormer module and the OC-SORT algorithms. Second, to extract the visual trajectory of the target ship without a reference object, an absolute position estimation method based on binocular stereo vision attitude information was proposed. Then, a perception data fusion framework based on ship spatio-temporal trajectory features (ST-TF) was proposed to match GPS-based ship information with corresponding visual target information. Finally, AR technology was integrated to fuse multi-source perceptual information into the real-world navigation view. Experimental results demonstrate that the proposed method achieves a mAP0.5:0.95 of 79.6% under challenging scenarios such as low resolution, noise interference, and low-light conditions. Moreover, in the presence of the nonlinear motion of the own ship, the average relative position error of target ship visual measurements is maintained below 8%, achieving accurate absolute position estimation without reference objects. Compared to existing navigation assistance, the AR-based navigation assistance system, which utilizes ship ST-TF-based perception data fusion mechanism, enhances ship traffic situational awareness and provides reliable decision-making support to further ensure the safety of ship navigation. Full article
Show Figures

Figure 1

Figure 1
<p>Organization diagram of the sections of this paper.</p>
Full article ">Figure 2
<p>A perception data fusion framework based on ship ST-TF for ship AR navigation assistance.</p>
Full article ">Figure 3
<p>The structure of the Bi-YOLO network.</p>
Full article ">Figure 4
<p>(<b>a</b>) Details of a BiFormer block; (<b>b</b>) Structure of the BRA.</p>
Full article ">Figure 5
<p>(<b>a</b>) to (<b>b</b>) illustrate the Driving-Leaves binocular camera before and after calibration and stereo rectification, and (<b>c</b>) to (<b>d</b>) illustrate the Baymax binocular camera before and after calibration and stereo rectification.</p>
Full article ">Figure 6
<p>Conceptual diagram of the binocular imaging process.</p>
Full article ">Figure 7
<p>Illustration of coordinate system conversion.</p>
Full article ">Figure 8
<p>Synchronization process of different sensor frequencies.</p>
Full article ">Figure 9
<p>Asynchronous nonlinear ship trajectory sequence association based on the DTW algorithm.</p>
Full article ">Figure 10
<p>Asynchronous ship trajectory association and joint data storage method.</p>
Full article ">Figure 11
<p>The MASSs used in the experimental process.</p>
Full article ">Figure 12
<p>The data samples from the FLShip dataset.</p>
Full article ">Figure 13
<p>Training mAP@0.5 curves for Bi-YOLO and various object detection algorithms.</p>
Full article ">Figure 14
<p>(<b>a</b>–<b>f</b>) respectively show the comparison of detection effects between YOLO11s and Bi-YOLO.</p>
Full article ">Figure 15
<p>Tracking performance comparison of four state-of-the-art object trackers in Scene-2.</p>
Full article ">Figure 16
<p>Tracking performance comparison of four state-of-the-art object trackers in Scene-4.</p>
Full article ">Figure 17
<p>Tracking performance comparison of four state-of-the-art object trackers in Scene-6.</p>
Full article ">Figure 18
<p>The visual position estimation results of the ‘Roaring-Flame’ MASS in Scene-1.</p>
Full article ">Figure 19
<p>The visual position estimation result of the ‘Baymax’ MASS in scene-2.</p>
Full article ">Figure 20
<p>The visual position estimation results of the ‘Baymax’ MASS in Scene-3.</p>
Full article ">Figure 21
<p>The AR navigation assistance effects of ships constructed at different timestamps in multiple scenes.</p>
Full article ">
18 pages, 4254 KiB  
Article
Intracranial Aneurysm Segmentation with a Dual-Path Fusion Network
by Ke Wang, Yong Zhang and Bin Fang
Bioengineering 2025, 12(2), 185; https://doi.org/10.3390/bioengineering12020185 - 15 Feb 2025
Viewed by 139
Abstract
Intracranial aneurysms (IAs), a significant medical concern due to their prevalence and life-threatening nature, pose challenges regarding diagnosis owing to their diminutive and variable morphology. There are currently challenges surrounding automating the segmentation of IAs, which is essential for diagnostic precision. Existing deep [...] Read more.
Intracranial aneurysms (IAs), a significant medical concern due to their prevalence and life-threatening nature, pose challenges regarding diagnosis owing to their diminutive and variable morphology. There are currently challenges surrounding automating the segmentation of IAs, which is essential for diagnostic precision. Existing deep learning methods in IAs segmentation tend to emphasize semantic features at the expense of detailed information, potentially compromising segmentation quality. Our research introduces the innovative Dual-Path Fusion Network (DPF-Net), an advanced deep learning architecture crafted to refine IAs segmentation by adeptly incorporating detailed information. DPF-Net, with its unique resolution-preserving detail branch, ensures minimal loss of detail during feature extraction, while its cross-fusion module effectively promotes the connection of semantic information and finer detail features, enhancing segmentation precision. The network also integrates a detail aggregation module for effective fusion of multi-scale detail features. A view fusion strategy is employed to address spatial disruptions in patch generation, thereby improving feature extraction efficiency. Evaluated on the CADA dataset, DPF-Net achieves a remarkable mean Dice similarity coefficient (DSC) of 0.8967, highlighting its potential in automated IAs diagnosis in clinical settings. Furthermore, DPF-Net’s outstanding performance on the BraTS 2020 MRI dataset for brain tumor segmentation with a mean DSC of 0.8535 further confirms its robustness and generalizability. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Figure 1

Figure 1
<p>The overall flow of the proposed method for IAs segmentation.</p>
Full article ">Figure 2
<p>The architecture of the proposed DPF-Net.</p>
Full article ">Figure 3
<p>The detailed illustration of the cross-fusion module.</p>
Full article ">Figure 4
<p>The detailed illustration of the detail aggregation module.</p>
Full article ">Figure 5
<p>DSC score obtained by applying our proposed method to each volume in the CADA dataset. The red line indicates the mean of all cases.</p>
Full article ">Figure 6
<p>Segmentation performance of the proposed method on the CADA dataset. The ground truth and prediction results are marked in green and red, respectively. Yellow regions represent the overlap of prediction and ground truth. Note the images are not on the same scale.</p>
Full article ">Figure 7
<p>The segmentation results of ablation studies on the CADA dataset. The ground truth and prediction results are marked in green and red, respectively. Yellow regions represent the overlap of prediction and ground truth. Network 1 (full proposed model), network 2 (without detail branch), network 3 (without detail aggregation module), and network 4 (without cross-fusion module). Note the images are not the same.</p>
Full article ">Figure 8
<p>Segmentation performances of view fusion when using different fusion manners. The axial, coronal, and sagittal indicate prediction results obtained from axial, coronal, and sagittal views, respectively. The ground truth and prediction results are marked in green and red, respectively. Yellow regions represent the overlap of prediction and ground truth. The abbreviations mv and add denote fusing multi-view pre-diction results via majority voting and adding operation, respectively.</p>
Full article ">Figure 9
<p>Visualization of failure cases on the CADA dataset. The ground truth and prediction results are marked in green and red, respectively. Yellow regions represent the overlap of prediction and ground truth.</p>
Full article ">Figure 10
<p>Segmentation performance on BraTS 2020 brain tumor segmentation dataset. WT is marked in blue, green, and red; TC is marked in green and red; ET is marked in red.</p>
Full article ">
14 pages, 2796 KiB  
Article
Multi-View and Multi-Type Feature Fusion Rotor Biofouling Recognition Method for Tidal Stream Turbine
by Haoran Xu, Dingding Yang, Tianzhen Wang and Mohamed Benbouzid
J. Mar. Sci. Eng. 2025, 13(2), 356; https://doi.org/10.3390/jmse13020356 - 15 Feb 2025
Viewed by 162
Abstract
Power generation is affected and structural instability may occur when biofouling attaches to the rotor of tidal stream turbines (TSTs). Image signals are used to identify biofouling for biofouling recognition, thus achieving on-demand maintenance, optimizing power generation efficiency, and minimizing maintenance costs. However, [...] Read more.
Power generation is affected and structural instability may occur when biofouling attaches to the rotor of tidal stream turbines (TSTs). Image signals are used to identify biofouling for biofouling recognition, thus achieving on-demand maintenance, optimizing power generation efficiency, and minimizing maintenance costs. However, image signals are sensitive to background interferences, and underwater targets blend with the water background, making it difficult to extract target features. Changes in water turbidity can affect the effectiveness of image signal biofouling recognition, which can lead to reduced recognition accuracy. In order to solve these problems, a multi-view and multi-type feature fusion (MVTFF) method is proposed to recognize rotor biofouling on TSTs for applications in TST operation and maintenance. (1) Key boundary and semantic information are captured to solve the problem of background feature interference by comparing and fusing the extracted multi-view features. (2) The local geometric description and dependency are obtained by integrating contour features into multi-view features to address the issue of the target mixing with water. The mIoU, mPA, Precision, and Recall of the experimental results show that the method achieves superior recognition performance on TST datasets with different turbidity levels. Full article
Show Figures

Figure 1

Figure 1
<p>Comparison of clear and turbid TST segmentation images.</p>
Full article ">Figure 2
<p>MVTFF network for recognizing TST rotor biofouling.</p>
Full article ">Figure 3
<p>Examples of three types of segmentation artifacts.</p>
Full article ">Figure 4
<p>Comparison before and after introduction of different views.</p>
Full article ">Figure 5
<p>Comparison before and after introduction of OB.</p>
Full article ">Figure 6
<p>Results of comparative experiments. (<b>a</b>) Input. (<b>b</b>) MVTFF. (<b>c</b>) Unet. (<b>d</b>) Swin-Unet. (<b>e</b>) deeplabV3+. (<b>f</b>) SETR.</p>
Full article ">
21 pages, 4483 KiB  
Article
DEM Generation Incorporating River Channels in Data-Scarce Contexts: The “Fluvial Domain Method”
by Jairo R. Escobar Villanueva, Jhonny I. Pérez-Montiel and Andrea Gianni Cristoforo Nardini
Hydrology 2025, 12(2), 33; https://doi.org/10.3390/hydrology12020033 - 14 Feb 2025
Viewed by 475
Abstract
This paper presents a novel methodology to generate Digital Elevation Models (DEMs) in flat areas, incorporating river channels from relatively coarse initial data. The technique primarily utilizes filtered dense point clouds derived from SfM-MVS (Structure from Motion-Multi-View Stereo) photogrammetry of available crewed aerial [...] Read more.
This paper presents a novel methodology to generate Digital Elevation Models (DEMs) in flat areas, incorporating river channels from relatively coarse initial data. The technique primarily utilizes filtered dense point clouds derived from SfM-MVS (Structure from Motion-Multi-View Stereo) photogrammetry of available crewed aerial imagery datasets. The methodology operates under the assumption that the aerial survey was carried out during low-flow or drought conditions so that the dry (or almost dry) riverbed is detected, although in an imprecise way. Direct interpolation of the detected elevation points yields unacceptable river channel bottom profiles (often exhibiting unrealistic artifacts) and even distorts the floodplain. In our Fluvial Domain Method, channel bottoms are represented like “highways”, perhaps overlooking their (unknown) detailed morphology but gaining in general topographic consistency. For instance, we observed an 11.7% discrepancy in the river channel long profile (with respect to the measured cross-sections) and a 0.38 m RMSE in the floodplain (with respect to the GNSS-RTK measurements). Unlike conventional methods that utilize active sensors (satellite and airborne LiDAR) or classic topographic surveys—each with precision, cost, or labor limitations—the proposed approach offers a more accessible, cost-effective, and flexible solution that is particularly well suited to cases with scarce base information and financial resources. However, the method’s performance is inherently limited by the quality of input data and the simplification of complex channel morphologies; it is most suitable for cases where high-resolution geomorphological detail is not critical or where direct data acquisition is not feasible. The resulting DEM, incorporating a generalized channel representation, is well suited for flood hazard modeling. A case study of the Ranchería river delta in the Northern Colombian Caribbean demonstrates the methodology. Full article
(This article belongs to the Special Issue Hydrological Modeling and Sustainable Water Resources Management)
Show Figures

Figure 1

Figure 1
<p>Study area: lower Ranchería River basin sector (green polygon), Riohacha (Colombia). The study reach focuses on the main channel from the “Aremasain” station to the branch named “Riito”.</p>
Full article ">Figure 2
<p>Dense vegetation context along the studied river reach.</p>
Full article ">Figure 3
<p>Deployment of GNSS-RTK points (gray dots) used for subsequent DEM adjustments from the photogrammetric process and validation (red triangles) of DSM/DEM products.</p>
Full article ">Figure 4
<p>Figure example shows a longitudinal channel profile (dashed red line) from the preliminary SfM-MVS DEM (without channel correction). Note the significant altimetric variability resulting from interpolation artifacts and the overestimation of the channel width due to the artificial lowering of the floodplain surface along the riverbanks. Flow direction is represented by the black arrow.</p>
Full article ">Figure 5
<p>General outline of the proposed method. The workflow starts with the input data at the bottom and culminates in the final product at the top, highlighting it as the process’s outcome: (1) Input data and preprocessing, (2) Elevation extraction from the preliminary DEM, (3) Bathymetric channel correction, and (4) Channel Integration with the preliminary DEM.</p>
Full article ">Figure 6
<p>Visualization of the error distribution and accuracy assessment of the digital models using histograms (<b>a</b>) and box plots (<b>b</b>). DSM as blue and DTM appears in yellow. Superimposed on the histogram are the expected normal distribution curves and white circles represents the outliers.</p>
Full article ">Figure 7
<p>Comparison between raw (blue) and smoothed elevation profiles of the channel obtained by SfM-MVS photogrammetry and the proposed method (red): (<b>a</b>) smoothed channel bottom, location of reference cross-sections and GNSS RTK observations (triangles); (<b>b</b>) refinement of the channel longitudinal profile using GNSS RTK adjustment in the last river reach (7 and 8). Purple boxes represent the cross-section location along the elevation profile.</p>
Full article ">Figure 8
<p>Comparison of the cross-sectional depth (h) geometry along the studied river (n = 8). Black lines represent depths estimated by the proposed method; purple lines represent reference (observed) depths.</p>
Full article ">Figure 9
<p>Comparison of maximum depths obtained from field measurements and estimated using the proposed method at eight reference cross-sections of the Ranchería River: (<b>a</b>) relative error and Mean Absolute Percentage Error (MAPE) analysis; (<b>b</b>) scatter plot showing the relationship between the observed and estimated depths. The solid black line represents the linear regression fit to the depths data (grey boxes), with the corresponding equation and R-squared value shown (dashed red line indicates perfect agreement).</p>
Full article ">
24 pages, 10147 KiB  
Article
A Siamese Network via Cross-Domain Robust Feature Decoupling for Multi-Source Remote Sensing Image Registration
by Qichao Han, Xiyang Zhi, Shikai Jiang, Wenbin Chen, Yuanxin Huang, Lijian Yu and Wei Zhang
Remote Sens. 2025, 17(4), 646; https://doi.org/10.3390/rs17040646 - 13 Feb 2025
Viewed by 409
Abstract
Image registration is a prerequisite for many multi-source remote sensing image fusion applications. However, due to differences in imaging factors such as sensor type, imaging time, resolution, and viewing angle, multi-source image registration faces challenges of multidimensional coupling such as radiation, scale, and [...] Read more.
Image registration is a prerequisite for many multi-source remote sensing image fusion applications. However, due to differences in imaging factors such as sensor type, imaging time, resolution, and viewing angle, multi-source image registration faces challenges of multidimensional coupling such as radiation, scale, and directional differences. To address this issue, this paper proposes a Siamese network based on cross-domain robust feature decoupling as an image registration framework (CRS-Net), aiming to improve the robustness of multi-source image features across domains, scales, and rotations. Firstly, we design Siamese multiscale encoders and introduce a rotation-invariant convolutional layer without additional training parameters, achieving natural invariance to any rotation. Secondly, we propose a modality-independent decoder that utilizes the self-similarity of feature neighborhoods to excavate stable high-order structural information. Thirdly, we introduce cluster-aware contrastive constraints to learn discriminative and stable keypoint pairs. Finally, we design three multi-source remote sensing datasets and conduct sufficient experiments. Numerous experimental results show that our proposed method outperforms other SOTA methods and achieves more accurate registration in complex multi-source remote sensing scenes. Full article
Show Figures

Figure 1

Figure 1
<p>Example of multi-source remote sensing images. In (<b>a</b>–<b>c</b>), the top are optical images, and the bottom are SAR images. In (<b>d</b>,<b>e</b>), the visible images are above, and the infrared images are below. A significant nonlinear radiation difference can be observed in the green boxes of (<b>a</b>,<b>b</b>). There are significant geometric and viewing angle differences in the yellow boxes of (<b>c</b>,<b>d</b>). In the red box of (<b>e</b>), it can be observed that the objects change in the images at different times. The above multiple differences are coupled with each other, posing challenges to multi-source remote sensing image registration tasks.</p>
Full article ">Figure 2
<p>The flow of the proposed method. The reference image and sensed image are input into hierarchical encoders to capture a rotation-robust multiscale local feature pyramid. The Siamese network is able to share the computational and representational capabilities of the encoder. The multimodal feature pyramid is separately input into parallel sparse structure feature decoders and keypoint detection decoders. Feature map <span class="html-italic">D</span> and keypoint distribution heatmap <span class="html-italic">H</span> are obtained through the structural feature descriptor head and keypoint detection head. A joint optimization strategy is adopted to constrain parameter learning.</p>
Full article ">Figure 3
<p>The structure diagram of encoders.</p>
Full article ">Figure 4
<p>Selection of sampling points in convolutional layers. (<b>a</b>,<b>c</b>) represent the sampling points in the plain convolution. (<b>b</b>,<b>d</b>) represent the sampling points in the rotation-invariant convolution.</p>
Full article ">Figure 5
<p>Schematic diagram of a cross-domain-aware high-order structure decoder. The central elements of interest are displayed in red. The feature elements in the 8-<span class="html-italic">NH</span> neighborhood <math display="inline"><semantics> <mi mathvariant="script">N</mi> </semantics></math> are displayed in green, and the unique feature distances of the 8 groups are displayed as blue lines. The feature elements in the 8-<span class="html-italic">NH</span> expansion neighborhood <math display="inline"><semantics> <msub> <mi mathvariant="script">N</mi> <mi>d</mi> </msub> </semantics></math> are displayed in brown.</p>
Full article ">Figure 6
<p>Examples of registration results using different methods on the SOPatch dataset. (<b>a</b>,<b>c</b>,<b>e</b>,<b>g</b>) are keypoint pairs connected by lines. (<b>b</b>,<b>d</b>,<b>f</b>,<b>h</b>) are checkerboard patterns.</p>
Full article ">Figure 7
<p>The registration results of the proposed CRS-Net, with zoomed-in red-boxed areas of checkerboard patterns provided for observing details. (<b>a</b>–<b>c</b>) are different scenes in the SOPatch dataset.</p>
Full article ">Figure 8
<p>Examples of registration results using different methods on the OSEval dataset. (<b>a</b>,<b>c</b>,<b>e</b>,<b>g</b>) are keypoint pairs connected by lines. (<b>b</b>,<b>d</b>,<b>f</b>,<b>h</b>) are checkerboard patterns.</p>
Full article ">Figure 9
<p>The registration results of the proposed CRS-Net, with zoomed-in red-boxed areas of checkerboard patterns provided for observing details. (<b>a</b>–<b>c</b>) are different scenes in the OSEval dataset.</p>
Full article ">Figure 10
<p>Examples of registration results using different methods on the VEDAI dataset. (<b>a</b>,<b>c</b>,<b>e</b>,<b>g</b>) are keypoint pairs connected by lines. (<b>b</b>,<b>d</b>,<b>f</b>,<b>h</b>) are checkerboard patterns.</p>
Full article ">Figure 11
<p>The registration results of the proposed CRS-Net, with zoomed-in red-boxed areas of checkerboard patterns provided for observing details. (<b>a</b>–<b>c</b>) are different scenes in the VEDAI dataset.</p>
Full article ">Figure 12
<p>The impact of hyperparameter <math display="inline"><semantics> <mi>τ</mi> </semantics></math> on registration performance. (<b>a</b>–<b>c</b>) represent the experimental results on the SOPatch dataset, OSEval dataset, and VEDAI dataset, respectively.</p>
Full article ">Figure 13
<p>The impact of hyperparameter <math display="inline"><semantics> <mi>λ</mi> </semantics></math> on registration performance. (<b>a</b>–<b>c</b>) represent the experimental results on the SOPatch dataset, OSEval dataset, and VEDAI dataset, respectively.</p>
Full article ">Figure 14
<p>The impact of hyperparameter <math display="inline"><semantics> <mi>β</mi> </semantics></math> on registration performance. (<b>a</b>–<b>c</b>) represent the experimental results on the SOPatch dataset, OSEval dataset, and VEDAI dataset, respectively.</p>
Full article ">Figure 15
<p>Efficiency of registration algorithms on OSEval dataset.</p>
Full article ">
23 pages, 2857 KiB  
Article
Fast Multi-View Subspace Clustering Based on Flexible Anchor Fusion
by Yihao Zhu, Shibing Zhou and Guoqing Jin
Electronics 2025, 14(4), 737; https://doi.org/10.3390/electronics14040737 - 13 Feb 2025
Viewed by 186
Abstract
Multi-view subspace clustering enhances clustering performance by optimizing and integrating structural information from multiple views. Recently, anchor-based methods have made notable progress in large-scale clustering scenarios by leveraging anchor points to capture data distribution across different views. Although these methods improve efficiency, a [...] Read more.
Multi-view subspace clustering enhances clustering performance by optimizing and integrating structural information from multiple views. Recently, anchor-based methods have made notable progress in large-scale clustering scenarios by leveraging anchor points to capture data distribution across different views. Although these methods improve efficiency, a common limitation is that they typically select an equal number of anchor points from each view. Additionally, during the graph fusion stage, most existing frameworks use simple linear weighting to construct the final consensus graph, overlooking the inherent structural relationships between the data. To address these issues, we propose a novel and flexible anchor graph fusion framework which selects an appropriate number of anchor points for each view based on its data space, creating suitable anchor graphs. In the graph fusion stage, we introduce a regularization term which adaptively and flexibly combines anchor graphs of varying sizes. Moreover, our approach incorporates both global and local information between views, enabling a more accurate capture of the cluster structure within the data. Furthermore, our method operates with linear time complexity, making it well suited for large-scale datasets. Extensive experiments on multiple datasets demonstrate the superior performance of our proposed algorithm. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>left</b>) Existing anchor graph fusion strategy with equally sized anchors. (<b>right</b>) Our proposed FMVSC framework.</p>
Full article ">Figure 2
<p>Running results of MLRSSC on artificial datasets.</p>
Full article ">Figure 3
<p>Running results of LMVSC on artificial datasets.</p>
Full article ">Figure 3 Cont.
<p>Running results of LMVSC on artificial datasets.</p>
Full article ">Figure 4
<p>Running results of MSGL on artificial datasets.</p>
Full article ">Figure 5
<p>Running results of FMVSC on artificial datasets.</p>
Full article ">Figure 5 Cont.
<p>Running results of FMVSC on artificial datasets.</p>
Full article ">Figure 6
<p>Illustrations of the learned data distributions of the FMVSC and LMVSC datasets with the t-SNE algorithm on the WebKB database.</p>
Full article ">Figure 7
<p>Illustrations of the learned data distribution of FMVSC and LMVSC with the t- SNE algorithm on the Notting-Hill dataset.</p>
Full article ">Figure 8
<p>The sensitivity analysis experiments of hyperparameter λ on the Notting-Hill and Wiki datasets.</p>
Full article ">Figure 9
<p>The sensitivity analysis experiments of hyperparameter λ on the CCV and ALOI datasets.</p>
Full article ">Figure 10
<p>The convergence graph of the objective function for each dataset.</p>
Full article ">
24 pages, 14282 KiB  
Article
Multi-View, Multi-Target Tracking in Low-Altitude Scenes with UAV Involvement
by Pengnian Wu, Yixuan Li, Zhihao Li, Xuqi Yang and Dong Xue
Drones 2025, 9(2), 138; https://doi.org/10.3390/drones9020138 - 13 Feb 2025
Viewed by 338
Abstract
Cooperative visual tracking involving unmanned aerial vehicles (UAVs) in low-altitude environments is a dynamic and rapidly evolving domain. Existing models encounter challenges with targets, such as scale variation, appearance similarity, and frequent occlusions, which hinder the effective use of target information for cross-view [...] Read more.
Cooperative visual tracking involving unmanned aerial vehicles (UAVs) in low-altitude environments is a dynamic and rapidly evolving domain. Existing models encounter challenges with targets, such as scale variation, appearance similarity, and frequent occlusions, which hinder the effective use of target information for cross-view identity association. To address these challenges, this study introduces a model for multi-view, multi-target tracking in low-altitude scenes involving UAVs (MVTL-UAV), an effective multi-target tracking model specifically designed for low-altitude scenarios involving UAVs. The proposed method is built upon existing end-to-end detection and tracking frameworks, introducing three innovative modules: loss reinforcement, coupled constraints, and coefficient improvement. Collectively, these advancements enhance the accuracy of cross-view target identity matching. Our method is trained using the DIVOTrack dataset, which comprises data collected from a single UAV and two handheld cameras. Empirical results indicate that our approach achieves a 2.19% improvement in cross-view matching accuracy (CVMA) and a 1.95% improvement in the cross-view ID F1 metric (CVIDF1) when compared to current state-of-the-art methodologies. Importantly, the model’s performance is improved without compromising computational efficiency, thereby enhancing its practical value in resource-constrained environments. As a result, our model demonstrates satisfactory performance in various low-altitude target tracking scenarios involving UAVs, establishing a new benchmark in this research area. Full article
(This article belongs to the Section Drone Design and Development)
Show Figures

Figure 1

Figure 1
<p>The key innovative modules of the proposed MVTL-UAV.</p>
Full article ">Figure 2
<p>The output branch of the multi-view, multi-target tracking network model.</p>
Full article ">Figure 3
<p>An illustration of self-attention reinforcement in re-identification features.</p>
Full article ">Figure 4
<p>Appearance transformation of targets across and within views.</p>
Full article ">Figure 5
<p>The variation in the softmax function curve with different temperature coefficients.</p>
Full article ">Figure 6
<p>Comparison of recall metrics before and after applying positive and negative reinforcement.</p>
Full article ">Figure 7
<p>The amplitude spectra of the target appearance features before and after applying self-attention reinforcement.</p>
Full article ">Figure 8
<p>Loss variation curves showing how coupling detection and tracking losses influence model optimization.</p>
Full article ">Figure 9
<p>The cross-view multi-target correlation heatmap before improving the temperature coefficient.</p>
Full article ">Figure 10
<p>The cross-view multi-target correlation heatmap after improving the temperature coefficient.</p>
Full article ">Figure 11
<p>Comparison of tracking time for CrossMOT and MVTL-UAV in different scenes.</p>
Full article ">Figure 12
<p>Comparison of required memory space for CrossMOT and MVTL-UAV in different scenes.</p>
Full article ">Figure 13
<p>Comparison of CPU and GPU resource utilization rates between CrossMOT and MVTL-UAV under different scenarios.</p>
Full article ">Figure 14
<p>The proposed method’s tracking performance evaluated against the baseline algorithms on frame 58 of the DIVOTrack test set. The red check mark indicates that the corresponding target has been successfully detected and correctly matched, while the red cross signifies that the corresponding target was not successfully detected or was mismatched.</p>
Full article ">Figure 15
<p>The proposed method’s tracking performance evaluated against baseline algorithms based on frame 629 of the DIVOTrack test set. The red check mark indicates that the corresponding target has been successfully detected and correctly matched, while the red cross signifies that the corresponding target was not successfully detected or was mismatched.</p>
Full article ">
12 pages, 3351 KiB  
Article
MNFF-GNN: Multi-Order Neighbor Feature Fusion Graph Neural Network
by Yiwen Liu, Tao Wen, Wen Wu and Bo Lu
Electronics 2025, 14(4), 724; https://doi.org/10.3390/electronics14040724 - 13 Feb 2025
Viewed by 265
Abstract
With the rapid development of artificial intelligence, automated artifact recognition technology has gradually replaced the traditional manual quality evaluation method. The existing samples of CT images containing artifacts are small, and the relationships between the images are of great significance. In this study, [...] Read more.
With the rapid development of artificial intelligence, automated artifact recognition technology has gradually replaced the traditional manual quality evaluation method. The existing samples of CT images containing artifacts are small, and the relationships between the images are of great significance. In this study, firstly, a method for CT image artifact recognition was developed by transforming the problem into a node classification framework. Secondly, the characteristics of this complex network and the features of the CT image texture were extracted. Finally, the combination of the complex network’s characteristics and CT image texture features was viewed as node attribution; the relationship between different nodes was analyzed using a graph attention network; and classification was carried out. The integration of multi-order neighbor features in the MNFF-GNN model improves the representation of motion artifact regions, targeting the limitations of traditional methods and convolutional neural networks (CNNs). The model demonstrates potential as a clinical tool, particularly in resource-constrained settings, by effectively identifying artifacts even with limited data, with an accuracy of 90.9%, which is an improvement of 9.73%. This innovative approach leverages graph neural networks (GNNs), which are particularly effective at capturing both local and global relationships within graph-structured data. Full article
(This article belongs to the Special Issue Mechanism and Modeling of Graph Convolutional Networks)
Show Figures

Figure 1

Figure 1
<p>Graph and neighborhood of the nodes.</p>
Full article ">Figure 2
<p>Comparison of enhanced and initial image. (<b>a</b>) represents the original image, while Figure (<b>b</b>) represents the artifact image after feature enhancement.</p>
Full article ">Figure 3
<p>Iterative principles of graph neural networks. Through multiple iterations, the features of the nodes gradually integrate information from their neighbors and even the entire graph.</p>
Full article ">Figure 4
<p>Partial graph construction. The topological characteristics of the multi-order network, combined with physical characteristics, were fused as a node feature vector.</p>
Full article ">Figure 5
<p>Results of CT image classification. In particular, CT images with artifacts.</p>
Full article ">Figure 6
<p>Results of CT image classification. In particular, CT images without artifacts.</p>
Full article ">
22 pages, 2212 KiB  
Article
KeypointNet: An Efficient Deep Learning Model with Multi-View Recognition Capability for Sitting Posture Recognition
by Zheng Cao, Xuan Wu, Chunguo Wu, Shuyang Jiao, Yubin Xiao, Yu Zhang and You Zhou
Electronics 2025, 14(4), 718; https://doi.org/10.3390/electronics14040718 - 12 Feb 2025
Viewed by 369
Abstract
Numerous studies leverage pose estimation to extract human keypoint data and then classify sitting postures. However, employing neural networks for direct keypoint classification often yields suboptimal results. Alternatively, modeling keypoints into other data representations before classification introduces redundant information and substantially increases inference [...] Read more.
Numerous studies leverage pose estimation to extract human keypoint data and then classify sitting postures. However, employing neural networks for direct keypoint classification often yields suboptimal results. Alternatively, modeling keypoints into other data representations before classification introduces redundant information and substantially increases inference time. In addition, most existing methods perform well only under a single fixed viewpoint, limiting their applicability in complex real-world scenarios involving unseen viewpoints. To better address the first limitation, we propose KeypointNet, which employs a decoupled feature extraction strategy consisting of a Keypoint Feature Extraction module and a Multi-Scale Feature Extraction module. In addition, to enhance multi-view recognition capability, we propose the Multi-View Simulation (MVS) algorithm, which augments the viewpoint information by first rotating keypoints and then repositioning the camera. Simultaneously, we propose the multi-view sitting posture (MVSP) dataset, designed to simulate diverse real-world viewpoints. The experimental results demonstrate that KeypointNet outperforms the other state-of-the-art methods on both the proposed MVSP dataset and the other public datasets, while maintaining a lightweight and efficient design. Ablation studies demonstrate the effectiveness of MVS and all KeypointNet modules. Furthermore, additional experiments highlight the superior generalization, small-sample learning capability, and robustness to unseen viewpoints of KeypointNet. Full article
(This article belongs to the Special Issue Innovation and Technology of Computer Vision)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>An overview of the proposed MVS and KeypointNet method for sitting posture recognition. Firstly, a video stream is captured by a camera, and images from the stream are extracted and processed using a pose estimation method to infer 3D keypoint data (MHFormer [<a href="#B14-electronics-14-00718" class="html-bibr">14</a>] is used in our method). During the training process, MVS is employed to augment the keypoint data, after which KeypointNet is used for recognition. During the testing process, the keypoint data obtained from the pose estimation method are directly recognized using KeypointNet.</p>
Full article ">Figure 2
<p>An illustration of MVS’s viewpoint transformation process. The upper part uses cubes to visualize the spatial transformation more intuitively. Random keypoint rotation alters the orientation of the data without changing the relative positions between keypoints. In contrast, random camera repositioning introduces more diverse viewpoint transformations and changes the relative positions between keypoints. The lower part demonstrates a set of keypoint data to showcase the actual data augmentation effects.</p>
Full article ">Figure 3
<p>An illustration of the 17 human skeletal keypoints’ distribution.</p>
Full article ">Figure 4
<p>Framework of proposed KeypointNet. (<b>a</b>) Overview of the entire KeypointNet architecture. (<b>b</b>) Detailed structure of the KFE module within KeypointNet. (<b>c</b>) Detailed structure of the MSFE module within KeypointNet.</p>
Full article ">Figure 5
<p>An illustration of the 7 postures in the MVSP dataset from different viewpoints.</p>
Full article ">Figure 6
<p>Illustrations of keypoint data, skeleton images, point cloud data, and spatiotemporal sequence data.</p>
Full article ">Figure 7
<p>The performance of the MVS algorithm.</p>
Full article ">Figure 8
<p>Comparison of MVS with other data augmentation techniques.</p>
Full article ">Figure 9
<p>Result of Multi-View Simulation algorithm robustness experiments.</p>
Full article ">Figure 10
<p>The results of the experiment on viewpoint limitedness.</p>
Full article ">Figure 11
<p>The results of the data sampling rate experiment.</p>
Full article ">
18 pages, 39910 KiB  
Article
DyGS-SLAM: Realistic Map Reconstruction in Dynamic Scenes Based on Double-Constrained Visual SLAM
by Fan Zhu, Yifan Zhao, Ziyu Chen, Chunmao Jiang, Hui Zhu and Xiaoxi Hu
Remote Sens. 2025, 17(4), 625; https://doi.org/10.3390/rs17040625 - 12 Feb 2025
Viewed by 426
Abstract
Visual SLAM is widely applied in robotics and remote sensing. The fusion of Gaussian radiance fields and Visual SLAM has demonstrated astonishing efficacy in constructing high-quality dense maps. While existing methods perform well in static scenes, they are prone to the influence of [...] Read more.
Visual SLAM is widely applied in robotics and remote sensing. The fusion of Gaussian radiance fields and Visual SLAM has demonstrated astonishing efficacy in constructing high-quality dense maps. While existing methods perform well in static scenes, they are prone to the influence of dynamic objects in real-world dynamic environments, thus making robust tracking and mapping challenging. We introduce DyGS-SLAM, a Visual SLAM system that employs dual constraints to achieve high-fidelity static map reconstruction in dynamic environments. We extract ORB features within the scene, and use open-world semantic segmentation models and multi-view geometry to construct dual constraints, forming a zero-shot dynamic information elimination module while recovering backgrounds occluded by dynamic objects. Furthermore, we select high-quality keyframes and use them for loop closure detection and global optimization, constructing a foundational Gaussian map through a set of determined point clouds and poses and integrating repaired frames for rendering new viewpoints and optimizing 3D scenes. Experimental results on the TUM RGB-D, Bonn, and Replica datasets, as well as real scenes, demonstrate that our method has excellent localization accuracy and mapping quality in dynamic scenes. Full article
(This article belongs to the Special Issue 3D Scene Reconstruction, Modeling and Analysis Using Remote Sensing)
Show Figures

Figure 1

Figure 1
<p>System framework of DyGS-SLAM. The tracking thread conducts dynamic object removal and background inpainting. The mapping thread reconstructs the Gaussian map and performs differentiable rendering using a set of determined poses and point clouds. Lastly, the 3D scene is optimized based on the repaired frames and rendered frames.</p>
Full article ">Figure 2
<p>Open-world semantic segmentation model.</p>
Full article ">Figure 3
<p>RGB images in TUM RGB-D dataset. (<b>a</b>) Frame 690. (<b>b</b>) Frame 765. The red boxes indicates the chair being moved. This is often semantically static but actually moving.</p>
Full article ">Figure 4
<p>The feature point p on the keyframe projected onto the current frame is p’, and O and O’ are the two frames corresponding to the optical center of the camera, respectively. (<b>a</b>) Feature point p’ is static (<math display="inline"><semantics> <mrow> <msup> <mi>d</mi> <mo>′</mo> </msup> <mo>=</mo> <msub> <mi>d</mi> <mrow> <mi>p</mi> <mi>r</mi> <mi>o</mi> <mi>j</mi> </mrow> </msub> </mrow> </semantics></math>). (<b>b</b>) Feature point p’ is dynamic (<math display="inline"><semantics> <mrow> <msup> <mi>d</mi> <mo>′</mo> </msup> <mo>≪</mo> <msub> <mi>d</mi> <mrow> <mi>p</mi> <mi>r</mi> <mi>o</mi> <mi>j</mi> </mrow> </msub> </mrow> </semantics></math>).</p>
Full article ">Figure 5
<p>Image frame comparison between Dyna-SLAM and DyGS-SLAM (Ours) after walking_halfsphere sequence repair in TUM RGB-D dataset. The red boxes show how different methods compare the results of fixing the same frame.</p>
Full article ">Figure 6
<p>Camera trajectory estimated by ORB-SLAM3 and DyGS-SLAM (Ours) on the TUM dataset, and the differences with ground truth values.</p>
Full article ">Figure 7
<p>Comparison of mapping effects between NICE-SLAM, SplaTAM, and DyGS-SLAM (Ours) on walking_xyz sequence.</p>
Full article ">Figure 8
<p>Detailed comparison of the original reconstructed scene provided by Bonn and the scene reconstructed by our method. The red boxes indicate the details of the different methods to reconstruct the scene. (<b>a</b>) Original reconstructed scene provided by Bonn. (<b>b</b>–<b>d</b>) Details of the reconstructed scene of our method.</p>
Full article ">Figure 9
<p>Comparison of reconstruction performance between SplaTAM and DyGS-SLAM (Ours) on Bonn dataset. Our method demonstrates better reconstruction quality. (<b>a</b>) SplaTAM. (<b>b</b>) DyGS-SLAM.</p>
Full article ">Figure 10
<p>Comparison of mapping effects between NICE-SLAM, SplaTAM, and DyGS-SLAM on Replica dataset. The red boxes indicate the details of the different methods to reconstruct the scene. Our method also has excellent reconstruction quality in static scenes. (<b>a</b>) NICE-SLAM. (<b>b</b>) SplaTAM. (<b>c</b>) DyGS-SLAM. (<b>d</b>) GT.</p>
Full article ">Figure 11
<p>Experimental results in real scenarios. The red boxes indicates the recovery of the static background during reconstruction (<b>a</b>) Input image. (<b>b</b>) Segmentation. (<b>c</b>) Background repair. (<b>d</b>) Novel view synthesis.</p>
Full article ">Figure 12
<p>Effect of background inpainting or not on DyGS-SLAM scene reconstruction. The red boxes indicate the reconstruction effects of different methods. (<b>a</b>) Reconstruction w/o background inpainting. (<b>b</b>) Reconstruction w/background inpainting.</p>
Full article ">
17 pages, 2773 KiB  
Article
Probing Solid-Binding Peptide Self-Assembly Kinetics Using a Frequency Response Cooperativity Model
by Taylor Bader, Kyle Boone, Chris Johnson, Cindy L. Berrie and Candan Tamerler
Biomimetics 2025, 10(2), 107; https://doi.org/10.3390/biomimetics10020107 - 12 Feb 2025
Viewed by 493
Abstract
Biomolecular adsorption has great significance in medical, environmental, and technological processes. Understanding adsorption equilibrium and binding kinetics is essential for advanced process implementation. This requires identifying intrinsic determinants that predict optimal adsorption properties at bio–hybrid interfaces. Solid-binding peptides (SBPs) have targetable intrinsic properties [...] Read more.
Biomolecular adsorption has great significance in medical, environmental, and technological processes. Understanding adsorption equilibrium and binding kinetics is essential for advanced process implementation. This requires identifying intrinsic determinants that predict optimal adsorption properties at bio–hybrid interfaces. Solid-binding peptides (SBPs) have targetable intrinsic properties involving peptide–peptide and peptide–solid interactions, which result in high-affinity material-selective binding. Atomic force microscopy investigations confirmed this complex interplay of multi-step peptide assemblies in a cooperative modus. Yet, most studies report adsorption properties of SBPs using non-cooperative or single-step adsorption models. Using non-cooperative kinetic models for predicting cooperative self-assembly behavior creates an oversimplified view of peptide adsorption, restricting implementing SBPs beyond their current use. To address these limitations and provide insight into surface-level events during self-assembly, a novel method, the Frequency Response Cooperativity model, was developed. This model iteratively fits adsorption data through spectral analysis of several time-dependent kinetic parameters. The model, applied to a widely used gold-binding peptide data obtained using a quartz crystal microbalance with dissipation, verified multi-step assembly. Peak deconvolution of spectral plots revealed distinct differences in the size and distribution of the kinetic rates present during adsorption across the concentrations. This approach provides new fundamental insights into the intricate dynamics of self-assembly of biomolecules on surfaces. Full article
Show Figures

Figure 1

Figure 1
<p>AFM height images, 250 × 250 nm<sup>2</sup>, with a false color scale of 10 Å of various concentrations absorbed for one hour, with cross-sections through the middle shown in red directly beneath the images, with a scale from 0 to 10 Å (adjusted so that the lowest areas have a height of 0); (<b>a</b>) bare Au(111), (<b>b</b>) 1 fM AuBP1, (<b>c</b>) 1 nM AuBP1, (<b>d</b>) 100 nM (<b>e</b>) 10 µM AuBP1. (The total height for the cross-section of (<b>c</b>,<b>d</b>) has been approximated based on completion of the initial layer, and the height shown should represent the height above the bare gold).</p>
Full article ">Figure 2
<p>Gold-binding peptide AuBP1 (WAGAKRLVLRRE) concentration-dependent adsorption curves using low ionic strength buffer. (<b>a</b>) Averaged frequency–response curves for the adsorption of AuBP1 on a gold-coated QCM-D sensor for all runs of all concentrations. (<b>b</b>) Representative frequency response curves for each concentration used for further analysis. Curves represent the average of the 5th, 7th, and 9th harmonics for a 5 MHz sensor.</p>
Full article ">Figure 3
<p>Frequency Response Cooperativity (FRC) isotherms and their components for QCM by peptide concentrations. The green curves are the FRC isotherms, reconstructing the experimental QCM data (black curves). The FRC isotherms are the sums of their cooperative components in blue. The cooperative components have growth phases and convergence phases.</p>
Full article ">Figure 4
<p>Spectral distribution of component scaling factors for apparent kinetic constants (<span class="html-italic">k<sub>obs</sub></span>) for FRC components across AuBP1 concentrations. Concentrations of 0.05–0.1 μM are solely driven by slow kinetics. Concentrations of 0.15–0.175 μM are driven by both slow and fast kinetics. Concentrations of 0.25–0.75 μM are mainly driven by fast kinetics.</p>
Full article ">Figure 5
<p>Peak deconvolution for representative concentrations.</p>
Full article ">
Back to TopTop