Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (272)

Search Parameters:
Keywords = building masks

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
12 pages, 6649 KiB  
Article
Masked Image Modeling Meets Self-Distillation: A Transformer-Based Prostate Gland Segmentation Framework for Pathology Slides
by Haoyue Zhang, Sushant Patkar, Rosina Lis, Maria J. Merino, Peter A. Pinto, Peter L. Choyke, Baris Turkbey and Stephanie Harmon
Cancers 2024, 16(23), 3897; https://doi.org/10.3390/cancers16233897 - 21 Nov 2024
Abstract
Detailed evaluation of prostate cancer glands is an essential yet labor-intensive step in grading prostate cancer. Gland segmentation can serve as a valuable preliminary step for machine-learning-based downstream tasks, such as Gleason grading, patient classification, cancer biomarker building, and survival analysis. Despite its [...] Read more.
Detailed evaluation of prostate cancer glands is an essential yet labor-intensive step in grading prostate cancer. Gland segmentation can serve as a valuable preliminary step for machine-learning-based downstream tasks, such as Gleason grading, patient classification, cancer biomarker building, and survival analysis. Despite its importance, there is currently a lack of a reliable gland segmentation model for prostate cancer. Without accurate gland segmentation, researchers rely on cell-level or human-annotated regions of interest for pathomic and deep feature extraction. This approach is sub-optimal, as the extracted features are not explicitly tailored to gland information. Although foundational segmentation models have gained a lot of interest, we demonstrated the limitations of this approach. This work proposes a prostate gland segmentation framework that utilizes a dual-path Swin Transformer UNet structure and leverages Masked Image Modeling for large-scale self-supervised pretaining. A tumor-guided self-distillation step further fused the binary tumor labels of each patch to the encoder to ensure the encoders are suitable for the gland segmentation step. We united heterogeneous data sources for self-supervised training, including biopsy and surgical specimens, to reflect the diversity of benign and cancerous pathology features. We evaluated the segmentation performance on two publicly available prostate cancer datasets. We achieved state-of-the-art segmentation performance with a test mDice of 0.947 on the PANDA dataset and a test mDice of 0.664 on the SICAPv2 dataset. Full article
(This article belongs to the Section Methods and Technologies Development)
Show Figures

Figure 1

Figure 1
<p>Sample slides from the three data cohorts. The top slide is from SICAPv2. Note that the SICAPv2 dataset is provided in a patch form, so the sample shown in this figure was stitched back based on the given coordinates. The bottom-left slide is from the PANDA cohort. The bottom-right slide is a whole-mount slide from our in-house dataset NCI.</p>
Full article ">Figure 2
<p>Overview of the proposed model for prostate gland segmentation. Section (<b>A</b>) shows the architecture of our proposed dual-path segmentation architecture. Section (<b>B</b>) shows our preprocessing, self-supervised learning, and self-distillation schema for the self-supervised learning step.</p>
Full article ">Figure 3
<p>Sample segmentation results for different Gleason grade glands across different methods. Compared with other methods, many small spots were removed by the tumor classification head in our network, which yielded a better visual representation without any post-processing smoothing methods.</p>
Full article ">
28 pages, 45529 KiB  
Article
High-Quality Damaged Building Instance Segmentation Based on Improved Mask Transfiner Using Post-Earthquake UAS Imagery: A Case Study of the Luding Ms 6.8 Earthquake in China
by Kangsan Yu, Shumin Wang, Yitong Wang and Ziying Gu
Remote Sens. 2024, 16(22), 4222; https://doi.org/10.3390/rs16224222 - 13 Nov 2024
Viewed by 537
Abstract
Unmanned aerial systems (UASs) are increasingly playing a crucial role in earthquake emergency response and disaster assessment due to their ease of operation, mobility, and low cost. However, post-earthquake scenes are complex, with many forms of damaged buildings. UAS imagery has a high [...] Read more.
Unmanned aerial systems (UASs) are increasingly playing a crucial role in earthquake emergency response and disaster assessment due to their ease of operation, mobility, and low cost. However, post-earthquake scenes are complex, with many forms of damaged buildings. UAS imagery has a high spatial resolution, but the resolution is inconsistent between different flight missions. These factors make it challenging for existing methods to accurately identify individual damaged buildings in UAS images from different scenes, resulting in coarse segmentation masks that are insufficient for practical application needs. To address these issues, this paper proposed DB-Transfiner, a building damage instance segmentation method for post-earthquake UAS imagery based on the Mask Transfiner network. This method primarily employed deformable convolution in the backbone network to enhance adaptability to collapsed buildings of arbitrary shapes. Additionally, it used an enhanced bidirectional feature pyramid network (BiFPN) to integrate multi-scale features, improving the representation of targets of various sizes. Furthermore, a lightweight Transformer encoder has been used to process edge pixels, enhancing the efficiency of global feature extraction and the refinement of target edges. We conducted experiments on post-disaster UAS images collected from the 2022 Luding earthquake with a surface wave magnitude (Ms) of 6.8 in the Sichuan Province of China. The results demonstrated that the average precisions (AP) of DB-Transfiner, APbox and APseg, are 56.42% and 54.85%, respectively, outperforming all other comparative methods. Our model improved the original model by 5.00% and 4.07% in APbox and APseg, respectively. Importantly, the APseg of our model was significantly higher than the state-of-the-art instance segmentation model Mask R-CNN, with an increase of 9.07%. In addition, we conducted applicability testing, and the model achieved an average correctness rate of 84.28% for identifying images from different scenes of the same earthquake. We also applied the model to the Yangbi earthquake scene and found that the model maintained good performance, demonstrating a certain level of generalization capability. This method has high accuracy in identifying and assessing damaged buildings after earthquakes and can provide critical data support for disaster loss assessment. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The study area and UAS orthophotos after the earthquake in Luding County, Sichuan Province. (<b>A</b>) study area; (<b>B</b>) UAS orthophotos: (<b>a</b>,<b>c</b>) Moxi town; (<b>b</b>,<b>d</b>,<b>g</b>) Detuo town; (<b>e</b>) Fawang village; (<b>f</b>) Wandong village.</p>
Full article ">Figure 2
<p>The samples of damaged buildings and labels: (<b>a</b>) Field investigation photos; (<b>b</b>) UAS images, the red fan-shaped marker representing the viewing angle of the observation location; (<b>c</b>) Labeled bounding boxes; (<b>d</b>) Labeled instance masks, the color of the polygon masks represents different instance objects.</p>
Full article ">Figure 3
<p>The network architecture of Mask Transfiner.</p>
Full article ">Figure 4
<p>The improved network architecture for DB-Transfiner. Deformable convolution is employed in the backbone. The FPN is replaced by enhanced BiFPN to fuse the multi-scale features, and, in this study, a lightweight sequence encoder is adopted for efficiency.</p>
Full article ">Figure 5
<p>Deformable convolution feature extraction module. Arrows indicate the type of convolution used at each stage. The first two stages use standard convolution, and the last three stages use deformable convolution. (<b>a</b>) Standard convolution; (<b>b</b>) Deformable convolution.</p>
Full article ">Figure 6
<p>Replacing FPN with enhanced BiFPN to improve feature fusion network.</p>
Full article ">Figure 7
<p>Lightweight sequence encoder to improve the efficiency of the network, using a Transformer structure with an eight-headed self-attention mechanism instead of three Transformer structures with four-headed self-attention mechanisms.</p>
Full article ">Figure 8
<p>Loss curve during DB-Transfiner training.</p>
Full article ">Figure 9
<p>Comparison of the performance of all models based on the metrics <span class="html-italic">AP</span> (%).</p>
Full article ">Figure 10
<p>Visualization of the prediction results of different network models. The colored bounding boxes and polygons represent the detection and segmentation results, respectively. (<b>a</b>) Annotated images; (<b>b</b>) Mask R-CNN; (<b>c</b>) Mask Transfiner; (<b>d</b>) DB-Transfiner.</p>
Full article ">Figure 11
<p>Visualization of instance mask results of different network models. The colored polygons represent the recognized instance objects. ① and ② represent two typical damaged buildings with the same level of destruction. (<b>a</b>) Original images; (<b>b</b>) Annotated results; (<b>c</b>) Mask R-CNN; (<b>d</b>) Mask Transfiner; (<b>e</b>) DB-Transfiner.</p>
Full article ">Figure 12
<p>Visualization of heatmaps: (<b>a</b>) The original images; (<b>b</b>) The heatmaps of Conv2_x layer of the DCNM; (<b>c</b>) The heatmaps of Conv5_x layer of the DCNM; (<b>d</b>) The heatmaps of <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>N</mi> </mrow> <mrow> <mn>5</mn> </mrow> </msub> </mrow> </semantics></math> layer of the MEFM; (<b>e</b>) The final results. The colored borders represent the model’s predicted different instance objects.</p>
Full article ">Figure 13
<p>The visualization of feature maps before and after the LTGM. The colored borders represent the different instance objects.</p>
Full article ">Figure 14
<p>Results of damaged building classification in Fawang village (<a href="#remotesensing-16-04222-f001" class="html-fig">Figure 1</a>B(e)). Red indicates correct detections, green indicates incorrect detections, and yellow indicates missed.</p>
Full article ">Figure 15
<p>Results of damaged building classification in Wandong village and Detuo town (<a href="#remotesensing-16-04222-f001" class="html-fig">Figure 1</a>B(f,g)). Red indicates correct detections, green indicates incorrect detections, and yellow indicates missed.</p>
Full article ">Figure 16
<p>Example of UAV imagery from the Yangbi earthquake in Yunnan, China: (<b>a</b>) Huaian village; (<b>b</b>) Yangbi town.</p>
Full article ">Figure 17
<p>UAS imagery samples of damaged buildings from the Yangbi earthquake. (<b>a</b>) The red irregular polygons denote the damaged buildings. (<b>b</b>) The bounding boxes and polygon masks are the visualized results of our model. The colors represent different instance objects.</p>
Full article ">Figure 18
<p>Examples of densely built-up areas. The red boxes indicate buildings with blurred contour information caused by shadows and occlusions.</p>
Full article ">
13 pages, 326 KiB  
Article
Factors Associated with Impact of Event Scores Among Ontario Education Workers During the COVID-19 Pandemic
by Iris Gutmanis, Brenda L. Coleman, Robert G. Maunder, Kailey Fischer, Veronica Zhu and Allison McGeer
Int. J. Environ. Res. Public Health 2024, 21(11), 1448; https://doi.org/10.3390/ijerph21111448 - 31 Oct 2024
Viewed by 544
Abstract
There is limited information regarding factors related to education workers’ responses to traumatic stress during the COVID-19 pandemic. The study goal was to determine whether personal factors, behaviours that mitigate viral spread, and work-related factors were associated with post-traumatic symptoms. This observational study, [...] Read more.
There is limited information regarding factors related to education workers’ responses to traumatic stress during the COVID-19 pandemic. The study goal was to determine whether personal factors, behaviours that mitigate viral spread, and work-related factors were associated with post-traumatic symptoms. This observational study, embedded within a cohort study, recruited Ontario education workers from February 2021 to June 2023. Exposure data were collected at enrollment and updated annually. Participants completed the Impact of Event Scale (IES) at withdrawal/study completion. Modified Poisson regression was used to build hierarchical models of dichotomized IES scores (≥26: moderate/severe post-traumatic symptoms). Of the 1518 education workers who submitted an IES between September 2022 and December 2023, the incidence rate ratio of IES scores ≥26 was significantly higher among participants who usually/always wore a mask at work (1.48; 95% confidence interval 1.23, 1.79), usually/always practiced physical distancing (1.31; 1.06, 1.62), lived in larger households (1.06; 1.01, 1.12), and reported poor/fair/good health (1.27; 1.11, 1.46). However, models accounted for little of the variance in IES scores, suggesting the need for future studies to collect data on other factors associated with the development of PTSD, such as pre-existing mental health challenges. Early identification of those experiencing traumatic stress and the implementation of stress reduction strategies are needed to ensure the ongoing health of education workers. Full article
21 pages, 7110 KiB  
Article
Pose Tracking and Object Reconstruction Based on Occlusion Relationships in Complex Environments
by Xi Zhao, Yuekun Zhang and Yaqing Zhou
Appl. Sci. 2024, 14(20), 9355; https://doi.org/10.3390/app14209355 - 14 Oct 2024
Viewed by 762
Abstract
For the reconstruction of objects during hand–object interactions, accurate pose estimation is indispensable. By improving the precision of pose estimation, the accuracy of the 3D reconstruction results can be enhanced. Recently, pose tracking techniques are no longer limited to individual objects, leading to [...] Read more.
For the reconstruction of objects during hand–object interactions, accurate pose estimation is indispensable. By improving the precision of pose estimation, the accuracy of the 3D reconstruction results can be enhanced. Recently, pose tracking techniques are no longer limited to individual objects, leading to advancements in the reconstruction of objects interacting with other objects. However, most methods struggle to handle incomplete target information in complex scenes and mutual interference between objects in the environment, leading to a decrease in pose estimation accuracy. We proposed an improved algorithm building upon the existing BundleSDF framework, which enables more robust and accurate tracking by considering the occlusion relationships between objects. First of all, for detecting changes in occlusion relationships, we segment the target and compute dual-layer masks. Secondly, rough pose estimation is performed through feature matching, and a keyframe pool is introduced for pose optimization, which is maintained based on occlusion relationships. Lastly, the estimated results of historical frames are used to train an object neural field to assist in the subsequent pose-tracking process. Experimental verification shows that on the HO-3D dataset, our method can significantly improve the accuracy and robustness of object tracking in frequent interactions, providing new ideas for object pose-tracking tasks in complex scenes. Full article
(This article belongs to the Special Issue Technical Advances in 3D Reconstruction)
Show Figures

Figure 1

Figure 1
<p>Overview of our system. First, we compute the image mask of the target object, then feed the mask segmentation result into the feature-matching network to obtain a coarse pose estimation using the Umeyama [<a href="#B19-applsci-14-09355" class="html-bibr">19</a>] algorithm with the feature-matching result of the previous frame. Second, we use a dual-layer mask-based strategy to select frames from the key-frame pool that have a strong co-visibility relationship with the current frame and perform joint optimization with the current frame to obtain the final pose estimation result.</p>
Full article ">Figure 2
<p>Visualization of the dual-layer mask results (yellow: target object mask; pink: detected foreground occlusions).</p>
Full article ">Figure 3
<p>The images of mask difference between adjacent frames and the original RGB image are shown above. (<b>a</b>) The original RGB image of this scene. (<b>b</b>) The result of superimposing foreground occlusion masks of adjacent frames is shown above. White indicates overlapping areas, while gray indicates non-overlapping areas. (<b>c</b>) The portion showing the change in foreground occlusion masks between adjacent frames (white areas).</p>
Full article ">Figure 4
<p>The partial results of our method on the HO-3D dataset are visualized above. Each row represents the results of a video in the dataset, where the green bounding box indicates the ground truth pose, and the red bounding box represents the predicted results of our method.</p>
Full article ">Figure 5
<p>Visualizations of partial results of our method on the self-made indoor interaction dataset are presented. Each row represents the results of one video in the dataset.</p>
Full article ">Figure 6
<p>Experiment with moving occluders: qualitative comparison with BundleSDF on the custom dataset. On the left are the visualized results of the estimated poses of the target (yellow box) by the two methods, while on the right, the final reconstructed meshes of the two methods are displayed.</p>
Full article ">Figure 7
<p>The partial pose tracking results for the <span class="html-italic"><b>video_switch</b></span> dataset are shown below: yellow wireframes represent the pose estimation results of the method. (<b>a</b>) The results of the comparative method, BundleSDF; (<b>b</b>) the results of our method.</p>
Full article ">Figure 8
<p>The comparison of reconstruction results for the <span class="html-italic"><b>video_switch</b></span> dataset. The first row depicts the reconstruction results by BundleSDF, while the second row shows the reconstruction results by our method. From left to right, the observations are from the frontal, side, and top views of the object, respectively.</p>
Full article ">Figure 9
<p>The partial results of the tracking for <span class="html-italic"><b>video_cup</b></span> are shown above. The left two columns display the original images and local enlargements of the pose estimation results by BundleSDF, while the right two columns show the original images and local enlargements of the pose estimation results by our method.</p>
Full article ">Figure 10
<p>Comparison of tracking and reconstruction results for the <span class="html-italic"><b>video_cup</b></span> dataset. The first row shows the results obtained with BundleSDF, while the second row presents the output from our method.</p>
Full article ">
15 pages, 1641 KiB  
Article
Interactive Segmentation for Medical Images Using Spatial Modeling Mamba
by Yuxin Tang, Yu Li, Hua Zou and Xuedong Zhang
Information 2024, 15(10), 633; https://doi.org/10.3390/info15100633 - 14 Oct 2024
Viewed by 997
Abstract
Interactive segmentation methods utilize user-provided positive and negative clicks to guide the model in accurately segmenting target objects. Compared to fully automatic medical image segmentation, these methods can achieve higher segmentation accuracy with limited image data, demonstrating significant potential in clinical applications. Typically, [...] Read more.
Interactive segmentation methods utilize user-provided positive and negative clicks to guide the model in accurately segmenting target objects. Compared to fully automatic medical image segmentation, these methods can achieve higher segmentation accuracy with limited image data, demonstrating significant potential in clinical applications. Typically, for each new click provided by the user, conventional interactive segmentation methods reprocess the entire network by re-inputting the click into the segmentation model, which greatly increases the user’s interaction burden and deviates from the intended goal of interactive segmentation tasks. To address this issue, we propose an efficient segmentation network, ESM-Net, for interactive medical image segmentation. It obtains high-quality segmentation masks based on the user’s initial clicks, reducing the complexity of subsequent refinement steps. Recent studies have demonstrated the strong performance of the Mamba model in various vision tasks; however, its application in interactive segmentation remains unexplored. In our study, we incorporate the Mamba module into our framework for the first time and enhance its spatial representation capabilities by developing a Spatial Augmented Convolution (SAC) module. These components are combined as the fundamental building blocks of our network. Furthermore, we designed a novel and efficient segmentation head to fuse multi-scale features extracted from the encoder, optimizing the generation of the predicted segmentation masks. Through comprehensive experiments, our method achieved state-of-the-art performance on three medical image datasets. Specifically, we achieved 1.43 NoC@90 on the Kvasir-SEG dataset, 1.57 NoC@90 on the CVC-ClinicDB polyp segmentation dataset, and 1.03 NoC@90 on the ADAM retinal disk segmentation dataset. The assessments on these three medical image datasets highlight the effectiveness of our approach in interactive medical image segmentation. Full article
(This article belongs to the Special Issue Applications of Deep Learning in Bioinformatics and Image Processing)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>ESM-Click Overview. Our model comprises two stages: preliminary segmentation and refinement segmentation. The encoded image and click features are fed into our proposed ESM-Net segmentation network to extract target-aware features and generate a coarse segmentation mask guided by the initial click. Starting from the second click, the new user-provided click is fed into the refinement network to optimize the details of the previously generated coarse mask. By iteratively executing the refinement network, a high-quality prediction mask is eventually produced.</p>
Full article ">Figure 2
<p>The Overall Architecture of ESM-Net integrates Spatial Augmented Convolution (SAC), Mamba modules, and MBConv for downsampling within the encoder module. (<b>a</b>) The Spatial Augmented Convolution Module enhances the spatial representation of features before input to the Mamba Module using a gate-like structure. (<b>b</b>) The Mamba Module transforms input features into feature sequences and processes them with SS2D to obtain comprehensive features from the merged sequences. (<b>c</b>) KAN SegHead receives multi-scale features from the encoder and utilizes KANLinear layers to output the final segmentation mask.</p>
Full article ">Figure 3
<p>The mean Intersection over Union (mIoU) and mean Dice coefficient (mDice) scores corresponding to the predictions obtained per click using different methods on the Kvasir-SEG and Clinic datasets.</p>
Full article ">Figure 4
<p>Qualitative results of ESM-Click. The first row illustrates example segmentations from the Kvasir-SEG dataset. The second row presents segmentation examples from the Clinic dataset with varying numbers of clicks. The third row showcases interactive segmentation cases from the ADAM dataset. Segmentation probability maps are depicted in blue; segmentation overlays on the original images are shown in red using the IoU evaluation metric. Green dots indicate positive clicks, while red dots indicate negative clicks.</p>
Full article ">
21 pages, 26972 KiB  
Article
Defective Pennywort Leaf Detection Using Machine Vision and Mask R-CNN Model
by Milon Chowdhury, Md Nasim Reza, Hongbin Jin, Sumaiya Islam, Geung-Joo Lee and Sun-Ok Chung
Agronomy 2024, 14(10), 2313; https://doi.org/10.3390/agronomy14102313 - 9 Oct 2024
Viewed by 558
Abstract
Demand and market value for pennywort largely depend on the quality of the leaves, which can be affected by various ambient environment or fertigation variables during cultivation. Although early detection of defects in pennywort leaves would enable growers to take quick action, conventional [...] Read more.
Demand and market value for pennywort largely depend on the quality of the leaves, which can be affected by various ambient environment or fertigation variables during cultivation. Although early detection of defects in pennywort leaves would enable growers to take quick action, conventional manual detection is laborious and time consuming as well as subjective. Therefore, the objective of this study was to develop an automatic leaf defect detection algorithm for pennywort plants grown under controlled environment conditions, using machine vision and deep learning techniques. Leaf images were captured from pennywort plants grown in an ebb-and-flow hydroponic system under fluorescent light conditions in a controlled plant factory environment. Physically or biologically damaged leaves (e.g., curled, creased, discolored, misshapen, or brown spotted) were classified as defective leaves. Images were annotated using an online tool, and Mask R-CNN models were implemented with the integrated attention mechanisms, convolutional block attention module (CBAM) and coordinate attention (CA) and compared for improved image feature extraction. Transfer learning was employed to train the model with a smaller dataset, effectively reducing processing time. The improved models demonstrated significant advancements in accuracy and precision, with the CA-augmented model achieving the highest metrics, including a mean average precision (mAP) of 0.931 and an accuracy of 0.937. These enhancements enabled more precise localization and classification of leaf defects, outperforming the baseline Mask R-CNN model in complex visual recognition tasks. The final model was robust, effectively distinguishing defective leaves in challenging scenarios, making it highly suitable for applications in precision agriculture. Future research can build on this modeling framework, exploring additional variables to identify specific leaf abnormalities at earlier growth stages, which is crucial for production quality assurance. Full article
(This article belongs to the Special Issue Advanced Machine Learning in Agriculture)
Show Figures

Figure 1

Figure 1
<p>Experimental site and image acquisition: (<b>a</b>) cultivation shelf for pennywort seedling adaption with hydroponic system and ambient environment, (<b>b</b>) pennywort seedlings grown under fluorescent light, and (<b>c</b>) sample images of pennywort leaves grown in an ebb-and-flow type hydroponic system: malnourished leaves (<b>top</b>), healthy leaves (<b>bottom</b>).</p>
Full article ">Figure 2
<p>Pennywort leaf annotation: (<b>a</b>) original image of affected pennywort plants taken during the experiment, and (<b>b</b>) manually masked healthy and unhealthy leaves.</p>
Full article ">Figure 3
<p>Image augmentation: (<b>a</b>) original image, (<b>b</b>) horizontal flip, (<b>c</b>) vertical flip, (<b>d</b>) shift, (<b>e</b>) zoom, and (<b>f</b>) rotation.</p>
Full article ">Figure 4
<p>The Mask R-CNN architecture with RPN and FPN was used in this study for detecting defective pennywort leaves.</p>
Full article ">Figure 5
<p>(<b>a</b>) The backbone feature extraction network (modified from [<a href="#B31-agronomy-14-02313" class="html-bibr">31</a>]), (<b>b</b>) anchor generation principle (modified from [<a href="#B29-agronomy-14-02313" class="html-bibr">29</a>]), and (<b>c</b>) ROI Align output achieved through grid points of bilinear interpolation (modified from [<a href="#B30-agronomy-14-02313" class="html-bibr">30</a>]), used in this study for detecting defective pennywort leaves.</p>
Full article ">Figure 6
<p>Illustration of feature extraction through the implemented algorithm for defective pennywort leaves.</p>
Full article ">Figure 7
<p>Illustration of CBAM model structure used in this study for detecting defective pennywort leaves: (<b>a</b>) convolutional block attention module, (<b>b</b>) channel attention module, and (<b>c</b>) spatial attention module.</p>
Full article ">Figure 8
<p>Structure of the coordinate attention (CA) mechanism used in this study for detecting defective pennywort leaves.</p>
Full article ">Figure 9
<p>Schematic diagrams for integrating ResNet-101 with attention mechanism modules: (<b>a</b>) ResNet-101+CBAM, and (<b>b</b>) ResNet-101+CA.</p>
Full article ">Figure 10
<p>Loss and accuracy variation of the Mask-RCNN and improved Mask-RCNN models: (<b>a</b>) loss variation for Mask-RCNN_ResNet-101, Mask-RCNN_ResNet-101+CBAM, and Mask-RCNN_ResNet-101+CA, and (<b>b</b>) accuracy variation for Mask-RCNN_ResNet-101, Mask-RCNN_ResNet-101+CBAM, and Mask-RCNN_ResNet-101+CA.</p>
Full article ">Figure 11
<p>Heatmap generated from the images and using the pre-trained models: (<b>a</b>) original image, (<b>b</b>) heatmap of Mask-RCNN_ResNet-101 model, (<b>c</b>) heatmap of Mask-RCNN_ResNet-101+CBAM, and (<b>d</b>) heatmap of Mask-RCNN_ResNet-101+CA model.</p>
Full article ">Figure 12
<p>Output results of the defective pennywort leaf detection in the test images using: (<b>a</b>) an annotated image, (<b>b</b>) the Mask R-CNN model, (<b>c</b>) the improved Mask-RCNN model with CBAM, and (<b>d</b>) the improved Mask-RCNN model with CA.</p>
Full article ">Figure 13
<p>Detection inaccuracies in test images: (<b>a</b>) annotated image and (<b>b</b>) false negative detection from the Mask RCNN model and the improved Mask RCNN models.</p>
Full article ">Figure 14
<p>Visualization of defective leaf segmentation results; (<b>a</b>) original annotated image, (<b>b</b>) ground truth, (<b>c</b>) segmentation result of Mask-RCNN model, (<b>d</b>) segmentation result of improved Mask-RCNN model with CBAM, and (<b>e</b>) segmentation result of improved Mask-RCNN model with CA.</p>
Full article ">Figure 15
<p>Precision- Recall (P-R) curve to evaluate the proposed models performance used in this study.</p>
Full article ">
25 pages, 9183 KiB  
Article
A High-Accuracy Contour Segmentation and Reconstruction of a Dense Cluster of Mushrooms Based on Improved SOLOv2
by Shuzhen Yang, Jingmin Zhang and Jin Yuan
Agriculture 2024, 14(9), 1646; https://doi.org/10.3390/agriculture14091646 - 20 Sep 2024
Viewed by 688
Abstract
This study addresses challenges related to imprecise edge segmentation and low center point accuracy, particularly when mushrooms are heavily occluded or deformed within dense clusters. A high-precision mushroom contour segmentation algorithm is proposed that builds upon the improved SOLOv2, along with a contour [...] Read more.
This study addresses challenges related to imprecise edge segmentation and low center point accuracy, particularly when mushrooms are heavily occluded or deformed within dense clusters. A high-precision mushroom contour segmentation algorithm is proposed that builds upon the improved SOLOv2, along with a contour reconstruction method using instance segmentation masks. The enhanced segmentation algorithm, PR-SOLOv2, incorporates the PointRend module during the up-sampling stage, introducing fine features and enhancing segmentation details. This addresses the difficulty of accurately segmenting densely overlapping mushrooms. Furthermore, a contour reconstruction method based on the PR-SOLOv2 instance segmentation mask is presented. This approach accurately segments mushrooms, extracts individual mushroom masks and their contour data, and classifies reconstruction contours based on average curvature and length. Regular contours are fitted using least-squares ellipses, while irregular ones are reconstructed by extracting the longest sub-contour from the original irregular contour based on its corners. Experimental results demonstrate strong generalization and superior performance in contour segmentation and reconstruction, particularly for densely clustered mushrooms in complex environments. The proposed approach achieves a 93.04% segmentation accuracy and a 98.13% successful segmentation rate, surpassing Mask RCNN and YOLACT by approximately 10%. The center point positioning accuracy of mushrooms is 0.3%. This method better meets the high positioning requirements for efficient and non-destructive picking of densely clustered mushrooms. Full article
Show Figures

Figure 1

Figure 1
<p>Main problems in segmentation and localization of densely overlapping mushrooms.</p>
Full article ">Figure 2
<p>Field image of mushroom image acquisition.</p>
Full article ">Figure 3
<p>Datasets of different forms of mushrooms. (<b>a</b>) With scales on caps; (<b>b</b>) With brown spots on caps; (<b>c</b>) With soil on caps; (<b>d</b>) Serious interleaving, clumping, and adhesion; (<b>e</b>) Large differences in size and height; (<b>f</b>) Adhesion and extrusion deformation between mushrooms; (<b>g</b>) Heavily tilted to expose its stalk; (<b>h</b>) With mechanical damage on caps.</p>
Full article ">Figure 4
<p>Datasets for different light sources: (<b>a</b>) Top light source; (<b>b</b>) Side light source; (<b>c</b>) Dark light source.</p>
Full article ">Figure 5
<p>SOLOv2 network structure.</p>
Full article ">Figure 6
<p>Result of SOLOv2 segmented mushrooms: (<b>a</b>) Ordinary aggregation; (<b>b</b>) Complex polymerization.</p>
Full article ">Figure 7
<p>PointRend network structure.</p>
Full article ">Figure 8
<p>PR-SOLOv2 network structure.</p>
Full article ">Figure 9
<p>Flow chart of high-precision contour reconstruction based on PR-SOLOv2.</p>
Full article ">Figure 10
<p>Segmentation results of densely overlapping mushrooms: (<b>a</b>) Original; (<b>b</b>) Mask RCNN; (<b>c</b>) YOLACT; (<b>d</b>) PR-SOLOv2.</p>
Full article ">Figure 11
<p>Segmentation result of tilting mushrooms: (<b>a</b>) Original; (<b>b</b>) Mask RCNN; (<b>c</b>) YOLACT; (<b>d</b>) PR-SOLOv2.</p>
Full article ">Figure 12
<p>Segmentation result of tiny mushrooms: (<b>a</b>) Original; (<b>b</b>) Mask RCNN; (<b>c</b>) YOLACT; (<b>d</b>) PR-SOLOv2.</p>
Full article ">Figure 13
<p>Segmentation results under different light sources.</p>
Full article ">Figure 14
<p>Reconstruction effect of overlapping extruded mushrooms: (<b>a</b>) PR-SOLOv2 mask; (<b>b</b>) Extract mask edges; (<b>c</b>) Contour reconstruction.</p>
Full article ">Figure 15
<p>Mushroom contour reconstruction effect.</p>
Full article ">Figure 16
<p>Benchmarking process for determining mushroom centers: (<b>a</b>) Original image; (<b>b</b>) Marking image; (<b>c</b>) Fitting contour map.</p>
Full article ">Figure 17
<p>Comparative effect of fitting: (<b>a</b>) Manually labeled contours; (<b>b</b>) Reconstructed contours by this paper’s method; (<b>c</b>) Comparison result.</p>
Full article ">Figure 18
<p>Comparison effect of contour reconstruction by this paper’s method and manually labeled contours: (<b>a</b>) Manually labeled contours; (<b>b</b>) Reconstructed contours by this paper’s method; (<b>c</b>) Comparison result.</p>
Full article ">Figure 19
<p>CDR plots of the four pictures.</p>
Full article ">
16 pages, 8717 KiB  
Article
A Method for Extracting High-Resolution Building Height Information in Rural Areas Using GF-7 Data
by Mingbo Liu, Ping Wang, Kailong Hu, Changjun Gu, Shengyue Jin and Lu Chen
Sensors 2024, 24(18), 6076; https://doi.org/10.3390/s24186076 - 20 Sep 2024
Viewed by 603
Abstract
Building height is important information in disaster management and damage assessment. It is also a key parameter in studies such as population modeling and urbanization. Relatively few studies have been conducted on extracting building height in rural areas using imagery from China’s Gaofen-7 [...] Read more.
Building height is important information in disaster management and damage assessment. It is also a key parameter in studies such as population modeling and urbanization. Relatively few studies have been conducted on extracting building height in rural areas using imagery from China’s Gaofen-7 satellite (GF-7). In this study, we developed a method combining photogrammetry and deep learning to extract building height using GF-7 data in the rural area of Pingquan in northern China. The deep learning model DELaMa was proposed for digital surface model (DSM) editing based on the Large Mask Inpainting (LaMa) architecture. It not only preserves topographic details but also reasonably predicts the topography inside the building mask. The percentile value of the normalized digital surface model (nDSM) in the building footprint was taken as the building height. The extracted building heights in the study area are highly consistent with the reference building heights measured from the ICESat-2 LiDAR point cloud, with an R2 of 0.83, an MAE of 1.81 m and an RMSE of 2.13 m for all validation buildings. Overall, the proposed method in this paper helps to promote the use of satellite data in large-scale building height surveys, especially in rural areas. Full article
Show Figures

Figure 1

Figure 1
<p>The study area: (<b>a</b>) extent of the study area and building footprints, overlay on the Copernicus DEM hillshade layer; (<b>b</b>) location of the study area; and (<b>c</b>) demonstration of building types in the study area.</p>
Full article ">Figure 2
<p>Workflow of the building height extraction. Important intermediate products are marked with a darker background.</p>
Full article ">Figure 3
<p>The color encoding module: (<b>a</b>) two-stage color encoding; (<b>b</b>) RGB components of CMRMAP; (<b>c</b>) CMRMAP, digital surface model (DSM), and digital terrain model (DTM) in RGB color space.</p>
Full article ">Figure 4
<p>The architecture of Large Mask Inpainting (LaMa) [<a href="#B40-sensors-24-06076" class="html-bibr">40</a>].</p>
Full article ">Figure 5
<p>Comparisons of different DSM editing methods. The results are presented as multidirectional hillshade to show details: (<b>a</b>) original GF-7 DSM; (<b>d</b>) building mask, derived from the building footprints; (<b>b</b>) filled in using Copernicus DEM; (<b>c</b>) plane fitting; (<b>e</b>) terrain filter; and (<b>f</b>) our method, DELaMa.</p>
Full article ">Figure 6
<p>The building height map of the study area. Aggregated to 50 m ground sampling distance (GSD) for visualization.</p>
Full article ">Figure 7
<p>Examples of building height extraction. The first column is the GF-7 multispectral image, the second column is the GF-7 DSM, the third column is the normalized digital surface model (nDSM), and the fourth column is the extracted building height.</p>
Full article ">Figure 8
<p>Scatter plot of the building heights from GF-7 against the building heights from ICESat-2.</p>
Full article ">Figure 9
<p>Quality control and inspection of the validation process.</p>
Full article ">Figure 10
<p>Demonstrations of the impact of DSM editing methods on building height extraction in rugged terrain. The topographic data are presented as multidirectional hillshade to show details: (<b>a</b>) images of Zone 1 and Zone 2; (<b>b</b>) GF-7 DSM; (<b>c</b>) Copernicus DEM filling; (<b>d</b>) plane fitting; (<b>e</b>) terrain filter; and (<b>f</b>) DELaMa.</p>
Full article ">
12 pages, 1465 KiB  
Article
American Football Headgear Impairs Visuomotor Drill Performance in Division I NCAA Football Athletes
by Christopher G. Ballmann and Rebecca R. Rogers
J. Funct. Morphol. Kinesiol. 2024, 9(3), 169; https://doi.org/10.3390/jfmk9030169 - 18 Sep 2024
Viewed by 662
Abstract
Background/Objectives: Previous evidence has shown that American football headgear (e.g., facemasks, visors/eye shields) differentially impairs reaction time (RT) to visual stimuli, most notably in peripheral fields of view. However, this has only been established with stationary RT testing, which may not translate to [...] Read more.
Background/Objectives: Previous evidence has shown that American football headgear (e.g., facemasks, visors/eye shields) differentially impairs reaction time (RT) to visual stimuli, most notably in peripheral fields of view. However, this has only been established with stationary RT testing, which may not translate to gameplay situations that require gross motor skills. Therefore, the purpose of this study was to build upon previous findings to elucidate the effects of various American football headgear on gross motor visuomotor drill performance. Methods: Division 1 NCAA football players (n = 16) with normal/corrected-to-normal vision participated and completed two experiments (EXP), each with differing conditions: EXP1- Varying facemask reinforcement and EXP2- Varying visor/eye shield light transmittance. In EXP1, participants completed an agility test for the following conditions: baseline/no helmet (BL), helmet + light (HL), helmet + medium (HM), and helmet + heavy (HH) face mask reinforcement. In EXP2, participants completed an agility test for the following conditions: baseline/no helmet (BL), helmet + clear visor (HCV), helmet + smoke-tinted visor (HSV), and helmet + mirrored visor (HMV). For each condition in EXP1 and EXP2, participants completed a reactive agility task using a FITLIGHT trainer system where five poles were equipped with a total of ten LED sensors and were placed in a semi-circle 1 m around a center point. Participants were asked to step and reach with their hands to hit each ten lights individually as fast as possible upon illumination. Each reactive agility test was repeated for a total of three attempts. Results: Average reaction time was analyzed and compared between conditions and according to visual fields of interest (e.g., central vs. peripheral). Results from EXP1 showed that compared to BL, reactive agility was worsened by HL (p = 0.030), HM (p = 0.034), and HH (p = 0.003) conditions. No differences between facemask conditions existed for overall performance (p > 0.05). For EXP2, HCV (p < 0.001), HSV (p < 0.001), and HMV (p < 0.001) conditions resulted in worsened reactive agility performance compared to BL. No differences between visor conditions existed for overall performance (p > 0.05). Conclusions: Overall, these findings suggest that American football headgear impairs reactive agility, which could result in worsened game performance and safety. Future studies investigating training strategies to overcome impairments are warranted. Full article
(This article belongs to the Special Issue Advances in Physiology of Training)
Show Figures

Figure 1

Figure 1
<p>Diagram of the dynamic visuomotor test. Participants stood at a centralized starting point and were asked to step and deactivate LED timing sensors situated on poles in a semi-circle as quickly as possible. Upper sensors were 85 cm from the floor while lower were 42 cm. Each of the 10 lights illuminated in a random order for a total of 2 times (20 total attempts). Participants were given 1 s to return to the starting point after each sensor deactivation. The test was completed a total of 3 times with 3 min of rest in between.</p>
Full article ">Figure 2
<p>Headgear used in EXP1. (<b>a</b>) Schutt™ Custom Vengeance Pro Helmet, (<b>b</b>) Light-reinforced facemask (V-ROPO-TRAD), (<b>c</b>) Medium-reinforced facemask (V-ROPO-SW-TRAD), and (<b>d</b>) Heavy- reinforced facemask (VR-JOP-DW-TRAD).</p>
Full article ">Figure 3
<p>Headgear used in EXP 2. (<b>a</b>) Schutt™ Custom Vengeance Pro Helmet, (<b>b</b>) Light-reinforced facemask (V-ROPO-TRAD) depicted with visor, (<b>c</b>) Elitetek clear football visor (90%+ visual light transmittance; VLT), (<b>d</b>) Elitetek smoke-tinted football visor (48% VLT), and (<b>e</b>) Elitetek mirror-tinted football visor (28% VLT). Note: Lower VLT values indicate less passage of light through the visor.</p>
Full article ">Figure 4
<p>Response times between baseline (BL; white), helmet + light (HL; black), helmet + medium (HM; gray), and helmet + heavy (HH; green) conditions. (<b>a</b>) Overall response times (ms) over the drill. (<b>b</b>) Response times (ms) for each condition according to the sensor position. (<b>c</b>) Response times (ms) for each condition according to the sensor level. * indicates significantly different than BL (<span class="html-italic">p</span> &lt; 0.05). # indicates significantly different from Mid (<span class="html-italic">p</span> &lt; 0.05). <span>$</span> indicates significantly different than outer (<span class="html-italic">p</span> &lt; 0.05). † indicates significantly different than lower (<span class="html-italic">p</span> &lt; 0.05).</p>
Full article ">Figure 5
<p>Response times between baseline (BL; white), helmet + clear visor (HCV; stripe), helmet + smoke visor (HSV; gray), and helmet + mirrored visor (HMV; gold) conditions. (<b>a</b>) Overall response times (ms) over the drill. (<b>b</b>) Response times (ms) for each condition according to the sensor position. (<b>c</b>) Response times (ms) for each condition according to the sensor level. * indicates significantly different than BL (<span class="html-italic">p</span> &lt; 0.05). # indicates significantly different from central (<span class="html-italic">p</span> &lt; 0.05). <span>$</span> indicates significantly different than outer (<span class="html-italic">p</span> &lt; 0.05). † indicates significantly different than lower (<span class="html-italic">p</span> &lt; 0.05).</p>
Full article ">
11 pages, 209 KiB  
Article
Making Waves: Fanon, Phenomenology, and the Sonic
by Michael J. Monahan
Philosophies 2024, 9(5), 145; https://doi.org/10.3390/philosophies9050145 - 12 Sep 2024
Viewed by 1041
Abstract
Frantz Fanon’s Black Skin, White Masks opens with a discussion of language in the colonial setting. I argue that this is at least in part due to Fanon’s background in phenomenology, and the crucial role that intersubjectivity plays in the phenomenological account of [...] Read more.
Frantz Fanon’s Black Skin, White Masks opens with a discussion of language in the colonial setting. I argue that this is at least in part due to Fanon’s background in phenomenology, and the crucial role that intersubjectivity plays in the phenomenological account of the subject. I begin by demonstrating the phenomenological underpinnings of Fanon’s chapter on language. I then further develop the background phenomenological account of the subject, showing how this informs Fanon’s project. I then develop a sonic account of the subject, arguing that metaphors of sound best represent the phenomenological account of the subject. Finally, I build on this sonic account to draw out the implications for our thinking about communication and liberation in Fanon’s work and beyond. Full article
(This article belongs to the Special Issue Communicative Philosophy)
34 pages, 2908 KiB  
Article
A Hybrid Contrast and Texture Masking Model to Boost High Efficiency Video Coding Perceptual Rate-Distortion Performance
by Javier Ruiz Atencia, Otoniel López-Granado, Manuel Pérez Malumbres, Miguel Martínez-Rach, Damian Ruiz Coll, Gerardo Fernández Escribano and Glenn Van Wallendael
Electronics 2024, 13(16), 3341; https://doi.org/10.3390/electronics13163341 - 22 Aug 2024
Viewed by 546
Abstract
As most of the videos are destined for human perception, many techniques have been designed to improve video coding based on how the human visual system perceives video quality. In this paper, we propose the use of two perceptual coding techniques, namely contrast [...] Read more.
As most of the videos are destined for human perception, many techniques have been designed to improve video coding based on how the human visual system perceives video quality. In this paper, we propose the use of two perceptual coding techniques, namely contrast masking and texture masking, jointly operating under the High Efficiency Video Coding (HEVC) standard. These techniques aim to improve the subjective quality of the reconstructed video at the same bit rate. For contrast masking, we propose the use of a dedicated weighting matrix for each block size (from 4×4 up to 32×32), unlike the HEVC standard, which only defines an 8×8 weighting matrix which it is upscaled to build the 16×16 and 32×32 weighting matrices (a 4×4 weighting matrix is not supported). Our approach achieves average Bjøntegaard Delta-Rate (BD-rate) gains of between 2.5% and 4.48%, depending on the perceptual metric and coding mode used. On the other hand, we propose a novel texture masking scheme based on the classification of each coding unit to provide an over-quantization depending on the coding unit texture level. Thus, for each coding unit, its mean directional variance features are computed to feed a support vector machine model that properly predicts the texture type (plane, edge, or texture). According to this classification, the block’s energy, the type of coding unit, and its size, an over-quantization value is computed as a QP offset (DQP) to be applied to this coding unit. By applying both techniques in the HEVC reference software, an overall average of 5.79% BD-rate gain is achieved proving their complementarity. Full article
(This article belongs to the Special Issue Recent Advances in Image/Video Compression and Coding)
Show Figures

Figure 1

Figure 1
<p>Default HEVC quantization weighting matrices.</p>
Full article ">Figure 2
<p>Contrast sensitivity function. The red curve represents the original CSF as defined by Equation (1), while the blue dashed curve represents the flattened CSF, with spatial frequencies below the peak sensitivity saturated.</p>
Full article ">Figure 3
<p>Proposed 4 × 4 quantization weighting matrices for intra- and interprediction modes.</p>
Full article ">Figure 4
<p>Rate-distortion curves comparing our proposed CSF with the default implemented in the HEVC standard using different perceptual metrics. (<b>a</b>,<b>b</b>) correspond to the BQTerrace sequence of class B, while (<b>c</b>,<b>d</b>) correspond to the ChinaSpeed sequence of class F.</p>
Full article ">Figure 5
<p>Samples of manually classified blocks (left-hand side) and their associated polar diagram of the MDV metric (right-hand side). From top to bottom: <math display="inline"><semantics> <mrow> <mn>8</mn> <mo>×</mo> <mn>8</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mn>16</mn> <mo>×</mo> <mn>16</mn> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <mn>32</mn> <mo>×</mo> <mn>32</mn> </mrow> </semantics></math> block sizes; from left- to right-hand side: plain, edge, and texture blocks.</p>
Full article ">Figure 6
<p>(<b>a</b>) Scatter plot of manually classified <math display="inline"><semantics> <mrow> <mn>16</mn> <mo>×</mo> <mn>16</mn> </mrow> </semantics></math> blocks (training dataset), and (<b>b</b>) the classification results provided by the trained SVM model (testing dataset).</p>
Full article ">Figure 7
<p>Example of block classification for the first frame of sequence BasketballDrill, using optimal SVM models for each block size.</p>
Full article ">Figure 8
<p>Box and whisker plot of the block energy (<math display="inline"><semantics> <mi>ε</mi> </semantics></math>) distribution by size and texture classification.</p>
Full article ">Figure 9
<p>Representation of Equation (6) for two sets of function parameter, (<b>red</b>) <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>i</mi> <mi>n</mi> <msub> <mi>E</mi> <mn>1</mn> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>a</mi> <mi>x</mi> <msub> <mi>E</mi> <mn>1</mn> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>a</mi> <mi>x</mi> <mi>Q</mi> <mi>S</mi> <mi>t</mi> <mi>e</mi> <msub> <mi>p</mi> <mn>1</mn> </msub> </mrow> </semantics></math> and (<b>blue</b>) <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>i</mi> <mi>n</mi> <msub> <mi>E</mi> <mn>2</mn> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>a</mi> <mi>x</mi> <msub> <mi>E</mi> <mn>2</mn> </msub> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>a</mi> <mi>x</mi> <mi>Q</mi> <mi>S</mi> <mi>t</mi> <mi>e</mi> <msub> <mi>p</mi> <mn>2</mn> </msub> </mrow> </semantics></math>. <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>Q</mi> <mi>S</mi> <mi>t</mi> <mi>e</mi> <msub> <mi>p</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> </mrow> </semantics></math> is different for each set.</p>
Full article ">Figure 10
<p>Flowchart of candidate selection for brute-force analysis of perceptually optimal parameters. The Ps in energy range boxes refer to the percentile.</p>
Full article ">Figure 11
<p>BD-rate curves (MS-SSIM metric) for PeopleOnStreet video test sequence over the <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>a</mi> <mi>x</mi> <mi>Q</mi> <mi>S</mi> <mi>t</mi> <mi>e</mi> <mi>p</mi> </mrow> </semantics></math> parameter when modifying texture blocks of size 8. Each curve represents a different block energy range (<math display="inline"><semantics> <mrow> <mi>M</mi> <mi>i</mi> <mi>n</mi> <mi>E</mi> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>a</mi> <mi>x</mi> <mi>E</mi> </mrow> </semantics></math>).</p>
Full article ">Figure 12
<p>Rate-distortion curves of the first frame of the BQSquare sequence, comparing our proposed contrast masking (red line) and contrast and texture masking (yellow line) with the HM reference coding (blue line), using the (<b>a</b>) SSIM, (<b>b</b>) MS-SSIM, and (<b>c</b>) PSNR-HVS-M perceptual metrics.</p>
Full article ">Figure 13
<p>Visual comparison of the first frame of the BQSquare sequence encoded at <math display="inline"><semantics> <mrow> <mi>Q</mi> <mi>P</mi> <mo>=</mo> <mn>22</mn> </mrow> </semantics></math>. (<b>a</b>) HM reference-encoded frame; (<b>b</b>) frame encoded with contrast and texture masking.</p>
Full article ">Figure 13 Cont.
<p>Visual comparison of the first frame of the BQSquare sequence encoded at <math display="inline"><semantics> <mrow> <mi>Q</mi> <mi>P</mi> <mo>=</mo> <mn>22</mn> </mrow> </semantics></math>. (<b>a</b>) HM reference-encoded frame; (<b>b</b>) frame encoded with contrast and texture masking.</p>
Full article ">Figure A1
<p>Traffic 2560 × 1600 30 fps Class A.</p>
Full article ">Figure A2
<p>PeopleOnStreet 2560 × 1600 30 fps Class A.</p>
Full article ">Figure A3
<p>NebutaFestival 2560 × 1600 60 fps Class A.</p>
Full article ">Figure A4
<p>SteamLocomotiveTrain 2560 × 1600 60 fps Class A.</p>
Full article ">Figure A5
<p>Kimono 1920 × 1080 24 fps Class B.</p>
Full article ">Figure A6
<p>ParkScene 1920 × 1080 24 fps Class B.</p>
Full article ">Figure A7
<p>Cactus 1920 × 1080 50 fps Class B.</p>
Full article ">Figure A8
<p>BQTerrace 1920 × 1080 60 fps Class B.</p>
Full article ">Figure A9
<p>BasketballDrive 1920 × 1080 50 fps Class B.</p>
Full article ">Figure A10
<p>RaceHorses 832 × 480 30 fps Class C.</p>
Full article ">Figure A11
<p>BQMall 832 × 480 60 fps Class C.</p>
Full article ">Figure A12
<p>PartyScene 832 × 480 50 fps Class C.</p>
Full article ">Figure A13
<p>BasketballDrill 832 × 480 50 fps Class C.</p>
Full article ">Figure A14
<p>RaceHorses 416 × 240 30 fps Class D.</p>
Full article ">Figure A15
<p>BQSquare 416 × 240 60 fps Class D.</p>
Full article ">Figure A16
<p>BlowingBubbles 416 × 240 50 fps Class D.</p>
Full article ">Figure A17
<p>BasketballPass 416 × 240 50 fps Class D.</p>
Full article ">Figure A18
<p>FourPeople 1280 × 720 60 fps Class E.</p>
Full article ">Figure A19
<p>Johnny 1280 × 720 60 fps Class E.</p>
Full article ">Figure A20
<p>KristenAndSara 1280 × 720 60 fps Class E.</p>
Full article ">Figure A21
<p>BasketballDrillText 832 × 480 50 fps Class F.</p>
Full article ">Figure A22
<p>ChinaSpeed 1024 × 768 30 fps Class F.</p>
Full article ">Figure A23
<p>SlideEditing 1280 × 720 30 fps Class F.</p>
Full article ">Figure A24
<p>SlideShow 1280 × 720 20 fps Class F.</p>
Full article ">
7 pages, 1581 KiB  
Short Note
Bis(3-(((4-methoxybenzyl)oxy)methyl)-5,6-dihydro-1,4-dithiin-2-yl)methanol
by Anna Esposito and Annalisa Guaragna
Molbank 2024, 2024(3), M1867; https://doi.org/10.3390/M1867 - 14 Aug 2024
Viewed by 607
Abstract
An organolithium reagent containing a 5,6-dihydro-1,4-dithiin moiety has been herein used as homologating agent to build up a fully protected divinylcarbinol by two different synthetic procedures, respectively, based on a step-by-step approach or a tandem process. The resulting molecule contains two double bonds [...] Read more.
An organolithium reagent containing a 5,6-dihydro-1,4-dithiin moiety has been herein used as homologating agent to build up a fully protected divinylcarbinol by two different synthetic procedures, respectively, based on a step-by-step approach or a tandem process. The resulting molecule contains two double bonds masked by two dithiodimethylene bridges that can be stereoselectively removed to give a E,E- or Z,Z-configured divinylcarbinol. These products could then be conveniently functionalized, for example, with hydroxyl or amino functions, for the construction of the skeleton of more complex systems. Full article
(This article belongs to the Collection Molecules from Side Reactions)
Show Figures

Figure 1

Figure 1
<p>Synthetic applications of dhdt-2-PMBOM (<b>1</b>) in the construction of bio- and glycomimetics.</p>
Full article ">Scheme 1
<p>Simplified representation of the synthetic path for the construction of five- and six-member rings, starting from the coupling products obtained by homologation of lithiated carbanion <b>2</b> with different electrophiles.</p>
Full article ">Scheme 2
<p>Synthetic path to <span class="html-italic">O</span>-protected divinylcarbinol <b>5</b> by a two-step procedure.</p>
Full article ">Scheme 3
<p>Synthetic path to <span class="html-italic">O</span>-protected divinylcarbinol <b>5</b> by a tandem process.</p>
Full article ">Scheme 4
<p>Desulfurization reaction to bis(<span class="html-italic">cis</span>- and <span class="html-italic">trans</span>-configured) <span class="html-italic">O</span>-protected divinylcarbinols <b>7</b> and <b>8</b>.</p>
Full article ">
16 pages, 2210 KiB  
Article
Long 3D-POT: A Long-Term 3D Drosophila-Tracking Method for Position and Orientation with Self-Attention Weighted Particle Filters
by Chengkai Yin, Xiang Liu, Xing Zhang, Shuohong Wang and Haifeng Su
Appl. Sci. 2024, 14(14), 6047; https://doi.org/10.3390/app14146047 - 11 Jul 2024
Viewed by 796
Abstract
The study of the intricate flight patterns and behaviors of swarm insects, such as drosophilas, has long been a subject of interest in both the biological and computational realms. Tracking drosophilas is an essential and indispensable method for researching drosophilas’ behaviors. Still, it [...] Read more.
The study of the intricate flight patterns and behaviors of swarm insects, such as drosophilas, has long been a subject of interest in both the biological and computational realms. Tracking drosophilas is an essential and indispensable method for researching drosophilas’ behaviors. Still, it remains a challenging task due to the highly dynamic nature of these drosophilas and their partial occlusion in multi-target environments. To address these challenges, particularly in environments where multiple targets (drosophilas) interact and overlap, we have developed a long-term Trajectory 3D Position and Orientation Tracking Method (Long 3D-POT) that combines deep learning with particle filtering. Our approach employs a detection model based on an improved Mask-RCNN to accurately detect the position and state of drosophilas from frames, even when they are partially occluded. Following detection, improved particle filtering is used to predict and update the motion of the drosophilas. To further enhance accuracy, we have introduced a prediction module based on the self-attention backbone that predicts the drosophila’s next state and updates the particles’ weights accordingly. Compared with previous methods by Ameni, Cheng, and Wang, our method has demonstrated a higher degree of accuracy and robustness in tracking the long-term trajectories of drosophilas, even those that are partially occluded. Specifically, Ameni employs the Interacting Multiple Model (IMM) combined with the Global Nearest Neighbor (GNN) assignment algorithm, primarily designed for tracking larger, more predictable targets like aircraft, which tends to perform poorly with small, fast-moving objects like drosophilas. The method by Cheng then integrates particle filtering with LSTM networks to predict particle weights, enhancing trajectory prediction under kinetic uncertainties. Wang’s approach builds on Cheng’s by incorporating an estimation of the orientation of drosophilas in order to refine tracking further. Compared with those methods, our method performs with higher accuracy on detection, which increases by more than 10% on the F1 Score, and tracks more long-term trajectories, showing stability. Full article
(This article belongs to the Special Issue Evolutionary Computation Meets Deep Learning)
Show Figures

Figure 1

Figure 1
<p>The general flowchart of our method. Each color in the resultant trajectories refer to a trajectory of a drosophila.</p>
Full article ">Figure 2
<p>Visual comparison between an input frame and its subtracted frame. (<b>a</b>) Input frame; (<b>b</b>) after the subtraction using MOG2.</p>
Full article ">Figure 3
<p>This figure shows a series of drosophila objects detected by our method.</p>
Full article ">Figure 4
<p>The estimation of a detected drosophila object. The red line refers to the 2D orientation of the object.</p>
Full article ">Figure 5
<p>The comparison of detection performance.</p>
Full article ">Figure 6
<p>The comparison of the distribution of trajectory lengths obtained using the different methods.</p>
Full article ">Figure 7
<p>The visualizations of two sets of long trajectories. Each color refers to a trajectory of a drosophila. (<b>a</b>) This figure presents the first set of 20 long-distance trajectories as viewed from one angle. The trajectories are plotted in a 3D coordinate system, showcasing the intricate paths taken by the drosophilas over an extended period. (<b>b</b>) Offering a different perspective, this figure displays the same set of trajectories from another angle. This alternate view further emphasizes the accuracy of our tracking system in capturing the 3D dynamics of fruit fly movement. (<b>c</b>) Similar to the first set, we visualized another group of 20 long-distance trajectories. This figure presents these trajectories from one angle, highlighting the consistency and continuity of our tracking results. (<b>d</b>) Providing a complementary perspective, this figure shows the second set of trajectories from a different angle. The variation in viewing angles helps us appreciate the three-dimensional nature of the trajectories and the effectiveness of our tracking approach.</p>
Full article ">
20 pages, 17993 KiB  
Article
Semantic 3D Reconstruction for Volumetric Modeling of Defects in Construction Sites
by Dimitrios Katsatos, Paschalis Charalampous, Patrick Schmidt, Ioannis Kostavelis, Dimitrios Giakoumis, Lazaros Nalpantidis and Dimitrios Tzovaras
Robotics 2024, 13(7), 102; https://doi.org/10.3390/robotics13070102 - 11 Jul 2024
Viewed by 1169
Abstract
The appearance of construction defects in buildings can arise from a variety of factors, ranging from issues during the design and construction phases to problems that develop over time with the lifecycle of a building. These defects require repairs, often in the context [...] Read more.
The appearance of construction defects in buildings can arise from a variety of factors, ranging from issues during the design and construction phases to problems that develop over time with the lifecycle of a building. These defects require repairs, often in the context of a significant shortage of skilled labor. In addition, such work is often physically demanding and carried out in hazardous environments. Consequently, adopting autonomous robotic systems in the construction industry becomes essential, as they can relieve labor shortages, promote safety, and enhance the quality and efficiency of repair and maintenance tasks. Hereupon, the present study introduces an end-to-end framework towards the automation of shotcreting tasks in cases where construction or repair actions are required. The proposed system can scan a construction scene using a stereo-vision camera mounted on a robotic platform, identify regions of defects, and reconstruct a 3D model of these areas. Furthermore, it automatically calculates the required 3D volumes to be constructed to treat a detected defect. To achieve all of the above-mentioned technological tools, the developed software framework employs semantic segmentation and 3D reconstruction modules based on YOLOv8m-seg, SiamMask, InfiniTAM, and RTAB-Map, respectively. In addition, the segmented 3D regions are processed by the volumetric modeling component, which determines the amount of concrete needed to fill the defects. It generates the exact 3D model that can repair the investigated defect. Finally, the precision and effectiveness of the proposed pipeline are evaluated in actual construction site scenarios, featuring reinforcement bars as defective areas. Full article
(This article belongs to the Special Issue Localization and 3D Mapping of Intelligent Robotics)
Show Figures

Figure 1

Figure 1
<p>In (<b>a,c</b>), the construction of tunnels is depicted, while (<b>b</b>) shows the construction of ground support walls. All images feature exposed reinforcement bars and highlight the labor-intensive process of spraying concrete (shotcreting) to fill the surface [<a href="#B9-robotics-13-00102" class="html-bibr">9</a>].</p>
Full article ">Figure 2
<p>Workflow of the volumetric modeling approach.</p>
Full article ">Figure 3
<p>(<b>a</b>) Main concept of the missing volume computation (<b>b</b>) Typical projected mesh of a defected region and (<b>c</b>) Generated 3D model repairing the defect.</p>
Full article ">Figure 4
<p>(<b>a</b>) Theoretical point cloud construction methodology and (<b>b</b>) Typical computed 3D model and its corresponding point cloud representation.</p>
Full article ">Figure 5
<p>(<b>a</b>) Cyan-labeled mask area in RGB encoding. (<b>b</b>) Corresponding mask area in binary format. (<b>c</b>) Sensor’s raw depth image. (<b>d</b>) Cropped depth image based on binary mask.</p>
Full article ">Figure 6
<p>In the outdoor scene in Greece, the left image shows Testbed 01-a featuring a replica rebar structure, while the right image depicts Testbed 01-b with actual reinforcement bars used in building construction. Both rebars are installed in a custom wooden frame.</p>
Full article ">Figure 7
<p>The semi-indoor scene in Denmark serves as Testbed 02, situated within the construction site. Here, wooden frames with exposed reinforcement bars are prepared for shotcreting.</p>
Full article ">Figure 8
<p>Testbed 03 is a semi-indoor scene, where the surface presents various flaws requiring proper treatment.</p>
Full article ">Figure 9
<p>On the left: Robotnik Summit XL platform equipped with a Roboception RC-Visard 160 stereo camera mounted on the side. On the right: real-time testing of the integrated system during the experimental session in Testbed 01-a.</p>
Full article ">Figure 10
<p>Three-dimensional views of semantic 3D reconstruction. <b>Top</b> sequence: results for Testbed 01-a (<b>left</b>) and Testbed 01-b (<b>right</b>). <b>Bottom</b> sequence: results for Testbed 02.</p>
Full article ">Figure 11
<p>(<b>a</b>) Projected mesh of the investigated cases and (<b>b</b>) registered 3D generated models on the input scenes.</p>
Full article ">Figure 12
<p>(<b>a</b>) Computation of the projected mesh in each defected region and (<b>b</b>) generation of the appropriate 3D model on the scene with multiple defects.</p>
Full article ">Figure 13
<p>Registration and alignment of the generated point clouds in the 3D reconstructed scene.</p>
Full article ">Figure 14
<p>Comparison between the automatically computed 3D models and the theoretical ones for (<b>a</b>) Testbed 01-b, (<b>b</b>) Testbed 01-a, (<b>c</b>) Testbed 02 (LR) and Testbed 02 (RR), (<b>d</b>) Testbed 03.</p>
Full article ">
21 pages, 10870 KiB  
Article
An Improved Instance Segmentation Method for Fast Assessment of Damaged Buildings Based on Post-Earthquake UAV Images
by Ran Zou, Jun Liu, Haiyan Pan, Delong Tang and Ruyan Zhou
Sensors 2024, 24(13), 4371; https://doi.org/10.3390/s24134371 - 5 Jul 2024
Cited by 2 | Viewed by 1175
Abstract
Quickly and accurately assessing the damage level of buildings is a challenging task for post-disaster emergency response. Most of the existing research mainly adopts semantic segmentation and object detection methods, which have yielded good results. However, for high-resolution Unmanned Aerial Vehicle (UAV) imagery, [...] Read more.
Quickly and accurately assessing the damage level of buildings is a challenging task for post-disaster emergency response. Most of the existing research mainly adopts semantic segmentation and object detection methods, which have yielded good results. However, for high-resolution Unmanned Aerial Vehicle (UAV) imagery, these methods may result in the problem of various damage categories within a building and fail to accurately extract building edges, thus hindering post-disaster rescue and fine-grained assessment. To address this issue, we proposed an improved instance segmentation model that enhances classification accuracy by incorporating a Mixed Local Channel Attention (MLCA) mechanism in the backbone and improving small object segmentation accuracy by refining the Neck part. The method was tested on the Yangbi earthquake UVA images. The experimental results indicated that the modified model outperformed the original model by 1.07% and 1.11% in the two mean Average Precision (mAP) evaluation metrics, mAPbbox50 and mAPseg50, respectively. Importantly, the classification accuracy of the intact category was improved by 2.73% and 2.73%, respectively, while the collapse category saw an improvement of 2.58% and 2.14%. In addition, the proposed method was also compared with state-of-the-art instance segmentation models, e.g., Mask-R-CNN and YOLO V9-Seg. The results demonstrated that the proposed model exhibits advantages in both accuracy and efficiency. Specifically, the efficiency of the proposed model is three times faster than other models with similar accuracy. The proposed method can provide a valuable solution for fine-grained building damage evaluation. Full article
Show Figures

Figure 1

Figure 1
<p>Refer to the legend in Figure 3c for the following categories: green represents the intact category, yellow represents the slight category, orange represents the severe category, and red represents the collapse category. Comparing results from different research methods. (<b>a</b>) Label image, real damaged area, and damage category of the building. (<b>b</b>) Dividing the image into multiple sub-images and performing classification on each sub-image. (<b>c</b>) Object detection, detecting the location of buildings, marking them with rectangles, and classifying the degree of damage. (<b>d</b>) Semantic segmentation, classifying each pixel in the image to obtain an overall evaluation. (<b>e</b>) Instance segmentation, locating each building and classifying its degree of damage.</p>
Full article ">Figure 2
<p>Research area, vector data source: <a href="https://geo.datav.aliyun.com/areas_v3/bound/530000_full.json" target="_blank">https://geo.datav.aliyun.com/areas_v3/bound/530000_full.json</a> (accessed on 6 June 2024) and <a href="https://geo.datav.aliyun.com/areas_v3/bound/532922.json" target="_blank">https://geo.datav.aliyun.com/areas_v3/bound/532922.json</a> (accessed on 6 June 2024).</p>
Full article ">Figure 3
<p>(<b>a</b>) Label annotation, depicting the edges of the building. (<b>b</b>) Visualization results, marking the damage status of the building. (<b>c</b>) Legend of the visualization results, representing different degrees of damage to the building.</p>
Full article ">Figure 4
<p>Data augmentation processes. (<b>a</b>) Original image, (<b>b</b>) brightness −20%, (<b>c</b>) brightness +20%, (<b>d</b>) rotate right <math display="inline"><semantics> <mrow> <msup> <mrow> <mn>90</mn> </mrow> <mrow> <mo>°</mo> </mrow> </msup> </mrow> </semantics></math>, (<b>e</b>) rotate right <math display="inline"><semantics> <mrow> <msup> <mrow> <mn>180</mn> </mrow> <mrow> <mo>°</mo> </mrow> </msup> </mrow> </semantics></math>, (<b>f</b>) rotate right <math display="inline"><semantics> <mrow> <msup> <mrow> <mn>270</mn> </mrow> <mrow> <mo>°</mo> </mrow> </msup> </mrow> </semantics></math>.</p>
Full article ">Figure 5
<p>The overall structure of the improved YOLOv5-Seg model. The red rectangular box is the improved part that this paper showcases.</p>
Full article ">Figure 6
<p>Detailed composition of modules in the improved YOLOv5-Seg model.</p>
Full article ">Figure 7
<p>Detailed information about MLCA (Mixed Local Channel Attention).</p>
Full article ">Figure 8
<p>Scale sequence feature fusion (SSFF) module details.</p>
Full article ">Figure 9
<p>Triple feature encoding (TFE) module details, used for aggregating multi-scale features.</p>
Full article ">Figure 10
<p>Channel and Position Attention Mechanism (CPAM) module details.</p>
Full article ">Figure 11
<p>All experimental results were compared in terms of <math display="inline"><semantics> <mrow> <msubsup> <mrow> <mi>m</mi> <mi>A</mi> <mi>P</mi> </mrow> <mrow> <mi>b</mi> <mi>b</mi> <mi>o</mi> <mi>x</mi> </mrow> <mrow> <mn>50</mn> </mrow> </msubsup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msubsup> <mrow> <mi>m</mi> <mi>A</mi> <mi>P</mi> </mrow> <mrow> <mi>s</mi> <mi>e</mi> <mi>g</mi> </mrow> <mrow> <mn>50</mn> </mrow> </msubsup> </mrow> </semantics></math> accuracy, parameters, and inference time between different models.</p>
Full article ">Figure 12
<p>Refer to the legend in <a href="#sensors-24-04371-f003" class="html-fig">Figure 3</a>c for the following categories: green represents the intact category, yellow represents the slight category, orange represents the severe category, and red represents the collapse category. Visualization results comparing the differences in the actual and prediction results of several excellent models. The name of each model is at the top.</p>
Full article ">
Back to TopTop