Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (244)

Search Parameters:
Keywords = CUDA

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 1757 KiB  
Article
End-to-End Deployment of Winograd-Based DNNs on Edge GPU
by Pierpaolo Mori, Mohammad Shanur Rahman, Lukas Frickenstein, Shambhavi Balamuthu Sampath, Moritz Thoma, Nael Fasfous, Manoj Rohit Vemparala, Alexander Frickenstein, Walter Stechele and Claudio Passerone
Electronics 2024, 13(22), 4538; https://doi.org/10.3390/electronics13224538 - 19 Nov 2024
Viewed by 269
Abstract
The Winograd algorithm reduces the computational complexity of convolutional neural networks (CNNs) by minimizing the number of multiplications required for convolutions, making it particularly suitable for resource-constrained edge devices. Concurrently, most edge hardware accelerators utilize 8-bit integer arithmetic to enhance energy efficiency and [...] Read more.
The Winograd algorithm reduces the computational complexity of convolutional neural networks (CNNs) by minimizing the number of multiplications required for convolutions, making it particularly suitable for resource-constrained edge devices. Concurrently, most edge hardware accelerators utilize 8-bit integer arithmetic to enhance energy efficiency and reduce inference latency, requiring the quantization of CNNs before deployment. Combining Winograd-based convolution with quantization offers the potential for both performance acceleration and reduced energy consumption. However, prior research has identified significant challenges in this combination, particularly due to numerical instability and substantial accuracy degradation caused by the transformations required in the Winograd domain, making the two techniques incompatible on edge hardware. In this work, we describe our latest training scheme, which addresses these challenges, enabling the successful integration of Winograd-accelerated convolution with low-precision quantization while maintaining high task-related accuracy. Our approach mitigates the numerical instability typically introduced during the transformation, ensuring compatibility between the two techniques. Additionally, we extend our work by presenting a custom-optimized CUDA implementation of quantized Winograd convolution for NVIDIA edge GPUs. This implementation takes full advantage of the proposed training scheme, achieving both high computational efficiency and accuracy, making it a compelling solution for edge-based AI applications. Our training approach enables significant MAC reduction with minimal impact on prediction quality. Furthermore, our hardware results demonstrate up to a 3.4× latency reduction for specific layers, and a 1.44× overall reduction in latency for the entire DeepLabV3 model, compared to the standard implementation. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>The three steps of the <math display="inline"><semantics> <mrow> <mi>F</mi> <mo>(</mo> <mn>4</mn> <mo>,</mo> <mn>3</mn> <mo>)</mo> </mrow> </semantics></math> Winograd algorithm: (1) input and weight transformation, (2) element-wise matrix multiplication (EWMM) of the transformed matrices, and (3) inverse transformation to produce the spatial output feature maps. The numerical instability due to quantization is highlighted.</p>
Full article ">Figure 2
<p>Comparison of the (<b>a</b>) standard Winograd quantized transformation against (<b>b</b>) the Winograd quantized transformation that leverages trainable clipping factors to better exploit the quantized range.</p>
Full article ">Figure 3
<p>Overview of the proposed Winograd aware quantized training. Straight-through estimator (STE) is used to approximate the gradient of the quantization function. Trainable clipping factors <span class="html-italic">c</span>, <math display="inline"><semantics> <msub> <mi>α</mi> <mrow> <mi>t</mi> <mi>a</mi> </mrow> </msub> </semantics></math>, and <math display="inline"><semantics> <msub> <mi>α</mi> <mrow> <mi>t</mi> <mi>w</mi> </mrow> </msub> </semantics></math> are highlighted in <span style="color: #FF0000">red</span>.</p>
Full article ">Figure 4
<p>Input transformation kernel overview. The input volume is divided in sub-volumes and each thread block is responsible for the transformation of a sub-volume.</p>
Full article ">Figure 5
<p>Element-wise matrix multiplication kernel overview. The computation is organized in <math display="inline"><semantics> <mrow> <mn>6</mn> <mo>×</mo> <mn>6</mn> </mrow> </semantics></math> GEMMs. Each one is responsible for the computation of <math display="inline"><semantics> <mrow> <msub> <mi>N</mi> <mrow> <mi>t</mi> <mi>i</mi> <mi>l</mi> <mi>e</mi> <mi>s</mi> </mrow> </msub> <mo>×</mo> <msub> <mi>C</mi> <mi>o</mi> </msub> </mrow> </semantics></math> output pixels in the Winograd domain.</p>
Full article ">Figure 6
<p>Inverse transformation kernel overview. The Winograd tiles produced by the EWMM kernel are transformed back to the spatial domain. Each thread block is responsible for the computation of a <math display="inline"><semantics> <mrow> <mn>4</mn> <mo>×</mo> <mn>4</mn> <mo>×</mo> <msub> <mi>P</mi> <mrow> <mi>o</mi> <mi>c</mi> </mrow> </msub> </mrow> </semantics></math> output pixel.</p>
Full article ">Figure 7
<p>Numerical distributions of example layers for transformed weights and activations of ResNet-20 on CIFAR-10. The values in the clipped range (green) sufficiently contain the information needed to maintain high-accuracy full 8-bit Winograd.</p>
Full article ">Figure 8
<p>Latency speedup brought by the custom Winograd <math display="inline"><semantics> <mrow> <mi>F</mi> <mo>(</mo> <mn>4</mn> <mo>,</mo> <mn>3</mn> <mo>)</mo> </mrow> </semantics></math> kernels compared to cuDNN convolution on Tensor Cores (<tt>int8x32</tt>).</p>
Full article ">Figure 9
<p>The latency contribution of each of the three steps in the Winograd <math display="inline"><semantics> <mrow> <mi>F</mi> <mo>(</mo> <mn>4</mn> <mo>,</mo> <mn>3</mn> <mo>)</mo> </mrow> </semantics></math> algorithm. In each sub-figure, the spatial dimensions are fixed, while the channel dimensions are varied.</p>
Full article ">
22 pages, 689 KiB  
Article
GPU Accelerating Algorithms for Three-Layered Heat Conduction Simulations
by Nicolás Murúa, Aníbal Coronel, Alex Tello, Stefan Berres and Fernando Huancas
Mathematics 2024, 12(22), 3503; https://doi.org/10.3390/math12223503 - 9 Nov 2024
Viewed by 411
Abstract
In this paper, we consider the finite difference approximation for a one-dimensional mathematical model of heat conduction in a three-layered solid with interfacial conditions for temperature and heat flux between the layers. The finite difference scheme is unconditionally stable, convergent, and equivalent to [...] Read more.
In this paper, we consider the finite difference approximation for a one-dimensional mathematical model of heat conduction in a three-layered solid with interfacial conditions for temperature and heat flux between the layers. The finite difference scheme is unconditionally stable, convergent, and equivalent to the solution of two linear algebraic systems. We evaluate various methods for solving the involved linear systems by analyzing direct and iterative solvers, including GPU-accelerated approaches using CuPy and PyCUDA. We evaluate performance and scalability and contribute to advancing computational techniques for modeling complex physical processes accurately and efficiently. Full article
(This article belongs to the Special Issue Advances in High-Performance Computing, Optimization and Simulation)
Show Figures

Figure 1

Figure 1
<p>Visual representation of the three-layered solid.</p>
Full article ">Figure 2
<p>General flowchart for the numerical computation of the solution of (<a href="#FD9-mathematics-12-03503" class="html-disp-formula">9</a>)–(<a href="#FD12-mathematics-12-03503" class="html-disp-formula">12</a>) by applying the finite difference scheme (<a href="#FD13-mathematics-12-03503" class="html-disp-formula">13</a>) and (14).</p>
Full article ">Figure 3
<p>Specific flowchart for the numerical computation of the solution of (<a href="#FD9-mathematics-12-03503" class="html-disp-formula">9</a>)–(<a href="#FD12-mathematics-12-03503" class="html-disp-formula">12</a>) by applying the finite difference scheme (<a href="#FD13-mathematics-12-03503" class="html-disp-formula">13</a>) and (14).</p>
Full article ">Figure 4
<p>Comparison between analytical solution and numerical solution using the Jacobi method: (<b>a</b>) Analytical solution, (<b>b</b>) Jacobi method with <math display="inline"><semantics> <mrow> <msub> <mi>m</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>64</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>=</mo> <mn>500</mn> </mrow> </semantics></math>. The numerical temperature profile is clearly different from the analytic temperature profile. The inconsistency originated in the incorrect solution of the linear system by the selected linear solver.</p>
Full article ">Figure 5
<p>Numerical temperature profiles obtained with the conjugate gradient method, with (<b>a</b>) <math display="inline"><semantics> <mrow> <msub> <mi>m</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>64</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>=</mo> <mn>1000</mn> </mrow> </semantics></math>, and (<b>b</b>) <math display="inline"><semantics> <mrow> <msub> <mi>m</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>2048</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>=</mo> <mn>5000</mn> </mrow> </semantics></math>. The numerical temperature profiles are clearly different from the analytic temperature profiles. The figures show the inconsistency of the linear solver to approximate the linear system of the difference scheme.</p>
Full article ">Figure 6
<p>Temperature profiles for <math display="inline"><semantics> <mrow> <msub> <mi>m</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>2048</mn> </mrow> </semantics></math> and <span class="html-italic">N</span> = 10,000 obtained with (<b>a</b>) LU method in the case, (<b>b</b>) QR method. The figure show that the numerical temperature profiles converges the analytic temperature profile when we consider the LU linear solver to approximate the linear system of the difference scheme.</p>
Full article ">Figure 7
<p>Comparison of computational times as <span class="html-italic">N</span> increases with a fixed value of <math display="inline"><semantics> <msub> <mi>m</mi> <mi>i</mi> </msub> </semantics></math> for steps 3 and 4 on GPU and CPU in logarithmic scale: (<b>a</b>) Time in seconds for <math display="inline"><semantics> <mrow> <msub> <mi>m</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>128</mn> </mrow> </semantics></math>, (<b>b</b>) Speed-up obtained with GPU for <math display="inline"><semantics> <mrow> <msub> <mi>m</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>128</mn> </mrow> </semantics></math>, (<b>c</b>) Time in seconds for <math display="inline"><semantics> <mrow> <msub> <mi>m</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>2048</mn> </mrow> </semantics></math>, (<b>d</b>) Speed-up obtained with GPU for <math display="inline"><semantics> <mrow> <msub> <mi>m</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>2048</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 8
<p>Comparison of computational times as <span class="html-italic">N</span> increases with a fixed value of <math display="inline"><semantics> <msub> <mi>m</mi> <mi>i</mi> </msub> </semantics></math> for LU and QR solvers on GPU and CPU in logarithmic scale: (<b>a</b>) Time in seconds for <math display="inline"><semantics> <mrow> <msub> <mi>m</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>128</mn> </mrow> </semantics></math>, (<b>b</b>) speed-up obtained with GPU for <math display="inline"><semantics> <mrow> <msub> <mi>m</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>128</mn> </mrow> </semantics></math>, (<b>c</b>) Time in seconds for <math display="inline"><semantics> <mrow> <msub> <mi>m</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>2048</mn> </mrow> </semantics></math>, (<b>d</b>) Speed-Up obtained with GPU for <math display="inline"><semantics> <mrow> <msub> <mi>m</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>2048</mn> </mrow> </semantics></math>.</p>
Full article ">
17 pages, 3237 KiB  
Article
ssc-cdi: A Memory-Efficient, Multi-GPU Package for Ptychography with Extreme Data
by Yuri Rossi Tonin, Alan Zanoni Peixinho, Mauro Luiz Brandao-Junior, Paola Ferraz and Eduardo Xavier Miqueles
J. Imaging 2024, 10(11), 286; https://doi.org/10.3390/jimaging10110286 - 7 Nov 2024
Viewed by 787
Abstract
We introduce <tt>ssc-cdi</tt>, an open-source software package from the Sirius Scientific Computing family, designed for memory-efficient, single-node multi-GPU ptychography reconstruction. <tt>ssc-cdi</tt> offers a range of reconstruction engines in Python version 3.9.2 and C++/CUDA. It aims at developing local expertise and customized solutions to [...] Read more.
We introduce <tt>ssc-cdi</tt>, an open-source software package from the Sirius Scientific Computing family, designed for memory-efficient, single-node multi-GPU ptychography reconstruction. <tt>ssc-cdi</tt> offers a range of reconstruction engines in Python version 3.9.2 and C++/CUDA. It aims at developing local expertise and customized solutions to meet the specific needs of beamlines and user community of the Brazilian Synchrotron Light Laboratory (LNLS). We demonstrate ptychographic reconstruction of beamline data and present benchmarks for the package. Results show that <tt>ssc-cdi</tt> effectively handles extreme datasets typical of modern X-ray facilities without significantly compromising performance, offering a complementary approach to well-established packages of the community and serving as a robust tool for high-resolution imaging applications. Full article
(This article belongs to the Special Issue Recent Advances in X-ray Imaging)
Show Figures

Figure 1

Figure 1
<p>Diagram illustrating the batch distributions of measurements for three GPUs. Each colored block represents data inside of the respective GPU. A batch of <math display="inline"><semantics> <mrow> <mi>B</mi> <mo>≤</mo> <mi>N</mi> </mrow> </semantics></math> measurements is distributed to the GPU memory, so that the wavefronts are updated in parallel. Once a GPU finishes processing and is made available, a remaining batch of unprocessed data is loaded from RAM. After all batches have been loaded and all the wavefronts updated, the new object and probe matrices are calculated by GPU<sub>0</sub> and then broadcasted to the other GPUs, so that each of them has faster access to <span class="html-italic">O</span> and <span class="html-italic">P</span> in the subsequent iteration.</p>
Full article ">Figure 2
<p>Ptychography reconstruction of a Siemens Star measured at CARNAÚBA beamline. The finest features of the innermost circles are spaced <math display="inline"><semantics> <mrow> <mn>15</mn> <mspace width="0.166667em"/> <mi>nm</mi> </mrow> </semantics></math> from each other. The complex probe is shown in an hsv colormap, saturation encoding magnitude and hue encoding the phase.</p>
Full article ">Figure 3
<p>Comparison of the simulated sample against the reconstruction using the DM algorithm from different packages. The insets show a zoomed region from the red square in the object and phase reconstructions. The reconstruction for <tt>ssc-cdi</tt> used the RAAR algorithm with parameter <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, such that the update function equals that of DM. For <tt>PyNX</tt> and <tt>PtyPy</tt>; we used the DM engine directly. In all cases, the same initial guesses were used: random magnitude and constant phase for the object array, and an inverse Fourier transform of the averaged measurements for the probe.</p>
Full article ">Figure 4
<p>Single GPU performance of DM and PIE algorithms across different packages. The inset shows the same data without log scale on the vertical axis. Missing points on some curves indicate dimensions that were not supported by a specific engine. Note that <tt>PyNX</tt> does not provide an engine for an algorithm of the PIE family for comparison.</p>
Full article ">Figure 5
<p>Multi-GPU performance of DM algorithm for <tt>ssc-cdi</tt> and <tt>PtyPy</tt> using batch sizes of (<b>a</b>) 128 and (<b>b</b>) 16. Dimensions that were not supported by an engine are the reason for missing points for some of the curves. The inset plots the same data without log scale on the vertical axis.</p>
Full article ">Figure 5 Cont.
<p>Multi-GPU performance of DM algorithm for <tt>ssc-cdi</tt> and <tt>PtyPy</tt> using batch sizes of (<b>a</b>) 128 and (<b>b</b>) 16. Dimensions that were not supported by an engine are the reason for missing points for some of the curves. The inset plots the same data without log scale on the vertical axis.</p>
Full article ">Figure 6
<p>Single GPU performance of <tt>ssc-cdi</tt> for DM and PIE engines at a conventional machine. RAAR was run with batch size <math display="inline"><semantics> <mrow> <mi>B</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math> and managed to run up to a data size of <math display="inline"><semantics> <msup> <mn>2048</mn> <mn>2</mn> </msup> </semantics></math>.</p>
Full article ">
17 pages, 1369 KiB  
Article
Enabling Parallel Performance and Portability of Solid Mechanics Simulations Across CPU and GPU Architectures
by Nathaniel Morgan, Caleb Yenusah, Adrian Diaz, Daniel Dunning, Jacob Moore, Erin Heilman, Evan Lieberman, Steven Walton, Sarah Brown, Daniel Holladay, Russell Marki, Robert Robey and Marko Knezevic
Information 2024, 15(11), 716; https://doi.org/10.3390/info15110716 - 7 Nov 2024
Viewed by 597
Abstract
Efficiently simulating solid mechanics is vital across various engineering applications. As constitutive models grow more complex and simulations scale up in size, harnessing the capabilities of modern computer architectures has become essential for achieving timely results. This paper presents advancements in running parallel [...] Read more.
Efficiently simulating solid mechanics is vital across various engineering applications. As constitutive models grow more complex and simulations scale up in size, harnessing the capabilities of modern computer architectures has become essential for achieving timely results. This paper presents advancements in running parallel simulations of solid mechanics on multi-core CPUs and GPUs using a single-code implementation. This portability is made possible by the C++ matrix and array (MATAR) library, which interfaces with the C++ Kokkos library, enabling the selection of fine-grained parallelism backends (e.g., CUDA, HIP, OpenMP, pthreads, etc.) at compile time. MATAR simplifies the transition from Fortran to C++ and Kokkos, making it easier to modernize legacy solid mechanics codes. We applied this approach to modernize a suite of constitutive models and to demonstrate substantial performance improvements across different computer architectures. This paper includes comparative performance studies using multi-core CPUs along with AMD and NVIDIA GPUs. Results are presented using a hypoelastic–plastic model, a crystal plasticity model, and the viscoplastic self-consistent generalized material model (VPSC-GMM). The results underscore the potential of using the MATAR library and modern computer architectures to accelerate solid mechanics simulations. Full article
(This article belongs to the Special Issue Advances in High Performance Computing and Scalable Software)
Show Figures

Figure 1

Figure 1
<p>In this work, the MATAR library is used to modernize multiple Fortran material model implementations that are then coupled to the C++ Fierro mechanics code, which is also based on the MATAR library.</p>
Full article ">Figure 2
<p>The runtime scaling results are presented for the 2D axisymmetric, metal rod impact test conducted on both multi-core Haswell CPUs and GPU architectures. The data are displayed as wall clock time in seconds against increasing mesh resolution. Even on 2D meshes, significant accelerations of the runtime, relative to the serial, are possible on GPUs for larger mesh sizes.</p>
Full article ">Figure 3
<p>The runtime scaling results are presented for the 3D metal rod impact test conducted on both multi-core Haswell CPUs and GPU architectures. The data are displayed as the wall clock time in seconds against increasing the mesh resolution in 3D. The mesh resolution is the number of elements in the cross section of the rod by the number of elements in the vertical direction. Significant accelerations of the runtime, relative to the serial, are possible on GPUs for larger mesh sizes.</p>
Full article ">Figure 4
<p>Speedup comparisons for the 2D axisymmetric, metal rod impact test on a 40 × 416 2D cylindrical coordinate mesh using an equation of state with an isotropic hypoelastic–plastic model. Plot (<b>a</b>) presents the speedup compared to a serial run, and Plot (<b>b</b>) presents the speedup compared to a parallel 20 core run on the Haswell CPU. On a 2D mesh, GPUs give a significant boost to runtime performance over a serial and a multi-core CPU.</p>
Full article ">Figure 5
<p>Speedup comparisons for the 3D metal rod impact test on a mesh with 40 elements in the cross-section by 416 elements in the vertical direction using an equation of state with an isotropic hypoelastic–plastic model. Plot (<b>a</b>) presents the speedup compared to a serial run, and Plot (<b>b</b>) presents the speedup compared to a parallel 20 core run on the Haswell CPU. GPUs give a significant boost to runtime performance over a serial and a multi-core CPU.</p>
Full article ">Figure 6
<p>Von Mises-equivalent stress results in each element of the mesh for the 3D metal rod impact test using an elasto-viscoplastic single-crystal plasticity model.</p>
Full article ">Figure 7
<p>Speedup comparisons to a serial run for the 3D metal rod impact test using an elasto-viscoplastic single-crystal plasticity model. The Power9 CPU was used for the serial, 8-core, and 16-core calculations.</p>
Full article ">Figure 8
<p>The runtimes on a V100 GPU are shorter than 20 cores on a Power9 CPU only when there are many instances of the VPSC model.</p>
Full article ">Figure 9
<p>The scale-bridging VPSC-GMM model was made performant and portable across CPU and GPU architectures using the MATAR library. The speedup results are for 30, 100, 200, and 500 grains on five different computer architectures.</p>
Full article ">Figure 10
<p>A Taylor anvil impact test with a polycrystalline tantalum was simulated using the Fierro mechanics code with the VPSC-GMM. The rod deformation is a function of the texture of the material. The rod is colored by the von Mises stress [MPa].</p>
Full article ">Figure 11
<p>Speedup comparisons are shown for the VPSC-GMM coupled to the Fierro mechanics code. (<b>a</b>) Using the linear extrapolation scheme in combination with the VPSC model generates thread divergence can greatly hinder fine-grain parallelism. (<b>b</b>) Not using the linear extrapolation scheme in VPSC-GMM yields a favorable speedup on GPUs because it eliminates thread divergence.</p>
Full article ">
10 pages, 860 KiB  
Article
Erythrocytic α-Synuclein in Parkinson’s Disease and Progressive Supranuclear Palsy—A Pilot Study
by Costanza Maria Cristiani, Luana Scaramuzzino, Elvira Immacolata Parrotta, Giovanni Cuda, Aldo Quattrone and Andrea Quattrone
Biomedicines 2024, 12(11), 2510; https://doi.org/10.3390/biomedicines12112510 - 2 Nov 2024
Viewed by 557
Abstract
Background/Objectives: The current research examines the accuracy of α-synuclein in RBCs as a diagnostic biomarker for PD and PSP, despite their distinct molecular etiologies. Methods: We used ELISA to measure total, oligomeric, and p129-α-synuclein levels in erythrocytes from 8 PSP patients, 19 PD [...] Read more.
Background/Objectives: The current research examines the accuracy of α-synuclein in RBCs as a diagnostic biomarker for PD and PSP, despite their distinct molecular etiologies. Methods: We used ELISA to measure total, oligomeric, and p129-α-synuclein levels in erythrocytes from 8 PSP patients, 19 PD patients, and 18 healthy controls (HCs). The classification performances of RBC α-synuclein levels were investigated by receiver operator characteristic (ROC) curve. We also evaluated a possible correlation between RBC α-synuclein level and the biological and clinical features of our cohorts. Results: RBC total α-synuclein was higher in PSP patients compared to both PD patients and HCs, achieving good classification performance (AUC: 0.853) in distinguishing PSP patients from PD patients, with a sensitivity of 100% and a specificity of 70.6%; moreover, the levels of this biomarker positively correlated with disease severity in PSP group. Regarding oligomeric α-synuclein and p129-α-synuclein, the latter was slightly increased in RBCs from PSP patients compared to HCs, but no correlations were detected. Conclusions: Although these findings need to be confirmed in larger studies, our pilot work suggests that RBC total α-synuclein may represent a potential molecular biomarker for the differential diagnosis and clinical staging of PSP. Full article
Show Figures

Figure 1

Figure 1
<p>Erythrocytic concentration, corrected for Hb content, of total (<b>A</b>) oligomeric (<b>B</b>) and p129- α-synuclein (<b>C</b>) in PSP patients (<span class="html-italic">n</span> = 8), PD patients (<span class="html-italic">n</span> = 19), and HC (<span class="html-italic">n</span> = 18). Data are summarized as box plots, in which the lower, upper, and middle lines of boxes represent the 25th percentile, 75th percentile, and median, respectively, while limits of vertical lines indicate ranges. Shown <span class="html-italic">p</span>-values were obtained by ANCOVA with age and sex as covariates followed by Turkey’s LSD post hoc test. o-α-synuclein = oligomeric α-synuclein; PSP = progressive supranuclear palsy; PD = Parkinson’s disease; HC = healthy control.</p>
Full article ">Figure 2
<p>Performance of erythrocytic total α-synuclein/Hb in differentiating PSP from PD patients.</p>
Full article ">
65 pages, 2635 KiB  
Tutorial
Understanding the Flows of Signals and Gradients: A Tutorial on Algorithms Needed to Implement a Deep Neural Network from Scratch
by Przemysław Klęsk
Appl. Sci. 2024, 14(21), 9972; https://doi.org/10.3390/app14219972 - 31 Oct 2024
Viewed by 402
Abstract
Theano, TensorFlow, Keras, Torch, PyTorch, and other software frameworks have remarkably stimulated the popularity of deep learning (DL). Apart from all the good they achieve, the danger of such frameworks is that they unintentionally spur a black-box attitude. Some practitioners play around with [...] Read more.
Theano, TensorFlow, Keras, Torch, PyTorch, and other software frameworks have remarkably stimulated the popularity of deep learning (DL). Apart from all the good they achieve, the danger of such frameworks is that they unintentionally spur a black-box attitude. Some practitioners play around with building blocks offered by frameworks and rely on them, having a superficial understanding of the internal mechanics. This paper constitutes a concise tutorial that elucidates the flows of signals and gradients in deep neural networks, enabling readers to successfully implement a deep network from scratch. By “from scratch”, we mean with access to a programming language and numerical libraries but without any components that hide DL computations underneath. To achieve this goal, the following five topics need to be well understood: (1) automatic differentiation, (2) the initialization of weights, (3) learning algorithms, (4) regularization, and (5) the organization of computations. We cover all of these topics in the paper. From a tutorial perspective, the key contributions include the following: (a) proposition of R and S operators for tensors—rashape and stack, respectively—that facilitate algebraic notation of computations involved in convolutional, pooling, and flattening layers; (b) a Python project named hmdl (“home-made deep learning”); and (c) consistent notation across all mathematical contexts involved. The hmdl project serves as a practical example of implementation and a reference. It was built using NumPy and Numba modules with JIT and CUDA amenities applied. In the experimental section, we compare hmdl implementation to Keras (backed with TensorFlow). Finally, we point out the consistency of the two in terms of convergence and accuracy, and we observe the superiority of the latter in terms of efficiency. Full article
(This article belongs to the Special Issue Advanced Digital Signal Processing and Its Applications)
Show Figures

Figure 1

Figure 1
<p>Milestones in the history of neural networks and deep learning [<a href="#B3-applsci-14-09972" class="html-bibr">3</a>,<a href="#B4-applsci-14-09972" class="html-bibr">4</a>,<a href="#B5-applsci-14-09972" class="html-bibr">5</a>,<a href="#B6-applsci-14-09972" class="html-bibr">6</a>,<a href="#B7-applsci-14-09972" class="html-bibr">7</a>,<a href="#B8-applsci-14-09972" class="html-bibr">8</a>,<a href="#B9-applsci-14-09972" class="html-bibr">9</a>,<a href="#B10-applsci-14-09972" class="html-bibr">10</a>,<a href="#B11-applsci-14-09972" class="html-bibr">11</a>,<a href="#B12-applsci-14-09972" class="html-bibr">12</a>,<a href="#B13-applsci-14-09972" class="html-bibr">13</a>,<a href="#B14-applsci-14-09972" class="html-bibr">14</a>,<a href="#B15-applsci-14-09972" class="html-bibr">15</a>,<a href="#B16-applsci-14-09972" class="html-bibr">16</a>,<a href="#B17-applsci-14-09972" class="html-bibr">17</a>,<a href="#B18-applsci-14-09972" class="html-bibr">18</a>,<a href="#B19-applsci-14-09972" class="html-bibr">19</a>,<a href="#B20-applsci-14-09972" class="html-bibr">20</a>,<a href="#B21-applsci-14-09972" class="html-bibr">21</a>,<a href="#B22-applsci-14-09972" class="html-bibr">22</a>,<a href="#B23-applsci-14-09972" class="html-bibr">23</a>,<a href="#B24-applsci-14-09972" class="html-bibr">24</a>,<a href="#B25-applsci-14-09972" class="html-bibr">25</a>,<a href="#B26-applsci-14-09972" class="html-bibr">26</a>,<a href="#B27-applsci-14-09972" class="html-bibr">27</a>,<a href="#B28-applsci-14-09972" class="html-bibr">28</a>,<a href="#B29-applsci-14-09972" class="html-bibr">29</a>,<a href="#B30-applsci-14-09972" class="html-bibr">30</a>,<a href="#B31-applsci-14-09972" class="html-bibr">31</a>,<a href="#B32-applsci-14-09972" class="html-bibr">32</a>,<a href="#B33-applsci-14-09972" class="html-bibr">33</a>,<a href="#B34-applsci-14-09972" class="html-bibr">34</a>,<a href="#B35-applsci-14-09972" class="html-bibr">35</a>].</p>
Full article ">Figure 2
<p>Example medium-sized deep network built with popular layer types: convolutional, max pooling, dropout, flattening, and dense (prepared for the CIFAR-10 data set; see experiment with ID 3392187021 in <a href="#sec8-applsci-14-09972" class="html-sec">Section 8</a>).</p>
Full article ">Figure 3
<p>Example of a simple neural network for 10-class classification.</p>
Full article ">Figure 4
<p>Illustration of forward and backward computations at the junction of two convolutional layers <math display="inline"><semantics> <mi>α</mi> </semantics></math> and <math display="inline"><semantics> <mi>β</mi> </semantics></math> for the network structure from <a href="#applsci-14-09972-f003" class="html-fig">Figure 3</a>.</p>
Full article ">Figure 5
<p>Forward computations of max and average pooling for <math display="inline"><semantics> <mrow> <mi>S</mi> <mo>=</mo> <mn>4</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 6
<p>Illustration of backward computations for flattening and dropout layers. High dropout rate <math display="inline"><semantics> <mrow> <msup> <mi>r</mi> <mo>*</mo> </msup> <mo>=</mo> <mn>0.875</mn> </mrow> </semantics></math> chosen for readability of surviving connections.</p>
Full article ">Figure 7
<p>Illustration of two stages of backward computations for a dense layer using a softmax activation function.</p>
Full article ">Figure 8
<p>Forward pass leading to exploding signals: a fake network consisting of 100 dense layers with 512 neurons each (no activations) and intial weights drawn from a standard normal distribution.</p>
Full article ">Figure 9
<p>Fake forward pass leading to vanishing signals due to intial weights drawn from a normal distribution with standard deviation scaled by <math display="inline"><semantics> <msup> <mn>10</mn> <mrow> <mo>−</mo> <mn>2</mn> </mrow> </msup> </semantics></math>, regardless of <span class="html-italic">N</span>.</p>
Full article ">Figure 10
<p>Fake forward pass leading to a stable numerical behavior due to initial weights drawn from a properly scaled normal distribution with standard deviation <math display="inline"><semantics> <mrow> <mn>1</mn> <mo>/</mo> <msqrt> <mi>N</mi> </msqrt> </mrow> </semantics></math>.</p>
Full article ">Figure 11
<p>Training costs (losses) of Adam and other SGD algorithms on MNIST (<b>a</b>) and CIFAR-10 (<b>b</b>) data sets. Source: (Kingma and Ba, 2014) [<a href="#B25-applsci-14-09972" class="html-bibr">25</a>].</p>
Full article ">Figure 12
<p>UML class diagram of <span class="html-italic">hdml</span> project (<a href="https://github.com/pklesk/hmdl/blob/main/uml/classes.pdf" target="_blank">https://github.com/pklesk/hmdl/blob/main/uml/classes.pdf</a>) (accessed on 17 October 2024).</p>
Full article ">Figure 13
<p>Functions executing forward and backward computations from the abstract class <tt>Layer</tt>.</p>
Full article ">Figure 14
<p>The core of the <tt>fit</tt> function of <tt>SequentialClassifier</tt>. The main training loop (over epochs and batches) executes forward and backward computations through layers for each batch.</p>
Full article ">Figure 15
<p>Forward and backward passes for <tt>SequentialClassifier</tt>.</p>
Full article ">Figure 16
<p>Automatic differentiation for class <tt>Flatten</tt>.</p>
Full article ">Figure 17
<p>Automatic differentiation for class <tt>Dropout</tt>.</p>
Full article ">Figure 18
<p>Automatic differentiation for class <tt>Dense</tt>.</p>
Full article ">Figure 19
<p>Forward computations for class <tt>MaxPool2D</tt>: the simplest <tt>numpy</tt>-based variant.</p>
Full article ">Figure 20
<p>Backward computations for class <tt>MaxPool2D</tt>: the simplest <tt>numpy</tt>-based variant.</p>
Full article ">Figure 21
<p>Backward computations of gradient for class <tt>Conv2D</tt>: the simplest <tt>numpy</tt>-based variant (GEMM).</p>
Full article ">Figure 22
<p>Backward computations of error propagation for class <tt>Conv2D</tt>: the simplest <tt>numpy</tt>-based variant (GEMM).</p>
Full article ">Figure 23
<p>Execution times of convolutional layers (64 input and 64 output channels; batch size: 32) for different implementation variants, filter sizes, and image sizes. For details of the hardware and software environment see page 42.</p>
Full article ">Figure 24
<p>Functions for Glorot initalization of weights in the hmdl project.</p>
Full article ">Figure 25
<p>Functions for He initalization of weights in the hmdl project.</p>
Full article ">Figure 26
<p>Implementation of Adam (coupled with regularization) in the hmdl project.</p>
Full article ">Figure 27
<p>Sample images from data sets applied in experiments.</p>
Full article ">Figure 28
<p>Main settings for an experiment in script <tt>experimenter.py</tt>.</p>
Full article ">Figure 29
<p>Description of a network structure in the hmdl project, declared in <tt>experimenter.py</tt>.</p>
Full article ">Figure 30
<p>Choice of data set, randomization seed, and other settings in the script <tt>experimenter.py</tt>.</p>
Full article ">Figure A1
<p>Forward computations for class <tt>MaxPool2D</tt>: variant based on <tt>numba</tt>’s just-in-time compilation.</p>
Full article ">Figure A2
<p>Backward computations for class <tt>MaxPool2D</tt>: variant based on <tt>numba</tt>’s just-in-time compilation.</p>
Full article ">Figure A3
<p>Forward computations for class <tt>MaxPool2D</tt>: variant implemented for GPU computations using <tt>numba.cuda</tt> module.</p>
Full article ">Figure A4
<p>CUDA kernel function <tt>do_forward_numba_cuda_direct_job</tt> for forward max pooling computations invoked via function <tt>do_forward_numba_cuda_direct</tt>.</p>
Full article ">Figure A5
<p>Backward computations for class <tt>MaxPool2D</tt>: variant implemented for GPU computations using <tt>numba.cuda</tt> module.</p>
Full article ">Figure A6
<p>CUDA kernel function <tt>do_backward_numba_cuda_direct_job</tt> for backward max pooling computations invoked via function <tt>do_backward_numba_cuda_direct</tt>.</p>
Full article ">Figure A7
<p>Backward computations of gradient for class <tt>Conv2D</tt>: variant based on <tt>numba</tt>’s just-in-time compilation.</p>
Full article ">Figure A8
<p>Backward computations of gradient for class <tt>Conv2D</tt>: variant based directly on definition, using GPU computations and <tt>numba.cuda</tt>.</p>
Full article ">Figure A9
<p>CUDA kernel function <tt>do_backward_numba_cuda_direct_job</tt> for backward convolutional computations invoked via function <tt>do_backward_numba_cuda_direct</tt>.</p>
Full article ">Figure A10
<p>Backward computations of gradient for class <tt>Conv2D</tt>: variant based on tiles, using GPU and <tt>numba.cuda</tt>.</p>
Full article ">Figure A11
<p>CUDA kernel function <tt>do_backward_numba_cuda_tiles_job</tt> for backward convolutional computations invoked via function <tt>do_backward_numba_cuda_tiles</tt>.</p>
Full article ">
13 pages, 735 KiB  
Article
Proximity Elongation Assay and ELISA for the Identification of Serum Diagnostic Biomarkers in Parkinson’s Disease and Progressive Supranuclear Palsy
by Costanza Maria Cristiani, Camilla Calomino, Luana Scaramuzzino, Maria Stella Murfuni, Elvira Immacolata Parrotta, Maria Giovanna Bianco, Giovanni Cuda, Aldo Quattrone and Andrea Quattrone
Int. J. Mol. Sci. 2024, 25(21), 11663; https://doi.org/10.3390/ijms252111663 - 30 Oct 2024
Viewed by 527
Abstract
Clinical differentiation of progressive supranuclear palsy (PSP) from Parkinson’s disease (PD) is challenging due to overlapping phenotypes and late onset of PSP specific symptoms, highlighting the need for easily assessable biomarkers. We used proximity elongation assay (PEA) to analyze 460 proteins in serum [...] Read more.
Clinical differentiation of progressive supranuclear palsy (PSP) from Parkinson’s disease (PD) is challenging due to overlapping phenotypes and late onset of PSP specific symptoms, highlighting the need for easily assessable biomarkers. We used proximity elongation assay (PEA) to analyze 460 proteins in serum samples from 46 PD, 30 PSP patients, and 24 healthy controls. ANCOVA was used to identify the most promising proteins and machine learning (ML) XGBoost and random forest algorithms to assess their classification performance. Promising proteins were also quantified by ELISA. Moreover, correlations between serum biomarkers and biological and clinical features were investigated. We identified five proteins (TFF3, CPB1, OPG, CNTN1, TIMP4) showing different levels between PSP and PD, which achieved good performance (AUC: 0.892) when combined by ML. On the other hand, when the three most significant biomarkers (TFF3, CPB1 and OPG) were analyzed by ELISA, there was no difference between groups. Serum levels of TFF3 positively correlated with age in all subjects’ groups, while for OPG and CPB1 such a correlation occurred in PSP patients only. Moreover, CPB1 positively correlated with disease severity in PD, while no correlations were observed in the PSP group. Overall, we identified CPB1 correlating with PD severity, which may support clinical staging of PD. In addition, our results showing discrepancy between PEA and ELISA technology suggest that caution should be used when translating proteomic findings into clinical practice. Full article
Show Figures

Figure 1

Figure 1
<p>In Panel (<b>A</b>), proteins on the right of the <span class="html-italic">x</span>-axis have a higher concentration in PSP while proteins on the left have a higher concentration in PD. In Panel (<b>B</b>), proteins on the right of the <span class="html-italic">x</span>-axis have a higher concentration in PD while proteins on the left have a higher concentration in HC. In Panel (<b>C</b>), proteins on the right of the <span class="html-italic">x</span>-axis have a higher concentration in HC while proteins on the left have a higher concentration in PSP. Bonferroni’s correction method was employed to define the threshold for <span class="html-italic">p</span>-value significance. PD = Parkinson’s disease; PSP = progressive supranuclear palsy; HC = healthy control.</p>
Full article ">Figure 2
<p>Serum concentration of TFF3 (<b>A</b>), CPB1 (<b>B</b>), and OPG (<b>C</b>) in PD (n = 46), PSP (n = 30) and HC (n = 24), as measured by ELISA. In each box plot, the 25th percentile, 75th percentile, and median of data are depicted as lower, upper, and middle lines, respectively, while bars on vertical lines indicate ranges. The Kruskal–Wallis test was used to calculate shown p-values. TFF3 = trefoil factor 3; CPB1 = carboxypeptidase B1; OPG = osteoprotegerin; PD = Parkinson’s disease; PSP = progressive supranuclear palsy; HC = healthy controls.</p>
Full article ">
24 pages, 830 KiB  
Article
On a Simplified Approach to Achieve Parallel Performance and Portability Across CPU and GPU Architectures
by Nathaniel Morgan, Caleb Yenusah, Adrian Diaz, Daniel Dunning, Jacob Moore, Erin Heilman, Calvin Roth, Evan Lieberman, Steven Walton, Sarah Brown, Daniel Holladay, Marko Knezevic, Gavin Whetstone, Zachary Baker and Robert Robey
Information 2024, 15(11), 673; https://doi.org/10.3390/info15110673 - 28 Oct 2024
Cited by 1 | Viewed by 1748
Abstract
This paper presents software advances to easily exploit computer architectures consisting of a multi-core CPU and CPU+GPU to accelerate diverse types of high-performance computing (HPC) applications using a single code implementation. The paper describes and demonstrates the performance of the open-source C++ mat [...] Read more.
This paper presents software advances to easily exploit computer architectures consisting of a multi-core CPU and CPU+GPU to accelerate diverse types of high-performance computing (HPC) applications using a single code implementation. The paper describes and demonstrates the performance of the open-source C++ matrix and array (MATAR) library that uniquely offers: (1) a straightforward syntax for programming productivity, (2) usable data structures for data-oriented programming (DOP) for performance, and (3) a simple interface to the open-source C++ Kokkos library for portability and memory management across CPUs and GPUs. The portability across architectures with a single code implementation is achieved by automatically switching between diverse fine-grained parallelism backends (e.g., CUDA, HIP, OpenMP, pthreads, etc.) at compile time. The MATAR library solves many longstanding challenges associated with easily writing software that can run in parallel on any computer architecture. This work benefits projects seeking to write new C++ codes while also addressing the challenges of quickly making existing Fortran codes performant and portable over modern computer architectures with minimal syntactical changes from Fortran to C++. We demonstrate the feasibility of readily writing new C++ codes and modernizing existing codes with MATAR to be performant, parallel, and portable across diverse computer architectures. Full article
(This article belongs to the Special Issue Advances in High Performance Computing and Scalable Software)
Show Figures

Figure 1

Figure 1
<p>The C++ MATAR library builds on the Kokkos library to enable diverse software to run on multi-core CPUs and GPUs with a single implementation and enables a common approach to modernize existing Fortran software and to write new C++ software.</p>
Full article ">Figure 2
<p>A range of dense (<b>left</b> chart) and sparse (<b>right</b> chart) data types in MATAR are shown above that are designed for performance portability across CPUs and GPUs. Additional data types are provided in MATAR than shown above here; for instance, there are data types that solely run on CPUs and dynamically resizable 1D array and 1D matrix types.</p>
Full article ">Figure 3
<p>On the left, a ring lattice with seven nodes is shown where each node is connected to its nearest one neighbor. On the right we replace three edges with random edges. In particular, we replaced (1,2) with (1,5), (2,3) with (2,6), and (7,1) with (7,5). This is a simple demonstration of the Watts–Strogatz random graph process.</p>
Full article ">Figure 4
<p>Comparison of runtimes, as a function of the nodes in the WS graph, on diverse GPUs and on an AMD EPYC 7502 multi-core CPU. A serial calculation (blue) is presented for comparison purposes. All GPUs deliver favorable acceleration of this test case over using 16 cores on the CPU.</p>
Full article ">Figure 5
<p>The strong scaling on the WS graph with 4000 nodes is shown. This scaling test was run at powers of 2<span class="html-italic"><sup>n</sup></span> for the number of cores from 1 to 32. Displayed is the log–log plot of number of cores versus runtime. Near perfect scaling is observed.</p>
Full article ">Figure 6
<p>A comparison between the Python networkx package and the C++ implementation with MATAR is shown for the average distance between nodes as a function of the rewire probability. The same trend is observed between the Python and C++ implementations, which is a desired outcome. Both codes yielded the expected, correct behavior for this network test case. An exact match between the two implementations is not expected since the results are probabilistic.</p>
Full article ">Figure 7
<p>Compared to the serial Python code, the MATAR implementation is serially 60× faster, is about 700× faster using 16 threads on an AMD EPYC 7502 multi-core CPU, is about 2050× faster on a Titan GPU, and is 2460× faster on a V100 GPU. The WS graph speed-up results reported here correspond to a simulation using 4000 nodes.</p>
Full article ">Figure 8
<p>Speed-up compared to serial of the forward propagation through an artificial neural network using 8 cores and 16 cores on a Haswell CPU with openMP, a Nvidia Tesla V100 GPU, a Nvidia A100 GPU, and an AMD MI50 GPU. The GPU architectures deliver favorable accelerations with widely varying sizes of the vector-array multiplications.</p>
Full article ">Figure 9
<p>Diagram of the mid-ocean ridge showing oceanic lithosphere increasing in thickness as it increases in distance from the ridge. Gray is the oceanic lithosphere, and red is the asthenosphere (mantle).</p>
Full article ">Figure 10
<p>Scaling plot for the half-space cooling test problem showing walltime in seconds for serial, 8 cores, and 16 cores on a Haswell CPU; and for Nvidia Tesla V100 GPU, Nvidia A100 GPU, Quadro RTX GPU, and AMD MI50 GPU architectures with increasing problem size (e.g., billions of years). GPU architectures significantly accelerate the calculation compared to the Haswell CPU.</p>
Full article ">Figure 11
<p>Speed-up with mesh refinement provided by MATAR with CUDA over the serial Python code using an V100 GPU is shown. 253X, 69X, and 1756X accelerations are observed.</p>
Full article ">
23 pages, 16714 KiB  
Article
A Geographically Weighted Regression–Compute Unified Device Architecture Approach to Explore the Spatial Agglomeration and Heterogeneity in Arable Land Consumption in Southwest China
by Chang Liu, Tingting Xu, Letao Han, Sapu Du and Aohua Tian
Agriculture 2024, 14(10), 1675; https://doi.org/10.3390/agriculture14101675 - 25 Sep 2024
Viewed by 637
Abstract
Arable land loss has become a critical issue in China because of rapid urbanization, industrial expansion, and unsustainable agricultural practices. While previous studies have explored the factors contributing to this loss, they often fall short in addressing the challenges of spatial heterogeneity and [...] Read more.
Arable land loss has become a critical issue in China because of rapid urbanization, industrial expansion, and unsustainable agricultural practices. While previous studies have explored the factors contributing to this loss, they often fall short in addressing the challenges of spatial heterogeneity and large-scale dataset analysis. This research introduces an innovative approach to geographically weighted regression (GWR) for assessing arable land loss in China, effectively addressing these challenges. Focusing on Chongqing, Guizhou, and Yunnan Provinces over the past two decades, it examines spatial autocorrelation with R-squared values exceeding 0.6 and residuals. Eight factors, including environmental elements (rain, evaporation, slope, digital elevation model) and human activities (distance to city, distance to roads, population, GDP), were analyzed. By visualizing and analyzing R² spatial patterns, the results reveal a clear spatial agglomeration distribution, primarily in urban areas with industries, highly urbanized cities, and flat terrains near rivers, influenced by GDP, population, rain, and slope. The novelty of this study is that it significantly enhances GWR computational capabilities for handling extensive datasets by utilizing Compute Unified Device Architecture (CUDA) on a high-performance GPU cloud server. Simultaneously, it conducts comprehensive analyses of the GWR model’s local results through visualization and spatial autocorrelation tools, enhancing the interpretability of the GWR model. Through spatial clustering analysis of local results, this study enables targeted exploration of factors influencing arable land changes in various temporal and spatial dimensions while also evaluating the reliability of the model results. Full article
(This article belongs to the Section Agricultural Economics, Policies and Rural Management)
Show Figures

Figure 1

Figure 1
<p>Location of the study area—Chongqing, Guizhou, and Yunnan in China—and its elevation range.</p>
Full article ">Figure 2
<p>The steps to implement GWR analysis with the grid method. (<b>a</b>) Create the fishnet, and (<b>b</b>) use the fishnet to demonstrate the arable land loss rate. The color of different degree represents the loss rate of arable land. The darker the color, the higher the rate of arable land loss.</p>
Full article ">Figure 3
<p>Steps in the analysis of arable land loss rates using a fishnet.</p>
Full article ">Figure 4
<p>Arable land loss during 2000–2010 (<b>a</b>) and 2010–2020 (<b>b</b>) in the three provinces considered in this study.</p>
Full article ">Figure 5
<p>Fishnet grid with arable land loss rates during 2000–2010 and 2010–2020.</p>
Full article ">Figure 6
<p>Spatial distribution of R<sup>2</sup> during 2000–2010 and 2010–2020 in the three provinces considered in this study. The color gradient represents the distribution of adjusted R<sup>2</sup> values across intervals. Deeper colors signify higher R<sup>2</sup> values, indicating stronger explanatory power of the variables for farmland degradation, while lighter colors indicate lower R<sup>2</sup> values, reflecting weaker explanatory power. The subgraph represents the highly adjusted R<sup>2</sup> cluster area.</p>
Full article ">Figure 7
<p>Spatial distribution of residual during 2000–2010 and 2010–2020 in the three provinces considered in this study. The red boxes represent areas with high residuals across three regions over different time periods.</p>
Full article ">
17 pages, 6650 KiB  
Article
Study on Large-Scale Urban Water Distribution Network Computation Method Based on a GPU Framework
by Rongbin Zhang, Jingming Hou, Jingsi Li, Tian Wang and Muhammad Imran
Water 2024, 16(18), 2642; https://doi.org/10.3390/w16182642 - 18 Sep 2024
Viewed by 748
Abstract
Large-scale urban water distribution network simulation plays a critical role in the construction, monitoring, and maintenance of urban water distribution systems. However, during the simulation process, matrix inversion calculations generate a large amount of computational data and consume significant amounts of time, posing [...] Read more.
Large-scale urban water distribution network simulation plays a critical role in the construction, monitoring, and maintenance of urban water distribution systems. However, during the simulation process, matrix inversion calculations generate a large amount of computational data and consume significant amounts of time, posing challenges for practical applications. To address this issue, this paper proposes a parallel gradient calculation algorithm based on GPU hardware and the CUDA Toolkit library and compares it with the EPANET model and a model based on CPU hardware and the Armadillo library. The results show that the GPU-based model not only achieves a precision level very close to the EPANET model, reaching 99% accuracy, but also significantly outperforms the CPU-based model. Furthermore, during the simulation, the GPU architecture is able to efficiently handle large-scale data and achieve faster convergence, significantly reducing the overall simulation time. Particularly in handling larger-scale water distribution networks, the GPU architecture can improve computational efficiency by up to 13 times. Further analysis reveals that different GPU models exhibit significant differences in computational efficiency, with memory capacity being a key factor affecting performance. GPU devices with larger memory capacity demonstrate higher computational efficiency when processing large-scale water distribution networks. This study demonstrates the advantages of GPU acceleration technology in the simulation of large-scale urban water distribution networks and provides important theoretical and technical support for practical applications in this field. By carefully selecting and configuring GPU devices, the computational efficiency of large-scale water distribution networks can be significantly improved, providing more efficient solutions for future urban water resource management and planning. Full article
(This article belongs to the Special Issue Urban Flood Mitigation and Sustainable Stormwater Management)
Show Figures

Figure 1

Figure 1
<p>Topological representation of the pipeline network structure.</p>
Full article ">Figure 2
<p>CPU computational framework for water supply network.</p>
Full article ">Figure 3
<p>Computational model examples.</p>
Full article ">Figure 4
<p>Proportion of computation time for the models.</p>
Full article ">Figure 5
<p>GPU computational framework for water supply network.</p>
Full article ">Figure 6
<p>Large computational model.</p>
Full article ">Figure 7
<p>Comparison of computational performance between GPU and CPU models across different examples.</p>
Full article ">Figure 8
<p>Comparison of computational performance across different scenarios under various GPU devices.</p>
Full article ">
13 pages, 4228 KiB  
Article
Cross-Correlation Algorithm Based on Speeded-Up Robust Features Parallel Acceleration for Shack–Hartmann Wavefront Sensing
by Linxiong Wen, Xiaohan Mei, Yi Tan, Zhiyun Zhang, Fangfang Chai, Jiayao Wu, Shuai Wang and Ping Yang
Photonics 2024, 11(9), 844; https://doi.org/10.3390/photonics11090844 - 5 Sep 2024
Viewed by 502
Abstract
A cross-correlation algorithm to obtain the sub-aperture shifts that occur is a crucial aspect of scene-based SHWS (Shack–Hartmann wavefront sensing). However, when the sub-image is partially absent within the atmosphere, the traditional cross-correlation algorithm can easily obtain the wrong shift results. To overcome [...] Read more.
A cross-correlation algorithm to obtain the sub-aperture shifts that occur is a crucial aspect of scene-based SHWS (Shack–Hartmann wavefront sensing). However, when the sub-image is partially absent within the atmosphere, the traditional cross-correlation algorithm can easily obtain the wrong shift results. To overcome this drawback, we propose an algorithm based on SURFs (speeded-up-robust features) matching. In addition, to meet the speed required by wavefront sensing, CUDA parallel optimization of SURF matching is carried out using a GPU thread execution model and a programming model. The results show that the shift error can be reduced by more than two times, and the parallel algorithm can achieve nearly ten times the acceleration ratio. Full article
(This article belongs to the Special Issue Challenges and Future Directions in Adaptive Optics Technology)
Show Figures

Figure 1

Figure 1
<p>Matching process of the actual image and reference image.</p>
Full article ">Figure 2
<p>Influence of partial absence of sub-image on cross-correlation algorithm.</p>
Full article ">Figure 3
<p>SURF matching before cross-correlation algorithm.</p>
Full article ">Figure 4
<p>Integral image.</p>
Full article ">Figure 5
<p>Gaussian second derivative template.</p>
Full article ">Figure 6
<p>Approximation of Gaussian second derivative (box filter).</p>
Full article ">Figure 7
<p>Non-maximum suppression of a 3 × 3 × 3 neighborhood.</p>
Full article ">Figure 8
<p>The main direction determination diagram.</p>
Full article ">Figure 9
<p>Scheme of the SURF matching parallel acceleration optimization.</p>
Full article ">Figure 10
<p>Diagram of the optimal path in the experiment.</p>
Full article ">Figure 11
<p>Test images of different resolutions.</p>
Full article ">Figure 12
<p>Feature point registration process.</p>
Full article ">Figure 13
<p>Comparisons of relative shift estimation errors between pre-processing of SURF and unprocessed sub-images.</p>
Full article ">Figure 14
<p>Partial sub-apertures of the image.</p>
Full article ">Figure 15
<p>Average estimates of the error in pre-pressing and un-preprocessing sub-images.</p>
Full article ">
24 pages, 6571 KiB  
Article
Deciphering the Impact of Nucleosides and Nucleotides on Copper Ion and Dopamine Coordination Dynamics
by Patrycja Sadowska, Wojciech Jankowski, Romualda Bregier-Jarzębowska, Piotr Pietrzyk and Renata Jastrząb
Int. J. Mol. Sci. 2024, 25(17), 9137; https://doi.org/10.3390/ijms25179137 - 23 Aug 2024
Viewed by 625
Abstract
The mode of coordination of copper(II) ions with dopamine (DA, L) in the binary, as well as ternary systems with Ado, AMP, ADP, and ATP (L′) as second ligands, was studied with the use of experimental—potentiometric and spectroscopic (VIS, EPR, NMR, IR)—methods and [...] Read more.
The mode of coordination of copper(II) ions with dopamine (DA, L) in the binary, as well as ternary systems with Ado, AMP, ADP, and ATP (L′) as second ligands, was studied with the use of experimental—potentiometric and spectroscopic (VIS, EPR, NMR, IR)—methods and computational—molecular modeling and DFT—studies. In the Cu(II)/DA system, depending on the pH value, the active centers of the ligand involved in the coordination with copper(II) ions changed from nitrogen and oxygen atoms (CuH(DA)3+, Cu(DA)2+), via nitrogen atoms (CuH2(DA)24+), to oxygen atoms at strongly alkaline pH (Cu(DA)22+). The introduction of L′ into this system changed the mode of interaction of dopamine from oxygen atoms to the nitrogen atom in the hydroxocomplexes formed at high pH values. In the ternary systems, the ML′-L (non-covalent interaction) and ML′HxL, ML′L, and ML′L(OH)x species were found. In the Cu(II)/DA/AMP or ATP systems, mixed forms were formed up to a pH of around 9.0; above this pH, only Cu(II)/DA complexes occurred. In contrast to systems with AMP and ATP, ternary species with Ado and ADP occurred in the whole pH range at a high concentration, and moreover, binary complexes of Cu(II) ions with dopamine did not form in the detectable concentration. Full article
Show Figures

Figure 1

Figure 1
<p>Chemical formulae of the bioligands studied.</p>
Full article ">Figure 2
<p>Experimental and simulated titration curves for the Cu(II)/DA system; green—experimental curve (complexes formation was not taken into account); red—experimental curve, black—simulated curve (complexes formation was taken into account); C<sub>Cu</sub><sup>2+</sup> = 1 × 10<sup>−3</sup> M, C<sub>DA</sub> = 4 × 10<sup>−3</sup> M.</p>
Full article ">Figure 3
<p>UV–VIS spectra of the (<b>a</b>) DA and Cu(II)/DA system depending on the pH, C<sub>Cu</sub><sup>2+</sup> = 1 × 10<sup>−5</sup> M, C<sub>DA</sub> = 4 × 10<sup>−5</sup> M; (<b>b</b>) DA and Cu(II)/DA system at pH 4.8 depending on the time in the range of 280 nm–900 nm; (<b>c</b>) Cu(II)/DA system. C<sub>Cu</sub><sup>2+</sup> = 1 × 10<sup>−3</sup> M, C<sub>DA</sub> = 4 × 10<sup>−3</sup> M.</p>
Full article ">Figure 4
<p>Distribution diagram for the Cu(II)/DA system; percentage of the species refers to total Cu(II); C<sub>Cu</sub><sup>2+</sup> = 1 × 10<sup>−3</sup> M, C<sub>DA</sub> = 4 × 10<sup>−3</sup> M.</p>
Full article ">Figure 5
<p>Experimental and simulated EPR spectra for the Cu(II)/DA system at (<b>a</b>) pH 6.0 and (<b>b</b>) pH 8.0; C<sub>Cu</sub><sup>2+</sup> = 1 × 10<sup>−3</sup> M, C<sub>DA</sub> = 4 × 10<sup>−3</sup> M.</p>
Full article ">Figure 6
<p><sup>13</sup>C NMR spectra of the DA and Cu(II)/DA system at pH 6.0; C<sub>Cu</sub><sup>2+</sup> = 1 × 10<sup>−3</sup> M, C<sub>DA</sub> = 1 × 10<sup>−1</sup> M.</p>
Full article ">Figure 7
<p>Optimized structures of complexes of dopamine and protonated dopamine with the copper(II) ion; dark grey—carbon atom, light grey—hydrogen atom, red—oxygen atom, blue—nitrogen atom and brown—copper(II) ion.</p>
Full article ">Figure 8
<p>Structure of a complex with the strongest interaction between the two molecules of protonated dopamine and the copper(II) ion (Cu_H_DA2_2_A1); dark grey—carbon atoms, light grey—hydrogen atoms, red—oxygen atoms, blue—nitrogen atoms and brown—copper(II) ion.</p>
Full article ">Figure 9
<p>Fragment of FT-IR spectra of the DA and Cu(II)/DA system at (<b>a</b>) pH 6.0 and (<b>b</b>) pH 8.0; C<sub>Cu</sub><sup>2+</sup> = 1 × 10<sup>−3</sup> M, C<sub>DA</sub> = 4 × 10<sup>−3</sup> M.</p>
Full article ">Figure 10
<p>Structure of the complex with the strongest interaction between the two molecules of dopamine and the copper(II) ion (Cu_DA2_3); dark grey—carbon atoms, light grey—hydrogen atoms, red—oxygen atoms, blue—nitrogen atoms and brown—copper(II) ion.</p>
Full article ">Figure 11
<p>Schemes of a possible interaction between two protonated molecules of dopamine with a copper(II) ion; red—oxygen atoms, blue—nitrogen atoms and brown—copper(II) ions.</p>
Full article ">Figure 12
<p>Distribution diagrams for the (<b>a</b>) Cu(II)/DA/Ado, (<b>b</b>) Cu(II)/DA/AMP, (<b>c</b>) Cu(II)/DA/ADP, and (<b>d</b>) Cu(II)/DA/ATP systems; percentage of the species refers to total Cu(II); C<sub>Cu</sub><sup>2+</sup> = 1 × 10<sup>−3</sup> M, C<sub>L=L′</sub> = 2 × 10<sup>−3</sup> M.</p>
Full article ">Figure 13
<p>VIS spectra of the (<b>a</b>) Cu(II)/DA/Ado, (<b>b</b>) Cu(II)/DA/AMP, (<b>c</b>) Cu(II)/DA/ADP, and (<b>d</b>) Cu(II)/DA/ATP systems; C<sub>Cu</sub><sup>2+</sup> = 1 × 10<sup>−3</sup> M, C<sub>L=L′</sub> = 2 × 10<sup>−3</sup> M.</p>
Full article ">Figure 14
<p>Experimental and simulated EPR spectra for the Cu(II)/DA/ADP system at pH 8.0; C<sub>Cu</sub><sup>2+</sup> = 1 × 10<sup>−3</sup> M, C<sub>L=L′</sub> = 2 × 10<sup>−3</sup> M.</p>
Full article ">Figure 15
<p>Schemes of a possible interaction between two protonated molecules of dopamine with a copper(II) ion; red—oxygen atoms, blue—nitrogen atoms and brown—copper(II) ions.</p>
Full article ">Figure 16
<p>Schemes of a possible interaction between two molecules of dopamine with a copper(II) ion; red—oxygen atoms, blue—nitrogen atoms and brown—copper(II) ions.</p>
Full article ">
27 pages, 4031 KiB  
Article
Polarization Characteristics of Massive HVI Debris Clouds Using an Improved Monte Carlo Ray Tracing Method for Remote Sensing Applications
by Guangsen Liu, Peng Rao, Yao Li and Wen Sun
Remote Sens. 2024, 16(16), 2925; https://doi.org/10.3390/rs16162925 - 9 Aug 2024
Viewed by 927
Abstract
As a signature phenomenon of massive hypervelocity impacts (HVIs) in space, debris clouds provide critical optical information for satellite remote sensing and the assessment of large-scale impacts. However, studies of the optical scattering properties of debris clouds remain limited, and existing vector radiative [...] Read more.
As a signature phenomenon of massive hypervelocity impacts (HVIs) in space, debris clouds provide critical optical information for satellite remote sensing and the assessment of large-scale impacts. However, studies of the optical scattering properties of debris clouds remain limited, and existing vector radiative transfer (VRT) methods struggle to accurately simulate the optical characteristics of these complex scatterers. To address this gap, this paper presents an improved Monte Carlo VRT program (PGS–MC) for multicomponent polydisperse scatterers to precisely evaluate the radiation and polarization characteristics of complex scatterers. Based on the Monte Carlo ray tracing (MCRT) method, our program introduces a particle grouping strategy (PGS) to further emphasize the importance of accounting for optical property discrepancies between different materials and particle sizes, thus significantly improving the fidelity of VRT simulations. Moreover, our program, developed using the compute unified device architecture (CUDA), can be run parallelly on graphics processing units (GPUs), which effectively reduces the computational time. The validation results indicated that the developed PGS–MC program can accurately and efficiently simulate the polarization of complex 3D scatterers. A further investigation showed that the polarization characteristics of debris clouds are highly sensitive to parameters such as the angle between the incident and detection directions, number density, particle size distribution, debris material, and wavelength. In addition, the polarization imaging of debris clouds offers distinct advantages over intensity imaging. This study offers guidance for analyzing the VRT properties of massive HVI debris clouds. Additionally, it provides a practical tool and concrete ideas for modeling the polarization characteristics of various complex scatterers, such as aircraft contrails and clouds, etc. Full article
Show Figures

Figure 1

Figure 1
<p>Geometric model of a debris cloud: (<b>a</b>) Normalized contour model; (<b>b</b>) 3D model. The parts in purple indicate the presence of debris particles.</p>
Full article ">Figure 2
<p>Process of the PGS–MC simulation.</p>
Full article ">Figure 3
<p>Schematic of simulating the physical process of debris cloud VRT using the PGS-MC program: (<b>a</b>) 3D schematic of the simulation process, (<b>b</b>) cross-section of photon tracking, and (<b>c</b>) photon transmission cases in a single voxel. The particles in (<b>b</b>,<b>c</b>) are distinguished by various colors to indicate their respective materials.</p>
Full article ">Figure 4
<p>Validation of PGS–MC in a pure Rayleigh atmosphere: (<b>a</b>) Stokes <span class="html-italic">I</span> for backward scattering. (<b>b</b>) Stokes <span class="html-italic">I</span> for forward scattering. (<b>c</b>) Stokes <span class="html-italic">Q</span> for backward scattering. (<b>d</b>) Stokes <span class="html-italic">Q</span> for forward scattering. The scatter plots indicate the results of the PGS–MC, and the curves indicate the results given in Ref. [<a href="#B4-remotesensing-16-02925" class="html-bibr">4</a>].</p>
Full article ">Figure 5
<p>Validation of PGS–MC in the two-layer medium model. The scatter plots indicate the results obtained using the PGS–MC, and the curves indicate the results given in Ref. [<a href="#B38-remotesensing-16-02925" class="html-bibr">38</a>].</p>
Full article ">Figure 6
<p>Polarization distributions for different <math display="inline"><semantics> <msub> <mi>n</mi> <mi>r</mi> </msub> </semantics></math> and the integration method.</p>
Full article ">Figure 7
<p>Scattering phase matrix coefficients calculated by PGS for Subregions 1, 5, and 8 and by the integration method: (<b>a</b>) Normalized <math display="inline"><semantics> <msub> <mi>s</mi> <mn>11</mn> </msub> </semantics></math>, (<b>b</b>) <math display="inline"><semantics> <mrow> <msub> <mi>s</mi> <mn>12</mn> </msub> <mo>/</mo> <msub> <mi>s</mi> <mn>11</mn> </msub> </mrow> </semantics></math>, (<b>c</b>) <math display="inline"><semantics> <mrow> <msub> <mi>s</mi> <mn>33</mn> </msub> <mo>/</mo> <msub> <mi>s</mi> <mn>11</mn> </msub> </mrow> </semantics></math>, and (<b>d</b>) <math display="inline"><semantics> <mrow> <msub> <mi>s</mi> <mn>43</mn> </msub> <mo>/</mo> <msub> <mi>s</mi> <mn>11</mn> </msub> </mrow> </semantics></math>.</p>
Full article ">Figure 8
<p>Bidirectional <math display="inline"><semantics> <mi mathvariant="italic">DoLP</mi> </semantics></math> distribution of debris clouds for different <math display="inline"><semantics> <msub> <mi>N</mi> <mi>p</mi> </msub> </semantics></math> values: (<b>a</b>) <math display="inline"><semantics> <mrow> <msub> <mi>N</mi> <mi>p</mi> </msub> <mo>=</mo> <mn>2</mn> <mo>×</mo> <msup> <mrow> <mn>10</mn> </mrow> <mn>7</mn> </msup> </mrow> </semantics></math>, (<b>b</b>) <math display="inline"><semantics> <mrow> <msub> <mi>N</mi> <mi>p</mi> </msub> <mo>=</mo> <mn>5</mn> <mo>×</mo> <msup> <mrow> <mn>10</mn> </mrow> <mn>7</mn> </msup> </mrow> </semantics></math>, (<b>c</b>) <math display="inline"><semantics> <mrow> <msub> <mi>N</mi> <mi>p</mi> </msub> <mo>=</mo> <mn>1</mn> <mo>×</mo> <msup> <mrow> <mn>10</mn> </mrow> <mn>8</mn> </msup> </mrow> </semantics></math>, and (<b>d</b>) <math display="inline"><semantics> <mrow> <msub> <mi>N</mi> <mi>p</mi> </msub> <mo>=</mo> <mn>2</mn> <mo>×</mo> <msup> <mrow> <mn>10</mn> </mrow> <mn>8</mn> </msup> </mrow> </semantics></math>. The angular and radial scales represent the azimuth angles (0°∼360°) and zenith angles (0°∼180°), respectively. The <math display="inline"><semantics> <mi mathvariant="italic">DoLP</mi> </semantics></math> is depicted through a color gradient.</p>
Full article ">Figure 9
<p>Bidirectional distribution of <math display="inline"><semantics> <mi mathvariant="italic">DoLP</mi> </semantics></math> for debris clouds with different numbers of subregions with different particle sizes: (<b>a</b>) <math display="inline"><semantics> <mrow> <msub> <mi>n</mi> <mi>r</mi> </msub> <mo>=</mo> <mn>6</mn> </mrow> </semantics></math>, (<b>b</b>) <math display="inline"><semantics> <mrow> <msub> <mi>n</mi> <mi>r</mi> </msub> <mo>=</mo> <mn>7</mn> </mrow> </semantics></math>, (<b>c</b>) <math display="inline"><semantics> <mrow> <msub> <mi>n</mi> <mi>r</mi> </msub> <mo>=</mo> <mn>8</mn> </mrow> </semantics></math>, and (<b>d</b>) <math display="inline"><semantics> <mrow> <msub> <mi>n</mi> <mi>r</mi> </msub> <mo>=</mo> <mn>9</mn> </mrow> </semantics></math>. The angular and radial scales represent the azimuth angles (0°∼360°) and zenith angles (0°∼180°), respectively. The <math display="inline"><semantics> <mi mathvariant="italic">DoLP</mi> </semantics></math> is depicted through a color gradient.</p>
Full article ">Figure 10
<p>Bidirectional distribution of the <math display="inline"><semantics> <mi mathvariant="italic">DoLP</mi> </semantics></math> debris clouds at different incidence angles. (<b>a</b>–<b>c</b>) correspond to cases with incidence angles of 0°, 30°, and 60° for <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>2.0</mn> <mspace width="3.33333pt"/> </mrow> </semantics></math>µm, respectively, and (<b>d</b>–<b>f</b>) correspond to cases for <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>4.0</mn> <mspace width="3.33333pt"/> </mrow> </semantics></math>µm. The angular and radial scales represent the azimuth (0°∼360°) and zenith (0°∼180°) angles, respectively. The <math display="inline"><semantics> <mi mathvariant="italic">DoLP</mi> </semantics></math> is depicted using a color gradient.</p>
Full article ">Figure 11
<p><math display="inline"><semantics> <mi mathvariant="italic">DoLP</mi> </semantics></math> distribution for different <math display="inline"><semantics> <msub> <mi>m</mi> <mi>t</mi> </msub> </semantics></math>.</p>
Full article ">Figure 12
<p><math display="inline"><semantics> <mi mathvariant="italic">DoLP</mi> </semantics></math> distributions in the three particle size regions: (<b>a</b>) <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>2.0</mn> <mspace width="3.33333pt"/> </mrow> </semantics></math>µm, (<b>b</b>) <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>3.0</mn> <mspace width="3.33333pt"/> </mrow> </semantics></math>µm, (<b>c</b>) <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>4.0</mn> <mspace width="3.33333pt"/> </mrow> </semantics></math>µm, and (<b>d</b>) <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>5.0</mn> <mspace width="3.33333pt"/> </mrow> </semantics></math>µm.</p>
Full article ">Figure 13
<p><math display="inline"><semantics> <mi mathvariant="italic">DoLP</mi> </semantics></math> distributions for different attenuation coefficients <span class="html-italic">b</span>: (<b>a</b>) <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>2.0</mn> <mspace width="3.33333pt"/> </mrow> </semantics></math>µm, (<b>b</b>) <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>3.0</mn> <mspace width="3.33333pt"/> </mrow> </semantics></math>µm, (<b>c</b>) <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>4.0</mn> <mspace width="3.33333pt"/> </mrow> </semantics></math>µm, (<b>d</b>) <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>5.0</mn> <mspace width="3.33333pt"/> </mrow> </semantics></math>µm.</p>
Full article ">Figure 14
<p><math display="inline"><semantics> <mi mathvariant="italic">DoLP</mi> </semantics></math> distributions for different debris materials: (<b>a</b>) <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>2.0</mn> <mspace width="3.33333pt"/> </mrow> </semantics></math>µm, (<b>b</b>) <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>3.0</mn> <mspace width="3.33333pt"/> </mrow> </semantics></math>µm, (<b>c</b>) <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>4.0</mn> <mspace width="3.33333pt"/> </mrow> </semantics></math>µm, and (<b>d</b>) <math display="inline"><semantics> <mrow> <mi>λ</mi> <mo>=</mo> <mn>5.0</mn> <mspace width="3.33333pt"/> </mrow> </semantics></math>µm.</p>
Full article ">Figure 15
<p>Polarization imaging simulation: (<b>a</b>–<b>d</b>) <math display="inline"><semantics> <mi mathvariant="italic">DoLP</mi> </semantics></math> images at four wavelengths, and (<b>e</b>–<b>h</b>) intensity images.</p>
Full article ">Figure A1
<p>Schematic diagram of the pure Rayleigh atmosphere experiment.</p>
Full article ">
21 pages, 3978 KiB  
Article
Application and Evaluation of the AI-Powered Segment Anything Model (SAM) in Seafloor Mapping: A Case Study from Puck Lagoon, Poland
by Łukasz Janowski and Radosław Wróblewski
Remote Sens. 2024, 16(14), 2638; https://doi.org/10.3390/rs16142638 - 18 Jul 2024
Cited by 1 | Viewed by 1181
Abstract
The digital representation of seafloor, a challenge in UNESCO’s Ocean Decade initiative, is essential for sustainable development support and marine environment protection, aligning with the United Nations’ 2030 program goals. Accuracy in seafloor representation can be achieved through remote sensing measurements, including acoustic [...] Read more.
The digital representation of seafloor, a challenge in UNESCO’s Ocean Decade initiative, is essential for sustainable development support and marine environment protection, aligning with the United Nations’ 2030 program goals. Accuracy in seafloor representation can be achieved through remote sensing measurements, including acoustic and laser sources. Ground truth information integration facilitates comprehensive seafloor assessment. The current seafloor mapping paradigm benefits from the object-based image analysis (OBIA) approach, managing high-resolution remote sensing measurements effectively. A critical OBIA step is the segmentation process, with various algorithms available. Recent artificial intelligence advancements have led to AI-powered segmentation algorithms development, like the Segment Anything Model (SAM) by META AI. This paper presents the SAM approach’s first evaluation for seafloor mapping. The benchmark remote sensing dataset refers to Puck Lagoon, Poland and includes measurements from various sources, primarily multibeam echosounders, bathymetric lidar, airborne photogrammetry, and satellite imagery. The SAM algorithm’s performance was evaluated on an affordable workstation equipped with an NVIDIA GPU, enabling CUDA architecture utilization. The growing popularity and demand for AI-based services predict their widespread application in future underwater remote sensing studies, regardless of the measurement technology used (acoustic, laser, or imagery). Applying SAM in Puck Lagoon seafloor mapping may benefit other seafloor mapping studies intending to employ AI technology. Full article
(This article belongs to the Special Issue Advanced Remote Sensing Technology in Geodesy, Surveying and Mapping)
Show Figures

Figure 1

Figure 1
<p>Geographical representation of the study site and remote sensing datasets used as benchmark in this study: (<b>a</b>) location of the study site within the central Europe, marked by red area; (<b>b</b>) MBES bathymetry; (<b>c</b>) MBES backscatter; (<b>d</b>) bathymetric LiDAR intensity; (<b>e</b>) orthophoto of the study site; (<b>f</b>) SDB; (<b>g</b>) joint DEM generated by integration of MBES and ALB bathymetries.</p>
Full article ">Figure 2
<p>Detailed flow chart of the methods used in this study.</p>
Full article ">Figure 3
<p>The side-by-side presentation of the spatial results of SAM allowing for a comparative analysis of different parameters of SAM application and manual discrimination. Image segments were outlined by black border over SDB bathymetry: (<b>a</b>) SAM algorithm, ViT−B model type, and 3 m pixel size; (<b>b</b>) SAM algorithm, ViT−B model type, and 4 m pixel size; (<b>c</b>) SAM algorithm, ViT−B model type, and 5 m pixel size; (<b>d</b>) SAM algorithm, ViT−L model type, and 5 m pixel size; (<b>e</b>) SAM + MRS algorithms, ViT−B model type, and 4 m pixel size; (<b>f</b>) SAM + MRS algorithms, ViT−B model type, and 5 m pixel size; (<b>g</b>) SAM + MRS algorithms, ViT−L model type, and 4 m pixel size; (<b>h</b>) SAM + MRS algorithms, ViT−L model type, and 5 m pixel size; (<b>i</b>) result of manual image segmentation by expert interpretation.</p>
Full article ">Figure 4
<p>(<b>a</b>) The map presents a comparative analysis of the results obtained from the application of three methods: SAM + MRS (represented by a black solid line), manual delineation (depicted by a grey dashed line), and MRS with the RF algorithm (the classification of which is expressed in a color scale). (<b>b</b>) The map presents result of RF classification over SAM + MRS outcome. Distinct types of bedforms in both maps have been identified and named according to the “symbol names” column in <a href="#remotesensing-16-02638-t002" class="html-table">Table 2</a>.</p>
Full article ">
28 pages, 11142 KiB  
Article
Real-Time Registration of Unmanned Aerial Vehicle Hyperspectral Remote Sensing Images Using an Acousto-Optic Tunable Filter Spectrometer
by Hong Liu, Bingliang Hu, Xingsong Hou, Tao Yu, Zhoufeng Zhang, Xiao Liu, Jiacheng Liu and Xueji Wang
Drones 2024, 8(7), 329; https://doi.org/10.3390/drones8070329 - 17 Jul 2024
Viewed by 1038
Abstract
Differences in field of view may occur during unmanned aerial remote sensing imaging applications with acousto-optic tunable filter (AOTF) spectral imagers using zoom lenses. These differences may stem from image size deformation caused by the zoom lens, image drift caused by AOTF wavelength [...] Read more.
Differences in field of view may occur during unmanned aerial remote sensing imaging applications with acousto-optic tunable filter (AOTF) spectral imagers using zoom lenses. These differences may stem from image size deformation caused by the zoom lens, image drift caused by AOTF wavelength switching, and drone platform jitter. However, they can be addressed using hyperspectral image registration. This article proposes a new coarse-to-fine remote sensing image registration framework based on feature and optical flow theory, comparing its performance with that of existing registration algorithms using the same dataset. The proposed method increases the structure similarity index by 5.2 times, reduces the root mean square error by 3.1 times, and increases the mutual information by 1.9 times. To meet the real-time processing requirements of the AOTF spectrometer in remote sensing, a development environment using VS2023+CUDA+OPENCV was established to improve the demons registration algorithm. The registration algorithm for the central processing unit+graphics processing unit (CPU+GPU) achieved an acceleration ratio of ~30 times compared to that of a CPU alone. Finally, the real-time registration effect of spectral data during flight was verified. The proposed method demonstrates that AOTF hyperspectral imagers can be used in real-time remote sensing applications on unmanned aerial vehicles. Full article
Show Figures

Figure 1

Figure 1
<p>Composition diagram of the unmanned aerial vehicle hyperspectral imaging system based on AOTF. (<b>a</b>) Components of the AOTF imaging system based on the zoom lens: AOTF imaging spectrometer, AOTF driver, MINI-PC with GPU, and battery. (<b>b</b>) AOTF spectrometer airborne imaging system based on a zoom lens.</p>
Full article ">Figure 2
<p>Core optical path structure diagram of AOTF spectrometer based on electric zoom lens.</p>
Full article ">Figure 3
<p>Trend of variation between the diffraction angle of the AOTF crystal and the wavelength of incident light.</p>
Full article ">Figure 4
<p>Basic steps of the GPU-based image processing-accelerated CUDA program.</p>
Full article ">Figure 5
<p>Framework of the proposed registration method.</p>
Full article ">Figure 6
<p>Six pairs of original images. (<b>a</b>,<b>b</b>) Image pair 1; (<b>c</b>,<b>d</b>) image pair 2; (<b>e</b>,<b>f</b>) image pair 3; (<b>g</b>,<b>h</b>) image pair 4; (<b>i</b>,<b>j</b>) image pair 5; (<b>k</b>,<b>l</b>) image pair 6. The left side of the image is a 580 nm image, and the right side is a 620 nm image.</p>
Full article ">Figure 7
<p>Image registration results of the proposed algorithm: (<b>a</b>–<b>l</b>) pairs correspond to the registration results of image pairs 1–6, respectively, where the left image is an unregistered image overlay display, and the right image is a registered image overlay display.</p>
Full article ">Figure 8
<p>Comparison of the details of different registration algorithms using checkerboard mosaicked images (good registration results are displayed on a checkerboard without image misalignment).</p>
Full article ">Figure 9
<p>Real-time registration of remote sensing flight experiment and the waypoint data cube registration effect. (<b>a</b>) Experimental environment for RGB camera shooting. (<b>b</b>) Planned five waypoints in the experiment. Cubes 1–5: without registration (different spectral images exhibit misalignment, with ghosting in the data cube display); cubes 6–10: with registration (the registered image displays no misalignment, with no ghosting in the data cube display).</p>
Full article ">Figure 10
<p>Comparison of the quantitative effect of waypoint data cube alignment and GPU alignment acceleration. (<b>a</b>–<b>e</b>) Quantitative comparison of SSIM and RMSE between the unaligned and aligned data cubes adjacent to each other for the five waypoints. (<b>f</b>) Comparison of the average processing time of data cube alignment using CPU and CPU+GPU for the five waypoints.</p>
Full article ">
Back to TopTop