MDPI - Publisher of Open Access Journals

21 pages, 4958 KiB

Open AccessArticle

An Efficient GPU-Accelerated Algorithm for Solving Dynamic Response of Fluid-Saturated Porous Media

by Wancang Lin, Qinglong Zhou, Xinyi Chen, Wenhao Shi and Jie Ai

Mathematics 2025, 13(2), 181; https://doi.org/10.3390/math13020181 - 7 Jan 2025

Viewed by 709

The traditional finite element program is executed on the CPU; however, it is challenging for the CPU to compute the ultra-large scale finite element model. In this paper, we present a set of efficient algorithms based on GPU acceleration technology for the dynamic [...] Read more.

The traditional finite element program is executed on the CPU; however, it is challenging for the CPU to compute the ultra-large scale finite element model. In this paper, we present a set of efficient algorithms based on GPU acceleration technology for the dynamic response of fluid-saturated porous media, named PNAM, encompassing the assembly of the global matrix and the iterative solution of equations. In the assembly part, the CSR storage format of the global matrix is directly obtained from the element matrix. For data with two million degrees of freedom, it merely takes approximately 1 s to generate all the data of global matrices, which is significantly superior to the CPU version. Regarding the iterative solution of equations, a novel algorithm based on the CUDA kernel function is proposed. For a data set with two million degrees of freedom, it takes only about 0.05 s to compute an iterative step and transfer the data to the CPU. The program is designed to calculate either in single or double precision. The change in precision has little impact on the assembly of the global matrix, but the calculation time of double precision is generally 1.5 to 2 times that of single precision in the iterative solution part for a model with 2 million degrees of freedom. PNAM has high computational efficiency and great compatibility, which can be used to solve not only saturated fluid problems but also a variety of other problems. Full article

► Show Figures

Figure 1

22 pages, 620 KiB

Open AccessArticle

Ego-Motion Estimation for Autonomous Vehicles Based on Genetic Algorithms and CUDA Parallel Processing

by Abiel Aguilar-González and Alejandro Medina Santiago

Algorithms 2025, 18(1), 19; https://doi.org/10.3390/a18010019 - 3 Jan 2025

Viewed by 501

Abstract

Estimating ego-motion in autonomous vehicles is critical for tasks such as localization, navigation, obstacle avoidance, and so on. While traditional methods often rely on direct pose estimation or AI-based approaches, these can be computationally intensive, especially for small, incremental movements typically observed between [...] Read more.

Estimating ego-motion in autonomous vehicles is critical for tasks such as localization, navigation, obstacle avoidance, and so on. While traditional methods often rely on direct pose estimation or AI-based approaches, these can be computationally intensive, especially for small, incremental movements typically observed between consecutive frames. In this work, we propose a brute-force-based ego-motion estimation algorithm that takes advantage of the constraints of autonomous vehicles, which are assumed to have only three degrees of freedom (x, y, and yaw). Our approach is based on a genetic algorithm to efficiently explore potential vehicle movements. By generating an initial seed of random motion candidates and iteratively mutating and selecting the best-performing individuals, we minimize the cost function that measures image similarity between frames. Furthermore, we implement the algorithm using CUDA to exploit parallel processing, significantly improving computational speed. Experimental results demonstrate that our approach achieves accurate ego-motion estimation with high efficiency, making it suitable for real-time autonomous vehicle applications. Full article

(This article belongs to the Section Parallel and Distributed Algorithms)

► Show Figures

Figure 1

Figure 1
Block diagram of the proposed algorithm. Full article ">Figure 2
Flow chart of the proposed algorithm. Full article ">Figure 3
The algorithm’s performance was evaluated on the KITTI dataset’s training sequences, with results demonstrating consistently high accuracy across multiple test cases, closely matching the ground truth. These outcomes confirm that the model effectively captures vehicle motion and spatial consistency within diverse urban and suburban environments featured in KITTI. The blue line represents the ground truth, while the green line shows the estimated ego-motion using the proposed algorithm. Full article ">Figure 3 Cont.
The algorithm’s performance was evaluated on the KITTI dataset’s training sequences, with results demonstrating consistently high accuracy across multiple test cases, closely matching the ground truth. These outcomes confirm that the model effectively captures vehicle motion and spatial consistency within diverse urban and suburban environments featured in KITTI. The blue line represents the ground truth, while the green line shows the estimated ego-motion using the proposed algorithm. Full article ">Figure 4
The performance of the algorithm, sequences 11 to 14 from the KITTI dataset, without ground truth data. The results were obtained from the KITTI evaluation platform, where our algorithm was submitted for evaluation. These evaluations demonstrated that the proposed algorithm maintains a high level of precision across all tested scenarios. Full article ">Figure 4 Cont.
The performance of the algorithm, sequences 11 to 14 from the KITTI dataset, without ground truth data. The results were obtained from the KITTI evaluation platform, where our algorithm was submitted for evaluation. These evaluations demonstrated that the proposed algorithm maintains a high level of precision across all tested scenarios. Full article ">Figure 5
Performance for the pose estimation step under the proposed dataset. Sequence 48, which consists of an x, y, and yaw camera movement, is used to validate the performance under loop trajectories. By using the proposed algorithm, an accuracy of around 97.4% can be reached. Full article ">

21 pages, 4675 KiB

Open AccessArticle

A Parallel Framework for Fast Charge/Discharge Scheduling of Battery Storage Systems in Microgrids

by Wei-Tzer Huang, Wu-Chun Chung, Chao-Chin Wu and Tse-Yun Huang

Energies 2024, 17(24), 6371; https://doi.org/10.3390/en17246371 - 18 Dec 2024

Viewed by 544

Abstract

Fast charge/discharge scheduling of battery storage systems is essential in microgrids to effectively balance variable renewable energy sources, meet fluctuating demand, and maintain grid stability. To achieve this, parallel processing is employed, allowing batteries to respond instantly to dynamic conditions. By managing the [...] Read more.

Fast charge/discharge scheduling of battery storage systems is essential in microgrids to effectively balance variable renewable energy sources, meet fluctuating demand, and maintain grid stability. To achieve this, parallel processing is employed, allowing batteries to respond instantly to dynamic conditions. By managing the complexity, high data volume, and rapid decision-making requirements in real time, parallel processing ensures that the microgrid operates with stability, efficiency, and safety. With the application of deep reinforcement learning (DRL) in scheduling algorithm design, the demand for computational power has further increased significantly. To address this challenge, we propose a Ray-based parallel framework to accelerate the development of fast charge/discharge scheduling for battery storage systems in microgrids. We demonstrate how to implement a real-world scheduling problem in the framework. We focused on minimizing power losses and reducing the ramping rate of net loads by leveraging the Asynchronous Advantage Actor Critic (A3C) algorithms and the features of the Ray cluster for real-time decision making. Multiple instances of OpenDSS were executed concurrently, with each instance simulating a distinct environment and efficiently processing input data. Additionally, Numba CUDA was utilized to facilitate GPU acceleration of shared memory, significantly enhancing the performance of the computationally intensive reward function in A3C. The proposed framework enhanced scheduling performance, enabling efficient energy management in complex, dynamic microgrid environments. Full article

(This article belongs to the Section A1: Smart Grids and Microgrids)

► Show Figures

Figure 1

40 pages, 1079 KiB

Open AccessArticle

Context-Adaptable Deployment of FastSLAM 2.0 on Graphic Processing Unit with Unknown Data Association

by Jessica Giovagnola, Manuel Pegalajar Cuéllar and Diego Pedro Morales Santos

Appl. Sci. 2024, 14(23), 11466; https://doi.org/10.3390/app142311466 - 9 Dec 2024

Viewed by 965

Abstract

Simultaneous Localization and Mapping (SLAM) algorithms are crucial for enabling agents to estimate their position in unknown environments. In autonomous navigation systems, these algorithms need to operate in real-time on devices with limited resources, emphasizing the importance of reducing complexity and ensuring efficient [...] Read more.

Simultaneous Localization and Mapping (SLAM) algorithms are crucial for enabling agents to estimate their position in unknown environments. In autonomous navigation systems, these algorithms need to operate in real-time on devices with limited resources, emphasizing the importance of reducing complexity and ensuring efficient performance. While SLAM solutions aim at ensuring accurate and timely localization and mapping, one of their main limitations is their computational complexity. In this scenario, particle filter-based approaches such as FastSLAM 2.0 can significantly benefit from parallel programming due to their modular construction. The parallelization process involves identifying the parameters affecting the computational complexity in order to distribute the computation among single multiprocessors as efficiently as possible. However, the computational complexity of methodologies such as FastSLAM 2.0 can depend on multiple parameters whose values may, in turn, depend on each specific use case scenario ( ingi.e., the context), leading to multiple possible parallelization designs. Furthermore, the features of the hardware architecture in use can significantly influence the performance in terms of latency. Therefore, the selection of the optimal parallelization modality still needs to be empirically determined. This may involve redesigning the parallel algorithm depending on the context and the hardware architecture. In this paper, we propose a CUDA-based adaptable design for FastSLAM 2.0 on GPU, in combination with an evaluation methodology that enables the assessment of the optimal parallelization modality based on the context and the hardware architecture without the need for the creation of separate designs. The proposed implementation includes the parallelization of all the functional blocks of the FastSLAM 2.0 pipeline. Additionally, we contribute a parallelized design of the data association step through the Joint Compatibility Branch and Bound (JCBB) method. Multiple resampling algorithms are also included to accommodate the needs of a wide variety of navigation scenarios. Full article

(This article belongs to the Special Issue Advancements in Multi-Agent Systems and Artificial Intelligence: Methodologies, Applications, and Future Trends)

► Show Figures

Figure 1

10 pages, 620 KiB

Open AccessArticle

Serum Tau Species in Progressive Supranuclear Palsy: A Pilot Study

by Costanza Maria Cristiani, Luana Scaramuzzino, Elvira Immacolata Parrotta, Giovanni Cuda, Aldo Quattrone and Andrea Quattrone

Diagnostics 2024, 14(23), 2746; https://doi.org/10.3390/diagnostics14232746 - 5 Dec 2024

Viewed by 722

Abstract

Background/Objectives: Progressive Supranuclear Palsy (PSP) is a tauopathy showing a marked symptoms overlap with Parkinson’s Disease (PD). PSP pathology suggests that tau protein might represent a valuable biomarker to distinguish between the two diseases. Here, we investigated the presence and diagnostic value of [...] Read more.

Background/Objectives: Progressive Supranuclear Palsy (PSP) is a tauopathy showing a marked symptoms overlap with Parkinson’s Disease (PD). PSP pathology suggests that tau protein might represent a valuable biomarker to distinguish between the two diseases. Here, we investigated the presence and diagnostic value of six different tau species (total tau, 4R-tau isoform, tau aggregates, p-tau202, p-tau231 and p-tau396) in serum from 13 PSP and 13 PD patients and 12 healthy controls (HCs). Methods: ELISA commercial kits were employed to assess all the tau species except for t-tau, which was assessed by a single molecule array (SIMOA)-based commercial kit. Possible correlations between tau species and biological and clinical features of our cohorts were also evaluated. Results: Among the six tau species tested, only p-tau396 was detectable in serum. Concentration of p-tau396 was significantly higher in both PSP and PD groups compared to HC, but PSP and PD patients showed largely overlapping values. Moreover, serum concentration of p-tau396 strongly correlated with disease severity in PSP and not in PD. Conclusions: Overall, we identified serum p-tau396 as the most expressed phosphorylated tau species in serum and as a potential tool for assessing PSP clinical staging. Moreover, we demonstrated that other p-tau species may be present at too low concentrations in serum to be detected by ELISA, suggesting that future work should focus on other biological matrices. Full article

(This article belongs to the Special Issue Novel Biomarkers for Alzheimer’s Disease and Other Neurodegenerative Diseases)

► Show Figures

Figure 1

20 pages, 14037 KiB

Open AccessArticle

Algorithmic Efficiency in Convex Hull Computation: Insights from 2D and 3D Implementations

by Hyun Kwon, Sehong Oh and Jang-Woon Baek

Symmetry 2024, 16(12), 1590; https://doi.org/10.3390/sym16121590 - 28 Nov 2024

Cited by 2 | Viewed by 1449

Abstract

This study examines various algorithms for computing the convex hull of a set of n points in a d-dimensional space. Convex hulls are fundamental in computational geometry and are applied in computer graphics, pattern recognition, and computational biology. Such convex hulls can also [...] Read more.

This study examines various algorithms for computing the convex hull of a set of n points in a d-dimensional space. Convex hulls are fundamental in computational geometry and are applied in computer graphics, pattern recognition, and computational biology. Such convex hulls can also be useful in symmetry problems. For instance, when points are arranged symmetrically, the convex hull is also likely to be symmetrically shaped, which can be useful for object recognition in computer vision or pattern recognition. The focus is primarily on two-dimensional algorithms, including well-known methods like Gift Wrapping, Graham Scan, Divide and Conquer, QuickHull, TORCH, Kirkpatrick–Sediel, and Chan’s algorithms. These algorithms vary in terms of time complexity and scalability to higher dimensions. This study is extended to three-dimensional convex hull algorithms, such as NAW, randomized insertion, and parallelized versions, such as CudaHull and CudaChain. This study aimed to elucidate the operational principles, step-by-step procedures, and comparative time complexities of each algorithm. The implementation in Python facilitates a detailed comparison of the algorithmic performance through stepwise analysis and graphical outputs. The ultimate goal is to provide insights into the strengths and weaknesses of each algorithm under various scenarios, thereby offering a comprehensive guide for practical implementation. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

18 pages, 1757 KiB

Open AccessEditor’s ChoiceArticle

End-to-End Deployment of Winograd-Based DNNs on Edge GPU

by Pierpaolo Mori, Mohammad Shanur Rahman, Lukas Frickenstein, Shambhavi Balamuthu Sampath, Moritz Thoma, Nael Fasfous, Manoj Rohit Vemparala, Alexander Frickenstein, Walter Stechele and Claudio Passerone

Electronics 2024, 13(22), 4538; https://doi.org/10.3390/electronics13224538 - 19 Nov 2024

Viewed by 940

Abstract

The Winograd algorithm reduces the computational complexity of convolutional neural networks (CNNs) by minimizing the number of multiplications required for convolutions, making it particularly suitable for resource-constrained edge devices. Concurrently, most edge hardware accelerators utilize 8-bit integer arithmetic to enhance energy efficiency and [...] Read more.

The Winograd algorithm reduces the computational complexity of convolutional neural networks (CNNs) by minimizing the number of multiplications required for convolutions, making it particularly suitable for resource-constrained edge devices. Concurrently, most edge hardware accelerators utilize 8-bit integer arithmetic to enhance energy efficiency and reduce inference latency, requiring the quantization of CNNs before deployment. Combining Winograd-based convolution with quantization offers the potential for both performance acceleration and reduced energy consumption. However, prior research has identified significant challenges in this combination, particularly due to numerical instability and substantial accuracy degradation caused by the transformations required in the Winograd domain, making the two techniques incompatible on edge hardware. In this work, we describe our latest training scheme, which addresses these challenges, enabling the successful integration of Winograd-accelerated convolution with low-precision quantization while maintaining high task-related accuracy. Our approach mitigates the numerical instability typically introduced during the transformation, ensuring compatibility between the two techniques. Additionally, we extend our work by presenting a custom-optimized CUDA implementation of quantized Winograd convolution for NVIDIA edge GPUs. This implementation takes full advantage of the proposed training scheme, achieving both high computational efficiency and accuracy, making it a compelling solution for edge-based AI applications. Our training approach enables significant MAC reduction with minimal impact on prediction quality. Furthermore, our hardware results demonstrate up to a 3.4× latency reduction for specific layers, and a 1.44× overall reduction in latency for the entire DeepLabV3 model, compared to the standard implementation. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 689 KiB

Open AccessArticle

GPU Accelerating Algorithms for Three-Layered Heat Conduction Simulations

by Nicolás Murúa, Aníbal Coronel, Alex Tello, Stefan Berres and Fernando Huancas

Mathematics 2024, 12(22), 3503; https://doi.org/10.3390/math12223503 - 9 Nov 2024

Viewed by 726

Abstract

In this paper, we consider the finite difference approximation for a one-dimensional mathematical model of heat conduction in a three-layered solid with interfacial conditions for temperature and heat flux between the layers. The finite difference scheme is unconditionally stable, convergent, and equivalent to [...] Read more.

In this paper, we consider the finite difference approximation for a one-dimensional mathematical model of heat conduction in a three-layered solid with interfacial conditions for temperature and heat flux between the layers. The finite difference scheme is unconditionally stable, convergent, and equivalent to the solution of two linear algebraic systems. We evaluate various methods for solving the involved linear systems by analyzing direct and iterative solvers, including GPU-accelerated approaches using CuPy and PyCUDA. We evaluate performance and scalability and contribute to advancing computational techniques for modeling complex physical processes accurately and efficiently. Full article

(This article belongs to the Special Issue Advances in High-Performance Computing, Optimization and Simulation)

► Show Figures

Figure 1

17 pages, 3237 KiB

Open AccessArticle

ssc-cdi: A Memory-Efficient, Multi-GPU Package for Ptychography with Extreme Data

by Yuri Rossi Tonin, Alan Zanoni Peixinho, Mauro Luiz Brandao-Junior, Paola Ferraz and Eduardo Xavier Miqueles

J. Imaging 2024, 10(11), 286; https://doi.org/10.3390/jimaging10110286 - 7 Nov 2024

Viewed by 1356

Abstract

We introduce <tt>ssc-cdi</tt>, an open-source software package from the Sirius Scientific Computing family, designed for memory-efficient, single-node multi-GPU ptychography reconstruction. <tt>ssc-cdi</tt> offers a range of reconstruction engines in Python version 3.9.2 and C++/CUDA. It aims at developing local expertise and customized solutions to [...] Read more.

We introduce <tt>ssc-cdi</tt>, an open-source software package from the Sirius Scientific Computing family, designed for memory-efficient, single-node multi-GPU ptychography reconstruction. <tt>ssc-cdi</tt> offers a range of reconstruction engines in Python version 3.9.2 and C++/CUDA. It aims at developing local expertise and customized solutions to meet the specific needs of beamlines and user community of the Brazilian Synchrotron Light Laboratory (LNLS). We demonstrate ptychographic reconstruction of beamline data and present benchmarks for the package. Results show that <tt>ssc-cdi</tt> effectively handles extreme datasets typical of modern X-ray facilities without significantly compromising performance, offering a complementary approach to well-established packages of the community and serving as a robust tool for high-resolution imaging applications. Full article

(This article belongs to the Special Issue Recent Advances in X-ray Imaging)

► Show Figures

Figure 1

Figure 1
Diagram illustrating the batch distributions of measurements for three GPUs. Each colored block represents data inside of the respective GPU. A batch of <math display="inline"><semantics> <mrow> <mi>B</mi> <mo>≤</mo> <mi>N</mi> </mrow> </semantics></math> measurements is distributed to the GPU memory, so that the wavefronts are updated in parallel. Once a GPU finishes processing and is made available, a remaining batch of unprocessed data is loaded from RAM. After all batches have been loaded and all the wavefronts updated, the new object and probe matrices are calculated by GPU0 and then broadcasted to the other GPUs, so that each of them has faster access to O and P in the subsequent iteration. Full article ">Figure 2
Ptychography reconstruction of a Siemens Star measured at CARNAÚBA beamline. The finest features of the innermost circles are spaced <math display="inline"><semantics> <mrow> <mn>15</mn> <mspace width="0.166667em"/> <mi>nm</mi> </mrow> </semantics></math> from each other. The complex probe is shown in an hsv colormap, saturation encoding magnitude and hue encoding the phase. Full article ">Figure 3
Comparison of the simulated sample against the reconstruction using the DM algorithm from different packages. The insets show a zoomed region from the red square in the object and phase reconstructions. The reconstruction for <tt>ssc-cdi</tt> used the RAAR algorithm with parameter <math display="inline"><semantics> <mrow> <mi>β</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math>, such that the update function equals that of DM. For <tt>PyNX</tt> and <tt>PtyPy</tt>; we used the DM engine directly. In all cases, the same initial guesses were used: random magnitude and constant phase for the object array, and an inverse Fourier transform of the averaged measurements for the probe. Full article ">Figure 4
Single GPU performance of DM and PIE algorithms across different packages. The inset shows the same data without log scale on the vertical axis. Missing points on some curves indicate dimensions that were not supported by a specific engine. Note that <tt>PyNX</tt> does not provide an engine for an algorithm of the PIE family for comparison. Full article ">Figure 5
Multi-GPU performance of DM algorithm for <tt>ssc-cdi</tt> and <tt>PtyPy</tt> using batch sizes of (a) 128 and (b) 16. Dimensions that were not supported by an engine are the reason for missing points for some of the curves. The inset plots the same data without log scale on the vertical axis. Full article ">Figure 5 Cont.
Multi-GPU performance of DM algorithm for <tt>ssc-cdi</tt> and <tt>PtyPy</tt> using batch sizes of (a) 128 and (b) 16. Dimensions that were not supported by an engine are the reason for missing points for some of the curves. The inset plots the same data without log scale on the vertical axis. Full article ">Figure 6
Single GPU performance of <tt>ssc-cdi</tt> for DM and PIE engines at a conventional machine. RAAR was run with batch size <math display="inline"><semantics> <mrow> <mi>B</mi> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math> and managed to run up to a data size of <math display="inline"><semantics> <msup> <mn>2048</mn> <mn>2</mn> </msup> </semantics></math>. Full article ">

17 pages, 1369 KiB

Open AccessArticle

Enabling Parallel Performance and Portability of Solid Mechanics Simulations Across CPU and GPU Architectures

by Nathaniel Morgan, Caleb Yenusah, Adrian Diaz, Daniel Dunning, Jacob Moore, Erin Heilman, Evan Lieberman, Steven Walton, Sarah Brown, Daniel Holladay, Russell Marki, Robert Robey and Marko Knezevic

Information 2024, 15(11), 716; https://doi.org/10.3390/info15110716 - 7 Nov 2024

Viewed by 949

Abstract

Efficiently simulating solid mechanics is vital across various engineering applications. As constitutive models grow more complex and simulations scale up in size, harnessing the capabilities of modern computer architectures has become essential for achieving timely results. This paper presents advancements in running parallel [...] Read more.

Efficiently simulating solid mechanics is vital across various engineering applications. As constitutive models grow more complex and simulations scale up in size, harnessing the capabilities of modern computer architectures has become essential for achieving timely results. This paper presents advancements in running parallel simulations of solid mechanics on multi-core CPUs and GPUs using a single-code implementation. This portability is made possible by the C++ matrix and array (MATAR) library, which interfaces with the C++ Kokkos library, enabling the selection of fine-grained parallelism backends (e.g., CUDA, HIP, OpenMP, pthreads, etc.) at compile time. MATAR simplifies the transition from Fortran to C++ and Kokkos, making it easier to modernize legacy solid mechanics codes. We applied this approach to modernize a suite of constitutive models and to demonstrate substantial performance improvements across different computer architectures. This paper includes comparative performance studies using multi-core CPUs along with AMD and NVIDIA GPUs. Results are presented using a hypoelastic–plastic model, a crystal plasticity model, and the viscoplastic self-consistent generalized material model (VPSC-GMM). The results underscore the potential of using the MATAR library and modern computer architectures to accelerate solid mechanics simulations. Full article

(This article belongs to the Special Issue Advances in High Performance Computing and Scalable Software)

► Show Figures

Figure 1

10 pages, 860 KiB

Open AccessArticle

Erythrocytic α-Synuclein in Parkinson’s Disease and Progressive Supranuclear Palsy—A Pilot Study

by Costanza Maria Cristiani, Luana Scaramuzzino, Elvira Immacolata Parrotta, Giovanni Cuda, Aldo Quattrone and Andrea Quattrone

Biomedicines 2024, 12(11), 2510; https://doi.org/10.3390/biomedicines12112510 - 2 Nov 2024

Viewed by 884

Abstract

Background/Objectives: The current research examines the accuracy of α-synuclein in RBCs as a diagnostic biomarker for PD and PSP, despite their distinct molecular etiologies. Methods: We used ELISA to measure total, oligomeric, and p129-α-synuclein levels in erythrocytes from 8 PSP patients, 19 PD [...] Read more.

Background/Objectives: The current research examines the accuracy of α-synuclein in RBCs as a diagnostic biomarker for PD and PSP, despite their distinct molecular etiologies. Methods: We used ELISA to measure total, oligomeric, and p129-α-synuclein levels in erythrocytes from 8 PSP patients, 19 PD patients, and 18 healthy controls (HCs). The classification performances of RBC α-synuclein levels were investigated by receiver operator characteristic (ROC) curve. We also evaluated a possible correlation between RBC α-synuclein level and the biological and clinical features of our cohorts. Results: RBC total α-synuclein was higher in PSP patients compared to both PD patients and HCs, achieving good classification performance (AUC: 0.853) in distinguishing PSP patients from PD patients, with a sensitivity of 100% and a specificity of 70.6%; moreover, the levels of this biomarker positively correlated with disease severity in PSP group. Regarding oligomeric α-synuclein and p129-α-synuclein, the latter was slightly increased in RBCs from PSP patients compared to HCs, but no correlations were detected. Conclusions: Although these findings need to be confirmed in larger studies, our pilot work suggests that RBC total α-synuclein may represent a potential molecular biomarker for the differential diagnosis and clinical staging of PSP. Full article

(This article belongs to the Special Issue Recent Advances in Understanding of the Role of Synuclein Family Members in Health and Disease: Third Edition)

► Show Figures

Figure 1

65 pages, 2635 KiB

Open AccessTutorial

Understanding the Flows of Signals and Gradients: A Tutorial on Algorithms Needed to Implement a Deep Neural Network from Scratch

by Przemysław Klęsk

Appl. Sci. 2024, 14(21), 9972; https://doi.org/10.3390/app14219972 - 31 Oct 2024

Viewed by 812

Abstract

Theano, TensorFlow, Keras, Torch, PyTorch, and other software frameworks have remarkably stimulated the popularity of deep learning (DL). Apart from all the good they achieve, the danger of such frameworks is that they unintentionally spur a black-box attitude. Some practitioners play around with [...] Read more.

Theano, TensorFlow, Keras, Torch, PyTorch, and other software frameworks have remarkably stimulated the popularity of deep learning (DL). Apart from all the good they achieve, the danger of such frameworks is that they unintentionally spur a black-box attitude. Some practitioners play around with building blocks offered by frameworks and rely on them, having a superficial understanding of the internal mechanics. This paper constitutes a concise tutorial that elucidates the flows of signals and gradients in deep neural networks, enabling readers to successfully implement a deep network from scratch. By “from scratch”, we mean with access to a programming language and numerical libraries but without any components that hide DL computations underneath. To achieve this goal, the following five topics need to be well understood: (1) automatic differentiation, (2) the initialization of weights, (3) learning algorithms, (4) regularization, and (5) the organization of computations. We cover all of these topics in the paper. From a tutorial perspective, the key contributions include the following: (a) proposition of R and S operators for tensors—rashape and stack, respectively—that facilitate algebraic notation of computations involved in convolutional, pooling, and flattening layers; (b) a Python project named hmdl (“home-made deep learning”); and (c) consistent notation across all mathematical contexts involved. The hmdl project serves as a practical example of implementation and a reference. It was built using NumPy and Numba modules with JIT and CUDA amenities applied. In the experimental section, we compare hmdl implementation to Keras (backed with TensorFlow). Finally, we point out the consistency of the two in terms of convergence and accuracy, and we observe the superiority of the latter in terms of efficiency. Full article

(This article belongs to the Special Issue Advanced Digital Signal Processing and Its Applications)

► Show Figures

Figure 1

13 pages, 735 KiB

Open AccessArticle

Proximity Elongation Assay and ELISA for the Identification of Serum Diagnostic Biomarkers in Parkinson’s Disease and Progressive Supranuclear Palsy

by Costanza Maria Cristiani, Camilla Calomino, Luana Scaramuzzino, Maria Stella Murfuni, Elvira Immacolata Parrotta, Maria Giovanna Bianco, Giovanni Cuda, Aldo Quattrone and Andrea Quattrone

Int. J. Mol. Sci. 2024, 25(21), 11663; https://doi.org/10.3390/ijms252111663 - 30 Oct 2024

Viewed by 1021

Abstract

Clinical differentiation of progressive supranuclear palsy (PSP) from Parkinson’s disease (PD) is challenging due to overlapping phenotypes and late onset of PSP specific symptoms, highlighting the need for easily assessable biomarkers. We used proximity elongation assay (PEA) to analyze 460 proteins in serum [...] Read more.

Clinical differentiation of progressive supranuclear palsy (PSP) from Parkinson’s disease (PD) is challenging due to overlapping phenotypes and late onset of PSP specific symptoms, highlighting the need for easily assessable biomarkers. We used proximity elongation assay (PEA) to analyze 460 proteins in serum samples from 46 PD, 30 PSP patients, and 24 healthy controls. ANCOVA was used to identify the most promising proteins and machine learning (ML) XGBoost and random forest algorithms to assess their classification performance. Promising proteins were also quantified by ELISA. Moreover, correlations between serum biomarkers and biological and clinical features were investigated. We identified five proteins (TFF3, CPB1, OPG, CNTN1, TIMP4) showing different levels between PSP and PD, which achieved good performance (AUC: 0.892) when combined by ML. On the other hand, when the three most significant biomarkers (TFF3, CPB1 and OPG) were analyzed by ELISA, there was no difference between groups. Serum levels of TFF3 positively correlated with age in all subjects’ groups, while for OPG and CPB1 such a correlation occurred in PSP patients only. Moreover, CPB1 positively correlated with disease severity in PD, while no correlations were observed in the PSP group. Overall, we identified CPB1 correlating with PD severity, which may support clinical staging of PD. In addition, our results showing discrepancy between PEA and ELISA technology suggest that caution should be used when translating proteomic findings into clinical practice. Full article

(This article belongs to the Special Issue Novel Concepts, New Perspectives, and Current Therapies for Parkinson’s Disease: 2nd Edition)

► Show Figures

Figure 1

24 pages, 830 KiB

Open AccessArticle

On a Simplified Approach to Achieve Parallel Performance and Portability Across CPU and GPU Architectures

by Nathaniel Morgan, Caleb Yenusah, Adrian Diaz, Daniel Dunning, Jacob Moore, Erin Heilman, Calvin Roth, Evan Lieberman, Steven Walton, Sarah Brown, Daniel Holladay, Marko Knezevic, Gavin Whetstone, Zachary Baker and Robert Robey

Information 2024, 15(11), 673; https://doi.org/10.3390/info15110673 - 28 Oct 2024

Cited by 1 | Viewed by 2353

Abstract

This paper presents software advances to easily exploit computer architectures consisting of a multi-core CPU and CPU+GPU to accelerate diverse types of high-performance computing (HPC) applications using a single code implementation. The paper describes and demonstrates the performance of the open-source C++ mat [...] Read more.

This paper presents software advances to easily exploit computer architectures consisting of a multi-core CPU and CPU+GPU to accelerate diverse types of high-performance computing (HPC) applications using a single code implementation. The paper describes and demonstrates the performance of the open-source C++ matrix and array (MATAR) library that uniquely offers: (1) a straightforward syntax for programming productivity, (2) usable data structures for data-oriented programming (DOP) for performance, and (3) a simple interface to the open-source C++ Kokkos library for portability and memory management across CPUs and GPUs. The portability across architectures with a single code implementation is achieved by automatically switching between diverse fine-grained parallelism backends (e.g., CUDA, HIP, OpenMP, pthreads, etc.) at compile time. The MATAR library solves many longstanding challenges associated with easily writing software that can run in parallel on any computer architecture. This work benefits projects seeking to write new C++ codes while also addressing the challenges of quickly making existing Fortran codes performant and portable over modern computer architectures with minimal syntactical changes from Fortran to C++. We demonstrate the feasibility of readily writing new C++ codes and modernizing existing codes with MATAR to be performant, parallel, and portable across diverse computer architectures. Full article

(This article belongs to the Special Issue Advances in High Performance Computing and Scalable Software)

► Show Figures

Figure 1

23 pages, 16714 KiB

Open AccessArticle

A Geographically Weighted Regression–Compute Unified Device Architecture Approach to Explore the Spatial Agglomeration and Heterogeneity in Arable Land Consumption in Southwest China

by Chang Liu, Tingting Xu, Letao Han, Sapu Du and Aohua Tian

Agriculture 2024, 14(10), 1675; https://doi.org/10.3390/agriculture14101675 - 25 Sep 2024

Viewed by 884

Abstract

Arable land loss has become a critical issue in China because of rapid urbanization, industrial expansion, and unsustainable agricultural practices. While previous studies have explored the factors contributing to this loss, they often fall short in addressing the challenges of spatial heterogeneity and [...] Read more.

Arable land loss has become a critical issue in China because of rapid urbanization, industrial expansion, and unsustainable agricultural practices. While previous studies have explored the factors contributing to this loss, they often fall short in addressing the challenges of spatial heterogeneity and large-scale dataset analysis. This research introduces an innovative approach to geographically weighted regression (GWR) for assessing arable land loss in China, effectively addressing these challenges. Focusing on Chongqing, Guizhou, and Yunnan Provinces over the past two decades, it examines spatial autocorrelation with R-squared values exceeding 0.6 and residuals. Eight factors, including environmental elements (rain, evaporation, slope, digital elevation model) and human activities (distance to city, distance to roads, population, GDP), were analyzed. By visualizing and analyzing R² spatial patterns, the results reveal a clear spatial agglomeration distribution, primarily in urban areas with industries, highly urbanized cities, and flat terrains near rivers, influenced by GDP, population, rain, and slope. The novelty of this study is that it significantly enhances GWR computational capabilities for handling extensive datasets by utilizing Compute Unified Device Architecture (CUDA) on a high-performance GPU cloud server. Simultaneously, it conducts comprehensive analyses of the GWR model’s local results through visualization and spatial autocorrelation tools, enhancing the interpretability of the GWR model. Through spatial clustering analysis of local results, this study enables targeted exploration of factors influencing arable land changes in various temporal and spatial dimensions while also evaluating the reliability of the model results. Full article

(This article belongs to the Section Agricultural Economics, Policies and Rural Management)

► Show Figures

Figure 1

Figure 1
Location of the study area—Chongqing, Guizhou, and Yunnan in China—and its elevation range. Full article ">Figure 2
The steps to implement GWR analysis with the grid method. (a) Create the fishnet, and (b) use the fishnet to demonstrate the arable land loss rate. The color of different degree represents the loss rate of arable land. The darker the color, the higher the rate of arable land loss. Full article ">Figure 3
Steps in the analysis of arable land loss rates using a fishnet. Full article ">Figure 4
Arable land loss during 2000–2010 (a) and 2010–2020 (b) in the three provinces considered in this study. Full article ">Figure 5
Fishnet grid with arable land loss rates during 2000–2010 and 2010–2020. Full article ">Figure 6
Spatial distribution of R2 during 2000–2010 and 2010–2020 in the three provinces considered in this study. The color gradient represents the distribution of adjusted R2 values across intervals. Deeper colors signify higher R2 values, indicating stronger explanatory power of the variables for farmland degradation, while lighter colors indicate lower R2 values, reflecting weaker explanatory power. The subgraph represents the highly adjusted R2 cluster area. Full article ">Figure 7
Spatial distribution of residual during 2000–2010 and 2010–2020 in the three provinces considered in this study. The red boxes represent areas with high residuals across three regions over different time periods. Full article ">

Search Results (250)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (250)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI