Saved Queries

Disc cutters are essential for shield tunnel construction, and monitoring their wear is vital for safety and efficiency. Due to their position in the soil silo, it is more challenging to observe the wear of disc cutters directly, making accurate and efficient detection a technical challenge. However, existing methods that treat the problem as a classification task often overlook the issue of data imbalance. To solve these problems, this paper proposes an end-to-end detection method for disc cutter wear state called the Multivariate Selective Attention Prototype Network (MVSAPNet). The method introduces an attention prototype network for variable selection, which selects important features from many input parameters using a specialized variable selection network. To address the problem of imbalance in the wear data, a prototype network is used to learn the centers of the normal and wear state classes, and the detection of the wear state is achieved by detecting high-dimensional features and comparing their distances to the class centers. The method performs better on the data collected from the Ma Wan Cross-Sea Tunnel project in Shenzhen, China, with an accuracy of 0.9187 and an F1 score of 0.8978, yielding higher values than the experimental results of other classification models. Full article

(This article belongs to the Topic Digital and Intelligent Technologies and Application in Urban Construction, Operation, Maintenance, and Renewal)

►▼ Show Figures

Figure 1

23 pages, 69279 KiB

Open AccessArticle

A Novel Equivariant Self-Supervised Vector Network for Three-Dimensional Point Clouds

by Kedi Shen, Jieyu Zhao and Min Xie

Algorithms 2025, 18(3), 152; https://doi.org/10.3390/a18030152 - 7 Mar 2025

Abstract

For networks that process 3D data, estimating the orientation and position of 3D objects is a challenging task. This is because the traditional networks are not robust to the rotation of the data, and their internal workings are largely opaque and uninterpretable. To solve this problem, a novel equivariant self-supervised vector network for point clouds is proposed. The network can learn the rotation direction information of the 3D target and estimate the rotational pose change of the target, and the interpretability of the equivariant network is studied using information theory. The utilization of vector neurons within the network lifts the scalar data to vector representations, enabling the network to learn the pose information inherent in the 3D target. The network can perform complex rotation-equivariant tasks after pre-training, and it shows impressive performance in complex tasks like category-level pose change estimation and rotation-equivariant reconstruction. We demonstrate through experiments that our network can accurately detect the orientation and pose change of point clouds and visualize the latent features. Moreover, it performs well in invariant tasks such as classification and category-level segmentation. Full article

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

►▼ Show Figures

Figure 1

Figure 1
Equivariant self-supervised vector network’s overall architecture. The input point cloud is divided into several point cloud patches, after which they are randomly masked, and then the point embed operation is carried out through the equivariant layer and token embed layer. Then, the obtained point embedding with vector information is fed into the autoencoder for pre-training. When the pre-training is finished, the decoder module will be abandoned and replaced by different fine-tuning heads. After connecting various fine-tuning heads to the front network, the network can be applied to different downstream tasks. Full article ">Figure 2
A diagram of the framework of the VN-Transformer. Its structure is similar to the standard Transformer framework. The left is the flow of tokens in the whole autoencoder framework, and the right is the internal structure of each block and the calculation process of VN-Attention. Full article ">Figure 3
Rendering of the reconstruction effect of our network on ShapeNet. The network will reconstruct the visible point clouds under different rotation states. During training, the network only learns the non-rotated points, but, in the test, the network can also reconstruct the rotated point clouds under z and SO(3) conditions well. The training in the figure uses a 60% mask rate. Meanwhile, in order to better present the results, we render the point clouds in this figure and the later results presentation figure to some extent. Full article ">Figure 4
Rendering of the reconstruction effect of our network on human point clouds. The human point clouds are trained in the same way as ShapeNet. These human point clouds were obtained by sampling on the mesh model of the HumanBody dataset. Since the initial pose in the original HumanBody dataset is confusing (i.e., the pose of the point cloud below the mesh is the origin pose), we manually adjusted the mesh to show its rough pose and appearance. The training in the figure uses a 60% mask rate. Full article ">Figure 5
Demonstration of the latent features of the network. The top of each row is the state of the input point cloud, and the bottom is the state of its corresponding latent feature, which can be seen to rotate with the rotation of the point cloud. The origin column is the point cloud without rotation, and the rest are the point clouds rotated by different angles, for example, 90°z means that the point cloud rotates 90 degrees around the z-axis. Full article ">Figure 6
Gradual rotation of point clouds and their corresponding output results. This figure shows how the network outputs different poses for different angles of the same point cloud around the same axis (z-axis). The top row is the input point cloud P after applying the rotation matrix S, and the bottom row is the predicted pose generated by the network based on the rotated point cloud <math display="inline"><semantics> <msub> <mi>P</mi> <mi>r</mi> </msub> </semantics></math>, which is generated from the point cloud P after applying the rotation matrix S. The predicted pose is expressed as the rotation matrix <math display="inline"><semantics> <msup> <mi>S</mi> <mo>′</mo> </msup> </semantics></math>. After reapplying <math display="inline"><semantics> <msup> <mi>S</mi> <mo>′</mo> </msup> </semantics></math> to the origin point cloud P, a new rotation point cloud <math display="inline"><semantics> <msubsup> <mi>P</mi> <mi>r</mi> <mo>′</mo> </msubsup> </semantics></math> can be obtained, and whether the predicted pose is consistent can be judged by comparing <math display="inline"><semantics> <msub> <mi>P</mi> <mi>r</mi> </msub> </semantics></math> with <math display="inline"><semantics> <msubsup> <mi>P</mi> <mi>r</mi> <mo>′</mo> </msubsup> </semantics></math>. Full article ">Figure 7
Randomly rotated point cloud and their corresponding output results in 3D space. This figure shows the comparison between the pose output by the network and the origin point cloud for different point clouds rotated randomly under SO(3). Full article ">Figure 8
Segmentation of rotated point clouds. This figure illustrates the comparison of segmentation results obtained from our network’s output for different point cloud rotation states, namely, Z and SO(3), in contrast to the original point cloud. Full article ">

20 pages, 2207 KiB

Open AccessArticle

A Novel TLS-Based Fingerprinting Approach That Combines Feature Expansion and Similarity Mapping

by Amanda Thomson, Leandros Maglaras and Naghmeh Moradpoor

Future Internet 2025, 17(3), 120; https://doi.org/10.3390/fi17030120 - 7 Mar 2025

Viewed by 93

Abstract

Malicious domains are part of the landscape of the internet but are becoming more prevalent and more dangerous both to companies and to individuals. They can be hosted on various technologies and serve an array of content, including malware, command and control and complex phishing sites that are designed to deceive and expose. Tracking, blocking and detecting such domains is complex, and very often it involves complex allowlist or denylist management or SIEM integration with open-source TLS fingerprinting techniques. Many fingerprinting techniques, such as JARM and JA3, are used by threat hunters to determine domain classification, but with the increase in TLS similarity, particularly in CDNs, they are becoming less useful. The aim of this paper was to adapt and evolve open-source TLS fingerprinting techniques with increased features to enhance granularity and to produce a similarity-mapping system that would enable the tracking and detection of previously unknown malicious domains. This was achieved by enriching TLS fingerprints with HTTP header data and producing a fine-grain similarity visualisation that represented high-dimensional data using MinHash and Locality-Sensitive Hashing. Influence was taken from the chemistry domain, where the problem of high-dimensional similarity in chemical fingerprints is often encountered. An enriched fingerprint was produced, which was then visualised across three separate datasets. The results were analysed and evaluated, with 67 previously unknown malicious domains being detected based on their similarity to known malicious domains and nothing else. The similarity-mapping technique produced demonstrates definite promise in the arena of early detection of malware and phishing domains. Full article

(This article belongs to the Special Issue Intrusion Detection and Resiliency in Cyber-Physical Systems and Networks)

►▼ Show Figures

Figure 1

19 pages, 6174 KiB

Open AccessArticle

Sub-Pixel Displacement Measurement with Swin Transformer: A Three-Level Classification Approach

by Yongxing Lin, Xiaoyan Xu and Zhixin Tie

Appl. Sci. 2025, 15(5), 2868; https://doi.org/10.3390/app15052868 - 6 Mar 2025

Viewed by 97

Abstract

In order to avoid the dependence of traditional sub-pixel displacement methods on interpolation method calculation, image gradient calculation, initial value estimation and iterative calculation, a Swin Transformer-based sub-pixel displacement measurement method (ST-SDM) is proposed, and a square dataset expansion method is also proposed to rapidly expand the training dataset. The ST-SDM computes sub-pixel displacement values of different scales through three-level classification tasks, and solves the problem of positive and negative displacement with the rotation relative tag value method. The accuracy of the ST-SDM is verified by simulation experiments, and its robustness is verified by real rigid body experiments. The experimental results show that the ST-SDM model has higher accuracy and higher efficiency than the comparison algorithm. Full article

►▼ Show Figures

Figure 1

25 pages, 7248 KiB

Open AccessArticle

CEEMDAN-IHO-SVM: A Machine Learning Research Model for Valve Leak Diagnosis

by Ruixue Wang and Ning Zhao

Algorithms 2025, 18(3), 148; https://doi.org/10.3390/a18030148 - 5 Mar 2025

Viewed by 121

Abstract

Due to the complex operating environment of valves, when a fault occurs inside a valve, the vibration signal generated by the fault is easily affected by the environmental noise, making the extraction of fault features difficult. To address this problem, this paper proposes a feature extraction method based on the combination of Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) and Fuzzy Entropy (FN). Due to the slow convergence speed and the tendency to fall into local optimal solutions of the Hippopotamus Optimization Algorithm (HO), an improved Hippopotamus Optimization (IHO) algorithm-optimized Support Vector Machine (SVM) model for valve leakage diagnosis is introduced to further enhance the accuracy of valve leakage diagnosis. The improved Hippopotamus Optimization algorithm initializes the hippopotamus population with Tent chaotic mapping, designs an adaptive weight factor, and incorporates adaptive variation perturbation. Moreover, the performance of IHO was proven to be optimal compared to HO, Particle Swarm Optimization (PSO), Grey Wolf Optimization (GWO), Whale Optimization Algorithm (WOA), and Sparrow Search Algorithm (SSA) by calculating twelve test functions. Subsequently, the IHO-SVM classification model was established and applied to valve leakage diagnosis. The prediction effects of the seven models, IHO-SVM. HO-SVM, PSO-SVM, GWO-SVM, WOA-SVM, SSA-SVM, and SVM were compared and analyzed with actual data. As a result, the comparison indicated that IHO-SVM has desirable robustness and generalization, which successfully improves the classification efficiency and the recognition rate in fault diagnosis. Full article

(This article belongs to the Section Evolutionary Algorithms and Machine Learning)

►▼ Show Figures

Figure 1

19 pages, 3746 KiB

Open AccessArticle

The Impact of the Human Factor on Communication During a Collision Situation in Maritime Navigation

by Leszek Misztal and Paulina Hatlas-Sowinska

Appl. Sci. 2025, 15(5), 2797; https://doi.org/10.3390/app15052797 - 5 Mar 2025

Viewed by 175

Abstract

In this paper, the authors draw attention to the significant impact of the human factor during collision situations in maritime navigation. The problems in the communication process between navigators are so excessive that the authors propose automatic communication. This is an alternative method to the current one. The presented system comprehensively performs communication tasks during a sea voyage. To reach the mentioned goal, AI methods of natural language processing and additional properties of metaontology (ontology supplemented with objective functions) are applied. Dedicated to maritime transport applications, the model for translating a natural language into an ontology consists of multiple steps and uses AI methods of classification for the recognition of a message from the ship’s bridge. The reverse model is also multi-stage and uses a created rule-based knowledge base to create natural-language sentences built on the basis of the ontology. Validation of the model’s accuracy results was conducted through accuracy assessment coefficients for information classification, commonly used in science. Receiver operating characteristic (ROC) curves represent the results in the datasets. The presented solution of the designed architecture of the system as well as algorithms developed in the software prototype confirmed the correctness of the assumptions in the described study. The authors demonstrated that it is feasible to successfully apply metaontology and machine learning methods in the proposed prototype software for ship-to-ship communication. Full article

(This article belongs to the Section Marine Science and Engineering)

►▼ Show Figures

Figure 1

18 pages, 35678 KiB

Open AccessArticle

Novelty Recognition: Fish Species Classification via Open-Set Recognition

by Manuel Córdova, Ricardo da Silva Torres, Aloysius van Helmond and Gert Kootstra

Sensors 2025, 25(5), 1570; https://doi.org/10.3390/s25051570 - 4 Mar 2025

Viewed by 135

Abstract

To support the sustainable use of marine resources, regulations have been proposed to reduce fish discards focusing on the registration of all listed species. To comply with such regulations, computer vision methods have been developed. Nevertheless, current approaches are constrained by their closed-set nature, where they are designed only to recognize fish species that were present during training. In the real world, however, samples of unknown fish species may appear in different fishing regions or seasons, requiring fish classification to be treated as an open-set problem. This work focuses on the assessment of open-set recognition to automate the registration process of fish. The state-of-the-art Multiple Gaussian Prototype Learning (MGPL) was compared with the simple yet powerful Open-Set Nearest Neighbor (OSNN) and the Probability of Inclusion Support Vector Machine (PISVM). For the experiments, the Fish Detection and Weight Estimation dataset, containing images of 2216 fish instances from nine species, was used. Experimental results demonstrated that OSNN and PISVM outperformed MGPL in both recognizing known and unknown species. OSNN achieved the best results when classifying samples as either one of the known species or as an unknown species with an F1-macro of

0.79 \pm 0.05

and an AUROC score of

0.92 \pm 0.01

surpassing PISVM by

0.05

and

0.03

, respectively. Full article

(This article belongs to the Special Issue Sensor Technologies for Ocean Environments: Impact Assessment, Monitoring and Protection)

►▼ Show Figures

Figure 1

26 pages, 330 KiB

Open AccessFeature PaperArticle

Construction of Countably Infinite Programs That Evade Malware/Non-Malware Classification for Any Given Formal System

by Vasiliki Liagkou, Panagiotis E. Nastou, Paul Spirakis and Yannis C. Stamatiou

Cryptography 2025, 9(1), 16; https://doi.org/10.3390/cryptography9010016 - 4 Mar 2025

Viewed by 117

Abstract

The formal study of computer malware was initiated in the seminal work of Fred Cohen in the mid-80s, who applied elements of Computation Theory in the investigation of the theoretical limits of using the Turing Machine formal model of computation in detecting viruses. Cohen gave a simple but realistic formal definition of the characteristic actions of a computer virus as a Turing Machine that replicates itself and proved that detecting this behaviour, in general, is an undecidable problem. In this paper, we complement Cohen’s approach by providing a simple generalization of his definition of a computer virus so as to model any type of malware behaviour and showing that the malware/non-malware classification problem is, again, undecidable. Most importantly, beyond Cohen’s work, our work provides a generic theoretical framework for studying anti-malware applications and identifying, at an early stage, before their deployment, several of their inherent vulnerabilities which may lead to the construction of zero-day exploits and malware strains with stealth properties. To this end, we show that for any given formal system, which can be seen as an anti-malware formal model, there are infinitely many, effectively constructible programs for which no proof can be produced by the formal system that they are either malware or non-malware programs. Moreover, infinitely many of these programs are, indeed, malware programs which evade the detection powers of the given formal system. Full article

35 pages, 5528 KiB

Open AccessReview

Vehicle to Grid: Technology, Charging Station, Power Transmission, Communication Standards, Techno-Economic Analysis, Challenges, and Recommendations

by Parag Biswas, Abdur Rashid, A. K. M. Ahasan Habib, Md Mahmud, S. M. A. Motakabber, Sagar Hossain, Md. Rokonuzzaman, Altaf Hossain Molla, Zambri Harun, Md Munir Hayet Khan, Wan-Hee Cheng and Thomas M. T. Lei

World Electr. Veh. J. 2025, 16(3), 142; https://doi.org/10.3390/wevj16030142 - 3 Mar 2025

Viewed by 378

Abstract

Electric vehicles (EVs) must be used as the primary mode of transportation as part of the gradual transition to more environmentally friendly clean energy technology and cleaner power sources. Vehicle-to-grid (V2G) technology has the potential to improve electricity demand, control load variability, and improve the sustainability of smart grids. The operation and principles of V2G and its varieties, the present classifications and types of EVs sold on the market, applicable policies for V2G and business strategy, implementation challenges, and current problem-solving techniques have not been thoroughly examined. This paper exposes the research gap in the V2G area and more accurately portrays the present difficulties and future potential in V2G deployment globally. The investigation starts by discussing the advantages of the V2G system and the necessary regulations and commercial representations implemented in the last decade, followed by a description of the V2G technology, charging communication standards, issues related to V2G and EV batteries, and potential solutions. A few major issues were brought to light by this investigation, including the lack of a transparent business model for V2G, the absence of stakeholder involvement and government subsidies, the excessive strain that V2G places on EV batteries, the lack of adequate bidirectional charging and standards, the introduction of harmonic voltage and current into the grid, and the potential for unethical and unscheduled V2G practices. The results of recent studies and publications from international organizations were altered to offer potential answers to these research constraints and, in some cases, to highlight the need for further investigation. V2G holds enormous potential, but the plan first needs a lot of financing, teamwork, and technological development. Full article

(This article belongs to the Special Issue Electric Vehicles and Smart Grid Interaction)

►▼ Show Figures

Figure 1

24 pages, 5117 KiB

Open AccessArticle

Estimation of Aboveground Biomass of Picea schrenkiana Forests Considering Vertical Zonality and Stand Age

by Guohui Zhang, Donghua Chen, Hu Li, Minmin Pei, Qihang Zhen, Jian Zheng, Haiping Zhao, Yingmei Hu and Jingwei Fan

Forests 2025, 16(3), 445; https://doi.org/10.3390/f16030445 - 1 Mar 2025

Viewed by 198

Abstract

The aboveground biomass (AGB) of forests reflects the productivity and carbon-storage capacity of the forest ecosystem. Although AGB estimation techniques have become increasingly sophisticated, the relationships between AGB, spatial distribution, and growth stages still require further exploration. In this study, the Picea schrenkiana (Picea schrenkiana var. tianschanica) forest area in the Kashi River Basin of the Ili River Valley in the western Tianshan Mountains was selected as the research area. Based on forest resources inventory data, Gaofen-1 (GF-1), Gaofen-6 (GF-6), Gaofen-3 (GF-3) Polarimetric Synthetic Aperture Radar (PolSAR), and DEM data, we classified the Picea schrenkiana forests in the study area into three cases: the Whole Forest without vertical zonation and stand age, Vertical Zonality Classification without considering stand age, and Stand-Age Classification without considering vertical zonality. Then, for each case, we used eXtreme Gradient Boosting (XGBoost), Back Propagation Neural Network (BPNN), and Residual Networks (ResNet), respectively, to estimate the AGB of forests in the study area. The results show that: (1) The integration of multi-source remote-sensing data and the ResNet can effectively improve the remote-sensing estimation accuracy of the AGB of Picea schrenkiana. (2) Furthermore, classification by vertical zonality and stand ages can reduce the problems of low-value overestimation and high-value underestimation to a certain extent. Full article

(This article belongs to the Special Issue Modeling Aboveground Forest Biomass: New Developments)

►▼ Show Figures

Figure 1

17 pages, 1206 KiB

Open AccessArticle

A Smoothing Newton Method for Real-Time Pricing in Smart Grids Based on User Risk Classification

by Linsen Song and Gaoli Sheng

Mathematics 2025, 13(5), 822; https://doi.org/10.3390/math13050822 - 28 Feb 2025

Viewed by 244

Abstract

Real-time pricing is an ideal pricing mechanism for regulating the balance of power supply and demand in smart grid. Considering the differences in electricity consumption risks among different types of users, a social welfare maximization model with user risk classification is proposed in this paper. Also, a smoothing Newton method is investigated for solving the proposed model. Firstly, the convexity of the model is discussed, which implies that the local optimum of the model is also the global optimum. Then, by transforming the proposed model into a smooth equation system based on the Karush–Kuhn–Tucker (KKT) conditions, we devise a smoothing Newton algorithm integrated with Powell–Wolfe line search criteria. The nonsingularity of the corresponding function’s Jacobian matrix is obtained to ensure the stability of the proposed algorithm. Finally, we give a comparison between the proposed model and the unclassified risk model and the proposed algorithm and the distributed algorithm for real-time pricing, time-of-use pricing, and fixed pricing, respectively. The numerical results demonstrate the effectiveness of the model and the algorithm. Full article

►▼ Show Figures

Figure 1

Figure 1
Comparison The flow chart of Algorithm 1. Full article ">Figure 2
Comparison of price between the risk-classified model and unclassified model under different scales of users based on the smoothing Newton algorithm. Full article ">Figure 3
Comparison of the social welfare between risk-classified model and unclassified model under different scales of users based on the smoothing Newton algorithm. Full article ">Figure 4
Comparison of price and the social welfare between the smoothing Newton algorithm and the distributed algorithm. Full article ">Figure 5
Comparison of price and the social welfare between the RTP, TOU, and FP strategies. Full article ">

16 pages, 2179 KiB

Open AccessArticle

MNv3-MFAE: A Lightweight Network for Video Action Recognition

by Jie Liu, Wenyue Liu and Ke Han

Electronics 2025, 14(5), 981; https://doi.org/10.3390/electronics14050981 - 28 Feb 2025

Viewed by 211

Abstract

Video action recognition aims to achieve the automatic classification of human behaviors by analyzing the actions in videos, with its core lying in accurately capturing the spatial detail features of images and the temporal dynamic features among video frames. In response to the problems of limited action recognition accuracy in videos containing complex temporal dynamics and large network model parameters, this paper proposes an innovative multi-feature fusion information modeling method. This paper designs a plug-and-play multi-feature action extraction (MFAE) module. The module adopts a multi-branch parallel processing strategy and integrates the functions of modeling and extracting temporal features, spatial features, and motion features to ensure the efficient modeling of the spatio-temporal information, inter-frame differences, and temporal dependencies of video actions. Meanwhile, the network employs a lightweight channel attention module (TiedSE), which reduces the complexity of the network model and decreases the number of network parameters. Finally, the effectiveness of the model is demonstrated on the Jester dataset, SomethingV2 dataset, and UCF101 dataset, achieving accuracies of 94.01%, 66.19%, and 96.74% with only 1.45 M parameters, significantly fewer than existing algorithms. The proposed method balances accuracy and computational efficiency in video action recognition, overcoming the shortcomings of traditional algorithms in temporal modeling and demonstrating its effectiveness in the task of video action recognition. Full article

►▼ Show Figures

Figure 1

17 pages, 72606 KiB

Open AccessArticle

Classification of Large Scale Hyperspectral Remote Sensing Images Based on LS3EU-Net++

by Hengqian Zhao, Zhengpu Lu, Shasha Sun, Pan Wang, Tianyu Jia, Yu Xie and Fei Xu

Remote Sens. 2025, 17(5), 872; https://doi.org/10.3390/rs17050872 - 28 Feb 2025

Viewed by 155

Abstract

Aimed at the limitation that existing hyperspectral classification methods were mainly oriented to small-scale images, this paper proposed a new large-scale hyperspectral remote sensing image classification method, LS3EU-Net++ (Lightweight Encoder and Integrated Spatial Spectral Squeeze and Excitation U-Net++). The method optimized the U-Net++ architecture by introducing a lightweight encoder and combining the Spatial Spectral Squeeze and Excitation (S3E) Attention Module, which maintained the powerful feature extraction capability while significantly reducing the training cost. In addition, the model employed a composite loss function combining focal loss and Jaccard loss, which could focus more on difficult samples, thus improving pixel-level accuracy and classification results. To solve the sample imbalance problem in hyperspectral images, this paper also proposed a data enhancement strategy based on “copy–paste”, which effectively increased the diversity of the training dataset. Experiments on large-scale satellite hyperspectral remote sensing images from the Zhuhai-1 satellite demonstrated that LS3EU-Net++ exhibited superiority over the U-Net++ benchmark. Specifically, the overall accuracy (OA) was improved by 5.35%, and the mean Intersection over Union (mIoU) by 12.4%. These findings suggested that the proposed method provided a robust solution for large-scale hyperspectral image classification, effectively balancing accuracy and computational efficiency. Full article

(This article belongs to the Topic Hyperspectral Imaging and Signal Processing)

►▼ Show Figures

Figure 1

19 pages, 7206 KiB

Open AccessArticle

Optimizing Model Performance and Interpretability: Application to Biological Data Classification

by Zhenyu Huang, Xuechen Mu, Yangkun Cao, Qiufen Chen, Siyu Qiao, Bocheng Shi, Gangyi Xiao, Yan Wang and Ying Xu

Genes 2025, 16(3), 297; https://doi.org/10.3390/genes16030297 - 28 Feb 2025

Viewed by 260

Abstract

This study introduces a novel framework that simultaneously addresses the challenges of performance accuracy and result interpretability in transcriptomic-data-based classification. Background/objectives: In biological data classification, it is challenging to achieve both high performance accuracy and interpretability at the same time. This study presents a framework to address both challenges in transcriptomic-data-based classification. The goal is to select features, models, and a meta-voting classifier that optimizes both classification performance and interpretability. Methods: The framework consists of a four-step feature selection process: (1) the identification of metabolic pathways whose enzyme-gene expressions discriminate samples with different labels, aiding interpretability; (2) the selection of pathways whose expression variance is largely captured by the first principal component of the gene expression matrix; (3) the selection of minimal sets of genes, whose collective discerning power covers 95% of the pathway-based discerning power; and (4) the introduction of adversarial samples to identify and filter genes sensitive to such samples. Additionally, adversarial samples are used to select the optimal classification model, and a meta-voting classifier is constructed based on the optimized model results. Results: The framework applied to two cancer classification problems showed that in the binary classification, the prediction performance was comparable to the full-gene model, with F1-score differences of between −5% and 5%. In the ternary classification, the performance was significantly better, with F1-score differences ranging from −2% to 12%, while also maintaining excellent interpretability of the selected feature genes. Conclusions: This framework effectively integrates feature selection, adversarial sample handling, and model optimization, offering a valuable tool for a wide range of biological data classification problems. Its ability to balance performance accuracy and high interpretability makes it highly applicable in the field of computational biology. Full article

(This article belongs to the Section Bioinformatics)

►▼ Show Figures

Figure 1

28 pages, 1129 KiB

Open AccessArticle

Mass Generation of Programming Learning Problems from Public Code Repositories

by Oleg Sychev and Dmitry Shashkov

Big Data Cogn. Comput. 2025, 9(3), 57; https://doi.org/10.3390/bdcc9030057 - 28 Feb 2025

Viewed by 234

Abstract

We present an automatic approach for generating learning problems for teaching introductory programming in different programming languages. The current implementation allows input and output in the three most popular programming languages for teaching introductory programming courses: C++, Java, and Python. The generator stores learning problems using the “meaning tree”, a language-independent representation of a syntax tree. During this study, we generated a bank of 1,428,899 learning problems focused on the order of expression evaluation. They were generated in about 16 h. The learning problems were classified for further use with the used concepts, possible domain-rule violations, and required skills; they covered a wide range of difficulties and topics. The problems were validated by automatically solving them in an intelligent tutoring system that recorded the actual skills used and violations made. The generated problems were favorably assessed by 10 experts: teachers and teaching assistants in introductory programming courses. They noted that the problems are ready for use without further manual improvement and that the classification system is flexible enough to receive problems with desirable properties. The proposed approach combines the advantages of different state-of-the-art methods. It combines the diversity of learning problems generated by restricted randomization and large language models with full correctness and a natural look of template-based problems, which makes it a good fit for large-scale learning problem generation. Full article

(This article belongs to the Special Issue Application of Semantic Technologies in Intelligent Environment)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 144.

Go to page 1 2 3 4 5

Search Results (7,172)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI