Ebook: Fuzzy Systems and Data Mining IX
Fuzzy systems and data mining are indispensible aspects of the digital technology on which we now all depend. Fuzzy logic is intrinsic to applications in the electrical, chemical and engineering industries, and also in the fields of management and environmental issues. Data mining is indispensible in dealing with big data, massive data, and scalable, parallel and distributed algorithms.
This book presents the proceedings of FSDM 2023, the 9th International Conference on Fuzzy Systems and Data Mining, held from 10-13 November 2023 as a hybrid event, with some participants attending in Chongqing, China, and others online. The conference focuses on four main areas: fuzzy theory, algorithms and systems; fuzzy application; data mining; and the interdisciplinary field of fuzzy logic and data mining, and provides a forum for experts, researchers, academics and representatives from industry to share the latest advances in the field of fuzzy sets and data mining. This year, topics from two special sessions on granular-ball computing and the application of generative AI, as well as machine learning and neural networks, were also covered. A total of 363 submissions were received, and after careful review by the members of the international program committee, 110 papers were accepted for presentation at the conference and publication here, representing an acceptance rate of just over 30%.
Covering a comprehensive range of current research and developments in fuzzy logic and data mining, the book will be of interest to all those working in the field of data science.
A decade after its inception, the International Conference series on Fuzzy Systems and Data Mining (FSDM) has become established as a mature event. It deals with four main topic groups: a) fuzzy theory, algorithms and systems, b) fuzzy application, c) the interdisciplinary field of fuzzy logic and data mining, and d) data mining. It is a forum for experts, researchers, academics and representatives from industry to share the latest advances in the field of fuzzy sets and data mining. From the outset, the proceedings have formed part of the prestigious book series, Frontiers in Artificial Intelligence and Applications (FAIA), published by IOS Press.
This book contains the papers accepted and presented at the 9th International Conference on Fuzzy Systems and Data Mining (FSDM 2023), held from 10–13 November 2023 in hybrid mode, with most participants gathered in-person in Chongqing, China, and some online. The conference was organized by Chongqing University of Posts and Telecommunications.
All papers were carefully reviewed by members of the international program committee, (listed on unmapped: uri http://www.fsdmconf.org/TPC), and peer-reviewers (listed on unmapped: uri http://www.academicconf.com/reviewerlist?confname=fsdm2023) taking into consideration the breadth and depth of the research topics that fall within the scope of FSDM, focusing not only on the four main topic groups, but also on the themes of two special sessions on granular-ball computing and the application of generative AI, as well as machine learning and neural networks.
FSDM 2023 received 363 submissions, and after a vibrant and vivid discussion stage, 110 papers were accepted by the committee, representing an acceptance rate of just over 30%.
We would like to thank all the speakers and authors for their effort in preparing a contribution for this leading international conference. Moreover, we are very grateful to all those who devoted their time to the evaluation of the papers, especially the reviewers and the members of the program committee. It is also a great honor to continue with the publication of these proceedings in the prestigious FAIA series from IOS Press.
October 2023
Antonio J. Tallón-Ballesteros, Department of Electronics, Computer Systems and Automation Engineering, University of Huelva, Huelva city, Spain
Raquel Beltrán-Barba, Department of Integrated Sciences, University of Huelva, Huelva city, Spain
Big data repositories contain great-value data from which actionable knowledge insights can be meaningfully derived in order to support a wide spectrum of modern applications, such as smart cities, social networks, e-science, bio-informatics, and so forth. How to extract these interesting patterns from such large-scale repositories? The latter is a fundamental research question that is still open. Inspired by the described research challenge, this paper explores the issue of supporting advanced Machine Learning (ML) structures over big data repositories, whose final goal is realizing meaningful knowledge discovery tasks. These “structures” are rather programs than tasks so that they incorporate ML procedures within high-level (program) controls whose main goal is that of magnifying the expressive power of the whole big data analytics process implemented as a collection of singleton big data analytics tasks. In turn, each task is implemented in term of a proper advanced ML structure. The paper provides introduction and motivations to the investigated problem, analysis of related work, and the proposal of a reference architecture supporting these innovative structures.
This paper is devoted to the novel method of automated detection of common eye gaze movement patterns by mining the data recorded in eye-tracking-based experiments. For this, a model of aggregated scanpath is proposed that represents a fuzzy set of all possible eye gaze trajectories found in the experimental data. In contrast to the traditional methods of aggregation, no averaging is used to avoid information loss. Instead, the belonging function determines the probability of each particular trajectory. The constructed fuzzy scanpath is then filtered and automatically analyzed by applying methods of network science. For this, the fixations (eye gaze stops) are represented as network nodes and saccades (eye gaze jumps) are mapped to network links. For the network composed, modularity is calculated utilizing the Louvain method of community detection. In the case of eye gaze data, modularity represents saccadic cycles, which can be mapped to the cycles of cognitive processing. Thereby, the common perception structure is retrieved. To support all the analysis steps, we proposed corresponding scalable visualization tools based on our visual analytics platform SciVi. We demonstrate the viability of our approach by analyzing the data obtained from the real-world eye-tracking-based experiment from the Digital Humanities application domain. Preliminary experiment results are discussed along with the efficacy of the proposed methods.
Harmony level prediction is receiving increasing attention nowadays. Color plays a crucial role in affecting human aesthetic responses. In this paper, we explore color harmony using a fuzzy-based color model and address the question of its universality. For our experiments, we utilize a dataset containing attractive images from five domains: fashion, art, nature, interior design, and brand logos. Using a fuzzy approach, we aim to identify harmony patterns and dominant color palettes within these images. It is well-suited for this task because it can handle the inherent subjectivity and contextual variability associated with aesthetics and color harmony evaluation. Our experimental results suggest that color harmony is largely universal. Additionally, our findings reveal that color harmony is not solely influenced by hue relationships on the color wheel but also by the saturation and intensity of colors. In palettes with high harmony levels, we observed a prevalent adherence to color wheel principles while maintaining moderate levels of saturation and intensity. These findings contribute to ongoing research on color harmony and its underlying principles, offering valuable insights for designers, artists, and researchers in the field of aesthetics.
The homomorphic image of a fuzzy module of an R-module is a fuzzy module [1]. In one of our papers [2] we have proved that if f : ℤn → ℤm is a ℤ-module homomorphism with gcd(n, m) = p, a prime and λ is any fuzzy module on ℤn with any level cardinality then the fuzzy module f(λ) on ℤm has level cardinality atmost 3. In this paper, we are considering the fuzzy module homomorphism between the ℤ-modules ℤn and ℤm where n,m ∈ ℤ+ with gcd(n, m) = pq, where p and q are primes and trying to find the level cardinality of the fuzzy module f(λ) on ℤm when λ is a fuzzy module on ℤn.
Electric vehicles (EVs) are becoming increasingly popular as a cleaner alternative to traditional gasoline-powered vehicles due to advances in battery technology, climate change impacts, and a host of other factors. However, one of the major challenges to the widespread adoption of EVs is the lack of sufficient charging infrastructure. In this study, we explore the use of predictive and optimization algorithms to estimate future demand and find the optimal placement and allocation of EV charging stations in a given charging network. For demand forecasting, we develop a neural network model that takes into account the location of charging stations and historical demand data to predict future EV charging demand. These predicted demands are then used to optimize the infrastructure. Specifically, we use the CMA-ES algorithm to optimize the placement and allocation of charging stations based on factors such as predicted demand, infrastructure costs, driving distance to charging stations, and available space. We compare the results of this approach to a baseline approach that allocates charging stations based on simple heuristics. Our results show that our proposed optimization algorithm effectively handles uncertainty, which can lead to significant improvements in the efficiency and effectiveness of EV infrastructure planning and help accelerate EV adoption.
Self-medication is a widespread practice throughout the world, especially in the Syrian Arab Republic. However, if done incorrectly, this method might be harmful. Data stream mining, which involves analyzing vast amounts of data from multiple sources, has proven to be an effective technique for enhancing self-medication practices. The role of data stream mining, and how it may evaluate information from various sources like social media, electronic health records, pharmacy sales data, sensor-based medical devices, etc. have been explored for the research. Healthcare professionals in Syria can improve patient outcomes and safety related to self medication by employing data stream mining to uncover important information about patient behaviors, drug effectiveness, and adverse events. In this paper, a strategy for generating evidence-based medical data utilizing data stream mining techniques is suggested. Here, methods including association rule mining, categorization, and data clustering have been described. For future progression of this research, the information gap and other data collection-related problems need to be resolved. The implementation demonstrates that healthcare professionals can use a variety of data stream mining techniques to better understand drug usage patterns and spot opportunities while also learning about the prevalence of self-medication in different parts of the Syrian Arab Republic.
The Net Promoter Score (NPS) is often used in customer experience programs for measuring customer loyalty. Increasingly more companies seek to automatically process millions of pieces of customer feedback from social media per month in order to estimate their NPS, leveraging advanced analytics like machine learning (ML) and natural language processing (NLP). Discovering trends and themes in customer interactions helps explain the NPS, empowering companies to improve products and customer experience. In this paper, we describe an end-to-end solution for NPS estimation and explanation from social media. The process includes sentiment analysis on user comments, estimating product information based on text semantics, grouping and tagging user comments for text discovery, and NPS explanation. The solution gives companies the capability to identify overall customer sentiment and common topics in a unified platform, allowing faster analysis and insights on NPS based on customer feedback.
A Convolutional Neural Network (CNN) is one branch of Deep Learning widely used for image classification. CNN have complex architectures and capable of achieving high accuracy and producing good results. However, CNN have limitations, especially when dealing with noisy images. Image noise can decrease classification performance and increase network training time. This research was tested the robustness of two CNN methods, namely VGG16 and ResNet50, in processing images with added Gaussian noise at various levels, without any preprocessing. The dataset used is the Rice Image Dataset, which consists of images of 5 different types of rice. The data is divided into three parts: 70% for training, 20% for testing, and 10% for validation. The results of this study show that as the variability in generating Gaussian noise in the images increases, the loss function value consistently increases, and the accuracy decreases. However, the increase in the loss function value and the decrease in accuracy are not significantly different among the different levels of noise variability.
This paper underscores the significance of employing fuzzy methods for monitoring the risk attitudes of air traffic controllers. It offers an account of the implementation of such a method within a student group. The development of membership functions is elaborated upon, including their alignment with the ICAO recommended scale. Additionally, the paper outlines the calculation of personal opinions for membership function of fuzzy variable term set. It also presents the integral alignment comparison of personal opinions with the group opinion for the entire set of fuzzy variable terms. The distribution of survey participants along a competence scale and future research directions are discussed.
Chronic illnesses like cancer and diabetes are a big deal worldwide, especially in poorer countries. They’re tough on healthcare systems and people’s wallets. Various strategies, like early detection and better management, can help. Machine learning (ML) is getting really good at helping us understand and manage these diseases. Within ML, something called classification algorithms are super useful in figuring out who’s at higher risk for these diseases by sifting through things like medical records and lab results. One standout technique within these algorithms is neural networks, particularly a type called ‘shallow neural networks’ or RBF-NNs. In this investigation, a tweaked version of RBF-NNs namely ‘Radius Local k-Point Radial Basis Function (RLRBF) Neural Networks’ is paid attention to, to see how good they are at classifying people based on their risk of getting a chronic illness. Generally, the two objectives are firstly, to test out how effective this specific type of neural network is, and secondly, to provide useful insights into identifying chronic diseases. Our findings could be a big help for doctors and patients in the future.
Breast cancer is one of the deadliest types of cancer, and it comes in a wide variety of forms, resulting in a wide variety of detection methods. Several deep learning techniques have been applied to decrease unnecessary biopsies and lessen the burden on radiologists. One of the most popular architectures for this task is the Convolutional Neural Networks (CNNs). This paper aims to explore the integration of convolutional neural networks (CNN) and wavelet transform (WT) to identify the optimal combination and architecture of these methods for efficient detection of breast cancer in ultrasound images. To accomplish this task, the wavelet convolutional neural network (WCNN) structures are proposed and trained for the mission of screening breast cancer abnormalities embedded in ultrasound type of images. Compared with other two popular networks, ResNet50 and MobileNetV2, it has been found that the proposed WCNN has produced a satisfactory solution, with an accuracy of 98.24%, precision of 97.29%, recall of 100%, and F measure of 98.24%.
Multiple Target Tracking (MTT) is one of the most challenging topics in radar target tracking. In addition to conventional position measurements, the integration of target Doppler and power information can offer valuable insights into a target’s kinematic state, thereby improving tracking performance. This article discusses a tracking approach that incorporates those components into the measurement enhancement process and assesses the advantages of the proposed strategy. Firstly, we investigated an improved data association scheme that exploits statistical features of position, Doppler and power. The estimated Doppler value calculated from the target range and timestamp will be compared with the measured Doppler to deal with the Doppler ambiguity situation. Secondly, we studied an augmented unscented Kalman filter (UKF) algorithm using position, Doppler measures, and the indirect measure of radial velocity in the linear domain. The experimental results show that the proposed solution has good performance in terms of reduced number of false tracks and improving the accuracy of the target state estimation.
This research paper evaluates the effects of wind tunnel testing on low-rise buildings using a Finite Element-based computational fluid dynamics (CFD) model in order to compare the results of the traditional Directional Procedure (Building of All Heights Method) to the results of the CFD analysis. This study utilized all necessary parameters for the Wind Tunnel Procedure in accordance with the National Structural Code of the Philippines section 207F together with the guidelines prescribed in ASCE 7-16. The mathematical prototype of 3 different buildings with the same dimensions were created using midas NFX, and the necessary boundary conditions for CFD analysis were set up. The pressure caused by wind load on these buildings were evaluated and compared with the Directional procedure in Section 207B of the NSCP (Buildings of all heights method). Atmospheric turbulence was also modelled using Kinetic Energy and Length Scale method utilizing 2-equation k-ε turbulence model. The results show significant differences in pressure effects on walls with considerable number of openings and unique façade features. CFD results also show accurate internal pressure and velocity profile in areas near the wall openings and building edges. This research is particularly relevant for the design and construction of buildings in areas prone to strong winds, such as the Philippines, where accurate wind load calculations are crucial to ensuring structural safety and integrity. While numerous studies in other countries have explored the impact of wind on low-rise structures through mathematical simulations, there remains a substantial gap in the Philippines regarding research concerning wind tunnel effects on structures using mathematical and numerical approaches. This study aims to fill this gap by connecting the existing National Structural Code of the Philippines with contemporary trends in mathematical simulations.
The human wrist radial artery is rich in important physiological and pathological information. The rapid dynamic detection of radial artery can be realized by ultrasonic imaging method. Because of the small amplitude fluctuation and narrow vessel width of the wrist radial artery, it is difficult to detect the vessels by ultrasound. In ultrasonic nondestructive testing system, CORDIC algorithm can realize complex operation only through simple addition, subtraction and shift transformation. It has obvious advantages in high-speed operation, especially suitable for optimizing ultrasonic signal processing algorithm. Based on CORDIC algorithm, this paper proposes a phased array ultrasonic signal processing method for wrist radial artery. The dynamic filtering, orthogonal demodulation and logarithmic compression techniques using CORDIC algorithm are mainly studied. FPGA simulation results show that CORDIC algorithm can realize the rapid reproduction of the wrist radial artery ultrasonic echo signal, which lays the foundation for the next step of the interpretation of the physiological and pathological information on the radial artery.
This study analyzed the stability of predator-prey ecosystem (including the sensitivity of the initial value and coefficient) via the original Lotka-Volterra model and three improved models. The first improved model considered the impact of hunting and internal competition on the ecosystem, the second introduced the second prey to the original model, while the third added the second-type predator and the second-type prey to the original system. The performed numerical simulation of the original and three improved models proved instability of the ecosystem represented by the original Lotka-Volterra model and the stability of those represented by all three improved models.
Due to the surge in COVID-19 cases, hospitals have had to receive many more patients than before, which has brought unprecedented pressure to the hospital system. Therefore, the emphasis of medical decision-making has shifted from reaching the best treatment effect to prioritizing the treatment of COVID-19 patients by hospitals, which is key to relieving the pressure on the hospital system and reducing the overall mortality rate of COVID-19. There is no doubt that establishing the prioritization of COVID-19 cases is fundamental and pivotal for hospitals to achieve the shift in medical decision-making. Prioritization of COVID-19 cases in previous studies was mostly based on one patient characteristic, mainly including age, health conditions, and gender. This paper focuses on two patient characteristics at the same time. The probability that a COVID-19 patient who died had a given health condition in a given age group is calculated using the matrix completion technique based on the high-rank assumption of Bayesian matrices and the properties of Markov matrices. The calculated results show that doctors should give patients over 55 with respiratory diseases, patients over 65 with circulatory diseases, and patients over 65 with diabetes a higher prioritization in COVID-19 treatment.
Meta-path-based methods for measuring the similarities between nodes in Heterogeneous Information Networks (HINs) have attracted attention from researchers due to excellent performance. However, these methods suffer from some issues: (1) it is difficult for users to provide effective meta-paths of complex HINs; (2) it is inefficient to enumerate all instances of meta-paths. In order to solve the above issues, this paper proposes a novel method, K-order Neighbors based Heterogeneous Graph Neural Network (KN-HGNN), for measuring node similarity. Firstly, KN-HGNN generates meta-paths based on network schema. Then, KN-HGNN obtains the features of nodes by aggregating itself type feature and its k-order neighbors’ type features under the constraints of meta-paths. Finally, KN-HGNN calculates the similarities between nodes based on the features of nodes. The experimental results on real datasets show that KN-HGNN outperforms the baselines.
To fit the changes in the investment process, a portfolio adjusting method with triangular intuitionistic fuzzy return is put forward. The expected return rate and risk of the portfolio are characterized by mean value and variance of triangular intuitionistic fuzzy number. Then, an intuitionistic fuzzy portfolio adjusting model is established by minimizing the variance risk of portfolio and ensuring the expected return greater than some aspired return levels. Finally, an application example of stock portfolio is given to demonstrate the practicability of intuitionistic fuzzy portfolio adjusting model.
Nonferrous metals are important commodities, and it is of great significance for policy makers and investors to accurately predict their price changes. Nevertheless, because the price of nonferrous metals present drastic fluctuations, developing a robust price prediction method is a tricky task. In this research, a hybrid model based on discrete wavelet transform (DWT), bidirectional long short-term memory (BiLSTM) and residual network (ResNet) is constructed for nonferrous metals price prediction. The hyper-parameters of the hybrid neural network are searched by grey wolf optimization (GWO) algorithm. Configuring reasonable parameters, which enhances the final prediction effect. Additionally, behind the second hidden layer, the low and high dimensional features are fused to prevent the degradation of the model. The original sequence is processed by DWT technology, then the sequence is reconstructed, which is beneficial to capture the essential trend. The experimental results show that the proposed BiLSTM-ResNet-GWO-DWT model is more accurate compared with the other benchmark models, which provides an effective reference significance.
Abstract goes here. Existing keyword query systems over knowledge graph can produce interesting results and are easy to use. However, they cannot handle the ambiguities that have matches in the knowledge graph, namely, multiple interpretations may be correct so that they cannot determine which interpretation is what the user expects. And they cannot scale to handle the knowledge graphs with more than billions of triples or thousands of types/predicates. On the one hand, we construct an interactive interface in which the above ambiguities will resort to the user. To enhance the user experience, we formalize the interaction problem and then propose an algorithm to find a best scheme of interaction (i.e., a verifying sequence with lowest interaction times and candidates) based on the dependency relations between mappings. On the other hand, we propose a new schema graph, i.e., type-predicate graph, which has good scalability while containing complete information for building query graph. No matter how large the knowledge graph is, the size of type-predicate graph is always very small because its size depends on the number of types and predicates whose number are far less than that of triples in knowledge graph. Finally, we have demonstrated our contributions with several well-directed experiments over real datasets (DBpedia and Yago).
Wheat and corn are the two most important grain crops in northern China, and at this stage, it is difficult to obtain professional information in the field of wheat and corn, the acquisition efficiency is low, and the accuracy is poor, which seriously restricts the production efficiency. The intelligent question answering system can efficiently and accurately automatically screen out professional information, to effectively solve the above problems. However, the existing intelligent question answering system for wheat and maize has too low matching accuracy and slow retrieval speed to be widely promoted. Therefore, an efficient and accurate two-stage matching algorithm is designed, which uses the BM25 algorithm to recall the candidate set, and then uses the BERT model to screen out the optimal solution based on the candidate set. Experimental results show that the algorithm has high retrieval accuracy and retrieval efficiency.
Semi-supervised support vector machine (S3VM) algorithms can effectively deal with the problem of a few labeled instance and a large number of unlabeled instances due to its good performance. The solution of the existing semi-supervised support vector machine algorithms requires the use of many types of optimization strategies because it takes all the training data as parameters to participate in iterative optimization, which makes it difficult to efficiently process large-scale data. Although simple random sampling is an effective means to consider efficient modeling from the perspective of data preprocessing, the problem that it determines the sample size in advance is difficult to process for the existence of sampling randomness and sample difference. To fully characterize the original unlabeled data and ensure the robustness of the model, we have proposed an adaptive sampling to train the model on the labeled set and the sampled unlabeled set. The fixed size unlabeled instances are continually sampled from the original unlabeled set until the proposed statistics on the obtained sample meet the stopping condition, where the statistics and stopping condition are generated by the density estimation. This method solves the problem of subjectively determining the sample size in advance, the robustness of the proposed algorithm has been proved with the probably approximately correct learning theory.
Fuzzy control has great advantages in dealing with imprecision and uncertainty in inference systems and control systems. At present, an important area of the development of new energy vehicle vehicles, the purpose is to save non-renewable resources and reduce air pollution, good control methods can improve the energy efficiency of new energy vehicles. This paper focuses on the recovery and utilization of braking energy in gasoline-electric hybrid vehicles, which increases the energy reuse of gasoline-electric hybrid vehicles, so that the energy consumption of the whole vehicle is lower and the pollution is less. In this paper, fuzzy control theory is used to reasonably allocate the ratio of mechanical braking and regenerative braking according to vehicle speed and braking requirements, so as to achieve a reasonable balance between braking reliability and braking energy recovery when braking.
With the rapid development of mobile communication technologies, the mobile network has evolved into a highly heterogeneous network structure. Based on dynamic networks, we mainly investigated a method to explore the synchronous changes of high-traffic events. Event coincidence analysis is used to quantify the concurrency of high-traffic events. A variety of network measures are used to analyze the dynamic spatio-temporal characteristics of high-traffic. The static network is constructed to analyze the synchronous influence area and the temporal and spatial characteristics of high-traffic of base station. Taking hour as the time window, the dynamic network is constructed to study the dynamic spatio-temporal variation rule of high-traffic and the interactive relationship of traffic between base stations. It is found that static network is a small-world network. The spatial connectivity of high-traffic events at the base station is high, and the spatial connectivity is not sensitive to temporal changes. The traffic of different base stations has interactive relation at the same time in different days.