No abstract available.
Front Matter
Front Matter
LiteSelect: A Lightweight Adaptive Learning Algorithm for Online Index Selection
Using appropriately selected indexes can dramatically improve the performance of query workloads in database systems. Typically, the access patterns of the workloads in real-world applications change frequently. This poses the challenge of ...
IDAGEmb: An Incremental Data Alignment Based on Graph Embedding
In the evolving digital environments, information systems are faced with a myriad of challenges such as data heterogeneity, the dynamic nature of data and integration complexities. These challenges impact on decision-making and data integration ...
Learning Paradigms and Modelling Methodologies for Digital Twins in Process Industry
Central to the digital transformation of the process industry are Digital Twins (DTs), virtual replicas of physical manufacturing systems that combine sensor data with sophisticated data-based or physics-based models, or a combination thereof, to ...
Front Matter
Embedding-Based Data Matching for Disparate Data Sources
Dealing with heterogeneous sources is an important challenge in the field of knowledge discovery and management. Schema matching methods are employed to solve this problem using three approaches: schema-based, instance-based, or a combination. ...
Subtree Similarity Search Based on Structure and Text
Given a query tree, the subtree similarity search problem is finding all subtrees in a document tree that are similar to the query tree. The previous scan-based method extracts candidate subtrees based on the size difference, which only considers ...
Front Matter
Towards Hybrid Embedded Feature Selection and Classification Approach with Slim-TSF
Traditional solar flare forecasting approaches have mostly relied on physics-based or data-driven models using solar magnetograms, treating flare predictions as a point-in-time classification problem. This approach has limitations, particularly in ...
Evaluation of High Sparsity Strategies for Efficient Binary Classification
In the dynamic landscape of Artificial Intelligence (AI) advancements, particularly in the development of compact and highly efficient models for space-constrained environments, the strategic sparsification of neural networks takes center stage. ...
Incremental SMOTE with Control Coefficient for Classifiers in Data Starved Medical Applications
- Wan D. Bae,
- Shayma Alkobaisi,
- Siddheshwari Bankar,
- Sartaj Bhuvaji,
- Jay Singhvi,
- Madhuroopa Irukulla,
- William McDonnell
Prediction models for data-starved medical applications lag behind general machine learning solutions, despite their potential to improve early interventions. This is largely due to the assumption that optimization approaches are applied on a ...
Front Matter
Exploring Causal Chain Identification: Comprehensive Insights from Text and Knowledge Graphs
During real-world reasoning, the logic path is generally not explicitly articulated. An appropriate causal chain can offer abundant informative details to depict a logical pathway, which is also beneficial in preventing ambiguity problems during ...
Towards Regional Explanations with Validity Domains for Local Explanations
The field of explainability in machine learning has become very prolific and numerous explanation methods have emerged during the last decade. Local explanations are of major interest because they are intelligible and claim to be locally faithful ...
Analyzing a Decade of Evolution: Trends in Natural Language Processing
Natural Language Processing (NLP) stands at the forefront of the rapidly evolving landscape of Machine Learning, witnessing the emergence and evolution of diverse methodologies over the past decade. This study delves into the dynamic trends within ...
Improving Serendipity for Collaborative Metric Learning Based on Mutual Proximity
Today, in web space, where content is constantly expanding, recommendation systems that enable users to explore information passively have become essential technologies, and their accuracy is significantly improving. However, recent studies have ...
Ada2vec: Adaptive Representation Learning for Large-Scale Dynamic Heterogeneous Networks
Representation learning generates the embedding vector of an object based on its relationships with others in a network. The generated vectors are inputs to various downstream machine learning tasks, such as classification, clustering and ...
Differentially-Private Neural Network Training with Private Features and Public Labels
Training neural networks (NN) with differential privacy (DP) protection has been extensively studied in the past decade, with the DP-SGD (stochastic gradient descent) mechanism representing the benchmark approach. Conventional DP-SGD assumes that ...
Front Matter
Series2Graph++: Distributed Detection of Correlation Anomalies in Multivariate Time Series
Multivariate time series are a form of real-valued sequence data that simultaneously record different time-dependent variables. They originate mostly from multi-sensor setups and serve a variety of important analytical purposes, including the ...
Anomaly Detection from Time Series Under Uncertainty
Anomalies in data can cause potential issues in downstream tasks, making their detection critical. Data collection processes for continuous data are often defective and imprecise. For example, sensors are resource-constrained devices, raising ...
Comparison of Measures for Characterizing the Difficulty of Time Series Classification
The performance of machine learning algorithms is influenced both by their characteristics and parameterization as well as by the properties of the data they are trained and evaluated on. The latter aspect is often neglected. In this paper, we ...
Dynamic Time Warping for Phase Recognition in Tribological Sensor Data
This paper analyzes the potential of dynamic time warping (DTW) for recognizing phases of tribological sensor data. The three classes in these time series—run-in, constant wear, and divergent wear—are distinguished by their long-term trend and ...
Front Matter
Putting Co-Design-Supporting Data Lakes to the Test: An Evaluation on AEC Case Studies
- Melanie Herschel,
- Andreas Gienger,
- Anja P. R. Lauer,
- Charlotte Stein,
- Lior Skoury,
- Nico Lässig,
- Carsten Ellwein,
- Alexander Verl,
- Thomas Wortmann,
- Cristina Tarin Sauer
Leveraging data from various stakeholders in the architecture, engineering, and construction (AEC) industry is an essential prerequisite to harness the potential of digitization and Artificial Intelligence (AI) in addressing major challenges such ...
Creating and Querying Data Cubes in Python Using PyCube
Data cubes are used for analyzing large data sets usually contained in data warehouses. The most popular data cube tools use graphical user interfaces (GUI) to do the data analysis. Traditionally this was necessary since data analysts were not ...
Index Terms
- Big Data Analytics and Knowledge Discovery: 26th International Conference, DaWaK 2024, Naples, Italy, August 26–28, 2024, Proceedings