No abstract available.
Front Matter
Front Matter
Bounding Box Representation of Co-location Instances for Induced Distance Measure
In this paper, we investigate the efficiency of Co-location Pattern Mining (CPM). In popular methods for CPM, the most time-consuming step consists of identifying of pattern instances, which are required to calculate the potential interestingness ...
Benchmarking Data Lakes Featuring Structured and Unstructured Data with DLBench
In the last few years, the concept of data lake has become trendy for data storage and analysis. Thus, several approaches have been proposed to build data lake systems. However, these proposals are difficult to evaluate as there are no commonly ...
Towards an Adaptive Multidimensional Partitioning for Accelerating Spark SQL
Nowadays Parallel DBMSs and Spark SQL compete with each other to query Big Data. Parallel DBMSs feature extensive experience embodied by powerful data partitioning and data allocation algorithms, but they suffer when handling dynamic changes in ...
Front Matter
A Chain Composite Item Recommender for Lifelong Pathways
This work addresses the problem of recommending lifelong pathways, i.e., sequences of actions pertaining to health, social or professional aspects, for fulfilling a personal lifelong project. This problem raises some specific challenges, since the ...
Health Analytics on COVID-19 Data with Few-Shot Learning
Although huge volumes of valuable data can be generated and collected at a rapid velocity from a wide variety of rich data sources, their availability may vary due to various factors. For example, in the competitive business world, huge volumes of ...
Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory
Visual Commonsense Reasoning (VCR) predicts an answer with corresponding rationale, given a question-image input. VCR is a recently introduced visual scene understanding task with a wide range of applications, including visual question answering, ...
Front Matter
Universal Storage Adaption for Distributed RDF-Triple Stores
The publication of machine-readable information has been significantly increasing both in the magnitude and complexity of the embedded relations. The Resource Description Framework (RDF) plays a big role in modelling and linking web data and their ...
RDF Data Management is an Analytical Market, not a Transaction One
In recent years, the Resource Description Framework data model has seen an increasing adoption in Web applications and IT in general. This has contributed to the establishment of standards such as the SPARQL query language and the emergence of ...
Document Ranking for Curated Document Databases Using BERT and Knowledge Graph Embeddings: Introducing GRAB-Rank
Curated Document Databases (CDD) play an important role in helping researchers find relevant articles in scientific literature. Considerable recent attention has been given to the use of various document ranking algorithms to support the ...
Front Matter
Contextual and Behavior Factors Extraction from Pedestrian Encounter Scenes Using Deep Language Models
This study introduces an NLP framework including deep language models to automate the contextual and behavior factors extraction from a narrative text that describes the environment and pedestrian behaviors at the pedestrian encounter scenes. The ...
Spark Based Text Clustering Method Using Hashing
Text clustering has become an important task in machine learning since several micro-blogging platforms such as Twitter require efficient clustering methods to discover topics. In this context, we propose in this paper a new Spark based text ...
Impact of Textual Data Augmentation on Linguistic Pattern Extraction to Improve the Idiomaticity of Extractive Summaries
The present work aims to develop a text summarisation system for financial texts with a focus on the fluidity of the target language. Linguistic analysis shows that the process of writing summaries should take into account not only terminological ...
Explainability in Irony Detection
Irony detection is a text analysis problem aiming to detect ironic content. The methods in the literature are mostly for English text. In this paper, we focus on irony detection in Turkish and we analyze the explainability of neural models using ...
Efficient Graph Analytics in Python for Large-Scale Data Science
Graph analytics is important in data science research, where Python is nowadays the most popular language among data analysts. It facilitates many packages for graph analytics. However, those packages are either too specific or cannot work on ...
Front Matter
A New Accurate Clustering Approach for Detecting Different Densities in High Dimensional Data
Clustering is a data analysis method for extracting knowledge by discovering groups of data called clusters. Density-based clustering methods have proven to be effective for arbitrary-shaped clusters, but they have difficulties to find low-density ...
ODCA: An Outlier Detection Approach to Deal with Correlated Attributes
Datasets from different domains usually contain data defined over a wide set of attributes or features linked through correlation relationship. Moreover, there are some applications in which not all the attributes should be treated in the same ...
A Novel Neurofuzzy Approach for Semantic Similarity Measurement
The problem of identifying the degree of semantic similarity between two textual statements automatically has grown in importance in recent times. Its impact on various computer-related domains and recent breakthroughs in neural computation has ...
Front Matter
Integrated Process Data and Organizational Data Analysis for Business Process Improvement
Neither a compartmentalized vision nor coupling of process data and organizational data favors the extraction of the evidence an organization needs for their business process improvement. In previous work, we dealt with integrating both kinds of ...
Smart-Views: Decentralized OLAP View Management Using Blockchains
In this work we explore the use of a blockchain as an immutable ledger for publishing fact records in a decentralized data warehouse. We also exploit the ledger for storing smart views, i.e. definitions of frequent data cube computations in the ...
A Workload-Aware Change Data Capture Framework for Data Warehousing
Today’s data warehousing requires continuous or on-demand data integration through a Change-Data-Capture (CDC) process to extract data deltas from Online Transaction Processing Systems. This paper proposes a workload-aware CDC framework for on-...