The International Conference on Scientific and Statistical Database Management (SSDBM) brings together scientific domain experts, database researchers, practitioners, and developers for the presentation and exchange of current research results on concepts, tools, and techniques for scientific and statistical database applications. SSDBM 2018 continues the tradition of past SSDBM conferences in providing a stimulating environment to encourage discussion, fellowship and exchange of ideas in all aspects of research related to scientific and statistical data management. The conference is hosted by the Free University of Bozen-Bolzano, Italy, from July 9--11, 2018.
Proceeding Downloads
Metadata-driven error detection
Scientific data often originates from multiple sources and human agents. The integration of data from different sources must also resolve data quality problems that might occur because of inconsistency or different quality assurance levels of the ...
Towards meaningful distance-preserving encryption
Mining complex data is an essential and at the same time challenging task. Therefore, organizations pass on their encrypted data to service providers carrying out such analyses. Thus, encryption must preserve the mining results. Many mining algorithms ...
SBG-sketch: a self-balanced sketch for labeled-graph stream summarization
Applications in various domains rely on processing graph streams, e.g., communication logs of a cloud-troubleshooting system, road-network traffic updates, and interactions on a social network. A labeled-graph stream refers to a sequence of streamed ...
Multidimensional range queries on modern hardware
Range queries over multidimensional data are an important part of database workloads in many applications. Their execution may be accelerated by using multidimensional index structures (MDIS), such as kd-trees or R-trees. As for most index structures, ...
Massively-parallel break detection for satellite data
The field of remote sensing is nowadays faced with huge amounts of data. While this offers a variety of exciting research opportunities, it also yields significant challenges regarding both computation time and space requirements. In practice, the sheer ...
Declarative cartography under fine-grained access control
Visualization of spatial data is of increasing importance in science and society, but opens up justified concerns about data privacy and security. A classic methodology for cartography through generalization is data selection; however, data selection ...
COMPASS: compact array storage with value index
Efficient array storage is the backbone of scientific data processing. With an explosion of data, rapidly answering queries on array data is becoming increasingly important. Although most of the array storages today support subsetting of an array based ...
TIPP: parallel Delaunay triangulation for large-scale datasets
Because of the importance of Delaunay Triangulation in science and engineering, researchers have devoted extensive attention to parallelizing this fundamental algorithm. However, generating unstructured meshes for extremely large point sets remains a ...
Learning interesting attributes for automated data categorization
This work proposes and evaluates a novel approach to determining interesting attributes, in order to categorize entities accordingly. Once identified, such categories are of immense value to allow constraining (filtering) a user's current view to ...
Numerically stable parallel computation of (co-)variance
With the advent of big data, we see an increasing interest in computing correlations in huge data sets with both many instances and many variables. Essential descriptive statistics such as the variance, standard deviation, covariance, and correlation ...
A unified framework of density-based clustering for semi-supervised classification
Semi-supervised classification is drawing increasing attention in the era of big data, as the gap between the abundance of cheap, automatically collected unlabeled data and the scarcity of labeled data that are laborious and expensive to obtain is ...
Finding shortest keyword covering routes in road networks
Millions of users rely on navigation applications to compute an optimal route for their trips. The basic functionality of these applications is to find the minimum cost route between a source and target node in the transportation network. In this paper, ...
ERMrest: a web service for collaborative data management
The foundation of data oriented scientific collaboration is the ability for participants to find, access and reuse data created during the course of an investigation, what has been referred to as the FAIR principles. In this paper, we describe ERMrest, ...
Publishing spatial histograms under differential privacy
Studying trajectories of individuals has received growing interest. The aggregated movement behaviour of people provides important insights about their habits, interests, and lifestyles. Understanding and utilizing trajectory data is a crucial part of ...
GeoSparkViz: a scalable geospatial data visualization framework in the apache spark ecosystem
Data Visualization allows users to summarize, analyze and reason about data. A map visualization tool first loads the designated geospatial data, processes the data and then applies the map visualization effect. Guaranteeing detailed and accurate ...
Efficient anti-community detection in complex networks
Modeling the relations between the components of complex systems as networks of vertices and edges is a commonly used method in many scientific disciplines that serves to obtain a deeper understanding of the systems themselves. In particular, the ...
Selecting representative and diverse spatio-textual posts over sliding windows
Thousands of posts are generated constantly by millions of users in social media, with an increasing portion of this content being geotagged. Keeping track of the whole stream of this spatio-textual content can easily become overwhelming for the user. ...
NoSingles: a space-efficient algorithm for influence maximization
Algorithmic problems of computing influence estimation and influence maximization have been actively researched for decades. We developed a novel algorithm, NoSingles, based on the Reverse Influence Sampling method proposed by Borgs et al. in 2013. ...
Order-independent constraint-based causal structure learning for gaussian distribution models using GPUs
Learning the causal structures in high-dimensional datasets allows deriving advanced insights from observational data, thus creating the potential for new applications. One crucial limitation of state-of-the-art methods for learning causal relationships,...
Feature-based comparison and generation of time series
For more than three decades, researchers have been developping generation methods for the weather, energy, and economic domain. These methods provide generated datasets for reasons like system evaluation and data availability. However, despite the ...
Point pattern search in big data
- Fabio Porto,
- João N. Rittmeyer,
- Eduardo Ogasawara,
- Alberto Krone-Martins,
- Patrick Valduriez,
- Dennis Shasha
Consider a set of points P in space with at least some of the pairwise distances specified. Given this set P, consider the following three kinds of queries against a database D of points : (i) pure constellation query: find all sets S in D of size |P| ...
Distributed caching for processing raw arrays
As applications continue to generate multi-dimensional data at exponentially increasing rates, fast analytics to extract meaningful results is becoming extremely important. The database community has developed array databases that alleviate this problem ...
GPU-based parallel indexing for concurrent spatial query processing
In most spatial database applications, the input data is very large. Previous work has shown the importance of using spatial indexing and parallel computing to speed up such tasks. In recent years, GPUs have become a mainstream platform for massively ...
Optimizer time estimation for SQL queries
Predicting the amount of time a SQL query takes to execute can help in prioritizing, optimizing and scheduling the query execution. This also helps inoptimal utilization of hardware resources. The total execution time of a query can be split into the ...
Scheduling data-intensive scientific workflows with reduced communication
Data-intensive scientific workflows, typically modelled by directed acyclic graphs, consist of inter-dependent tasks that exchange significant amounts of data and are executed on parallel/distributed clusters. However, the energy or monetary costs ...
PARADISO: an interactive approach of parameter selection for the mean shift algorithm
Many algorithms have been developed for detecting clusters of various kinds over the past decades. However, just few attempts have been made to provide an interactive setting for the clustering algorithms. In this paper, we present PARADISO, an ...
Towards an efficient and effective framework for the evolution of scientific databases
Database systems are well suited to scientific data management and analysis workloads, however, a database must evolve to keep pace with changing requirements and adjust to changes in the domain conceptualization as applications mature. Evolving a ...
Maximizing area-range sum for spatial shapes (MAxRS3)
We investigate a novel variant of the well-known MaxRS (Maximizing Range Sum) problem - namely, the MAxRS3 (Maximizing Area-Range Sum for Spatial Shapes). The MaxRS problem amounts to detecting a location where a fixed-size rectangle R should be placed, ...
PathGraph: querying and exploring big data graphs
With the widespread diffusion of social networks and the dawn of data-intensive scientific applications, graphs became one of the foundations for modern data management applications. A key role in graph querying and analysis is played by Regular Path ...
Crossing an OCEAN of queries: analyzing SQL query logs with OCEANLog
SQL queries encapsulate the knowledge of their authors about the usage of the queried data sources. This knowledge also contains aspects that cannot be inferred by analyzing the contents of the queried data sources alone. Due to the complexity of ...
Index Terms
- Proceedings of the 30th International Conference on Scientific and Statistical Database Management