Issue Downloads
PACMMOD Volume 2 Issue 2: Editorial
We are excited to announce the first issue dedicated to the PODS research track of the Proceedings of the ACM on Management of Data, or PACMMOD, journal. In its current form, this new journal hosts a SIGMOD and a PODS research track. The PODS research ...
A Dichotomy in the Complexity of Consistent Query Answering for Two Atom Queries With Self-Join
We consider the dichotomy conjecture for consistent query answering under primary key constraints. It states that, for every fixed Boolean conjunctive query q, testing whether q is certain (i.e. whether it evaluates to true over all repairs of a given ...
Conjunctive Queries with Negation and Aggregation: A Linear Time Characterization
In this paper, we study the complexity of evaluating Conjunctive Queries with negation (\cqneg). First, we present an algorithm with linear preprocessing time and constant delay enumeration for a class of CQs with negation called free-connex signed-...
Consistent Query Answering for Primary Keys on Rooted Tree Queries
We study the data complexity of consistent query answering (CQA) on databases that may violate the primary key constraints. A repair is a maximal subset of the database satisfying the primary key constraints. For a Boolean query q, the problem fCERTAINTY(...
Containment of Graph Queries Modulo Schema
With multiple graph database systems on the market and a new Graph Query Language standard on the horizon, it is time to revisit some classic static analysis problems. Query containment, arguably the workhorse of static analysis, has already received a ...
Enumeration for MSO-Queries on Compressed Trees
We present a linear preprocessing and output-linear delay enumeration algorithm for MSO-queries over trees that are compressed in the well-established grammar-based framework. Time bounds are measured with respect to the size of the compressed ...
From Shapley Value to Model Counting and Back
In this paper we investigate the problem of quantifying the contribution of each variable to the satisfying assignments of a Boolean function based on the Shapley value. Our main result is a polynomial-time equivalence between computing Shapley values ...
Generalized Core Spanner Inexpressibility via Ehrenfeucht-Fraïssé Games for FC
Despite considerable research on document spanners, little is known about the expressive power of generalized core spanners. In this paper, we use Ehrenfeucht-Fraïssé games to obtain general inexpressibility lemmas for the logic FC (a finite model ...
On Reporting Durable Patterns in Temporal Proximity Graphs
Finding patterns in graphs is a fundamental problem in databases and data mining. In many applications, graphs are temporal and evolve over time, so we are interested in finding durable patterns, such as triangles and paths, which persist over a long ...
Streaming Algorithms with Few State Changes
In this paper, we study streaming algorithms that minimize the number of changes made to their internal state (i.e., memory contents). While the design of streaming algorithms typically focuses on minimizing space and update time, these metrics fail to ...
The Complexity of Why-Provenance for Datalog Queries
Datalog is a powerful rule-based language that allows us to express complex recursive queries and has found numerous applications over the years. Explaining why a result to a Datalog query is obtained is an essential task towards explainable and ...
The Moments Method for Approximate Data Cube Queries
We investigate an approximation algorithm for various aggregate queries on partially materialized data cubes. Data cubes are interpreted as probability distributions, and cuboids from a partial materialization populate the terms of a series expansion of ...
Tight Lower Bounds for Directed Cut Sparsification and Distributed Min-Cut
In this paper, we consider two fundamental cut approximation problems on large graphs. We prove new lower bounds for both problems that are optimal up to logarithmic factors.
The first problem is approximating cuts in balanced directed graphs. In this ...
The Weisfeiler-Leman Dimension of Conjunctive Queries
A graph parameter is a function f on graphs with the property that, for any pair of isomorphic graphs G1 and G2, f(G1)=f(G2). The Weisfeiler--Leman (WL) dimension of f is the minimum k such that, if G1 and G2 are indistinguishable by the k-dimensional WL-...
Tight Bounds of Circuits for Sum-Product Queries
In this paper, we ask the following question: given a Boolean Conjunctive Query (CQ), what is the smallest circuit that computes the provenance polynomial of the query over a given semiring? We answer this question by giving upper and lower bounds. ...
On Density-based Local Community Search
Local community search (LCS) finds a community in a given graph G local to a set R of seed nodes by optimizing an objective function. The objective function f(S) for an induced subgraph S encodes the set inclusion criteria of R to a classic community ...
Verification of Unary Communicating Datalog Programs
We study verification of reachability properties over Communicating Datalog Programs (CDPs), which are networks of relational nodes connected through unordered channels and running Datalog-like computations. Each node manipulates a local state database (...
Evaluating Datalog over Semirings: A Grounding-based Approach
Datalog is a powerful yet elegant language that allows expressing recursive computation. Although Datalog evaluation has been extensively studied in the literature, so far, only loose upper bounds are known on how fast a Datalog program can be evaluated. ...
When View- and Conflict-Robustness Coincide for Multiversion Concurrency Control
A DBMS allows trading consistency for efficiency through the allocation of isolation levels that are strictly weaker than serializability. The robustness problem asks whether, for a given set of transactions and a given allocation of isolation levels, ...
Expected Shapley-Like Scores of Boolean functions: Complexity and Applications to Probabilistic Databases
Shapley values, originating in game theory and increasingly prominent in explainable AI, have been proposed to assess the contribution of facts in query answering over databases, along with other similar power indices such as Banzhaf values. In this work ...
Chase Termination Beyond Polynomial Time
The chase is a widely implemented approach to reason with tuple-generating dependencies (tgds), used in data exchange, data integration, and ontology-based query answering. However, it is merely a semi-decision procedure, which may fail to terminate. ...
Continual Release of Differentially Private Synthetic Data from Longitudinal Data Collections
Motivated by privacy concerns in long-term longitudinal studies in medical and social science research, we study the problem of continually releasing differentially private synthetic data from longitudinal data collections. We introduce a model where, in ...
Postulates for Provenance: Instance-based provenance for first-order logic
Instance-based provenance is an explanation for a query result in the form of a subinstance of the database. We investigate different desiderata one may want to impose on these subinstances. Concretely we consider seven basic postulates for provenance. ...
Join Size Bounds using lp-Norms on Degree Sequences
Estimating the output size of a query is a fundamental yet longstanding problem in database query processing. Traditional cardinality estimators used by database systems can routinely underestimate the true output size by orders of magnitude, which leads ...
Topology-aware Parallel Joins
We study the design and analysis of parallel join algorithms in a topology-aware computational model. In this model, the network is modeled as a directed graph, where each edge is associated with a cost function that depends on the data transferred ...
Fast Matrix Multiplication for Query Processing
This paper studies how to use fast matrix multiplication to speed up query processing. As observed, computing a two-table join and then projecting away the join attribute is essentially the Boolean matrix multiplication problem, which can be ...
Combined Approximations for Uniform Operational Consistent Query Answering
Operational consistent query answering (CQA) is a recent framework for CQA based on revised definitions of repairs, which are built by applying a sequence of operations (e.g., fact deletions) starting from an inconsistent database until we reach a ...
Distinct Shortest Walk Enumeration for RPQs
We consider the Distinct Shortest Walks problem. Given two vertices s and t of a graph database D and a regular path query, we want to enumerate all walks of minimal length from s to t that carry a label that conforms to the query. Usual theoretical ...
Layered List Labeling
The list-labeling problem is one of the most basic and well-studied algorithmic primitives in data structures, with an extensive literature spanning upper bounds, lower bounds, and data management applications. The classical algorithm for this problem, ...
On the Feasibility of Forgetting in Data Streams
In today's digital age, it is becoming increasingly prevalent to retain digital footprints in the cloud indefinitely. Nonetheless, there is a valid argument that entities should have the authority to decide whether their personal data remains within a ...