Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJune 2016
SLING: A Near-Optimal Index Structure for SimRank
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 1859–1874https://doi.org/10.1145/2882903.2915243SimRank is a similarity measure for graph nodes that has numerous applications in practice. Scalable SimRank computation has been the subject of extensive research for more than a decade, and yet, none of the existing solutions can efficiently derive ...
- research-articleJune 2016
Speedup Graph Processing by Graph Ordering
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 1813–1828https://doi.org/10.1145/2882903.2915220The CPU cache performance is one of the key issues to efficiency in database systems. It is reported that cache miss latency takes a half of the execution time in database systems. To improve the CPU cache performance, there are studies to support ...
- posterJune 2016
Constructing Join Histograms from Histograms with q-error Guarantees
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 2245–2246https://doi.org/10.1145/2882903.2914828Histograms are implemented and used in any database system, usually defined on a single-column of a database table. However, one of the most desired statistical data in such systems are statistics on the correlation among columns. In this paper we ...
- research-articleJune 2016
Big Graph Analytics Systems
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 2241–2243https://doi.org/10.1145/2882903.2912566In recent years we have witnessed a surging interest in developing Big Graph processing systems. To date, tens of Big Graph systems have been proposed. This tutorial provides a timely and comprehensive review of existing Big Graph systems, and ...
- research-articleJune 2016
REACT: Context-Sensitive Recommendations for Data Analysis
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 2137–2140https://doi.org/10.1145/2882903.2899392Data analysis may be a difficult task, especially for non-expert users, as it requires deep understanding of the investigated domain and the particular context. In this demo we present REACT, a system that hooks to the analysis UI and provides the users ...
- research-articleJune 2016
QFix: Demonstrating Error Diagnosis in Query Histories
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 2177–2180https://doi.org/10.1145/2882903.2899388An increasing number of applications in all aspects of society rely on data. Despite the long line of research in data cleaning and repairs, data correctness has been an elusive goal. Errors in the data can be extremely disruptive, and are detrimental ...
ROLL: Fast In-Memory Generation of Gigantic Scale-free Networks
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 1829–1842https://doi.org/10.1145/2882903.2882964Real-world graphs are not always publicly available or sometimes do not meet specific research requirements. These challenges call for generating synthetic networks that follow properties of the real-world networks. Barabási-Albert (BA) is a well-known ...
- research-articleJune 2016
Data Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 1011–1025https://doi.org/10.1145/2882903.2915245The increasing ability to collect data from urban environments, coupled with a push towards openness by governments, has resulted in the availability of numerous spatio-temporal data sets covering diverse aspects of a city. Discovering relationships ...
- research-articleJune 2016
Efficient Subgraph Matching by Postponing Cartesian Products
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 1199–1214https://doi.org/10.1145/2882903.2915236In this paper, we study the problem of subgraph matching that extracts all subgraph isomorphic embeddings of a query graph q in a large data graph G. The existing algorithms for subgraph matching follow Ullmann's backtracking approach; that is, ...
- research-articleJune 2016
Graph Stream Summarization: From Big Bang to Big Crunch
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 1481–1496https://doi.org/10.1145/2882903.2915223A graph stream, which refers to the graph with edges being updated sequentially in a form of a stream, has important applications in cyber security and social networks. Due to the sheer volume and highly dynamic nature of graph streams, the practical ...
- research-articleJune 2016
Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 695–710https://doi.org/10.1145/2882903.2915207Influence Maximization (IM), that seeks a small set of key users who spread the influence widely into the network, is a core problem in multiple domains. It finds applications in viral marketing, epidemic control, and assessing cascading failures within ...
- research-articleJune 2016
Extracting Databases from Dark Data with DeepDive
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 847–859https://doi.org/10.1145/2882903.2904442DeepDive is a system for extracting relational databases from dark data: the mass of text, tables, and images that are widely collected and stored but which cannot be exploited by standard relational tools. If the information in dark data --- scientific ...
- short-paperJune 2016
SparkR: Scaling R Programs with Spark
- Shivaram Venkataraman,
- Zongheng Yang,
- Davies Liu,
- Eric Liang,
- Hossein Falaki,
- Xiangrui Meng,
- Reynold Xin,
- Ali Ghodsi,
- Michael Franklin,
- Ion Stoica,
- Matei Zaharia
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 1099–1104https://doi.org/10.1145/2882903.2903740R is a popular statistical programming language with a number of extensions that support data processing and machine learning tasks. However, interactive data analysis in R is usually limited as the R runtime is single threaded and can only process data ...
- research-articleJune 2016
Towards Globally Optimal Crowdsourcing Quality Management: The Uniform Worker Setting
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 47–62https://doi.org/10.1145/2882903.2882953We study crowdsourcing quality management, that is, given worker responses to a set of tasks, our goal is to jointly estimate the true answers for the tasks, as well as the quality of the workers. Prior work on this problem relies primarily on applying ...
- research-articleJune 2016
Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters
- Srikanth Kandula,
- Anil Shanbhag,
- Aleksandar Vitorovic,
- Matthaios Olma,
- Robert Grandl,
- Surajit Chaudhuri,
- Bolin Ding
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 631–646https://doi.org/10.1145/2882903.2882940We present a system that approximates the answer to complex ad-hoc queries in big-data clusters by injecting samplers on-the-fly and without requiring pre-existing samples. Improvements can be substantial when big-data queries take multiple passes over ...
- research-articleJune 2016
Hybrid Pulling/Pushing for I/O-Efficient Distributed and Iterative Graph Computing
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 479–494https://doi.org/10.1145/2882903.2882938Billion-node graphs are rapidly growing in size in many applications such as online social networks. Most graph algorithms generate a large number of messages during iterative computations. Vertex-centric distributed systems usually store graph data and ...
- research-articleJune 2016
Holistic Influence Maximization: Combining Scalability and Efficiency with Opinion-Aware Models
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 743–758https://doi.org/10.1145/2882903.2882929The steady growth of graph data from social networks has resulted in wide-spread research in finding solutions to the influence maximization problem. In this paper, we propose a holistic solution to the influence maximization (IM) problem. (1) We ...
- research-articleJune 2016
Truss Decomposition of Probabilistic Graphs: Semantics and Algorithms
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 77–90https://doi.org/10.1145/2882903.2882913A key operation in network analysis is the discovery of cohesive subgraphs. The notion of $k$-truss has gained considerable popularity in this regard, based on its rich structure and efficient computability. However, many complex networks such as social,...
- research-articleJune 2016
Learning-Based Cleansing for Indoor RFID Data
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 925–936https://doi.org/10.1145/2882903.2882907RFID is widely used for object tracking in indoor environments, e.g., airport baggage tracking. Analyzing RFID data offers insight into the underlying tracking systems as well as the associated business processes. However, the inherent uncertainty in ...