Issue Downloads
Finding persistent items in data streams
Frequent item mining, which deals with finding items that occur frequently in a given data stream over a period of time, is one of the heavily studied problems in data stream mining. A generalized version of frequent item mining is the persistent item ...
Bluecache: a scalable distributed flash-based key-value store
A key-value store (KVS), such as memcached and Redis, is widely used as a caching layer to augment the slower persistent backend storage in data centers. DRAM-based KVS provides fast key-value access, but its scalability is limited by the cost, power ...
A general and parallel platform for mining co-movement patterns over large-scale trajectories
Discovering co-movement patterns from large-scale trajectory databases is an important mining task and has a wide spectrum of applications. Previous studies have identified several types of interesting co-movement patterns and show-cased their ...
VIP-Tree: an effective index for indoor spatial queries
Due to the growing popularity of indoor location-based services, indoor data management has received significant research attention in the past few years. However, we observe that the existing indexing and query processing techniques for the indoor ...
Write-behind logging
The design of the logging and recovery components of database management systems (DBMSs) has always been influenced by the difference in the performance characteristics of volatile (DRAM) and non-volatile storage devices (HDD/SSDs). The key assumption ...
The TileDB array data storage manager
We present a novel storage manager for multi-dimensional arrays that arise in scientific applications, which is part of a larger scientific data management system called TileDB. In contrast to existing solutions, TileDB is optimized for both dense and ...
DOCS: a domain-aware crowdsourcing system using knowledge bases
Crowdsourcing is a new computing paradigm that harnesses human effort to solve computer-hard problems, such as entity resolution and photo tagging. The crowd (or workers) have diverse qualities and it is important to effectively model a worker's ...
Lifting the haze off the cloud: a consumer-centric market for database computation in the cloud
The availability of public computing resources in the cloud has revolutionized data analysis, but requesting cloud resources often involves complex decisions for consumers. Estimating the completion time and cost of a computation and requesting the ...
Two birds, one stone: a fast, yet lightweight, indexing scheme for modern database systems
Classic database indexes (e.g., B+-Tree), though speed up queries, suffer from two main drawbacks: (1) An index usually yields 5% to 15% additional storage overhead which results in non-ignorable dollar cost in big data scenarios especially when ...
History is a mirror to the future: best-effort approximate complex event matching with insufficient resources
Complex event processing (CEP) has proven to be a highly relevant topic in practice. As it is sensitive to both errors in the stream and uncertainty in the pattern, approximate complex event processing (ACEP) is an important direction but has not been ...
Persistent hybrid transactional memory for databases
Processors with hardware support for transactional memory (HTM) are rapidly becoming commonplace, and processor manufacturers are currently working on implementing support for upcoming non-volatile memory (NVM) technologies. The combination of HTM and ...
Skipping-oriented partitioning for columnar layouts
As data volumes continue to grow, modern database systems increasingly rely on data skipping mechanisms to improve performance by avoiding access to irrelevant data. Recent work [39] proposed a fine-grained partitioning scheme that was shown to improve ...
Estimating quantiles from the union of historical and streaming data
Modern enterprises generate huge amounts of streaming data, for example, micro-blog feeds, financial data, network monitoring and industrial application monitoring. While Data Stream Management Systems have proven successful in providing support for ...
Clay: fine-grained adaptive partitioning for general database schemas
Transaction processing database management systems (DBMSs) are critical for today's data-intensive applications because they enable an organization to quickly ingest and query new information. Many of these applications exceed the capabilities of a ...
Effortless data exploration with zenvisage: an expressive and interactive visual analytics system
Data visualization is by far the most commonly used mechanism to explore and extract insights from datasets, especially by novice data scientists. And yet, current visual analytics tools are rather limited in their ability to operate on collections of ...
Subjects
Currently Not Available