Mining quantitative association rules in large relational tables
We introduce the problem of mining association rules in large relational tables containing both quantitative and categorical attributes. An example of such an association might be "10% of married people between age 50 and 60 have at least 2 cars". We ...
Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization
We discuss data mining based on association rules for two numeric attributes and one Boolean attribute. For example, in a database of bank customers, "Age" and "Balance" are two numeric attributes, and "CardLoan" is a Boolean attribute. Taking the pair (...
IDEA: interactive data exploration and analysis
The analysis of business data is often an ill-defined task characterized by large amounts of noisy data. Because of this, business data analysis must combine two kinds of intertwined tasks: exploration and analysis. Exploration is the process of finding ...
Rapid bushy join-order optimization with Cartesian products
Query optimizers often limit the search space for join orderings, for example by excluding Cartesian products in subplans or by restricting plan trees to left-deep vines. Such exclusions are widely assumed to reduce optimization effort while minimally ...
SQL query optimization: reordering for a general class of queries
The strength of commercial query optimizers like DB2 comes from their ability to select an optimal order by generating all equivalent reorderings of binary operators. However, there are no known methods to generate all equivalent reorderings for a SQL ...
Fundamental techniques for order optimization
Decision support applications are growing in popularity as more business data is kept on-line. Such applications typically include complex SQL queries that can test a query optimizer's ability to produce an efficient access plan. Many access plan ...
A Teradata content-based multimedia object manager for massively parallel architectures
- W. O'Connell,
- I. T. Ieong,
- D. Schrader,
- C. Watson,
- G. Au,
- A. Biliris,
- S. Choo,
- P. Colin,
- G. Linderman,
- E. Panagos,
- J. Wang,
- T. Walter
The Teradata Multimedia Object Manager is a general-purpose content analysis multimedia server designed for symmetric multiprocessing and massively parallel processing environments. The Multimedia Object Manager defines and manipulates user-defined ...
Fault-tolerant architectures for continuous media servers
Continuous media servers that provide support for the storage and retrieval of continuous media data (e.g., video, audio) at guaranteed rates are becoming increasingly important. Such servers, typically, rely on several disks to service a large number ...
Optimizing queries over multimedia repositories
Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A selection on these attributes will typically produce not just a set of objects, as in the traditional relational query model (...
BIRCH: an efficient data clustering method for very large databases
Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely studied problems in this area is the identification of clusters, or densely populated regions, in a multi-dimensional dataset. Prior work ...
On-line reorganization of sparsely-populated B+-trees
In this paper, we present an efficient method to do online reorganization of sparsely-populated B+-trees. It reorganizes the leaves first, compacting in short operations groups of leaves with the same parent. After compacting, optionally, the new leaves ...
Two techniques for on-line index modification in shared nothing parallel databases
Whenever data is moved across nodes in the parallel database system, the indexes need to be modified too. Index modification overhead can be quite severe because there can be a large number of indexes on a relation. In this paper, we study two ...
Query caching and optimization in distributed mediator systems
Query processing and optimization in mediator systems that access distributed non-proprietary sources pose many novel problems. Cost-based query optimization is hard because the mediator does not have access to source statistics information and ...
Performance tradeoffs for client-server query processing
The construction of high-performance database systems that combine the best aspects of the relational and object-oriented approaches requires the design of client-server architectures that can fully exploit client and server resources in a flexible ...
Data access for the masses through OLE DB
This paper presents an overview of OLE DB, a set of interfaces being developed at Microsoft whose goal is to enable applications to have uniform access to data stored in DBMS and non-DBMS information containers. Applications will be able to take ...
The dangers of replication and a solution
Update anywhere-anytime-anyway transactional replication has unstable behavior as the workload scales up: a ten-fold increase in nodes and traffic gives a thousand fold increase in deadlocks or reconciliations. Master copy replication (primary copy) ...
Hot mirroring: a method of hiding parity update penalty and degradation during rebuilds for RAID5
This paper proposes a storage management scheme for disk arrays, named hot mirroring. In this scheme, storage space is partitioned into two regions. One is the mirrored region, which is characterized by high performance and low storage efficiency. The ...
Random I/O scheduling in online tertiary storage systems
New database applications that require the storage and retrieval of many terabytes of data are reaching the limits for disk-based storage systems, in terms of both cost and scalability. These limits provide a strong incentive for the development of ...
Implementing data cubes efficiently
Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view ...
Providing better support for a class of decision support queries
Relational database systems do not effectively support complex queries containing quantifiers (quantified queries) that are increasingly becoming important in decision support applications. Generalized quantifiers provide an effective way of expressing ...
A query language for multidimensional arrays: design, implementation, and optimization techniques
While much recent research has focussed on extending databases beyond the traditional relational model, relatively little has been done to develop database tools for querying data organized in (multidimensional) arrays. The scientific computing ...
A super scalar sort algorithm for RISC processors
The compare and branch sequences required in a traditional sort algorithm can not efficiently exploit multiple execution units present in currently available high performance RISC processors. This is because of the long latency of the compare ...
Spatial hash-joins
We examine how to apply the hash-join paradigm to spatial joins, and define a new framework for spatial hash-joins. Our spatial partition functions have two components: a set of bucket extents and an assignment function, which may map a data item into ...
Partition based spatial-merge join
This paper describes PBSM (Partition Based Spatial-Merge), a new algorithm for performing spatial join operation. This algorithm is especially effective when neither of the inputs to the join have an index on the joining attribute. Such a situation ...
Bifocal sampling for skew-resistant join size estimation
This paper introduces bifocal sampling, a new technique for estimating the size of an equi-join of two relations. Bifocal sampling classifies tuples in each relation into two groups, sparse and dense, based on the number of tuples with the same join ...
Estimating alphanumeric selectivity in the presence of wildcards
Success of commercial query optimizers and database management systems (object-oriented or relational) depend on accurate cost estimation of various query reordering [BGI]. Estimating predicate selectivity, or the fraction of rows in a database that ...
Improved histograms for selectivity estimation of range predicates
Many commercial database systems maintain histograms to summarize the contents of relations and permit efficient estimation of query result sizes and access plan costs. Although several types of histograms have been proposed in the past, there has never ...
Structures for manipulating proposed updates in object-oriented databases
Support for virtual states and deltas between them is useful for a variety of database applications, including hypothetical database access, version management, simulation, and active databases. The Heraclitus paradigm elevates delta values to be "first-...
Safe and efficient sharing of persistent objects in Thor
Thor is an object-oriented database system designed for use in a heterogeneous distributed environment. It provides highly-reliable and highly-available persistent storage for objects, and supports safe sharing of these objects by applications written ...
An open abstract-object storage system
Database systems must become more open to retain their relevance as a technology of choice and necessity. Openness implies not only databases exporting their data, but also exporting their services. This is as true in classical application areas as in ...