Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Join indices

Published: 01 June 1987 Publication History

Abstract

In new application areas of relational database systems, such as artificial intelligence, the join operator is used more extensively than in conventional applications. In this paper, we propose a simple data structure, called a join index, for improving the performance of joins in the context of complex queries. For most of the joins, updates to join indices incur very little overhead. Some properties of a join index are (i) its efficient use of memory and adaptiveness to parallel execution, (ii) its compatibility with other operations (including select and union), (iii) its support for abstract data type join predicates, (iv) its support for multirelation clustering, and (v) its use in representing directed graphs and in evaluating recursive queries. Finally, the analysis of the join algorithm using join indices shows its excellent performance.

References

[1]
BAYER, R., AND MCCREIGHT, E. Organization and maintenance of large ordered indexes. Acts inf. 1, 3, 1972.
[2]
BITTON, D., BORAL, H., DEWITT, D. J., AND WILKINSON, W.K. Parallel algorithms for the execution of relational database operations. ACM Trans. Database Syst. 8, 3 (Sept. 1983), 324-353.
[3]
BITTON, D., DEWITT, D. J., AND TURBYFILL, C. Benchmarking database systems: A systematic approach. In International Conference on Very Large Databases (Florence, Nov. 1983), VLDB Endowment Ed., 1983, pp. 8-19.
[4]
BLASGEN, S. W., AND ESWARAN, Z.P. Storage and access in relational databases. IBM Syst. J. 16, 4, (1977).
[5]
BRATBERGSENGEN, K. Hashing methods and relational algebra operations. In International Conference on Very Large Databases (Singapore, Aug. 1984), VLDB Endowment Ed., 1984, pp. 323-333.
[6]
CODD, E.F. Extending the database relational model to capture more meaning. A CM Trans. Database Syst. 4, 4 (Dec. 1979), 397-434.
[7]
COMER, D. The ubiquitous B-tree. Comput. Surv. 11, 2, (June 1979), 121-137.
[8]
COPELAND, G. P., AND KHOSHAFIAN, S. A decomposition storage model. In Proceedings of ACM-SIGMOD International Conference on Management of Data (Austin, Tex., May 28-31, 1985). ACM, New York, 1985, pp. 268-279.
[9]
DEEN, S. M. An implementation of impure surrogates. In International Conference on Very Large Databases (Mexico City, Sept. 1982), VLDB Endowment Ed., 1982, pp. 245-256.
[10]
DEWITT, D. J., KATZ, R. H., OLKEN, F., SHAPIRO, L. D., STONEBRAKER, M. R., AND WOOD, D. Implementation techniques for main memory database systems. In SIGMOD '84: Proceedings of Annual Meeting (Boston, June 18-21, 1984). ACM, New York, 1984, pp. 1-8.
[11]
GOTLIEB, L.R. Computing joins of relations. In A CM-SIGMOD International Conference on Management of Data (San Jose, Calif., May 14-16, 1975). ACM, New York, 1975, pp. 55-63.
[12]
Guz'rAG, J. Abstract data types and the development of data structures. Commun. ACM, 20, 6 (June 1977), 396-404.
[13]
HAERDER, T. Implementing a generalized access path structure for a relational database system. ACM Trans. Database Syst. 3, 3, (Sept. 1978), 285-298.
[14]
HALL, P., OWLETT, J., AND TODD, S. J.P. Relations and entities. In Modelling in DBMS, G. Nijssen, Ed. North-Holland, Amsterdam, 1976, pp. 201-220.
[15]
JABKE, M., AND SCHMIDT, J. Query processing strategies in the Pascal/R relational database management system. In ACM SIGMOD International Conference on Management of Data (Orlando, Fla., June 2-4, 1982). ACM, New York, 1982, pp. 256-264.
[16]
KNUTH, D. The Art of Computer Programming: Sorting and Searching. Addison-Wesley, Reading, Mass., 1973.
[17]
MISSIKOFF, M. A domain based internal schema for relational database machines. In ACM SIGMOD International Conference on Management of Data (Orlando, Fla., June 2-4, 1982). ACM, New York, 1982, pp. 215-224.
[18]
ROUSSOPOULOS, N. View indexing in relational databases. ACM Trans. Database Syst. 7, 2 (June 1982), 258-290.
[19]
SIMON, E., AND VALDURIEZ, P. Design and implementation of an extendible integrity subsystem. In SIGMOD 84: Proceedings o/Annual Meeting (Boston, June 18-21, 1984). ACM, New York, 1984, pp. 9-17.
[20]
TSICHRITZIS, D. LSL: A link and selector language. In ACM SIGMOD International Conference on Management of Data (Washington, D.C., June 2-4, 1976). ACM, New York, 1976, pp. 123- 134.
[21]
VALDURIEZ, P., AND GARDARIN, G. Join and semijoin algorithms for a muitiprocessor database machine. ACM Trans. Database Syst. 9, 1 (Mar. 1984), 133-161.
[22]
VALDURIEZ, P., AND BORAL, H. Evaluation of recursive queries using join indices. In Proceedings o{ the 1st International Con{erence on Expert Database Systems (Charleston, S.C., Apr. 1986), Benjamin/Cummings, Menlo Park, Calif., 1986, pp. 197-208.
[23]
VALDURIEZ, P., KHOSHAFIAN, S., AND COPELAND, G. Implementation techniques of complex objects. In International Con{erence on Very Large Databases (Kyoto, Aug. 1986), VLDB Endowment Ed., 1986, pp. 197-208.
[24]
Y^o, S. B. Approximating block accesses in database organizations. Commun. A CM 20, 4 (Apr. 1977), 260-261.

Cited By

View all

Recommendations

Reviews

Robert J. Tufts

Since its inception, relational database technology has been plagued with operational efficiency problems for large amounts of data and for complex queries. Much of the operational inefficiency can be attributed to the so-called 80-20 rule (80 percent of the processing is done on 20 percent of the data). Conventional (business) database management systems take advantage of this rule by using indices, linked lists, and built-in data relationships to optimize 80 percent of normal processing. Relational DBMSs, however, have not been able to do this because such optimizing shortcuts violate the mathematical foundation of relational technology and lead to an infinite number of “special case” implementation headaches. Because of join indices, this may not be the case any longer. In this paper, Valduriez proposes a strong argument and the mathematical foundation for including join indices into relational database technology. Join indices will definitely speed up almost all relational database processing. A join index is a binary relation that ties together two other relations into a bidirectional parent-child relationship. In effect, join indices can be used to predefine tree and network data relationships for the most important and/or most often used joins. Join indices are small (containing only two attributes), can be sorted and clustered easily, and can be implemented using inverted indices for secondary key access. As such, they form a tremendous acceleration mechanism for optimizing joins. The basic purpose of this paper is to propose an efficient implementation of join indices, give algorithms for joins and updates, and show the superior performance that join indices give to complex queries. The author admirably fulfills this purpose with a well-written, easily-understood tutorial. The paper contains good examples to explain each algorithm, rigorous proofs of most formulas, and graphs showing performance times against the University of Wisconsin benchmarks. The author concludes that join indices should be employed whenever possible in the implementation of relational DBMSs. The best feature of the presentation is the reference to “real world” operational problems and how join indices provide similar benefits to relational systems that inverted indices, clustered indices, B+ trees, and linked lists provide to CODASYL and hierarchical DBMSs. Perhaps the only shortcoming is the limited amount of performance data plotted against clock times. This would have been a major selling point when compared against other techniques using the Wisconsin benchmarks. The author provides performance statistics relative to the hybrid hash technique and also mentions that additional performance measurements are being conducted. The length of the paper is about right for the discussion given, and the references are up to date. The writing style is aimed at a level such that those with only a few years of experience in relational systems can quickly grasp the concepts. It also contains enough technical discussions to whet the appetite of the most ardent researcher. Overall, this outstanding paper will certainly become a classic reference for relational DBMS vendors, professionals, and researchers in the years to come. I fully expect the concept of join indices to be implemented in most commercially available packages in the near future.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 12, Issue 2
June 1987
185 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/22952
  • Editor:
  • Robert W. Taylor
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1987
Published in TODS Volume 12, Issue 2

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)134
  • Downloads (Last 6 weeks)14
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Hybrid Materialization in a Disk-Based Column-StoreProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632422(164-172)Online publication date: 4-Jan-2024
  • (2024)Database management system performance comparisonsJournal of Systems and Software10.1016/j.jss.2023.111872208:COnline publication date: 1-Feb-2024
  • (2023)Multi-valued indexing in Apache AsterixDB (SI DOLAP 2022)Information Systems10.1016/j.is.2022.102144113:COnline publication date: 1-Jan-2023
  • (2023)BibliographyData Mining10.1016/B978-0-12-811760-6.00024-2(681-734)Online publication date: 2023
  • (2023)Data warehousing and online analytical processingData Mining10.1016/B978-0-12-811760-6.00013-8(85-144)Online publication date: 2023
  • (2023)Finding a Second Wind: Speeding Up Graph Traversal Queries in RDBMSs Using Column-Oriented ProcessingModel and Data Engineering10.1007/978-3-031-49333-1_14(186-199)Online publication date: 2-Nov-2023
  • (2022)In-Memory Indexed Caching for Distributed Data Processing2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00019(104-114)Online publication date: May-2022
  • (2022)Implementing the Comparison-Based External SortNew Trends in Database and Information Systems10.1007/978-3-031-15743-1_46(500-511)Online publication date: 29-Aug-2022
  • (2021)SPARQL2Flink: Evaluation of SPARQL Queries on Apache FlinkApplied Sciences10.3390/app1115703311:15(7033)Online publication date: 30-Jul-2021
  • (2021)PatchIndex: exploiting approximate constraints in distributed databasesDistributed and Parallel Databases10.1007/s10619-021-07326-139:3(833-853)Online publication date: 1-Sep-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media