article

Free access

Join indices

Editor: Robert W. Taylor Author:

Patrick ValduriezAuthors Info & Claims

ACM Transactions on Database Systems (TODS), Volume 12, Issue 2

Pages 218 - 246

https://doi.org/10.1145/22952.22955

Published: 01 June 1987 Publication History

PDF eReader

Abstract

In new application areas of relational database systems, such as artificial intelligence, the join operator is used more extensively than in conventional applications. In this paper, we propose a simple data structure, called a join index, for improving the performance of joins in the context of complex queries. For most of the joins, updates to join indices incur very little overhead. Some properties of a join index are (i) its efficient use of memory and adaptiveness to parallel execution, (ii) its compatibility with other operations (including select and union), (iii) its support for abstract data type join predicates, (iv) its support for multirelation clustering, and (v) its use in representing directed graphs and in evaluating recursive queries. Finally, the analysis of the join algorithm using join indices shows its excellent performance.

References

[1]

BAYER, R., AND MCCREIGHT, E. Organization and maintenance of large ordered indexes. Acts inf. 1, 3, 1972.

Google Scholar

[2]

BITTON, D., BORAL, H., DEWITT, D. J., AND WILKINSON, W.K. Parallel algorithms for the execution of relational database operations. ACM Trans. Database Syst. 8, 3 (Sept. 1983), 324-353.

Digital Library

Google Scholar

[3]

BITTON, D., DEWITT, D. J., AND TURBYFILL, C. Benchmarking database systems: A systematic approach. In International Conference on Very Large Databases (Florence, Nov. 1983), VLDB Endowment Ed., 1983, pp. 8-19.

Digital Library

Google Scholar

[4]

BLASGEN, S. W., AND ESWARAN, Z.P. Storage and access in relational databases. IBM Syst. J. 16, 4, (1977).

Digital Library

Google Scholar

[5]

BRATBERGSENGEN, K. Hashing methods and relational algebra operations. In International Conference on Very Large Databases (Singapore, Aug. 1984), VLDB Endowment Ed., 1984, pp. 323-333.

Digital Library

Google Scholar

[6]

CODD, E.F. Extending the database relational model to capture more meaning. A CM Trans. Database Syst. 4, 4 (Dec. 1979), 397-434.

Digital Library

Google Scholar

[7]

COMER, D. The ubiquitous B-tree. Comput. Surv. 11, 2, (June 1979), 121-137.

Digital Library

Google Scholar

[8]

COPELAND, G. P., AND KHOSHAFIAN, S. A decomposition storage model. In Proceedings of ACM-SIGMOD International Conference on Management of Data (Austin, Tex., May 28-31, 1985). ACM, New York, 1985, pp. 268-279.

Digital Library

Google Scholar

[9]

DEEN, S. M. An implementation of impure surrogates. In International Conference on Very Large Databases (Mexico City, Sept. 1982), VLDB Endowment Ed., 1982, pp. 245-256.

Digital Library

Google Scholar

[10]

DEWITT, D. J., KATZ, R. H., OLKEN, F., SHAPIRO, L. D., STONEBRAKER, M. R., AND WOOD, D. Implementation techniques for main memory database systems. In SIGMOD '84: Proceedings of Annual Meeting (Boston, June 18-21, 1984). ACM, New York, 1984, pp. 1-8.

Digital Library

Google Scholar

[11]

GOTLIEB, L.R. Computing joins of relations. In A CM-SIGMOD International Conference on Management of Data (San Jose, Calif., May 14-16, 1975). ACM, New York, 1975, pp. 55-63.

Digital Library

Google Scholar

[12]

Guz'rAG, J. Abstract data types and the development of data structures. Commun. ACM, 20, 6 (June 1977), 396-404.

Digital Library

Google Scholar

[13]

HAERDER, T. Implementing a generalized access path structure for a relational database system. ACM Trans. Database Syst. 3, 3, (Sept. 1978), 285-298.

Digital Library

Google Scholar

[14]

HALL, P., OWLETT, J., AND TODD, S. J.P. Relations and entities. In Modelling in DBMS, G. Nijssen, Ed. North-Holland, Amsterdam, 1976, pp. 201-220.

Google Scholar

[15]

JABKE, M., AND SCHMIDT, J. Query processing strategies in the Pascal/R relational database management system. In ACM SIGMOD International Conference on Management of Data (Orlando, Fla., June 2-4, 1982). ACM, New York, 1982, pp. 256-264.

Digital Library

Google Scholar

[16]

KNUTH, D. The Art of Computer Programming: Sorting and Searching. Addison-Wesley, Reading, Mass., 1973.

Digital Library

Google Scholar

[17]

MISSIKOFF, M. A domain based internal schema for relational database machines. In ACM SIGMOD International Conference on Management of Data (Orlando, Fla., June 2-4, 1982). ACM, New York, 1982, pp. 215-224.

Digital Library

Google Scholar

[18]

ROUSSOPOULOS, N. View indexing in relational databases. ACM Trans. Database Syst. 7, 2 (June 1982), 258-290.

Digital Library

Google Scholar

[19]

SIMON, E., AND VALDURIEZ, P. Design and implementation of an extendible integrity subsystem. In SIGMOD 84: Proceedings o/Annual Meeting (Boston, June 18-21, 1984). ACM, New York, 1984, pp. 9-17.

Digital Library

Google Scholar

[20]

TSICHRITZIS, D. LSL: A link and selector language. In ACM SIGMOD International Conference on Management of Data (Washington, D.C., June 2-4, 1976). ACM, New York, 1976, pp. 123- 134.

Crossref

Google Scholar

[21]

VALDURIEZ, P., AND GARDARIN, G. Join and semijoin algorithms for a muitiprocessor database machine. ACM Trans. Database Syst. 9, 1 (Mar. 1984), 133-161.

Digital Library

Google Scholar

[22]

VALDURIEZ, P., AND BORAL, H. Evaluation of recursive queries using join indices. In Proceedings o{ the 1st International Con{erence on Expert Database Systems (Charleston, S.C., Apr. 1986), Benjamin/Cummings, Menlo Park, Calif., 1986, pp. 197-208.

Google Scholar

[23]

VALDURIEZ, P., KHOSHAFIAN, S., AND COPELAND, G. Implementation techniques of complex objects. In International Con{erence on Very Large Databases (Kyoto, Aug. 1986), VLDB Endowment Ed., 1986, pp. 197-208.

Digital Library

Google Scholar

[24]

Y^o, S. B. Approximating block accesses in database organizations. Commun. A CM 20, 4 (Apr. 1977), 260-261.

Digital Library

Google Scholar

Cited By

View all

Klyuchikov EPolyntsov MChizhov AMikhailova EChernishev G(2024)Hybrid Materialization in a Disk-Based Column-StoreProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632422(164-172)Online publication date: 4-Jan-2024
https://dl.acm.org/doi/10.1145/3632410.3632422
Taipalus T(2024)Database management system performance comparisonsJournal of Systems and Software10.1016/j.jss.2023.111872208:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.jss.2023.111872
Galvizo GCarey M(2023)Multi-valued indexing in Apache AsterixDB (SI DOLAP 2022)Information Systems10.1016/j.is.2022.102144113:COnline publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1016/j.is.2022.102144
Show More Cited By

Index Terms

Recommendations

Multi-table joins through bitmapped join indices

This technical note shows how to combine some well-known techniques to create a method that will efficiently execute common multi-table joins. We concentrate on a commonly occurring type of join known as a star-join, although the method presented will ...
Mediator join indices
RIDE '97: Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications

A mediator join index (MJI) is proposed to speed up N-way inter-database joins by reducing the amount of data transfer during evaluation. A family of algorithms, the query scrubbing algorithms (QSA), are developed to maintain MJI and to evaluate queries ...
Multi-way spatial join selectivity for the ring join graph

Efficient spatial query processing is very important since the applications of the spatial DBMS (e.g. GIS, CAD/CAM, LBS) handle massive amount of data and consume much time. Many spatial queries contain the multi-way spatial join due to the fact that ...

Reviews

Reviewer: Robert J. Tufts

Since its inception, relational database technology has been plagued with operational efficiency problems for large amounts of data and for complex queries. Much of the operational inefficiency can be attributed to the so-called 80-20 rule (80 percent of the processing is done on 20 percent of the data). Conventional (business) database management systems take advantage of this rule by using indices, linked lists, and built-in data relationships to optimize 80 percent of normal processing. Relational DBMSs, however, have not been able to do this because such optimizing shortcuts violate the mathematical foundation of relational technology and lead to an infinite number of “special case” implementation headaches. Because of join indices, this may not be the case any longer. In this paper, Valduriez proposes a strong argument and the mathematical foundation for including join indices into relational database technology. Join indices will definitely speed up almost all relational database processing. A join index is a binary relation that ties together two other relations into a bidirectional parent-child relationship. In effect, join indices can be used to predefine tree and network data relationships for the most important and/or most often used joins. Join indices are small (containing only two attributes), can be sorted and clustered easily, and can be implemented using inverted indices for secondary key access. As such, they form a tremendous acceleration mechanism for optimizing joins. The basic purpose of this paper is to propose an efficient implementation of join indices, give algorithms for joins and updates, and show the superior performance that join indices give to complex queries. The author admirably fulfills this purpose with a well-written, easily-understood tutorial. The paper contains good examples to explain each algorithm, rigorous proofs of most formulas, and graphs showing performance times against the University of Wisconsin benchmarks. The author concludes that join indices should be employed whenever possible in the implementation of relational DBMSs. The best feature of the presentation is the reference to “real world” operational problems and how join indices provide similar benefits to relational systems that inverted indices, clustered indices, B+ trees, and linked lists provide to CODASYL and hierarchical DBMSs. Perhaps the only shortcoming is the limited amount of performance data plotted against clock times. This would have been a major selling point when compared against other techniques using the Wisconsin benchmarks. The author provides performance statistics relative to the hybrid hash technique and also mentions that additional performance measurements are being conducted. The length of the paper is about right for the discussion given, and the references are up to date. The writing style is aimed at a level such that those with only a few years of experience in relational systems can quickly grasp the concepts. It also contains enough technical discussions to whet the appetite of the most ardent researcher. Overall, this outstanding paper will certainly become a classic reference for relational DBMS vendors, professionals, and researchers in the years to come. I fully expect the concept of join indices to be implemented in most commercially available packages in the near future.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems

ACM Transactions on Database Systems Volume 12, Issue 2

June 1987

185 pages

ISSN:0362-5915

EISSN:1557-4644

DOI:10.1145/22952

Editor:
Robert W. Taylor

Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1987

Published in TODS Volume 12, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

447
Total Citations
View Citations
1,714
Total Downloads

Downloads (Last 12 months)134
Downloads (Last 6 weeks)14

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Klyuchikov EPolyntsov MChizhov AMikhailova EChernishev G(2024)Hybrid Materialization in a Disk-Based Column-StoreProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632422(164-172)Online publication date: 4-Jan-2024
https://dl.acm.org/doi/10.1145/3632410.3632422
Taipalus T(2024)Database management system performance comparisonsJournal of Systems and Software10.1016/j.jss.2023.111872208:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.jss.2023.111872
Galvizo GCarey M(2023)Multi-valued indexing in Apache AsterixDB (SI DOLAP 2022)Information Systems10.1016/j.is.2022.102144113:COnline publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1016/j.is.2022.102144
(2023)BibliographyData Mining10.1016/B978-0-12-811760-6.00024-2(681-734)Online publication date: 2023
https://doi.org/10.1016/B978-0-12-811760-6.00024-2
Han JPei JTong H(2023)Data warehousing and online analytical processingData Mining10.1016/B978-0-12-811760-6.00013-8(85-144)Online publication date: 2023
https://doi.org/10.1016/B978-0-12-811760-6.00013-8
Firsov MPolyntsov MSmirnov KChernishev G(2023)Finding a Second Wind: Speeding Up Graph Traversal Queries in RDBMSs Using Column-Oriented ProcessingModel and Data Engineering10.1007/978-3-031-49333-1_14(186-199)Online publication date: 2-Nov-2023
https://dl.acm.org/doi/10.1007/978-3-031-49333-1_14
Uta AGhit BDave ARellermeyer JBoncz P(2022)In-Memory Indexed Caching for Distributed Data Processing2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00019(104-114)Online publication date: May-2022
https://doi.org/10.1109/IPDPS53621.2022.00019
Polyntsov MGrigorev VSmirnov KChernishev G(2022)Implementing the Comparison-Based External SortNew Trends in Database and Information Systems10.1007/978-3-031-15743-1_46(500-511)Online publication date: 29-Aug-2022
https://doi.org/10.1007/978-3-031-15743-1_46
Ceballos ORamírez Restrepo CPabón MCastillo ACorcho O(2021)SPARQL2Flink: Evaluation of SPARQL Queries on Apache FlinkApplied Sciences10.3390/app1115703311:15(7033)Online publication date: 30-Jul-2021
https://doi.org/10.3390/app11157033
Kläbe SSattler KBaumann S(2021)PatchIndex: exploiting approximate constraints in distributed databasesDistributed and Parallel Databases10.1007/s10619-021-07326-139:3(833-853)Online publication date: 1-Sep-2021
https://dl.acm.org/doi/10.1007/s10619-021-07326-1
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Index Terms

Recommendations

Multi-table joins through bitmapped join indices

Mediator join indices

Multi-way spatial join selectivity for the ring join graph

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations