article

Free access

Data allocation in distributed database systems

Editor: Gio Wiederhold Author:

Peter M. G. ApersAuthors Info & Claims

ACM Transactions on Database Systems (TODS), Volume 13, Issue 3

Pages 263 - 304

https://doi.org/10.1145/44498.45063

Published: 01 September 1988 Publication History

PDF eReader

Abstract

The problem of allocating the data of a database to the sites of a communication network is investigated. This problem deviates from the well-known file allocation problem in several aspects. First, the objects to be allocated are not known a priori; second, these objects are accessed by schedules that contain transmissions between objects to produce the result. A model that makes it possible to compare the cost of allocations is presented; the cost can be computed for different cost functions and for processing schedules produced by arbitrary query processing algorithms.

For minimizing the total transmission cost, a method is proposed to determine the fragments to be allocated from the relations in the conceptual schema and the queries and updates executed by the users.

For the same cost function, the complexity of the data allocation problem is investigated. Methods for obtaining optimal and heuristic solutions under various ways of computing the cost of an allocation are presented and compared.

Two different approaches to the allocation management problem are presented and their merits are discussed.

References

[1]

ADIBA, M., CHUPIN, J. C., DEMOLOMBE, R., GARDARIN, G., AND BIHAN, J. L. Issues in distributed data base management systems: A technical overview. In Proceedings of the 4th International Conference on Very Large Data Bases (West Berlin, Sept. 1978), pp. 89-110.

Google Scholar

[2]

ADIBA, M. E., AND LINDSAY, B.G. Database snapshots, in Proceedings of Conference on Very Large Data Bases (Montreal, Oct. 1980). pp. 86-91.

Google Scholar

[3]

ADIBA, M.E. Derived relations: A unified mechanism for views, snapshots and distributed data. In Proceedings of 7th International Conference on Very Large Data Bases (Cannes, Sept. 1981). pp. 293-305.

Google Scholar

[4]

AHO, A. V., HOPCROFT, J. E., AND ULLMAN, J. D. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Mass., 1974.

Crossref

Google Scholar

[5]

APERS, P. M. G. Data allocation and distributed query processing. In Proceedings of A CM Pacific '80 (San Francisco, Nov. 1980). ACM, New York, 1980, pp. 48-54.

Google Scholar

[6]

APERS, P. M.G. Redundant allocation of relations in a communication network. In Proceedings of 5th Berkeley Workshop on Distributed Data Management and Computer Networks (San Francisco, Feb. 1981). pp. 245-258.

Google Scholar

[7]

APERS, P. M.G. Centralized or decentralized data allocation. In Proceedings of 2nd Seminar on Distributed Data Sharing Systems (Amsterdam, June 1981). R. P. van de Riet and W. Litwin Eds. North-Holland, Amsterdam, 1981, pp. 101-116.

Google Scholar

[8]

APERS, P. M.G. Query processing and data allocation in distributed database systems. Ph.D. dissertation, Computer Science Dept., Vrije Univ., Amsterdam, Sept. 1982.

Google Scholar

[9]

APERS, P. M. G., HEVNER, A. R., AND YAO, S.B. Optimization algorithms for distributed queries. IEEE Trans. Softw. Eng. SE-9, 1 (Jan. 1983), 57-68.

Google Scholar

[10]

BALDISSERA, C., BRACCHI, G., AND CERI, S. A query processing strategy for distributed data bases. In Proceedings of the European Conference on Applied Information Technology, EURO- IFIP 1979 (London, Sept. 1979). North-Holland, Amsterdam, 1979, pp. 667-677.

Google Scholar

[11]

BERNSTEIN, P. A., GOODMAN, N., WONG, E., REEVE, C. L., AND ROTHNIE, J. B. Query processing in a system for distributed databases (SDD-1). ACM Trans. Database Syst. 6, 4 (Dec. 1981), 602-625.

Crossref

Google Scholar

[12]

CASEY, R.G. Allocation of copies of files in an information network, in Proceedings of AFIPS 1972 SJCC, vol. 40. AFIPS Press, 1972, pp. 617-625.

Google Scholar

[13]

CERI, S., MARTELLA, G., AND PELAGATTI, G. Optimal file allocation for a distributed database on a network of minicomputers. In Proceedings of International Conference on Databases (Aberdeen, July 1980). Deen and Hammerlsey, Eds., Hayden, 1980.

Google Scholar

[14]

CERI, S., NAVATHE, S., AND WIEDERHOLD, G. Distribution design of logical database schemas. IEEE Trans. Softw. Eng. SE-9, 4 {July 1983), 487-563.

Google Scholar

[15]

CERI, S., NEGRI, M., ANU PELAGATTI, G. Horizontal data partitioning in database design. In Proceedings of ACM-SIGMOD Conference (Orlando, Fla., June 1982). ACM, New York, 1982.

Crossref

Google Scholar

[16]

CERL S., AND PELAGATTI, G. Distributed Database--Principles and Systems. McGraw-Hill, New York, 1984.

Crossref

Google Scholar

[17]

CHU, W.W. Optimal file allocation in a multiple-computer information system. IEEE Trans. Comput. C-18 (1969), 885-889.

Google Scholar

[18]

CHU, W. W. Optimal file allocation in a computer network. In Computer-Communication Networks. N. Abrarnson and F. F. Kuo, Eds. Prentice-Hall, Englewood Cliffs, N.J., 1973, pp. 83-94.

Google Scholar

[19]

COOK, S.A. The complexity of theorem-proving procedures. In Proceedings of 3rd Annual ACM Symposium on Theory of Computing. ACM, New York, 1971, pp. 151-158.

Crossref

Google Scholar

[20]

EPSTEIN, R., STONEBRAKER, M. R., AND WONG, E. Distributed query processing in a relational data base system. In Proceedings of ACM-SIGMOD (Boston, May 1979). ACM, New York, 1979, pp. 169-180.

Crossref

Google Scholar

[21]

ESWARAN, K.P. Placement of records in a file and file allocation in a computer network. In Proceedings of the IFIP Congress on Information Processing 1974. North-Holland, Amsterdam, 1974, pp. 304-307.

Google Scholar

[22]

GAREY, M. R., AND JOHNSON, D.S. Computers and Intractibility: A Guide to the Theory of NP- Completeness. Freeman, San Francisco, 1979.

Crossref

Google Scholar

[23]

HEVNER, A. R., AND YAO, S.B. Query processing in distributed database systems. IEEE Trans. Softw. Eng. SE-5, 3 (May 1979), 177-187.

Google Scholar

[24]

HEVNER, A. R. Data allocation and retrieval in distributed systems. In Advances in Data Management, vol. II. Wiley, New York, 1983.

Google Scholar

[25]

HOROWITZ, E., AND SAHNI, S. Fundamentals of Computer Algorithms. Computer Science Press, Rockville, Md., 1978.

Crossref

Google Scholar

[26]

IN-SuP PAIK, AND DELOBEL, C. A strategy for optimizing the distributed query processing. In Proceedings of First International Conference on Distributed Computing Systems (Huntsville, Ala., Oct. 1979). IEEE, New York, 1979, pp. 686-698.

Google Scholar

[27]

Lawler, E. L., and Wood, D.E. Branch-and-bound methods: A survey. Oper. Res. 14, 4 (July 1966), 699-719.

Google Scholar

[28]

LEVIN, K.D. Organizing distributed data bases in computer networks. Ph.D. dissertation, The Wharton School, Univ. of Pennsylvania, Philadelphia, Sept. 1974.

Crossref

Google Scholar

[29]

LEVIN, K. D., AND MORGAN, S.L. Optimizing distributed databases--A framework for research. In Proceedings of 1975 AFIPS NCC, vol. 44. AFIPS Press, 1975, pp. 473-478.

Google Scholar

[30]

MAHMOUD, S., AND RIORDON, J.S. Optimal allocation of resources in distributed information networks. ACM Trans. Database Syst. 1, 1 (March 1976), 66-78.

Crossref

Google Scholar

[31]

NAVATHE, S., CERI, S., WIEDERHOLD, G., AND DOU, J. Vertical partitioning algorithms for database design. ACM Trans. Database Syst. 9, 4 (Dec. 1984), 680-710.

Crossref

Google Scholar

[32]

NGUYEN, GXA TOAN. Decentralized dynamic query decomposition for distributed database systems. In Proceedings of ACM Pacific '80 (San Francisco, Nov. 1980). ACM, New York, 1980, pp. 55-60.

Google Scholar

[33]

NILSSON, N.J. Problem-Solving Methods in Artificial Intelligence. McGraw-Hill, New York, 1971.

Crossref

Google Scholar

[34]

PELAGATTI, G., AND SCHREIBER, F.A. A model of an access strategy in a distributed database. In Proceedings of the Working Conference on Data Base Architecture, IFIP-TC2, Data Base Architecture (Venice, June 1979). Bracchi and Nijssen, Eds., North-Holland, Amsterdam, 1979.

Google Scholar

[35]

POHL, I. Is heuristic search really branch-and-bound? In Proceedings of 6th Princeton IEEE Symposium on Information Sciences and Systems (March 1972), pp. 370-373.

Google Scholar

[36]

RAMAMOORTHY, C. V., AND WAH, B.W. The placement of relations on a distributed relational database. In Proceedings of the 1st International Conference on Distributed Computing Systems (Huntsville, Ala., Oct. 1979). IEEE, New York, 1979, pp. 642-650.

Google Scholar

[37]

ROTHNIE, J. B., AND GOODMAN, N. A survey of research and development in distributed database management. In Proceedings of 3rd International Conference on Very Large Data Bases (Tokyo, Oct. 1977). pp. 48-62.

Google Scholar

[38]

SACCA, D., AND WIEDERHOLD, G. Database partitioning in a cluster of processors. ACM Trans. Database Syst. 10, 1 (March 1985), 29-56.

Crossref

Google Scholar

[39]

SELINGER, P. G., AND ADIBA, M.E. Access path selections in distributed data base management systems. In Proceedings of International Conference on Databases (Aberdeen, July 1980), pp. 204-215.

Google Scholar

[40]

TOTH, K. C., MAHMOUD, S. A., AND RIORDON, J.S. Query processing strategies in a distributed database architecture. In Proceedings of 2nd Seminar on Distributed Data Sharing Systems (Amsterdam, June 1981). R. P. van de Riet and W. Litwin, Eds. North-Holland, Amsterdam, 1981.

Google Scholar

[41]

WAH, B. W., AND LIEN, Y.-N. Design of distributed databases on local computer systems with a multiaccess network. IEEE Trans. Softw. Eng. SE-11, 7 (July 1985), 606-619.

Google Scholar

[42]

WONG, E. Retrieving dispersed data from SDD-I: A system for distributed data bases. In Proceedings of 2nd Berkeley Workshop on Distributed Data Management and Computer Networks (San Francisco, May 1977), pp. 217-235.

Google Scholar

[43]

Yv, C. T., AND CHANG, C. C. Distributed query processing. ACM Comput. Surv. 16, 4 (Dec. 1984), 399-433.

Crossref

Google Scholar

[44]

Yu, C. T., LAM, K., CHANG, C. C., AND CHANG, S.K. A promising approach to distributed query processing. In Proceedings of Berkeley Conference on Distributed Data Bases (San Francisco, Feb. 1982), pp. 363-390.

Google Scholar

Cited By

View all

Kadam SKim D(2024)Knowledge-Aware Semantic Communication System Design and Data AllocationIEEE Transactions on Vehicular Technology10.1109/TVT.2023.333335073:4(5755-5769)Online publication date: Apr-2024
https://doi.org/10.1109/TVT.2023.3333350
Reznichenko V(2022)60 Years of Databases (part three)PROBLEMS IN PROGRAMMING10.15407/pp2022.01.034(034-066)Online publication date: Mar-2022
https://doi.org/10.15407/pp2022.01.034
Burckhardt SChandramouli BGillum CJusto DKallas KMcMahon CMeiklejohn CZhu X(2022)NetheriteProceedings of the VLDB Endowment10.14778/3529337.352934415:8(1591-1604)Online publication date: 22-Jun-2022
https://dl.acm.org/doi/10.14778/3529337.3529344
Show More Cited By

Recommendations

Allocation of processes and files for load balancing in distributed systems
Incremental data allocation and reallocation in distributed database systems
Data warehousing and web engineering

In a distributed database system, an increase in workload typically necessitates the installation of additional database servers followed by the implementation of expensive data reorganization strategies. We present the Partial REALLOCATE and Full ...
A data allocation considering data availability in distributed database systems
ICPADS '97: Proceedings of the 1997 International Conference on Parallel and Distributed Systems

In designing distributed databases, data allocation is one of the key design issues. Some advantages of optimal data allocation are reducing cost, increasing performance and availability. However, because of that most of current data allocation ...

Reviews

Reviewer: Robert J. Tufts

Operational systems that distribute data across a communications network are currently designed around the allocation of files to the proper nodes (replicated, partitioned, or centralized databases). The author of this paper contends that this approach does not really distribute databases. In a true distributed environment, fragments of data are allocated to different nodes, so query processing must combine data from different nodes for the final answer. To do this economically requires a good cost function that accounts for the fragments of data, usage frequencies on all queries and updates, and sites where the results have to be sent. The purpose of this paper, therefore, is to present and discuss the author's model for comparing the costs of various data allocations across a communications network. The author does a good job of introducing the subject, describing previous work on file allocations across a network, and rigorously explaining the proposed cost model. He also attempts to show how this cost model could be useful to database administrators responsible for allocating data across networks. The major strength of the paper is its rigor. Not only is the cost model presented with numerous examples, but the underlying mathematics are presented with numerous proofs that show the cost model to be at least suboptimal for both static and dynamic cases. The paper's primary weakness is in the applicability of the model to the real world. Inputs to the model include the total set of queries and updates, frequencies of their usage, and final destinations for the results. In most systems, this information is almost impossible to collect because use of the system is not static over time. Where the system is large enough to gather meaningful average statistics (e.g., with a digital telephone switching network), the amount of data input to the model would be overwhelming. The author recognizes this problem but does not give a good solution for it. The paper is well organized and, with the exception of a few confusing sentences, easy to read and understand. Its length is about right for the subject, although more of the proofs could have been put into an appendix to assist readability. The bibliography is adequate, but most of the references are from 1982 and earlier, with only two as recent as 1985. Still, the paper is useful as a reference for those doing research on the problem of distributing data across a multinode communications network.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems

ACM Transactions on Database Systems Volume 13, Issue 3

Sept. 1988

158 pages

ISSN:0362-5915

EISSN:1557-4644

DOI:10.1145/44498

Editor:
Gio Wiederhold
Computer Science Department, Stanford University, Stanford, CA

Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 1988

Published in TODS Volume 13, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

181
Total Citations
View Citations
4,609
Total Downloads

Downloads (Last 12 months)422
Downloads (Last 6 weeks)32

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Kadam SKim D(2024)Knowledge-Aware Semantic Communication System Design and Data AllocationIEEE Transactions on Vehicular Technology10.1109/TVT.2023.333335073:4(5755-5769)Online publication date: Apr-2024
https://doi.org/10.1109/TVT.2023.3333350
Reznichenko V(2022)60 Years of Databases (part three)PROBLEMS IN PROGRAMMING10.15407/pp2022.01.034(034-066)Online publication date: Mar-2022
https://doi.org/10.15407/pp2022.01.034
Burckhardt SChandramouli BGillum CJusto DKallas KMcMahon CMeiklejohn CZhu X(2022)NetheriteProceedings of the VLDB Endowment10.14778/3529337.352934415:8(1591-1604)Online publication date: 22-Jun-2022
https://dl.acm.org/doi/10.14778/3529337.3529344
(2022)BibliographyStorage Systems10.1016/B978-0-32-390796-5.00023-1(641-693)Online publication date: 2022
https://doi.org/10.1016/B978-0-32-390796-5.00023-1
Thomasian A(2022)Database parallelism, big data and analytics, deep learningStorage Systems10.1016/B978-0-32-390796-5.00017-6(385-491)Online publication date: 2022
https://doi.org/10.1016/B978-0-32-390796-5.00017-6
Singh AKhehra BMavi B(2021)Simplified-BBO for Non-redundant Allocation of Data in Distributed Database Design2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS)10.1109/MWSCAS47672.2021.9531836(544-548)Online publication date: 9-Aug-2021
https://doi.org/10.1109/MWSCAS47672.2021.9531836
Khan WZhang CLuo BKumar TAhmed E(2021)Robust Partitioning Scheme for Accelerating SQL Database2021 IEEE International Conference on Emergency Science and Information Technology (ICESIT)10.1109/ICESIT53460.2021.9696761(369-376)Online publication date: 22-Nov-2021
https://doi.org/10.1109/ICESIT53460.2021.9696761
Kadambi SChen JCooper BLomax DRamakrishnan RSilberstein ATam EGarcia-Molina H(2020)Where in the world is my data?Proceedings of the VLDB Endowment10.14778/3402707.34027404:11(1040-1050)Online publication date: 3-Jun-2020
https://dl.acm.org/doi/10.14778/3402707.3402740
Mazalov AYakovlev AAlekseev VGaidashov I(2020)Mathematical model for optimizing distributed information systemsJournal of Physics: Conference Series10.1088/1742-6596/1679/2/0221001679(022100)Online publication date: 26-Nov-2020
https://doi.org/10.1088/1742-6596/1679/2/022100
Amer AMohamed MAl_Asri K(2020)ASGOP: An aggregated similarity-based greedy-oriented approach for relational DDBSs designHeliyon10.1016/j.heliyon.2020.e031726:1(e03172)Online publication date: Jan-2020
https://doi.org/10.1016/j.heliyon.2020.e03172
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Allocation of processes and files for load balancing in distributed systems

Incremental data allocation and reallocation in distributed database systems

A data allocation considering data availability in distributed database systems

Reviews

Access critical reviews of Computing literature here