Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Data allocation in distributed database systems

Published: 01 September 1988 Publication History

Abstract

The problem of allocating the data of a database to the sites of a communication network is investigated. This problem deviates from the well-known file allocation problem in several aspects. First, the objects to be allocated are not known a priori; second, these objects are accessed by schedules that contain transmissions between objects to produce the result. A model that makes it possible to compare the cost of allocations is presented; the cost can be computed for different cost functions and for processing schedules produced by arbitrary query processing algorithms.
For minimizing the total transmission cost, a method is proposed to determine the fragments to be allocated from the relations in the conceptual schema and the queries and updates executed by the users.
For the same cost function, the complexity of the data allocation problem is investigated. Methods for obtaining optimal and heuristic solutions under various ways of computing the cost of an allocation are presented and compared.
Two different approaches to the allocation management problem are presented and their merits are discussed.

References

[1]
ADIBA, M., CHUPIN, J. C., DEMOLOMBE, R., GARDARIN, G., AND BIHAN, J. L. Issues in distributed data base management systems: A technical overview. In Proceedings of the 4th International Conference on Very Large Data Bases (West Berlin, Sept. 1978), pp. 89-110.
[2]
ADIBA, M. E., AND LINDSAY, B.G. Database snapshots, in Proceedings of Conference on Very Large Data Bases (Montreal, Oct. 1980). pp. 86-91.
[3]
ADIBA, M.E. Derived relations: A unified mechanism for views, snapshots and distributed data. In Proceedings of 7th International Conference on Very Large Data Bases (Cannes, Sept. 1981). pp. 293-305.
[4]
AHO, A. V., HOPCROFT, J. E., AND ULLMAN, J. D. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Mass., 1974.
[5]
APERS, P. M. G. Data allocation and distributed query processing. In Proceedings of A CM Pacific '80 (San Francisco, Nov. 1980). ACM, New York, 1980, pp. 48-54.
[6]
APERS, P. M.G. Redundant allocation of relations in a communication network. In Proceedings of 5th Berkeley Workshop on Distributed Data Management and Computer Networks (San Francisco, Feb. 1981). pp. 245-258.
[7]
APERS, P. M.G. Centralized or decentralized data allocation. In Proceedings of 2nd Seminar on Distributed Data Sharing Systems (Amsterdam, June 1981). R. P. van de Riet and W. Litwin Eds. North-Holland, Amsterdam, 1981, pp. 101-116.
[8]
APERS, P. M.G. Query processing and data allocation in distributed database systems. Ph.D. dissertation, Computer Science Dept., Vrije Univ., Amsterdam, Sept. 1982.
[9]
APERS, P. M. G., HEVNER, A. R., AND YAO, S.B. Optimization algorithms for distributed queries. IEEE Trans. Softw. Eng. SE-9, 1 (Jan. 1983), 57-68.
[10]
BALDISSERA, C., BRACCHI, G., AND CERI, S. A query processing strategy for distributed data bases. In Proceedings of the European Conference on Applied Information Technology, EURO- IFIP 1979 (London, Sept. 1979). North-Holland, Amsterdam, 1979, pp. 667-677.
[11]
BERNSTEIN, P. A., GOODMAN, N., WONG, E., REEVE, C. L., AND ROTHNIE, J. B. Query processing in a system for distributed databases (SDD-1). ACM Trans. Database Syst. 6, 4 (Dec. 1981), 602-625.
[12]
CASEY, R.G. Allocation of copies of files in an information network, in Proceedings of AFIPS 1972 SJCC, vol. 40. AFIPS Press, 1972, pp. 617-625.
[13]
CERI, S., MARTELLA, G., AND PELAGATTI, G. Optimal file allocation for a distributed database on a network of minicomputers. In Proceedings of International Conference on Databases (Aberdeen, July 1980). Deen and Hammerlsey, Eds., Hayden, 1980.
[14]
CERI, S., NAVATHE, S., AND WIEDERHOLD, G. Distribution design of logical database schemas. IEEE Trans. Softw. Eng. SE-9, 4 {July 1983), 487-563.
[15]
CERI, S., NEGRI, M., ANU PELAGATTI, G. Horizontal data partitioning in database design. In Proceedings of ACM-SIGMOD Conference (Orlando, Fla., June 1982). ACM, New York, 1982.
[16]
CERL S., AND PELAGATTI, G. Distributed Database--Principles and Systems. McGraw-Hill, New York, 1984.
[17]
CHU, W.W. Optimal file allocation in a multiple-computer information system. IEEE Trans. Comput. C-18 (1969), 885-889.
[18]
CHU, W. W. Optimal file allocation in a computer network. In Computer-Communication Networks. N. Abrarnson and F. F. Kuo, Eds. Prentice-Hall, Englewood Cliffs, N.J., 1973, pp. 83-94.
[19]
COOK, S.A. The complexity of theorem-proving procedures. In Proceedings of 3rd Annual ACM Symposium on Theory of Computing. ACM, New York, 1971, pp. 151-158.
[20]
EPSTEIN, R., STONEBRAKER, M. R., AND WONG, E. Distributed query processing in a relational data base system. In Proceedings of ACM-SIGMOD (Boston, May 1979). ACM, New York, 1979, pp. 169-180.
[21]
ESWARAN, K.P. Placement of records in a file and file allocation in a computer network. In Proceedings of the IFIP Congress on Information Processing 1974. North-Holland, Amsterdam, 1974, pp. 304-307.
[22]
GAREY, M. R., AND JOHNSON, D.S. Computers and Intractibility: A Guide to the Theory of NP- Completeness. Freeman, San Francisco, 1979.
[23]
HEVNER, A. R., AND YAO, S.B. Query processing in distributed database systems. IEEE Trans. Softw. Eng. SE-5, 3 (May 1979), 177-187.
[24]
HEVNER, A. R. Data allocation and retrieval in distributed systems. In Advances in Data Management, vol. II. Wiley, New York, 1983.
[25]
HOROWITZ, E., AND SAHNI, S. Fundamentals of Computer Algorithms. Computer Science Press, Rockville, Md., 1978.
[26]
IN-SuP PAIK, AND DELOBEL, C. A strategy for optimizing the distributed query processing. In Proceedings of First International Conference on Distributed Computing Systems (Huntsville, Ala., Oct. 1979). IEEE, New York, 1979, pp. 686-698.
[27]
Lawler, E. L., and Wood, D.E. Branch-and-bound methods: A survey. Oper. Res. 14, 4 (July 1966), 699-719.
[28]
LEVIN, K.D. Organizing distributed data bases in computer networks. Ph.D. dissertation, The Wharton School, Univ. of Pennsylvania, Philadelphia, Sept. 1974.
[29]
LEVIN, K. D., AND MORGAN, S.L. Optimizing distributed databases--A framework for research. In Proceedings of 1975 AFIPS NCC, vol. 44. AFIPS Press, 1975, pp. 473-478.
[30]
MAHMOUD, S., AND RIORDON, J.S. Optimal allocation of resources in distributed information networks. ACM Trans. Database Syst. 1, 1 (March 1976), 66-78.
[31]
NAVATHE, S., CERI, S., WIEDERHOLD, G., AND DOU, J. Vertical partitioning algorithms for database design. ACM Trans. Database Syst. 9, 4 (Dec. 1984), 680-710.
[32]
NGUYEN, GXA TOAN. Decentralized dynamic query decomposition for distributed database systems. In Proceedings of ACM Pacific '80 (San Francisco, Nov. 1980). ACM, New York, 1980, pp. 55-60.
[33]
NILSSON, N.J. Problem-Solving Methods in Artificial Intelligence. McGraw-Hill, New York, 1971.
[34]
PELAGATTI, G., AND SCHREIBER, F.A. A model of an access strategy in a distributed database. In Proceedings of the Working Conference on Data Base Architecture, IFIP-TC2, Data Base Architecture (Venice, June 1979). Bracchi and Nijssen, Eds., North-Holland, Amsterdam, 1979.
[35]
POHL, I. Is heuristic search really branch-and-bound? In Proceedings of 6th Princeton IEEE Symposium on Information Sciences and Systems (March 1972), pp. 370-373.
[36]
RAMAMOORTHY, C. V., AND WAH, B.W. The placement of relations on a distributed relational database. In Proceedings of the 1st International Conference on Distributed Computing Systems (Huntsville, Ala., Oct. 1979). IEEE, New York, 1979, pp. 642-650.
[37]
ROTHNIE, J. B., AND GOODMAN, N. A survey of research and development in distributed database management. In Proceedings of 3rd International Conference on Very Large Data Bases (Tokyo, Oct. 1977). pp. 48-62.
[38]
SACCA, D., AND WIEDERHOLD, G. Database partitioning in a cluster of processors. ACM Trans. Database Syst. 10, 1 (March 1985), 29-56.
[39]
SELINGER, P. G., AND ADIBA, M.E. Access path selections in distributed data base management systems. In Proceedings of International Conference on Databases (Aberdeen, July 1980), pp. 204-215.
[40]
TOTH, K. C., MAHMOUD, S. A., AND RIORDON, J.S. Query processing strategies in a distributed database architecture. In Proceedings of 2nd Seminar on Distributed Data Sharing Systems (Amsterdam, June 1981). R. P. van de Riet and W. Litwin, Eds. North-Holland, Amsterdam, 1981.
[41]
WAH, B. W., AND LIEN, Y.-N. Design of distributed databases on local computer systems with a multiaccess network. IEEE Trans. Softw. Eng. SE-11, 7 (July 1985), 606-619.
[42]
WONG, E. Retrieving dispersed data from SDD-I: A system for distributed data bases. In Proceedings of 2nd Berkeley Workshop on Distributed Data Management and Computer Networks (San Francisco, May 1977), pp. 217-235.
[43]
Yv, C. T., AND CHANG, C. C. Distributed query processing. ACM Comput. Surv. 16, 4 (Dec. 1984), 399-433.
[44]
Yu, C. T., LAM, K., CHANG, C. C., AND CHANG, S.K. A promising approach to distributed query processing. In Proceedings of Berkeley Conference on Distributed Data Bases (San Francisco, Feb. 1982), pp. 363-390.

Cited By

View all

Recommendations

Reviews

Robert J. Tufts

Operational systems that distribute data across a communications network are currently designed around the allocation of files to the proper nodes (replicated, partitioned, or centralized databases). The author of this paper contends that this approach does not really distribute databases. In a true distributed environment, fragments of data are allocated to different nodes, so query processing must combine data from different nodes for the final answer. To do this economically requires a good cost function that accounts for the fragments of data, usage frequencies on all queries and updates, and sites where the results have to be sent. The purpose of this paper, therefore, is to present and discuss the author's model for comparing the costs of various data allocations across a communications network. The author does a good job of introducing the subject, describing previous work on file allocations across a network, and rigorously explaining the proposed cost model. He also attempts to show how this cost model could be useful to database administrators responsible for allocating data across networks. The major strength of the paper is its rigor. Not only is the cost model presented with numerous examples, but the underlying mathematics are presented with numerous proofs that show the cost model to be at least suboptimal for both static and dynamic cases. The paper's primary weakness is in the applicability of the model to the real world. Inputs to the model include the total set of queries and updates, frequencies of their usage, and final destinations for the results. In most systems, this information is almost impossible to collect because use of the system is not static over time. Where the system is large enough to gather meaningful average statistics (e.g., with a digital telephone switching network), the amount of data input to the model would be overwhelming. The author recognizes this problem but does not give a good solution for it. The paper is well organized and, with the exception of a few confusing sentences, easy to read and understand. Its length is about right for the subject, although more of the proofs could have been put into an appendix to assist readability. The bibliography is adequate, but most of the references are from 1982 and earlier, with only two as recent as 1985. Still, the paper is useful as a reference for those doing research on the problem of distributing data across a multinode communications network.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 13, Issue 3
Sept. 1988
158 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/44498
  • Editor:
  • Gio Wiederhold
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 1988
Published in TODS Volume 13, Issue 3

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)422
  • Downloads (Last 6 weeks)32
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Knowledge-Aware Semantic Communication System Design and Data AllocationIEEE Transactions on Vehicular Technology10.1109/TVT.2023.333335073:4(5755-5769)Online publication date: Apr-2024
  • (2022)60 Years of Databases (part three)PROBLEMS IN PROGRAMMING10.15407/pp2022.01.034(034-066)Online publication date: Mar-2022
  • (2022)NetheriteProceedings of the VLDB Endowment10.14778/3529337.352934415:8(1591-1604)Online publication date: 22-Jun-2022
  • (2022)BibliographyStorage Systems10.1016/B978-0-32-390796-5.00023-1(641-693)Online publication date: 2022
  • (2022)Database parallelism, big data and analytics, deep learningStorage Systems10.1016/B978-0-32-390796-5.00017-6(385-491)Online publication date: 2022
  • (2021)Simplified-BBO for Non-redundant Allocation of Data in Distributed Database Design2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS)10.1109/MWSCAS47672.2021.9531836(544-548)Online publication date: 9-Aug-2021
  • (2021)Robust Partitioning Scheme for Accelerating SQL Database2021 IEEE International Conference on Emergency Science and Information Technology (ICESIT)10.1109/ICESIT53460.2021.9696761(369-376)Online publication date: 22-Nov-2021
  • (2020)Where in the world is my data?Proceedings of the VLDB Endowment10.14778/3402707.34027404:11(1040-1050)Online publication date: 3-Jun-2020
  • (2020)Mathematical model for optimizing distributed information systemsJournal of Physics: Conference Series10.1088/1742-6596/1679/2/0221001679(022100)Online publication date: 26-Nov-2020
  • (2020)ASGOP: An aggregated similarity-based greedy-oriented approach for relational DDBSs designHeliyon10.1016/j.heliyon.2020.e031726:1(e03172)Online publication date: Jan-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media