Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/50202.50213acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free access

Data placement in Bubba

Published: 01 June 1988 Publication History

Abstract

This paper examines the problem of data placement in Bubba, a highly-parallel system for data-intensive applications being developed at MCC. “Highly-parallel” implies that load balancing is a critical performance issue. “Data-intensive” means data is so large that operations should be executed where the data resides. As a result, data placement becomes a critical performance issue.
In general, determining the optimal placement of data across processing nodes for performance is a difficult problem. We describe our heuristic approach to solving the data placement problem in Bubba. We then present experimental results using a specific workload to provide insight into the problem. Several researchers have argued the benefits of declustering (i e, spreading each base relation over many nodes). We show that as declustering is increased, load balancing continues to improve. However, for transactions involving complex joins, further declustering reduces throughput because of communications, startup and termination overhead.
We argue that data placement, especially declustering, in a highly-parallel system must be considered early in the design, so that mechanisms can be included for supporting variable declustering, for minimizing the most significant overheads associated with large-scale declustering, and for gathering the required statistics.

References

[1]
W Alexander, T Keller and E Boughter, "A Workload Characterization P~pehne for Models of Parallel Systems," ACM SIGMETRICS Conference, Alberta, Canada (May 1987)
[2]
W Alexander and G Copeland, "Comparison Of Dataflow Control Techniques In Distributed Data-Intensive Systems," ACM SIGMETRICS Conference, Santa Fe, New Mexico (May 1988)
[3]
W Alexander and G Copeland, "Process And Dataflow Control In D~stnbuted data-Intensive Systems," ACM SIGMOD Conference, Chicago (June 1988)
[4]
Anon el al, "A Measure Of Transaction Processing Power," Datamatton, Vol 31 7 (Aprd 1985)
[5]
R Attar, P Bemstem and N Goodman, "Site Imtmhzat~on, Recovery And Backup In A D~stnbuted Database System," IEEE Transactions on Software Engmeermg, Vol SE-10, No 6 (November 1984)
[6]
D S Batory, "Optimal File Designs And Reorgamzat~on Points," ACM TODS, Vol 7, No 1 (March 1982)
[7]
E Boughter, W Alexander and T Keller, "A Tool for Performance-Driven Design of Parallel Systems", MCC Tech Report ACA-ST-312-87 (1987)
[8]
R Bunt, J Murphy, and S Majumdar, "A Measure of Program Locahty and ~ts Apphcations," A CM SIGMETRICS Conference, Cambridge, Mass (May 1984)
[9]
W W Chu, "Multiple File Allocations m a Multiple Computer System," IEEE Trans on Computers, Vol C-18, No 10 (October 1969)
[10]
Z Cvetanowe, "The Effects Of Problem Partmonmg, Alloeauon, and Granularity On The Performance Of Multiple-Processor Systems," IEEE Trans on Computers, Vol C-36, No 4 (Aprd 1987)
[11]
Denning, P, Buzen, J, "The Operational Analysis of Queuing Network Models", ACM Computing Surveys Vol 10, No 3 (September 1978)
[12]
D J DeW~tt, R H Gerber, G Graefe, M H Heytens, K B Kumar and M Murahknshna, "GAMMA--A High Performance Dataflow Database Machine," VLDB Conference, Japan (August 1986)
[13]
D J DeWltt, S Ghandeharizadeh, D Schneider, R Jauhan, M Mural~knshna and A Sharma, "A Single User Evaluation Of The Gamma Database Machme," Proceedings of the Fifth International Workshop on Database Machines, Japan (October 1987)
[14]
K Eswaran, "Placement of Records m a Fde and Fde Allocation m a Computer Network," lnformatton Processtng 74, IFIPS (1974)
[15]
A Flory, J Gunther and J Kouloumdjtan, "Database Reorganization By Clustering Methods," lnformatton Systems, Vol 3, No 1 (1978)
[16]
J Gray, "Notes on Database Operating Systems," IBM Research Laboratory, San Jose, Report RJ2188 (1978)
[17]
J N Gray and F Putzolu, "The 5 Minute Rule for Trading Memory for Disc Accesses and the 10 Byte Rule for Trading Memory for CPU Time," A CM SlGMOD Conference, San Francisco (May 1987)
[18]
K Hwang and F Bnggs, Computer Archttecture And Parallel Processing, McGraw-Hall Pub Co (1984)
[19]
M Jakobsson, "Reducing Block Accesses In inverted Fdes By Partml Clustering," Information Systems, Vol 5, No 1 (1980)
[20]
J A Katzman, "A Fault-Tolerant Celnputmg System," Eleventh Conference on System Sczences, Hawan (January 1978)
[21]
E Lazowska, J Zahorjan, G Graham, K Sevcik, Quantttattve System Performance, Prentice-Hall (1984)
[22]
M Livny, S Khoshafian and H Boral, "Multi-Disk Management," ACM SIGMETRICS Conference, Alberta, Canada (1987)
[23]
S Mahmoud and J S Raordon, "Optimal Allocation of Resources m D~stnbuted Information Networks", ACM TODS, Vol 1, No 1 (March 1976)
[24]
K Maruyama and S E Smith, "Optimal Reorgamzauon Of D~strlbuted Space D~sk Flies," Commun of the ACM, Vol 19, No 11 (November 1976)
[25]
R Mukkamala, "Design of Partmlly Replmated D~stnbuted Database Systems An Integrated Methodology," Tech Report 87-04, Department of Computer Science, Umverslty of Iowa (July 1987)
[26]
E Omiecmskl and P Scheuermann, "A Global Approach To Record Clustering and File Reorgamzataon," Techmcal Report, Department Of EECS, Northwestern Umverslty (December 1983)
[27]
H W Sammer, "Online Stock Trading Systems Study Of An Apphcatlon," IEEE COMPCON, San Francisco (February 1987)
[28]
B Shnelderman, "Optimum Data Base Reorganization Points," Commun of the ACM, Vol 16, No 6 (June 1973)
[29]
G H Sockut and R P Goldberg, "Database Reorgamzatton---Prmc~ples And Practices," ACM Computtng Surveys, Vol 11, No 4 (December 1979)
[30]
M Stonebraker, "The Case For Shared Nothing," Database Engtneermg Conf, Vol 9, No 1 (March 1986)
[31]
The Tandem Database Group, "NonStop SQL, A D~stnbuted, High-Performance, Hlgh-Avadabd~ty Implementation of SQL," Workshop on Hzgh Performance Transactton Systems, Asdomar, CA (September 1987)
[32]
"DBC/1012 Data Base Computer System Manual, Release 1 3," C10-0001-01, Teradata Corp, Los Angeles (February 1985)
[33]
W G Tuel, "Optimal Reorgamzatlon Points For Lmearly Growlng Fdes," ACM TODS, Vol 3, No 1 (March 1978)
[34]
D Vrsalowc, E F Gehrmger, Z Z Segal and D P S~ewtorek, "The Influence Of Parallel Decomposmon Strategies On The Perlormanee Of Multlprocessor Systems,"IEEE/ACM Symposmm on Computer Archttecture, Boston (June 1985)
[35]
S B Yao, K S Das and T J Teorey, "A Dynamic Database Reorgamzatlon Algorithm," ACM TODS, Vol 1, No 2 (June 1976)
[36]
CT Yu, CM Such, K Lam and MK Sin, "Adaptive Record Clustering," ACM TODS, Vol 10, No 2 (June 1985)

Cited By

View all
  • (2024)Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning TechniquesJournal of Computer Science and Technology10.1007/s11390-024-3538-139:2(346-368)Online publication date: 1-Mar-2024
  • (2018)C-storeMaking Databases Work10.1145/3226595.3226638(491-518)Online publication date: 1-Dec-2018
  • (2018)EASY: Efficient Segment Assignment Strategy for Reducing Tail Latencies in Pinot2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS.2018.00144(1432-1437)Online publication date: Jul-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '88: Proceedings of the 1988 ACM SIGMOD international conference on Management of data
June 1988
443 pages
ISBN:0897912683
DOI:10.1145/50202
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1988

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD88
Sponsor:
SIGMOD88: International Conference On Management of Data
June 1 - 3, 1988
Illinois, Chicago, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2,779
  • Downloads (Last 6 weeks)538
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning TechniquesJournal of Computer Science and Technology10.1007/s11390-024-3538-139:2(346-368)Online publication date: 1-Mar-2024
  • (2018)C-storeMaking Databases Work10.1145/3226595.3226638(491-518)Online publication date: 1-Dec-2018
  • (2018)EASY: Efficient Segment Assignment Strategy for Reducing Tail Latencies in Pinot2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS.2018.00144(1432-1437)Online publication date: Jul-2018
  • (2017)The design of an adaptive column-store systemJournal of Big Data10.1186/s40537-017-0069-44:1Online publication date: 23-Mar-2017
  • (2017)Morphus: Supporting Online Reconfigurations in Sharded NoSQL SystemsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2015.24981025:4(466-479)Online publication date: Oct-2017
  • (2017)Resource bricolage and resource selection for parallel database systemsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-016-0435-426:1(31-54)Online publication date: 1-Feb-2017
  • (2016)Database MiddlewareEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_689-2(1-4)Online publication date: 9-Nov-2016
  • (2014)AccordionProceedings of the VLDB Endowment10.14778/2732977.27329797:12(1035-1046)Online publication date: 1-Aug-2014
  • (2014)Analysis of a parallel grace hash join implementation on The Cell Processor2014 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCSim.2014.6903805(1018-1022)Online publication date: Jul-2014
  • (2013)Cogset: a high performance MapReduce engineConcurrency and Computation: Practice & Experience10.1002/cpe.282725:1(2-23)Online publication date: 1-Jan-2013
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media