Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/1298455.1298475acmconferencesArticle/Chapter ViewAbstractPublication PagesosdiConference Proceedingsconference-collections
Article

Bigtable: a distributed storage system for structured data

Published: 06 November 2006 Publication History

Abstract

Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.

References

[1]
Abadi, D. J., Madden, S. R., and Ferreira, M. C. Integrating compression and execution in column-oriented database systems. Proc. of SIGMOD (2006).
[2]
Ailamaki, A., DeWitt, D. J., Hill, M. D., and Skounakis, M. Weaving relations for cache performance. In The VLDB Journal (2001), pp. 169--180.
[3]
Banga, G., Druschel, P., and Mogul, J. C. Resource containers: A new facility for resource management in server systems. In Proc. of the 3rd OSDI (Feb. 1999), pp. 45--58.
[4]
Baru, C. K., Fecteau, G., Goyal, A., Hsiao, H., Jhingran, A., Padmanabhan, S., Copeland, G. P., and Wilson, W. G. DB2 parallel edition. IBM Systems Journal 34, 2 (1995), 292--322.
[5]
Bavier, A., Bowman, M., Chun, B., Culler, D., Karlin, S., Peterson, L., Roscoe, T., Spalink, T., and Wawrzoniak, M. Operating system support for planetary-scale network services. In Proc. of the 1st NSDI (Mar. 2004), pp. 253--266.
[6]
Bentley, J. L., and McIlroy, M. D. Data compression using long common strings. In Data Compression Conference (1999), pp. 287--295.
[7]
Bloom, B. H. Space/time trade-offs in hash coding with allowable errors. CACM 13, 7 (1970), 422--426.
[8]
Burrows, M. The Chubby lock service for loosely-coupled distributed systems. In Proc. of the 7th OSDI (Nov. 2006).
[9]
Chandra, T., Griesemer, R., and Redstone, J. Paxos made live --- An engineering perspective. In Proc. of PODC (2007).
[10]
Comer, D. Ubiquitous B-tree. Computing Surveys 11, 2 (June 1979), 121--137.
[11]
Copeland, G. P., Alexander, W., Boughter, E. E., and Keller, T. W. Data placement in Bubba. In Proc. of SIGMOD (1988), pp. 99--108.
[12]
Dean, J., and Ghemawat, S. MapReduce: Simplified data processing on large clusters. In Proc. of the 6th OSDI (Dec. 2004), pp. 137--150.
[13]
DeWitt, D., Katz, R., Olken, F., Shapiro, L., Stonebraker, M., and Wood, D. Implementation techniques for main memory database systems. In Proc. of SIGMOD (June 1984), pp. 1--8.
[14]
DeWitt, D. J., and Gray, J. Parallel database systems: The future of high performance database systems. CACM 35, 6 (June 1992), 85--98.
[15]
French, C. D. One size fits all database architectures do not work for DSS. In Proc. of SIGMOD (May 1995), pp. 449--450.
[16]
Gawlick, D., and Kinkade, D. Varieties of concurrency control in IMS/VS fast path. Database Engineering Bulletin 8, 2 (1985), 3--10.
[17]
Ghemawat, S., Gobioff, H., and Leung, S.-T. The Google file system. In Proc. of the 19th ACM SOSP (Dec. 2003), pp. 29--43.
[18]
Gray, J. Notes on database operating systems. In Operating Systems --- An Advanced Course, vol. 60 of Lecture Notes in Computer Science. Springer-Verlag, 1978.
[19]
Greer, R. Daytona and the fourth-generation language Cymbal. In Proc. of SIGMOD (1999), pp. 525--526.
[20]
Hagmann, R. Reimplementing the Cedar file system using logging and group commit. In Proc. of the 11th SOSP (Dec. 1987), pp. 155--162.
[21]
Hartman, J. H., and Ousterhout, J. K. The Zebra striped network file system. In Proc. of the 14th SOSP (Asheville, NC, 1993), pp. 29--43.
[22]
Kx.com. kx.com/products/database.php. Product page.
[23]
Lamport, L. The part-time parliament. ACM TOCS 16, 2 (1998), 133--169.
[24]
MacCormick, J., Murphy, N., Najork, M., Thekkath, C. A., and Zhou, L. Boxwood: Abstractions as the foundation for storage infrastructure. In Proc. of the 6th OSDI (Dec. 2004), pp. 105--120.
[25]
McCarthy, J. Recursive functions of symbolic expressions and their computation by machine. CACM 3, 4 (Apr. 1960), 184--195.
[26]
O'Neil, P., Cheng, E., Gawlick, D., and O'Neil, E. The log-structured merge-tree (LSM-tree). Acta Inf. 33, 4 (1996), 351--385.
[27]
Oracle.com.www.oracle.com/technology/products/-database/clustering/index.html. Product page.
[28]
Pike, R., Dorward, S., Griesemer, R., and Quinlan, S. Interpreting the data: Parallel analysis with Sawzall. Scientific Programming Journal 13, 4 (2005), 227--298.
[29]
Ratnasamy, S., Francis, P., Handley, M., Karp, R., and Shenker, S. A scalable content-addressable network. In Proc. of SIGCOMM (Aug. 2001), pp. 161--172.
[30]
Rowstron, A., and Druschel, P. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Proc. of Middleware 2001 (Nov. 2001), pp. 329--350.
[31]
Sensage.com.sensage.com/products-sensage.htm. Product page.
[32]
Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., and Balakrishnan, H. Chord: A scalable peer-to-peer lookup service for Internet applications. In Proc. of SIGCOMM (Aug. 2001), pp. 149--160.
[33]
Stonebraker, M. The case for shared nothing. Database Engineering Bulletin 9, 1 (Mar. 1986), 4--9.
[34]
Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O'Neil, E., O'Neil, P., Rasin, A., Tran, N., and Zdonik, S. C-Store: A column-oriented DBMS. In Proc. of VLDB (Aug. 2005), pp. 553--564.
[35]
Stonebraker, M., Aoki, P. M., Devine, R., Litwin, W., and Olson, M. A. Mariposa: A new architecture for distributed data. In Proc. of the Tenth ICDE (1994), IEEE Computer Society, pp. 54--65.
[36]
Sybase.com.www.sybase.com/products/database-servers/sybaseiq. Product page.
[37]
Zhao, B. Y., Kubiatowicz, J., and Joseph, A. D. Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Tech. Rep. UCB/CSD-01-1141, CS Division, UC Berkeley, Apr. 2001.
[38]
Zukowski, M., Boncz, P. A., Nes, N., and Heman, S. MonetDB/X100 --- A DBMS in the CPU cache. IEEE Data Eng. Bull. 28, 2 (2005), 17--22.

Cited By

View all
  • (2024)On the Feasibility and Benefits of Extensive EvaluationProceedings of the ACM on Management of Data10.1145/36771372:4(1-24)Online publication date: 30-Sep-2024
  • (2024)Anatomy of the LSM Memory BufferProceedings of the Tenth International Workshop on Testing Database Systems10.1145/3662165.3662766(23-29)Online publication date: 9-Jun-2024
  • (2023)Enabling Timely and Persistent Deletion in LSM-EnginesACM Transactions on Database Systems10.1145/359972448:3(1-40)Online publication date: 9-Aug-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
OSDI '06: Proceedings of the 7th symposium on Operating systems design and implementation
November 2006
407 pages
ISBN:1931971471

Sponsors

Publisher

USENIX Association

United States

Publication History

Published: 06 November 2006

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)2
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)On the Feasibility and Benefits of Extensive EvaluationProceedings of the ACM on Management of Data10.1145/36771372:4(1-24)Online publication date: 30-Sep-2024
  • (2024)Anatomy of the LSM Memory BufferProceedings of the Tenth International Workshop on Testing Database Systems10.1145/3662165.3662766(23-29)Online publication date: 9-Jun-2024
  • (2023)Enabling Timely and Persistent Deletion in LSM-EnginesACM Transactions on Database Systems10.1145/359972448:3(1-40)Online publication date: 9-Aug-2023
  • (2022)Dissecting, Designing, and Optimizing LSM-based Data StoresProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3522563(2489-2497)Online publication date: 10-Jun-2022
  • (2021)Big metadataProceedings of the VLDB Endowment10.14778/3476311.347638514:12(3083-3095)Online publication date: 28-Oct-2021
  • (2021)Constructing and analyzing the LSM compaction design spaceProceedings of the VLDB Endowment10.14778/3476249.347627414:11(2216-2229)Online publication date: 27-Oct-2021
  • (2021)Authenticated key-value stores with hardware enclavesProceedings of the 22nd International Middleware Conference: Industrial Track10.1145/3491084.3491425(1-8)Online publication date: 6-Dec-2021
  • (2020)From WiscKey to BourbonProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488775(155-171)Online publication date: 4-Nov-2020
  • (2020)TiDBProceedings of the VLDB Endowment10.14778/3415478.341553513:12(3072-3084)Online publication date: 14-Sep-2020
  • (2020)Streaming Data Reorganization at Scale with DeltaFS Indexed Massive DirectoriesACM Transactions on Storage10.1145/341558116:4(1-31)Online publication date: 24-Sep-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media