Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Minuet: a scalable distributed multiversion B-tree

Published: 01 May 2012 Publication History

Abstract

Data management systems have traditionally been designed to support either long-running analytics queries or short-lived transactions, but an increasing number of applications need both. For example, online games, socio-mobile apps, and e-commerce sites need to not only maintain operational state, but also analyze that data quickly to make predictions and recommendations that improve user experience. In this paper, we present Minuet, a distributed, main-memory B-tree that supports both transactions and copy-on-write snapshots for in-situ analytics. Minuet uses main-memory storage to enable low-latency transactional operations as well as analytics queries without compromising transaction performance. In addition to supporting read-only analytics queries on snapshots, Minuet supports writable clones, so that users can create branching versions of the data. This feature can be quite useful, e.g. to support complex "what-if" analysis or to facilitate wide-area replication. Our experiments show that Minuet outperforms a commercial main-memory database in many ways. It scales to hundreds of cores and TBs of memory, and can process hundreds of thousands of B-tree operations per second while executing long-running scans.

References

[1]
CouchDB. http://couchdb.apache.org/.
[2]
HBase. http://hbase.apache.org/.
[3]
LevelDB. http://code.google.com/p/leveldb/.
[4]
VoltDB. http://voltdb.com/.
[5]
M. K. Aguilera, W. Golab, and M. A. Shah. A practical scalable distributed B-tree. PVLDB, 1(1):598--609, 2008.
[6]
M. K. Aguilera, A. Merchant, M. A. Shah, A. C. Veitch, and C. T. Karamanolis. Sinfonia: A new paradigm for building scalable distributed systems. ACM Trans. Comput. Syst., 27(3):5:1--5:48, 2009.
[7]
B. Becker, S. Gschwind, T. Ohler, B. Seeger, and P. Widmayer. An asymptotically optimal multiversion B-tree. VLDB J., 5(4):264--275, 1996.
[8]
P. A. Bernstein, C. W. Reid, and S. Das. Hyder - a transactional record manager for shared flash. In Proc. CIDR, pages 9--20, 2011.
[9]
J. Bonwick and M. Ahrens. The Zettabyte file system. 2008.
[10]
Y. Cao, C. Chen, F. Guo, D. Jiang, Y. Lin, B. C. Ooi, H. T. Vo, S. Wu, and Q. Xu. ES2: A cloud data storage system for supporting both OLTP and OLAP. In Proc. ICDE, pages 291--302, 2011.
[11]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):4:1--4:26, 2008.
[12]
S. Chutani, O. T. Anderson, M. L. Kazar, B. W. Leverett, W. A. Mason, and R. N. Sidebotham. The Episode file system. In Proc. USENIX ATC, pages 43--60, 1992.
[13]
B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. Pnuts: Yahoo!'s hosted data serving platform. PVLDB, 1(2):1277--1288, 2008.
[14]
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In Proc. SoCC, pages 143--154, New York, NY, USA, 2010. ACM.
[15]
A. Crainiceanu, P. Linga, A. Machanavajjhala, J. Gehrke, and J. Shanmugasundaram. P-ring: an efficient and robust p2p range index structure. In Proc. SIGMOD, pages 223--234, 2007.
[16]
J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan. Making data structures persistent. J. Comput. Syst. Sci., 38(1):86--124, 1989.
[17]
G. Graefe. Write-optimized B-trees. In Proc. VLDB, pages 672--683, 2004.
[18]
G. Graefe. A survey of B-tree locking techniques. ACM Trans. Database Syst., 35(3):16:1--16:26, July 2010.
[19]
G. Graefe and R. Stonecipher. Efficient verification of B-tree integrity. In Proc. BTW, pages 27--46, 2009.
[20]
T. Härder. Observations on optimistic concurrency control schemes. Inf. Syst., 9(2):111--120, June 1984.
[21]
D. Hitz, J. Lau, and M. A. Malcolm. File system design for an NFS file server appliance. In USENIX Winter, pages 235--246, 1994.
[22]
H. V. Jagadish, B. C. Ooi, and Q. H. Vu. BATON: A balanced tree structure for peer-to-peer networks. In Proc. VLDB, pages 661--672, 2005.
[23]
L. Jiang, B. Salzberg, D. B. Lomet, and M. B. García. The BT-tree: A branched and temporal access method. In Proc. VLDB, pages 451--460, 2000.
[24]
T. Johnson and A. Colbrook. A distributed data-balanced dictionary based on the B-link tree. In Proc. IPPS, pages 319--324, 1992.
[25]
A. Kemper and T. Neumann. Hyper: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In Proc. ICDE, pages 195--206, 2011.
[26]
G. M. Landau, J. P. Schmidt, and V. J. Tsotras. Historical queries along multiple lines of time evolution. VLDB J., 4(4):703--726, 1995.
[27]
S. Lanka and E. Mays. Fully persistent B+-trees. In Proc. SIGMOD Conference, pages 426--435, 1991.
[28]
P. L. Lehman and S. B. Yao. Efficient locking for concurrent operations on B-trees. ACM Trans. Database Syst., 6(4):650--670, December 1981.
[29]
D. B. Lomet and B. Salzberg. Access methods for multiversion data. In Proc. SIGMOD, pages 315--324, 1989.
[30]
J. MacCormick, N. Murphy, M. Najork, C. Thekkath, and L. Zhou. Boxwood: Abstractions as the foundation for storage infrastructure. In Proc. OSDI, pages 105--120, Dec. 2004.
[31]
P. Muth, P. E. O'Neil, A. Pick, and G. Weikum. The LHAM log-structured history data access method. VLDB J., 8(3-4):199--221, 2000.
[32]
M. T. Najaran, P. Wijesekera, A. Warfield, and N. C. Hutchinson. Distributed indexing and locking: In search of scalable consistency. In Proc. LADIS, 2011.
[33]
P. E. O'Neil, E. Cheng, D. Gawlick, and E. J. O'Neil. The log-structured merge-tree (LSM-tree). Acta Inf., 33(4):351--385, 1996.
[34]
C. H. Papadimitriou. The serializability of concurrent database updates. J. ACM, 26(4):631--653, 1979.
[35]
D. Reed. Naming and synchronization in a decentralized computer system, 1978.
[36]
O. Rodeh. B-trees, shadowing, and clones. Trans. Storage, 3(4):2:1--2:27, 2008.
[37]
C. A. N. Soules, G. R. Goodson, J. D. Strunk, and G. R. Ganger. Metadata efficiency in versioning file systems. In Proc. FAST, pages 43--58, 2003.
[38]
M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era (it's time for a complete rewrite). In Proc. VLDB, pages 1150--1160, 2007.
[39]
A. Twigg, A. Byde, G. Milos, T. D. Moreton, J. Wilkes, and T. Wilkie. Stratified B-trees and versioning dictionaries. CoRR, abs/1103.4282, 2011.
[40]
H. T. Vo, C. Chen, and B. C. Ooi. Towards elastic transactional cloud storage with range query support. PVLDB, 3(1):506--517, 2010.
[41]
S. Wu, D. Jiang, B. C. Ooi, and K.-L. Wu. Efficient B-tree based indexing for cloud data processing. PVLDB, 3(1):1207--1218, 2010.

Cited By

View all
  • (2023)Learning Optimal Tree-Based Index Placement for Autonomous DatabaseDatabase and Expert Systems Applications10.1007/978-3-031-39847-6_22(304-309)Online publication date: 28-Aug-2023
  • (2022)JiffyProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508437(400-415)Online publication date: 2-Apr-2022
  • (2021) XStore: Fast RDMA-Based Ordered Key-Value Store Using Remote Learned CacheACM Transactions on Storage10.1145/346852017:3(1-32)Online publication date: 16-Aug-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 5, Issue 9
May 2012
120 pages

Publisher

VLDB Endowment

Publication History

Published: 01 May 2012
Published in PVLDB Volume 5, Issue 9

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Learning Optimal Tree-Based Index Placement for Autonomous DatabaseDatabase and Expert Systems Applications10.1007/978-3-031-39847-6_22(304-309)Online publication date: 28-Aug-2023
  • (2022)JiffyProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508437(400-415)Online publication date: 2-Apr-2022
  • (2021) XStore: Fast RDMA-Based Ordered Key-Value Store Using Remote Learned CacheACM Transactions on Storage10.1145/346852017:3(1-32)Online publication date: 16-Aug-2021
  • (2020)Fast RDMA-based ordered key-value store using remote learned cacheProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488773(117-135)Online publication date: 4-Nov-2020
  • (2020)KiWiACM Transactions on Parallel Computing10.1145/33997187:3(1-28)Online publication date: 21-Jun-2020
  • (2019)On supporting efficient snapshot isolation for hybrid workloads with multi-versioned indexesProceedings of the VLDB Endowment10.14778/3364324.336433413:2(211-225)Online publication date: 1-Oct-2019
  • (2019)Multiversion Concurrency with Bounded Delay and Precise Garbage CollectionThe 31st ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3323165.3323185(241-252)Online publication date: 17-Jun-2019
  • (2017)KiWiACM SIGPLAN Notices10.1145/3155284.301876152:8(357-369)Online publication date: 26-Jan-2017
  • (2017)KiWiProceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3018743.3018761(357-369)Online publication date: 26-Jan-2017
  • (2016)Balancing CPU and network in the cell distributed B-tree storeProceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference10.5555/3026959.3027001(451-464)Online publication date: 22-Jun-2016
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media