Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Eliminating unscalable communication in transaction processing

Published: 01 February 2014 Publication History

Abstract

Multicore hardware demands software parallelism. Transaction processing workloads typically exhibit high concurrency, and, thus, provide ample opportunities for parallel execution. Unfortunately, because of the characteristics of the application, transaction processing systems must moderate and coordinate communication between independent agents; since it is notoriously difficult to implement high performing transaction processing systems that incur no communication whatsoever. As a result, transaction processing systems cannot always convert abundant, even embarrassing, request-level parallelism into execution parallelism due to communication bottlenecks. Transaction processing system designers must therefore find ways to achieve scalability while still allowing communication to occur. To this end, we identify three forms of communication in the system--unbounded, fixed, and cooperative--and argue that only the first type poses a fundamental threat to scalability. The other two types tend not impose obstacles to scalability, though they may reduce single-thread performance. We argue that proper analysis of communication patterns in any software system is a powerful tool for improving the system's scalability. Then, we present and evaluate under a common framework techniques that attack significant sources of unbounded communication during transaction processing and sketch a solution for those that remain. The solutions we present affect fundamental services of any transaction processing engine, such as locking, logging, physical page accesses, and buffer pool frame accesses. They either reduce such communication through caching, downgrade it to a less-threatening type, or eliminate it completely through system design. We find that the later technique, revisiting the transaction processing architecture, is the most effective. The final design cuts unbounded communication by roughly an order of magnitude compared with the baseline, while exhibiting better scalability on multicore machines.

References

[1]
Achyutuni, K.J., Omiecinski, E., Navathe, S.B.: Two techniques for on-line index modification in shared nothing parallel databases. In: SIGMOD, pp. 125---136 (1996)
[2]
Ailamaki, A., DeWitt, D.J., Hill, M.D.: Walking four machines by the shore. In: CAECW (2001)
[3]
Ailamaki, A., DeWitt, D.J., Hill, M.D., Wood, D.A.: DBMSs on a modern processor: where does time go? In: VLDB, pp. 266---277 (1999)
[4]
Albutiu, M.C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB 5(10), 1064---1075 (2012)
[5]
Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: AFIPS, pp. 483---485 (1967)
[6]
Aspnes, J., Herlihy, M., Shavit, N.: Counting networks. J. ACM 41(5), 1020---1048 (1994)
[7]
Attiya, H., Bar-Noy, A., Dolev, D.: Sharing memory robustly in message-passing systems. J. ACM 42(1), 124---142 (1995)
[8]
Barroso, L.A., Gharachorloo, K., Bugnion, E.: Memory system characterization of commercial workloads. In: ISCA, pp. 3---14 (1998)
[9]
Baumann, A., et al.: The multikernel: a new OS architecture for scalable multicore systems. In: SOSP, pp. 29---44 (2009)
[10]
Bayer, R., McCreight, E.M.: Organization and maintenance of large ordered indices. In: SIGFIDET, pp. 107---141 (1970)
[11]
Bernstein, P.A., Goodman, N.: Multiversion concurrency control--theory and algorithms. ACM TODS 8(4), 465---483 (1983)
[12]
Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: PACT, pp. 72---81 (2008)
[13]
Carey, M.J., et al.: Shoring up persistent applications. In: SIGMOD, pp. 383---394 (1994)
[14]
Chang, F., et al.: Bigtable: A distributed storage system for structured data. In: OSDI, p. 15 (2006)
[15]
Chen, S., Ailamaki, A., Gibbons, P.B., Mowry, T.C.: Improving hash join performance through prefetching. ACM TODS 32(3), 116---127 (2007)
[16]
Cieslewicz, J., Ross, K.A.: Adaptive aggregation on chip multiprocessors. In: VLDB, pp. 339---350 (2007)
[17]
Clark, K.L., McCabe, F.G.: Go! a multi-paradigm programming language for implementing multi-threaded agents. Ann. Math. Artif. Intell. 41, 171---206 (2004)
[18]
Curino, C., Jones, E., Zhang, Y., Madden, S.: Schism: a workload-driven approach to database replication and partitioning. PVLDB 3, 48---57 (2010)
[19]
Daniels, D.S., Spector, A.Z., Thompson, D.S.: Distributed logging for transaction processing. SIGMOD Rec. 16, 82---96 (1987)
[20]
Davis, J.D., Laudon, J., Olukotun, K.: Maximizing CMP throughput with mediocre cores. In: PACT, pp. 51---62(2005)
[21]
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI, p. 10 (2004)
[22]
DeCandia, G., et al.: Dynamo: Amazon's highly available key-value store. SIGOPS OSR 41(6), 205---220 (2007)
[23]
Dewitt, D.J., et al.: The Gamma database machine project. IEEE TKDE 2(1), 44---62 (1990)
[24]
Dragojevic, A., Guerraoui, R., Kapalka, M.: Dividing transactional memories by zero. In: TRANSACT (2008)
[25]
Graefe, G.: Hierarchical locking in B-tree indexes. In: BTW, pp. 18---42 (2007)
[26]
Gray, J., Helland, P., O'Neil, P., Shasha, D.: The dangers of replication and a solution. In: SIGMOD, pp. 173---182 (1996)
[27]
Gray, J., Reuter, A.: Transaction Processing: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (1992)
[28]
Hardavellas, N., et al.: Database servers on chip multiprocessors: limitations and opportunities. In: CIDR, pp. 79---87 (2007)
[29]
Harizopoulos, S., Abadi, D.J., Madden, S., Stonebraker, M.: OLTP through the looking glass, and what we found there. In: SIGMOD, pp. 981---992 (2008)
[30]
Harizopoulos, S., Shkapenyuk, V., Ailamaki, A.: QPipe: a simultaneously pipelined relational query engine. In: SIGMOD, pp. 383---394 (2005)
[31]
Helland, P.: Life beyond distributed transactions: an apostate's opinion. In: CIDR, pp. 132---141 (2007)
[32]
Helland, P., et al.: Group commit timers and high volume transaction systems. In: HPTS, pp. 301---329 (1989)
[33]
Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13(1), 124---149 (1991)
[34]
Herlihy, M., Moss, J.E.B.: Transactional memory: architectural support for lock-free data structures. SIGARCH Comput. Archit. News 21(2), 289---300 (1993)
[35]
Hill, M.D., Marty, M.R.: Amdahl's law in the multicore era. Computer 41, 33---38 (2008)
[36]
Hunt, G.C., Larus, J.R.: Singularity: rethinking the software stack. SIGOPS OSR 41(2), 37---49 (2007)
[37]
Jaluta, I., Sippu, S., Soisalon-Soininen, E.: B-tree concurrency control and recovery in page-server database systems. ACM TODS 31, 82---132 (2006)
[38]
Johnson, R., Pandis, I., Hardavellas, N., Ailamaki, A., Falsafi, B.: Shore-MT: a scalable storage manager for the multicore era. In: EDBT, pp. 24---35 (2009)
[39]
Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Aether: a scalable approach to logging. PVLDB 3, 681---692 (2010)
[40]
Jones, E., Abadi, D.J., Madden, S.: Low overhead concurrency control for partitioned main memory databases. In: SIGMOD, pp. 603---614 (2010)
[41]
Jorwekar, S., Fekete, A., Ramamritham, K., Sudarshan, S.: Automating the detection of snapshot isolation anomalies. In: VLDB, pp. 1263---1274 (2007)
[42]
Kemper, A., Neumann, T.: HyPer--a hybrid OLTP &OLAP main memory database system based on virtual memory snapshots. In: ICDE, pp. 195---206 (2011)
[43]
Kimura, H., Graefe, G., Kuno, H.: Efficient locking techniques for databases on modern hardware. In: ADMS (2012)
[44]
Kung, H.T., Robinson, J.T.: On optimistic methods for concurrency control. ACM TODS 6(2), 213---226 (1981)
[45]
Larson, P.A., et al.: High-performance concurrency control mechanisms for main-memory databases. PVLDB 5(4), 298---309 (2011)
[46]
Lauer, H.C., Needham, R.M.: On the duality of operating system structures. SIGOPS OSR 13, 3---19 (1979)
[47]
Lee, J., Kim, K., Cha, S.K.: Differential logging: a commutative and associative logging scheme for highly parallel main memory database. In: ICDE, pp. 173---184 (2001)
[48]
Lee, M.L., Kitsuregawa, M., Ooi, B.C., Tan, K.L., Mondal, A.: Towards self-tuning data placement in parallel database systems. In: SIGMOD, pp. 225---236 (2000)
[49]
Lomet, D.: Recovery for shared disk systems using multiple redo logs. Technical report CRL-90-4 (1990)
[50]
Lomet, D., Anderson, R., Rengarajan, T.K., Spiro, P.: How the Rdb/VMS data sharing system became fast. Technical report CRL-92-4 (1992)
[51]
Magnusson, P.S., Landin, A., Hagersten, E.: Queue locks on cache coherent multiprocessors. In: ISPP, pp. 165---171 (1994)
[52]
Mellor-Crummey, J.M., Scott, M.L.: Scalable reader-writer synchronization for shared-memory multiprocessors. SIGPLAN Not. 26(7), 106---113 (1991)
[53]
Mohan, C.: ARIES/KVL: a key-value locking method for concurrency control of multiaction transactions operating on B-tree indexes. In: VLDB, pp. 392---405 (1990)
[54]
Mohan, C., Levine, F.: ARIES/IM: an efficient and high concurrency index management method using write-ahead logging. In: SIGMOD, pp. 371---380 (1992)
[55]
Moir, M., Nussbaum, D., Shalev, O., Shavit, N.: Using elimination to implement scalable and lock-free FIFO queues. In: SPAA, pp. 253---262 (2005)
[56]
Pandis, I., Johnson, R., Hardavellas, N., Ailamaki, A.: Data-oriented transaction execution. PVLDB 3(1), 928---939 (2010)
[57]
Pandis, I., Tözün, P., Johnson, R., Ailamaki, A.: PLP: page latch-free shared-everything OLTP. PVLDB 4(10), 610---621 (2011)
[58]
Porobic, D., Pandis, I., Branco, M., Tözün, P., Ailamaki, A.: OLTP on hardware islands. PVLDB 5(11), 1447---1458 (2012)
[59]
Ranganathan, P., Gharachorloo, K., Adve, S.V., Barroso, L.A.: Performance of database workloads on shared-memory systems with out-of-order processors. In: ASPLOS, pp. 307---318 (1998)
[60]
Rao, J., Ross, K.A.: Cache conscious indexing for decision-support in main memory. In: VLDB, pp. 78---89 (1999)
[61]
Rao, J., Ross, K.A.: Making B+-trees cache conscious in main memory. In: SIGMOD, pp. 475---486 (2000)
[62]
Shavit, N., Touitou, D.: Software transactional memory. In: PODC, pp. 204---213 (1995)
[63]
Smith, A.J.: Sequentiality and prefetching in database systems. ACM TODS 3, 223---247 (1978)
[64]
Soisalon-Soininen, E., Ylönen, T.: Partial strictness in two-phase locking. In: ICDT, pp. 139---147 (1995)
[65]
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: SIGCOMM, pp. 149---160 (2001)
[66]
Stonebraker, M.: The case for shared nothing. IEEE Database, Eng. Bull. 9, 4---9 (1986)
[67]
Stonebraker, M.: Stonebraker on nosql and enterprises. Commun. ACM 54, 10---11 (2011)
[68]
Stonebraker, M., et al.: The end of an architectural era: (it's time for a complete rewrite). In: VLDB, pp. 1150---1160 (2007)
[69]
Thomson, A., Abadi, D.J.: The case for determinism in database systems. PVLDB 3, 70---80 (2010)
[70]
Tözün, P., Pandis, I., Johnson, R., Ailamaki, A.: Scalable and dynamically balanced shared-everything OLTP with physiological partitioning. VLDB J 1---25 (2012)
[71]
Vogels, W.: Eventually consistent. Commun. ACM 52, 40---44 (2009)
[72]
Welsh, M., Culler, D., Brewer, E.: SEDA: an architecture for well-conditioned, scalable internet services. In: SOSP, pp. 230---243 (2001)
[73]
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodological considerations. In: ISCA, pp. 24---36 (1995)

Cited By

View all
  • (2016)ERMIAProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2882905(1675-1687)Online publication date: 26-Jun-2016
  • (2016)Workload-aware incremental repartitioning of shared-nothing distributed databases for scalable OLTP applicationsFuture Generation Computer Systems10.1016/j.future.2015.09.02456:C(421-435)Online publication date: 1-Mar-2016
  • (2016)Characterization of the Impact of Hardware Islands on OLTPThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-015-0413-225:5(625-650)Online publication date: 1-Oct-2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases  Volume 23, Issue 1
February 2014
170 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 February 2014

Author Tags

  1. Aether
  2. Communication patterns
  3. DORA
  4. Overlay Bufferpools
  5. PLP
  6. SLI
  7. Scalable OLTP
  8. Shore-MT

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)60
  • Downloads (Last 6 weeks)6
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2016)ERMIAProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2882905(1675-1687)Online publication date: 26-Jun-2016
  • (2016)Workload-aware incremental repartitioning of shared-nothing distributed databases for scalable OLTP applicationsFuture Generation Computer Systems10.1016/j.future.2015.09.02456:C(421-435)Online publication date: 1-Mar-2016
  • (2016)Characterization of the Impact of Hardware Islands on OLTPThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-015-0413-225:5(625-650)Online publication date: 1-Oct-2016
  • (2015)CentimanProceedings of the Sixth ACM Symposium on Cloud Computing10.1145/2806777.2806837(262-275)Online publication date: 27-Aug-2015
  • (2015)Applying HTM to an OLTP SystemProceedings of the 11th International Workshop on Data Management on New Hardware10.1145/2771937.2771946(1-7)Online publication date: 31-May-2015
  • (2014)Phase reconciliation for contended in-memory transactionsProceedings of the 11th USENIX conference on Operating Systems Design and Implementation10.5555/2685048.2685088(511-524)Online publication date: 6-Oct-2014
  • (2014)Coordination avoidance in database systemsProceedings of the VLDB Endowment10.14778/2735508.27355098:3(185-196)Online publication date: 1-Nov-2014
  • (2014)How to stop under-utilization and love multicoresProceedings of the 2014 ACM SIGMOD International Conference on Management of Data10.1145/2588555.2588892(189-192)Online publication date: 18-Jun-2014
  • (2014)Workload-Aware Incremental Repartitioning of Shared-Nothing Distributed Databases for Scalable Cloud ApplicationsProceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing10.1109/UCC.2014.30(213-222)Online publication date: 8-Dec-2014

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media