Abstract
Modern hardware is abundantly parallel and increasingly heterogeneous. The numerous processing cores have non-uniform access latencies to the main memory and processor caches, which causes variability in the communication costs. Unfortunately, database systems mostly assume that all processing cores are the same and that microarchitecture differences are not significant enough to appear in critical database execution paths. As we demonstrate in this paper, however, non-uniform core topology does appear in the critical path and conventional database architectures achieve suboptimal and even worse, unpredictable performance. We perform a detailed performance analysis of OLTP deployments in servers with multiple cores per CPU (multicore) and multiple CPUs per server (multisocket). We compare different database deployment strategies where we vary the number and size of independent database instances running on a single server, from a single shared-everything instance to fine-grained shared-nothing configurations. We quantify the impact of non-uniform hardware on various deployments by (a) examining how efficiently each deployment uses the available hardware resources and (b) measuring the impact of distributed transactions and skewed requests on different workloads. We show that no strategy is optimal for all cases and that the best choice depends on the combination of hardware topology and workload characteristics. Finally, we argue that transaction processing systems must be aware of the hardware topology in order to achieve predictably high performance.
Similar content being viewed by others
Notes
Such as VoltDB, MongoDB, MemSQL, NuoDB.
For more details, see http://www.supermicro.com/manuals/motherboard/7500/X8OBN-F.
Explaining, among other reasons, the high compensation for skilled database administrators.
References
Accetta, M.J., Baron, R.V., Bolosky, W.J., Golub, D.B., Rashid, R.F., Tevanian, A., Young, M.: Mach: A new kernel foundation for UNIX development. In: USENIX Summer, pp. 93–112 (1986)
Ailamaki, A., DeWitt, D.J., Hill, M.D., Wood, D.A.: DBMSs on a modern processor: where does time go? In: VLDB, pp. 266–277 (1999)
Albutiu, M.C., Kemper, A., Neumann, T.: Massively Parallel sort-merge joins in main memory multi-core database systems. PVLDB 5(10), 1064–1075 (2012)
Amazon: EC2 instance types (2015). https://aws.amazon.com/ec2/instance-types/
Bailis, P., Fekete, A., Franklin, M.J., Ghodsi, A., Hellerstein, J.M., Stoica, I.: Coordination avoidance in database systems. PVLDB 8(3), 185–196 (2015)
Balkesen, C., Alonso, G., Teubner, J., Ozsu, M.T.: Multi-core, main-memory joins: sort vs. hash revisited. PVLDB 7(1), 85–96 (2014)
Barroso, L.A., Gharachorloo, K., Bugnion, E.: Memory system characterization of commercial workloads. In: ISCA, pp. 3–14 (1998)
Baumann, A., Barham, P., Dagand, P.E., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schüpbach, A., Singhania, A.: The multikernel: a new OS architecture for scalable multicore systems. In: SOSP, pp. 29–44 (2009)
Beckmann, B.M., Wood, D.A.: Managing wire delay in large chip-multiprocessor caches. In: IEEE MICRO, pp. 319–330 (2004)
Bernstein, P.A., Goodman, N.: Multiversion concurrency control–theory and algorithms. ACM TODS 8(4), 465–483 (1983)
Blagodurov, S., Zhuravlev, S., Fedorova, A., Kamali, A.: A case for NUMA-aware contention management on multicore systems. In: PACT, pp. 557–558 (2010)
Brewer, E.A.: Towards robust distributed systems (abstract). In: PODC, pp. 7–7 (2000)
Carey, M.J., DeWitt, D.J., Franklin, M.J., Hall, N.E., McAuliffe, M.L., Naughton, J.F., Schuh, D.T., Solomon, M.H., Tan, C.K., Tsatalos, O.G., White, S.J., Zwilling, M.J.: Shoring up persistent applications. In: SIGMOD, pp. 383–394 (1994)
Closson, K.: You buy a NUMA system, Oracle says disable NUMA! What gives? (2009). http://kevinclosson.wordpress.com/2009/05/14/you-buy-a-numa-system-oracle-says-disable-numa-what-gives-part-ii/
Corbett, J.C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J., Ghemawat, S., Gubarev, A., Heiser, C., Hochschild, P., Hsieh, W., Kanthak, S., Kogan, E., Li, H., Lloyd, A., Melnik, S., Mwaura, D., Nagle, D., Quinlan, S., Rao, R., Rolig, L., Saito, Y., Szymaniak, M., Taylor, C., Wang, R., Woodford, D.: Spanner: Google’s globally-distributed database. In: OSDI, pp. 261–264 (2012)
Curino, C., Jones, E., Zhang, Y., Madden, S.: Schism: a workload-driven approach to database replication and partitioning. PVLDB 3, 48–57 (2010)
Dashti, M., Fedorova, A., Funston, J., Gaud, F., Lachaize, R., Lepers, B., Quema, V., Roth, M.: Traffic management: a holistic approach to memory placement on NUMA systems. In: ASPLOS, pp. 381–394 (2013)
David, T., Guerraoui, R., Trigonakis, V.: Everything you always wanted to know about synchronization but were afraid to ask. In: SOSP, pp. 33–48 (2013)
Engler, D.R., Kaashoek, M.F., O’Toole Jr., J.: Exokernel: an operating system architecture for application-level resource management. In: SOSP, pp. 251–266 (1995)
Giceva, J., Alonso, G., Roscoe, T., Harris, T.: Deployment of query plans on multicores. PVLDB 8(3), 233–244 (2014)
Graham, C., Sood, B., Horiuchi, H., Sommer, D.: Market share: Database management system software, worldwide (2009). http://www.gartner.com/DisplayDocument?id=1044912
Hardavellas, N., Ferdman, M., Falsafi, B., Ailamaki, A.: Reactive NUCA: near-optimal block placement and replication in distributed caches. In: ISCA, pp. 184–195 (2009)
Harizopoulos, S., Abadi, D.J., Madden, S., Stonebraker, M.: OLTP through the looking glass, and what we found there. In: SIGMOD, pp. 981–992 (2008)
Helland, P.: Life beyond distributed transactions: an apostate’s opinion. In: CIDR, pp. 132–141 (2007)
HP: Running Microsoft SQL Server 2014 on HP Integrity Superdome X—Reference Configuration Guide (2015). http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA5-8846ENW
Johnson, R., Pandis, I., Ailamaki, A.: Improving OLTP scalability using speculative lock inheritance. PVLDB 2(1), 479–489 (2009)
Johnson, R., Pandis, I., Ailamaki, A.: Eliminating unscalable communication in transaction processing. Vldb J. 23(1), 1–23 (2014)
Johnson, R., Pandis, I., Hardavellas, N., Ailamaki, A., Falsafi, B.: Shore-MT: a scalable storage manager for the multicore era. In: EDBT, pp. 24–35 (2009)
Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M., Ailamaki, A.: Aether: a scalable approach to logging. PVLDB 3, 681–692 (2010)
Jones, E., Abadi, D.J., Madden, S.: Low overhead concurrency control for partitioned main memory databases. In: SIGMOD, pp. 603–614 (2010)
Jung, H., Han, H., Fekete, A.D., Heiser, G., Yeom, H.Y.: A Scalable lock manager for multicores. In: SIGMOD, pp. 73–84 (2013)
Kemper, A., Neumann, T.: HyPer – a hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: ICDE, pp. 195–206 (2011)
Kimura, H.: FOEDUS: OLTP engine for a thousand cores and NVRAM. In: SIGMOD, pp. 691–706 (2015)
Kimura, H., Graefe, G., Kuno, H.: Efficient locking techniques for databases on modern hardware. In: ADMS (2012)
Kissinger, T., Kiefer, T., Schlegel, B., Habich, D., Molka, D., Lehner, W.: ERIS: A NUMA-aware in-memory storage engine for analytical workload. In: ADMS, pp. 74–85 (2014)
Kung, H.T., Robinson, J.T.: On optimistic methods for concurrency control. ACM TODS 6(2), 213–226 (1981)
Lahiri, T., Neimat, M.A., Folkman, S.: Oracle TimesTen: an in-memory database for enterprise applications. IEEE Data Eng. Bull. 36(2), 6–13 (2013)
Lahiri, T., Srihari, V., Chan, W., MacNaughton, N., Chandrasekaran, S.: Cache fusion: extending shared-disk clusters with shared caches. In: VLDB, pp. 683–686 (2001)
Larson, P.A., Blanas, S., Diaconu, C., Freedman, C., Patel, J.M., Zwilling, M.: High-performance concurrency control mechanisms for main-memory databases. PVLDB 5(4), 298–309 (2011)
Levandoski, J.J., Lomet, D.B., Sengupta, S.: The bw-tree: a b-tree for new hardware platforms. In: ICDE, pp. 302–313 (2013)
Levinthal, D.: Performance analysis guide for Intel Core i7 and Intel Xeon 5500 processors (2009). http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf
Li, Y., Pandis, I., Mueller, R., Raman, V., Lohman, G.: NUMA-aware algorithms: the case of data shuffling. In: CIDR (2013)
Lindström, J., Raatikka, V., Ruuth, J., Soini, P., Vakkila, K.: IBM solidDB: in-memory database optimized for extreme speed and availability. IEEE Data Eng. Bull. 36(2), 14–20 (2013)
Mao, Y., Kohler, E., Morris, R.T.: Cache craftiness for fast multicore key-value storage. In: Eurosys, pp. 183–196 (2012)
Microsoft: Analytics Platform System (2015). http://www.microsoft.com/en-us/server-cloud/products/analytics-platform-system
Oracle Corp.: Exadata Database Machine (2015). https://www.oracle.com/engineered-systems/exadata/database-machine-x4-8/features.html
Pandis, I., Johnson, R., Hardavellas, N., Ailamaki, A.: Data-oriented transaction execution. PVLDB 3(1), 928–939 (2010)
Pandis, I., Tözün, P., Johnson, R., Ailamaki, A.: PLP: page latch-free shared-everything OLTP. PVLDB 4(10), 610–621 (2011)
Pavlo, A., Curino, C., Zdonik, S.: Skew-Aware Automatic database partitioning in shared-nothing, parallel OLTP systems. In: SIGMOD, pp. 61–72 (2012)
Pavlo, A., Jones, E.P.C., Zdonik, S.: On predictive modeling for optimizing transaction execution in parallel OLTP systems. PVLDB 5(2), 85–96 (2011)
Polychroniou, O., Ross, K.A.: A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In: SIGMOD, pp. 755–766 (2014)
Porobic, D., Liarou, E., Tözün, P., Ailamaki, A.: ATraPos: adaptive transaction processing on hardware islands. In: ICDE (2014)
Porobic, D., Pandis, I., Branco, M., Tözün, P., Ailamaki, A.: OLTP on hardware Islands. PVLDB 5(11), 1447–1458 (2012)
Quamar, A., Kumar, K.A., Deshpande, A.: Sword: scalable workload-aware data placement for transactional workloads. In: EDBT, pp. 430–441 (2013)
Salomie, T.I., Subasu, I.E., Giceva, J., Alonso, G.: Database engines on multicores, why parallelize when you can distribute? In: EuroSys, pp. 17–30 (2011)
Somogyi, S., Wenisch, T.F., Hardavellas, N., Kim, J., Ailamaki, A., Falsafi, B.: Memory coherence activity prediction in commercial workloads. In: WMPI, pp. 37–45 (2004)
Stonebraker, M.: The case for shared nothing. IEEE Database Eng. Bull. 9, 4–9 (1986)
Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The end of an architectural era: (it’s time for a complete rewrite). In: VLDB, pp. 1150–1160 (2007)
Tang, L., Mars, J., Vachharajani, N., Hundt, R., Soffa, M.L.: The impact of memory subsystem resource sharing on datacenter applications. In: ISCA, pp. 283–294 (2011)
Thomson, A., Diamond, T., Weng, S.C., Ren, K., Shao, P., Abadi, D.J.: Calvin: Fast distributed transactions for partitioned database systems. In: SIGMOD, pp. 1–12 (2012)
Tözün, P., Pandis, I., Johnson, R., Ailamaki, A.: Scalable and dynamically balanced shared-everything OLTP with physiological partitioning. VLDB J. 22(2), 151–175 (2013)
Tözün, P., Pandis, I., Kaynak, C., Jevdjic, D., Ailamaki, A.: From A to E: analyzing TPC’s OLTP Benchmarks—The obsolete, the ubiquitous, the unexplored. In: EDBT, pp. 17–28 (2013)
TPC: TPC benchmark B standard specification, revision 2.0 (1994). http://www.tpc.org/tpcb
TPC: TPC benchmark C standard specification, revision 5.11 (2010). http://www.tpc.org/tpcc
TPC: TPC benchmark E standard specification, revision 1.12.0 (2010). http://www.tpc.org/tpce
Tran, K.Q., Naughton, J.F., Sundarmurthy, B., Tsirogiannis, D.: JECB: A join-extension, code-based approach to OLTP data partitioning. In: SIGMOD, pp. 39–50. ACM
Tu, S., Zheng, W., Kohler, E., Liskov, B., Madden, S.: Speedy transactions in multicore in-memory databases. In: SOSP, pp. 18–32 (2013)
Vogels, W.: Eventually consistent. Commun. ACM 52, 40–44 (2009)
Wagle, M., Booss, D., Schreter, I.: NUMA-aware memory management with in-memory databases. In: TPCTC (2015)
Wilson, M.: Disabling NUMA parameter (2011). http://www.michaelwilsondba.info/2011/05/disabling-numa-parameter.html
Yu, X., Bezerra, G., Pavlo, A., Devadas, S., Stonebraker, M.: Staring into the abyss: an evaluation of concurrency control with one thousand cores. PVLDB 8(3), 209–220 (2014)
Zhang, C., Ré, C.: Dimmwitted: a study of main-memory statistical analytics. PVLDB 7(12), 1283–1294 (2014)
Acknowledgments
We would like to thank Eric Sedlar and Brian Gold for many insightful discussions and the members of the DIAS laboratory for their support throughout this work. This work is partially funded by Oracle Labs and by the Swiss National Science Foundation (Grant No. 200021-146407/1).
Author information
Authors and Affiliations
Corresponding author
Additional information
I. Pandis: Work done while author was affiliated with IBM.
M. Branco, P. Tözün: Work done while author was affiliated with EPFL.
Rights and permissions
About this article
Cite this article
Porobic, D., Pandis, I., Branco, M. et al. Characterization of the Impact of Hardware Islands on OLTP. The VLDB Journal 25, 625–650 (2016). https://doi.org/10.1007/s00778-015-0413-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-015-0413-2