Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3514221.3517852acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

Confidence Bounded Replica Currency Estimation

Published: 11 June 2022 Publication History

Abstract

Replicas of the same data item often exhibit varying consistency levels when executing read and write requests due to system availability and network limitations. When one or more replicas respond to a query, estimating the currency (or staleness) of the returned data item (without accessing the other replicas) is essential for applications requiring timely data. Depending on how confident the estimation is, the query may dynamically decide to return the retrieved replicas, or wait for the remaining replicas to respond. The replica currency estimation is expected to be accurate and extremely time efficient without introducing large overhead during query processing. In this paper, we provide theoretical bounds on the confidence of replica currency estimation. Our system computes with a minimum probability p, whether the retrieved replicas are current or stale. Using this confidence-bounded replica currency estimation, we implement a novel DYNAMIC read consistency level in the open-source, NoSQL database, Cassandra. Experiments show that the proposed replica currency estimation is intuitive and efficient. In most tested scenarios, with various query loads and cluster configurations, we show our estimations with confidence levels of at least 0.99 while keeping query latency low (close to reading ONE replica). Moreover, the overheads introduced due to estimation scoring and training are low, incurring only 0.76% to 1.17% of the query processing and replica synchronization time costs, respectively.

References

[1]
https://cassandra.apache.org/.
[2]
https://github.com/illidanlab/t-lstm.
[3]
https://hbase.apache.org/.
[4]
F. Akal, C. Türker, H.-J. Schek, Y. Breitbart, T. Grabs, and L. Veen. Fine-grained replication and scheduling with freshness and correctness guarantees. In International Conference on Very Large Data Bases, pages 565--576, 2005.
[5]
P. Bailis, S. Venkataraman, M. J. Franklin, J. M. Hellerstein, and I. Stoica. Probabilistically bounded staleness for practical partial quorums. Proc. VLDB Endow., 5(8):776--787, 2012.
[6]
I. M. Baytas, C. Xiao, X. Zhang, F. Wang, A. K. Jain, and J. Zhou. Patient subtyping via time-aware LS™ networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017, pages 65--74, 2017.
[7]
D. Bermbach and S. Tai. Eventual consistency: How soon is eventual? an evaluation of amazon s3's consistency behavior. In Proceedings of the 6th Workshop on Middleware for Service Oriented Computing, MW4SOC 2011, Lisbon, Portugal, December 12--16, 2011, page 1, 2011.
[8]
P. A. Bernstein, A. D. Fekete, H. Guo, R. Ramakrishnan, and P. Tamma. Relaxed-currency serializability for middle-tier caching and replication. In S. Chaudhuri, V. Hristidis, and N. Polyzotis, editors, Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, June 27--29, 2006, pages 599--610. ACM, 2006.
[9]
E. A. Brewer. Towards robust distributed systems (abstract). In Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing, July 16--19, 2000, Portland, Oregon, USA, page 7, 2000.
[10]
N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, S. Kulkarni, H. C. Li, M. Marchukov, D. Petrov, L. Puzar, Y. J. Song, and V. Venkataramani. TAO: facebook's distributed data store for the social graph. In 2013 USENIX Annual Technical Conference, San Jose, CA, USA, June 26--28, 2013, pages 49--60, 2013.
[11]
S. Brown. Measures of shape: Skewness and kurtosis, 2011.
[12]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber. Bigtable: A distributed storage system for structured data (awarded best paper!). In 7th Symposium on Operating Systems Design and Implementation (OSDI '06), November 6--8, Seattle, WA, USA, pages 205--218, 2006.
[13]
J. Cipar, G. Ganger, K. Keeton, C. B. Morrey, C. A. Soules, and A. Veitch. Lazybase: Trading freshness for performance in a scalable database. In European Conference on Computer Systems, pages 169--182, 2012.
[14]
R. Cooper. Introduction to Queueing Theory. North Holland, 1981.
[15]
C. Cuadras and C. Arenas. A distance based regression model for prediction with mixed data. Communications in Statistics-Theory and Methods, 19(6):2261--2279, 1990.
[16]
A. H. de Souza Jú nior, F. Corona, G. D. A. Barreto, Y. Miché, and A. Lendasse. Minimal learning machine: A novel supervised distance-based approach for regression and classification. Neurocomputing, 164:34--44, 2015.
[17]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: amazon's highly available key-value store. In Proceedings of the 21st ACM Symposium on Operating Systems Principles 2007, SOSP 2007, Stevenson, Washington, USA, October 14--17, 2007, pages 205--220, 2007.
[18]
W. Fan and F. Geerts. Foundations of Data Quality Management. Morgan & Claypool Publishers, 2012.
[19]
A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. Bayesian data analysis. CRC press, 2013.
[20]
L. George. HBase - The Definitive Guide: Random Access to Your Planet-Size Data. O'Reilly, 2011.
[21]
J. Gray, P. Helland, P. E. O'Neil, and D. E. Shasha. The dangers of replication and a solution. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4--6, 1996, pages 173--182, 1996.
[22]
H. Guo, P. Larson, R. Ramakrishnan, and J. Goldstein. Relaxed currency and consistency: How to say "good enough" in SQL. In G. Weikum, A. C. Kö nig, and S. Deßloch, editors, Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, June 13--18, 2004, pages 815--826. ACM, 2004.
[23]
M. Herlihy. Dynamic quorum adjustment for partitioned data. ACM Trans. Database Syst., 12(2):170--194, 1987.
[24]
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735--1780, 1997.
[25]
G. Huang, X. Cheng, J. Wang, Y. Wang, D. He, T. Zhang, F. Li, S. Wang, W. Cao, and Q. Li. X-engine: An optimized storage engine for large-scale e-commerce transaction processing. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 651--665, 2019.
[26]
A. Labrinidis and N. Roussopoulos. Exploring the tradeoff between performance and data freshness in database-driven web servers. The VLDB Journal, 13(3):240--255, 2004.
[27]
A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. Operating Systems Review, 44(2):35--40, 2010.
[28]
G. Linden. Make data useful. Online at: https://sites.google.com/site/glinden/Home/StanfordDataMining.2006--11--29.ppt, 2006.
[29]
G. Linden. Marissa mayer at web 2.0. Online at: http://glinden.blogspot.com/2006/11/marissa-mayer-at-web-20.html, 2006.
[30]
H. Lu, K. Veeraraghavan, P. Ajoux, J. Hunt, Y. J. Song, W. Tobagus, S. Kumar, and W. Lloyd. Existential consistency: measuring and understanding consistency at facebook. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015, Monterey, CA, USA, October 4--7, 2015, pages 295--310, 2015.
[31]
R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, D. Stafford, T. Tung, and V. Venkataramani. Scaling memcache at facebook. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2013, Lombard, IL, USA, April 2--5, 2013, pages 385--398, 2013.
[32]
E. Pacitti, E. Simon, and R. Melo. Improving data freshness in lazy master schemes. In International Conference on Distributed Computing Systems, pages 164--171, 1998.
[33]
D. Potts and C. Sammut. Incremental learning of linear model trees. Mach. Learn., 61(1--3):5--48, 2005.
[34]
X. Qin, X. Zhang, M. Q. Yasin, S. Wang, Z. Feng, and G. Xiao. SUMA: A partial materialization-based scalable query answering in OWL 2 DL. Data Sci. Eng., 6(2):229--245, 2021.
[35]
relaxCarsten. 400+ crypto currency pairs at 1-minute resolution. https://www.kaggle.com/tencars/392-crypto-currency-pairs-at-minute-resolution, 2020.
[36]
U. Rö hm, K. Bö hm, H. Schek, and H. Schuldt. FAS - A freshness-sensitive coordination middleware for a cluster of OLAP components. In Proceedings of 28th International Conference on Very Large Data Bases, VLDB 2002, Hong Kong, August 20--23, 2002, pages 754--765. Morgan Kaufmann, 2002.
[37]
E. Schurman and J. Brutlag. Performance related changes and their user impact. In velocity web performance and operations conference, 2009.
[38]
A. Stisen, H. Blunck, S. Bhattacharya, T. S. Prentow, M. B. Kjærgaard, A. Dey, T. Sonne, and M. M. Jensen. The heterogeneity activity recognition data set. https://archive.ics.uci.edu/ml/datasets/Heterogeneity+Activity+Recognition, 2015.
[39]
M. J. Wooldridge. Foundations of Machine Learning. MIT Press, 2012.
[40]
T. Yamashita. Distributed view divergence control of data freshness in replicated database systems. IEEE Transactions on Knowledge and Data Engineering, 21(10):1403--1417, 2009.
[41]
Q. Yang, Y. Liu, T. Chen, and Y. Tong. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol., 10(2):12:1--12:19, 2019.
[42]
C. Zhang and Z. Zhang. Trading replication consistency for performance and availability: an adaptive approach. In 23rd International Conference on Distributed Computing Systems (ICDCS 2003), 19--22 May 2003, Providence, RI, USA, pages 687--695, 2003.

Cited By

View all
  • (2024)Edge AI-driven neural network predictions for replica sync optimizationApplied Soft Computing10.1016/j.asoc.2024.112083165(112083)Online publication date: Nov-2024
  • (2023)Apache IoTDB: A Time Series Database for IoT ApplicationsProceedings of the ACM on Management of Data10.1145/35897751:2(1-27)Online publication date: 20-Jun-2023
  • (2023)Hydis: A Hybrid Consistent KVS with Effective Sync Among ReplicasAdvanced Parallel Processing Technologies10.1007/978-981-99-7872-4_9(147-161)Online publication date: 4-Aug-2023

Index Terms

  1. Confidence Bounded Replica Currency Estimation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
    June 2022
    2597 pages
    ISBN:9781450392495
    DOI:10.1145/3514221
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 June 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. distributed database
    2. replica currency

    Qualifiers

    • Research-article

    Funding Sources

    • National Key Research & Development Plan
    • National Natural Science Foundation of China
    • BNR2022RC01011

    Conference

    SIGMOD/PODS '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)147
    • Downloads (Last 6 weeks)22
    Reflects downloads up to 14 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Edge AI-driven neural network predictions for replica sync optimizationApplied Soft Computing10.1016/j.asoc.2024.112083165(112083)Online publication date: Nov-2024
    • (2023)Apache IoTDB: A Time Series Database for IoT ApplicationsProceedings of the ACM on Management of Data10.1145/35897751:2(1-27)Online publication date: 20-Jun-2023
    • (2023)Hydis: A Hybrid Consistent KVS with Effective Sync Among ReplicasAdvanced Parallel Processing Technologies10.1007/978-981-99-7872-4_9(147-161)Online publication date: 4-Aug-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media