Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Speculative execution in a distributed file system

Published: 20 October 2005 Publication History

Abstract

Speculator provides Linux kernel support for speculative execution. It allows multiple processes to share speculative state by tracking causal dependencies propagated through inter-process communication. It guarantees correct execution by preventing speculative processes from externalizing output, e.g., sending a network message or writing to the screen, until the speculations on which that output depends have proven to be correct. Speculator improves the performance of distributed file systems by masking I/O latency and increasing I/O throughput. Rather than block during a remote operation, a file system predicts the operation's result, then uses Speculator to checkpoint the state of the calling process and speculatively continue its execution based on the predicted result. If the prediction is correct, the checkpoint is discarded; if it is incorrect, the calling process is restored to the checkpoint, and the operation is retried. We have modified the client, server, and network protocol of two distributed file systems to use Speculator. For PostMark and Andrew-style benchmarks, speculative execution results in a factor of 2 performance improvement for NFS over local-area networks and an order of magnitude improvement over wide-area networks. For the same benchmarks, Speculator enables the Blue File System to provide the consistency of single-copy file semantics and the safety of synchronous I/O, yet still outperform current distributed file systems with weaker consistency and safety.

References

[1]
Adya, A., Bolosky, W. J., Castro, M., Cermak, G., Chaiken, R., Douceur, J. R., Howell, J., Lorch, J. R., Theimer, M., and Wattenhofer, R. P. FARSITE: Federated, available, and reliable storage for an incompletely trusted environment. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (Boston, MA, December 2002), pp. 1--14.
[2]
Birrell, A. D., Hisgen, A., Jerian, C., Mann, T., and Swart, G. The Echo distributed file system. Tech. Rep. 111, Digital Equipment Corporation, Palo Alto, CA, USA, October 1993.
[3]
Callaghan, B., Pavlowski, B., and Staubach, P. NFS Version 3 Protocol Specification. Tech. Rep. RFC 1813, IETF, June 1995.
[4]
Carson, M. Adaptation and Protocol Testing thorugh Network Emulation. NIST, http://snad.ncsl.nist.gov/itg/nistnet/slides/index.htm.
[5]
Castro, M., and Liskov, B. Proactive recovery in a byzantine-fault-tolerant system. In Proceedings of the 4th Symposium on Operating Systems Design and Implementation (San Diego, CA, October 2000).
[6]
Chang, F., and Gibson, G. Automatic I/O hint generation through speculative execution. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (New Orleans, LA, February 1999), pp. 1--14.
[7]
Cheriton, D., and Duda, K. Logged virtual memory. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (Copper Mountain, CO, Dec. 1995), pp. 26--39.
[8]
Elnozahy, E. N., Alvisi, L., Wang, Y.-M., and Johnson, D. B. A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys 34, 3 (September 2002), 375--408.
[9]
Franklin, M., and Sohi, G. ARB: A hardware mechanism for dynamic reordering of memory references. IEEE Transactions on Computers 45, 5 (May 1996), 552--571.
[10]
Fraser, K., and Chang, F. Operating system I/O speculation: How two invocations are faster than one. In Proceedings of the 2003 USENIX Technical Conference (San Antonio, TX, June 2003), pp. 325--338.
[11]
Haerder, T., and Reuter, A. Principles of Transaction-Oriented Database Recovery. ACM Computing Surveys 15, 4 (December 1983), 287--317.
[12]
Hammond, L., Willey, M., and Olukotun, K. Data speculation support for a chip multiprocessor. In Proc. of the 8th Intl. ACM Conf. on Arch. Support for Programming Languages and Operating Systems (San Jose, CA, October 1998), pp. 58--69.
[13]
Howard, J. H., Kazar, M. L., Menees, S. G., Nichols, D. A., Satyanarayanan, M., Sidebotham, R. N., and West, M. J. Scale and performance in a distributed file system. ACM Transactions on Computer Systems 6, 1 (February 1988).
[14]
Jefferson, D. Virtual time. ACM Transactions on Programming Languages and Systems 7, 3 (July 1985), 404--425.
[15]
Jefferson, D., Beckman, B., Wieland, F., Blume, L., DiLoreto, M., P.Hontalas, Laroche, P., Sturdevant, K., Tupman, J., Warren, V., Weidel, J., Younger, H., and Bellenot, S. Time Warp operating system. In Proceedings of the 11th ACM Symposium on Operating Systems Principles (Austin, TX, November 1987), pp. 77--93.
[16]
Katcher, J. PostMark: A new file system benchmark. Tech. Rep. TR3022, Network Appliance, 1997.
[17]
King, S. T., and Chen, P. M. Backtracking intrusions. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (Bolton Landing, NY, October 2003), pp. 223--236.
[18]
Kistler, J. J., and Satyanarayanan, M. Disconnected operation in the Coda file system. ACM Transactions on Computer Systems 10, 1 (February 1992).
[19]
Lamport, L. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7 (1978), 558--565.
[20]
Li, J., Krohn, M., Mazières, D., and Shasha, D. Secure untrusted data repository (SUNDR). In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (San Francisco, CA, December 2004), pp. 121--136.
[21]
Liskov, B., and Rodrigues, R. Transactional file systems can be fast. In Proceedings of the 11th SIGOPS European Workshop (Leuven, Belgium, September 2004).
[22]
Mazières, D., Kaminsky, M., Kaashoek, M. F., and Witchel, E. Separating key management from file system security. In Proceedings of the 17th ACM Symposium on Operating Systems Principles (Kiawah Island, SC, December 1999), pp. 124--139.
[23]
Nelson, M. N., Welsh, B. B., and Ousterhout, J. K. Caching in the Sprite network file system. ACM Transactions on Computer Systems 6, 1 (1988), 134--154.
[24]
Nightingale, E. B., and Flinn, J. Energy-efficiency and storage flexibility in the Blue File System. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (San Francisco, CA, December 2004), pp. 363--378.
[25]
Rosenblum, M., Bugnion, E., Herrod, S. A., Witchel, E., and Gupta, A. The impact of architectural trends on operating system performance. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (Copper Mountain, CO, December 1995), pp. 285--298.
[26]
Schmuck, F., and Wyllie, J. Experience with transactions in QuickSilver. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (October 1991), pp. 239--53.
[27]
Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame, C., Eisler, M., and Noveck, D. Network File System (NFS) version 4 Protocol. Tech. Rep. RFC 3530, IETF, April 2003.
[28]
Spector, A. Z., Daniels, D., Duchamp, D., Eppinger, J. L., and Pauch, R. Distributed transactions for reliable systems. In Proceedings of the 10th ACM Symposium on Operating Systems Principles (Orcas Island, WA, December 1985), pp. 127--146.
[29]
Srinivasan, S., Andrews, C., Kandula, S., and Zhou, Y. Flashback: A light-weight extension for rollback and deterministic replay for software debugging. In Proceedings of the 2004 USENIX Technical Conference (Boston, MA, June 2004).
[30]
Srinivasan, V., and Mogul, J. Spritely NFS: Experiments with cache consistency protocols. In Proceedings of the 12th ACM Symposium on Operating System Principles (December 1989), pp. 45--57.
[31]
Steffan, J. G., Colohan, C. B., Zhai, A., and Mowry, T. C. A scalable approach to thread-level speculation. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA) (Vancouver, Canada, June 2000), pp. 1--24.
[32]
Weinstein, M. J., Thomas W. Page, J., Livezey, B. K., and Popek, G. J. Transactions and synchronization in a distributed operating system. In Proceedings of the 10th ACM Symposium on Operating Systems Principles (Orcas Island, WA, December 1985), pp. 115--126.
[33]
Zhang, Y., Rauchwerger, L., and Torrellas, J. Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors. In Proc. of the 5th Intl. Symposium on High Performance Computer Architecture (Orlando, FL, January 1999), p. 135.
[34]
Zhu, N., and Chiueh, T. Design, implementation and evaluation of the Repairable File Service. In Proceedings of the International Conference on Dependable Systems and Networks (San Francisco, CA, June 2003).

Cited By

View all
  • (2023)A Network Load Perception Based Task Scheduler for Parallel Distributed Data Processing SystemsIEEE Transactions on Cloud Computing10.1109/TCC.2021.313262711:2(1352-1364)Online publication date: 1-Apr-2023
  • (2021)ForerunnerProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483564(570-587)Online publication date: 26-Oct-2021
  • (2020)Build scripts with perfect dependenciesProceedings of the ACM on Programming Languages10.1145/34282374:OOPSLA(1-28)Online publication date: 13-Nov-2020
  • Show More Cited By

Recommendations

Reviews

Bayard Kohlhepp

Make it run; make it right; make it fast; make it small. The anonymous mantra above captures the evolution of all software (except for the "make it bloat" phase of overly mature products). Distributed processing has passed the novelty stage where just making it run was a thrill. Most improvements in the past decade have been targeted at making it right; in other words, ensuring correct, consistent data in the face of increasingly complicated user interaction scenarios. Pervasive Internet availability and the proliferation of handheld devices are currently motivating research into making software fast and small. It is the move of both casual and mission-critical applications to the Internet that has made improved performance of distributed software a primary goal. However, as in many other domains, designers have continually faced a tradeoff between making software fast and right; you could only increase one attribute by decreasing the other. The authors of this paper have broken through this dilemma. Their speculative techniques vastly improve performance without sacrificing correctness. Moreover, these are not just theoretical or hypothetical performance gains suggested by isolated laboratory experiments. The authors implement their techniques in off-the-shelf commodity software and benchmark the changes. Theirs is proven, real-world success. Their core finding is that it is cheaper to burn central processing unit (CPU) cycles and random access memory (RAM) (abundant resources) on checkpoints that are usually discarded than to waste time waiting for a synchronous remote procedure call to complete (time is the scarce resource). Because of longer latency times, Internet-based applications show even greater performance improvements than local area network applications. It?s that rare case where an increase in need is met with an increase in satisfaction. The paper is 15 pages long with one full page of bibliographic references (the references alone provide a great survey of the field). It is easy to read, and is more like a Communications of the ACM article than a research paper. A download of the base BlueFS file system they modified is available at http://notrump.eecs.umich.edu/group/group.html, the home page of the Pervasive Computing Research Group at University of Michigan. However, source code to the Speculator library or the modified file system is not available. The authors have revealed enough information to recreate their speculative library, but it would be great to see their work released under an open source license. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 39, Issue 5
SOSP '05
December 2005
290 pages
ISSN:0163-5980
DOI:10.1145/1095809
Issue’s Table of Contents
  • cover image ACM Conferences
    SOSP '05: Proceedings of the twentieth ACM symposium on Operating systems principles
    October 2005
    259 pages
    ISBN:1595930795
    DOI:10.1145/1095810
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2005
Published in SIGOPS Volume 39, Issue 5

Check for updates

Author Tags

  1. causality
  2. distributed file systems
  3. speculative execution

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)287
  • Downloads (Last 6 weeks)17
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A Network Load Perception Based Task Scheduler for Parallel Distributed Data Processing SystemsIEEE Transactions on Cloud Computing10.1109/TCC.2021.313262711:2(1352-1364)Online publication date: 1-Apr-2023
  • (2021)ForerunnerProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483564(570-587)Online publication date: 26-Oct-2021
  • (2020)Build scripts with perfect dependenciesProceedings of the ACM on Programming Languages10.1145/34282374:OOPSLA(1-28)Online publication date: 13-Nov-2020
  • (2020)A parallel two-stage genetic algorithm for route planningProceedings of the 2020 Genetic and Evolutionary Computation Conference Companion10.1145/3377929.3398116(1739-1746)Online publication date: 8-Jul-2020
  • (2019)A survey on data storage and placement methodologies for Cloud-Big Data ecosystemJournal of Big Data10.1186/s40537-019-0178-36:1Online publication date: 11-Feb-2019
  • (2019)Improving a Genetic Algorithm for Route Planning Using Parallelism with Speculative ExecutionPractice and Experience in Advanced Research Computing 2019: Rise of the Machines (learning)10.1145/3332186.3333251(1-5)Online publication date: 28-Jul-2019
  • (2018)FoggyCacheProceedings of the 24th Annual International Conference on Mobile Computing and Networking10.1145/3241539.3241557(19-34)Online publication date: 15-Oct-2018
  • (2018)High Availability in Cyber-physical Systems by Self-determined Virtual Machine Replication2018 IEEE 13th International Symposium on Industrial Embedded Systems (SIES)10.1109/SIES.2018.8442105(1-10)Online publication date: Jun-2018
  • (2016)Adaptive Scheduling Strategy for Heterogeneous Spark ClusterComputer Science and Application10.12677/CSA.2016.61108406:11(692-704)Online publication date: 2016
  • (2016)Adaptive Task Scheduling Strategy Based on Dynamic Workload Adjustment for Heterogeneous Hadoop ClustersIEEE Systems Journal10.1109/JSYST.2014.232311210:2(471-482)Online publication date: Jun-2016
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media