article

Speculative execution in a distributed file system

Authors:

Edmund B. Nightingale,

Peter M. Chen,

Jason FlinnAuthors Info & Claims

ACM SIGOPS Operating Systems Review, Volume 39, Issue 5

Pages 191 - 205

https://doi.org/10.1145/1095809.1095829

Published: 20 October 2005 Publication History

Get Access

Abstract

Speculator provides Linux kernel support for speculative execution. It allows multiple processes to share speculative state by tracking causal dependencies propagated through inter-process communication. It guarantees correct execution by preventing speculative processes from externalizing output, e.g., sending a network message or writing to the screen, until the speculations on which that output depends have proven to be correct. Speculator improves the performance of distributed file systems by masking I/O latency and increasing I/O throughput. Rather than block during a remote operation, a file system predicts the operation's result, then uses Speculator to checkpoint the state of the calling process and speculatively continue its execution based on the predicted result. If the prediction is correct, the checkpoint is discarded; if it is incorrect, the calling process is restored to the checkpoint, and the operation is retried. We have modified the client, server, and network protocol of two distributed file systems to use Speculator. For PostMark and Andrew-style benchmarks, speculative execution results in a factor of 2 performance improvement for NFS over local-area networks and an order of magnitude improvement over wide-area networks. For the same benchmarks, Speculator enables the Blue File System to provide the consistency of single-copy file semantics and the safety of synchronous I/O, yet still outperform current distributed file systems with weaker consistency and safety.

References

[1]

Adya, A., Bolosky, W. J., Castro, M., Cermak, G., Chaiken, R., Douceur, J. R., Howell, J., Lorch, J. R., Theimer, M., and Wattenhofer, R. P. FARSITE: Federated, available, and reliable storage for an incompletely trusted environment. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (Boston, MA, December 2002), pp. 1--14.

Digital Library

Google Scholar

[2]

Birrell, A. D., Hisgen, A., Jerian, C., Mann, T., and Swart, G. The Echo distributed file system. Tech. Rep. 111, Digital Equipment Corporation, Palo Alto, CA, USA, October 1993.

Google Scholar

[3]

Callaghan, B., Pavlowski, B., and Staubach, P. NFS Version 3 Protocol Specification. Tech. Rep. RFC 1813, IETF, June 1995.

Digital Library

Google Scholar

[4]

Carson, M. Adaptation and Protocol Testing thorugh Network Emulation. NIST, http://snad.ncsl.nist.gov/itg/nistnet/slides/index.htm.

Google Scholar

[5]

Castro, M., and Liskov, B. Proactive recovery in a byzantine-fault-tolerant system. In Proceedings of the 4th Symposium on Operating Systems Design and Implementation (San Diego, CA, October 2000).

Digital Library

Google Scholar

[6]

Chang, F., and Gibson, G. Automatic I/O hint generation through speculative execution. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (New Orleans, LA, February 1999), pp. 1--14.

Digital Library

Google Scholar

[7]

Cheriton, D., and Duda, K. Logged virtual memory. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (Copper Mountain, CO, Dec. 1995), pp. 26--39.

Digital Library

Google Scholar

[8]

Elnozahy, E. N., Alvisi, L., Wang, Y.-M., and Johnson, D. B. A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys 34, 3 (September 2002), 375--408.

Digital Library

Google Scholar

[9]

Franklin, M., and Sohi, G. ARB: A hardware mechanism for dynamic reordering of memory references. IEEE Transactions on Computers 45, 5 (May 1996), 552--571.

Digital Library

Google Scholar

[10]

Fraser, K., and Chang, F. Operating system I/O speculation: How two invocations are faster than one. In Proceedings of the 2003 USENIX Technical Conference (San Antonio, TX, June 2003), pp. 325--338.

Google Scholar

[11]

Haerder, T., and Reuter, A. Principles of Transaction-Oriented Database Recovery. ACM Computing Surveys 15, 4 (December 1983), 287--317.

Digital Library

Google Scholar

[12]

Hammond, L., Willey, M., and Olukotun, K. Data speculation support for a chip multiprocessor. In Proc. of the 8th Intl. ACM Conf. on Arch. Support for Programming Languages and Operating Systems (San Jose, CA, October 1998), pp. 58--69.

Digital Library

Google Scholar

[13]

Howard, J. H., Kazar, M. L., Menees, S. G., Nichols, D. A., Satyanarayanan, M., Sidebotham, R. N., and West, M. J. Scale and performance in a distributed file system. ACM Transactions on Computer Systems 6, 1 (February 1988).

Digital Library

Google Scholar

[14]

Jefferson, D. Virtual time. ACM Transactions on Programming Languages and Systems 7, 3 (July 1985), 404--425.

Digital Library

Google Scholar

[15]

Jefferson, D., Beckman, B., Wieland, F., Blume, L., DiLoreto, M., P.Hontalas, Laroche, P., Sturdevant, K., Tupman, J., Warren, V., Weidel, J., Younger, H., and Bellenot, S. Time Warp operating system. In Proceedings of the 11th ACM Symposium on Operating Systems Principles (Austin, TX, November 1987), pp. 77--93.

Digital Library

Google Scholar

[16]

Katcher, J. PostMark: A new file system benchmark. Tech. Rep. TR3022, Network Appliance, 1997.

Google Scholar

[17]

King, S. T., and Chen, P. M. Backtracking intrusions. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (Bolton Landing, NY, October 2003), pp. 223--236.

Digital Library

Google Scholar

[18]

Kistler, J. J., and Satyanarayanan, M. Disconnected operation in the Coda file system. ACM Transactions on Computer Systems 10, 1 (February 1992).

Digital Library

Google Scholar

[19]

Lamport, L. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7 (1978), 558--565.

Digital Library

Google Scholar

[20]

Li, J., Krohn, M., Mazières, D., and Shasha, D. Secure untrusted data repository (SUNDR). In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (San Francisco, CA, December 2004), pp. 121--136.

Digital Library

Google Scholar

[21]

Liskov, B., and Rodrigues, R. Transactional file systems can be fast. In Proceedings of the 11th SIGOPS European Workshop (Leuven, Belgium, September 2004).

Digital Library

Google Scholar

[22]

Mazières, D., Kaminsky, M., Kaashoek, M. F., and Witchel, E. Separating key management from file system security. In Proceedings of the 17th ACM Symposium on Operating Systems Principles (Kiawah Island, SC, December 1999), pp. 124--139.

Digital Library

Google Scholar

[23]

Nelson, M. N., Welsh, B. B., and Ousterhout, J. K. Caching in the Sprite network file system. ACM Transactions on Computer Systems 6, 1 (1988), 134--154.

Digital Library

Google Scholar

[24]

Nightingale, E. B., and Flinn, J. Energy-efficiency and storage flexibility in the Blue File System. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (San Francisco, CA, December 2004), pp. 363--378.

Digital Library

Google Scholar

[25]

Rosenblum, M., Bugnion, E., Herrod, S. A., Witchel, E., and Gupta, A. The impact of architectural trends on operating system performance. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (Copper Mountain, CO, December 1995), pp. 285--298.

Digital Library

Google Scholar

[26]

Schmuck, F., and Wyllie, J. Experience with transactions in QuickSilver. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (October 1991), pp. 239--53.

Digital Library

Google Scholar

[27]

Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame, C., Eisler, M., and Noveck, D. Network File System (NFS) version 4 Protocol. Tech. Rep. RFC 3530, IETF, April 2003.

Digital Library

Google Scholar

[28]

Spector, A. Z., Daniels, D., Duchamp, D., Eppinger, J. L., and Pauch, R. Distributed transactions for reliable systems. In Proceedings of the 10th ACM Symposium on Operating Systems Principles (Orcas Island, WA, December 1985), pp. 127--146.

Digital Library

Google Scholar

[29]

Srinivasan, S., Andrews, C., Kandula, S., and Zhou, Y. Flashback: A light-weight extension for rollback and deterministic replay for software debugging. In Proceedings of the 2004 USENIX Technical Conference (Boston, MA, June 2004).

Digital Library

Google Scholar

[30]

Srinivasan, V., and Mogul, J. Spritely NFS: Experiments with cache consistency protocols. In Proceedings of the 12th ACM Symposium on Operating System Principles (December 1989), pp. 45--57.

Digital Library

Google Scholar

[31]

Steffan, J. G., Colohan, C. B., Zhai, A., and Mowry, T. C. A scalable approach to thread-level speculation. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA) (Vancouver, Canada, June 2000), pp. 1--24.

Digital Library

Google Scholar

[32]

Weinstein, M. J., Thomas W. Page, J., Livezey, B. K., and Popek, G. J. Transactions and synchronization in a distributed operating system. In Proceedings of the 10th ACM Symposium on Operating Systems Principles (Orcas Island, WA, December 1985), pp. 115--126.

Digital Library

Google Scholar

[33]

Zhang, Y., Rauchwerger, L., and Torrellas, J. Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors. In Proc. of the 5th Intl. Symposium on High Performance Computer Architecture (Orlando, FL, January 1999), p. 135.

Digital Library

Google Scholar

[34]

Zhu, N., and Chiueh, T. Design, implementation and evaluation of the Repairable File Service. In Proceedings of the International Conference on Dependable Systems and Networks (San Francisco, CA, June 2003).

Google Scholar

Cited By

View all

Tang ZXiao ZYang LHe KLi K(2023)A Network Load Perception Based Task Scheduler for Parallel Distributed Data Processing SystemsIEEE Transactions on Cloud Computing10.1109/TCC.2021.313262711:2(1352-1364)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TCC.2021.3132627
Chen YGuo ZLi RChen SZhou LZhou YZhang X(2021)ForerunnerProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483564(570-587)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3477132.3483564
Spall SMitchell NTobin-Hochstadt S(2020)Build scripts with perfect dependenciesProceedings of the ACM on Programming Languages10.1145/34282374:OOPSLA(1-28)Online publication date: 13-Nov-2020
https://dl.acm.org/doi/10.1145/3428237
Show More Cited By

Index Terms

Speculative execution in a distributed file system
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance
2. Software and its engineering
  1. Software creation and management
    1. Designing software
  2. Software organization and properties

Recommendations

Speculative execution in a distributed file system
SOSP '05: Proceedings of the twentieth ACM symposium on Operating systems principles

Speculator provides Linux kernel support for speculative execution. It allows multiple processes to share speculative state by tracking causal dependencies propagated through inter-process communication. It guarantees correct execution by preventing ...
Speculative execution in a distributed file system

Speculator provides Linux kernel support for speculative execution. It allows multiple processes to share speculative state by tracking causal dependencies propagated through interprocess communication. It guarantees correct execution by preventing ...
Three Architectural Models for Compiler-Controlled Speculative Execution

To effectively exploit instruction level parallelism, the compiler must move instructions across branches. When an instruction is moved above a branch that it is control dependent on, it is considered to be speculatively executed since it is executed ...

Reviews

Reviewer: Bayard Kohlhepp

Make it run; make it right; make it fast; make it small. The anonymous mantra above captures the evolution of all software (except for the "make it bloat" phase of overly mature products). Distributed processing has passed the novelty stage where just making it run was a thrill. Most improvements in the past decade have been targeted at making it right; in other words, ensuring correct, consistent data in the face of increasingly complicated user interaction scenarios. Pervasive Internet availability and the proliferation of handheld devices are currently motivating research into making software fast and small. It is the move of both casual and mission-critical applications to the Internet that has made improved performance of distributed software a primary goal. However, as in many other domains, designers have continually faced a tradeoff between making software fast and right; you could only increase one attribute by decreasing the other. The authors of this paper have broken through this dilemma. Their speculative techniques vastly improve performance without sacrificing correctness. Moreover, these are not just theoretical or hypothetical performance gains suggested by isolated laboratory experiments. The authors implement their techniques in off-the-shelf commodity software and benchmark the changes. Theirs is proven, real-world success. Their core finding is that it is cheaper to burn central processing unit (CPU) cycles and random access memory (RAM) (abundant resources) on checkpoints that are usually discarded than to waste time waiting for a synchronous remote procedure call to complete (time is the scarce resource). Because of longer latency times, Internet-based applications show even greater performance improvements than local area network applications. It?s that rare case where an increase in need is met with an increase in satisfaction. The paper is 15 pages long with one full page of bibliographic references (the references alone provide a great survey of the field). It is easy to read, and is more like a Communications of the ACM article than a research paper. A download of the base BlueFS file system they modified is available at http://notrump.eecs.umich.edu/group/group.html, the home page of the Pervasive Computing Research Group at University of Michigan. However, source code to the Speculator library or the modified file system is not available. The authors have revealed enough information to recreate their speculative library, but it would be great to see their work released under an open source license. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review

ACM SIGOPS Operating Systems Review Volume 39, Issue 5

SOSP '05

December 2005

290 pages

ISSN:0163-5980

DOI:10.1145/1095809

Issue’s Table of Contents

SOSP '05: Proceedings of the twentieth ACM symposium on Operating systems principles
October 2005
259 pages
ISBN:1595930795
DOI:10.1145/1095810
General Chair:
Andrew Herbert
Microsoft Research, UK
,
Program Chair:
Ken Birman
Cornell University, USA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2005

Published in SIGOPS Volume 39, Issue 5

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

83
Total Citations
View Citations
2,411
Total Downloads

Downloads (Last 12 months)287
Downloads (Last 6 weeks)17

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Tang ZXiao ZYang LHe KLi K(2023)A Network Load Perception Based Task Scheduler for Parallel Distributed Data Processing SystemsIEEE Transactions on Cloud Computing10.1109/TCC.2021.313262711:2(1352-1364)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TCC.2021.3132627
Chen YGuo ZLi RChen SZhou LZhou YZhang X(2021)ForerunnerProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483564(570-587)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3477132.3483564
Spall SMitchell NTobin-Hochstadt S(2020)Build scripts with perfect dependenciesProceedings of the ACM on Programming Languages10.1145/34282374:OOPSLA(1-28)Online publication date: 13-Nov-2020
https://dl.acm.org/doi/10.1145/3428237
Mathias HFoley SCoello Coello C(2020)A parallel two-stage genetic algorithm for route planningProceedings of the 2020 Genetic and Evolutionary Computation Conference Companion10.1145/3377929.3398116(1739-1746)Online publication date: 8-Jul-2020
https://dl.acm.org/doi/10.1145/3377929.3398116
Mazumdar SSeybold DKritikos KVerginadis Y(2019)A survey on data storage and placement methodologies for Cloud-Big Data ecosystemJournal of Big Data10.1186/s40537-019-0178-36:1Online publication date: 11-Feb-2019
https://doi.org/10.1186/s40537-019-0178-3
Mathias HFoley SFurlani T(2019)Improving a Genetic Algorithm for Route Planning Using Parallelism with Speculative ExecutionPractice and Experience in Advanced Research Computing 2019: Rise of the Machines (learning)10.1145/3332186.3333251(1-5)Online publication date: 28-Jul-2019
https://dl.acm.org/doi/10.1145/3332186.3333251
Guo PHu BLi RHu WShorey RMurty RChen YJamieson K(2018)FoggyCacheProceedings of the 24th Annual International Conference on Mobile Computing and Networking10.1145/3241539.3241557(19-34)Online publication date: 15-Oct-2018
https://dl.acm.org/doi/10.1145/3241539.3241557
Jablkowski BMueller MSpinczyk O(2018)High Availability in Cyber-physical Systems by Self-determined Virtual Machine Replication2018 IEEE 13th International Symposium on Industrial Embedded Systems (SIES)10.1109/SIES.2018.8442105(1-10)Online publication date: Jun-2018
https://doi.org/10.1109/SIES.2018.8442105
徐佳(2016)Adaptive Scheduling Strategy for Heterogeneous Spark ClusterComputer Science and Application10.12677/CSA.2016.61108406:11(692-704)Online publication date: 2016
https://doi.org/10.12677/CSA.2016.611084
Xu XCao LWang X(2016)Adaptive Task Scheduling Strategy Based on Dynamic Workload Adjustment for Heterogeneous Hadoop ClustersIEEE Systems Journal10.1109/JSYST.2014.232311210:2(471-482)Online publication date: Jun-2016
https://doi.org/10.1109/JSYST.2014.2323112
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations