Building a reliable operating system

January 2008

Author:
Francis Manoj David
University of Illinois at Urbana-Champaign
,
Adviser:
Roy H. Campbell
University of Illinois at Urbana-Champaign

Publisher:

University of Illinois at Urbana-Champaign
Champaign, IL
United States

ISBN:978-1-109-02562-0

Order Number:AAI3347295

Pages:

100

Purchase on ProQuest

Bibliometrics

Abstract

Despite many decades of research, the management of errors in a live operating system remains a challenging problem. This thesis presents CuriOS, an operating system that incorporates several new error management techniques that significantly improve reliability. Errors detected by both hardware and software are signaled using language exception handling mechanisms. Unhandled exceptions do not crash the operating system and are dispatched to recovery routines.

The architecture of CuriOS is influenced by microkernel design principles. Individual operating system services are assigned separate protection domains. This componentization provided by traditional microkernel designs helps confine errors. However, an error that occurs in a microkernel operating system service can potentially result in state corruption and service failure. A simple restart of the failed service is not always the best solution for reliability. Blindly restarting a service which maintains client-related state such as session information results in the loss of this state and affects all clients that were using the service. CuriOS adopts a novel design that uses lightweight distribution, isolation and persistence of client-related state information maintained by operating system services. This helps mitigate the problem of state loss during a restart. This design also achieves inter-client isolation by curtailing error propagation within services.

Fault injection experiments show that it is possible to recover from 87% or more manifested errors in operating system services such as the file system, timer, scheduler and network while maintaining low performance overheads.

Contributors

Roy Harold Campbell
University of Illinois Urbana-Champaign
- Publication Years1974 - 2023
- Publication counts265
- Citation count3,938
- Available for Download86
- Downloads (cumulative)81,961
- Downloads (12 months)4,634
- Downloads (6 weeks)744
- Average Downloads per Article953
- Average Citation per Article15
View Full Profile
Francis Manoj David
Microsoft Corporation
- Publication Years2004 - 2023
- Publication counts16
- Citation count676
- Available for Download7
- Downloads (cumulative)8,854
- Downloads (12 months)129
- Downloads (6 weeks)14
- Average Downloads per Article1,265
- Average Citation per Article42
View Full Profile

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

THE LINUX OPERATING SYSTEM: AN INTRODUCTION
Towards an immortal operating system in virtual environments
Highlights
- We show how a commercial OS can be successfully recovered from a crash.
- Support ...
Abstract
Many OS crashes are caused by bugs in kernel extensions or device drivers while the OS itself may have been tested rigorously. To make an OS immortal we must resurrect the OS from these crashes. We present a novel OS-hypervisor ...
Building a Self-Healing Operating System
DASC '07: Proceedings of the Third IEEE International Symposium on Dependable, Autonomic and Secure Computing

User applications and data in volatile memory are usu- ally lost when an operating system crashes because of er- rors caused by either hardware or software faults. This is because most operating systems are designed to stop working when some internal ...

Browse Theses

Sections

THE LINUX OPERATING SYSTEM: AN INTRODUCTION

Towards an immortal operating system in virtual environments

Building a Self-Healing Operating System

Sections

Save to Binder

Recommendations

THE LINUX OPERATING SYSTEM: AN INTRODUCTION

Towards an immortal operating system in virtual environments

Building a Self-Healing Operating System