[TXT][TXT] Eliminating receive livelock in an interrupt-driven kernel

JC Mogul, KK Ramakrishnan - ACM Transactions on Computer Systems, 1997 - usenix.org
ACM Transactions on Computer Systems, 1997usenix.org
1. Introduction they are not flow-controlled. Some multi-media ap-Most operating systems
use interrupts to interplications want constant-rate, low-latency service; nally schedule the
performance of tasks related to I/O RPC-based client-server applications often use events,
and particularly the invocation of network datagram-style transports, instead of reliable,
flowprotocol software. Interrupts are useful because they controlled protocols. Note that
whereas I/O devices allow the CPU to spend most of its time doing useful such as disks …
1. Introduction they are not flow-controlled. Some multi-media ap-Most operating systems use interrupts to interplications want constant-rate, low-latency service; nally schedule the performance of tasks related to I/O RPC-based client-server applications often use events, and particularly the invocation of network datagram-style transports, instead of reliable, flowprotocol software. Interrupts are useful because they controlled protocols. Note that whereas I/O devices allow the CPU to spend most of its time doing useful such as disks generate interrupts only as a result of processing, yet respond quickly to events without requests from the operating system, and so are inconstantly having to poll for event arrivals. herently flow-controlled, network interfaces generate Polling is expensive, especially when I/O events unsolicited receive interrupts. are relatively rare, as is the case with disks, which The shift to higher event rates and non-flowseldom interrupt more than a few hundred times per controlled protocols can subject a host to congestive second. Polling can also increase the latency of collapse: once the event rate saturates the system, response to an event. Modern systems can respond to without a negative feedback loop to control the an interrupt in a few tens of microseconds; to achieve sources, there is no way to gracefully shed load. If the same latency using polling, the system would the host runs at full throughput under these conhave to poll tens of thousands of times per second, ditions, and gives fair service to all sources, this at which would create excessive overhead. For a least preserves the possibility of stability. But if general-purpose system, an interrupt-driven design throughput decreases as the offered load increases, works best. the overall system becomes unstable. Most extant operating systems were designed to Interrupt-driven systems tend to perform badly handle I/O devices that interrupt every few milunder overload. Tasks performed at interrupt level, liseconds. Disks tended to issue events on the order
usenix.org