Larrabee: a many-core x86 architecture for visual computing

Published: 01 August 2008


This paper presents a many-core visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications. Larrabee uses multiple in-order x86 CPU cores that are augmented by a wide vector processor unit, as well as some fixed function logic blocks. This provides dramatically higher performance per watt and per unit of area than out-of-order CPUs on highly parallel workloads. It also greatly increases the flexibility and programmability of the architecture as compared to standard GPUs. A coherent on-die 2nd level cache allows efficient inter-processor communication and high-bandwidth local data access by CPU cores. Task scheduling is performed entirely with software in Larrabee, rather than in fixed function logic. The customizable software graphics rendering pipeline for this architecture uses binning in order to reduce required memory bandwidth, minimize lock contention, and increase opportunities for parallelism relative to standard GPUs. The Larrabee native programming model supports a variety of highly parallel applications that use irregular data structures. Performance analysis on those applications demonstrates Larrabee's potential for a broad range of parallel computation.

Hector Yee

In the early years of computer graphics, software renderers were very popular on the personal computer. These renderers have been recently supplanted by graphics processing units (GPUs), which first took over fixed-function operations such as triangle setup and rasterization, and eventually grew to encompass the computation of transformation and lighting of geometry completely in hardware. Recent GPU technology enables the user to have limited customization of the shading of pixels and transformation of geometry by means of programmable graphics hardware. However, some kinds of operations, such as creation and manipulation of dynamic data structures (for example, linked lists and other irregular data structures), are still difficult to implement on graphics hardware, and are important for many rendering problems. The Larrabee architecture, described by the authors, attempts to address issues such as this by implementing a multi-core general-purpose processor-based architecture, augmented with several vector units, as an alternative to the classic GPU model. This paper is written in two parts, the first describing the hardware architecture and the second describing an implementation of a software renderer running on top of the architecture. The hardware is described as many in-order central processing units (CPUs), based on the Intel x86 architecture, connected by an interprocessor ring network for communication, with each having its own L2 cache. The hardware has additional fixed-function units that perform tasks such as texture filtering, which is difficult to implement efficiently in software. Almost everything else, such as shading and geometry transformation, is done in software. The Larrabee software renderer follows a sort-middle architecture, where polygons are binned up for rendering and then each block is rendered at once, in order to use the CPU as much as possible while not saturating the bandwidth with too many simultaneous memory requests. The authors show almost linear scale up with the number of CPUs for applications, such as game fluid simulation; applications such as rigid-body simulations do not scale up as well. It is interesting to see real-time rendering software go the full circle from software to hardware and now back to software. I am eager to see the actual hardware in operation in the future. My only disappointment with the paper is the lack of comparison with existing GPUs in terms of performance on state-of-the-art games. The authors do, however, provide a detailed analysis of how each game uses the CPU and bandwidth of the architecture. Online Computing Reviews Service

