2.1. Overview of PCI Express Bus
2.1. Overview of PCI Express Bus
2.1. Overview of PCI Express Bus
This laboratory work presents the serial variant of the PCI bus, referred to as PCI
Express. After an overview of the PCI Express bus, details about its architecture are present-
ed, including the PCI Express link, bus topology, architectural layers, transactions, and inter-
rupts. The physical layer is presented in more detail and the most important configuration
registers are described. The aim of the applications is to access the PCI configuration space
and to decode the information available in the configuration registers of PCI and PCI Express
devices.
The designers of the PCIe bus have maintained the main advantageous features of the
architecture of previous PCI bus generations. For instance, the PCIe bus uses the same com-
munication model as the PCI and PCI-X buses. The same address spaces are retained:
memory, I/O, and configuration. The PCIe bus allows using the same types of transactions as
the previous buses: memory read/write, I/O read/write, and configuration read/write. This
way, the compatibility is maintained with existing operating systems and software drivers,
which do not require changes.
Like previous PCI buses, PCIe supports chip-to-chip interconnection and board-to-
board interconnection via expansion cards and connectors. The expansion cards have a struc-
ture similar to that used by the expansion cards of PCI and PCI-X buses. A PCIe motherboard
has a similar form factor to existing ATX motherboards, used for personal computers.
In addition to retaining some advantageous features of the PCI and PCI-X buses, the
PCIe bus introduces various improvements for enhancing performance and reducing cost. As
opposed to the previous PCI and PCI-X generations, which are shared parallel buses, the PCIe
bus uses a serial point-to-point interconnect for communication between two peripheral de-
vices. First of all, a serial interconnect eliminates the disadvantages of a parallel bus, especial-
2 2. The PCI Express Bus
ly the difficulty of synchronization between multiple data lines due to the asymmetrical signal
propagation (skew). The cause of this skew may be the different length of data paths traveled
by various signals or the propagation on different layers of the printed circuit board. Although
the data signals of a parallel bus are transmitted simultaneously, they may reach the destina-
tion at different times. Increasing the clock frequency of a parallel bus is difficult, since the
clock cycle time may become shorter than the signal skew (which can be of a few nanosec-
onds). For a serial bus the signal skew problem does not arise, because there is no external
clock signal, as the synchronization information is embedded into the transmitted serial sig-
nal. Secondly, a point-to-point interconnect implies a reduced electrical load of the link,
which enables to increase the frequency of the clock signal used for data transfers.
The performance of PCIe bus is scalable, which is obtained by implementing a varia-
ble number of communication lanes per interconnect, based on performance requirements for
that interconnect.
The PCIe bus implements a switch-based technology to interconnect a large number
of peripheral devices. For the serial interconnect a packet-based communication protocol is
used. Instead of special signals for various functions, such as interrupt signaling, error han-
dling, or power management, both data and commands are transmitted in packets. By this the
pin count of devices and their cost are reduced.
The PCIe bus has several advanced features. For instance, the Quality of Service
(QoS) feature allows to ensure differentiated performance for different applications. The hot
plug and hot swap support enables to build systems that are always available. Advanced pow-
er management features allow to implement mobile applications with low power consump-
tion. The error handling feature makes the PCIe bus suitable for robust systems required for
high-end servers.
It supports advanced error reporting and handling to improve fault isolation and error
recovery.
It supports hot-plug and hot-swap of peripheral devices, without the need to use addi-
tional signals.
During hardware initialization, for each PCIe link the lane width and frequency of
operation are negotiated. The link width and frequency of operation are set automatically by
the devices at each end of the link, without involving the operating system. After initializa-
tion, each link must only operate at the operating frequency that has been set. The first version
of the PCIe specification defined an operating frequency of 2.5 GHz, which corresponds to an
effective bandwidth of 2.5 Gbits/s for each communication lane and direction. In the subse-
quent versions, the operating frequency increased to 5 GHz, and then to 8 GHz.
Endpoints represent peripheral devices that participate to PCIe transactions. There are
two types of endpoints. An initiator (requester) endpoint initiates a transaction in the PCIe
system, while a target (completer) endpoint responds to transactions that are addressed to it.
In a PCIe hierarchy, in addition to PCIe endpoints, legacy endpoints may also exist, which are
compatible with previous generations of the PCI bus. Like with the classical PCI bus, PCIe
devices may have up to eight logical functions, so that an endpoint may be composed of up to
eight functions numbered from 0 through 7. Each endpoint is assigned a device identifier
(ID), which consists of a bus number, device number, and function number.
A switch forwards packets from any of its input (ingress) ports to one of its output
(egress) ports, in a manner similar to a PCI-to-PCI bridge. The packets are transferred via a
routing mechanism based on either an address or an identifier. An arbitration mechanism is
used, by which the priority with which packets are forwarded from input ports to output ports
is determined.
The PCIe specification defines the architecture of PCIe devices in terms of three logi-
cal layers, which are the last three layers from those previously listed. Each of these layers
may be divided into two sections, one that processes information to be transmitted and one
that processes information received (Figure 2.5). This logical organization, however, does not
imply a particular implementation of PCIe devices.
The PCIe bus uses packets for transferring information between pairs of devices con-
nected via a PCIe link. Consider first the transfer of information from device A to device B.
Packets are formed in the transaction layer based on information obtained from the device
core and application. A particular packet is stored in a buffer to be transmitted to the lower
6 2. The PCI Express Bus
layers. The data link layer extends the packet with additional information required for error
detection at a receiver device. This packet is then encoded in the physical layer and transmit-
ted through differential signals over the PCIe link by the analog portion of this layer.
Now consider the reception of information by device B. Packets are decoded in the
physical layer and their contents are forwarded to the upper layers. The data link layer checks
for errors in a received packet, and if there are no errors forwards the packet to the transaction
layer. This layer stores the packet in a buffer and converts the information in the packet to a
representation that can be processed by the device core and application.
Figure 2.6 illustrates the conceptual information flow that is transferred through the
three logical layers of PCIe devices.
Figure 2.6. Packet flow through the logical layers of PCIe devices.
The software layer or device core sends to the transaction layer the information re-
quired to create the main section of the packet. This information is the header and data field of
the packet. Optionally, a CRC code is computed and appended to the packet as the ECRC
(End-to-End CRC) field. This field is used by the target device of the packet to detect CRC
errors in the header and data field.
The packet created by the transaction layer is forwarded to the data link layer, which
appends to this packet a sequence number and another CRC field, LCRC (Link CRC). The
LCRC field is used by the receiver device at the other end of the link to detect CRC errors in
the packet created by the transaction layer and in the sequence number. The resulting packet
is forwarded to the physical layer, which concatenates two Start and End characters of one
byte each that will frame the packet. The packet is then encoded and is transmitted through
differential signals over a PCIe link using the available number of communication lanes.
such as in the case of a read transaction, the target device gathers these data, obtains bus own-
ership, and returns the requested data. The completion packet returned by the target device
confirms that the request packet has been received by the target device.
The second type is represented by transactions for which the target device does not
return a completion packet back to the initiator device; these are called posted transactions. In
this way, the time required for completing the transaction is shorter, but the initiator device
does not have knowledge of successful reception of the request packet by the target device.
link width, link data rate, communication lane reversal, polarity inversion, and signal skew
compensation within a multi-lane link.
Link width. It is possible connecting two devices via ports with a different number of
communication lanes per link. After initialization, the link width will be set to the minimum
lane width of the two connected ports. For instance, a certain device with an x2 PCIe port
may be connected to another device with an x4 PCIe port. For communication between two
devices, the link width will be set to x2.
Link data rate. Initially, a link’s data rate is set to the minimum value of 2.5 Gbits/s.
During link training, each device advertises its highest data rate that is capable of. The link
will be initialized with the highest common frequency supported by the two devices at oppo-
site ends of the link.
Communication lane reversal. When a link contains several communication lanes,
these are numbered. When two devices are physically connected, the communication lanes of
the devices’ ports may not be connected correctly. In such a case, link training allows for the
lane numbers to be reversed, so that the lane numbers of adjacent ports on each end of the link
correspond.
Polarity inversion. The D+ and D- differential wire pairs of two devices may not be
connected correctly. In this case, as the result of link training, the receiver device will invert
the polarity of the receiver circuit’s terminals.
Skew compensation. In case of a multi-lane link, due to length variations of physical
wires and different characteristics of driver/receiver circuitry, bit streams on a lane may be
received skewed with respect to other lanes. The receiver circuits must compensate for this
skew by adding delays on some lanes.
A PCIe function’s configuration space is divided into two sections. The first section
represents the PCI-compatible configuration space and it occupies the first 256 B (64 double-
words of 32 bits) of the 4-KB space. The first 16 double-words of this section represent the
PCI configuration header, while de remaining 48 double-words are reserved for the imple-
mentation of function-specific configuration registers.
The second section of a PCIe function’s configuration space represent the PCIe ex-
tended configuration space and it occupies 3840 B (960 double-words). This space is used to
implement the PCIe extended capability registers, which are optional. Examples of such regis-
ters are the advanced error reporting capability register set, virtual channel capability register
set, and device serial number capability register set.
Input/Output Systems and Peripheral Devices 11
The PCI-compatible configuration space may be accessed via two methods, either
through the PCI-compatible configuration mechanism or the PCIe enhanced configuration
mechanism. These access mechanisms are presented in the following sections. A PCIe func-
tion’s extended configuration space can only be accessed through the PCIe enhanced configu-
ration mechanism.
The meaning of the fields in the configuration address port is presented next.
Bit 31 represents the enable bit for mapping the configuration space. This bit must be
set to one to enable the translation of a subsequent processor access to the configura-
tion data port into a transaction for accessing the configuration space. If this bit is set
to zero and the processor initiates an access to the configuration data port, the opera-
tion will be translated into a transaction for accessing the I/O space and not the con-
figuration space.
Bits 30..24 are reserved and must be set to zero.
Bits 23..16 identify the PCI bus number (0..255).
12 2. The PCI Express Bus
The structure of configuration header type zero is illustrated in Figure 2.8. Part of the
registers in this header must be implemented in every PCI or PCIe device, including bridges.
These mandatory registers are shown in a darker color in Figure 2.8. The mandatory configu-
ration header registers are described next.
Note
The PCI-e.h header file defines the configuration header type zero in a structure called
PCI_CONFIG0.
Vendor ID Register
This 16-bit register identifies the manufacturer of the device. The vendor identifier is
assigned by the PCI SIG organization. The value 0xFFFF is reserved and is returned by the
host-PCI bridge when an attempt is made to perform a read from a configuration register in a
non-existent PCI function.
Device ID Register
This 16-bit register contains an identifier assigned by the device manufacturer that
identifies the type of device. In conjunction with the Vendor ID register and possibly the Re-
vision ID register, the Device ID register can be used to locate a function-specific driver for
the device.
Revision ID Register
This 8-bit register contains an identifier that is assigned by the device manufacturer
and identifies the revision number of the device.
Class Code Register
The structure of this register is illustrated in Figure 2.10. It is a 24-bit register divided
into three 8-bit fields: class code (the upper byte), sub-class code (the middle byte), and pro-
gramming interface (the lower byte). This register identifies the basic function of the device
(for instance, a mass storage controller), a more specific device sub-class (such as SATA
mass storage controller), and, in some cases, a register-specific programming interface.
For many class code/sub-class code combinations, the programming interface byte
returns zero, and therefore it has no meaning. For other combinations, however, the pro-
gramming interface byte does have meaning, as it identifies the exact register set layout of the
function, which can vary from one implementation to another. For instance, there are different
types of USB controllers with the same class code and sub-class code, but with different pro-
gramming interfaces (e.g., UHCI, OHCI, EHCI, and XHCI).
Note
The PCI-e.h header file contains the currently-defined class codes, sub-class codes, and
programming interfaces, in a structure called PCI_CLASS_TABLE. This structure also
contains pointers to two descriptors (texts) that can be used for decoding the information
contained in the Class Code register: the first is a class and sub-class descriptor, and the
second is a programming interface descriptor.
Command Register
This 16-bit register provides basic control over the device’s ability to perform PCI or
PCIe transactions. It contains bits that allow to enable or disable the I/O address space decod-
14 2. The PCI Express Bus
er, enable or disable the memory address space decoder, enable or disable the function’s abil-
ity to issue memory access requests or I/O requests, enable or disable the reporting of errors
detected by the function, and enable or disable the function’s ability to generate INTx inter-
rupt messages. The Command register is not described in detail in this laboratory work.
Status Register
This 16-bit register traces the status of events related to the PCI or PCIe bus. It con-
tains bits that indicate the interrupt status (whether the function has an interrupt request out-
standing), whether a parity error has been detected, or whether a transaction has been aborted
by the target or the initiator device. The Status register is not described in detail in this labora-
tory work.
Some bits of this register have RO (Read Only) attribute, while others have R/W
(Read/Write) attribute. A particular feature of the bits that can be written is that they can be
cleared, but not set. A bit can be cleared by writing a one to it; this attribute is denoted as
RW1C (Read/Write 1 to Clear). This method was chosen to simplify programming. After
reading the status and identifying the error bits that are set, the programmer can clear these
bits by writing the value that was read back to the register.
Header Type Register
Bits 6..0 of this 8-bit register define the configuration header type. The following
header types are currently defined:
0: Non-bridge function;
1: PCI(X)-to-PCI(X) bridge;
2: CardBus bridge.
Bit 7 defines the device as a single-function device (if bit 7 is 0) or a multi-function
device (if bit 7 is 1). During configuration, the programmer may test the state of this bit to
determine whether there are any other functions of the device that require configuration.
Subsystem Vendor ID and Subsystem ID Registers
The 16-bit Subsystem Vendor ID register contains an identifier assigned by the PCI
SIG organization. The 16-bit Subsystem ID register contains an identifier assigned by the
function vendor. A value of zero read from these registers indicates that there is no subsystem
vendor ID and subsystem ID associated with the function.
A function may reside on an expansion card or within an embedded device. Functions
that are designed around the same PCI, PCI-X, or PCIe core logic may have the same vendor
ID and device ID assigned by the core logic vendor. In this case, the operating system would
not be able to identify the correct driver to be used for that function. The Subsystem Vendor
ID and Subsystem ID registers are used to uniquely identify the expansion card or subsystem
that the function resides within. Consequently, the operating system can distinguish the dif-
ference between cards or subsystems manufactured by different vendors but designed around
the same core logic.
code of zero indicates successful completion of the test. A non-zero value represents a func-
tion-specific error code.
For a 32-bit memory decoder, the Base Address register contains a start memory ad-
dress in the first 4 GB of the memory address space. For a 64-bit memory decoder, the Base
Address register contains a start address anywhere in the memory address space of 264 bytes.
In this case, the Base Address register occupies two consecutive double-words in the configu-
ration header space. The first double-word contains the lower 32 bits of the memory start
address and the second double-word contains the upper 32 bits of the memory start address.
16 2. The PCI Express Bus
Bit 3 indicates whether the memory block is prefetchable (bit 3 = 1) or not (bit 3 = 0).
For a prefetchable memory block, it is acceptable for a bridge that resides between an initiator
and a memory target to prefetch data from memory into a buffer in order to yield better per-
formance.
Bits 31..7 for a 32-bit memory decoder and bits 63..7 for a 64-bit memory decoder
contain the memory base address.
For each memory Base Address register, the configuration software should determine
whether the register is implemented, what is the size of the register (32 bits or 64 bits), and
what is the size of the memory space corresponding to the register. The size of the memory
space can be determined using the following procedure:
1. Read the contents of the Base Address register into a temporary variable.
2. Write the value consisting of all one bits to the Base Address register.
3. Read back the contents of the Base Address register and then restore its contents from
the temporary variable. If the value read is zero, it indicates that the Base Address
register is not implemented and the procedure is completed.
4. If the value read is not zero, scan the bits of the value upwards starting with the least
significant bit of the Base Address field (bit 7) until the first bit set to one is found.
The binary-weighted value of the least significant bit set to one represents the size of
the memory space associated with the Base Address register.
As an example, assume that the value 0xFFFFFFFF is written to a Base Address reg-
ister and the value read back from the register is 0xFFF00000. As the value read back is not
zero, the register is implemented. Since bit 0 is zero and bits 2..1 are 00, the register is a 32-
bit memory address decoder. Bit 20 is the first bit set to one in the Base Address field. The
binary-weighted value of this bit is 220, which means that the size of the memory space corre-
sponding to this register is 1 MB.
Structure of an I/O Base Address Register
An I/O Base Address register has a size of 32 bits. Figure 2.13 illustrates the structure
of an I/O Base Address register. Bit 0 is one and indicates an I/O address decoder. Bit 1 is
reserved and always returns zero when read. Bits 31..2 represent the Base Address field.
The upper 16 bits of an I/O Base Address register may be hardwired to zero by the
manufacturer when a function is designed specifically for a PC-compatible computer, since
Intel x86 processors are limited to an I/O space of 64 KB.
The size of the I/O space corresponding to an I/O Base Address register can be de-
termined using the same procedure used for determining the size of the memory space. The
only difference is that the least significant bit of the Base Address field is bit 2 instead of bit
7. As a second example, assume that the value 0xFFFFFFFF is written to a Base Address
register and the value read back is 0xFFFFFF01. Bit 0 is one, indicating that the register is an
I/O address decoder. Scanning upwards starting with bit 2, bit 8 is the first bit set to one in the
Base Address field. The binary-weighted value of this bit is 28, which means that the size of
the I/O space corresponding to this register is 256 bytes.
Input/Output Systems and Peripheral Devices 17
2.7. Applications
2.7.1. Answer the following questions:
a. What are the improvements introduced by the PCIe bus compared to the previous
PCI and PCI-X buses?
b. What are the main components of the PCIe bus topology?
c. What are the methods that can be used by PCIe devices for interrupt signaling?
d. What are the operational parameters determined during link initialization and
training?
e. What are the configuration registers that have to be used to uniquely identify a
PCIe function?
2.7.2. Create a Windows application for identifying each PCIe device in the computer.
As model for the Windows application, use the AppScroll-e application, whose source files are
available on the laboratory web page in the AppScroll-e.zip archive. Perform the following oper-
ations to create the application project:
1. In the Microsoft Visual Studio programming environment, create a new project by se-
lecting General Empty Project in the New Project dialog window. Deselect the
Create directory for solution option to avoid creating another folder for the solution.
2. Copy to the project folder the files contained in the AppScroll-e.zip archive and add to
the project these files.
3. Change the active solution platform to x64.
4. Copy to the project folder the Hw.h and Hw64.lib files from the folder of a previously
created project. Copy to the project folder the PCI-e.h file, available on the laboratory
page in the PCI-e.zip archive.
5. Add to the project the Hw.h and PCI-e.h files.
6. Specify the Hw64.lib file as an additional dependency for the linker.
7. Open the AppScroll-e.cpp source file and add a #include directive to include the
PCI-e.h header file.
8. Select Build Build Solution and make sure that the application builds without er-
rors.
In the AppScroll-e.cpp source file, first write a function that returns a pointer to a PCIe
function’s configuration header using the PCIe enhanced configuration mechanism. The func-
tion has as input parameters the bus number, device number, and PCIe function number, and
it returns a pointer to a PCI_CONFIG0 structure containing the PCIe function’s configuration
header. The enhanced configuration mechanism is described in Section 2.6.3. Next, use this
function to search for PCIe devices on each bus between 0 and 15, for each device (0..31) and
for each function (0..7) of a device. For each existing PCIe device, the following information
should be displayed (on separate lines):
Bus number, device number, function number;
Class code, sub-class code, programming interface, subsystem vendor ID, subsys-
tem ID, class/sub-class descriptor, programming interface descriptor.
Use the structures defined in the PCI-e.h header file. For displaying the class/sub-
class descriptor and the programming interface descriptor, search in the PciClassTable
array using the class code, the sub-class code, and the programming interface as search keys.
18 2. The PCI Express Bus
Notes
If the Vendor ID register of a PCIe function returns the value 0xFFFF when read, it
means that the function does not exist.
The configuration registers of a PCIe function should not be accessed directly, but rather
via the MarvinHw driver. For example, assuming that pRegPci is a pointer to a func-
tion’s configuration header, the Vendor ID register can be read into the wVendorID var-
iable as follows:
wVendorID = _inmw((DWORD_PTR)&pRegPci->VendorID);
2.7.3. Extend Application 2.7.2 to display additional information about the existing
PCIe devices in the computer. The additional information that should be displayed is the fol-
lowing:
Vendor ID, vendor descriptor;
Device ID, chip descriptor.
Use the PCI-vendor-dev.h header file, available on the laboratory web page in the
PCI-e.zip archive. For displaying the vendor descriptor, search in the PciVenTable array
using the vendor ID as search key and use the CHAR *VenFull member of the
PCI_VENTABLE structure. For displaying the chip descriptor, search in the PciDevTable
array using the vendor ID and the device ID as search keys and use the CHAR *ChipDesc
member of the PCI_DEVTABLE structure.
Notes
The number of entries in the PciVenTable array is defined as PCI_VENTABLE_LEN.
The number of entries in the PciDevTable array is defined as PCI_DEVTABLE_LEN.
2.7.4. Create a Windows application for displaying the contents of a PCIe function’s
configuration header. Use the PCI-compatible configuration mechanism for accessing the
configuration space (Section 2.6.2). First, write a function that reads the contents of a single
double-word from a PCIe function’s configuration header using the PCI-compatible configu-
ration mechanism. The function has the following input parameters: bus number; device
number; PCIe function number; double-word number. The function returns the contents of the
specified double-word. Then, use this function to read and display the contents of the first 16
double-words from the configuration space of the PCIe function defined by bus number 0,
device 31, and function 3 (for the current Intel chipsets, this function represents the SMBus
controller). Display on each line a double-word number and its contents in hexadecimal.
2.7.5. Create a Windows application for identifying the mass storage controller in the
computer (with a class code of 0x01) and displaying the contents of its implemented base
address registers. Search for the mass storage controller on each bus between 0 and 15, for
each device (0..31) and for each PCIe function (0..7) of a device. Use the PCIe enhanced con-
figuration mechanism for accessing the configuration space. After identifying the controller,
display its bus number, device number, and function number. Then, for the controller’s each
six base address registers perform the following operations:
Determine whether the register is implemented (as described in Section 2.6.5);
If the register is implemented, display its type (memory or I/O decoder);
If the register is a memory decoder, display its size (32 bits or 64 bits) and the cor-
responding memory base address;
If the register is an I/O decoder, display the corresponding I/O base address.
2.7.6. Extend Application 2.7.5 to display the size of the memory space or I/O space
for each implemented base address register. Use the procedure described in Section 2.6.5 to
determine the size of the memory space or I/O space corresponding to a register.
Input/Output Systems and Peripheral Devices 19
Bibliography
[1] Ajanovic, J., “PCI Express (PCIe) 3.0 Accelerator Features”, Intel Corporation, 2008,
http://www.intel.com/content/dam/doc/white-paper/pci-express3-accelerator-white-
paper.pdf.
[2] Bhatt, A. V., “Creating a PCI Express Interconnect”, Intel Corporation, 2002,
http://www.advancedaudiorentals.com/phpkb/admin/attachments/pci_express_white_pap
er.pdf.
[3] Budruk, R., Anderson, D., Shanley, T., PCI Express System Architecture, MindShare
Inc., Addison-Wesley Developer’s Press, 2008, https://www.mindshare.com/files/
ebooks/PCI%20Express%20System%20Architecture.pdf.
[4] PCI-SIG, “PCI Express Base Specification Revision 3.0”, November 10, 2010.
[5] Shanley, T., Anderson, D., PCI System Architecture, Fourth Edition, MindShare Inc.,
Addison-Wesley Developer’s Press, 1999.
[6] *** PCI Vendor and Device Lists, http://www.pcidatabase.com.
[7] *** The PCI ID Repository, https://pci-ids.ucw.cz.