Nothing Special   »   [go: up one dir, main page]

Modelo 5G Ok

Download as pdf or txt
Download as pdf or txt
You are on page 1of 75

Creating a reusable FPGA

programming model architecture


for 5G layer 1

Henry Räbinä

School of Electrical Engineering

Thesis submitted for examination for the degree of Master of


Science in Technology.
Espoo 24.7.2019

Supervisor

Prof. Kari Halonen

Advisor

M.Sc. Antti Kunnas


Copyright
c 2019 Henry Räbinä
Aalto University, P.O. BOX 11000, 00076 AALTO
www.aalto.fi
Abstract of the master’s thesis

Author Henry Räbinä

Title Creating a reusable FPGA programming model architecture for 5G layer 1

Degree programme Master’s Programme in Electronics and Nanotechnology

Major Micro- and Nanoelectronic Circuit Design Code of major ELEC3036

Supervisor Prof. Kari Halonen

Advisor M.Sc. Antti Kunnas

Date 24.7.2019 Number of pages 8+67 Language English

Abstract
This master’s thesis assesses the problems related to the usage of field-programmable
gate arrays (FPGAs) in a team-based embedded system development, in the
context of 5G layer 1 (5G L1). Due to the high potential for performance and
efficiency improvements, considerable effort has recently been put into integrating
FPGAs into traditional software development. Several tools have been developed
for this purpose, but they all share the common issue of adding overhead to the
program, which cannot be tolerated in 5G L1 due to strict real-time requirements.

Due to the overhead issues, the interface between the software and the
FPGA hardware in 5G L1 is implemented as a direct memory mapping.
However, the FPGA designs include multiple design variants, each containing
parts that are common to every variant, and parts that are specific to only
one variant. Because developing the memory mapped interface for each
separate variant would require excessive software branching, the concept of
a unified programming model (UPM) was developed to reduce the need for
branching. The UPM requires the common interfaces of each variant to be the same.

Although the UPM makes the 5G L1 software development simpler, the


creation and maintenance of the UPM is difficult and requires substantial
synchronization between the hardware teams. Currently, the communication
between the teams lacks proper organization, and therefore it is difficult to keep the
interfaces of the common parts of each variant similar. This thesis strives to solve
this problem by creating a simple and organized method as a proof-of-concept for
sharing the modifications of the memory mapped interface between the hardware
design teams. The method is based on a single “top-level” file that is used to store
the descriptions of every hardware interface of each design variant. The analysis of
the method shows that if used correctly, it enhances the communication between
the teams, and ensures that every released FPGA bitstream complies to the UPM.
Keywords 5G, Embedded system, FPGA, Reconfigurable hardware
Aalto-yliopisto, PL 11000, 00076 AALTO
www.aalto.fi
Diplomityön tiivistelmä

Tekijä Henry Räbinä

Työn nimi Uudelleenkäytettävä FPGA:n ohjelmointimalli 5G verkon 1. kerrokselle

Koulutusohjelma Elektroniikka ja nanoteknologia

Pääaine Mikro- ja nanoelektroniikkasuunnittelu Pääaineen koodi ELEC3036

Työn valvoja Prof. Kari Halonen

Työn ohjaaja DI Antti Kunnas

Päivämäärä 24.7.2019 Sivumäärä 8+67 Kieli Englanti

Tiivistelmä
Tässä diplomityössä tarkastellaan kenttäohjelmoitavien porttimatriisien (FPGA)
käyttöä ja niihin liittyviä ongelmia tiimipohjaisessa sulautettujen järjestelmien
kehityksessä 5G verkkojen 1. kerroksen kontekstissa. FPGA-piirit tarjoavat paljon
mahdollisuuksia sovellusten suorituskyvyn ja energiatehokkuuden parantamiseksi,
mistä johtuen niiden sisällyttämistä perinteiseen ohjelmistokehitykseen on tutkittu
merkittävästi viime aikoina. Tähän tarkoitukseen on kehitetty useita ohjelmistoja,
mutta ne eivät sovellu käytettäväksi 5G-kehityksessä, koska niiden aiheuttama
ylimääräinen viive on liian suuri 5G:n reaaliaikavaatimusten puitteissa.

Edellä mainituista viiveongelmista johtuen ohjelmiston ja FPGA:lle toteu-


tetun laitteiston välinen rajapinta on toteutettu suoraan muistiosoitteiden avulla.
Tämä ei kuitenkaan ole täysin ongelmatonta, sillä 5G:n kehityksessä käytetään
useita FPGA-toteutuksia, joista jokaisessa on sekä kaikille toteutuksille yhteisiä
osioita, että vain yhdessä toteutuksessa käytettyjä osioita. Jotta vältyttäisiin eri
varianttien vaatimien rajapintojen eroista seuraavalta ohjelmiston haaroittamiselta,
on kehitetty yhtenäistetty ohjelmointimalli (UPM), joka vaatii, että kaikkien
FPGA-toteutusten rajapintojen yhteisten osioiden tulee olla samanlaiset.

UPM:n luominen ja ylläpitäminen on kuitenkin haastavaa ja se vaatii


FPGA:n kehitystiimeiltä merkittävää keskinäistä synkronointia. Keskeisin ongelma
on, että nykyinen kommunikointi ei ole riittävän järjestelmällistä rajapintojen
samanlaisuuden takaamiseksi. Tämä diplomityö pyrkii ratkaisemaan kyseisen
ongelman toteuttamalla soveltuvuusselvityksen yksinkertaisesta ja järjestelmälli-
sestä menetelmästä, jonka avulla tieto rajapinnan muutoksista jaetaan eri tiimien
kesken. Menetelmän perusidea on, että kaikki tieto eri osioiden rajapinnoista
on tallennettu yhteen tiedostoon, jota jokainen tiimi voi muokata. Menetelmän
analyysi osoittaa, että oikein käytettynä se parantaa tiimien välistä synkronointia
ja takaa, että kehitetty FPGA-toteutus on yhteensopiva UPM:n kanssa.
Avainsanat 5G, Sulautettu järjestelmä, FPGA, Uudelleenohjelmoitava laitteisto
v

Preface
This thesis was written for Master’s degree in Micro- and Nanoelectronic circuit
design major in Aalto University. The research work of this thesis was conducted at
Nokia Solutions and Networks (NSN).

I want to express my gratitude to my advisor Antti Kunnas and to my super-


visor Kari Halonen for their valuable help and guidance throughout this thesis work.
I would also want to thank my line manager Matti Rintamäki for providing me
the opportunity to do this thesis for NSN. Finally, I want to thank Tomi Alanen
and Petri Kukkala in the Nokia office at Tampere for sharing their knowledge and
expertise without which this thesis would not have been completed.

Otaniemi, 24.7.2019

Henry Räbinä
vi

Contents
Abstract iii

Abstract (in Finnish) iv

Preface v

Contents vi

Abbreviations vii

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Goals and contribution . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Theory 5
2.1 Cellular networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 5G technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Overall architecture and protocol stack . . . . . . . . . . . . . 8
2.2.2 5G layer 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 FPGAs and SoCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Operation of FPGAs . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.2 Hardware design using FPGAs . . . . . . . . . . . . . . . . . . 17
2.4 Co-operation between hardware and software . . . . . . . . . . . . . . 19
2.5 Software development for reconfigurable hardware . . . . . . . . . . . 23
2.6 General guidelines for team-based FPGA development . . . . . . . . 31

3 Solution development 35
3.1 Current FPGA design workflow . . . . . . . . . . . . . . . . . . . . . 35
3.2 Proposed improvements and motivation . . . . . . . . . . . . . . . . . 43
3.2.1 Improvements to the current workflow . . . . . . . . . . . . . 43
3.2.2 The new approach . . . . . . . . . . . . . . . . . . . . . . . . 44

4 Measurements and discussion 55


4.1 Evaluation of the improvements . . . . . . . . . . . . . . . . . . . . . 55
4.2 Discussion of the proof-of-concept method . . . . . . . . . . . . . . . 56

5 Conclusion and future improvements 60

References 63
vii

Abbreviations
1G 1st Generation
2G 2nd Generation
3G 3rd Generation
3GPP 3rd generation partnership project
4G 4th Generation
5G 5th Generation
5GC 5G core network
ALM Adaptive logic module
AMF Access and mobility management function
API Application programming interface
ARM Advanced reduced instruction set computer machine
ASIC Application specific integrated circuit
AXI Advanced xtensible interface
CLB Configurable logic block
CP Control plane
CPU Central processing unit
DL Downlink
DMA Direct memory access
DSP Digital signal processing
EDA Electronic design automation
FDD Frequency-division duplexing
FIFO First-in first-out
FPGA Field-programmable gate array
gNB Next generation Node B
GPU Graphics processing unit
HDL Hardware description language
HLS High-level synthesis
HPS Hard processor system
HW Hardware
IC Integrated circuit
IEEE Institute of Electrical and Electronics Engineers
IMT International mobile communications
I/O Input/output
IoT Internet of things
IP Intellectual property
ITU-R International telecommunication union radiocommunication sector
JSON Javascript object notation
JTAG Joint Test Action Group
L1 Layer 1
LTE Long term evolution
LUT Lookup table
MAC Medium access control protocol
MM Memory-mapped
viii

NAS Non-access stratum protocol


NG-C Next generation control plane interface
ng-eNB Next generation evolved Node B
NG-RAN Next generation radio access network
NG-U Next generation user plane interface
NR New radio
NSN Nokia Solutions and Networks
OFDM Orthogonal frequency division multiplexing
OpenCL Open computing language
PBCH Physical broadcast channel
PCIe Peripheral component interconnect express
PDCCH Physical downlink control channel
PDCP Packet data convergence protocol
PDSCH Physical downlink shared channel
PHY Physical layer protocol
PLD Programmable logic device
PM Programming model
PRACH Physical random access channel
PUCCH Physical uplink control channel
PUSCH Physical uplink shared channel
RAM Random-access memory
RAN Radio access network
RIFFA Reusable integration framework for FPGA accelerators
RLC Radio link control protocol
RO Read-only
RRC Radio resource control protocol
RTL Register transfer level
RW Read-write
RX Receive
SDAP Service data adaption protocol
SDK Software development kit
SMF Session management function
SoC System-on-Chip
SRAM Static random-access memory
SW Software
TDD Time-division duplexing
TX Transmit
UE User equipment
UL Uplink
UP User plane
UPF User plane function
UPM Unified programming model
VHDL Very high speed integrated circuit hardware description language
WO Write-only
XML Extensible markup language
1 Introduction
1.1 Background
The amount of mobile data used by consumers has been growing rapidly during the
recent years and the growth is expected to continue in the future. It is estimated
that the mobile data usage will increase approximately by ten times from 2016 to
2022, and the yearly mobile data traffic is estimated to reach almost one zettabyte
by 2022. In addition, it is assumed that by 2022 there will be more than 12 billion
devices connected to mobile networks, and smartphones are estimated to account for
over 90 % of mobile data usage. [1, 2] The current generation networks, most notably
Long Term Evolution (LTE, also known as 4G), are not able to fulfill the future
requirements of massive data usage, and thus there is a need for higher capability and
lower latency mobile networks that can respond to the demand for mobile networks
with increased performance. Currently the objective is that the 5G technology would
satisfy these requirements. [3, 4]

Because the requirements for 5G networks are strict, many parts of the 5G technology
are desired to be implemented in hardware (HW) instead of software (SW). Especially
the strict latency requirements support the need for HW, because the execution time
of computation is lower and significantly more stable on HW than on SW. In addition,
because the specifications for 5G networks are not complete yet, it is essential that the
features of 5G can be fluently modified after the initial implementation. Therefore,
reconfigurable HW is a good choice for 5G implementation, offering a compromise
between efficiency and flexibility. FPGAs are an example of reconfigurable HW that
could be used in this application.

In addition to the design implemented on the reconfigurable HW, also SW for


controlling the HW is required. The purpose of the SW is essentially to write/read
data to/from the registers included in the HW. As this overall process involves
both SW and HW development, it is referred to as embedded system development.
An embedded system is a system that includes one or more processors that have a
significant role in it, but the system is not called a computer [5]. Some properties
that are common to many embedded systems include the following: [6]

• Contains a processing unit, for example a microprocessor.

• Is designed for a certain application instead of being general-purpose.

• Includes only a simple user interface, or does not have a user interface at all.

• Has often resource limitations.

• Is often used in applications that do not require interaction with human.


2

1.2 Motivation
This thesis was conducted for Nokia Solutions and Network (referred to as Nokia in
the rest of this thesis) during the first half of the year 2019. Nokia develops 5G HW
and SW using the configuration described previously in section 1.1, and uses FPGAs
as the reconfigurable HW for implementing 5G. The FPGA HW development at
Nokia is a broad process involving several HW design teams. In addition to develop-
ing the required HW for 5G implementation, the HW teams must provide the SW
teams with a programming model (PM) that can be used to write SW for controlling
the 5G HW implemented on the FPGAs. The PM is essentially a description of the
registers of the FPGA design that are interfacing towards the 5G L1 SW. Thus, the
PM can be considered as an abstraction of the FPGA design that is seen by the
L1 SW. The SW teams can then control the FPGA HW via these interfacing registers.

The FPGA designs to be used in the 5G implementation are developed in sev-


eral Nokia offices in different sites. In each location, intellectual property (IP) blocks
are implemented, and finally these IP blocks are integrated to form the top-level
design used for 5G L1. However, the 5G L1 design is further split into several variants.
Some properties of the design are common to every variant, but each variant also
contains unique functionality. These properties are implemented as common IP
blocks and variant-specific IP blocks, respectively. For example, some designs require
time-division duplexing (TDD) while some other designs require frequency-division
duplexing (FDD). The main reasons for this split are that all properties cannot
be implemented on a single FPGA design due to limited FPGA resources, and on
the other hand it is even desired to test different functionalities separately in the
development phase.

The 5G L1 SW is executed on central processing unit (CPU) cores, and as it


is used to control the L1 HW implemented on the FPGA, there must be an inter-
face between the cores and the FPGA design. In addition, due to strict real-time
requirements, the interface must be extremely fast. To satisfy the latency require-
ments, the interface between the CPU cores and the FPGA design is implemented
as a direct memory mapping from the FPGA interfacing registers (the PM) to
the CPU cores. However, the problem with the direct mapping is that a different
5G L1 SW must be developed for every HW design variant. This is due to the
differences in the interfacing registers of the variants. From the SW development
point of view, this means that the L1 SW should be branched into a high number of
branches. This would create severe difficulties for developing and maintaining the SW.

As an attempt to solve this problem, the HW teams have begun to produce an


UPM that contains all the necessary information to be used with every FPGA design
variant. This information consists of the register banks of every common IP block
and every variant-specific IP block of each variant. The UPM itself is an extensible
markup language (XML) file. The main idea of the UPM is that it ensures that the
parts of the FPGA design variants that are common to each variant have the same
3

register map towards the L1 SW. Therefore, if the L1 SW is developed according to


the UPM, the same SW can be used to control every FPGA design variant. This
removes the need for L1 SW branching.

The UPM is currently in use and it works from the SW viewpoint. However,
from the HW viewpoint, the process of creating the UPM causes problems. The
main issue is that to keep the register banks of the common IPs of each variant
similar, much synchronization between the different teams in the HW development
process is required. In addition, there are certain phases in the development process
that require unnecessary manual work, which is time-consuming for the busy HW
teams. Thus, it would be desired to improve the FPGA development process such
that creating and maintaining the UPM would be simpler.

1.3 Goals and contribution


The goal of this thesis is to improve the 5G L1 HW development process at Nokia by
improving the workflow such that it would be more automated and effortless for the
HW developers. The main objective is to enhance the process such that two goals
are achieved:

1. The UPM is always synchronized with the FPGA design variants. This is the
primary goal because the errors in the UPM transfer directly to problems in
the L1 SW development.

2. The creation and maintenance of the UPM is as simple as possible for the HW
developers. This is the secondary goal and even though it is important, the
primary goal should not be compromised because of this goal.

This objective is further divided into two parts. In the first part, minor modifications
to the FPGA development workflow are implemented. These modifications consist
of developing new scripts for automating a few manual parts of the process. The
goal is that these improvements are taken into use quickly after they have been
verified to work properly. In the second part, a significantly modified approach for
the HW development is proposed as a proof-of-concept. It is not assumed that this
process would be taken into use in the near future, but the objective is to examine
the advantages and disadvantages of this process compared to the old process.

In addition to the practical work, the theoretical part of this thesis addresses the
topics of HW/SW co-operation, and SW development for reconfigurable HW. The
general requirements for interconnecting external HW to SW running on a host
CPU is examined, and two SW tools for integrating FPGAs into traditional SW
development are discussed in some detail. Also, the problems of these tools with
respect to the FPGA development at Nokia are studied.
4

1.4 Structure
In the second chapter of this thesis, relevant theory and background related to the
topic of the thesis are discussed. These include mobile networks, 5G technology,
FPGA architecture and operation, and SW development for reconfigurable HW.
The main purpose of this chapter is to give the reader the relevant background
information, and to set a proper context for the thesis topic. The third chapter first
discusses the old FPGA development process in detail, and afterwards explains the
proposed improvements to the process. The minor enhancements to the old process
and the major modification are discussed separately.

The fourth chapter of this thesis analyses the developed improvements, and dis-
cusses their feasibility. First, measurement results related to the minor improvements
of the old process are presented and analyzed. Afterwards, the proof-of-concept
method is further discussed, including for example the pros and cons of the method.
In addition, few alternative methods to improve the development process are shortly
presented. The fifth and final chapter of the thesis concludes the results of the
research work, evaluates their quality, and provides ideas for further research and
development.
5

2 Theory
This chapter presents the basic theoretical concepts that are relevant for this thesis.
Several topics are addressed that provide the reader with the necessary background
information to set this thesis to the correct context. These topics include cellular
networks, 5G technology, FPGAs, system-on-chips (SoCs) and co-operation between
HW and SW. Specifically related to the last topic, the concept of developing SW for
reconfigurable HW is discussed, and a few tools related to this are presented. Finally,
the general recommendations for using FPGAs in embedded system development are
discussed.

2.1 Cellular networks


A cellular network (also known as a mobile network) is a network of base stations
that provide services for devices that are connected to the base stations [7]. A mobile
network can be divided into three planes: the user plane (UP), which is responsible
for carrying and controlling the network user traffic, the control plane (CP), which
is responsible for carrying the signaling traffic (for example the connection and call
control protocols), and the management plane [8]. The devices in a mobile network
are usually called user equipment (UEs), and they can be wirelessly connected to
each other via the base stations. Nowadays, the UEs usually consist of mobile devices
such as smartphones, tablets or laptop computers. However, in the future the variety
of different connected devices will increase rapidly due to the internet of things (IoT).
The connected devices could include autonomous vehicles, household machines and
so on. [3]

Each base station in the cellular network provides only a limited power and covers a
limited area, which is called a cell. The base stations are geographically located such
that there are no uncovered spots in the coverage area. When a user is in a specific
cell, the user is connected to the corresponding base station. To avoid interference,
the cells that are near to each other use different frequencies. However, two cells
that are far enough from each other can use the same frequency because the cells do
not overlap and thus there will be no interference. This is a highly efficient method
to utilize the scarce frequency resources. [7] Figure 1 illustrates a cellular network,
and it also explains the concepts of cells and base stations. The cell labeled with
number 2 uses a different frequency than the two cells labeled with number 1.
6

Base
1 station
1

Cell
Figure 1: An illustration of a cellular network. Modified based on [7].

The technologies used in cellular networks in different eras are usually separated by
defining generations of mobile communication technologies. All of these generations
have provided a significant improvement in the network capability compared to the
previous generations. So far there have been four generations of mobile networks
that are commonly called 1G, 2G, 3G and 4G. 1G was the first generation of mobile
communications used in the 1980s, and it was used for analog radio signals. It
was later replaced by 2G that utilized digital radio signals in the network. The 3G
network was the first mobile network that could transfer broadband data, which was
an essential step in the breakthrough of smartphones. 4G technology improved the
data connections provided by 3G by making them faster, enabling larger bandwidth,
and reducing delays, among other improvements. [7, 9] To create and maintain
the technical specifications and technical reports for these mobile communication
standards, an organization known as the 3rd Generation Partnership Project (3GPP)
was founded in 1998. Currently, the scope of 3GPP covers 3G, 4G, and 5G networks.
[10]

The next generation of mobile networks is known as 5G. It is expected to answer the
challenge of continuously increasing consumption of mobile data by providing very
high speed and capacity, broad bandwidth, and at the same time near-zero latency.
In addition to being just an improvement to the 4G technology, 5G is assumed to
provide enough capacity for IoT. Therefore, new technologies and innovations are
required for implementing 5G networks that could fulfill these requirements. [3, 4]

2.2 5G technology
As mentioned in the previous section, 5G means the fifth generation of mobile
communication technology, and it is specified in the 3GPP standard. The first 5G
specifications were delivered by 3GPP in release 15, even though 5G requirements
were discussed already in release 14. [11] The radiocommunication sector of the
International Telecommunication Union (ITU-R) (an agency of the United Nations
7

for coordinating information and communication technologies) has defined three


usage scenarios for 5G, each addressing different characteristics. [3, 12, 13]

• Enhanced mobile broadband: To support the increasing demand of mobile


broadband in the future, this scenario strives to provide broader bandwidth for
data transfer and to simultaneously decrease latency. This enables improved
performance for applications such as augmented reality or virtual reality. [3]

• Massive machine type communications: This usage scenario addresses the


situation where a vast number of devices, each consuming only a relatively
small amount of data, are connected to the network simultaneously. An example
of this could be the IoT. This usage scenario is highly relevant for 5G, because
it is predicted that there will be billions of devices connected to the internet in
the future. [3, 13]

• Ultra-reliable and low latency communications: The applications in this usage


scenario have extremely strict requirements for reliability, latency and availabil-
ity. Such applications could include for example self-driving vehicles, remote
medical operations and remote control of industrial manufacturing. Many of
these applications could require reliability of more than 99.99 %. [3, 13]

To be able to support the above three usage scenarios and the continuous growth
of mobile data usage in the future, 5G technology must satisfy strict technical
requirements. The ITU-R has created the International mobile communications
(IMT)-2020 standard, the purpose of which is to provide international specifications
for 5G. This standard has defined eight key capabilities and their values for 5G,
which are shown in the following table: [3]

Table 1: Key specifications for 5G.

Capability Value
Peak data rate 20 Gbps
User experienced data rate 0.1 − 1 Gbps
Latency 1 ms over-the-air
Mobility 500 km/h
Connection density 106 /km2
Energy efficiency 100 times compared with IMT-Advanced
Spectrum efficiency 3 − 5 times compared with IMT-Advanced
Area traffic capacity 10 Mbit/s/m2

The IMT-Advanced standard in the above table refers to the specifications for
4G networks provided by the ITU-R. Based on Table 1, it is evident that these
specifications cannot be satisfied with current generation mobile networks.
8

2.2.1 Overall architecture and protocol stack


The simplified architecture of the 5G network is illustrated in Figure 2. The leftmost
dashed box in the figure illustrates the next generation radio access network (NG-
RAN), and the rightmost dashed box illustrates the 5G core network (5GC). The
RAN nodes (corresponding to the base stations) can be divided into two categories
[14]:
• Next generation Node B (gNB) nodes, that provide 5G new radio (NR) UP
and CP termination towards the UE.
• Next generation evolved Node B (ng-eNB) nodes, that provide evolved universal
terrestrial radio access UP and CP termination towards the UE.

NG-RAN 5GC

NG-C
UE gNB/ng-eNB AMF

Xn
SMF

NG-U
gNB/ng-eNB gNB/ng-eNB UPF Internet

Figure 2: A simplified architecture of the 5G network. Modified based on [15].

As can be seen from Figure 2, 5GC consists of the access and mobility management
function (AMF), the session management function (SMF) and the user plane func-
tion (UPF). The AMF provides services related to the mobility of the user. These
include for example mobility management, connection management and reachability.
The SMF provides functions for session establishment and session management.
The combined AMF and SMF correspond to the mobility management entity that
was used in the LTE technology. The UPF concentrates on data connectivity and
providing fast access to the internet. The UPF combines the services provided by
serving gateway and packet gateway used in LTE. [16] The interface between a
RAN-node and the UPF is called the next generation user plane interface (NG-U),
and the interface between a RAN-node and the AMF is called the next generation
control plane interface (NG-C). The interface between two RAN-nodes is called the
Xn-interface, which can be divided into Xn user plane interface and Xn control plane
interface. [14]

The 5G NR architecture contains several protocols for controlling the data that
is sent and received by the connected devices. In 5G, there are separate protocol
stacks for the UP and for the CP. [14] The protocol stack for UP is shown in Figure
3, and the protocol stack for CP is shown in Figure 4. As can be seen from these
figures, there are a total of seven different protocols, which will be briefly introduced
in the following paragraphs.
9

UE gNB

SDAP SDAP

PDCP PDCP

RLC RLC

MAC MAC

PHY PHY

Figure 3: The 5G NR user plane protocol stack. Modified based on [14].

UE gNB AMF

NAS NAS

RRC RRC

PDCP PDCP

RLC RLC

MAC MAC

PHY PHY

Figure 4: The 5G NR control plane protocol stack. Modified based on [14].

The physical layer (PHY) is used to transfer data to the layers above it. This data
includes information on how the data is transferred over the radio interface, and
what characteristics the data has. However, the data provided by PHY does not
contain any information about what is actually transferred. The services provided
by PHY to the higher layers are called the transport channels. There are different
channels for both downlink (DL) and uplink (UL). [14]

The medium access control (MAC) protocol offers functions for data transfer and
radio resource allocation. The MAC protocol provides services known as the logical
channels for the higher layer, and it requires the transport channels from the physical
layer. The logical channels can be divided into two categories: control channels, that
are used only for transferring the CP information, and traffic channels, that are used
only for transferring the UP information. One of the main functions of the MAC
protocol is to provide a mapping between the logical channels and transport channels.
Other services provided by MAC include for example scheduling services, priority
handling services and multiplexing/demultiplexing services. [14, 17]
10

The functions provided by the radio link control (RLC) protocol are performed
by several RLC entities, that are located in the UEs and in the base stations. The
purpose of the RLC protocol is to manage the data transfer between the RLC entities
at the gNB nodes and the RLC entities at the UEs. The RLC entities at the gNB
nodes contain RLC service data units, and the entities at the UEs contain RLC
protocol data units. The RLC protocol also provides segmentation and reassembly
services. [14, 18]

In the packet data convergence protocol (PDCP), the PDCP entities are associated
either to UP or to CP. The PDCP provides services to the layers above it, which
are different in UP and CP. The services include transferring the user plane and
control plane data, integrity protection, header compression and ciphering. PDCP
requires the RLC channels that are provided to it by the RLC protocol. [19] The
service data adaption protocol (SDAP) is the protocol that is above the PDCP in
the UP, and it is responsible for providing a mapping between a quality of service
flow and a data radio bearer, provided to it by the PDCP. The SDAP provides the
quality of service flows to the 5GC. SDAP is the highest-level protocol in the UP. [14]

The radio resource control protocol (RRC) is the protocol that is above the PDCP
in the CP. The main services that the RRC requires from the lower layers are
integrity protection, ciphering and robust delivery of sequential information. As
explained previously, these are provided by the PDCP. The services that the RRC
provides to the upper layers include for example broadcasting control information
and transferring dedicated signalling. [20] The non-access stratum (NAS) protocol is
the highest-level protocol in the CP. The main areas of service for NAS include for
example mobility management, authentication and security control. [14]

2.2.2 5G layer 1
The 5G layer 1 corresponds to the physical layer, described in the previous section.
This is the layer that is closest to the HW, and it provides information about the
data to be transferred to the upper layers. Both the DL and UL in L1 use a waveform
modulation scheme known as orthogonal frequency division multiplexing (OFDM)
with minor modifications. [14] In telecommunications, modulation means the process
of modifying the properties of some signal by mixing another signal with it. Typ-
ically, the signal that is mixed with the original signal contains information to be
transmitted, and the original signal (often called carrier) is used for transferring the
information wirelessly. [21]

OFDM allows the simultaneous usage of several closely spaced frequency bands
(subcarriers) for data transfer such that the bands do not interfere with each other.
When several frequency bands are simultaneously used, there must usually be a
certain amount of space between the bands because when modulation is applied to a
band, it spreads out on the sides which causes different frequency bands to overlap.
11

Overlapping causes interference between the bands, making it difficult to separate


them in the receiver. In OFDM, this band overlapping still occurs but it does not
cause problems because the bands are orthogonal to each other. Thus, they do not
interfere and they can be separated in the radio receiver. The advantages of OFDM
include for example a highly resilient communication and a high spectral efficiency.
[22] Figure 5 illustrates the OFDM scheme.
Subcarriers

Spectrum

Frequency

Figure 5: An illustration of OFDM. Modified based on [22].

The physical layer downlink can roughly be divided into three channels: the physical
downlink shared channel (PDSCH), the physical downlink control channel (PDCCH)
and the physical broadcast channel (PBCH) [23, 14]. The purpose of PDSCH is
to transfer the downlink user data, UE specific higher layer information, system
information and paging. PDCCH carries downlink control information, and one
important application for it is to schedule DL transmissions on PDSCH and UL
transmissions on the equivalent uplink channel. Other applications for PDCCH
include for example giving notifications to the UEs, or switching the active band-
width part of a UE. PBCH is used to carry very basic system information. [14, 24, 25]

The physical layer uplink can also be roughly divided into three channels: the physical
uplink shared channel (PUSCH), the physical uplink control channel (PUCCH) and
the physical random access channel (PRACH) [23, 14]. The operation of PUSCH
can be compared to PDSCH, except that it transfers uplink data. PUCCH is used
to carry the uplink control information from the UE to the gNB. PRACH is used to
transfer a random-access preamble from the UE towards the gNB. The purpose of
this is to notify the gNB of a random-access attempt and to guide the gNB to adjust
several parameters of the UE. [14, 24, 25]
12

In 5G, all data transmission in both DL and UL is divided into frames, each being
10 ms in duration. A frame in telecommunication is a complete data unit that is
transmitted between two nodes in the network. A frame contains the data to be sent
combined with addressing and protocol information. Every frame is further divided
into 10 subframes, each being 1 ms in duration. A subframe can in turn be divided
into slots, which can differ in lengths depending on the subcarrier spacing. Due to the
nature of OFDM, the slot length decreases as the subcarrier spacing increases. [14, 25]

During one slot, the 5G physical layer must be able to process a varying amount of
events, and because the duration of one slot may be quite short, the available time
to process one event could be extremely short. The highest subcarrier spacing used
in 5G for data channels is 120 kHz (for the synchronization signal that is used for
initially accessing the network, it is 240 kHz), and this corresponds to slot duration
of 125 µs. If the number of events that the 5G L1 should be able to process would
be for example 20, under these conditions the average time for processing one event
would be about 6 µs. In practice, some margin should be left to guarantee that the
latency requirement is filled, and thus the real available time for event processing
would be even less. This sets extremely strict real-time requirements for the 5G L1
implementation. [25] These requirements also motivate the usage of HW instead of
SW for the L1 implementation.

2.3 FPGAs and SoCs


FPGAs belong to the category of programmable logic devices (PLDs). An FPGA is a
device that can be programmed to implement the functionality of basically any digital
integrated circuit (IC). [26] In practice the available HW inside the FPGA limits
the size of the digital IC that can be programmed to the FPGA. In this context, the
term “programming” requires some explanation. It does not mean the same as in
traditional SW development, where it essentially means providing instructions to the
CPU. Rather, programming an FPGA refers to the usage of SW based methods to
configure the HW inside the FPGA to operate similarly as the desired digital IC.
The file used for the configuration is usually called a bitfile or a bitstream. [26] A
more detailed explanation of programming an FPGA is presented in the upcoming
section 2.3.2 “Hardware design using FPGAs”.

The development of modern FPGAs began in the 1980s, when Xilinx introduced the
first FPGAs in 1984. Since then, the capacity and speed of FPGAs have increased
rapidly, while the energy consumption and costs have decreased. The main reason
behind this development has been the scaling of semiconductor process technology, de-
scribed by the Moore’s Law. In addition, the development of efficient electronic design
automation (EDA) tools for FPGA development supported the success of FPGAs.
[27] As a result, FPGAs have become promising devices for several applications, and
they are widely used in microelectronics and embedded system development. Some
applications that make use of FPGAs include network equipment, automation, data
centers and application specific integrated circuit (ASIC) prototyping. [26]
13

A SoC is a complete, independently working system integrated on a single chip


that integrates all components of some electronic system, for example a computer.
The components may include for example a CPU, a graphics processing unit (GPU),
an FPGA, memory management devices, radio transceivers, power management
devices and so on. Because the variety of electronic blocks that can be integrated
into a single chip is large, the design and manufacturing of a SoC may be challenging.
The essential difference between a SoC and for example a CPU is that a SoC is a
complete system that can be used for something reasonable as such, but a CPU is
only one part of a larger system and it is quite useless as an independent component.
The common usage applications for SoCs include for example mobile phones, digital
cameras, single board computers and other embedded systems. [5]

Nokia uses a large amount of HW for implementing 5G L1, but the device of
interest with respect to this thesis is the Intel Stratix 10 SX2800 SoC, which is
included in the HW. The Stratix 10 contains the required components to host
both, the L1 HW and the L1 SW. The Stratix 10 has a hard processor system
(HPS) containing four advanced reduced instruction set computer machine (ARM)
Cortex-A53 cores, and an FPGA containing over 2.5 million logic elements. The
processor containing the ARM cores can be used with up to 1.5 GHz frequency, and
it includes for example cache memories, support for direct memory access (DMA)
and several peripheral interfaces. [28] As described in chapter 1, the L1 SW runs on
the ARM cores and the L1 HW on the FPGA can be controlled by those cores via
the interfacing registers. Figure 6 illustrates the structure and different components
of the Intel Stratix 10 device.

Figure 6: Block diagram of the Intel Stratix 10 [29].


14

As discussed in the previous section, the strict real-time requirements offer a major
reason for using FPGAs in 5G implementation. An example of the problems in SW
implementation are cache memories, which can cause vast variation to the processing
times, since the time to fetch data from the memory can vary significantly depending
on where the data is physically located [5]. For example, it is faster to fetch data
from the level 1 cache than from the main memory. HW can provide considerably
more stable processing times. In addition, implementing computations on HW is
generally faster than implementing the same computations on SW. However, the
amount of speedup is strongly dependent on the application. [30]

An option for the use of an FPGA could be to produce ASICs, but the prob-
lem is that manufacturing separate ASICs is very time-consuming and expensive,
and they cannot be modified after manufacturing and delivering to the customer,
unlike FPGAs. This non-configurable nature of ASICs is a major drawback as the
specifications for 5G are not complete yet, and thus it would be essential that it is
relatively simple and fast to modify the HW of the 5G L1 implementation after it
has been delivered to the customer. In addition, as described in chapter 1, there are
several variants of the L1 HW design. By using FPGAs, the SW can decide on boot
which variant will be loaded to the FPGA. This would not be possible with ASICs,
and thus a separate ASIC would have to be manufactured for each variant, which
would further increase the costs. Therefore, Nokia has chosen to use FPGAs instead
of ASICs.

2.3.1 Operation of FPGAs


Essentially, an FPGA is a set of memory blocks that can be configured to store
arbitrary data and whose interconnections can be programmed by the user. At its
simplest, an FPGA consist of multiplexers, random-access memory (RAM) cells and
wiring between them. The memory is usually implemented as static RAM (SRAM),
but other memory types can also be used. SRAM is a type of RAM memory that can
store the data without a need to refresh the memory regularly. [26] In comparison,
dynamic RAM requires regular refreshing. A multiplexer is a component that has
several data inputs, one or more select inputs, and one output. By controlling the
select inputs, any data input can be produced at the output of the multiplexer [26].
The fundamentally reconfigurable nature of an FPGA is due to the reconfigurability
of the RAM memory. In other words, the developer can store arbitrary values to the
RAM cells. Thus, by connecting the programmable RAM cells to the data inputs of
a multiplexer, the user can program the multiplexer to output a desired value for
each select input.

The combination of one multiplexer and the RAM cells connected to the data
inputs of the multiplexer is called a lookup table (LUT) [26]. Furthermore, several
LUTs and flip-flops (and possibly other components) can be grouped together to form
configurable logic blocks (CLBs) that can implement more complicated logic than
single LUTs. The CLBs are connected to each other and to other parts of the FPGA
15

via a switch matrix. The switch matrix is formed by connecting the programmable
RAM cells to the select inputs of the multiplexers. The inputs and outputs of the
CLBs are then connected to the data inputs and outputs of the multiplexers of the
switch matrix. [26] The overall architecture of an FPGA is illustrated in Figure 7.
The CLBs in the figure are directly connected to the switch matrices even though it
is not evident from the figure.
Interconnect wires

CLB CLB CLB CLB

Switch matrix
CLB CLB CLB CLB
I/O bank

CLB CLB CLB CLB

CLB CLB CLB CLB

Figure 7: The general structure of an FPGA. Modified based on [5].

However, the overall structure of modern FPGAs is not quite as simple as implied by
Figure 7. Instead, modern FPGAs include a set of ready-made building blocks that
are provided within the FPGA. These blocks may include for example memories
or memory controllers, digital signal processing (DSP) blocks, input/output (I/O)
controllers, or even CPUs. The idea of including such ready-made blocks into an
FPGA is that using these blocks in an FPGA design, instead of implementing them
on the FPGA logic, may improve area, performance and power ratio. The inclusion of
such blocks may also make FPGAs more appealing and competitive devices compared
to for example DSP processors. [26]

Figure 8 further explains the concepts of the lookup table and the switch matrix.
The c-letters in the figure illustrate the programmable RAM cells and the trapezoids
illustrate the multiplexers. The two multiplexers on the left side of the figure form
the switch matrix and the rightmost multiplexer, combined with the four RAM cells
that are connected to the data inputs of the multiplexer, correspond to the LUT.
As an example of the programmability of an FPGA, let us assume that we would
like to implement a 2-input OR-gate with the configuration shown in Figure 8. To
implement the gate, the three uppermost RAM cells of the LUT (corresponding to
16

the multiplexer values 11, 10 and 01) should be programmed to store 1, and the final
RAM cell corresponding to 00 should be programmed to store 0. If the two select
inputs of the LUT would contain for example bits 1 and 1, the output of the LUT
would be 1 which corresponds to the operation of the OR-gate. [26]

11
10

01
00

c c 11
c c 10 output
c 01
c 00
11
10

01
Lookup table
00

c Switch
matrix
c

Figure 8: An illustration of a LUT and a switch matrix. Modified based on [26].

Even though in the previous example it was demonstrated how to implement a


2-input OR gate with a LUT, it should be noted that a single LUT can be configured
to implement more complicated logic than just single standard logic gates. As an
example, let us again consider the LUT in Figure 8 and assume that the memory cells
connected to the values 11, 00 and 01 are configured to store 1 and the cell connected
to value 10 is configured to store 0. This configuration cannot be implemented with
any single standard logic gate, but instead it would require more than one standard
logic gate to implement. For this reason, it may be sometimes difficult to try to
estimate how large digital IC can be implemented on an FPGA.

Naturally, the logic blocks of real FPGAs are significantly more complex than
for example the one shown in Figure 8. As an example, Figure 9 shows a high-level
block diagram of a one so-called adaptive logic module (ALM) of the Stratix 10
FPGA. In the Stratix 10, one CLB consists of 10 ALMs connected together. [31] In
total, the Stratix 10 contains 933 120 ALMs [28].
17

Figure 9: High-level diagram of an ALM of the Stratix 10 [31].

2.3.2 Hardware design using FPGAs


Creating an FPGA design refers to the process of configuring the RAM cells of the
FPGA such that it operates similarly as the desired digital IC would operate. FPGA
designs are created by writing code that describes the operation of the desired digital
circuit. The programming is usually done by using some HW description language
(HDL). Two widely used HDLs are Verilog and very high speed integrated circuit HW
description language (VHDL). Even though both have been designed for the same
purpose, they have certain differences. For example, VHDL is a very strongly typed
language whereas Verilog is not. VHDL has also preferable constructs for managing
large hardware design structures. On the other hand, because VHDL is strongly
typed and the behavior of the same circuit can be modeled in several ways, VHDL
might be slightly more difficult to learn for a developer with no previous hardware
design background. [32] The written HDL code is usually referred to as register
transfer level (RTL) description of the design. RTL is a design abstraction that
is used to model synchronous digital circuits with HDLs. In RTL, the circuits are
modeled in terms of the signal flow between registers and combinational operations
that are performed on the signals. [33]

HW design using an HDL resembles traditional SW development; for example,


the same version control tools can be used in both. However, when developing
HW using an HDL the designers must take into account several HW related issues,
including for example timing constraints, and the designer must also understand
that only certain types of HDL code can be implemented in HW. In addition, HDL
development includes the usage of complicated HW design tools for converting the
18

HDL code into the actual IC. These issues make it difficult to embed FPGAs into
traditional SW development, even though it has been shown that the usage of FPGAs
could improve the efficiency of developed SW significantly in many applications,
for example data analysis [34]. Therefore, much effort is currently put into making
FPGAs more accessible for SW developers. [26] One method to achieve this could
be to shift to a higher abstraction level in the design by developing programming
languages for HW development that are more similar to the languages used in SW
programming. This is known as high-level synthesis (HLS). Catapult C, developed
by Mentor Graphics, is one example of an HLS tool. It uses C/C++ or SystemC as
input language, and produces RTL code as an output. [35]

The FPGA development process can roughly be divided into five sections [36]:

1. Creating the circuit description using an HDL.

2. Analyzing and elaborating the HDL code.

3. Synthesizing the elaborated design.

4. Implementing the synthesized netlist.

5. Generating the bitstream and loading it to the FPGA.

Simulating the created design is not included in the above list, but in reality it is an
extremely important step in the process and it is done in many phases of the flow.
In addition, the flow contains several checks and verifications that are not in the
list, related to for example clock skews, clock domain crossings, checks for logical
equivalence and so on. The creation of the HDL code is the only one of the five
steps that is done manually, and the rest of the steps are done automatically by
EDA tools. However, this has not always been the case. During the first years after
the invention of the FPGA, implementing the designs on FPGAs was manual work.
Afterwards, as the designs became larger and more complicated, it was necessary to
develop EDA tools for the process to keep the design effort on acceptable level. [27]
There exist several tools for this process, provided by the FPGA vendors. The tool
used by Nokia will be addressed in chapter 3 of this thesis.

The first step after writing the HDL code is to analyze it. This means that the
code is checked for syntactic and semantic errors. A syntactic error means a situ-
ation where the written HDL code does not support the syntax of the used HDL
language. A semantic error, on the other hand, refers to a situation where the
syntax of the HDL code is correct, but the written HDL code is meaningless for
the HDL compiler. If the analysis of the HDL code does not give any errors, the
code can be elaborated. In the elaboration phase, the top-level HDL design is
expanded such that all individual entities are represented by unique components,
and the interconnections between them are defined. It should be noted that after the
elaboration, the components are still represented by only generic RTL constructs. [33]
19

The second step in the process is to synthesize the elaborated design. The syn-
thesis tool maps the generic RTL constructs of the elaborated design into gate level
components that are included in the FPGA. These may include for example LUTs,
flip-flops and phase-locked loops. The mapping created by the synthesis tool is called
a netlist, and it will be used in the subsequent phases of the design process. This
mapping is minimized and optimized, and the mapping is logically equivalent to the
elaborated design. [36, 37]

The third phase of the design flow is the implementation (also known as place-
and-route or fitter). In this phase, the netlist generated by the synthesis tool will be
mapped to the HW inside the FPGA. As the name suggests, this process can be
divided into two parts: placing the design, and routing the design. Placement is the
process of deciding where the components generated by the synthesis process are
situated inside the FPGA. Routing is the process of determining the physical connec-
tors that are used to connect the placed components together. The implementation
is a very HW-intensive and time-consuming process. Essentially, implementation
can be seen as an extremely complicated optimization problem: it strives to map
a netlist to the available HW inside the FPGA with certain boundary conditions.
These include for example the timing constraints. [36, 37]

The mapping generated by the place-and-route tool is not yet compatible with
the FPGA, and therefore the last phase of the design flow is to generate a bitfile
based on the mapping that can directly be loaded into the FPGA. The bitfile is
generated automatically by the SW tool that is used in the development, and it
can be loaded to the FPGA by using for example the Joint Test Action Group
(JTAG) standard. JTAG was originally developed to provide relatively effortless
technology for testing printed circuit board assemblies without the need for low-level
physical access or much development for functional tests. However, nowadays JTAG
is widely used for programming, debugging and testing interfaces on for example
microcontrollers, ASICs, and FPGAs [38, 39]. After the bitfile is loaded into the
FPGA, the design can be tested in practice to verify that it operates correctly. It
should be noted that because RAM is volatile memory, meaning that it requires
continuous supply voltage to store data, FPGAs can only store the loaded bitfile as
long as power is supplied to the FPGA. Thus, every time the FPGA is rebooted,
the bitfile is lost and it must be reloaded to the FPGA.

2.4 Co-operation between hardware and software


All the SW that is developed with some programming language is essentially executed
on HW, usually on a microprocessor/CPU. This observation makes it clear that
there must be an interface between HW and SW in every computing system that
can be controlled with SW. In traditional SW development, the developer must
pay only little attention to these interfaces because the development environment
(i.e. the computer) is provided to the developer as a ready-made device in which
the interfacing issues have already been addressed. However, when the developer
20

desires to connect external HW (for example an FPGA or an ASIC) to the system,


the interfaces between HW and SW must be considered. The development of the
required interfaces requires knowledge on both, SW and HW development [30].

When the interfacing and co-operation of HW and SW is considered, it should


be discussed why in many situations HW and SW are desired to be used together
to perform computations instead of using only SW or only HW. Essentially, the
reason is that the choice between HW and SW as an implementation method can
be described as a trade-off between certain desired properties. Roughly speaking,
it can be seen that implementing computations on hardware provides improved
performance and higher energy efficiency, whereas software provides smaller design
effort and considerably higher flexibility. When a certain compromise between these
is desired, some portion of the computation should be executed on specialized HW
and the rest on SW. [30] Figure 10 illustrates this trade-off.
Performance

ASIC

FPGA

CPU

Flexibility

Figure 10: An illustration of the trade-off between HW and SW in computing.


Modified based on [40].

However, to achieve the improvements in performance and efficiency depicted in


Figure 10, it is essential to carefully consider, how the algorithm should be split
between SW and the external HW. Different splits can be characterized in terms
of scalability, latency, bandwidth and programmability. This decision is highly
important in the design flow because the splitting of the algorithm between SW
and HW can have a significant effect on the overall performance of the application. [41]

Generally, the goal of creating an interface between SW and custom HW is to


connect the SW application (on a microprocessor/CPU, for instance) to the custom
HW module on a coprocessor. [30] This is illustrated in Figure 11.
21

Microprocessor Coprocessor

Software Custom-HW
application module

API Ports

Driver Programming
model
Microprocessor Hardware
interface interface

On-chip bus

Figure 11: High-level illustration of HW/SW interfacing. Modified based on [30].

As can be seen from the figure, five elements of interest can be determined that
constitute the interface between the software application and the custom-HW module.
These elements will be discussed during the rest of this chapter.

1. On-chip bus

2. Microprocessor interface

3. Hardware interface

4. Software driver

5. Programming model

The on-chip bus is a component that is used for transferring data between the
software application and the custom-HW module. In more detail, the bus system
describes a protocol, i.e. the different phases in communication required to transfer
data between components in a predefined manner. Basically two types of on-chip
buses exist: those that are shared among many components (usually called mas-
ters/slaves), and those that implement only dedicated point-to-point connections.
All on-chip buses do not transfer data according to the same protocol. Instead,
there exist several on-chip bus standards, and if the developer desires to connect a
custom-HW block to a CPU/microprocessor, it is often required to know the bus
standard that is used by the processor. Some commonly used standards include for
example the advanced microcontroller bus architecture, CoreConnect and Avalon. [30]

Let us now consider the shared bus configuration and the point-to-point bus con-
figuration in a little more detail. In the shared configuration, an address space is
22

utilized to control the communication between the devices that are connected to the
on-chip bus. This address space is used such a way that when data is transferred via
the bus, there is always a destination address associated with the data. This address
is used to determine the component that should receive the data. [30]

As it requires several pieces of information to transfer data via the on-chip bus,
the bus itself physically consists of several wires. These wires can be divided into
four categories: address wires, data wires, command wires and synchronization wires.
As their names imply, data wires carry the data to be transferred, and address
wires contain the address that is associated with the data. Command wires carry
information about the type of transfer that is to be performed. This information
may for instance imply, whether read or write operation will be performed. Finally,
the synchronization wires ensure that the masters and slaves that are attached to
the bus are synchronized during the data transfer. [30]

The point-to-point configuration has some similarities and some differences in com-
parison to the shared configuration. Similarly as in the shared configuration, the
point-to-point configuration also recognizes slave and master components. However,
because the connection is only from one point to another, there does not exist an
address space or any similar concept. Instead, the data is represented as a continuous
stream of items. Even though there is no address space, the data items may still be
divided into different logical channels by multiplexing several data streams over the
same physical channel. [30]

The microprocessor interface provides a method for the software program to connect
to the on-chip bus. It consists of hardware and low-level firmware to enable this
connection. Microprocessor interfaces can be categorized into three classes: memory-
mapped (MM) interfaces, coprocessor interfaces and custom-instruction interfaces.
The MM interface is the most commonly used. It reserves a portion of the address
space of the processor to be used in communication between HW and SW. This
property makes it easy to use with SW development. As a simple example, a register
connected to the on-chip bus can be an MM interface, making the register a shared
resource between SW and HW. In this case, a bus transfer to a specific memory
address causes an access to the register. More advanced examples of MM interfaces
include for instance mailbox, first-in first-out (FIFO) queue, and shared memory
configuration. [30] However, these will not be discussed in more detail in this thesis.

Even though the MM interface is very widely used, its performance may not be
enough in high-throughput applications. In this situation, it may be replaced with a
coprocessor interface, which is a dedicated interface specifically designed to be used
to attach custom-HW modules to the processor. In addition to higher throughput,
the coprocessor interface provides a fixed latency, which is not the case with the MM
interface. The coprocessor interface does not utilize the on-chip bus, but instead it is
a specific port on the processor which is controlled with a certain set of instructions.
However, the drawback of using the coprocessor interface instead of the MM interface
23

is that the custom-HW module must be designed to be compatible with that interface
on the particular processor. Therefore, the reusability of the HW block is limited to
systems that use the same processor. [30]

To speed up the integration of HW and SW, custom-instruction interfaces can


be applied. In this approach, the custom-HW module is directly integrated into the
microarchitecture of the microprocessor. To use the HW module, a specific part of
the operation codes of the microprocessor are reserved for new instructions. However,
this interfacing approach suffers from the same lack of flexibility and reusability
as the coprocessor interface, because this method is highly processor-specific. The
advantage of the custom-instruction interface is that it automates some otherwise
difficult aspects of HW/SW interfacing and codesign. [30]

The hardware interface is an interface that is used to connect the custom-HW


module on the coprocessor to the on-chip bus. It decodes the bus protocol to enable
the HW module to gain access to the data on the bus through a register or a memory
system. This interface controls the I/O ports of the HW module, affecting several
operations of the HW module. Some specific functions of the interface include
data transfer, wordlength conversion, and temporary operand storage. Some basic
components that are usually included in the interface include data input buffer, data
output buffer and command interpreter. The hardware interface also includes several
ports, that are addressable inputs/outputs of the coprocessor. [30]

The software driver and the programming model are the elements that are clos-
est to the software application and to the custom-HW module, respectively. The
software driver is an interface that connects the SW application to the microprocessor
interface. It is used to wrap the transactions between HW and SW into SW function
calls, and to convert concepts that are natural to software into structures that are
applicable for communication with HW. Similarly, the programming model is an
interface for connecting the custom-HW module to the hardware interface. It is a
high-level presentation of the HW to the SW application. The programming model
is straightforward to handle by the microprocessor because it includes the memory
areas used by and the commands understood by the HW module. [30]

2.5 Software development for reconfigurable hardware


As described in chapter 2.4, there are a significant amount of issues to be considered
when an external HW block is to be connected to SW for more efficient computation.
Even if the HW to be used is designed to be kept constant, it still requires extensive
knowledge of both HW and SW design to implement a proper interconnection logic
between HW and SW. Therefore, it comes as no surprise that the situation becomes
even more complicated if the HW consists of reconfigurable logic.

The flexible programmability of FPGAs make it difficult to develop SW for them


because the interface between SW and the FPGA design should change according to
24

the bitstream that is loaded to the FPGA. Therefore, developing just one constant
interface will cause issues. More specifically, if the MM interface is used, the problem
is that the address space of the interfacing components of the FPGA system changes
whenever the HW interface is modified. The consequence of this is that the address
space used in the SW must be reconfigured. [42]

In the case of non-programmable electronics, it is acceptable that a dedicated


SW is developed for every HW design. The reason for this is that it is expected that
a new HW design is produced quite rarely, and thus the effort required to develop
a new SW for every new HW design is not overwhelming. However, with FPGAs
and other PLDs, the HW may be reconfigured even several times per day. Thus, it
would be very difficult and time-consuming to redesign the SW for every change of
the HW design. As a result, it is important that the developed SW is flexible enough
to adjust to as many different HW modifications as possible. [42]

As discussed in chapter 2.3, the Intel Stratix 10 SX2800 contains both, an FPGA and
an HPS. A typical usage scenario of such a device is to implement fast computation
on an FPGA HW, that is controlled with SW developed for the HPS. Generally,
when a new FPGA design is created, a base address is assigned to each IP block of
the design. This assignment can be performed automatically with the FPGA design
tools. The base addresses are then mapped to the processor so that the developed SW
can access the FPGA design. This enables the user to control the FPGA HW via SW.

However, this standard approach causes problems in creating the UPM. As an


example, let us consider two FPGA design teams that are developing two 5G L1 HW
variants (TDD and FDD, for example). Both of these designs have some common
modules and some modules that are specific to each variant. If the two teams create
the register map for the designs separately, the addresses for the common modules
may be different for the variants. This is not allowed, since the same programming
model could not be used to develop SW for the two variants. Thus, manual synchro-
nization between the teams is required to keep the programming models similar.

Due to the reasons discussed above, it is clear that developing SW for reconfig-
urable HW is not a simple task. Because a single, static interface between HW and
SW is clearly a problematic solution, roughly two practical approaches can be found.
The first one is to move to a higher abstraction level in the development. Several
frameworks have been designed for this purpose, for example the open computing
language (OpenCL) developed by Khronos Group [43], and the reusable integration
framework for FPGA accelerators (RIFFA) developed at the University of California,
San Diego [44]. The second approach is to stick with the traditional interface logic
development, and try to make the process of maintaining and updating the interface
and programming model as simple as possible.

Let us discuss the concept of generating the interfacing logic between HW and
SW by increasing the abstraction level. The objective of moving to a higher level
25

of abstraction in the design is basically to “abstract away” the interconnection. In


practice, this means that the developer could simply add the developed IP cores
to a framework, develop the required SW control code (including certain function
calls or other properties provided by the framework to make the application support
the framework), and let the framework take care of the rest of the interfacing. This
enables the application developers to concentrate on the application logic instead of
the basic interfacing issues. [44] In addition, this could make it easier for SW de-
velopers to take advantage of FPGAs in increasing the efficiency of the developed SW.

To illustrate the frameworks for increasing the abstraction level, the two frameworks
that were mentioned above (OpenCL and RIFFA) will be examined. Fundamentally,
these frameworks address the same problem but approach it from slightly different
perspectives. RIFFA is used to make connecting custom IP blocks on an FPGA to
the CPU on the host computer as simple as possible. Therefore, using it in FPGA
design still requires the HW design skills to develop IP blocks using HDLs, or at
least the developer must have an access to IP blocks made by someone else. [44]
On the other hand, it can be seen that OpenCL combines the approach provided by
RIFFA with HLS. In other words, using OpenCL for FPGA development requires
(ideally) only SW design experience. [26]

OpenCL is an open and royalty-free SW framework that can be used for devel-
oping SW that runs across several platforms, including for example CPUs, GPUs,
FPGAs and so on. In other words, OpenCL can be seen as programming model,
providing SW developers a relatively easy access for controlling for example different
HW accelerators via SW. [43] This can be highly useful, as traditional SW developers
most likely do not have the necessary knowledge and skills to utilize for example
FPGAs without a framework such as OpenCL.

In terms of FPGAs, the main idea of OpenCL is that it abstracts away the FPGA
design flow, enabling low-level SW developers to use traditional SW development
methods to program FPGAs. An OpenCL program consists of a set of functions
(usually called kernels) running on the accelerator, and a host program running on
the host (usually CPUs), that is connected to the accelerator. The host program is
the SW that controls the HW on the accelerator. The host program controls the
accelerator by using library routines that abstract the communication between the
host processor and the kernels on the accelerator. [26]

The host program of an OpenCL application can be written with several languages
that have the support for it, including for example C/C++ and Python [43]. The
kernel functions that are executed on the FPGA are written with a language that
is similar to C, but it is modified such that it supports the device model used by
OpenCL better. However, the important issue to note is that both “sides” of an
OpenCL program are developed with a software language, which does not require
expertise in HDLs. [26]
26

Because OpenCL is not specifically dedicated for FPGA development, it is used


together with traditional FPGA development tools. For example Intel and Xilinx,
two major FPGA vendors, offer compilers to be used with OpenCL development for
FPGAs. The tool provided by Xilinx is called the SDAccel, which is a stand-alone
OpenCL development environment [45], whereas Intel offers a software development
kit (SDK), that is available as an add-on for Intel’s FPGA development tool called
Quartus [46]. These compilers are used to convert the kernel code into RTL de-
scription that can be synthesized to the FPGA. Because this compilation process is
very complicated, the compilers or other OpenCL development tools provided by the
FPGA vendors are usually quite expensive.

The method to connect the FPGA board to the host system varies depending
on the development setup. For example, using the Intel FPGA SDK for OpenCL,
there are basically two methods to connect the FPGA to the host. In the first
scenario, the FPGA is placed on an accelerator board that can be directly connected
to the host PC via a peripheral component interconnect express (PCIe) connection.
In this case, the OpenCL host program runs on the CPU of the host PC, and the
kernels are implemented on the FPGA accelerator board. In the second scenario
the host program runs on the CPU cores of the FPGA HPS, and the kernels are
implemented on the FPGA fabric. Thus, the CPU cores are connected to the FPGA
through specialized bridges. [47, 48, 49] Figure 12 illustrates the basic concept of an
OpenCL design flow.
27

OpenCL design

Host program Kernel source

Standard C
OpenCL compiler
compiler

Host binary Kernel binary

PCIe or other
interconnect
Host device FPGA

Figure 12: High-level illustration of the OpenCL design flow. Modified based on [50].

In addition to OpenCL, another recently developed SW tool that strives to alleviate


the problem of interfacing between SW and FPGA HW is RIFFA. The objective of
RIFFA is to provide a framework for integrating SW running on CPU cores with
IP cores implemented on an FPGA. RIFFA provides relatively simple interfaces for
both HW and SW sides of the design, thus making it easy to use for developers. It
supports SW bindings for C/C++, Python, Java and Matlab. RIFFA only works
via a PCIe connection, and thus the framework requires a PCIe enabled host system
and an FPGA board with a PCIe connector, such as the Xilinx Virtex-7 VC707
development board. [44]

The basis of RIFFA is a concept of communication channel that is used for commu-
nication between SW threads on the CPU and the developed IP cores on the FPGA.
To use a channel, it must first be opened, then data can be read and written through
it, and finally it must be closed. On the FPGA side, the operation of a channel is
implemented as a FIFO interface for both receiving and transmitting data. On the
SW side, data can be sent to and received from the channel as byte arrays with SW
function calls. The upstream transfers (i.e. data flowing from the IP cores to the
PC) are initiated by the IP cores, and downstream transfers (i.e. data flowing from
the PC to the IP cores) are initiated by the PC. [44]
28

The SW interface of RIFFA is a very simple set of functions that are used to
communicate with the FPGA. The main functions are open, close, send and receive,
which are used for initializing the FPGA, closing the connection to the FPGA, sending
data to the FPGA, and reading data from the FPGA, respectively. In addition, there
is a function for listing all FPGAs that are connected to the host, and a function that
resets a specified FPGA and the transfers across all channels that are connected to it.
Because there are only these six commands for communicating with the FPGA, it is
very simple for a SW developer to communicate with the IP blocks on the FPGA. [44]

The HW interface of RIFFA consists of two separate sets of signals: one for re-
ceiving data, and the other for sending data. Some of these signals are used for the
handshake protocol, and the rest implement the FIFO ports that were mentioned
above. One problem with the HW interface is that it is required to know the length
of every data transfer. This may cause issues with applications where the transfer
length is not known. However, in some situations this problem can be solved by
buffering the data until all of it is generated, and transferring it afterwards. [44]

Figure 13 shows a high-level illustration of the architecture of RIFFA. As can


be seen from the figure, there is a driver and an application programming interface
(API) on the SW side, and quite a lot of logic on the FPGA side. This includes two
separate data paths, one for sending data and one for receiving data. All the design
choices that are made for the architecture are based on the objective of maximizing
throughput and minimizing latency, while also minimizing the amount of used re-
sources. Because RIFFA uses PCIe, which is an address based packet protocol, for
transferring data between the SW and the IP cores on the FPGA, data is transferred
as packets. When it is desired to send data, write packets are used. When data is
to be read, first read packets are sent, and subsequently completion packets, which
contain the actual requested data, are received. The following paragraphs present a
somewhat more detailed discussion of this architecture.
29

Figure 13: The architecture of RIFFA [44].

Let us first examine the HW side of the architecture. The data transfer is imple-
mented as scatter gather DMA based on the PCIe protocol, and the FPGA acts
as the DMA bus master in this configuration. In the above figure, this basically
consists of receive (RX) and transmit (TX) engines and everything above them. [44]
A scatter gather DMA is an architecture that, as opposed to the traditional DMA
which is able to fetch data only from a single buffer of consecutive memory, can
collect data from several buffers distributed over nonconsecutive parts of memory.
In this way, the scatter gather DMA can be used to transfer larger buffers than
is possible with ordinary DMA, reducing the number of DMA transfers and thus
increasing the overall performance of transfers significantly. [51]

As can be seen from Figure 13, the DMA bus master configuration is connected to a
PCIe endpoint core. One of the uses for this core is that it enables the translation
between the packet data supported by PCIe and the payload data. The custom IP
cores access and execute computation on this payload data via the signals of the HW
interface that were shortly introduced previously. [44]

The upstream transfer is initiated by the corresponding IP core via a specified


30

set of the signals of the HW interface. As can be seen from Figure 13, the data
is first written to the TX FIFO, where it is split into blocks that are suitable for
separate PCIe write packets. To send these blocks to the host PC, the targeted
memory locations must be known. These memory locations are provided as scatter
gather elements. Thus, a channel first requests these elements from the host. This is
illustrated as the “scatter gather requester” box in Figure 13. After a channel has
acquired the scatter gather elements, it creates a write packet supported by the PCIe
protocol for each data block created in TX FIFO. [44]

Because there are multiple independent channels transferring the upstream data, and
each of them shares the same PCIe upstream direction, multiplexing is needed. This
is provided and managed by the TX engine block, shown in Figure 13. In addition,
the TX engine also formats the channel requests to complete PCIe packets and sends
them to the PCIe endpoint. [44]

The downstream transfer is initiated by the host PC via SW APIs. Once the
initialization has been completed, the channels request the scatter gather elements
(memory locations) for transferring the data. These requests are addressed by the
TX engine. After the memory locations have been received, separate PCIe read
requests are made for the data at the memory locations defined by the scatter gather
elements. This requested data is forwarded to the RX engine at the PCIe endpoint.
There, the RX engine extracts data from the PCIe packets and demultiplexes the
data to correct channels. The reordering queue shown in Figure 13 is used to ensure
that the data is forwarded to the channel in the correct order. [44]

Let us now examine the SW side of the architecture. As can be seen from Fig-
ure 13, there is a kernel device driver and an API on the host PC. The kernel driver
registers all connected FPGAs that have RIFFA support, and it allocates a memory
buffer for each FPGA that is used for transferring scatter gather data between the
host PC and the FPGA. The language bindings allow the user application to call
into the kernel driver, and the FPGA can access the driver via a device interrupt. [44]

Even though SW tools like OpenCL and RIFFA make it more effortless to integrate
FPGAs into traditional SW development, they both require relatively complicated
logic to be able to adapt easily to different FPGA designs. This causes the overhead
in communication between the host SW and the FPGA to be quite high. Even
though this overhead might not cause issues in many applications, the real-time
requirements of 5G L1 discussed in chapter 2.2.2 are so strict that the overhead will
cause problems. Different latencies of RIFFA are listed in Table 2:
31

Table 2: Latencies of RIFFA [44].

Description Value Delta


FPGA to host interrupt time 3µs ±0.06µs
Host read from FPGA round-trip time 1.8µs ±0.09µs
Host thread wake after interrupt time 10.4µs ±1.16µs

The “FPGA to host interrupt time” is the latency between the moment when the
FPGA signals an interrupt to the host and the moment when the host receives the
interrupt. The “host read from FPGA round-trip time” is the total time that it takes
from a request to propagate from the kernel driver to RIFFA (through RX and TX
engines), and back. The “Host thread wake after interrupt time” refers to the time
to resume a SW thread after it has been woken by an interrupt from the FPGA. [44]
As can be seen, all these latencies are in the order of microseconds, which is a large
portion of the time available to perform a certain calculation in 5G L1. In many
worst-case scenarios (from the viewpoint of timing requirements), such latencies are
not acceptable.

For OpenCL, the problem with the overhead is even worse. The issue with OpenCL
is the overhead associated with launching a kernel function on the device, for example
an FPGA or a GPU. This overhead can be in the order of tens of microseconds, and
it is not dependent on the actual computation that is performed on the kernel. [52,
53] Based on the real-time requirements of 5G L1, it is trivial that such overhead
times can not be tolerated. In contrast, accessing the registers of a memory mapped
FPGA is considerably faster. Depending on the situation, the time required for this
is in the order of few hundred nanoseconds. [54]

2.6 General guidelines for team-based FPGA development


As discussed in chapter 1, the generation of the UPM causes difficulties for the FPGA
design teams, and thus the development process should be improved. It is not unique
to Nokia that the FPGA development process causes problems. Instead, several
companies are facing challenges with FPGA development, and these problems have
been studied. For example, P.A. Simpson has identified three main sectors that are
required for successful FPGA development: [55]

1. Project management. This includes for example defining the project require-
ments and objectives, resource and cost management, risk assessment and
project execution.

2. FPGA vendor choice and partnership. The careful selection of the FPGA
vendor ensures that the same technology and devices can be efficiently used for
not only the current but also the future FPGA projects.
32

3. FPGA design methodology. This includes the whole FPGA design flow, includ-
ing for example device selection, IP reuse and design environment.

If each of these sectors are designed carefully, successful FPGA development and
desired results can be achieved [55]. Because the problems at Nokia are closely
related to the interaction between different FPGA design teams, the best practices
of working in a team-based design environment are discussed in more detail.

Nowadays, many large FPGA design processes are divided into several teams, each
team developing a certain portion of the design. There are several advantages that
can be achieved by successfully using this team-based design approach. The first
one is acceleration of the design process. Because the separate teams can begin to
implement their parts of the project without waiting for the rest of the team, the
overall process can be completed within a shorter schedule. In other words, the
process becomes more parallel. Another advantage is that the verification of the
design simplifies, because several phases of the verification can be executed by the
teams only for the corresponding portions of the design. The team-based design
approach also isolates the possible problems (for example timing issues) to a certain
portion, simplifying the debugging. The third advantage is that the compilation
time of minor changes to a single portion decreases, because the modification can be
performed only by the corresponding team without directly affecting the rest of the
design. [55]

For a team-based FPGA design process to be successful, there are two major parts
that are required. The first one is a single team lead that is responsible for the
top-level design planning and integration of the design. The second one consists of
the other team members that create the RTL descriptions of the corresponding IP
blocks, and implement the blocks. The team leader creates new top-level designs
either according to a predefined schedule or whenever a major improvement has
been developed for one or more IP blocks. Even though the idea is that the team
members develop the portions of the design independently, it would be efficient if
the portions are developed in the context of the top-level design to avoid issues in
integration. For these two parts to work effectively together, the team lead should
create an initial project setup that assigns the portions of the design to the different
teams. [55] Figure 14 shows a high-level illustration of the team-based design flow.
33

Team leader sets


up project

Team members
implement partitions

Team leader integrates


and analyzes full
design
Implement changes

No
Design finished?

Yes

Done

Figure 14: Team-based FPGA design flow. Modified based on [55].

If the FPGA system contains a processor that is used to execute SW, the overall
development is usually considered to be embedded system design. This requires new
aspects to be taken into account because adding SW to the development process
introduces new design challenges to the HW engineers. One major advantage of
using FPGAs in embedded systems is that the reconfigurable nature of FPGAs allow
the developer to modify the design in both SW and HW. This makes it possible
to match the design more precisely to the target application. In comparison, if an
embedded system is built around a simple microcontroller, only SW modifications
can be done to meet the requirements. On the other hand, the flexibility provided by
an FPGA also creates the problem of determining how a certain functionality should
be implemented or how computations should be split between SW and HW. [55]

One of the challenges of introducing SW to FPGA development is that the HW


engineer must design the interface between the processor and the FPGA logic. In
addition, the register map that is used by the SW must be created and maintained.
34

Each IP block includes a register interface that is mapped to the addresses of the
SW interface, and these registers represent the data that is transferred between HW
and SW. The register address map contains each of these registers. [55]

The register address map is the main interface between the application SW and the
RTL description of the HW, and the information provided by it is used by many
participants in several data formats. Therefore, the address map is shared within
several teams and groups that are associated with the embedded system develop-
ment. This creates challenges in synchronization of the address map. For these
reasons, it is highly important that the register address map is strictly controlled
and every modification to the map is communicated across the entire design team. [55]

As mentioned previously, the HW designer must connect the register address map
interface to other parts of the system. This requires creating considerable amount of
logic for all the registers to be able to communicate with the system. The problem
that arises is that modifying the register address map requires the HW developer
to modify the interconnection code and the documentations, and the address map
must be again communicated to the rest of the design team. In addition, the SW
developers must modify the SW header files accordingly. Performing this manually
requires much work and is prone to errors. However, FPGA vendors and other
companies provide EDA tools that automate the creation of the interconnection logic,
SW header files and all the necessary other files. It is highly recommended to use
such tools for register address map management. [55]
35

3 Solution development
This chapter first introduces the current FPGA development workflow and the
problems related to it. The rest of the chapter focuses on explaining and motivating
the proposed improvements to the workflow, and describing the development of the
improvements.

3.1 Current FPGA design workflow


The FPGA design workflow includes multiple HW design teams, consists of several
steps and contains the usage of many HW design tools. As the FPGA designs used
for 5G L1 are large and complicated, it is desired that the workflow itself would
be as simple and straightforward as possible. Therefore, as shortly described in
chapter 1, it would be essential to have a more automated process for the creation
and maintenance of the UPM.

Before the development process is described in more detail, let us first discuss
the specifications of the UPM and the requirements for creating it. The UPM itself
is an XML file that contains a description of the interfacing registers towards the L1
SW. The most important requirements for the UPM are listed below:

• The UPM must contain the interfacing registers of every FPGA design variant,
even though each design does not have to actually implement the registers of
every variant.

• Every FPGA design variant must use the same version of the same common
IP blocks.

• Every modification made to the UPM must comply to the open-close principle.
This means that new interfacing registers may be added to previously unused
memory locations and new fields may be added to previously unused parts of a
register. However, the names, locations or contents of existing registers can
not be modified without special arrangements, which include synchronization
between the L1 HW and the L1 SW teams.

• When a modification to some common IP is performed, every team should


release a new bitstream that contains modifications to only this specific IP.
This makes testing easier.

• Each FPGA variant team may release independently new bitstreams without
synchronization as long as no modifications to common IPs are performed, and
the new releases are backwards compatible with the UPM.

Figure 15 shows a simplified block diagram that illustrates the basic building blocks
of the 5G L1 FPGA designs. As can be seen from the figure, there are common IP
blocks and variant-specific IP blocks, and each block is directly connected to one
36

register bank. The register banks are further connected to the SW, and through these
connections the SW can be used to control the IP blocks implemented on the FPGA.
The dashed box around the register banks illustrates the UPM, and as can be seen,
it combines the register banks of every common and variant-specific IP block.

SW

Register bank Register bank Register bank UPM

IP (variant A) IP (common) IP (variant B)

Figure 15: The building blocks of the FPGA designs.

The teams involved in the FPGA design process can be divided into three categories:
the teams developing the IP blocks and the corresponding register banks of the
FPGA design (submodule teams), the teams that integrate these blocks into the final
top-level design of each variant (integration teams), and the team that generates the
final UPM. There is a separate team for developing every IP module, and there is
also an independent integration team for every L1 HW variant. Figure 16 illustrates
the different team categories, and the interaction between the teams. As can be seen
from the figure, the submodule teams provide all information of an IP block (i.e. the
description of the register bank as Excel or XML format, and the VHDL codes of the
IP block and the register bank) to the integration teams, and the integration teams
in turn provide the PM XML files of each design variant to the UPM generation
team. Every integration team also generates a bitstream to be loaded to the FPGA,
even though only two bitstreams are shown in Figure 16.
37

UPM

Bitstream 1 Bitstream 4
PM XML UPM generation PM XML

PM XML PM XML

Variant 1 Variant 2 Variant 3 Variant 4


integration integration integration integration

Common IP 1

Common IP 2

Specific IP 1

Excel/XML, VHDL codes of the


IP block and the register bank
Specific IP 2

Specific IP 3

Figure 16: Overview of the current workflow.

Let us now describe the development process in more detail. The design flow starts
from the specifications for the HW to be implemented. Based on the specifications,
the RTL descriptions of the required IP blocks and the interfacing registers towards
the L1 SW are created by the submodule teams. Each submodule team is responsible
for implementing one specific IP block and the corresponding register interface of
the top-level design. The RTL descriptions of the IP blocks are created using VHDL
language and HLS tools. On the other hand, the RTL descriptions of the interfacing
registers are not created manually by the developer. Instead, a script that takes as
input a simple description of the desired registers, in Excel format, is used.

The Excel tables describing the interfacing registers are created manually by the
submodule teams. For the Excel tables to be compatible with the script, the tables
must conform to a certain format. This required format is further discussed in chapter
3.2.2. The script produces an XML file and a VHDL file based on the input Excel
table. The usage of these files are addressed shortly. Because a register interface can
technically be generated without any information about the corresponding IP block,
the internal implementation and functionality of the IP blocks are not addressed in
38

this thesis. However, in practice the desired usage of the IP block must be known,
because otherwise it cannot be specified what kind of interfacing registers are required
for the IP.

The XML file produced by the script contains essentially the same description
of the interfacing registers as the Excel file, and it complies to the IP-XACT format.
IP-XACT is an XML format that is used to describe individual and reusable elec-
tronic circuit designs. The purpose of IP-XACT is to provide a standard that can be
used for compatible sharing and reuse of circuit designs between component vendors
such that they are also compatible between different EDA tools. [56] IP-XACT has
been standardized by the Institute of Electrical and Electronics Engineers (IEEE) in
the IEEE 1685-2009 standard [57]. The created XML file is used in the subsequent
phase of the design flow, as an integration team generates a programming model
corresponding to a certain design variant.

The VHDL file produced by the script is used to create the HW interface for
the control SW. It contains not only the VHDL description of the used registers,
but also all the interfacing logic that is necessary for accessing the registers both
from the IP block HW and from the ARM cores. The interface protocol according
to which the VHDL code is created is specified in the Excel table. This can be for
example the advanced xtensible interface (AXI) protocol. The creation of the XML
file and the VHDL file of the interfacing registers are performed for every IP block.
Afterwards, the created IP-XACT descriptions and the VHDL files are stored to an
IP library for later reuse.

All parts of the design flow discussed so far are performed by the submodule teams.
Thus, the next step in the process involves the integration teams. As implied in
Figure 16, the submodule teams provide the RTL descriptions of the IP blocks and
the Excel/XML and VHDL descriptions of the register interfaces to the integration
teams. At this point, it should be noted that the submodule teams strive to develop
as “general” and reusable IP modules as possible, i.e. they do not try to optimize
the modules for some specific purpose. This is a good approach in terms of the
reusability of the IP blocks. However, as pointed out in chapter 2.6, this approach
may sometimes create problems in the final integration.

The integration team of each variant uses the IP blocks that belong the specific
variant to create the top-level L1 HW design and the corresponding bitstream. This
design then goes through the basic steps of FPGA design flow described in section
2.3.2 (simulations, synthesis, place-and-route, timing analysis etc.), and finally the
generated bitfile can be loaded to the Stratix 10 FPGA. The SW tool used for this
part of the design flow is the Intel Quartus Prime Pro (from now on referred to
as Quartus). Quartus is a tool developed by Intel that is specifically targeted for
developing HW for PLDs. It can be used for the whole development process, starting
from the VHDL programming all the way to the bitfile generation. However, in this
case the most important tool in Quartus is the platform designer.
39

The platform designer is a system integration tool that is used to automate the
process of integrating separate IP blocks into a larger design. It provides a graphical
user interface where the developer can easily connect the different IP blocks together,
assuming that each IP block is developed to support a certain interface protocol.
Only certain protocols are supported by the platform designer, one of them being
the AXI protocol that is used to connect the IP blocks to each other and to the
ARM cores of the Stratix 10 device. After all the components are connected to
each other in the platform designer, it can be used to generate the top-level RTL
description of the design, based on which the final bitstream can be generated
and loaded to the FPGA. In addition to connecting the IP blocks together, the
base addresses for the SW interfaces are also specified in the platform designer.
Since the SW running on the ARM cores and the FPGA HW communicate by
sharing data via the interfacing registers, each located in a certain memory address,
this interface implements the MM architecture, discussed in chapter 2.4. There-
fore, it is essential that the base addresses can be acquired from the platform designer.

In addition to the bitstreams, each integration team also produces the PM XML file
of the corresponding variant. Essentially, the PM XML file is a combination of all the
XML files corresponding to the IP blocks that are used in the FPGA design variant.
In other words, the PM XML file contains all the interfacing registers of those IP
blocks that are used by the integration team. The PMs of the independent design
variants are created automatically using a C shell script. This script requires as
inputs the XML files of the interfacing registers of the different components (provided
by the submodule teams), and the base addresses of the register banks. The script
produces several outputs, the most important of which are the PM XML file and
the C header files that could be used to develop SW for the FPGA design. However,
these header files are not used because they do not comply to the UPM. Instead, the
used header files are generated separately based on the UPM by the L1 SW team.

The final part of the development process is the generation of the UPM. This
is done by a separate team. As can be seen from Figure 16, each variant integration
team provides the PM XML file of the corresponding variant to the UPM generation
team. As the UPM is the “top-level” PM, it must be formed by combining the
information of all the PMs provided by the integration teams. Currently, this is done
quite manually. One of the L1 HW variants is a “master” variant in the sense that
every IP block and the corresponding register interface that is included in this master
variant is also included in some other variant, but not all the blocks that are in the
master variant are included in every other variant. Therefore, every register interface
that is included in the master variant must be included in the UPM. Currently,
every register bank XML file that is not included in the master variant, is searched
and extracted manually from the XML files of the other variants. These are then
manually copied into a single XML file, that is merged to the XML file corresponding
to the master variant with a C shell script.
40

Figure 17 describes the current workflow as an activity diagram. Each rectangle in


the diagram represents one step in the workflow. The rectangles are color-coded
such that a red rectangle represents a step that is done manually, a blue rectangle
represents a step that has been automated with a script, and a yellow rectangle
represents a step that has been partly automated with a script. The figure is quite
simple to interpret: the upper part of the figure shows the submodule teams that
create IP blocks and the Excel files, from which the XML and VHDL descriptions of
the interfacing register banks are created. The lower part of the figure shows the
integration teams that use the IP blocks to develop the top-level designs and the
corresponding PMs, which are then combined to form the UPM. However, it should
be noted that in reality the design flow is not as strictly sequential as implied by
Figure 17. In particular, the integration teams do not necessarily develop a new
top-level design always after some IP blocks in the IP database have been updated.
Instead, there may be several releases of new / updated IP blocks before one or more
integration teams develop a new top-level design.
41

Submodule team 1 Submodule team 2 Submodule team N-1 Submodule team N

Create IP block Create IP block Create IP block Create IP block


and Excel file and Excel file and Excel file and Excel file

Convert Excel file Convert Excel file Convert Excel file Convert Excel file
to XML file and to XML file and to XML file and to XML file and
VHDL file VHDL file VHDL file VHDL file

Store files to
IP database

Integration team 1 Integration team M-1 Integration team M

Fethch IPs from Fethch IPs from Fethch IPs from


the database the database the database

Integrate IPs Integrate IPs Integrate IPs


and compile and compile and compile

Add base Add base Add base


addresses to PM addresses to PM addresses to PM
generation script generation script generation script

Generate PM Generate PM Generate PM

No
Are PMs compatible?

Yes Manual work

Automated with
Generate UPM
script
Partly automated
with script

Figure 17: Activity diagram of the current workflow.


42

One of the most important observations to be made from the figure is that there
is a lack of organized communication between the different teams. In addition, it
can be seen that writing the base addresses to an PM generation script is performed
manually by an integration team, even though this could be done automatically
based on the .qsys file that is produced by the platform designer. Finally, it can be
seen that the generation of the UPM is partly manual, even though it could be fully
automated. These issues will be addressed in the next section of this thesis.

Let us now compare the described development flow with the best practices of
team-based FPGA development discussed in chapter 2.6. If only a single variant-
integration team and the submodule teams are considered, the design process is
almost identical to the preferred embedded system development process. The variant-
integration team can be considered as the team leader, and the submodule teams
correspond to the other team members which independently develop certain portions
of the top-level design. In addition, the register bank interfaces of the IP blocks are
fluently communicated to the integration team and the platform designer is used
to automatically generate the interfacing logic and the SW header files that are
transferred to the L1 SW team. The main difference is that the submodule teams
produce as general IP block as possible, i.e. they are not developed in the context of
the top-level design.

However, as several variant-integration teams and the UPM generation team are intro-
duced to the process, it ceases to follow one of the essential general guidelines. This is
the requirement for a single team leader that would manage all the other participants
in the process. Even though it can still be seen that the variant-integration teams
act as leaders for the submodule teams, there is no clear leader for the integration
teams. They are in equal position with respect to each other, and even though the
UPM generation team communicates with each integration team, it does not manage
or control them.

Another issue that was emphasized in chapter 2.6 was that it is essential to com-
municate the modifications of the register address map (programming model) to
every group involved in the FPGA design process. As mentioned, this principle is
followed in the context of one integration team, the submodule teams and the SW
team. However, this is not applied between the different variant integration teams
which can also be observed from Figure 17. Thus, the integration teams do not
clearly communicate with each other on which versions of the IP blocks they are
using in the corresponding FPGA variants.

These two issues, i.e. the lack of a leader for the integration teams and the lack of
communication between the integration teams, are the main causes for the problems
in maintaining the PMs synchronized over all FPGA design variants. The lack of
the team leader enables the integration teams to develop the FPGA designs indepen-
dently, inevitably leading to the situation where the different variants, that should
comply to the same UPM, use different versions of the same common IP blocks. The
43

lack of communication in turn prevents the teams from fixing this because they do
not systematically share the information on the used IP blocks.

3.2 Proposed improvements and motivation


As it was motivated in chapter 2.5, utilizing a sophisticated SW tool such as OpenCL
or RIFFA for creating and maintaining the interface between the SW and the FPGA
HW is not a feasible solution. This is due to the overhead associated with such tools,
which is too high to fulfill the real-time requirements of 5G L1. It would also be very
difficult to try to develop such an interface that would satisfy these requirements,
and thus it is not attempted in this thesis. Instead, the proposed improvement is
based on the idea of making it simpler to maintain, modify and share the manually
created interface. As it was discussed in the previous section, most of the problems
in the current FPGA development flow are related to the lack of communication
between different teams, and to the lack of optimization and automation of several
phases of the design flow. This thesis strives to address both of these issues.

The work done in this thesis can be divided into two parts. The first part con-
sists of improving the current design process. These improvements are to be taken
into use quite quickly. The second part of the work introduces a more significant
modification to the design process, which is examined as a proof-of-concept. It is not
assumed that this new process will be taken into use in the near future, but the goal
is to show that the proposed process can be used in FPGA development and that it
can provide improvements to the process. These two parts are addressed separately
in the following two sections of the thesis.

3.2.1 Improvements to the current workflow


Let us first consider the improvements that could immediately be taken into use in
the development process. Two steps in the process were recognized where efficiency
can be improved:

1. The method for acquiring the base addresses of the interfacing register banks.

2. The generation of the UPM based on the PM XML files of the design variants.

Currently, the base addresses are written manually to the PM generation script.
They are defined in an Excel table that is formed based on the platform designer.
This is unnecessary manual work, because when the top-level design of the L1 HW is
constructed in the platform designer, a .qsys file, that contains all the base addresses
and the associated register banks, is generated. In addition to being slow, writing
the base addresses to the script manually is quite prone to errors. Thus, it would be
faster and less error-prone to fetch the base addresses from the .qsys file with a script.
Therefore, a Python script that reads the base addresses from the .qsys file based on
the names of the components defined in the C shell script was created. However, the
44

component names defined the C shell script and the component names in the .qsys
file differ quite much, and currently the mapping between these names is hard-coded
in the Python script. This mapping could maybe be replaced with an algorithm that
directly finds the base addresses based on the component names. Some kind of string
distance algorithm, for example the Edit distance, could be utilized for this purpose.

The process of generating the UPM is also performed quite manually even though
it could be more automatic. The first step in the UPM generation process is that
the variant integration teams provide the PM XML files of each variant for the
UPM generation. As mentioned earlier, one of these variants can be considered
as a “master” variant, and each of the interfacing register banks of that variant
will be included in the UPM. Thus, all the register banks that are not included
in the master variant, but can be found from one or more other variants must be
searched. Currently, this is done manually by reading through all the PM XML
files, and copying and pasting every found XML register bank description (that is
not in the master variant) to a separate XML file. This new file and the master
variant XML file are then merged together with a C shell script to form the final UPM.

The UPM creation process was improved by creating a Python script that au-
tomatically reads through all the PM XML files of the variants, searches the desired
register bank descriptions, and writes them to an XML file that can subsequently be
used as an input for the C shell script. Currently, the names of the desired register
banks are hard-coded to the Python script, instead of every time searching for the
differences between the XML file of the master variant and all the other variants.
However, this does not make the implementation less flexible because it is known that
only very rarely new register banks are added to/removed from the design variants.
Therefore, the hard-coded register banks must be modified only very rarely.

3.2.2 The new approach


Let us now describe the proposed new approach for the FPGA development process.
As mentioned in chapter 3.1, the communication between the variant integration
teams is not as organized as it should be. This causes issues because according to
the specifications of the UPM, the register banks of the common IP blocks should be
the same for every variant, and also every common IP block should be in the same
version for every variant. This is difficult to achieve if the variant integration teams
independently produce new bitstreams and send the corresponding PMs to the UPM
generation team without organized communication. In the proposed method, it is
required that the integration teams communicate with each other, but the objective
is to make the communication as effortless as possible.

Because the UPM is a combination of all the register banks of every IP block
of the 5G L1 top-level design, it is desired to develop a simple method for keeping the
PM of one common IP block synchronized over all design variants. This can then be
easily expanded to the whole UPM, as the same method can be applied to every com-
45

mon IP block of the 5G L1 design. The variant-specific IP blocks of the design do not
need to be synchronized because they are not shared between all the different variants.

It should be noted that the implementation of the functionality that will be us-
ing the information provided by the added registers or register fields could cause
problems for some teams. This is due to the internal connections between the IP
blocks in different variants. Thus, it may be difficult for some teams to take the
required new functionality into use because it could cause issues for some internal
connections between the IP blocks. However, this problem is out of the scope of
this thesis because solving it would require the knowledge of the intended purpose
of each register that is included in the UPM, and it is only known by the HW
developers who are developing the corresponding feature. This problem can be seen
as a manifestation of the issue that the submodule teams do not develop the IP blocks
in the context of the top-level designs, even though this would be recommended
according to the best practices discussed in chapter 2.6.

To keep one IP block and the corresponding register bank synchronized between
different HW design teams, there must be a simple and effective method to transfer
data between the teams. In the proposed solution, the data transfer is implemented
with a single top-level Excel file. This file contains the descriptions of every register
bank, and also the base address and the “scope” of each register bank is stored in
it. In this context, the scope of a register bank refers to all the design variants that
use the register bank. In more detail, the process can roughly be described with the
following steps:

1. One integration team desires to take into use a newer version of a common IP
block, provided by the corresponding submodule team. The other teams must
review this proposed modification via the top-level Excel file.

2. If every integration team accepts the proposed modification, it is merged to


the top-level Excel file, which is then pushed into a version control system.

3. All the other teams can now fetch the top-level Excel file from the version
control, extract the modified register bank, and generate new XML and VHDL
files based on the modified register bank.

It should be emphasized that the above process must be followed only for the common
IPs. If an integration team desires to take a newer version of a variant-specific IP
block into use, this can be done without acceptance from the other teams. However,
the modified register bank of a variant-specific IP must still be merged to the top-level
Excel file because the UPM is generated based on it. For this process to be effective,
it is essential that the top-level Excel file is stored to a version control system, and
that there are proper and systematic review practices applied to it. A problem
related to this is that the current FPGA design workflow does not include proper
code review, and thus it would be important that such review system is introduced
46

to the design flow.

In practice, the solution was implemented and tested as a proof-of-concept method.


This means that it may not be directly applicable to the FPGA development process
as such, but it should still prove that the developed method can be applied to the
process after the method has been tailored for it. Further discussion and analysis of
this developed method is presented in chapter 4 of this thesis.

Let us now consider the implementation of the new process in more detail. The first
issue that was noted in the process was that viewing and modifying the Excel files,
that may in some occasions be considerably large, is in many situations quite clumsy.
This is especially true for the top-level Excel file that may consist of the data included
in tens of individual Excel files, each describing a register bank of a separate IP block.
In addition, it is relatively complicated to develop scripts for parsing or modifying
the data stored in Excel files. Therefore, it was concluded that there should be a
better format for storing the information of the interfacing registers, and it was de-
cided to examine the usage of Javascript object notation (JSON) files for this purpose.

JSON is, as the name implies, based on the Javascript programming language,
and it is a data structure that can be used to store data in an organized and easily
accessible manner. JSON is relatively easy for humans to read and write, but at the
same time it is also simple for a computer to access and parse. JSON consists of two
fundamental structures: a collection of name-value pairs, which can be compared
to for example dictionaries or objects in traditional programming languages, and
ordered lists of values, which can be compared to lists or arrays. [58]

Figure 18 shows an example of an Excel table that contains the description of


the interfacing registers towards the L1 SW. As can be seen, there are a total of 16
different fields for storing information. The fields of greatest interest are “address”,
“register”, “field”, “start”, “stop”, “dim”, “rw” and “default”. These are the fields
that are used to define each register and their properties. In the first column of the
Excel table, some general properties of the register bank are defined. As an example,
in the cell A5 it is defined that this register bank uses the AXI4-Lite protocol for
interfacing with the ARM cores. Another issue to note is the text “Type_1” in the
cell P7. This implies that the IP block corresponding to this register bank is specific
to only one FPGA variant, which is here denoted as “Type_1”. Another possibility is
“VARIANT_COMMON”, which would imply that the IP block is common to every
FPGA variant. In addition, the cell P2 defines the base address of this component.
This information in the cells P2 and P7 is required for parsing the top-level Excel file
and for synchronization between the integration teams and the top-level Excel file.
47

Figure 18: An example of an Excel file describing the interfacing registers.

The Excel table in Figure 18 defines three interfacing registers: “Register_1”, “Regis-
ter_2” and “Register_3”. These are defined in the column F of the table. In column
E, the address of each register is specified. However, these are not the absolute
addresses of the registers, but rather offsets to the base address of the register bank.
For example, the absolute address of “Register_2” is <base address> + 0x040. In
column J, the dimension of a register is defined. This can be used to easily define a
varying number of identical registers. In this case, the dimension of every register is
1, so there are only one of each register. If the dimension for some register would be
for instance 3, it would mean that three identical registers would be generated.

In column G of the Excel table, the different fields of a register are defined. A
field is a certain range of bits of the register. For example, the register “Register_1”
consists of three fields: “Register_1_1”, “Register_1_2” and “Register_1_3”. The
range of bits that each field covers can be seen in the columns H and I. By looking at
these columns, it can be seen that each register is 32 bits wide. For the register “Reg-
ister_1”, bits 31-21 are covered by the field “Register_1_1”, bits 20-11 are covered
by the field “Register_1_2”, and bits 10-0 are covered by the field “Register_1_3”.
The field coverage areas for registers “Register_2” and “Registers_3” can be found
similarly.

Finally, the columns K and L define the access type and the default value of the
register fields. The access type defines the type of operations that the SW can do to a
register. Some common types are displayed in Figure 18, namely read-write (RW, the
SW has both read and write access), write-only (WO, the SW has only write access),
and read-only (RO, the SW has only read access). It can be seen that all fields of the
register “Register_1” have RW access, all fields of the register “Register_2” have
WO access, and the field “Register_3_1” has RO access. In column L, the default
value of each field is defined.

Considering the Excel table in Figure 18, it can be seen why JSON is a suitable data
type for storing information about the interfacing register banks. The fields in the
first row of the Excel table can be seen as properties that belong to the register bank.
Every field is associated with a varying number of values, and some of these values
48

may further be associated with other values. As an example, the field “register” is
associated with three registers, and each of them is further associated with at least
an address, dimension and some number of fields. Each of these fields is in turn
associated with other values. The two fundamental data structures of JSON can be
well applied to such information relations.

As an example of the usability and format of the developed JSON description,


the JSON file corresponding to the Excel table shown in Figure 18 is presented below
in Listing 1.
Listing 1: JSON file corresponding to the Excel table in Figure 18.
IF1_registers :
{
//group : [Example001, AddressUnitBits=8, LeastAddressableUnit=8,
reg_gen_options= −protocol AXI4−Lite, IF_NAME=AXI_Lite_Slave_0],
addressblock : Example001_BLK,
registerfile : ,
range : 0x500,
address : 0x0,
registers : [
{register : Register_1,
fields : [
{field : Register_1_1,
start : 31,
stop : 21,
rw : RW,
default : 0x0,
function : ,
description : ,
comment : },

{field : Register_1_2,
start : 20,
stop : 11,
rw : RW,
default : 0x0,
function : ,
description : ,
comment : },

{field : Register_1_3,
start : 10,
stop : 0,
rw : RW,
default : 0x0,
49

function : ,
description : ,
comment : }
],
address : 0x0,
dim : 1,
function : ,
description : Example description,
comment : },

{register : Register_2,
fields : [
{field : Register_2_1,
start : 31,
stop : 1,
rw : WO,
default : 0x0,
function : ,
description : ,
comment : Example comment},

{field : Register_2_2,
start : 0,
stop : 0,
rw : WO,
default : 0x0,
function : ,
description : ,
comment : }
],
address : 0x040,
dim : 1,
function : ,
description : ,
comment : },

{register : Register_3,
fields : [
{field : Register_3_1,
start : 31,
stop : 0,
rw : RO,
default : 0x0,
function : ,
description : ,
50

comment : }
],
address : 0x080,
dim : 1,
function : ,
description : ,
comment : }
],
hdl_path : ,
function : ,
description : ,
comment : Type_1,
base : 0x00300000
}

It can be seen from Listing 1 that the JSON file contains exactly the same infor-
mation as the corresponding Excel file. However, the base address in the JSON file
is associated with a separate field “base”, even though in the Excel file it is stored
under the comment field. This approach has been selected because the comment field
may contain much more information than just the separation between specific and
common blocks, and thus parsing the base address from the comment field would be
prone to errors.

Let us now consider the actions that are required from the integration teams. In the
proposed new method, this process begins with a team fetching the top-level Excel
file from the version control. Afterwards, every sub-Excel file that is found from the
top-level Excel file is extracted to a separate Excel file and all these files are stored
to a specified directory. The next step in the process is to convert every extracted
Excel file into a JSON format. This is done to enable the development and usage of
simple and robust scripts for accessing and modifying the register banks.

After all the JSON files have been generated, the next step in the process con-
tains synchronization between the top-level Excel file and the PM that is currently
used by an integration team. As all the integration teams have accepted the mod-
ifications in the review of the top-level Excel file, it is naturally desired by an
integration team to replace the old IP blocks and the register banks by the mod-
ified ones. To achieve this, a script for comparing two JSON files was developed.
This script is used such that every JSON file that was generated based on the top-
level Excel file is compared with a corresponding old JSON file that was produced
when the integration team had previously fetched the top-level Excel file from the
version control. If a difference between the two JSON files is recognized, the old
JSON file is replaced by the new JSON file (similarly for Excel files), and a new
XML register bank and a VHDL register bank are generated based on the new Excel
file. These can then be used for the generation of the PM XML file and the bitstream.
51

The previous step illustrates one of the advantages of using JSON format instead of
Excel format for storing the information about the register banks. The comparison
of the register banks is considerably more robust and simpler to implement using
JSON files, compared to using Excel files. The file format of JSON is very clearly
structured and organized, and for example empty spaces or empty lines in the JSON
file have no effect on how a computer interprets the file. On the other hand, the
data format in an Excel file is much more loose. Only few empty rows or columns
in a specific place could cause the comparison algorithm to fail. In addition, JSON
files are more lightweight than Excel files. Due to these advantages of JSON files
over Excel files, it would be desired to use only the JSON files in the development
process. However, the script that produces XML and VHDL files based on the Excel
files uses tools provided by a company called Magillem, and these tools require that
the input of the script is an Excel file in certain format. It would require much effort
to create a script that would convert the JSON file directly to the XML and VHDL
files, and therefore the implementation of such a script is out of the scope of this
thesis. However, it could be done as a future improvement.

Even though the JSON files offer several advantages compared to the Excel files, they
contain also certain disadvantages. The register bank files are designed to be read
and modified by humans, and therefore it is highly important that they are simple
to read, interpret and modify. This is one area where the Excel format outperforms
the JSON format. With JSON files, the user must take care that in addition to text,
also parentheses, quotation marks and commas are formatted correctly so that they
support the JSON format. Thus, to completely replace the Excel files with JSON
files, it would be desired to develop a user interface for modifying and viewing the
contents of the files.

Now that the differences between the top-level Excel file and the previously used
register banks have been synchronized, the integration team can, if desired, propose
new modifications for the IP blocks to other teams. After that, the final step in the
process is to merge the (possibly) modified Excel files to the top-level Excel file. Each
Excel file consist of four sheets, and each sheet must be merged to the corresponding
sheet of the top-level Excel file. To make this step effortless, a Python script was
created for merging the Excel files to the top-level Excel file.

All the previously described steps in the new development process are combined into
two Python scripts. The first one implements all steps from fetching the top-level
Excel file from the version control to generating the necessary XML and VHDL files
based on the comparison of the JSON files. The other script is used to merge all the
desired Excel files into the top-level Excel file. This makes it very straightforward
for an integration team to perform these steps, enabling the team to focus on more
complicated work. In conclusion, these two top-level Python scripts perform the
following steps:
52

1. Extract every sub-Excel file in the fetched top-level Excel file into a separate
Excel file.

2. For each extracted Excel file, generate a corresponding JSON file.

3. Compare each produced JSON file with the older version of the same JSON
file that is stored in the local directory of the integration team.

4.1. If differences are found in the comparison and the JSON file represents a register
bank that is used in the specific design variant, generate the XML and VHDL
description of the register bank based on the corresponding Excel file.

4.2. If there are no differences, or the JSON file represents an IP block that is not
used by the integration team, move on to the next JSON file.

5. When all JSON files are compared, replace all the old JSON files with the new
JSON files.

6. Merge all the Excel files together to form the top-level Excel file.

Finally, the team that created the top-level Excel file pushes it to the version control
system where other teams can easily review it and pull it to their local working
directories for modification and synchronization. This process is repeated as the
development proceeds.

The activity diagram in Figure 19 illustrates the proposed new workflow. For
the sake of simplicity, the diagram illustrates an example of a workflow that includes
only two variant integration teams, which are noted as team A and team B. This
diagram describes a use case where team A first takes a newer version of a common
IP block into use, and subsequently team B updates its local design to comply
to the modified top-level Excel file. In addition, there is also a UPM team that
finally generates the UPM that covers both variants. The diagram in Figure 19 is
color-coded similarly as the diagram in Figure 17.
53

Submodule team 1 Submodule team 2 Submodule team N-1 Submodule team N

Create IP block Create IP block Create IP block Create IP block


and Excel file and Excel file and Excel file and Excel file

Convert Excel file Convert Excel file Convert Excel file Convert Excel file
to XML file and to XML file and to XML file and to XML file and
VHDL file VHDL file VHDL file VHDL file

Store files to Push top-level Excel


Team A
IP database file to version control

Integrate IPs Team A


Fetch top-level and compile
Team A Excel file from
version control
Fetch top-level
Excel file from Team B
Extract Excel files
Team A version control
and generate JSON
files
Extract Excel files
and generate JSON Team B
Use JSON files to
synchronize the files
Team A
teams local
working directory Use JSON files to
synchronize the Team B
teams local
Take an updated
working directory
Team A common IP block
into use
Integrate IPs
Team B
and compile
Team A Update the top-
level Excel file
Fetch base addresses
from the top-level
Request review UPM team
Team A Excel file and add
from team B them to UPM
generation script
Does team B No
accept changes?
Yes Generate UPM UPM team

Partly automated with Manual work Automated with script


script

Figure 19: Activity diagram of the proposed workflow.


54

Let us compare this diagram to the activity diagram in Figure 17. The upper part of
the diagram is identical to that of the diagram in Figure 17, describing the submodule
teams and their workflow. The lower part, on the other hand, shows the automated
steps of synchronization between the top-level Excel file and the local development
environments of the integration teams. The most significant difference is that now
there is organized communication between the integration teams A and B via review
of the top-level Excel file. As can be seen at the bottom left corner of the diagram,
team A does not take the modifications to the common IP block into use before the
changes have been reviewed and accepted by team B.

Another observation that can be made is that the integration teams do not anymore
provide PM XML files as in the diagram in Figure 17. The reason for this is simply
that the PM XML files of different design variants are not needed anymore, because
the top-level Excel file describes the UPM directly. This clearly illustrates that the
PMs of the variants are not needed for anything as such, but their main purpose is
to serve as “tools” for the generation of the UPM. This also holds for the original
FPGA workflow. Because of these differences, the UPM is not generated by merging
XML files together as in the old process. Instead, the UPM generation script in the
proposed method is the same C shell script that was used for generating the PM
XML files of the design variants in the old process. The only difference is that in
the new process, the inputs of that script include the XML files corresponding to
every sub-Excel file included in the top-level Excel file. This also means that the
C header files, that were previously generated by the L1 SW based on the UPM
(see chapter 3.1), can now be generated directly by the HW teams, based on the
top-level Excel file. As can be seen from Figure 19, the base addresses for the UPM
generation script can be acquired from the top-level Excel file with a script, and also
the generation of the UPM has been fully scripted.
55

4 Measurements and discussion


This chapter consists of two parts. In the first part, the proposed improvements
to the current workflow are further analyzed, and measurements related to them
are presented. The second part concentrates on critically evaluating the proposed
proof-of-concept method. The advantages and disadvantages of the method are
examined, and the reason for implementing this method instead of other alternatives
is also explained.

4.1 Evaluation of the improvements


In this section, the two improvements to the current workflow, discussed in chapter
3.2.1, are further examined. The main idea in both improvements is that some part
of the process that was previously performed manually, is performed with a script in
the proposed workflow. To demonstrate the improvements provided by the modifica-
tions, these steps of the process were performed with the current method and with
the improved method. Based on these tests, the provided enhancements are discussed.

The main advantage of the proposed improvements is that the time that it takes to
perform the actions was reduced significantly. Below are two tables, each illustrating
the reduction of the execution time of the corresponding phase of the design flow.

Table 3: Time to write base addresses to PM generation script.

Method to find base addresses Time


Script 4s
Address Excel file 17 min
platform designer 23 min
Manually from .qsys file 19 min

Table 4: Time to generate UPM based on the variant PMs.

Method to form variant XML file Time


Manual 13 min
Script 4 min

Table 3 shows execution times of gathering the base addresses and writing them to
the PM generation script. Four different method of gathering the base addresses
were examined: using the developed script, using a separate Excel file that contains
all the base addresses, using the address map table found in the platform designer,
and using the .qsys file, generated by the platform designer. All the base addresses
56

were removed from the PM generation script before measuring the execution time.

As can be seen from the table, the method that uses the script for fetching the
base addresses is by far the fastest. The problem with the other methods is that
it is quite slow for a human to search the correct component from a certain source.
Especially the address map table of the platform designer is quite complicated and
difficult to read for a human. Another advantage of using the script is that it is less
prone to errors than manually reading the addresses from a source file.

Table 4 shows the measured time that it takes to create the UPM. This time
consists of two parts: generating the XML file that contains all the register banks
that are not included in the master variant (“variant XML file”), and executing the
script that merges the generated XML file with the XML file of the master variant.
Generating the variant XML file was performed by using a script, and by manually
copying the register bank information from separate XML files to the variant XML
file. As can be seen from the table, this process is clearly faster if the script is used.

4.2 Discussion of the proof-of-concept method


In this section, the proposed new method described in chapter 3.2.2 is analyzed in
more detail. There are quite significant differences between the proposed workflow
and the original workflow, as can be observed from Figures 17 and 19. Based on
the figures, it may seem that the new process is actually more complicated than the
old process, and from a certain viewpoint, it is. However, by looking at the FPGA
development process in total, the new method makes it more straightforward and,
most importantly, it ensures that the programming models are always synchronized.
This is of high importance because if the UPM is not compatible with every bitstream
variant, the problems transfer from the L1 HW team to the L1 SW team. This is
the scenario that should be avoided, even at the cost of complexity in the FPGA
development process.

Let us consider the advantages of the proposed new method. Because the main goal
of the process is to ensure that every released bitstream is always compatible with
the UPM that is at use, the new process will be considered first from this viewpoint.
As described in chapter 3.2.2, two scenarios of programming model modifications
exist: a modification of a variant-specific IP block and a modification of a common
IP block. The first issue to note is that the proposed method assumes that every
modification to some register bank complies to the open-close principle, introduced
in chapter 3.1. Modifications that do not comply to this principle require special
arrangements, and they are out of the scope of this thesis. From now on, every
modification discussed in this thesis is assumed to comply to the open-close principle,
and it will not be emphasized separately.

Let us first discuss a modification to a variant-specific IP block. Because every


variant-specific IP is used by only a specific integration team, this team will be the
57

only one that desires to take an updated version of the IP block into use. Therefore,
the team may independently take a new version of the IP into use, generate and
release a new bitstream using it, and merge the modified register bank into the
top-level Excel file. The other integration teams are not involved in this process.
Even though the UPM will be updated in such a modification (because the top-level
Excel file is updated), this will not cause conflicts with the bitstreams. This is due
to the bitstreams of the other FPGA design variants not using the variant-specific
register bank that was modified.

For updates to the common IP blocks, the situation is more complicated, because ev-
ery bitstream variant uses all the same common IP blocks. Therefore, an integration
team cannot independently take an updated common IP into use. Instead, it must
communicate with the other integration teams that it desires to use a new version
of the IP block. The idea of the top-level Excel file is to make this communication
as straightforward as possible. The team may merge the proposed updated register
bank Excel file into the top-level Excel file, and ask the other integration teams to
review and accept it. By using the top-level Excel file, it is effortless for all the teams
to not only review the actual proposed change, but also to check whether it conflicts
with any of their variant-specific IPs, or planned updates to them.

The new process introduces several advantages, but it still includes communica-
tion between the integration teams, which makes the workflow more complicated for
them. Even though the teams would prefer to work independently, communication is
necessary because common IP blocks cannot be taken into use without synchroniza-
tion between the integration teams. The synchronization causes problems because
the FPGA design variants differ quite much, and the development of these variants
do not proceed at the same phase. Therefore, the communication may slow down
the FPGA development, but it is a compromise that must be made to ensure that
the UPM can be used to develop SW for every FPGA design variant. This is a first
priority in the FPGA development.

The fundamental reason for the need for communication is that to make the devel-
opment fluent, the variant integration teams must have an agreement on the used
common IP blocks before each bitstream variant is generated. If, on the other hand,
the UPM generation team strives to generate the UPM based on the PM XML
files that the integration teams have produced without synchronization, eventually
the problem of unequal common register banks will occur. This will require one or
more integration teams to take a different common IP block into use, regenerate the
bitstream, redo several functional verifications and so on. This will significantly slow
down the process.

The above problem can be avoided by synchronizing the used common IP blocks
before the bitstream is generated. The most reasonable method to do this is by
manual communication between the variant integration teams. Even if there was a
script that would compare the IP blocks of one integration team to the IP blocks of
58

all the other teams before the integration team uses the IPs to create a new bitstream,
the process would still work quite poorly. This is due to the script not knowing in
which phase of development the other teams are. For example, some team might
have plans to release a bitstream with a different common IP in the very near future.
In addition, when one integration team takes a new version of a common IP block
into use, the script would notify the other teams that the versions of the common
IP block do not match, which would again require synchronization between the teams.

Even though it was observed that the FPGA development process without syn-
chronization between teams would be very difficult to say the least, it does not
mean that major improvements using technical methods cannot be achieved. As
an example, the HW teams are currently developing a method to automatically
generate two design variants based on only one design. The idea is that one inte-
gration team would use the platform designer to create a variant that contains all
the common IP blocks and the specific IP blocks of two different (but quite similar)
variants. Afterwards, the .qsys file, which is essentially a text-based description of
the connections between the IP blocks, could be modified with a script such that
the two different FPGA variants would be “extracted”. Because the variants differ
only in the used variant-specific IP blocks, the extraction can, in principle, be done
by only removing certain IP blocks and connections from the .qsys file. However, it
has been noted that this approach cannot be used such that all the design variants
would be generated based on one master variant because there are too significant
differences between the variants. Thus, this method will not remove the need for
communication between the teams.

In addition to the new method described in chapter 3.2.2, a slight variation of


this method was also considered. The method is based on the same idea as the
proposed method, with the exception that the top-level Excel file would be main-
tained by the submodule teams instead of the integration teams. However, after
some consideration this method was discarded, because even though it would solve
some of the problems introduced by the proposed method, there are some severe
issues that make it infeasible for the FPGA development process.

The most notable advantage of maintaining the top-level Excel file by the submodule
teams is that there would be no need for synchronization between the integration
teams via the top-level Excel file. The submodule teams could simply develop new
versions of the IP blocks independently, and always update the top-level Excel file
when a new version of an IP block is released. It would then be known that as all the
integration teams develop the corresponding bitstreams based on the top-level Excel
file, every bitstream release would, in theory, comply to the UPM. This process
would basically require very little synchronization between different teams associated
with the FPGA development, which is highly desirable by the FPGA developers.

However, by considering the specifications of UPM listed in chapter 3.1, it can


be seen that there are certain requirements that are difficult to achieve if the top-level
59

Excel file is maintained by the submodule teams. The first issue that is evident is
the requirement that when a common IP block is modified, a new bitstream should
be released with modifications to only that IP. In practice, this would be difficult
to achieve as the submodule teams work independently. If a submodule team that
develops a common IP block releases a new version of the IP, it does not know
whether some other team has released a newer version of some other IP block after
the previous bitstream was released. To overcome this problem, the submodule
teams should synchronize with each other, which should be avoided.

Another problem is that the integration teams could not work as independently
as before. For example, let us consider a situation where one integration team desires
to release a new bitstream that contains a newer version of an IP block that is specific
only to the corresponding design variant. Even if this newer version of an IP block
has been released by the corresponding submodule team, the integration team may
not be able to release a new bitstream if a modification has also been done to some
common IP block. In this case, the integration team should somehow know whether
all other integration teams are already using the new version of the common IP.

The above problems could be avoided if it was assumed that there is a notable
time interval between each new version release of an IP block. In this scenario, a new
FPGA design bitstream would be created by integration teams after each release of
a new IP block version. However, this is not a feasible assumption. In practice, there
are numerous submodule teams working independently, and thus it is highly unlikely
that there is always enough interval between two subsequent IP block updates. For
these reasons, this process was not examined further in this thesis.
60

5 Conclusion and future improvements


This thesis addressed the challenges in the FPGA development process at Nokia.
Specifically, the problem of synchronizing the interfacing register maps of the IP
blocks over every design variant was examined. Maintaining the register maps syn-
chronized is highly important as they are the only abstraction of the FPGA designs
that are visible to the SW developers. Thus, differences in the register maps transfer
directly to problems for the SW developers, which should be avoided.

Chapter 2 of this thesis first discussed briefly mobile communications and rele-
vant technological aspects of 5G networks. In particular, it was motivated that
the strict real-time requirements of 5G L1 support the usage of HW instead of SW
for implementing the necessary computations. Afterwards, the general architecture
and usage of FPGAs was presented, and it was discussed why FPGAs provide an
attractive option for 5G HW implementation. Finally, the general requirements and
architectural aspects of interfacing between HW and SW were presented.

In chapter 2.5, the topic of SW development for reconfigurable HW was exam-


ined. It was observed that during the recent years different tools for integrating
FPGAs into traditional SW development have been developed. Two such tools,
namely OpenCL and RIFFA, were discussed in more detail. The main idea of these
tools is that they move the development to a higher abstraction level, so that the
developer does not have to consider the low-level interconnections between the SW
and the FPGA design. However, the common problem with these tools is that they
introduce relatively high latency that can not be tolerated in 5G L1 implementation.

Chapter 2.6 presented some general guidelines for team-based FPGA development.
It was observed that many companies involved in FPGA development share the
same challenges in the development process. Two essential parts of team-based
design were recognized: a single team leader that takes care of the top-level plan-
ning and integration, and the other team members that are responsible for the
IP block development. Including SW to the development process complicates the
development, because the register address map must be maintained between sev-
eral parts of the development team. To keep the process organized, it is highly
important to strictly manage the register map and to communicate every change
of the map to every group associated with the embedded system development process.

The current FPGA development workflow and the proposed improvements were
discussed in chapter 3. It was noted that even though the workflow associated with
generating a single FPGA design variant follows the best practices of team-based
FPGA design, the overall workflow including all the different variant-integration
teams and the UPM generation team does not follow them. This is due to the lack of a
single team leader for the integration teams, and the lack of necessary communication
between the integration teams. These problems were addressed by creating a top-level
Excel file, which serves as a common interface for all the integration teams to modify
61

the register banks and to communicate the modifications to the other integration
teams. In a way, the top-level Excel file can be seen as a replacement for the single
team lead that was concluded to be missing from the current workflow. In addition to
the major modifications, two minor steps in the current workflow, namely the acqui-
sition of the base addresses of the interfacing register banks and the generation of the
UPM based on the PM XML files of the design variants, were automated with scripts.

In chapter 4, the proposed improvements were analyzed and alternative approaches


were also discussed. For the minor improvements to the old workflow, it was measured
that automating the manual steps by scripts reduced the execution times of those
steps significantly. In addition, both of the improved steps were previously quite error
prone due to manually performed repetitive tasks. For example, the files in which
the base addresses are available are quite difficult for a human to read, which may
result in associating a base address with the wrong register interface. The manual
generation of the UPM, in turn, requires the developer to carefully copy and paste
the XML descriptions of each register bank, that is used in some other variant than
the master variant, to a specific file. In this process, it is quite easy to either copy
an incorrect XML description or to omit some description altogether. By using the
developed scripts, such human errors do not occur.

For the larger modification to the development process, it was motivated that the
proposed process basically ensures that the generated bitstreams of each variant are
always compatible with the most recent version of the UPM. However, this requires
direct communication between the variant integration teams and it may also slow
down the process because the integration teams must always synchronize with each
other before applying IP block modifications. In addition, it was noted that there
is currently a project being developed that would allow the integration teams to
generate two FPGA variants from only a single bitstream. This would reduce the
number of register maps, making it easier to synchronize the bitstreams.

One important aspect of the proposed process is that the Excel files were strove to
be replaced with the JSON file format. It was shown that the format of the register
bank information stored in the Excel files can be very well adapted to JSON, and a
script for converting an Excel file to JSON file was developed. However, there are a
few challenges in replacing Excel files with JSON files. The first one is that a script
for converting a JSON file into XML file should be developed. Secondly, even though
JSON is a lightweight data format and it supports scripting well, modifying or reading
large register banks is still simpler in Excel format than in JSON format. Therefore,
a simple user interface that allows effortless review and modification of the JSON
files should be developed. These problems could be addressed as future improvements.

As a conclusion, this thesis proposed a few methods to improve the currently used
FPGA development process at Nokia, and it also described a larger update to the
development flow. The aim of these modifications was to keep the process simple for
the HW designers and to ensure that the generated FPGA bitstreams always comply
62

to the UPM, avoiding any issues in the L1 SW development. Measurements were


conducted to prove that the minor modifications improve the design flow, and it was
also motivated that the new design flow fulfills the goals of the practical work of this
thesis. However, this new method cannot be taken into use immediately, but rather
discussions with the HW designers should be held first to agree on the practicalities
of the usage of the process. In addition, there is a potential for future improvement
in terms of applying the JSON format more widely to the design process.
63

References
[1] LM Ericsson. Future mobile data usage and traffic growth. [Online]. [Referenced
on 28.02.2019]. url: https://www.ericsson.com/en/mobility-report/
future-mobile-data-usage-and-traffic-growth.
[2] Cisco Systems. Cisco visual networking index: global mobile data traffic forecast
update, 2017-2022 white paper. [Online]. [Referenced on 28.02.2019]. url:
https : / / www . cisco . com / c / en / us / solutions / collateral / service -
provider / visual - networking - index - vni / white - paper - c11 - 738429 .
html.
[3] Xiang, W. et al. 5G mobile communications. 1st ed. Switzerland: Springer
international publishing, 2017. 691 pp. isbn: 978-3-319-34206-1.
[4] Rodriguez, J. Fundamentals of 5G mobile networks. 1st ed. West Sussex,
United Kingdom: John Wiley and Sons Ltd, 2015. 336 pp. isbn: 978-1-118-
86748-8.
[5] Laplante, P. and Ovaska, S. Real-time systems design and analysis: tools for
the practitioner. 4th ed. New Jersey: John Wiley and Sons Ltd, 2002. 560 pp.
isbn: 978-0-470-76864-8.
[6] Hallinan, C. Embedded Linux primer: a practical real-world approach. 2nd ed:
Prentice Hall., 2011. 616 pp. isbn: 978-0-13-701783-6.
[7] Khaldoun, A. et al. Mobile and wireless networks: Volume 2. ISTE Ltd., 2016.
356 pp. isbn: 978-1-84821-714-0.
[8] Veeraraghavan, Malathi. Three planes in networks. [Online]. [Referenced
on 01.04.2019]. url: http : / / www . ece . virginia . edu / mv / edu / ee136 /
Lectures/routing-sig/cs-ps-cops.pdf.
[9] Dahlman, E. et al. 4G LTE and LTE-advanced for mobile broadband. 1st ed.
United Kingdom: Elsevier Ltd, 2011. 455 pp. isbn: 978-0-12-385489-6.
[10] 3GPP. About 3GPP Home. [Online]. [Referenced on 16.04.2019]. url: https:
//www.3gpp.org/about-3gpp/about-3gpp.
[11] 3GPP. Releases. [Online]. [Referenced on 14.01.2019]. url: http://www.
3gpp.org/specifications/67-releases.
[12] The International Telecommunication Union. About International Telecom-
munication Union (ITU). [Online]. [Referenced on 14.01.2019]. url: https:
//www.itu.int/en/about/Pages/default.aspx.
[13] Beyene, Y.D. “Algorithms, protocols and cloud-RAN implementation aspects
of 5G networks. [online].” PhD thesis. Aalto University, Department of
communications and networking, Feb. 2018. [Referenced on 30.1.2019]. ISBN
978-952-60-7912-7 (electronic).
[14] 3GPP. 5G NR: overall description. [Online]. [Referenced on 16.01.2019]. url:
https://www.etsi.org/docdeliver/etsi_ts/138300_138399/138300/15.
02.00_60/ts_138300v150200p.docx.
64

[15] 3GPP. 5G: System architecture for the 5G system. [Online]. [Referenced
on 22.01.2019]. url: https://www.etsi.org/deliver/etsi_ts/123500_
123599/123501/15.02.00_60/ts_123501v150200p.pdf.
[16] Carlton, A. The 5G core network: 3GPP standards progress. [Online]. [Ref-
erenced on 28.01.2019]. url: https://www.computerworld.com/article/
3219828 / mobile - wireless / the - 5g - core - network - 3gpp - standards -
progress.html.
[17] 3GPP. 5G NR: Medium access control (MAC) protocol specification. [Online].
[Referenced on 16.04.2019]. url: https://www.etsi.org/deliver/etsi_
ts/138300_138399/138321/15.03.00_60/ts_138321v150300p.pdf.
[18] 3GPP. 5G NR: Radio link control (RLC) protocol specification. [Online].
[Referenced on 17.01.2019]. url: https://www.etsi.org/deliver/etsi_
ts/138300_138399/138322/15.03.00_60/ts_138322v150300p.pdf.
[19] 3GPP. 5G NR: Packet data convergence protocol (PDCP) specification. [Online].
[Referenced on 17.01.2019]. url: https://www.etsi.org/deliver/etsi_
ts/138300_138399/138323/15.02.00_60/ts_138323v150200p.pdf.
[20] 3GPP. NR: Radio resource control (RRC) protocol specification (release 15).
[Online]. [Referenced on 16.04.2019]. url: https://www.3gpp.org/ftp/
Specs/archive/38_series/38.331/.
[21] Goleniewski, L. et al. Telecommunications essentials, second edition: the
complete global source. 2nd ed: Addison-Wesley Professional 2006., 2006. 928 pp.
isbn: 978-0-32-142761-8.
[22] Diniz, P. et al. Block transceivers : OFDM and beyond. 1st ed: Morgan and
Claypool cop., 2012. 184 pp. isbn: 978-1-60845-830-1.
[23] Nokia Bell Labs. 5G new radio (NR): physical layer overview and performance.
[Online]. [Referenced on 22.01.2019]. url: http://ctw2018.ieee-ctw.org/
files/2018/05/5G-NR-CTW-final.pdf.
[24] Nokia Bell Labs. 5G new radio design. [Online]. [Referenced on 22.01.2019].
url: http://www.ieeevtc.org/conf-admin/vtc2017fall/51.pdf.
[25] Lin, X. et al. 5G new radio: unveiling the essentials of the next generation
wireless access technology. [Online]. [Referenced on 16.04.2019]. url: https:
//arxiv.org/abs/1806.06898.
[26] Koch, D. et al. FPGAs for software programmers. 1st ed. Switzerland: Springer
international publishing, 2016. 327 pp. isbn: 978-3-319-26406-6.
[27] Trimberger, S. “Three ages of FPGAs: A retrospective on the first thirty
years of FPGA technology”. In: Proceedings of the IEEE. [Online]. Vol. 103:3
(2015), Pp. 318–331. [Referenced on 08.01.2019]. issn: 0018-9219. Available:
doi: 10.1109/JPROC.2015.2392104.
65

[28] Intel Corporation. Intel stratix 10 GX/SX product table. [Online]. [Refer-
enced on 23.01.2019]. url: https : / / www . intel . com / content / dam /
www/programmable/us/en/pdfs/literature/pt/stratix- 10- product-
table.pdf.
[29] Intel Corporation. Intel Stratix 10 hard processor system technical reference
manual. [Online]. [Referenced on 02.04.2019]. url: https://www.intel.com/
content/dam/www/programmable/us/en/pdfs/literature/hb/stratix-
10/s10_5v4.pdf.
[30] Schaumont, P. A practical introduction to hardware/software codesign. 2nd ed:
Springer publishing., 2013. 480 pp. isbn: 978-1-4614-3737-6.
[31] Intel Corporation. Intel Stratix 10 logic array blocks and adaptive logic mod-
ules user guide. [Online]. [Referenced on 22.02.2019]. url: https://www.
intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/
stratix-10/ug-s10-lab.pdf.
[32] Smith, D.J. “VHDL and Verilog compared contrasted - plus modeled example
written in VHDL, Verilog and C”. In: 33rd design automation conference
proceedings, 1996. [Online] (1996), Pp. 771–776. [Referenced on 16.04.2019].
issn: 0738-100X. Available: doi: 10.1109/DAC.1996.545676.
[33] Ashenden, P. The designer’s guide to VHDL. Elsevier publishing., 2008. 909 pp.
isbn: 978-0-12-088785-9.
[34] Neshatpour, K. et al. “Energy-efficient acceleration of big data analytics
applications using FPGAs”. In: 2015 IEEE International Conference on Big
Data (Big Data). [Online]. (29 October 2015), [Referenced on 02.04.2019]. doi:
10.1109/BigData.2015.7363748.
[35] Mentor Graphics. C/C++/SystemC HLS. [Online]. [Referenced on 21.05.2019].
url: https://www.mentor.com/hls-lp/catapult-high-level-synthesis/
c-systemc-hls.
[36] Andrei, V. FPGA design flow: from HDL to physical implementation. [Online].
[Referenced on 11.01.2019]. url: https://indico.desy.de/indico/event/
7001/session/0/contribution/1/material/slides/0.pdf.
[37] Bezerra, E.A. and Lettnin, D.V. Synthesizable VHDL design for FPGAs.
Springer publishing., 2014. 157 pp. isbn: 978-3-319-02547-6.
[38] Microchip Technology Inc. Introduction to JTAG. [Online]. [Referenced on
25.02.2019]. url: http://microchipdeveloper.com/jlink:jtag.
[39] XJTAG. What is JTAG and how can I make use of it? [Online]. [Referenced
on 25.02.2019]. url: https://www.xjtag.com/about-jtag/what-is-jtag/.
[40] Rodriguez-Andina, J.J. et al. “Deep learning and reconfigurable platforms
in the internet on things: challenges and opportunities in algorithms and
hardware”. In: IEEE industrial electronics magazine. [Online]. Vol. 12:2 (2018),
Pp. 36–49 [Referenced on 07.05.2019]. doi: 10.1109/MIE.2018.2824843.
66

[41] Woulfe, M. et al. “Programming models for FPGA application accelerators”. In:
1st workshop on programming models for emerging achitectures. [Online]. (2009),
[Referenced on 04.03.2019]. issn: Available: doi: 10.13140/2.1.1537.4403.
[42] Szewinski, J. et al. “Software for development and communication with FPGA
based hardware”. In: Proceedings of the SPIE 5948, photonics applications in
industry and research IV, 59480H. [Online]. (11 October 2005), [Referenced on
28.01.2019]. doi: 10.1117/12.622512.
[43] Khronos Group. OpenCL overview. [Online]. [Referenced on 02.04.2019]. url:
https://www.khronos.org/opencl/.
[44] Jacobsen, M. et al. “RIFFA 2.1: a reusable integration framework for FPGA
accelerators”. In: ACM transactions on reconfigurable technology and systems.
[Online]. Vol. 8:4 (2015), Pp. 23. [Referenced on 03.05.2019]. doi: 10.1145/
2815631.
[45] Xilinx, Inc. Software defined. [Online]. [Referenced on 07.05.2019]. url:
https : / / www . xilinx . com / products / design - tools / software - zone /
sdaccel.html.
[46] Intel Corporation. Intel FPGA SDK for OpenCL software technology. [Online].
[Referenced on 07.05.2019]. url: https://www.intel.com/content/www/
us/en/software/programmable/sdk-for-opencl/overview.html.
[47] Intel Corporation. Using Intel FPGA SDK for OpenCL* on DE-series boards.
[Online]. [Referenced on 06.06.2019]. url: ftp://ftp.intel.com/pub/
fpgaup/pub/Intel_Material/17.0/Tutorials/OpenCL_On_DE_Series_
Boards.pdf.
[48] Intel Corporation. Intel FPGA SDK for OpenCL standard edition: get-
ting started guide. [Online]. [Referenced on 06.06.2019]. url: https : / /
www . intel . com / content / www / us / en / programmable / documentation /
xwm1515793070801.html.
[49] Intel Corporation. Intel FPGA SDK for OpenCL standard edition: Cyclone V
SoC getting started guide. [Online]. [Referenced on 06.06.2019]. url: https:
//www.intel.com/content/www/us/en/programmable/documentation/
xwm1515793070801.html.
[50] Intel Corporation. Intel FPGA SDK for OpenCL pro edition: programming
guide. [Online]. [Referenced on 06.06.2019]. url: https://www.intel.com/
content / www / us / en / programmable / documentation / mwh1391807965224 .
html#mwh1391807939093.
[51] Kavianipour, H. et al. “High performance FPGA-based DMA interface for
PCIe”. In: IEEE transactions on nuclear science. [Online]. Vol. 61:2 (2014),
Pp. 745–749. [Referenced on 03.05.2019]. doi: 10.1109/TNS.2014.2304691.
[52] Rosenberg, O. OpenCL do’s and don’ts. [Online]. [Referenced on 05.05.2019].
url: http://www.haifux.org/lectures/267/OpenCL_Dos_and_Donts.pdf.
67

[53] Texas Instruments Incorporated. Optimization techniques for host code. [On-
line]. [Referenced on 05.05.2019]. url: https : / / downloads . ti . com /
mctools/esd/docs/opencl/optimization/host_code.html.
[54] McCalpin, J. Low level microbenchmarks of processor to FPGA memory-mapped
IO. Tech. rep. University of Texas at Austin, 2014.
[55] Simpson, P.A. FPGA design: best practices for team-based reuse. 2nd ed:
Springer publishing., 2015. 257 pp. isbn: 978-3-319-17924-7.
[56] Schattkowsky, T. et al. “A UML frontend for IP-XACT-based IP management”.
In: 2009 design automation test in europe conference exhibition. [Online].
(2009), [Referenced on 07.05.2019]. doi: 10.1109/DATE.2009.5090664.
[57] IEEE 1685-2009. IEEE standard for IP-XACT, standard structure for pack-
aging, integrating, and reusing IP within tool flows. New York. 2010. 374
pp.
[58] Json. Introducing JSON. [Online]. [Referenced on 08.04.2019]. url: https:
//www.json.org/.

You might also like