ERTS Unit 3

EC8791 Embedded and Real Time Systems
UNIT III EMBEDDED PROGRAMMING
PART A
1. What is a data flow graph? [APRIL 2014](Apr /may 17 )

A data flow graph is a model of a program with no conditionals. In a high-level
programming language, a code segment with no conditionals—more precisely, with only one
entry and exit point—is known as a basic block. As the C code is executed, we would enter
this basic block at the beginning and execute all the statements.
2. What are the different CPU buses ?State the function of each one.[MAY & NOV 2013]
❖ PCI (Peripheral Component Interconnect) is the dominant high-performance system
bus today. PCI uses high-speed data transmission techniques and efficient protocols
to achieve high throughput.
❖ USB (Universal Serial Bus) and IEEE 1394 are the two major high-speed serial
buses. Both of these buses offer high transfer rates using simple connectors. They also
allow devices to be chained together so that users don’t have to worry about the order of
devices on the bus or other details of connection.
3. State the principle of basic compilation techniques. [MAY/JUNE 2013] .[NOV 2013]
➢ The high-level language program is translated into the lower-level form of instructions;
➢ Optimizations try to generate better instruction sequences than would be possible.
Optimization techniques focus on more of the program to ensure that compilation
decisions that appear to be good for one statement are not unnecessarily problematic for
other parts of the program
➢ Compilation combines translation and optimization.
4. Name any two technique used to optimize execution time of a program. [NOV/DEC
2012]
➢ Expression Simplification
➢ Dead Code Elimination
➢ Procedure IN lining
➢ Loop Transformations
➢ Register Allocation
➢ Scheduling
➢ Instruction Selection
➢ Understanding and Using Your Compiler
➢ Interpreters and JIT Compilers
1
5. What is meant by absolute address and relative address?
❖ The simplest form of the assembler assumes that the starting address of the assembly
language program has been specified by the programmer. The addresses in such a
program are known as absolute addresses
❖ Most assemblers therefore allow us to use relative addresses by specifying at the start
of the file that the origin of the assembly language module is to be computed later
6. Define Direct Memory Access (DMA).

. Direct memory access (DMA) is a bus operation that allows reads and writesnot controlled
by the CPU. A DMA transfer is controlled by a DMA controller,which requests control of the
bus from the CPU.After gaining control,the DMA controller performs read and write
operations directly between devices and memory.
7. Define Peripheral Component Interconnect (PCI)

PCI (Peripheral Component Interconnect) is the dominant high-performance system bus
today. PCI uses high-speed data transmission techniques and efficient protocols to achieve high
throughput.
The original PCI standard allowed operation up to 33 MHz; at that rate, it could achieve a
maximum transfer rate of 264 MB/s using 64-bit transfers. The revised PCI standard allows
the bus to run up to 66 MHz, giving a maximum transfer rate of 524 MB/s with 64-bit wide
transfers.
8. What is the function of host in development environment?

• The host should be able to do the following:
➢ load programs into the target,
➢ start and stop program execution on the target, and
➢ Examine memory and CPU registers.
9. What is the function of test bench in embedded coding?

In many cases, a testbench program can be built to help debug the embedded code. The test
bench generates inputs to simulate the actions of the input devices; it may also take the output
values and compare them against expected values, providing valuable early debugging help.
The embedded code may need to be slightly modified to work with the testbench, but careful
coding (such as using the #if def directive in C) can ensure that the changes can be undone
easily and without introducing bugs.
2
10. Define Cross compiler.
✓ A cross-compiler is a compiler that runs on one type of machine but generates code
for another.
✓ After compilation, the executable code is downloaded to the embedded system by a serial
link or perhaps burned in a PROM and plugged in.
11. What do you mean by entry point and external reference? Explain with an
example
The place in the file where a label is defined is known as an entry point. The place in the file
where the label is used is called an external reference
12. How registers are used in ARM procedure call standard (APCS) linkage
mechanism?
The ARM Procedure Call Standard (APCS) is a good illustration of a typical procedure
linkage mechanism.
■ r0_to r3 are used to pass parameters into the procedure. r0 is also used to hold the return
value. If more than four parameters are required, they are put on the stack frame.
■ r4 to r7 hold register variables.
■ r11 is the frame pointer and r13 is the stack pointer.
■ r10 holds the limiting address on stack size, which is used to check for stack
3
overflows.
Other registers have additional uses in the protocol.
13. What are two major types of testing strategies? State the function of each one.
The two major types of testing strategies:
■ Black-box methods generate tests without looking at the internal structure of the
program.
■ Clear-box (also known as white-box) methods generate tests based on the program
structure.
14. How many passes are required to process the LABEL present in source code? State
the function of each one
Label processing requires making two passes through the assembly source code as follows:
❖ The first pass scans the code to determine the address of each label.
❖ The second pass assembles the instructions using the label values computed in the first
pass
15. Differentiate stack pointer and frame pointer.

• Procedure stacks are typically built to grow down from high addresses. A stack pointer
(sp) defines the end of the current frame,
• While a frame pointer (fp) defines the end of the last frame.
16. How many nodes are present in data flow graph and what is the function of each node?
Two types of nodes in the graph—
➢ round nodes denote operators and
➢ square nodes represent values
17. What are all the components used to create the embedded software?or Mention the
different components for embedded programs. (Nov/Dec 2020 &Apr/May 2021)
• To create embedded software three structures or components that are commonly used
• the state machine,
• the circular buffer, and
• The queue
4
18. Define state machine .
• When inputs appear intermittently (stopping or ceasing for a time) rather than as
periodic samples, it is often convenient to think of the system as reacting to those inputs.
• The reaction of most systems can be characterized in terms of the input received and the
current state of the system.
This leads naturally to a finite-state machine style of describing the reactive system’s
behavior
19. What do you mean by random and regression tests?

Random tests form one category of black-box test. Random values are generated with a
given distribution. The expected values are computed independently of the system, and
then the test inputs are applied. A large number of tests must be applied for the results to be
statistically significant, but the tests are easy to generate. Another scenario is to test certain
types of data values. For example, integer-valued inputs can be generated at interesting
values such as 0, 1, and values near the maximum end of the data range. Illegal values can
be tested as well.
Regression tests form an extremely important category of tests. When tests are created
during earlier stages in the system design or for previous versions of the system, those
tests should be saved to apply to the later versions of the system. Clearly, unless the
system specification changed, the new system should be able to pass old tests. In some cases
old bugs can creep back into systems, such as when an old version of a software module is
inadvertently installed. In other cases regression tests simply exercise the code in different
ways than would be done for the current version of the code and therefore possibly exercise
different bugs.
20. What are the two major types of testing strategies?
program.
structure.
21. What is microprocessor in-circuit emulator (ICE)?
✓ The microprocessor in-circuit emulator (ICE) is a specialized hardware tool that
can help debug software in a working embedded system.
✓ The main drawback to in-circuit emulation is that the machine is specific to a
particular microprocessor, even down to the pinout.
✓
5
22. Define logic analyzer.
✓ The logic analyzer is the other major piece of instrumentation in the embedded
system designer’s arsenal.
✓ Think of a logic analyzer as an array of inexpensive oscilloscopes—the analyzer
can sample many different signals simultaneously(tens to hundreds) but can display
only 0, 1, or changing values for each conventional oscilloscope.
23.Define Stack pointer and frame pointer
A stack pointer (sp) defines the end of the current frame, while a frame pointer (fp)
defines the end of the last frame. The procedure can refer to an element in the frame by
addressing relative to stack pointer (sp) .
24. Show the structure of typical CPU bus which support read/Write . Apr/May 16
25. How the program validation can be done? Apr/May 16

program.
structure.
26. Compare static (SRAM) and dynamic RAM (DRAM). (NoV/Dec 16)
SRAM (static RAM) Dynamic RAM(DRAM),
SRAM (static RAM) is random Unlike dynamic RAM(DRAM), which
accessmemory (RAM) that retains data storesbits in cells consisting of a capacitor
bits in itsmemory as long as power is being and atransistor, SRAM does not have to be
supplied periodically refreshed
Static RAM provides faster access to Slower access to data compared to SRAM
Data
More expensive than DRAM Less expensive than SRAM
6
27. Differentiate Harvard and von Neumann Architecture .(Apr/May 18)
Von Neumann Architecture Harvard Architecture

The memory holds both data and Harvard machine has separate memories
instructions, and can be read or written for data and program.
when given an address. A computer
whose memory holds both data and
instructions is
known as a von Neumann machine
The CPU has several internal registers The program counter points to program
that store values used internally. One of memory, not data memory. As a result,
those registers is the program counter it is harder to write self-modifying
(PC),which holds the address in programs (programs that write data
memory of an instruction. The CPU values, then use
fetches the instruction from memory, those values as instructions) on Harvard
decodes the instruction, and executes it. machines
The program counter does not directly
determine what the machine does next,
but only indirectly by pointing to an
instruction in memory
28. List the memory devices used in design of embedded systems. Nov 17 ,Nov 18
✓ RAM(Random Access Memory)
✓ ROM(Read only Memory)
29. How power can be optimized at program level? Nov 17
■ Try to use registers efficiently.
■ Analyze cache behavior to find major cache conflicts.
■ Make use of page mode accesses in the memory system whenever possible.
30. What is meant by linking ,linker and loading? Apr/May 19,
Or What does a linker mean? N/D 2021
Linker
❖ A linker allows a program to be stitched together out of several smaller pieces. The
linker operates on the object files created by the assembler and modifies the assembled code to
make the necessary links between files.
7
Loader
❖ Program that brings the program into memory for execution is called a loader.
❖ The main job of the loader is to resolve external references based on available entry
points. As a result of the need to know how definitions and references connect, the
assembler passes to the linker not only the object file but also the symbol table.
31. Define Embedded programming . Apr/May 19
The creation of embedded programs is at the heart of embedded system design. Designing
and implementing embedded programs is different and more challenging than writing
typical workstation or PC programs. Embedded code must not only provide rich
functionality, it must also often run at a required rate to meet system deadlines, fit into the
allowed amount of memory, and meet power consumption requirements. Designing code
that simultaneously meets multiple design constraints is a considerable challenge.
32. Outline the significance of CDFG . Nov/Dec 18
❖ Can define a state variable that represents a program counter in a CPU.
❖ Either execute the data flow node or compute the decision in the decision node and
follow the appropriate edge, depending on the type of node the program counter points
on.
❖ Even though the data flow nodes may specify only a partial ordering on the data flow
computations, the CDFG is a sequential representation of the program.
❖ There is only one program counter in our execution model of the CDFG, and operations
are not executed in parallel
33. Differentiate compiler and cross compiler. (Nov/Dec 2020 &Apr/May 2021)
The main difference between compiler and cross compiler is that the compiler is a
software that transforms a computer program written in high-level programming language
into machine language while the cross compiler is a type of a compiler that can create an
executable code for a platform other than the one on which the compiler is running.
34. Bring out the difference between program counter and program location counter.
Nov/Dec 2021
Location counter is used to point available memory. Program counter is a special
location counter for memory to store instruction. Program Counter points to the next
instruction. While Instruction is being executed, Program counter points to current
instruction. When instructions is done, program counter is incremented, and program
counter points to the next instruction to be executed.
8
PART B
1. What are the Components required for embedded programs. Nov 17
Consider code for three structures or components that are commonly used in embedded
software: the
✓ State machine,
✓ The circular buffer, and
✓ The queue.
❖ State machines shown in fig 3.2 are well suited to reactive systems such as user
interfaces; circular buffers and queues are useful in digital signal processing.
▪ State Machines
❖ When inputs appear intermittently rather than as periodic samples, it is often
convenient to think of the system as reacting to those inputs.
Example
Stream-Oriented Programming and Circular Buffers

❖ The data stream style makes sense for data that comes in regularly and must be processed
on the fly. The FIR filter is a classic example of stream oriented processing as shown in fig
3.3. For each sample, the filter must emit one output that depends on the values of the last n
inputs.
9
Queues
❖ Queues are also used in
✓ Signal processing and
✓ Event processing.
❖ Queues are used whenever data may arrive and depart at somewhat unpredictable times
or when variable amounts of data may arrive. A queue is often referred to as an elastic
buffer.
❖ One way to build a queue is with a linked list.
❖ This approach allows the queue to grow to an arbitrary size.
❖ But in many applications we are unwilling to pay the price of dynamically allocating
memory.
❖ Another way to design the queue is to use an array to hold all the data.
2.Explain how to model the program in embedded systems ? Apr 18
Subtopics
1. Data Flow Graphs
2. Control/Data Flow Graphs
10
1 Data Flow Graphs
❖ A data flow graph is a model of a program with no conditionals. In a high-level
programming language, a code segment with no conditionals—more precisely,with only one
entry and exit point—is known as a basic block.
❖ Figure 3.4 shows a simple basic block. As the C code is executed, we would enter this
basic block at the beginning and execute all the statements.
❖ Before we are able to draw the data flow graph for this code we need to modify it
slightly. There are two assignments to the variable x—it appears twice on the left side of an
assignment.
❖ We need to rewrite the code in single-assignment form, fig 3.5 in which a variable
appears only once on the left side. Since our specification is C code, we assume that the
statements are executed sequentially, so that any use of a variable refers to its latest assigned
value.
❖ In this case, x is not reused in this block (presumably it is used elsewhere), so we just
have to eliminate the multiple assignment to x.
11
❖ The result is shown in above method, where we have used the names x1 and x2 to
distinguish the separate uses of x. The data flow graph for single-assignment code is shown in
Figure 3.6
❖ The single-assignment form means that the data flow graph is acyclic—if we assigned
to x multiple times, then the second assignment would form a cycle in the graph including x
and the operators used to compute x.
❖ The data flow graph for the code makes the order in which the operations are performed
in the C code much less obvious. This is one of the advantages of the data flow graph
Control/Data Flow Graphs
❖ A CDFG uses a data flow graph as an element, adding constructs to describe control.
In a basic CDFG, we have two types of nodes:

✓ Decision nodes and
✓ Data flow nodes.
❖ A data flow node encapsulates a complete data flow graph to represent a basic
block.We can use one type of decision node to describe all the types of control in a sequential
program. (The jump/branch is, after all, the way we implement all those high-level control
constructs.)
❖ The data flow graph is generally drawn in the form in fig 3.7. Here, the variables are not
explicitly represented by nodes. Instead, the edges are labeled with the variables they represent.
As a result, a variable can be represented by more than one edge.
12
❖ However, the edges are directed and all the edges for a variable must come from a single
source. We use this form for its simplicity and compactness. Fig 3.8 shows a flow graph of the
C code and its CDFG.
❖
3.Explain Assembly, Linking, And Loading?(or) Outline role of Assemblers and
linkers in compilation process
(OR)
Describe in detail about Assembly and Linking.[NOV/DEC 2013]Apr 18
• The assembler’s job is to translate symbolic assembly language statements into bit-level
representations of instructions known as object code. The assembler takes care of
instruction formats and does part of the job of translating labels into addresses as shown
in Fig 3.9
13
❖ However, since the program may be built from many files, the final steps in determining
the addresses of instructions and data are performed by the linker, which produces an
executable binary file.
❖ Program that brings the program into memory for execution is called a loader.
❖ The simplest form of the assembler assumes that the starting address of the assembly
language program has been specified by the programmer. The addresses in such a
program are known as absolute addresses.
❖ Most assemblers therefore allow us to use relative addresses by specifying at the start
of the file that the origin of the assembly language module is to be computed later.
Addresses within the module are then computed relative to the start of the module. The linker
is then responsible for translating relative addresses into addresses.
a) Assemblers
❖ When translating assembly code into object code, the assembler must translate opcodes
and format the bits in each instruction, and translate labels into addresses. Labels make the
assembly process more complex, but they are the most important abstraction provided by the
assembler.
❖ Labels let the programmer avoid worrying about the locations of instructions and data.
Label processing requires making two passes through the assembly source code as
follows:
1. The first pass scans the code to determine the address of each label.
2. The second pass assembles the instructions using the label values computed in the first
pass.
As shown in Figure 3.10, the name of each symbol and its address is stored in a symbol
table that is built during the first pass. The symbol table is built by scanning from the first
instruction to the last.
❖ During scanning, the current location in memory is kept in a program location counter
(PLC).
14
❖ Thus, at the start of the first pass, the PLC is set to the program’s starting address
and the assembler looks at the first line.
❖ After examining the line, the assembler updates the PLC to the next location and
looks at the next instruction. If the instruction begins with a label, a new entry is made in the
symbol table, which includes the label name and its value.
❖ At the end of the first pass, the assembler rewinds to the beginning of the assembly
language file to make the second pass.
❖ During the second pass, when a label name is found, the label is looked up in the
symbol table and its value substituted into the appropriate place in the instruction.
❖ Symbol table processing during assembly. is a pseudo-op that specifies the origin of the
program, that is, the location of the first address in the program.
❖ A common name for this pseudo-op (e.g., the one used for the ARM) is the ORG statement .
b) Linking
❖ A linker allows a program to be stitched together out of several smaller pieces. The
linker operates on the object files created by the assembler and modifies the assembled code to
make the necessary links between files.
❖ Some labels will be both defined and used in the same file. Other labels will be defined
in a single file but used elsewhere as illustrated in Figure 3.11
❖ The place in the file where a label is defined is known as an entry point.
❖ The place in the file where the label is used is called an external reference.
❖ The main job of the loader is to resolve external references based on available entry
points.
❖ Even if the entire symbol table is not kept for later debugging purposes, it must at least
pass the entry points. External references are identified in the object code by their relative
symbol identifiers. The linker proceeds in two phases.
❖ First, it determines the address of the start of each object file. The order in which object
files are to be loaded is given by the user, either by specifying parameters when the loader is
run or by creating a load map file that gives the order in which files are to be placed in memory.
❖ At the start of the second phase, the loader merges all symbol tables from the object files
into a single, large table. It then edits the object files to change relative addresses into addresses.
15
4.Describe basic compilation techniques.[MAY/JUNE 2014](Apr /may 17 )Nov17
(Nov/Dec 2020 &Apr/May 2021),N/D 2021
❖ Compilation combines translation and optimization.

❖ The high-level language program is translated into the lower-level form of instructions;
optimizations try generate better instruction sequences than would be possible if the brute
force technique of independently translating source code statements were used.
❖ Optimization techniques focus on more of the program to ensure that compilation
decisions that appear to be good for one statement are not unnecessarily problematic for other
parts of the program.
❖ The compilation process is summarized in Figure 3.12 Compilation begins with high-
level language code such as C and generally produces assembly code.
❖ The high-level language program is split to break it into statements and
expressions.
16
❖ Simplifying arithmetic expressions is one example of a machine-independent optimization.
❖ They may work directly on real instructions or on a pseudo-instruction format that is later
mapped onto the instructions of the target CPU. This level of optimization also helps
modularize the compiler by allowing code generation to create simpler code that is later
optimized. For example, consider the following array access code:
x[i] = c*x[i];
Statement Translation
The basic job of translating the high-level language program with little or no
optimization.
Procedures
❖ Another major code generation problem is the creation of procedures. Generating code
for procedures is relatively straightforward once we know the procedure linkage appropriate
for the CPU.
❖ At the procedure definition, we generate the code to handle the procedure call and return.
At each call of the procedure, we set up the procedure parameters and make the call.
❖ Procedure stacks are typically built to grow down from high addresses. A stack pointer
(sp) defines the end of the current frame, while a frame pointer (fp)defines the end of the
last frame.
❖ The procedure can refer to an element in the frame by addressing relative to stack pointer
(sp).
17
❖ The ARM Procedure Call Standard (APCS) is a good illustration of a typical procedure
linkage mechanism. Although the stack frames are in main memory, understanding how
registers are used is key to understanding the mechanism, as explained below.
❖ ■ r0_r3 are used to pass parameters into the procedure. r0 is also used to hold the return
value. If more than four parameters are required, they are put on the stack frame.
❖ ■ r4 _ r7 hold register variables.
❖ ■ r11 is the frame pointer and r13 is the stack pointer.
❖ ■ r10 holds the limiting address on stack size, which is used to check for stack overflows.
❖ Other registers have additional uses in the protocol
3 Data Structures
❖ The compiler must also translate references to data structures into references to raw
memories. In general, this requires address computations. Some of these computations
can be done at compile time while others must be done at runtime.
❖ Let us first consider one-dimensional arrays:
a[i]
❖ The layout of the array in memory is shown in Figure 3.13. The zeroth element is stored
as the first element of the array, the first element directly below, and so on.
❖ We can create a pointer for the array that points to the array’s head, namely, a[0]. If we
call that pointer aptr for convenience ,then we can rewrite the reading of a[i] as
*(aptr + i)
❖ Two-dimensional arrays are more challenging. There are multiple possible ways to
lay out a two-dimensional array in memory, as shown in Figure 3.14. In this form, which
is known as row major, the inner variable of the array ( j in a[i, j]) varies most quickly.
❖ Two dimensional arrays also require more sophisticated addressing—in particular, we
must know the size of the array. Let us consider the row-major form.
❖ If the a[ ] array is of size N _ M, then we can turn the two-dimensional array access into
a one-dimensional array access. Thus,
a[i,j]
becomes
a[i*M + j]
where the maximum value for j is M -1.
18
5.How will you perform PROGRAM-LEVEL PERFORMANCE ANALYSIS in
embedded systems?Apr/May 16 (Apr /may 17 )
❖ The execution time of a program often varies with the input data values because those values
select different execution paths in the program as shown in fig 3.15.
❖ The cache has a major effect on program performance, and once again, the cache’s behavior
depends in part on the data values input to the program.
❖ Execution times may vary even at the instruction level.
19
We can measure program performance in several ways:
❖ Some microprocessor manufacturers supply simulators for their CPUs:
❖ The simulator runs on a workstation or PC, takes as input an executable for the
microprocessor along with input data, and simulates the execution of that program.
❖ Some of these simulators go beyond functional simulation to measure the execution time
of the program.
❖ A timer connected to the microprocessor bus can be used to measure performance of
executing sections of code.
❖ The code to be measured would reset and start the timer at its start and stop the timer at
the end of execution. The length of the program that can be measured is limited by the accuracy
of the timer.
❖ A logic analyzer can be connected to the microprocessor bus to measure the, start and
stop times of a code segment.
❖ Three different types of performance measures on programs:
❖ Average-case execution time: This is the typical execution time we would expect for
typical data. Clearly, the first challenge is defining typical inputs.
❖ Worst-case execution time The longest time that the program can spend on any input
sequence is clearly important for systems that must meet deadlines.
❖ Best-case execution time This measure can be important in multirate real-time systems
1. Elements of Program Performance
❖ The key to evaluating execution time is breaking the performance problem into parts. Program
execution time can be seen as
❖ execution time =program path +instruction timing
❖ Once we know the execution path of the program, we have to measure the execution time of
the instructions executed along that path.
❖ Not all instructions take the same amount of time.
20
❖ Execution times of instructions are not independent.
❖ The execution time of an instruction may depend on operand values.
2 .Measurement-Driven Performance Analysis
❖ The most direct way to determine the execution time of a program is by measuring it. This
approach is appealing, but it does have some drawbacks.
❖ First, in order to cause the program to execute its worst-case execution path, we have to provide
the proper inputs to it. Determining the set of inputs that will guarantee the worst case execution
path is infeasible. Furthermore, in order to measure the program’s performance on a particular
type of CPU, we need the CPU or its simulator.
❖ Despite these drawbacks, measurement is the most commonly use way to determine the
execution time of embedded software.
❖ The record of the execution path of a program as a program trace (or more succinctly,
atrace).Traces can be valuable for other purposes, such as analyzing the cache behavior of the
program.
❖ This problem has two aspects. First, we have to determine the actual input values. We may be
able to use benchmark data sets or data captured from a running system to help us generate
typical values.
❖ For simple programs, we may be able to analyze the algorithm to determine the inputs that
cause the worst-case execution time.
❖ The other problem with input data is the software scaffolding that we may need to feed data
into the program and get data out.
❖ For purposes of performance analysis, the most important type of CPU simulator is the cycle-
accurate simulator, which performs a sufficiently detailed simulation of the processor’s
internals so that it can determine the exact number of clock cycles required for execution.
❖ A cycle-accurate simulator has a complete model of the processor, including the cache. It can
therefore provide valuable information about why the program runs too slowly.
21
6. Discuss about embedded system Software performance analysis and its optimization.
Apr/May 16 , Nov 17
There are several techniques for optimizing software performance, including

both loop and cache optimizations as well as more generic strategies.
Loop optimizations
❖ Loops are important targets for optimization because programs with loops tend to spend
a lot of time executing those loops.
❖ There are three important techniques in optimizing loops:
✓ code motion,
✓ induction variable elimination, and
✓ Strength reduction.
❖ Code motion lets us move unnecessary code out of a loop. If a computation’s result does
not depend on operations performed in the loop body, then we can safely move it out of the
loop. Code motion opportunities can arise because programmers may find some computations
clearer and more concise when put in the loop body, even though they are not strictly dependent
on the loop iterations. A simple example of code motion is also common. Consider this loop:
for (i = 0; i< N*M; i++) {
z[i] = a[i] + b[i];
❖ The code motion opportunity becomes more obvious when we draw the loop’s CDFG .
The loop bound computation is performed on every iteration during the loop test, even though
the result never changes. We can avoid N X M - 1 unnecessary executions of this statement by
moving it before the loop, as shown in the figure.
❖ An induction variable is a variable whose value is derived from the loop iteration
variable’s value. The compiler often introduces induction variables to help it implement the
loop. Properly transformed, we may be able to eliminate some variables and apply strength
reduction to others.
2. Cache optimizations
❖ A loop nest is a set of loops, one inside the other. Loop nests occur when we process arrays. A
large body of techniques has been developed for optimizing loop nests.
22
❖ Rewriting a loop nest changes the order in which array elements are accessed. This can expose
new parallelism opportunities that can be exploited by later stages of the compiler, and it can
also improve cache performance. In this section we concentrate on the analysis of loop nests
for cache performance.
3. Performance optimization strategies
❖ Performance analysis and measurement will give you a baseline for the execution time
of the program. Knowing the overall execution time tells you how much it needs to be
improved. Knowing the execution time of various pieces of the program helps you identify the
right locations for changes to the program.
❖ You may be able to redesign your algorithm to improve efficiency. Examining
asymptotic performance is often a good guide to efficiency. Doing fewer operations is usually
the key to performance.
❖ In a few cases, however, brute force may provide a better implementation. A seemingly
simple high-level-language statement may in fact hide a very long sequence of operations that
slows down the algorithm.
❖ Using dynamically allocated memory is one example, because managing the heap takes
time but is hidden from the programmer. For example, a sophisticated algorithm that uses
dynamic storage may be slower in practice than an algorithm that performs more operations on
statically allocated memory.
❖ Finally, you can look at the implementation of the program itself. Here are a few hints
on program implementation:
✓ Try to use registers efficiently.
✓ Group accesses to a value together so that the value can be brought into a register and
kept there.
✓ Make use of page mode accesses in the memory system whenever possible. Page
mode reads and writes eliminate one step in the memory access.
✓ Analyze cache behavior to find major cache conflicts. Restructure the code to
eliminate as many of these as you can as follows:
✓ For instruction conflicts, if the offending code segment is small, try to rewrite the
segment to make it as small as possible so that it better fits into the cache.
✓ For scalar data conflicts, move the data values to different locations to reduce
conflicts.
✓ For array data conflicts, consider either moving the arrays or changing your array
access patterns to reduce conflicts.
23
7. Explain Program-Level Energy And Power Analysis And Optimization?
Apr/May16 (Nov/Dec 16)(Nov/Dec 18) (Nov/Dec 2020 &Apr/May 2021)
Power consumption is a particularly important design metric for battery-powered systems
because the battery has a very limited lifetime. However, power consumption is increasingly
important in systems that run off the power grid.
Example
■ We may be able to replace the algorithms with others that do things in clever ways that
consume less power as shown in Fig 3.16
■ Memory accesses are a major component of power consumption in many applications. By
optimizing memory accesses we may be able to significantly reduce power.
■ We may be able to turn off parts of the system—such as subsystems of the CPU, chips in the
system, and so on—when we do not need them in order to save power.
Several factors contribute to the energy consumption of the program.

■ Energy consumption varies somewhat from instruction to instruction.
■ The sequence of instructions has some influence.
■ The opcode and the locations of the operands also matter.
❖ Choosing which instructions to use can make some difference in a program’s energy
consumption, but concentrating on the instruction opcodes has limited payoffs in most CPUs.
The program has to do a certain amount of computation to perform its function.
❖ While there may be some clever ways to perform that computation, the energy cost of
the basic computation will change only a fairly small amount compared to the total system
energy consumption, and usually only after a great deal of effort.
❖ We are further hampered in our ability to optimize instruction level energy consumption
because most manufacturers do not provide detailed, instruction-level energy consumption
figures for their processors.
❖ In many applications, the biggest payoff in energy reduction for a given amount of
designer effort comes from concentrating on the memory system.
24
❖ Accesses to registers are the most energy efficient; cache accesses are more energy
efficient than main memory accesses. Caches are an important factor in energy consumption.
❖ On the one hand, a cache hit saves a costly main memory access, and on the other, the
cache itself is relatively power hungry because it is built from SRAM, not DRAM. If we can
control the size of the cache, we want to choose the smallest cache that provides us with the
necessary performance.
❖ If the cache is too small, the program runs slowly and the system consumes a lot of
power due to the high cost of main memory accesses.
❖ If the cache is too large, the power consumption is high without a corresponding
payoff in performance. At intermediate values, the execution time and power consumption are
both good.?
❖ The best overall advice is that high performance = low power.
❖ If the program can be modified to reduce instruction or data cache conflicts, for example,
the energy required by the memory system can be significantly reduced. The effectiveness of
changes such as reordering instructions or selecting different instructions depends on the
processor involved, but they are generally less effective than cache optimizations.
❖ A few optimizations mentioned previously for performance are also often useful for
improving energy consumption:
■ Try to use registers efficiently.

■ Analyze cache behavior to find major cache conflicts.
■ Make use of page mode accesses in the memory system whenever possible.
8. Explain Analysis And Optimization Of Program Size?

❖ The memory footprint of a program is determined by the size of its data and instructions. Both
must be considered to minimize program size. Data provide an excellent opportunity for
minimizing size because the data are most highly dependent on programming style.
❖ Because inefficient programs often keep several copies of data, identifying and eliminating
duplications can lead to significant memory savings usually with little performance penalty.
Buffers should be sized carefully—rather than defining a data array to a large size that the
program will never attain, determine the actual maximum amount of data held in the buffer and
allocate the array accordingly.
❖ Data can sometimes be packed, such as by storing several flags in a single word and extracting
them by using bit-level operations. A very low-level technique for minimizing data is to reuse
values. For instance, if several constants happen to have the same value, they can be mapped
to the same location.
❖ Data buffers can often be reused at several different points in the program. This technique
must be used with extreme caution, however, since subsequent versions of the program may
25
not use the same values for the constants. A more generally applicable technique is to generate
data on the fly rather than store it.
❖ Of course, the code required to generate the data takes up space in the program, but when
complex data structures are involved there may be some net space savings from using code to
generate data. Minimizing the size of the instruction text of a program requires a mix of high-
level program transformations and careful instruction selection.
❖ Encapsulating functions in subroutines can reduce program size when done carefully.
Because subroutines have overhead for parameter passing that is not obvious from the high-
level language code, there is a minimum-size function body for which a subroutine makes
sense.
❖ Architectures that have variable-size instruction lengths are particularly good candidates
for careful coding to minimize program size, which may require assembly language coding of
key program segments.
❖ There may also be cases in which one or a sequence of instructions is much smaller than
alternative implementations— for example, a multiply-accumulate instruction may be both
smaller and faster than separate arithmetic operations.
❖ When reducing the number of instructions in a program, one important technique is the proper
use of subroutines. If the program performs identical operations repeatedly, these operations
are natural candidates for subroutines.
❖ Even if the operations vary somewhat, you may be able to construct a properly parameterized
subroutine that saves space. Of course, when considering the code size savings, the subroutine
linkage code must be counted into the equation. There is extra code not only in the subroutine
body but also in each call to the subroutine that handles parameters.
❖ In some cases, proper instruction selection may reduce code size; this is particularly true in
CPUs that use variable-length instructions.
❖ Some microprocessor architectures support dense instruction sets, specially designed
instruction sets that use shorter instruction formats to encode the instructions.
❖ The ARM Thumb instruction set and the MIPS-16 instruction set for the MIPS
architecture are two examples of this type of instruction set. In many cases, a
microprocessor that supports the dense instruction set also supports the normal instruction set,
although it is possible to build a microprocessor that executes only the dense instruction set.
❖ Special compilation modes produce the program in terms of the dense instruction set. Program
size of course varies with the type of program, but programs using the dense instruction set are
often 70 to 80% of the size of the standard instruction set equivalents.
26
9. Describe the techniques used for Program Validation And Testing?[MAY/JUNE
2013](or) Compare various programming validation and testing methods done for system
design. (Apr/May 19)

program.
structure.
a) Clear-Box Testing
■ Provide the program with inputs that exercise the test we are interested in.
■ Execute the program to perform the test.
■ Examine the outputs to determine whether the test was successful.
❖ Graph theory helps us get a quantitative handle on the different paths required. In an undirected
graph, we can form any path through the graph from combinations of basis paths.
❖ The graph is represented as an incidence matrix. Each row and column represents a node; a 1
is entered for each node pair connected by an edge.
❖ A simple measure, cyclomatic complexity [McC76], allows us to measure the control
complexity of a program. Cyclomatic complexity is an upper bound on the size of the basis set.
If e is the number of edges in the flow graph, n the number of nodes, and p the number of
components in the graph, then the cyclomatic complexity is given by
M =e-n+2p
❖ For a structured program, M can be computed by counting the number of binary decisions in
the flow graph and adding 1. If the CDFG has higher-order branch nodes, add b − 1 for each
b-way branch.
❖ The cyclomatic complexity evaluates to 4. Because there are actually only three distinct paths
in the graph, cyclomatic complexity in this case is an overly conservative bound.
❖ Another way of looking at control flow–oriented testing is to analyze the conditions that control
the conditional statements. Consider the following if statement:
if ((a == b) || (c >= d)) { … }
❖ This complex condition can be exercised in several different ways. If we want to truly exercise
the paths through this condition, it is prudent to exercise the conditional’s elements in ways
related to their own structure, not just the structure of the paths through them.
27
❖ A simple condition testing strategy is known as branch testing. This strategy requires the true
and false branches of a conditional and every simple condition in the conditional’s expression
to be tested at least once.
❖ A potential problem with path coverage is that the paths chosen to cover the CDFG may not
have any important relationship to the program’s function. Another testing strategy known as
data flow testing makes use of def-use analysis (short for definition-use analysis). It selects
paths that have some relationship to the program’s function.
❖ The terms def and use come from compilers, which use def-use analysis for optimization. A
variable’s value is defined when an assignment is made to the variable; it is used when it
appears on the right side of an assignment (sometimes called a C-use for computation use) or
in a conditional expression (sometimes called P-use for predicate use). A def-use pair is a
definition of a variable’s value and a use of that value. Figure 3.17 shows a code fragment and
all the def-use pairs for the first assignment to a. Def-use analysis can be performed on a
program using iterative algorithms.
❖ Data flow testing chooses tests that exercise chosen def-use pairs. The test first causes a certain
value to be assigned at the definition and then observes the result at the use point to be sure
that the desired value arrived there.
2. Black-box testing
❖ Black-box tests are generated without knowledge of the code being tested. When used alone,
black-box tests have a low probability of finding all the bugs in a program. But when used in
conjunction with clear-box tests they help provide a well- rounded test set.
❖ Random tests form one category of black-box test. Random values are generated with a
given distribution. The expected values are computed independently of the system, and then
the test inputs are applied. A large number of tests must be applied for the results to be
28
statistically significant, but the tests are easy to generate. Another scenario is to test certain
types of data values.
❖ Regression tests form an extremely important category of tests. When tests are created during
earlier stages in the system design or for previous versions of the system, those tests should
be saved to apply to the later versions of the system.
3. Evaluating functional tests
The evaluation is done by two well-known programs, TeX and awk. Fig 3.18. They used
functional tests for these programs that had been developed over several years of extensive
testing. Upon applying those functional tests to the programs.
10.Discuss in detail about the various techniques used in “black box testing”.
Nov/Dec 2021
Black Box Testing

• Black box testing is also called functional testing. I is testing that ignores the
internal mechanism of a system or component and focuses solely on the outputs
generated in response to selected inputs and execution conditions.
• With black box testing, the software tester does not have access to the source
code itself. The code is considered to be a "big black box" to the tester who
can't see inside the box.
Black-box is based on requirements and functionality, not code.

• Random tests form one category of black-box test. Random values are
generated with a given distribution.
• The expected values are computed independently of the system, and then the
test inputs are applied. A large number of tests must be applied for the results
to be statistically significant, but the tests are easy to generate.
29
• Using black box testing techniques, testers examine the high-level design and
the customer requirements specification to plan the test cases to ensure the
code does what it is intended to do.
• Functional testing involves ensuring that the functionality specified in the
requirement specification works. System testing involves putting the new
program in many different environments to ensure the program works in
typical customer environments with various versions and types of operating
systems and/ or applications.
Advantages :
1. Tests the final behavior of the software.

2. Can be written independent of software design.
3. Can be used to test different implementations with minimal changes.
Disadvantages :
1. Doesn't necessarily know the boundary cases.

2. Can be difficult to cover all portions of software implementation.
11. Can you apply code motion to the following example? Explain Nov/Dec 2021
for (i=0; i < N; i++)

for (j=0; j < M; j++)
z[i][j] = a[i]*b[i][j];
❖ The compiler uses induction variables to help it address the arrays. Let us rewrite
the loop in C using induction variables and pointers.
for (i = 0; i< N; i++)
for (j = 0; j < M; j++) {
zbinduct = i*M + j;
*(zptr + zbinduct) = *(bptr + zbinduct);

30
}
❖ In the above code, zptr and bptr are pointers to the heads of the z and b arrays and
zbinduct is the shared induction variable. However, we do not need to compute zbinduct afresh
each time. Because we are stepping through the arrays sequentially, we can simply add the
update value to the induction variable:
zbinduct = 0;
for (i = 0; i< N; i++) {
for (j = 0; j < M; j++) {
*(zptr + zbinduct) = *(bptr + zbinduct);
zbinduct++;
}
❖ This is a form of strength reduction because we have eliminated the multiplication from
the induction variable computation.
❖ Strength reduction helps us reduce the cost of a loop iteration. Consider this assignment:
❖ y = x * 2;
❖ In integer arithmetic, we can use a left shift rather than a multiplication by 2 (as long as
we properly keep track of overflows). If the shift is faster than the multiply, we probably want
to perform the substitution.
❖ This optimization can often be used with induction variables because loops are often
indexed with simple expressions. Fig 3.19 is shown as example for code motion.
31
Strength reduction can often be performed with simple substitution rules because there are
relatively few interactions between the possible substitutions.
12. Find the cyclomatic complexity of the CDFG for the code fragment given: Nov/Dec
2021
if(a< b) {
If(c<d)
X=1;
else
x=2;
} else{
if(e<f)
x =3;
else
x=4;
}
❖ cyclomatic complexity allows us to measure the control complexity of a program. Cyclomatic
complexity is an upper bound on the size of the basis set. If e is the number of edges in the flow
graph, n the number of nodes, and p the number of components in the graph, then the
cyclomatic complexity is given as sown in Fig 3.20
M =e-n+2p
32
❖ For a structured program, M can be computed by counting the number of binary decisions in
the flow graph and adding 1. If the CDFG has higher-order branch nodes, add b − 1 for each
b-way branch.
❖ In the example shown in Figure 3.21, the cyclomatic complexity evaluates to 4. Because there
are actually only three distinct paths in the graph, cyclomatic complexity in this case is an
overly conservative bound.
❖ Another way of looking at control flow–oriented testing is to analyze the conditions that control
the conditional statements. Consider the following if statement:
if ((a == b) || (c >= d)) { … }
❖ This complex condition can be exercised in several different ways. If we want to truly exercise
the paths through this condition, it is prudent to exercise the conditional’s elements in ways
related to their own structure, not just the structure of the paths through them.
❖ A simple condition testing strategy is known as branch testing. This strategy requires the true
and false branches of a conditional and every simple condition in the conditional’s expression
to be tested at least once.
33
13.With a neat flowchart, explain the steps involved in compiling a program. Nov/Dec
2021
• A compiler is a computer program that translates a program written in

a high-level language to the machine language of a computer. The high-level
program is referred to as 'the source code.' A typical computer program
processes some type of input data to produce output data. The compiler is used
to translate source code into machine code or compiled code.
• Fig. 3.22 shows program generation from compilation through loading.
• Assembler converts assembly language programs into object files. Object files
contain a combination of machine instructions, data, and information needed
to place instructions properly in memory.
• Linker merges the object files produced by separate compilation or assembly and
creates an executable file.
• Loader : Part of the OS that brings an executable file residing on disk into
memory and starts it running.
• Label : It is an identifier and optional field. Labels are used extensively in
34
programs to reduce reliance upon programmers remembering where data or
code is located. The maximum length of label differs between assemblers.
Some accept upto 32 characters long, other only four characters.
• The name of each symbol and its address is stored in a symbol table that is
built during the first pass. The symbol table is built by scan-ning from the first
instruction to the last.
• Symbol table contains name and address field. Symbol table is built by the
analysis phase. It also contains flags to indicate errors.
• During scanning, the current location in memory is kept in a Program
Location Counter (PLC).
• Memory allocation means the fixing the address of the assembly language
statement. Suppose we want to fix the memory address of M, then also fix the
address of remaining instructions.
• Location counter is a data structure used to implement the memory allocation.
Location Counter (LC) is always made to contain the address of the next
memory word in the target program.
• SYMTAB includes the name and value for each label in the source program,
together with flags to indicate error conditions. It also contains information
about the data area or instruction labeled.
35
Fig.3.22 Program generation from compilation through loading
• During Pass 1, labels are entered into SYMTAB as they are encountered in
the source program along with their assigned address.
• During Pass 2, symbols used as operands are looked up in SYMTAB to obtain
the addresses to be inserted in the assembled instructions.
• SYMTAB is also organized as a hash table for efficiency of insertion and
retrieval. Entries are rarely deleted from this table.
• It is possible for both passes of the assembler to read the original source
program as input. Information such as location counter values and error flag
statement can be communicated between the two passes. For this reason, Pass
1 usually writes an intermediate file that contains each source statement together
with its assigned address, error indications etc. This file is used as the input to
Pass 2.
36
14. Find all the def-use pairs for the code fragment given below: Nov/Dec 2021
x=a-b;
y=c-d;
z=e-f;
if (x<10){
q=y+e;
z=e+f;
}
if (z<y)proc();
a. x: any value will exercise def-use. Y: any value will exercise first def-use; must use y<10
to exercise second def.
b. r is not used. S is always exercised.
c. X is not used. To exercise y and z def-use pairs, set x<10
By feeding in known values of the x[i]’s and the coefficients, it will be possible to verify the
results. Additionally, it will be possible to verify the results by running the random tests on a
known working system and then comparing the result
❖ A potential problem with path coverage is that the paths chosen to cover the CDFG may not
have any important relationship to the program’s function. Another testing strategy known as
data flow testing makes use of def-use analysis (short for definition-use analysis). It selects
paths that have some relationship to the program’s function.
❖ The terms def and use come from compilers, which use def-use analysis for optimization. A
variable’s value is defined when an assignment is made to the variable; it is used when it
appears on the right side of an assignment (sometimes called a C-use for computation use) or
in a conditional expression (sometimes called P-use for predicate use). A def-use pair is a
definition of a variable’s value and a use of that value. Code fragment of all the def-use pairs
form the first assignment to a. Def-use analysis can be performed on a program using iterative
algorithms.
37
15. Show the contents of the assembler’s symbol table at the end of code generation for
each line of the following program: Nov/Dec 2021
ORG 200 p1
ADRr4,a
LDR r0,[r4]
ADR r4,e
LDR r1, [r4]
ADD r0,r0,r1
CMP r0,r1
BNE q1 p2
ADR r4,e
Fig 3.23 shows the assemblers symbol table at the end of code generation.
38

ERTS Unit 3

Uploaded by

Copyright:

Available Formats

ERTS Unit 3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ERTS Unit 3

Uploaded by

Copyright:

Available Formats

EC8791 Embedded and Real Time Systems

UNIT III EMBEDDED PROGRAMMING

1. What is a data flow graph? [APRIL 2014](Apr /may 17 )

6. Define Direct Memory Access (DMA).

7. Define Peripheral Component Interconnect (PCI)

8. What is the function of host in development environment?

9. What is the function of test bench in embedded coding?

15. Differentiate stack pointer and frame pointer.

➢ round nodes denote operators and

➢ square nodes represent values

• the state machine,

• the circular buffer, and

19. What do you mean by random and regression tests?

25. How the program validation can be done? Apr/May 16

Von Neumann Architecture Harvard Architecture

30. What is meant by linking ,linker and loading? Apr/May 19,

Or What does a linker mean? N/D 2021

Location counter is used to point available memory. Program counter is a special

1. What are the Components required for embedded programs. Nov 17

Stream-Oriented Programming and Circular Buffers

2.Explain how to model the program in embedded systems ? Apr 18

In a basic CDFG, we have two types of nodes:

❖ Compilation combines translation and optimization.

where the maximum value for j is M -1.

1. Elements of Program Performance

2 .Measurement-Driven Performance Analysis

There are several techniques for optimizing software performance, including

for (i = 0; i< N*M; i++) {

z[i] = a[i] + b[i];

3. Performance optimization strategies

Several factors contribute to the energy consumption of the program.

■ Try to use registers efficiently.

8. Explain Analysis And Optimization Of Program Size?

The two major types of testing strategies:

if ((a == b) || (c >= d)) { … }

Black Box Testing

Black-box is based on requirements and functionality, not code.

1. Tests the final behavior of the software.

1. Doesn't necessarily know the boundary cases.

for (i=0; i < N; i++)

for (i = 0; i< N; i++)

for (j = 0; j < M; j++) {

*(zptr + zbinduct) = *(bptr + zbinduct);

for (i = 0; i< N; i++) {

for (j = 0; j < M; j++) {

*(zptr + zbinduct) = *(bptr + zbinduct);

if ((a == b) || (c >= d)) { … }

• A compiler is a computer program that translates a program written in

You might also like

(zptr + zbinduct) = (bptr + zbinduct);

(zptr + zbinduct) = (bptr + zbinduct);