Embeded PU Computer
Embeded PU Computer
Sensor – It measures the physical quantity and converts it to an electrical signal which can be read by an
observer or by any electronic instrument like an A2D converter. A sensor stores the measured quantity to
the memory.
A-D Converter – An analog-to-digital converter converts the analog signal sent by the sensor into a digital
signal.
Processor & ASICs – Processors process the data to measure the output and store it to the memory.
D-A Converter – A digital-to-analog converter converts the digital data fed by the processor to analog
data.
Actuator – An actuator compares the output given by the D-A Converter to the actual (expected) output
stored in it and stores the approved output.
Power Supply
Processor
Memory
Timers
Serial communication ports
Output/Output circuits
System application specific circuits
The CU includes a fetch unit for fetching instructions from the memory. The EU has circuits that
implement the instructions pertaining to data transfer operation and data conversion from one
form to another.
The EU includes the Arithmetic and Logical Unit (ALU) and also the circuits that execute
instructions for a program control task such as interrupt, or jump to another set of instructions.
A processor runs the cycles of fetch and executes the instructions in the same sequence as they
are fetched from memory.
General Purpose Processor (GPP): GPP is used for processing signal from input to output by
controlling the operation of system bus, address bus and data bus inside an embedded system. It
provides the hardwire circuit for memory management i.e. supports the one –chip DMA and
Cache. It consists the common circuitry for computations of arithmetic as well as logical
operations used in daily life i.e. it includes the powerful ALU. It use the large instruction set and
use the pipeline structure for instruction execution to speed up computer. Types of general purpose
processor are:
Microprocessor
Microcontroller
Embedded Processor
Digital Signal Processor
Microprocessor
A microprocessor is a single VLSI chip having a CPU. In addition, it may also have other units
such as coaches, floating point processing arithmetic unit, and pipelining units that help in faster
processing of instructions.
Microcontroller
A microcontroller is a single-chip VLSI unit (also called microcomputer) which, although having
limited computational capabilities, possesses enhanced input/output capability and a number of
on-chip functional units.
Microcontrollers are particularly used in embedded systems for real-time control applications
with on-chip program memory and devices. Some of the examples are: Intel 8032, 8051, 8052,
AVR ATMEGA 328 etc.
A digital signal processor (DSP) is an integrated circuit designed for high-speed data
manipulations, and is used in audio, communications, image manipulation, and other data
acquisition and data-control applications. For example: PAC, TMS320XX series, Zed-broad etc.
The constraints in the embedded systems design are imposed by external as well as internal
specifications. Design metrics are introduced to measure the cost function taking into account the
technical as well as economic considerations.
A Design Metric is a measurable feature of the system’s performance, cost, time for
implementation and safety etc. Most of these are conflicting requirements i.e. optimizing one shall
not optimize the other: e.g. a cheaper processor may have a lousy performance as far as speed and
throughput is concerned. Following metrics are generally taken into account while designing
embedded systems
It is one-time cost of designing the system. Once the system is designed, any number of units can
be manufactured without incurring any additional design cost; hence the term nonrecurring.
Unit Cost:
The monetary cost of manufacturing each copy of the system, excluding NRE cost.
Size: The physical space required by the system, often measured in bytes for software, and gates
or transistors for hardware.
Performance:
Power Consumption:
It is the amount of power consumed by the system, which may determine the lifetime of a battery,
or the cooling requirements of the IC, since more power means more heat.
Flexibility:
The ability to change the functionality of the system without incurring heavy NRE cost. Software
is typically considered very flexible.
Time-to-prototype:
The time needed to build a working version of the system, which may be bigger or more expensive
than the final system implementation, but it can be used to verify the system’s usefulness and
correctness and to refine the system’s functionality.
Time-to-market:
The time required to develop a system to the point that it can be released and sold to customers.
The main contributors are design time, manufacturing time, and testing time. This metric has
become especially demanding in recent years. Introducing an embedded system to the marketplace
early can make a big difference in the system’s profitability.
It is the ability to modify the system after its initial release, especially by designers who did not
originally design the system.
Correctness:
This is the measure of the confidence that we have implemented the system’s functionality
correctly. We can check the functionality throughout the process of designing the system, and we
can insert test circuitry to check that manufacturing was correct.
A single purpose processor is a digital; circuit designed to execute exactly one program. An
embedded system designer may obtain several benefits by choosing to use a custom single purpose
processor to implement a computation task.
A basic processor consists of a controller and a data path. The data path stores and manipulates a
system’s data. The data path contains registers units, functional units and connection like wires
and multiplexers. The data path can be configured to read data from particular registers feed that
data through functional units configured to carry out particular operations like add or shift and
store the operation results back in to the particular registers. Controller caries out such
configuration of the data path. It sets the data path control inputs, like register load and multiplexer
select signals, of the registers units, functional units and connection units to obtain the desired
configuration at a particular time.
It monitors external control inputs as well as data path control outputs, known as status signals,
coming from functional units, and it sets external control outputs as well. The digital systems
design techniques such as combinational and sequential logic design including those of
synchronous and asynchronous design can be applied to build a CONTROLLER and a DATA
PATH.
Performance may be faster, due to fewer clock cycles resulting from a customized data
path and due to shorter clock cycles resulting from the simpler controller logic.
Size may be smaller due to simplest data path and no program memory.
Power consumption may be less due to more efficient computation.
However, cost could be higher because of high NRE cost. Also time to market may be longer.
Embedded systems have different applications. A few select applications of embedded systems are
smart cards, telecommunications, satellites, missiles, digital consumer electronics, computer
networking, etc.
When the software development cycle ends then the cycle begins for the process of integrating the
software into the hardware at the time when a system is designed.
Both cycles concurrently proceed when co-designing a time critical sophisticated system.
Combinational logic refers to circuits whose output is a function of the present value of the inputs only. As
soon as inputs are changed, the information about the previous inputs is lost, that is, combinational logic
circuits have no memory.
Sequential logic circuits are those whose outputs are also dependent upon past inputs, and hence outputs.
In other words the output of a sequential circuit may depend upon its previous outputs and so in effect has
some form of "memory". The mathematical model of a sequential circuit is usually referred to as
a sequential machine. The general block diagram of a sequential switching circuit is shown below:
Transistor:
A transistor is a basic electrical component in digital system. It acts as a simple ON/OFF switch. Transister
are abstracted to construct the logic gates in higher level. One type of transistor is complementary metal
oxide (CMOS) which is more popular in combinational circuit design and corresponding technology is
called CMOS technology. CMOS transistor are two types:
1. nMOS Transistor
2. pMOS Transister
We can apply low or high levels to the gate of the CMOS transistor. We refers logical levels i.e. logic 0 is
0V and logic 1 is 5V.
nMOS Transistor:
- When logic 1 is applied to gate, the transistor conducts and current flows from source to drain.
- When logic 0 is applied to gate, the transistor do not conducts and high resistance is developed
about 10 MΩ along source to drain.
- When logic 0 is applied to gate, the transistor conducts and current flows from source to drain.
When logic 1 is applied to gate, the transistor do not conducts and high resistance is developed
about 10 MΩ along source to drain
- When the input “x” is logic 0, the upper transistor conducts and lower transistor does not conducts.
Thus logic 1 is appears as output “F”.
- Similarly when the input “x” is logic 1, the upper transistor does not conducts and lower transistor
conducts. Thus logic 0 is appears as output “F”.
NAND Gate:
Logic Gates:
The Digital Logic Gate is the basic building block from which all digital electronic circuits and
microprocessor based systems are constructed from. Basic digital logic gates perform logical operations
of AND, OR and NOT on binary numbers.
Digital logic gates may have more than one input, (X, Y, Z, etc.) but generally only have one digital
output, (F). Individual logic gates can be connected together to form combinational or sequential circuits,
or larger logic gate functions.
Combinational logic refers to circuits whose output is a function of the present value of the inputs only.
Combinational logic circuits have no memory. The design procedure includes following steps:
- Determine the no of input and output of system from the problem specifications.
- Derive truth table for each output from their relationship with number of inputs.
- Simplify the Boolean expression using K-maps & obtain logic equations.
- Draw logic diagram (sharing common gates).
- Simulate circuit for design verification.
- Optimize the circuit for area and/or performance using different optimization metric.
- Re -simulate & verify optimized design.
For example: a combinational circuit that have three input say A, B and C and will give a single output X
as logic 1 if the binary number consists more 1’s than 0’s.
All digital circuit can be designed as the procedure stated as above but it is difficult and complex so higher
abstract level of combinational logic devices are used in resister transfer level design. Some of the
component are:
1. Multiplexer:
A digital multiplexer is a combinational circuit that selects binary information from one of many input lines
and directs it to a single output line.
A –a
2. Decoder
Discrete quantities of information are represented in digital systems with binary codes. A binary code
of n bits is capable of representing up to 2n distinct elements of the coded information. Decoder is a
combinational circuit that converts binary information from n input lines to a maximum of 2n unique
output lines.
3. Adder:
-
An n-bit adder adds two n-bit data say A and B at a time and generates carry and sum as an
output.
4. Comparator:
- An n-bit ALU can perform arithmetic and logical calculations on n-bit data inputs A and B.
- The selection input “S” determines the operation that is going to perform.
6. Shifter
- An n-bit shifter can shift the n bit data stored on n-bit register towards right or left direction.
Asynchronous sequential circuits change their state and output values whenever a change in input values
occurs. Synchronous sequential circuits change their states and output values at fixed points of time, which
are specified by the rising or falling edge of a free-running clock signal. Clock period is the time between
successive transitions in the same direction, i.e., between two rising or two falling edges. Clock frequency
= 1/clock period Clock width is the time during which the value of the clock signal is equal to 1. Duty cycle
is the ratio of clock width and clock period. Active high if the state changes occur at the clock's rising edge
or during the clock width. Active low if the state changes occur at the clock's falling edge. Latches and flip
flops are the basic storage elements that can store one bit of information. The basic components with their
properties are shown below.
The characteristic equation is just the functional expressions derived from the characteristic (truth) table.
It formally describes the functional behavior of a latch or flip-flop. They specify the flip-flop’s next state
as a function of its current state and inputs.
The excitation table gives the value of the flip-flop inputs that are necessary to change the flip-flop’s
present state to the desired next state after the rising edge of the clock signal. It is obtained from the
characteristic table by transposing input and output columns. It is used during the synthesis of sequential
circuits.
- A register stores the n-bits from its input “I” with those n bits appearing at output line “Q”.
Shift Register:
- A shift register stores n bit but they cannot be stored in parallel. Instead they must be shifted
into the register i.e. one bit at a single clock.
- The register have at least a data input “I” that holds a single bit at a time and control input
shift that is used to insert a data.
- When clock is raising mode the shift equals 1 such that the data bits in “I” is inserted into “n”
bit position of register while nth bit is inserted into (n-1)th bits and (n-1) bit into (n-2) and so
on.
- The first bit is usually shifted to the output end “Q”.
Counter:
- A counter is a register that can be also increment, meaning add the binary value 1, to its stored
binary value. A counter has a control input clear that resets all bits of register to vale “0” and
count input that increments the value of stored number by 1 in each raising edge of clock.
- A counter has also a load input to load the n –bit data in parallel.
- Commonly a counter operates in both mode as down and up. The up counter increments the
value stored in register by 1 and down counter decrements the contents of register by value 1
up to define level. Mode M counter counts from 0 to M-1 or M-1 to 0. For this it requires
another control input as Count UP/DOWN.
- These control input may be
The design of a clocked sequential circuit starts from a set of specifications (state table) and ends in a
logic diagram or a list of Boolean functions from which the logic diagram can be obtained. The procedure
can be summarized by a list of consecutive recommended steps:
1. State the word description of the circuit behavior. It may be a state diagram, a timing
diagram, or other pertinent information.
2. From the given information about the circuit, obtain the state table.
3. Apply state-reduction methods if the sequential circuit can be characterized by input-output
relationships independent of the number of states.
4. Assign binary values to each state if the state table obtained in step 2 or 3 contains letter
symbols.
5. Determine the number of flip-flops needed and assign a letter symbol to each.
6. Choose the type of flip-flop to be used.
7. From the state table, derive the circuit excitation and output tables.
8. Using the map or any other simplification method, derive the circuit output functions and the
flip-flop input functions.
9. Draw the logic diagram.
Design Examples:
1. Design a combinational circuit for output y and z where y is 1 when a is 1 or b and c are 1 and z
is 1 if b or c is 1 but not all i.e. a and b and c are 1.
Truth Table:
Logic Diagram:
Truth Table:
K-map Reduction:
Logic Diagram:
State diagram
Logic diagram
A sequence recognizer is a special kind of sequential circuit that looks for a special bit pattern in some
input. The recognizer circuit has only one input, X
There is one output, Z, which is 1 when the desired pattern is found. Our example will detect the bit
pattern ―1001‖:
A sequential circuit is required because the circuit has to ―remember‖ the inputs from previous clock
cycles, in order to determine whether or not a match was found. The corresponding state diagram is:
State Diagram
State assignments: Let S0S0 = 00, S1S1 = 01, S2S2 = 10, S3S3 = 11
Excitation table:
2. Design a sequence detector that produces a true output whenever it detects the sequence 010 at its
input.
- The data path stores and manipulates the system data. Examples of data in embedded system
includes binary numbers representing external conditions like temperature or speed, the character
to be displayed on the screen or digitized photographic image to be stored and compressed.
- The data path consists register unit, functional unit, and connection unit like wires and multiplexer.
- The data path can be configured to read the data from register, fed that data into the functional unit
configured to carry the operations like add, subtract, shift etc. and stores the result back to the
designated register.
- A controller caries out the configuration of data path. It sets the data path control signals like load
input to register, select inputs for selecting the register, operation selection for functional unit and
connection unit to obtain the desired configuration at particular instant of time.
- It monitors the external control input as well as data path control outputs known as status signal,
coming from the functional units and it sets the external control output as well.
- The combinational or sequential logic can be applied to design controller and data path.
Statements:
Statements are used to implement the logical operation, data flow and control flow during the custom
single processor design. Some of the useful statements are:
1. Assignment statements
2. Loop statements
Assignment Statements:
Assignment statements are used to initialize the fixed data into a variable or transfer
of data from one variable to another. For example a =5, b = 10, d= a etc. It is used to
create a single state and available for next statement.
Loop Statements:
Branch Statements:
Steps:
1. Draw a black box that shows the abstract view of the implementation logic.
2. Derive the algorithms to implement the functionality of system.
3. Derive the state diagram to implement the operational logic in terms of control flow, dataflow and
applicable logic using different statements.
4. Design the data path, functional unit as well as controller to implement the complex logic specified
in step 2.
For example: design of custom single processor to find the GCD of given two numbers
Suppose Xi and Yi be the two input number, go be the control input and do is the GCD of Xi and Yi such
that the black box, functionality and state diagram be designed as below be represented as:
!1
1:
1 !(!go_i)
2:
!go_i
2-J:
0: int x, y; 3:
go_i x_i y_i x = x_i
1: while (1) {
GCD 2: while (!go_i);
4: y = y_i
3: x = x_i;
d_o 4: y = y_i;
!(x!=y)
5: while (x != y) { 5:
6: if (x < y) x!=y
7: y = y - x; 6:
else x<y !(x<y)
8: x = x - y;
y = y -x 8: x = x - y
} 7:
9: d_o = x;
6-J:
}
5-J:
9: d_o = x
1-J:
d_o
Q3 Q2 Q1 Q0
State register
I3 I2 I1 I0
The finite state machine uses the number of states so thaht they can be reduced without halting the operation.
The machine can be optimized by optimizing different parameter as below:
- Original program
- FSMD
- Data path
- FSM for controller
1. Optimization of Original Program:
The program can be optimize by optimizing the no of computation, size of variable, time and space
complexity, operation used like multiplication and division may have the higher cost etc. for
example:
2. Optimization of FSMD:
FSMD can be optimized by using different concepts as merge state, separate state and scheduling.
Those states they have constant value or independent with change then they can be merged and so
called merged state. The state with complex logic can be replaces by no of sub operation with simpler
logic that reduce the hardware complexity and so called as separate state. By optimization of scheduling
time, we can optimize the FSMD. For example:
To optimize the data path, a sheared functional unit can be used such that there is a optimized datapath
and able perform the verity of operation. For this purpose we can use the sheared ALU circuit that
supports verity of state equation as well as state operations. For example: for operation X-Y and Y-X,
instead of using two sub tractor we can generate the result.
The FSM of controller can be optimize by reducing the states and using efficient state encoding. The n
bit encoded binary bits can define 2n different states and n bits can generate the n! Combinations. The
states can be merged into a single state such that it provides the optimize FSM.
The general purpose processor sometimes called as central processing unit (CPU) or a microprocessor. It
consists of a datapath and a control unit, tightly coupled with memory as shown above fig. the general
purpose basic architecture consists of:
- Datapath
- Control unit
- Memory
Data path:
The data path consists the circuitry to transfer the data from one place to another and storing the temporary
data. The datapath contains the ALU capable of transferring the data from different operation like additions,
subtraction, bitwise OR, AND etc. ALU also generates the status signals often stored in status register.
These status bit conditions are known as flags. The flags may be zero, sign carry overflow etc. It also
contains register and stores the data temporarily during ALU operation. The temporary data includes:
The capacity of processor measured by bandwidth of datapath i.e. data carrying capacity. The n-bit size
processor consists:
Control Unit
It consists a circuitry that generates the control signal to read the instruction stored in memory, its execution
and transferring the data from datapath, storing the result on memory and IO operation. A register called as
program counter (PC) that is use to sequence the program i.e. it points the address of next instruction that
is going to fetch and execute. Another register called as instruction register (IR) is used to hold the
instruction read from memory. The register called as Address Register (AR) is used to store the memory
address during memory read/write operation. The control unit has a controller, consisting of a status register
plus next state and control logic. It controls the data flow on datapath and such flow includes:
An m bit sized address memory consists the address space of 2m and controller goes through the following
operation to execute an instructions.
Memory:
Registers are used as the short term storage whereas memory is used as the mid-term and long –term storage.
Memory be classified as:
- Program Memory
- Data memory
The program memory is used to store the sequence of instruction called as program which is use to achieve
a given functionality whereas data memory use to store the data information and represents the values of
input, output, and transformed by program. We can store the data and program together or separately. The
memory architecture follow the following two model:
- Princeton Architecture
- The Princeton architecture shares the common memory space to store the data and program and
requires one to one connection with hardware.
- The Harvard architecture uses the separate memory space to store program as well as data and
requires different connection. The microcontroller 8051/52 follow this model.
Instruction Execution:
Fetch Instruction (FI): It is a task that reads the instruction from memory pointed by PC and
loaded into the instruction register.
Decode Instruction (DI): In this phase, an instruction be decoded to separate the operand
reference as well as operation code to represent the particular operation as ADD, MOV, AND,
SUB etc.
Fetch Operand (FO): In this stage, the operand be read from the memory represented by the
effective address (EA). EA calculation is needed for indirect address.
Execute Instruction (EI): In this phase, the instruction be executed in accordance with its
opcode an operand and generates the result.
Store Result (SR): The result is stored on the particular destination that may be register or
memory.
Pipeline for Instruction Execution: The pipeline is a common process to increase the instruction
throughput of a microprocessor. The pipelining is easily understand by taking the example of washing and
drying eight dishes as below.
The instruction pipeline can be carried to execute the instruction in five independent segments as
specified above as FI, DI, FO, EI and SR. The instruction pipeline structure be constructed as:
Programmer View:
A programmer writes the program instruction carryout the desired functionality on GPP. For this purpose,
programmer doesn’t need to know about the detailed structure of the processor where as he/she need to
know how instruction be executed. There are three levels of programming:
1. Assembly Level
2. Structure level
3. Machine level
Assembly Level: An assembly language is a low-level programming language for microprocessors and
other programmable devices. It is not just a single language, but rather a group of languages. An assembly
language implements a symbolic representation of the machine code needed to program a given CPU
architecture. Assembly language is also known as assembly code. The term is often also used synonymously
with 2GL. The assembler is needed to convert the assembly code into machine code.
Machine Level: Sometimes referred to as machine code or object code, machine language is a collection
of binary digits or bits that the computer reads and interprets. Machine language is the only language a
computer is capable of understanding.
Operating System:
An Operating System (OS) is an interface between a computer user and computer hardware. An operating
system is a software which performs all the basic tasks like file management, memory management, process
management, handling input and output, and controlling peripheral devices such as disk drives and printers.
An operating system is a program that acts as an interface between the user and the computer hardware and
controls the execution of all kinds of programs.
The system call provides an interface to the operating system services. Application developers often do not
have direct access to the system calls, but can access them through an application programming interface
(API). The functions that are included in the API invoke the actual system calls. By using the API, certain
benefits can be gained:
Portability: as long a system supports an API, any program using that API can compile and run.
Development Environment:
The development environment is comprise of the general software tools they are used to design, testing,
validation and verification of embedded system software. The software be developed in general processor
called as development processor then it sis burned to the target embedded processor.
- Target hardware platform consists of target hardware (processor, memory, I/O) and Runtime
environment (Operating System/Kernel).
- Target hardware platform contains only what is needed for final deployment.
- Target hardware platform does not contain development tools (editor, compiler, debugger).
- Development platform, called the Host Computer, is typically a general purpose computer.
- Host computer runs compiler, assembler, linker, locator to create a binary image that will run on
the Target embedded system.
Editor
A source code editor is a text editor program designed specifically for editing source code to control
embedded systems. It may be a standalone application or it may be built into an integrated development
environment (e.g. IDE). Source code editors may have features specifically designed to simplify and speed
up input of source code, such as syntax highlighting and auto complete functionality. These features ease
the development of code.
Compiler
A compiler is a computer program that translates the source code into computer language (object code).
Commonly the output has a form suitable for processing by other programs (e.g., a linker), but it may be a
human readable text file. A compiler translates source code from a high level language to a lower level
language (e.g., assembly language or machine language). The most common reason for wanting to translate
source code is to create a program that can be executed on a computer or on an embedded system. The
Linker
A linker or link editor is a program that takes one or more objects generated by compilers and assembles
them into a single executable program or a library that can later be linked to in itself. All of the object files
resulting from compiling must be combined in a special way before the program locator will produce an
output file that contains a binary image that can be loaded into the target ROM. A commonly used
linker/locater for embedded systems isld (GNU).
Debugger
A debugger is a computer program that is used to test and debug other programs. It is a piece of software
running on the PC, which has to be tightly integrated with the emulator that you use to validate your code.
A Debugger allows you to download your code to the emulator's memory and then control all of the
functions of the emulator from a PC.
The process of converting the source code into object file is called compiling. Machine-language
instructions are specific to a particular processor and platforms are different for development. The complier
that runs on a computer platform and produces code for that same computer platform called as Native-
compiler. The compiler that runs on one computer platform and produces code for another computer
platform is called Cross-compiler.
In some cases, a compiler is not used. In this case assembler and interpreter is used.
The Linker combines object files (from compiler) and resolves variable and function references and
corresponding process is called linking.
- A Locator is the tool that performs the conversion from relocatable program to executable binary
image and corresponding procedure is called locating.
- The Locator assigns physical memory addresses to code and data sections within the relocatable
program
- The Locator produces a binary memory image that can be loaded into the target ROM
- In contrast, On General Purpose Computers, the operating system assigns the addresses at load time
Once a program has been successfully compiled, linked, and located, it must be moved to the target
platform. Executable binary image is transferred and loaded into a memory device on target board.
Debugging Tools:
When it comes to debugging your code and testing your application there are several different tools you
can utilize that differ greatly in terms of development time spend and debugging features available. In this
section we take a look at simulators, and emulators.
Simulators try to model the behavior of the complete microcontroller in software. Some simulators go even
a step further and include the whole system (simulation of peripherals outside of the microcontroller). No
matter how fast you’re PC, there is no simulator on the market that can actually simulate a microcontroller's
behavior in real-time. Simulating external events can become a time-consuming exercise, as you have to
manually create "stimulus" files that tell the simulator what external waveforms to expect on which
microcontroller pin. A simulator can also not talk to your target system, so functions that rely on external
components are difficult to verify. For that reason simulators are best suited to test algorithms that run
completely within the microcontroller.
An emulator is a piece of hardware that ideally behaves exactly like the real microcontroller chip with all
its integrated functionality. It is the most powerful debugging tool of all. A microcontroller's functions are
emulated in real-time.
Design
Functional Requirements
Architecture Design
Hardware Modeling
- Schematics design
- PCB Layout Design
- Re-engineering and repairing
- Samples & Prototypes Assembly
Prototyping
The FSMD of general purpose processor requires no of sub operation to execute the instruction. The data
flow to execute the above instruction set is constructed as:
Instruction Operation
Code
Load A 001
Load B 010
A OR B 011
A AND B 100
A+B 101
A-B 110
A+1 111
In a memory read operation the CPU loads the address onto the address bus. Most cases these lines are fed
to a decoder which selects the proper memory location. The CPU then sends a read control signal. The data
is stored in that location is transferred to the processor via the data lines. In the memory write operation
after the address is loaded the CPU sends the write control signal followed by the data to the requested
memory location. The memory can be classified in various ways i.e. based on the location, power
consumption, way of data storage etc. The memory at the basic level can be classified as:
1. Processor Memory (Register Array)
2. Internal on-chip Memory
3. Primary Memory
4. Cache Memory
5. Secondary Memory
Processor Memory (Register Array)
Most processors have some registers associated with the arithmetic logic units. They store the
operands and the result of an instruction. The data transfer rates are much faster without needing
any additional clock cycles. The number of registers varies from processor to processor. The more
is the number the faster is the instruction execution. But the complexity of the architecture puts a
limit on the amount of the processor memory.
Memory Specifications
The specification of a typical memory is as follows
- The storage capacity: The number of bits/bytes or words it can store.
- The memory access time (read access and write access): How long the memory takes to
load the data on to its data lines after it has been addressed or how fast it can store the data
upon supplied through its data lines. This reciprocal of the memory access time is known
as Memory.
- The Power Consumption and Voltage Levels: The power consumption is a major factor in
embedded systems. The lesser is the power consumption the more is packing density.
- Size: Size is directly related to the power consumption and data storage capacity.
There are two important specifications for the Memory as far as Real Time Embedded Systems are
concerned.
- Write Ability
- Storage Performance
Write Ability:
It is the manner and speed that a particular memory can be written. The memory write ability refers to the
process of putting the bits in specific location of the memory and the ease and speed with which the process
can be completed. The writing process may be time consuming like in ROMs or faster like in registers.
Example:
The figure shows the structure of a ROM. Horizontal lines represents the words. The vertical lines give out
data. These lines are connected only at circles. If address input is 010 the decoder sets 2 nd word line to 1.
The data lines Q3 and Q1 are set to 1 because there is a “programmed” connection with word 2’s line. The
word 2 is not connected with data lines Q2 and Q0. Thus the output is 1010.
Mask-programmed ROM
The connections “programmed” at fabrication. They are a set of masks. It can be written only once (in the
factory). But it stores data for ever. Thus it has the highest storage permanence. The bits never change
unless damaged. These are typically used for final design of high-volume systems.
OTP ROM: One-time programmable ROM
The Connections “programmed” after manufacture by user. The user provides file of desired contents of
ROM. The file input to machine called ROM programmer. Each programmable connection is a fuse. The
ROM programmer blows fuses where connections should not exist.
- Very low write ability: typically written only once and requires ROM programmer device
- Very high storage permanence: bits don’t change unless reconnected to programmer and
more fuses blown
- Commonly used in final products: cheaper, harder to inadvertently modify
Ram variations
- PSRAM: Pseudo-static RAM
DRAM with built-in memory refresh controller
Popular low-cost high-density alternative to SRAM
- NVRAM: Nonvolatile RAM
Holds data after external power removed
Battery-backed RAM
o SRAM with own permanently connected battery
o writes as fast as reads
o no limit on number of writes unlike nonvolatile ROM-based
memory
SRAM with EEPROM or flash stores complete RAM contents on
EEPROM or flash before power
Example: HM6264 & 27C256 RAM/ROM devices
- Low-cost low-capacity memory devices
- Commonly used in 8-bit microcontroller-based embedded systems • First two numeric
digits indicate device type
RAM: 62
ROM: 27
- Subsequent digits indicate capacity in kilobits
Cache
At any given time, data is copied between only two adjacent levels:
- Upper level: the one closer to the processor
Smaller, faster, uses more expensive technology
- Lower level: the one away from the processor
Bigger, slower, uses less expensive technology
The basic unit of information transfer is Block. The minimum unit of information that can either be present
or not present in a level of the hierarchy is called as block.
Cache Mapping:
Cache mapping is the method by which the contents of main memory are brought into the cache and
referenced by the CPU. The mapping method used directly affects the performance of the entire embedded
system. It is necessary as there are far fewer number of available cache addresses than the memory
- Are address’ contents in cache?
- Cache mapping used to assign main memory address to cache address and determine hit or miss.
- Three basic techniques:
Direct mapping
Fully associative mapping
Set-associative mapping
- Caches partitioned into indivisible blocks or lines of adjacent memory addresses.
usually 4 or 8 addresses per line
Direct Mapping:
Main memory locations can only be copied into one location in the cache. This is accomplished by dividing
main memory into pages that correspond in size with the cache.
When all lines are occupied, bringing in a new block requires that an existing line be overwritten.
Algorithms must be implemented in hardware for speed and they are:
- Least Recently used (LRU)
- First in first out (FIFO)
- Least-frequently-used (LFU)
- Random
- Addressing: The data sent by the master over a specified set of lines which enables just the device
for which it is meant.
- Protocols: The literal meaning of protocol is a set of rules. Here it is a set of formal rules describing
how to transfer data, especially between two devices.
- Arbitration: it specifies the process of generating the logic to select the bus connection over
different connected peripherals.
- Memory to processor: the processor reads instructions and data from memory
- Processor to memory: the processor writes data to memory
- I/O to processor: the processor reads data from I/O device
- Processor to I/O: the processor writes data to I/O device
- I/O to or from memory: I/O module allowed to exchange data directly with memory without going
through the processor - Direct Memory Access (DMA)
Signals found on a bus are:
- Memory write: data on the bus written into the addressed location
- Memory read: data from the addressed location placed on the bus
- I/O write: data on the bus output to the addressed I/O port
- I/O read: data from the addressed I/O port placed on the bus
- Bus REQ: indicates a module needs to gain control of the bus
- Bus GRANT: indicates that requesting module has been granted control of the bus
- Interrupt REQ: indicates that an interrupt is pending
- Interrupt ACK: Acknowledges that pending interrupt has been recognized
- Reset: Initializes everything connected to the bus
- Clock: on a synchronous bus, everything is synchronized to this signal
For simple transfer, memory and processor transfer the valid data by placing on data bus blindly.
Here cross line indicate the time for new valid data.
Strobe Transfer:
In this mode the transmitter transmit the data by placing valid data on data bus and raises the
strobe pulse to indicate initiation of data transfer.
- The peripheral asserts its line low to ask processor ―Are you ready?
- The processor raises its ACK line high to say ― I am ready.
- Peripheral then sends data and raises its line low to say ―Here is some valid data for
you.
- Processor then reads the data and drops its ACK line to say, ―I have the data, thank you,
and I await your request to send the next byte of data.
I/O Addressing:
A microprocessor communicates with other devices using some of its pins. Broadly we can classify them
as
Microprocessor is connected with memory and I/O devices via common address and data bus. Only one
device can send data at a time and other devices can only receive that data. If more than one device sends
data at the same time, the data gets garbled. In order to avoid this situation, ensuring that the proper device
gets addressed at proper time, the technique called address decoding is used.
Interrupts:
- Interrupt is signals send by an external device to the processor, to request the processor to perform
a particular task or work.
- Mainly in the microprocessor based system the interrupts are used for data transfer between the
peripheral and the microprocessor.
- The processor will check the interrupts always at the 2nd T-state of last machine cycle.
- If there is any interrupt it accept the interrupt and send the INTA (active low) signal to the
peripheral.
- The vectored address of particular interrupt is stored in program counter.
- The processor executes an interrupt service routine (ISR) addressed in program counter.
- It returned to main program by RET instruction.
Types of Interrupts
Interrupts can be broadly classified as
- Hardware Interrupts
These are interrupts caused by the connected devices.
- Software Interrupts
These are interrupts deliberately introduced by software instructions to generate
user defined exceptions
- Trap
These are interrupts used by the processor alone to detect any exception such as
divide by zero.
Depending on the service the interrupts also can be classified as:
- Fixed interrupt
Address of the ISR built into microprocessor, cannot be changed
Either ISR stored at address or a jump to actual ISR stored if not enough bytes
available
- Vectored interrupt
Peripheral must provide the address of the ISR
Common when microprocessor has multiple peripherals connected by a system
bus
- Compromise between fixed and vectored interrupts
One interrupt pin
Table in memory holding ISR addresses (maybe 256 words)
Peripheral doesn’t provide ISR address, but rather index into table
Fewer bits are sent by the peripheral
Can move ISR location without changing peripheral
- The I/O unit issues an interrupt signal to the processor for exchange of data between them.
- The processor finishes execution of the current instruction before responding to the interrupt.
- The processor sends an acknowledgement signal to the device that it issued the interrupt.
- The processor transfers its control to the requested routine called ―Interrupt Service Routine (ISR)
by saving the contents of program status word (PSW) and program counter (PC).
- The processor now loads the PC with the location of interrupt service routine and the fetches the
instructions. The result is transferred to the interrupt handler program.
During any given bus cycle, one of the system components connected to the system bus is given control
of the bus. This component is said to be the master during that cycle and the component it is communicating
with is said to be the slave. The CPU with its bus control logic is normally the master, but other specially
designed components can gain control of the bus by sending a bus request to the CPU. After the current
bus cycle is completed the CPU will return a bus grant signal and the component sending the request will
become the master.
The process of transferring the data in between memory and IO called as direct memory access (DMA). It
is done by a controller called as DMA controller.
Let us assume that the Priority of the devices are Device1 > Device 2 … then daisy chain arbiter works
as:
- The Processor is executing its program.
Use of a cache structure insulates CPU from frequent accesses to main memory.
Main memory can be moved off local bus to a system bus.
Expansion bus interface buffers data transfers between system bus and I/O
controllers on expansion.
- Parallel communication
– Physical layer capable of transporting multiple bits of data
- Serial communication
– Physical layer transports one bit of data at a time
A real-time system is defined as a data processing system in which the time interval required to process
and respond to inputs is so small that it controls the environment. The time taken by the system to respond
to an input and display of required updated information is termed as the response time. So in this method,
the response time is very less as compared to online processing.
Real-time systems are used when there are rigid time requirements on the operation of a processor or the
flow of data and real-time systems can be used as a control device in a dedicated application. A real-time
operating system must have well-defined, fixed time constraints, otherwise the system will fail. For
example, scientific experiments, medical imaging systems, industrial control systems, weapon systems,
robots, air traffic control systems, etc.
- Idle (Created) State: The task has been created and memory allotted to its structure. However, it
is not ready and is not schedulable by kernel.
- Ready (Active) State: The created task is ready and is schedulable by the kernel but not running
at present as another higher priority task is scheduled to run and gets the system resources at this
instance.
- Running state: Executing the codes and getting the system resources at this instance. It will run
till it needs some IPC (input) or wait for an event or till it gets preempted by another higher priority
task than this one.
- Blocked (waiting) state: Execution of task codes suspends after saving the needed parameters into
its context. It needs some IPC (input) or it needs to wait for an event or wait for higher priority task
to block to enable running after blocking.
- Deleted (finished) state: The created task has memory de allotted to its structure i.e. task be deleted
such that It frees the memory.
- A data structure having the information using which the OS controls the process state.
- Task Information at the TCB are:
TaskID: The unique identifier use to define a task. For example, in case of 8-bit ID, a number between 0
and 255 be used to define TaskID.
Task Context: It includes the current status of program counter, stack pointer, status of CPU register and
Status Register.
Task priority: It stores the priority level of parent as well as child task available in Task List. The priority
is a number used as the identifier.
Task Context_init: it is a pointer to the processor memory that stores following information.
- Allocated program memory address blocks in physical memory and in secondary (virtual) memory
for the tasks-codes.
- Allocated task-specific data address blocks.
- Allocated task-stack addresses for the functions called during running of the process.
- Allocated addresses of CPU register-save area as a task context represents by CPU registers, which
include the program counter and stack pointer.
Context Switch
When the multithreading kernel decides to run a different thread, it simply saves the current thread’s context
(CPU registers) in the current thread’s context storage area (the thread control block, or TCB). Once this
operation is performed, the new thread’s context is restored from its TCB and the CPU resumes execution
of the new thread’s code. This process is called a context switch. Context switching adds overhead to the
application.
Task Management:
The task management operation defines the following operations:
- Creation of new task with TCB.
- Task termination: remove the TCB
- Change Priority: modify the TCB
- State-inquiry: read the TCB
Interrupt Handling:
An interrupt is a hardware mechanism used to inform the CPU that an asynchronous event has occurred.
When an interrupt is recognized, the CPU saves all of its context (i.e., registers) and jumps to a special
subroutine called an Interrupt Service Routine, or ISR. The ISR processes the event, and upon completion
of the ISR, the program returns to:
- the background for a foreground / background system,
- the interrupted thread for a non-preemptive kernel, or
- The highest priority thread ready to run for a preemptive kernel.
Interrupts allow a microprocessor to process events when they occur. This prevents the microprocessor
from continuously polling an event to see if it has occurred. Microprocessors allow interrupts to be ignored
and recognized through the use of two special instructions: disable interrupts and enable interrupts,
respectively.
The interrupt handlers hands the interrupt generated by external devices as below:
- The current context of the task is saved on stack.
- Block the task and branches the program control to beginning address of ISR and executes the
ISR to serve the interrupt.
- Terminates from interrupt routine and read the context of the blocked task.
In a real-time environment, interrupts should be disabled as little as possible. Disabling interrupts affects
interrupt latency and may cause interrupts to be missed. Processors generally allow interrupts to be nested.
This means that while servicing an interrupt, the processor will recognize and service other (more
important) interrupts, as shown in Figure below.
Scheduler
The scheduler is the part of the kernel responsible for determining which thread will run next. Most real-
time kernels are priority based. Each thread is assigned a priority based on its importance. Establishing the
priority for each thread is application specific. In a priority-based kernel, control of the CPU will always be
given to the highest priority thread ready to run. In a preemptive kernel, when a thread makes a higher
priority thread ready to run, the current thread is pre-empted (suspended) and the higher priority thread is
immediately given control of the CPU. If an interrupt service routine (ISR) makes a higher priority thread
ready, then when the ISR is completed the interrupted thread is suspended and the new higher priority
thread is resumed.
With a preemptive kernel, execution of the highest priority thread is deterministic; you can determine when
the highest priority thread will get control of the CPU.
Application code using a preemptive kernel should not use non-reentrant functions, unless exclusive access
to these functions is ensured through the use of mutual exclusion semaphores, because both a low- and a
high-priority thread can use a common function. Corruption of data may occur if the higher priority thread
preempts a lower priority thread that is using the function.
To summarize, a preemptive kernel always executes the highest priority thread that is ready to run. An
interrupt preempts a thread. Upon completion of an ISR, the kernel resumes execution to the highest priority
thread ready to run (not the interrupted thread). Thread-level response is optimum and deterministic.
Reentrancy
A reentrant function can be used by more than one thread without fear of data corruption. A reentrant
function can be interrupted at any time and resumed at a later time without loss of data. Reentrant functions
either use local variables (i.e., CPU registers or variables on the stack) or protect data when global variables
are used. An example of a reentrant function is shown below:
Swap () is a simple function that swaps the contents of its two arguments. Since Temp is a global variable,
if the swap () function gets preempted after the first line by a higher priority thread which also uses the
swap () function, then when the low priority thread resumes it will use the Temp value that was used by the
high priority thread.
We can make swap () reentrant with one of the following techniques:
- Declare Temp local to swap ().
- Disable interrupts before the operation and enable them afterwards.
- Use a semaphore.
Thread Priority
A priority is assigned to each thread. The more important the thread, the higher the priority given to it.
- Static Priorities
Thread priorities are said to be static when the priority of each thread does not change during the
application's execution. Each thread is thus given a fixed priority at compile time. All the threads
and their timing constraints are known at compile time in a system where priorities are static
- Dynamic Priorities
Thread priorities are said to be dynamic if the priority of threads can be changed during the
application's execution; each thread can change its priority at run time. This is a desirable feature
to have in a real-time kernel to avoid priority inversions.
- Priority Inversions
Priority inversion is a problem in real-time systems and occurs mostly when you use a real-time
kernel. Priority inversion is any situation in which a low priority thread holds a resource while a
Semaphores
The semaphore was invented by Edgser Dijkstra in the mid-1960s. It is a protocol mechanism offered by
most multithreading kernels. Semaphores are used to:
- control access to a shared resource (mutual exclusion),
- signal the occurrence of an event, and
- Allow two threads to synchronize their activities.
A semaphore is a key that code acquires in order to continue execution. If the semaphore is already in use,
the requesting thread is suspended until the semaphore is released by its current owner. In other words, the
requesting thread says: ''Give me the key. If someone else is using it, I am willing to wait for it!" There are
two types of semaphores: binary semaphores and counting semaphores. As its name implies, a binary
semaphore can only take two values: 0 or 1. A counting semaphore allows values between 0 and 255, 65535,
or 4294967295, depending on whether the semaphore mechanism is implemented using 8, 16, or 32 bits,
respectively. The actual size depends on the kernel used. Along with the semaphore's value, the kernel also
needs to keep track of threads waiting for the semaphore's availability.
Generally, only three operations can be performed on a semaphore: Create (), Wait (), and Signal (). The
initial value of the semaphore must be provided when the semaphore is initialized. The waiting list of
threads is always initially empty.
A thread desiring the semaphore will perform a Wait () operation. If the semaphore is available (the
semaphore value is greater than 0), the semaphore value is decremented and the thread continues execution.
If the semaphore's value is 0, the thread performing a Wait () on the semaphore is placed in a waiting list.
Most kernels allow you to specify a timeout; if the semaphore is not available within a certain amount of
time, the requesting thread is made ready to run and an error code (indicating that a timeout has occurred)
is returned to the caller.
A thread releases a semaphore by performing a Signal () operation. If no thread is waiting for the semaphore,
the semaphore value is simply incremented. If any thread is waiting for the semaphore, however, one of the
threads is made ready to run and the semaphore value is not incremented; the key is given to one of the
threads waiting for it. Depending on the kernel, the thread that receives the semaphore is either:
Following listing shows how you can share data using a semaphore. Any thread needing access to the same
shared data calls OS_SemaphoreWait(), and when the thread is done with the data, the thread calls
OS_SemaphoreSignal(). Both of these functions are described later. You should note that a semaphore is
an object that needs to be initialized before it is used; for mutual exclusion, a semaphore is initialized to a
value of 1. Using a semaphore to access shared data doesn't affect interrupt latency. If an ISR or the current
thread makes a higher priority thread ready to run while accessing shared data, the higher priority thread
executes immediately.
Semaphores are especially useful when threads share I/O devices. Imagine what would happen if two
threads were allowed to send characters to a printer at the same time. The printer would contain
interleaved data from each thread. For instance, the printout from Thread 1 printing "I am Thread 1!"
and Thread 2 printing "I am Thread 2!" could result in:
“I Ia amm T Threahread d1 !2!”
In this case, use a semaphore and initialize it to 1 (i.e., a binary semaphore). The rule is simple: to access
the printer each thread first must obtain the resource's semaphore.
In this case, use a semaphore and initialize it to 1 (i.e., a binary semaphore). The rule is simple: to access
the printer each thread first must obtain the resource's semaphore.
Figure below shows threads competing for a semaphore to gain exclusive access to the printer. Note that
the semaphore is represented symbolically by a key, indicating that each thread must obtain this key to
use the printer.
Note that, in this case, the semaphore is drawn as a flag to indicate that it is used to signal the occurrence
of an event (rather than to ensure mutual exclusion, in which case it would be drawn as a key). When used
as a synchronization mechanism, the semaphore is initialized to 0. Using a semaphore for this type of
synchronization is called a unilateral rendezvous. A thread initiates an I/O operation and waits for the
semaphore. When the I/O operation is complete, an ISR (or another thread) signals the semaphore and the
thread is resumed.
If the kernel supports counting semaphores, the semaphore would accumulate events that have not yet been
processed. Note that more than one thread can be waiting for an event to occur. In this case, the kernel
could signal the occurrence of the event either to:
- the highest priority thread waiting for the event to occur or
- the first thread waiting for the event.
Depending on the application, more than one ISR or thread could signal the occurrence of the event.Two
threads can synchronize their activities by using two semaphores, as shown in Figure below. This is called
a bilateral rendezvous. A bilateral rendezvous is similar to a unilateral rendezvous, except both threads must
synchronize with one another before proceeding.
Interthread Communication
It is sometimes necessary for a thread or an ISR to communicate information to another thread. This
information transfer is called interthread communication. Information may be communicated between
threads in two ways: through global data or by sending messages.
When using global variables, each thread or ISR must ensure that it has exclusive access to the variables.
If an ISR is involved, the only way to ensure exclusive access to the common variables is to disable
interrupts. If two threads are sharing data, each can gain exclusive access to the variables either by disabling
and enabling interrupts or with the use of a semaphore (as we have seen). Note that a thread can only
communicate information to an ISR by using global variables. A thread is not aware when a global variable
is changed by an ISR, unless the ISR signals the thread by using a semaphore or unless the thread polls the
contents of the variable periodically.
Semaphores are useful either for synchronizing execution of multiple tasks or for coordinating access to a
shared resource. The following examples and general discussions illustrate using different types of
semaphores to address common synchronization design requirements effectively, as listed:
wait-and-signal synchronization,
multiple-task wait-and-signal synchronization,
credit-tracking synchronization,
single shared-resource-access synchronization,
recursive shared-resource-access synchronization, and
multiple shared-resource-access synchronization.
Note that, for the sake of simplicity, not all uses of semaphores are listed here. Also, later chapters of this
book contain more advanced discussions on the different ways that mutex semaphores can handle priority
inversion.
Wait-and-Signal Synchronization
Two tasks can communicate for the purpose of synchronization without exchanging data. For example, a
binary semaphore can be used between two tasks to coordinate the transfer of execution control, as shown
in figure below.
When coordinating the synchronization of more than two tasks, use the flush operation on the task-waiting list of a
binary semaphore, as shown in Figure below.
As in the previous case, the binary semaphore is initially unavailable (value of 0). The higher priority tWaitTasks
1, 2, and 3 all do some processing; when they are done, they try to acquire the unavailable semaphore and, as a result,
block. This action gives tSignalTask a chance to complete its processing and execute a flush command on the
semaphore, effectively unblocking the three tWaitTasks.
Credit-Tracking Synchronization
Sometimes the rate at which the signaling task executes is higher than that of the signaled task. In this case, a
mechanism is needed to count each signaling occurrence. The counting semaphore provides just this facility. With a
counting semaphore, the signaling task can continue to execute and increment a count at its own pace, while the wait
task, when unblocked, executes at its own pace, as shown in figure below.
Again, the counting semaphore's count is initially 0, making it unavailable. The lower priority tWaitTask tries to
acquire this semaphore but blocks until tSignalTask makes the semaphore available by performing a release on it.
Even then, tWaitTask will waits in the ready state until the higher priority tSignalTask eventually relinquishes
the CPU by making a blocking call or delaying itself.
Single Shared-Resource-Access Synchronization
One of the more common uses of semaphores is to provide for mutually exclusive access to a shared resource. A
shared resource might be a memory location, a data structure, or an I/O device-essentially anything that might have
to be shared between two or more concurrent threads of execution. A semaphore can be used to serialize access to a
shared resource, as shown in figure below.
Sometimes a developer might want a task to access a shared resource recursively. This situation might
exist if tAccessTask calls Routine A that calls Routine B, and all three need access to the same shared
resource, as shown in figure below.
If a semaphore were used in this scenario, the task would end up blocking, causing a deadlock. When a routine is
called from a task, the routine effectively becomes a part of the task. When Routine A runs, therefore, it is running as
a part of tAccessTask. Routine A trying to acquire the semaphore is effectively the same as tAccessTask trying to
acquire the same semaphore. In this case, tAccessTask would end up blocking while waiting for the unavailable
semaphore that it already has.
One solution to this situation is to use a recursive mutex. After tAccessTask locks the mutex, the task owns it.
Additional attempts from the task itself or from routines that it calls to lock the mutex succeed. As a result, when
Routines A and B attempt to lock the mutex, they succeed without blocking.
Multiple Shared-Resource-Access Synchronization
For cases in which multiple equivalent shared resources are used, a counting semaphore comes in handy, as shown
in Figure
Note that this scenario does not work if the shared resources are not equivalent. The counting semaphore's count is
initially set to the number of equivalent shared resources: in this example, 2. As a result, the first two tasks requesting
a semaphore token are successful. However, the third task ends up blocking until one of the previous two tasks releases
a semaphore token.
Knowing the capability of the memory management system can aid application design and help avoid
pitfalls. For example, in many existing embedded applications, the dynamic memory allocation
routine, malloc, is called often. It can create an undesirable side effect called memory fragmentation. This
generic memory allocation routine, depending on its implementation, might impact an application's
performance. In addition, it might not support the allocation behavior required by the application.
Many embedded devices (such as PDAs, cell phones, and digital cameras) have a limited number of
applications (tasks) that can run in parallel at any given time, but these devices have small amounts of
physical memory onboard. Larger embedded devices (such as network routers and web servers) have more
physical memory installed, but these embedded systems also tend to operate in a more dynamic
environment, therefore making more demands on memory. Regardless of the type of embedded system, the
common requirements placed on a memory management system are minimal fragmentation, minimal
management overhead, and deterministic allocation time.
Dynamic Memory Allocation in Embedded Systems
It is known that the program code, program data, and system stack occupy the physical memory after
program initialization completes. Either the RTOS or the kernel typically uses the remaining physical
memory for dynamic memory allocation. This memory area is called the heap . Memory management in
the context of this chapter refers to the management of a contiguous block of physical memory, although
the concepts introduced in this apply to the management of non-contiguous memory blocks as well. These
concepts also apply to the management of various types of physical memory. In general, a memory
management facility maintains internal information for a heap in a reserved memory area called the control
block. Typical internal information includes:
the starting address of the physical memory block used for dynamic memory allocation,
the overall size of this physical memory block, and
the allocation table that indicates which memory areas are in use, which memory areas are free,
and the size of each free region.
Memory Fragmentation and Compaction
In the example implementation, the heap is broken into small, fixed-size blocks. Each block has a unit size
that is power of two to ease translating a requested size into the corresponding required number of units. In
this example, the unit size is 32 bytes. The dynamic memory allocation function, malloc, has an input
parameter that specifies the size of the allocation request in bytes. malloc allocates a larger block, which is
made up of one or more of the smaller, fixed-size blocks. The size of this larger memory block is at least
as large as the requested size; it is the closest to the multiple of the unit size. For example, if the allocation
requests 100 bytes, the returned block has a size of 128 bytes (4 units x 32 bytes/unit). As a result, the
requestor does not use 28 bytes of the allocated memory, which is called memory fragmentation. This
specific form of fragmentation is called internal fragmentation because it is internal to the allocated block.
The allocation table can be represented as a bitmap, in which each bit represents a 32-byte unit. Figure
shows the states of the allocation table after a series of invocations of the malloc and free functions. In this
example, the heap is 256 bytes.
Step 6 shows two free blocks of 32 bytes each. Step 7, instead of maintaining three separate free blocks,
shows that all three blocks are combined to form a 128-byte block. Because these blocks have been
combined, a future allocation request for 96 bytes should succeed.
Figure below shows another example of the state of an allocation table. Note that two free 32-byte blocks
are shown. One block is at address 0x10080, and the other at address 0x101C0, which cannot be used for
any memory allocation requests larger than 32 bytes. Because these isolated blocks do not contribute to the
contiguous free space needed for a large allocation request, their existence makes it more likely that a large
request will fail or take too long. The existence of these two trapped blocks is considered external
fragmentation because the fragmentation exists in the table, not within the blocks themselves. One way to
eliminate this type of fragmentation is to compact the area adjacent to these two blocks. The range of
memory content from address 0x100A0 (immediately following the first free block) to address 0x101BF
(immediately preceding the second free block is shifted 32 bytes lower in memory, to the new range of
0x10080 to 0x1019F, which effectively combines the two free blocks into one 64-byte block. This new free
block is still considered memory fragmentation if future allocations are potentially larger than 64 bytes.
Therefore, memory compaction continues until all of the free blocks are combined into one large chunk.
Memory compaction is allowed if the tasks that own those memory blocks reference the blocks using virtual
addresses. Memory compaction is not permitted if tasks hold physical addresses to the allocated memory
blocks.
In many cases, memory management systems should also be concerned with architecture-specific memory
alignment requirements. Memory alignment refers to architecture-specific constraints imposed on the
address of a data item in memory. Many embedded processor architectures cannot access multi-byte data
items at any address. For example, some architecture requires multi-byte data items, such as integers and
long integers, to be allocated at addresses that are a power of two. Unaligned memory addresses result in
bus errors and are the source of memory access exceptions.
Some conclusions can be drawn from this example. An efficient memory manager needs to perform the
following chores quickly:
Determine if a free block that is large enough exists to satisfy the allocation request. This work
is part of the malloc operation.
Update the internal management information. This work is part of both
the malloc and free operations.
Determine if the just-freed block can be combined with its neighboring free blocks to form a
larger piece. This work is part of the free operation.
The structure of the allocation table is the key to efficient memory management because the structure
determines how the operations listed earlier must be implemented. The allocation table is part of the
overhead because it occupies memory space that is excluded from application use. Consequently, one other
requirement is to minimize the management overhead.
Microprocessors Microcontrollers
Microprocessor contains ALU, General purpose Microcontroller contains the circuitry of
registers, stack pointer, program counter, clock microprocessor, and in addition it has built in ROM,
timing circuit, interrupt circuit. RAM, I/O Devices, Timers/Counters etc.
Few bit handling instruction It has many bit handling instructions
Less number of pins are multifunctional More number of pins are multifunctional
Single memory map for data and code (program) Separate memory map for data and code (program)
Access time for memory and IO are more Less access time for built in memory and IO.
Microprocessor based system requires additional It requires less additional hardwires.
hardware
More flexible in the design point of view Less flexible since the additional circuits which is
residing inside the microcontroller is fixed for a
particular microcontroller
Large number of instructions with flexible Limited number of instructions with few
addressing modes addressing modes
CY - carry flag
AC - auxiliary carry flag
F0 - available to the user for general purpose
RS1, RS0 - register bank select bits
- Stack Pointer (SP) – it contains the address of the data item on the top of the stack. Stack may
reside anywhere on the internal RAM. On reset, SP is initialized to 07 so that the default stack will
start from address 08 onwards.
- Data Pointer (DPTR) – DPH (Data pointer higher byte), DPL (Data pointer lower byte). This is a
16 bit register which is used to furnish address information for internal and external program
memory and for external data memory.
- Program Counter (PC) – 16 bit PC contains the address of next instruction to be executed. On reset
PC will set to 0000. After fetching every instruction PC will increment by one.
Pin Diagram:
- Pins 1-8: PORT 1. Each of these pins can be configured as an input or an output.
- Pin 9 RESET. A logic one on this pin disables the microcontroller and clears the contents of most
registers. In other words, the positive voltage on this pin resets the microcontroller. By applying
logic zero to this pin, the program starts execution from the beginning.
- Pins10-17 PORT 3. Similar to port 1, each of these pins can serve as general input or output.
Besides, all of them have alternative functions.
- Pin 10 RXD. Serial asynchronous communication input or Serial synchronous communication
output.
- Pin 11 TXD. Serial asynchronous communication output or Serial synchronous communication
clock output.
- Pin 12 INT0.External Interrupt 0 input.
- Pin 13 INT1. External Interrupt 1 input.
- Pin 14 T0. Counter 0 clock input.
- Pin 15 T1. Counter 1 clock input
- Pin 16 WR. Write to external (additional) RAM.
- Pin 17 RD. Read from external RAM
- Register Banks: 00h to 1Fh. The 8051 uses 8 general-purpose registers R0 through R7 (R0, R1, R2,
R3, R4, R5, R6, and R7). There are four such register banks. Selection of register bank can be done
through RS1, RS0 bits of PSW. On reset, the default Register Bank 0 will be selected.
- Bit Addressable RAM: 20h to 2Fh . The 8051 supports a special feature which allows access to bit
variables. This is where individual memory bits in Internal RAM can be set or cleared. In all there
are 128 bits numbered 00h to 7Fh. Being bit variables any one variable can have a value 0 or 1. A
bit variable can be set with a command such as SETB and cleared with a command such as CLR.
Example instructions are:
SETB 25h ; sets the bit 25h (becomes 1)
CLR 25h ; clears bit 25h (becomes 0) Note, bit 25h is actually bit 5 of Internal RAM location 24h.
The Bit Addressable area of the RAM is just 16 bytes of Internal RAM located between 20h and
2Fh.
- General Purpose RAM: 30h to 7Fh. Even if 80 bytes of Internal RAM memory are available for
general-purpose data storage, user should take care while using the memory location from 00 -2Fh
since these locations are also the default register space, stack space, and bit addressable space. It is
The lower order address and data bus are multiplexed. De-multiplexing is done by the latch. Initially
the address will appear in the bus and this latched at the output of latch using ALE signal. The output
of the latch is directly connected to the lower byte address lines of the memory. Later data will be
available in this bus. Still the latch output is address itself. The higher byte of address bus is directly
connected to the memory. The number of lines connected depends on the memory size.
The RD and WR (both active low) signals are connected to RAM for reading and writing the data.
PSEN of microcontroller is connected to the output enable of the ROM to read the data from the
memory.
EA (active low) pin is always grounded if we use only external memory. Otherwise, once the program
size exceeds internal memory the microcontroller will automatically switch to external memory.
Stack:
A stack is a last in first out memory. In 8051 internal RAM space can be used as stack. The address of
the stack is contained in a register called stack pointer. Instructions PUSH and POP are used for stack
operations. When a data is to be placed on the stack, the stack pointer increments before storing the
data on the stack so that the stack grows up as data is stored (pre-increment). As the data is retrieved
from the stack the byte is read from the stack, and then SP decrements to point the next available byte
Instruction Syntax.
Register addressing.
In this addressing mode the register will hold the data. One of the eight general registers (R0 to R7) can be
used and specified as the operand.
Eg. MOV A, R0
ADD A, R6
R0 – R7 will be selected from the current selection of register bank. The default register bank will be bank
0.
Direct addressing
There are two ways to access the internal memory. Using direct address and indirect address. Using direct
addressing mode we can not only address the internal memory but SFRs also. In direct addressing, an 8 bit
internal data memory address is specified as part of the instruction and hence, it can specify the address
only in the range of 00H to FFH. In this addressing mode, data is obtained directly from the memory.
Eg. MOV A, 60h
ADD A, 30h
Indirect addressing
The indirect addressing mode uses a register to hold the actual address that will be used in data movement.
Registers R0 and R1 and DPTR are the only registers that can be used as data pointers. Indirect addressing
cannot be used to refer to SFR registers. Both R0 and R1 can hold 8 bit address and DPTR can hold 16 bit
address.
Eg. MOV A,@R0
ADD A,@R1
MOVX A,@DPTR
Indexed addressing.
In indexed addressing, either the program counter (PC), or the data pointer (DTPR)—is used to hold the
base address, and the A is used to hold the offset address. Adding the value of the base address to the value
8051 Instructions
The instructions of 8051 can be broadly classified under the following headings.
Subtraction
In this group, we have instructions to
i. Subtract the contents of A with immediate data with or without carry.
i. SUBB A,
#45H
ii. SUBB A,
#OB4H
ii. Subtract the contents of A with register Rn with or without carry.
i. SUBB A,
R5 ii.
SUBB A,
R2
iii. Subtract the contents of A with contents of memory with or without carry using direct and
indirect addressing
i. SUBB A,
51H ii.
SUBB A,
75H
iii. SUBB
A, @R1 iv.
SUBB A,
@R0
Multiplication
MUL AB. This instruction multiplies two 8 bit unsigned numbers which are stored in A and B
register. After multiplication the lower byte of the result will be stored in accumulator and higher
byte of result will be stored in B register.
Eg. MOV A,#45H ;[A]=45H
MOV B,#0F5H ;[B]=F5H
MUL AB ;[A] x [B] = 45 x F5 = 4209
;[A]=09H, [B]=42H
DIV AB. This instruction divides the 8 bit unsigned number which is stored in A by the 8 bit
unsigned number which is stored in B register. After division the result will be stored in
accumulator and remainder will be stored in B register.
Eg. MOV A,#45H ;[A]=0E8H
MOV B,#0F5H ;[B]=1BH
DIV AB ;[A] / [B] = E8 /1B = 08 H with remainder 10H
;[A] = 08H, [B]=10H
When two BCD numbers are added, the answer is a non-BCD number. To get the result in BCD, we
use DA A instruction after the addition. DA A works as follows.
• If lower nibble is greater than 9 or auxiliary carry is 1, 6 is added to lower nibble.
• If upper nibble is greater than 9 or carry is 1, 6 is added to upper nibble.
Eg 1: MOV A,#23H
MOV R1,#55H
ADD A,R1 // [A]=78
DA A // [A]=78 no changes in the accumulator after da a
Eg 2: MOV A,#53H
MOV R1,#58H
ADD A,R1 // [A]=ABh
DA A // [A]=11, C=1 . ANSWER IS 111. Accumulator data is changed after DA A
INC increments the value of source by 1. If the initial value of register is FFh, incrementing the value
will cause it to reset to 0. The Carry Flag is not set when the value "rolls over" from 255 to 0.
In the case of "INC DPTR", the value two-byte unsigned integer value of DPTR is incremented. If the
initial value of DPTR is FFFFh, incrementing the value will cause it to reset to 0.
DEC decrements the value of source by 1. If the initial value of is 0, decrementing the value will
cause it to reset to FFh. The Carry Flag is not set when the value "rolls over" from 0 to FFh.
Logical Instruction:
Logical AND
Logical OR
ORL destination, source: ORL does a bitwise "OR" operation between source and destination,
leaving the resulting value in destination. The value in source is not affected. " OR " instruction
logically OR the bits of source and destination.
ORL A,#DATA ORL A, Rn
ORL A,DIRECT ORL A,@Ri
ORL DIRECT,A ORL DIRECT, #DATA
Logical Ex-OR
XRL destination, source: XRL does a bitwise "EX-OR" operation between source and
destination, leaving the resulting value in destination. The value in source is not affected. " XRL
" instruction logically EX-OR the bits of source and destination.
XRL A,#DATA XRL A,Rn
XRL A,DIRECT XRL A,@Ri
XRL DIRECT,A XRL DIRECT, #DATA
Logical NOT
CPL complements operand, leaving the result in operand. If operand is a single bit then the
state of the bit will be reversed. If operand is the Accumulator then all the bits in the
Accumulator will be reversed.
Rotate Instructions
RR A
This instruction is rotate right the accumulator. Its operation is illustrated below. Each bit is shifted one
location to the right, with bit 0 going to bit 7.
RL A
Rotate left the accumulator. Each bit is shifted one location to the left, with bit 7 going to bit 0
RLC A
Rotate left through the carry. Each bit is shifted one location to the left, with bit 7 going into the carry bit
in the PSW, while the carry goes into bit 0.
Relative Jump
Jump that replaces the PC (program counter) content with a new address that is greater than (the
address following the jump instruction by 127 or less) or less than (the address following the jump
by 128 or less) is called a relative jump. Schematically, the relative jump can be shown as follows:
-
JC <relative address>
JNC <relative address>
JB bit, <relative address>
JNB bit, <relative address>
JBC bit, <relative address>
CJNE <destination byte>, <source byte>, <relative address>
DJNZ <byte>, <relative address>
JZ <relative address>
JNZ <relative address>
00 0000 - 07FF
01 0800 - 0FFF
02 1000 - 17FF
03 1800 - 1FFF
.
.
1E F000 - F7FF
1F F800 - FFFF
It can be seen that the upper 5bits of the program counter (PC) hold the page number and the lower
11bits of the PC hold the address within that page. Thus, an absolute address is formed by taking
page numbers of the instruction (from the program counter) following the jump and attaching the
specified 11bits to it to form the 16-bit address.
Applications that need to access the entire program memory from 0000H to FFFFH use long
absolute jump. Since the absolute address has to be specified in the op-code, the instruction length
is 3 bytes (except for JMP @ A+DPTR). This jump is not re-locatable.
Example: -
1. The unconditional jump is a jump in which control is transferred unconditionally to the target location.
a. LJMP (long jump). This is a 3-byte instruction. First byte is the op-code and second and third
bytes represent the 16-bit target address which is any memory location from 0000 to FFFFH
eg: LJMP 3000H
b. AJMP: this causes unconditional branch to the indicated address, by loading the 11 bit
address to 0 -10 bits of the program counter. The destination must be therefore within the
same 2K blocks.
c. SJMP (short jump). This is a 2-byte instruction. First byte is the op-code and second byte is
the relative target address, 00 to FFH (forward +127 and backward -128 bytes from the current
PC value). To calculate the target address of a short jump, the second byte is added to the PC
value which is address of the instruction immediately below the jump.
Bit level JUMP instructions will check the conditions of the bit and if condition is true, it jumps to the
address specified in the instruction. All the bit jumps are relative jumps.
JB bit, rel ; jump if the direct bit is set to the relative address specified.
JNB bit, rel ; jump if the direct bit is clear to the relative address specified.
JBC bit, rel ; jump if the direct bit is set to the relative address specified and then clear the bit.
RET instruction
RET instruction pops top two contents from the stack and load it to PC.
g. [PC15-8] = [[SP]] ;content of current top of the stack will be moved to higher byte of PC.
h. [SP]=[SP]-1 ; (SP decrements)
i. [PC7-0] = [[SP]] ;content of bottom of the stack will be moved to lower byte of PC.
j. [SP]=[SP]-1 ; (SP decrements again)
1. LOGICAL AND
a. ANL C,BIT(BIT ADDRESS) ; ‘LOGICALLY AND’ CARRY AND CONTENT OF BIT ADDRESS, STORE
RESULT IN CARRY
b. ANL C, /BIT; ; ‘LOGICALLY AND’ CARRY AND COMPLEMENT OF CONTENT OF BIT ADDRESS, STORE RESULT IN
CARRY
2. LOGICAL OR
a. ORL C,BIT(BIT ADDRESS) ; ‘LOGICALLY OR’ CARRY AND CONTENT OF BIT ADDRESS, STORE RESULT
IN CARRY
b. ORL C, /BIT; ; ‘LOGICALLY OR’ CARRY AND COMPLEMENT OF CONTENT OF BIT ADDRESS, STORE RESULT IN
CARRY
3. CLR bit
a. CLR bit ; CONTENT OF BIT ADDRESS SPECIFIED WILL BE CLEARED.
Assembler Directives.
Assembler directives tell the assembler to do something other than creating the machine code
for an instruction. In assembly language programming, the assembler directives instruct the
assembler to
1. Process subsequent assembly language instructions
2. Define program constants
3. Reserve space for variables
ORG (origin)
The ORG directive is used to indicate the starting address. It can be used only when the
program counter needs to be changed. The number that comes after ORG can be either in
hex or in decimal.
Eg: ORG 0000H ;Set PC to 0000.
All the above said tasks are accomplished by the program given below.
1. Write a program to add the values of locations 50H and 51H and store the result in locations
in 52h and 53H.
2. Write a program to store data FFH into RAM memory locations 50H to 58H using direct
addressing mode
3. Write a program to subtract a 16 bit number stored at locations 51H-52H from 55H-56H and
store the result in locations 40H and 41H. Assume that the least significant byte of data or the
result is stored in low address. If the result is positive, then store 00H, else store 01H in 42H.
ORG 0000H ; Set program counter 0000H
MOV A, 55H ; Load the contents of memory location 55 into A
CLR C ; Clear the borrow flag
SUBB A,51H ; Sub the contents of memory 51H from contents of A
MOV 40H, A ; Save the LSByte of the result in location 40H
MOV A, 56H ; Load the contents of memory location 56H into A
SUBB A, 52H ; Subtract the content of memory 52H from the content
A MOV 41H, ; Save the MSbyte of the result in location 415.
MOV A, #00 ; Load 005 into A
ADDC A, #00 ; Add the immediate data and the carry flag to A
MOV 42H, A ; If result is positive, store00H, else store 0lH in 42H
END
4. Write a program to add two 16 bit numbers stored at locations 51H-52H and 55H-56H and
store the result in locations 40H, 41H and 42H. Assume that the least significant byte of data
and the result is stored in low address and the most significant byte of data or the result is
stored in high address.
5. Write a program to store data FFH into RAM memory locations 50H to 58H using indirect
addressing mode.
ORG 0000H ; Set program counter 0000H
MOV A, #0FFH ; Load FFH into A
MOV RO, #50H ; Load pointer, R0-50H
MOV R5, #08H ; Load counter, R5-08H
Start: MOV @RO, A ; Copy contents of A to RAM pointed by R0
INC RO ; Increment pointer
DJNZ R5, start ; Repeat until R5 is zero
END
6. Write a program to add two Binary Coded Decimal (BCD) numbers stored at locations 60H and
61H and store the result in BCD at memory locations 52H and 53H. Assume that the least
significant byte of the result is stored in low address.
12. Write a program to shift a 24 bit number stored at 57H-55H to the left logically four places.
Assume that the least significant byte of data is stored in lower address.
ORG 0000H ; Set program counter 0000h
MOV R1,#04 ; Set up loop count to 4
again: MOV A,55H ; Place the least significant byte of data in A
CLR C ; Clear tne carry flag
RLC A ; Rotate contents of A (55h) left through carry
MOV 55H,A
MOV A,56H
RLC A ; Rotate contents of A (56H) left through carry
MOV 56H,A
MOV A,57H
RLC A ; Rotate contents of A (57H) left through carry
MOV 57H,A
DJNZ R1,again ; Repeat until R1 is zero
END
VHDL stands for very high-speed integrated circuit hardware description language. It is a programming
language used to model a digital system by dataflow, behavioral and structural style of modeling. This
language was first introduced in 1981 for the department of Defense (DoD) under the VHSIC program.
- The language not only defines the syntax but also defines very clear simulation semantics
for each language construct.
- Quick Time-to-Market
- Allows creation of device-independent designs that are portable to multiple vendors. Good
for ASIC Migration.
- Concurrency.
- Supports Hierarchies.
Concurrency
- VHDL supports sequential statements also, it executes one statement at a time in sequence
only.
- To ensure that design is correct as per the specifications, the designer has to write another
program known as “TEST BENCH”.
- It generates a set of test vectors and sends them to the design under test (DUT).
- Also gives the responses made by the DUT against a specifications for correct results to
ensure the functionality.
- Example:
Supports Hierarchies:
- Consider example of a Full-adder which is the top-level module, being composed of three
lower level modules i.e. half-Adder and OR gate.
- Example :
Levels of Abstraction:
- In this style of modeling the flow of data through the entity is expressed using concurrent
signal assignment statements.
Structural level
Behavioral level.
- This style of modeling specifies the behavior of an entity as a set of statements that are
executed sequentially in the specified order.
- A basic identifier may contain only capital ‘A’ - ’Z’ , ‘a’ - ’z’, ‘0’ - ’9’, underscore
character ‘_’.
Objects:
Type
Major Types
- Major types
- Composite Types
Scalar Types
Integer
- For example:
Floating Point:
Physical
Enumeration
- Example:
Composite Types
Array:
- The synthesis of multidimensional array depends upon the synthesizer being used.
Record:
type std_logic is (‘U’, ‘X’, ‘0’, ‘1’, ‘Z’, ‘W’, ‘L’, ‘H’,’-’)
‘u’ unspecified
‘x’ unknown
Alias:
- Syntax :
- Examples:
Signal Array:
- A set of signals may also be declared as a signal array which is a concatenated set of
signals.
- Example:
Subtype
- Useful for range checking and for imposing additional constraints on types.
Syntax:
Operators
2. relational operators:
3. shift operators:
4. adding operators:
6. multiplying operators:
7. miscellaneous operators:
Multi-Dimensional Arrays
For example:
For synthesizers which do not accept multidimensional arrays,one can declare two uni-
dimensional arrays.
For example:
Dataflow Level
8. A Dataflow model specifies the functionality of the entity without explicitly specifying its
structure.
9. This functionality shows the flow of information through the entity, which is expressed
primarily using concurrent signal assignment statements and block statements.
10. The primary mechanism for modeling the dataflow behavior of an entity is using the
concurrent signal assignment statement.
Entity
12. The interconnections of the design unit with the external world are enumerated.
Entity declaration:
Entity<entity_name > is
…………………….
);
15. These modes describe the different kinds of interconnections that the port can have with
the external circuitry.
Entity andgate is
a: in bit;
b : in bit
);
End andgate;
Architecture:
z : out std_logic; 1 1 1
);
End andgate;
architecture arc_andgate of andgate is
begin
z <= x and y;
end arc_andgate;
Library ieee;
use ieee.std_logic_1164.all;
Entity half_adder is
Port(
a, b: in std_logic;
c, s : out std_logic;
);
End half_adder;
architecture arc_half_adder of half_adder is
begin
c<= x and y;
s<= x xor y;
end arc_half_adder;
Signals
23. Syntax: signal signal_name <list of signals > : type := initial_value;
24. Equivalent to wires.
25. Connect design entities together and communicate changes in values within a design.
26. Computed value is assigned to signal after a specified delay called as Delta Delay.
27. Signals can be declared in an entity (it can be seen by all the architectures), in an
architecture (local to the architecture), in a package (globally available to the user of the
package) or as a parameter of a subprogram (I.e. function or procedure).
28. Signals have three properties attached to it.
Type and Type attributes,value,Time (It has a history).
29. Signal assignment be done by using assignment operator: ‘<=‘.
30. Signal assignment is concurrent outside a process & sequential within a process.
Structural Modeling:
31. An entity is modeled as a set of components connected by signals, that is, as a netlist.
32. The behavior of the entity is not explicitly apparent from its model.
33. The component instantiation statement is the primary mechanism used for describing such a model
of an entity.
34. A component instantiated in a structural description must first be declared using a component
declaration.
35. A larger design entity can call a smaller design unit in it.
36. This forms a hierarchical structure.
37. This is allowed by a feature of VHDL called component instantiation.
38. A component is a design entity in itself which is instantiated in the larger entity.
39. A component is a design entity in itself which is instantiated in the larger entity.
Library ieee;
use ieee.std_logic_1164.all;
entity andgate is
port
( c : out std_logic;
a : in std_logic;
b: in std_logic;
);
end andgate;
architecture arch_andgate of angate is
begin
c<=a and b;
end arch_andgate;
entity xorgate is
port
( c : out std_logic;
a : in std_logic;
b: in std_logic;
);
end xorgate;
architecture arch_xorgate of xorgate is
begin
Library ieee;
use ieee.std_logic_1164.all;
entity halfadder is
port
( s, c : out std_logic;
x : in std_logic;
y: in std_logic;
);
end halfadder;
architecture arch_halfadder of halfadder is
begin
s<=x xor y;
c<= x and y;
end arch_halfadder;
entity fulladder is
port
( a,b,c : in std_logic;
Sum, carry : out std_logic;
Library ieee;
use ieee.std_logic_1164.all;
entity fulladder is
entity BPA4 is
port
( A,B: in std_logic_vector(3 down to 0);
Sum_bpa: out std_logic_vector(3 down to 0);
Cin : in std_logic;
Cout: out std_logic;
);
end BPA4;
architecture arc_BPA4 of BPA4 is
component fulladder
port
( sum, carry : out std_logic;
a,b c. : in std_logic;
);
43. This is the process of combining two signals into a single set which can be individually addressed.
44. The concatenation operator is ‘&’.
45. A concatenated signal’s value is written in double quotes whereas the value of a single bit signal is
written in single quotes.
Decision-making statements:
If statements:
If (expression) then
S1
Elseif (expression) then
S2
Elseif (expression) then
S3
Elseif (expression) then
S4
………………..
…………………
…………………
Elseif (expression) then
Sn
Else
Sn+1
49. Example:
entity mux2 is
port
( i0, i1 : in bit_vector(1 downto 0);
y : out bit_vector(1 downto 0);
sel : in bit
);
end mux2;
architecture behaviour of mux2 is
begin
with sel select y <= i0 when '0',
i1 when '1';
end behaviour;
When-Else
50. syntax :
Signal_name<= expression1 when condition1
else expression2 when condition2
else expression3;
51. Example:
entity tri_state is
port
Library ieee;
use ieee.std_logic_1164.all;
entity decoder is
port
( SW : in std_logic_vector (1 down to 0);
Q : out std_logic_vector(3 down to 0);
);
end decoder;
architecture arc_decoder of decoder is
begin
if (SW = “00”) then
Q<= “0001” ;
elseif (SW = “01”) then
Library ieee;
use ieee.std_logic_1164.all;
entity enecoder is
port
( Q : out std_logic_vector (1 down to 0);
D: in std_logic_vector(3 down to 0);
);
end encoder;
architecture arc_encoder of encoder is
begin
if (D = “0001”) then
Q<= “00” ;
elseif (Q = “0001”) then
Q<= “01” ;
Library ieee;
use ieee.std_logic_1164.all;
entity MUX is
port
( Q : out std_logic;
I0,I1,I2,I3 : in std_logic;
SL: in std_logic_vector (1 down to 0);
);
end MUX;
Library ieee;
use ieee.std_logic_1164.all;
entity DMUX is
port
( Din : in std_logic;
Y0,Y1,Y2,Y3 : out std_logic;
Do your self
- A process statement defines an independent sequential process representing the behavior of some
portion of the design.
- Simplified Syntax
process_declarations
begin
sequential_statements
- The process statement represents the behavior of some portion of the design. It consists of the
sequential statements whose execution is made in order defined by the user.
- Each process can be assigned an optional label.
- The process declarative part defines local items for the process and may contain declarations of:
subprograms, types, subtypes, constants, variables, files, aliases, attributes, use clauses and group
declarations. It is not allowed to declare signals or shared variables inside processes.
- The statements, which describe the behavior in a process, are executed sequentially, in the order in
which the designer specifies them. The execution of statements, however, does not terminate with
the last statement in the process, but is repeated in an infinite loop. The loop can
be suspended and resumed with wait statements. When the next statement to be executed is
a wait statement, the process suspends its execution until a condition supporting the wait statement
is met. See respective topics for details.
- A process declaration may contain optional sensitivity list. The list contains identifiers of signals
to which the process is sensitive. A change of a value of any of those signals causes the suspended
process to resume. A sensitivity list is a full equivalent of a wait on sensitivity_list statement at the
end of the process. It is not allowed, however, to use wait statements and sensitivity list in the same
process. In addition, if a process with a sensitivity list calls a procedure, then the procedure cannot
contain any wait statements.
Positive-edge-triggered D Flip-Flop
type state_type is (A,B,C,D); --Defines the type for states in the state machine
signal state : state_type := A; --Declare the signal with the corresponding state type.
begin
process(clk)
begin
if( reset = '1' ) then --resets state and output signal when reset is asserted.
det_vld <= '0';
state <= A;
elsif ( rising_edge(clk) ) then --calculates the next state based on current state and input
end Behavioral;
VHDL code for Asynchronous counter using JK Flip Flop
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity jkc is
reset : in std_logic;
);
end jkc;
COMPONENT jkff
PORT(
clock : in std_logic;
reset : in std_logic;
j : in std_logic;
k : in std_logic;
q : out std_logic
END COMPONENT;
begin
d0 : jkff
port map (
j => '1',
k => '1',
q => temp(3)
);
d1 : jkff
port map (
j => '1',
k => '1',
q => temp(2)
);
d2 : jkff
port map (
j => '1',
q => temp(1)
);
d3 : jkff
port map (
j => '1',
k => '1',
q => temp(0)
);
end rtl;