DBMS - Unit 4
DBMS - Unit 4
DBMS - Unit 4
1. What are the advantages and disadvantages of indexed sequential file? MAY-
2011
The advantage of ordering records in a sequential file according to a key is that you
can then search the file more quickly. If you know the key value that you want, you
can use one of the relatively fast searches.
The disadvantage is that when you insert, you need to rewrite at least everything
after the insertion point, which makes inserts very expensive unless they are done at
the end of the file.
2. What is database tuning? MAY-2011
Database tuning describes a group of activities used to optimize and
homogenize the performance of a database. It usually overlaps with query tuning, but
refers to design of the database files, selection of the database management system
(DBMS), operating system and CPU the DBMS runs on.
3. Give the measures of quality of a disk.
Capacity
Access time
Seek time
Data transfer rate
Reliability
Rotational latency time.
4. Compare sequential access devices versus random access devices with an
example.
Sequential access devices Random access devices
Must be accessed from the It is possible to read data
beginning from any location
If one of the disks fails the data can be read from the other. Data will be
lost if the second disk fails before the first failed disk is repaired.
20. What is called mirroring?
The simplest approach to introducing redundancy is to duplicate every
disk. This technique is called mirroring or shadowing.
78. Which are the factors to be considered for the evaluation of indexing and
hashing techniques? DEC-2010
Refer Part A - Q. No. 41
80. What is the drawback of flash memory? DEC-2010
Disadvantages:
Flash memory cells have a limited number of write and erase cycles before
failing.
Most flash drives do not have a write-protection mechanism
82. How to choose the best evaluation plan for a query? Explain. MAY-2003
The optimizer should choose the access path that retrieves the fewest records in the
most efficient way by estimating the different cost and choosing the methods with the
least estimated cost.
83. Draw the structure of a typical node of a B+ tree and explain briefly. DEC-
2003 (Nov/Dec 2019)
B+ tree considers all the keys in nodes except the leaves as dummies. All keys are
duplicated in the leaves. This has the advantage that is all the leaves are linked
together sequentially, the entire tree may be scanned without visiting the higher
nodes at all.
B+ Tree Structure
Introduction
RAID Levels
o LEVEL 0 – non redundant striping
o LEVEL 1 – mirrored disk
o LEVEL 2 – memory style error correcting code
o LEVEL 3 – bit interleaved parity
o LEVEL 4 – block interleaved parity
o LEVEL 5 – block interleaved distributed parity
o LEVEL 6 – P + Q redundancy
Choices of RAID Level
Introduction:
o RAID - Redundant Array of Independent Disks are used to improve
the performance and reliability of disks.
o The main features of RAID are,
High capacity and high speed by using multiple disks in parallel
High reliability by storing data redundantly for data recovery under disk
failures.
Improvement of Reliability via Redundancy
Redundancy - store extra information that can be used to rebuild
information lost inadisk failure. This can be achieved through Mirroring
(or shadowing).
LEVEL 1:
LEVEL 2:
o RAID level 2, known as memory-style error-correcting-code (ECC)
organization, employs parity bits.
o Memory systems have long used parity bits for error detection and correction.
o Storage efficiency increases as the number of data disks increases
o Each byte in a memory system may have a parity bit associated with it that
records whether the numbers of bits in the byte that are set to 1 is even (parity
= 0) or odd (parity = 1).
o If one of the bits in the byte gets damaged (either a 1 becomes a 0, or a 0
becomes a 1), the parity of the byte changes and thus will not match the stored
parity.
o Similarly, if the stored parity bit gets damaged, it will not match the computed
parity.
o Thus, all 1-bit errors will be detected by the memory system.
o Error-correcting schemes store 2 or more extra bits, and can reconstruct the
data if a single bit gets damaged.
o The idea of error-correcting codes can be used directly in disk arrays by
striping bytes across disks.
o For example, the first bit of each byte could be stored in disk 1, the second bit
in disk 2, and so on until the eighth bit is stored in disk 8, and the error-
correction bits are stored in further disks.
o The disks labeled P stores the error correction bits.
o If one of the disks fails, the remaining bits of the byte and the associated error-
correction bits can be read from other disks, and can be used to reconstruct
the damaged data.
o Figure shows an array of size 4.
LEVEL 3:
o RAID level 3, bit-interleaved parity organization, improves on level 2 by
exploiting the fact that disk controllers, unlike memory systems, can detect
whether a sector has been read correctly, so a single parity bit can be used for
error correction, as well as for detection.
o RAID 3 stripes the data onto multiple disks. The parity bit generated for data
word
is stored on a different disk.
o Data is interleaved bit-wise over the data disks and a single parity disk is used
and hence overcomes single disk failures.
o Each read operation will access all data disks and a write operation will access
all
the data disks and the single parity disk .
o To recover data in a failed disk, the XOR of bits from other disks including
pari~
bit disk is computed.
o If one of the sectors gets damaged, the system knows exactly which sector it is,
and, for each bit in the sector, the system can figure out whether it s a 1 or a 0
by computing the parity of the corresponding bits from sectors in the other
disks.
o If the parity of the remaining bits is equal to the stored parity, the missing bit is
0; otherwise, it is 1.
Benefits:
o Compared to Level 2 - it is less expensive in the number of extra disks (it has only
a one-disk overhead).
o Compared to Level 1 – It needs only one parity disk for several regular disks
o The transfer rate for reading or writing a single block is N times faster than a
RAID level 1 organization using N-way striping.
LEVEL 5:
o RAID level 5, block-interleaved distributed parity, improves on level 4 by
partitioning data and parity among all N + 1 disks, instead of storing data in N
disks and parity in one disk.
o This level is called the block interleaved distributed parity disk array . Writes
whole data blocks onto different disks .
o Parity bits generated for data block stripe are distributed among all the data disks
o rather than storing them on a different dedicated disk.
o Provides best small read, large read and large write performance
o In level 5, all disks can participate in satisfying read requests.
Benefit :
It offers better read–write performance at the same cost than level 4.
LEVEL 6:
o RAID level 6, the P + Q redundancy scheme, is much like RAID level 5, but
stores extra redundant information to guard against multiple disk failures.
o In this level, two independent parities (P and Q) are generated and stored in
distributed fashion among multiple disks. Two parities provide additional fault
tolerance.
o Instead of using parity, level 6 uses error-correcting codes such as the Reed–
Solomon codes. 2 bits of redundant data are stored for every 4 bits of data.
The File is a collection of records. Using the primary key, we can access the
records. The type and frequency of access can be determined by the type of
file organization which was used for a given set of records.
File organization is used to describe the way in which the records are stored
in terms of blocks, and the blocks are placed on the storage medium.
The first approach to map the database to the file is to use the several files
and store only one fixed length record in any given file. An alternative
approach is to structure our files so that we can contain multiple lengths for
records.
Files of fixed length records are easier to implement than the files of variable
length records.
Sequential – store records in sequential order, based on the value of the search
key of each record
Heap – a record can be placed anywhere in the file where there is space
Hashing – a hash function computed on some attribute of each record; the
result specifies in which block of the file the record should be placed
A search key is any attribute or set of attributes; it need not be the primary
key, or even a super key.
FIXED-LENGTH REPRESENTATION
o Reserved Space
o List representation
1. Reserved space - If there is a maximum record length that is never exceeded, we
can use fixed-length records of that length and the free space is filled with a
special null, or end-of-record, symbol.
VARIABLE-LENGTH RECORDS :
Variable-length records arise in database systems in several ways:
Storage of multiple record types in a file
Record types that allow variable lengths for one or more fields
Record types that allow repeating fields
Consider an account record,
Indices- Definition
Types of indices
o Ordered Indices
o Hash Indices
Ordered Indices
o Primary index
Dense Index
Sparse Index
Multilevel Indices
o Secondary Indices
Hash indices
The database system would look up an index to find on which disk block
the corresponding record resides, and then fetch the disk block, to get the record.
There are two basic kinds of indices:
Ordered indices - Based on a sorted ordering of the values.
Hash indices - Based on a uniform distribution of values across a range
of buckets. The bucket to which a value is assigned is determined by a
function, called a hash function.
Each technique must be evaluated on the basis of these factors:
Files with a primary index on the search key, are called index- sequential files.
They are designed for applications that require both sequential processing of
the entire file and random access to individual records.
Figure shows a sequential file of account records taken from our banking
example.
The records are stored in search-key order, with branch-name used as the
search key.
Dense index
Sparse index
Multilevel indices
Dense index:
In a dense primary index, the index record contains the search-key
value and a pointer to the first data record with that search-key value.
The rest of the records with the same search key-value would be stored
sequentially after the first record, since, because the index is a primary
one, records are sorted on the same search key.
Dense index implementations may store a list of pointers to all records
with the same search-key value.
A hash index organizes the search keys, with their associated pointers,
into a hash file structure.
Apply a hash function on a search key to identify a bucket, and store the
key and its associated pointers in the bucket.
On the account file, for the search key account-number, the hash function
in the figure computes the sum of the digits of the account number
modulo 7.
The hash index has seven buckets, each of size 2.
One of the buckets has three keys mapped to it, so it has an overflow bucket.
Static hashing
• Hash file organization
• Hash function
• Bucket overflows
o Reason for overflow
o Handling of overflows
• Hash indices
Dynamic Hashing
• Extendable Hashing
STATIC HASHING
Hash File Organization
In a hash file organization, we obtain the address of the disk block
containing a desired record directly by computing a function on the search-key
value of the record.
Bucket is used to denote a unit of storage that can store one or more records.
A bucket is typically a disk block, but could be chosen to be smaller or larger
than a disk block.
Hash Functions
Let K denote the set of all search-key values, and let B denote the set of all
bucket addresses.
A hash function h is a function from K to B. Let h denote a hash function. To
insert a record with search key Ki, we compute h(Ki), which gives the address
of the bucket for that record.
Qualities of Hash function :
• The distribution is uniform.
• The distribution is random.
Hash structure after inserting all the records of the account relation
B+ Tree - Introduction
Structure of B+ tree
Structure of the leaf nodes
Queries On B+ Trees
Find Operation
Algorithm
Insert Operation
Algorithm
Delete Operation
Algorithm
Structure of a B+ Tree
A B+ tree index is a multilevel index.
QUERIES ON B+ TREES
Find Operation
To find all records with a search-key value of V, steps are
1. Examine the root node, looking for the smallest search-key value greater
than V.
2. If the search-key value is Ki, then follow pointer Pi to another node.
3. If no such value, then k ≥ Km−1, where m is the number of pointers in the
node.
4. Follow Pm to another node. Repeat the steps from 1-4.
5. At the leaf node, the search key value Ki equals V is found, then pointer Pi
directs to the desired record or bucket.
6. If the value V is not found in the leaf node, no record with key value V exists.
Algorithm
Example :
Let us delete “ downtown “ from the B+ tree.
Locate the entry for “Downtown” then delete the entry for “Downtown” from its
leaf node, the leaf becomes empty. This node must be eliminated from the B+ tree. To
delete a leaf node, we must delete the pointer to it from its parent.
Definition
Query processing refers to the range of activities involved in extracting data
from a database.
The activities include translation of queries in high-level database languages
into expressions that can be used at the physical level of the file system, a
variety of query-optimizing transformations, and actual evaluation of queries.
Steps involved in Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
Before query processing can begin, the system must translate the query into a
usable form.
Query processing is to translate a given query into its internal form.
In generating the internal form of the query, the parser checks the syntax of the
user’s query, verifies that the relation names appearing in the query are names
of the relations in the database, and so on.
The system constructs a parse-tree representation of the query, which it then
translates into a relational-algebra expression.
Each SQL query can itself be translated into a relational-algebra expression in
one of several ways.
Furthermore, the relational-algebra representation of a query specifies only
partially how to evaluate a query.
There are usually several ways to evaluate relational-algebra expressions.
1. B Tree
The primary distinction between the two approaches is that
1. A B-tree eliminates the redundant storage of search-key values. A B-
tree allows search-key values to appear only once.
2. Able to store the index in fewer tree nodes than in the corresponding B+
tree index.
Structure of a Leaf Node
However, roughly n times as many keys are stored in the leaf level of a B-
tree as in the nonleaf levels, and, since n is typically large, the benefit of
finding certain values early is relatively small.
Lookup in a B-tree is faster for some search keys but slower for others.
Lookup time is still proportional to the logarithm of the number of search
keys.
The proper value must be selected as a replacement from the tree of the
node containing the deleted entry.
• Negation: The result of a selection σ¬θ(r) is the set of tuples of r for which the
condition θ evaluates to false. In the absence of nulls, this set is simply the set of
tuples that are not in σθ(r).
• A8 (conjunctive selection using one index)- First determine whether an access
path is available for an attribute. If so then, one of the selection algorithms A2
through A7 can retrieve records satisfying that condition.
• A9 (conjunctive selection using composite index) - If the selection specifies an
equality condition on two or more attributes, and a composite index exists on these
combined attribute fields, then the index can be searched directly.
• A10 (conjunctive selection by intersection of identifiers) - This algorithm
requires indices with record pointers, on the fields involved in the individual