Nothing Special   »   [go: up one dir, main page]

Lecture 5.Pptx 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

NEHRU ARTS AND SCIENCE COLLEGE

COIMBATORE -641 105

DEPARTMENT OF INFORMATION TECHNOLOGY


Course: Data Structures

Facilitator : Dr.S.Saraswathi
Index Techniques
• Indexed sequential access file combines both sequential file and direct
access file organization.

• In indexed sequential access file, records are stored randomly on a direct


access device such as magnetic disk by a primary key.

• This file have multiple keys. These keys can be alphanumeric in which the
records are ordered is called primary key.

• The data can be access either sequentially or randomly using the index. The
index is stored in a file and read into memory when the file is opened.
Advantages of Indexed sequential access file
organization
• In indexed sequential access file, sequential file and random file access is
possible.

• It accesses the records very fast if the index table is properly organized.

• The records can be inserted in the middle of the file.

• It provides quick access for sequential and direct processing.

• It reduces the degree of the sequential search.


Disadvantages of Indexed sequential access file
organization
• In indexed sequential access file, sequential file and random file access is
possible.

• It accesses the records very fast if the index table is properly organized.

• The records can be inserted in the middle of the file.

• It provides quick access for sequential and direct processing.

• It reduces the degree of the sequential search.


• Indexing is a data structure technique to efficiently retrieve records
from the database files based on some attributes on which the
indexing has been done.
• These Ordered or Sequential file organization might store the data in a
dense or sparse format:

• Dense Index:
For every search key value in the data file, there is an index record.
This record contains the search key and also a reference to the first data record with that search key value.
Sparse Index:

• The index record appears only for a few items in the data file. Each item points to a block as shown.

• To locate a record, we find the index record with the largest search key value less than or equal to the
search key value we are looking for.

• We start at that record pointed to by the index record, and proceed along with the pointers in the file
(that is, sequentially) until we find the desired record.

• Number of Accesses required=log₂(n)+1, (here n=number of blocks acquired by index file)


• The important component of file is directory.
• A Directory is a collection of indexes.
• A directory may contain one index for each key or one index for some
keys.
• Index may be dense or non dense.
• The form of index is ( key value , address).
The functions on index are
• search for a key value
• Insert a new pair
• Delete a pair
• Modify or update an entry
The index techniques include
• Cylinder Surface indexing

• Hashed Indexes

• Tree Indexing –B Trees

• Trie Indexing
Cylinder Surface indexing
• This index structure is for files stored in hard disks. It needs the records to
be sorted on some field.

• Then assuming a sequence of the cylinders in the disk and a sequence of


surfaces (tracks) in each cylinder, the sorted records are stored starting
from the first cylinder and proceeding in the assumed sequence through
the cylinders.

• An index of the first record in each cylinder is maintained using the field on
which the records are sorted as the key field. This is the cylinder index.
• The key values of this index defines the ranges of the key value present in
each cylinder, so that for a search of a particular record the cylinder in
which the record may be present can be found out.

• So the search for the record can be performed only in that cylinder.

• Again within a cylinder, records are stored starting from the first surface
and through the assumed sequence of surfaces.

• Here an index of the first record in each surface is maintained using the
same key field. This is the surface index.

• There is a cylinder index for the entire file and one surface index for each
cylinder that the file spreads over.
• The idea of cylinder surface index can be used even without
considering cylinders and surfaces.

• If the records are sorted and kept in a disk file, then an index
containing the (key-value, offset) of equally spaced records records
can help a sequential search by requiring the search first in the index,
and then in the proper segment of the disk file.
Hashed indexing
• Hash functions and overflow handling techniques are used for hashing.

• Since the index is to be maintained on disk and disk access times are generally
several orders of magnitude larger than internal memory access times, much
consideration is given to hash table design and the choice of an overflow handling
technique.

Overflow handling techniques:

1. rehashing

2. open addressing (random[f(i)=random()], quadratic, linear)

3. chaining
• Expected number of bucket accesses when s=1 is roughly same for method
rehashing, random and quadratic probing.

• Since the hash table is on a disk and these overflow techniques tend to
randomize the use of hash table, we can expect each bucket access to take
almost the maximum seek time.

• In case of linear probing however overflow buckets are adjacent to home


bucket so their retrieval will require minimum seek time.

• While using chaining we can minimize the tendency to randomize use of


overflow area by designating certain tracks as overflow tracks for particular
buckets.
Tree Indexing B Tree
• The basic idea in searching using any search tree is to successively
divide the searching domain (set of elements where search is to be
done), until either the search is successful or it is found that the
element is not present in the given set.

• An m-way search tree is a very general search tree where the


elements are ordered on a key field and m denotes the maximum
number of partitions that the set can be divided into at any level.
• In other words it is the maximum number of children that each node
can have.

• At each level the root of the tree can contain upto m - 1 key values
that define the partition boundaries.

• A binary search tree is a m-way search tree of order 2.


Trie indexing
• An index structure that is particularly useful when key values are of
varying size is trie.

• A trie is a tree of degree m>=2 in which the branching at any level is


determined not by the entire key value but by only a portion of it.

• The trie contains two types of nodes. First type called the branch
node and second information node.
• Searching a trie for a key value X requires breaking up X into its
consequent characters and following the branching patterns
determined by these characters.

• The trie is a data structure that can be used to do a fast search in a


large text.

• For example, we can think of the Oxford English dictionary which


contains several gigabytes of text.
• The Oxford dictionary is a static structure because we do not want to
add or delete any items.

• However, searching for an item in the dictionary is very important.


Also, searching for a string should be efficient because overlapping of
strings can occur.
Thank you

You might also like