4 Chapter17 Index

Copyright © 2016 Ramez Elmasri and Shamkant B.
Navathe
CHAPTER 17
Indexing Structures for Files and

Physical Database Design
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe

Introduction
◼ Indexes used to speed up record retrieval in
response to certain search conditions
◼ Index structures provide secondary access paths
◼ Any field can be used to create an index
◼ Multiple indexes can be constructed
◼ Most indexes based on ordered files
◼ Tree data structures organize the index
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 3

17.1 Types of Single-Level Ordered
Indexes
◼ Ordered index similar to index in a textbook
◼ Indexing field (attribute)
◼ Index stores each value of the index field with list
of pointers to all disk blocks that contain records
with that field value
◼ Values in index are ordered
◼ Primary index
◼ Specified on the ordering key field of ordered file
of records

Types of Single-Level Ordered
Indexes (cont’d.)
◼ Clustering index
◼ Used if numerous records can have the same
value for the ordering field
◼ Secondary index
◼ Can be specified on any nonordering field
◼ Data file can have several secondary indexes

Primary Indexes
◼ Ordered file with two fields
◼ Primary key, K(i)
◼ Pointer to a disk block, P(i)
◼ One index entry in the index file for each block in
the data file
◼ Indexes may be dense or sparse
◼ Dense index has an index entry for every search
key value in the data file
◼ Sparse index has entries for only some search
values
Primary Indexes (cont’d.)
Figure 17.1 Primary index on the ordering key field of the file shown in Figure 16.7
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-7
Primary Indexes (cont’d.)
◼ Major problem: insertion and deletion of records
◼ Move records around and change index values
◼ Solutions
◼ Use unordered overflow file
◼ Use linked list of overflow records

Clustering Indexes
◼ Clustering field
◼ File records are physically ordered on a nonkey
field without a distinct value for each record
◼ Same type as clustering field
◼ Disk block pointer

Clustering Indexes (cont’d.)
Figure 17.2 A clustering index on the Dept_number ordering

nonkey field of an EMPLOYEE file
Secondary Indexes
◼ Provide secondary means of accessing a data file
◼ Some primary access exists
◼ Indexing field, K(i)
◼ Block pointer or record pointer, P(i)
◼ Usually need more storage space and longer
search time than primary index
◼ Improved search time for arbitrary record

Secondary Indexes (cont’d.)
Figure 17.4 Dense

secondary index (with
block pointers) on a
nonordering key field
of a file.

Types of Single-Level Ordered
Indexes (cont’d.)
Table 17.1 Types of indexes based on the properties of the indexing field
Table 17.2 Properties of index types

17.2 Multilevel Indexes
◼ Designed to greatly reduce remaining search
space as search is conducted
◼ Index file
◼ Considered first (or base level) of a multilevel
index
◼ Second level
◼ Primary index to the first level
◼ Third level
◼ Primary index to the second level

Figure 17.6 A two-level
primary index resembling
ISAM (indexed sequential
access method) organization

Example 1
Suppose that we have an ordered file with r = 300,000

records stored on a disk with block size B = 4,096 bytes
File records are of fixed size and are unspanned, with
record length R = 100 bytes.
A- What is the blocking factor?
B- What is the number of needed blocks?
C- What is the cost of linear search?
D- What is the cost of binary search on the file?

Example
Suppose we want to search for a record with a specific value with an index. The
index value V = 9 bytes long. Suppose a block pointer is P = 6 bytes.
E- How many index entries per block?
F- How many blocks for a primary index?
G. How many blocks will be accessed using the dense secondary index?
H- Binary search on secondary index?

Example
I- Suppose that the dense secondary index is converted into a
multilevel index.
How many blocks will be accessed?
Secondary Index Level3 5/273= 1 Block
Secondary Index Level2 1,099/273= 5 Blocks
Secondary Index Level1 300,000/273= 1,099 Blocks
Data File 300,000/40= 7500 Blocks

17.3 Dynamic Multilevel Indexes
Using B-Trees and B+ -Trees
◼ Tree data structure terminology
◼ Tree is formed of nodes
◼ Each node (except root) has one parent and zero
or more child nodes
◼ Leaf node has no child nodes
◼ Unbalanced if leaf nodes occur at different levels
◼ Nonleaf node called internal node
◼ Subtree of node consists of node and all
descendant nodes

Tree Data Structure
Figure 17.7 A tree data structure that shows an unbalanced tree

Search Trees and B-Trees
◼ Search tree used to guide search for a record
◼ Given value of one of record’s fields
Figure 17.8 A node in a search tree with pointers to subtrees below it

Search Trees and B-Trees (cont’d.)
◼ Algorithms necessary for inserting and deleting
search values into and from the tree
Figure 17.9 A search tree of order p = 3

B-Trees
◼ Provide multi-level access structure
◼ Tree is always balanced
◼ Space wasted by deletion never becomes
excessive
◼ Each node is at least half-full
◼ Each node in a B-tree of order p can have at
most p-1 search values

B-Tree Structures
Figure 17.10 B-tree structures (a) A node in a B-tree with q−1 search values (b) A
B-tree of order p=3. The values were inserted in the order 8, 5, 1, 7, 3, 12, 9, 6
Demo
https://www.cs.usfca.edu/~galles/visualization/BTree.html

B+ -Trees
◼ Data pointers stored only at the leaf nodes
◼ Leaf nodes have an entry for every value of the
search field, and a data pointer to the record if
search field is a key field
◼ For a nonkey search field, the pointer points to a
block containing pointers to the data file records
◼ Internal nodes
◼ Some search field values from the leaf nodes
repeated to guide search

B+ -Trees (cont’d.)
Figure 17.11 The nodes of a B+-tree (a) Internal node of a B+-tree with q−1 search
values (b) Leaf node of a B+-tree with q−1 search values and q−1 data pointers

Searching for a Record With Search
Key Field Value K, Using a B+ -Tree
Algorithm 17.2 Searching for a record with search key field value K, using a B+ -Tree

Demo
◼ https://www.cs.usfca.edu/~galles/visualization/BP
lusTree.html

17.4 Indexes on Multiple Keys
◼ Multiple attributes involved in many retrieval and
update requests
◼ Composite keys
◼ Access structure using key value that combines
attributes
◼ Partitioned hashing
◼ Suitable for equality comparisons

Indexes on Multiple Keys (cont’d.)
◼ Grid files
◼ Array with one dimension for each search attribute
Figure 17.14 Example of a grid array on Dno and Age attributes

17.5 Other Types of Indexes
◼ Hash indexes
◼ Secondary structure for file access
◼ Uses hashing on a search key other than the one
used for the primary data file organization
◼ Index entries of form (K, Pr) or (K, P)
◼ Pr: pointer to the record containing the key
◼ P: pointer to the block containing the record for that
key

Hash Indexes (cont’d.)
Figure 17.15 Hash-based indexing

Bitmap Indexes
◼ Used with a large number of rows
◼ Creates an index for one or more columns
◼ Each value or value range in the column is
indexed
◼ Built on one particular value of a particular field
◼ Array of bits
◼ Existence bitmap
◼ Bitmaps for B+ -tree leaf nodes

Bitmap Indexes
EmpID Gender City
10 M Riyadh
20 F Riyadh
30 M Jeddah
40 F Riyadh
50 F Makkah
Gender Bitmap City Bitmap
EmpID M F EmpID Riyadh Jeddah Makkah
10 1 0 10 1 0 0
20 0 1 20 1 0 0
30 1 0 30 0 1 0
40 0 1 40 1 0 0
50 0 1 50 0 0 1

Bitmap Indexes

17.6 Some General Issues
Concerning Indexing
◼ Physical index
◼ Pointer specifies physical record address
◼ Disadvantage: pointer must be changed if record
is moved
◼ Logical index
◼ Used when physical record addresses expected to
change frequently
◼ Entries of the form (K, Kp)

Index Creation
◼ General form of the command to create an index
◼ Unique and cluster keywords optional

◼ Order can be ASC or DESC
◼ Secondary indexes can be created for any
primary record organization
◼ Complements other primary access methods

Tuning Indexes
◼ Tuning goals
◼ Dynamically evaluate requirements
◼ Reorganize indexes to yield best performance
◼ Reasons for revising initial index choice
◼ Certain queries may take too long to run due to
lack of an index
◼ Certain indexes may not get utilized
◼ Certain indexes may undergo too much updating if
based on an attribute that undergoes frequent
changes

Physical Database Design in
Relational Databases (cont’d.)
◼ Analyzing the expected frequency of invocation of
queries and transactions
◼ Expected frequency of using each attribute as a
selection or join attribute
◼ 80-20 rule: 80 percent of processing accounted for
by only 20 percent of queries and transactions
◼ Analyzing the time constraints of queries and
transactions
◼ Selection attributes associated with time
constraints are candidates for primary access
structures
17.8 Summary
◼ Indexes are access structures that improve
efficiency of record retrieval from a data file
◼ Ordered single-level index types
◼ Primary, clustering, and secondary
◼ Multilevel indexes can be implemented as B-trees
and B+ -trees
◼ Dynamic structures
◼ Multiple key access methods
◼ Logical and physical indexes

4 Chapter17 Index

Uploaded by

Copyright:

Available Formats

4 Chapter17 Index

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4 Chapter17 Index

Uploaded by

Copyright:

Available Formats

Copyright © 2016 Ramez Elmasri and Shamkant B.

Indexing Structures for Files and

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 3

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 4

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 5

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 8

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 9

Figure 17.2 A clustering index on the Dept_number ordering

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 11

Figure 17.4 Dense

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-12

Table 17.2 Properties of index types

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 14

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-15

Suppose that we have an ordered file with r = 300,000

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-16

F- How many blocks for a primary index?

H- Binary search on secondary index?

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-17

Secondary Index Level2 1,099/273= 5 Blocks

Secondary Index Level1 300,000/273= 1,099 Blocks

Data File 300,000/40= 7500 Blocks

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-18

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 19

Figure 17.7 A tree data structure that shows an unbalanced tree

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-20

Figure 17.8 A node in a search tree with pointers to subtrees below it

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 21

Figure 17.9 A search tree of order p = 3

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 22

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 23

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-25

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 26

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-27

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 28

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 29

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 30

Figure 17.14 Example of a grid array on Dno and Age attributes

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 31

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 32

Figure 17.15 Hash-based indexing

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 34

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 35

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 36

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 37

◼ Unique and cluster keywords optional

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 38

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 39

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 41

You might also like