Nothing Special   »   [go: up one dir, main page]

4 Chapter17 Index

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Copyright © 2016 Ramez Elmasri and Shamkant B.

Navathe
CHAPTER 17

Indexing Structures for Files and


Physical Database Design

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Introduction
◼ Indexes used to speed up record retrieval in
response to certain search conditions
◼ Index structures provide secondary access paths
◼ Any field can be used to create an index
◼ Multiple indexes can be constructed
◼ Most indexes based on ordered files
◼ Tree data structures organize the index

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 3


17.1 Types of Single-Level Ordered
Indexes
◼ Ordered index similar to index in a textbook
◼ Indexing field (attribute)
◼ Index stores each value of the index field with list
of pointers to all disk blocks that contain records
with that field value
◼ Values in index are ordered
◼ Primary index
◼ Specified on the ordering key field of ordered file
of records

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 4


Types of Single-Level Ordered
Indexes (cont’d.)
◼ Clustering index
◼ Used if numerous records can have the same
value for the ordering field
◼ Secondary index
◼ Can be specified on any nonordering field
◼ Data file can have several secondary indexes

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 5


Primary Indexes
◼ Ordered file with two fields
◼ Primary key, K(i)
◼ Pointer to a disk block, P(i)
◼ One index entry in the index file for each block in
the data file
◼ Indexes may be dense or sparse
◼ Dense index has an index entry for every search
key value in the data file
◼ Sparse index has entries for only some search
values
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 6
Primary Indexes (cont’d.)

Figure 17.1 Primary index on the ordering key field of the file shown in Figure 16.7
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-7
Primary Indexes (cont’d.)
◼ Major problem: insertion and deletion of records
◼ Move records around and change index values
◼ Solutions
◼ Use unordered overflow file
◼ Use linked list of overflow records

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 8


Clustering Indexes
◼ Clustering field
◼ File records are physically ordered on a nonkey
field without a distinct value for each record
◼ Ordered file with two fields
◼ Same type as clustering field
◼ Disk block pointer

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 9


Clustering Indexes (cont’d.)

Figure 17.2 A clustering index on the Dept_number ordering


nonkey field of an EMPLOYEE file
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-10
Secondary Indexes
◼ Provide secondary means of accessing a data file
◼ Some primary access exists
◼ Ordered file with two fields
◼ Indexing field, K(i)
◼ Block pointer or record pointer, P(i)
◼ Usually need more storage space and longer
search time than primary index
◼ Improved search time for arbitrary record

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 11


Secondary Indexes (cont’d.)

Figure 17.4 Dense


secondary index (with
block pointers) on a
nonordering key field
of a file.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-12


Types of Single-Level Ordered
Indexes (cont’d.)

Table 17.1 Types of indexes based on the properties of the indexing field

Table 17.2 Properties of index types


Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-13
17.2 Multilevel Indexes
◼ Designed to greatly reduce remaining search
space as search is conducted
◼ Index file
◼ Considered first (or base level) of a multilevel
index
◼ Second level
◼ Primary index to the first level
◼ Third level
◼ Primary index to the second level

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 14


Figure 17.6 A two-level
primary index resembling
ISAM (indexed sequential
access method) organization

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-15


Example 1

Suppose that we have an ordered file with r = 300,000


records stored on a disk with block size B = 4,096 bytes
File records are of fixed size and are unspanned, with
record length R = 100 bytes.
A- What is the blocking factor?
B- What is the number of needed blocks?
C- What is the cost of linear search?
D- What is the cost of binary search on the file?

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-16


Example

Suppose we want to search for a record with a specific value with an index. The
index value V = 9 bytes long. Suppose a block pointer is P = 6 bytes.
E- How many index entries per block?

F- How many blocks for a primary index?

G. How many blocks will be accessed using the dense secondary index?

H- Binary search on secondary index?

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-17


Example
I- Suppose that the dense secondary index is converted into a
multilevel index.
How many blocks will be accessed?
Secondary Index Level3 5/273= 1 Block

Secondary Index Level2 1,099/273= 5 Blocks

Secondary Index Level1 300,000/273= 1,099 Blocks

Data File 300,000/40= 7500 Blocks

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-18


17.3 Dynamic Multilevel Indexes
Using B-Trees and B+ -Trees
◼ Tree data structure terminology
◼ Tree is formed of nodes
◼ Each node (except root) has one parent and zero
or more child nodes
◼ Leaf node has no child nodes
◼ Unbalanced if leaf nodes occur at different levels
◼ Nonleaf node called internal node
◼ Subtree of node consists of node and all
descendant nodes

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 19


Tree Data Structure

Figure 17.7 A tree data structure that shows an unbalanced tree

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-20


Search Trees and B-Trees
◼ Search tree used to guide search for a record
◼ Given value of one of record’s fields

Figure 17.8 A node in a search tree with pointers to subtrees below it

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 21


Search Trees and B-Trees (cont’d.)
◼ Algorithms necessary for inserting and deleting
search values into and from the tree

Figure 17.9 A search tree of order p = 3

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 22


B-Trees
◼ Provide multi-level access structure
◼ Tree is always balanced
◼ Space wasted by deletion never becomes
excessive
◼ Each node is at least half-full
◼ Each node in a B-tree of order p can have at
most p-1 search values

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 23


B-Tree Structures

Figure 17.10 B-tree structures (a) A node in a B-tree with q−1 search values (b) A
B-tree of order p=3. The values were inserted in the order 8, 5, 1, 7, 3, 12, 9, 6
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-24
Demo

https://www.cs.usfca.edu/~galles/visualization/BTree.html

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-25


B+ -Trees
◼ Data pointers stored only at the leaf nodes
◼ Leaf nodes have an entry for every value of the
search field, and a data pointer to the record if
search field is a key field
◼ For a nonkey search field, the pointer points to a
block containing pointers to the data file records
◼ Internal nodes
◼ Some search field values from the leaf nodes
repeated to guide search

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 26


B+ -Trees (cont’d.)

Figure 17.11 The nodes of a B+-tree (a) Internal node of a B+-tree with q−1 search
values (b) Leaf node of a B+-tree with q−1 search values and q−1 data pointers

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-27


Searching for a Record With Search
Key Field Value K, Using a B+ -Tree

Algorithm 17.2 Searching for a record with search key field value K, using a B+ -Tree

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 28


Demo
◼ https://www.cs.usfca.edu/~galles/visualization/BP
lusTree.html

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 29


17.4 Indexes on Multiple Keys
◼ Multiple attributes involved in many retrieval and
update requests
◼ Composite keys
◼ Access structure using key value that combines
attributes
◼ Partitioned hashing
◼ Suitable for equality comparisons

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 30


Indexes on Multiple Keys (cont’d.)
◼ Grid files
◼ Array with one dimension for each search attribute

Figure 17.14 Example of a grid array on Dno and Age attributes

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 31


17.5 Other Types of Indexes
◼ Hash indexes
◼ Secondary structure for file access
◼ Uses hashing on a search key other than the one
used for the primary data file organization
◼ Index entries of form (K, Pr) or (K, P)
◼ Pr: pointer to the record containing the key
◼ P: pointer to the block containing the record for that
key

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 32


Hash Indexes (cont’d.)

Figure 17.15 Hash-based indexing


Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17-33
Bitmap Indexes
◼ Used with a large number of rows
◼ Creates an index for one or more columns
◼ Each value or value range in the column is
indexed
◼ Built on one particular value of a particular field
◼ Array of bits
◼ Existence bitmap
◼ Bitmaps for B+ -tree leaf nodes

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 34


Bitmap Indexes
EmpID Gender City
10 M Riyadh
20 F Riyadh
30 M Jeddah
40 F Riyadh
50 F Makkah
Gender Bitmap City Bitmap
EmpID M F EmpID Riyadh Jeddah Makkah
10 1 0 10 1 0 0
20 0 1 20 1 0 0
30 1 0 30 0 1 0
40 0 1 40 1 0 0
50 0 1 50 0 0 1

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 35


Bitmap Indexes

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 16- 36


17.6 Some General Issues
Concerning Indexing
◼ Physical index
◼ Pointer specifies physical record address
◼ Disadvantage: pointer must be changed if record
is moved
◼ Logical index
◼ Used when physical record addresses expected to
change frequently
◼ Entries of the form (K, Kp)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 37


Index Creation
◼ General form of the command to create an index

◼ Unique and cluster keywords optional


◼ Order can be ASC or DESC
◼ Secondary indexes can be created for any
primary record organization
◼ Complements other primary access methods

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 38


Tuning Indexes
◼ Tuning goals
◼ Dynamically evaluate requirements
◼ Reorganize indexes to yield best performance
◼ Reasons for revising initial index choice
◼ Certain queries may take too long to run due to
lack of an index
◼ Certain indexes may not get utilized
◼ Certain indexes may undergo too much updating if
based on an attribute that undergoes frequent
changes

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 39


Physical Database Design in
Relational Databases (cont’d.)
◼ Analyzing the expected frequency of invocation of
queries and transactions
◼ Expected frequency of using each attribute as a
selection or join attribute
◼ 80-20 rule: 80 percent of processing accounted for
by only 20 percent of queries and transactions
◼ Analyzing the time constraints of queries and
transactions
◼ Selection attributes associated with time
constraints are candidates for primary access
structures
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 40
17.8 Summary
◼ Indexes are access structures that improve
efficiency of record retrieval from a data file
◼ Ordered single-level index types
◼ Primary, clustering, and secondary
◼ Multilevel indexes can be implemented as B-trees
and B+ -trees
◼ Dynamic structures
◼ Multiple key access methods
◼ Logical and physical indexes

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 17- 41

You might also like