Nothing Special   »   [go: up one dir, main page]

SQL Tuning Guide Pt1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 300

Oracle® Database

SQL Tuning Guide

18c
E84296-06
August 2021
Oracle Database SQL Tuning Guide, 18c

E84296-06

Copyright © 2013, 2021, Oracle and/or its affiliates.

Primary Author: Lance Ashdown

Contributing Authors: Nigel Bayliss, Maria Colgan, Tom Kyte

Contributors: Hermann Baer, Bjorn Bolltoft, Ali Cakmak, Sunil Chakkappen, Immanuel Chan, Deba
Chatterjee, Chris Chiappa, Dinesh Das, Kurt Engeleiter, Leonidas Galanis, William Endress, Marcus Fallen,
Bruce Golbus, Katsumi Inoue, Praveen Kumar Tupati Jaganath, Mark Jefferys, Shantanu Joshi, Adam
Kociubes, Keith Laker, Allison Lee, Sue Lee, Cheng Li, David McDermid, Colin McGregor, Ajit Mylavarapu,
Ted Persky, Lei Sheng, Ekrem Soylemez, Hong Su, Murali Thiyagarajah, Randy Urbano, Sahil Vazirani,
Bharath Venkatakrishnan, Hailing Yu, John Zimmerman

Contributors: Frederick Kush

This software and related documentation are provided under a license agreement containing restrictions on
use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your
license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license,
transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse
engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is
prohibited.

The information contained herein is subject to change without notice and is not warranted to be error-free. If
you find any errors, please report them to us in writing.

If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on
behalf of the U.S. Government, then the following notice is applicable:

U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software,
any programs embedded, installed or activated on delivered hardware, and modifications of such programs)
and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end
users are "commercial computer software" or "commercial computer software documentation" pursuant to the
applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use,
reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/or
adaptation of i) Oracle programs (including any operating system, integrated software, any programs
embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle
computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the
license contained in the applicable contract. The terms governing the U.S. Government’s use of Oracle cloud
services are defined by the applicable contract for such services. No other rights are granted to the U.S.
Government.

This software or hardware is developed for general use in a variety of information management applications.
It is not developed or intended for use in any inherently dangerous applications, including applications that
may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you
shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its
safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this
software or hardware in dangerous applications.

Oracle, Java, and MySQL are registered trademarks of Oracle and/or its affiliates. Other names may be
trademarks of their respective owners.

Intel and Intel Inside are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are
used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Epyc,
and the AMD logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered
trademark of The Open Group.

This software or hardware and documentation may provide access to or information about content, products,
and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly
disclaim all warranties of any kind with respect to third-party content, products, and services unless otherwise
set forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not be
responsible for any loss, costs, or damages incurred due to your access to or use of third-party content,
products, or services, except as set forth in an applicable agreement between you and Oracle.
Contents
Preface
Audience xxv
Documentation Accessibility xxv
Related Documents xxv
Conventions xxvi

Changes in This Release for Oracle Database SQL Tuning Guide


Changes in Oracle Database Release 18c, Version 18.1 xxvii
Changes in Oracle Database 12c Release 2 (12.2.0.1) xxviii
Changes in Oracle Database 12c Release 1 (12.1.0.2) xxxi
Changes in Oracle Database 12c Release 1 (12.1.0.1) xxxi

Part I SQL Performance Fundamentals

1 Introduction to SQL Tuning


1.1 About SQL Tuning 1-1
1.2 Purpose of SQL Tuning 1-1
1.3 Prerequisites for SQL Tuning 1-1
1.4 Tasks and Tools for SQL Tuning 1-2
1.4.1 SQL Tuning Tasks 1-2
1.4.2 SQL Tuning Tools 1-4
1.4.2.1 Automated SQL Tuning Tools 1-4
1.4.2.2 Manual SQL Tuning Tools 1-7
1.4.3 User Interfaces to SQL Tuning Tools 1-9

2 SQL Performance Methodology


2.1 Guidelines for Designing Your Application 2-1
2.1.1 Guideline for Data Modeling 2-1
2.1.2 Guideline for Writing Efficient Applications 2-1

iii
2.2 Guidelines for Deploying Your Application 2-3
2.2.1 Guideline for Deploying in a Test Environment 2-3
2.2.2 Guidelines for Application Rollout 2-4

Part II Query Optimizer Fundamentals

3 SQL Processing
3.1 About SQL Processing 3-1
3.1.1 SQL Parsing 3-2
3.1.1.1 Syntax Check 3-2
3.1.1.2 Semantic Check 3-2
3.1.1.3 Shared Pool Check 3-3
3.1.2 SQL Optimization 3-5
3.1.3 SQL Row Source Generation 3-5
3.1.4 SQL Execution 3-7
3.2 How Oracle Database Processes DML 3-8
3.2.1 How Row Sets Are Fetched 3-8
3.2.2 Read Consistency 3-9
3.2.3 Data Changes 3-9
3.3 How Oracle Database Processes DDL 3-9

4 Query Optimizer Concepts


4.1 Introduction to the Query Optimizer 4-1
4.1.1 Purpose of the Query Optimizer 4-1
4.1.2 Cost-Based Optimization 4-1
4.1.3 Execution Plans 4-2
4.1.3.1 Query Blocks 4-2
4.1.3.2 Query Subplans 4-3
4.1.3.3 Analogy for the Optimizer 4-3
4.2 About Optimizer Components 4-4
4.2.1 Query Transformer 4-5
4.2.2 Estimator 4-5
4.2.2.1 Selectivity 4-7
4.2.2.2 Cardinality 4-8
4.2.2.3 Cost 4-8
4.2.3 Plan Generator 4-9
4.3 About Automatic Tuning Optimizer 4-11
4.4 About Adaptive Query Optimization 4-12
4.4.1 Adaptive Query Plans 4-12

iv
4.4.1.1 About Adaptive Query Plans 4-13
4.4.1.2 Purpose of Adaptive Query Plans 4-13
4.4.1.3 How Adaptive Query Plans Work 4-14
4.4.1.4 When Adaptive Query Plans Are Enabled 4-20
4.4.2 Adaptive Statistics 4-21
4.4.2.1 Dynamic Statistics 4-21
4.4.2.2 Automatic Reoptimization 4-21
4.4.2.3 SQL Plan Directives 4-24
4.4.2.4 When Adaptive Statistics Are Enabled 4-24
4.5 About Approximate Query Processing 4-25
4.5.1 Approximate Query Initialization Parameters 4-26
4.5.2 Approximate Query SQL Functions 4-26
4.6 About SQL Plan Management 4-28
4.7 About the Expression Statistics Store (ESS) 4-29

5 Query Transformations
5.1 OR Expansion 5-1
5.2 View Merging 5-3
5.2.1 Query Blocks in View Merging 5-4
5.2.2 Simple View Merging 5-4
5.2.3 Complex View Merging 5-7
5.3 Predicate Pushing 5-9
5.4 Subquery Unnesting 5-11
5.5 Query Rewrite with Materialized Views 5-11
5.5.1 About Query Rewrite and the Optimizer 5-12
5.5.2 About Initialization Parameters for Query Rewrite 5-12
5.5.3 About the Accuracy of Query Rewrite 5-13
5.5.4 Example of Query Rewrite 5-14
5.6 Star Transformation 5-15
5.6.1 About Star Schemas 5-15
5.6.2 Purpose of Star Transformations 5-16
5.6.3 How Star Transformation Works 5-16
5.6.4 Controls for Star Transformation 5-16
5.6.5 Star Transformation: Scenario 5-17
5.6.6 Temporary Table Transformation: Scenario 5-20
5.7 In-Memory Aggregation (VECTOR GROUP BY) 5-22
5.8 Cursor-Duration Temporary Tables 5-22
5.8.1 Purpose of Cursor-Duration Temporary Tables 5-22
5.8.2 How Cursor-Duration Temporary Tables Work 5-23
5.8.3 Cursor-Duration Temporary Tables: Example 5-23

v
5.9 Table Expansion 5-25
5.9.1 Purpose of Table Expansion 5-25
5.9.2 How Table Expansion Works 5-25
5.9.3 Table Expansion: Scenario 5-26
5.9.4 Table Expansion and Star Transformation: Scenario 5-29
5.10 Join Factorization 5-31
5.10.1 Purpose of Join Factorization 5-31
5.10.2 How Join Factorization Works 5-31
5.10.3 Factorization and Join Orders: Scenario 5-32
5.10.4 Factorization of Outer Joins: Scenario 5-33

Part III Query Execution Plans

6 Generating and Displaying Execution Plans


6.1 Introduction to Execution Plans 6-1
6.2 About Plan Generation and Display 6-1
6.2.1 About the Plan Explanation 6-1
6.2.2 Why Execution Plans Change 6-2
6.2.2.1 Different Schemas 6-2
6.2.2.2 Different Costs 6-2
6.2.3 Guideline for Minimizing Throw-Away 6-3
6.2.4 Guidelines for Evaluating Execution Plans Using EXPLAIN PLAN 6-3
6.2.4.1 Guidelines for Evaluating Plans Using the V$SQL_PLAN Views 6-4
6.2.5 EXPLAIN PLAN Restrictions 6-4
6.2.6 Guidelines for Creating PLAN_TABLE 6-5
6.3 Generating Plan Output Using the EXPLAIN PLAN Statement 6-5
6.3.1 Executing EXPLAIN PLAN for a Single Statement 6-6
6.3.2 Executing EXPLAIN PLAN Using a Statement ID 6-7
6.3.3 Directing EXPLAIN PLAN Output to a Nondefault Table 6-7
6.4 Displaying PLAN_TABLE Output 6-7
6.4.1 Displaying an Execution Plan: Example 6-8
6.4.2 Customizing PLAN_TABLE Output 6-10

7 Reading Execution Plans


7.1 Reading Execution Plans: Basic 7-1
7.2 Reading Execution Plans: Advanced 7-2
7.2.1 Reading Adaptive Query Plans 7-2
7.2.2 Viewing Parallel Execution with EXPLAIN PLAN 7-6
7.2.2.1 About EXPLAIN PLAN and Parallel Queries 7-6

vi
7.2.2.2 Viewing Parallel Queries with EXPLAIN PLAN: Example 7-7
7.2.3 Viewing Bitmap Indexes with EXPLAIN PLAN 7-8
7.2.4 Viewing Result Cache with EXPLAIN PLAN 7-9
7.2.5 Viewing Partitioned Objects with EXPLAIN PLAN 7-10
7.2.5.1 Displaying Range and Hash Partitioning with EXPLAIN PLAN: Examples 7-10
7.2.5.2 Pruning Information with Composite Partitioned Objects: Examples 7-12
7.2.5.3 Examples of Partial Partition-Wise Joins 7-14
7.2.5.4 Example of Full Partition-Wise Join 7-16
7.2.5.5 Examples of INLIST ITERATOR and EXPLAIN PLAN 7-17
7.2.5.6 Example of Domain Indexes and EXPLAIN PLAN 7-18
7.2.6 PLAN_TABLE Columns 7-18
7.3 Execution Plan Reference 7-29
7.3.1 Execution Plan Views 7-29
7.3.2 PLAN_TABLE Columns 7-30
7.3.3 DBMS_XPLAN Display Functions 7-39

Part IV SQL Operators: Access Paths and Joins

8 Optimizer Access Paths


8.1 Introduction to Access Paths 8-1
8.2 Table Access Paths 8-2
8.2.1 About Heap-Organized Table Access 8-2
8.2.1.1 Row Storage in Data Blocks and Segments: A Primer 8-2
8.2.1.2 Importance of Rowids for Row Access 8-3
8.2.1.3 Direct Path Reads 8-4
8.2.2 Full Table Scans 8-5
8.2.2.1 When the Optimizer Considers a Full Table Scan 8-5
8.2.2.2 How a Full Table Scan Works 8-6
8.2.2.3 Full Table Scan: Example 8-7
8.2.3 Table Access by Rowid 8-8
8.2.3.1 When the Optimizer Chooses Table Access by Rowid 8-8
8.2.3.2 How Table Access by Rowid Works 8-8
8.2.3.3 Table Access by Rowid: Example 8-8
8.2.4 Sample Table Scans 8-9
8.2.4.1 When the Optimizer Chooses a Sample Table Scan 8-9
8.2.4.2 Sample Table Scans: Example 8-10
8.2.5 In-Memory Table Scans 8-10
8.2.5.1 When the Optimizer Chooses an In-Memory Table Scan 8-11
8.2.5.2 In-Memory Query Controls 8-11
8.2.5.3 In-Memory Table Scans: Example 8-12

vii
8.3 B-Tree Index Access Paths 8-12
8.3.1 About B-Tree Index Access 8-13
8.3.1.1 B-Tree Index Structure 8-13
8.3.1.2 How Index Storage Affects Index Scans 8-14
8.3.1.3 Unique and Nonunique Indexes 8-15
8.3.1.4 B-Tree Indexes and Nulls 8-15
8.3.2 Index Unique Scans 8-17
8.3.2.1 When the Optimizer Considers Index Unique Scans 8-17
8.3.2.2 How Index Unique Scans Work 8-18
8.3.2.3 Index Unique Scans: Example 8-19
8.3.3 Index Range Scans 8-20
8.3.3.1 When the Optimizer Considers Index Range Scans 8-20
8.3.3.2 How Index Range Scans Work 8-21
8.3.3.3 Index Range Scan: Example 8-22
8.3.3.4 Index Range Scan Descending: Example 8-23
8.3.4 Index Full Scans 8-24
8.3.4.1 When the Optimizer Considers Index Full Scans 8-24
8.3.4.2 How Index Full Scans Work 8-24
8.3.4.3 Index Full Scans: Example 8-25
8.3.5 Index Fast Full Scans 8-26
8.3.5.1 When the Optimizer Considers Index Fast Full Scans 8-26
8.3.5.2 How Index Fast Full Scans Work 8-26
8.3.5.3 Index Fast Full Scans: Example 8-26
8.3.6 Index Skip Scans 8-27
8.3.6.1 When the Optimizer Considers Index Skips Scans 8-27
8.3.6.2 How Index Skip Scans Work 8-27
8.3.6.3 Index Skip Scans: Example 8-28
8.3.7 Index Join Scans 8-29
8.3.7.1 When the Optimizer Considers Index Join Scans 8-29
8.3.7.2 How Index Join Scans Work 8-30
8.3.7.3 Index Join Scans: Example 8-30
8.4 Bitmap Index Access Paths 8-31
8.4.1 About Bitmap Index Access 8-31
8.4.1.1 Differences Between Bitmap and B-Tree Indexes 8-32
8.4.1.2 Purpose of Bitmap Indexes 8-33
8.4.1.3 Bitmaps and Rowids 8-34
8.4.1.4 Bitmap Join Indexes 8-35
8.4.1.5 Bitmap Storage 8-36
8.4.2 Bitmap Conversion to Rowid 8-36
8.4.2.1 When the Optimizer Chooses Bitmap Conversion to Rowid 8-37
8.4.2.2 How Bitmap Conversion to Rowid Works 8-37

viii
8.4.2.3 Bitmap Conversion to Rowid: Example 8-37
8.4.3 Bitmap Index Single Value 8-37
8.4.3.1 When the Optimizer Considers Bitmap Index Single Value 8-38
8.4.3.2 How Bitmap Index Single Value Works 8-38
8.4.3.3 Bitmap Index Single Value: Example 8-38
8.4.4 Bitmap Index Range Scans 8-39
8.4.4.1 When the Optimizer Considers Bitmap Index Range Scans 8-39
8.4.4.2 How Bitmap Index Range Scans Work 8-39
8.4.4.3 Bitmap Index Range Scans: Example 8-40
8.4.5 Bitmap Merge 8-40
8.4.5.1 When the Optimizer Considers Bitmap Merge 8-41
8.4.5.2 How Bitmap Merge Works 8-41
8.4.5.3 Bitmap Merge: Example 8-41
8.5 Table Cluster Access Paths 8-42
8.5.1 Cluster Scans 8-42
8.5.1.1 When the Optimizer Considers Cluster Scans 8-42
8.5.1.2 How a Cluster Scan Works 8-43
8.5.1.3 Cluster Scans: Example 8-43
8.5.2 Hash Scans 8-44
8.5.2.1 When the Optimizer Considers a Hash Scan 8-44
8.5.2.2 How a Hash Scan Works 8-44
8.5.2.3 Hash Scans: Example 8-45

9 Joins
9.1 About Joins 9-1
9.1.1 Join Trees 9-1
9.1.2 How the Optimizer Executes Join Statements 9-3
9.1.3 How the Optimizer Chooses Execution Plans for Joins 9-4
9.2 Join Methods 9-5
9.2.1 Nested Loops Joins 9-6
9.2.1.1 When the Optimizer Considers Nested Loops Joins 9-6
9.2.1.2 How Nested Loops Joins Work 9-7
9.2.1.3 Nested Nested Loops 9-7
9.2.1.4 Current Implementation for Nested Loops Joins 9-10
9.2.1.5 Original Implementation for Nested Loops Joins 9-13
9.2.1.6 Nested Loops Controls 9-14
9.2.2 Hash Joins 9-16
9.2.2.1 When the Optimizer Considers Hash Joins 9-16
9.2.2.2 How Hash Joins Work 9-17
9.2.2.3 How Hash Joins Work When the Hash Table Does Not Fit in the PGA 9-19

ix
9.2.2.4 Hash Join Controls 9-20
9.2.3 Sort Merge Joins 9-20
9.2.3.1 When the Optimizer Considers Sort Merge Joins 9-21
9.2.3.2 How Sort Merge Joins Work 9-21
9.2.3.3 Sort Merge Join Controls 9-25
9.3 Join Types 9-26
9.3.1 Inner Joins 9-26
9.3.1.1 Equijoins 9-26
9.3.1.2 Nonequijoins 9-26
9.3.1.3 Band Joins 9-27
9.3.2 Outer Joins 9-31
9.3.2.1 Nested Loops Outer Joins 9-31
9.3.2.2 Hash Join Outer Joins 9-31
9.3.2.3 Sort Merge Outer Joins 9-34
9.3.2.4 Full Outer Joins 9-34
9.3.2.5 Multiple Tables on the Left of an Outer Join 9-35
9.3.3 Semijoins 9-36
9.3.3.1 When the Optimizer Considers Semijoins 9-36
9.3.3.2 How Semijoins Work 9-36
9.3.4 Antijoins 9-38
9.3.4.1 When the Optimizer Considers Antijoins 9-38
9.3.4.2 How Antijoins Work 9-39
9.3.4.3 How Antijoins Handle Nulls 9-41
9.3.5 Cartesian Joins 9-44
9.3.5.1 When the Optimizer Considers Cartesian Joins 9-44
9.3.5.2 How Cartesian Joins Work 9-44
9.3.5.3 Cartesian Join Controls 9-45
9.4 Join Optimizations 9-46
9.4.1 Bloom Filters 9-46
9.4.1.1 Purpose of Bloom Filters 9-46
9.4.1.2 How Bloom Filters Work 9-47
9.4.1.3 Bloom Filter Controls 9-47
9.4.1.4 Bloom Filter Metadata 9-48
9.4.1.5 Bloom Filters: Scenario 9-48
9.4.2 Partition-Wise Joins 9-50
9.4.2.1 Purpose of Partition-Wise Joins 9-50
9.4.2.2 How Partition-Wise Joins Work 9-51
9.4.3 In-Memory Join Groups 9-53

x
Part V Optimizer Statistics

10 Optimizer Statistics Concepts


10.1 Introduction to Optimizer Statistics 10-1
10.2 About Optimizer Statistics Types 10-2
10.2.1 Table Statistics 10-2
10.2.1.1 Permanent Table Statistics 10-3
10.2.1.2 Temporary Table Statistics 10-3
10.2.2 Column Statistics 10-7
10.2.3 Index Statistics 10-8
10.2.3.1 Types of Index Statistics 10-8
10.2.3.2 Index Clustering Factor 10-9
10.2.3.3 Effect of Index Clustering Factor on Cost: Example 10-13
10.2.4 System Statistics 10-13
10.2.5 User-Defined Optimizer Statistics 10-14
10.3 How the Database Gathers Optimizer Statistics 10-14
10.3.1 DBMS_STATS Package 10-14
10.3.2 Supplemental Dynamic Statistics 10-15
10.3.3 Online Statistics Gathering for Bulk Loads 10-16
10.3.3.1 Purpose of Online Statistics Gathering for Bulk Loads 10-16
10.3.3.2 Global Statistics During Inserts into Empty Partitioned Tables 10-16
10.3.3.3 Index Statistics and Histograms During Bulk Loads 10-17
10.3.3.4 Restrictions for Online Statistics Gathering for Bulk Loads 10-17
10.3.3.5 Hints for Online Statistics Gathering for Bulk Loads 10-19
10.4 When the Database Gathers Optimizer Statistics 10-19
10.4.1 Sources for Optimizer Statistics 10-19
10.4.2 SQL Plan Directives 10-20
10.4.2.1 When the Database Creates SQL Plan Directives 10-20
10.4.2.2 How the Database Uses SQL Plan Directives 10-21
10.4.2.3 SQL Plan Directive Maintenance 10-22
10.4.2.4 How the Optimizer Uses SQL Plan Directives: Example 10-22
10.4.2.5 How the Optimizer Uses Extensions and SQL Plan Directives: Example 10-27
10.4.3 When the Database Samples Data 10-31
10.4.4 How the Database Samples Data 10-33

11 Histograms
11.1 Purpose of Histograms 11-1
11.2 When Oracle Database Creates Histograms 11-1
11.3 How Oracle Database Chooses the Histogram Type 11-3

xi
11.4 Cardinality Algorithms When Using Histograms 11-4
11.4.1 Endpoint Numbers and Values 11-4
11.4.2 Popular and Nonpopular Values 11-4
11.4.3 Bucket Compression 11-5
11.5 Frequency Histograms 11-6
11.5.1 Criteria For Frequency Histograms 11-6
11.5.2 Generating a Frequency Histogram 11-7
11.6 Top Frequency Histograms 11-10
11.6.1 Criteria For Top Frequency Histograms 11-10
11.6.2 Generating a Top Frequency Histogram 11-11
11.7 Height-Balanced Histograms (Legacy) 11-14
11.7.1 Criteria for Height-Balanced Histograms 11-14
11.7.2 Generating a Height-Balanced Histogram 11-15
11.8 Hybrid Histograms 11-18
11.8.1 How Endpoint Repeat Counts Work 11-18
11.8.2 Criteria for Hybrid Histograms 11-20
11.8.3 Generating a Hybrid Histogram 11-21

12 Configuring Options for Optimizer Statistics Gathering


12.1 About Optimizer Statistics Collection 12-1
12.1.1 Purpose of Optimizer Statistics Collection 12-1
12.1.2 User Interfaces for Optimizer Statistics Management 12-1
12.1.2.1 Graphical Interface for Optimizer Statistics Management 12-1
12.1.2.2 Command-Line Interface for Optimizer Statistics Management 12-2
12.2 Setting Optimizer Statistics Preferences 12-3
12.2.1 About Optimizer Statistics Preferences 12-3
12.2.1.1 Purpose of Optimizer Statistics Preferences 12-3
12.2.1.2 Examples of Statistics Preferences 12-3
12.2.1.3 DBMS_STATS Procedures for Setting Statistics Preferences 12-4
12.2.1.4 Statistics Preference Overrides 12-5
12.2.1.5 Setting Statistics Preferences: Example 12-8
12.2.2 Setting Global Optimizer Statistics Preferences Using Cloud Control 12-9
12.2.3 Setting Object-Level Optimizer Statistics Preferences Using Cloud Control 12-9
12.2.4 Setting Optimizer Statistics Preferences from the Command Line 12-10
12.3 Configuring Options for Dynamic Statistics 12-12
12.3.1 About Dynamic Statistics Levels 12-12
12.3.2 Setting Dynamic Statistics Levels Manually 12-13
12.3.3 Disabling Dynamic Statistics 12-16
12.4 Managing SQL Plan Directives 12-16

xii
13 Gathering Optimizer Statistics
13.1 Configuring Automatic Optimizer Statistics Collection 13-1
13.1.1 About Automatic Optimizer Statistics Collection 13-1
13.1.2 Configuring Automatic Optimizer Statistics Collection Using Cloud Control 13-2
13.1.3 Configuring Automatic Optimizer Statistics Collection from the Command Line 13-4
13.2 Gathering Optimizer Statistics Manually 13-5
13.2.1 About Manual Statistics Collection with DBMS_STATS 13-6
13.2.2 Guidelines for Gathering Optimizer Statistics Manually 13-7
13.2.2.1 Guideline for Setting the Sample Size 13-7
13.2.2.2 Guideline for Gathering Statistics in Parallel 13-8
13.2.2.3 Guideline for Partitioned Objects 13-8
13.2.2.4 Guideline for Frequently Changing Objects 13-9
13.2.2.5 Guideline for External Tables 13-9
13.2.3 Determining When Optimizer Statistics Are Stale 13-9
13.2.4 Gathering Schema and Table Statistics 13-11
13.2.5 Gathering Statistics for Fixed Objects 13-11
13.2.6 Gathering Statistics for Volatile Tables Using Dynamic Statistics 13-12
13.2.7 Gathering Optimizer Statistics Concurrently 13-14
13.2.7.1 About Concurrent Statistics Gathering 13-14
13.2.7.2 Enabling Concurrent Statistics Gathering 13-16
13.2.7.3 Monitoring Statistics Gathering Operations 13-19
13.2.8 Gathering Incremental Statistics on Partitioned Objects 13-21
13.2.8.1 Purpose of Incremental Statistics 13-21
13.2.8.2 How DBMS_STATS Derives Global Statistics for Partitioned tables 13-22
13.2.8.3 Gathering Statistics for a Partitioned Table: Basic Steps 13-25
13.2.8.4 Maintaining Incremental Statistics for Partition Maintenance Operations 13-28
13.2.8.5 Maintaining Incremental Statistics for Tables with Stale or Locked
Partition Statistics 13-30
13.3 Gathering System Statistics Manually 13-32
13.3.1 About System Statistics 13-32
13.3.2 Guidelines for Gathering System Statistics 13-34
13.3.3 Gathering System Statistics with DBMS_STATS 13-34
13.3.3.1 About the GATHER_SYSTEM_STATS Procedure 13-34
13.3.3.2 Gathering Workload Statistics 13-36
13.3.3.3 Gathering Noworkload Statistics 13-40
13.3.4 Deleting System Statistics 13-42
13.4 Running Statistics Gathering Functions in Reporting Mode 13-42

xiii
14 Managing Extended Statistics
14.1 Managing Column Group Statistics 14-1
14.1.1 About Statistics on Column Groups 14-2
14.1.1.1 Why Column Group Statistics Are Needed: Example 14-2
14.1.1.2 Automatic and Manual Column Group Statistics 14-4
14.1.1.3 User Interface for Column Group Statistics 14-5
14.1.2 Detecting Useful Column Groups for a Specific Workload 14-6
14.1.3 Creating Column Groups Detected During Workload Monitoring 14-9
14.1.4 Creating and Gathering Statistics on Column Groups Manually 14-11
14.1.5 Displaying Column Group Information 14-12
14.1.6 Dropping a Column Group 14-13
14.2 Managing Expression Statistics 14-14
14.2.1 About Expression Statistics 14-14
14.2.1.1 When Expression Statistics Are Useful: Example 14-15
14.2.2 Creating Expression Statistics 14-15
14.2.3 Displaying Expression Statistics 14-16
14.2.4 Dropping Expression Statistics 14-17

15 Controlling the Use of Optimizer Statistics


15.1 Locking and Unlocking Optimizer Statistics 15-1
15.1.1 Locking Statistics 15-1
15.1.2 Unlocking Statistics 15-2
15.2 Publishing Pending Optimizer Statistics 15-3
15.2.1 About Pending Optimizer Statistics 15-3
15.2.2 User Interfaces for Publishing Optimizer Statistics 15-5
15.2.3 Managing Published and Pending Statistics 15-6
15.3 Creating Artificial Optimizer Statistics for Testing 15-9
15.3.1 About Artificial Optimizer Statistics 15-9
15.3.2 Setting Artificial Optimizer Statistics for a Table 15-10
15.3.3 Setting Optimizer Statistics: Example 15-11

16 Managing Historical Optimizer Statistics


16.1 Restoring Optimizer Statistics 16-1
16.1.1 About Restore Operations for Optimizer Statistics 16-1
16.1.2 Guidelines for Restoring Optimizer Statistics 16-1
16.1.3 Restrictions for Restoring Optimizer Statistics 16-2
16.1.4 Restoring Optimizer Statistics Using DBMS_STATS 16-2
16.2 Managing Optimizer Statistics Retention 16-4
16.2.1 Obtaining Optimizer Statistics History 16-4

xiv
16.2.2 Changing the Optimizer Statistics Retention Period 16-5
16.2.3 Purging Optimizer Statistics 16-6
16.3 Reporting on Past Statistics Gathering Operations 16-7

17 Importing and Exporting Optimizer Statistics


17.1 About Transporting Optimizer Statistics 17-1
17.2 Transporting Optimizer Statistics to a Test Database: Tutorial 17-2

18 Analyzing Statistics Using Optimizer Statistics Advisor


18.1 About Optimizer Statistics Advisor 18-1
18.1.1 Purpose of Optimizer Statistics Advisor 18-2
18.1.1.1 Problems with a Traditional Script-Based Approach 18-3
18.1.1.2 Advantages of Optimizer Statistics Advisor 18-3
18.1.2 Optimizer Statistics Advisor Concepts 18-4
18.1.2.1 Components of Optimizer Statistics Advisor 18-4
18.1.2.2 Operational Modes for Optimizer Statistics Advisor 18-8
18.1.3 Command-Line Interface to Optimizer Statistics Advisor 18-9
18.2 Basic Tasks for Optimizer Statistics Advisor 18-10
18.2.1 Creating an Optimizer Statistics Advisor Task 18-13
18.2.2 Listing Optimizer Statistics Advisor Tasks 18-14
18.2.3 Creating Filters for an Optimizer Advisor Task 18-14
18.2.3.1 About Filters for Optimizer Statistics Advisor 18-15
18.2.3.2 Creating an Object Filter for an Optimizer Advisor Task 18-15
18.2.3.3 Creating a Rule Filter for an Optimizer Advisor Task 18-18
18.2.3.4 Creating an Operation Filter for an Optimizer Advisor Task 18-21
18.2.4 Executing an Optimizer Statistics Advisor Task 18-24
18.2.5 Generating a Report for an Optimizer Statistics Advisor Task 18-26
18.2.6 Implementing Optimizer Statistics Advisor Recommendations 18-29
18.2.6.1 Implementing Actions Recommended by Optimizer Statistics Advisor 18-29
18.2.6.2 Generating a Script Using Optimizer Statistics Advisor 18-32

Part VI Optimizer Controls

19 Influencing the Optimizer


19.1 Techniques for Influencing the Optimizer 19-1
19.2 Influencing the Optimizer with Initialization Parameters 19-2
19.2.1 About Optimizer Initialization Parameters 19-3
19.2.2 Enabling Optimizer Features 19-7

xv
19.2.3 Choosing an Optimizer Goal 19-8
19.2.4 Controlling Adaptive Optimization 19-9
19.3 Influencing the Optimizer with Hints 19-11
19.3.1 About Optimizer Hints 19-11
19.3.1.1 Types of Hints 19-12
19.3.1.2 Scope of Hints 19-13
19.3.1.3 Guidelines for Hints 19-14
19.3.2 Guidelines for Join Order Hints 19-15

20 Improving Real-World Performance Through Cursor Sharing


20.1 Overview of Cursor Sharing 20-1
20.1.1 About Cursors 20-1
20.1.1.1 Private and Shared SQL Areas 20-2
20.1.1.2 Parent and Child Cursors 20-3
20.1.2 About Cursors and Parsing 20-7
20.1.3 About Literals and Bind Variables 20-11
20.1.3.1 Literals and Cursors 20-11
20.1.3.2 Bind Variables and Cursors 20-12
20.1.3.3 Bind Variable Peeking 20-13
20.1.4 About the Life Cycle of Shared Cursors 20-16
20.1.4.1 Cursor Marked Invalid 20-16
20.1.4.2 Cursors Marked Rolling Invalid 20-18
20.2 CURSOR_SHARING and Bind Variable Substitution 20-20
20.2.1 CURSOR_SHARING Initialization Parameter 20-20
20.2.2 Parsing Behavior When CURSOR_SHARING = FORCE 20-21
20.3 Adaptive Cursor Sharing 20-23
20.3.1 Purpose of Adaptive Cursor Sharing 20-23
20.3.2 How Adaptive Cursor Sharing Works: Example 20-24
20.3.3 Bind-Sensitive Cursors 20-25
20.3.4 Bind-Aware Cursors 20-29
20.3.5 Cursor Merging 20-33
20.3.6 Adaptive Cursor Sharing Views 20-33
20.4 Real-World Performance Guidelines for Cursor Sharing 20-34
20.4.1 Develop Applications with Bind Variables for Security and Performance 20-34
20.4.2 Do Not Use CURSOR_SHARING = FORCE as a Permanent Fix 20-35
20.4.3 Establish Coding Conventions to Increase Cursor Reuse 20-36
20.4.4 Minimize Session-Level Changes to the Optimizer Environment 20-38

xvi
Part VII Monitoring and Tracing SQL

21 Monitoring Database Operations


21.1 About Monitoring Database Operations 21-1
21.1.1 Purpose of Monitoring Database Operations 21-1
21.1.2 Database Operation Monitoring Concepts 21-2
21.1.2.1 About the Architecture of Real-Time SQL Monitoring 21-2
21.1.2.2 When the Database Monitors Operations 21-4
21.1.2.3 Attributes of composite Database Operations 21-5
21.1.3 User Interfaces for Database Operations Monitoring 21-6
21.1.3.1 Monitored SQL Executions Page in Cloud Control 21-6
21.1.3.2 DBMS_SQL_MONITOR Package 21-7
21.1.3.3 Views for Monitoring and Reporting on Database Operations 21-8
21.1.4 Basic Tasks in Database Operations Monitoring 21-10
21.2 Enabling and Disabling Monitoring of Database Operations 21-10
21.2.1 Enabling Monitoring of Database Operations at the System Level 21-10
21.2.2 Enabling and Disabling Monitoring of Database Operations at the Statement
Level 21-11
21.3 Defining a Composite Database Operation 21-12
21.4 Monitoring SQL Executions Using Cloud Control 21-15
21.5 Monitoring Database Operations: Scenarios 21-19
21.5.1 Reporting on a Simple Operation: Scenario 21-19
21.5.2 Reporting on a Composite Operation: Scenario 21-22

22 Gathering Diagnostic Data with SQL Test Case Builder


22.1 Purpose of SQL Test Case Builder 22-1
22.2 Concepts for SQL Test Case Builder 22-1
22.2.1 SQL Incidents 22-1
22.2.2 What SQL Test Case Builder Captures 22-2
22.2.3 Output of SQL Test Case Builder 22-3
22.3 User Interfaces for SQL Test Case Builder 22-4
22.3.1 Graphical Interface for SQL Test Case Builder 22-4
22.3.1.1 Accessing the Incident Manager 22-4
22.3.1.2 Accessing the Support Workbench 22-5
22.3.2 Command-Line Interface for SQL Test Case Builder 22-6
22.4 Running SQL Test Case Builder 22-6

xvii
23 Performing Application Tracing
23.1 Overview of End-to-End Application Tracing 23-1
23.1.1 Purpose of End-to-End Application Tracing 23-1
23.1.2 End-to-End Application Tracing in a Multitenant Environment 23-2
23.1.3 Tools for End-to-End Application Tracing 23-3
23.1.3.1 Overview of the SQL Trace Facility 23-3
23.1.3.2 Overview of TKPROF 23-4
23.2 Enabling Statistics Gathering for End-to-End Tracing 23-4
23.2.1 Enabling Statistics Gathering for a Client ID 23-4
23.2.2 Enabling Statistics Gathering for Services, Modules, and Actions 23-5
23.3 Enabling End-to-End Application Tracing 23-6
23.3.1 Enabling Tracing for a Client Identifier 23-6
23.3.2 Enabling Tracing for a Service, Module, and Action 23-7
23.3.3 Enabling Tracing for a Session 23-8
23.3.4 Enabling Tracing for an Instance or Database 23-9
23.4 Generating Output Files Using SQL Trace and TKPROF 23-10
23.4.1 Step 1: Setting Initialization Parameters for Trace File Management 23-11
23.4.2 Step 2: Enabling the SQL Trace Facility 23-12
23.4.3 Step 3: Generating Output Files with TKPROF 23-13
23.4.4 Step 4: Storing SQL Trace Facility Statistics 23-14
23.4.4.1 Generating the TKPROF Output SQL Script 23-14
23.4.4.2 Editing the TKPROF Output SQL Script 23-15
23.4.4.3 Querying the Output Table 23-15
23.5 Guidelines for Interpreting TKPROF Output 23-17
23.5.1 Guideline for Interpreting the Resolution of Statistics 23-17
23.5.2 Guideline for Recursive SQL Statements 23-17
23.5.3 Guideline for Deciding Which Statements to Tune 23-17
23.5.4 Guidelines for Avoiding Traps in TKPROF Interpretation 23-18
23.5.4.1 Guideline for Avoiding the Argument Trap 23-18
23.5.4.2 Guideline for Avoiding the Read Consistency Trap 23-19
23.5.4.3 Guideline for Avoiding the Schema Trap 23-19
23.5.4.4 Guideline for Avoiding the Time Trap 23-20
23.6.1 Application Tracing Utilities 23-21
23.6.1.1 TRCSESS 23-21
23.6.1.1.1 Purpose 23-21
23.6.1.1.2 Guidelines 23-21
23.6.1.1.3 Syntax 23-21
23.6.1.1.4 Options 23-22
23.6.1.1.5 Examples 23-22
23.6.1.2 TKPROF 23-23

xviii
23.6.1.2.1 Purpose 23-23
23.6.1.2.2 Guidelines 23-23
23.6.1.2.3 Syntax 23-24
23.6.1.2.4 Options 23-24
23.6.1.2.5 Output 23-26
23.6.1.2.6 Examples 23-29
23.7.1 Views for Application Tracing 23-34
23.7.1.1 Views Relevant for Trace Statistics 23-34
23.7.1.2 Views Related to Enabling Tracing 23-34

Part VIII Automatic SQL Tuning

24 Managing SQL Tuning Sets


24.1 About SQL Tuning Sets 24-1
24.1.1 Purpose of SQL Tuning Sets 24-1
24.1.2 Concepts for SQL Tuning Sets 24-2
24.1.3 User Interfaces for SQL Tuning Sets 24-3
24.1.3.1 Accessing the SQL Tuning Sets Page in Cloud Control 24-3
24.1.3.2 Command-Line Interface to SQL Tuning Sets 24-4
24.1.4 Basic Tasks for Managing SQL Tuning Sets 24-4
24.2 Creating a SQL Tuning Set Using CREATE_SQLSET 24-6
24.3 Loading a SQL Tuning Set Using LOAD_SQLSET 24-7
24.4 Querying a SQL Tuning Set 24-9
24.5 Modifying a SQL Tuning Set Using UPDATE_SQLSET 24-11
24.6 Transporting a SQL Tuning Set 24-13
24.6.1 About Transporting SQL Tuning Sets 24-13
24.6.1.1 Basic Steps for Transporting SQL Tuning Sets 24-13
24.6.1.2 Basic Steps for Transporting SQL Tuning Sets When the CON_DBID
Values Differ 24-14
24.6.2 Transporting SQL Tuning Sets with DBMS_SQLTUNE 24-15
24.7 Dropping a SQL Tuning Set Using DROP_SQLSET 24-17

25 Analyzing SQL with SQL Tuning Advisor


25.1 About SQL Tuning Advisor 25-1
25.1.1 Purpose of SQL Tuning Advisor 25-1
25.1.2 SQL Tuning Advisor Architecture 25-2
25.1.2.1 Input to SQL Tuning Advisor 25-3
25.1.2.2 Output of SQL Tuning Advisor 25-4
25.1.2.3 Automatic Tuning Optimizer Analyses 25-5

xix
25.1.3 SQL Tuning Advisor Operation 25-15
25.1.3.1 Automatic and On-Demand SQL Tuning 25-15
25.1.3.2 Local and Remote SQL Tuning 25-16
25.2 Managing the Automatic SQL Tuning Task 25-17
25.2.1 About the Automatic SQL Tuning Task 25-18
25.2.1.1 Purpose of Automatic SQL Tuning 25-18
25.2.1.2 Automatic SQL Tuning Concepts 25-18
25.2.1.3 Command-Line Interface to SQL Tuning Advisor 25-19
25.2.1.4 Basic Tasks for Automatic SQL Tuning 25-19
25.2.2 Enabling and Disabling the Automatic SQL Tuning Task 25-20
25.2.2.1 Enabling and Disabling the Automatic SQL Tuning Task Using Cloud
Control 25-21
25.2.2.2 Enabling and Disabling the Automatic SQL Tuning Task from the
Command Line 25-22
25.2.3 Configuring the Automatic SQL Tuning Task 25-23
25.2.3.1 Configuring the Automatic SQL Tuning Task Using Cloud Control 25-24
25.2.3.2 Configuring the Automatic SQL Tuning Task Using the Command Line 25-24
25.2.4 Viewing Automatic SQL Tuning Reports 25-26
25.2.4.1 Viewing Automatic SQL Tuning Reports Using the Command Line 25-27
25.3 Running SQL Tuning Advisor On Demand 25-30
25.3.1 About On-Demand SQL Tuning 25-30
25.3.1.1 Purpose of On-Demand SQL Tuning 25-30
25.3.1.2 User Interfaces for On-Demand SQL Tuning 25-30
25.3.1.3 Basic Tasks in On-Demand SQL Tuning 25-32
25.3.2 Creating a SQL Tuning Task 25-33
25.3.3 Configuring a SQL Tuning Task 25-35
25.3.4 Executing a SQL Tuning Task 25-37
25.3.5 Monitoring a SQL Tuning Task 25-38
25.3.6 Displaying the Results of a SQL Tuning Task 25-39

26 Optimizing Access Paths with SQL Access Advisor


26.1 About SQL Access Advisor 26-1
26.1.1 Purpose of SQL Access Advisor 26-1
26.1.2 SQL Access Advisor Architecture 26-2
26.1.2.1 Input to SQL Access Advisor 26-3
26.1.2.2 Filter Options for SQL Access Advisor 26-3
26.1.2.3 SQL Access Advisor Recommendations 26-4
26.1.2.4 SQL Access Advisor Actions 26-5
26.1.2.5 SQL Access Advisor Repository 26-7
26.1.3 User Interfaces for SQL Access Advisor 26-7

xx
26.1.3.1 Accessing the SQL Access Advisor: Initial Options Page Using Cloud
Control 26-7
26.1.3.2 Command-Line Interface to SQL Tuning Sets 26-8
26.2 Using SQL Access Advisor: Basic Tasks 26-8
26.2.1 Creating a SQL Tuning Set as Input for SQL Access Advisor 26-10
26.2.2 Populating a SQL Tuning Set with a User-Defined Workload 26-11
26.2.3 Creating and Configuring a SQL Access Advisor Task 26-14
26.2.4 Executing a SQL Access Advisor Task 26-15
26.2.5 Viewing SQL Access Advisor Task Results 26-16
26.2.6 Generating and Executing a Task Script 26-21
26.3 Performing a SQL Access Advisor Quick Tune 26-22
26.4 Using SQL Access Advisor: Advanced Tasks 26-23
26.4.1 Evaluating Existing Access Structures 26-23
26.4.2 Updating SQL Access Advisor Task Attributes 26-24
26.4.3 Creating and Using SQL Access Advisor Task Templates 26-25
26.4.4 Terminating SQL Access Advisor Task Execution 26-27
26.4.4.1 Interrupting SQL Access Advisor Tasks 26-27
26.4.4.2 Canceling SQL Access Advisor Tasks 26-28
26.4.5 Deleting SQL Access Advisor Tasks 26-29
26.4.6 Marking SQL Access Advisor Recommendations 26-30
26.4.7 Modifying SQL Access Advisor Recommendations 26-31
26.5 SQL Access Advisor Examples 26-32
26.6 SQL Access Advisor Reference 26-32
26.6.1 Action Attributes in the DBA_ADVISOR_ACTIONS View 26-32
26.6.2 Categories for SQL Access Advisor Task Parameters 26-34
26.6.3 SQL Access Advisor Constants 26-35

Part IX SQL Management Objects

27 Managing SQL Profiles


27.1 About SQL Profiles 27-1
27.1.1 Purpose of SQL Profiles 27-1
27.1.2 Concepts for SQL Profiles 27-2
27.1.2.1 Statistics in SQL Profiles 27-2
27.1.2.2 SQL Profiles and Execution Plans 27-2
27.1.2.3 SQL Profile Recommendations 27-3
27.1.2.4 SQL Profiles and SQL Plan Baselines 27-6
27.1.3 User Interfaces for SQL Profiles 27-6
27.1.4 Basic Tasks for SQL Profiles 27-6
27.2 Implementing a SQL Profile 27-7

xxi
27.2.1 About SQL Profile Implementation 27-8
27.2.2 Implementing a SQL Profile 27-9
27.3 Listing SQL Profiles 27-9
27.4 Altering a SQL Profile 27-10
27.5 Dropping a SQL Profile 27-11
27.6 Transporting a SQL Profile 27-12

28 Overview of SQL Plan Management


28.1 Purpose of SQL Plan Management 28-1
28.1.1 Benefits of SQL Plan Management 28-1
28.1.2 Differences Between SQL Plan Baselines and SQL Profiles 28-2
28.2 Plan Capture 28-3
28.2.1 Automatic Initial Plan Capture 28-3
28.2.1.1 Eligibility for Automatic Initial Plan Capture 28-4
28.2.1.2 Plan Matching for Automatic Initial Plan Capture 28-5
28.2.2 Manual Plan Capture 28-5
28.3 Plan Selection 28-7
28.4 Plan Evolution 28-8
28.4.1 Purpose of Plan Evolution 28-9
28.4.2 PL/SQL Subprograms for Plan Evolution 28-9
28.5 Storage Architecture for SQL Plan Management 28-10
28.5.1 SQL Management Base 28-10
28.5.2 SQL Statement Log 28-11
28.5.3 SQL Plan History 28-13
28.5.3.1 Enabled Plans 28-13
28.5.3.2 Accepted Plans 28-13
28.5.3.3 Fixed Plans 28-13

29 Managing SQL Plan Baselines


29.1 About Managing SQL Plan Baselines 29-1
29.1.1 User Interfaces for SQL Plan Management 29-1
29.1.1.1 Accessing the SQL Plan Baseline Page in Cloud Control 29-1
29.1.1.2 DBMS_SPM Package 29-2
29.1.2 Basic Tasks in SQL Plan Management 29-3
29.2 Configuring SQL Plan Management 29-4
29.2.1 Configuring the Capture and Use of SQL Plan Baselines 29-4
29.2.1.1 Enabling Automatic Initial Plan Capture for SQL Plan Management 29-5
29.2.1.2 Configuring Filters for Automatic Plan Capture 29-6
29.2.1.3 Disabling All SQL Plan Baselines 29-8

xxii
29.2.2 Managing the SPM Evolve Advisor Task 29-9
29.2.2.1 About the SPM Evolve Advisor Task 29-9
29.2.2.2 Enabling and Disabling the Automatic SPM Evolve Advisor Task 29-9
29.2.2.3 Configuring the Automatic SPM Evolve Advisor Task 29-10
29.3 Displaying Plans in a SQL Plan Baseline 29-13
29.4 Loading SQL Plan Baselines 29-14
29.4.1 About Loading SQL Plan Baselines 29-15
29.4.2 Loading Plans from AWR 29-16
29.4.3 Loading Plans from the Shared SQL Area 29-19
29.4.4 Loading Plans from a SQL Tuning Set 29-21
29.4.5 Loading Plans from a Staging Table 29-23
29.5 Evolving SQL Plan Baselines Manually 29-26
29.5.1 About the DBMS_SPM Evolve Functions 29-26
29.5.2 Managing an Evolve Task 29-28
29.6 Dropping SQL Plan Baselines 29-37
29.7 Managing the SQL Management Base 29-38
29.7.1 About Managing the SMB 29-38
29.7.2 Changing the Disk Space Limit for the SMB 29-39
29.7.3 Changing the Plan Retention Policy in the SMB 29-41

30 Migrating Stored Outlines to SQL Plan Baselines


30.1 About Stored Outline Migration 30-1
30.1.1 Purpose of Stored Outline Migration 30-1
30.1.2 How Stored Outline Migration Works 30-2
30.1.2.1 Stages of Stored Outline Migration 30-2
30.1.2.2 Outline Categories and Baseline Modules 30-3
30.1.3 User Interface for Stored Outline Migration 30-5
30.1.4 Basic Steps in Stored Outline Migration 30-6
30.2 Preparing for Stored Outline Migration 30-7
30.3 Migrating Outlines to Utilize SQL Plan Management Features 30-8
30.4 Migrating Outlines to Preserve Stored Outline Behavior 30-9
30.5 Performing Follow-Up Tasks After Stored Outline Migration 30-10

A Guidelines for Indexes and Table Clusters


A.1 Guidelines for Tuning Index Performance A-1
A.1.1 Guidelines for Tuning the Logical Structure A-1
A.1.2 Guidelines for Using SQL Access Advisor A-2
A.1.3 Guidelines for Choosing Columns and Expressions to Index A-2
A.1.4 Guidelines for Choosing Composite Indexes A-3

xxiii
A.1.4.1 Guidelines for Choosing Keys for Composite Indexes A-4
A.1.4.2 Guidelines for Ordering Keys for Composite Indexes A-4
A.1.5 Guidelines for Writing SQL Statements That Use Indexes A-5
A.1.6 Guidelines for Writing SQL Statements That Avoid Using Indexes A-5
A.1.7 Guidelines for Avoiding Index Serialization on a Sequence-Generated Key A-5
A.1.8 Guidelines for Re-Creating Indexes A-7
A.1.9 Guidelines for Compacting Indexes A-7
A.1.10 Guidelines for Using Nonunique Indexes to Enforce Uniqueness A-8
A.1.11 Guidelines for Using Enabled Novalidated Constraints A-8
A.2 Guidelines for Using Function-Based Indexes for Performance A-9
A.3 Guidelines for Using Partitioned Indexes for Performance A-10
A.4 Guidelines for Using Index-Organized Tables for Performance A-11
A.5 Guidelines for Using Bitmap Indexes for Performance A-12
A.6 Guidelines for Using Bitmap Join Indexes for Performance A-12
A.7 Guidelines for Using Domain Indexes for Performance A-13
A.8 Guidelines for Using Table Clusters A-13
A.9 Guidelines for Using Hash Clusters for Performance A-14

Glossary

Index

xxiv
Preface
This manual explains how to tune Oracle SQL.
This preface contains the following topics:

Audience
This document is intended for database administrators and application developers who
perform the following tasks:
• Generating and interpreting SQL execution plans
• Managing optimizer statistics
• Influencing the optimizer through initialization parameters or SQL hints
• Controlling cursor sharing for SQL statements
• Monitoring SQL execution
• Performing application tracing
• Managing SQL tuning sets
• Using SQL Tuning Advisor or SQL Access Advisor
• Managing SQL profiles
• Managing SQL baselines

Documentation Accessibility
For information about Oracle's commitment to accessibility, visit the Oracle Accessibility
Program website at http://www.oracle.com/pls/topic/lookup?ctx=acc&id=docacc.

Access to Oracle Support


Oracle customers that have purchased support have access to electronic support through My
Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=info
or visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs if you are hearing impaired.

Related Documents
This manual assumes that you are familiar with Oracle Database Concepts. The following
books are frequently referenced:
• Oracle Database Data Warehousing Guide
• Oracle Database VLDB and Partitioning Guide

xxv
Preface

• Oracle Database SQL Language Reference


• Oracle Database Reference
Many examples in this book use the sample schemas, which are installed by default
when you select the Basic Installation option with an Oracle Database. See Oracle
Database Sample Schemas for information on how these schemas were created and
how you can use them.

Conventions
The following text conventions are used in this document:

Convention Meaning
boldface Boldface type indicates graphical user interface elements associated
with an action, or terms defined in text or the glossary.
italic Italic type indicates book titles, emphasis, or placeholder variables for
which you supply particular values.
monospace Monospace type indicates commands within a paragraph, URLs, code
in examples, text that appears on the screen, or text that you enter.

xxvi
Changes in This Release for Oracle Database
SQL Tuning Guide
This preface describes the most important changes in Oracle Database SQL Tuning Guide.
This preface contains the following topics:

Changes in Oracle Database Release 18c, Version 18.1


Oracle Database SQL Tuning Guide for Oracle Database release 18c, version 18.1 has the
following changes.

New Features
The following features are new in this release:
• Private temporary tables
Private temporary tables are temporary database objects that are automatically dropped
at the end of a transaction or a session. A private temporary table is stored in memory
and is visible only to the session that created it. A private temporary table confines the
scope of a temporary table to a session or a transaction, thus providing more flexibility in
application coding, leading to easier code maintenance and a better ready-to-use
functionality.
See "Statistics for Global Temporary Tables".
• Approximate Top-N Query Processing
To obtain “top n” query results much faster than traditional queries, use the APPROX_SUM
and APPROX_COUNT SQL functions with APPROX_RANK .
See "About Approximate Query Processing".
• SQL Tuning Advisor enhancements for Oracle Exadata Database Machine
SQL Tuning Advisor can recommend an Exadata-aware SQL profile. On Oracle Exadata
Database Machine, the cost of smart scans depends on the system statistics I/O seek
time (ioseektim), multiblock read count (mbrc), and I/O transfer speed (iotfrspeed). The
values of these statistics usually differ on Exadata and can thus influence the choice of
plan. If system statistics are stale, and if gathering them improves performance, then
SQL Tuning Advisor recommends accepting an Exadata-aware SQL profile.
See "Statistical Analysis" and "Statistics in SQL Profiles".
• New package for managing SQL tuning sets
You can use DBMS_SQLSET instead of DBMS_SQLTUNE to create, modify, drop, and perform
all other SQL tuning set operations.
See "Command-Line Interface to SQL Tuning Sets".

xxvii
Changes in This Release for Oracle Database SQL Tuning Guide

• Scalable sequences
Scalable sequences alleviate index leaf block contention when loading data into
tables that use sequence values as keys.
• Decoupling OPTIMIZER_ADAPTIVE_STATISTICS from performance feedback
Unlike in previous releases, setting the OPTIMIZER_ADAPTIVE_STATISTICS
initialization parameter to TRUE or FALSE now has no effect on performance
feedback.

Changes in Oracle Database 12c Release 2 (12.2.0.1)


Oracle Database SQL Tuning Guide for Oracle Database 12c Release 2 (12.2.0.1) has
the following changes.

New Features
The following features are new in this release:
• Advisor enhancements
– Optimizer Statistics Advisor
Optimizer Statistics Advisor is built-in diagnostic software that analyzes the
quality of statistics and statistics-related tasks. The advisor task runs
automatically in the maintenance window, but you can also run it on demand.
You can then view the advisor report. If the advisor makes recommendations,
then in some cases you can run system-generated scripts to implement them.
See "Analyzing Statistics Using Optimizer Statistics Advisor".
– Active Data Guard Support for SQL Tuning Advisor
Using database links, you can tune a standby database workload on a primary
database.
See "Local and Remote SQL Tuning".
• DBMS_STATS enhancements
– DBMS_STATS preference for automatic column group statistics
If the DBMS_STATS preference AUTO_STAT_EXTENSIONS is set to ON (by default it
is OFF), then a SQL plan directive can automatically trigger the creation of
column group statistics based on usage of predicates in the workload.
See "Purpose of Optimizer Statistics Preferences".
– DBMS_STATS support for external table scan rates and In-Memory column store
(IM column store) statistics
If the database uses an IM column store, then you can set the im_imcu_count
parameter to the number of IMCUs in the table or partition, and
im_block_count to the number of blocks. For an external table, scanrate
specifies the rate at which data is scanned in MB/second.
See "Guideline for External Tables".
– DBMS_STATS statistics preference PREFERENCE_OVERRIDES_PARAMETER
The PREFERENCE_OVERRIDES_PARAMETER statistics preference determines
whether, when gathering optimizer statistics, to override the input value of a

xxviii
Changes in This Release for Oracle Database SQL Tuning Guide

parameter with the statistics preference. In this way, you control when the database
honors a parameter value passed to the statistics gathering procedures.
See "Statistics Preference Overrides".
– Access to current statistics does not require FLUSH_DATABASE_MONITORING_INFO
You no longer need to ensure that view metadata is up-to-date by using
DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO to save monitoring information to
disk. The statistics shown in DBA_TAB_STATISTICS and DBA_IND_STATISTICS come
from the same source as DBA_TAB_MODIFICATIONS, which means these views show
statistics obtained from disk and memory.
See "Determining When Optimizer Statistics Are Stale".
• Separate controls for adaptive plans and adaptive statistics
The OPTIMIZER_ADAPTIVE_PLANS initialization parameter enables (default) or disables
adaptive plans. The OPTIMIZER_ADAPTIVE_STATISTICS initialization parameter enables or
disables (default) adaptive statistics.
See "When Adaptive Query Plans Are Enabled" and "When Adaptive Statistics Are
Enabled".
• Join enhancements
– Join groups
A join group is a user-created object that lists two columns that can be meaningfully
joined. In certain queries, join groups enable the database to eliminate the
performance overhead of decompressing and hashing column values. Join groups
require an IM column store.
See "In-Memory Join Groups".
– Band join enhancements
A band join is a special type of nonequijoin in which key values in one data set must
fall within the specified range (“band”) of the second data set. When the database
detects a band join, the database evaluates the costs of band joins more efficiently,
avoiding unnecessary scans of rows that fall outside the defined bands. In most
cases, optimized performance is comparable to an equijoin.
See "Band Joins".
• Cursor management enhancements
– Cursor-duration temporary tables
To materialize the intermediate results of a query, Oracle Database may create a
cursor-duration temporary table in memory during query compilation. For complex
operations such as WITH clause queries and star transformations, this internal
optimization, which enhances the materialization of intermediate results from
repetitively used subqueries, improves performance and optimizes I/O.
See "Cursor-Duration Temporary Tables".
– Fine-grained cursor invalidation
Starting in this release, you can specify deferred invalidation on DDL statements.
When shared SQL areas are marked rolling invalid, the database assigns each one a
randomly generated time period. A hard parse occurs only if the query executes after
the time period has expired. In this way, the database can diffuse the performance
overhead of hard parsing over time.

xxix
Changes in This Release for Oracle Database SQL Tuning Guide

See "About the Life Cycle of Shared Cursors".


• OR expansion enhancement
In previous releases, the optimizer used the CONCATENATION operator to perform
the OR expansion. Now the optimizer uses the UNION-ALL operator instead. This
enhancement provides several benefits, including enabling interaction among
various transformations, and avoiding the sharing of query structures.
See "OR Expansion".
• SQL plan management enhancements
– You can now capture plans from AWR. See "Manual Plan Capture".
– In previous releases, automatic capture applied to all repeatable queries.
Starting in this release, you can create filters to capture only the plans for
statements that you choose. See "Eligibility for Automatic Initial Plan Capture".
• Real-Time database operation monitoring enhancements
A session can start or stop a database operation in a different session by
specifying its session ID and serial number.
See "Defining a Composite Database Operation".
• Expression tracking
SQL statements commonly include expressions such as plus (+) or minus (-).
More complicated examples include PL/SQL functions or SQL functions such as
LTRIM and TO_NUMBER. The Expression Statistics Store (ESS) maintains usage
information about expressions identified during compilation and captured during
execution.
See "About the Expression Statistics Store (ESS)".
• Enhancements for application tracing in a multitenant environment
CDB administrators and PDB administrators can use new V$ views to access trace
data that is relevant for a specific PDB.
See "End-to-End Application Tracing in a Multitenant Environment".

Desupported Features
The following features are desupported in Oracle Database 12c Release 2 (12.2.0.1).
• The OPTIMIZER_ADAPTIVE_FEATURES initialization parameter

See Also:
Oracle Database Upgrade Guide for a list of desupported features

Other Changes
This topic describes additional changes in the release.
• New Real-World Performance content

xxx
Changes in This Release for Oracle Database SQL Tuning Guide

In this release, the book incorporates information provided by the Real-World


Performance group, including the following:
– "Improving Real-World Performance Through Cursor Sharing" explains how to use
bind variables and new features such as adaptive cursor sharing

Changes in Oracle Database 12c Release 1 (12.1.0.2)


Oracle Database SQL Tuning Guide for Oracle Database 12c Release 1 (12.1.0.2) has the
following changes.

New Features
The following features are new in this release.
• In-Memory aggregation
This optimization minimizes the join and GROUP BY processing required for each row when
joining a single large table to multiple small tables, as in a star schema. VECTOR GROUP BY
aggregation uses the infrastructure related to parallel query (PQ) processing, and blends
it with CPU-efficient algorithms to maximize the performance and effectiveness of the
initial aggregation performed before redistributing fact data.
See "In-Memory Aggregation (VECTOR GROUP BY)".
• SQL Monitor support for adaptive query plans
SQL Monitor supports adaptive query plans in the following ways:
– Indicates whether a query plan is adaptive, and show its current status: resolving or
resolved.
– Provides a list that enables you to select the current, full, or final query plans
See "Adaptive Query Plans" to learn more about adaptive query plans, and
"Monitoring SQL Executions Using Cloud Control" to learn more about SQL Monitor.

Changes in Oracle Database 12c Release 1 (12.1.0.1)


Oracle Database SQL Tuning Guide for Oracle Database 12c Release 1 (12.1) has the
following changes.

New Features
The following features are new in this release.
• Adaptive SQL Plan Management (SPM)
The SPM Evolve Advisor is a task infrastructure that enables you to schedule an evolve
task, rerun an evolve task, and generate persistent reports. The new automatic evolve
task, SYS_AUTO_SPM_EVOLVE_TASK, runs in the default maintenance window. This task
ranks all unaccepted plans and runs the evolve process for them. If the task finds a new
plan that performs better than existing plan, the task automatically accepts the plan. You
can also run evolution tasks manually using the DBMS_SPM package.
See "Managing the SPM Evolve Advisor Task".
• Adaptive query optimization

xxxi
Changes in This Release for Oracle Database SQL Tuning Guide

Adaptive query optimization is a set of capabilities that enable the optimizer to


make run-time adjustments to execution plans and discover additional information
that can lead to better statistics. The set of capabilities include:
– Adaptive query plans
An adaptive query plan has built-in options that enable the final plan for a
statement to differ from the default plan. During the first execution, before a
specific subplan becomes active, the optimizer makes a final decision about
which option to use. The optimizer bases its choice on observations made
during the execution up to this point. The ability of the optimizer to adapt plans
can improve query performance.
See "Adaptive Query Plans".
– Automatic reoptimization
When using automatic reoptimization, the optimizer monitors the initial
execution of a query. If the actual execution statistics vary significantly from
the original plan statistics, then the optimizer records the execution statistics
and uses them to choose a better plan the next time the statement executes.
The database uses information obtained during automatic reoptimization to
generate SQL plan directives automatically.
See "Automatic Reoptimization".
– SQL plan directives
In releases earlier than Oracle Database 12c, the database stored compilation
and execution statistics in the shared SQL area, which is nonpersistent.
Starting in this release, the database can use a SQL plan directive, which is
additional information and instructions that the optimizer can use to generate a
more optimal plan. The database stores SQL plan directives persistently in the
SYSAUX tablespace. When generating an execution plan, the optimizer can use
SQL plan directives to obtain more information about the objects accessed in
the plan.
See "SQL Plan Directives".
– Dynamic statistics enhancements
In releases earlier than Oracle Database 12c, Oracle Database only used
dynamic statistics (previously called dynamic sampling) when one or more of
the tables in a query did not have optimizer statistics. Starting in this release,
the optimizer automatically decides whether dynamic statistics are useful and
which dynamic statistics level to use for all SQL statements. Dynamic statistics
gathers are persistent and usable by other queries.
See "Supplemental Dynamic Statistics".
• New types of histograms
This release introduces top frequency and hybrid histograms. If a column contains
more than 254 distinct values, and if the top 254 most frequent values occupy
more than 99% of the data, then the database creates a top frequency histogram
using the top 254 most frequent values. By ignoring the nonpopular values, which
are statistically insignificant, the database can produce a better quality histogram
for highly popular values. A hybrid histogram is an enhanced height-based
histogram that stores the exact frequency of each endpoint in the sample, and
ensures that a value is never stored in multiple buckets.

xxxii
Changes in This Release for Oracle Database SQL Tuning Guide

Also, regular frequency histograms have been enhanced. The optimizer computes
frequency histograms during NDV computation based on a full scan of the data rather
than a small sample (when AUTO_SAMPLING is used). The enhanced frequency histograms
ensure that even highly infrequent values are properly represented with accurate bucket
counts within a histogram.
See "Histograms ".
• Monitoring database operations
Real-Time Database Operations Monitoring enables you to monitor long running
database tasks such as batch jobs, scheduler jobs, and Extraction, Transformation, and
Loading (ETL) jobs as a composite business operation. This feature tracks the progress
of SQL and PL/SQL queries associated with the business operation being monitored. As
a DBA or developer, you can define business operations for monitoring by explicitly
specifying the start and end of the operation or implicitly with tags that identify the
operation.
See "Monitoring Database Operations ".
• Concurrent statistics gathering
You can concurrently gather optimizer statistics on multiple tables, table partitions, or
table subpartitions. By fully utilizing multiprocessor environments, the database can
reduce the overall time required to gather statistics. Oracle Scheduler and Advanced
Queuing create and manage jobs to gather statistics concurrently. The scheduler decides
how many jobs to execute concurrently, and how many to queue based on available
system resources and the value of the JOB_QUEUE_PROCESSES initialization parameter.
See "Gathering Optimizer Statistics Concurrently".
• Reporting mode for DBMS_STATS statistics gathering functions
You can run the DBMS_STATS functions in reporting mode. In this mode, the optimizer does
not actually gather statistics, but reports objects that would be processed if you were to
use a specified statistics gathering function.
See "Running Statistics Gathering Functions in Reporting Mode".
• Reports on past statistics gathering operations
You can use DBMS_STATS functions to report on a specific statistics gathering operation or
on operations that occurred during a specified time.
See "Reporting on Past Statistics Gathering Operations".
• Automatic column group creation
With column group statistics, the database gathers optimizer statistics on a group of
columns treated as a unit. Starting in Oracle Database 12c, the database automatically
determines which column groups are required in a specified workload or SQL tuning set,
and then creates the column groups. Thus, for any specified workload, you no longer
need to know which columns from each table must be grouped.
See "Detecting Useful Column Groups for a Specific Workload".
• Session-private statistics for global temporary tables
Starting in this release, global temporary tables have a different set of optimizer statistics
for each session. Session-specific statistics improve performance and manageability of
temporary tables because users no longer need to set statistics for a global temporary
table in each session or rely on dynamic statistics. The possibility of errors in cardinality
estimates for global temporary tables is lower, ensuring that the optimizer has the
necessary information to determine an optimal execution plan.

xxxiii
Changes in This Release for Oracle Database SQL Tuning Guide

See "Temporary Table Statistics".


• SQL Test Case Builder enhancements
SQL Test Case Builder can capture and replay actions and events that enable you
to diagnose incidents that depend on certain dynamic and volatile factors. This
capability is especially useful for parallel query and automatic memory
management.
See Gathering Diagnostic Data with SQL Test Case Builder.
• Online statistics gathering for bulk loads
A bulk load is a CREATE TABLE AS SELECT or INSERT INTO ... SELECT operation.
In releases earlier than Oracle Database 12c, you needed to manually gather
statistics after a bulk load to avoid the possibility of a suboptimal execution plan
caused by stale statistics. Starting in this release, Oracle Database gathers
optimizer statistics automatically, which improves both performance and
manageability.
See "Online Statistics Gathering for Bulk Loads".
• Reuse of synopses after partition maintenance operations
ALTER TABLE EXCHANGE is a common partition maintenance operation. During a
partition exchange, the statistics of the partition and the table are also exchanged.
A synopsis is a set of auxiliary statistics gathered on a partitioned table when the
INCREMENTAL value is set to true. In releases earlier than Oracle Database 12c,
you could not gather table-level synopses on a table. Thus, you could not gather
table-level synopses on a table, exchange the table with a partition, and end up
with synopses on the partition. You had to explicitly gather optimizer statistics in
incremental mode to create the missing synopses. Starting in this release, you can
gather table-level synopses on a table. When you exchange this table with a
partition in an incremental mode table, the synopses are also exchanged.
See "Maintaining Incremental Statistics for Partition Maintenance Operations".
• Automatic updates of global statistics for tables with stale or locked partition
statistics
Incremental statistics can automatically calculate global statistics for a partitioned
table even if the partition or subpartition statistics are stale and locked.
See "Maintaining Incremental Statistics for Tables with Stale or Locked Partition
Statistics".
• Cube query performance enhancements
These enhancements minimize CPU and memory consumption and reduce I/O for
queries against cubes.
See Table 7-7 to learn about the CUBE JOIN operation.

Deprecated Features
The following features are deprecated in this release, and may be desupported in a
future release.
• Stored outlines
See "Managing SQL Plan Baselines" for information about alternatives.
• The SIMILAR value for the CURSOR_SHARING initialization parameter

xxxiv
Changes in This Release for Oracle Database SQL Tuning Guide

This value is deprecated. Use FORCE instead.


See "Do Not Use CURSOR_SHARING = FORCE as a Permanent Fix".

Desupported Features
Some features previously described in this document are desupported in Oracle Database
12c.
See Oracle Database Upgrade Guide for a list of desupported features.

Other Changes
This manual has the following additional changes in Oracle Database 12c.
• New tuning books
The Oracle Database 11g Oracle Database Performance Tuning Guide has been divided
into two books for Oracle Database 12c:
– Oracle Database Performance Tuning Guide, which contains only topics that pertain
to tuning the database
– Oracle Database SQL Tuning Guide, which contains only topics that pertain to tuning
SQL

xxxv
Part I
SQL Performance Fundamentals
SQL tuning is improving SQL statement performance to meet specific, measurable, and
achievable goals.
This part contains the following chapters:
1
Introduction to SQL Tuning
SQL tuning is the attempt to diagnose and repair SQL statements that fail to meet a
performance standard.
This chapter contains the following topics:

1.1 About SQL Tuning


SQL tuning is the iterative process of improving SQL statement performance to meet
specific, measurable, and achievable goals.
SQL tuning implies fixing problems in deployed applications. In contrast, application design
sets the security and performance goals before deploying an application.

See Also:

• SQL Performance Methodology


• "Guidelines for Designing Your Application" to learn how to design for SQL
performance

1.2 Purpose of SQL Tuning


A SQL statement becomes a problem when it fails to perform according to a predetermined
and measurable standard.
After you have identified the problem, a typical tuning session has one of the following goals:
• Reduce user response time, which means decreasing the time between when a user
issues a statement and receives a response
• Improve throughput, which means using the least amount of resources necessary to
process all rows accessed by a statement
For a response time problem, consider an online book seller application that hangs for three
minutes after a customer updates the shopping cart. Contrast with a three-minute parallel
query in a data warehouse that consumes all of the database host CPU, preventing other
queries from running. In each case, the user response time is three minutes, but the cause of
the problem is different, and so is the tuning goal.

1.3 Prerequisites for SQL Tuning


SQL performance tuning requires a foundation of database knowledge.
If you are tuning SQL performance, then this manual assumes that you have the knowledge
and skills shown in the following table.

1-1
Chapter 1
Tasks and Tools for SQL Tuning

Table 1-1 Required Knowledge

Required Knowledge Description To Learn More


Database architecture Database architecture is not the Oracle Database Concepts
domain of administrators alone. explains the basic relational data
As a developer, you want to structures, transaction
develop applications in the least management, storage structures,
amount of time against an Oracle and instance architecture of
database, which requires Oracle Database.
exploiting the database
architecture and features. For
example, not understanding
Oracle Database concurrency
controls and multiversioning read
consistency may make an
application corrupt the integrity of
the data, run slowly, and
decrease scalability.
SQL and PL/SQL Because of the existence of GUI- Oracle Database Concepts
based tools, it is possible to includes an introduction to Oracle
create applications and SQL and PL/SQL. You must also
administer a database without have a working knowledge of
knowing SQL. However, it is Oracle Database SQL Language
impossible to tune applications or Reference, Oracle Database
a database without knowing SQL. PL/SQL Packages and Types
Reference, and Oracle Database
PL/SQL Packages and Types
Reference.
SQL tuning tools The database generates Oracle Database 2 Day +
performance statistics, and Performance Tuning Guide
provides SQL tuning tools that provides an introduction to the
interpret these statistics. principal SQL tuning tools.

1.4 Tasks and Tools for SQL Tuning


After you have identified the goal for a tuning session, for example, reducing user
response time from three minutes to less than a second, the problem becomes how to
accomplish this goal.
This section contains the following topics:

1.4.1 SQL Tuning Tasks


The specifics of a tuning session depend on many factors, including whether you tune
proactively or reactively.
In proactive SQL tuning, you regularly use SQL Tuning Advisor to determine whether
you can make SQL statements perform better. In reactive SQL tuning, you correct a
SQL-related problem that a user has experienced.
Whether you tune proactively or reactively, a typical SQL tuning session involves all or
most of the following tasks:
1. Identifying high-load SQL statements

1-2
Chapter 1
Tasks and Tools for SQL Tuning

Review past execution history to find the statements responsible for a large share of the
application workload and system resources.
2. Gathering performance-related data
The optimizer statistics are crucial to SQL tuning. If these statistics do not exist or are no
longer accurate, then the optimizer cannot generate the best plan. Other data relevant to
SQL performance include the structure of tables and views that the statement accessed,
and definitions of any indexes available to the statement.
3. Determining the causes of the problem
Typically, causes of SQL performance problems include:
• Inefficiently designed SQL statements
If a SQL statement is written so that it performs unnecessary work, then the optimizer
cannot do much to improve its performance. Examples of inefficient design include
– Neglecting to add a join condition, which leads to a Cartesian join
– Using hints to specify a large table as the driving table in a join
– Specifying UNION instead of UNION ALL
– Making a subquery execute for every row in an outer query
• Suboptimal execution plans
The query optimizer (also called the optimizer) is internal software that determines
which execution plan is most efficient. Sometimes the optimizer chooses a plan with
a suboptimal access path, which is the means by which the database retrieves data
from the database. For example, the plan for a query predicate with low selectivity
may use a full table scan on a large table instead of an index.
You can compare the execution plan of an optimally performing SQL statement to the
plan of the statement when it performs suboptimally. This comparison, along with
information such as changes in data volumes, can help identify causes of
performance degradation.
• Missing SQL access structures
Absence of SQL access structures, such as indexes and materialized views, is a
typical reason for suboptimal SQL performance. The optimal set of access structures
can improve SQL performance by orders of magnitude.
• Stale optimizer statistics
Statistics gathered by DBMS_STATS can become stale when the statistics maintenance
operations, either automatic or manual, cannot keep up with the changes to the table
data caused by DML. Because stale statistics on a table do not accurately reflect the
table data, the optimizer can make decisions based on faulty information and
generate suboptimal execution plans.
• Hardware problems
Suboptimal performance might be connected with memory, I/O, and CPU problems.
4. Defining the scope of the problem
The scope of the solution must match the scope of the problem. Consider a problem at
the database level and a problem at the statement level. For example, the shared pool is
too small, which causes cursors to age out quickly, which in turn causes many hard
parses. Using an initialization parameter to increase the shared pool size fixes the
problem at the database level and improves performance for all sessions. However, if a

1-3
Chapter 1
Tasks and Tools for SQL Tuning

single SQL statement is not using a helpful index, then changing the optimizer
initialization parameters for the entire database could harm overall performance. If
a single SQL statement has a problem, then an appropriately scoped solution
addresses just this problem with this statement.
5. Implementing corrective actions for suboptimally performing SQL statements
These actions vary depending on circumstances. For example, you might rewrite a
SQL statement to be more efficient, avoiding unnecessary hard parsing by
rewriting the statement to use bind variables. You might also use equijoins,
remove functions from WHERE clauses, and break a complex SQL statement into
multiple simple statements.
In some cases, you improve SQL performance not by rewriting the statement, but
by restructuring schema objects. For example, you might index a new access
path, or reorder columns in a concatenated index. You might also partition a table,
introduce derived values, or even change the database design.
6. Preventing SQL performance regressions
To ensure optimal SQL performance, verify that execution plans continue to
provide optimal performance, and choose better plans if they come available. You
can achieve these goals using optimizer statistics, SQL profiles, and SQL plan
baselines.

See Also:

• "Shared Pool Check"


• Oracle Database Concepts to learn more about the shared pool

1.4.2 SQL Tuning Tools


SQL tuning tools are either automated or manual.
In this context, a tool is automated if the database itself can provide diagnosis, advice,
or corrective actions. A manual tool requires you to perform all of these operations.
All tuning tools depend on the basic tools of the dynamic performance views, statistics,
and metrics that the database instance collects. The database itself contains the data
and metadata required to tune SQL statements.
This section contains the following topics:

1.4.2.1 Automated SQL Tuning Tools


Oracle Database provides several advisors relevant for SQL tuning.
Additionally, SQL plan management is a mechanism that can prevent performance
regressions and also help you to improve SQL performance.
All of the automated SQL tuning tools can use SQL tuning sets as input. A SQL tuning
set (STS) is a database object that includes one or more SQL statements along with
their execution statistics and execution context.
This section contains the following topics:

1-4
Chapter 1
Tasks and Tools for SQL Tuning

See Also:

• "About SQL Tuning Sets"


• Oracle Database 2 Day + Performance Tuning Guide to learn more about
managing SQL tuning sets

1.4.2.1.1 Automatic Database Diagnostic Monitor (ADDM)


ADDM is self-diagnostic software built into Oracle Database.
ADDM can automatically locate the root causes of performance problems, provide
recommendations for correction, and quantify the expected benefits. ADDM also identifies
areas where no action is necessary.
ADDM and other advisors use Automatic Workload Repository (AWR), which is an
infrastructure that provides services to database components to collect, maintain, and use
statistics. ADDM examines and analyzes statistics in AWR to determine possible
performance problems, including high-load SQL.
For example, you can configure ADDM to run nightly. In the morning, you can examine the
latest ADDM report to see what might have caused a problem and if there is a recommended
fix. The report might show that a particular SELECT statement consumed a huge amount of
CPU, and recommend that you run SQL Tuning Advisor.

See Also:

• Oracle Database 2 Day + Performance Tuning Guide


• Oracle Database Performance Tuning Guide

1.4.2.1.2 SQL Tuning Advisor


SQL Tuning Advisor is internal diagnostic software that identifies problematic SQL
statements and recommends how to improve statement performance.
When run during database maintenance windows as an automated maintenance task, SQL
Tuning Advisor is known as Automatic SQL Tuning Advisor.
SQL Tuning Advisor takes one or more SQL statements as an input and invokes the
Automatic Tuning Optimizer to perform SQL tuning on the statements. The advisor performs
the following types of analysis:
• Checks for missing or stale statistics
• Builds SQL profiles
A SQL profile is a set of auxiliary information specific to a SQL statement. A SQL profile
contains corrections for suboptimal optimizer estimates discovered during Automatic SQL
Tuning. This information can improve optimizer estimates for cardinality, which is the
number of rows that is estimated to be or actually is returned by an operation in an
execution plan, and selectivity. These improved estimates lead the optimizer to select
better plans.

1-5
Chapter 1
Tasks and Tools for SQL Tuning

• Explores whether a different access path can significantly improve performance


• Identifies SQL statements that lend themselves to suboptimal plans
The output is in the form of advice or recommendations, along with a rationale for each
recommendation and its expected benefit. The recommendation relates to a collection
of statistics on objects, creation of new indexes, restructuring of the SQL statement, or
creation of a SQL profile. You can choose to accept the recommendations to complete
the tuning of the SQL statements.

See Also:

• "Analyzing SQL with SQL Tuning Advisor"


• Oracle Database 2 Day + Performance Tuning Guide

1.4.2.1.3 SQL Access Advisor


SQL Access Advisor is internal diagnostic software that recommends which
materialized views, indexes, and materialized view logs to create, drop, or retain.
SQL Access Advisor takes an actual workload as input, or the advisor can derive a
hypothetical workload from the schema. SQL Access Advisor considers the trade-offs
between space usage and query performance, and recommends the most cost-
effective configuration of new and existing materialized views and indexes. The
advisor also makes recommendations about partitioning.

See Also:

• "About SQL Access Advisor"


• Oracle Database 2 Day + Performance Tuning Guide

1.4.2.1.4 SQL Plan Management


SQL plan management is a preventative mechanism that enables the optimizer to
automatically manage execution plans, ensuring that the database uses only known or
verified plans.
This mechanism can build a SQL plan baseline, which contains one or more accepted
plans for each SQL statement. By using baselines, SQL plan management can
prevent plan regressions from environmental changes, while permitting the optimizer
to discover and use better plans.

1-6
Chapter 1
Tasks and Tools for SQL Tuning

See Also:

• "Overview of SQL Plan Management"


• Oracle Database PL/SQL Packages and Types Reference
to learn about the DBMS_SPM package

1.4.2.1.5 SQL Performance Analyzer


SQL Performance Analyzer determines the effect of a change on a SQL workload by
identifying performance divergence for each SQL statement.
System changes such as upgrading a database or adding an index may cause changes to
execution plans, affecting SQL performance. By using SQL Performance Analyzer, you can
accurately forecast the effect of system changes on SQL performance. Using this information,
you can tune the database when SQL performance regresses, or validate and measure the
gain when SQL performance improves.

See Also:
Oracle Database Testing Guide

1.4.2.2 Manual SQL Tuning Tools


In some situations, you may want to run manual tools in addition to the automated tools.
Alternatively, you may not have access to the automated tools.
This section contains the following topics:

1.4.2.2.1 Execution Plans


Execution plans are the principal diagnostic tool in manual SQL tuning. For example, you can
view plans to determine whether the optimizer selects the plan you expect, or identify the
effect of creating an index on a table.
You can display execution plans in multiple ways. The following tools are the most commonly
used:
• DBMS_XPLAN
You can use the DBMS_XPLAN package methods to display the execution plan generated
by the EXPLAIN PLAN command and query of V$SQL_PLAN.
• EXPLAIN PLAN
This SQL statement enables you to view the execution plan that the optimizer would use
to execute a SQL statement without actually executing the statement. See Oracle
Database SQL Language Reference.
• V$SQL_PLAN and related views
These views contain information about executed SQL statements, and their execution
plans, that are still in the shared pool. See Oracle Database Reference.

1-7
Chapter 1
Tasks and Tools for SQL Tuning

• AUTOTRACE
The AUTOTRACE command in SQL*Plus generates the execution plan and statistics
about the performance of a query. This command provides statistics such as disk
reads and memory reads. See SQL*Plus User's Guide and Reference.

1.4.2.2.2 Real-Time SQL Monitoring and Real-Time Database Operations


The Real-Time SQL Monitoring feature of Oracle Database enables you to monitor the
performance of SQL statements while they are executing. By default, SQL monitoring
starts automatically when a statement runs in parallel, or when it has consumed at
least 5 seconds of CPU or I/O time in a single execution.
A database operation is a set of database tasks defined by end users or application
code, for example, a batch job or Extraction, Transformation, and Loading (ETL)
processing. You can define, monitor, and report on database operations. Real-Time
Database Operations provides the ability to monitor composite operations
automatically. The database automatically monitors parallel queries, DML, and DDL
statements as soon as execution begins.
Oracle Enterprise Manager Cloud Control (Cloud Control) provides easy-to-use SQL
monitoring pages. Alternatively, you can monitor SQL-related statistics using the
V$SQL_MONITOR and V$SQL_PLAN_MONITOR views. You can use these views with the
following views to get more information about executions that you are monitoring:
• V$ACTIVE_SESSION_HISTORY
• V$SESSION
• V$SESSION_LONGOPS
• V$SQL
• V$SQL_PLAN

See Also:

• "About Monitoring Database Operations"


• Oracle Database Reference to learn about the V$ views

1.4.2.2.3 Application Tracing


A SQL trace file provides performance information on individual SQL statements:
parse counts, physical and logical reads, misses on the library cache, and so on.
Trace files are sometimes useful for diagnosing SQL performance problems. You can
enable and disable SQL tracing for a specific session using the DBMS_MONITOR or
DBMS_SESSION packages. Oracle Database implements tracing by generating a trace
file for each server process when you enable the tracing mechanism.
Oracle Database provides the following command-line tools for analyzing trace files:
• TKPROF
This utility accepts as input a trace file produced by the SQL Trace facility, and
then produces a formatted output file.

1-8
Chapter 1
Tasks and Tools for SQL Tuning

• trcsess
This utility consolidates trace output from multiple trace files based on criteria such as
session ID, client ID, and service ID. After trcsess merges the trace information into a
single output file, you can format the output file with TKPROF. trcsess is useful for
consolidating the tracing of a particular session for performance or debugging purposes.
End-to-End Application Tracing simplifies the process of diagnosing performance problems in
multitier environments. In these environments, the middle tier routes a request from an end
client to different database sessions, making it difficult to track a client across database
sessions. End-to-End application tracing uses a client ID to uniquely trace a specific end-
client through all tiers to the database.

See Also:
Oracle Database PL/SQL Packages and Types Reference to learn more about
DBMS_MONITOR and DBMS_SESSION

1.4.2.2.4 Optimizer Hints


A hint is an instruction passed to the optimizer through comments in a SQL statement. Hints
enable you to make decisions normally made automatically by the optimizer.
In a test or development environment, hints are useful for testing the performance of a
specific access path. For example, you may know that a specific index is more selective for
certain queries. In this case, you may use hints to instruct the optimizer to use a better
execution plan, as in the following example:

SELECT /*+ INDEX (employees emp_department_ix) */


employee_id, department_id
FROM employees
WHERE department_id > 50;

See Also:

• "Influencing the Optimizer with Hints"


• Oracle Database SQL Language Reference to learn more about hints

1.4.3 User Interfaces to SQL Tuning Tools


Cloud Control is a system management tool that provides centralized management of a
database environment. Cloud Control provides access to most tuning tools.
By combining a graphical console, Oracle Management Servers, Oracle Intelligent Agents,
common services, and administrative tools, Cloud Control provides a comprehensive system
management platform.
You can access all SQL tuning tools using a command-line interface. For example, the
DBMS_SQLTUNE package is the command-line interface for SQL Tuning Advisor.

1-9
Chapter 1
Tasks and Tools for SQL Tuning

Oracle recommends Cloud Control as the best interface for database administration
and tuning. In cases where the command-line interface better illustrates a particular
concept or task, this manual uses command-line examples. However, in these cases
the tuning tasks include a reference to the principal Cloud Control page associated
with the task.

1-10
2
SQL Performance Methodology
This chapter describes the recommended methodology for SQL tuning.

Note:
This book assumes that you have learned the Oracle Database performance
methodology described in Oracle Database 2 Day + Performance Tuning Guide.

This chapter contains the following topics:

2.1 Guidelines for Designing Your Application


The key to obtaining good SQL performance is to design your application with performance in
mind.
This section contains the following topics:

2.1.1 Guideline for Data Modeling


Data modeling is important to successful application design.
You must perform data modeling in a way that represents the business practices. Heated
debates may occur about the correct data model. The important thing is to apply greatest
modeling efforts to those entities affected by the most frequent business transactions.
In the modeling phase, there is a great temptation to spend too much time modeling the non-
core data elements, which results in increased development lead times. Use of modeling
tools can then rapidly generate schema definitions and can be useful when a fast prototype is
required.

2.1.2 Guideline for Writing Efficient Applications


During the design and architecture phase of system development, ensure that the application
developers understand SQL execution efficiency.
To achieve this goal, the development environment must support the following characteristics:
• Good database connection management
Connecting to the database is an expensive operation that is not scalable. Therefore, a
best practice is to minimize the number of concurrent connections to the database. A
simple system, where a user connects at application initialization, is ideal. However, in a
web-based or multitiered application in which application servers multiplex database
connections to users, this approach can be difficult. With these types of applications,
design them to pool database connections, and not reestablish connections for each user
request.

2-1
Chapter 2
Guidelines for Designing Your Application

• Good cursor usage and management


Maintaining user connections is equally important to minimizing the parsing activity
on the system. Parsing is the process of interpreting a SQL statement and creating
an execution plan for it. This process has many phases, including syntax checking,
security checking, execution plan generation, and loading shared structures into
the shared pool. There are two types of parse operations:
– Hard parsing
A SQL statement is submitted for the first time, and no match is found in the
shared pool. Hard parses are the most resource-intensive and unscalable,
because they perform all the operations involved in a parse.
– Soft parsing
A SQL statement is submitted for the first time, and a match is found in the
shared pool. The match can be the result of previous execution by another
user. The SQL statement is shared, which is optimal for performance.
However, soft parses are not ideal, because they still require syntax and
security checking, which consume system resources.
Because parsing should be minimized as much as possible, application
developers should design their applications to parse SQL statements once and
execute them many times. This is done through cursors. Experienced SQL
programmers should be familiar with the concept of opening and re-executing
cursors.
• Effective use of bind variables
Application developers must also ensure that SQL statements are shared within
the shared pool. To achieve this goal, use bind variables to represent the parts of
the query that change from execution to execution. If this is not done, then the
SQL statement is likely to be parsed once and never re-used by other users. To
ensure that SQL is shared, use bind variables and do not use string literals with
SQL statements. For example:
Statement with string literals:

SELECT *
FROM employees
WHERE last_name LIKE 'KING';

Statement with bind variables:

SELECT *
FROM employees
WHERE last_name LIKE :1;

The following example shows the results of some tests on a simple OLTP
application:

Test #Users Supported


No Parsing all statements 270
Soft Parsing all statements 150
Hard Parsing all statements 60
Re-Connecting for each Transaction 30

2-2
Chapter 2
Guidelines for Deploying Your Application

These tests were performed on a four-CPU computer. The differences increase as the
number of CPUs on the system increase.

2.2 Guidelines for Deploying Your Application


To achieve optimal performance, deploy your application with the same care that you put into
designing it.
This section contains the following topics:

2.2.1 Guideline for Deploying in a Test Environment


The testing process mainly consists of functional and stability testing. At some point in the
process, you must perform performance testing.
The following list describes simple rules for performance testing an application. If correctly
documented, then this list provides important information for the production application and
the capacity planning process after the application has gone live.
• Use the Automatic Database Diagnostic Monitor (ADDM) and SQL Tuning Advisor for
design validation.
• Test with realistic data volumes and distributions.
All testing must be done with fully populated tables. The test database should contain
data representative of the production system in terms of data volume and cardinality
between tables. All the production indexes should be built and the schema statistics
should be populated correctly.
• Use the correct optimizer mode.
Perform all testing with the optimizer mode that you plan to use in production.
• Test a single user performance.
Test a single user on an idle or lightly-used database for acceptable performance. If a
single user cannot achieve acceptable performance under ideal conditions, then multiple
users cannot achieve acceptable performance under real conditions.
• Obtain and document plans for all SQL statements.
Obtain an execution plan for each SQL statement. Use this process to verify that the
optimizer is obtaining an optimal execution plan, and that the relative cost of the SQL
statement is understood in terms of CPU time and physical I/Os. This process assists in
identifying the heavy use transactions that require the most tuning and performance work
in the future.
• Attempt multiuser testing.
This process is difficult to perform accurately, because user workload and profiles might
not be fully quantified. However, transactions performing DML statements should be
tested to ensure that there are no locking conflicts or serialization problems.
• Test with the correct hardware configuration.
Test with a configuration as close to the production system as possible. Using a realistic
system is particularly important for network latencies, I/O subsystem bandwidth, and
processor type and speed. Failing to use this approach may result in an incorrect
analysis of potential performance problems.
• Measure steady state performance.

2-3
Chapter 2
Guidelines for Deploying Your Application

When benchmarking, it is important to measure the performance under steady


state conditions. Each benchmark run should have a ramp-up phase, where users
are connected to the application and gradually start performing work on the
application. This process allows for frequently cached data to be initialized into the
cache and single execution operations—such as parsing—to be completed before
the steady state condition. Likewise, after a benchmark run, a ramp-down period is
useful so that the system frees resources, and users cease work and disconnect.

2.2.2 Guidelines for Application Rollout


When new applications are rolled out, two strategies are commonly adopted: the Big
Bang approach, in which all users migrate to the new system at once, and the trickle
approach, in which users slowly migrate from existing systems to the new one.
Both approaches have merits and disadvantages. The Big Bang approach relies on
reliable testing of the application at the required scale, but has the advantage of
minimal data conversion and synchronization with the old system, because it is simply
switched off. The Trickle approach allows debugging of scalability issues as the
workload increases, but might mean that data must be migrated to and from legacy
systems as the transition takes place.
It is difficult to recommend one approach over the other, because each technique has
associated risks that could lead to system outages as the transition takes place.
Certainly, the Trickle approach allows profiling of real users as they are introduced to
the new application, and allows the system to be reconfigured while only affecting the
migrated users. This approach affects the work of the early adopters, but limits the
load on support services. Thus, unscheduled outages only affect a small percentage of
the user population.
The decision on how to roll out a new application is specific to each business. Any
adopted approach has its own unique pressures and stresses. The more testing and
knowledge that you derive from the testing process, the more you realize what is best
for the rollout.

2-4
Part II
Query Optimizer Fundamentals
To tune Oracle SQL, you must understand the query optimizer. The optimizer is built-in
software that determines the most efficient method for a statement to access data.
This part contains the following chapters:
3
SQL Processing
This chapter explains how database processes DDL statements to create objects, DML to
modify data, and queries to retrieve data.
This chapter contains the following topics:

3.1 About SQL Processing


SQL processing is the parsing, optimization, row source generation, and execution of a SQL
statement.
The following figure depicts the general stages of SQL processing. Depending on the
statement, the database may omit some of these stages.

Figure 3-1 Stages of SQL Processing

SQL Statement

Parsing

Syntax
Check

Semantic
Check

Shared Pool Soft Parse


Check

Hard Parse

Generation of
multiple Optimization
execution plans

Generation of Row Source


query plan Generation

Execution

3-1
Chapter 3
About SQL Processing

This section contains the following topics:

3.1.1 SQL Parsing


The first stage of SQL processing is parsing.
The parsing stage involves separating the pieces of a SQL statement into a data
structure that other routines can process. The database parses a statement when
instructed by the application, which means that only the application, and not the
database itself, can reduce the number of parses.
When an application issues a SQL statement, the application makes a parse call to the
database to prepare the statement for execution. The parse call opens or creates a
cursor, which is a handle for the session-specific private SQL area that holds a parsed
SQL statement and other processing information. The cursor and private SQL area are
in the program global area (PGA).
During the parse call, the database performs checks that identify the errors that can be
found before statement execution. Some errors cannot be caught by parsing. For
example, the database can encounter deadlocks or errors in data conversion only
during statement execution.
This section contains the following topics:

See Also:
Oracle Database Concepts to learn about deadlocks

3.1.1.1 Syntax Check


Oracle Database must check each SQL statement for syntactic validity.
A statement that breaks a rule for well-formed SQL syntax fails the check. For
example, the following statement fails because the keyword FROM is misspelled as
FORM:

SQL> SELECT * FORM employees;


SELECT * FORM employees
*
ERROR at line 1:
ORA-00923: FROM keyword not found where expected

3.1.1.2 Semantic Check


The semantics of a statement are its meaning. A semantic check determines whether
a statement is meaningful, for example, whether the objects and columns in the
statement exist.

3-2
Chapter 3
About SQL Processing

A syntactically correct statement can fail a semantic check, as shown in the following
example of a query of a nonexistent table:

SQL> SELECT * FROM nonexistent_table;


SELECT * FROM nonexistent_table
*
ERROR at line 1:
ORA-00942: table or view does not exist

3.1.1.3 Shared Pool Check


During the parse, the database performs a shared pool check to determine whether it can
skip resource-intensive steps of statement processing.
To this end, the database uses a hashing algorithm to generate a hash value for every SQL
statement. The statement hash value is the SQL ID shown in V$SQL.SQL_ID. This hash value
is deterministic within a version of Oracle Database, so the same statement in a single
instance or in different instances has the same SQL ID.
When a user submits a SQL statement, the database searches the shared SQL area to see if
an existing parsed statement has the same hash value. The hash value of a SQL statement
is distinct from the following values:
• Memory address for the statement
Oracle Database uses the SQL ID to perform a keyed read in a lookup table. In this way,
the database obtains possible memory addresses of the statement.
• Hash value of an execution plan for the statement
A SQL statement can have multiple plans in the shared pool. Typically, each plan has a
different hash value. If the same SQL ID has multiple plan hash values, then the
database knows that multiple plans exist for this SQL ID.
Parse operations fall into the following categories, depending on the type of statement
submitted and the result of the hash check:
• Hard parse
If Oracle Database cannot reuse existing code, then it must build a new executable
version of the application code. This operation is known as a hard parse, or a library
cache miss.

Note:
The database always performs a hard parse of DDL.

During the hard parse, the database accesses the library cache and data dictionary
cache numerous times to check the data dictionary. When the database accesses these
areas, it uses a serialization device called a latch on required objects so that their
definition does not change. Latch contention increases statement execution time and
decreases concurrency.
• Soft parse

3-3
Chapter 3
About SQL Processing

A soft parse is any parse that is not a hard parse. If the submitted statement is the
same as a reusable SQL statement in the shared pool, then Oracle Database
reuses the existing code. This reuse of code is also called a library cache hit.
Soft parses can vary in how much work they perform. For example, configuring the
session shared SQL area can sometimes reduce the amount of latching in the soft
parses, making them "softer."
In general, a soft parse is preferable to a hard parse because the database skips
the optimization and row source generation steps, proceeding straight to
execution.
The following graphic is a simplified representation of a shared pool check of an
UPDATE statement in a dedicated server architecture.

Figure 3-2 Shared Pool Check

Reserved
SQL Area

Pool
User Global Area (SGA)

Private
Shared Pool

Other

Private SQL Area


Comparison of hash values

3967354608
Shared SQL Area

Server
Cache
Result
Library Cache

3667723989
3967354608
2190280494

SQL Work Areas


System

Dictionary

PGA
Cache

Session Memory
Data

Process
Server
Process
Client
Update ...

User

If a check determines that a statement in the shared pool has the same hash value,
then the database performs semantic and environment checks to determine whether
the statements have the same meaning. Identical syntax is not sufficient. For example,
suppose two different users log in to the database and issue the following SQL
statements:

CREATE TABLE my_table ( some_col INTEGER );


SELECT * FROM my_table;

The SELECT statements for the two users are syntactically identical, but two separate
schema objects are named my_table. This semantic difference means that the second
statement cannot reuse the code for the first statement.
Even if two statements are semantically identical, an environmental difference can
force a hard parse. In this context, the optimizer environment is the totality of session

3-4
Chapter 3
About SQL Processing

settings that can affect execution plan generation, such as the work area size or optimizer
settings (for example, the optimizer mode). Consider the following series of SQL statements
executed by a single user:

ALTER SESSION SET OPTIMIZER_MODE=ALL_ROWS;


ALTER SYSTEM FLUSH SHARED_POOL; # optimizer environment 1
SELECT * FROM sh.sales;

ALTER SESSION SET OPTIMIZER_MODE=FIRST_ROWS; # optimizer environment 2


SELECT * FROM sh.sales;

ALTER SESSION SET SQL_TRACE=true; # optimizer environment 3


SELECT * FROM sh.sales;

In the preceding example, the same SELECT statement is executed in three different optimizer
environments. Consequently, the database creates three separate shared SQL areas for
these statements and forces a hard parse of each statement.

See Also:

• Oracle Database Concepts to learn about private SQL areas and shared SQL
areas
• Oracle Database Performance Tuning Guide to learn how to configure the
shared pool
• Oracle Database Concepts to learn about latches

3.1.2 SQL Optimization


During optimization, Oracle Database must perform a hard parse at least once for every
unique DML statement and performs the optimization during this parse.
The database does not optimize DDL. The only exception is when the DDL includes a DML
component such as a subquery that requires optimization.

3.1.3 SQL Row Source Generation


The row source generator is software that receives the optimal execution plan from the
optimizer and produces an iterative execution plan that is usable by the rest of the database.
The iterative plan is a binary program that, when executed by the SQL engine, produces the
result set. The plan takes the form of a combination of steps. Each step returns a row set.
The next step either uses the rows in this set, or the last step returns the rows to the
application issuing the SQL statement.
A row source is a row set returned by a step in the execution plan along with a control
structure that can iteratively process the rows. The row source can be a table, view, or result
of a join or grouping operation.
The row source generator produces a row source tree, which is a collection of row sources.
The row source tree shows the following information:

3-5
Chapter 3
About SQL Processing

• An ordering of the tables referenced by the statement


• An access method for each table mentioned in the statement
• A join method for tables affected by join operations in the statement
• Data operations such as filter, sort, or aggregation
Example 3-1 Execution Plan
This example shows the execution plan of a SELECT statement when AUTOTRACE is
enabled. The statement selects the last name, job title, and department name for all
employees whose last names begin with the letter A. The execution plan for this
statement is the output of the row source generator.

SELECT e.last_name, j.job_title, d.department_name


FROM hr.employees e, hr.departments d, hr.jobs j
WHERE e.department_id = d.department_id
AND e.job_id = j.job_id
AND e.last_name LIKE 'A%';

Execution Plan
----------------------------------------------------------
Plan hash value: 975837011

------------------------------------------------------------------------
---
| Id| Operation | Name |Rows|Bytes|Cost(%CPU)|
Time|
------------------------------------------------------------------------
---
| 0| SELECT STATEMENT | | 3 |189 |7(15)|
00:00:01 |
|*1| HASH JOIN | | 3 |189 |7(15)|
00:00:01 |
|*2| HASH JOIN | | 3 |141 |5(20)|
00:00:01 |
| 3| TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 3 | 60 |2 (0)|
00:00:01 |
|*4| INDEX RANGE SCAN | EMP_NAME_IX | 3 | |1 (0)|
00:00:01 |
| 5| TABLE ACCESS FULL | JOBS |19 |513 |2 (0)|
00:00:01 |
| 6| TABLE ACCESS FULL | DEPARTMENTS |27 |432 |2 (0)|
00:00:01 |
------------------------------------------------------------------------
---

Predicate Information (identified by operation id):


---------------------------------------------------

1 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
2 - access("E"."JOB_ID"="J"."JOB_ID")
4 - access("E"."LAST_NAME" LIKE 'A%')
filter("E"."LAST_NAME" LIKE 'A%')

3-6
Chapter 3
About SQL Processing

3.1.4 SQL Execution


During execution, the SQL engine executes each row source in the tree produced by the row
source generator. This step is the only mandatory step in DML processing.
Figure 3-3 is an execution tree, also called a parse tree, that shows the flow of row sources
from one step to another in the plan in Example 3-1. In general, the order of the steps in
execution is the reverse of the order in the plan, so you read the plan from the bottom up.
Each step in an execution plan has an ID number. The numbers in Figure 3-3 correspond to
the Id column in the plan shown in Example 3-1. Initial spaces in the Operation column of
the plan indicate hierarchical relationships. For example, if the name of an operation is
preceded by two spaces, then this operation is a child of an operation preceded by one
space. Operations preceded by one space are children of the SELECT statement itself.

Figure 3-3 Row Source Tree

1
HASH JOIN

2 6
HASH JOIN TABLE ACCESS
FULL
departments

3 5
TABLE ACCESS TABLE ACCESS
BY INDEX ROWID FULL
employees jobs

4
INDEX RANGE
SCAN
emp_name_ix

In Figure 3-3, each node of the tree acts as a row source, which means that each step of the
execution plan in Example 3-1 either retrieves rows from the database or accepts rows from
one or more row sources as input. The SQL engine executes each row source as follows:

3-7
Chapter 3
How Oracle Database Processes DML

• Steps indicated by the black boxes physically retrieve data from an object in the
database. These steps are the access paths, or techniques for retrieving data from
the database.
– Step 6 uses a full table scan to retrieve all rows from the departments table.
– Step 5 uses a full table scan to retrieve all rows from the jobs table.
– Step 4 scans the emp_name_ix index in order, looking for each key that begins
with the letter A and retrieving the corresponding rowid. For example, the rowid
corresponding to Atkinson is AAAPzRAAFAAAABSAAe.
– Step 3 retrieves from the employees table the rows whose rowids were
returned by Step 4. For example, the database uses rowid
AAAPzRAAFAAAABSAAe to retrieve the row for Atkinson.
• Steps indicated by the clear boxes operate on row sources.
– Step 2 performs a hash join, accepting row sources from Steps 3 and 5,
joining each row from the Step 5 row source to its corresponding row in Step
3, and returning the resulting rows to Step 1.
For example, the row for employee Atkinson is associated with the job name
Stock Clerk.
– Step 1 performs another hash join, accepting row sources from Steps 2 and 6,
joining each row from the Step 6 source to its corresponding row in Step 2,
and returning the result to the client.
For example, the row for employee Atkinson is associated with the
department named Shipping.
In some execution plans the steps are iterative and in others sequential. The hash join
shown in Example 3-1 is sequential. The database completes the steps in their entirety
based on the join order. The database starts with the index range scan of
emp_name_ix. Using the rowids that it retrieves from the index, the database reads the
matching rows in the employees table, and then scans the jobs table. After it retrieves
the rows from the jobs table, the database performs the hash join.

During execution, the database reads the data from disk into memory if the data is not
in memory. The database also takes out any locks and latches necessary to ensure
data integrity and logs any changes made during the SQL execution. The final stage of
processing a SQL statement is closing the cursor.

3.2 How Oracle Database Processes DML


Most DML statements have a query component. In a query, execution of a cursor
places the results of the query into a set of rows called the result set.
This section contains the following topics:

3.2.1 How Row Sets Are Fetched


Result set rows can be fetched either a row at a time or in groups.
In the fetch stage, the database selects rows and, if requested by the query, orders the
rows. Each successive fetch retrieves another row of the result until the last row has
been fetched.

3-8
Chapter 3
How Oracle Database Processes DDL

In general, the database cannot determine for certain the number of rows to be retrieved by a
query until the last row is fetched. Oracle Database retrieves the data in response to fetch
calls, so that the more rows the database reads, the more work it performs. For some queries
the database returns the first row as quickly as possible, whereas for others it creates the
entire result set before returning the first row.

3.2.2 Read Consistency


In general, a query retrieves data by using the Oracle Database read consistency
mechanism, which guarantees that all data blocks read by a query are consistent to a single
point in time.
Read consistency uses undo data to show past versions of data. For an example, suppose a
query must read 100 data blocks in a full table scan. The query processes the first 10 blocks
while DML in a different session modifies block 75. When the first session reaches block 75, it
realizes the change and uses undo data to retrieve the old, unmodified version of the data
and construct a noncurrent version of block 75 in memory.

See Also:
Oracle Database Concepts to learn about multiversion read consistency

3.2.3 Data Changes


DML statements that must change data use read consistency to retrieve only the data that
matched the search criteria when the modification began.
Afterward, these statements retrieve the data blocks as they exist in their current state and
make the required modifications. The database must perform other actions related to the
modification of the data such as generating redo and undo data.

3.3 How Oracle Database Processes DDL


Oracle Database processes DDL differently from DML.
For example, when you create a table, the database does not optimize the CREATE TABLE
statement. Instead, Oracle Database parses the DDL statement and carries out the
command.
The database processes DDL differently because it is a means of defining an object in the
data dictionary. Typically, Oracle Database must parse and execute many recursive SQL
statements to execute a DDL statement. Suppose you create a table as follows:

CREATE TABLE mytable (mycolumn INTEGER);

Typically, the database would run dozens of recursive statements to execute the preceding
statement. The recursive SQL would perform actions such as the following:
• Issue a COMMIT before executing the CREATE TABLE statement
• Verify that user privileges are sufficient to create the table
• Determine which tablespace the table should reside in

3-9
Chapter 3
How Oracle Database Processes DDL

• Ensure that the tablespace quota has not been exceeded


• Ensure that no object in the schema has the same name
• Insert rows that define the table into the data dictionary
• Issue a COMMIT if the DDL statement succeeded or a ROLLBACK if it did not

See Also:
Oracle Database Development Guide to learn about processing DDL,
transaction control, and other types of statements

3-10
4
Query Optimizer Concepts
This chapter describes the most important concepts relating to the query optimizer, including
its principal components.
This chapter contains the following topics:

4.1 Introduction to the Query Optimizer


The query optimizer (called simply the optimizer) is built-in database software that
determines the most efficient method for a SQL statement to access requested data.
This section contains the following topics:

4.1.1 Purpose of the Query Optimizer


The optimizer attempts to generate the most optimal execution plan for a SQL statement.
The optimizer choose the plan with the lowest cost among all considered candidate plans.
The optimizer uses available statistics to calculate cost. For a specific query in a given
environment, the cost computation accounts for factors of query execution such as I/O, CPU,
and communication.
For example, a query might request information about employees who are managers. If the
optimizer statistics indicate that 80% of employees are managers, then the optimizer may
decide that a full table scan is most efficient. However, if statistics indicate that very few
employees are managers, then reading an index followed by a table access by rowid may be
more efficient than a full table scan.
Because the database has many internal statistics and tools at its disposal, the optimizer is
usually in a better position than the user to determine the optimal method of statement
execution. For this reason, all SQL statements use the optimizer.

4.1.2 Cost-Based Optimization


Query optimization is the process of choosing the most efficient means of executing a SQL
statement.
SQL is a nonprocedural language, so the optimizer is free to merge, reorganize, and process
in any order. The database optimizes each SQL statement based on statistics collected about
the accessed data. The optimizer determines the optimal plan for a SQL statement by
examining multiple access methods, such as full table scan or index scans, different join
methods such as nested loops and hash joins, different join orders, and possible
transformations.
For a given query and environment, the optimizer assigns a relative numerical cost to each
step of a possible plan, and then factors these values together to generate an overall cost
estimate for the plan. After calculating the costs of alternative plans, the optimizer chooses
the plan with the lowest cost estimate. For this reason, the optimizer is sometimes called the
cost-based optimizer (CBO) to contrast it with the legacy rule-based optimizer (RBO).

4-1
Chapter 4
Introduction to the Query Optimizer

Note:
The optimizer may not make the same decisions from one version of Oracle
Database to the next. In recent versions, the optimizer might make different
decision because better information is available and more optimizer
transformations are possible.

4.1.3 Execution Plans


An execution plan describes a recommended method of execution for a SQL
statement.
The plan shows the combination of the steps Oracle Database uses to execute a SQL
statement. Each step either retrieves rows of data physically from the database or
prepares them for the user issuing the statement.
An execution plan displays the cost of the entire plan, indicated on line 0, and each
separate operation. The cost is an internal unit that the execution plan only displays to
allow for plan comparisons. Thus, you cannot tune or change the cost value.
In the following graphic, the optimizer generates two possible execution plans for an
input SQL statement, uses statistics to estimate their costs, compares their costs, and
then chooses the plan with the lowest cost.

Figure 4-1 Execution Plans

GB Plan GB Plan
1 2
NL HJ
NL HJ

Generates Multiple
Plans and
Compares Them

Parsed Representation Final Plan with


of SQL Statement Lowest Cost

Input Output GB Plan


Optimizer 2
HJ
HJ

Statistics
1 0 1 1 0 0 1 0 0

This section contains the following topics:

4.1.3.1 Query Blocks


The input to the optimizer is a parsed representation of a SQL statement.

4-2
Chapter 4
Introduction to the Query Optimizer

Each SELECT block in the original SQL statement is represented internally by a query block. A
query block can be a top-level statement, subquery, or unmerged view.
Example 4-1 Query Blocks
The following SQL statement consists of two query blocks. The subquery in parentheses is
the inner query block. The outer query block, which is the rest of the SQL statement, retrieves
names of employees in the departments whose IDs were supplied by the subquery. The
query form determines how query blocks are interrelated.

SELECT first_name, last_name


FROM hr.employees
WHERE department_id
IN (SELECT department_id
FROM hr.departments
WHERE location_id = 1800);

See Also:

• "View Merging"
• Oracle Database Concepts for an overview of SQL processing

4.1.3.2 Query Subplans


For each query block, the optimizer generates a query subplan.
The database optimizes query blocks separately from the bottom up. Thus, the database
optimizes the innermost query block first and generates a subplan for it, and then generates
the outer query block representing the entire query.
The number of possible plans for a query block is proportional to the number of objects in the
FROM clause. This number rises exponentially with the number of objects. For example, the
possible plans for a join of five tables are significantly higher than the possible plans for a join
of two tables.

4.1.3.3 Analogy for the Optimizer


One analogy for the optimizer is an online trip advisor.
A cyclist wants to know the most efficient bicycle route from point A to point B. A query is like
the directive "I need the most efficient route from point A to point B" or "I need the most
efficient route from point A to point B by way of point C." The trip advisor uses an internal
algorithm, which relies on factors such as speed and difficulty, to determine the most efficient
route. The cyclist can influence the trip advisor's decision by using directives such as "I want
to arrive as fast as possible" or "I want the easiest ride possible."
In this analogy, an execution plan is a possible route generated by the trip advisor. Internally,
the advisor may divide the overall route into several subroutes (subplans), and calculate the
efficiency for each subroute separately. For example, the trip advisor may estimate one
subroute at 15 minutes with medium difficulty, an alternative subroute at 22 minutes with
minimal difficulty, and so on.

4-3
Chapter 4
About Optimizer Components

The advisor picks the most efficient (lowest cost) overall route based on user-specified
goals and the available statistics about roads and traffic conditions. The more accurate
the statistics, the better the advice. For example, if the advisor is not frequently notified
of traffic jams, road closures, and poor road conditions, then the recommended route
may turn out to be inefficient (high cost).

4.2 About Optimizer Components


The optimizer contains three components: the transformer, estimator, and plan
generator.
The following graphic illustrates the components.

Figure 4-2 Optimizer Components

Dictionary
Data

(to Row Source Generator)


statistics
Transformed query

Query + estimates
Parsed Query
(from Parser)

Query Plan
Transformer

Generator
Estimator
Query

Plan

A set of query blocks represents a parsed query, which is the input to the optimizer.
The following table describes the optimizer operations.

Table 4-1 Optimizer Operations

Phase Operation Description To Learn More


1 Query Transformer The optimizer determines whether it is helpful "Query
to change the form of the query so that the Transformer"
optimizer can generate a better execution
plan.
2 Estimator The optimizer estimates the cost of each plan "Estimator"
based on statistics in the data dictionary.

4-4
Chapter 4
About Optimizer Components

Table 4-1 (Cont.) Optimizer Operations

Phase Operation Description To Learn More


3 Plan Generator The optimizer compares the costs of plans "Plan Generator"
and chooses the lowest-cost plan, known as
the execution plan, to pass to the row source
generator.

This section contains the following topics:

4.2.1 Query Transformer


For some statements, the query transformer determines whether it is advantageous to rewrite
the original SQL statement into a semantically equivalent SQL statement with a lower cost.
When a viable alternative exists, the database calculates the cost of the alternatives
separately and chooses the lowest-cost alternative. The following graphic shows the query
transformer rewriting an input query that uses OR into an output query that uses UNION ALL.

Figure 4-3 Query Transformer

SELECT *
FROM sales
WHERE promo_id=33
OR prod_id=136;

Query Transformer

SELECT *
FROM sales
WHERE prod_id=136
UNION ALL
SELECT *
FROM sales
WHERE promo_id=33
AND LNNVL(prod_id=136);

4.2.2 Estimator
The estimator is the component of the optimizer that determines the overall cost of a given
execution plan.
The estimator uses three different measures to determine cost:
• Selectivity
The percentage of rows in the row set that the query selects, with 0 meaning no rows and
1 meaning all rows. Selectivity is tied to a query predicate, such as WHERE last_name

4-5
Chapter 4
About Optimizer Components

LIKE 'A%', or a combination of predicates. A predicate becomes more selective


as the selectivity value approaches 0 and less selective (or more unselective) as
the value approaches 1.

Note:
Selectivity is an internal calculation that is not visible in the execution
plans.

• Cardinality
The cardinality is the number of rows returned by each operation in an execution
plan. This input, which is crucial to obtaining an optimal plan, is common to all cost
functions. The estimator can derive cardinality from the table statistics collected by
DBMS_STATS, or derive it after accounting for effects from predicates (filter, join, and
so on), DISTINCT or GROUP BY operations, and so on. The Rows column in an
execution plan shows the estimated cardinality.
• Cost
This measure represents units of work or resource used. The query optimizer uses
disk I/O, CPU usage, and memory usage as units of work.
As shown in the following graphic, if statistics are available, then the estimator uses
them to compute the measures. The statistics improve the degree of accuracy of the
measures.

Figure 4-4 Estimator

Cardinality
Selectivity Cost

GB Plan
Estimator
HJ
HJ Total Cost

Statistics
1 0 1 0 0
0 0 0 1 1
0 1 1 0 1

For the query shown in Example 4-1, the estimator uses selectivity, estimated
cardinality (a total return of 10 rows), and cost measures to produce its total cost
estimate of 3:

------------------------------------------------------------------------
---
|Id| Operation |Name |Rows|Bytes|Cost %CPU|
Time|
------------------------------------------------------------------------
---
| 0| SELECT STATEMENT | |10|250|3 (0)|
00:00:01|

4-6
Chapter 4
About Optimizer Components

| 1| NESTED LOOPS | | | | | |
| 2| NESTED LOOPS | |10|250|3 (0)|00:00:01|
|*3| TABLE ACCESS FULL |DEPARTMENTS | 1| 7|2 (0)|00:00:01|
|*4| INDEX RANGE SCAN |EMP_DEPARTMENT_IX|10| |0 (0)|00:00:01|
| 5| TABLE ACCESS BY INDEX ROWID|EMPLOYEES |10|180|1 (0)|00:00:01|
---------------------------------------------------------------------------

This section contains the following topics:

4.2.2.1 Selectivity
The selectivity represents a fraction of rows from a row set.
The row set can be a base table, a view, or the result of a join. The selectivity is tied to a
query predicate, such as last_name = 'Smith', or a combination of predicates, such as
last_name = 'Smith' AND job_id = 'SH_CLERK'.

Note:
Selectivity is an internal calculation that is not visible in execution plans.

A predicate filters a specific number of rows from a row set. Thus, the selectivity of a
predicate indicates how many rows pass the predicate test. Selectivity ranges from 0.0 to 1.0.
A selectivity of 0.0 means that no rows are selected from a row set, whereas a selectivity of
1.0 means that all rows are selected. A predicate becomes more selective as the value
approaches 0.0 and less selective (or more unselective) as the value approaches 1.0.
The optimizer estimates selectivity depending on whether statistics are available:
• Statistics not available
Depending on the value of the OPTIMIZER_DYNAMIC_SAMPLING initialization parameter, the
optimizer either uses dynamic statistics or an internal default value. The database uses
different internal defaults depending on the predicate type. For example, the internal
default for an equality predicate (last_name = 'Smith') is lower than for a range
predicate (last_name > 'Smith') because an equality predicate is expected to return a
smaller fraction of rows.
• Statistics available
When statistics are available, the estimator uses them to estimate selectivity. Assume
there are 150 distinct employee last names. For an equality predicate last_name =
'Smith', selectivity is the reciprocal of the number n of distinct values of last_name,
which in this example is .006 because the query selects rows that contain 1 out of 150
distinct values.
If a histogram exists on the last_name column, then the estimator uses the histogram
instead of the number of distinct values. The histogram captures the distribution of
different values in a column, so it yields better selectivity estimates, especially for
columns that have data skew.

4-7
Chapter 4
About Optimizer Components

See Also:

• "Histograms "
• Oracle Database Reference to learn more about
OPTIMIZER_DYNAMIC_SAMPLING

4.2.2.2 Cardinality
The cardinality is the number of rows returned by each operation in an execution
plan.
For example, if the optimizer estimate for the number of rows returned by a full table
scan is 100, then the cardinality estimate for this operation is 100. The cardinality
estimate appears in the Rows column of the execution plan.

The optimizer determines the cardinality for each operation based on a complex set of
formulas that use both table and column level statistics, or dynamic statistics, as input.
The optimizer uses one of the simplest formulas when a single equality predicate
appears in a single-table query, with no histogram. In this case, the optimizer assumes
a uniform distribution and calculates the cardinality for the query by dividing the total
number of rows in the table by the number of distinct values in the column used in the
WHERE clause predicate.

For example, user hr queries the employees table as follows:

SELECT first_name, last_name


FROM employees
WHERE salary='10200';

The employees table contains 107 rows. The current database statistics indicate that
the number of distinct values in the salary column is 58. Therefore, the optimizer
estimates the cardinality of the result set as 2, using the formula 107/58=1.84.

Cardinality estimates must be as accurate as possible because they influence all


aspects of the execution plan. Cardinality is important when the optimizer determines
the cost of a join. For example, in a nested loops join of the employees and
departments tables, the number of rows in employees determines how often the
database must probe the departments table. Cardinality is also important for
determining the cost of sorts.

4.2.2.3 Cost
The optimizer cost model accounts for the machine resources that a query is
predicted to use.
The cost is an internal numeric measure that represents the estimated resource usage
for a plan. The cost is specific to a query in an optimizer environment. To estimate
cost, the optimizer considers factors such as the following:
• System resources, which includes estimated I/O, CPU, and memory
• Estimated number of rows returned (cardinality)

4-8
Chapter 4
About Optimizer Components

• Size of the initial data sets


• Distribution of the data
• Access structures

Note:
The cost is an internal measure that the optimizer uses to compare different plans
for the same query. You cannot tune or change cost.

The execution time is a function of the cost, but cost does not equate directly to time. For
example, if the plan for query A has a lower cost than the plan for query B, then the following
outcomes are possible:
• A executes faster than B.
• A executes slower than B.
• A executes in the same amount of time as B.
Therefore, you cannot compare the costs of different queries with one another. Also, you
cannot compare the costs of semantically equivalent queries that use different optimizer
modes.

4.2.3 Plan Generator


The plan generator explores various plans for a query block by trying out different access
paths, join methods, and join orders.
Many plans are possible because of the various combinations that the database can use to
produce the same result. The optimizer picks the plan with the lowest cost.
The following graphic shows the optimizer testing different plans for an input query.

4-9
Chapter 4
About Optimizer Components

Figure 4-5 Plan Generator

departments 0 employees 1
employees 0 departments 1
WHERE e.department_id = d.department_id;
hr.employees e, hr.departments d
SELECT e.last_name, d.department_name

Lowest Cost Plan


Join Order

departments 0, employees 1
Full Table Scan
Access Path
Index

Hash Join
Loop, Sort Merge
Hash, Nested
Join Method
Transformer
FROM

Optimizer

The following snippet from an optimizer trace file shows some computations that the
optimizer performs:

GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
Permutations for Starting Table :0
Join order[1]: DEPARTMENTS[D]#0 EMPLOYEES[E]#1

***************
Now joining: EMPLOYEES[E]#1
***************
NL Join
Outer table: Card: 27.00 Cost: 2.01 Resp: 2.01 Degree: 1 Bytes: 16
Access path analysis for EMPLOYEES
. . .
Best NL cost: 13.17
. . .
SM Join
SM cost: 6.08
resc: 6.08 resc_io: 4.00 resc_cpu: 2501688
resp: 6.08 resp_io: 4.00 resp_cpu: 2501688
. . .
SM Join (with index on outer)
Access Path: index (FullScan)
. . .
HA Join

4-10
Chapter 4
About Automatic Tuning Optimizer

HA cost: 4.57
resc: 4.57 resc_io: 4.00 resc_cpu: 678154
resp: 4.57 resp_io: 4.00 resp_cpu: 678154
Best:: JoinMethod: Hash
Cost: 4.57 Degree: 1 Resp: 4.57 Card: 106.00 Bytes: 27
. . .

***********************
Join order[2]: EMPLOYEES[E]#1 DEPARTMENTS[D]#0
. . .

***************
Now joining: DEPARTMENTS[D]#0
***************
. . .
HA Join
HA cost: 4.58
resc: 4.58 resc_io: 4.00 resc_cpu: 690054
resp: 4.58 resp_io: 4.00 resp_cpu: 690054
Join order aborted: cost > best plan cost
***********************

The trace file shows the optimizer first trying the departments table as the outer table in the
join. The optimizer calculates the cost for three different join methods: nested loops join (NL),
sort merge (SM), and hash join (HA). The optimizer picks the hash join as the most efficient
method:

Best:: JoinMethod: Hash


Cost: 4.57 Degree: 1 Resp: 4.57 Card: 106.00 Bytes: 27

The optimizer then tries a different join order, using employees as the outer table. This join
order costs more than the previous join order, so it is abandoned.
The optimizer uses an internal cutoff to reduce the number of plans it tries when finding the
lowest-cost plan. The cutoff is based on the cost of the current best plan. If the current best
cost is large, then the optimizer explores alternative plans to find a lower cost plan. If the
current best cost is small, then the optimizer ends the search swiftly because further cost
improvement is not significant.

4.3 About Automatic Tuning Optimizer


The optimizer performs different operations depending on how it is invoked.
The database provides the following types of optimization:
• Normal optimization
The optimizer compiles the SQL and generates an execution plan. The normal mode
generates a reasonable plan for most SQL statements. Under normal mode, the
optimizer operates with strict time constraints, usually a fraction of a second, during
which it must find an optimal plan.
• SQL Tuning Advisor optimization
When SQL Tuning Advisor invokes the optimizer, the optimizer is known as Automatic
Tuning Optimizer. In this case, the optimizer performs additional analysis to further

4-11
Chapter 4
About Adaptive Query Optimization

improve the plan produced in normal mode. The optimizer output is not an
execution plan, but a series of actions, along with their rationale and expected
benefit for producing a significantly better plan.

See Also:

• "Analyzing SQL with SQL Tuning Advisor"


• Oracle Database 2 Day + Performance Tuning Guide to learn more
about SQL Tuning Advisor

4.4 About Adaptive Query Optimization


In Oracle Database, adaptive query optimization enables the optimizer to make run-
time adjustments to execution plans and discover additional information that can lead
to better statistics.
Adaptive optimization is helpful when existing statistics are not sufficient to generate
an optimal plan. The following graphic shows the feature set for adaptive query
optimization.

Figure 4-6 Adaptive Query Optimization

Adaptive Query
Optimization

Adaptive Adaptive
Plans Statistics

Join Parallel Bitmap Dynamic Automatic SQL Plan


Methods Distribution Index Pruning Statistics Reoptimization Directives
Methods

This section contains the following topics:

4.4.1 Adaptive Query Plans


An adaptive query plan enables the optimizer to make a plan decision for a
statement during execution.
Adaptive query plans enable the optimizer to fix some classes of problems at run time.
Adaptive plans are enabled by default.
This section contains the following topics:

4-12
Chapter 4
About Adaptive Query Optimization

4.4.1.1 About Adaptive Query Plans


An adaptive query plan contains multiple predetermined subplans, and an optimizer statistics
collector. Based on the statistics collected during execution, the dynamic plan coordinator
chooses the best plan at run time.

Dynamic Plans
To change plans at runtime, adaptive query plans use a dynamic plan, which is represented
as a set of subplan groups. A subplan group is a set of subplans. A subplan is a portion of a
plan that the optimizer can switch to as an alternative at run time. For example, a nested
loops join could switch to a hash join during execution.
The optimizer decides which subplan to use at run time. When notified of a new statistic
value relevant to a subplan group, the coordinator dispatches it to the handler function for this
subgroup.

Figure 4-7 Dynamic Plan Coordinator

Dynamic Plan

Subplan Group Subplan Group


Dynamic Plan
Coordinator
GB GB
Subplan Subplan
NL NL
NL NL

GB GB
Subplan Subplan
HJ HJ
HJ HJ

Optimizer Statistics Collector


An optimizer statistics collector is a row source inserted into a plan at key points to collect
run-time statistics relating to cardinality and histograms. These statistics help the optimizer
make a final decision between multiple subplans. The collector also supports optional
buffering up to an internal threshold.
For parallel buffering statistics collectors, each parallel execution server collects the statistics,
which the parallel query coordinator aggregates and then sends to the clients. In this context,
a client is a consumer of the collected statistics, such as a dynamic plan. Each client
specifies a callback function to be executed on each parallel server or on the query
coordinator.

4.4.1.2 Purpose of Adaptive Query Plans


The ability of the optimizer to adapt a plan, based on statistics obtained during execution, can
greatly improve query performance.
Adaptive query plans are useful because the optimizer occasionally picks a suboptimal
default plan because of a cardinality misestimate. The ability of the optimizer to pick the best

4-13
Chapter 4
About Adaptive Query Optimization

plan at run time based on actual execution statistics results in a more optimal final
plan. After choosing the final plan, the optimizer uses it for subsequent executions,
thus ensuring that the suboptimal plan is not reused.

4.4.1.3 How Adaptive Query Plans Work


For the first execution of a statement, the optimizer uses the default plan, and then
stores an adaptive plan. The database uses the adaptive plan for subsequent
executions unless specific conditions are met.
During the first execution of a statement, the database performs the following steps:
1. The database begins executing the statement using the default plan.
2. The statistics collector gathers information about the in-progress execution, and
buffers some rows received by the subplan.
For parallel buffering statistics collectors, each slave process collects the statistics,
which the query coordinator aggregates before sending to the clients.
3. Based on the statistics gathered by the collector, the optimizer chooses a subplan.
The dynamic plan coordinator decides which subplan to use at runtime for all such
subplan groups. When notified of a new statistic value relevant to a subplan group,
the coordinator dispatches it to the handler function for this subgroup.
4. The collector stops collecting statistics and buffering rows, permitting rows to pass
through instead.
5. The database stores the adaptive plan in the child cursor, so that the next
execution of the statement can use it.
On subsequent executions of the child cursor, the optimizer continues to use the same
adaptive plan unless one of the following conditions is true, in which case it picks a
new plan for the current execution:
• The current plan ages out of the shared pool.
• A different optimizer feature (for example, adaptive cursor sharing or statistics
feedback) invalidates the current plan.
This section contains the following topics:

4.4.1.3.1 Adaptive Query Plans: Join Method Example


This example shows how the optimizer can choose a different plan based on
information collected at runtime.
The following query shows a join of the order_items and prod_info tables.

SELECT product_name
FROM order_items o, prod_info p
WHERE o.unit_price = 15
AND quantity > 1
AND p.product_id = o.product_id

4-14
Chapter 4
About Adaptive Query Optimization

An adaptive query plan for this statement shows two possible plans, one with a nested loops
join and the other with a hash join:

SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(FORMAT => 'ADAPTIVE'));

SQL_ID 7hj8dwwy6gm7p, child number 0


-------------------------------------
SELECT product_name FROM order_items o, prod_info p WHERE
o.unit_price = 15 AND quantity > 1 AND p.product_id = o.product_id

Plan hash value: 1553478007

-----------------------------------------------------------------------------
| Id | Operation | Name |Rows|Bytes|Cost (%CPU)|Time|
-----------------------------------------------------------------------------
| 0| SELECT STATEMENT | | | |7(100)| |
| * 1| HASH JOIN | |4| 128 | 7 (0)|00:00:01|
|- 2| NESTED LOOPS | |4| 128 | 7 (0)|00:00:01|
|- 3| NESTED LOOPS | |4| 128 | 7 (0)|00:00:01|
|- 4| STATISTICS COLLECTOR | | | | | |
| * 5| TABLE ACCESS FULL | ORDER_ITEMS |4| 48 | 3 (0)|00:00:01|
|-* 6| INDEX UNIQUE SCAN | PROD_INFO_PK |1| | 0 (0)| |
|- 7| TABLE ACCESS BY INDEX ROWID| PROD_INFO |1| 20 | 1 (0)|00:00:01|
| 8| TABLE ACCESS FULL | PROD_INFO |1| 20 | 1 (0)|00:00:01|
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

1 - access("P"."PRODUCT_ID"="O"."PRODUCT_ID")
5 - filter(("O"."UNIT_PRICE"=15 AND "QUANTITY">1))
6 - access("P"."PRODUCT_ID"="O"."PRODUCT_ID")

Note
-----
- this is an adaptive plan (rows marked '-' are inactive)

A nested loops join is preferable if the database can avoid scanning a significant portion of
prod_info because its rows are filtered by the join predicate. If few rows are filtered,
however, then scanning the right table in a hash join is preferable.
The following graphic shows the adaptive process. For the query in the preceding example,
the adaptive portion of the default plan contains two subplans, each of which uses a different
join method. The optimizer automatically determines when each join method is optimal,
depending on the cardinality of the left side of the join.
The statistics collector buffers enough rows coming from the order_items table to determine
which join method to use. If the row count is below the threshold determined by the optimizer,
then the optimizer chooses the nested loops join; otherwise, the optimizer chooses the hash
join. In this case, the row count coming from the order_items table is above the threshold, so
the optimizer chooses a hash join for the final plan, and disables buffering.

4-15
Chapter 4
About Adaptive Query Optimization

Figure 4-8 Adaptive Join Methods

Default plan is a nested loops join

Nested Hash
Loops Join

Statistics
Collector

Table scan Index scan Table scan


order_items prod_info_pk prod_info

The optimizer buffers rows coming from the order_items table


up to a point. If the row count is less than the threshold,
then use a nested loops join. Otherwise,
switch to a hash join.
Threshold exceeded,
so subplan switches

The optimizer disables the statistics collector after making the decision,
and lets the rows pass through.

Final plan is a hash join

Nested Hash
Loops Join

Statistics
Collector

Table scan Index scan Table scan


order_items prod_info_pk prod_info

The Note section of the execution plan indicates whether the plan is adaptive, and
which rows in the plan are inactive.

See Also:

• "Controlling Adaptive Optimization"


• "Reading Execution Plans: Advanced" for an extended example showing
an adaptive query plan

4-16
Chapter 4
About Adaptive Query Optimization

4.4.1.3.2 Adaptive Query Plans: Parallel Distribution Methods


Typically, parallel execution requires data redistribution to perform operations such as parallel
sorts, aggregations, and joins.
Oracle Database can use many different data distributions methods. The database chooses
the method based on the number of rows to be distributed and the number of parallel server
processes in the operation.
For example, consider the following alternative cases:
• Many parallel server processes distribute few rows.
The database may choose the broadcast distribution method. In this case, each parallel
server process receives each row in the result set.
• Few parallel server processes distribute many rows.
If a data skew is encountered during the data redistribution, then it could adversely affect
the performance of the statement. The database is more likely to pick a hash distribution
to ensure that each parallel server process receives an equal number of rows.
The hybrid hash distribution technique is an adaptive parallel data distribution that does not
decide the final data distribution method until execution time. The optimizer inserts statistic
collectors in front of the parallel server processes on the producer side of the operation. If the
number of rows is less than a threshold, defined as twice the degree of parallelism (DOP),
then the data distribution method switches from hash to broadcast. Otherwise, the distribution
method is a hash.

Broadcast Distribution
The following graphic depicts a hybrid hash join between the departments and employees
tables, with a query coordinator directing 8 parallel server processes: P5-P8 are producers,
whereas P1-P4 are consumers. Each producer has its own consumer.

4-17
Chapter 4
About Adaptive Query Optimization

Figure 4-9 Adaptive Query with DOP of 4

Query
Coordinator

Statistics collector The number of rows


threshold is 2X returned is below
the DOP threshold, so optimizer
chooses broadcast
method.

P1 P2 P3 P4

departments P5 employees
2

P6
3

P7
4

P8

The database inserts a statistics collector in front of each producer process scanning
the departments table. The query coordinator aggregates the collected statistics. The
distribution method is based on the run-time statistics. In Figure 4-9, the number of
rows is below the threshold (8), which is twice the DOP (4), so the optimizer chooses a
broadcast technique for the departments table.

Hybrid Hash Distribution


Consider an example that returns a greater number of rows. In the following plan, the
threshold is 8, or twice the specified DOP of 4. However, because the statistics
collector (Step 10) discovers that the number of rows (27) is greater than the threshold
(8), the optimizer chooses a hybrid hash distribution rather than a broadcast
distribution. (The time column should show 00:00:01, but shows 0:01 so the plan can
fit the page.)

See Also:
Oracle Database VLDB and Partitioning Guide to learn more about parallel
data redistribution techniques

4-18
Chapter 4
About Adaptive Query Optimization

4.4.1.3.3 Adaptive Query Plans: Bitmap Index Pruning


Adaptive plans prune indexes that do not significantly reduce the number of matched rows.
When the optimizer generates a star transformation plan, it must choose the right
combination of bitmap indexes to reduce the relevant set of rowids as efficiently as possible.
If many indexes exist, some indexes might not reduce the rowid set substantially, but
nevertheless introduce significant processing cost during query execution. Adaptive plans
can solve this problem by not using indexes that degrade performance.
Example 4-2 Bitmap Index Pruning
In this example, you issue the following star query, which joins the cars fact table with
multiple dimension tables (sample output included):

SELECT /*+ star_transformation(r) */ l.color_name, k.make_name,


h.filter_col, count(*)
FROM cars r, colors l, makes k, models d, hcc_tab h
WHERE r.make_id = k.make_id
AND r.color_id = l.color_id
AND r.model_id = d.model_id
AND r.high_card_col = h.high_card_col
AND d.model_name = 'RAV4'
AND k.make_name = 'Toyota'
AND l.color_name = 'Burgundy'
AND h.filter_col = 100
GROUP BY l.color_name, k.make_name, h.filter_col;

COLOR_NA MAKE_N FILTER_COL COUNT(*)


-------- ------ ---------- ----------
Burgundy Toyota 100 15000

The following sample execution plan shows that the query generated no rows for the bitmap
node in Step 12 and Step 17. The adaptive optimizer determined that filtering rows by using
the CAR_MODEL_IDX and CAR_MAKE_IDX indexes was inefficient. The query did not use the
steps in the plan that begin with a dash (-).

-----------------------------------------------------------
| Id | Operation | Name |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT GROUP BY NOSORT | |
| 2 | HASH JOIN | |
| 3 | VIEW | VW_ST_5497B905 |
| 4 | NESTED LOOPS | |
| 5 | BITMAP CONVERSION TO ROWIDS | |
| 6 | BITMAP AND | |
| 7 | BITMAP MERGE | |
| 8 | BITMAP KEY ITERATION | |
| 9 | TABLE ACCESS FULL | COLORS |
| 10 | BITMAP INDEX RANGE SCAN | CAR_COLOR_IDX |
|- 11 | STATISTICS COLLECTOR | |
|- 12 | BITMAP MERGE | |

4-19
Chapter 4
About Adaptive Query Optimization

|- 13 | BITMAP KEY ITERATION | |


|- 14 | TABLE ACCESS FULL | MODELS |
|- 15 | BITMAP INDEX RANGE SCAN | CAR_MODEL_IDX |
|- 16 | STATISTICS COLLECTOR | |
|- 17 | BITMAP MERGE | |
|- 18 | BITMAP KEY ITERATION | |
|- 19 | TABLE ACCESS FULL | MAKES |
|- 20 | BITMAP INDEX RANGE SCAN | CAR_MAKE_IDX |
| 21 | TABLE ACCESS BY USER ROWID | CARS |
| 22 | MERGE JOIN CARTESIAN | |
| 23 | MERGE JOIN CARTESIAN | |
| 24 | MERGE JOIN CARTESIAN | |
| 25 | TABLE ACCESS FULL | MAKES |
| 26 | BUFFER SORT | |
| 27 | TABLE ACCESS FULL | MODELS |
| 28 | BUFFER SORT | |
| 29 | TABLE ACCESS FULL | COLORS |
| 30 | BUFFER SORT | |
| 31 | TABLE ACCESS FULL | HCC_TAB |
-----------------------------------------------------------

Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- star transformation used for this statement
- this is an adaptive plan (rows marked '-' are inactive)

4.4.1.4 When Adaptive Query Plans Are Enabled


Adaptive query plans are enabled by default.
Adaptive plans are enabled when the following initialization parameters are set:
• OPTIMIZER_ADAPTIVE_PLANS is TRUE (default)
• OPTIMIZER_FEATURES_ENABLE is 12.1.0.1 or later
• OPTIMIZER_ADAPTIVE_REPORTING_ONLY is FALSE (default)
Adaptive plans control the following optimizations:
• Nested loops and hash join selection
• Star transformation bitmap pruning
• Adaptive parallel distribution method

See Also:

• "Controlling Adaptive Optimization"


• Oracle Database Reference to learn more about
OPTIMIZER_ADAPTIVE_PLANS

4-20
Chapter 4
About Adaptive Query Optimization

4.4.2 Adaptive Statistics


The optimizer can use adaptive statistics when query predicates are too complex to rely on
base table statistics alone. By default, adaptive statistics are disabled
(OPTIMIZER_ADAPTIVE_STATISTICS is false).

The following topics describe types of adaptive statistics:

4.4.2.1 Dynamic Statistics


Dynamic statistics are an optimization technique in which the database executes a
recursive SQL statement to scan a small random sample of a table's blocks to estimate
predicate cardinalities.
During SQL compilation, the optimizer decides whether to use dynamic statistics by
considering whether available statistics are sufficient to generate an optimal plan. If the
available statistics are insufficient, then the optimizer uses dynamic statistics to augment the
statistics. To improve the quality of optimizer decisions, the optimizer can use dynamic
statistics for table scans, index access, joins, and GROUP BY operations.

4.4.2.2 Automatic Reoptimization


In automatic reoptimization, the optimizer changes a plan on subsequent executions after
the initial execution.
Adaptive query plans are not feasible for all kinds of plan changes. For example, a query with
an inefficient join order might perform suboptimally, but adaptive query plans do not support
adapting the join order during execution. At the end of the first execution of a SQL statement,
the optimizer uses the information gathered during execution to determine whether automatic
reoptimization has a cost benefit. If execution information differs significantly from optimizer
estimates, then the optimizer looks for a replacement plan on the next execution.
The optimizer uses the information gathered during the previous execution to help determine
an alternative plan. The optimizer can reoptimize a query several times, each time gathering
additional data and further improving the plan.
Automatic reoptimization takes the following forms:

4.4.2.2.1 Reoptimization: Statistics Feedback


A form of reoptimization known as statistics feedback (formerly known as cardinality
feedback) automatically improves plans for repeated queries that have cardinality
misestimates.
The optimizer can estimate cardinalities incorrectly for many reasons, such as missing
statistics, inaccurate statistics, or complex predicates. The basic process of reoptimization
using statistics feedback is as follows:
1. During the first execution of a SQL statement, the optimizer generates an execution plan.
The optimizer may enable monitoring for statistics feedback for the shared SQL area in
the following cases:
• Tables with no statistics
• Multiple conjunctive or disjunctive filter predicates on a table

4-21
Chapter 4
About Adaptive Query Optimization

• Predicates containing complex operators for which the optimizer cannot


accurately compute selectivity estimates
2. At the end of the first execution, the optimizer compares its initial cardinality
estimates to the actual number of rows returned by each operation in the plan
during execution.
If estimates differ significantly from actual cardinalities, then the optimizer stores
the correct estimates for subsequent use. The optimizer also creates a SQL plan
directive so that other SQL statements can benefit from the information obtained
during this initial execution.
3. If the query executes again, then the optimizer uses the corrected cardinality
estimates instead of its usual estimates.
The OPTIMIZER_ADAPTIVE_STATISTICS initialization parameter does not control all
features of automatic reoptimization. Specifically, this parameter controls statistics
feedback for join cardinality only in the context of automatic reoptimization. For
example, setting OPTIMIZER_ADAPTIVE_STATISTICS to FALSE disables statistics
feedback for join cardinality misestimates, but it does not disable statistics feedback
for single-table cardinality misestimates.
Example 4-3 Statistics Feedback
This example shows how the database uses statistics feedback to adjust incorrect
estimates.
1. The user oe runs the following query of the orders, order_items, and
product_information tables:

SELECT o.order_id, v.product_name


FROM orders o,
( SELECT order_id, product_name
FROM order_items o, product_information p
WHERE p.product_id = o.product_id
AND list_price < 50
AND min_price < 40 ) v
WHERE o.order_id = v.order_id

2. Querying the plan in the cursor shows that the estimated rows (E-Rows) is far
fewer than the actual rows (A-Rows).

--------------------------------------------------------------------------------------------------
| Id | Operation | Name |Starts|E-Rows|A-Rows|A-Time|Buffers|OMem|1Mem|O/1/M|
--------------------------------------------------------------------------------------------------
| 0| SELECT STATEMENT | | 1| | 269 |00:00:00.14|1338| | | |
| 1| NESTED LOOPS | | 1| 1 | 269 |00:00:00.14|1338| | | |
| 2| MERGE JOIN CARTESIAN| | 1| 4 |9135 |00:00:00.05| 33| | | |
|*3| TABLE ACCESS FULL |PRODUCT_INFORMATION| 1| 1 | 87 |00:00:00.01| 32| | | |
| 4| BUFFER SORT | | 87| 105 |9135 |00:00:00.02| 1|4096|4096|1/0/0|
| 5| INDEX FULL SCAN |ORDER_PK | 1| 105 | 105 |00:00:00.01| 1| | | |
|*6| INDEX UNIQUE SCAN |ORDER_ITEMS_UK |9135| 1 | 269 |00:00:00.04|1305| | | |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

3 - filter(("MIN_PRICE"<40 AND "LIST_PRICE"<50))


6 - access("O"."ORDER_ID"="ORDER_ID" AND "P"."PRODUCT_ID"="O"."PRODUCT_ID")

3. The user oe reruns the query in Step 1.

4-22
Chapter 4
About Adaptive Query Optimization

4. Querying the plan in the cursor shows that the optimizer used statistics feedback (shown
in the Note) for the second execution, and also chose a different plan.

--------------------------------------------------------------------------------------------------
|Id | Operation | Name | Starts |E-Rows|A-Rows|A-Time|Buffers|Reads|OMem|1Mem|O/1/M|
--------------------------------------------------------------------------------------------------
| 0| SELECT STATEMENT | | 1| | 269 |00:00:00.05|60|1| | | |
| 1| NESTED LOOPS | | 1|269| 269 |00:00:00.05|60|1| | | |
|*2| HASH JOIN | | 1|313| 269 |00:00:00.05|39|1|1398K|1398K|1/0/0|
|*3| TABLE ACCESS FULL |PRODUCT_INFORMATION| 1| 87| 87 |00:00:00.01|15|0| | | |
| 4| INDEX FAST FULL SCAN|ORDER_ITEMS_UK | 1|665| 665 |00:00:00.01|24|1| | | |
|*5| INDEX UNIQUE SCAN |ORDER_PK |269| 1| 269 |00:00:00.01|21|0| | | |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

2 - access("P"."PRODUCT_ID"="O"."PRODUCT_ID")
3 - filter(("MIN_PRICE"<40 AND "LIST_PRICE"<50))
5 - access("O"."ORDER_ID"="ORDER_ID")

Note
-----
- statistics feedback used for this statement

In the preceding output, the estimated number of rows (269) in Step 1 matches the actual
number of rows.

4.4.2.2.2 Reoptimization: Performance Feedback


Another form of reoptimization is performance feedback. This reoptimization helps improve
the degree of parallelism automatically chosen for repeated SQL statements when
PARALLEL_DEGREE_POLICY is set to ADAPTIVE.

The basic process of reoptimization using performance feedback is as follows:


1. During the first execution of a SQL statement, when PARALLEL_DEGREE_POLICY is set to
ADAPTIVE, the optimizer determines whether to execute the statement in parallel, and if
so, which degree of parallelism to use.
The optimizer chooses the degree of parallelism based on the estimated performance of
the statement. Additional performance monitoring is enabled for all statements.
2. At the end of the initial execution, the optimizer compares the following:
• The degree of parallelism chosen by the optimizer
• The degree of parallelism computed based on the performance statistics (for
example, the CPU time) gathered during the actual execution of the statement
If the two values vary significantly, then the database marks the statement for reparsing,
and stores the initial execution statistics as feedback. This feedback helps better
compute the degree of parallelism for subsequent executions.
3. If the query executes again, then the optimizer uses the performance statistics gathered
during the initial execution to better determine a degree of parallelism for the statement.

4-23
Chapter 4
About Adaptive Query Optimization

Note:
Even if PARALLEL_DEGREE_POLICY is not set to ADAPTIVE, statistics feedback
may influence the degree of parallelism chosen for a statement.

4.4.2.3 SQL Plan Directives


A SQL plan directive is additional information that the optimizer uses to generate a
more optimal plan.
The directive is a “note to self” by the optimizer that it is misestimating cardinalities of
certain types of predicates, and also a reminder to DBMS_STATS to gather statistics
needed to correct the misestimates in the future.
For example, during query optimization, when deciding whether the table is a
candidate for dynamic statistics, the database queries the statistics repository for
directives on a table. If the query joins two tables that have a data skew in their join
columns, then a SQL plan directive can direct the optimizer to use dynamic statistics to
obtain an accurate cardinality estimate.
The optimizer collects SQL plan directives on query expressions rather than at the
statement level so that it can apply directives to multiple SQL statements. The
optimizer not only corrects itself, but also records information about the mistake, so
that the database can continue to correct its estimates even after a query—and any
similar query—is flushed from the shared pool.
The database automatically creates directives, and stores them in the SYSAUX
tablespace. You can alter, save to disk, and transport directives using the PL/SQL
package DBMS_SPD.

See Also:

• "SQL Plan Directives"


• "Managing SQL Plan Directives"
• Oracle Database PL/SQL Packages and Types Reference to learn about
the DBMS_SPD package

4.4.2.4 When Adaptive Statistics Are Enabled


Adaptive statistics are disabled by default.
Adaptive statistics are enabled when the following initialization parameters are set:
• OPTIMIZER_ADAPTIVE_STATISTICS is TRUE (the default is FALSE)
• OPTIMIZER_FEATURES_ENABLE is 12.1.0.1 or later
Setting OPTIMIZER_ADAPTIVE_STATISTICS to TRUE enables the following features:

• SQL plan directives


• Statistics feedback for join cardinality

4-24
Chapter 4
About Approximate Query Processing

• Adaptive dynamic sampling

Note:
Setting OPTIMIZER_ADAPTIVE_STATISTICS to FALSE preserves statistics feedback for
single-table cardinality misestimates.

See Also:

• "Controlling Adaptive Optimization"


• Oracle Database Reference to learn more about
OPTIMIZER_ADAPTIVE_STATISTICS

4.5 About Approximate Query Processing


Approximate query processing is a set of optimization techniques that speed analytic
queries by calculating results within an acceptable range of error.
Business intelligence (BI) queries heavily rely on sorts that involve aggregate functions such
as COUNT DISTINCT, SUM, RANK, and MEDIAN. For example, an application generates reports
showing how many distinct customers are logged on, or which products were most popular
last week. It is not uncommon for BI applications to have the following requirements:
• Queries must be able to process data sets that are orders of magnitude larger than in
traditional data warehouses.
For example, the daily volumes of web logs of a popular website can reach tens or
hundreds of terabytes a day.
• Queries must provide near real-time response.
For example, a company requires quick detection and response to credit card fraud.
• Explorative queries of large data sets must be fast.
For example, a user might want to find out a list of departments whose sales have
approximately reached a specific threshold. A user would form targeted queries on these
departments to find more detailed information, such as the exact sales number, the
locations of these departments, and so on.
For large data sets, exact aggregation queries consume extensive memory, often spilling to
temp space, and can be unacceptably slow. Applications are often more interested in a
general pattern than exact results, so customers are willing to sacrifice exactitude for speed.
For example, if the goal is to show a bar chart depicting the most popular products, then
whether a product sold 1 million units or .999 million units is statistically insignificant.
Oracle Database implements its solution through approximate query processing. Typically,
the accuracy of the approximate aggregation is over 97% (with 95% confidence), but the
processing time is orders of magnitude faster. The database uses less CPU, and avoids the
I/O cost of writing to temp files.

4-25
Chapter 4
About Approximate Query Processing

See Also:
"NDV Algorithms: Adaptive Sampling and HyperLogLog"

4.5.1 Approximate Query Initialization Parameters


You can implement approximate query processing without changing existing code by
using the APPROX_FOR_* initialization parameters.

Set these parameters at the database or session level. The following table describes
initialization parameters and SQL functions relevant to approximation techniques.

Table 4-2 Approximate Query Initialization Parameters

Initialization Parameter Default Description See Also


APPROX_FOR_AGGREGATION FALSE Enables (TRUE) or disables (FALSE) Oracle
approximate query processing. This Database
parameter acts as an umbrella parameter Reference
for enabling the use of functions that
return approximate results.
APPROX_FOR_COUNT_DISTINCT FALSE Converts COUNT(DISTINCT) to Oracle
APPROX_COUNT_DISTINCT. Database
Reference
APPROX_FOR_PERCENTILE none Converts eligible exact percentile Oracle
functions to their APPROX_PERCENTILE_* Database
counterparts. Reference

See Also:

• "About Optimizer Initialization Parameters"


• Oracle Database Data Warehousing Guide to learn more about
approximate query processing

4.5.2 Approximate Query SQL Functions


Approximate query processing uses SQL functions to provide real-time responses to
explorative queries where approximations are acceptable.
The following table describes SQL functions that return approximate results.

4-26
Chapter 4
About Approximate Query Processing

Table 4-3 Approximate Query User Interface

SQL Function Description See Also


APPROX_COUNT Calculates the approximate top n most common Oracle Database SQL
values when used with the APPROX_RANK function. Language Reference
Returns the approximate count of an expression. If
you supply MAX_ERROR as the second argument, then
the function returns the maximum error between the
actual and approximate count.
You must use this function with a corresponding
APPROX_RANK function in the HAVING clause. If a
query uses APPROX_COUNT, APPROX_SUM, or
APPROX_RANK, then the query must not use any other
non-approximate aggregation functions.
The following query returns the 10 most common jobs
within every department:

SELECT department_id, job_id,


APPROX_COUNT(*)
FROM employees
GROUP BY department_id, job_id
HAVING
APPROX_RANK (
PARTITION BY department_id
ORDER BY APPROX_COUNT(*)
DESC ) <= 10;

APPROX_COUNT_DISTINCT Returns the approximate number of rows that contain Oracle Database SQL
distinct values of an expression. Language Reference
APPROX_COUNT_DISTINCT_AGG Aggregates the precomputed approximate count Oracle Database SQL
distinct synopses to a higher level. Language Reference
APPROX_COUNT_DISTINCT_DETA Returns the synopses of the Oracle Database SQL
IL APPROX_COUNT_DISTINCT function as a BLOB. Language Reference
The database can persist the returned result to disk
for further aggregation.
APPROX_MEDIAN Accepts a numeric or date-time value, and returns an Oracle Database SQL
approximate middle or approximate interpolated value Language Reference
that would be the middle value when the values are
sorted.
This function provides an alternative to the MEDIAN
function.
APPROX_PERCENTILE Accepts a percentile value and a sort specification, Oracle Database SQL
and returns an approximate interpolated value that Language Reference
falls into that percentile value with respect to the sort
specification.
This function provides an alternative to the
PERCENTILE_CONT function.

4-27
Chapter 4
About SQL Plan Management

Table 4-3 (Cont.) Approximate Query User Interface

SQL Function Description See Also


APPROX_RANK Returns the approximate value in a group of values. Oracle Database SQL
This function takes an optional PARTITION BY clause Language Reference
followed by a mandatory ORDER BY ... DESC
clause. The PARTITION BY key must be a subset of
the GROUP BY key. The ORDER BY clause must include
either APPROX_COUNT or APPROX_SUM.
APPROX_SUM Calculates the approximate top n accumulated values Oracle Database SQL
when used with the APPROX_RANK function. Language Reference
If you supply MAX_ERROR as the second argument,
then the function returns the maximum error between
the actual and approximate sum.
You must use this function with a corresponding
APPROX_RANK function in the HAVING clause. If a
query uses APPROX_COUNT, APPROX_SUM, or
APPROX_RANK, then the query must not use any other
non-approximate aggregation functions.
The following query returns the 10 job types within
every department that have the highest aggregate
salary:

SELECT department_id, job_id,


APPROX_SUM(salary)
FROM employees
GROUP BY department_id, job_id
HAVING
APPROX_RANK (
PARTITION BY department_id
ORDER BY APPROX_SUM(salary)
DESC ) <= 10;

Note that APPROX_SUM returns an error when the input


is a negative number.

See Also:
Oracle Database Data Warehousing Guide to learn more about approximate
query processing

4.6 About SQL Plan Management


SQL plan management enables the optimizer to automatically manage execution
plans, ensuring that the database uses only known or verified plans.
SQL plan management can build a SQL plan baseline, which contains one or more
accepted plans for each SQL statement. The optimizer can access and manage the

4-28
Chapter 4
About the Expression Statistics Store (ESS)

plan history and SQL plan baselines of SQL statements. The main objectives are as follows:
• Identify repeatable SQL statements
• Maintain plan history, and possibly SQL plan baselines, for a set of SQL statements
• Detect plans that are not in the plan history
• Detect potentially better plans that are not in the SQL plan baseline
The optimizer uses the normal cost-based search method.

See Also:

• "Managing SQL Plan Baselines"


• Oracle Database PL/SQL Packages and Types Reference to learn about the
DBMS_SPM package

4.7 About the Expression Statistics Store (ESS)


The Expression Statistics Store (ESS) is a repository maintained by the optimizer to store
statistics about expression evaluation.
When an IM column store is enabled, the database leverages the ESS for its In-Memory
Expressions (IM expressions) feature. However, the ESS is independent of the IM column
store. The ESS is a permanent component of the database and cannot be disabled.
The database uses the ESS to determine whether an expression is “hot” (frequently
accessed), and thus a candidate for an IM expression. During a hard parse of a query, the
ESS looks for active expressions in the SELECT list, WHERE clause, GROUP BY clause, and so
on.
For each segment, the ESS maintains expression statistics such as the following:
• Frequency of execution
• Cost of evaluation
• Timestamp evaluation
The optimizer assigns each expression a weighted score based on cost and the number of
times it was evaluated. The values are approximate rather than exact. More active
expressions have higher scores. The ESS maintains an internal list of the most frequently
accessed expressions.
The ESS resides in the SGA and also persists on disk. The database saves the statistics to
disk every 15 minutes, or immediately using the
DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO procedure. The ESS statistics are visible in
the DBA_EXPRESSION_STATISTICS view.

4-29
Chapter 4
About the Expression Statistics Store (ESS)

See Also:

• Oracle Database In-Memory Guide to learn more about the ESS


• Oracle Database PL/SQL Packages and Types Reference to learn more
about DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO

4-30
5
Query Transformations
The optimizer employs many query transformation techniques. This chapter describes some
of the most important.
This chapter contains the following topics:

5.1 OR Expansion
In OR expansion, the optimizer transforms a query block containing top-level disjunctions into
the form of a UNION ALL query that contains two or more branches.

The optimizer achieves this goal by splitting the disjunction into its components, and then
associating each component with a branch of a UNION ALL query. The optimizer can choose
OR expansion for various reasons. For example, it may enable more efficient access paths or
alternative join methods that avoid Cartesian products. As always, the optimizer performs the
expansion only if the cost of the transformed statement is lower than the cost of the original
statement.
In previous releases, the optimizer used the CONCATENATION operator to perform the OR
expansion. Starting in Oracle Database 12c Release 2 (12.2), the optimizer uses the UNION-
ALL operator instead. The framework provides the following enhancements:

• Enables interaction among various transformations


• Avoids sharing query structures
• Enables the exploration of various search strategies
• Provides the reuse of cost annotation
• Supports the standard SQL syntax
Example 5-1 Transformed Query: UNION ALL Condition
To prepare for this example, log in to the database as an administrator, execute the following
statements to add a unique constraint to the hr.departments.department_name column, and
then add 100,000 rows to the hr.employees table:

ALTER TABLE hr.departments ADD CONSTRAINT department_name_uk UNIQUE


(department_name);
DELETE FROM hr.employees WHERE employee_id > 999;
DECLARE
v_counter NUMBER(7) := 1000;
BEGIN
FOR i IN 1..100000 LOOP
INSERT INTO hr.employees
VALUES (v_counter,null,'Doe','Doe' || v_counter ||
'@example.com',null,'07-JUN-02','AC_ACCOUNT',null,null,null,50);
v_counter := v_counter + 1;
END LOOP;
END;

5-1
Chapter 5
OR Expansion

/
COMMIT;
EXEC DBMS_STATS.GATHER_TABLE_STATS ( ownname => 'hr', tabname =>
'employees');

You then connect as the user hr, and execute the following query, which joins the
employees and departments tables:

SELECT *
FROM employees e, departments d
WHERE (e.email='SSTILES' OR d.department_name='Treasury')
AND e.department_id = d.department_id;

Without OR expansion, the optimizer treats e.email='SSTILES' OR


d.department_name='Treasury' as a single unit. Consequently, the optimizer cannot
use the index on either the e.email or d.department_name column, and so performs a
full table scan of employees and departments.

With OR expansion, the optimizer breaks the disjunctive predicate into two independent
predicates, as shown in the following example:

SELECT *
FROM employees e, departments d
WHERE e.email = 'SSTILES'
AND e.department_id = d.department_id
UNION ALL
SELECT *
FROM employees e, departments d
WHERE d.department_name = 'Treasury'
AND e.department_id = d.department_id;

This transformation enables the e.email and d.department_name columns to serve as


index keys. Performance improves because the database filters data using two unique
indexes instead of two full table scans, as shown in the following execution plan:

Plan hash value: 2512933241

--------------------------------------------------------------------------------------
-----
| Id| Operation | Name |Rows|Bytes|Cost(%CPU)|
Time |
--------------------------------------------------------------------------------------
-----
| 0 |SELECT STATEMENT | | | |
122(100)| |
| 1 | VIEW |VW_ORE_19FF4E3E |9102|1679K|122 (5) |
00:00:01|
| 2 | UNION-ALL | | | |
| |
| 3 | NESTED LOOPS | | 1 | 78 | 4 (0) |
00:00:01|
| 4 | TABLE ACCESS BY INDEX ROWID | EMPLOYEES | 1 | 57 | 3 (0) |
00:00:01|

5-2
Chapter 5
View Merging

|*5 | INDEX UNIQUE SCAN | EMP_EMAIL_UK | 1 | | 2 (0) |00:00:01|


| 6 | TABLE ACCESS BY INDEX ROWID | DEPARTMENTS | 1 | 21 | 1 (0) |00:00:01|
|*7 | INDEX UNIQUE SCAN | DEPT_ID_PK | 1 | | 0 (0) | |
| 8 | NESTED LOOPS | |9101| 693K|118 (5) |00:00:01|
| 9 | TABLE ACCESS BY INDEX ROWID | DEPARTMENTS | 1 | 21 | 1 (0) |00:00:01|
|*10| INDEX UNIQUE SCAN |DEPARTMENT_NAME_UK| 1 | | 0 (0) | |
|*11| TABLE ACCESS BY INDEX ROWID BATCH| EMPLOYEES |9101| 506K|117 (5) |00:00:01|
|*12| INDEX RANGE SCAN |EMP_DEPARTMENT_IX |9101| | 35 (6) |00:00:01|
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

5 - access("E"."EMAIL"='SSTILES')
7 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
10 - access("D"."DEPARTMENT_NAME"='Treasury')
11 - filter(LNNVL("E"."EMAIL"='SSTILES'))
12 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")

35 rows selected.

5.2 View Merging


In view merging, the optimizer merges the query block representing a view into the query
block that contains it.
View merging can improve plans by enabling the optimizer to consider additional join orders,
access methods, and other transformations. For example, after a view has been merged and
several tables reside in one query block, a table inside a view may permit the optimizer to use
join elimination to remove a table outside the view.
For certain simple views in which merging always leads to a better plan, the optimizer
automatically merges the view without considering cost. Otherwise, the optimizer uses cost to
make the determination. The optimizer may choose not to merge a view for many reasons,
including cost or validity restrictions.
If OPTIMIZER_SECURE_VIEW_MERGING is true (default), then Oracle Database performs checks
to ensure that view merging and predicate pushing do not violate the security intentions of the
view creator. To disable these additional security checks for a specific view, you can grant the
MERGE VIEW privilege to a user for this view. To disable additional security checks for all views
for a specific user, you can grant the MERGE ANY VIEW privilege to that user.

Note:
You can use hints to override view merging rejected because of cost or heuristics,
but not validity.

This section contains the following topics:

5-3
Chapter 5
View Merging

See Also:

• Oracle Database SQL Language Reference for more information about


the MERGE ANY VIEW and MERGE VIEW privileges
• Oracle Database Reference for more information about the
OPTIMIZER_SECURE_VIEW_MERGING initialization parameter

5.2.1 Query Blocks in View Merging


The optimizer represents each nested subquery or unmerged view by a separate
query block.
The database optimizes query blocks separately from the bottom up. Thus, the
database optimizes the innermost query block first, generates the part of the plan for it,
and then generates the plan for the outer query block, representing the entire query.
The parser expands each view referenced in a query into a separate query block. The
block essentially represents the view definition, and thus the result of a view. One
option for the optimizer is to analyze the view query block separately, generate a view
subplan, and then process the rest of the query by using the view subplan to generate
an overall execution plan. However, this technique may lead to a suboptimal execution
plan because the view is optimized separately.
View merging can sometimes improve performance. As shown in "Example 5-2", view
merging merges the tables from the view into the outer query block, removing the
inner query block. Thus, separate optimization of the view is not necessary.

5.2.2 Simple View Merging


In simple view merging, the optimizer merges select-project-join views.
For example, a query of the employees table contains a subquery that joins the
departments and locations tables.

Simple view merging frequently results in a more optimal plan because of the
additional join orders and access paths available after the merge. A view may not be
valid for simple view merging because:
• The view contains constructs not included in select-project-join views, including:
– GROUP BY
– DISTINCT
– Outer join
– MODEL
– CONNECT BY
– Set operators
– Aggregation
• The view appears on the right side of a semijoin or antijoin.
• The view contains subqueries in the SELECT list.

5-4
Chapter 5
View Merging

• The outer query block contains PL/SQL functions.


• The view participates in an outer join, and does not meet one of the several additional
validity requirements that determine whether the view can be merged.
Example 5-2 Simple View Merging
The following query joins the hr.employees table with the dept_locs_v view, which returns
the street address for each department. dept_locs_v is a join of the departments and
locations tables.

SELECT e.first_name, e.last_name, dept_locs_v.street_address,


dept_locs_v.postal_code
FROM employees e,
( SELECT d.department_id, d.department_name,
l.street_address, l.postal_code
FROM departments d, locations l
WHERE d.location_id = l.location_id ) dept_locs_v
WHERE dept_locs_v.department_id = e.department_id
AND e.last_name = 'Smith';

The database can execute the preceding query by joining departments and locations to
generate the rows of the view, and then joining this result to employees. Because the query
contains the view dept_locs_v, and this view contains two tables, the optimizer must use one
of the following join orders:
• employees, dept_locs_v (departments, locations)
• employees, dept_locs_v (locations, departments)
• dept_locs_v (departments, locations), employees
• dept_locs_v (locations, departments), employees
Join methods are also constrained. The index-based nested loops join is not feasible for join
orders that begin with employees because no index exists on the column from this view.
Without view merging, the optimizer generates the following execution plan:

-----------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 (15)|
|* 1 | HASH JOIN | | 7 (15)|
| 2 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 2 (0)|
|* 3 | INDEX RANGE SCAN | EMP_NAME_IX | 1 (0)|
| 4 | VIEW | | 5 (20)|
|* 5 | HASH JOIN | | 5 (20)|
| 6 | TABLE ACCESS FULL | LOCATIONS | 2 (0)|
| 7 | TABLE ACCESS FULL | DEPARTMENTS | 2 (0)|
-----------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------
1 - access("DEPT_LOCS_V"."DEPARTMENT_ID"="E"."DEPARTMENT_ID")
3 - access("E"."LAST_NAME"='Smith')
5 - access("D"."LOCATION_ID"="L"."LOCATION_ID")

5-5
Chapter 5
View Merging

View merging merges the tables from the view into the outer query block, removing the
inner query block. After view merging, the query is as follows:

SELECT e.first_name, e.last_name, l.street_address, l.postal_code


FROM employees e, departments d, locations l
WHERE d.location_id = l.location_id
AND d.department_id = e.department_id
AND e.last_name = 'Smith';

Because all three tables appear in one query block, the optimizer can choose from the
following six join orders:
• employees, departments, locations
• employees, locations, departments
• departments, employees, locations
• departments, locations, employees
• locations, employees, departments
• locations, departments, employees
The joins to employees and departments can now be index-based. After view merging,
the optimizer chooses the following more efficient plan, which uses nested loops:

-------------------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 (0)|
| 1 | NESTED LOOPS | | |
| 2 | NESTED LOOPS | | 4 (0)|
| 3 | NESTED LOOPS | | 3 (0)|
| 4 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 2 (0)|
|* 5 | INDEX RANGE SCAN | EMP_NAME_IX | 1 (0)|
| 6 | TABLE ACCESS BY INDEX ROWID| DEPARTMENTS | 1 (0)|
|* 7 | INDEX UNIQUE SCAN | DEPT_ID_PK | 0 (0)|
|* 8 | INDEX UNIQUE SCAN | LOC_ID_PK | 0 (0)|
| 9 | TABLE ACCESS BY INDEX ROWID | LOCATIONS | 1 (0)|
-------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------
5 - access("E"."LAST_NAME"='Smith')
7 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
8 - access("D"."LOCATION_ID"="L"."LOCATION_ID")

See Also:
The Oracle Optimizer blog at https://blogs.oracle.com/optimizer/ to
learn about outer join view merging, which is a special case of simple view
merging

5-6
Chapter 5
View Merging

5.2.3 Complex View Merging


In view merging, the optimizer merges views containing GROUP BY and DISTINCT views. Like
simple view merging, complex merging enables the optimizer to consider additional join
orders and access paths.
The optimizer can delay evaluation of GROUP BY or DISTINCT operations until after it has
evaluated the joins. Delaying these operations can improve or worsen performance
depending on the data characteristics. If the joins use filters, then delaying the operation until
after joins can reduce the data set on which the operation is to be performed. Evaluating the
operation early can reduce the amount of data to be processed by subsequent joins, or the
joins could increase the amount of data to be processed by the operation. The optimizer uses
cost to evaluate view merging and merges the view only when it is the lower cost option.
Aside from cost, the optimizer may be unable to perform complex view merging for the
following reasons:
• The outer query tables do not have a rowid or unique column.
• The view appears in a CONNECT BY query block.
• The view contains GROUPING SETS, ROLLUP, or PIVOT clauses.
• The view or outer query block contains the MODEL clause.
Example 5-3 Complex View Joins with GROUP BY
The following view uses a GROUP BY clause:

CREATE VIEW cust_prod_totals_v AS


SELECT SUM(s.quantity_sold) total, s.cust_id, s.prod_id
FROM sales s
GROUP BY s.cust_id, s.prod_id;

The following query finds all of the customers from the United States who have bought at
least 100 fur-trimmed sweaters:

SELECT c.cust_id, c.cust_first_name, c.cust_last_name, c.cust_email


FROM customers c, products p, cust_prod_totals_v
WHERE c.country_id = 52790
AND c.cust_id = cust_prod_totals_v.cust_id
AND cust_prod_totals_v.total > 100
AND cust_prod_totals_v.prod_id = p.prod_id
AND p.prod_name = 'T3 Faux Fur-Trimmed Sweater';

The cust_prod_totals_v view is eligible for complex view merging. After merging, the query
is as follows:

SELECT c.cust_id, cust_first_name, cust_last_name, cust_email


FROM customers c, products p, sales s
WHERE c.country_id = 52790
AND c.cust_id = s.cust_id
AND s.prod_id = p.prod_id
AND p.prod_name = 'T3 Faux Fur-Trimmed Sweater'
GROUP BY s.cust_id, s.prod_id, p.rowid, c.rowid, c.cust_email,

5-7
Chapter 5
View Merging

c.cust_last_name,
c.cust_first_name, c.cust_id
HAVING SUM(s.quantity_sold) > 100;

The transformed query is cheaper than the untransformed query, so the optimizer
chooses to merge the view. In the untransformed query, the GROUP BY operator applies
to the entire sales table in the view. In the transformed query, the joins to products
and customers filter out a large portion of the rows from the sales table, so the GROUP
BY operation is lower cost. The join is more expensive because the sales table has not
been reduced, but it is not much more expensive because the GROUP BY operation
does not reduce the size of the row set very much in the original query. If any of the
preceding characteristics were to change, merging the view might no longer be lower
cost. The final plan, which does not include a view, is as follows:

--------------------------------------------------------
| Id | Operation | Name | Cost (%CPU)|
--------------------------------------------------------
| 0 | SELECT STATEMENT | | 2101 (18)|
|* 1 | FILTER | | |
| 2 | HASH GROUP BY | | 2101 (18)|
|* 3 | HASH JOIN | | 2099 (18)|
|* 4 | HASH JOIN | | 1801 (19)|
|* 5 | TABLE ACCESS FULL| PRODUCTS | 96 (5)|
| 6 | TABLE ACCESS FULL| SALES | 1620 (15)|
|* 7 | TABLE ACCESS FULL | CUSTOMERS | 296 (11)|
--------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SUM("QUANTITY_SOLD")>100)
3 - access("C"."CUST_ID"="CUST_ID")
4 - access("PROD_ID"="P"."PROD_ID")
5 - filter("P"."PROD_NAME"='T3 Faux Fur-Trimmed Sweater')
7 - filter("C"."COUNTRY_ID"='US')

Example 5-4 Complex View Joins with DISTINCT


The following query of the cust_prod_v view uses a DISTINCT operator:

SELECT c.cust_id, c.cust_first_name, c.cust_last_name, c.cust_email


FROM customers c, products p,
( SELECT DISTINCT s.cust_id, s.prod_id
FROM sales s) cust_prod_v
WHERE c.country_id = 52790
AND c.cust_id = cust_prod_v.cust_id
AND cust_prod_v.prod_id = p.prod_id
AND p.prod_name = 'T3 Faux Fur-Trimmed Sweater';

After determining that view merging produces a lower-cost plan, the optimizer rewrites
the query into this equivalent query:

SELECT nwvw.cust_id, nwvw.cust_first_name, nwvw.cust_last_name,


nwvw.cust_email
FROM ( SELECT DISTINCT(c.rowid), p.rowid, s.prod_id, s.cust_id,

5-8
Chapter 5
Predicate Pushing

c.cust_first_name, c.cust_last_name, c.cust_email


FROM customers c, products p, sales s
WHERE c.country_id = 52790
AND c.cust_id = s.cust_id
AND s.prod_id = p.prod_id
AND p.prod_name = 'T3 Faux Fur-Trimmed Sweater' ) nwvw;

The plan for the preceding query is as follows:

-------------------------------------------
| Id | Operation | Name |
-------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | VIEW | VM_NWVW_1 |
| 2 | HASH UNIQUE | |
|* 3 | HASH JOIN | |
|* 4 | HASH JOIN | |
|* 5 | TABLE ACCESS FULL| PRODUCTS |
| 6 | TABLE ACCESS FULL| SALES |
|* 7 | TABLE ACCESS FULL | CUSTOMERS |
-------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------
3 - access("C"."CUST_ID"="S"."CUST_ID")
4 - access("S"."PROD_ID"="P"."PROD_ID")
5 - filter("P"."PROD_NAME"='T3 Faux Fur-Trimmed Sweater')
7 - filter("C"."COUNTRY_ID"='US')

The preceding plan contains a view named vm_nwvw_1, known as a projection view, even
after view merging has occurred. Projection views appear in queries in which a DISTINCT
view has been merged, or a GROUP BY view is merged into an outer query block that also
contains GROUP BY, HAVING, or aggregates. In the latter case, the projection view contains the
GROUP BY, HAVING, and aggregates from the original outer query block.

In the preceding example of a projection view, when the optimizer merges the view, it moves
the DISTINCT operator to the outer query block, and then adds several additional columns to
maintain semantic equivalence with the original query. Afterward, the query can select only
the desired columns in the SELECT list of the outer query block. The optimization retains all of
the benefits of view merging: all tables are in one query block, the optimizer can permute
them as needed in the final join order, and the DISTINCT operation has been delayed until
after all of the joins complete.

5.3 Predicate Pushing


In predicate pushing, the optimizer "pushes" the relevant predicates from the containing
query block into the view query block.
For views that are not merged, this technique improves the subplan of the unmerged view.
The database can use the pushed-in predicates to access indexes or to use as filters.

5-9
Chapter 5
Predicate Pushing

For example, suppose you create a table hr.contract_workers as follows:

DROP TABLE contract_workers;


CREATE TABLE contract_workers AS (SELECT * FROM employees where 1=2);
INSERT INTO contract_workers VALUES (306, 'Bill', 'Jones', 'BJONES',
'555.555.2000', '07-JUN-02', 'AC_ACCOUNT', 8300, 0,205, 110);
INSERT INTO contract_workers VALUES (406, 'Jill', 'Ashworth',
'JASHWORTH',
'555.999.8181', '09-JUN-05', 'AC_ACCOUNT', 8300, 0,205, 50);
INSERT INTO contract_workers VALUES (506, 'Marcie', 'Lunsford',
'MLUNSFORD', '555.888.2233', '22-JUL-01', 'AC_ACCOUNT', 8300,
0, 205, 110);
COMMIT;
CREATE INDEX contract_workers_index ON contract_workers(department_id);

You create a view that references employees and contract_workers. The view is
defined with a query that uses the UNION set operator, as follows:

CREATE VIEW all_employees_vw AS


( SELECT employee_id, last_name, job_id, commission_pct, department_id
FROM employees )
UNION
( SELECT employee_id, last_name, job_id, commission_pct, department_id
FROM contract_workers );

You then query the view as follows:

SELECT last_name
FROM all_employees_vw
WHERE department_id = 50;

Because the view is a UNION set query, the optimizer cannot merge the view's query
into the accessing query block. Instead, the optimizer can transform the accessing
statement by pushing its predicate, the WHERE clause condition department_id=50, into
the view's UNION set query. The equivalent transformed query is as follows:

SELECT last_name
FROM ( SELECT employee_id, last_name, job_id, commission_pct,
department_id
FROM employees
WHERE department_id=50
UNION
SELECT employee_id, last_name, job_id, commission_pct,
department_id
FROM contract_workers
WHERE department_id=50 );

The transformed query can now consider index access in each of the query blocks.

5-10
Chapter 5
Subquery Unnesting

5.4 Subquery Unnesting


In subquery unnesting, the optimizer transforms a nested query into an equivalent join
statement, and then optimizes the join.
This transformation enables the optimizer to consider the subquery tables during access
path, join method, and join order selection. The optimizer can perform this transformation
only if the resulting join statement is guaranteed to return the same rows as the original
statement, and if subqueries do not contain aggregate functions such as AVG.

For example, suppose you connect as user sh and execute the following query:

SELECT *
FROM sales
WHERE cust_id IN ( SELECT cust_id
FROM customers );

Because the customers.cust_id column is a primary key, the optimizer can transform the
complex query into the following join statement that is guaranteed to return the same data:

SELECT sales.*
FROM sales, customers
WHERE sales.cust_id = customers.cust_id;

If the optimizer cannot transform a complex statement into a join statement, it selects
execution plans for the parent statement and the subquery as though they were separate
statements. The optimizer then executes the subquery and uses the rows returned to execute
the parent query. To improve execution speed of the overall execution plan, the optimizer
orders the subplans efficiently.

5.5 Query Rewrite with Materialized Views


A materialized view is a query result stored in a table.
When the optimizer finds a user query compatible with the query associated with a
materialized view, the database can rewrite the query in terms of the materialized view. This
technique improves query execution because the database has precomputed most of the
query result.
The optimizer looks for materialized views that are compatible with the user query, and then
uses a cost-based algorithm to select materialized views to rewrite the query. The optimizer
does not rewrite the query when the plan generated unless the materialized views has a
lower cost than the plan generated with the materialized views.
This section contains the following topics:

See Also:
Oracle Database Data Warehousing Guide to learn more about query rewrite

5-11
Chapter 5
Query Rewrite with Materialized Views

5.5.1 About Query Rewrite and the Optimizer


A query undergoes several checks to determine whether it is a candidate for query
rewrite.
If the query fails any check, then the query is applied to the detail tables rather than
the materialized view. The inability to rewrite can be costly in terms of response time
and processing power.
The optimizer uses two different methods to determine when to rewrite a query in
terms of a materialized view. The first method matches the SQL text of the query to the
SQL text of the materialized view definition. If the first method fails, then the optimizer
uses the more general method in which it compares joins, selections, data columns,
grouping columns, and aggregate functions between the query and materialized views.
Query rewrite operates on queries and subqueries in the following types of SQL
statements:
• SELECT
• CREATE TABLE … AS SELECT
• INSERT INTO … SELECT
It also operates on subqueries in the set operators UNION, UNION ALL , INTERSECT, and
MINUS, and subqueries in DML statements such as INSERT, DELETE, and UPDATE.

Dimensions, constraints, and rewrite integrity levels affect whether a query is rewritten
to use materialized views. Additionally, query rewrite can be enabled or disabled by
REWRITE and NOREWRITE hints and the QUERY_REWRITE_ENABLED session parameter.

The DBMS_MVIEW.EXPLAIN_REWRITE procedure advises whether query rewrite is


possible on a query and, if so, which materialized views are used. It also explains why
a query cannot be rewritten.

5.5.2 About Initialization Parameters for Query Rewrite


Query rewrite behavior is controlled by certain database initialization parameters.

Table 5-1 Initialization Parameters that Control Query Rewrite Behavior

Initialization Parameter Initialization Parameter Behavior of Query Rewrite


Name Value
OPTIMIZER_MODE ALL_ROWS (default), With OPTIMIZER_MODE set to FIRST_ROWS, the
FIRST_ROWS, or optimizer uses a mix of costs and heuristics to find a
FIRST_ROWS_n best plan for fast delivery of the first few rows. When
set to FIRST_ROWS_n, the optimizer uses a cost-
based approach and optimizes with a goal of best
response time to return the first n rows (where n = 1,
10, 100, 1000).

5-12
Chapter 5
Query Rewrite with Materialized Views

Table 5-1 (Cont.) Initialization Parameters that Control Query Rewrite Behavior

Initialization Parameter Initialization Parameter Behavior of Query Rewrite


Name Value
QUERY_REWRITE_ENABLED TRUE (default), FALSE, or This option enables the query rewrite feature of the
FORCE optimizer, enabling the optimizer to utilize
materialized views to enhance performance. If set to
FALSE, this option disables the query rewrite feature
of the optimizer and directs the optimizer not to
rewrite queries using materialized views even when
the estimated query cost of the unrewritten query is
lower.
If set to FORCE, this option enables the query rewrite
feature of the optimizer and directs the optimizer to
rewrite queries using materialized views even when
the estimated query cost of the unrewritten query is
lower.
QUERY_REWRITE_INTEGRITY STALE_TOLERATED, This parameter is optional. However, if it is set, the
TRUSTED, or ENFORCED (the value must be one of these specified in the
default) Initialization Parameter Value column.
By default, the integrity level is set to ENFORCED. In
this mode, all constraints must be validated.
Therefore, if you use ENABLE NOVALIDATE RELY ,
certain types of query rewrite might not work. To
enable query rewrite in this environment (where
constraints have not been validated), you should set
the integrity level to a lower level of granularity such
as TRUSTED or STALE_TOLERATED.

Related Topics
• About the Accuracy of Query Rewrite
Query rewrite offers three levels of rewrite integrity that are controlled by the initialization
parameter QUERY_REWRITE_INTEGRITY.

5.5.3 About the Accuracy of Query Rewrite


Query rewrite offers three levels of rewrite integrity that are controlled by the initialization
parameter QUERY_REWRITE_INTEGRITY.

The values that you can set for the QUERY_REWRITE_INTEGRITY parameter are as follows:

• ENFORCED
This is the default mode. The optimizer only uses fresh data from the materialized views
and only use those relationships that are based on ENABLED VALIDATED primary, unique,
or foreign key constraints.
• TRUSTED
In TRUSTED mode, the optimizer trusts that the relationships declared in dimensions and
RELY constraints are correct. In this mode, the optimizer also uses prebuilt materialized
views or materialized views based on views, and it uses relationships that are not
enforced as well as those that are enforced. It also trusts declared but not ENABLED
VALIDATED primary or unique key constraints and data relationships specified using

5-13
Chapter 5
Query Rewrite with Materialized Views

dimensions. This mode offers greater query rewrite capabilities but also creates
the risk of incorrect results if any of the trusted relationships you have declared are
incorrect.
• STALE_TOLERATED
In STALE_TOLERATED mode, the optimizer uses materialized views that are valid but
contain stale data as well as those that contain fresh data. This mode offers the
maximum rewrite capability but creates the risk of generating inaccurate results.
If rewrite integrity is set to the safest level, ENFORCED, the optimizer uses only enforced
primary key constraints and referential integrity constraints to ensure that the results of
the query are the same as the results when accessing the detail tables directly.
If the rewrite integrity is set to levels other than ENFORCED, there are several situations
where the output with rewrite can be different from that without it:
• A materialized view can be out of synchronization with the master copy of the
data. This generally happens because the materialized view refresh procedure is
pending following bulk load or DML operations to one or more detail tables of a
materialized view. At some data warehouse sites, this situation is desirable
because it is not uncommon for some materialized views to be refreshed at certain
time intervals.
• The relationships implied by the dimension objects are invalid. For example,
values at a certain level in a hierarchy do not roll up to exactly one parent value.
• The values stored in a prebuilt materialized view table might be incorrect.
• A wrong answer can occur because of bad data relationships defined by
unenforced table or view constraints.
You can set QUERY_REWRITE_INTEGRITY either in your initialization parameter file or
using an ALTER SYSTEM or ALTER SESSION statement.

5.5.4 Example of Query Rewrite


This example illustrates the power of query rewrite with materialized views.
Consider the following materialized view, cal_month_sales_mv, which provides an
aggregation of the dollar amount sold in every month:

CREATE MATERIALIZED VIEW cal_month_sales_mv


ENABLE QUERY REWRITE AS
SELECT t.calendar_month_desc, SUM(s.amount_sold) AS dollars
FROM sales s, times t WHERE s.time_id = t.time_id
GROUP BY t.calendar_month_desc;

Let us assume that, in a typical month, the number of sales in the store is around one
million. So this materialized aggregate view has the precomputed aggregates for the
dollar amount sold for each month.
Consider the following query, which asks for the sum of the amount sold at the store
for each calendar month:

SELECT t.calendar_month_desc, SUM(s.amount_sold)


FROM sales s, times t WHERE s.time_id = t.time_id
GROUP BY t.calendar_month_desc;

5-14
Chapter 5
Star Transformation

In the absence of the previous materialized view and query rewrite feature, Oracle Database
must access the sales table directly and compute the sum of the amount sold to return the
results. This involves reading many million rows from the sales table, which will invariably
increase the query response time due to the disk access. The join in the query will also
further slow down the query response as the join needs to be computed on many million
rows.
In the presence of the materialized view cal_month_sales_mv, query rewrite will transparently
rewrite the previous query into the following query:

SELECT calendar_month, dollars


FROM cal_month_sales_mv;

Because there are only a few dozen rows in the materialized view cal_month_sales_mv and
no joins, Oracle Database returns the results instantly.

5.6 Star Transformation


Star transformation is an optimizer transformation that avoids full table scans of fact tables in
a star schema.
This section contains the following topics:

5.6.1 About Star Schemas


A star schema divides data into facts and dimensions.
Facts are the measurements of an event such as a sale and are typically numbers.
Dimensions are the categories that identify facts, such as date, location, and product.
A fact table has a composite key made up of the primary keys of the dimension tables of the
schema. Dimension tables act as lookup or reference tables that enable you to choose
values that constrain your queries.
Diagrams typically show a central fact table with lines joining it to the dimension tables, giving
the appearance of a star. The following graphic shows sales as the fact table and products,
times, customers, and channels as the dimension tables.

Figure 5-1 Star Schema

products times

sales
(amount_sold,
quantity_sold)

Fact Table
customers channels

Dimension Table Dimension Table

A snowflake schema is a star schema in which the dimension tables reference other tables. A
snowstorm schema is a combination of snowflake schemas.

5-15
Chapter 5
Star Transformation

See Also:
Oracle Database Data Warehousing Guide to learn more about star
schemas

5.6.2 Purpose of Star Transformations


In joins of fact and dimension tables, a star transformation can avoid a full scan of a
fact table.
The star transformation improves performance by fetching only relevant fact rows that
join to the constraint dimension rows. In some cases, queries have restrictive filters on
other columns of the dimension tables. The combination of filters can dramatically
reduce the data set that the database processes from the fact table.

5.6.3 How Star Transformation Works


Star transformation adds subquery predicates, called bitmap semijoin predicates,
corresponding to the constraint dimensions.
The optimizer performs the transformation when indexes exist on the fact join
columns. By driving bitmap AND and OR operations of key values supplied by the
subqueries, the database only needs to retrieve relevant rows from the fact table. If the
predicates on the dimension tables filter out significant data, then the transformation
can be more efficient than a full scan on the fact table.
After the database has retrieved the relevant rows from the fact table, the database
may need to join these rows back to the dimension tables using the original
predicates. The database can eliminate the join back of the dimension table when the
following conditions are met:
• All the predicates on dimension tables are part of the semijoin subquery predicate.
• The columns selected from the subquery are unique.
• The dimension columns are not in the SELECT list, GROUP BY clause, and so on.

5.6.4 Controls for Star Transformation


The STAR_TRANSFORMATION_ENABLED initialization parameter controls star
transformations.
This parameter takes the following values:
• true
The optimizer performs the star transformation by identifying the fact and
constraint dimension tables automatically. The optimizer performs the star
transformation only if the cost of the transformed plan is lower than the
alternatives. Also, the optimizer attempts temporary table transformation
automatically whenever materialization improves performance (see "Temporary
Table Transformation: Scenario").
• false (default)
The optimizer does not perform star transformations.

5-16
Chapter 5
Star Transformation

• TEMP_DISABLE
This value is identical to true except that the optimizer does not attempt temporary table
transformation.

See Also:
Oracle Database Reference to learn about the STAR_TRANSFORMATION_ENABLED
initialization parameter

5.6.5 Star Transformation: Scenario


This scenario demonstrates a star transformation of a star query.
Example 5-5 Star Query
The following query finds the total Internet sales amount in all cities in California for quarters
Q1 and Q2 of year 1999:

SELECT c.cust_city,
t.calendar_quarter_desc,
SUM(s.amount_sold) sales_amount
FROM sales s,
times t,
customers c,
channels ch
WHERE s.time_id = t.time_id
AND s.cust_id = c.cust_id
AND s.channel_id = ch.channel_id
AND c.cust_state_province = 'CA'
AND ch.channel_desc = 'Internet'
AND t.calendar_quarter_desc IN ('1999-01','1999-02')
GROUP BY c.cust_city, t.calendar_quarter_desc;

Sample output is as follows:

CUST_CITY CALENDA SALES_AMOUNT


------------------------------ ------- ------------
Montara 1999-02 1618.01
Pala 1999-01 3263.93
Cloverdale 1999-01 52.64
Cloverdale 1999-02 266.28
. . .

In this example, sales is the fact table, and the other tables are dimension tables. The sales
table contains one row for every sale of a product, so it could conceivably contain billions of
sales records. However, only a few products are sold to customers in California through the
Internet for the specified quarters.

5-17
Chapter 5
Star Transformation

Example 5-6 Star Transformation


This example shows a star transformation of the query in Example 5-5. The
transformation avoids a full table scan of sales.

SELECT c.cust_city, t.calendar_quarter_desc, SUM(s.amount_sold)


sales_amount
FROM sales s, times t, customers c
WHERE s.time_id = t.time_id
AND s.cust_id = c.cust_id
AND c.cust_state_province = 'CA'
AND t.calendar_quarter_desc IN ('1999-01','1999-02')
AND s.time_id IN ( SELECT time_id
FROM times
WHERE calendar_quarter_desc
IN('1999-01','1999-02') )
AND s.cust_id IN ( SELECT cust_id
FROM customers
WHERE cust_state_province='CA' )
AND s.channel_id IN ( SELECT channel_id
FROM channels
WHERE channel_desc = 'Internet' )
GROUP BY c.cust_city, t.calendar_quarter_desc;

Example 5-7 Partial Execution Plan for Star Transformation


This example shows an edited version of the execution plan for the star transformation
in Example 5-6.
Line 26 shows that the sales table has an index access path instead of a full table
scan. For each key value that results from the subqueries of channels (line 14), times
(line 19), and customers (line 24), the database retrieves a bitmap from the indexes on
the sales fact table (lines 15, 20, 25).

Each bit in the bitmap corresponds to a row in the fact table. The bit is set when the
key value from the subquery is same as the value in the row of the fact table. For
example, in the bitmap 101000... (the ellipses indicates that the values for the
remaining rows are 0), rows 1 and 3 of the fact table have matching key values from
the subquery.
The operations in lines 12, 17, and 22 iterate over the keys from the subqueries and
retrieve the corresponding bitmaps. In Example 5-6, the customers subquery seeks
the IDs of customers whose state or province is CA. Assume that the bitmap 101000...
corresponds to the customer ID key value 103515 from the customers table subquery.
Also assume that the customers subquery produces the key value 103516 with the
bitmap 010000..., which means that only row 2 in sales has a matching key value
from the subquery.
The database merges (using the OR operator) the bitmaps for each subquery (lines 11,
16, 21). In our customers example, the database produces a single bitmap 111000...
for the customers subquery after merging the two bitmaps:

101000... # bitmap corresponding to key 103515


010000... # bitmap corresponding to key 103516

5-18
Chapter 5
Star Transformation

---------
111000... # result of OR operation

In line 10, the database applies the AND operator to the merged bitmaps. Assume that after
the database has performed all OR operations, the resulting bitmap for channels is 100000...
If the database performs an AND operation on this bitmap and the bitmap from customers
subquery, then the result is as follows:

100000... # channels bitmap after all OR operations performed


111000... # customers bitmap after all OR operations performed
---------
100000... # bitmap result of AND operation for channels and customers

In line 9, the database generates the corresponding rowids of the final bitmap. The database
retrieves rows from the sales fact table using the rowids (line 26). In our example, the
database generate only one rowid, which corresponds to the first row, and thus fetches only a
single row instead of scanning the entire sales table.

---------------------------------------------------------------------------
| Id | Operation | Name
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT |
| 1 | HASH GROUP BY |
|* 2 | HASH JOIN |
|* 3 | TABLE ACCESS FULL | CUSTOMERS
|* 4 | HASH JOIN |
|* 5 | TABLE ACCESS FULL | TIMES
| 6 | VIEW | VW_ST_B1772830
| 7 | NESTED LOOPS |
| 8 | PARTITION RANGE SUBQUERY |
| 9 | BITMAP CONVERSION TO ROWIDS|
| 10 | BITMAP AND |
| 11 | BITMAP MERGE |
| 12 | BITMAP KEY ITERATION |
| 13 | BUFFER SORT |
|* 14 | TABLE ACCESS FULL | CHANNELS
|* 15 | BITMAP INDEX RANGE SCAN| SALES_CHANNEL_BIX
| 16 | BITMAP MERGE |
| 17 | BITMAP KEY ITERATION |
| 18 | BUFFER SORT |
|* 19 | TABLE ACCESS FULL | TIMES
|* 20 | BITMAP INDEX RANGE SCAN| SALES_TIME_BIX
| 21 | BITMAP MERGE |
| 22 | BITMAP KEY ITERATION |
| 23 | BUFFER SORT |
|* 24 | TABLE ACCESS FULL | CUSTOMERS
|* 25 | BITMAP INDEX RANGE SCAN| SALES_CUST_BIX
| 26 | TABLE ACCESS BY USER ROWID | SALES
---------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

2 - access("ITEM_1"="C"."CUST_ID")

5-19
Chapter 5
Star Transformation

3 - filter("C"."CUST_STATE_PROVINCE"='CA')
4 - access("ITEM_2"="T"."TIME_ID")
5 - filter(("T"."CALENDAR_QUARTER_DESC"='1999-01'
OR "T"."CALENDAR_QUARTER_DESC"='1999-02'))
14 - filter("CH"."CHANNEL_DESC"='Internet')
15 - access("S"."CHANNEL_ID"="CH"."CHANNEL_ID")
19 - filter(("T"."CALENDAR_QUARTER_DESC"='1999-01'
OR "T"."CALENDAR_QUARTER_DESC"='1999-02'))
20 - access("S"."TIME_ID"="T"."TIME_ID")
24 - filter("C"."CUST_STATE_PROVINCE"='CA')
25 - access("S"."CUST_ID"="C"."CUST_ID")

Note
-----
- star transformation used for this statement

5.6.6 Temporary Table Transformation: Scenario


In the preceding scenario, the optimizer does not join back the table channels to the
sales table because it is not referenced outside and the channel_id is unique.

If the optimizer cannot eliminate the join back, however, then the database stores the
subquery results in a temporary table to avoid rescanning the dimension table for
bitmap key generation and join back. Also, if the query runs in parallel, then the
database materializes the results so that each parallel execution server can select the
results from the temporary table instead of executing the subquery again.
Example 5-8 Star Transformation Using Temporary Table
In this example, the database materializes the results of the subquery on customers
into a temporary table:

SELECT t1.c1 cust_city, t.calendar_quarter_desc calendar_quarter_desc,


SUM(s.amount_sold) sales_amount
FROM sales s, sh.times t, sys_temp_0fd9d6621_e7e24 t1
WHERE s.time_id=t.time_id
AND s.cust_id=t1.c0
AND (t.calendar_quarter_desc='1999-q1' OR
t.calendar_quarter_desc='1999-q2')
AND s.cust_id IN ( SELECT t1.c0
FROM sys_temp_0fd9d6621_e7e24 t1 )
AND s.channel_id IN ( SELECT ch.channel_id
FROM channels ch
WHERE ch.channel_desc='internet' )
AND s.time_id IN ( SELECT t.time_id
FROM times t
WHERE t.calendar_quarter_desc='1999-q1'
OR t.calendar_quarter_desc='1999-q2' )
GROUP BY t1.c1, t.calendar_quarter_desc

The optimizer replaces customers with the temporary table


sys_temp_0fd9d6621_e7e24, and replaces references to columns cust_id and
cust_city with the corresponding columns of the temporary table. The database
creates the temporary table with two columns: (c0 NUMBER, c1 VARCHAR2(30)).

5-20
Chapter 5
Star Transformation

These columns correspond to cust_id and cust_city of the customers table. The database
populates the temporary table by executing the following query at the beginning of the
execution of the previous query:

SELECT c.cust_id, c.cust_city FROM customers WHERE c.cust_state_province =


'CA'

Example 5-9 Partial Execution Plan for Star Transformation Using Temporary Table
The following example shows an edited version of the execution plan for the query in
Example 5-8:

---------------------------------------------------------------------------
| Id | Operation | Name
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT |
| 1 | TEMP TABLE TRANSFORMATION |
| 2 | LOAD AS SELECT |
|* 3 | TABLE ACCESS FULL | CUSTOMERS
| 4 | HASH GROUP BY |
|* 5 | HASH JOIN |
| 6 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6613_C716F
|* 7 | HASH JOIN |
|* 8 | TABLE ACCESS FULL | TIMES
| 9 | VIEW | VW_ST_A3F94988
| 10 | NESTED LOOPS |
| 11 | PARTITION RANGE SUBQUERY |
| 12 | BITMAP CONVERSION TO ROWIDS|
| 13 | BITMAP AND |
| 14 | BITMAP MERGE |
| 15 | BITMAP KEY ITERATION |
| 16 | BUFFER SORT |
|* 17 | TABLE ACCESS FULL | CHANNELS
|* 18 | BITMAP INDEX RANGE SCAN| SALES_CHANNEL_BIX
| 19 | BITMAP MERGE |
| 20 | BITMAP KEY ITERATION |
| 21 | BUFFER SORT |
|* 22 | TABLE ACCESS FULL | TIMES
|* 23 | BITMAP INDEX RANGE SCAN| SALES_TIME_BIX
| 24 | BITMAP MERGE |
| 25 | BITMAP KEY ITERATION |
| 26 | BUFFER SORT |
| 27 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6613_C716F
|* 28 | BITMAP INDEX RANGE SCAN| SALES_CUST_BIX
| 29 | TABLE ACCESS BY USER ROWID | SALES
---------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

3 - filter("C"."CUST_STATE_PROVINCE"='CA')
5 - access("ITEM_1"="C0")
7 - access("ITEM_2"="T"."TIME_ID")
8 - filter(("T"."CALENDAR_QUARTER_DESC"='1999-01' OR

5-21
Chapter 5
In-Memory Aggregation (VECTOR GROUP BY)

"T"."CALENDAR_QUARTER_DESC"='1999-02'))
17 - filter("CH"."CHANNEL_DESC"='Internet')
18 - access("S"."CHANNEL_ID"="CH"."CHANNEL_ID")
22 - filter(("T"."CALENDAR_QUARTER_DESC"='1999-01' OR
"T"."CALENDAR_QUARTER_DESC"='1999-02'))
23 - access("S"."TIME_ID"="T"."TIME_ID")
28 - access("S"."CUST_ID"="C0")

Lines 1, 2, and 3 of the plan materialize the customers subquery into the temporary
table. In line 6, the database scans the temporary table (instead of the subquery) to
build the bitmap from the fact table. Line 27 scans the temporary table for joining back
instead of scanning customers. The database does not need to apply the filter on
customers on the temporary table because the filter is applied while materializing the
temporary table.

5.7 In-Memory Aggregation (VECTOR GROUP BY)


The key optimization of in-memory aggregation is to aggregate while scanning.
To optimize query blocks involving aggregation and joins from a single large table to
multiple small tables, such as in a typical star query, the transformation uses KEY
VECTOR and VECTOR GROUP BY operations. These operations use efficient in-memory
arrays for joins and aggregation, and are especially effective when the underlying
tables are in-memory columnar tables.

See Also:
Oracle Database In-Memory Guide to learn more about in-memory
aggregation

5.8 Cursor-Duration Temporary Tables


To materialize the intermediate results of a query, Oracle Database may implicitly
create a cursor-duration temporary table in memory during query compilation.
This section contains the following topics:

5.8.1 Purpose of Cursor-Duration Temporary Tables


Complex queries sometimes process the same query block multiple times, which
creates unnecessary performance overhead.
To avoid this scenario, Oracle Database can automatically create temporary tables for
the query results and store them in memory for the duration of the cursor. For complex
operations such as WITH clause queries, star transformations, and grouping sets, this
optimization enhances the materialization of intermediate results from repetitively used
subqueries. In this way, cursor-duration temporary tables improve performance and
optimize I/O.

5-22
Chapter 5
Cursor-Duration Temporary Tables

5.8.2 How Cursor-Duration Temporary Tables Work


The definition of the cursor-definition temporary table resides in memory. The table definition
is associated with the cursor, and is only visible to the session executing the cursor.
When using cursor-duration temporary tables, the database performs the following steps:
1. Chooses a plan that uses a cursor-duration temporary table
2. Creates the temporary table using a unique name
3. Rewrites the query to refer to the temporary table
4. Loads data into memory until no memory remains, in which case it creates temporary
segments on disk
5. Executes the query, returning data from the temporary table
6. Truncates the table, releasing memory and any on-disk temporary segments

Note:
The metadata for the cursor-duration temporary table stays in memory as long as
the cursor is in memory. The metadata is not stored in the data dictionary, which
means it is not visible through data dictionary views. You cannot drop the metadata
explicitly.

The preceding scenario depends on the availability of memory. For serial queries, the
temporary tables use PGA memory.
The implementation of cursor-duration temporary tables is similar to sorts. If no more memory
is available, then the database writes data to temporary segments. For cursor-duration
temporary tables, the differences are as follows:
• The database releases memory and temporary segments at the end of the query rather
than when the row source is no longer active.
• Data in memory stays in memory, unlike in sorts where data can move between memory
and temporary segments.
When the database uses cursor-duration temporary tables, the keyword CURSOR DURATION
MEMORY appears in the execution plan.

5.8.3 Cursor-Duration Temporary Tables: Example


A WITH query that repeats the same subquery can sometimes benefit from a cursor-duration
temporary table.
The following query uses a WITH clause to create three subquery blocks:

WITH
q1 AS (SELECT department_id, SUM(salary) sum_sal FROM hr.employees GROUP
BY department_id),
q2 AS (SELECT * FROM q1),
q3 AS (SELECT department_id, sum_sal FROM q1)

5-23
Chapter 5
Cursor-Duration Temporary Tables

SELECT * FROM q1
UNION ALL
SELECT * FROM q2
UNION ALL
SELECT * FROM q3;

The following sample plan shows the transformation:

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'BASIC +ROWS +COST'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
-----
| Id | Operation | Name |Rows |Cost
(%CPU)|
--------------------------------------------------------------------------------------
-----
| 0 | SELECT STATEMENT | | |6
(100)|
| 1 | TEMP TABLE TRANSFORMATION | |
| |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY) | SYS_TEMP_0FD9D6606_1AE004 |
| |
| 3 | HASH GROUP BY | | 11 | 3
(34)|
| 4 | TABLE ACCESS FULL | EMPLOYEES |107 | 2
(0) |
| 5 | UNION-ALL | |
| |
| 6 | VIEW | | 11 | 2
(0) |
| 7 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6606_1AE004 | 11 | 2
(0) |
| 8 | VIEW | | 11 | 2
(0) |
| 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6606_1AE004 | 11 | 2
(0) |
| 10 | VIEW | | 11 | 2
(0) |
| 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6606_1AE004 | 11 | 2
(0) |
--------------------------------------------------------------------------------------
-----

In the preceding plan, TEMP TABLE TRANSFORMATION in Step 1 indicates that the
database used cursor-duration temporary tables to execute the query. The CURSOR
DURATION MEMORY keyword in Step 2 indicates that the database used memory, if
available, to store the results of SYS_TEMP_0FD9D6606_1AE004. If memory was
unavailable, then the database wrote the temporary data to disk.

5-24
Chapter 5
Table Expansion

5.9 Table Expansion


In table expansion, the optimizer generates a plan that uses indexes on the read-mostly
portion of a partitioned table, but not on the active portion of the table.
This section contains the following topics:

5.9.1 Purpose of Table Expansion


Index-based plans can improve performance, but index maintenance creates overhead. In
many databases, DML affects only a small portion of the data.
Table expansion uses index-based plans for high-update tables. You can create an index only
on the read-mostly data, eliminating index overhead on the active data. In this way, table
expansion improves performance while avoiding index maintenance.

5.9.2 How Table Expansion Works


Table partitioning makes table expansion possible.
If a local index exists on a partitioned table, then the optimizer can mark the index as
unusable for specific partitions. In effect, some partitions are not indexed.
In table expansion, the optimizer transforms the query into a UNION ALL statement, with some
subqueries accessing indexed partitions and other subqueries accessing unindexed
partitions. The optimizer can choose the most efficient access method available for a
partition, regardless of whether it exists for all of the partitions accessed in the query.
The optimizer does not always choose table expansion:
• Table expansion is cost-based.
While the database accesses each partition of the expanded table only once across all
branches of the UNION ALL, any tables that the database joins to it are accessed in each
branch.
• Semantic issues may render expansion invalid.
For example, a table appearing on the right side of an outer join is not valid for table
expansion.
You can control table expansion with the hint EXPAND_TABLE hint. The hint overrides the cost-
based decision, but not the semantic checks.

See Also:

• "Influencing the Optimizer with Hints"


• Oracle Database SQL Language Reference to learn more about SQL hints

5-25
Chapter 5
Table Expansion

5.9.3 Table Expansion: Scenario


The optimizer keeps track of which partitions must be accessed from each table,
based on predicates that appear in the query. Partition pruning enables the optimizer
to use table expansion to generate more optimal plans.

Assumptions
This scenario assumes the following:
• You want to run a star query against the sh.sales table, which is range-partitioned
on the time_id column.
• You want to disable indexes on specific partitions to see the benefits of table
expansion.

To use table expansion:


1. Log in to the database as the sh user.
2. Run the following query:

SELECT *
FROM sales
WHERE time_id >= TO_DATE('2000-01-01 00:00:00', 'SYYYY-MM-DD
HH24:MI:SS')
AND prod_id = 38;

3. Explain the plan by querying DBMS_XPLAN:

SET LINESIZE 150


SET PAGESIZE 0
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format =>
'BASIC,PARTITION'));

As shown in the Pstart and Pstop columns in the following plan, the optimizer
determines from the filter that only 16 of the 28 partitions in the table must be
accessed:

Plan hash value: 3087065703

--------------------------------------------------------------------
----
|Id| Operation | Name |Pstart|
Pstop|
--------------------------------------------------------------------
----
| 0| SELECT STATEMENT | |
| |
| 1| PARTITION RANGE ITERATOR | |13|
28 |
| 2| TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES |13|
28 |
| 3| BITMAP CONVERSION TO ROWIDS | |

5-26
Chapter 5
Table Expansion

| |
|*4| BITMAP INDEX SINGLE VALUE |SALES_PROD_BIX|13| 28 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

4 - access("PROD_ID"=38)

After the optimizer has determined the partitions to be accessed, it considers any index
that is usable on all of those partitions. In the preceding plan, the optimizer chose to use
the sales_prod_bix bitmap index.
4. Disable the index on the SALES_1995 partition of the sales table:

The preceding DDL disables the index on partition 1, which contains all sales from before
1996.

Note:
You can obtain the partition information by querying the USER_IND_PARTITIONS
view.

5. Execute the query of sales again, and then query DBMS_XPLAN to obtain the plan.
The output shows that the plan did not change:

Plan hash value: 3087065703

------------------------------------------------------------------------
|Id| Operation | Name |Pstart|Pstop
------------------------------------------------------------------------
| 0| SELECT STATEMENT | | | |
| 1| PARTITION RANGE ITERATOR | |13|28 |
| 2| TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| SALES |13|28 |
| 3| BITMAP CONVERSION TO ROWIDS | | | |
|*4| BITMAP INDEX SINGLE VALUE | SALES_PROD_BIX|13|28 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

4 - access("PROD_ID"=38)

The plan is the same because the disabled index partition is not relevant to the query. If
all partitions that the query accesses are indexed, then the database can answer the
query using the index. Because the query only accesses partitions 16 through 28,
disabling the index on partition 1 does not affect the plan.

5-27
Chapter 5
Table Expansion

6. Disable the indexes for partition 28 (SALES_Q4_2003), which is a partition that the
query needs to access:

ALTER INDEX sales_prod_bix MODIFY PARTITION sales_q4_2003 UNUSABLE;


ALTER INDEX sales_time_bix MODIFY PARTITION sales_q4_2003 UNUSABLE;

By disabling the indexes on a partition that the query does need to access, the
query can no longer use this index (without table expansion).
7. Query the plan using DBMS_XPLAN.
As shown in the following plan, the optimizer does not use the index:

Plan hash value: 3087065703

--------------------------------------------------------------------
----
| Id| Operation | Name |Pstart|
Pstop
--------------------------------------------------------------------
----
| 0 | SELECT STATEMENT | |
| |
| 1 | PARTITION RANGE ITERATOR | |13 |
28 |
|*2 | TABLE ACCESS FULL | SALES |13 |
28 |
--------------------------------------------------------------------
----

Predicate Information (identified by operation id):


---------------------------------------------------
2 - access("PROD_ID"=38)

In the preceding example, the query accesses 16 partitions. On 15 of these


partitions, an index is available, but no index is available for the final partition.
Because the optimizer has to choose one access path or the other, the optimizer
cannot use the index on any of the partitions.
8. With table expansion, the optimizer rewrites the original query as follows:

SELECT *
FROM sales
WHERE time_id >= TO_DATE('2000-01-01 00:00:00', 'SYYYY-MM-DD
HH24:MI:SS')
AND time_id < TO_DATE('2003-10-01 00:00:00', 'SYYYY-MM-DD
HH24:MI:SS')
AND prod_id = 38
UNION ALL
SELECT *
FROM sales
WHERE time_id >= TO_DATE('2003-10-01 00:00:00', 'SYYYY-MM-DD
HH24:MI:SS')
AND time_id < TO_DATE('2004-01-01 00:00:00', 'SYYYY-MM-DD

5-28
Chapter 5
Table Expansion

HH24:MI:SS')
AND prod_id = 38;

In the preceding query, the first query block in the UNION ALL accesses the partitions that
are indexed, while the second query block accesses the partition that is not. The two
subqueries enable the optimizer to choose to use the index in the first query block, if it is
more optimal than using a table scan of all of the partitions that are accessed.
9. Query the plan using DBMS_XPLAN.
The plan appears as follows:

Plan hash value: 2120767686

------------------------------------------------------------------------
|Id| Operation |Name |Pstart|Pstop|
------------------------------------------------------------------------
| 0|SELECT STATEMENT | | | |
| 1| VIEW |VW_TE_2 | | |
| 2| UNION-ALL | | | |
| 3| PARTITION RANGE ITERATOR | |13| 27|
| 4| TABLE ACCESS BY LOCAL INDEX ROWID BATCHED|SALES |13| 27|
| 5| BITMAP CONVERSION TO ROWIDS | | | |
|*6| BITMAP INDEX SINGLE VALUE |SALES_PROD_BIX|13| 27|
| 7| PARTITION RANGE SINGLE | |28| 28|
|*8| TABLE ACCESS FULL |SALES |28| 28|
------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

6 - access("PROD_ID"=38)
8 - filter("PROD_ID"=38)

As shown in the preceding plan, the optimizer uses a UNION ALL for two query blocks
(Step 2). The optimizer chooses an index to access partitions 13 to 27 in the first query
block (Step 6). Because no index is available for partition 28, the optimizer chooses a full
table scan in the second query block (Step 8).

5.9.4 Table Expansion and Star Transformation: Scenario


Star transformation enables specific types of queries to avoid accessing large portions of big
fact tables.
Star transformation requires defining several indexes, which in an actively updated table can
have overhead. With table expansion, you can define indexes on only the inactive partitions
so that the optimizer can consider star transformation on only the indexed portions of the
table.

Assumptions
This scenario assumes the following:
• You query the same schema used in "Star Transformation: Scenario".

5-29
Chapter 5
Table Expansion

• The last partition of sales is actively being updated, as is often the case with time-
partitioned tables.
• You want the optimizer to take advantage of table expansion.

To take advantage of table expansion in a star query:


1. Disable the indexes on the last partition as follows:

ALTER INDEX sales_channel_bix MODIFY PARTITION sales_q4_2003


UNUSABLE;
ALTER INDEX sales_cust_bix MODIFY PARTITION sales_q4_2003 UNUSABLE;

2. Execute the following star query:

SELECT t.calendar_quarter_desc, SUM(s.amount_sold) sales_amount


FROM sales s, times t, customers c, channels ch
WHERE s.time_id = t.time_id
AND s.cust_id = c.cust_id
AND s.channel_id = ch.channel_id
AND c.cust_state_province = 'CA'
AND ch.channel_desc = 'Internet'
AND t.calendar_quarter_desc IN ('1999-01','1999-02')
GROUP BY t.calendar_quarter_desc;

3. Query the cursor using DBMS_XPLAN, which shows the following plan:

---------------------------------------------------------------------------
|Id| Operation | Name | Pstart| Pstop |
---------------------------------------------------------------------------
| 0| SELECT STATEMENT | | | |
| 1| HASH GROUP BY | | | |
| 2| VIEW |VW_TE_14 | | |
| 3| UNION-ALL | | | |
| 4| HASH JOIN | | | |
| 5| TABLE ACCESS FULL |TIMES | | |
| 6| VIEW |VW_ST_1319B6D8 | | |
| 7| NESTED LOOPS | | | |
| 8| PARTITION RANGE SUBQUERY | |KEY(SQ)|KEY(SQ)|
| 9| BITMAP CONVERSION TO ROWIDS| | | |
|10| BITMAP AND | | | |
|11| BITMAP MERGE | | | |
|12| BITMAP KEY ITERATION | | | |
|13| BUFFER SORT | | | |
|14| TABLE ACCESS FULL |CHANNELS | | |
|15| BITMAP INDEX RANGE SCAN|SALES_CHANNEL_BIX|KEY(SQ)|KEY(SQ)|
|16| BITMAP MERGE | | | |
|17| BITMAP KEY ITERATION | | | |
|18| BUFFER SORT | | | |
|19| TABLE ACCESS FULL |TIMES | | |
|20| BITMAP INDEX RANGE SCAN|SALES_TIME_BIX |KEY(SQ)|KEY(SQ)|
|21| BITMAP MERGE | | | |
|22| BITMAP KEY ITERATION | | | |
|23| BUFFER SORT | | | |
|24| TABLE ACCESS FULL |CUSTOMERS | | |

5-30
Chapter 5
Join Factorization

|25| BITMAP INDEX RANGE SCAN|SALES_CUST_BIX |KEY(SQ)|KEY(SQ)|


|26| TABLE ACCESS BY USER ROWID |SALES | ROWID | ROWID |
|27| NESTED LOOPS | | | |
|28| NESTED LOOPS | | | |
|29| NESTED LOOPS | | | |
|30| NESTED LOOPS | | | |
|31| PARTITION RANGE SINGLE | | 28 | 28 |
|32| TABLE ACCESS FULL |SALES | 28 | 28 |
|33| TABLE ACCESS BY INDEX ROWID|CHANNELS | | |
|34| INDEX UNIQUE SCAN |CHANNELS_PK | | |
|35| TABLE ACCESS BY INDEX ROWID |CUSTOMERS | | |
|36| INDEX UNIQUE SCAN |CUSTOMERS_PK | | |
|37| INDEX UNIQUE SCAN |TIMES_PK | | |
|38| TABLE ACCESS BY INDEX ROWID |TIMES | | |
---------------------------------------------------------------------------

The preceding plan uses table expansion. The UNION ALL branch that is accessing every
partition except the last partition uses star transformation. Because the indexes on
partition 28 are disabled, the database accesses the final partition using a full table scan.

5.10 Join Factorization


In the cost-based transformation known as join factorization, the optimizer can factorize
common computations from branches of a UNION ALL query.

This section contains the following topics:

5.10.1 Purpose of Join Factorization


UNION ALL queries are common in database applications, especially in data integration
applications.
Often, branches in a UNION ALL query refer to the same base tables. Without join
factorization, the optimizer evaluates each branch of a UNION ALL query independently, which
leads to repetitive processing, including data access and joins. Join factorization
transformation can share common computations across the UNION ALL branches. Avoiding an
extra scan of a large base table can lead to a huge performance improvement.

5.10.2 How Join Factorization Works


Join factorization can factorize multiple tables and from more than two UNION ALL branches.

Join factorization is best explained through examples.


Example 5-10 UNION ALL Query
The following query shows a query of four tables (t1, t2, t3, and t4) and two UNION ALL
branches:

SELECT t1.c1, t2.c2


FROM t1, t2, t3
WHERE t1.c1 = t2.c1
AND t1.c1 > 1
AND t2.c2 = 2

5-31
Chapter 5
Join Factorization

AND t2.c2 = t3.c2


UNION ALL
SELECT t1.c1, t2.c2
FROM t1, t2, t4
WHERE t1.c1 = t2.c1
AND t1.c1 > 1
AND t2.c3 = t4.c3

In the preceding query, table t1 appears in both UNION ALL branches, as does the filter
predicate t1.c1 > 1 and the join predicate t1.c1 = t2.c1. Without any
transformation, the database must perform the scan and the filtering on table t1 twice,
one time for each branch.
Example 5-11 Factorized Query
Example 5-10

SELECT t1.c1, VW_JF_1.item_2


FROM t1, (SELECT t2.c1 item_1, t2.c2 item_2
FROM t2, t3
WHERE t2.c2 = t3.c2
AND t2.c2 = 2
UNION ALL
SELECT t2.c1 item_1, t2.c2 item_2
FROM t2, t4
WHERE t2.c3 = t4.c3) VW_JF_1
WHERE t1.c1 = VW_JF_1.item_1
AND t1.c1 > 1

In this case, because table t1 is factorized, the database performs the table scan and
the filtering on t1 only one time. If t1 is large, then this factorization avoids the huge
performance cost of scanning and filtering t1 twice.

Note:
If the branches in a UNION ALL query have clauses that use the DISTINCT
function, then join factorization is not valid.

5.10.3 Factorization and Join Orders: Scenario


Join factorization can create more possibilities for join orders
Example 5-12 Query Involving Five Tables
In the following query, view V is same as the query as in Example 5-10:

SELECT *
FROM t5, (SELECT t1.c1, t2.c2
FROM t1, t2, t3
WHERE t1.c1 = t2.c1
AND t1.c1 > 1
AND t2.c2 = 2

5-32
Chapter 5
Join Factorization

AND t2.c2 = t3.c2


UNION ALL
SELECT t1.c1, t2.c2
FROM t1, t2, t4
WHERE t1.c1 = t2.c1
AND t1.c1 > 1
AND t2.c3 = t4.c3) V
WHERE t5.c1 = V.c1

t1t2t3t5
Example 5-13 Factorization of t1 from View V
If join factorization factorizes t1 from view V, as shown in the following query, then the
database can join t1 with t5.:

SELECT *
FROM t5, ( SELECT t1.c1, VW_JF_1.item_2
FROM t1, (SELECT t2.c1 item_1, t2.c2 item_2
FROM t2, t3
WHERE t2.c2 = t3.c2
AND t2.c2 = 2
UNION ALL
SELECT t2.c1 item_1, t2.c2 item_2
FROM t2, t4
WHERE t2.c3 = t4.c3) VW_JF_1
WHERE t1.c1 = VW_JF_1.item_1
AND t1.c1 > 1 )
WHERE t5.c1 = V.c1

The preceding query transformation opens up new join orders. However, join factorization
imposes specific join orders. For example, in the preceding query, tables t2 and t3 appear in
the first branch of the UNION ALL query in view VW_JF_1. The database must join t2 with t3
before it can join with t1, which is not defined within the VW_JF_1 view. The imposed join
order may not necessarily be the best join order. For this reason, the optimizer performs join
factorization using the cost-based transformation framework. The optimizer calculates the
cost of the plans with and without join factorization, and then chooses the cheapest plan.
Example 5-14 Factorization of t1 from View V with View Definition Removed
The following query is the same query in Example 5-13, but with the view definition removed
so that the factorization is easier to see:

SELECT *
FROM t5, (SELECT t1.c1, VW_JF_1.item_2
FROM t1, VW_JF_1
WHERE t1.c1 = VW_JF_1.item_1
AND t1.c1 > 1)
WHERE t5.c1 = V.c1

5.10.4 Factorization of Outer Joins: Scenario


The database supports join factorization of outer joins, antijoins, and semijoins, but only for
the right tables in such joins.

5-33
Chapter 5
Join Factorization

For example, join factorization can transform the following UNION ALL query by
factorizing t2:

SELECT t1.c2, t2.c2


FROM t1, t2
WHERE t1.c1 = t2.c1(+)
AND t1.c1 = 1
UNION ALL
SELECT t1.c2, t2.c2
FROM t1, t2
WHERE t1.c1 = t2.c1(+)
AND t1.c1 = 2

The following example shows the transformation. Table t2 now no longer appears in
the UNION ALL branches of the subquery.

SELECT VW_JF_1.item_2, t2.c2


FROM t2, (SELECT t1.c1 item_1, t1.c2 item_2
FROM t1
WHERE t1.c1 = 1
UNION ALL
SELECT t1.c1 item_1, t1.c2 item_2
FROM t1
WHERE t1.c1 = 2) VW_JF_1
WHERE VW_JF_1.item_1 = t2.c1(+)

5-34
Part III
Query Execution Plans
If a query has suboptimal performance, the execution plan is the key tool for understanding
the problem and supplying a solution.
This part contains the following chapters:
6
Generating and Displaying Execution Plans
A thorough understanding of execution plans is essential to SQL tuning.
This chapter contains the following topics:

6.1 Introduction to Execution Plans


The combination of the steps that Oracle Database uses to execute a statement is an
execution plan.
Each step either retrieves rows of data physically from the database or prepares them for the
user issuing the statement. An execution plan includes an access path for each table that the
statement accesses and an ordering of the tables (the join order) with the appropriate join
method.

6.2 About Plan Generation and Display


The EXPLAIN PLAN statement displays execution plans that the optimizer chooses for SELECT,
UPDATE, INSERT, and DELETE statements.

This section contains the following topics:

6.2.1 About the Plan Explanation


A statement execution plan is the sequence of operations that the database performs to run
the statement.
The row source tree is the core of the execution plan. The tree shows the following
information:
• An ordering of the tables referenced by the statement
• An access method for each table mentioned in the statement
• A join method for tables affected by join operations in the statement
• Data operations like filter, sort, or aggregation
In addition to the row source tree, the plan table contains information about the following:
• Optimization, such as the cost and cardinality of each operation
• Partitioning, such as the set of accessed partitions
• Parallel execution, such as the distribution method of join inputs
You can use the EXPLAIN PLAN results to determine whether the optimizer chose a particular
execution plan, such as a nested loops join. The results also help you to understand the
optimizer decisions, such as why the optimizer chose a nested loops join instead of a hash
join.

6-1
Chapter 6
About Plan Generation and Display

See Also:

• "SQL Row Source Generation"


• Oracle Database SQL Language Reference to learn about the EXPLAIN
PLAN statement

6.2.2 Why Execution Plans Change


Execution plans can and do change as the underlying optimizer inputs change.
EXPLAIN PLAN output shows how the database would run the SQL statement when the
statement was explained. This plan can differ from the actual execution plan a SQL
statement uses because of differences in the execution environment and explain plan
environment.

Note:
To avoid possible SQL performance regression that may result from
execution plan changes, consider using SQL plan management.

This section contains the following topics:

See Also:

• "Overview of SQL Plan Management"


• Oracle Database PL/SQL Packages and Types Reference to learn about
the DBMS_SPM package

6.2.2.1 Different Schemas


Schemas can differ for various reasons.
Principal reasons include the following:
• The execution and explain plan occur on different databases.
• The user explaining the statement is different from the user running the statement.
Two users might be pointing to different objects in the same database, resulting in
different execution plans.
• Schema changes (often changes in indexes) between the two operations.

6.2.2.2 Different Costs


Even if the schemas are the same, the optimizer can choose different execution plans
when the costs are different.

6-2
Chapter 6
About Plan Generation and Display

Some factors that affect the costs include the following:


• Data volume and statistics
• Bind variable types and values
• Initialization parameters set globally or at session level

6.2.3 Guideline for Minimizing Throw-Away


Examining an explain plan enables you to look for rows that are thrown-away.
The database often throws away rows in the following situations:
• Full scans
• Unselective range scans
• Late predicate filters
• Wrong join order
• Late filter operations
In the plan shown in Example 6-1, the last step is a very unselective range scan that is
executed 76,563 times, accesses 11,432,983 rows, throws away 99% of them, and retains
76,563 rows. Why access 11,432,983 rows to realize that only 76,563 rows are needed?
Example 6-1 Looking for Thrown-Away Rows in an Explain Plan

Rows Execution Plan


-------- ----------------------------------------------------
12 SORT AGGREGATE
2 SORT GROUP BY
76563 NESTED LOOPS
76575 NESTED LOOPS
19 TABLE ACCESS FULL CN_PAYRUNS_ALL
76570 TABLE ACCESS BY INDEX ROWID CN_POSTING_DETAILS_ALL
76570 INDEX RANGE SCAN (object id 178321)
76563 TABLE ACCESS BY INDEX ROWID CN_PAYMENT_WORKSHEETS_ALL
11432983 INDEX RANGE SCAN (object id 186024)

6.2.4 Guidelines for Evaluating Execution Plans Using EXPLAIN PLAN


The execution plan operation alone cannot differentiate between well-tuned statements and
those that perform suboptimally.
For example, an EXPLAIN PLAN output that shows that a statement uses an index does not
necessarily mean that the statement runs efficiently. Sometimes indexes are extremely
inefficient. In this case, a good practice is to examine the following:
• The columns of the index being used
• Their selectivity (fraction of table being accessed)
It is best to use EXPLAIN PLAN to determine an access plan, and then later prove that it is the
optimal plan through testing. When evaluating a plan, examine the statement's actual
resource consumption.
This section contains the following topics:

6-3
Chapter 6
About Plan Generation and Display

6.2.4.1 Guidelines for Evaluating Plans Using the V$SQL_PLAN Views


As an alternative to running the EXPLAIN PLAN command and displaying the plan, you
can display the plan by querying the V$SQL_PLAN view.

V$SQL_PLAN contains the execution plan for every statement stored in the shared SQL
area. Its definition is similar to PLAN_TABLE.

The advantage of V$SQL_PLAN over EXPLAIN PLAN is that you do not need to know the
compilation environment that was used to execute a particular statement. For EXPLAIN
PLAN, you would need to set up an identical environment to get the same plan when
executing the statement.
The V$SQL_PLAN_STATISTICS view provides the actual execution statistics for every
operation in the plan, such as the number of output rows and elapsed time. All
statistics, except the number of output rows, are cumulative. For example, the
statistics for a join operation also includes the statistics for its two inputs. The statistics
in V$SQL_PLAN_STATISTICS are available for cursors that have been compiled with the
STATISTICS_LEVEL initialization parameter set to ALL.

The V$SQL_PLAN_STATISTICS_ALL view enables side by side comparisons of the


estimates that the optimizer provides for the number of rows and elapsed time. This
view combines both V$SQL_PLAN and V$SQL_PLAN_STATISTICS information for every
cursor.

See Also:

• "PLAN_TABLE Columns"
• "Monitoring Database Operations " for information about the
V$SQL_PLAN_MONITOR view
• Oracle Database Reference for more information about V$SQL_PLAN
views
• Oracle Database Reference for information about the STATISTICS_LEVEL
initialization parameter

6.2.5 EXPLAIN PLAN Restrictions


Oracle Database does not support EXPLAIN PLAN for statements performing implicit
type conversion of date bind variables.
With bind variables in general, the EXPLAIN PLAN output might not represent the real
execution plan.
From the text of a SQL statement, TKPROF cannot determine the types of the bind
variables. It assumes that the type is VARCHAR, and gives an error message otherwise.
You can avoid this limitation by putting appropriate type conversions in the SQL
statement.

6-4
Chapter 6
Generating Plan Output Using the EXPLAIN PLAN Statement

See Also:

• "Performing Application Tracing "


• "Guideline for Avoiding the Argument Trap"
• Oracle Database SQL Language Reference to learn more about SQL data
types

6.2.6 Guidelines for Creating PLAN_TABLE


The PLAN_TABLE is automatically created as a public synonym to a global temporary table.

This temporary table holds the output of EXPLAIN PLAN statements for all users. PLAN_TABLE
is the default sample output table into which the EXPLAIN PLAN statement inserts rows
describing execution plans.
While a PLAN_TABLE table is automatically set up for each user, you can use the SQL script
catplan.sql to manually create the global temporary table and the PLAN_TABLE synonym.
The name and location of this script depends on your operating system. On UNIX and Linux,
the script is located in the $ORACLE_HOME/rdbms/admin directory.

For example, start a SQL*Plus session, connect with SYSDBA privileges, and run the script as
follows:

@$ORACLE_HOME/rdbms/admin/catplan.sql

Oracle recommends that you drop and rebuild your local PLAN_TABLE table after upgrading
the version of the database because the columns might change. This can cause scripts to fail
or cause TKPROF to fail, if you are specifying the table.

If you do not want to use the name PLAN_TABLE, create a new synonym after running the
catplan.sql script. For example:

CREATE OR REPLACE PUBLIC SYNONYM my_plan_table for plan_table$

See Also:

• "PLAN_TABLE Columns" for a description of the columns in the table


• Oracle Database SQL Language Reference to learn about CREATE SYNONYM

6.3 Generating Plan Output Using the EXPLAIN PLAN


Statement
The EXPLAIN PLAN statement enables you to examine the execution plan that the optimizer
chose for a SQL statement.

6-5
Chapter 6
Generating Plan Output Using the EXPLAIN PLAN Statement

This section contains the following topics:

6.3.1 Executing EXPLAIN PLAN for a Single Statement


Explain the plan using database-supplied scripts.
The basics of using the EXPLAIN PLAN statement are as follows:

• Use the SQL script catplan.sql to create a sample output table called
PLAN_TABLE in your schema.
• Include the EXPLAIN PLAN FOR clause before the SQL statement.
• After issuing the EXPLAIN PLAN statement, use a script or package provided by
Oracle Database to display the most recent plan table output.
• The execution order in EXPLAIN PLAN output begins with the line that is the furthest
indented to the right. The next step is the parent of that line. If two lines are
indented equally, then the top line is normally executed first.

Note:

– The EXPLAIN PLAN output tables in this chapter were displayed with
the utlxpls.sql script.
– The steps in the EXPLAIN PLAN output in this chapter may be
different on your database. The optimizer may choose different
execution plans, depending on database configurations.

To explain a SQL statement, use the EXPLAIN PLAN FOR clause immediately before the
statement. For example:
EXPLAIN PLAN FOR
SELECT last_name FROM employees;

The preceding plan explains the plan and stores the output in the PLAN_TABLE table.
You can then select the execution plan from PLAN_TABLE.

See Also:

• "Guidelines for Creating PLAN_TABLE"


• "Displaying PLAN_TABLE Output"
• Oracle Database SQL Language Reference for the syntax and
semantics of EXPLAIN PLAN

6-6
Chapter 6
Displaying PLAN_TABLE Output

6.3.2 Executing EXPLAIN PLAN Using a Statement ID


With multiple statements, you can specify a statement identifier and use that to identify your
specific execution plan.
Before using SET STATEMENT ID, remove any existing rows for that statement ID. In the
following example, st1 is specified as the statement identifier.

Example 6-2 Using EXPLAIN PLAN with the STATEMENT ID Clause

EXPLAIN PLAN
SET STATEMENT_ID = 'st1' FOR
SELECT last_name FROM employees;

6.3.3 Directing EXPLAIN PLAN Output to a Nondefault Table


You can specify the INTO clause to specify a different table.

The following statement directs output to my_plan_table:

EXPLAIN PLAN
INTO my_plan_table FOR
SELECT last_name FROM employees;

You can specify a statement ID when using the INTO clause, as in the following statement:

EXPLAIN PLAN
SET STATEMENT_ID = 'st1'
INTO my_plan_table FOR
SELECT last_name FROM employees;

See Also:
Oracle Database SQL Language Reference for a complete description of EXPLAIN
PLAN syntax.

6.4 Displaying PLAN_TABLE Output


You can use scripts or a package to display the plan output.
After you have explained the plan, use the following SQL scripts or PL/SQL package
provided by Oracle Database to display the most recent plan table output:
• utlxpls.sql
This script displays the plan table output for serial processing. Example 6-4 is an
example of the plan table output when using the utlxpls.sql script.
• utlxplp.sql
This script displays the plan table output including parallel execution columns.

6-7
Chapter 6
Displaying PLAN_TABLE Output

• DBMS_XPLAN.DISPLAY table function


This function accepts options for displaying the plan table output. You can specify:
– A plan table name if you are using a table different than PLAN_TABLE
– A statement ID if you have set a statement ID with the EXPLAIN PLAN
– A format option that determines the level of detail: BASIC, SERIAL, TYPICAL,
and ALL
Examples of using DBMS_XPLAN to display PLAN_TABLE output are:

SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());

SELECT PLAN_TABLE_OUTPUT
FROM TABLE(DBMS_XPLAN.DISPLAY('MY_PLAN_TABLE', 'st1','TYPICAL'));

This section contains the following topics:

See Also:
Oracle Database PL/SQL Packages and Types Reference for more
information about the DBMS_XPLAN package

6.4.1 Displaying an Execution Plan: Example


This example uses EXPLAIN PLAN to examine a SQL statement that selects the
employee_id, job_title, salary, and department_name for the employees whose IDs
are less than 103.
Example 6-3 Using EXPLAIN PLAN

EXPLAIN PLAN FOR


SELECT e.employee_id, j.job_title, e.salary, d.department_name
FROM employees e, jobs j, departments d
WHERE e.employee_id < 103
AND e.job_id = j.job_id
AND e.department_id = d.department_id;

Example 6-4 EXPLAIN PLAN Output


The following output table shows the execution plan that the optimizer chose to
execute the SQL statement in Example 6-3:

-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 189 | 10 (10)|
| 1 | NESTED LOOPS | | 3 | 189 | 10 (10)|
| 2 | NESTED LOOPS | | 3 | 141 | 7 (15)|
|* 3 | TABLE ACCESS FULL | EMPLOYEES | 3 | 60 | 4 (25)|
| 4 | TABLE ACCESS BY INDEX ROWID| JOBS | 19 | 513 | 2 (50)|
|* 5 | INDEX UNIQUE SCAN | JOB_ID_PK | 1 | | |

6-8
Chapter 6
Displaying PLAN_TABLE Output

| 6 | TABLE ACCESS BY INDEX ROWID | DEPARTMENTS | 27 | 432 | 2 (50)|


|* 7 | INDEX UNIQUE SCAN | DEPT_ID_PK | 1 | | |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------
3 - filter("E"."EMPLOYEE_ID"<103)
5 - access("E"."JOB_ID"="J"."JOB_ID")
7 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID"

--------------------------------------------------------------------------------------------
----
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
Time |
--------------------------------------------------------------------------------------------
----
| 0 | SELECT STATEMENT | | 3 | 189 | 8 (13)|
00:00:01 |
| 1 | NESTED LOOPS | | | |
| |
| 2 | NESTED LOOPS | | 3 | 189 | 8 (13)|
00:00:01 |
| 3 | MERGE JOIN | | 3 | 141 | 5 (20)|
00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID | JOBS | 19 | 513 | 2 (0)|
00:00:01 |
| 5 | INDEX FULL SCAN | JOB_ID_PK | 19 | | 1 (0)|
00:00:01 |
|* 6 | SORT JOIN | | 3 | 60 | 3 (34)|
00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 3 | 60 | 2 (0)|
00:00:01 |
|* 8 | INDEX RANGE SCAN | EMP_EMP_ID_PK | 3 | | 1 (0)|
00:00:01 |
|* 9 | INDEX UNIQUE SCAN | DEPT_ID_PK | 1 | | 0 (0)|
00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID | DEPARTMENTS | 1 | 16 | 1 (0)|
00:00:01 |
--------------------------------------------------------------------------------------------
----

Predicate Information (identified by operation id):


---------------------------------------------------

6 - access("E"."JOB_ID"="J"."JOB_ID")
filter("E"."JOB_ID"="J"."JOB_ID")
8 - access("E"."EMPLOYEE_ID"<103)
9 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")

6-9
Chapter 6
Displaying PLAN_TABLE Output

6.4.2 Customizing PLAN_TABLE Output


If you have specified a statement identifier, then you can write your own script to query
the PLAN_TABLE.

For example:
• Start with ID = 0 and given STATEMENT_ID.
• Use the CONNECT BY clause to walk the tree from parent to child, the join keys
being STATEMENT_ID = PRIOR STATMENT_ID and PARENT_ID = PRIOR ID.
• Use the pseudo-column LEVEL (associated with CONNECT BY) to indent the children.

SELECT cardinality "Rows", lpad(' ',level-1) || operation


||' '||options||' '||object_name "Plan"
FROM PLAN_TABLE
CONNECT BY prior id = parent_id
AND prior statement_id = statement_id
START WITH id = 0
AND statement_id = 'st1'
ORDER BY id;

Rows Plan
------- ----------------------------------------
SELECT STATEMENT
TABLE ACCESS FULL EMPLOYEES

The NULL in the Rows column indicates that the optimizer does not have any
statistics on the table. Analyzing the table shows the following:

Rows Plan
------- ----------------------------------------
16957 SELECT STATEMENT
16957 TABLE ACCESS FULL EMPLOYEES

You can also select the COST. This is useful for comparing execution plans or for
understanding why the optimizer chooses one execution plan over another.

Note:
These simplified examples are not valid for recursive SQL.

6-10
7
Reading Execution Plans
Execution plans are represented as a tree of operations.
This chapter contains the following topics:

7.1 Reading Execution Plans: Basic


This section uses EXPLAIN PLAN examples to illustrate execution plans.

The following query displays the execution plans:

SELECT PLAN_TABLE_OUTPUT
FROM TABLE(DBMS_XPLAN.DISPLAY(NULL, 'statement_id','BASIC'));

Examples of the output from this statement are shown in Example 7-4 and Example 7-1.
Example 7-1 EXPLAIN PLAN for Statement ID ex_plan1
The following plan shows execution of a SELECT statement. The table employees is accessed
using a full table scan. Every row in the table employees is accessed, and the WHERE clause
criteria is evaluated for every row.

EXPLAIN PLAN
SET statement_id = 'ex_plan1' FOR
SELECT phone_number
FROM employees
WHERE phone_number LIKE '650%';

---------------------------------------
| Id | Operation | Name |
---------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS FULL| EMPLOYEES |
---------------------------------------

Example 7-2 EXPLAIN PLAN for Statement ID ex_plan2


This following plan shows the execution of a SELECT statement. In this example, the database
range scans the EMP_NAME_IX index to evaluate the WHERE clause criteria.

EXPLAIN PLAN
SET statement_id = 'ex_plan2' FOR
SELECT last_name
FROM employees
WHERE last_name LIKE 'Pe%';

SELECT PLAN_TABLE_OUTPUT
FROM TABLE(DBMS_XPLAN.DISPLAY(NULL, 'ex_plan2','BASIC'));

7-1
Chapter 7
Reading Execution Plans: Advanced

----------------------------------------
| Id | Operation | Name |
----------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | INDEX RANGE SCAN| EMP_NAME_IX |
----------------------------------------

7.2 Reading Execution Plans: Advanced


In some cases, execution plans can be complicated and challenging to read.
This section contains the following topics:

7.2.1 Reading Adaptive Query Plans


The adaptive optimizer is a feature of the optimizer that enables it to adapt plans
based on run-time statistics. All adaptive mechanisms can execute a final plan for a
statement that differs from the default plan.
An adaptive query plan chooses among subplans during the current statement
execution. In contrast, automatic reoptimization changes a plan only on executions
that occur after the current statement execution.
You can determine whether the database used adaptive query optimization for a SQL
statement based on the comments in the Notes section of plan. The comments
indicate whether row sources are dynamic, or whether automatic reoptimization
adapted a plan.

Assumptions
This tutorial assumes the following:
• The STATISTICS_LEVEL initialization parameter is set to ALL.
• The database uses the default settings for adaptive execution.
• As user oe, you want to issue the following separate queries:
SELECT o.order_id, v.product_name
FROM orders o,
( SELECT order_id, product_name
FROM order_items o, product_information p
WHERE p.product_id = o.product_id
AND list_price < 50
AND min_price < 40 ) v
WHERE o.order_id = v.order_id

SELECT product_name
FROM order_items o, product_information p
WHERE o.unit_price = 15
AND quantity > 1
AND p.product_id = o.product_id
• Before executing each query, you want to query DBMS_XPLAN.DISPLAY_PLAN to see
the default plan, that is, the plan that the optimizer chose before applying its
adaptive mechanism.

7-2
Chapter 7
Reading Execution Plans: Advanced

• After executing each query, you want to query DBMS_XPLAN.DISPLAY_CURSOR to see the
final plan and adaptive query plan.
• SYS has granted oe the following privileges:
– GRANT SELECT ON V_$SESSION TO oe
– GRANT SELECT ON V_$SQL TO oe
– GRANT SELECT ON V_$SQL_PLAN TO oe
– GRANT SELECT ON V_$SQL_PLAN_STATISTICS_ALL TO oe

To see the results of adaptive optimization:


1. Start SQL*Plus, and then connect to the database as user oe.
2. Query orders.
For example, use the following statement:
SELECT o.order_id, v.product_name
FROM orders o,
( SELECT order_id, product_name
FROM order_items o, product_information p
WHERE p.product_id = o.product_id
AND list_price < 50
AND min_price < 40 ) v
WHERE o.order_id = v.order_id;
3. View the plan in the cursor.
For example, run the following commands:
SET LINESIZE 165
SET PAGESIZE 0
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'+ALLSTATS'));

The following sample output has been reformatted to fit on the page. In this plan, the
optimizer chooses a nested loops join. The original optimizer estimates are shown in the
E-Rows column, whereas the actual statistics gathered during execution are shown in the
A-Rows column. In the MERGE JOIN operation, the difference between the estimated and
actual number of rows is significant.
--------------------------------------------------------------------------------------------
|Id| Operation | Name |Start|E-Rows|A-Rows|A-Time|Buff|OMem|1Mem|O/1/M|
--------------------------------------------------------------------------------------------
| 0| SELECT STATEMENT | | 1| | 269|00:00:00.09|1338| | | |
| 1| NESTED LOOPS | | 1| 1| 269|00:00:00.09|1338| | | |
| 2| MERGE JOIN CARTESIAN| | 1| 4|9135|00:00:00.03| 33| | | |
|*3| TABLE ACCESS FULL |PRODUCT_INFORMAT| 1| 1| 87|00:00:00.01| 32| | | |
| 4| BUFFER SORT | | 87|105|9135|00:00:00.01| 1|4096|4096|1/0/0|
| 5| INDEX FULL SCAN | ORDER_PK | 1|105| 105|00:00:00.01| 1| | | |
|*6| INDEX UNIQUE SCAN | ORDER_ITEMS_UK |9135| 1| 269|00:00:00.03|1305| | | |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

3 - filter(("MIN_PRICE"<40 AND "LIST_PRICE"<50))


6 - access("O"."ORDER_ID"="ORDER_ID" AND "P"."PRODUCT_ID"="O"."PRODUCT_ID")
4. Run the same query of orders that you ran in Step 2.

7-3
Chapter 7
Reading Execution Plans: Advanced

5. View the execution plan in the cursor by using the same SELECT statement that you
ran in Step 3.
The following example shows that the optimizer has chosen a different plan, using
a hash join. The Note section shows that the optimizer used statistics feedback to
adjust its cost estimates for the second execution of the query, thus illustrating
automatic reoptimization.
--------------------------------------------------------------------------------------------
|Id| Operation |Name |Start|E-Rows|A-Rows|A-Time|Buff|Reads|OMem|1Mem|O/1/M|
--------------------------------------------------------------------------------------------
| 0| SELECT STATEMENT | | 1 | |269|00:00:00.02|60|1| | | |
| 1| NESTED LOOPS | | 1 |269|269|00:00:00.02|60|1| | | |
|*2| HASH JOIN | | 1 |313|269|00:00:00.02|39|1|1000K|1000K|1/0/0|
|*3| TABLE ACCESS FULL |PRODUCT_INFORMA| 1 | 87| 87|00:00:00.01|15|0| | | |
| 4| INDEX FAST FULL SCAN|ORDER_ITEMS_UK | 1 |665|665|00:00:00.01|24|1| | | |
|*5| INDEX UNIQUE SCAN |ORDER_PK |269| 1|269|00:00:00.01|21|0| | | |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

2 - access("P"."PRODUCT_ID"="O"."PRODUCT_ID")
3 - filter(("MIN_PRICE"<40 AND "LIST_PRICE"<50))
5 - access("O"."ORDER_ID"="ORDER_ID")

Note
-----
- statistics feedback used for this statement
6. Query V$SQL to verify the performance improvement.
The following query shows the performance of the two statements (sample output
included).
SELECT CHILD_NUMBER, CPU_TIME, ELAPSED_TIME, BUFFER_GETS
FROM V$SQL
WHERE SQL_ID = 'gm2npz344xqn8';

CHILD_NUMBER CPU_TIME ELAPSED_TIME BUFFER_GETS


------------ ---------- ------------ -----------
0 92006 131485 1831
1 12000 24156 60

The second statement executed, which is child number 1, used statistics feedback.
CPU time, elapsed time, and buffer gets are all significantly lower.
7. Explain the plan for the query of order_items.
For example, use the following statement:
EXPLAIN PLAN FOR
SELECT product_name
FROM order_items o, product_information p
WHERE o.unit_price = 15
AND quantity > 1
AND p.product_id = o.product_id
8. View the plan in the plan table.
For example, run the following statement:
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

7-4
Chapter 7
Reading Execution Plans: Advanced

Sample output appears below:


-------------------------------------------------------------------------------
|Id| Operation | Name |Rows|Bytes|Cost (%CPU)|Time|
-------------------------------------------------------------------------------
| 0| SELECT STATEMENT | |4|128|7 (0)|00:00:01|
| 1| NESTED LOOPS | | | | | |
| 2| NESTED LOOPS | |4|128|7 (0)|00:00:01|
|*3| TABLE ACCESS FULL |ORDER_ITEMS |4|48 |3 (0)|00:00:01|
|*4| INDEX UNIQUE SCAN |PRODUCT_INFORMATION_PK|1| |0 (0)|00:00:01|
| 5| TABLE ACCESS BY INDEX ROWID|PRODUCT_INFORMATION |1|20 |1 (0)|00:00:01|
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

3 - filter("O"."UNIT_PRICE"=15 AND "QUANTITY">1)


4 - access("P"."PRODUCT_ID"="O"."PRODUCT_ID")

In this plan, the optimizer chooses a nested loops join.


9. Run the query that you previously explained.
For example, use the following statement:
SELECT product_name
FROM order_items o, product_information p
WHERE o.unit_price = 15
AND quantity > 1
AND p.product_id = o.product_id
10. View the plan in the cursor.

For example, run the following commands:


SET LINESIZE 165
SET PAGESIZE 0
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY(FORMAT=>'+ADAPTIVE'));

Sample output appears below. Based on statistics collected at run time (Step 4), the
optimizer chose a hash join rather than the nested loops join. The dashes (-) indicate the
steps in the nested loops plan that the optimizer considered but do not ultimately choose.
The switch illustrates the adaptive query plan feature.
-------------------------------------------------------------------------------
|Id | Operation | Name |Rows|Bytes|Cost(%CPU)|Time |
-------------------------------------------------------------------------------
| 0| SELECT STATEMENT | |4|128|7(0)|00:00:01|
| *1| HASH JOIN | |4|128|7(0)|00:00:01|
|- 2| NESTED LOOPS | | | | | |
|- 3| NESTED LOOPS | | |128|7(0)|00:00:01|
|- 4| STATISTICS COLLECTOR | | | | | |
| *5| TABLE ACCESS FULL | ORDER_ITEMS |4| 48|3(0)|00:00:01|
|-*6| INDEX UNIQUE SCAN | PRODUCT_INFORMATI_PK|1| |0(0)|00:00:01|
|- 7| TABLE ACCESS BY INDEX ROWID| PRODUCT_INFORMATION |1| 20|1(0)|00:00:01|
| 8| TABLE ACCESS FULL | PRODUCT_INFORMATION |1| 20|1(0)|00:00:01|
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------
1 - access("P"."PRODUCT_ID"="O"."PRODUCT_ID")
5 - filter("O"."UNIT_PRICE"=15 AND "QUANTITY">1)
6 - access("P"."PRODUCT_ID"="O"."PRODUCT_ID")

7-5
Chapter 7
Reading Execution Plans: Advanced

Note
-----
- this is an adaptive plan (rows marked '-' are inactive)

See Also:

• "Adaptive Query Plans"


• "Table 7-8"
• "Controlling Adaptive Optimization"
• Oracle Database Reference to learn about the STATISTICS_LEVEL
initialization parameter
• Oracle Database PL/SQL Packages and Types Reference to learn more
about DBMS_XPLAN

7.2.2 Viewing Parallel Execution with EXPLAIN PLAN


Plans for parallel queries differ in important ways from plans for serial queries.
This section contains the following topics:

7.2.2.1 About EXPLAIN PLAN and Parallel Queries


Tuning a parallel query begins much like a non-parallel query tuning exercise by
choosing the driving table. However, the rules governing the choice are different.
In the serial case, the best driving table produces the fewest numbers of rows after
applying limiting conditions. The database joins a small number of rows to larger
tables using non-unique indexes.
For example, consider a table hierarchy consisting of customer, account, and
transaction.

Figure 7-1 A Table Hierarchy


TRANSACTION
ACCOUNT
CUSTOMER

In this example, customer is the smallest table, whereas transaction is the largest
table. A typical OLTP query retrieves transaction information about a specific customer
account. The query drives from the customer table. The goal is to minimize logical I/O,
which typically minimizes other critical resources including physical I/O and CPU time.
For parallel queries, the driving table is usually the largest table. It would not be
efficient to use parallel query in this case because only a few rows from each table are

7-6
Chapter 7
Reading Execution Plans: Advanced

accessed. However, what if it were necessary to identify all customers who had transactions
of a certain type last month? It would be more efficient to drive from the transaction table
because no limiting conditions exist on the customer table. The database would join rows
from the transaction table to the account table, and then finally join the result set to the
customer table. In this case, the used on the account and customer table are probably highly
selective primary key or unique indexes rather than the non-unique indexes used in the first
query. Because the transaction table is large and the column is not selective, it would be
beneficial to use parallel query driving from the transaction table.

Parallel operations include the following:


• PARALLEL_TO_PARALLEL
• PARALLEL_TO_SERIAL
A PARALLEL_TO_SERIAL operation is always the step that occurs when the query
coordinator consumes rows from a parallel operation. Another type of operation that does
not occur in this query is a SERIAL operation. If these types of operations occur, then
consider making them parallel operations to improve performance because they too are
potential bottlenecks.
• PARALLEL_FROM_SERIAL
• PARALLEL_TO_PARALLEL
If the workloads in each step are relatively equivalent, then the PARALLEL_TO_PARALLEL
operations generally produce the best performance.
• PARALLEL_COMBINED_WITH_CHILD
• PARALLEL_COMBINED_WITH_PARENT
A PARALLEL_COMBINED_WITH_PARENT operation occurs when the database performs the
step simultaneously with the parent step.
If a parallel step produces many rows, then the QC may not be able to consume the rows as
fast as they are produced. Little can be done to improve this situation.

See Also:
The OTHER_TAG column in "PLAN_TABLE Columns"

7.2.2.2 Viewing Parallel Queries with EXPLAIN PLAN: Example


When using EXPLAIN PLAN with parallel queries, the database compiles and executes one
parallel plan. This plan is derived from the serial plan by allocating row sources specific to the
parallel support in the QC plan.
The table queue row sources (PX Send and PX Receive), the granule iterator, and buffer sorts,
required by the two parallel execution server set PQ model, are directly inserted into the
parallel plan. This plan is the same plan for all parallel execution servers when executed in
parallel or for the QC when executed serially.

7-7
Chapter 7
Reading Execution Plans: Advanced

Example 7-3 Parallel Query Explain Plan


The following simple example illustrates an EXPLAIN PLAN for a parallel query:

CREATE TABLE emp2 AS SELECT * FROM employees;

ALTER TABLE emp2 PARALLEL 2;

EXPLAIN PLAN FOR


SELECT SUM(salary)
FROM emp2
GROUP BY department_id;

SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());

-------------------------------------------------------------------------------------
|Id | Operation | Name |Rows| Bytes |Cost %CPU| TQ |IN-OUT|PQ Distrib|
-------------------------------------------------------------------------------------
|0| SELECT STATEMENT | |107| 2782 | 3 (34) | | | |
|1| PX COORDINATOR | | | | | | | |
|2| PX SEND QC (RANDOM) |:TQ10001|107| 2782 | 3 (34) | Q1,01 | P->S |QC (RAND) |
|3| HASH GROUP BY | |107| 2782 | 3 (34) | Q1,01 | PCWP | |
|4| PX RECEIVE | |107| 2782 | 3 (34) | Q1,01 | PCWP | |
|5| PX SEND HASH |:TQ10000|107| 2782 | 3 (34) | Q1,00 | P->P |HASH |
|6| HASH GROUP BY | |107| 2782 | 3 (34) | Q1,00 | PCWP | |
|7| PX BLOCK ITERATOR | |107| 2782 | 2 (0) | Q1,00 | PCWP | |
|8| TABLE ACCESS FULL|EMP2 |107| 2782 | 2 (0) | Q1,00 | PCWP | |
-------------------------------------------------------------------------------------

One set of parallel execution servers scans EMP2 in parallel, while the second set
performs the aggregation for the GROUP BY operation. The PX BLOCK ITERATOR row
source represents the splitting up of the table EMP2 into pieces to divide the scan
workload between the parallel execution servers. The PX SEND and PX RECEIVE row
sources represent the pipe that connects the two sets of parallel execution servers as
rows flow up from the parallel scan, get repartitioned through the HASH table queue,
and then read by and aggregated on the top set. The PX SEND QC row source
represents the aggregated values being sent to the QC in random (RAND) order. The
PX COORDINATOR row source represents the QC or Query Coordinator which controls
and schedules the parallel plan appearing below it in the plan tree.

7.2.3 Viewing Bitmap Indexes with EXPLAIN PLAN


Index row sources using bitmap indexes appear in the EXPLAIN PLAN output with the
word BITMAP indicating the type of the index.

Note:
Queries using bitmap join index indicate the bitmap join index access path.
The operation for bitmap join index is the same as bitmap index.

7-8
Chapter 7
Reading Execution Plans: Advanced

Example 7-4 EXPLAIN PLAN with Bitmap Indexes


In this example, the predicate c1=2 yields a bitmap from which a subtraction can take place.
From this bitmap, the bits in the bitmap for c2=6 are subtracted. Also, the bits in the bitmap
for c2 IS NULL are subtracted, explaining why there are two MINUS row sources in the plan.
The NULL subtraction is necessary for semantic correctness unless the column has a NOT
NULL constraint. The TO ROWIDS option generates the rowids necessary for the table access.

EXPLAIN PLAN FOR SELECT *


FROM t
WHERE c1 = 2
AND c2 <> 6
OR c3 BETWEEN 10 AND 20;

SELECT STATEMENT
TABLE ACCESS T BY INDEX ROWID
BITMAP CONVERSION TO ROWID
BITMAP OR
BITMAP MINUS
BITMAP MINUS
BITMAP INDEX C1_IND SINGLE VALUE
BITMAP INDEX C2_IND SINGLE VALUE
BITMAP INDEX C2_IND SINGLE VALUE
BITMAP MERGE
BITMAP INDEX C3_IND RANGE SCAN

7.2.4 Viewing Result Cache with EXPLAIN PLAN


When your query contains the result_cache hint, the ResultCache operator is inserted into
the execution plan.
For example, consider the following query:

SELECT /*+ result_cache */ deptno, avg(sal)


FROM emp
GROUP BY deptno;

To view the EXPLAIN PLAN for this query, use the following command:

EXPLAIN PLAN FOR


SELECT /*+ result_cache */ deptno, avg(sal)
FROM emp
GROUP BY deptno;

SELECT PLAN_TABLE_OUTPUT FROM TABLE (DBMS_XPLAN.DISPLAY());

The EXPLAIN PLAN output for this query should look similar to the following:

-----------------------------------------------------------------------------
---
|Id| Operation | Name |Rows|Bytes|Cost(%CPU)|
Time |
-----------------------------------------------------------------------------

7-9
Chapter 7
Reading Execution Plans: Advanced

---
|0| SELECT STATEMENT | | 11 | 77 | 4 (25)|
00:00:01|
|1| RESULT CACHE |b06ppfz9pxzstbttpbqyqnfbmy| | |
| |
|2| HASH GROUP BY | | 11 | 77 | 4 (25)|
00:00:01|
|3| TABLE ACCESS FULL| EMP |107 | 749| 3 (0) |
00:00:01|
------------------------------------------------------------------------
--------

In this EXPLAIN PLAN, the ResultCache operator is identified by its CacheId, which is
b06ppfz9pxzstbttpbqyqnfbmy. You can now run a query on the
V$RESULT_CACHE_OBJECTS view by using this CacheId.

7.2.5 Viewing Partitioned Objects with EXPLAIN PLAN


Use EXPLAIN PLAN to determine how Oracle Database accesses partitioned objects for
specific queries.
Partitions accessed after pruning are shown in the PARTITION START and PARTITION
STOP columns. The row source name for the range partition is PARTITION RANGE. For
hash partitions, the row source name is PARTITION HASH.

A join is implemented using partial partition-wise join if the DISTRIBUTION column of the
plan table of one of the joined tables contains PARTITION(KEY). Partial partition-wise
join is possible if one of the joined tables is partitioned on its join column and the table
is parallelized.
A join is implemented using full partition-wise join if the partition row source appears
before the join row source in the EXPLAIN PLAN output. Full partition-wise joins are
possible only if both joined tables are equipartitioned on their respective join columns.
Examples of execution plans for several types of partitioning follow.
This section contains the following topics:

7.2.5.1 Displaying Range and Hash Partitioning with EXPLAIN PLAN:


Examples
This example illustrates pruning by using the emp_range table, which partitioned by
range on hire_date.

Assume that the tables employees and departments from the Oracle Database sample
schema exist.

CREATE TABLE emp_range


PARTITION BY RANGE(hire_date)
(
PARTITION emp_p1 VALUES LESS THAN (TO_DATE('1-JAN-1992','DD-MON-
YYYY')),
PARTITION emp_p2 VALUES LESS THAN (TO_DATE('1-JAN-1994','DD-MON-
YYYY')),
PARTITION emp_p3 VALUES LESS THAN (TO_DATE('1-JAN-1996','DD-MON-

7-10
Chapter 7
Reading Execution Plans: Advanced

YYYY')),
PARTITION emp_p4 VALUES LESS THAN (TO_DATE('1-JAN-1998','DD-MON-YYYY')),
PARTITION emp_p5 VALUES LESS THAN (TO_DATE('1-JAN-2001','DD-MON-YYYY'))
)
AS SELECT * FROM employees;

For the first example, consider the following statement:

EXPLAIN PLAN FOR


SELECT * FROM emp_range;

Oracle Database displays something similar to the following:

--------------------------------------------------------------------
|Id| Operation | Name |Rows| Bytes|Cost|Pstart|Pstop|
--------------------------------------------------------------------
| 0| SELECT STATEMENT | | 105| 13965 | 2 | | |
| 1| PARTITION RANGE ALL| | 105| 13965 | 2 | 1 | 5 |
| 2| TABLE ACCESS FULL | EMP_RANGE | 105| 13965 | 2 | 1 | 5 |
--------------------------------------------------------------------

The database creates a partition row source on top of the table access row source. It iterates
over the set of partitions to be accessed. In this example, the partition iterator covers all
partitions (option ALL), because a predicate was not used for pruning. The PARTITION_START
and PARTITION_STOP columns of the PLAN_TABLE show access to all partitions from 1 to 5.

For the next example, consider the following statement:

EXPLAIN PLAN FOR


SELECT *
FROM emp_range
WHERE hire_date >= TO_DATE('1-JAN-1996','DD-MON-YYYY');

-----------------------------------------------------------------------
| Id | Operation | Name |Rows|Bytes|Cost|Pstart|Pstop|
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 399 | 2 | | |
| 1 | PARTITION RANGE ITERATOR| | 3 | 399 | 2 | 4 | 5 |
| *2 | TABLE ACCESS FULL |EMP_RANGE| 3 | 399 | 2 | 4 | 5 |
-----------------------------------------------------------------------

In the previous example, the partition row source iterates from partition 4 to 5 because the
database prunes the other partitions using a predicate on hire_date.

Finally, consider the following statement:

EXPLAIN PLAN FOR


SELECT *
FROM emp_range
WHERE hire_date < TO_DATE('1-JAN-1992','DD-MON-YYYY');

-----------------------------------------------------------------------
| Id | Operation | Name |Rows|Bytes|Cost|Pstart|Pstop|

7-11
Chapter 7
Reading Execution Plans: Advanced

-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 133 | 2 | | |
| 1 | PARTITION RANGE SINGLE| | 1 | 133 | 2 | 1 | 1 |
|* 2 | TABLE ACCESS FULL | EMP_RANGE | 1 | 133 | 2 | 1 | 1 |
-----------------------------------------------------------------------

In the previous example, only partition 1 is accessed and known at compile time; thus,
there is no need for a partition row source.

Note:
Oracle Database displays the same information for hash partitioned objects,
except the partition row source name is PARTITION HASH instead of
PARTITION RANGE. Also, with hash partitioning, pruning is only possible using
equality or IN-list predicates.

7.2.5.2 Pruning Information with Composite Partitioned Objects: Examples


To illustrate how Oracle Database displays pruning information for composite
partitioned objects, consider the table emp_comp. It is range-partitioned on hiredate
and subpartitioned by hash on deptno.

CREATE TABLE emp_comp PARTITION BY RANGE(hire_date)


SUBPARTITION BY HASH(department_id) SUBPARTITIONS 3
(
PARTITION emp_p1 VALUES LESS THAN (TO_DATE('1-JAN-1992','DD-MON-YYYY')),
PARTITION emp_p2 VALUES LESS THAN (TO_DATE('1-JAN-1994','DD-MON-YYYY')),
PARTITION emp_p3 VALUES LESS THAN (TO_DATE('1-JAN-1996','DD-MON-YYYY')),
PARTITION emp_p4 VALUES LESS THAN (TO_DATE('1-JAN-1998','DD-MON-YYYY')),
PARTITION emp_p5 VALUES LESS THAN (TO_DATE('1-JAN-2001','DD-MON-YYYY'))
)
AS SELECT * FROM employees;

For the first example, consider the following statement:

EXPLAIN PLAN FOR


SELECT * FROM emp_comp;

-----------------------------------------------------------------------
|Id| Operation | Name | Rows | Bytes |Cost|Pstart|Pstop|
-----------------------------------------------------------------------
| 0| SELECT STATEMENT | | 10120 | 1314K| 78 | | |
| 1| PARTITION RANGE ALL| | 10120 | 1314K| 78 | 1 | 5 |
| 2| PARTITION HASH ALL| | 10120 | 1314K| 78 | 1 | 3 |
| 3| TABLE ACCESS FULL| EMP_COMP | 10120 | 1314K| 78 | 1 | 15 |
-----------------------------------------------------------------------

This example shows the plan when Oracle Database accesses all subpartitions of all
partitions of a composite object. The database uses two partition row sources for this

7-12
Chapter 7
Reading Execution Plans: Advanced

purpose: a range partition row source to iterate over the partitions, and a hash partition row
source to iterate over the subpartitions of each accessed partition.
In the following example, the range partition row source iterates from partition 1 to 5, because
the database performs no pruning. Within each partition, the hash partition row source
iterates over subpartitions 1 to 3 of the current partition. As a result, the table access row
source accesses subpartitions 1 to 15. In other words, the database accesses all
subpartitions of the composite object.

EXPLAIN PLAN FOR


SELECT *
FROM emp_comp
WHERE hire_date = TO_DATE('15-FEB-1998', 'DD-MON-YYYY');

-----------------------------------------------------------------------
| Id | Operation | Name |Rows|Bytes |Cost|Pstart|Pstop|
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20 | 2660 | 17 | | |
| 1 | PARTITION RANGE SINGLE| | 20 | 2660 | 17 | 5 | 5 |
| 2 | PARTITION HASH ALL | | 20 | 2660 | 17 | 1 | 3 |
|* 3 | TABLE ACCESS FULL | EMP_COMP | 20 | 2660 | 17 | 13 | 15 |
-----------------------------------------------------------------------

In the previous example, only the last partition, partition 5, is accessed. This partition is
known at compile time, so the database does not need to show it in the plan. The hash
partition row source shows accessing of all subpartitions within that partition; that is,
subpartitions 1 to 3, which translates into subpartitions 13 to 15 of the emp_comp table.

Now consider the following statement:

EXPLAIN PLAN FOR


SELECT *
FROM emp_comp
WHERE department_id = 20;

------------------------------------------------------------------------
| Id | Operation |Name |Rows | Bytes |Cost|Pstart|Pstop|
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 101 | 13433 | 78 | | |
| 1 | PARTITION RANGE ALL | | 101 | 13433 | 78 | 1 | 5 |
| 2 | PARTITION HASH SINGLE| | 101 | 13433 | 78 | 3 | 3 |
|* 3 | TABLE ACCESS FULL | EMP_COMP | 101 | 13433 | 78 | | |
------------------------------------------------------------------------

In the previous example, the predicate deptno=20 enables pruning on the hash dimension
within each partition. Therefore, Oracle Database only needs to access a single subpartition.
The number of this subpartition is known at compile time, so the hash partition row source is
not needed.
Finally, consider the following statement:

VARIABLE dno NUMBER;


EXPLAIN PLAN FOR
SELECT *

7-13
Chapter 7
Reading Execution Plans: Advanced

FROM emp_comp
WHERE department_id = :dno;

-----------------------------------------------------------------------
| Id| Operation | Name |Rows| Bytes |Cost|Pstart|Pstop|
-----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 101| 13433 | 78 | | |
| 1 | PARTITION RANGE ALL | | 101| 13433 | 78 | 1 | 5 |
| 2 | PARTITION HASH SINGLE| | 101| 13433 | 78 | KEY | KEY |
|*3 | TABLE ACCESS FULL | EMP_COMP | 101| 13433 | 78 | | |
-----------------------------------------------------------------------

The last two examples are the same, except that department_id = :dno replaces
deptno=20. In this last case, the subpartition number is unknown at compile time, and
a hash partition row source is allocated. The option is SINGLE for this row source
because Oracle Database accesses only one subpartition within each partition. In Step
2, both PARTITION_START and PARTITION_STOP are set to KEY. This value means that
Oracle Database determines the number of subpartitions at run time.

7.2.5.3 Examples of Partial Partition-Wise Joins


In these examples, the PQ_DISTRIBUTE hint explicitly forces a partial partition-wise join
because the query optimizer could have chosen a different plan based on cost in this
query.
Example 7-5 Partial Partition-Wise Join with Range Partition
In the following example, the database joins emp_range_did on the partitioning column
department_id and parallelizes it. The database can use a partial partition-wise join
because the dept2 table is not partitioned. Oracle Database dynamically partitions the
dept2 table before the join.

CREATE TABLE dept2 AS SELECT * FROM departments;


ALTER TABLE dept2 PARALLEL 2;

CREATE TABLE emp_range_did PARTITION BY RANGE(department_id)


(PARTITION emp_p1 VALUES LESS THAN (150),
PARTITION emp_p5 VALUES LESS THAN (MAXVALUE) )
AS SELECT * FROM employees;

ALTER TABLE emp_range_did PARALLEL 2;

EXPLAIN PLAN FOR


SELECT /*+ PQ_DISTRIBUTE(d NONE PARTITION) ORDERED */ e.last_name,
d.department_name
FROM emp_range_did e, dept2 d
WHERE e.department_id = d.department_id;

--------------------------------------------------------------------------------------
-----
|Id| Operation |Name |Row|Byte|Cost|Pstart|Pstop|TQ|IN-OUT|PQ
Distrib|
--------------------------------------------------------------------------------------
-----

7-14
Chapter 7
Reading Execution Plans: Advanced

| 0| SELECT STATEMENT | |284 |16188|6| | | | | |


| 1| PX COORDINATOR | | | | | | | | | |
| 2| PX SEND QC (RANDOM) |:TQ10001 |284 |16188|6| | | Q1,01 |P->S|QC (RAND) |
|*3| HASH JOIN | |284 |16188|6| | | Q1,01 |PCWP| |
| 4| PX PARTITION RANGE ALL | |284 |7668 |2|1 |2| Q1,01 |PCWC| |
| 5| TABLE ACCESS FULL |EMP_RANGE_DID|284 |7668 |2|1 |2| Q1,01 |PCWP| |
| 6| BUFFER SORT | | | | | | | Q1,01 |PCWC| |
| 7| PX RECEIVE | | 21 | 630 |2| | | Q1,01 |PCWP| |
| 8| PX SEND PARTITION (KEY)|:TQ10000 | 21 | 630 |2| | | |S->P|PART (KEY)|
| 9| TABLE ACCESS FULL |DEPT2 | 21 | 630 |2| | | | | |
-------------------------------------------------------------------------------------------

The execution plan shows that the table dept2 is scanned serially and all rows with the same
partitioning column value of emp_range_did (department_id) are sent through a PART
(KEY), or partition key, table queue to the same parallel execution server doing the partial
partition-wise join.
Example 7-6 Partial Partition-Wise Join with Composite Partition
In the following example, emp_comp is joined on the partitioning column and is parallelized,
enabling use of a partial partition-wise join because dept2 is not partitioned. The database
dynamically partitions dept2 before the join.

ALTER TABLE emp_comp PARALLEL 2;

EXPLAIN PLAN FOR


SELECT /*+ PQ_DISTRIBUTE(d NONE PARTITION) ORDERED */ e.last_name,
d.department_name
FROM emp_comp e, dept2 d
WHERE e.department_id = d.department_id;

SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());

-------------------------------------------------------------------------------------------
| Id| Operation | Name |Rows |Bytes |Cost|Pstart|Pstop|TQ |IN-OUT|PQ Distrib|
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 445 |17800| 5 | | | | | |
| 1 | PX COORDINATOR | | | | | | | | | |
| 2 | PX SEND QC (RANDOM) |:TQ10001| 445 |17800| 5 | | | Q1,01 |P->S| QC (RAND)|
|*3 | HASH JOIN | | 445 |17800| 5 | | | Q1,01 |PCWP| |
| 4 | PX PARTITION RANGE ALL | | 107 | 1070| 3 |1 | 5 | Q1,01 |PCWC| |
| 5 | PX PARTITION HASH ALL | | 107 | 1070| 3 |1 | 3 | Q1,01 |PCWC| |
| 6 | TABLE ACCESS FULL |EMP_COMP| 107 | 1070| 3 |1 | 15| Q1,01 |PCWP| |
| 7 | PX RECEIVE | | 21 | 630| 1 | | | Q1,01 |PCWP| |
| 8 | PX SEND PARTITION (KEY)|:TQ10000| 21 | 630| 1 | | | Q1,00 |P->P|PART (KEY)|
| 9 | PX BLOCK ITERATOR | | 21 | 630| 1 | | | Q1,00 |PCWC| |
|10 | TABLE ACCESS FULL |DEPT2 | 21 | 630| 1 | | | Q1,00 |PCWP| |
-------------------------------------------------------------------------------------------

The plan shows that the optimizer selects partial partition-wise join from one of two columns.
The PX SEND node type is PARTITION (KEY) and the PQ Distrib column contains the text
PART (KEY), or partition key. This implies that the table dept2 is re-partitioned based on the
join column department_id to be sent to the parallel execution servers executing the scan of
EMP_COMP and the join.

7-15
Chapter 7
Reading Execution Plans: Advanced

7.2.5.4 Example of Full Partition-Wise Join


In this example, emp_comp and dept_hash are joined on their hash partitioning
columns, enabling use of a full partition-wise join.
The PARTITION HASH row source appears on top of the join row source in the plan
table output.

CREATE TABLE dept_hash


PARTITION BY HASH(department_id)
PARTITIONS 3
PARALLEL 2
AS SELECT * FROM departments;

EXPLAIN PLAN FOR


SELECT /*+ PQ_DISTRIBUTE(e NONE NONE) ORDERED */ e.last_name,
d.department_name
FROM emp_comp e, dept_hash d
WHERE e.department_id = d.department_id;

--------------------------------------------------------------------------------------
-----
|Id| Operation | Name |Rows|Bytes|Cost|Pstart|Pstop|TQ |IN-OUT|PQ
Distrib|
--------------------------------------------------------------------------------------
-----
| 0| SELECT STATEMENT | | 106 | 2544 |8| | | |
| |
| 1| PX COORDINATOR | | | | | | | |
| |
| 2| PX SEND QC (RANDOM) |:TQ10000 | 106 | 2544 |8| | | Q1,00 | P->S |QC
(RAND)|
| 3| PX PARTITION HASH ALL | | 106 | 2544 |8|1 | 3 | Q1,00 | PCWC
| |
|*4| HASH JOIN | | 106 | 2544 |8| | | Q1,00 | PCWP
| |
| 5| PX PARTITION RANGE ALL| | 107 | 1070 |3|1 | 5 | Q1,00 | PCWC
| |
| 6| TABLE ACCESS FULL |EMP_COMP | 107 | 1070 |3|1 |15 | Q1,00 | PCWP
| |
| 7| TABLE ACCESS FULL |DEPT_HASH | 27 | 378 |4|1 | 3 | Q1,00 | PCWP
| |
--------------------------------------------------------------------------------------
-----

The PX PARTITION HASH row source appears on top of the join row source in the plan
table output while the PX PARTITION RANGE row source appears over the scan of
emp_comp. Each parallel execution server performs the join of an entire hash partition
of emp_comp with an entire partition of dept_hash.

7-16
Chapter 7
Reading Execution Plans: Advanced

7.2.5.5 Examples of INLIST ITERATOR and EXPLAIN PLAN


An INLIST ITERATOR operation appears in the EXPLAIN PLAN output if an index implements
an IN-list predicate.

Consider the following statement:

SELECT * FROM emp WHERE empno IN (7876, 7900, 7902);

The EXPLAIN PLAN output appears as follows:

OPERATION OPTIONS OBJECT_NAME


---------------- --------------- --------------
SELECT STATEMENT
INLIST ITERATOR
TABLE ACCESS BY ROWID EMP
INDEX RANGE SCAN EMP_EMPNO

The INLIST ITERATOR operation iterates over the next operation in the plan for each value in
the IN-list predicate. The following sections describe the three possible types of IN-list
columns for partitioned tables and indexes.
This section contains the following topics:

7.2.5.5.1 When the IN-List Column is an Index Column: Example


If the IN-list column empno is an index column but not a partition column, then the IN-list
operator appears before the table operation but after the partition operation in the plan.

OPERATION OPTIONS OBJECT_NAME PARTIT_START PARTITI_STOP


---------------- ------------ ----------- ------------ ------------
SELECT STATEMENT
PARTITION RANGE ALL KEY(INLIST) KEY(INLIST)
INLIST ITERATOR
TABLE ACCESS BY LOCAL INDEX ROWID EMP KEY(INLIST) KEY(INLIST)
INDEX RANGE SCAN EMP_EMPNO KEY(INLIST) KEY(INLIST)

The KEY(INLIST) designation for the partition start and stop keys specifies that an IN-list
predicate appears on the index start and stop keys.

7.2.5.5.2 When the IN-List Column is an Index and a Partition Column: Example
If empno is an indexed and a partition column, then the plan contains an INLIST ITERATOR
operation before the partition operation.

OPERATION OPTIONS OBJECT_NAME PARTITION_START PARTITION_STOP


---------------- ------------ ----------- --------------- --------------
SELECT STATEMENT
INLIST ITERATOR
PARTITION RANGE ITERATOR KEY(INLIST) KEY(INLIST)

7-17
Chapter 7
Reading Execution Plans: Advanced

TABLE ACCESS BY LOCAL INDEX ROWID EMP KEY(INLIST) KEY(INLIST)


INDEX RANGE SCAN EMP_EMPNO KEY(INLIST) KEY(INLIST)

7.2.5.5.3 When the IN-List Column is a Partition Column: Example


If empno is a partition column and no indexes exist, then no INLIST ITERATOR operation
is allocated.

OPERATION OPTIONS OBJECT_NAME PARTITION_START PARTITION_STOP


---------------- ------------ ----------- --------------- --------------
SELECT STATEMENT
PARTITION RANGE INLIST KEY(INLIST) KEY(INLIST)
TABLE ACCESS FULL EMP KEY(INLIST) KEY(INLIST)

If emp_empno is a bitmap index, then the plan is as follows:

OPERATION OPTIONS OBJECT_NAME


---------------- --------------- --------------
SELECT STATEMENT
INLIST ITERATOR
TABLE ACCESS BY INDEX ROWID EMP
BITMAP CONVERSION TO ROWIDS
BITMAP INDEX SINGLE VALUE EMP_EMPNO

7.2.5.6 Example of Domain Indexes and EXPLAIN PLAN


You can use EXPLAIN PLAN to derive user-defined CPU and I/O costs for domain
indexes.
EXPLAIN PLAN displays domain index statistics in the OTHER column of PLAN_TABLE. For
example, assume table emp has user-defined operator CONTAINS with a domain index
emp_resume on the resume column, and the index type of emp_resume supports the
operator CONTAINS. You explain the plan for the following query:

SELECT * FROM emp WHERE CONTAINS(resume, 'Oracle') = 1

The database could display the following plan:

OPERATION OPTIONS OBJECT_NAME OTHER


----------------- ----------- ------------ ----------------
SELECT STATEMENT
TABLE ACCESS BY ROWID EMP
DOMAIN INDEX EMP_RESUME CPU: 300, I/O: 4

7.2.6 PLAN_TABLE Columns


The PLAN_TABLE used by the EXPLAIN PLAN statement contains the columns listed in
this topic.

7-18
Chapter 7
Reading Execution Plans: Advanced

Table 7-1 PLAN_TABLE Columns

Column Type Description


STATEMENT_ID VARCHAR2(30) Value of the optional STATEMENT_ID parameter
specified in the EXPLAIN PLAN statement.
PLAN_ID NUMBER Unique identifier of a plan in the database.
TIMESTAMP DATE Date and time when the EXPLAIN PLAN
statement was generated.
REMARKS VARCHAR2(80) Any comment (of up to 80 bytes) you want to
associate with each step of the explained plan.
This column indicates whether the database
used an outline or SQL profile for the query.
If you need to add or change a remark on any
row of the PLAN_TABLE, then use the UPDATE
statement to modify the rows of the
PLAN_TABLE.
OPERATION VARCHAR2(30) Name of the internal operation performed in this
step. In the first row generated for a statement,
the column contains one of the following values:
• DELETE STATEMENT
• INSERT STATEMENT
• SELECT STATEMENT
• UPDATE STATEMENT
See Table 7-3 for more information about values
for this column.
OPTIONS VARCHAR2(225) A variation on the operation described in the
OPERATION column.
See Table 7-3 for more information about values
for this column.
OBJECT_NODE VARCHAR2(128) Name of the database link used to reference the
object (a table name or view name). For local
queries using parallel execution, this column
describes the order in which the database
consumes output from operations.
OBJECT_OWNER VARCHAR2(30) Name of the user who owns the schema
containing the table or index.
OBJECT_NAME VARCHAR2(30) Name of the table or index.
OBJECT_ALIAS VARCHAR2(65) Unique alias of a table or view in a SQL
statement. For indexes, it is the object alias of
the underlying table.
OBJECT_INSTANCE NUMERIC Number corresponding to the ordinal position of
the object as it appears in the original statement.
The numbering proceeds from left to right, outer
to inner for the original statement text. View
expansion results in unpredictable numbers.
OBJECT_TYPE VARCHAR2(30) Modifier that provides descriptive information
about the object; for example, NON-UNIQUE for
indexes.
OPTIMIZER VARCHAR2(255) Current mode of the optimizer.
SEARCH_COLUMNS NUMBERIC Not currently used.

7-19
Chapter 7
Reading Execution Plans: Advanced

Table 7-1 (Cont.) PLAN_TABLE Columns

Column Type Description


ID NUMERIC A number assigned to each step in the execution
plan.
PARENT_ID NUMERIC The ID of the next execution step that operates
on the output of the ID step.
DEPTH NUMERIC Depth of the operation in the row source tree
that the plan represents. You can use the value
to indent the rows in a plan table report.
POSITION NUMERIC For the first row of output, this indicates the
optimizer's estimated cost of executing the
statement. For the other rows, it indicates the
position relative to the other children of the same
parent.
COST NUMERIC Cost of the operation as estimated by the
optimizer's query approach. Cost is not
determined for table access operations. The
value of this column does not have any
particular unit of measurement; it is a weighted
value used to compare costs of execution plans.
The value of this column is a function of the
CPU_COST and IO_COST columns.
CARDINALITY NUMERIC Estimate by the query optimization approach of
the number of rows that the operation accessed.
BYTES NUMERIC Estimate by the query optimization approach of
the number of bytes that the operation
accessed.

7-20
Chapter 7
Reading Execution Plans: Advanced

Table 7-1 (Cont.) PLAN_TABLE Columns

Column Type Description


OTHER_TAG VARCHAR2(255) Describes the contents of the OTHER column.
Values are:
• SERIAL (blank): Serial execution. Currently,
SQL is not loaded in the OTHER column for
this case.
• SERIAL_FROM_REMOTE (S -> R): Serial
execution at a remote site.
• PARALLEL_FROM_SERIAL (S -> P):
Serial execution. Output of step is
partitioned or broadcast to parallel
execution servers.
• PARALLEL_TO_SERIAL (P -> S): Parallel
execution. Output of step is returned to
serial QC process.
• PARALLEL_TO_PARALLEL (P -> P):
Parallel execution. Output of step is
repartitioned to second set of parallel
execution servers.
• PARALLEL_COMBINED_WITH_PARENT
(PWP): Parallel execution; Output of step
goes to next step in same parallel process.
No interprocess communication to parent.
• PARALLEL_COMBINED_WITH_CHILD
(PWC): Parallel execution. Input of step
comes from prior step in same parallel
process. No interprocess communication
from child.
PARTITION_START VARCHAR2(255) Start partition of a range of accessed partitions.
It can take one of the following values:
n indicates that the start partition has been
identified by the SQL compiler, and its partition
number is given by n.
KEY indicates that the start partition is identified
at run time from partitioning key values.
ROW REMOVE_LOCATION indicates that the
database computes the start partition (same as
the stop partition) at run time from the location of
each retrieved record. The record location is
obtained by a user or from a global index.
INVALID indicates that the range of accessed
partitions is empty.

7-21
Chapter 7
Reading Execution Plans: Advanced

Table 7-1 (Cont.) PLAN_TABLE Columns

Column Type Description


PARTITION_STOP VARCHAR2(255) Stop partition of a range of accessed partitions.
It can take one of the following values:
n indicates that the stop partition has been
identified by the SQL compiler, and its partition
number is given by n.
KEY indicates that the stop partition is identified
at run time from partitioning key values.
ROW REMOVE_LOCATION indicates that the
database computes the stop partition (same as
the start partition) at run time from the location
of each retrieved record. The record location is
obtained by a user or from a global index.
INVALID indicates that the range of accessed
partitions is empty.
PARTITION_ID NUMERIC Step that has computed the pair of values of the
PARTITION_START and PARTITION_STOP
columns.
OTHER LONG Other information that is specific to the execution
step that a user might find useful. See the
OTHER_TAG column.
DISTRIBUTION VARCHAR2(30) Method used to distribute rows from producer
query servers to consumer query servers.
See Table 7-2 for more information about the
possible values for this column. For more
information about consumer and producer query
servers, see Oracle Database VLDB and
Partitioning Guide.
CPU_COST NUMERIC CPU cost of the operation as estimated by the
query optimizer's approach. The value of this
column is proportional to the number of machine
cycles required for the operation. For statements
that use the rule-based approach, this column is
null.
IO_COST NUMERIC I/O cost of the operation as estimated by the
query optimizer's approach. The value of this
column is proportional to the number of data
blocks read by the operation. For statements
that use the rule-based approach, this column is
null.
TEMP_SPACE NUMERIC Temporary space, in bytes, that the operation
uses as estimated by the query optimizer's
approach. For statements that use the rule-
based approach, or for operations that do not
use any temporary space, this column is null.
ACCESS_PREDICATES VARCHAR2(4000) Predicates used to locate rows in an access
structure. For example, start or stop predicates
for an index range scan.
FILTER_PREDICATES VARCHAR2(4000) Predicates used to filter rows before producing
them.

7-22
Chapter 7
Reading Execution Plans: Advanced

Table 7-1 (Cont.) PLAN_TABLE Columns

Column Type Description


PROJECTION VARCHAR2(4000) Expressions produced by the operation.
TIME NUMBER(20,2) Elapsed time in seconds of the operation as
estimated by query optimization. For statements
that use the rule-based approach, this column is
null. The DBMS_XPLAN.DISPLAY_PLAN out, the
time is in the HH:MM:SS format.
QBLOCK_NAME VARCHAR2(30) Name of the query block, either system-
generated or defined by the user with the
QB_NAME hint.

Table 7-2 describes the values that can appear in the DISTRIBUTION column:

Table 7-2 Values of DISTRIBUTION Column of the PLAN_TABLE

DISTRIBUTION Text Interpretation


PARTITION Maps rows to query servers based on the partitioning of a table or index using the rowid of
(ROWID) the row to UPDATE or DELETE.
PARTITION (KEY) Maps rows to query servers based on the partitioning of a table or index using a set of
columns. Used for partial partition-wise join, PARALLEL INSERT, CREATE TABLE AS
SELECT of a partitioned table, and CREATE PARTITIONED GLOBAL INDEX.
HASH Maps rows to query servers using a hash function on the join key. Used for PARALLEL
JOIN or PARALLEL GROUP BY.
RANGE Maps rows to query servers using ranges of the sort key. Used when the statement
contains an ORDER BY clause.
ROUND-ROBIN Randomly maps rows to query servers.
BROADCAST Broadcasts the rows of the entire table to each query server. Used for a parallel join when
one table is very small compared to the other.
QC (ORDER) The QC consumes the input in order, from the first to the last query server. Used when the
statement contains an ORDER BY clause.
QC (RANDOM) The QC consumes the input randomly. Used when the statement does not have an ORDER
BY clause.

Table 7-3 lists each combination of OPERATION and OPTIONS produced by the EXPLAIN PLAN
statement and its meaning within an execution plan.

Table 7-3 OPERATION and OPTIONS Values Produced by EXPLAIN PLAN

Operation Option Description


AND-EQUAL Operation accepting multiple sets of rowids, returning the
intersection of the sets, eliminating duplicates. Used for the
single-column indexes access path.

7-23
Chapter 7
Reading Execution Plans: Advanced

Table 7-3 (Cont.) OPERATION and OPTIONS Values Produced by EXPLAIN PLAN

Operation Option Description


BITMAP CONVERSION TO ROWIDS converts bitmap representations to actual
rowids that you can use to access the table.
FROM ROWIDS converts the rowids to a bitmap
representation.
COUNT returns the number of rowids if the actual values are
not needed.
BITMAP INDEX SINGLE VALUE looks up the bitmap for a single key value in
the index.
RANGE SCAN retrieves bitmaps for a key value range.
FULL SCAN performs a full scan of a bitmap index if there is
no start or stop key.
BITMAP MERGE Merges several bitmaps resulting from a range scan into
one bitmap.
BITMAP MINUS Subtracts bits of one bitmap from another. Row source is
used for negated predicates. Use this option only if there
are nonnegated predicates yielding a bitmap from which the
subtraction can take place. An example appears in "Viewing
Bitmap Indexes with EXPLAIN PLAN".
BITMAP OR Computes the bitwise OR of two bitmaps.
BITMAP AND Computes the bitwise AND of two bitmaps.
BITMAP KEY ITERATION Takes each row from a table row source and finds the
corresponding bitmap from a bitmap index. This set of
bitmaps are then merged into one bitmap in a following
BITMAP MERGE operation.
CONNECT BY Retrieves rows in hierarchical order for a query containing a
CONNECT BY clause.
CONCATENATION Operation accepting multiple sets of rows returning the
union-all of the sets.
COUNT Operation counting the number of rows selected from a
table.
COUNT STOPKEY Count operation where the number of rows returned is
limited by the ROWNUM expression in the WHERE clause.
CUBE SCAN Uses inner joins for all cube access.
CUBE SCAN PARTIAL OUTER Uses an outer join for at least one dimension, and inner
joins for the other dimensions.
CUBE SCAN OUTER Uses outer joins for all cube access.
DOMAIN INDEX Retrieval of one or more rowids from a domain index. The
options column contain information supplied by a user-
defined domain index cost function, if any.
FILTER Operation accepting a set of rows, eliminates some of them,
and returns the rest.
FIRST ROW Retrieval of only the first row selected by a query.
FOR UPDATE Operation retrieving and locking the rows selected by a
query containing a FOR UPDATE clause.

7-24
Chapter 7
Reading Execution Plans: Advanced

Table 7-3 (Cont.) OPERATION and OPTIONS Values Produced by EXPLAIN PLAN

Operation Option Description


HASH GROUP BY Operation hashing a set of rows into groups for a query with
a GROUP BY clause.
HASH GROUP BY PIVOT Operation hashing a set of rows into groups for a query with
a GROUP BY clause. The PIVOT option indicates a pivot-
specific optimization for the HASH GROUP BY operator.
HASH JOIN Operation joining two sets of rows and returning the result.
(These are join This join method is useful for joining large data sets of data
operations.) (DSS, Batch). The join condition is an efficient way of
accessing the second table.
Query optimizer uses the smaller of the two tables/data
sources to build a hash table on the join key in memory.
Then it scans the larger table, probing the hash table to find
the joined rows.
HASH JOIN ANTI Hash (left) antijoin
HASH JOIN SEMI Hash (left) semijoin
HASH JOIN RIGHT ANTI Hash right antijoin
HASH JOIN RIGHT SEMI Hash right semijoin
HASH JOIN OUTER Hash (left) outer join
HASH JOIN RIGHT OUTER Hash right outer join
INDEX UNIQUE SCAN Retrieval of a single rowid from an index.
(These are access
methods.)
INDEX RANGE SCAN Retrieval of one or more rowids from an index. Indexed
values are scanned in ascending order.
INDEX RANGE SCAN Retrieval of one or more rowids from an index. Indexed
DESCENDING values are scanned in descending order.
INDEX FULL SCAN Retrieval of all rowids from an index when there is no start
or stop key. Indexed values are scanned in ascending order.
INDEX FULL SCAN Retrieval of all rowids from an index when there is no start
DESCENDING or stop key. Indexed values are scanned in descending
order.
INDEX FAST FULL SCAN Retrieval of all rowids (and column values) using multiblock
reads. No sorting order can be defined. Compares to a full
table scan on only the indexed columns. Only available with
the cost based optimizer.
INDEX SKIP SCAN Retrieval of rowids from a concatenated index without using
the leading column(s) in the index. Only available with the
cost based optimizer.
INLIST ITERATOR Iterates over the next operation in the plan for each value in
the IN-list predicate.
INTERSECTION Operation accepting two sets of rows and returning the
intersection of the sets, eliminating duplicates.
MERGE JOIN Operation accepting two sets of rows, each sorted by a
(These are join value, combining each row from one set with the matching
operations.) rows from the other, and returning the result.

7-25
Chapter 7
Reading Execution Plans: Advanced

Table 7-3 (Cont.) OPERATION and OPTIONS Values Produced by EXPLAIN PLAN

Operation Option Description


MERGE JOIN OUTER Merge join operation to perform an outer join statement.
MERGE JOIN ANTI Merge antijoin.
MERGE JOIN SEMI Merge semijoin.
MERGE JOIN CARTESIAN Can result from 1 or more of the tables not having any join
conditions to any other tables in the statement. Can occur
even with a join and it may not be flagged as CARTESIAN in
the plan.
CONNECT BY Retrieval of rows in hierarchical order for a query containing
a CONNECT BY clause.
MAT_VIEW REWITE FULL Retrieval of all rows from a materialized view.
ACCESS
(These are access
methods.)
MAT_VIEW REWITE SAMPLE Retrieval of sampled rows from a materialized view.
ACCESS
MAT_VIEW REWITE CLUSTER Retrieval of rows from a materialized view based on a value
ACCESS of an indexed cluster key.
MAT_VIEW REWITE HASH Retrieval of rows from materialized view based on hash
ACCESS cluster key value.
MAT_VIEW REWITE BY ROWID RANGE Retrieval of rows from a materialized view based on a rowid
ACCESS range.
MAT_VIEW REWITE SAMPLE BY Retrieval of sampled rows from a materialized view based
ACCESS ROWID RANGE on a rowid range.
MAT_VIEW REWITE BY USER ROWID If the materialized view rows are located using user-
ACCESS supplied rowids.
MAT_VIEW REWITE BY INDEX ROWID If the materialized view is nonpartitioned and rows are
ACCESS located using index(es).
MAT_VIEW REWITE BY GLOBAL If the materialized view is partitioned and rows are located
ACCESS INDEX ROWID using only global indexes.

7-26
Chapter 7
Reading Execution Plans: Advanced

Table 7-3 (Cont.) OPERATION and OPTIONS Values Produced by EXPLAIN PLAN

Operation Option Description


MAT_VIEW REWITE BY LOCAL INDEX If the materialized view is partitioned and rows are located
ACCESS ROWID using one or more local indexes and possibly some global
indexes.
Partition Boundaries:
The partition boundaries might have been computed by:
A previous PARTITION step, in which case the
PARTITION_START and PARTITION_STOP column values
replicate the values present in the PARTITION step, and the
PARTITION_ID contains the ID of the PARTITION step.
Possible values for PARTITION_START and
PARTITION_STOP are NUMBER(n), KEY, INVALID.
The MAT_VIEW REWRITE ACCESS or INDEX step itself, in
which case the PARTITION_ID contains the ID of the step.
Possible values for PARTITION_START and
PARTITION_STOP are NUMBER(n), KEY, ROW
REMOVE_LOCATION (MAT_VIEW REWRITE ACCESS only),
and INVALID.
MINUS Operation accepting two sets of rows and returning rows
appearing in the first set but not in the second, eliminating
duplicates.
NESTED LOOPS Operation accepting two sets of rows, an outer set and an
(These are join inner set. Oracle Database compares each row of the outer
operations.) set with each row of the inner set, returning rows that satisfy
a condition. This join method is useful for joining small
subsets of data (OLTP). The join condition is an efficient
way of accessing the second table.
NESTED LOOPS OUTER Nested loops operation to perform an outer join statement.
PARTITION Iterates over the next operation in the plan for each partition
in the range given by the PARTITION_START and
PARTITION_STOP columns. PARTITION describes partition
boundaries applicable to a single partitioned object (table or
index) or to a set of equipartitioned objects (a partitioned
table and its local indexes). The partition boundaries are
provided by the values of PARTITION_START and
PARTITION_STOP of the PARTITION. Refer to Table 7-1 for
valid values of partition start and stop.
PARTITION SINGLE Access one partition.
PARTITION ITERATOR Access many partitions (a subset).
PARTITION ALL Access all partitions.
PARTITION INLIST Similar to iterator, but based on an IN-list predicate.
PARTITION INVALID Indicates that the partition set to be accessed is empty.
PX ITERATOR BLOCK, CHUNK Implements the division of an object into block or chunk
ranges among a set of parallel execution servers.
PX COORDINATOR Implements the query coordinator that controls, schedules,
and executes the parallel plan below it using parallel
execution servers. It also represents a serialization point, as
the end of the part of the plan executed in parallel and
always has a PX SEND QC operation below it.

7-27
Chapter 7
Reading Execution Plans: Advanced

Table 7-3 (Cont.) OPERATION and OPTIONS Values Produced by EXPLAIN PLAN

Operation Option Description


PX PARTITION Same semantics as the regular PARTITION operation
except that it appears in a parallel plan.
PX RECEIVE Shows the consumer/receiver parallel execution node
reading repartitioned data from a send/producer (QC or
parallel execution server) executing on a PX SEND node.
This information was formerly displayed into the
DISTRIBUTION column. See Table 7-2.
PX SEND QC (RANDOM), Implements the distribution method taking place between
HASH, RANGE two parallel execution servers. Shows the boundary
between two sets and how data is repartitioned on the
send/producer side. This information was formerly displayed
into the DISTRIBUTION column. See Table 7-2.
REMOTE Retrieval of data from a remote database.
SEQUENCE Operation involving accessing values of a sequence.
SORT AGGREGATE Retrieval of a single row that is the result of applying a
group function to a group of selected rows.
SORT UNIQUE Operation sorting a set of rows to eliminate duplicates.
SORT GROUP BY Operation sorting a set of rows into groups for a query with
a GROUP BY clause.
SORT GROUP BY PIVOT Operation sorting a set of rows into groups for a query with
a GROUP BY clause. The PIVOT option indicates a pivot-
specific optimization for the SORT GROUP BY operator.
SORT JOIN Operation sorting a set of rows before a merge-join.
SORT ORDER BY Operation sorting a set of rows for a query with an ORDER
BY clause.
TABLE ACCESS FULL Retrieval of all rows from a table.
(These are access
methods.)
TABLE ACCESS SAMPLE Retrieval of sampled rows from a table.
TABLE ACCESS CLUSTER Retrieval of rows from a table based on a value of an
indexed cluster key.
TABLE ACCESS HASH Retrieval of rows from table based on hash cluster key
value.
TABLE ACCESS BY ROWID RANGE Retrieval of rows from a table based on a rowid range.
TABLE ACCESS SAMPLE BY Retrieval of sampled rows from a table based on a rowid
ROWID RANGE range.
TABLE ACCESS BY USER ROWID If the table rows are located using user-supplied rowids.
TABLE ACCESS BY INDEX ROWID If the table is nonpartitioned and rows are located using
indexes.
TABLE ACCESS BY GLOBAL If the table is partitioned and rows are located using only
INDEX ROWID global indexes.

7-28
Chapter 7
Execution Plan Reference

Table 7-3 (Cont.) OPERATION and OPTIONS Values Produced by EXPLAIN PLAN

Operation Option Description


TABLE ACCESS BY LOCAL INDEX If the table is partitioned and rows are located using one or
ROWID more local indexes and possibly some global indexes.
Partition Boundaries:
The partition boundaries might have been computed by:
A previous PARTITION step, in which case the
PARTITION_START and PARTITION_STOP column values
replicate the values present in the PARTITION step, and the
PARTITION_ID contains the ID of the PARTITION step.
Possible values for PARTITION_START and
PARTITION_STOP are NUMBER(n), KEY, INVALID.
The TABLE ACCESS or INDEX step itself, in which case the
PARTITION_ID contains the ID of the step. Possible values
for PARTITION_START and PARTITION_STOP are
NUMBER(n), KEY, ROW REMOVE_LOCATION (TABLE ACCESS
only), and INVALID.
TRANSPOSE Operation evaluating a PIVOT operation by transposing the
results of GROUP BY to produce the final pivoted data.
UNION Operation accepting two sets of rows and returns the union
of the sets, eliminating duplicates.
UNPIVOT Operation that rotates data from columns into rows.
VIEW Operation performing a view's query and then returning the
resulting rows to another operation.

See Also:
Oracle Database Reference for more information about PLAN_TABLE

7.3 Execution Plan Reference


This section describes V$ views and PLAN_COLUMN columns.

This section contains the following topics:

7.3.1 Execution Plan Views


The following dynamic performance and data dictionary views provide information on
execution plans.

7-29
Chapter 7
Execution Plan Reference

Table 7-4 Execution Plan Views

View Description
V$SQL_SHARED_CURSOR Explains why a particular child cursor is not shared with
existing child cursors. Each column identifies a specific
reason why the cursor cannot be shared.
The USE_FEEDBACK_STATS column shows whether a
child cursor fails to match because of reoptimization.
V$SQL_PLAN Includes a superset of all rows appearing in all final
plans. PLAN_LINE_ID is consecutively numbered, but
for a single final plan, the IDs may not be consecutive.
V$SQL_PLAN_STATISTICS_ALL Contains memory usage statistics for row sources that
use SQL memory (sort or hash join). This view
concatenates information in V$SQL_PLAN with
execution statistics from V$SQL_PLAN_STATISTICS
and V$SQL_WORKAREA.

7.3.2 PLAN_TABLE Columns


The PLAN_TABLE is used by the EXPLAIN PLAN statement.

PLAN_TABLE contains the columns listed in Table 7-5.

Table 7-5 PLAN_TABLE Columns

Column Type Description


STATEMENT_ID VARCHAR2(30) Value of the optional STATEMENT_ID parameter specified
in the EXPLAIN PLAN statement.
PLAN_ID NUMBER Unique identifier of a plan in the database.
TIMESTAMP DATE Date and time when the EXPLAIN PLAN statement was
generated.
REMARKS VARCHAR2(80) Any comment (of up to 80 bytes) you want to associate
with each step of the explained plan. This column
indicates whether the database used an outline or SQL
profile for the query.
If you need to add or change a remark on any row of the
PLAN_TABLE, then use the UPDATE statement to modify
the rows of the PLAN_TABLE.
OPERATION VARCHAR2(30) Name of the internal operation performed in this step. In
the first row generated for a statement, the column
contains one of the following values:
• DELETE STATEMENT
• INSERT STATEMENT
• SELECT STATEMENT
• UPDATE STATEMENT
See Table 7-6 for more information about values for this
column.

7-30
Chapter 7
Execution Plan Reference

Table 7-5 (Cont.) PLAN_TABLE Columns

Column Type Description


OPTIONS VARCHAR2(225) A variation on the operation that the OPERATION column
describes.
See Table 7-6 for more information about values for this
column.
OBJECT_NODE VARCHAR2(128) Name of the database link used to reference the object
(a table name or view name). For local queries using
parallel execution, this column describes the order in
which the database consumes output from operations.
OBJECT_OWNER VARCHAR2(30) Name of the user who owns the schema containing the
table or index.
OBJECT_NAME VARCHAR2(30) Name of the table or index.
OBJECT_ALIAS VARCHAR2(65) Unique alias of a table or view in a SQL statement. For
indexes, it is the object alias of the underlying table.
OBJECT_INSTANCE NUMERIC Number corresponding to the ordinal position of the
object as it appears in the original statement. The
numbering proceeds from left to right, outer to inner for
the original statement text. View expansion results in
unpredictable numbers.
OBJECT_TYPE VARCHAR2(30) Modifier that provides descriptive information about the
object; for example, NONUNIQUE for indexes.
OPTIMIZER VARCHAR2(255) Current mode of the optimizer.
SEARCH_COLUMNS NUMBERIC Not currently used.
ID NUMERIC A number assigned to each step in the execution plan.
PARENT_ID NUMERIC The ID of the next execution step that operates on the
output of the ID step.
DEPTH NUMERIC Depth of the operation in the row source tree that the
plan represents. You can use this value to indent the
rows in a plan table report.
POSITION NUMERIC For the first row of output, this indicates the optimizer's
estimated cost of executing the statement. For the other
rows, it indicates the position relative to the other children
of the same parent.
COST NUMERIC Cost of the operation as estimated by the optimizer's
query approach. Cost is not determined for table access
operations. The value of this column does not have any
particular unit of measurement; it is a weighted value
used to compare costs of execution plans. The value of
this column is a function of the CPU_COST and IO_COST
columns.
CARDINALITY NUMERIC Estimate by the query optimization approach of the
number of rows that the operation accessed.
BYTES NUMERIC Estimate by the query optimization approach of the
number of bytes that the operation accessed.

7-31
Chapter 7
Execution Plan Reference

Table 7-5 (Cont.) PLAN_TABLE Columns

Column Type Description


OTHER_TAG VARCHAR2(255) Describes the contents of the OTHER column. Values are:
• SERIAL (blank): Serial execution. Currently, SQL is
not loaded in the OTHER column for this case.
• SERIAL_FROM_REMOTE (S -> R): Serial execution
at a remote site.
• PARALLEL_FROM_SERIAL (S -> P): Serial
execution. Output of step is partitioned or broadcast
to parallel execution servers.
• PARALLEL_TO_SERIAL (P -> S): Parallel
execution. Output of step is returned to serial QC
process.
• PARALLEL_TO_PARALLEL (P -> P): Parallel
execution. Output of step is repartitioned to second
set of parallel execution servers.
• PARALLEL_COMBINED_WITH_PARENT (PWP):
Parallel execution; Output of step goes to next step
in same parallel process. No interprocess
communication to parent.
• PARALLEL_COMBINED_WITH_CHILD (PWC): Parallel
execution. Input of step comes from prior step in
same parallel process. No interprocess
communication from child.
PARTITION_START VARCHAR2(255) Start partition of a range of accessed partitions. It can
take one of the following values:
n indicates that the start partition has been identified by
the SQL compiler, and its partition number is given by n.
KEY indicates that the start partition is identified at run
time from partitioning key values.
ROW LOCATION indicates that the database computes
the start partition (same as the stop partition) at run time
from the location of each retrieved record. The record
location is obtained by a user-specified ROWID or from a
global index.
INVALID indicates that the range of accessed partitions
is empty.
PARTITION_STOP VARCHAR2(255) Stop partition of a range of accessed partitions. It can
take one of the following values:
n indicates that the stop partition has been identified by
the SQL compiler, and its partition number is given by n.
KEY indicates that the stop partition is identified at run
time from partitioning key values.
ROW LOCATION indicates that the database computes
the stop partition (same as the start partition) at run time
from the location of each retrieved record. The record
location is obtained by a user or from a global index.
INVALID indicates that the range of accessed partitions
is empty.
PARTITION_ID NUMERIC Step that has computed the pair of values of the
PARTITION_START and PARTITION_STOP columns.

7-32
Chapter 7
Execution Plan Reference

Table 7-5 (Cont.) PLAN_TABLE Columns

Column Type Description


OTHER LONG Other information that is specific to the execution step
that a user might find useful. See the OTHER_TAG
column.
DISTRIBUTION VARCHAR2(30) Method used to distribute rows from producer query
servers to consumer query servers.
See "Table 7-6" for more information about the possible
values for this column. For more information about
consumer and producer query servers, see Oracle
Database VLDB and Partitioning Guide.
CPU_COST NUMERIC CPU cost of the operation as estimated by the query
optimizer's approach. The value of this column is
proportional to the number of machine cycles required for
the operation. For statements that use the rule-based
approach, this column is null.
IO_COST NUMERIC I/O cost of the operation as estimated by the query
optimizer's approach. The value of this column is
proportional to the number of data blocks read by the
operation. For statements that use the rule-based
approach, this column is null.
TEMP_SPACE NUMERIC Temporary space, in bytes, used by the operation as
estimated by the query optimizer's approach. For
statements that use the rule-based approach, or for
operations that do not use any temporary space, this
column is null.
ACCESS_PREDICATES VARCHAR2(4000) Predicates used to locate rows in an access structure.
For example, start or stop predicates for an index range
scan.
FILTER_PREDICATES VARCHAR2(4000) Predicates used to filter rows before producing them.
PROJECTION VARCHAR2(4000) Expressions produced by the operation.
TIME NUMBER(20,2) Elapsed time in seconds of the operation as estimated by
query optimization. For statements that use the rule-
based approach, this column is null.
QBLOCK_NAME VARCHAR2(30) Name of the query block, either system-generated or
defined by the user with the QB_NAME hint.

Table 7-6 Values of DISTRIBUTION Column of the PLAN_TABLE

DISTRIBUTION Text Interpretation


PARTITION (ROWID) Maps rows to query servers based on the partitioning of a table or index using
the rowid of the row to UPDATE/DELETE.
PARTITION (KEY) Maps rows to query servers based on the partitioning of a table or index using a
set of columns. Used for partial partition-wise join, PARALLEL INSERT, CREATE
TABLE AS SELECT of a partitioned table, and CREATE PARTITIONED GLOBAL
INDEX.
HASH Maps rows to query servers using a hash function on the join key. Used for
PARALLEL JOIN or PARALLEL GROUP BY.

7-33
Chapter 7
Execution Plan Reference

Table 7-6 (Cont.) Values of DISTRIBUTION Column of the PLAN_TABLE

DISTRIBUTION Text Interpretation


RANGE Maps rows to query servers using ranges of the sort key. Used when the
statement contains an ORDER BY clause.
ROUND-ROBIN Randomly maps rows to query servers.
BROADCAST Broadcasts the rows of the entire table to each query server. Used for a parallel
join when one table is very small compared to the other.
QC (ORDER) The QC consumes the input in order, from the first to the last query server.
Used when the statement contains an ORDER BY clause.
QC (RANDOM) The QC consumes the input randomly. Used when the statement does not have
an ORDER BY clause.

Table 7-7 lists each combination of OPERATION and OPTIONS produced by the EXPLAIN
PLAN statement and its meaning within an execution plan.

Table 7-7 OPERATION and OPTIONS Values Produced by EXPLAIN PLAN

Operation Option Description


AND-EQUAL Operation accepting multiple sets of rowids, returning the
intersection of the sets, eliminating duplicates. Used for the single-
column indexes access path.
BITMAP CONVERSION TO ROWIDS converts bitmap representations to actual rowids that
you can use to access the table.
FROM ROWIDS converts the rowids to a bitmap representation.
COUNT returns the number of rowids if the actual values are not
needed.
BITMAP INDEX SINGLE VALUE looks up the bitmap for a single key value in the
index.
RANGE SCAN retrieves bitmaps for a key value range.
FULL SCAN performs a full scan of a bitmap index if there is no start
or stop key.
BITMAP MERGE Merges several bitmaps resulting from a range scan into one bitmap.
BITMAP MINUS Subtracts bits of one bitmap from another. Row source is used for
negated predicates. This option is usable only if there are non-
negated predicates yielding a bitmap from which the subtraction can
take place.
BITMAP OR Computes the bitwise OR of two bitmaps.
BITMAP AND Computes the bitwise AND of two bitmaps.
BITMAP KEY ITERATION Takes each row from a table row source and finds the corresponding
bitmap from a bitmap index. This set of bitmaps are then merged
into one bitmap in a following BITMAP MERGE operation.
CONNECT BY Retrieves rows in hierarchical order for a query containing a
CONNECT BY clause.
CONCATENATION Operation accepting multiple sets of rows returning the union-all of
the sets.
COUNT Operation counting the number of rows selected from a table.

7-34
Chapter 7
Execution Plan Reference

Table 7-7 (Cont.) OPERATION and OPTIONS Values Produced by EXPLAIN PLAN

Operation Option Description


COUNT STOPKEY Count operation where the number of rows returned is limited by the
ROWNUM expression in the WHERE clause.
CUBE JOIN Joins a table or view on the left and a cube on the right.
See Oracle Database SQL Language Reference to learn about the
NO_USE_CUBE and USE_CUBE hints.
CUBE JOIN ANTI Uses an antijoin for a table or view on the left and a cube on the
right.
CUBE JOIN ANTI SNA Uses an antijoin (single-sided null aware) for a table or view on the
left and a cube on the right. The join column on the right (cube side)
is NOT NULL.
CUBE JOIN OUTER Uses an outer join for a table or view on the left and a cube on the
right.
CUBE JOIN RIGHT SEMI Uses a right semijoin for a table or view on the left and a cube on the
right.
CUBE SCAN Uses inner joins for all cube access.
CUBE SCAN PARTIAL OUTER Uses an outer join for at least one dimension, and inner joins for the
other dimensions.
CUBE SCAN OUTER Uses outer joins for all cube access.
DOMAIN INDEX Retrieval of one or more rowids from a domain index. The options
column contain information supplied by a user-defined domain index
cost function, if any.
FILTER Operation accepting a set of rows, eliminates some of them, and
returns the rest.
FIRST ROW Retrieval of only the first row selected by a query.
FOR UPDATE Operation retrieving and locking the rows selected by a query
containing a FOR UPDATE clause.
HASH GROUP BY Operation hashing a set of rows into groups for a query with a
GROUP BY clause.
HASH GROUP BY PIVOT Operation hashing a set of rows into groups for a query with a
GROUP BY clause. The PIVOT option indicates a pivot-specific
optimization for the HASH GROUP BY operator.
HASH JOIN Operation joining two sets of rows and returning the result. This join
(These are join method is useful for joining large data sets of data (DSS, Batch).
operations.) The join condition is an efficient way of accessing the second table.
Query optimizer uses the smaller of the two tables/data sources to
build a hash table on the join key in memory. Then it scans the larger
table, probing the hash table to find the joined rows.
HASH JOIN ANTI Hash (left) antijoin
HASH JOIN SEMI Hash (left) semijoin
HASH JOIN RIGHT ANTI Hash right antijoin
HASH JOIN RIGHT SEMI Hash right semijoin
HASH JOIN OUTER Hash (left) outer join
HASH JOIN RIGHT OUTER Hash right outer join

7-35
Chapter 7
Execution Plan Reference

Table 7-7 (Cont.) OPERATION and OPTIONS Values Produced by EXPLAIN PLAN

Operation Option Description


INDEX UNIQUE SCAN Retrieval of a single rowid from an index.
(These are access
methods.)
INDEX RANGE SCAN Retrieval of one or more rowids from an index. Indexed values are
scanned in ascending order.
INDEX RANGE SCAN Retrieval of one or more rowids from an index. Indexed values are
DESCENDING scanned in descending order.
INDEX FULL SCAN Retrieval of all rowids from an index when there is no start or stop
key. Indexed values are scanned in ascending order.
INDEX FULL SCAN Retrieval of all rowids from an index when there is no start or stop
DESCENDING key. Indexed values are scanned in descending order.
INDEX FAST FULL SCAN Retrieval of all rowids (and column values) using multiblock reads.
No sorting order can be defined. Compares to a full table scan on
only the indexed columns. Only available with the cost based
optimizer.
INDEX SKIP SCAN Retrieval of rowids from a concatenated index without using the
leading column(s) in the index. Only available with the cost based
optimizer.
INLIST ITERATOR Iterates over the next operation in the plan for each value in the IN-
list predicate.
INTERSECTION Operation accepting two sets of rows and returning the intersection
of the sets, eliminating duplicates.
MERGE JOIN Operation accepting two sets of rows, each sorted by a value,
(These are join combining each row from one set with the matching rows from the
operations.) other, and returning the result.

MERGE JOIN OUTER Merge join operation to perform an outer join statement.
MERGE JOIN ANTI Merge antijoin.
MERGE JOIN SEMI Merge semijoin.
MERGE JOIN CARTESIAN Can result from 1 or more of the tables not having any join
conditions to any other tables in the statement. Can occur even with
a join and it may not be flagged as CARTESIAN in the plan.
CONNECT BY Retrieval of rows in hierarchical order for a query containing a
CONNECT BY clause.
MAT_VIEW REWITE FULL Retrieval of all rows from a materialized view.
ACCESS
(These are access
methods.)
MAT_VIEW REWITE SAMPLE Retrieval of sampled rows from a materialized view.
ACCESS
MAT_VIEW REWITE CLUSTER Retrieval of rows from a materialized view based on a value of an
ACCESS indexed cluster key.
MAT_VIEW REWITE HASH Retrieval of rows from materialized view based on hash cluster key
ACCESS value.

7-36
Chapter 7
Execution Plan Reference

Table 7-7 (Cont.) OPERATION and OPTIONS Values Produced by EXPLAIN PLAN

Operation Option Description


MAT_VIEW REWITE BY ROWID RANGE Retrieval of rows from a materialized view based on a rowid range.
ACCESS
MAT_VIEW REWITE SAMPLE BY ROWID Retrieval of sampled rows from a materialized view based on a rowid
ACCESS RANGE range.
MAT_VIEW REWITE BY USER ROWID If the materialized view rows are located using user-supplied rowids.
ACCESS
MAT_VIEW REWITE BY INDEX ROWID If the materialized view is nonpartitioned and rows are located using
ACCESS indexes.
MAT_VIEW REWITE BY GLOBAL INDEX If the materialized view is partitioned and rows are located using
ACCESS ROWID only global indexes.
MAT_VIEW REWITE BY LOCAL INDEX If the materialized view is partitioned and rows are located using one
ACCESS ROWID or more local indexes and possibly some global indexes.
Partition Boundaries:
The partition boundaries might have been computed by:
A previous PARTITION step, in which case the PARTITION_START
and PARTITION_STOP column values replicate the values present in
the PARTITION step, and the PARTITION_ID contains the ID of the
PARTITION step. Possible values for PARTITION_START and
PARTITION_STOP are NUMBER(n), KEY, and INVALID.
The MAT_VIEW REWRITE ACCESS or INDEX step itself, in which
case the PARTITION_ID contains the ID of the step. Possible values
for PARTITION_START and PARTITION_STOP are NUMBER(n), KEY,
ROW LOCATION (MAT_VIEW REWRITE ACCESS only), and INVALID.
MINUS Operation accepting two sets of rows and returning rows appearing
in the first set but not in the second, eliminating duplicates.
NESTED LOOPS Operation accepting two sets of rows, an outer set and an inner set.
(These are join Oracle Database compares each row of the outer set with each row
operations.) of the inner set, returning rows that satisfy a condition. This join
method is useful for joining small subsets of data (OLTP). The join
condition is an efficient way of accessing the second table.
NESTED LOOPS OUTER Nested loops operation to perform an outer join statement.
PARTITION Iterates over the next operation in the plan for each partition in the
range given by the PARTITION_START and PARTITION_STOP
columns. PARTITION describes partition boundaries applicable to a
single partitioned object (table or index) or to a set of equipartitioned
objects (a partitioned table and its local indexes). The partition
boundaries are provided by the values of PARTITION_START and
PARTITION_STOP of the PARTITION. Refer to Table 7-4 for valid
values of partition start and stop.
PARTITION SINGLE Access one partition.
PARTITION ITERATOR Access many partitions (a subset).
PARTITION ALL Access all partitions.
PARTITION INLIST Similar to iterator, but based on an IN-list predicate.
PARTITION INVALID Indicates that the partition set to be accessed is empty.
POLYMORPHIC Indicates the row source for a polymorphic table function, which is a
TABLE FUNCTION table function whose return type is determined by its arguments.

7-37
Chapter 7
Execution Plan Reference

Table 7-7 (Cont.) OPERATION and OPTIONS Values Produced by EXPLAIN PLAN

Operation Option Description


PX ITERATOR BLOCK, CHUNK Implements the division of an object into block or chunk ranges
among a set of parallel execution servers.
PX COORDINATOR Implements the Query Coordinator which controls, schedules, and
executes the parallel plan below it using parallel execution servers. It
also represents a serialization point, as the end of the part of the
plan executed in parallel and always has a PX SEND QC operation
below it.
PX PARTITION Same semantics as the regular PARTITION operation except that it
appears in a parallel plan.
PX RECEIVE Shows the consumer/receiver parallel execution node reading
repartitioned data from a send/producer (QC or parallel execution
server) executing on a PX SEND node. This information was
formerly displayed into the DISTRIBUTION column. See Table 7-5.
PX SEND QC (RANDOM), HASH, Implements the distribution method taking place between two sets of
RANGE parallel execution servers. Shows the boundary between two sets
and how data is repartitioned on the send/producer side (QC or side.
This information was formerly displayed into the DISTRIBUTION
column. See Table 7-5.
REMOTE Retrieval of data from a remote database.
SEQUENCE Operation involving accessing values of a sequence.
SORT AGGREGATE Retrieval of a single row that is the result of applying a group
function to a group of selected rows.
SORT UNIQUE Operation sorting a set of rows to eliminate duplicates.
SORT GROUP BY Operation sorting a set of rows into groups for a query with a GROUP
BY clause.
SORT GROUP BY PIVOT Operation sorting a set of rows into groups for a query with a GROUP
BY clause. The PIVOT option indicates a pivot-specific optimization
for the SORT GROUP BY operator.
SORT JOIN Operation sorting a set of rows before a merge-join.
SORT ORDER BY Operation sorting a set of rows for a query with an ORDER BY clause.
TABLE ACCESS FULL Retrieval of all rows from a table.
(These are access
methods.)
TABLE ACCESS SAMPLE Retrieval of sampled rows from a table.
TABLE ACCESS CLUSTER Retrieval of rows from a table based on a value of an indexed cluster
key.
TABLE ACCESS HASH Retrieval of rows from table based on hash cluster key value.
TABLE ACCESS BY ROWID RANGE Retrieval of rows from a table based on a rowid range.
TABLE ACCESS SAMPLE BY ROWID Retrieval of sampled rows from a table based on a rowid range.
RANGE
TABLE ACCESS BY USER ROWID If the table rows are located using user-supplied rowids.
TABLE ACCESS BY INDEX ROWID If the table is nonpartitioned and rows are located using index(es).
TABLE ACCESS BY GLOBAL INDEX If the table is partitioned and rows are located using only global
ROWID indexes.

7-38
Chapter 7
Execution Plan Reference

Table 7-7 (Cont.) OPERATION and OPTIONS Values Produced by EXPLAIN PLAN

Operation Option Description


TABLE ACCESS BY LOCAL INDEX If the table is partitioned and rows are located using one or more
ROWID local indexes and possibly some global indexes.
Partition Boundaries:
The partition boundaries might have been computed by:
A previous PARTITION step, in which case the PARTITION_START
and PARTITION_STOP column values replicate the values present in
the PARTITION step, and the PARTITION_ID contains the ID of the
PARTITION step. Possible values for PARTITION_START and
PARTITION_STOP are NUMBER(n), KEY, and INVALID.
The TABLE ACCESS or INDEX step itself, in which case the
PARTITION_ID contains the ID of the step. Possible values for
PARTITION_START and PARTITION_STOP are NUMBER(n), KEY, ROW
LOCATION (TABLE ACCESS only), and INVALID.
TRANSPOSE Operation evaluating a PIVOT operation by transposing the results of
GROUP BY to produce the final pivoted data.
UNION Operation accepting two sets of rows and returns the union of the
sets, eliminating duplicates.
UNPIVOT Operation that rotates data from columns into rows.
VIEW Operation performing a view's query and then returning the resulting
rows to another operation.

See Also:
Oracle Database Reference for more information about PLAN_TABLE

7.3.3 DBMS_XPLAN Display Functions


You can use the DBMS_XPLAN display functions to show plans.

The display functions accept options for displaying the plan table output. You can specify:
• A plan table name if you are using a table different from PLAN_TABLE
• A statement ID if you have set a statement ID with the EXPLAIN PLAN
• A format option that determines the level of detail: BASIC, SERIAL, TYPICAL, ALL, and in
some cases ADAPTIVE

7-39
Chapter 7
Execution Plan Reference

Table 7-8 DBMS_XPLAN Display Functions

Display Functions Notes


DISPLAY This table function displays the contents of the plan table.
In addition, you can use this table function to display any plan (with or
without statistics) stored in a table as long as the columns of this table
are named the same as columns of the plan table (or
V$SQL_PLAN_STATISTICS_ALL if statistics are included). You can
apply a predicate on the specified table to select rows of the plan to
display.
The format parameter controls the level of the plan. It accepts the
values BASIC, TYPICAL, SERIAL, and ALL.
DISPLAY_AWR This table function displays the contents of an execution plan stored in
AWR.
The format parameter controls the level of the plan. It accepts the
values BASIC, TYPICAL, SERIAL, and ALL.
DISPLAY_CURSOR This table function displays the explain plan of any cursor loaded in the
cursor cache. In addition to the explain plan, various plan statistics
(such as. I/O, memory and timing) can be reported (based on the
V$SQL_PLAN_STATISTICS_ALL VIEWS).
The format parameter controls the level of the plan. It accepts the
values BASIC, TYPICAL, SERIAL, ALL, and ADAPTIVE. When you
specify ADAPTIVE, the output includes:
• The final plan. If the execution has not completed, then the output
shows the current plan. This section also includes notes about
run-time optimizations that affect the plan.
• Recommended plan. In reporting mode, the output includes the
plan that would be chosen based on execution statistics.
• Dynamic plan. The output summarizes the portions of the plan
that differ from the default plan chosen by the optimizer.
• Reoptimization. The output displays the plan that would be chosen
on a subsequent execution because of reoptimization.
DISPLAY_PLAN This table function displays the contents of the plan table in a variety of
formats with CLOB output type.
The format parameter controls the level of the plan. It accepts the
values BASIC, TYPICAL, SERIAL, ALL, and ADAPTIVE. When you
specify ADAPTIVE, the output includes the default plan. For each
dynamic subplan, the plan shows a list of the row sources from the
original that may be replaced, and the row sources that would replace
them.
If the format argument specifies the outline display, then the function
displays the hints for each option in the dynamic subplan. If the plan is
not an adaptive query plan, then the function displays the default plan.
When you do not specify ADAPTIVE, the plan is shown as-is, but with
additional comments in the Note section that show any row sources
that are dynamic.

7-40
Chapter 7
Execution Plan Reference

Table 7-8 (Cont.) DBMS_XPLAN Display Functions

Display Functions Notes


DISPLAY_SQL_PLAN_ This table function displays one or more execution plans for the
BASELINE specified SQL handle of a SQL plan baseline.
This function uses plan information stored in the plan baseline to
explain and display the plans. The plan_id stored in the SQL
management base may not match the plan_id of the generated plan.
A mismatch between the stored plan_id and generated plan_id
means that it is a non-reproducible plan. Such a plan is deemed invalid
and is bypassed by the optimizer during SQL compilation.
DISPLAY_SQLSET This table function displays the execution plan of a given statement
stored in a SQL tuning set.
The format parameter controls the level of the plan. It accepts the
values BASIC, TYPICAL, SERIAL, and ALL.

See Also:
Oracle Database PL/SQL Packages and Types Reference to learn more about
DBMS_XPLAN display functions

7-41
Part IV
SQL Operators: Access Paths and Joins
A row source is a set of rows returned by a step in the execution plan. A SQL operator acts
on a row source.
A unary operator acts on one input, as with access paths. A binary operator acts on two
outputs, as with joins.
This part contains the following chapters:
8
Optimizer Access Paths
An access path is a technique used by a query to retrieve rows from a row source.
This chapter contains the following topics:

8.1 Introduction to Access Paths


A row source is a set of rows returned by a step in an execution plan. A row source can be a
table, view, or result of a join or grouping operation.
A unary operation such as an access path, which is a technique used by a query to retrieve
rows from a row source, accepts a single row source as input. For example, a full table scan
is the retrieval of rows of a single row source. In contrast, a join operation is binary and
receives inputs from two row sources
The database uses different access paths for different relational data structures. The
following table summarizes common access paths for the major data structures.

Table 8-1 Data Structures and Access Paths

Access Path Heap-Organized B-Tree Indexes Bitmap Indexes Table Clusters


Tables and IOTs
Full Table Scans x
Table Access by Rowid x
Sample Table Scans x
Index Unique Scans x
Index Range Scans x
Index Full Scans x
Index Fast Full Scans x
Index Skip Scans x
Index Join Scans x
Bitmap Index Single Value x
Bitmap Index Range Scans x
Bitmap Merge x
Bitmap Index Range Scans x
Cluster Scans x
Hash Scans x

The optimizer considers different possible execution plans, and then assigns each plan a
cost. The optimizer chooses the plan with the lowest cost. In general, index access paths are
more efficient for statements that retrieve a small subset of table rows, whereas full table
scans are more efficient when accessing a large portion of a table.

8-1
Chapter 8
Table Access Paths

See Also:

• "Joins"
• "Cost-Based Optimization"
• Oracle Database Concepts for an overview of these structures

8.2 Table Access Paths


A table is the basic unit of data organization in an Oracle database.
Relational tables are the most common table type. Relational tables have with the
following organizational characteristics:
• A heap-organized table does not store rows in any particular order.
• An index-organized table orders rows according to the primary key values.
• An external table is a read-only table whose metadata is stored in the database
but whose data is stored outside the database.
This section explains optimizer access paths for heap-organized tables, and contains
the following topics:

See Also:

• Oracle Database Concepts for an overview of tables


• Oracle Database Administrator’s Guide to learn how to manage tables

8.2.1 About Heap-Organized Table Access


By default, a table is organized as a heap, which means that the database places rows
where they fit best rather than in a user-specified order.
As users add rows, the database places the rows in the first available free space in the
data segment. Rows are not guaranteed to be retrieved in the order in which they were
inserted.
This section contains the following topics:

8.2.1.1 Row Storage in Data Blocks and Segments: A Primer


The database stores rows in data blocks. In tables, the database can write a row
anywhere in the bottom part of the block. Oracle Database uses the block overhead,
which contains the row directory and table directory, to manage the block itself.
An extent is made up of logically contiguous data blocks. The blocks may not be
physically contiguous on disk. A segment is a set of extents that contains all the data
for a logical storage structure within a tablespace. For example, Oracle Database

8-2
Chapter 8
Table Access Paths

allocates one or more extents to form the data segment for a table. The database also
allocates one or more extents to form the index segment for a table.
By default, the database uses automatic segment space management (ASSM) for
permanent, locally managed tablespaces. When a session first inserts data into a table, the
database formats a bitmap block. The bitmap tracks the blocks in the segment. The database
uses the bitmap to find free blocks and then formats each block before writing to it. ASSM
spread out inserts among blocks to avoid concurrency issues.
The high water mark (HWM) is the point in a segment beyond which data blocks are
unformatted and have never been used. Below the HWM, a block may be formatted and
written to, formatted and empty, or unformatted. The low high water mark (low HWM) marks
the point below which all blocks are known to be formatted because they either contain data
or formerly contained data.
During a full table scan, the database reads all blocks up to the low HWM, which are known
to be formatted, and then reads the segment bitmap to determine which blocks between the
HWM and low HWM are formatted and safe to read. The database knows not to read past
the HWM because these blocks are unformatted.

See Also:
Oracle Database Concepts to learn about data block storage

8.2.1.2 Importance of Rowids for Row Access


Every row in a heap-organized table has a rowid unique to this table that corresponds to the
physical address of a row piece. A rowid is a 10-byte physical address of a row.
The rowid points to a specific file, block, and row number. For example, in the rowid
AAAPecAAFAAAABSAAA, the final AAA represents the row number. The row number is an index
into a row directory entry. The row directory entry contains a pointer to the location of the row
on the block.
The database can sometimes move a row in the bottom part of the block. For example, if row
movement is enabled, then the row can move because of partition key updates, Flashback
Table operations, shrink table operations, and so on. If the database moves a row within a
block, then the database updates the row directory entry to modify the pointer. The rowid
stays constant.
Oracle Database uses rowids internally for the construction of indexes. For example, each
key in a B-tree index is associated with a rowid that points to the address of the associated
row. Physical rowids provide the fastest possible access to a table row, enabling the
database to retrieve a row in as little as a single I/O.

See Also:
Oracle Database Concepts to learn about rowids

8-3
Chapter 8
Table Access Paths

8.2.1.3 Direct Path Reads


In a direct path read, the database reads buffers from disk directly into the PGA,
bypassing the SGA entirely.
The following figure shows the difference between scattered and sequential reads,
which store buffers in the SGA, and direct path reads.

Figure 8-1 Direct Path Reads

Database Buffer Database Buffer


Cache Cache Process PGA
SGA Buffer Cache SGA Buffer Cache Bitmap Merge
Sort Area Hash Area
Area

Session Persistent Runtime


Memory Area Area

Direct path
read

DB File DB File Direct Path


Sequential Read Scattered Read Read

Situations in which Oracle Database may perform direct path reads include:
• Execution of a CREATE TABLE AS SELECT statement
• Execution of an ALTER REBUILD or ALTER MOVE statement
• Reads from a temporary tablespace
• Parallel queries
• Reads from a LOB segment

See Also:
Oracle Database Performance Tuning Guide to learn about wait events for
direct path reads

8-4
Chapter 8
Table Access Paths

8.2.2 Full Table Scans


A full table scan reads all rows from a table, and then filters out those rows that do not meet
the selection criteria.
This section contains the following topics:

8.2.2.1 When the Optimizer Considers a Full Table Scan


In general, the optimizer chooses a full table scan when it cannot use a different access path,
or another usable access path is higher cost.
The following table shows typical reasons for choosing a full table scan.

Table 8-2 Typical Reasons for a Full Table Scan

Reason Explanation To Learn More


No index exists. If no index exists, then the optimizer Oracle Database Concepts
uses a full table scan.
The query predicate applies Unless the index is a function-based "Guidelines for Using
a function to the indexed index, the database indexes the values Function-Based Indexes for
column. of the column, not the values of the Performance"
column with the function applied. A
typical application-level mistake is to
index a character column, such as
char_col, and then query the column
using syntax such as WHERE
char_col=1. The database implicitly
applies a TO_NUMBER function to the
constant number 1, which prevents use
of the index.
A SELECT COUNT(*) query The optimizer cannot use the index to "B-Tree Indexes and Nulls"
is issued, and an index count the number of table rows because
exists, but the indexed the index cannot contain null entries.
column contains nulls.
The query predicate does not For example, an index might exist on "Index Skip Scans"
use the leading edge of a B- employees(first_name,last_name).
tree index. If a user issues a query with the
predicate WHERE last_name='KING',
then the optimizer may not choose an
index because column first_name is
not in the predicate. However, in this
situation the optimizer may choose to
use an index skip scan.
The query is unselective. If the optimizer determines that the query "Selectivity"
requires most of the blocks in the table,
then it uses a full table scan, even
though indexes are available. Full table
scans can use larger I/O calls. Making
fewer large I/O calls is cheaper than
making many smaller calls.

8-5
Chapter 8
Table Access Paths

Table 8-2 (Cont.) Typical Reasons for a Full Table Scan

Reason Explanation To Learn More


The table statistics are stale. For example, a table was small, but now "Introduction to Optimizer
has grown large. If the table statistics are Statistics"
stale and do not reflect the current size
of the table, then the optimizer does not
know that an index is now most efficient
than a full table scan.
The table is small. If a table contains fewer than n blocks Oracle Database Reference
under the high water mark, where n
equals the setting for the
DB_FILE_MULTIBLOCK_READ_COUNT
initialization parameter, then a full table
scan may be cheaper than an index
range scan. The scan may be less
expensive regardless of the fraction of
tables being accessed or indexes
present.
The table has a high degree A high degree of parallelism for a table Oracle Database Reference
of parallelism. skews the optimizer toward full table
scans over range scans. Query the value
in the ALL_TABLES.DEGREE column to
determine the degree of parallelism.
The query uses a full table The hint FULL(table alias) instructs Oracle Database SQL
scan hint. the optimizer to use a full table scan. Language Reference

8.2.2.2 How a Full Table Scan Works


In a full table scan, the database sequentially reads every formatted block under the
high water mark. The database reads each block only once.
The following graphic depicts a scan of a table segment, showing how the scan skips
unformatted blocks below the high water mark.

Figure 8-2 High Water Mark

Sequential Low HWM HWM


Read

Used Never Used,


Unformatted

8-6
Chapter 8
Table Access Paths

Because the blocks are adjacent, the database can speed up the scan by making I/O calls
larger than a single block, known as a multiblock read. The size of a read call ranges from
one block to the number of blocks specified by the DB_FILE_MULTIBLOCK_READ_COUNT
initialization parameter. For example, setting this parameter to 4 instructs the database to
read up to 4 blocks in a single call.
The algorithms for caching blocks during full table scans are complex. For example, the
database caches blocks differently depending on whether tables are small or large.

See Also:

• "Table 19-2"
• Oracle Database Concepts for an overview of the default caching mode
• Oracle Database Reference to learn about the
DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter

8.2.2.3 Full Table Scan: Example


This example scans the hr.employees table.

The following statement queries monthly salaries over $4000:

SELECT salary
FROM hr.employees
WHERE salary > 4000;

Example 8-1 Full Table Scan


The following plan was retrieved using the DBMS_XPLAN.DISPLAY_CURSOR function. Because
no index exists on the salary column, the optimizer cannot use an index range scan, and so
uses a full table scan.

SQL_ID 54c20f3udfnws, child number 0


-------------------------------------
select salary from hr.employees where salary > 4000

Plan hash value: 3476115102

---------------------------------------------------------------------------
| Id| Operation | Name | Rows | Bytes |Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0| SELECT STATEMENT | | | | 3 (100)| |
|* 1| TABLE ACCESS FULL| EMPLOYEES | 98 | 6762 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

1 - filter("SALARY">4000)

8-7
Chapter 8
Table Access Paths

8.2.3 Table Access by Rowid


A rowid is an internal representation of the storage location of data.
The rowid of a row specifies the data file and data block containing the row and the
location of the row in that block. Locating a row by specifying its rowid is the fastest
way to retrieve a single row because it specifies the exact location of the row in the
database.

Note:
Rowids can change between versions. Accessing data based on position is
not recommended because rows can move.

This section contains the following topics:

See Also:
Oracle Database Development Guide to learn more about rowids

8.2.3.1 When the Optimizer Chooses Table Access by Rowid


In most cases, the database accesses a table by rowid after a scan of one or more
indexes.
However, table access by rowid need not follow every index scan. If the index contains
all needed columns, then access by rowid might not occur.

8.2.3.2 How Table Access by Rowid Works


To access a table by rowid, the database performs multiple steps.
The database does the following:
1. Obtains the rowids of the selected rows, either from the statement WHERE clause or
through an index scan of one or more indexes
Table access may be needed for columns in the statement not present in the
index.
2. Locates each selected row in the table based on its rowid

8.2.3.3 Table Access by Rowid: Example


This example demonstrates rowid access of the hr.employees table.

8-8
Chapter 8
Table Access Paths

Assume that you run the following query:

SELECT *
FROM employees
WHERE employee_id > 190;

Step 2 of the following plan shows a range scan of the emp_emp_id_pk index on the
hr.employees table. The database uses the rowids obtained from the index to find the
corresponding rows from the employees table, and then retrieve them. The BATCHED access
shown in Step 1 means that the database retrieves a few rowids from the index, and then
attempts to access rows in block order to improve the clustering and reduce the number of
times that the database must access a block.

--------------------------------------------------------------------------------
|Id| Operation | Name |Rows|Bytes|Cost(%CPU)|Time|
--------------------------------------------------------------------------------
| 0| SELECT STATEMENT | | | |2(100)| |
| 1| TABLE ACCESS BY INDEX ROWID BATCHED|EMPLOYEES |16|1104|2 (0)|00:00:01|
|*2| INDEX RANGE SCAN |EMP_EMP_ID_PK|16| |1 (0)|00:00:01|
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

2 - access("EMPLOYEE_ID">190)

8.2.4 Sample Table Scans


A sample table scan retrieves a random sample of data from a simple table or a complex
SELECT statement, such as a statement involving joins and views.

This section contains the following topics:

8.2.4.1 When the Optimizer Chooses a Sample Table Scan


The database uses a sample table scan when a statement FROM clause includes the SAMPLE
keyword.
The SAMPLE clause has the following forms:

• SAMPLE (sample_percent)
The database reads a specified percentage of rows in the table to perform a sample table
scan.
• SAMPLE BLOCK (sample_percent)
The database reads a specified percentage of table blocks to perform a sample table
scan.
The sample_percent specifies the percentage of the total row or block count to include in the
sample. The value must be in the range .000001 up to, but not including, 100. This
percentage indicates the probability of each row, or each cluster of rows in block sampling,
being selected for the sample. It does not mean that the database retrieves exactly
sample_percent of the rows.

8-9
Chapter 8
Table Access Paths

Note:
Block sampling is possible only during full table scans or index fast full
scans. If a more efficient execution path exists, then the database does not
sample blocks. To guarantee block sampling for a specific table or index, use
the FULL or INDEX_FFS hint.

See Also:

• "Influencing the Optimizer with Hints"


• Oracle Database SQL Language Reference to learn about the SAMPLE
clause

8.2.4.2 Sample Table Scans: Example


This example uses a sample table scan to access 1% of the employees table,
sampling by blocks instead of rows.
Example 8-2 Sample Table Scan

SELECT * FROM hr.employees SAMPLE BLOCK (1);

The EXPLAIN PLAN output for this statement might look as follows:

------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost
(%CPU)|
------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 1 | 68 | 3
(34)|
| 1 | TABLE ACCESS SAMPLE | EMPLOYEES | 1 | 68 | 3
(34)|
------------------------------------------------------------------------
-

8.2.5 In-Memory Table Scans


An In-Memory scan retrieves rows from the In-Memory Column Store (IM column
store).
The IM column store is an optional SGA area that stores copies of tables and
partitions in a special columnar format optimized for rapid scans.
This section contains the following topics:

8-10
Chapter 8
Table Access Paths

See Also:
Oracle Database In-Memory Guide for an introduction to the IM column store

8.2.5.1 When the Optimizer Chooses an In-Memory Table Scan


The optimizer cost model is aware of the content of the IM column store.
When a user executes a query that references a table in the IM column store, the optimizer
calculates the cost of all possible access methods—including the In-Memory table scan—and
selects the access method with the lowest cost.

8.2.5.2 In-Memory Query Controls


You can control In-Memory queries using initialization parameters.
The following database initialization parameters affect the In-Memory features:
• INMEMORY_QUERY
This parameter enables or disables In-Memory queries for the database at the session or
system level. This parameter is helpful when you want to test workloads with and without
the use of the IM column store.
• OPTIMIZER_INMEMORY_AWARE
This parameter enables (TRUE) or disables (FALSE) all of the In-Memory enhancements
made to the optimizer cost model, table expansion, bloom filters, and so on. Setting the
parameter to FALSE causes the optimizer to ignore the In-Memory property of tables
during the optimization of SQL statements.
• OPTIMIZER_FEATURES_ENABLE
When set to values lower than 12.1.0.2, this parameter has the same effect as setting
OPTIMIZER_INMEMORY_AWARE to FALSE.
To enable or disable In-Memory queries, you can specify the INMEMORY or NO_INMEMORY hints,
which are the per-query equivalent of the INMEMORY_QUERY initialization parameter. If a SQL
statement uses the INMEMORY hint, but the object it references is not already loaded in the IM
column store, then the database does not wait for the object to be populated in the IM column
store before executing the statement. However, initial access of the object triggers the object
population in the IM column store.

See Also:

• Oracle Database Reference to learn more about the INMEMORY_QUERY,


OPTIMIZER_INMEMORY_AWARE, and OPTIMIZER_FEATURES_ENABLE initialization
parameters
• Oracle Database SQL Language Reference to learn more about the INMEMORY
hints

8-11
Chapter 8
B-Tree Index Access Paths

8.2.5.3 In-Memory Table Scans: Example


This example shows an execution plan that includes the TABLE ACCESS INMEMORY
operation.
The following example shows a query of the oe.product_information table, which
has been altered with the INMEMORY HIGH option.

Example 8-3 In-Memory Table Scan

SELECT *
FROM oe.product_information
WHERE list_price > 10
ORDER BY product_id

The plan for this statement might look as follows, with the INMEMORY keyword in Step 2
indicating that some or all of the object was accessed from the IM column store:

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);

SQL_ID 2mb4h57x8pabw, child number 0


-------------------------------------
select * from oe.product_information where list_price > 10 order byproduct_id

Plan hash value: 2256295385


--------------------------------------------------------------------------------------
-----
|Id| Operation | Name |Rows|Bytes|TempSpc|Cost(%CPU)|
Time|
--------------------------------------------------------------------------------------
-----
| 0| SELECT STATEMENT | | | | |21
(100)| |
| 1| SORT ORDER BY | |285| 62415|82000|21 (5)|
00:00:01|
|*2| TABLE ACCESS INMEMORY FULL| PRODUCT_INFORMATION |285| 62415| | 5 (0)|
00:00:01|
--------------------------------------------------------------------------------------
-----

Predicate Information (identified by operation id):


---------------------------------------------------
2 - inmemory("LIST_PRICE">10)
filter("LIST_PRICE">10)

8.3 B-Tree Index Access Paths


An index is an optional structure, associated with a table or table cluster, that can
sometimes speed data access.
By creating an index on one or more columns of a table, you gain the ability in some
cases to retrieve a small set of randomly distributed rows from the table. Indexes are
one of many means of reducing disk I/O.

8-12
Chapter 8
B-Tree Index Access Paths

This section contains the following topics:

See Also:

• Oracle Database Concepts for an overview of indexes


• Oracle Database Administrator’s Guide to learn more about automatic and
manual index creation

8.3.1 About B-Tree Index Access


B-trees, short for balanced trees, are the most common type of database index.
A B-tree index is an ordered list of values divided into ranges. By associating a key with a row
or range of rows, B-trees provide excellent retrieval performance for a wide range of queries,
including exact match and range searches.
This section contains the following topics:

8.3.1.1 B-Tree Index Structure


A B-tree index has two types of blocks: branch blocks for searching and leaf blocks that store
values.
The following graphic illustrates the logical structure of a B-tree index. Branch blocks store
the minimum key prefix needed to make a branching decision between two keys. The leaf
blocks contain every indexed data value and a corresponding rowid used to locate the actual
row. Each index entry is sorted by (key, rowid). The leaf blocks are doubly linked.

8-13
Chapter 8
B-Tree Index Access Paths

Figure 8-3 B-Tree Index Structure

246,rowid
248,rowid
248,rowid

250,rowid
....
...
200..209
210..220
221..228

246..250
....

221,rowid
222,rowid
223,rowid

228,rowid
200..250
81..120

...
41..80
0..40

....
....

...
41..48
49..53
54..65

78..80
....

11,rowid
11,rowid
12,rowid

19,rowid
....
11..19
20..25

32..40
0..10

....
Branch Blocks

10,rowid
0,rowid
0,rowid
Leaf Blocks

....
8.3.1.2 How Index Storage Affects Index Scans
Bitmap index blocks can appear anywhere in the index segment.
Figure 8-3 shows the leaf blocks as adjacent to each other. For example, the 1-10
block is next to and before the 11-19 block. This sequencing illustrates the linked lists
that connect the index entries. However, index blocks need not be stored in order
within an index segment. For example, the 246-250 block could appear anywhere in
the segment, including directly before the 1-10 block. For this reason, ordered index
scans must perform single-block I/O. The database must read an index block to
determine which index block it must read next.
The index block body stores the index entries in a heap, just like table rows. For
example, if the value 10 is inserted first into a table, then the index entry with key 10
might be inserted at the bottom of the index block. If 0 is inserted next into the table,
then the index entry for key 0 might be inserted on top of the entry for 10. Thus, the
index entries in the block body are not stored in key order. However, within the index
block, the row header stores records in key order. For example, the first record in the
header points to the index entry with key 0, and so on sequentially up to the record
that points to the index entry with key 10. Thus, index scans can read the row header
to determine where to begin and end range scans, avoiding the necessity of reading
every entry in the block.

8-14
Chapter 8
B-Tree Index Access Paths

See Also:
Oracle Database Concepts to learn about index blocks

8.3.1.3 Unique and Nonunique Indexes


In a nonunique index, the database stores the rowid by appending it to the key as an extra
column. The entry adds a length byte to make the key unique.
For example, the first index key in the nonunique index shown in Figure 8-3 is the pair
0,rowid and not simply 0. The database sorts the data by index key values and then by rowid
ascending. For example, the entries are sorted as follows:

0,AAAPvCAAFAAAAFaAAa
0,AAAPvCAAFAAAAFaAAg
0,AAAPvCAAFAAAAFaAAl
2,AAAPvCAAFAAAAFaAAm

In a unique index, the index key does not include the rowid. The database sorts the data only
by the index key values, such as 0, 1, 2, and so on.

See Also:
Oracle Database Concepts for an overview of unique and nonunique indexes

8.3.1.4 B-Tree Indexes and Nulls


B-tree indexes never store completely null keys, which is important for how the optimizer
chooses access paths. A consequence of this rule is that single-column B-tree indexes never
store nulls.
An example helps illustrate. The hr.employees table has a primary key index on
employee_id, and a unique index on department_id. The department_id column can contain
nulls, making it a nullable column, but the employee_id column cannot.

SQL> SELECT COUNT(*) FROM employees WHERE department_id IS NULL;

COUNT(*)
----------
1

SQL> SELECT COUNT(*) FROM employees WHERE employee_id IS NULL;

COUNT(*)
----------
0

8-15
Chapter 8
B-Tree Index Access Paths

The following example shows that the optimizer chooses a full table scan for a query
of all department IDs in hr.employees. The optimizer cannot use the index on
employees.department_id because the index is not guaranteed to include entries for
every row in the table.

SQL> EXPLAIN PLAN FOR SELECT department_id FROM employees;

Explained.

SQL> SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------
---
Plan hash value: 3476115102

------------------------------------------------------------------------
---
|Id | Operation | Name | Rows| Bytes | Cost (%CPU)|
Time |
------------------------------------------------------------------------
---
| 0 | SELECT STATEMENT | | 107 | 321 | 2 (0)|
00:00:01 |
| 1 | TABLE ACCESS FULL| EMPLOYEES | 107 | 321 | 2 (0)|
00:00:01 |
------------------------------------------------------------------------
---

The following example shows the optimizer can use the index on department_id for a
query of a specific department ID because all non-null rows are indexed.

SQL> EXPLAIN PLAN FOR SELECT department_id FROM employees WHERE


department_id=10;

Explained.

SQL> SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------
---
Plan hash value: 67425611

------------------------------------------------------------------------
---
|Id| Operation | Name |Rows|Bytes|Cost (%CPU)|
Time |
------------------------------------------------------------------------
---
| 0| SELECT STATEMENT | | 1 | 3 | 1 (0)| 00:0
0:01|
|*1| INDEX RANGE SCAN| EMP_DEPARTMENT_IX | 1 | 3 | 1 (0)| 00:0
0:01|

8-16
Chapter 8
B-Tree Index Access Paths

---------------------------------------------------------------------------

Predicate Information (identified by operation id):


1 - access("DEPARTMENT_ID"=10)

The following example shows that the optimizer chooses an index scan when the predicate
excludes null values:

SQL> EXPLAIN PLAN FOR SELECT department_id FROM employees


WHERE department_id IS NOT NULL;

Explained.

SQL> SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 1590637672

---------------------------------------------------------------------------
| Id| Operation | Name |Rows|Bytes| Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0| SELECT STATEMENT | |106| 318 | 1 (0)| 00:0 0:01|
|*1| INDEX FULL SCAN | EMP_DEPARTMENT_IX |106| 318 | 1 (0)| 00:0 0:01|
---------------------------------------------------------------------------

Predicate Information (identified by operation id):


1 - filter("DEPARTMENT_ID" IS NOT NULL)

8.3.2 Index Unique Scans


An index unique scan returns at most 1 rowid.
This section contains the following topics:

8.3.2.1 When the Optimizer Considers Index Unique Scans


An index unique scan requires an equality predicate.
Specifically, the database performs a unique scan only when a query predicate references all
columns in a unique index key using an equality operator, such as WHERE prod_id=10.

A unique or primary key constraint is insufficient by itself to produce an index unique scan
because a non-unique index on the column may already exist. Consider the following
example, which creates the t_table table and then creates a non-unique index on numcol:

SQL> CREATE TABLE t_table(numcol INT);


SQL> CREATE INDEX t_table_idx ON t_table(numcol);
SQL> SELECT UNIQUENESS FROM USER_INDEXES WHERE INDEX_NAME = 'T_TABLE_IDX';

UNIQUENES
---------
NONUNIQUE

8-17
Chapter 8
B-Tree Index Access Paths

The following code creates a primary key constraint on a column with a non-unique
index, resulting in an index range scan rather than an index unique scan:

SQL> ALTER TABLE t_table ADD CONSTRAINT t_table_pk PRIMARY KEY(numcol);


SQL> SET AUTOTRACE TRACEONLY EXPLAIN
SQL> SELECT * FROM t_table WHERE numcol = 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 868081059

------------------------------------------------------------------------
---
| Id | Operation | Name |Rows |Bytes |Cost (%CPU)|
Time |
------------------------------------------------------------------------
---
| 0 | SELECT STATEMENT | | 1 | 13 | 1 (0)|
00:00:01 |
|* 1 | INDEX RANGE SCAN| T_TABLE_IDX | 1 | 13 | 1 (0)|
00:00:01 |
------------------------------------------------------------------------
---

Predicate Information (identified by operation id):


---------------------------------------------------
1 - access("NUMCOL"=1)

You can use the INDEX(alias index_name) hint to specify the index to use, but not a
specific type of index access path.

See Also:

• Oracle Database Concepts for more details on index structures and for
detailed information on how a B-tree is searched
• Oracle Database SQL Language Reference to learn more about the
INDEX hint

8.3.2.2 How Index Unique Scans Work


The scan searches the index in order for the specified key. An index unique scan stops
processing as soon as it finds the first record because no second record is possible.
The database obtains the rowid from the index entry, and then retrieves the row
specified by the rowid.
The following figure illustrates an index unique scan. The statement requests the
record for product ID 19 in the prod_id column, which has a primary key index.

8-18
Chapter 8
B-Tree Index Access Paths

Figure 8-4 Index Unique Scan

Branch Blocks
0..40
41..80
81..120
....
200..250

0..10 41..48 200..209


11..19 49..53 210..220
20..25 54..65 221..228
.... .... ... ....
32..40 78..80 246..250

Leaf Blocks

0,rowid 11,rowid 221,rowid 246,rowid


1,rowid 12,rowid 222,rowid 247,rowid
.... 13,rowid 223,rowid 248,rowid
10,rowid .... .... ....
19,rowid ... 228,rowid ... 250,rowid

8.3.2.3 Index Unique Scans: Example


This example uses a unique scan to retrieve a row from the products table.

The following statement queries the record for product 19 in the sh.products table:

SELECT *
FROM sh.products
WHERE prod_id = 19;

Because a primary key index exists on the products.prod_id column, and the WHERE clause
references all of the columns using an equality operator, the optimizer chooses a unique
scan:

SQL_ID 3ptq5tsd5vb3d, child number 0


-------------------------------------
select * from sh.products where prod_id = 19

Plan hash value: 4047888317

---------------------------------------------------------------------------
| Id| Operation | Name |Rows|Bytes|Cost (%CPU)|Time |
---------------------------------------------------------------------------

8-19
Chapter 8
B-Tree Index Access Paths

| 0| SELECT STATEMENT | | | |1
(100)| |
| 1| TABLE ACCESS BY INDEX ROWID| PRODUCTS |1 | 173 |1 (0)|
00:00:01|
|* 2| INDEX UNIQUE SCAN | PRODUCTS_PK |1 | |0
(0)| |
------------------------------------------------------------------------
---

Predicate Information (identified by operation id):


---------------------------------------------------

2 - access("PROD_ID"=19)

8.3.3 Index Range Scans


An index range scan is an ordered scan of values.
The range in the scan can be bounded on both sides, or unbounded on one or both
sides. The optimizer typically chooses a range scan for queries with high selectivity.
By default, the database stores indexes in ascending order, and scans them in the
same order. For example, a query with the predicate department_id >= 20 uses a
range scan to return rows sorted by index keys 20, 30, 40, and so on. If multiple index
entries have identical keys, then the database returns them in ascending order by
rowid, so that 0,AAAPvCAAFAAAAFaAAa is followed by 0,AAAPvCAAFAAAAFaAAg, and so
on.
An index range scan descending is identical to an index range scan except that the
database returns rows in descending order. Usually, the database uses a descending
scan when ordering data in a descending order, or when seeking a value less than a
specified value.
This section contains the following topics:

8.3.3.1 When the Optimizer Considers Index Range Scans


For an index range scan, multiple values must be possible for the index key.
Specifically, the optimizer considers index range scans in the following circumstances:
• One or more leading columns of an index are specified in conditions.
A condition specifies a combination of one or more expressions and logical
(Boolean) operators and returns a value of TRUE, FALSE, or UNKNOWN. Examples of
conditions include:
– department_id = :id
– department_id < :id
– department_id > :id
– AND combination of the preceding conditions for leading columns in the index,
such as department_id > :low AND department_id < :hi.

8-20
Chapter 8
B-Tree Index Access Paths

Note:
For the optimizer to consider a range scan, wild-card searches of the form
col1 LIKE '%ASD' must not be in a leading position.

• 0, 1, or more values are possible for an index key.

Tip:
If you require sorted data, then use the ORDER BY clause, and do not rely on an
index. If an index can satisfy an ORDER BY clause, then the optimizer uses this option
and thereby avoids a sort.

The optimizer considers an index range scan descending when an index can satisfy an ORDER
BY DESCENDING clause.

If the optimizer chooses a full table scan or another index, then a hint may be required to
force this access path. The INDEX(tbl_alias ix_name) and INDEX_DESC(tbl_alias ix_name)
hints instruct the optimizer to use a specific index.

See Also:
Oracle Database SQL Language Reference to learn more about the INDEX and
INDEX_DESC hints

8.3.3.2 How Index Range Scans Work


During an index range scan, Oracle Database proceeds from root to branch.
In general, the scan algorithm is as follows:
1. Read the root block.
2. Read the branch block.
3. Alternate the following steps until all data is retrieved:
a. Read a leaf block to obtain a rowid.
b. Read a table block to retrieve a row.

Note:
In some cases, an index scan reads a set of index blocks, sorts the rowids, and
then reads a set of table blocks.

Thus, to scan the index, the database moves backward or forward through the leaf blocks.
For example, a scan for IDs between 20 and 40 locates the first leaf block that has the lowest

8-21
Chapter 8
B-Tree Index Access Paths

key value that is 20 or greater. The scan proceeds horizontally through the linked list of
leaf nodes until it finds a value greater than 40, and then stops.
The following figure illustrates an index range scan using ascending order. A
statement requests the employees records with the value 20 in the department_id
column, which has a nonunique index. In this example, 2 index entries for department
20 exist.

Figure 8-5 Index Range Scan

Branch Blocks
0..40
41..80
81..120
....
200..250

0..10 41..48 200..209


11..2 49..53 210..220
54..65 221..228
.... .... ... ....
32..40 78..80 246..250

Leaf Blocks

0,rowid 11,rowid 221,rowid 246,rowid


0,rowid 11,rowid 222,rowid 248,rowid
.... 12,rowid 223,rowid 248,rowid
10,rowid .... .... ....
20,rowid
20, rowid
... 228,rowid ... 250,rowid

8.3.3.3 Index Range Scan: Example


This example retrieves a set of values from the employees table using an index range
scan.
The following statement queries the records for employees in department 20 with
salaries greater than 1000:

SELECT *
FROM employees
WHERE department_id = 20
AND salary > 1000;

The preceding query has low cardinality (returns few rows), so the query uses the
index on the department_id column. The database scans the index, fetches the

8-22
Chapter 8
B-Tree Index Access Paths

records from the employees table, and then applies the salary > 1000 filter to these fetched
records to generate the result.

SQL_ID brt5abvbxw9tq, child number 0


-------------------------------------
SELECT * FROM employees WHERE department_id = 20 AND salary > 1000

Plan hash value: 2799965532

-------------------------------------------------------------------------------------------
|Id | Operation | Name |Rows|Bytes|Cost(%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
|*1 | TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEES | 2 | 138 | 2 (0)|00:00:01|
|*2 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX| 2 | | 1 (0)|00:00:01|
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

1 - filter("SALARY">1000)
2 - access("DEPARTMENT_ID"=20)

8.3.3.4 Index Range Scan Descending: Example


This example uses an index to retrieve rows from the employees table in sorted order.

The following statement queries the records for employees in department 20 in descending
order:

SELECT *
FROM employees
WHERE department_id < 20
ORDER BY department_id DESC;

This preceding query has low cardinality, so the query uses the index on the department_id
column.

SQL_ID 8182ndfj1ttj6, child number 0


-------------------------------------
SELECT * FROM employees WHERE department_id<20 ORDER BY department_id DESC

Plan hash value: 1681890450


---------------------------------------------------------------------------
|Id| Operation | Name |Rows|Bytes|Cost(%CPU)|Time |
---------------------------------------------------------------------------
| 0| SELECT STATEMENT | | | |2(100)| |
| 1| TABLE ACCESS BY INDEX ROWID |EMPLOYEES |2|138|2 (0)|00:00:01|
|*2| INDEX RANGE SCAN DESCENDING|EMP_DEPARTMENT_IX|2| |1 (0)|00:00:01|
---------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

8-23
Chapter 8
B-Tree Index Access Paths

2 - access("DEPARTMENT_ID"<20)

The database locates the first index leaf block that contains the highest key value that
is 20 or less. The scan then proceeds horizontally to the left through the linked list of
leaf nodes. The database obtains the rowid from each index entry, and then retrieves
the row specified by the rowid.

8.3.4 Index Full Scans


An index full scan reads the entire index in order. An index full scan can eliminate a
separate sorting operation because the data in the index is ordered by index key.
This section contains the following topics:

8.3.4.1 When the Optimizer Considers Index Full Scans


The optimizer considers an index full scan in a variety of situations.
The situations include the following:
• A predicate references a column in the index. This column need not be the leading
column.
• No predicate is specified, but all of the following conditions are met:
– All columns in the table and in the query are in the index.
– At least one indexed column is not null.
• A query includes an ORDER BY on indexed non-nullable columns.

8.3.4.2 How Index Full Scans Work


The database reads the root block, and then navigates down the left hand side of the
index (or right if doing a descending full scan) until it reaches a leaf block.
Then the database reaches a leaf block, the scan proceeds across the bottom of the
index, one block at a time, in sorted order. The database uses single-block I/O rather
than multiblock I/O.
The following graphic illustrates an index full scan. A statement requests the
departments records ordered by department_id.

8-24
Chapter 8
B-Tree Index Access Paths

Figure 8-6 Index Full Scan

Branch Blocks
0..40
41..80
81..120
....
200..250

0..10 41..48 200..209


11..19 49..53 210..220
20..25 54..65 221..228
.... .... ... ....
32..40 78..80 246..250

Leaf Blocks

0,rowid 11,rowid 221,rowid 246,rowid


1,rowid 12,rowid 222,rowid 247,rowid
.... 13,rowid 223,rowid 248,rowid
10,rowid .... .... ....
19,rowid ... 228,rowid ... 250,rowid

8.3.4.3 Index Full Scans: Example


This example uses an index full scan to satisfy a query with an ORDER BY clause.

The following statement queries the ID and name for departments in order of department ID:

SELECT department_id, department_name


FROM departments
ORDER BY department_id;

The following plan shows that the optimizer chose an index full scan:

SQL_ID 94t4a20h8what, child number 0


-------------------------------------
select department_id, department_name from departments order by department_id

Plan hash value: 4179022242

------------------------------------------------------------------------
|Id | Operation | Name |Rows|Bytes|Cost(%CPU)|Time |
------------------------------------------------------------------------
|0| SELECT STATEMENT | | | |2 (100)| |
|1| TABLE ACCESS BY INDEX ROWID|DEPARTMENTS |27 |432|2 (0)|00:00:01 |

8-25
Chapter 8
B-Tree Index Access Paths

|2| INDEX FULL SCAN |DEPT_ID_PK |27 | |1 (0)|00:00:01 |


------------------------------------------------------------------------

The database locates the first index leaf block, and then proceeds horizontally to the
right through the linked list of leaf nodes. For each index entry, the database obtains
the rowid from the entry, and then retrieves the table row specified by the rowid.
Because the index is sorted on department_id, the database avoids a separate
operation to sort the retrieved rows.

8.3.5 Index Fast Full Scans


An index fast full scan reads the index blocks in unsorted order, as they exist on disk.
This scan does not use the index to probe the table, but reads the index instead of the
table, essentially using the index itself as a table.
This section contains the following topics:

8.3.5.1 When the Optimizer Considers Index Fast Full Scans


The optimizer considers this scan when a query only accesses attributes in the index.

Note:
Unlike a full scan, a fast full scan cannot eliminate a sort operation because it
does not read the index in order.

The INDEX_FFS(table_name index_name) hint forces a fast full index scan.

See Also:
Oracle Database SQL Language Reference to learn more about the INDEX
hint

8.3.5.2 How Index Fast Full Scans Work


The database uses multiblock I/O to read the root block and all of the leaf and branch
blocks. The databases ignores the branch and root blocks and reads the index entries
on the leaf blocks.

8.3.5.3 Index Fast Full Scans: Example


This examples uses a fast full index scan as a result of an optimizer hint.
The following statement queries the ID and name for departments in order of
department ID:

SELECT /*+ INDEX_FFS(departments dept_id_pk) */ COUNT(*)


FROM departments;

8-26
Chapter 8
B-Tree Index Access Paths

The following plan shows that the optimizer chose a fast full index scan:

SQL_ID fu0k5nvx7sftm, child number 0


-------------------------------------
select /*+ index_ffs(departments dept_id_pk) */ count(*) from departments

Plan hash value: 3940160378


--------------------------------------------------------------------------
| Id | Operation | Name | Rows |Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | 2 (100)| |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| DEPT_ID_PK | 27 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------

8.3.6 Index Skip Scans


An index skip scan occurs when the initial column of a composite index is "skipped" or not
specified in the query.
This section contains the following topics:

See Also:
Oracle Database Concepts

8.3.6.1 When the Optimizer Considers Index Skips Scans


Often, skip scanning index blocks is faster than scanning table blocks, and faster than
performing full index scans.
The optimizer considers a skip scan when the following criteria are met:
• The leading column of a composite index is not specified in the query predicate.
For example, the query predicate does not reference the cust_gender column, and the
composite index key is (cust_gender,cust_email).
• Few distinct values exist in the leading column of the composite index, but many distinct
values exist in the nonleading key of the index.
For example, if the composite index key is (cust_gender,cust_email), then the
cust_gender column has only two distinct values, but cust_email has thousands.

8.3.6.2 How Index Skip Scans Work


An index skip scan logically splits a composite index into smaller subindexes.
The number of distinct values in the leading columns of the index determines the number of
logical subindexes. The lower the number, the fewer logical subindexes the optimizer must
create, and the more efficient the scan becomes. The scan reads each logical index
separately, and "skips" index blocks that do not meet the filter condition on the non-leading
column.

8-27
Chapter 8
B-Tree Index Access Paths

8.3.6.3 Index Skip Scans: Example


This example uses an index skip scan to satisfy a query of the sh.customers table.

The customers table contains a column cust_gender whose values are either M or F.
While logged in to the database as user sh, you create a composite index on the
columns (cust_gender, cust_email) as follows:

CREATE INDEX cust_gender_email_ix


ON sh.customers (cust_gender, cust_email);

Conceptually, a portion of the index might look as follows, with the gender value of F or
M as the leading edge of the index.

F,Wolf@company.example.com,rowid
F,Wolsey@company.example.com,rowid
F,Wood@company.example.com,rowid
F,Woodman@company.example.com,rowid
F,Yang@company.example.com,rowid
F,Zimmerman@company.example.com,rowid
M,Abbassi@company.example.com,rowid
M,Abbey@company.example.com,rowid

You run the following query for a customer in the sh.customers table:

SELECT *
FROM sh.customers
WHERE cust_email = 'Abbey@company.example.com';

The database can use a skip scan of the customers_gender_email index even though
cust_gender is not specified in the WHERE clause. In the sample index, the leading
column cust_gender has two possible values: F and M. The database logically splits
the index into two. One subindex has the key F, with entries in the following form:

F,Wolf@company.example.com,rowid
F,Wolsey@company.example.com,rowid
F,Wood@company.example.com,rowid
F,Woodman@company.example.com,rowid
F,Yang@company.example.com,rowid
F,Zimmerman@company.example.com,rowid

The second subindex has the key M, with entries in the following form:

M,Abbassi@company.example.com,rowid
M,Abbey@company.example.com,rowid

When searching for the record for the customer whose email is
Abbey@company.example.com, the database searches the subindex with the leading

8-28
Chapter 8
B-Tree Index Access Paths

value F first, and then searches the subindex with the leading value M. Conceptually, the
database processes the query as follows:

( SELECT *
FROM sh.customers
WHERE cust_gender = 'F'
AND cust_email = 'Abbey@company.example.com' )
UNION ALL
( SELECT *
FROM sh.customers
WHERE cust_gender = 'M'
AND cust_email = 'Abbey@company.example.com' )

The plan for the query is as follows:

SQL_ID d7a6xurcnx2dj, child number 0


-------------------------------------
SELECT * FROM sh.customers WHERE cust_email = 'Abbey@company.example.com'

Plan hash value: 797907791

-----------------------------------------------------------------------------------------
|Id| Operation | Name |Rows|Bytes|Cost(%CPU)|Time|
-----------------------------------------------------------------------------------------
| 0|SELECT STATEMENT | | | |10(100)| |
| 1| TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMERS |33|6237| 10(0)|00:00:01|
|*2| INDEX SKIP SCAN | CUST_GENDER_EMAIL_IX |33| | 4(0)|00:00:01|
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

2 - access("CUST_EMAIL"='Abbey@company.example.com')
filter("CUST_EMAIL"='Abbey@company.example.com')

See Also:
Oracle Database Concepts to learn more about skip scans

8.3.7 Index Join Scans


An index join scan is a hash join of multiple indexes that together return all columns
requested by a query. The database does not need to access the table because all data is
retrieved from the indexes.
This section contains the following topics:

8.3.7.1 When the Optimizer Considers Index Join Scans


In some cases, avoiding table access is the most cost efficient option.

8-29
Chapter 8
B-Tree Index Access Paths

The optimizer considers an index join in the following circumstances:


• A hash join of multiple indexes retrieves all data requested by the query, without
requiring table access.
• The cost of retrieving rows from the table is higher than reading the indexes
without retrieving rows from the table. An index join is often expensive. For
example, when scanning two indexes and joining them, it is often less costly to
choose the most selective index, and then probe the table.
You can specify an index join with the INDEX_JOIN(table_name) hint.

See Also:
Oracle Database SQL Language Reference

8.3.7.2 How Index Join Scans Work


An index join involves scanning multiple indexes, and then using a hash join on the
rowids obtained from these scans to return the rows.
In an index join scan, table access is always avoided. For example, the process for
joining two indexes on a single table is as follows:
1. Scan the first index to retrieve rowids.
2. Scan the second index to retrieve rowids.
3. Perform a hash join by rowid to obtain the rows.

8.3.7.3 Index Join Scans: Example


This example queries the last name and email for employees whose last name begins
with A, specifying an index join.

SELECT /*+ INDEX_JOIN(employees) */ last_name, email


FROM employees
WHERE last_name like 'A%';

Separate indexes exist on the (last_name,first_name) and email columns. Part of


the emp_name_ix index might look as follows:

Banda,Amit,AAAVgdAALAAAABSABD
Bates,Elizabeth,AAAVgdAALAAAABSABI
Bell,Sarah,AAAVgdAALAAAABSABc
Bernstein,David,AAAVgdAALAAAABSAAz
Bissot,Laura,AAAVgdAALAAAABSAAd
Bloom,Harrison,AAAVgdAALAAAABSABF
Bull,Alexis,AAAVgdAALAAAABSABV

8-30
Chapter 8
Bitmap Index Access Paths

The first part of the emp_email_uk index might look as follows:

ABANDA,AAAVgdAALAAAABSABD
ABULL,AAAVgdAALAAAABSABV
ACABRIO,AAAVgdAALAAAABSABX
AERRAZUR,AAAVgdAALAAAABSAAv
AFRIPP,AAAVgdAALAAAABSAAV
AHUNOLD,AAAVgdAALAAAABSAAD
AHUTTON,AAAVgdAALAAAABSABL

The following example retrieves the plan using the DBMS_XPLAN.DISPLAY_CURSOR function.
The database retrieves all rowids in the emp_email_uk index, and then retrieves rowids in
emp_name_ix for last names that begin with A. The database uses a hash join to search both
sets of rowids for matches. For example, rowid AAAVgdAALAAAABSABD occurs in both sets of
rowids, so the database probes the employees table for the record corresponding to this
rowid.
Example 8-4 Index Join Scan

SQL_ID d2djchyc9hmrz, child number 0


-------------------------------------
SELECT /*+ INDEX_JOIN(employees) */ last_name, email FROM employees
WHERE last_name like 'A%'

Plan hash value: 3719800892


-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
|* 1 | VIEW | index$_join$_001 | 3 | 48 | 3 (34)| 00:00:01 |
|* 2 | HASH JOIN | | | | | |
|* 3 | INDEX RANGE SCAN | EMP_NAME_IX | 3 | 48 | 1 (0)| 00:00:01 |
| 4 | INDEX FAST FULL SCAN| EMP_EMAIL_UK | 3 | 48 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------
1 - filter("LAST_NAME" LIKE 'A%')
2 - access(ROWID=ROWID)
3 - access("LAST_NAME" LIKE 'A%')

8.4 Bitmap Index Access Paths


Bitmap indexes combine the indexed data with a rowid range.
This section explains how bitmap indexes, and describes some of the more common bitmap
index access paths:

8.4.1 About Bitmap Index Access


In a conventional B-tree index, one index entry points to a single row. In a bitmap index, the
key is the combination of the indexed data and the rowid range.

8-31
Chapter 8
Bitmap Index Access Paths

The database stores at least one bitmap for each index key. Each value in the bitmap,
which is a series of 1 and 0 values, points to a row within a rowid range. Thus, in a
bitmap index, one index entry points to a set of rows rather than a single row.
This section contains the following topics:

8.4.1.1 Differences Between Bitmap and B-Tree Indexes


A bitmap index uses a different key from a B-tree index, but is stored in a B-tree
structure.
The following table shows the differences among types of index entries.

Table 8-3 Index Entries for B-Trees and Bitmaps

Index Entry Key Data Example


Unique B-tree Indexed data only Rowid In an entry of the index on the employees.employee_id
column, employee ID 101 is the key, and the rowid
AAAPvCAAFAAAAFaAAa is the data:

101,AAAPvCAAFAAAAFaAAa

Nonunique B-tree Indexed data None In an entry of the index on the employees.last_name
combined with rowid column, the name and rowid combination
Smith,AAAPvCAAFAAAAFaAAa is the key, and there is no
data:

Smith,AAAPvCAAFAAAAFaAAa

Bitmap Indexed data Bitmap In an entry of the index on the customers.cust_gender


combined with rowid column, the M,low-rowid,high-rowid part is the key,
range and the series of 1 and 0 values is the data:

M,low-rowid,high-rowid,1000101010101010

The database stores a bitmap index in a B-tree structure. The database can search
the B-tree quickly on the first part of the key, which is the set of attributes on which the
index is defined, and then obtain the corresponding rowid range and bitmap.

See Also:

• "Bitmap Storage"
• Oracle Database Concepts for an overview of bitmap indexes
• Oracle Database Data Warehousing Guide for more information about
bitmap indexes

8-32
Chapter 8
Bitmap Index Access Paths

8.4.1.2 Purpose of Bitmap Indexes


Bitmap indexes are typically suitable for infrequently modified data with a low or medium
number of distinct values (NDV).
In general, B-tree indexes are suitable for columns with high NDV and frequent DML activity.
For example, the optimizer might choose a B-tree index for a query of a sales.amount
column that returns few rows. In contrast, the customers.state and customers.county
columns are candidates for bitmap indexes because they have few distinct values, are
infrequently updated, and can benefit from efficient AND and OR operations.

Bitmap indexes are a useful way to speed ad hoc queries in a data warehouse. They are
fundamental to star transformations. Specifically, bitmap indexes are useful in queries that
contain the following:
• Multiple conditions in the WHERE clause
Before the table itself is accessed, the database filters out rows that satisfy some, but not
all, conditions.
• AND, OR, and NOT operations on columns with low or medium NDV
Combining bitmap indexes makes these operations more efficient. The database can
merge bitmaps from bitmap indexes very quickly. For example, if bitmap indexes exist on
the customers.state and customers.county columns, then these indexes can
enormously improve the performance of the following query:

SELECT *
FROM customers
WHERE state = 'CA'
AND county = 'San Mateo'

The database can convert 1 values in the merged bitmap into rowids efficiently.
• The COUNT function
The database can scan the bitmap index without needing to scan the table.
• Predicates that select for null values
Unlike B-tree indexes, bitmap indexes can contain nulls. Queries that count the number
of nulls in a column can use the bitmap index without scanning the table.
• Columns that do not experience heavy DML
The reason is that one index key points to many rows. If a session modifies the indexed
data, then the database cannot lock a single bit in the bitmap: rather, the database locks
the entire index entry, which in practice locks the rows pointed to by the bitmap. For
example, if the county of residence for a specific customer changes from San Mateo to
Alameda, then the database must get exclusive access to the San Mateo index entry and
Alameda index entry in the bitmap. Rows containing these two values cannot be modified
until COMMIT.

8-33
Chapter 8
Bitmap Index Access Paths

See Also:

• "Star Transformation"
• Oracle Database SQL Language Reference to learn about the COUNT
function

8.4.1.3 Bitmaps and Rowids


For a particular value in a bitmap, the value is 1 if the row values match the bitmap
condition, and 0 if it does not. Based on these values, the database uses an internal
algorithm to map bitmaps onto rowids.
The bitmap entry contains the indexed value, the rowid range (start and end rowids),
and a bitmap. Each 0 or 1 value in the bitmap is an offset into the rowid range, and
maps to a potential row in the table, even if the row does not exist. Because the
number of possible rows in a block is predetermined, the database can use the range
endpoints to determine the rowid of an arbitrary row in the range.

Note:
The Hakan factor is an optimization used by the bitmap index algorithms to
limit the number of rows that Oracle Database assumes can be stored in a
single block. By artificially limiting the number of rows, the database reduces
the size of the bitmaps.

Table 8-4 shows part of a sample bitmap for the sh.customers.cust_marital_status


column, which is nullable. The actual index has 12 distinct values. Only 3 are shown in
the sample: null, married, and single.

Table 8-4 Bitmap Index Entries

Column Start End 1st 2nd 3rd 4th 5th 6th


Value for Rowid in Rowid in Row in Row in Row in Row in Row in Row in
cust_marital Range Range Range Range Range Range Range Range
_status
(null) AAA ... CCC ... 0 0 0 0 0 1
married AAA ... CCC ... 1 0 1 1 1 0
single AAA ... CCC ... 0 1 0 0 0 0
single DDD ... EEE ... 1 0 1 0 1 1

As shown in Table 8-4, bitmap indexes can include keys that consist entirely of null
values, unlike B-tree indexes. In Table 8-4, the null has a value of 1 for the 6th row in
the range, which means that the cust_marital_status value is null for the 6th row in
the range. Indexing nulls can be useful for some SQL statements, such as queries with
the aggregate function COUNT.

8-34
Chapter 8
Bitmap Index Access Paths

See Also:
Oracle Database Concepts to learn about rowid formats

8.4.1.4 Bitmap Join Indexes


A bitmap join index is a bitmap index for the join of two or more tables.
The optimizer can use a bitmap join index to reduce or eliminate the volume of data that must
be joined during plan execution. Bitmap join indexes can be much more efficient in storage
than materialized join views.
The following example creates a bitmap index on the sh.sales and sh.customers tables:

CREATE BITMAP INDEX cust_sales_bji ON sales(c.cust_city)


FROM sales s, customers c
WHERE c.cust_id = s.cust_id LOCAL;

The FROM and WHERE clause in the preceding CREATE statement represent the join condition
between the tables. The customers.cust_city column is the index key.

Each key value in the index represents a possible city in the customers table. Conceptually,
key values for the index might look as follows, with one bitmap associated with each key
value:

San Francisco 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 . . .
San Mateo 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 . . .
Smithville 1 0 0 0 1 0 0 1 0 0 1 0 1 0 0 . . .
.
.
.

Each bit in a bitmap corresponds to one row in the sales table. In the Smithville key, the
value 1 means that the first row in the sales table corresponds to a product sold to a
Smithville customer, whereas the value 0 means that the second row corresponds to a
product not sold to a Smithville customer.
Consider the following query of the number of separate sales to Smithville customers:

SELECT COUNT (*)


FROM sales s, customers c
WHERE c.cust_id = s.cust_id
AND c.cust_city = 'Smithville';

The following plan shows that the database reads the Smithville bitmap to derive the
number of Smithville sales (Step 4), thereby avoiding a join of the customers and sales
tables.

SQL_ID 57s100mh142wy, child number 0


-------------------------------------
SELECT COUNT (*) FROM sales s, customers c WHERE c.cust_id = s.cust_id

8-35
Chapter 8
Bitmap Index Access Paths

AND c.cust_city = 'Smithville'

Plan hash value: 3663491772

------------------------------------------------------------------------------------
|Id| Operation | Name |Rows|Bytes|Cost (%CPU)| Time|Pstart|Pstop|
------------------------------------------------------------------------------------
| 0| SELECT STATEMENT | | | |29 (100)| | | |
| 1| SORT AGGREGATE | | 1 | 5| | | | |
| 2| PARTITION RANGE ALL | | 1708|8540|29 (0)|00:00:01|1|28|
| 3| BITMAP CONVERSION COUNT | | 1708|8540|29 (0)|00:00:01| | |
|*4| BITMAP INDEX SINGLE VALUE|CUST_SALES_BJI| | | | |1|28|
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

4 - access("S"."SYS_NC00008$"='Smithville')

See Also:
Oracle Database Concepts to learn about the CREATE INDEX statement

8.4.1.5 Bitmap Storage


A bitmap index resides in a B-tree structure, using branch blocks and leaf blocks just
as in a B-tree.
For example, if the customers.cust_marital_status column has 12 distinct values,
then one branch block might point to the keys married,rowid-range and
single,rowid-range, another branch block might point to the widowed,rowid-range
key, and so on. Alternatively, a single branch block could point to a leaf block
containing all 12 distinct keys.
Each indexed column value may have one or more bitmap pieces, each with its own
rowid range occupying a contiguous set of rows in one or more extents. The database
can use a bitmap piece to break up an index entry that is large relative to the size of a
block. For example, the database could break a single index entry into three pieces,
with the first two pieces in separate blocks in the same extent, and the last piece in a
separate block in a different extent.
To conserve space, Oracle Database can compression consecutive ranges of 0
values.

8.4.2 Bitmap Conversion to Rowid


A bitmap conversion translates between an entry in the bitmap and a row in a table.
The conversion can go from entry to row (TO ROWID), or from row to entry (FROM
ROWID).

This section contains the following topics:

8-36
Chapter 8
Bitmap Index Access Paths

8.4.2.1 When the Optimizer Chooses Bitmap Conversion to Rowid


The optimizer uses a conversion whenever it retrieves a row from a table using a bitmap
index entry.

8.4.2.2 How Bitmap Conversion to Rowid Works


Conceptually, a bitmap can be represented as table.
For example, Table 8-4 represents the bitmap as a table with customers row numbers as
columns and cust_marital_status values as rows. Each field in Table 8-4 has the value 1 or
0, and represents a column value in a row. Conceptually, the bitmap conversion uses an
internal algorithm that says, "Field F in the bitmap corresponds to the Nth row of the Mth
block of the table," or "The Nth row of the Mth block in the table corresponds to field F in the
bitmap."

8.4.2.3 Bitmap Conversion to Rowid: Example


In this example, the optimizer chooses a bitmap conversion operation to satisfy a query using
a range predicate.
A query of the sh.customers table selects the names of all customers born before 1918:

SELECT cust_last_name, cust_first_name


FROM customers
WHERE cust_year_of_birth < 1918;

The following plan shows that the database uses a range scan to find all key values less than
1918 (Step 3), converts the 1 values in the bitmap to rowids (Step 2), and then uses the
rowids to obtain the rows from the customers table (Step 1):

-------------------------------------------------------------------------------------------
|Id| Operation | Name |Rows|Bytes|Cost(%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0| SELECT STATEMENT | | | |421 (100)| |
| 1| TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMERS |3604|68476|421 (1)|00:00:01|
| 2| BITMAP CONVERSION TO ROWIDS | | | | | |
|*3| BITMAP INDEX RANGE SCAN | CUSTOMERS_YOB_BIX| | | | |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------
3 - access("CUST_YEAR_OF_BIRTH"<1918)
filter("CUST_YEAR_OF_BIRTH"<1918)

8.4.3 Bitmap Index Single Value


This type of access path uses a bitmap index to look up a single key value.
This section contains the following topics:

8-37
Chapter 8
Bitmap Index Access Paths

8.4.3.1 When the Optimizer Considers Bitmap Index Single Value


The optimizer considers this access path when the predicate contains an equality
operator.

8.4.3.2 How Bitmap Index Single Value Works


The query scans a single bitmap for positions containing a 1 value. The database
converts the 1 values into rowids, and then uses the rowids to find the rows.

The database only needs to process a single bitmap. For example, the following table
represents the bitmap index (in two bitmap pieces) for the value widowed in the
sh.customers.cust_marital_status column. To satisfy a query of customers with the
status widowed, the database can search for each 1 value in the widowed bitmap and
find the rowid of the corresponding row.

Table 8-5 Bitmap Index Entries

Column Start End 1st 2nd 3rd 4th 5th 6th


Value Rowid in Rowid in Row in Row in Row in Row in Row in Row in
Range Range Range Range Range Range Range Range
widowed AAA ... CCC ... 0 1 0 0 0 0
widowed DDD ... EEE ... 1 0 1 0 1 1

8.4.3.3 Bitmap Index Single Value: Example


In this example, the optimizer chooses a bitmap index single value operation to satisfy
a query that uses an equality predicate.
A query of the sh.customers table selects all widowed customers:

SELECT *
FROM customers
WHERE cust_marital_status = 'Widowed';

The following plan shows that the database reads the entry with the Widowed key in the
customers bitmap index (Step 3), converts the 1 values in the bitmap to rowids (Step
2), and then uses the rowids to obtain the rows from the customers table (Step 1):

SQL_ID ff5an2xsn086h, child number 0


-------------------------------------
SELECT * FROM customers WHERE cust_marital_status = 'Widowed'

Plan hash value: 2579015045


--------------------------------------------------------------------------------------
-----
|Id| Operation | Name |Rows|Bytes|Cost (%CPU)|
Time|
--------------------------------------------------------------------------------------
-----
| 0|SELECT STATEMENT | | | |

8-38
Chapter 8
Bitmap Index Access Paths

412(100)| |
| 1| TABLE ACCESS BY INDEX ROWID BATCHED|CUSTOMERS |3461|638K|412 (2)|00:00:01|
| 2| BITMAP CONVERSION TO ROWIDS | | | | | |
|*3| BITMAP INDEX SINGLE VALUE |CUSTOMERS_MARITAL_BIX| | | | |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

3 - access("CUST_MARITAL_STATUS"='Widowed')

8.4.4 Bitmap Index Range Scans


This type of access path uses a bitmap index to look up a range of values.
This section contains the following topics:

8.4.4.1 When the Optimizer Considers Bitmap Index Range Scans


The optimizer considers this access path when the predicate selects a range of values.
The range in the scan can be bounded on both sides, or unbounded on one or both sides.
The optimizer typically chooses a range scan for selective queries.

See Also:
"Index Range Scans"

8.4.4.2 How Bitmap Index Range Scans Work


This scan works similarly to a B-tree range scan.
For example, the following table represents three values in the bitmap index for the
sh.customers.cust_year_of_birth column. If a query requests all customers born before
1917, then the database can scan this index for values lower than 1917, and then obtain the
rowids for rows that have a 1.

Table 8-6 Bitmap Index Entries

Column Start Rowid End Rowid 1st Row 2nd 3rd 4th Row 5th 6th Row
Value in Range in Range in Row in Row in in Row in in
Range Range Range Range Range Range
1913 AAA ... CCC ... 0 0 0 0 0 1
1917 AAA ... CCC ... 1 0 1 1 1 0
1918 AAA ... CCC ... 0 1 0 0 0 0
1918 DDD ... EEE ... 1 0 1 0 1 1

8-39
Chapter 8
Bitmap Index Access Paths

See Also:
"Index Range Scans"

8.4.4.3 Bitmap Index Range Scans: Example


This example uses a range scan to select customers born before a single year.
A query of the sh.customers table selects the names of customers born before 1918:

SELECT cust_last_name, cust_first_name


FROM customers
WHERE cust_year_of_birth < 1918

The following plan shows that the database obtains all bitmaps for
cust_year_of_birth keys lower than 1918 (Step 3), converts the bitmaps to rowids
(Step 2), and then fetches the rows (Step 1):

SQL_ID 672z2h9rawyjg, child number 0


-------------------------------------
SELECT cust_last_name, cust_first_name FROM customers WHERE
cust_year_of_birth < 1918

Plan hash value: 4198466611


--------------------------------------------------------------------------------------
-----
|Id| Operation | Name |Rows|Bytes|Cost(%CPU)|
Time |
--------------------------------------------------------------------------------------
-----
| 0| SELECT STATEMENT | | | |421
(100)| |
| 1| TABLE ACCESS BY INDEX ROWID BATCHED|CUSTOMERS |3604|68476|421 (1)|
00:00:01|
| 2| BITMAP CONVERSION TO ROWIDS | | | |
| |
|*3| BITMAP INDEX RANGE SCAN |CUSTOMERS_YOB_BIX | | |
| |
--------------------------------------------------------------------------------------
-----

Predicate Information (identified by operation id):


---------------------------------------------------
3 - access("CUST_YEAR_OF_BIRTH"<1918)
filter("CUST_YEAR_OF_BIRTH"<1918)

8.4.5 Bitmap Merge


This access path merges multiple bitmaps, and returns a single bitmap as a result.
A bitmap merge is indicated by the BITMAP MERGE operation in an execution plan.

8-40
Chapter 8
Bitmap Index Access Paths

This section contains the following topics:

8.4.5.1 When the Optimizer Considers Bitmap Merge


The optimizer typically uses a bitmap merge to combine bitmaps generated from a bitmap
index range scan.

8.4.5.2 How Bitmap Merge Works


A merge uses a Boolean OR operation between two bitmaps. The resulting bitmap selects all
rows from the first bitmap, plus all rows from every subsequent bitmap.
A query might select all customers born before 1918. The following example shows sample
bitmaps for three customers.cust_year_of_birth keys: 1917, 1916, and 1915. If any position
in any bitmap has a 1, then the merged bitmap has a 1 in the same position. Otherwise, the
merged bitmap has a 0.

1917 1 0 1 0 0 0 0 0 0 0 0 0 0 1
1916 0 1 0 0 0 0 0 0 0 0 0 0 0 0
1915 0 0 0 0 0 0 0 0 1 0 0 0 0 0
------------------------------------
merged: 1 1 1 0 0 0 0 0 1 0 0 0 0 1

The 1 values in resulting bitmap correspond to rows that contain the values 1915, 1916, or
1917.

8.4.5.3 Bitmap Merge: Example


This example shows how the database merges bitmaps to optimize a query using a range
predicate.
A query of the sh.customers table selects the names of female customers born before 1918:

SELECT cust_last_name, cust_first_name


FROM customers
WHERE cust_gender = 'F'
AND cust_year_of_birth < 1918

The following plan shows that the database obtains all bitmaps for cust_year_of_birth keys
lower than 1918 (Step 6), and then merges these bitmaps using OR logic to create a single
bitmap (Step 5). The database obtains a single bitmap for the cust_gender key of F (Step 4),
and then performs an AND operation on these two bitmaps. The result is a single bitmap that
contains 1 values for the requested rows (Step 3).

SQL_ID 1xf59h179zdg2, child number 0


-------------------------------------
select cust_last_name, cust_first_name from customers where cust_gender
= 'F' and cust_year_of_birth < 1918

Plan hash value: 49820847


-------------------------------------------------------------------------------------------
|Id| Operation | Name |Rows|Bytes|Cost(%CPU)|Time |
-------------------------------------------------------------------------------------------

8-41
Chapter 8
Table Cluster Access Paths

| 0|SELECT STATEMENT | | | |
288(100)| |
| 1| TABLE ACCESS BY INDEX ROWID BATCHED|CUSTOMERS |1802|37842|288 (1)|
00:00:01|
| 2| BITMAP CONVERSION TO ROWIDS | | | |
| |
| 3| BITMAP AND | | | |
| |
|*4| BITMAP INDEX SINGLE VALUE |CUSTOMERS_GENDER_BIX| | |
| |
| 5| BITMAP MERGE | | | |
| |
|*6| BITMAP INDEX RANGE SCAN |CUSTOMERS_YOB_BIX | | |
| |
--------------------------------------------------------------------------------------
-----

Predicate Information (identified by operation id):


---------------------------------------------------
4 - access("CUST_GENDER"='F')
6 - access("CUST_YEAR_OF_BIRTH"<1918)
filter("CUST_YEAR_OF_BIRTH"<1918)

8.5 Table Cluster Access Paths


A table cluster is a group of tables that share common columns and store related
data in the same blocks. When tables are clustered, a single data block can contain
rows from multiple tables.
This section contains the following topics:

See Also:
Oracle Database Concepts for an overview of table clusters

8.5.1 Cluster Scans


An index cluster is a table cluster that uses an index to locate data.
The cluster index is a B-tree index on the cluster key. A cluster scan retrieves all rows
that have the same cluster key value from a table stored in an indexed cluster.
This section contains the following topics:

8.5.1.1 When the Optimizer Considers Cluster Scans


The database considers a cluster scan when a query accesses a table in an indexed
cluster.

8-42
Chapter 8
Table Cluster Access Paths

8.5.1.2 How a Cluster Scan Works


In an indexed cluster, the database stores all rows with the same cluster key value in the
same data block.
For example, if the hr.employees2 and hr.departments2 tables are clustered in
emp_dept_cluster, and if the cluster key is department_id, then the database stores all
employees in department 10 in the same block, all employees in department 20 in the same
block, and so on.
The B-tree cluster index associates the cluster key value with the database block address
(DBA) of the block containing the data. For example, the index entry for key 30 shows the
address of the block that contains rows for employees in department 30:

30,AADAAAA9d

When a user requests rows in the cluster, the database scans the index to obtain the DBAs of
the blocks containing the rows. Oracle Database then locates the rows based on these
DBAs.

8.5.1.3 Cluster Scans: Example


This example clusters the employees and departments tables on the department_id column,
and then queries the cluster for a single department.
As user hr, you create a table cluster, cluster index, and tables in the cluster as follows:

CREATE CLUSTER employees_departments_cluster


(department_id NUMBER(4)) SIZE 512;

CREATE INDEX idx_emp_dept_cluster


ON CLUSTER employees_departments_cluster;

CREATE TABLE employees2


CLUSTER employees_departments_cluster (department_id)
AS SELECT * FROM employees;
CREATE TABLE departments2
CLUSTER employees_departments_cluster (department_id)
AS SELECT * FROM departments;

You query the employees in department 30 as follows:

SELECT *
FROM employees2
WHERE department_id = 30;

To perform the scan, Oracle Database first obtains the rowid of the row describing
department 30 by scanning the cluster index (Step 2). Oracle Database then locates the rows
in employees2 using this rowid (Step 1).

SQL_ID b7xk1jzuwdc6t, child number 0


-------------------------------------

8-43
Chapter 8
Table Cluster Access Paths

SELECT * FROM employees2 WHERE department_id = 30

Plan hash value: 49826199

------------------------------------------------------------------------
---
|Id| Operation | Name |Rows|Bytes|Cost(%CPU)|
Time|
------------------------------------------------------------------------
---
| 0| SELECT STATEMENT | | | | 2
(100)| |
| 1| TABLE ACCESS CLUSTER| EMPLOYEES2 | 6 |798 | 2 (0)|
00:00:01|
|*2| INDEX UNIQUE SCAN |IDX_EMP_DEPT_CLUSTER| 1 | | 1 (0)|
00:00:01|
------------------------------------------------------------------------
---

Predicate Information (identified by operation id):


---------------------------------------------------

2 - access("DEPARTMENT_ID"=30)

See Also:
Oracle Database Concepts to learn about indexed clusters

8.5.2 Hash Scans


A hash cluster is like an indexed cluster, except the index key is replaced with a hash
function. No separate cluster index exists.
In a hash cluster, the data is the index. The database uses a hash scan to locate rows
in a hash cluster based on a hash value.
This section contains the following topics:

8.5.2.1 When the Optimizer Considers a Hash Scan


The database considers a hash scan when a query accesses a table in a hash cluster.

8.5.2.2 How a Hash Scan Works


In a hash cluster, all rows with the same hash value are stored in the same data block.
To perform a hash scan of the cluster, Oracle Database first obtains the hash value by
applying a hash function to a cluster key value specified by the statement. Oracle
Database then scans the data blocks containing rows with this hash value.

8-44
Chapter 8
Table Cluster Access Paths

8.5.2.3 Hash Scans: Example


This example hashes the employees and departments tables on the department_id column,
and then queries the cluster for a single department.
You create a hash cluster and tables in the cluster as follows:

CREATE CLUSTER employees_departments_cluster


(department_id NUMBER(4)) SIZE 8192 HASHKEYS 100;

CREATE TABLE employees2


CLUSTER employees_departments_cluster (department_id)
AS SELECT * FROM employees;

CREATE TABLE departments2


CLUSTER employees_departments_cluster (department_id)
AS SELECT * FROM departments;

You query the employees in department 30 as follows:

SELECT *
FROM employees2
WHERE department_id = 30

To perform a hash scan, Oracle Database first obtains the hash value by applying a hash
function to the key value 30, and then uses this hash value to scan the data blocks and
retrieve the rows (Step 1).

SQL_ID 919x7hyyxr6p4, child number 0


-------------------------------------
SELECT * FROM employees2 WHERE department_id = 30

Plan hash value: 2399378016

----------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 |
|* 1 | TABLE ACCESS HASH| EMPLOYEES2 | 10 | 1330 | |
----------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

1 - access("DEPARTMENT_ID"=30)

See Also:
Oracle Database Concepts to learn about hash clusters

8-45
9
Joins
Oracle Database provides several optimizations for joining row sets.
This chapter contains the following topics:

9.1 About Joins


A join combines the output from exactly two row sources, such as tables or views, and
returns one row source. The returned row source is the data set.
A join is characterized by multiple tables in the WHERE (non-ANSI) or FROM ... JOIN (ANSI)
clause of a SQL statement. Whenever multiple tables exist in the FROM clause, Oracle
Database performs a join.
A join condition compares two row sources using an expression. The join condition defines
the relationship between the tables. If the statement does not specify a join condition, then
the database performs a Cartesian join, matching every row in one table with every row in the
other table.
This section contains the following topics:

See Also:

• "Cartesian Joins"
• Oracle Database SQL Language Reference for a concise discussion of joins in
Oracle SQL

9.1.1 Join Trees


Typically, a join tree is represented as an upside-down tree structure.
As shown in the following graphic, table1 is the left table, and table2 is the right table. The
optimizer processes the join from left to right. For example, if this graphic depicted a nested
loops join, then table1 is the outer loop, and table2 is the inner loop.

Figure 9-1 Join Tree

result set

table1 table2

9-1
Chapter 9
About Joins

The input of a join can be the result set from a previous join. If the right child of every
internal node of a join tree is a table, then the tree is a left deep join tree, as shown in
the following example. Most join trees are left deep joins.

Figure 9-2 Left Deep Join Tree

result set

table4

table3

table1 table2

If the left child of every internal node of a join tree is a table, then the tree is called a
right deep join tree, as shown in the following diagram.

Figure 9-3 Right Deep Join Tree

result set

table1

table2

table3 table4

If the left or the right child of an internal node of a join tree can be a join node, then the
tree is called a bushy join tree. In the following example, table4 is a right child of a join
node, table1 is the left child of a join node, and table2 is the left child of a join node.

9-2
Chapter 9
About Joins

Figure 9-4 Bushy Join Tree

result set

table4

table1

table2 table3

In yet another variation, both inputs of a join are the results of a previous join.

9.1.2 How the Optimizer Executes Join Statements


The database joins pairs of row sources. When multiple tables exist in the FROM clause, the
optimizer must determine which join operation is most efficient for each pair.
The optimizer must make the interrelated decisions shown in the following table.

Table 9-1 Join Operations

Operation Explanation To Learn More


Access paths As for simple statements, the optimizer must "Optimizer Access Paths"
choose an access path to retrieve data from
each table in the join statement. For example,
the optimizer might choose between a full table
scan or an index scan..
Join methods To join each pair of row sources, Oracle "Join Methods"
Database must decide how to do it. The "how"
is the join method. The possible join methods
are nested loop, sort merge, and hash joins. A
Cartesian join requires one of the preceding
join methods. Each join method has specific
situations in which it is more suitable than the
others.
Join types The join condition determines the join type. For "Join Types"
example, an inner join retrieves only rows that
match the join condition. An outer join retrieves
rows that do not match the join condition.

9-3
Chapter 9
About Joins

Table 9-1 (Cont.) Join Operations

Operation Explanation To Learn More


Join order To execute a statement that joins more than N/A
two tables, Oracle Database joins two tables
and then joins the resulting row source to the
next table. This process continues until all
tables are joined into the result. For example,
the database joins two tables, and then joins
the result to a third table, and then joins this
result to a fourth table, and so on.

9.1.3 How the Optimizer Chooses Execution Plans for Joins


When determining the join order and method, the optimizer goal is to reduce the
number of rows early so it performs less work throughout the execution of the SQL
statement.
The optimizer generates a set of execution plans, according to possible join orders,
join methods, and available access paths. The optimizer then estimates the cost of
each plan and chooses the one with the lowest cost. When choosing an execution
plan, the optimizer considers the following factors:
• The optimizer first determines whether joining two or more tables results in a row
source containing at most one row.
The optimizer recognizes such situations based on UNIQUE and PRIMARY KEY
constraints on the tables. If such a situation exists, then the optimizer places these
tables first in the join order. The optimizer then optimizes the join of the remaining
set of tables.
• For join statements with outer join conditions, the table with the outer join operator
typically comes after the other table in the condition in the join order.
In general, the optimizer does not consider join orders that violate this guideline,
although the optimizer overrides this ordering condition in certain circumstances.
Similarly, when a subquery has been converted into an antijoin or semijoin, the
tables from the subquery must come after those tables in the outer query block to
which they were connected or correlated. However, hash antijoins and semijoins
are able to override this ordering condition in certain circumstances.
The optimizer estimates the cost of a query plan by computing the estimated I/Os and
CPU. These I/Os have specific costs associated with them: one cost for a single block
I/O, and another cost for multiblock I/Os. Also, different functions and expressions
have CPU costs associated with them. The optimizer determines the total cost of a
query plan using these metrics. These metrics may be influenced by many initialization
parameter and session settings at compile time, such as the
DB_FILE_MULTI_BLOCK_READ_COUNT setting, system statistics, and so on.

For example, the optimizer estimates costs in the following ways:


• The cost of a nested loops join depends on the cost of reading each selected row
of the outer table and each of its matching rows of the inner table into memory.
The optimizer estimates these costs using statistics in the data dictionary.
• The cost of a sort merge join depends largely on the cost of reading all the
sources into memory and sorting them.

9-4
Chapter 9
Join Methods

• The cost of a hash join largely depends on the cost of building a hash table on one of the
input sides to the join and using the rows from the other side of the join to probe it.
Example 9-1 Estimating Costs for Join Order and Method
Conceptually, the optimizer constructs a matrix of join orders and methods and the cost
associated with each. For example, the optimizer must determine how best to join the
date_dim and lineorder tables in a query. The following table shows the possible variations
of methods and orders, and the cost for each. In this example, a nested loops join in the
order date_dim, lineorder has the lowest cost.

Table 9-2 Sample Costs for Join of date_dim and lineorder Tables

Join Method Cost of date_dim, lineorder Cost of lineorder, date_dim


Nested Loops 39,480 6,187,540
Hash Join 187,528 194,909
Sort Merge 217,129 217,129

See Also:

• "Introduction to Optimizer Statistics"


• "Influencing the Optimizer " for more information about optimizer hints
• Oracle Database Reference to learn about DB_FILE_MULTIBLOCK_READ_COUNT

9.2 Join Methods


A join method is the mechanism for joining two row sources.
Depending on the statistics, the optimizer chooses the method with the lowest estimated
cost. As shown in Figure 9-5, each join method has two children: the driving (also called
outer) row source and the driven-to (also called inner) row source.

Figure 9-5 Join Method

Join Method
(Nested Loops, Hash
Join, or Sort Merge)

Driving Row Source, Driven-To Row Source,


Outer row Source Inner Row Source

This section contains the following topics:

9-5
Chapter 9
Join Methods

9.2.1 Nested Loops Joins


Nested loops join an outer data set to an inner data set.
For each row in the outer data set that matches the single-table predicates, the
database retrieves all rows in the inner data set that satisfy the join predicate. If an
index is available, then the database can use it to access the inner data set by rowid.
This section contains the following topics:

9.2.1.1 When the Optimizer Considers Nested Loops Joins


Nested loops joins are useful when the database joins small subsets of data, the
database joins large sets of data with the optimizer mode set to FIRST_ROWS, or the join
condition is an efficient method of accessing the inner table.

Note:
The number of rows expected from the join is what drives the optimizer
decision, not the size of the underlying tables. For example, a query might
join two tables of a billion rows each, but because of the filters the optimizer
expects data sets of 5 rows each.

In general, nested loops joins work best on small tables with indexes on the join
conditions. If a row source has only one row, as with an equality lookup on a primary
key value (for example, WHERE employee_id=101), then the join is a simple lookup. The
optimizer always tries to put the smallest row source first, making it the driving table.
Various factors enter into the optimizer decision to use nested loops. For example, the
database may read several rows from the outer row source in a batch. Based on the
number of rows retrieved, the optimizer may choose either a nested loop or a hash join
to the inner row source. For example, if a query joins departments to driving table
employees, and if the predicate specifies a value in employees.last_name, then the
database might read enough entries in the index on last_name to determine whether
an internal threshold is passed. If the threshold is not passed, then the optimizer picks
a nested loop join to departments, and if the threshold is passed, then the database
performs a hash join, which means reading the rest of employees, hashing it into
memory, and then joining to departments.

If the access path for the inner loop is not dependent on the outer loop, then the result
can be a Cartesian product: for every iteration of the outer loop, the inner loop
produces the same set of rows. To avoid this problem, use other join methods to join
two independent row sources.

See Also:

• "Table 19-2"
• "Adaptive Query Plans"

9-6
Chapter 9
Join Methods

9.2.1.2 How Nested Loops Joins Work


Conceptually, nested loops are equivalent to two nested for loops.

For example, if a query joins employees and departments, then a nested loop in pseudocode
might be:

FOR erow IN (select * from employees where X=Y) LOOP


FOR drow IN (select * from departments where erow is matched) LOOP
output values from erow and drow
END LOOP
END LOOP

The inner loop is executed for every row of the outer loop. The employees table is the "outer"
data set because it is in the exterior for loop. The outer table is sometimes called a driving
table. The departments table is the "inner" data set because it is in the interior for loop.

A nested loops join involves the following basic steps:


1. The optimizer determines the driving row source and designates it as the outer loop.
The outer loop produces a set of rows for driving the join condition. The row source can
be a table accessed using an index scan, a full table scan, or any other operation that
generates rows.
The number of iterations of the inner loop depends on the number of rows retrieved in the
outer loop. For example, if 10 rows are retrieved from the outer table, then the database
must perform 10 lookups in the inner table. If 10,000,000 rows are retrieved from the
outer table, then the database must perform 10,000,000 lookups in the inner table.
2. The optimizer designates the other row source as the inner loop.
The outer loop appears before the inner loop in the execution plan, as follows:

NESTED LOOPS
outer_loop
inner_loop

3. For every fetch request from the client, the basic process is as follows:
a. Fetch a row from the outer row source
b. Probe the inner row source to find rows that match the predicate criteria
c. Repeat the preceding steps until all rows are obtained by the fetch request
Sometimes the database sorts rowids to obtain a more efficient buffer access pattern.

9.2.1.3 Nested Nested Loops


The outer loop of a nested loop can itself be a row source generated by a different nested
loop.

9-7
Chapter 9
Join Methods

The database can nest two or more outer loops to join as many tables as needed.
Each loop is a data access method. The following template shows how the database
iterates through three nested loops:

SELECT STATEMENT
NESTED LOOPS 3
NESTED LOOPS 2 - Row source becomes OUTER LOOP 3.1
NESTED LOOPS 1 - Row source becomes OUTER LOOP 2.1
OUTER LOOP 1.1
INNER LOOP 1.2
INNER LOOP 2.2
INNER LOOP 3.2

The database orders the loops as follows:


1. The database iterates through NESTED LOOPS 1:

NESTED LOOPS 1
OUTER LOOP 1.1
INNER LOOP 1.2

The output of NESTED LOOP 1 is a row source.


2. The database iterates through NESTED LOOPS 2, using the row source generated
by NESTED LOOPS 1 as its outer loop:

NESTED LOOPS 2
OUTER LOOP 2.1 - Row source generated by NESTED LOOPS 1
INNER LOOP 2.2

The output of NESTED LOOPS 2 is another row source.


3. The database iterates through NESTED LOOPS 3, using the row source generated
by NESTED LOOPS 2 as its outer loop:

NESTED LOOPS 3
OUTER LOOP 3.1 - Row source generated by NESTED LOOPS 2
INNER LOOP 3.2

Example 9-2 Nested Nested Loops Join


Suppose you join the employees and departments tables as follows:

SELECT /*+ ORDERED USE_NL(d) */ e.last_name, e.first_name,


d.department_name
FROM employees e, departments d
WHERE e.department_id=d.department_id
AND e.last_name like 'A%';

9-8
Chapter 9
Join Methods

The plan reveals that the optimizer chose two nested loops (Step 1 and Step 2) to access the
data:

SQL_ID ahuavfcv4tnz4, child number 0


-------------------------------------
SELECT /*+ ORDERED USE_NL(d) */ e.last_name, d.department_name FROM
employees e, departments d WHERE e.department_id=d.department_id AND
e.last_name like 'A%'

Plan hash value: 1667998133

----------------------------------------------------------------------------------
|Id| Operation |Name |Rows|Bytes|Cost(%CPU)|Time|
----------------------------------------------------------------------------------
| 0| SELECT STATEMENT | | | |5 (100)| |
| 1| NESTED LOOPS | | | | | |
| 2| NESTED LOOPS | | 3|102|5 (0)|00:00:01|
| 3| TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEES | 3| 54|2 (0)|00:00:01|
|*4| INDEX RANGE SCAN | EMP_NAME_IX | 3| |1 (0)|00:00:01|
|*5| INDEX UNIQUE SCAN | DEPT_ID_PK | 1| |0 (0)| |
| 6| TABLE ACCESS BY INDEX ROWID | DEPARTMENTS | 1| 16|1 (0)|00:00:01|
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

4 - access("E"."LAST_NAME" LIKE 'A%')


filter("E"."LAST_NAME" LIKE 'A%')
5 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")

In this example, the basic process is as follows:


1. The database begins iterating through the inner nested loop (Step 2) as follows:
a. The database searches the emp_name_ix for the rowids for all last names that begins
with A (Step 4).
For example:

Abel,employees_rowid
Ande,employees_rowid
Atkinson,employees_rowid
Austin,employees_rowid

b. Using the rowids from the previous step, the database retrieves a batch of rows from
the employees table (Step 3). For example:

Abel,Ellen,80
Abel,John,50

These rows become the outer row source for the innermost nested loop.
The batch step is typically part of adaptive execution plans. To determine whether a
nested loop is better than a hash join, the optimizer needs to determine many rows

9-9
Chapter 9
Join Methods

come back from the row source. If too many rows are returned, then the
optimizer switches to a different join method.
c. For each row in the outer row source, the database scans the dept_id_pk
index to obtain the rowid in departments of the matching department ID (Step
5), and joins it to the employees rows. For example:

Abel,Ellen,80,departments_rowid
Ande,Sundar,80,departments_rowid
Atkinson,Mozhe,50,departments_rowid
Austin,David,60,departments_rowid

These rows become the outer row source for the outer nested loop (Step 1).
2. The database iterates through the outer nested loop as follows:
a. The database reads the first row in outer row source.
For example:

Abel,Ellen,80,departments_rowid

b. The database uses the departments rowid to retrieve the corresponding row
from departments (Step 6), and then joins the result to obtain the requested
values (Step 1).
For example:

Abel,Ellen,80,Sales

c. The database reads the next row in the outer row source, uses the
departments rowid to retrieve the corresponding row from departments (Step
6), and iterates through the loop until all rows are retrieved.
The result set has the following form:

Abel,Ellen,80,Sales
Ande,Sundar,80,Sales
Atkinson,Mozhe,50,Shipping
Austin,David,60,IT

9.2.1.4 Current Implementation for Nested Loops Joins


Oracle Database 11g introduced a new implementation for nested loops that reduces
overall latency for physical I/O.
When an index or a table block is not in the buffer cache and is needed to process the
join, a physical I/O is required. The database can batch multiple physical I/O requests
and process them using a vector I/O (array) instead of one at a time. The database
sends an array of rowids to the operating system, which performs the read.
As part of the new implementation, two NESTED LOOPS join row sources might appear in
the execution plan where only one would have appeared in prior releases. In such
cases, Oracle Database allocates one NESTED LOOPS join row source to join the values
from the table on the outer side of the join with the index on the inner side. A second
row source is allocated to join the result of the first join, which includes the rowids
stored in the index, with the table on the inner side of the join.

9-10
Chapter 9
Join Methods

Consider the query in "Original Implementation for Nested Loops Joins". In the current
implementation, the execution plan for this query might be as follows:

-------------------------------------------------------------------------------------
| Id | Operation | Name |Rows|Bytes|Cost%CPU| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19 | 722 | 3 (0)|00:00:01|
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 19 | 722 | 3 (0)|00:00:01|
|* 3 | TABLE ACCESS FULL | DEPARTMENTS | 2 | 32 | 2 (0)|00:00:01|
|* 4 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | 10 | | 0 (0)|00:00:01|
| 5 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 10 | 220 | 1 (0)|00:00:01|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------
3 - filter("D"."DEPARTMENT_NAME"='Marketing' OR "D"."DEPARTMENT_NAME"='Sales')
4 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")

In this case, rows from the hr.departments table form the outer row source (Step 3) of the
inner nested loop (Step 2). The index emp_department_ix is the inner row source (Step 4) of
the inner nested loop. The results of the inner nested loop form the outer row source (Row 2)
of the outer nested loop (Row 1). The hr.employees table is the outer row source (Row 5) of
the outer nested loop.
For each fetch request, the basic process is as follows:
1. The database iterates through the inner nested loop (Step 2) to obtain the rows
requested in the fetch:
a. The database reads the first row of departments to obtain the department IDs for
departments named Marketing or Sales (Step 3). For example:

Marketing,20

This row set is the outer loop. The database caches the data in the PGA.
b. The database scans emp_department_ix, which is an index on the employees table,
to find employees rowids that correspond to this department ID (Step 4), and then
joins the result (Step 2).
The result set has the following form:

Marketing,20,employees_rowid
Marketing,20,employees_rowid
Marketing,20,employees_rowid

c. The database reads the next row of departments, scans emp_department_ix to find
employees rowids that correspond to this department ID, and then iterates through
the loop until the client request is satisfied.

9-11
Chapter 9
Join Methods

In this example, the database only iterates through the outer loop twice
because only two rows from departments satisfy the predicate filter.
Conceptually, the result set has the following form:

Marketing,20,employees_rowid
Marketing,20,employees_rowid
Marketing,20,employees_rowid
.
.
.
Sales,80,employees_rowid
Sales,80,employees_rowid
Sales,80,employees_rowid
.
.
.

These rows become the outer row source for the outer nested loop (Step 1).
This row set is cached in the PGA.
2. The database organizes the rowids obtained in the previous step so that it can
more efficiently access them in the cache.
3. The database begins iterating through the outer nested loop as follows:
a. The database retrieves the first row from the row set obtained in the previous
step, as in the following example:

Marketing,20,employees_rowid

b. Using the rowid, the database retrieves a row from employees to obtain the
requested values (Step 1), as in the following example:

Michael,Hartstein,13000,Marketing

c. The database retrieves the next row from the row set, uses the rowid to probe
employees for the matching row, and iterates through the loop until all rows are
retrieved.
The result set has the following form:

Michael,Hartstein,13000,Marketing
Pat,Fay,6000,Marketing
John,Russell,14000,Sales
Karen,Partners,13500,Sales
Alberto,Errazuriz,12000,Sales
.
.
.

In some cases, a second join row source is not allocated, and the execution plan looks
the same as it did before Oracle Database 11g. The following list describes such
cases:

9-12
Chapter 9
Join Methods

• All of the columns needed from the inner side of the join are present in the index, and
there is no table access required. In this case, Oracle Database allocates only one join
row source.
• The order of the rows returned might be different from the order returned in releases
earlier than Oracle Database 12c. Thus, when Oracle Database tries to preserve a
specific ordering of the rows, for example to eliminate the need for an ORDER BY sort,
Oracle Database might use the original implementation for nested loops joins.
• The OPTIMIZER_FEATURES_ENABLE initialization parameter is set to a release before
Oracle Database 11g. In this case, Oracle Database uses the original implementation for
nested loops joins.

9.2.1.5 Original Implementation for Nested Loops Joins


In the current release, both the new and original implementation of nested loops are possible.
For an example of the original implementation, consider the following join of the
hr.employees and hr.departments tables:

SELECT e.first_name, e.last_name, e.salary, d.department_name


FROM hr.employees e, hr.departments d
WHERE d.department_name IN ('Marketing', 'Sales')
AND e.department_id = d.department_id;

In releases before Oracle Database 11g, the execution plan for this query might appear as
follows:

-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |Cost (%CPU)|Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 19 | 722 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 10 | 220 | 1 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 19 | 722 | 3 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | DEPARTMENTS | 2 | 32 | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | 10 | | 0 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------
3 - filter("D"."DEPARTMENT_NAME"='Marketing' OR "D"."DEPARTMENT_NAME"='Sales')
4 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")

For each fetch request, the basic process is as follows:


1. The database iterates through the loop to obtain the rows requested in the fetch:
a. The database reads the first row of departments to obtain the department IDs for
departments named Marketing or Sales (Step 3). For example:
Marketing,20

This row set is the outer loop. The database caches the row in the PGA.

9-13
Chapter 9
Join Methods

b. The database scans emp_department_ix, which is an index on the


employees.department_id column, to find employees rowids that correspond
to this department ID (Step 4), and then joins the result (Step 2).
Conceptually, the result set has the following form:

Marketing,20,employees_rowid
Marketing,20,employees_rowid
Marketing,20,employees_rowid

c. The database reads the next row of departments, scans emp_department_ix


to find employees rowids that correspond to this department ID, and iterates
through the loop until the client request is satisfied.
In this example, the database only iterates through the outer loop twice
because only two rows from departments satisfy the predicate filter.
Conceptually, the result set has the following form:

Marketing,20,employees_rowid
Marketing,20,employees_rowid
Marketing,20,employees_rowid
.
.
.
Sales,80,employees_rowid
Sales,80,employees_rowid
Sales,80,employees_rowid
.
.
.

2. Depending on the circumstances, the database may organize the cached rowids
obtained in the previous step so that it can more efficiently access them.
3. For each employees rowid in the result set generated by the nested loop, the
database retrieves a row from employees to obtain the requested values (Step 1).
Thus, the basic process is to read a rowid and retrieve the matching employees
row, read the next rowid and retrieve the matching employees row, and so on.
Conceptually, the result set has the following form:

Michael,Hartstein,13000,Marketing
Pat,Fay,6000,Marketing
John,Russell,14000,Sales
Karen,Partners,13500,Sales
Alberto,Errazuriz,12000,Sales
.
.
.

9.2.1.6 Nested Loops Controls


You can add the USE_NL hint to instruct the optimizer to join each specified table to
another row source with a nested loops join, using the specified table as the inner
table.

9-14
Chapter 9
Join Methods

The related hint USE_NL_WITH_INDEX(table index) hint instructs the optimizer to join the
specified table to another row source with a nested loops join using the specified table as the
inner table. The index is optional. If no index is specified, then the nested loops join uses an
index with at least one join predicate as the index key.
Example 9-3 Nested Loops Hint
Assume that the optimizer chooses a hash join for the following query:

SELECT e.last_name, d.department_name


FROM employees e, departments d
WHERE e.department_id=d.department_id;

The plan looks as follows:

---------------------------------------------------------------------------
|Id | Operation | Name | Rows| Bytes |Cost(%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 5 (100)| |
|*1 | HASH JOIN | | 106 | 2862 | 5 (20)| 00:00:01 |
| 2 | TABLE ACCESS FULL| DEPARTMENTS | 27 | 432 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| EMPLOYEES | 107 | 1177 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------

To force a nested loops join using departments as the inner table, add the USE_NL hint as in
the following query:

SELECT /*+ ORDERED USE_NL(d) */ e.last_name, d.department_name


FROM employees e, departments d
WHERE e.department_id=d.department_id;

The plan looks as follows:

---------------------------------------------------------------------------
| Id | Operation | Name | Rows |Bytes |Cost (%CPU)|Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 34 (100)| |
| 1 | NESTED LOOPS | | 106 | 2862 | 34 (3)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMPLOYEES | 107 | 1177 | 2 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| DEPARTMENTS | 1 | 16 | 0 (0)| |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

3 - filter("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")

The database obtains the result set as follows:


1. In the nested loop, the database reads employees to obtain the last name and
department ID for an employee (Step 2). For example:

De Haan,90

9-15
Chapter 9
Join Methods

2. For the row obtained in the previous step, the database scans departments to find
the department name that matches the employees department ID (Step 3), and
joins the result (Step 1). For example:

De Haan,Executive

3. The database retrieves the next row in employees, retrieves the matching row from
departments, and then repeats this process until all rows are retrieved.
The result set has the following form:

De Haan,Executive
Kochnar,Executive
Baer,Public Relations
King,Executive
.
.
.

See Also:

• "Guidelines for Join Order Hints" to learn more about the USE_NL hint
• Oracle Database SQL Language Reference to learn about the USE_NL
hint

9.2.2 Hash Joins


The database uses a hash join to join larger data sets.
The optimizer uses the smaller of two data sets to build a hash table on the join key in
memory, using a deterministic hash function to specify the location in the hash table in
which to store each row. The database then scans the larger data set, probing the
hash table to find the rows that meet the join condition.
This section contains the following topics:

9.2.2.1 When the Optimizer Considers Hash Joins


In general, the optimizer considers a hash join when a relatively large amount of data
must be joined (or a large percentage of a small table must be joined), and the join is
an equijoin.
A hash join is most cost effective when the smaller data set fits in memory. In this
case, the cost is limited to a single read pass over the two data sets.
Because the hash table is in the PGA, Oracle Database can access rows without
latching them. This technique reduces logical I/O by avoiding the necessity of
repeatedly latching and reading blocks in the database buffer cache.
If the data sets do not fit in memory, then the database partitions the row sources, and
the join proceeds partition by partition. This can use a lot of sort area memory, and I/O

9-16
Chapter 9
Join Methods

to the temporary tablespace. This method can still be the most cost effective, especially when
the database uses parallel query servers.

9.2.2.2 How Hash Joins Work


A hashing algorithm takes a set of inputs and applies a deterministic hash function to
generate a hash value between 1 and n, where n is the size of the hash table.
In a hash join, the input values are the join keys. The output values are indexes (slots) in an
array, which is the hash table.
This section contains the following topics:

9.2.2.2.1 Hash Tables


To illustrate a hash table, assume that the database hashes hr.departments in a join of
departments and employees. The join key column is department_id.

The first 5 rows of departments are as follows:

SQL> select * from departments where rownum < 6;

DEPARTMENT_ID DEPARTMENT_NAME MANAGER_ID LOCATION_ID


------------- ------------------------------ ---------- -----------
10 Administration 200 1700
20 Marketing 201 1800
30 Purchasing 114 1700
40 Human Resources 203 2400
50 Shipping 121 1500

The database applies the hash function to each department_id in the table, generating a
hash value for each. For this illustration, the hash table has 5 slots (it could have more or
less). Because n is 5, the possible hash values range from 1 to 5. The hash functions might
generate the following values for the department IDs:

f(10) = 4
f(20) = 1
f(30) = 4
f(40) = 2
f(50) = 5

Note that the hash function happens to generate the same hash value of 4 for departments
10 and 30. This is known as a hash collision. In this case, the database puts the records for
departments 10 and 30 in the same slot, using a linked list. Conceptually, the hash table looks
as follows:

1 20,Marketing,201,1800
2 40,Human Resources,203,2400
3
4 10,Administration,200,1700 -> 30,Purchasing,114,1700
5 50,Shipping,121,1500

9-17
Chapter 9
Join Methods

9.2.2.2.2 Hash Join: Basic Steps


The optimizer uses the smaller data source to build a hash table on the join key in
memory, and then scans the larger table to find the joined rows.
The basic steps are as follows:
1. The database performs a full scan of the smaller data set, called the build table,
and then applies a hash function to the join key in each row to build a hash table in
the PGA.
In pseudocode, the algorithm might look as follows:

FOR small_table_row IN (SELECT * FROM small_table)


LOOP
slot_number := HASH(small_table_row.join_key);
INSERT_HASH_TABLE(slot_number,small_table_row);
END LOOP;

2. The database probes the second data set, called the probe table, using
whichever access mechanism has the lowest cost.
Typically, the database performs a full scan of both the smaller and larger data set.
The algorithm in pseudocode might look as follows:

FOR large_table_row IN (SELECT * FROM large_table)


LOOP
slot_number := HASH(large_table_row.join_key);
small_table_row =
LOOKUP_HASH_TABLE(slot_number,large_table_row.join_key);
IF small_table_row FOUND
THEN
output small_table_row + large_table_row;
END IF;
END LOOP;

For each row retrieved from the larger data set, the database does the following:
a. Applies the same hash function to the join column or columns to calculate the
number of the relevant slot in the hash table.
For example, to probe the hash table for department ID 30, the database
applies the hash function to 30, which generates the hash value 4.
b. Probes the hash table to determine whether rows exists in the slot.
If no rows exist, then the database processes the next row in the larger data
set. If rows exist, then the database proceeds to the next step.
c. Checks the join column or columns for a match. If a match occurs, then the
database either reports the rows or passes them to the next step in the plan,
and then processes the next row in the larger data set.
If multiple rows exist in the hash table slot, the database walks through the
linked list of rows, checking each one. For example, if department 30 hashes
to slot 4, then the database checks each row until it finds 30.

9-18
Chapter 9
Join Methods

Example 9-4 Hash Joins


An application queries the oe.orders and oe.order_items tables, joining on the order_id
column.

SELECT o.customer_id, l.unit_price * l.quantity


FROM orders o, order_items l
WHERE l.order_id = o.order_id;

The execution plan is as follows:

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 665 | 13300 | 8 (25)|
|* 1 | HASH JOIN | | 665 | 13300 | 8 (25)|
| 2 | TABLE ACCESS FULL | ORDERS | 105 | 840 | 4 (25)|
| 3 | TABLE ACCESS FULL | ORDER_ITEMS | 665 | 7980 | 4 (25)|
--------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------
1 - access("L"."ORDER_ID"="O"."ORDER_ID")

Because the orders table is small relative to the order_items table, which is 6 times larger,
the database hashes orders. In a hash join, the data set for the build table always appears
first in the list of operations (Step 2). In Step 3, the database performs a full scan of the larger
order_items later, probing the hash table for each row.

9.2.2.3 How Hash Joins Work When the Hash Table Does Not Fit in the PGA
The database must use a different technique when the hash table does not fit entirely in the
PGA. In this case, the database uses a temporary space to hold portions (called partitions) of
the hash table, and sometimes portions of the larger table that probes the hash table.
The basic process is as follows:
1. The database performs a full scan of the smaller data set, and then builds an array of
hash buckets in both the PGA and on disk.
When the PGA hash area fills up, the database finds the largest partition within the hash
table and writes it to temporary space on disk. The database stores any new row that
belongs to this on-disk partition on disk, and all other rows in the PGA. Thus, part of the
hash table is in memory and part of it on disk.
2. The database takes a first pass at reading the other data set.
For each row, the database does the following:
a. Applies the same hash function to the join column or columns to calculate the
number of the relevant hash bucket.
b. Probes the hash table to determine whether rows exist in the bucket in memory.
If the hashed value points to a row in memory, then the database completes the join
and returns the row. If the value points to a hash partition on disk, however, then the

9-19
Chapter 9
Join Methods

database stores this row in the temporary tablespace, using the same
partitioning scheme used for the original data set.
3. The database reads each on-disk temporary partition one by one
4. The database joins each partition row to the row in the corresponding on-disk
temporary partition.

9.2.2.4 Hash Join Controls


The USE_HASH hint instructs the optimizer to use a hash join when joining two tables
together.

See Also:

• "Guidelines for Join Order Hints"


• Oracle Database SQL Language Reference to learn about USE_HASH

9.2.3 Sort Merge Joins


A sort merge join is a variation on a nested loops join.
If the two data sets in the join are not already sorted, then the database sorts them.
These are the SORT JOIN operations. For each row in the first data set, the database
probes the second data set for matching rows and joins them, basing its start position
on the match made in the previous iteration. This is the MERGE JOIN operation.

Figure 9-6 Sort Merge Join

MERGE JOIN

SORT JOIN SORT JOIN

First Row Second Row


Source Source

This section contains the following topics:

9-20
Chapter 9
Join Methods

9.2.3.1 When the Optimizer Considers Sort Merge Joins


A hash join requires one hash table and one probe of this table, whereas a sort merge join
requires two sorts.
The optimizer may choose a sort merge join over a hash join for joining large amounts of data
when any of the following conditions is true:
• The join condition between two tables is not an equijoin, that is, uses an inequality
condition such as <, <=, >, or >=.
In contrast to sort merges, hash joins require an equality condition.
• Because of sorts required by other operations, the optimizer finds it cheaper to use a sort
merge.
If an index exists, then the database can avoid sorting the first data set. However, the
database always sorts the second data set, regardless of indexes.
A sort merge has the same advantage over a nested loops join as the hash join: the
database accesses rows in the PGA rather than the SGA, reducing logical I/O by avoiding the
necessity of repeatedly latching and reading blocks in the database buffer cache. In general,
hash joins perform better than sort merge joins because sorting is expensive. However, sort
merge joins offer the following advantages over a hash join:
• After the initial sort, the merge phase is optimized, resulting in faster generation of output
rows.
• A sort merge can be more cost-effective than a hash join when the hash table does not fit
completely in memory.
A hash join with insufficient memory requires both the hash table and the other data set
to be copied to disk. In this case, the database may have to read from disk multiple times.
In a sort merge, if memory cannot hold the two data sets, then the database writes them
both to disk, but reads each data set no more than once.

9.2.3.2 How Sort Merge Joins Work


As in a nested loops join, a sort merge join reads two data sets, but sorts them when they are
not already sorted.
For each row in the first data set, the database finds a starting row in the second data set,
and then reads the second data set until it finds a nonmatching row. In pseudocode, the high-
level algorithm for sort merge might look as follows:

READ data_set_1 SORT BY JOIN KEY TO temp_ds1


READ data_set_2 SORT BY JOIN KEY TO temp_ds2

READ ds1_row FROM temp_ds1


READ ds2_row FROM temp_ds2

WHILE NOT eof ON temp_ds1,temp_ds2


LOOP
IF ( temp_ds1.key = temp_ds2.key ) OUTPUT JOIN ds1_row,ds2_row
ELSIF ( temp_ds1.key <= temp_ds2.key ) READ ds1_row FROM temp_ds1
ELSIF ( temp_ds1.key => temp_ds2.key ) READ ds2_row FROM temp_ds2
END LOOP

9-21
Chapter 9
Join Methods

For example, the following table shows sorted values in two data sets: temp_ds1 and
temp_ds2.

Table 9-3 Sorted Data Sets

temp_ds1 temp_ds2
10 20
20 20
30 40
40 40
50 40
60 40
70 40
. 60
. 70
. 70

As shown in the following table, the database begins by reading 10 in temp_ds1, and
then reads the first value in temp_ds2. Because 20 in temp_ds2 is higher than 10 in
temp_ds1, the database stops reading temp_ds2.

Table 9-4 Start at 10 in temp_ds1

temp_ds1 temp_ds2 Action


10 [start here] 20 [start here] [stop 20 in temp_ds2 is higher than 10 in temp_ds1. Stop.
here] Start again with next row in temp_ds1.
20 20 N/A
30 40 N/A
40 40 N/A
50 40 N/A
60 40 N/A
70 40 N/A
. 60 N/A
. 70 N/A
. 70 N/A

The database proceeds to the next value in temp_ds1, which is 20. The database
proceeds through temp_ds2 as shown in the following table.

Table 9-5 Start at 20 in temp_ds1

temp_ds1 temp_ds2 Action


10 20 [start here] Match. Proceed to next value in temp_ds2.
20 [start here] 20 Match. Proceed to next value in temp_ds2.
30 40 [stop here] 40 in temp_ds2 is higher than 20 in temp_ds1. Stop.
Start again with next row in temp_ds1.
40 40 N/A
50 40 N/A

9-22
Chapter 9
Join Methods

Table 9-5 (Cont.) Start at 20 in temp_ds1

temp_ds1 temp_ds2 Action


60 40 N/A
70 40 N/A
. 60 N/A
. 70 N/A
. 70 N/A

The database proceeds to the next row in temp_ds1, which is 30. The database starts at the
number of its last match, which was 20, and then proceeds through temp_ds2 looking for a
match, as shown in the following table.

Table 9-6 Start at 30 in temp_ds1

temp_ds1 temp_ds2 Action


10 20 N/A
20 20 [start at last match] 20 in temp_ds2 is lower than 30 in temp_ds1. Proceed
to next value in temp_ds2.
30 [start here] 40 [stop here] 40 in temp_ds2 is higher than 30 in temp_ds1. Stop.
Start again with next row in temp_ds1.
40 40 N/A
50 40 N/A
60 40 N/A
70 40 N/A
. 60 N/A
. 70 N/A
. 70 N/A

The database proceeds to the next row in temp_ds1, which is 40. As shown in the following
table, the database starts at the number of its last match in temp_ds2, which was 20, and then
proceeds through temp_ds2 looking for a match.

Table 9-7 Start at 40 in temp_ds1

temp_ds1 temp_ds2 Action


10 20 N/A
20 20 [start at last match] 20 in temp_ds2 is lower than 40 in temp_ds1. Proceed to
next value in temp_ds2.
30 40 Match. Proceed to next value in temp_ds2.
40 [start here] 40 Match. Proceed to next value in temp_ds2.
50 40 Match. Proceed to next value in temp_ds2.
60 40 Match. Proceed to next value in temp_ds2.
70 40 Match. Proceed to next value in temp_ds2.
. 60 [stop here] 60 in temp_ds2 is higher than 40 in temp_ds1. Stop. Start
again with next row in temp_ds1.
. 70 N/A
. 70 N/A

9-23
Chapter 9
Join Methods

The database continues in this way until it has matched the final 70 in temp_ds2. This
scenario demonstrates that the database, as it reads through temp_ds1, does not need
to read every row in temp_ds2. This is an advantage over a nested loops join.

Example 9-5 Sort Merge Join Using Index


The following query joins the employees and departments tables on the
department_id column, ordering the rows on department_id as follows:

SELECT e.employee_id, e.last_name, e.first_name, e.department_id,


d.department_name
FROM employees e, departments d
WHERE e.department_id = d.department_id
ORDER BY department_id;

A query of DBMS_XPLAN.DISPLAY_CURSOR shows that the plan uses a sort merge join:

------------------------------------------------------------------------
---
|Id| Operation | Name |Rows|Bytes|Cost (%CPU)|
Time|
------------------------------------------------------------------------
---
| 0| SELECT STATEMENT | | | |
5(100)| |
| 1| MERGE JOIN | |106 |4028 |5 (20)|
00:00:01|
| 2| TABLE ACCESS BY INDEX ROWID|DEPARTMENTS | 27 | 432 |2 (0)|
00:00:01|
| 3| INDEX FULL SCAN |DEPT_ID_PK | 27 | |1 (0)|
00:00:01|
|*4| SORT JOIN | |107 |2354 |3 (34)|
00:00:01|
| 5| TABLE ACCESS FULL |EMPLOYEES |107 |2354 |2 (0)|
00:00:01|
------------------------------------------------------------------------
---

Predicate Information (identified by operation id):


---------------------------------------------------

4 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
filter("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")

The two data sets are the departments table and the employees table. Because an
index orders the departments table by department_id, the database can read this
index and avoid a sort (Step 3). The database only needs to sort the employees table
(Step 4), which is the most CPU-intensive operation.

9-24
Chapter 9
Join Methods

Example 9-6 Sort Merge Join Without an Index


You join the employees and departments tables on the department_id column, ordering the
rows on department_id as follows. In this example, you specify NO_INDEX and USE_MERGE to
force the optimizer to choose a sort merge:

SELECT /*+ USE_MERGE(d e) NO_INDEX(d) */ e.employee_id, e.last_name,


e.first_name,
e.department_id, d.department_name
FROM employees e, departments d
WHERE e.department_id = d.department_id
ORDER BY department_id;

A query of DBMS_XPLAN.DISPLAY_CURSOR shows that the plan uses a sort merge join:

---------------------------------------------------------------------------
| Id| Operation | Name | Rows| Bytes|Cost (%CPU)|Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 6 (100)| |
| 1 | MERGE JOIN | | 106 | 9540 | 6 (34)| 00:00:01|
| 2 | SORT JOIN | | 27 | 567 | 3 (34)| 00:00:01|
| 3 | TABLE ACCESS FULL| DEPARTMENTS | 27 | 567 | 2 (0)| 00:00:01|
|*4 | SORT JOIN | | 107 | 7383 | 3 (34)| 00:00:01|
| 5 | TABLE ACCESS FULL| EMPLOYEES | 107 | 7383 | 2 (0)| 00:00:01|
---------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

4 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
filter("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")

Because the departments.department_id index is ignored, the optimizer performs a sort,


which increases the combined cost of Step 2 and Step 3 by 67% (from 3 to 5).

9.2.3.3 Sort Merge Join Controls


The USE_MERGE hint instructs the optimizer to use a sort merge join.

In some situations it may make sense to override the optimizer with the USE_MERGE hint. For
example, the optimizer can choose a full scan on a table and avoid a sort operation in a
query. However, there is an increased cost because a large table is accessed through an
index and single block reads, as opposed to faster access through a full table scan.

See Also:
Oracle Database SQL Language Reference to learn about the USE_MERGE hint

9-25
Chapter 9
Join Types

9.3 Join Types


A join type is determined by the type of join condition.
This section contains the following topics:

9.3.1 Inner Joins


An inner join (sometimes called a simple join) is a join that returns only rows that
satisfy the join condition. Inner joins are either equijoins or nonequijoins.
This section contains the following topics:

9.3.1.1 Equijoins
An equijoin is an inner join whose join condition contains an equality operator.
The following example is an equijoin because the join condition contains only an
equality operator:

SELECT e.employee_id, e.last_name, d.department_name


FROM employees e, departments d
WHERE e.department_id=d.department_id;

In the preceding query, the join condition is e.department_id=d.department_id. If a


row in the employees table has a department ID that matches the value in a row in the
departments table, then the database returns the joined result; otherwise, the
database does not return a result.

9.3.1.2 Nonequijoins
A nonequijoin is an inner join whose join condition contains an operator that is not an
equality operator.
The following query lists all employees whose hire date occurred when employee 176
(who is listed in job_history because he changed jobs in 2007) was working at the
company:

SELECT e.employee_id, e.first_name, e.last_name, e.hire_date


FROM employees e, job_history h
WHERE h.employee_id = 176
AND e.hire_date BETWEEN h.start_date AND h.end_date;

In the preceding example, the condition joining employees and job_history does not
contain an equality operator, so it is a nonequijoin. Nonequijoins are relatively rare.
Note that a hash join requires at least a partial equijoin. The following SQL script
contains an equality join condition (e1.empno = e2.empno) and a nonequality
condition:

SET AUTOTRACE TRACEONLY EXPLAIN


SELECT *

9-26
Chapter 9
Join Types

FROM scott.emp e1 JOIN scott.emp e2


ON ( e1.empno = e2.empno
AND e1.hiredate BETWEEN e2.hiredate-1 AND e2.hiredate+1 )

The optimizer chooses a hash join for the preceding query, as shown in the following plan:

Execution Plan
----------------------------------------------------------
Plan hash value: 3638257876

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 174 | 5 (20)| 00:00:01 |
|* 1 | HASH JOIN | | 1 | 174 | 5 (20)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMP | 14 | 1218 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| EMP | 14 | 1218 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

1 - access("E1"."EMPNO"="E2"."EMPNO")
filter("E1"."HIREDATE">=INTERNAL_FUNCTION("E2"."HIREDATE")-1 AND
"E1"."HIREDATE"<=INTERNAL_FUNCTION("E2"."HIREDATE")+1)

9.3.1.3 Band Joins


A band join is a special type of nonequijoin in which key values in one data set must fall
within the specified range (“band”) of the second data set. The same table can serve as both
the first and second data sets.
Starting in Oracle Database 12c Release 2 (12.2), the database evaluates band joins more
efficiently. The optimization avoids the unnecessary scanning of rows that fall outside the
defined bands.
The optimizer uses a cost estimate to choose the join method (hash, nested loops, or sort
merge) and the parallel data distribution method. In most cases, optimized performance is
comparable to an equijoin.
This following examples query employees whose salaries are between $100 less and $100
more than the salary of each employee. Thus, the band has a width of $200. The examples
assume that it is permissible to compare the salary of every employee with itself. The
following query includes partial sample output:

SELECT e1.last_name ||
' has salary between 100 less and 100 more than ' ||
e2.last_name AS "SALARY COMPARISON"
FROM employees e1,
employees e2
WHERE e1.salary
BETWEEN e2.salary - 100
AND e2.salary + 100;

9-27
Chapter 9
Join Types

SALARY COMPARISON
-------------------------------------------------------------
King has salary between 100 less and 100 more than King
Kochhar has salary between 100 less and 100 more than Kochhar
Kochhar has salary between 100 less and 100 more than De Haan
De Haan has salary between 100 less and 100 more than Kochhar
De Haan has salary between 100 less and 100 more than De Haan
Russell has salary between 100 less and 100 more than Russell
Partners has salary between 100 less and 100 more than Partners
...

Example 9-7 Query Without Band Join Optimization


Without the band join optimization, the database uses the following query plan:

------------------------------------------
PLAN_TABLE_OUTPUT
------------------------------------------
------------------------------------------
| Id | Operation | Name |
------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | MERGE JOIN | |
| 2 | SORT JOIN | |
| 3 | TABLE ACCESS FULL | EMPLOYEES |
|* 4 | FILTER | |
|* 5 | SORT JOIN | |
| 6 | TABLE ACCESS FULL| EMPLOYEES |
------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------
4 - filter("E1"."SAL"<="E2"."SAL"+100)
5 - access(INTERNAL_FUNCTION("E1"."SAL")>="E2"."SAL"-100)
filter(INTERNAL_FUNCTION("E1"."SAL")>="E2"."SAL"-100)

In this plan, Step 2 sorts the e1 row source, and Step 5 sorts the e2 row source. The
sorted row sources are illustrated in the following table.

Table 9-8 Sorted row Sources

e1 Sorted (Step 2 of Plan) e2 Sorted (Step 5 of Plan)


24000 (King) 24000 (King)
17000 (Kochhar) 17000 (Kochhar)
17000 (De Haan) 17000 (De Haan)
14000 (Russell) 14000 (Russell)
13500 (Partners) 13500 (Partners)

The join begins by iterating through the sorted input (e1), which is the left branch of the
join, corresponding to Step 2 of the plan. The original query contains two predicates:
• e1.sal >= e2.sal–100, which is the Step 5 filter

9-28
Chapter 9
Join Types

• e1.sal >= e2.sal+100, which is the Step 4 filter


For each iteration of the sorted row source e1, the database iterates through row source e2,
checking every row against Step 5 filter e1.sal >= e2.sal–100. If the row passes the Step 5
filter, then the database sends it to the Step 4 filter, and then proceeds to test the next row in
e2 against the Step 5 filter. However, if a row fails the Step 5 filter, then the scan of e2 stops,
and the database proceeds through the next iteration of e1.

The following table shows the first iteration of e1, which begins with 24000 (King) in data set
e1. The database determines that the first row in e2, which is 24000 (King), passes the Step
5 filter. The database then sends the row to the Step 4 filter, e1.sal <= w2.sal+100, which
also passes. The database sends this row to the MERGE row source. Next, the database
checks 17000 (Kochhar) against the Step 5 filter, which also passes. However, the row fails
the Step 4 filter, and is discarded. The database proceeds to test 17000 (De Haan) against
the Step 5 filter.

Table 9-9 First Iteration of e1: Separate SORT JOIN and FILTER

Scan e2 Step 5 Filter (e1.sal >= e2.sal–100) Step 4 Filter (e1.sal <= e2.sal+100)
24000 (King) Pass because 24000 >= 23900. Send Pass because 24000 <= 24100. Return
to Step 4 filter. row for merging.
17000 (Kochhar) Pass because 24000 >= 16900. Send Fail because 24000 <=17100 is false.
to Step 4 filter. Discard row. Scan next row in e2.
17000 (De Haan) Pass because 24000 >= 16900. Send Fail because 24000 <=17100 is false.
to Step 4 filter. Discard row. Scan next row in e2.
14000 (Russell) Pass because 24000 >= 13900. Send Fail because 24000 <=14100 is false.
to Step 4 filter. Discard row. Scan next row in e2.
13500 (Partners) Pass because 24000 >= 13400. Send Fail because 24000 <=13600 is false.
to Step 4 filter. Discard row. Scan next row in e2.

As shown in the preceding table, every e2 row necessarily passes the Step 5 filter because
the e2 salaries are sorted in descending order. Thus, the Step 5 filter always sends the row to
the Step 4 filter. Because the e2 salaries are sorted in descending order, the Step 4 filter
necessarily fails every row starting with 17000 (Kochhar). The inefficiency occurs because
the database tests every subsequent row in e2 against the Step 5 filter, which necessarily
passes, and then against the Step 4 filter, which necessarily fails.
Example 9-8 Query With Band Join Optimization
Starting in Oracle Database 12c Release 2 (12.2), the database optimizes the band join by
using the following plan, which does not have a separate FILTER operation:

------------------------------------------
PLAN_TABLE_OUTPUT
------------------------------------------
| Id | Operation | Name |
------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | MERGE JOIN | |
| 2 | SORT JOIN | |
| 3 | TABLE ACCESS FULL | EMPLOYEES |
|* 4 | SORT JOIN | |
| 5 | TABLE ACCESS FULL | EMPLOYEES |

9-29
Chapter 9
Join Types

------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------
4 - access(INTERNAL_FUNCTION("E1"."SALARY")>="E2"."SALARY"-100)
filter(("E1"."SALARY"<="E2"."SALARY"+100 AND
INTERNAL_FUNCTION("E1"."SALARY")>="E2"."SALARY"-100))

The difference is that Step 4 uses Boolean AND logic for the two predicates to create a
single filter. Instead of checking a row against one filter, and then sending it to a
different row source for checking against a second filter, the database performs one
check against one filter. If the check fails, then processing stops.
In this example, the query begins the first iteration of e1, which begins with 24000
(King). The following figure represents the range. e2 values below 23900 and above
24100 fall outside the range.

Figure 9-7 Band Join

The following table shows that the database tests the first row of e2, which is 24000
(King), against the Step 4 filter. The row passes the test, so the database sends the
row to be merged. The next row in e2 is 17000 (Kochhar). This row falls outside of the
range (band) and thus does not satisfy the filter predicate, so the database stops
testing e2 rows in this iteration. The database stops testing because the descending
sort of e2 ensures that all subsequent rows in e2 fail the filter test. Thus, the database
can proceed to the second iteration of e1.

Table 9-10 First Iteration of e1: Single SORT JOIN

Scan e2 Filter 4 (e1.sal >= e2.sal – 100) AND (e1.sal <= e2.sal + 100)
24000 (King) Passes test because it is true that (24000 >= 23900) AND (24000
<= 24100).
Send row to MERGE. Test next row.
17000 (Kochhar) Fails test because it is false that (24000 >= 16900) AND (24000 <=
17100).
Stop scanning e2. Begin next iteration of e1.
17000 (De Haan) n/a
14000 (Russell) n/a
13500 (Partners) n/a

In this way, the band join optimization eliminates unnecessary processing. Instead of
scanning every row in e2 as in the unoptimized case, the database scans only the
minimum two rows.

9-30
Chapter 9
Join Types

9.3.2 Outer Joins


An outer join returns all rows that satisfy the join condition and also rows from one table for
which no rows from the other table satisfy the condition. Thus, the result set of an outer join is
the superset of an inner join.
In ANSI syntax, the OUTER JOIN clause specifies an outer join. In the FROM clause, the left
table appears to the left of the OUTER JOIN keywords, and the right table appears to the right
of these keywords. The left table is also called the outer table, and the right table is also
called the inner table. For example, in the following statement the employees table is the left
or outer table:

SELECT employee_id, last_name, first_name


FROM employees LEFT OUTER JOIN departments
ON (employees.department_id=departments.departments_id);

Outer joins require the outer-joined table to be the driving table. In the preceding example,
employees is the driving table, and departments is the driven-to table.

This section contains the following topics:

9.3.2.1 Nested Loops Outer Joins


The database uses this operation to loop through an outer join between two tables. The outer
join returns the outer (preserved) table rows, even when no corresponding rows are in the
inner (optional) table.
In a standard nested loop, the optimizer chooses the order of tables—which is the driving
table and which the driven table—based on the cost. However, in a nested loop outer join, the
join condition determines the order of tables. The database uses the outer, row-preserved
table to drive to the inner table.
The optimizer uses nested loops joins to process an outer join in the following circumstances:
• It is possible to drive from the outer table to the inner table.
• Data volume is low enough to make the nested loop method efficient.
For an example of a nested loop outer join, you can add the USE_NL hint to Example 9-9 to
instruct the optimizer to use a nested loop. For example:

SELECT /*+ USE_NL(c o) */ cust_last_name,


SUM(NVL2(o.customer_id,0,1)) "Count"
FROM customers c, orders o
WHERE c.credit_limit > 1000
AND c.customer_id = o.customer_id(+)
GROUP BY cust_last_name;

9.3.2.2 Hash Join Outer Joins


The optimizer uses hash joins for processing an outer join when either the data volume is
large enough to make a hash join efficient, or it is impossible to drive from the outer table to
the inner table.

9-31
Chapter 9
Join Types

The cost determines the order of tables. The outer table, including preserved rows,
may be used to build the hash table, or it may be used to probe the hash table.
Example 9-9 Hash Join Outer Joins
This example shows a typical hash join outer join query, and its execution plan. In this
example, all the customers with credit limits greater than 1000 are queried. An outer
join is needed so that the query captures customers who have no orders.
• The outer table is customers.
• The inner table is orders.
• The join preserves the customers rows, including those rows without a
corresponding row in orders.
You could use a NOT EXISTS subquery to return the rows. However, because you are
querying all the rows in the table, the hash join performs better (unless the NOT EXISTS
subquery is not nested).

SELECT cust_last_name, SUM(NVL2(o.customer_id,0,1)) "Count"


FROM customers c, orders o
WHERE c.credit_limit > 1000
AND c.customer_id = o.customer_id(+)
GROUP BY cust_last_name;

------------------------------------------------------------------------
---
| Id | Operation | Name |Rows |Bytes|Cost (%CPU)|
Time |
------------------------------------------------------------------------
---
| 0 | SELECT STATEMENT | | | | 7
(100)| |
| 1 | HASH GROUP BY | | 168 | 3192 | 7 (29)|
00:00:01 |
|* 2 | HASH JOIN OUTER | | 318 | 6042 | 6 (17)|
00:00:01 |
|* 3 | TABLE ACCESS FULL| CUSTOMERS | 260 | 3900 | 3 (0)|
00:00:01 |
|* 4 | TABLE ACCESS FULL| ORDERS | 105 | 420 | 2 (0)|
00:00:01 |
------------------------------------------------------------------------
---

Predicate Information (identified by operation id):


---------------------------------------------------

2 - access("C"."CUSTOMER_ID"="O"."CUSTOMER_ID")

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------
--
3 - filter("C"."CREDIT_LIMIT">1000)
4 - filter("O"."CUSTOMER_ID">0)

9-32
Chapter 9
Join Types

The query looks for customers which satisfy various conditions. An outer join returns NULL for
the inner table columns along with the outer (preserved) table rows when it does not find any
corresponding rows in the inner table. This operation finds all the customers rows that do not
have any orders rows.

In this case, the outer join condition is the following:

customers.customer_id = orders.customer_id(+)

The components of this condition represent the following:


Example 9-10 Outer Join to a Multitable View
In this example, the outer join is to a multitable view. The optimizer cannot drive into the view
like in a normal join or push the predicates, so it builds the entire row set of the view.

SELECT c.cust_last_name, sum(revenue)


FROM customers c, v_orders o
WHERE c.credit_limit > 2000
AND o.customer_id(+) = c.customer_id
GROUP BY c.cust_last_name;

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 144 | 4608 | 16 (32)|
| 1 | HASH GROUP BY | | 144 | 4608 | 16 (32)|
|* 2 | HASH JOIN OUTER | | 663 | 21216 | 15 (27)|
|* 3 | TABLE ACCESS FULL | CUSTOMERS | 195 | 2925 | 6 (17)|
| 4 | VIEW | V_ORDERS | 665 | 11305 | |
| 5 | HASH GROUP BY | | 665 | 15960 | 9 (34)|
|* 6 | HASH JOIN | | 665 | 15960 | 8 (25)|
|* 7 | TABLE ACCESS FULL| ORDERS | 105 | 840 | 4 (25)|
| 8 | TABLE ACCESS FULL| ORDER_ITEMS | 665 | 10640 | 4 (25)|
---------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------
2 - access("O"."CUSTOMER_ID"(+)="C"."CUSTOMER_ID")
3 - filter("C"."CREDIT_LIMIT">2000)
6 - access("O"."ORDER_ID"="L"."ORDER_ID")
7 - filter("O"."CUSTOMER_ID">0)

The view definition is as follows:

CREATE OR REPLACE view v_orders AS


SELECT l.product_id, SUM(l.quantity*unit_price) revenue,
o.order_id, o.customer_id
FROM orders o, order_items l
WHERE o.order_id = l.order_id
GROUP BY l.product_id, o.order_id, o.customer_id;

9-33
Chapter 9
Join Types

9.3.2.3 Sort Merge Outer Joins


When an outer join cannot drive from the outer (preserved) table to the inner (optional)
table, it cannot use a hash join or nested loops joins.
In this case, it uses the sort merge outer join.
The optimizer uses sort merge for an outer join in the following cases:
• A nested loops join is inefficient. A nested loops join can be inefficient because of
data volumes.
• The optimizer finds it is cheaper to use a sort merge over a hash join because of
sorts required by other operations.

9.3.2.4 Full Outer Joins


A full outer join is a combination of the left and right outer joins.
In addition to the inner join, rows from both tables that have not been returned in the
result of the inner join are preserved and extended with nulls. In other words, full outer
joins join tables together, yet show rows with no corresponding rows in the joined
tables.
Example 9-11 Full Outer Join
The following query retrieves all departments and all employees in each department,
but also includes:
• Any employees without departments
• Any departments without employees

SELECT d.department_id, e.employee_id


FROM employees e FULL OUTER JOIN departments d
ON e.department_id = d.department_id
ORDER BY d.department_id;

The statement produces the following output:

DEPARTMENT_ID EMPLOYEE_ID
------------- -----------
10 200
20 201
20 202
30 114
30 115
30 116
...
270
280
178
207

125 rows selected.

9-34
Chapter 9
Join Types

Example 9-12 Execution Plan for a Full Outer Join


Starting with Oracle Database 11g, Oracle Database automatically uses a native execution
method based on a hash join for executing full outer joins whenever possible. When the
database uses the new method to execute a full outer join, the execution plan for the query
contains HASH JOIN FULL OUTER. The query in Example 9-11 uses the following execution
plan:

---------------------------------------------------------------------------
| Id| Operation | Name |Rows|Bytes |Cost (%CPU)|Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |122 | 4758 | 6 (34)|00:0 0:01|
| 1 | SORT ORDER BY | |122 | 4758 | 6 (34)|00:0 0:01|
| 2 | VIEW | VW_FOJ_0 |122 | 4758 | 5 (20)|00:0 0:01|
|*3 | HASH JOIN FULL OUTER | |122 | 1342 | 5 (20)|00:0 0:01|
| 4 | INDEX FAST FULL SCAN| DEPT_ID_PK | 27 | 108 | 2 (0)|00:0 0:01|
| 5 | TABLE ACCESS FULL | EMPLOYEES |107 | 749 | 2 (0)|00:0 0:01|
---------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------
3 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")

HASH JOIN FULL OUTER is included in the preceding plan (Step 3), indicating that the query
uses the hash full outer join execution method. Typically, when the full outer join condition
between two tables is an equijoin, the hash full outer join execution method is possible, and
Oracle Database uses it automatically.
To instruct the optimizer to consider using the hash full outer join execution method, apply the
NATIVE_FULL_OUTER_JOIN hint. To instruct the optimizer not to consider using the hash full
outer join execution method, apply the NO_NATIVE_FULL_OUTER_JOIN hint. The
NO_NATIVE_FULL_OUTER_JOIN hint instructs the optimizer to exclude the native execution
method when joining each specified table. Instead, the full outer join is executed as a union of
left outer join and an antijoin.

9.3.2.5 Multiple Tables on the Left of an Outer Join


In Oracle Database 12c, multiple tables may exist on the left side of an outer-joined table.
This enhancement enables Oracle Database to merge a view that contains multiple tables
and appears on the left of the outer join. In releases before Oracle Database 12c, a query
such as the following was invalid, and would trigger an ORA-01417 error message:

SELECT t1.d, t3.c


FROM t1, t2, t3
WHERE t1.z = t2.z
AND t1.x = t3.x (+)
AND t2.y = t3.y (+);

Starting in Oracle Database 12c, the preceding query is valid.

9-35
Chapter 9
Join Types

9.3.3 Semijoins
A semijoin is a join between two data sets that returns a row from the first set when a
matching row exists in the subquery data set.
The database stops processing the second data set at the first match. Thus,
optimization does not duplicate rows from the first data set when multiple rows in the
second data set satisfy the subquery criteria.

Note:
Semijoins and antijoins are considered join types even though the SQL
constructs that cause them are subqueries. They are internal algorithms that
the optimizer uses to flatten subquery constructs so that they can be
resolved in a join-like way.

This section contains the following topics:

9.3.3.1 When the Optimizer Considers Semijoins


A semijoin avoids returning a huge number of rows when a query only needs to
determine whether a match exists.
With large data sets, this optimization can result in significant time savings over a
nested loops join that must loop through every record returned by the inner query for
every row in the outer query. The optimizer can apply the semijoin optimization to
nested loops joins, hash joins, and sort merge joins.
The optimizer may choose a semijoin in the following circumstances:
• The statement uses either an IN or EXISTS clause.
• The statement contains a subquery in the IN or EXISTS clause.
• The IN or EXISTS clause is not contained inside an OR branch.

9.3.3.2 How Semijoins Work


The semijoin optimization is implemented differently depending on what type of join is
used.
The following pseudocode shows a semijoin for a nested loops join:

FOR ds1_row IN ds1 LOOP


match := false;
FOR ds2_row IN ds2_subquery LOOP
IF (ds1_row matches ds2_row) THEN
match := true;
EXIT -- stop processing second data set when a match is found
END IF
END LOOP
IF (match = true) THEN
RETURN ds1_row

9-36
Chapter 9
Join Types

END IF
END LOOP

In the preceding pseudocode, ds1 is the first data set, and ds2_subquery is the subquery data
set. The code obtains the first row from the first data set, and then loops through the
subquery data set looking for a match. The code exits the inner loop as soon as it finds a
match, and then begins processing the next row in the first data set.
Example 9-13 Semijoin Using WHERE EXISTS
The following query uses a WHERE EXISTS clause to list only the departments that contain
employees:

SELECT department_id, department_name


FROM departments
WHERE EXISTS (SELECT 1
FROM employees
WHERE employees.department_id = departments.department_id)

The execution plan reveals a NESTED LOOPS SEMI operation in Step 1:

---------------------------------------------------------------------------
| Id| Operation | Name |Rows|Bytes|Cost (%CPU)|Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | NESTED LOOPS SEMI | |11 | 209 | 2 (0)|00:00:01 |
| 2 | TABLE ACCESS FULL| DEPARTMENTS |27 | 432 | 2 (0)|00:00:01 |
|*3 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX |44 | 132 | 0 (0)| |
---------------------------------------------------------------------------

For each row in departments, which forms the outer loop, the database obtains the
department ID, and then probes the employees.department_id index for matching entries.
Conceptually, the index looks as follows:

10,rowid
10,rowid
10,rowid
10,rowid
30,rowid
30,rowid
30,rowid
...

If the first entry in the departments table is department 30, then the database performs a
range scan of the index until it finds the first 30 entry, at which point it stops reading the index
and returns the matching row from departments. If the next row in the outer loop is
department 20, then the database scans the index for a 20 entry, and not finding any
matches, performs the next iteration of the outer loop. The database proceeds in this way
until all matching rows are returned.

9-37
Chapter 9
Join Types

Example 9-14 Semijoin Using IN


The following query uses a IN clause to list only the departments that contain
employees:

SELECT department_id, department_name


FROM departments
WHERE department_id IN
(SELECT department_id
FROM employees);

The execution plan reveals a NESTED LOOPS SEMI operation in Step 1:

------------------------------------------------------------------------
---
| Id| Operation | Name |Rows|Bytes|Cost (%CPU)|
Time |
------------------------------------------------------------------------
---
| 0 | SELECT STATEMENT | | | | 2
(100)| |
| 1 | NESTED LOOPS SEMI | |11 | 209 | 2 (0)|
00:00:01 |
| 2 | TABLE ACCESS FULL| DEPARTMENTS |27 | 432 | 2 (0)|
00:00:01 |
|*3 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX |44 | 132 | 0
(0)| |
------------------------------------------------------------------------
---

The plan is identical to the plan in Example 9-13.

9.3.4 Antijoins
An antijoin is a join between two data sets that returns a row from the first set when a
matching row does not exist in the subquery data set.
Like a semijoin, an antijoin stops processing the subquery data set when the first
match is found. Unlike a semijoin, the antijoin only returns a row when no match is
found.
This section contains the following topics:

9.3.4.1 When the Optimizer Considers Antijoins


An antijoin avoids unnecessary processing when a query only needs to return a row
when a match does not exist.
With large data sets, this optimization can result in significant time savings over a
nested loops join. The latter join must loop through every record returned by the inner
query for every row in the outer query. The optimizer can apply the antijoin
optimization to nested loops joins, hash joins, and sort merge joins.
The optimizer may choose an antijoin in the following circumstances:

9-38
Chapter 9
Join Types

• The statement uses either the NOT IN or NOT EXISTS clause.


• The statement has a subquery in the NOT IN or NOT EXISTS clause.
• The NOT IN or NOT EXISTS clause is not contained inside an OR branch.
• The statement performs an outer join and applies an IS NULL condition to a join column,
as in the following example:

SET AUTOTRACE TRACEONLY EXPLAIN


SELECT emp.*
FROM emp, dept
WHERE emp.deptno = dept.deptno(+)
AND dept.deptno IS NULL

Execution Plan
----------------------------------------------------------
Plan hash value: 1543991079

------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |Cost (%CPU)|Time |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 1400 | 5 (20)| 00:00:01 |
|* 1 | HASH JOIN ANTI | | 14 | 1400 | 5 (20)| 00:00:01 |
| 2 | TABLE ACCESS FULL| EMP | 14 | 1218 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| DEPT | 4 | 52 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

1 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")

Note
-----
- dynamic statistics used: dynamic sampling (level=2)

9.3.4.2 How Antijoins Work


The antijoin optimization is implemented differently depending on what type of join is used.
The following pseudocode shows an antijoin for a nested loops join:

FOR ds1_row IN ds1 LOOP


match := true;
FOR ds2_row IN ds2 LOOP
IF (ds1_row matches ds2_row) THEN
match := false;
EXIT -- stop processing second data set when a match is found
END IF
END LOOP
IF (match = true) THEN
RETURN ds1_row
END IF
END LOOP

9-39
Chapter 9
Join Types

In the preceding pseudocode, ds1 is the first data set, and ds2 is the second data set.
The code obtains the first row from the first data set, and then loops through the
second data set looking for a match. The code exits the inner loop as soon as it finds a
match, and begins processing the next row in the first data set.
Example 9-15 Semijoin Using WHERE EXISTS
The following query uses a WHERE EXISTS clause to list only the departments that
contain employees:

SELECT department_id, department_name


FROM departments
WHERE EXISTS (SELECT 1
FROM employees
WHERE employees.department_id =
departments.department_id)

The execution plan reveals a NESTED LOOPS SEMI operation in Step 1:

------------------------------------------------------------------------
---
| Id| Operation | Name |Rows|Bytes |Cost(%CPU)|
Time |
------------------------------------------------------------------------
---
| 0 | SELECT STATEMENT | | | | 2
(100)| |
| 1 | NESTED LOOPS SEMI | |11 | 209 | 2 (0)|
00:00:01 |
| 2 | TABLE ACCESS FULL| DEPARTMENTS |27 | 432 | 2 (0)|
00:00:01 |
|*3 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX |44 | 132 | 0
(0)| |
------------------------------------------------------------------------
---

For each row in departments, which forms the outer loop, the database obtains the
department ID, and then probes the employees.department_id index for matching
entries. Conceptually, the index looks as follows:

10,rowid
10,rowid
10,rowid
10,rowid
30,rowid
30,rowid
30,rowid
...

If the first record in the departments table is department 30, then the database
performs a range scan of the index until it finds the first 30 entry, at which point it stops
reading the index and returns the matching row from departments. If the next row in
the outer loop is department 20, then the database scans the index for a 20 entry, and

9-40
Chapter 9
Join Types

not finding any matches, performs the next iteration of the outer loop. The database proceeds
in this way until all matching rows are returned.

9.3.4.3 How Antijoins Handle Nulls


For semijoins, IN and EXISTS are functionally equivalent. However, NOT IN and NOT EXISTS
are not functionally equivalent because of nulls.
If a null value is returned to a NOT IN operator, then the statement returns no records. To see
why, consider the following WHERE clause:

WHERE department_id NOT IN (null, 10, 20)

The database tests the preceding expression as follows:

WHERE (department_id != null)


AND (department_id != 10)
AND (department_id != 20)

For the entire expression to be true, each individual condition must be true. However, a null
value cannot be compared to another value, so the department_id !=null condition cannot
be true, and thus the whole expression is always false. The following techniques enable a
statement to return records even when nulls are returned to the NOT IN operator:

• Apply an NVL function to the columns returned by the subquery.


• Add an IS NOT NULL predicate to the subquery.
• Implement NOT NULL constraints.
In contrast to NOT IN, the NOT EXISTS clause only considers predicates that return the
existence of a match, and ignores any row that does not match or could not be determined
because of nulls. If at least one row in the subquery matches the row from the outer query,
then NOT EXISTS returns false. If no tuples match, then NOT EXISTS returns true. The
presence of nulls in the subquery does not affect the search for matching records.
In releases earlier than Oracle Database 11g, the optimizer could not use an antijoin
optimization when nulls could be returned by a subquery. However, starting in Oracle
Database 11g, the ANTI NA (and ANTI SNA) optimizations described in the following sections
enable the optimizer to use an antijoin even when nulls are possible.
Example 9-16 Antijoin Using NOT IN
Suppose that a user issues the following query with a NOT IN clause to list the departments
that contain no employees:

SELECT department_id, department_name


FROM departments
WHERE department_id NOT IN
(SELECT department_id
FROM employees);

The preceding query returns no rows even though several departments contain no
employees. This result, which was not intended by the user, occurs because the
employees.department_id column is nullable.

9-41
Chapter 9
Join Types

The execution plan reveals a NESTED LOOPS ANTI SNA operation in Step 2:

------------------------------------------------------------------------
---
| Id| Operation | Name |Rows|Bytes|Cost (%CPU)|
Time|
------------------------------------------------------------------------
---
| 0| SELECT STATEMENT | | | |
4(100)| |
|*1| FILTER | | | |
| |
| 2| NESTED LOOPS ANTI SNA| |17 |323 | 4 (50)|
00:00:01|
| 3| TABLE ACCESS FULL | DEPARTMENTS |27 |432 | 2 (0)|
00:00:01|
|*4| INDEX RANGE SCAN | EMP_DEPARTMENT_IX |41 |123 | 0
(0)| |
|*5| TABLE ACCESS FULL | EMPLOYEES | 1 | 3 | 2 (0)|
00:00:01|
------------------------------------------------------------------------
---

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------
---

Predicate Information (identified by operation id):


---------------------------------------------------
1 - filter( IS NULL)
4 - access("DEPARTMENT_ID"="DEPARTMENT_ID")
5 - filter("DEPARTMENT_ID" IS NULL)

The ANTI SNA stands for "single null-aware antijoin." ANTI NA stands for "null-aware
antijoin." The null-aware operation enables the optimizer to use the antijoin
optimization even on a nullable column. In releases earlier than Oracle Database 11g,
the database could not perform antijoins on NOT IN queries when nulls were possible.

Suppose that the user rewrites the query by applying an IS NOT NULL condition to the
subquery:

SELECT department_id, department_name


FROM departments
WHERE department_id NOT IN
(SELECT department_id
FROM employees
WHERE department_id IS NOT NULL);

The preceding query returns 16 rows, which is the expected result. Step 1 in the plan
shows a standard NESTED LOOPS ANTI join instead of an ANTI NA or ANTI SNA join
because the subquery cannot returns nulls:

------------------------------------------------------------------------
---

9-42
Chapter 9
Join Types

|Id| Operation | Name |Rows|Bytes |Cost (%CPU)|Time |


---------------------------------------------------------------------------
| 0| SELECT STATEMENT | | | | 2 (100)| |
| 1| NESTED LOOPS ANTI | | 17 | 323 | 2 (0)|00:00:01 |
| 2| TABLE ACCESS FULL| DEPARTMENTS | 27 | 432 | 2 (0)|00:00:01 |
|*3| INDEX RANGE SCAN | EMP_DEPARTMENT_IX | 41 | 123 | 0 (0)| |
---------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("DEPARTMENT_ID"="DEPARTMENT_ID")
filter("DEPARTMENT_ID" IS NOT NULL)

Example 9-17 Antijoin Using NOT EXISTS


Suppose that a user issues the following query with a NOT EXISTS clause to list the
departments that contain no employees:

SELECT department_id, department_name


FROM departments d
WHERE NOT EXISTS
(SELECT null
FROM employees e
WHERE e.department_id = d.department_id)

The preceding query avoids the null problem for NOT IN clauses. Thus, even though
employees.department_id column is nullable, the statement returns the desired result.

Step 1 of the execution plan reveals a NESTED LOOPS ANTI operation, not the ANTI NA variant,
which was necessary for NOT IN when nulls were possible:

---------------------------------------------------------------------------
| Id| Operation | Name |Rows|Bytes| Cost (%CPU)|Time|
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | NESTED LOOPS ANTI | | 17 | 323 | 2 (0)|00:00:01|
| 2 | TABLE ACCESS FULL| DEPARTMENTS | 27 | 432 | 2 (0)|00:00:01|
|*3 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | 41 | 123 | 0 (0)| |
---------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")

9-43
Chapter 9
Join Types

9.3.5 Cartesian Joins


The database uses a Cartesian join when one or more of the tables does not have
any join conditions to any other tables in the statement.
The optimizer joins every row from one data source with every row from the other data
source, creating the Cartesian product of the two sets. Therefore, the total number of
rows resulting from the join is calculated using the following formula, where rs1 is the
number of rows in first row set and rs2 is the number of rows in the second row set:

rs1 X rs2 = total rows in result set

This section contains the following topics:

9.3.5.1 When the Optimizer Considers Cartesian Joins


The optimizer uses a Cartesian join for two row sources only in specific circumstances.
Typically, the situation is one of the following:
• No join condition exists.
In some cases, the optimizer could pick up a common filter condition between the
two tables as a possible join condition.

Note:
If a Cartesian join appears in a query plan, it could be caused by an
inadvertently omitted join condition. In general, if a query joins n tables,
then n-1 join conditions are required to avoid a Cartesian join.

• A Cartesian join is an efficient method.


For example, the optimizer may decide to generate a Cartesian product of two
very small tables that are both joined to the same large table.
• The ORDERED hint specifies a table before its join table is specified.

9.3.5.2 How Cartesian Joins Work


A Cartesian join uses nested FOR loops.

At a high level, the algorithm for a Cartesian join looks as follows, where ds1 is
typically the smaller data set, and ds2 is the larger data set:

FOR ds1_row IN ds1 LOOP


FOR ds2_row IN ds2 LOOP
output ds1_row and ds2_row
END LOOP
END LOOP

9-44
Chapter 9
Join Types

Example 9-18 Cartesian Join


In this example, a user intends to perform an inner join of the employees and departments
tables, but accidentally leaves off the join condition:

SELECT e.last_name, d.department_name


FROM employees e, departments d

The execution plan is as follows:

---------------------------------------------------------------------------
| Id| Operation | Name | Rows | Bytes |Cost (%CPU)|Time|
---------------------------------------------------------------------------
| 0| SELECT STATEMENT | | | |11 (100)| |
| 1| MERGE JOIN CARTESIAN | | 2889 |57780 |11 (0)|00:00:01|
| 2| TABLE ACCESS FULL | DEPARTMENTS | 27 | 324 | 2 (0)|00:00:01|
| 3| BUFFER SORT | | 107 | 856 | 9 (0)|00:00:01|
| 4| INDEX FAST FULL SCAN| EMP_NAME_IX | 107 | 856 | 0 (0)| |
---------------------------------------------------------------------------

In Step 1 of the preceding plan, the CARTESIAN keyword indicates the presence of a Cartesian
join. The number of rows (2889) is the product of 27 and 107.
In Step 3, the BUFFER SORT operation indicates that the database is copying the data blocks
obtained by the scan of emp_name_ix from the SGA to the PGA. This strategy avoids multiple
scans of the same blocks in the database buffer cache, which would generate many logical
reads and permit resource contention.

9.3.5.3 Cartesian Join Controls


The ORDERED hint instructs the optimizer to join tables in the order in which they appear in the
FROM clause. By forcing a join between two row sources that have no direct connection, the
optimizer must perform a Cartesian join.
Example 9-19 ORDERED Hint
In the following example, the ORDERED hint instructs the optimizer to join employees and
locations, but no join condition connects these two row sources:

SELECT /*+ORDERED*/ e.last_name, d.department_name, l.country_id,


l.state_province
FROM employees e, locations l, departments d
WHERE e.department_id = d.department_id
AND d.location_id = l.location_id

The following execution plan shows a Cartesian product (Step 3) between locations (Step 6)
and employees (Step 4), which is then joined to the departments table (Step 2):

---------------------------------------------------------------------------
| Id| Operation | Name |Rows | Bytes |Cost (%CPU)|Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | |37 (100)| |
|*1 | HASH JOIN | | 106 | 4664 |37 (6)|00:00:01 |
| 2 | TABLE ACCESS FULL | DEPARTMENTS | 27 | 513 | 2 (0)|00:00:01 |

9-45
Chapter 9
Join Optimizations

| 3 | MERGE JOIN CARTESIAN| |2461 |61525 |34 (3)|


00:00:01 |
| 4 | TABLE ACCESS FULL | EMPLOYEES | 107 | 1177 | 2 (0)|
00:00:01 |
| 5 | BUFFER SORT | | 23 | 322 |32 (4)|
00:00:01 |
| 6 | TABLE ACCESS FULL | LOCATIONS | 23 | 322 | 0
(0)| |
------------------------------------------------------------------------
---

See Also:
Oracle Database SQL Language Reference to learn about the ORDERED hint

9.4 Join Optimizations


Join optimizations enable joins to be more efficient.
This section describes common join optimizations:

9.4.1 Bloom Filters


A Bloom filter, named after its creator Burton Bloom, is a low-memory data structure
that tests membership in a set.
A Bloom filter correctly indicates when an element is not in a set, but can incorrectly
indicate when an element is in a set. Thus, false negatives are impossible but false
positives are possible.
This section contains the following topics:

9.4.1.1 Purpose of Bloom Filters


A Bloom filter tests one set of values to determine whether they are members another
set.
For example, one set is (10,20,30,40) and the second set is (10,30,60,70). A Bloom
filter can determine that 60 and 70 are guaranteed to be excluded from the first set,
and that 10 and 30 are probably members. Bloom filters are especially useful when the
amount of memory needed to store the filter is small relative to the amount of data in
the data set, and when most data is expected to fail the membership test.
Oracle Database uses Bloom filters to various specific goals, including the following:
• Reduce the amount of data transferred to slave processes in a parallel query,
especially when the database discards most rows because they do not fulfill a join
condition
• Eliminate unneeded partitions when building a partition access list in a join, known
as partition pruning
• Test whether data exists in the server result cache, thereby avoiding a disk read

9-46
Chapter 9
Join Optimizations

• Filter members in Exadata cells, especially when joining a large fact table and small
dimension tables in a star schema
Bloom filters can occur in both parallel and serial processing.

9.4.1.2 How Bloom Filters Work


A Bloom filter uses an array of bits to indicate inclusion in a set.
For example, 8 elements (an arbitrary number used for this example) in an array are initially
set to 0:

e1 e2 e3 e4 e5 e6 e7 e8
0 0 0 0 0 0 0 0

This array represents a set. To represent an input value i in this array, three separate hash
functions (three is arbitrary) are applied to i, each generating a hash value between 1 and 8:

f1(i) = h1
f2(i) = h2
f3(i) = h3

For example, to store the value 17 in this array, the hash functions set i to 17, and then return
the following hash values:

f1(17) = 5
f2(17) = 3
f3(17) = 5

In the preceding example, two of the hash functions happened to return the same value of 5,
known as a hash collision. Because the distinct hash values are 5 and 3, the 5th and 3rd
elements in the array are set to 1:

e1 e2 e3 e4 e5 e6 e7 e8
0 0 1 0 1 0 0 0

Testing the membership of 17 in the set reverses the process. To test whether the set
excludes the value 17, element 3 or element 5 must contain a 0. If a 0 is present in either
element, then the set cannot contain 17. No false negatives are possible.

To test whether the set includes 17, both element 3 and element 5 must contain 1 values.
However, if the test indicates a 1 for both elements, then it is still possible for the set not to
include 17. False positives are possible. For example, the following array might represent the
value 22, which also has a 1 for both element 3 and element 5:

e1 e2 e3 e4 e5 e6 e7 e8
1 0 1 0 1 0 0 0

9.4.1.3 Bloom Filter Controls


The optimizer automatically determines whether to use Bloom filters.

9-47
Chapter 9
Join Optimizations

To override optimizer decisions, use the hints PX_JOIN_FILTER and


NO_PX_JOIN_FILTER.

See Also:
Oracle Database SQL Language Reference to learn more about the bloom
filter hints

9.4.1.4 Bloom Filter Metadata


V$ views contain metadata about Bloom filters.

You can query the following views:


• V$SQL_JOIN_FILTER
This view shows the number of rows filtered out (FILTERED column) and tested
(PROBED column) by an active Bloom filter.
• V$PQ_TQSTAT
This view displays the number of rows processed through each parallel execution
server at each stage of the execution tree. You can use it to monitor how much
Bloom filters have reduced data transfer among parallel processes.
In an execution plan, a Bloom filter is indicated by keywords JOIN FILTER in the
Operation column, and the prefix :BF in the Name column, as in the 9th step of the
following plan snippet:

------------------------------------------------------------------------
---
| Id | Operation | Name | TQ |IN-OUT| PQ
Distrib |
------------------------------------------------------------------------
---
...
| 9 | JOIN FILTER CREATE | :BF0000 | Q1,03 | PCWP
| |

In the Predicate Information section of the plan, filters that contain functions
beginning with the string SYS_OP_BLOOM_FILTER indicate use of a Bloom filter.

9.4.1.5 Bloom Filters: Scenario


In this example, a parallel query joins the sales fact table to the products and times
dimension tables, and filters on fiscal week 18.

SELECT /*+ parallel(s) */ p.prod_name, s.quantity_sold


FROM sh.sales s, sh.products p, sh.times t
WHERE s.prod_id = p.prod_id
AND s.time_id = t.time_id
AND t.fiscal_week_number = 18;

9-48
Chapter 9
Join Optimizations

Querying DBMS_XPLAN.DISPLAY_CURSOR provides the following output:

SELECT * FROM
TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format => 'BASIC,+PARALLEL,+PREDICATE'));

EXPLAINED SQL STATEMENT:


------------------------
SELECT /*+ parallel(s) */ p.prod_name, s.quantity_sold FROM sh.sales s,
sh.products p, sh.times t WHERE s.prod_id = p.prod_id AND s.time_id =
t.time_id AND t.fiscal_week_number = 18

Plan hash value: 1183628457

---------------------------------------------------------------------------
| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | |
| 1 | PX COORDINATOR | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10003 | Q1,03 | P->S | QC (RAND) |
|* 3 | HASH JOIN BUFFERED | | Q1,03 | PCWP | |
| 4 | PX RECEIVE | | Q1,03 | PCWP | |
| 5 | PX SEND BROADCAST | :TQ10001 | Q1,01 | S->P | BROADCAST |
| 6 | PX SELECTOR | | Q1,01 | SCWC | |
| 7 | TABLE ACCESS FULL | PRODUCTS | Q1,01 | SCWP | |
|* 8 | HASH JOIN | | Q1,03 | PCWP | |
| 9 | JOIN FILTER CREATE | :BF0000 | Q1,03 | PCWP | |
| 10 | BUFFER SORT | | Q1,03 | PCWC | |
| 11 | PX RECEIVE | | Q1,03 | PCWP | |
| 12 | PX SEND HYBRID HASH| :TQ10000 | | S->P | HYBRID HASH|
|*13 | TABLE ACCESS FULL | TIMES | | | |
| 14 | PX RECEIVE | | Q1,03 | PCWP | |
| 15 | PX SEND HYBRID HASH | :TQ10002 | Q1,02 | P->P | HYBRID HASH|
| 16 | JOIN FILTER USE | :BF0000 | Q1,02 | PCWP | |
| 17 | PX BLOCK ITERATOR | | Q1,02 | PCWC | |
|*18 | TABLE ACCESS FULL | SALES | Q1,02 | PCWP | |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):


---------------------------------------------------

3 - access("S"."PROD_ID"="P"."PROD_ID")
8 - access("S"."TIME_ID"="T"."TIME_ID")
13 - filter("T"."FISCAL_WEEK_NUMBER"=18)
18 - access(:Z>=:Z AND :Z<=:Z)
filter(SYS_OP_BLOOM_FILTER(:BF0000,"S"."TIME_ID"))

A single server process scans the times table (Step 13), and then uses a hybrid hash
distribution method to send the rows to the parallel execution servers (Step 12). The
processes in set Q1,03 create a bloom filter (Step 9). The processes in set Q1,02 scan sales
in parallel (Step 18), and then use the Bloom filter to discard rows from sales (Step 16)
before sending them on to set Q1,03 using hybrid hash distribution (Step 15). The processes
in set Q1,03 hash join the times rows to the filtered sales rows (Step 8). The processes in set
Q1,01 scan products (Step 7), and then send the rows to Q1,03 (Step 5). Finally, the

9-49
Chapter 9
Join Optimizations

processes in Q1,03 join the products rows to the rows generated by the previous hash
join (Step 3).
The following figure illustrates the basic process.

Figure 9-8 Bloom Filter

Q1, 03

Create

Bloom filter
:BF0000

Q1, 01 Q1, 02

9.4.2 Partition-Wise Joins


A partition-wise join is an optimization that divides a large join of two tables, one of
which must be partitioned on the join key, into several smaller joins.
Partition-wise joins are either of the following:
• Full partition-wise join
Both tables must be equipartitioned on their join keys, or use reference partitioning
(that is, be related by referential constraints). The database divides a large join
into smaller joins between two partitions from the two joined tables.
• Partial partition-wise joins
Only one table is partitioned on the join key. The other table may or may not be
partitioned.
This section contains the following topics:

See Also:
Oracle Database VLDB and Partitioning Guide explains partition-wise joins in
detail

9.4.2.1 Purpose of Partition-Wise Joins


Partition-wise joins reduce query response time by minimizing the amount of data
exchanged among parallel execution servers when joins execute in parallel.
This technique significantly reduces response time and improves the use of CPU and
memory. In Oracle Real Application Clusters (Oracle RAC) environments, partition-
wise joins also avoid or at least limit the data traffic over the interconnect, which is the
key to achieving good scalability for massive join operations.

9-50
Chapter 9
Join Optimizations

9.4.2.2 How Partition-Wise Joins Work


When the database serially joins two partitioned tables without using a partition-wise join, a
single server process performs the join.
In the following illustration, the join is not partition-wise because the server process joins
every partition of table t1 to every partition of table t2.

Figure 9-9 Join That Is Not Partition-Wise

Server
Process

t1 t2

This section contains the following topics:

9.4.2.2.1 How a Full Partition-Wise Join Works


The database performs a full partition-wise join either serially or in parallel.
The following graphic shows a full partition-wise join performed in parallel. In this case, the
granule of parallelism is a partition. Each parallel execution server joins the partitions in pairs.
For example, the first parallel execution server joins the first partition of t1 to the first partition
of t2. The parallel execution coordinator then assembles the result.

9-51
Chapter 9
Join Optimizations

Figure 9-10 Full Partition-Wise Join in Parallel

PE Coordinator
t1 t2

PE Server

PE Server

PE Server

PE Server

A full partition-wise join can also join partitions to subpartitions, which is useful when
the tables use different partitioning methods. For example, customers is partitioned by
hash, but sales is partitioned by range. If you subpartition sales by hash, then the
database can perform a full partition-wise join between the hash partitions of the
customers and the hash subpartitions of sales.

In the execution plan, the presence of a partition operation before the join signals the
presence of a full partition-wise join, as in the following snippet:
| 8 | PX PARTITION HASH ALL|
|* 9 | HASH JOIN |

See Also:
Oracle Database VLDB and Partitioning Guide explains full partition-wise
joins in detail, and includes several examples

9.4.2.2.2 How a Partial Partition-Wise Join Works


Partial partition-wise joins, unlike their full partition-wise counterpart, must execute in
parallel.
The following graphic shows a partial partition-wise join between t1, which is
partitioned, and t2, which is not partitioned.

9-52
Chapter 9
Join Optimizations

Figure 9-11 Partial Partition-Wise Join

PE Coordinator
t1 t2

PE Server

PE Server t1 t2

PE Server

PE Server

PE Server

PE Server

PE Server

PE Server
Dynamically created
partitions

Because t2 is not partitioned, a set of parallel execution servers must generate partitions
from t2 as needed. A different set of parallel execution servers then joins the t1 partitions to
the dynamically generated partitions. The parallel execution coordinator assembles the result.
In the execution plan, the operation PX SEND PARTITION (KEY) signals a partial partition-wise
join, as in the following snippet:

| 11 | PX SEND PARTITION (KEY) |

See Also:
Oracle Database VLDB and Partitioning Guide explains full partition-wise joins in
detail, and includes several examples

9.4.3 In-Memory Join Groups


A join group is a user-created object that lists two or more columns that can be meaningfully
joined.
In certain queries, join groups eliminate the performance overhead of decompressing and
hashing column values. Join groups require an In-Memory Column Store (IM column store).

9-53
Chapter 9
Join Optimizations

See Also:
Oracle Database In-Memory Guide to learn how to optimize In-Memory
queries with join groups

9-54
Part V
Optimizer Statistics
The accuracy of an execution plan depends on the quality of the optimizer statistics.
This part contains the following chapters:
10
Optimizer Statistics Concepts
Oracle Database optimizer statistics describe details about the database and its objects.
This chapter includes the following topics:

10.1 Introduction to Optimizer Statistics


The optimizer cost model relies on statistics collected about the objects involved in a query,
and the database and host where the query runs.
The optimizer uses statistics to get an estimate of the number of rows (and number of bytes)
retrieved from a table, partition, or index. The optimizer estimates the cost for the access,
determines the cost for possible plans, and then picks the execution plan with the lowest
cost.
Optimizer statistics include the following:
• Table statistics
– Number of rows
– Number of blocks
– Average row length
• Column statistics
– Number of distinct values (NDV) in a column
– Number of nulls in a column
– Data distribution (histogram)
– Extended statistics
• Index statistics
– Number of leaf blocks
– Number of levels
– Index clustering factor
• System statistics
– I/O performance and utilization
– CPU performance and utilization
As shown in Figure 10-1, the database stores optimizer statistics for tables, columns,
indexes, and the system in the data dictionary. You can access these statistics using data
dictionary views.

10-1
Chapter 10
About Optimizer Statistics Types

Note:
The optimizer statistics are different from the performance statistics visible
through V$ views.

Figure 10-1 Optimizer Statistics

Database

Optimizer
Data Dictionary

Optimizer Statistics

Index Table Column System

PERSON
Table GB Execution
Plan
ID Name HJ
100 Kumar HJ
PERSON_ID_IX

CPU and I/O

10.2 About Optimizer Statistics Types


The optimizer collects statistics on different types of database objects and
characteristics of the database environment.
This section contains the following topics:

10.2.1 Table Statistics


Table statistics contain metadata that the optimizer uses when developing an
execution plan.
This section contains the following topics:

10-2
Chapter 10
About Optimizer Statistics Types

10.2.1.1 Permanent Table Statistics


In Oracle Database, table statistics include information about rows and blocks.
The optimizer uses these statistics to determine the cost of table scans and table joins. The
database tracks all relevant statistics about permanent tables. For example, table statistics
stored in DBA_TAB_STATISTICS track the following:

• Number of rows
The database uses the row count stored in DBA_TAB_STATISTICS when determining
cardinality.
• Average row length
• Number of data blocks
The optimizer uses the number of data blocks with the DB_FILE_MULTIBLOCK_READ_COUNT
initialization parameter to determine the base table access cost.
• Number of empty data blocks
DBMS_STATS.GATHER_TABLE_STATS commits before gathering statistics on permanent tables.

Example 10-1 Table Statistics


This example queries table statistics for the sh.customers table.

SELECT NUM_ROWS, AVG_ROW_LEN, BLOCKS,


EMPTY_BLOCKS, LAST_ANALYZED
FROM DBA_TAB_STATISTICS
WHERE OWNER='SH'
AND TABLE_NAME='CUSTOMERS';

Sample output appears as follows:

NUM_ROWS AVG_ROW_LEN BLOCKS EMPTY_BLOCKS LAST_ANAL


---------- ----------- ---------- ------------ ---------
55500 189 1517 0 25-MAY-17

See Also:

• "About Optimizer Initialization Parameters"


• "Gathering Schema and Table Statistics"
• Oracle Database Reference for a description of the DBA_TAB_STATISTICS view
and the DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter

10.2.1.2 Temporary Table Statistics


DBMS_STATS can gather statistics for both permanent and global temporary tables, but
additional considerations apply to the latter.
This section contains the following topics:

10-3
Chapter 10
About Optimizer Statistics Types

10.2.1.2.1 Types of Temporary Tables


Temporary tables are classified as global, private, or cursor-duration.
In all types of temporary table, the data is only visible to the session that inserts it. The
tables differ as follows:
• A global temporary table is an explicitly created persistent object that stores
intermediate session-private data for a specific duration.
The table is global because the definition is visible to all sessions. The ON COMMIT
clause of CREATE GLOBAL TEMPORARY TABLE indicates whether the table is
transaction-specific (DELETE ROWS) or session-specific (PRESERVE ROWS). Optimizer
statistics for global temporary tables can be shared or session-specific.
• A private temporary table is an explicitly created object, defined by private
memory-only metadata, that stores intermediate session-private data for a specific
duration.
The table is private because the definition is visible only to the session that
created the table. The ON COMMIT clause of CREATE PRIVATE TEMPORARY TABLE
indicates whether the table is transaction-specific (DROP DEFINITION) or session-
specific (PRESERVE DEFINITION).
• A cursor-duration temporary table is an implicitly created memory-only object
that is associated with a cursor.
Unlike global and private temporary tables, DBMS_STATS cannot gather statistics for
cursor-duration temporary tables.
The tables differ in where they store data, how they are created and dropped, and in
the duration and visibility of metadata. Note that the database allocates storage space
when a session first inserts data into a global temporary table, not at table creation.

Table 10-1 Important Characteristics of Temporary Tables

Characteristic Global Temporary Private Temporary Table Cursor-Duration


Table Temporary Table
Visibility of Data Session inserting Session inserting data Session inserting data
data
Storage of Data Persistent Memory or tempfiles, but Only in memory
only for the duration of the
session or transaction
Visibility of All sessions Session that created table Session executing
Metadata (in cursor
USER_PRIVATE_TEMP_TAB
LES view, which is based
on a V$ view)
Duration of Until table is explicitly Until table is explicitly Until cursor ages out of
Metadata dropped dropped, or end of session shared pool
(PRESERVE DEFINITION)
or transaction (DROP
DEFINITION)
Creation of Table CREATE GLOBAL CREATE PRIVATE Implicitly created when
TEMPORARY TABLE TEMPORARY TABLE optimizer considers it
(supports AS (supports AS SELECT) useful
SELECT)

10-4
Chapter 10
About Optimizer Statistics Types

Table 10-1 (Cont.) Important Characteristics of Temporary Tables

Characteristic Global Temporary Private Temporary Table Cursor-Duration


Table Temporary Table
Effect of No implicit commit No implicit commit No implicit commit
Creation on
Existing
Transactions
Naming Rules Same as for Must begin with ORA$PTT_ Internally generated
permanent tables unique name
Dropping of DROP GLOBAL DROP PRIVATE Implicitly dropped at
Table TEMPORARY TABLE TEMPORARY TABLE, or end of session
implicitly dropped at end of
session (PRESERVE
DEFINITION) or transaction
(DROP DEFINITION)

See Also:

• "Cursor-Duration Temporary Tables"


• Oracle Database Administrator’s Guide to learn how to manage temporary
tables

10.2.1.2.2 Statistics for Global Temporary Tables


DBMS_STATS collects the same types of statistics for global temporary tables as for permanent
tables.

Note:
You cannot collect statistics for private temporary tables.

The following table shows how global temporary tables differ in how they gather and store
optimizer statistics, depending on whether the tables are scoped to a transaction or session.

Table 10-2 Optimizer Statistics for Global Temporary Tables

Characteristic Transaction-Specific Session-Specific


Effect of DBMS_STATS collection Does not commit Commits
Storage of statistics Memory only Dictionary tables
Histogram creation Not supported Supported

The following procedures do not commit for transaction-specific temporary tables, so that
rows in these tables are not deleted:
• GATHER_TABLE_STATS

10-5
Chapter 10
About Optimizer Statistics Types

• DELETE_obj_STATS, where obj is TABLE, COLUMN, or INDEX


• SET_obj_STATS, where obj is TABLE, COLUMN, or INDEX
• GET_obj_STATS, where obj is TABLE, COLUMN, or INDEX
The preceding program units observe the GLOBAL_TEMP_TABLE_STATS statistics
preference. For example, if the table preference is set to SESSION, then
SET_TABLE_STATS sets the session statistics, and GATHER_TABLE_STATS preserves all
rows in a transaction-specific temporary table. If the table preference is set to SHARED,
however, then SET_TABLE_STATS sets the shared statistics, and GATHER_TABLE_STATS
deletes all rows from a transaction-specific temporary table.

See Also:

• "Gathering Schema and Table Statistics"


• Oracle Database Concepts to learn about global temporary tables
• Oracle Database PL/SQL Packages and Types Reference to learn about
the DBMS_STATS.GATHER_TABLE_STATS procedure

10.2.1.2.3 Shared and Session-Specific Statistics for Global Temporary Tables


Starting in Oracle Database 12c, you can set the table-level preference
GLOBAL_TEMP_TABLE_STATS to make statistics on a global temporary table shared
(SHARED) or session-specific (SESSION).

When GLOBAL_TEMP_TABLE_STATS is SESSION, you can gather optimizer statistics for a


global temporary table in one session, and then use the statistics for this session only.
Meanwhile, users can continue to maintain a shared version of the statistics. During
optimization, the optimizer first checks whether a global temporary table has session-
specific statistics. If yes, then the optimizer uses them. Otherwise, the optimizer uses
shared statistics if they exist.

Note:
In releases before Oracle Database 12c, the database did not maintain
optimizer statistics for global temporary tables and non-global temporary
tables differently. The database maintained one version of the statistics
shared by all sessions, even though data in different sessions could differ.

Session-specific optimizer statistics have the following characteristics:


• Dictionary views that track statistics show both the shared statistics and the
session-specific statistics in the current session.
The views are DBA_TAB_STATISTICS, DBA_IND_STATISTICS, DBA_TAB_HISTOGRAMS,
and DBA_TAB_COL_STATISTICS (each view has a corresponding USER_ and ALL_
version). The SCOPE column shows whether statistics are session-specific or
shared. Session-specific statistics must be stored in the data dictionary so that
multiple processes can access them in Oracle RAC.

10-6
Chapter 10
About Optimizer Statistics Types


CREATE ... AS SELECT automatically gathers optimizer statistics. When
GLOBAL_TEMP_TABLE_STATS is set to SHARED, however, you must gather statistics manually
using DBMS_STATS.
• Pending statistics are not supported.
• Other sessions do not share a cursor that uses the session-specific statistics.
Different sessions can share a cursor that uses shared statistics, as in releases earlier
than Oracle Database 12c. The same session can share a cursor that uses session-
specific statistics.
• By default, GATHER_TABLE_STATS for the temporary table immediately invalidates previous
cursors compiled in the same session. However, this procedure does not invalidate
cursors compiled in other sessions.

See Also:

• Oracle Database PL/SQL Packages and Types Reference to learn about the
GLOBAL_TEMP_TABLE_STATS preference
• Oracle Database Reference for a description of the DBA_TAB_STATISTICS view

10.2.2 Column Statistics


Column statistics track information about column values and data distribution.
The optimizer uses column statistics to generate accurate cardinality estimates and make
better decisions about index usage, join orders, join methods, and so on. For example,
statistics in DBA_TAB_COL_STATISTICS track the following:

• Number of distinct values


• Number of nulls
• High and low values
• Histogram-related information
The optimizer can use extended statistics, which are a special type of column statistics.
These statistics are useful for informing the optimizer of logical relationships among columns.

See Also:

• "Histograms "
• "About Statistics on Column Groups"
• Oracle Database Reference for a description of the DBA_TAB_COL_STATISTICS
view

10-7
Chapter 10
About Optimizer Statistics Types

10.2.3 Index Statistics


The index statistics include information about the number of index levels, the number
of index blocks, and the relationship between the index and the data blocks. The
optimizer uses these statistics to determine the cost of index scans.
This section contains the following topics:

10.2.3.1 Types of Index Statistics


The DBA_IND_STATISTICS view tracks index statistics.

Statistics include the following:


• Levels
The BLEVEL column shows the number of blocks required to go from the root block
to a leaf block. A B-tree index has two types of blocks: branch blocks for searching
and leaf blocks that store values. See Oracle Database Concepts for a conceptual
overview of B-tree indexes.
• Distinct keys
This columns tracks the number of distinct indexed values. If a unique constraint is
defined, and if no NOT NULL constraint is defined, then this value equals the
number of non-null values.
• Average number of leaf blocks for each distinct indexed key
• Average number of data blocks pointed to by each distinct indexed key

See Also:
Oracle Database Reference for a description of the DBA_IND_STATISTICS
view

Example 10-2 Index Statistics


This example queries some index statistics for the cust_lname_ix and customers_pk
indexes on the sh.customers table (sample output included):

SELECT INDEX_NAME, BLEVEL, LEAF_BLOCKS AS "LEAFBLK", DISTINCT_KEYS AS


"DIST_KEY",
AVG_LEAF_BLOCKS_PER_KEY AS "LEAFBLK_PER_KEY",
AVG_DATA_BLOCKS_PER_KEY AS "DATABLK_PER_KEY"
FROM DBA_IND_STATISTICS
WHERE OWNER = 'SH'
AND INDEX_NAME IN ('CUST_LNAME_IX','CUSTOMERS_PK');

INDEX_NAME BLEVEL LEAFBLK DIST_KEY LEAFBLK_PER_KEY DATABLK_PER_KEY


-------------- ------ ------- -------- --------------- ---------------
CUSTOMERS_PK 1 115 55500 1 1
CUST_LNAME_IX 1 141 908 1 10

10-8
Chapter 10
About Optimizer Statistics Types

10.2.3.2 Index Clustering Factor


For a B-tree index, the index clustering factor measures the physical grouping of rows in
relation to an index value, such as last name.
The index clustering factor helps the optimizer decide whether an index scan or full table
scan is more efficient for certain queries). A low clustering factor indicates an efficient index
scan.
A clustering factor that is close to the number of blocks in a table indicates that the rows are
physically ordered in the table blocks by the index key. If the database performs a full table
scan, then the database tends to retrieve the rows as they are stored on disk sorted by the
index key. A clustering factor that is close to the number of rows indicates that the rows are
scattered randomly across the database blocks in relation to the index key. If the database
performs a full table scan, then the database would not retrieve rows in any sorted order by
this index key.
The clustering factor is a property of a specific index, not a table. If multiple indexes exist on
a table, then the clustering factor for one index might be small while the factor for another
index is large. An attempt to reorganize the table to improve the clustering factor for one
index may degrade the clustering factor of the other index.
Example 10-3 Index Clustering Factor
This example shows how the optimizer uses the index clustering factor to determine whether
using an index is more effective than a full table scan.
1. Start SQL*Plus and connect to a database as sh, and then query the number of rows and
blocks in the sh.customers table (sample output included):

SELECT table_name, num_rows, blocks


FROM user_tables
WHERE table_name='CUSTOMERS';

TABLE_NAME NUM_ROWS BLOCKS


------------------------------ ---------- ----------
CUSTOMERS 55500 1486

2. Create an index on the customers.cust_last_name column.


For example, execute the following statement:

CREATE INDEX CUSTOMERS_LAST_NAME_IDX ON customers(cust_last_name);

3. Query the index clustering factor of the newly created index.


The following query shows that the customers_last_name_idx index has a high
clustering factor because the clustering factor is significantly more than the number of
blocks in the table:

SELECT index_name, blevel, leaf_blocks, clustering_factor


FROM user_indexes
WHERE table_name='CUSTOMERS'
AND index_name= 'CUSTOMERS_LAST_NAME_IDX';

INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR

10-9
Chapter 10
About Optimizer Statistics Types

------------------------------ ---------- -----------


-----------------
CUSTOMERS_LAST_NAME_IDX 1 141
9859

4. Create a new copy of the customers table, with rows ordered by cust_last_name.
For example, execute the following statements:

DROP TABLE customers3 PURGE;


CREATE TABLE customers3 AS
SELECT *
FROM customers
ORDER BY cust_last_name;

5. Gather statistics on the customers3 table.


For example, execute the GATHER_TABLE_STATS procedure as follows:
EXEC DBMS_STATS.GATHER_TABLE_STATS(null,'CUSTOMERS3');
6. Query the number of rows and blocks in the customers3 table .
For example, enter the following query (sample output included):

SELECT TABLE_NAME, NUM_ROWS, BLOCKS


FROM USER_TABLES
WHERE TABLE_NAME='CUSTOMERS3';

TABLE_NAME NUM_ROWS BLOCKS


------------------------------ ---------- ----------
CUSTOMERS3 55500 1485

7. Create an index on the cust_last_name column of customers3.


For example, execute the following statement:

CREATE INDEX CUSTOMERS3_LAST_NAME_IDX ON customers3(cust_last_name);

8. Query the index clustering factor of the customers3_last_name_idx index.


The following query shows that the customers3_last_name_idx index has a lower
clustering factor:

SELECT INDEX_NAME, BLEVEL, LEAF_BLOCKS, CLUSTERING_FACTOR


FROM USER_INDEXES
WHERE TABLE_NAME = 'CUSTOMERS3'
AND INDEX_NAME = 'CUSTOMERS3_LAST_NAME_IDX';

INDEX_NAME BLEVEL LEAF_BLOCKS


CLUSTERING_FACTOR
------------------------------ ---------- -----------
-----------------
CUSTOMERS3_LAST_NAME_IDX 1 141
1455

10-10
Chapter 10
About Optimizer Statistics Types

The table customers3 has the same data as the original customers table, but the index
on customers3 has a much lower clustering factor because the data in the table is
ordered by the cust_last_name. The clustering factor is now about 10 times the number
of blocks instead of 70 times.
9. Query the customers table.
For example, execute the following query (sample output included):

SELECT cust_first_name, cust_last_name


FROM customers
WHERE cust_last_name BETWEEN 'Puleo' AND 'Quinn';

CUST_FIRST_NAME CUST_LAST_NAME
-------------------- ----------------------------------------
Vida Puleo
Harriett Quinlan
Madeleine Quinn
Caresse Puleo

10. Display the cursor for the query.


For example, execute the following query (partial sample output included):

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR());

-------------------------------------------------------------------------------
| Id | Operation | Name | Rows |Bytes|Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0| SELECT STATEMENT | | | | 405 (100)| |
|* 1| TABLE ACCESS STORAGE FULL| CUSTOMERS | 2335|35025| 405 (1)|00:00:01|
-------------------------------------------------------------------------------

The preceding plan shows that the optimizer did not use the index on the original
customers tables.
11. Query the customers3 table.

For example, execute the following query (sample output included):

SELECT cust_first_name, cust_last_name


FROM customers3
WHERE cust_last_name BETWEEN 'Puleo' AND 'Quinn';

CUST_FIRST_NAME CUST_LAST_NAME
-------------------- ----------------------------------------
Vida Puleo
Harriett Quinlan
Madeleine Quinn
Caresse Puleo

12. Display the cursor for the query.

For example, execute the following query (partial sample output included):

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR());

10-11
Chapter 10
About Optimizer Statistics Types

-----------------------------------------------------------------------------------
----
|Id| Operation | Name |Rows|Bytes|Cost(%CPU)|
Time|
-----------------------------------------------------------------------------------
----
| 0| SELECT STATEMENT | | | |
69(100)| |
| 1| TABLE ACCESS BY INDEX ROWID|CUSTOMERS3 |2335|35025|69(0) |
00:00:01|
|*2| INDEX RANGE SCAN |CUSTOMERS3_LAST_NAME_IDX|2335| |7(0) |
00:00:01|
-----------------------------------------------------------------------------------
----

The result set is the same, but the optimizer chooses the index. The plan cost is
much less than the cost of the plan used on the original customers table.
13. Query customers with a hint that forces the optimizer to use the index.

For example, execute the following query (partial sample output included):

SELECT /*+ index (Customers CUSTOMERS_LAST_NAME_IDX) */


cust_first_name,
cust_last_name
FROM customers
WHERE cust_last_name BETWEEN 'Puleo' and 'Quinn';

CUST_FIRST_NAME CUST_LAST_NAME
-------------------- ----------------------------------------
Vida Puleo
Caresse Puleo
Harriett Quinlan
Madeleine Quinn

14. Display the cursor for the query.

For example, execute the following query (partial sample output included):

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR());

-----------------------------------------------------------------------------------
----
| Id | Operation | Name |Rows|Bytes|Cost(%CPU)|
Time |
-----------------------------------------------------------------------------------
----
| 0| SELECT STATEMENT | | | |
422(100)| |
| 1| TABLE ACCESS BY INDEX ROWID|CUSTOMERS |335 |35025|422(0) |
00:00:01|
|*2| INDEX RANGE SCAN |CUSTOMERS_LAST_NAME_IDX|2335| |7(0) |
00:00:01|
-----------------------------------------------------------------------------------
----

10-12
Chapter 10
About Optimizer Statistics Types

The preceding plan shows that the cost of using the index on customers is higher than
the cost of a full table scan. Thus, using an index does not necessarily improve
performance. The index clustering factor is a measure of whether an index scan is more
effective than a full table scan.

10.2.3.3 Effect of Index Clustering Factor on Cost: Example


This example illustrates how the index clustering factor can influence the cost of table
access.
Consider the following scenario:
• A table contains 9 rows that are stored in 3 data blocks.
• The col1 column currently stores the values A, B, and C.
• A nonunique index named col1_idx exists on col1 for this table.
Example 10-4 Collocated Data
Assume that the rows are stored in the data blocks as follows:

Block 1 Block 2 Block 3


------- ------- -------
A A A B B B C C C

In this example, the index clustering factor for col1_idx is low. The rows that have the same
indexed column values for col1 are in the same data blocks in the table. Thus, the cost of
using an index range scan to return all rows with value A is low because only one block in the
table must be read.
Example 10-5 Scattered Data
Assume that the same rows are scattered across the data blocks as follows:

Block 1 Block 2 Block 3


------- ------- -------
A B C A C B B A C

In this example, the index clustering factor for col1_idx is higher. The database must read all
three blocks in the table to retrieve all rows with the value A in col1.

See Also:
Oracle Database Reference for a description of the DBA_INDEXES view

10.2.4 System Statistics


The system statistics describe hardware characteristics such as I/O and CPU performance
and utilization.
System statistics enable the query optimizer to more accurately estimate I/O and CPU costs
when choosing execution plans. The database does not invalidate previously parsed SQL

10-13
Chapter 10
How the Database Gathers Optimizer Statistics

statements when updating system statistics. The database parses all new SQL
statements using new statistics.

See Also:

• "Gathering System Statistics Manually"


• Oracle Database Reference

10.2.5 User-Defined Optimizer Statistics


The extensible optimizer enables authors of user-defined functions and indexes to
create statistics collection, selectivity, and cost functions. The optimizer cost model is
extended to integrate information supplied by the user to assess CPU and the I/O cost.
Statistics types act as interfaces for user-defined functions that influence the choice of
execution plan. However, to use a statistics type, the optimizer requires a mechanism
to bind the type to a database object such as a column, standalone function, object
type, index, indextype, or package. The SQL statement ASSOCIATE STATISTICS allows
this binding to occur.
Functions for user-defined statistics are relevant for columns that use both standard
SQL data types and object types, and for domain indexes. When you associate a
statistics type with a column or domain index, the database calls the statistics
collection method in the statistics type whenever DBMS_STATS gathers statistics.

See Also:

• "Gathering Schema and Table Statistics"


• Oracle Database Data Cartridge Developer's Guide to learn about the
extensible optimizer and user-defined statistics

10.3 How the Database Gathers Optimizer Statistics


Oracle Database provides several mechanisms to gather statistics.
This section contains the following topics:

10.3.1 DBMS_STATS Package


The DBMS_STATS PL/SQL package collects and manages optimizer statistics.

This package enables you to control what and how statistics are collected, including
the degree of parallelism for statistics collection, sampling methods, granularity of
statistics collection in partitioned tables, and so on.

10-14
Chapter 10
How the Database Gathers Optimizer Statistics

Note:
Do not use the COMPUTE and ESTIMATE clauses of the ANALYZE statement to collect
optimizer statistics. These clauses have been deprecated. Instead, use DBMS_STATS.

Statistics gathered with the DBMS_STATS package are required for the creation of accurate
execution plans. For example, table statistics gathered by DBMS_STATS include the number of
rows, number of blocks, and average row length.
By default, Oracle Database uses automatic optimizer statistics collection. In this case, the
database automatically runs DBMS_STATS to collect optimizer statistics for all schema objects
for which statistics are missing or stale. The process eliminates many manual tasks
associated with managing the optimizer, and significantly reduces the risks of generating
suboptimal execution plans because of missing or stale statistics. You can also update and
manage optimizer statistics by manually executing DBMS_STATS.

See Also:

• "Configuring Automatic Optimizer Statistics Collection"


• "Gathering Optimizer Statistics Manually"
• Oracle Database Administrator’s Guide to learn more about automated
maintenance tasks
• Oracle Database PL/SQL Packages and Types Reference to learn about
DBMS_STATS

10.3.2 Supplemental Dynamic Statistics


By default, when optimizer statistics are missing, stale, or insufficient, the database
automatically gathers dynamic statistics during a parse. The database uses recursive SQL
to scan a small random sample of table blocks.

Note:
Dynamic statistics augment statistics rather than providing an alternative to them.

Dynamic statistics supplement optimizer statistics such as table and index block counts, table
and join cardinalities (estimated number of rows), join column statistics, and GROUP BY
statistics. This information helps the optimizer improve plans by making better estimates for
predicate cardinality.
Dynamic statistics are beneficial in the following situations:
• An execution plan is suboptimal because of complex predicates.
• The sampling time is a small fraction of total execution time for the query.
• The query executes many times so that the sampling time is amortized.

10-15
Chapter 10
How the Database Gathers Optimizer Statistics

10.3.3 Online Statistics Gathering for Bulk Loads


Starting in Oracle Database 12c, the database can gather table statistics automatically
during the following types of bulk loads: INSERT INTO ... SELECT into an empty table
using a direct path insert, and CREATE TABLE AS SELECT .

Note:
By default, a parallel insert uses a direct path insert. You can force a direct
path insert by using the /*+APPEND*/ hint.

This section contains the following topics:

See Also:
Oracle Database Data Warehousing Guide to learn more about bulk loads

10.3.3.1 Purpose of Online Statistics Gathering for Bulk Loads


Data warehouse applications typically load large amounts of data into the database.
For example, a sales data warehouse might load data every day, week, or month.
In releases earlier than Oracle Database 12c, the best practice was to gather statistics
manually after a bulk load. However, many applications did not gather statistics after
the load because of negligence or because they waited for the maintenance window to
initiate collection. Missing statistics are the leading cause of suboptimal execution
plans.
Automatic statistics gathering during bulk loads has the following benefits:
• Improved performance
Gathering statistics during the load avoids an additional table scan to gather table
statistics.
• Improved manageability
No user intervention is required to gather statistics after a bulk load.

10.3.3.2 Global Statistics During Inserts into Empty Partitioned Tables


When inserting rows into an empty partitioned table, the database gathers global
statistics during the insert.
For example, if sales is an empty partitioned table, and if you run INSERT INTO sales
SELECT, then the database gathers global statistics for sales. However, the database
does not gather partition-level statistics.
Assume a different case in which you use extended syntax to insert rows into a
particular partition or subpartition, which is empty. The database gathers statistics on

10-16
Chapter 10
How the Database Gathers Optimizer Statistics

the empty partition during the insert. However, the database does not gather global statistics.
Assume that you run INSERT INTO sales PARTITION (sales_q4_2000) SELECT. If partition
sales_q4_2000 is empty before the insert (other partitions need not be empty), then the
database gathers statistics during the insert. Moreover, if the INCREMENTAL preference is
enabled for sales, then the database also gathers a synopsis for sales_q4_2000. Statistics
are immediately available after the INSERT statement. However, if you roll back the
transaction, then the database automatically deletes statistics gathered during the bulk load.

See Also:

• "Considerations for Incremental Statistics Maintenance"


• Oracle Database SQL Language Reference for INSERT syntax and semantics

10.3.3.3 Index Statistics and Histograms During Bulk Loads


While gathering online statistics, the database does not gather index statistics or create
histograms. If these statistics are required, then Oracle recommends running
DBMS_STATS.GATHER_TABLE_STATS with the options parameter set to GATHER AUTO after the
bulk load.
For example, the following command gathers statistics for the bulk-loaded sh_ctas table:

EXEC DBMS_STATS.GATHER_TABLE_STATS( user, 'SH_CTAS', options => 'GATHER


AUTO' );

The preceding example only gathers missing or stale statistics. The database does not
gather table and basic column statistics collected during the bulk load.

Note:
You can set the table preference options to GATHER AUTO on the tables that you
plan to bulk load. In this way, you need not explicitly set the options parameter
when running GATHER_TABLE_STATS.

See Also:

• "Gathering Schema and Table Statistics"


• Oracle Database Data Warehousing Guide to learn more about bulk loading

10.3.3.4 Restrictions for Online Statistics Gathering for Bulk Loads


In some situations, optimizer statistics gathering does not occur automatically for bulk loads.

10-17
Chapter 10
How the Database Gathers Optimizer Statistics

Specifically, bulk loads do not gather statistics automatically when any of the following
conditions applies to the target table, partition, or subpartition:
• It is not empty, and you perform an INSERT INTO ... SELECT.
In this case, an OPTIMIZER STATISTICS GATHERING row source appears in the
plan, but this row source is only a pass-through. The database does not actually
gather optimizer statistics.

Note:
The DBA_TAB_COL_STATISTICS.NOTES column is set to STATS_ON_LOAD by
a bulk load into an empty table. However, subsequent bulk loads into the
non-empty table do not reset the NOTES column. One technique for
determining whether the database gathered statistics is to query the
USER_TAB_MODIFICATIONS.INSERTS column. If the query returns a row
indicating the number of rows loaded, then the most recent bulk load did
not gather statistics automatically.

• It is loaded using an INSERT INTO ... SELECT, and neither of the following
conditions is true: all columns of the target table are specified, or a subset of the
target columns are specified and the unspecified columns have default values.
Put differently, the database only gathers statistics automatically for bulk loads
when either all columns of the target table are specified, or a subset of the target
columns are specified and the unspecified columns have default values. For
example, the sales table has only columns c1, c2, c3, and c4. The column c4 does
not have a default value. You load sales_copy by executing INSERT /*+ APPEND
*/ INTO sales_copy SELECT c1, c2, c3 FROM sales. In this case, the database
does not gather online statistics for sales_copy. The database would gather
statistics if c4 had a default value or if it were included in the SELECT list.
• It is in an Oracle-owned schema such as SYS.
• It is one of the following types of tables: nested table, index-organized table (IOT),
external table, or global temporary table defined as ON COMMIT DELETE ROWS.
• It has a PUBLISH preference set to FALSE.
• Its statistics are locked.
• It is partitioned, INCREMENTAL is set to true, and partition-extended syntax is not
used.
For example, assume that you execute DBMS_STATS.SET_TABLE_PREFS(null,
'sales', incremental', 'true'). In this case, the database does not gather
statistics for INSERT INTO sales SELECT, even when sales is empty. However, the
database does gather statistics automatically for INSERT INTO sales PARTITION
(sales_q4_2000) SELECT.
• It is loaded using a multitable INSERT statement.

10-18
Chapter 10
When the Database Gathers Optimizer Statistics

See Also:

• "Gathering Schema and Table Statistics"


• Oracle Database PL/SQL Packages and Types Reference to learn more about
DBMS_STATS.SET_TABLE_PREFS

10.3.3.5 Hints for Online Statistics Gathering for Bulk Loads


By default, the database gathers statistics during bulk loads. You can disable the feature at
the statement level by using the NO_GATHER_OPTIMIZER_STATISTICS hint, and enable the
feature at the statement level by using the GATHER_OPTIMIZER_STATISTICS hint.

For example, the following statement disables online statistics gathering for bulk loads:

CREATE TABLE employees2 AS


SELECT /*+NO_GATHER_OPTIMIZER_STATISTICS*/ * FROM employees

See Also:
Oracle Database SQL Language Reference to learn about the
GATHER_OPTIMIZER_STATISTICS and NO_GATHER_OPTIMIZER_STATISTICS hints

10.4 When the Database Gathers Optimizer Statistics


The database collects optimizer statistics at various times and from various sources.
This section contains the following topics:

10.4.1 Sources for Optimizer Statistics


The optimizer uses several different sources for optimizer statistics.
The sources are as follows:
• DBMS_STATS execution, automatic or manual
This PL/SQL package is the primary means of gathering optimizer statistics.
• SQL compilation
During SQL compilation, the database can augment the statistics previously gathered by
DBMS_STATS. In this stage, the database runs additional queries to obtain more accurate
information on how many rows in the tables satisfy the WHERE clause predicates in the
SQL statement.
• SQL execution
During execution, the database can further augment previously gathered statistics. In this
stage, Oracle Database collects the number of rows produced by every row source
during the execution of a SQL statement. At the end of execution, the optimizer
determines whether the estimated number of rows is inaccurate enough to warrant

10-19
Chapter 10
When the Database Gathers Optimizer Statistics

reparsing at the next statement execution. If the cursor is marked for reparsing,
then the optimizer uses actual row counts from the previous execution instead of
estimates.
• SQL profiles
A SQL profile is a collection of auxiliary statistics on a query. The profile stores
these supplemental statistics in the data dictionary. The optimizer uses SQL
profiles during optimization to determine the most optimal plan.
The database stores optimizer statistics in the data dictionary and updates or replaces
them as needed. You can query statistics in data dictionary views.

See Also:

• "When the Database Samples Data"


• "About SQL Profiles"
• Oracle Database PL/SQL Packages and Types Reference to learn about
the DBMS_STATS.GATHER_TABLE_STATS procedure

10.4.2 SQL Plan Directives


A SQL plan directive is additional information and instructions that the optimizer can
use to generate a more optimal plan.
The directive is a “note to self” by the optimizer that it is misestimating cardinalities of
certain types of predicates, and also a reminder to DBMS_STATS to gather statistics
needed to correct the misestimates in the future. For example, when joining two tables
that have a data skew in their join columns, a SQL plan directive can direct the
optimizer to use dynamic statistics to obtain a more accurate join cardinality estimate.
This section contains the following topics:

10.4.2.1 When the Database Creates SQL Plan Directives


The database creates SQL plan directives automatically based on information learned
during automatic reoptimization. If a cardinality misestimate occurs during SQL
execution, then the database creates SQL plan directives.
For each new directive, the DBA_SQL_PLAN_DIRECTIVES.STATE column shows the value
USABLE. This value indicates that the database can use the directive to correct
misestimates.
The optimizer defines a SQL plan directive on a query expression, for example, filter
predicates on two columns being used together. A directive is not tied to a specific
SQL statement or SQL ID. For this reason, the optimizer can use directives for
statements that are not identical. For example, directives can help the optimizer with
queries that use similar patterns, such as queries that are identical except for a select
list item.
The Notes section of the execution plan indicates the number of SQL plan directives
used for a statement. Obtain more information about the directives by querying the
DBA_SQL_PLAN_DIRECTIVES and DBA_SQL_PLAN_DIR_OBJECTS views.

10-20
Chapter 10
When the Database Gathers Optimizer Statistics

See Also:
Oracle Database Reference to learn more about DBA_SQL_PLAN_DIRECTIVES

10.4.2.2 How the Database Uses SQL Plan Directives


When compiling a SQL statement, if the optimizer sees a directive, then it obeys the directive
by gathering additional information.
The optimizer uses directives in the following ways:
• Dynamic statistics
The optimizer uses dynamic statistics whenever it does not have sufficient statistics
corresponding to the directive. For example, the cardinality estimates for a query whose
predicate contains a specific pair of columns may be significantly wrong. A SQL plan
directive indicates that the whenever a query that contains these columns is parsed, the
optimizer needs to use dynamic sampling to avoid a serious cardinality misestimate.
Dynamic statistics have some performance overhead. Every time the optimizer hard
parses a query to which a dynamic statistics directive applies, the database must perform
the extra sampling.
Starting in Oracle Database 12c Release 2 (12.2), the database writes statistics from
adaptive dynamic sampling to the SQL plan directives store, making them available to
other queries.
• Column groups
The optimizer examines the query corresponding to the directive. If there is a missing
column group, and if the DBMS_STATS preference AUTO_STAT_EXTENSIONS is set to ON (the
default is OFF) for the affected table, then the optimizer automatically creates this column
group the next time DBMS_STATS gathers statistics on the table. Otherwise, the optimizer
does not automatically create the column group.
If a column group exists, then the next time this statement executes, the optimizer uses
the column group statistics in place of the SQL plan directive when possible (equality
predicates, GROUP BY, and so on). In subsequent executions, the optimizer may create
additional SQL plan directives to address other problems in the plan, such as join or
GROUP BY cardinality misestimates.

Note:
Currently, the optimizer monitors only column groups. The optimizer does not
create an extension on expressions.

When the problem that occasioned a directive is solved, either because a better directive
exists or because a histogram or extension exists, the DBA_SQL_PLAN_DIRECTIVES.STATE
value changes from USABLE to SUPERSEDED. More information about the directive state is
exposed in the DBA_SQL_PLAN_DIRECTIVES.NOTES column.

10-21
Chapter 10
When the Database Gathers Optimizer Statistics

See Also:

• "Managing Extended Statistics"


• "About Statistics on Column Groups"
• Oracle Database PL/SQL Packages and Types Reference to learn more
about the AUTO_STAT_EXTENSIONS preference for
DBMS_STATS.SET_TABLE_STATS

10.4.2.3 SQL Plan Directive Maintenance


The database automatically creates SQL plan directives. You cannot create them
manually.
The database initially creates directives in the shared pool. The database periodically
writes the directives to the SYSAUX tablespace. The database automatically purges any
SQL plan directive that is not used after the specified number of weeks
(SPD_RETENTION_WEEKS), which is 53 by default.

You can manage directives by using the DBMS_SPD package. For example, you can:

• Enable and disable SQL plan directives (ALTER_SQL_PLAN_DIRECTIVE)


• Change the retention period for SQL plan directives (SET_PREFS)
• Export a directive to a staging table (PACK_STGTAB_DIRECTIVE)
• Drop a directive (DROP_SQL_PLAN_DIRECTIVE)
• Force the database to write directives to disk (FLUSH_SQL_PLAN_DIRECTIVE)

See Also:
Oracle Database PL/SQL Packages and Types Reference to learn about the
DBMS_SPD package

10.4.2.4 How the Optimizer Uses SQL Plan Directives: Example


This example shows how the database automatically creates and uses SQL plan
directives for SQL statements.

Assumptions
You plan to run queries against the sh schema, and you have privileges on this
schema and on data dictionary and V$ views.

To see how the database uses a SQL plan directive:


1. Query the sh.customers table.

SELECT /*+gather_plan_statistics*/ *
FROM customers

10-22

You might also like