Oracle AWR (Automatic Workload Repository) Trending
Oracle AWR (Automatic Workload Repository) Trending
Oracle AWR (Automatic Workload Repository) Trending
Oracle AWR (Automatic (same load), just to rule out whether that average response time is normal for
this database. If a job had the same response time earlier and users were not
Workload Repository) Trending complaining, it usually means that the problem is somewhere else and we need to
dig further, probably using specific SQL trace for that portion of the application.
What is AWR?
AWR takes periodic snapshots of performance statistics (by default every hour)
and exposes this using DBA_HIST* views. Please note that AWR is a licensed
product under Diagnostic Pack; hence, even to access these views directly, requires
licenses needs to be purchased. Oracle Database 11g further expands on this.
For example, there are 79 DBA_HIST* views available in Oracle 10g (10.2.0.4).
SQL> select count (*) from dictionary where table_name like ‘DBA_HIST%’;
By Kapil Goyal COUNT(*)
----------
T
rending is very important for database performance 79
analysis. It exposes the performance profile of a On the other hand, there are 100 DBA_HIST* views available in Oracle 11g
database in terms of IO, CPU usage, wait-event response (11.1.0.7)
time, etc. Starting in Oracle Database 10g, AWR performance Select count(*) from dictionary where table_name like ‘DBA_HIST%’;
data that is collected and stored out of the box is very helpful COUNT(*)
in creating historical and comparative trend analysis to ----------------
100
gain insight into critical database performance issues. AWR
provides a rich history of Oracle performance statistics that How Much Space Does AWR Use?
shows how an Oracle database system has been trending for The following query can be used to see the current occupied space by AWR data.
as long as the AWR data is retained. Select SPACE_USAGE_KBYTES from v$sysaux_occupants where occupant_name like
‘%AWR%’;
So Where Do We Start? SPACE_USAGE_KBYTES
------------------
Any statistic in AWR can be trended quite easily. The AWR report consists of 4,384,640
SQLs running against various “DBA_HIST_%” views taking the difference
between two snapshots at a time. This implies that we can develop and execute 1 row selected.
similar SQL against those views to report the trend for any required statistic or
database wait event. Note that this size will vary depending on retention period, the snapshot
When analyzing AWR reports during the time a performance problem manifested, frequency, the number of datafiles, etc. You can monitor this space in a
and some wait event or response time figures seem high or a new set of SQL development/test instance to project space requirements in case you want to
statements come up, it is always a good idea to go back in history and check increase the retention period in production.
how the identified events, statistics, SQL or job behaved during similar times
previously—it could yesterday or last week or even last month. Of course, this Retention
You can use the following query to see the current retention policy.
Side Note: AWR is not a replacement for real time monitoring; it contains historical data
Select
and can be used to investigate what happened or what caused the performance issue. extract( day from snap_interval) *24*60+
One important difference between STATSPACK in Oracle9i and AWR in Oracle Database extract( hour from snap_interval) *60+
10g is that 9i exposes the source code for statspack while 10g does not. However, you extract( minute from snap_interval ) “Snapshot Interval”,
can use most of the 9i code scripts to understand AWR structures making it less hard to extract( day from retention) *24*60+
extract( hour from retention) *60+
write queries against 10g DBA_HIST* tables. For your information, STATSPACK source extract( minute from retention ) “Retention Interval(in Minutes)” ,
code in 9i can be seen in the $ORACLE_HOME/rdbms/admin/sprepins.sql file. This file extract(day from retention) “Retention (in Days)” from dba_hist_wr_control;
provides a fair idea about the “STATS$_%” tables used to store and generate STATSPACk
Snapshot Interval Retention Interval(in Minutes) Retention (in Days)
reports. In most cases, you can use the same SQL and replace the corresponding ----------------- ------------------------------ -------------------
“STAT$_%” tables with “DBA_HIST_%” tables. 30 129,600 90
The Time Model Statistics is a great feature in Oracle Database 10g. It tells us
where exactly the time is being spent. Industry performance experts like Anjo
Kolk, Cary Millsap (method-r.com) have always talked about Response time, Sample Output
and that is exactly what matters.
SQL> @”time_model.sql”
Response time=service time+wait time “Enter the date in DD-Mon-YY Format and Stats you want to trend like ‘DB time’,
‘DB CPU’, ‘sql execute elapsed time’, ‘PL/SQL execution elapsed time’, ‘parse
Time Model Statistics is based on Response time as well, as it shows the time time elapsed’, ‘background elapsed time’”
spent in the database calls by different operation types like DB Time, sql execution Enter value for date: 29-jan-10
old 11: trunc(sn.begin_interval_time) =’&Date’
elapsed time, DB CPU, parsing, and hard parsing. If you are comparing two new 11: trunc(sn.begin_interval_time) =’29-jan-10’
AWR reports for good and bad period and you don’t see much difference in Enter value for stat_name: DB time
“DB Time” statistics, then most likely the issue is not within the database but old 17: and st.stat_name=’&stat_name’
new 17: and st.stat_name=’DB time’
somewhere else (like client side or middleware/app layer, etc.).
Date time Statistics Name Time (s)
The most important statistics of the time model statistics is DB time. This ------------------------- ------------------------- ---------------
statistics represents the total time spent in database calls and is an indicator of 01/29/10_01_00_02_00 DB time 45,789
the total instance workload. It is calculated by aggregating the CPU and wait 01/29/10_02_00_03_00 DB time 54,308
01/29/10_03_00_04_00 DB time 62,480
times of all sessions not waiting on idle wait events (non-idle user sessions). 01/29/10_04_00_05_00 DB time 57,677
01/29/10_05_00_06_00 DB time 80,028
If the load increases on the system, then DB time increases. More users 01/29/10_06_00_07_00 DB time 76,765
mean more database calls; hence, higher DB time. If performance degrades 01/29/10_07_00_08_00 DB time 45,680
like higher IO time or wait time, then also DB Time increases (usually 01/29/10_08_00_09_00 DB time 53,926
01/29/10_09_00_10_00 DB time 45,778
because there are many more sessions waiting for non-idle events) and focus 01/29/10_10_00_11_00 DB time 40,402
should be where the time is being spent and how can we reduce it. 01/29/10_11_00_12_00 DB time 39,801
01/29/10_12_00_13_00 DB time 31,821
The following script can be used to see the trend for Time Model Statistics. 01/29/10_13_00_14_00 DB time 17,866
It can also be used to see when the database was the busiest—just look for 01/29/10_14_00_15_00 DB time 20,888
01/29/10_15_00_16_00 DB time 18,000
highest DB time.
time_model.sql By looking at above data, it shows instance was busiest between 05:00-06:00 am.
alter session set nls_date_format=’dd-mon-yy’; Following is a sample AWR where application was spending most of it’s time
on CPU.
set lines 150 pages 1000
col date_time heading ‘Date time’ for a25
Snap Id Snap Time Sessions Curs/Sess
col stat_name heading ‘Statistics Name’ for a25 --------- ------------------- -------- ---------
col time heading ‘Time (s)’ for 99,999,999,999 Begin Snap: 14137 23-Nov-09 10:15:18 414 175.7
prompt “Enter the date in DD-Mon-YY Format and Stats you want to trend like ‘DB End Snap: 14138 23-Nov-09 10:18:33 425 175.0
time’, ‘DB CPU’, ‘sql execute elapsed time’, ‘PL/SQL execution elapsed time’, Elapsed: 3.25 (mins)
‘parse time elapsed’, ‘background elapsed time’” DB Time: 125.30 (mins)
continued on page 8
As you can see, only in 3.25 minutes (195 sec) of walk clock time it chewed up Want to Know if Execution Plan Changed Recently?
5469 CPU seconds. By looking at this number, you can estimate the minimum In my experience, in a significant percentage of cases, sudden performance
number of CPUs on this box. It is really nice to play with these numbers and to degradation occurs because the SQL execution plan for one or more key SQL
draw some conclusions. In other words, to consume 5469 CPU seconds in 195 queries changes. Usually, it exists for cases where clients state that nothing has
seconds, we must have (5469/195=28.046) CPUs. Generally, I would expect it changed, no new code or no load change occurred but performance has
to be a 32 CPU box that is true as following is what “Operating System degraded drastically and queries are performing poorly. Whenever I conduct a
Statistics” shows in the AWR. performance analysis and identify if performance got degraded because of one
or few SQLs, I always try to find why that particular SQL statement is
Operating System Statistics consuming higher logical I/O (usually including increased physical I/O and
CPU usage) compared to when it was running fine. The following SQL tells me
Statistic Total
-------------------------------- -------------------- when exactly the execution plan changed recently for the given SQLID.
AVG_BUSY_TIME 18,644
AVG_IDLE_TIME 816 sqlid_stat.sql
AVG_IOWAIT_TIME 558
AVG_SYS_TIME 2,017
AVG_USER_TIME 16,622 set lines 150 pages 150
BUSY_TIME 596,848 col BEGIN_INTERVAL_TIME for a23
IDLE_TIME 26,307 col PLAN_HASH_VALUE for 9999999999
IOWAIT_TIME 18,141 col date_time for a18
SYS_TIME 64,732 col snap_id heading ‘SnapId’
USER_TIME 532,116 col executions_delta heading “No. of exec”
LOAD 0 col sql_profile heading “SQL|Profile” for a7
OS_CPU_WAIT_TIME 663,600 col date_time heading ‘Date time’
Sample Output
SQL> @sqlid_stat.sql
Enter value for sqlid: 0pjnz23mbf3wm
old 23: (‘&SQLID’)
new 23: (‘0pjnz23mbf3wm’)
SnapId PLAN_HASH_VALUE Date time No. of exec LIO/exec CPUTIM/exec ETIME/exec PIO/exec ROWs/exec
------------ ----------------- -------------------- ------------- ----------------- ------------- ------------- ------------- -------------
105152 1415312706 12/31/09_0450_0500 1277 14.40 .00 .00 .00 9.23
105459 1415312706 01/02/10_0800_0810 166 20.68 .00 .01 .34 1.00
105460 1415312706 01/02/10_0810_0820 444 11.24 .00 .00 .20 .97
105461 1415312706 01/02/10_0820_0830 1081 13.84 .00 .00 .21 1.18
105462 1415312706 01/02/10_0830_0840 1239 16.59 .00 .00 .13 2.03
105464 1415312706 01/02/10_0850_0900 1194 16.75 .00 .00 .09 3.10
105465 1415312706 01/02/10_0900_0910 610 16.87 .00 .00 .19 7.08
105466 1415312706 01/02/10_0910_0920 673 6.94 .00 .00 .15 6.70
105470 1415312706 01/02/10_0950_1000 1909 14.46 .00 .00 .12 3.20
105471 1415312706 01/02/10_1000_1010 2242 16.68 .00 .00 .16 3.72
105472 1415312706 01/02/10_1010_1020 3030 16.72 .00 .00 .08 3.75
105473 1415312706 01/02/10_1020_1030 51 16.98 .00 .00 .98 224.00
105689 1415312706 01/03/10_2220_2230 2000 3.30 .00 .00 .00 .00
105746 1415312706 01/04/10_0750_0800 1458 6.38 .00 .00 .00 .42
105850 1415312706 01/05/10_0110_0120 46 13.59 .00 .00 .00 .02
105851 1415312706 01/05/10_0120_0130 220 3.14 .00 .00 .00 .01
105853 1415312706 01/05/10_0140_0150 2 253.50 .05 .05 .00 1.00
105854 1415312706 01/05/10_0150_0200 9 17.00 .00 .00 .00 1.22
105855 1415312706 01/05/10_0200_0210 1002 3.13 .00 .00 .00 .02
105881 1415312706 01/05/10_0620_0630 1569 11.79 .00 .00 .00 1.19
105987 731370628 01/06/10_0000_0010 118 10933.76 1.32 1.32 .00 .03
105988 731370628 01/06/10_0010_0020 965 10978.74 1.33 1.33 .00 .03
105989 731370628 01/06/10_0020_0030 2658 10717.09 1.31 1.31 .00 .01
105990 731370628 01/06/10_0030_0040 4019 10831.99 1.31 1.34 .00 .01
105991 731370628 01/06/10_0040_0050 4577 10715.70 1.30 1.33 .00 .01
105992 731370628 01/06/10_0050_0100 3488 10891.26 1.33 1.34 .00 .01
105993 731370628 01/06/10_0100_0110 2709 10833.78 1.32 1.32 .00 .09
xp_awr.sql
We should now ask the question, “Why did the plan change?” Usually, the first
USERS GROUP CALENDAR
suspect would be the optimizer statistics. Generally, statistics collection jobs For the most updated calendar, please visit www.ioug.org
run at night, so it is quite possible that statistics were either not collected for
one or more table(s) but completed for other table(s) involved in the query. DECEMBER 2010 APRIL 2011
Actually in this particular case, this was indeed the reason because multiple December 8 April 10-14
statistics collection jobs were running with different parameters at the same New York Oracle Users Group COLLABORATE 11 – IOUG
time. However, it could also have been a large data load to tables involved in NYC Metro Area Oracle Users Forum, 2011
the query as well—this can be investigated further. Group Meeting Orange County Convention
8:30 a.m. - 5 p.m. Center West
In many cases, for accurate measurement, we will still need to use 10046 New Yorker Hotel Orlando, Florida
traces or may be ASH (Active Session History) to see what a session was doing Event URL: www.nyoug.org Event URL:
in the past, so it all depends on the problem. Contact: info@nyoug.org; http://collaborate10.ioug.org/
212-978-8890