Simple Performance Tuning Methodology

Thursday, September 3, 2015

Simple Performance Tuning Methodology

In this post I'd like to show a simple methodology for performance tuning using Wait-Event Analysis.
Logically, the process is very simple:

Determine the most significant bottleneck
Improve/Fix it
Repeat it until the performance is good enough

The question is, how can we find the most significant bottleneck for a specific session or for a specific SQL, or even for the entire instance?

Basically, as process in Oracle can be in one of the following states:

Consuming CPU - which is usually fine and useful
Non-Idle Wait event - Waiting for something (lock, latch, I/O)

For example: whenever a process is reading multi blocks in a single I/O operation (usually in cases of a Full Table Scan or Index Fast Full Scan), Oracle will report on a "db file scattered read" wait event
In order to get a list of all the Non-Idle wait events, execute the following query:

SELECT name

FROM v$event_name

WHERE wait_class <> 'Idle'

Idle Wait Event - Doesn't do anything, indication that the user session is inactive.

For example: whenever a server process is waiting for the user client to do something, Oracle will report on a "SQL*Net message from client"
In order to get a list of all the Idle wait events, execute the following query:

SELECT name

FROM v$event_name

WHERE wait_class = 'Idle'

Note: when a process is waiting for something it can also consume CPU, for example, in some cases of latch contentions, there will also be a high CPU consumption (due to the latch spinning).

So how can you measure the top non-idle wait events in the database?
There are severl moethods:

Oracle Dynamic Views:

V$SYSTEM_EVENT - Provides information on all the wait-events which occured in the instance since-startup. You can get all the non-idle wait events, ordered by the time (in seconds) using the following query:

SELECT event, time_waited / 100 "Seconds Waited"

FROM v$system_event

WHERE wait_class <> 'Idle'

ORDER BY 2 DESC

V$SESSION_EVENT - Provides information on all the wait-events which occudes in a specific session (relevant only for connected sessions, not historical sessions). You can get all the non-idle wait events for a specific session, ordered by the time (in seconds) using the following query:

SELECT event, time_waited / 100 "Seconds Waited"

FROM v$session_event

WHERE wait_class <> 'Idle' AND SID = <SID>

ORDER BY 2 DESC

AWR:

The AWR report also holds lots wait-events related information. However, usually the "Top 5 Timed Foreground Events" section in the beginning of the report will provide most of the important information. Following is a screenshot of this section in one of our Oracle instances:

So how can you measure the overall performance?
The answer is, "DB Time" - a statistic which represents the total amount of time that Oracle spent during CPU processing and waiting for non-idle wait events.
DB Time = CPU Time + Non Idle wait-events

Notes:

DB Time doesn't take into account background sessions. It takes into account foreground sessions only.
It doesn't take into account idle wait-events.

The DB Time could also be measured in a several ways:

Oracle Dynamic Views:

V$SYSSTAT - Displays instance-level statistics. You can view the value of the DB Time (since instance startup), using the following query:

SELECT value

FROM v$sysstat

WHERE name = 'DB time'

V$SESSTAT - Displays session-level statistics. You can view the value of the DB Time for a specific session (connected session only), using the following query:

SELECT value

FROM v$sesstat JOIN v$statname USING (statistic#)

WHERE name = 'DB time' AND sid= <SID>

AWR:
Obviously, the AWR report also hold information about the DB Time.
In the beginning of the report there's a summary of the DB Time for the AWR report duarion:

As you can see, this AWR report is based on a 2 snapshots which their distance is 1 hour - the first snapshot is taken at 13:00, and the second snapshot is taken at 16:00, therefore the value of "Elapsed" is 1 hour. You may now ask yourself, how is it possible that the value of DB time (3 hours) was higher than the enitre duration of the report (which is 1 hour). Is it possible that during a 1-hour period, Oracle spent 3 hours on cpu and non-idle wait events?!
The answer is: absolutely!
Since the DB Time equals to the total amount of CPU time + non-idle wait events, so in machines where you have more than 1 CPU core (in this particular example there are 4 CPU cores), the DB Tme can definitely be higher than the Elapsed Time.

For example:
If during a 10 second-period, all the 4 CPU cores were fully utilized, then the DB Time value will be 40 for those 10 second-period.

After we determined the most significant Wait-Events, i.e. those which impact the database mostly, how can we reduce the wait time or even totally avoid them?

The answer is that it mainly depends on your experience with performance tuning.
There are over 1000 wait-events in Oracle but the good news are the eventually there's a limited number of common wait events, and this number is much smaller than 1000, so eventually, you will be familiar with most of the common wait-events, and you'll know which actions you should take in order to solved them.
If you're encountering performance issues that are related to a specific wait-event which you are not familiar with, you can always ask our best friend, google :)

Why use Performance Dianostics tools?
Some tools can make our life easier by visualizing the workload of the instance.
Some tools can even display the workload filtered by a specific dimension or a combination of dimensions (e.g. SQL Statment, User, Program, Machine, PDB, Session, etc.).
Having the ability to filter the workload for a set of combinations allows performing advanced performance diagnostics research.
For example: viewing all the wait-events and statistics for all the SQL Statements that were executed by user HR using TOAD under PDB (Pluggable Database) PROD.

OEM ASH Analytics was introduced in OEM 12c and it allows you to perform OLAP operations on the database dimensions which makes the performance tuning process much simpler and faster.
Please note that in order to benefit this feature you must have the Enterprise Edition and also the tuning and diagnostics packs.

Foglight for Oracle also provides the ability to view a list of all the dimensions and drill-down to any combination of dimensions, as you can see in the below screenshot:

Foglight for Oracle comes with a full dimension stack:

Here is an example of how easily you can drill-down to all the SQL Statements that were executed from TOAD by the user SALES:

Conclusions:

DB Time is an important measurement which represents the total CPU + Non-Idle wait events of foreground sessions.
When a DBA wants to improve the performance, the best approach is to focus on the major/top wait-events and reduce/fix them in order to reduce the total DB time until the performance will be good enough.
Some tools (Oracle Enterprise Manager, Dell Foglight) allow visualizing the workload of the entire instance or the activity of a specific dimension/set of dimensions. This can reduce dramatically(!) the time it takes for the DBA identifying and solving these issues.