Using ASH Data To Solve A Performance Problem Using A Time Based-Like Approach

Posted on 23-March-2015 by Craig Shallahamer, craig@orapub.com

I'm going to show you how to use ASH data in a database time-like analysis. Sound dangerous? Perhaps, but doing this is important if you want an overview of your system before you drill down into session level details.

One of the benefits of an Oracle database time analysis is it naturally creates categories (that is, buckets) of time that attracts my attention. If my objective is to reduce time, then finding big chunks of time presents me with opportunities...very specific opportunities. So, an Oracle time based analysis (OTBA) is a great strategy and works wonderfully in many situations. But not in all situations.

Oracle's active session history or ASH for short, is sample based not time based. Because of this sampling-nature it can capture and store much more information than Oracle's time based model. However, sampling can't provide precise timing information. But in many situations this is not problem, plus ASH provides some unique advantages over the time model that I'll blog about later.

In this article, my objective is to use ASH sample data to create an Oracle time based analysis-like report and drill down to create a list of spot-on solutions. It's super fast and just like with an OTBA it will highlight performance opportunities right before our eyes!

Using OraPub's BloodHound

In this post, I'm going to use my BloodHound tool. It's ASH based, free and you can download it HERE BloodHound has many unique and powerful features but I'm not going to get into those in this post. If you are interested, just scan the readme.txt file in the BloodHound software distribution directory.

Set ASH Analysis Window

With BloodHound installed (super simple and fast) I need to point to only the data I want to report on. Users where telling me, in their typical gentle and respectful way, that performance sucked on 01-Aug-2015 between 8am and 9am. Here's how I set the BloodHound "analysis window" when I'm pulling ASH from a remote database system into my local database for analysis. (You don't need to dig into the script below to benefit from this article.)

SQL> @bhset

The BloodHound scripts all pull ASH data from an ASH-like table.

You can set and reset the source details to whatever you want and whenever you want.

The setting remains until you disconnect, which means with each re-connect you must re-set.

ENTER the maximum of PQ slaves used by these tools. (default:0) 	: 5
ENTER the minimum reporting percentage threshold. (1=1%, default: 0.5)	: 
ENTER: Is the ASH data source local or remote? (default:local or remote): remote
Most DBAs will use either v$active_session_history or dba_hist_active_sess_history.

ENTER the remote BH reporting data source (default:v$active_session_history) : dba_hist_active_sess_history
ENTER: Is the data source based on v$active... or DBA_HIST... (default: v$ or dba) : dba
ENTER the remote database connection string (example: system/manager@prod30) : system/manager@prod40

Dropping interim table...
Gathering summary remote data information (this can take a while)...

Array fetch/bind size is 15. (arraysize is 15)
Will commit when done. (copycommit is 0)
Maximum long size is 80. (long is 80)
Table OP_BH_REMOTE_INTERIM created.

   4 rows selected from system@prod40.
   4 rows inserted into OP_BH_REMOTE_INTERIM.
   4 rows committed into OP_BH_REMOTE_INTERIM at DEFAULT HOST connection.

Here is a summary of the available remote dba_hist_active_sess_history data.

      DBID INSTANCE_NUMBER MIN_SAMPLE_TIME	MAX_SAMPLE_TIME
---------- --------------- -------------------- --------------------
4217380439		 1 27-Dec-2014 21:59:27 04-Feb-2015 10:32:32
4243270474		 1 12-Jul-2014 02:00:05 11-Aug-2014 10:00:08
4243270474		 2 12-Jul-2014 02:00:07 11-Aug-2014 10:00:04
4243270474		 3 12-Jul-2014 02:00:08 11-Aug-2014 10:00:05

4 rows selected.


ENTER the DBID		  : 4243270474
ENTER the Instance Number : 3
ENTER BH reporting BEGIN date/time (fmt: DD-Mon-YYYY HH24:MI:SS, default: 23-Mar-2015 07:36:25) : 01-Aug-2014 08:00:00
ENTER BH reporting END	 date/time (fmt: DD-Mon-YYYY HH24:MI:SS, default: 23-Mar-2015 08:36:25) : 01-Aug-2014 09:00:00

Enabling super fast reporting by determining the begin and end snap IDs. This can take a while, but it's worth it...

Note: You can use bhcopydata.sql to copy the resulting data source rows into
a normal table for later analysis.

Done.

Dropping interim table...

Table dropped.

Copying remote data...

Array fetch/bind size is 1000. (arraysize is 1000)
Will commit when done. (copycommit is 0)
Maximum long size is 80. (long is 80)
Table OP_BH_REMOTE_INTERIM created.

   5651 rows selected from system@prod40.
   5651 rows inserted into OP_BH_REMOTE_INTERIM.
   5651 rows committed into OP_BH_REMOTE_INTERIM at DEFAULT HOST connection.

There will be 5651 DBA_HIST ASH rows available during the analysis processing.

Until you reset (@bhset.sql) or disconnect from Oracle, all BH reports will report
from the just created op_bh_remote_interim table which contains data
from the remote table from 01-Aug-2014 08:00:00 (032048) to 01-Aug-2014 09:00:00 (032051).

If this is not what you want, start bhset.sql again.

Group Session Activity In A Time-Like Format

Now I want to group session samples in a way that will highlight performance improving opportunities and also provide a natural analysis drill-down path. Notice how I grouped the ASH samples.

SQL> @bhsum flags % 

Database: prod40                                                            23-Mar 09:54am
Report:   bhsum.sql             BloodHound by OraPub, Inc.                  Page         1

                    Profile Summary - ASH Sample Based Analysis Report
            (from 01-Aug-2014 08:00:00 to 01-Aug-2014 09:00:00, flags like %%)

   % CPU   % Wait % Wait IO % Wait Other
-------- -------- --------- ------------
   52.47    47.53     90.05         9.95

Wait Activity Detail                   % Wait Samples % All Samples
-------------------------------------- -------------- -------------
db file sequential read                         80.45         38.24
gc cr grant 2-way                                3.93          1.87
read by other session                            2.48          1.18
enq: TX - row lock contention                    2.20          1.05
log file sync                                    2.01          0.95
direct path read                                 1.77          0.84
db file parallel read                            1.73          0.82
gc current block 2-way                           1.18          0.56
db file scattered read                           1.02          0.49

There were 5348 filtered samples available and used for this analysis.
Only wait events with % Wait Samples >= 0.5% are displayed.
The BH/ASH data source is op_bh_remote_interim

You can see I categorized the ASH samples, at the highest level, into two groups; CPU and WAIT. Because ASH analysis is NOT based on time, I should never say "percent of time." Instead, I say "percent of samples." In the above situation, 52% of the ASH samples relate to active Oracle sessions that were on CPU and 48% of the samples when active sessions where waiting to consume CPU.

While Oracle's time based model measures how much CPU time a session actually consumed, ASH only knows the session is "ON CPU." In reality, ASH does not know if the "ON CPU" session is consuming CPU or sitting in the OS CPU run queue ready to consume CPU. This is why some ASH reports will show "CPU and waiting on CPU" because ASH can't tell the difference. But I digress...

If our objective is to reduce database time, which includes Oracle process CPU consumption, and 52% of the active session samples were "ON CPU", then we have a potential performance improving opportunity before us. More later on this.

We have also been presented with another performance improving opportunity because 47.53% of the ASH samples are from Oracle sessions that were waiting. Moving to the next level down, of the 48% of samples found waiting, 90% of those where waiting on IO related wait events! And looking at the wait event details, clearly the opportunity before us is with db file sequential read[s], that is Oracle synchronous single block physical reads.

To summarize, we have two drill paths before us; CPU and IO reads.

CPU Category Drill-Down

Since 52% of the session samples where "on CPU" we need to explore this further looking for opportunities to potentially reduce Oracle process CPU consumption. While I am not detailing this, I did check operating system CPU utilization. I discovered that during this one hour snapshot internal, the average CPU utilization was 27%. This means there is plenty of CPU power available.

We can use the available CPU power to reduce long running CPU intensive SQL statements by parallelizing them. Also, to make OLTP SQL more snappy, I'll want to find CPU intensive SQL statements and tune them by focusing on reducing their CPU consumption, which essentially means reduce their buffer gets (logical IO).

Here's how I did this using BloodHound. (Inside the below BloodHound script are many examples and notes.)

SQL> @bhtopcolcpu sql_id,sql_plan_hash_value sample_id %

Database: prod40                                                  23-Mar 10:13am
Report:   bhtopcolcpu.sql  BloodHound by OraPub, Inc.             Page         1

When On CPU Show Top sql_id,sql_plan_hash_value - ASH Sample Based Analysis Repo
 (from 01-Aug-2014 08:00:00 to 01-Aug-2014 09:00:00, where: sample_id filter:%)

                                         % Samples
SQL_ID               SQL_PLAN_HASH_VALUE    ON CPU
-------------------- ------------------- ---------
1r0xtxsud9g2m                 2226130904     23.96
                                       0      9.60
4tdpuxb82b7ju                 3939892812      3.22
cbwt5v4r6xyj0                 3939892812      2.45
2gjhuu71dp34j                 2770779293      2.32
gcb5z9gj5m8wv                 3939892812      2.28
chpa6821mm5h6                 2330618605      2.08
225r3hukpy4md                 2329100176      1.91
96086s7y49czb                 1120174381      1.78
...

Nice! Clearly, the top CPU consuming SQL statement has been identified. Push that on your mental stack and let's move forward.

I know that this application is instrumented rather well using the module and action columns. Let's run the BloodHound report again, but this time focusing on the module and action.

I slightly modified the report output below to hide the production system's identify. I've done this throughout this article.

SQL> @bhtopcolcpu module,action sample_id %

Database: prod40                                                                      23-Mar 10:30am
Report:   bhtopcolcpu.sql            BloodHound by OraPub, Inc.                       Page         1

                When On CPU Show Top module,action - ASH Sample Based Analysis Report
           (from 01-Aug-2014 08:00:00 to 01-Aug-2014 09:00:00, where: sample_id filter:%)

                                                                                  % Samples
MODULE                                   ACTION                                      ON CPU
---------------------------------------- ---------------------------------------- ---------
ABC_12_INV_TXN_SRVCOPER                  OnBootup                                     32.26
                                                                                       6.18
PSQRYSRV@abcpab675 (TNS V1-V3)           xyzzy                                         5.94
PSQRYSRV@abcpab668 (TNS V1-V3)           xyzzy                                         4.37
PURCHASE_ORDER                           PO_LINE                                       3.63
AtPar_WindowsService.exe                                                               3.36
PV_REQ_STATUS                            PV_REQ_STATUS                                 3.12
PSQRYSRV@abcpab667 (TNS V1-V3)           xyzzy                                         2.62

You've got to love this! 32% of the CPU samples is focused on this "ABC" module's "OnBootup" action. Now I'll profile this module using BloodHound's profile summary report.

SQL> @bhsum  module ABC_12_INV_TXN_SRVCOPER

Database: prod40                                                            23-Mar 10:35am
Report:   bhsum.sql             BloodHound by OraPub, Inc.                  Page         1

                    Profile Summary - ASH Sample Based Analysis Report
 (from 01-Aug-2014 08:00:00 to 01-Aug-2014 09:00:00, module like KP_SCM_INV_TXN_SRVCOPER%)

   % CPU   % Wait % Wait IO % Wait Other
-------- -------- --------- ------------
   99.18     0.82    100.00         0.00

Wait Activity Detail                   % Wait Samples % All Samples
-------------------------------------- -------------- -------------
db file sequential read                         66.67          0.54
log file sync                                   33.33          0.27

Can't get much clearer than this! The ABC_12_INV_TXN_SRVCOPER module is all about CPU. Keep this in your mental stack while we shift our focus to the wait situation.

Wait Time Category Drill-Down

Now I'll get the top db file sequential read wait SQL. If you look at the BloodHound report below, you can "read aloud" what I'm trying to do: Give me the top SQL_ID, SQL_PLAN_HASH_VALUE where the wait event is db file sequential read.

SQL> @bhtopcolwait sql_id,sql_plan_hash_value event db%file%seq%read

Database: prod40                                                                                23-Mar 10:46am
Report:   bhtopcolwait.sql                BloodHound by OraPub, Inc.                            Page         1

              When Waiting Show Top sql_id,sql_plan_hash_value - ASH Sample Based Analysis Report
           (from 01-Aug-2014 08:00:00 to 01-Aug-2014 09:00:00, where: event filter:db%file%seq%read)

                                         % Samples
SQL_ID               SQL_PLAN_HASH_VALUE    Waited
-------------------- ------------------- ---------
cbwt5v4r6xyj0                 3939892812     37.46
1r0xtxsud9g2m                 2226130904     18.93
0m0cvcbjsgwft                  716627210      4.95
8sksw60nkukuf                 1010090860      3.65
8sksw60nkukuf                 4188348960      2.88
d5fjz6usdbhdr                  124339810      2.14
...

Bingo! We found the clear top sequential read wait SQL statement.

Just like with the CPU situation, let's look at this situation from a module and action perspective. Again, if you look at the BloodHound report below, you can "read aloud" what I'm trying to do: Give me the top module, action where the wait event is db file sequential read.

SQL> @bhtopcolwait module,action event db%file%seq%read

Database: prod40                                                                                23-Mar 10:48am
Report:   bhtopcolwait.sql                BloodHound by OraPub, Inc.                            Page         1

                    When Waiting Show Top module,action - ASH Sample Based Analysis Report
           (from 01-Aug-2014 08:00:00 to 01-Aug-2014 09:00:00, where: event filter:db%file%seq%read)

                                                              % Samples
MODULE                         ACTION                            Waited
------------------------------ ------------------------------ ---------
PURCHASE_ORDER                 PO_LINE                            28.23
ABC_12_INV_TXN_SRVCOPER        OnBootup                           18.48
PSQRYSRV@abcpab675 (TNS V1-V3) xyzzy                               9.89
ATM_AVAIL_INV                  ATM_SUMMARY_INV                     7.50
RECV_PO                        xyzzy                               7.07
PV_REQUISITIONS                PV_REQ_FORM                         6.82
...

Again, clean reporting makes this obvious. The top module and action is the purchase order module's po_line action. Notice the second module and action? It's the top CPU module and action! (I love it when this happens!)

Now let's get an overall summary for the top sequential read wait module. Looking at the BloodHound report below, you can "read aloud" what I'm trying to do: Give me a time based-like summary when the module is "PURCHASE_ORDER".

SQL> @bhsum  module PURCHASE_ORDER

Database: prod40                                                            23-Mar 10:59am
Report:   bhsum.sql             BloodHound by OraPub, Inc.                  Page         1

                    Profile Summary - ASH Sample Based Analysis Report
     (from 01-Aug-2014 08:00:00 to 01-Aug-2014 09:00:00, module like PURCHASE_ORDER%)

   % CPU   % Wait % Wait IO % Wait Other
-------- -------- --------- ------------
   19.57    80.43     89.47        10.53

Wait Activity Detail                   % Wait Samples % All Samples
-------------------------------------- -------------- -------------
db file sequential read                         89.47         18.28
Disk file operations I/O                         5.26          1.08
library cache lock                               2.63          0.54
gc cr grant 2-way                                2.63          0.54
...

We aren't done yet with regards to IO. We need to understand if the IO subsystem is getting overwhelmed with IO calls. ASH tells us what a session was waiting for, but not the wait duration.

One way to get clues about the IO subsystem's read responsiveness is to look at the wait times for the db file sequential read wait. If the average wait time surpasses your comfort threshold, that's a clue the IO subsystem is unable to respond to IO read calls quick enough. But this all depends on your SLA.

Assuming 10ms is your threshold and the average sequential read wait time is say 18ms, then performance can be improved by increasing the IO subsystem's read capacity AND tuning SQL_ID cbwt5v4r6xyj0 and looking at what's going on inside the PURCHASE_ORDER module.

Summary & To Do List

In this article, I setup the analysis just like I do when performing an Oracle Time Based Analysis (OTBA). This is also called a database time or DB Time analysis. But instead of using Oracle's time model, I used ASH data. And my BloodHound ASH based toolkit helped make this very easy.

Specifically in regards to this system, here is my initial set of solutions to evaluate, rank, and potentially implement.

Oracle Active Session History (ASH) data can be used in a variety of ways. What I did was focus on only one way to use the data and to relate it to a time based analysis approach.

Pretty cool, eh?

Thanks for readying and enjoy your work!

Craig.

Start my FREE 18 lesson Machine Learning For Oracle Professionals E-Course here.


Craig Shallahamer is a long time Oracle DBA who specializes in predictive analytics, machine learning and Oracle performance tuning. Craig is a performance researcher and blogger, consultant, author of two books, an enthusiastic conference speaker a passionate teacher and an Oracle ACE Director. More about Craig Shallahamer...


If you have any questions or comments, feel free to email me directly at craig at orapub.com.

Creating Realistic User Think Times For Research Projects Are Oracle Direct Path Reads Faster Than DB File Scattered Reads? Five Ways To Help Get Your 2017 IOUG Abstracts Accepted