Inferring SQL Run Time Using ASH PX_FLAGS Filtering And Max/Min SAMPLE_TIME

Posted on 23-July-2019 by Craig Shallahamer / OraPub / craig@orapub.com

As I wrote in my previous article , with ASH data it is very easy to infer SQL statement database time. However, it is not simple to infer SQL statement run time. The challenge occurs when parallel query is involved.

In this second article, there are two distinct parts. In the first part, the focus is on using the ASH PX_FLAGS column to filter out PQ sessions. The second part will focus on using the maximum and minimum ASH sample times to infer the SQL run time.

As with the program column filtration strategy, this px_flags strategy is not as perfect as you may expect. For my purposes, the clear winner if the max/min sample time strategy!

Enjoy the read!

SQL Statement Run Time

While inferring database time using ASH samples is very straightforward, inferring run time is not.

It's because database time includes ALL the cpu and wait time for all the sessions involved, which includes any Oracle parallel query (PQ) session database time. When PQ is involved, database time and run times diverge.

Here's an example I made up.

DB Time   PQs   Run Time
100        0    100
100        2     50
100       10     10

So, one strategy to determine a statement's run time is to exclude all PQ sessions from the ASH sample count. But this turned out to be much more complicated then I expected.

Complicated? Yes... Read on.

What About The ASH PX_FLAGS Column?

A colleague mentioned there is a px_flags column in ASH and wondered if it could be used to filter out the PQ active sessions.

What! Really? It's not the first time I did not notice a potentially useful ASH column. So I started my research.

You can see the px_flags column below.

SQL> desc v$active_session_history
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 SAMPLE_ID                                          NUMBER
 SAMPLE_TIME                                        TIMESTAMP(3)
 IS_AWR_SAMPLE                                      VARCHAR2(1)
 SESSION_ID                                         NUMBER
 SESSION_SERIAL#                                    NUMBER
 SESSION_TYPE                                       VARCHAR2(10)
 ...
 QC_SESSION_SERIAL#                                 NUMBER
 PX_FLAGS                                           NUMBER
 EVENT                                              VARCHAR2(64)
 EVENT_ID                                           NUMBER
...

Using your AWR ASH table of choice, run the below query and take a close look. My results are shown below.

select distinct(program || '   '|| px_flags)
from   DBA_HIST_ACTIVE_SESS_HISTORY
where  program like '%(P%)%'
   or  px_flags > 0
/

(PROGRAM||''||PX_FLAGS)
-----------------------------------
...
oracle@oceanus (P004) 50397189
sqlplus@oceanus (TNS V1-V3)   327688
sqlplus@oceanus (TNS V1-V3)   393221
oracle@oceanus (P009)   50462738
...
oracle@oceanus (P044)   245497864
oracle@oceanus (P094)   245497864
oracle@oceanus (P087)   245497864
oracle@oceanus (P091)   245497864
...

18763 rows selected.

It turns out the px_flags column is super powerful in helping to understand a query's degree of parallelism (DOP) and also for understanding parallel operations. There are a number of very cool blog posts on this topic, so I will not attempt to redo their excellent work here.

More importantly, my objective is NOT to determine the DOP. My objective is to infer SQL statement run time. And, with a very simple and "always working" strategy.

Perhaps I can get creative with the px_flags column, but I would rather use another solution.

But there is a larger issue at hand that makes me want to explore other strategies.

According to my sources, the px_flags column may not be populated in Oracle versions less than 11.2.0.2. For some DBAs this may not be a roadblock but for me it is an issue.

Like many of you, I work on many Oracle Database versions. While I suspect at the time of this writing, the majority of production Oracle systems are at a version equal to or greater than 11.2.0.2, there are plenty of large and stable production systems on earlier versions of Oracle. Many of those "early version" Oracle systems simply can't be upgraded because of application supportability or licensing.

So for me, I'm going to try and find a "better" solution.

And that's exactly what I did! Read on...

Using ASH SAMPLE_TIME To Infer Run Times

The max/min strategy is simply subtracting the minimum sample time from the maximum sample time for a specific SQL statement execution. Conceptually you can't get much simpler than this.

When I first started thinking about finding a strategy to infer SQL runs times, I was not aware of the PQ active session issue. In fact, my first strategy was to simply use the minimum and maximum ASH sample times to estimate the run time.

You might be wondering why I did not initially use a max/min time strategy to begin with. Honestly, I'm not 100% sure. But I think it was because when I first needed to know SQL run times I was "time-lining" a specific active session, looking at it's activity sample by sample. (See my ASH ScratchPad toolkit's Timeline script.)

This is not what I'm trying to do here. I want a list of run times for a specific SQL_ID, so I can do some data analysis, including machine learning anomaly detection. So my use case has changed.

Coming full circle now, it occurred to me to try the max/min sample time strategy again. It's just so crazy simple.

This strategy can be coded rather simply. Below is the code snippet followed by two example SQL_IDs.

def mysql_id='b62zukh71caj7'

col runtime format a25

select	(max(sample_time)-min(sample_time)) runtime,
        to_char(SQL_EXEC_START,'YYYY-Mon-DD HH24:MI:SS') start_date_time,
        sql_exec_id
from	OP_DBA_HIST_ACTV_SESS_HIST
where	sql_id = '&mysql_id'
group by to_char(SQL_EXEC_START,'YYYY-Mon-DD HH24:MI:SS'), sql_exec_id
order by 2,1
/

RUNTIME                   START_DATE_TIME      SQL_EXEC_ID
------------------------- -------------------- -----------
+000000000 00:00:50.142   2019-Feb-02 13:40:41    16777216
+000000000 00:00:50.202   2019-Feb-09 13:50:30    16777216
+000000000 00:01:00.210   2019-Feb-16 14:05:22    16777216
+000000000 00:00:40.170   2019-Jan-05 14:00:07    16777216
+000000000 00:01:00.206   2019-Jan-12 14:02:08    16777216
+000000000 00:01:40.350   2019-Jan-19 14:21:59    16777216
+000000000 00:01:40.354   2019-Jan-26 14:01:29    16777216

7 rows selected.

SQL> def mysql_id='8tjmvqptxpv4v'
SQL> /

RUNTIME                   START_DATE_TIME      SQL_EXEC_ID
------------------------- -------------------- -----------
+000000000 00:00:00.000   2019-Feb-06 20:02:16    16777296

Zero run time!? For SQL_ID 8tjmvqptxpv4v, all the associated ASH samples occurred at the same time. So the maximum sample_time minus the minimum sample_time is zero.

By the way, though I'm using AWR data with a sample frequency of once every 10 seconds, a single sample_time does not imply the execution is 10 seconds. It means the run time is greater than zero and less than or equal to 19 seconds. If the run time was 20 seconds, we should see two different sample times. But I digress...

Humanly Readable (In Seconds) Please

I would like the inferred run time to be expressed in seconds. Because the sample_time is a timestamp column, the code is messy.

The way I coded below may seem strange. Personally, I like to develop my SQL from the inside out. That is, have all the data I need from the inner select and then wrap an outer select to do the formatting. This allows me to debug my code by first focusing on the content and then focusing on the format. It's just a personal preference of mine.

Here is my final code to infer SQL_ID run times for each execution in seconds, followed by two example SQL_IDs.

def mysql_id='b62zukh71caj7'

col runtime_sec format 99999990
select  start_date_time, sql_exec_id,
        abs(extract( day from runtime ))*24*60 +
        abs(extract( hour from runtime ))*60 +
        abs(extract( minute from runtime ))*60 +
        abs(extract( second from runtime )) runtime_sec
from (
    select to_timestamp_tz(max(sample_time))-to_timestamp_tz(min(sample_time)) runtime,
           to_char(SQL_EXEC_START,'YYYY-Mon-DD HH24:MI:SS') start_date_time,
           sql_exec_id
    from   OP_DBA_HIST_ACTV_SESS_HIST
    where  sql_id='&mysql_id'
    group by to_char(SQL_EXEC_START,'YYYY-Mon-DD HH24:MI:SS'), sql_exec_id
)
order by 1,2
/

START_DATE_TIME      SQL_EXEC_ID RUNTIME_SEC
-------------------- ----------- -----------
2019-Feb-02 13:40:41    16777216          50
2019-Feb-09 13:50:30    16777216          50
2019-Feb-16 14:05:22    16777216          60
2019-Jan-05 14:00:07    16777216          40
2019-Jan-12 14:02:08    16777216          60
2019-Jan-19 14:21:59    16777216         100
2019-Jan-26 14:01:29    16777216         100

7 rows selected.

SQL> def mysql_id='8tjmvqptxpv4v'
SQL> /

START_DATE_TIME      SQL_EXEC_ID RUNTIME_SEC
-------------------- ----------- -----------
2019-Feb-06 20:02:16    16777296           0

Time-lining An Execution

When I do ASH analysis, it seems "stalking" a session is always necessary. The word "stalking" is a little creepy, so I typically use "timeline" or "timelining a session."

This particular situation is a little different. Instead of timelining a single session, I want to timeline all the sessions associated with a specific SQL_ID.

In my free ASH ScratchPad toolkit, I have a very simple "timeline" script that I use a lot. It's super flexible, simple and easily understood. My ASH ScratchPad is the source of the SQL below.

set tab off
set linesize 300
col sql_id format a15
col session_type format a15
col SESSION_STATE format a7 heading "STATE"
col program format a25 trunc
col sid format 9999
col event format a25 trunc
col module format a30

def ashCols="sql_id,event,program,module,to_char(SQL_EXEC_START,'MI:SS') BT,sql_exec_id"
--def ashWhere="sql_id='b62zukh71caj7'"
def ashWhere="sql_id='8tjmvqptxpv4v'"
def datasource='OP_DBA_HIST_ACTV_SESS_HIST'
def timingdetails='1=1'
def dbahistdetails='and 1=1'

select sample_id, to_char(sample_time,'DD HH24:MI:SS') ST,
       session_id sid, session_state,
       &ashCols
from   &datasource
where  &timingdetails
       &dbahistdetails
  and  &ashWhere
order by 1,3
/

If you recall above, for SQL_ID 8tjmvqptxpv4v, there is only one execution and it's run time was "zero" seconds. The zero seconds is because all the associated ASH samples had the same sample_time. Looking at the below timeline, we can see this is true! And we can see all the PQ processes plus the main foreground process (SID 855). I love ASH analysis!!

 SAMPLE_ID ST            SID STATE   SQL_ID          EVENT                     PROGRAM                   MODULE                         BT    SQL_EXEC_ID
---------- ----------- ----- ------- --------------- ------------------------- ------------------------- ------------------------------ ----- -----------
  63459581 06 20:02:16   855 ON CPU  8tjmvqptxpv4v                             sqlplus@oceanus (TNS V1-V sqlplus@oceanus (TNS V1-V3)    02:16    16777296
  63459581 06 20:02:16  1989 ON CPU  8tjmvqptxpv4v                             oracle@oceanus (P000)     sqlplus@oceanus (TNS V1-V3)    02:16    16777296
  63459581 06 20:02:16  2172 ON CPU  8tjmvqptxpv4v                             oracle@oceanus (P001)     sqlplus@oceanus (TNS V1-V3)    02:16    16777296
  63459581 06 20:02:16  2199 ON CPU  8tjmvqptxpv4v                             oracle@oceanus (P002)     sqlplus@oceanus (TNS V1-V3)    02:16    16777296
  63459581 06 20:02:16  2242 ON CPU  8tjmvqptxpv4v                             oracle@oceanus (P003)     sqlplus@oceanus (TNS V1-V3)    02:16    16777296
  63459581 06 20:02:16  2270 ON CPU  8tjmvqptxpv4v                             oracle@oceanus (P004)     sqlplus@oceanus (TNS V1-V3)    02:16    16777296
  63459581 06 20:02:16  2313 ON CPU  8tjmvqptxpv4v                             oracle@oceanus (P006)     sqlplus@oceanus (TNS V1-V3)    02:16    16777296
  63459581 06 20:02:16  2326 ON CPU  8tjmvqptxpv4v                             oracle@oceanus (P007)     sqlplus@oceanus (TNS V1-V3)    02:16    16777296
  63459581 06 20:02:16  2381 ON CPU  8tjmvqptxpv4v                             oracle@oceanus (P008)     sqlplus@oceanus (TNS V1-V3)    02:16    16777296
  63459581 06 20:02:16  2593 ON CPU  8tjmvqptxpv4v                             oracle@oceanus (P013)     sqlplus@oceanus (TNS V1-V3)    02:16    16777296
  63459581 06 20:02:16  2677 ON CPU  8tjmvqptxpv4v                             oracle@oceanus (P014)     sqlplus@oceanus (TNS V1-V3)    02:16    16777296
  63459581 06 20:02:16  2704 ON CPU  8tjmvqptxpv4v                             oracle@oceanus (P015)     sqlplus@oceanus (TNS V1-V3)    02:16    16777296
  63459581 06 20:02:16  2718 ON CPU  8tjmvqptxpv4v                             oracle@oceanus (P016)     sqlplus@oceanus (TNS V1-V3)    02:16    16777296
  63459581 06 20:02:16  2731 ON CPU  8tjmvqptxpv4v                             oracle@oceanus (P017)     sqlplus@oceanus (TNS V1-V3)    02:16    16777296
  63459581 06 20:02:16  2747 ON CPU  8tjmvqptxpv4v                             oracle@oceanus (P018)     sqlplus@oceanus (TNS V1-V3)    02:16    16777296

15 rows selected.

For SQL_ID b62zukh71caj7, there were 7 executions with the fasted starting at 2019-Jan-05 14:00:07 running for around 40 seconds. I time-lined all the session activity for this SQL_ID, but only the 40 second execution is shown below.

SQL> def ashWhere="sql_id='b62zukh71caj7'"
SQL> /

 SAMPLE_ID ST            SID STATE   SQL_ID          EVENT                     PROGRAM                   MODULE                         BT    SQL_EXEC_ID
---------- ----------- ----- ------- --------------- ------------------------- ------------------------- ------------------------------ ----- -----------
  60691552 05 14:00:13    10 WAITING b62zukh71caj7   direct path read          oracle@oceanus (P014)     DBMS_SCHEDULER                 00:07    16777216
  60691552 05 14:00:13    65 ON CPU  b62zukh71caj7                             oracle@oceanus (P017)     DBMS_SCHEDULER                 00:07    16777216
  60691552 05 14:00:13    74 ON CPU  b62zukh71caj7                             oracle@oceanus (P018)     DBMS_SCHEDULER                 00:07    16777216
  60691552 05 14:00:13    97 ON CPU  b62zukh71caj7                             oracle@oceanus (P019)     DBMS_SCHEDULER                 00:07    16777216
  60691552 05 14:00:13   115 WAITING b62zukh71caj7   direct path read          oracle@oceanus (P020)     DBMS_SCHEDULER                 00:07    16777216
  60691552 05 14:00:13   137 ON CPU  b62zukh71caj7                             oracle@oceanus (P021)     DBMS_SCHEDULER                 00:07    16777216
  60691552 05 14:00:13   150 ON CPU  b62zukh71caj7                             oracle@oceanus (P022)     DBMS_SCHEDULER                 00:07    16777216
  60691552 05 14:00:13   180 ON CPU  b62zukh71caj7                             oracle@oceanus (P023)     DBMS_SCHEDULER                 00:07    16777216
  60691552 05 14:00:13  2991 ON CPU  b62zukh71caj7                             oracle@oceanus (P012)     DBMS_SCHEDULER                 00:07    16777216
  60691552 05 14:00:13  3008 ON CPU  b62zukh71caj7                             oracle@oceanus (P013)     DBMS_SCHEDULER                 00:07    16777216
  60691562 05 14:00:23    65 ON CPU  b62zukh71caj7                             oracle@oceanus (P017)     DBMS_SCHEDULER                 00:07    16777216
  60691562 05 14:00:23    74 ON CPU  b62zukh71caj7                             oracle@oceanus (P018)     DBMS_SCHEDULER                 00:07    16777216
  60691562 05 14:00:23    97 ON CPU  b62zukh71caj7                             oracle@oceanus (P019)     DBMS_SCHEDULER                 00:07    16777216
  60691562 05 14:00:23   115 ON CPU  b62zukh71caj7                             oracle@oceanus (P020)     DBMS_SCHEDULER                 00:07    16777216
  60691562 05 14:00:23   150 ON CPU  b62zukh71caj7                             oracle@oceanus (P022)     DBMS_SCHEDULER                 00:07    16777216
  60691572 05 14:00:33    65 ON CPU  b62zukh71caj7                             oracle@oceanus (P017)     DBMS_SCHEDULER                 00:07    16777216
  60691572 05 14:00:33    74 ON CPU  b62zukh71caj7                             oracle@oceanus (P018)     DBMS_SCHEDULER                 00:07    16777216
  60691572 05 14:00:33   115 ON CPU  b62zukh71caj7                             oracle@oceanus (P020)     DBMS_SCHEDULER                 00:07    16777216
  60691572 05 14:00:33   150 ON CPU  b62zukh71caj7                             oracle@oceanus (P022)     DBMS_SCHEDULER                 00:07    16777216
  60691582 05 14:00:43    65 WAITING b62zukh71caj7   direct path read          oracle@oceanus (P017)     DBMS_SCHEDULER                 00:07    16777216
  60691582 05 14:00:43    74 ON CPU  b62zukh71caj7                             oracle@oceanus (P018)     DBMS_SCHEDULER                 00:07    16777216
  60691582 05 14:00:43   115 ON CPU  b62zukh71caj7                             oracle@oceanus (P020)     DBMS_SCHEDULER                 00:07    16777216
  60691582 05 14:00:43   150 ON CPU  b62zukh71caj7                             oracle@oceanus (P022)     DBMS_SCHEDULER                 00:07    16777216
  60691592 05 14:00:53    74 ON CPU  b62zukh71caj7                             oracle@oceanus (P018)     DBMS_SCHEDULER                 00:07    16777216
...

Here are some observations regarding the above b62zukh71caj7 execution:

It appears that inferring SQL run times using the maximum and minimum sample times works. And more than that, it works regardless of the Oracle version, is conceptually simple, the code is simple and easily customized for data analysis use.

Did someone say, "data analysis"? Read on...

Let's Do Some Data Analysis

With just a little data analysis effort, we can increase the value of our analysis.

It only takes a few commands to spot unusual or unexpected activity, see trends and answer questions with actual data and also visuals. That's super powerful.

I'm going to do some analysis on SQL_ID b62zukh71caj7, because it has a few samples. First, I need to get the data into a comma delimited format, so I can easily paste it into R.

If you look closely at the SQL code below, you might have noticed that I only changed two lines from the "final" SQL statement run time SQL. The change is related to the SQL*Plus define, mydelim. I also added commands to turn off (and then back on) the display of the start_date_time and sql_exec_id columns. This is a slick trick I use, so I can use the same SQL but show different outputs.

def mysql_id='b62zukh71caj7'

set heading off
set verify off

col start_date_time noprint
col sql_exec_id noprint
def mydelim=','

col runtime_sec format 99999990
select  start_date_time, sql_exec_id,
        abs(extract( day from runtime ))*24*60 +
        abs(extract( hour from runtime ))*60 +
        abs(extract( minute from runtime ))*60 +
        abs(extract( second from runtime ))||'&mydelim' runtime_sec
from (
   select  to_timestamp_tz(max(sample_time))-to_timestamp_tz(min(sample_time)) runtime,
           to_char(SQL_EXEC_START,'YYYY-Mon-DD HH24:MI:SS') start_date_time,
           sql_exec_id
    from   OP_DBA_HIST_ACTV_SESS_HIST
    where  sql_id='&mysql_id'
    group by to_char(SQL_EXEC_START,'YYYY-Mon-DD HH24:MI:SS'), sql_exec_id
)
order by 1,2
/

50.142,
50.202,
60.21,
40.17,
60.206,
100.35,
100.354,

7 rows selected.

SQL> col start_date_time print
SQL> col sql_exec_id print
SQL> def mydelim=''

Below is all the R code I need for this analysis. Does it look like I copied and pasted the SQL run time values directly into R? Well... That's because I did!

mydata <- c(50.142,
50.202,
60.21,
40.17,
60.206,
100.35,
100.354)
summary(mydata)
plot(density(mydata))
abline(v=mean(mydata), col="blue")
abline(v=median(mydata), col="red")
abline(v=quantile(mydata,.90))

Here is what you see in the R console.

> mydata <- c(50.142,
+             50.202,
+             60.21,
+             40.17,
+             60.206,
+             100.35,
+             100.354)
> summary(mydata)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  40.17   50.17   60.21   65.95   80.28  100.35 
> plot(density(mydata))
> abline(v=mean(mydata), col="blue")
> abline(v=median(mydata), col="red")
> abline(v=quantile(mydata,.90))

The below "smooth histogram" image is the result of the above plot and the three abline commands.

Here's a quick summary of the (very small) 7 sample data set. The run time range is from around 40 to 100 seconds. While the average is around 66 seconds, the median (50%-tile) is around 60 seconds. This indicates our data is skewed, as the mean is "pulled" away from the median. This is very typical of Oracle performance data. While 50% of our samples run in less than around 60 seconds, 90% of our run times where less than around 100 seconds.

There is much more we could do here, but this is enough for this article.

To Summarize

So it appears, inferring SQL run times using the maximum and minimum sample times works wonderfully. And more than that, it works regardless of the Oracle version or PQ activity, is conceptually simple, the code is simple and easily customizable for data analysis.

I hope this article has encouraged you to use ASH in interesting and unusual ways. Because what I show in this article is just the beginning! If you can report on run times, then you can also report run times within a specific time range, see what modules or users usually run the SQL. Only our creatively will limit what we can do with ASH analysis.

If you want to make ASH a part of your performance analysis arsenal, then check out my live virtual class, Tuning Oracle Using Advanced ASH Strategies.

All the best in your Oracle tuning and machine learning work,

Craig.

If you have any questions or comments, feel free to email me directly at craig at orapub.com.

Off May Not Be Totally Off: Is Oracle In-Memory Database 12c (12.1.0.2.0) Faster? Comparing Oracle Database SQL Execution Times From Different Systems A Duck In And Out Of The Oracle Water