Understanding How An Intense And Diverse SQL Workload Causes Parsing Problems

Posted on 26-Jan-2017 by Craig Shallahamer / OraPub / craig@orapub.com

This post is about how an intense and diverse SQL statement workload can cause Oracle parsing problems. Big problems.

There are many reasons for Oracle parsing related performance issues and many ways to identify them. So, I am narrowing the scope by focusing on one possible cause, that is, an intense and diverse SQL statement workload. I will use Oracle's wait interface to identify the root cause.

Sound dry and boring? To make this a little more interesting and a whole lot more fun, I'm going to approach this from a non-traditional, almost backward perspective. After explaining some of the Oracle internals, I'm going to create a parsing intensive workload and then watch what happens... in real time. The educational impact is so powerful, I recorded a video (scroll to the bottom).

Enjoy the read... and the video!

About Parsing

When a client|user process submits SQL or PLSQL to the database, before it can be executed, it must be parsed. So, parsing had better happen fast and consume minimal resources.

Parsing requires a number of actions take place and in the proper order. For example, there are various checks related to syntax, semantics and permissions. There are memory management tasks such as looking for SQL in the library cache and also when creating cursors and child cursors.

Did you notice that all of this requires CPU? This is why "too much" parsing activity can consume so much CPU that you can end up with a CPU bottleneck, which makes a bad situation even worse.

So, if there is "too much" parsing, there will likely be a lack of available CPU resources and Oracle latching and mutex contention issues. If the situation becomes completely out of control, there could be out of shared pool memory errors or even an instance freeze. Great stuff if you enjoy solving Oracle performance problems, but bad stuff from an organization/business perspective.

The Three User Calls

There are three calls that a user|client process can make to a foreground process. They are parse, execute and fetch. So parsing can be simplified to say that it's everything up until the execution phase begins. A gross simplification for sure.

By the way, the instance statistic "user calls" is incremented anytime one of these three calls is received by a foreground process. If you want to dig into user calls a little deeper, I blogged about them HERE and HERE.

Unique SQL Statements and SQL_IDs

There are many ways to force Oracle library cache problems. My focus today is on pounding Oracle with tons of unique SQL statements, each having a unique SQL_ID.

Another perspective is pounding Oracle with a massive diversity of SQL statements that result in unique SQL_IDs.

In this post, I will not go into strategies related to reducing the number of unique SQL_IDs.

Take a close look at the below SQL. My objective is to dynamically create three similar, yet unique SQL statements.

SQL> set echo on verify on tab off
SQL> set linesize 150
SQL> col sql_text format a95
SQL> 
SQL> alter system flush shared_pool;

System altered.

SQL> drop table toomuch;

Table dropped.

SQL> create table toomuch ( c1 number );

Table created.

SQL> insert into toomuch select level just_a_column from dual connect by level <= 50000;

50000 rows created.

SQL> commit;

Commit complete.

SQL> select count(*) from toomuch;

  COUNT(*)
----------
     50000

SQL> select 'select '||dbms_random.value(0,9999999)||' from toomuch where c1 = 1' from dual;

'SELECT'||DBMS_RANDOM.VALUE(0,9999999)||'FROMTOOMUCHWHEREC1=1'
-------------------------------------------------------------------------
select 3368727.13606693486484336772687471926489 from toomuch where c1 = 1

SQL> /

'SELECT'||DBMS_RANDOM.VALUE(0,9999999)||'FROMTOOMUCHWHEREC1=1'
-------------------------------------------------------------------------
select 4443872.66702328733095306687574442351562 from toomuch where c1 = 1

SQL> /

'SELECT'||DBMS_RANDOM.VALUE(0,9999999)||'FROMTOOMUCHWHEREC1=1'
-------------------------------------------------------------------------
select 7250473.28864361529432819773982295315706 from toomuch where c1 = 1

Next, I am going to run each of the three generated "similar" SQL statatements. No tricks, just a straightforward copy and paste into SQLPlus.

SQL> select 3368727.13606693486484336772687471926489 from toomuch where c1 = 1;

3368727.13606693486484336772687471926489
----------------------------------------
                              3368727.14

SQL> select 4443872.66702328733095306687574442351562 from toomuch where c1 = 1;

4443872.66702328733095306687574442351562
----------------------------------------
                              4443872.67

SQL> select 7250473.28864361529432819773982295315706 from toomuch where c1 = 1;

7250473.28864361529432819773982295315706
----------------------------------------
                              7250473.29

Now comes the interesting part. Look at the output from the below SQL statement pulling from v$sql. Focus in on our three similar SQL statements. Notice anything about their SQL_IDs?


SQL> select sql_id, sql_text
  2  from   v$sql
  3  where  lower(sql_text) like '%from toomuch%'
  4    and  sql_text not like '%v$sql%';

SQL_ID        SQL_TEXT
------------- -----------------------------------------------------------------------------------------------
60yv8s17u4992 select 7250473.28864361529432819773982295315706 from toomuch where c1 = 1
cdmcvdn7uwm66 select count(*) from toomuch
2zu2t949nx3j1 select 'select '||dbms_random.value(0,9999999)||' from toomuch where c1 = 1' from dual
8kng37shkpwwq select 3368727.13606693486484336772687471926489 from toomuch where c1 = 1
b1h92czr43b2a select 4443872.66702328733095306687574442351562 from toomuch where c1 = 1

SQL> 

Our three similar SQL statements each have a different SQL_ID. What this means is, each of the SQL_IDs will have their own cursor within the shared pool's library cache.

What I write below is super important to understand.

Because these three SQL statements have never been run before, they did not reside in the library cache. And, they will likely never be run again! Therefore, our Oracle foreground process must do additional parsing related work, called "hard parsing," to get our SQL properly in the library cache. If these SQL statements had already resided in the library cache, the associated CPU and memory management would be significantly reduced, resulting in a "soft parse." (There is a softer and no-parse, but that's a conversation for a later time.)

Not convinced that having the SQL already in the library cache would make a significant performance difference? Check this out .

Now it's time to think about making the library cache angry.

Make Your Library Cache Angry: Setup

Now consider not just three statements but hundreds or thousands. You may say it will never happen. To be blunt... you're wrong. It does happen and it happens in production Oracle systems.

Look at the below script.

$ cat angry.sh

sqlplus system/manager <<EOF
declare
  sql_var varchar2(500);
begin
  while 1 = 1
  loop
    sql_var := 'select c1 from toomuch where c1 = '||dbms_random.value(0,9999999) ;
    execute immediate sql_var ;
    dbms_lock.sleep(0.01);
  end loop;
end;
/
EOF

The above script simply creates and executes an endless supply of similar yet unique SQL statements. (Yes, I realize there could be duplicates.)

Running the script will likely be undetectable, consuming very little resources. But that's not my plan. My plan is to launch this script in the background multiple times! This is how you make your library cache angry!

Make Your Library Cache Angry: Get Ready

When life is good and the database is humming along there is plenty of available CPU resources and Oracle sessions do not have to wait for anything... just milking the CPU subsystem for resources.

Below is the result from my realtime session sampler (rss.sql) that is contained in my free OSM Toolkit. The columns, left to right, are a sequence-like number, sample time, session ID, session serial number, Oracle user, session state (CPU, WAITING), SQL_ID, wait event (if waiting) and other stuff.

In this example, I am sampling all active Oracle sessions once every 0.25 seconds. The workload is very light because only a couple angry.sh scripts are running. Notice all the active sessions are consuming CPU, and therefore no sessions are "waiting."


190 11:42:47:764   401 28120     SYSTEM CPU  52zu30zuscaaw        -     -                              [,,]
190 11:42:48:015    15 12091     SYSTEM CPU  3u565sybjb9zh        -     -                              [,,]
190 11:42:48:266   254 27097     SYSTEM CPU  3qxun4q1zp54n        -     -                              [,,]
190 11:42:48:517   322 29824     SYSTEM CPU  d8sx36rhqm31m        -     -                              [,,]
190 11:42:48:768   412 45369     SYSTEM CPU  6zcs88udqq41b        -     -                              [,,]
190 11:42:49:019   412 45369     SYSTEM CPU  fst6zwt7mfcqw        -     -                              [,,]
190 11:42:49:019   333 49784     SYSTEM CPU  f6pj32wcuy1d0        -     -                              [,,]
190 11:42:49:019   324 31501     SYSTEM CPU  0k4xj6zha67pd        -     -                              [,,]
190 11:42:49:019    86 30715     SYSTEM CPU  g625nbz82r4rd        -     -                              [,,]
190 11:42:49:270   333 49784     SYSTEM CPU  dmun9dstqq8zj        -     -                              [,,]
191 11:42:49:521   335  2765     SYSTEM CPU  9ra4g5b1s26fb        -     -                              [,,]
191 11:42:49:521   180 50922     SYSTEM CPU  fca7a1v14mf63        -     -                              [,,]
191 11:42:49:772   411 52258     SYSTEM CPU  72hj0fr4b5wfx        -     -                              [,,]
191 11:42:49:772   405 51108     SYSTEM CPU  g625nbz82r4rd        -     -                              [,,]
191 11:42:49:772   400 27805     SYSTEM CPU  dv6ypf4paa8wq        -     -                              [,,]

Below is what the Linux OS was experiencing based on a vmstat report (vmstat 3 9999). There are 6 CPU cores in the bare metal server. The far left "r" columns shows the average CPU run queue is much less than 6, indicating there is plenty of CPU resources. And, the average CPU utilization is around 7% ("us" + "sy" columns). So, the CPU subsystem looks great!


procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 5789780 284132 17808136    0    0     0    11  544  253  7  0 93  0  0	
 0  0      0 5789780 284132 17808136    0    0     0    16  542  248  7  0 93  0  0	
 1  0      0 5790276 284132 17808136    0    0     0    17  702  308  9  0 91  0  0	
 0  0      0 5790276 284132 17808136    0    0     0    16  535  272  7  0 93  0  0	
 0  0      0 5790276 284140 17808136    0    0     0    16  527  239  7  0 93  0  0	
 1  0      0 5790276 284140 17808136    0    0     0    13  524  228  7  0 93  0  0	
 0  0      0 5790276 284140 17808136    0    0     0    16  520  245  7  0 93  0  0	
 1  0      0 5790276 284140 17808136    0    0     0    11  529  235  7  0 93  0  0	
 1  0      0 5790276 284144 17808136    0    0     0    16  530  237  7  0 93  0  0

Make Your Library Cache Angry: Get Angry

The above looks great... until the workload changes. I personally altered the workload by firing off 20 simultaneous "angry" sessions which generate over a thousand unique SQL statements every few seconds. Below is the result of me doing a simple count related to my "angry" SQL statements every second or so.


SQL> select count(*) from v$sql where lower(sql_text) like '%from toomuch%' and sql_text not like '%v$sql%';

  COUNT(*)
----------
      6921

SQL> /

  COUNT(*)
----------
     10392

SQL> /

  COUNT(*)
----------
     12398

SQL> /

  COUNT(*)
----------
     14847

SQL> /

  COUNT(*)
----------
     17998

Each Oracle session is generating hundreds of unique SQL statements every few seconds. To complete this work, each Oracle session must create new cursors and child cursors which require CPU and memory resources along with Oracle latch/mutex acquisitions. That is fine until there are lots of sessions doing this same thing... over and over.

At some point, the combination of workload, Oracle memory serialization requirements and OS resources is just too much. The result is lots of Oracle sessions waiting before they can consume CPU to get on with executing their SQL statement.

Below is the result from my realtime session sampler during the mayhem. As you can see, the activity profile is very different... lots of waiting for library cache mutexes and shared pool latches.


203 11:43:19:647   401 28120     SYSTEM WAIT 82kb2uku8wp47        other library cache: mutex X         [74979849,64424509440,485997024444417]
203 11:43:19:647   336 63081     SYSTEM WAIT 1zkzf3j26ugg3        other library cache: mutex X         [74979849,64424509440,485997024444417]
203 11:43:19:647   333 49784     SYSTEM WAIT d24u3kfd7hrhp        other library cache: mutex X         [74979849,64424509440,485997024444417]
203 11:43:19:647   324 31501     SYSTEM WAIT 5vw6uhbwsm19h        other library cache: mutex X         [74979849,64424509440,485997024444522]
203 11:43:19:647    83 46586     SYSTEM WAIT cxs8tjq89qxgv        other library cache: mutex X         [74979849,64424509440,485997024444417]
203 11:43:19:647     9 31992     SYSTEM WAIT 4hbtgmr7st7j3        other library cache: mutex X         [74979849,64424509440,485997024444522]
203 11:43:19:898   166 28604     SYSTEM CPU  5z4yaucgshgpx        -     -                              [,,]
203 11:43:20:149   412 45369     SYSTEM CPU  8cfkahv227q4w        -     -                              [,,]
203 11:43:20:400   412 45369     SYSTEM WAIT 70ug99n8xh9vk        other latch: shared pool             [1612000384,453,0]
203 11:43:20:651   333 49784     SYSTEM CPU  8pkntfm7v8w6r        -     -                              [,,]
203 11:43:20:902   335  2765     SYSTEM CPU  9ra4g5b1s26fb        -     -                              [,,]
203 11:43:21:153   258 53179     SYSTEM CPU  1xfwcurvsa1dz        -     -                              [,,]
203 11:43:21:153    15 12091     SYSTEM WAIT 79dqx1b39ttbw        other latch: shared pool             [1612000544,453,0]
203 11:43:21:153     9 31992     SYSTEM CPU  c08uwsrzduf65        -     -                              [,,]
203 11:43:21:404   400 27805     SYSTEM WAIT g625nbz82r4rd        other latch: shared pool             [1612000384,453,0]

Take a look at the vmstat output below. Wow! The CPU run queue is over 10, and remember the bare metal box only has 6 physical CPU cores. So, it makes sense the average CPU utilization is 99%. Ouch!


procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
14  0      0 5562620 284624 17808272    0    0     0    11 5987 1485 99  0  1  0  0	
 7  0      0 5562620 284632 17808272    0    0     0    20 6005 1509 99  0  1  0  0	
10  0      0 5562588 284632 17808272    0    0     0    11 6024 1493 99  0  1  0  0	
15  0      0 5562588 284640 17808272    0    0     0    20 6006 1502 99  0  1  0  0	
11  0      0 5562588 284640 17808272    0    0     0    11 5999 1481 99  0  1  0  0	
13  0      0 5562596 284648 17808272    0    0     0    28 6019 1508 99  0  1  0  0

Sit On The Couch And Watch: Video

In this video, I'm going to do exactly what I wrote about above. That is, cause shared pool latching and library cache mutex contention by having multiple sessions quickly flooding Oracle with thousands and thousands of unique SQL statements.

What Have We Learned?

What are the lessons learned here?

Of course understanding and correctly diagnosing the problem does not solve the problem. However, most DBAs struggle with performance tuning because they do not correctly diagnose the problem.

While I purposely did not get into solving the problem, the initial solution is to directly attack the cause of the problem. That is, reduce the number of concurrent Oracle sessions, reduce the number of unique SQL statements and increase the CPU capacity. There are a number of other options as well.

I hope you enjoy the video and thanks for reading!

Craig.

If you have any questions or comments, feel free to email me directly at craig at orapub.com.

IOUG 2016 Is Different: Here's How To Get Your Abstract Accepted More On Oracle Database User Calls Hmm... Oracle Database Cloud Vendors Want Efficient Systems